GC28 0725 2_OS_VS2_System_Programming_Library_MVS_Diagnostic_Techniques_Rel_3.7_Sep78 2 OS VS2 System Programming Library MVS Diagnostic Techniques Rel 3.7 Sep78
User Manual: GC28-0725-2_OS_VS2_System_Programming_Library_MVS_Diagnostic_Techniques_Rel_3.7_Sep78
Open the PDF directly: View PDF .
Page Count: 564
Download | |
Open PDF In Browser | View PDF |
GC28-0725-2 File No. S370-37 Systems OS/VS2 System Programming Library: MVS Di.agnostic Techniques Release 3.7 I ncludes Selectable Units: Scheduler Improvements Supervisor Performance # 1 Supervisor Performance #2 Service Data Improvements JES2 Release 4.1 3838 Vector Processing Subsystem Support Dumping Improvements Attached Processor System for Models 158/168 Hardware Recovery Enhancements Interactive Problem Control System (lPCS) --... - --- -- --.-_-....- .- - - - ---- _ _ _ __ -----~- VS2.03.804 VS2.03.805 VS2.03.807 VS2.03.817 5752-825 5752-829 5752-833 5752-847 5752-855 5752-857 ( Third Edition (September, 1978) This is; a major revision of, and obsoletes, GC28-0725-1 incorporating changes released in the follo\\ting System Library Supplement: Interactive Problem Control System (IPCS) 5752-857 GD23-0095-0 (dated March 31, 1978) See the Summary of Amendments following the Contents for a summary of the changes that have been made to this manual. A vertical line to the left of the text or illustration indicates a technical change made in this edition; revision bars are not used, however, to indicate changes made in previous editions, technical newsletters, or supplements. This edition applies to release 3.7 of OS!VS2 and to all subsequent releases of OS!VS2 until otherwise indicated in new editions or Technical Newsletters. Changes are continually made to the information herein; before using this publication in connection with the operation of IBM systems, consult the latest IBM System/370 Bibliography, GC20-0001, for the editions that are applicable and current. Publications are not stocked at the address given below; requests for IBM publications should be made to your IBM representative or to the IBM branch office serving your locality. A form for reader's comments is provided at the back of this publication. If the form has been removed, comments may be addressed to IBM Corporation, Publications Development, Department D58, Building 706-2, PO Box 390, Poughkeepsie, NY 12602. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation whatever. You may, of course, continue to use the information you supply. ©Copyright International Business Machines Corporation 1976, 1977,1978 Guide for Using This Publication The following is a list of the requirements for using this publication. • This publication contains information for the following Selectable Units: Scheduler Improvements - SU4 Supervisor Performance # 1 - SUS Supervisor Performance # 2 - SU7 Service Data Improvements - SU17 JES2 Release 4.1 - SU25 3838 Vector Processing Subsystem Support - SU29 Dumping Improvements - SU33 Attached Processor System for Models 158/168 - SU47 Hardware Recovery Enhancements - SUSS Interactive Problem Control System (IpeS) - SU57 • To use this publication, you must have installed at least SUs 4, 5, 7, 17, 25, (if you are a JES2 user), 33, and 55. • The implied date of this publication, for the purpose of adding new supplements/TNLs, is September 30,1978. Always use the page with the latest date (shown at the top of the page) when adding pages from different supplements/TNLs. Guide for Using This Publication iii iv OS/VS2 System Programming Library: MVS Diagnostic Techniques Preface ]./ This publication describes diagnostic techniques and gUidelines for isolating problems on MVS systems. It is intended for the use of system programmers and analysts who understand MVS internal logic and who are involved in resolving MVS system problems. This publication is intended for use only in debugging. None of the information contained herein should be construed as defining a programming interface. Organization and Contents This publication stresses a three-step debugging approach: 1. Identifying the external symptom of the problem. 2. Gathering relevant data from system data areas in order to isolate the problem to the component level. 3. Analyzing the component to determine the cause of the problem. In support of this approach, the publication has been reorganized into three basic parts consisting of five sections and three appendixes as follows: Part 1 Section 1. GeneralIntroduction -- Describes the debugging approach that is used and defines the external symptoms that are used to identify a system problem. Section 2. Important Considerations Unique to MVS -- describes concepts and functions that should be understood prior to undertaking system diagnosis. Included are: global system analysis, system execution modes and status saving, locking, use of recovery work areas, effects of MP, trace analysis, debugging hints, and general data gathering techniques. Section 3. Diagnostic Materials Approach -- provides guidelines for obtaining and analyzing storage dumps of data areas affected by the problem. Preface v Part 2 Section 4. Symptom Analysis Approach - describes how to identify an external symptom (loop, wait state, TP problem, performance degradation, or incorrect output), and provides an analysis procedure for what kind of problem is causing the symptom. Section 5. Component Analysis - describes the operating characteristics and . recovery procedures of selected system components and provides debugging techniques for determining the cause of a problem that has been isolated to a particular component. Part 3 Appendixes A. - describes the flow of various MVS processes. B. - provides a step-by-step approach to analyzing a stand-alone dump. C. - contains definitions of abbreviations used throughout the publication. vi OS/VS2 System Programming Library: MVS Diagnostic Techniques jf ~ Referenced Publications The following publications either are referenced in this publication or provide related reading: I GA22-7000 System/370 Principles of Operation GA27-3093 Synchronous Data Link Control General Information OS/VS2 MVS Interactive Problem Control System (IPCS) User's GC34-2006 Guide and Reference OS/VS Environmental Recording Editing and Printing (EREP) Program GC28-0772 OS/VS2 System Programming Library: GC28-0681 Initialization and Tuning Guide Supervisor GC28-0628 GC28-0627 Job Management Service Aids GC28-0674 GC28-0677 SYSl.LOGREC Error Recording GC28-0751 and GC28-0752 Debugging Handbook (2 volumes) GC28-0703 JES3 Debugging Guide GC30-2051 OS/VS2 TCAM System Programmer's Guide, TCAM Level 10 GC30-3040 OS/VS TCAM Debugging Guide, TCAM Levell 0 GC27-0023 OS/ VS2 MVS VTAM Debugging Guide Operator's Library: GC38-0229 OS/VS2 MVS System Commands GC23-0007 OS/VS2 MVS JES2 Commands GC23-0008 OS/VS2 MVS JES3 Commands VTAM Network Operating Procedures GC27-6997 GC30-3037 OS/ VS TCAM Levell 0 OS/VS Message Library: VS2 System Messages GC38-1002 GC38-1008 VS2 System Codes GY30-3012 3704/3705 Program Reference Handbook SY26-3823 OS/VS2 I/O Supervisor Logic SY28-0623 OS/VS2 System Initialization Logic SY26-3825 OS/VS2 VSAM Logic SY26-3826 OS/VS2 Catalog Management Logic SY27-7267 OS/VS2 VTAM Data Areas SY35-0010 OS/VS2 Access Method Services Logic SY28-0621 OS/VS2 VTAM Logic SY28-0713 through SY28-0719 OS/VS2 System Logic Library (7 volumes) Preface vii OS/VS2 CVOL Processor Logic OS/VS2 MVS iES2 Logic OS/VS2 VIO Logic OS/VS2 MVS JES3 Logic . OS/VS2 TCAM Level 10 Logic "IBM 3704 and 3705 Communications Controllers NCe!VS Logic OS/VS2 Data Areas (microfiche) 3704/3705 Communications Controllers Principles of Operation IBM 3704/3705 Communications Controllers Emulation Program Generation and Utilities Guide and Reference Manual IBM 3704/3705 Communications Controllers NCP/VS Generation and Utilities Guide and Reference Manual I viii OS/VS2 System Programming Library: MVS Diagnostic Techniques SY35-0011 SY24-6000 SY26-3834 SY28-0612 SY30-3032 SY30-3013 SYB8-0606 GC30-3004 GC30-3008 GC30-3007 If! '~ Contents Section 1. General Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.1.1 Basic MVS Problem Analysis Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 IPCS - Interactive Problem Control System. . . • . . . . . . . . . . . . . . . . . . . . 1.1.4 Section 2. Important Considerations Unique to MVS ••••••••.•.•••.•.. Global System Analysis . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . Global Indicators that Determine the Current System State . . . . . . . . . . . . . Work Queues, TCBs and Address Space Analysis . . . . . . . . . . . . . . . . . . . TCB Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SRB Dispatching Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Address Space Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Task Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System Execution Modes and Status Saving. '.' . . . . . • . . . . . . . . . . . . . . . System Execution Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Task Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . SRB Mode . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . Physically Disabled Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Locked Mode . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . Determining Execution Mode From a Stand-alone Dump . . • . . . . . . . . . . . Locating Status Information in a Storage Dump . . . . . . . . . . . . . . . . . . . Task/SRB Mode Interruptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . Locally Locked Task Suspension . . . . . . . . . . . . . . . . . . . . . . . . SRB Suspension. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Classes of Locks. . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . Types of Locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Locking Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Determining Which Locks Are Held On a Processor . . . . . . . . . . . . . . . . . Content of Lockwords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to Find Lockwords. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . Results of Requests for Unavailable Locks. . . . . . . . . . . . . . . • . . . . . . . Use of Recovery Work Areas for Problem Analysis . . . . . . . . . . . . . . . . . . . . SYS1.LOGREC Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Listing the SYS1.LOGREC Data Set. . . . . . . . . . . . . . . . . . . . . . . . SYS1.LOGREC Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Important Considerations About SYS1.LOGREC Records . . . . . . . . . . . SYS1.LOGREC Recording Control Buffer. . . . . . . . . . . . . . . . . . . . . . . Formatting the LOGREC Buffer . . . . . . . . . . . . . . . . . . • . . . . • . . Finding the LOGREC Recording Control Buffer . . . . . . . . . . . . . . . . . Format of the LOGREC Recording Control Buffer. . . . . . . . . . . . . . . . fRR Stacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extended Error Descriptor (EED) . . . . . . • . . . . . . . . . . . . . . . . . . . . RTM2 Work Area (RTM2WA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Formatted RTM Control Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . System Diagnostic Work Area (SDWA) Use in RTM2 . . . . . . . . . . . . . . . . Effects of Multi-Processing On Problem Analysis . . . . . . . . . • • . . . . . . . . . . Features of an MP Environment. . . . . . . . . . . . . . . . . . . . • . . . . . . . . MP Dump Analysis . '.' . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . Data Areas Associated With the MP Environment. . . . . . . . . . . . . . . . . Parallelism . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . • . . . . General Hints for MP Dump Analysis. . . . . . . . . . . . . . . . . . . . . . . . In ter-Processor Comm unica tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Direct Services. . . . . . . . . . . . . . . . . . . . . . • . . . • . . . . . . . . . . Remote Pendable Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remote Immediate Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . MP Debugging Hints . . . . . . . . . . . . . . .; . . • . . • . . . . . . . . . . . . . . 2.1.1 2.1.3 2.1.3 2.1.6 2.1.6 2.1.7 2.1.7 2.1.8 2.1.10 2.2.1 2.2.1 2.2.1 2.2.2 2.2.2 2.2.3 2.2.4 2.2.5 2.2.5 2.2.6 2.2.7 2.3.1 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 2.3.5 2.3.7 2.4.1 2.4.2 2.4.2 2.4.3 2.4.13 2.4.14 2.4.15 2.4.15 2.4.15 2.4.17 2.4.19 2.4.19 2.4.19 2.4.20 2.5.1 2.5.1 2.5.2 2.5.3 2.5.4 2.5.6 2.5.7 2.5.8 2.5.9 2.5.10 2.5.16 Contents ix x MVS Trace Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trace Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trace Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notes for Traces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tracing Procedure.. . . . . . . . .. '.' . . . . . . . . . . • . . . . . . . . . . . . . Cautionary Notes . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . Miscellaneous Debugging Hints . . . . . . . . . . . . . . . . . . . . . '.' . . . . . . . . Alternate CPU Recovery (ACR) Problem Analysis . . . . . . . . . . . . . . . . . . Pattern Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Low Storage Overlays . . . . . • . . . . . .'. . . . . . . . . . . . . . . . . . . . Common Bad Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OPEN/CLOSE/EOV ABENDs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Debugging Machine Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Debugging Problem Program Abend Dumps. . . . . . . . . . . . . . . . . . . . . . Debugging from Summary SVC Dumps . . . . . .. . . . . . . . . . . . . . . . . . SUMDUMP Output for SVC-Entry SDUMP . . . . . . . . . . . . . . . . . . . . SUMDUMPOutput for Branch-Entry SDUMP . . . . . . . . . . . . . . . . . . Started Task Control ABEND and Reason Codes. . . " . . . . . . . . . . . . . . SW A Manager Reason Codes. . .'. . . . . . . . . " . . . . . . . . . . . . . . . . . Additional Data Gathering Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . • Using the CHNGDUMP, DISPLAY DUMP, and DUMP Commands . . . . . . . . . How to Print Dumps . . . . . . . ; '. . . . . . . . . . . • . . . . . . . . . . . . . . . How to Automatically Establish System Options for SVC Dump . . . . . . . . . How to Copy PRDMP Tapes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to Rebuild SYS1.UADS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to Print SYS1.DUMPxx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to Clear SYS1.DUMPxx Without Printing . . . . . . . . . . . . . . . . . . . . How to Print the SYS1.COMWRITE Data Set. . . . . . . . . . . . . . . . . . . . . How to Print an LMOD Map of a Module . . . . . . . . . . . . . . . . . . . . . . . How to Re-create SYS1.STGINDEX . . . . . . . . . . . • . . . . . . . . . . . . . . Software LOGREC Recording. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the PSA as a Patch Area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the SLIP Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Designing an Effective SLIP Trap. . . . . . . . . . . . . . . ',' . . . . . • . . . Enabling the PER Hardware to Monitor Storage Locations . . . . . . . . . . . . . System Stop Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U sing the MVS Trace to Monitor Storage . . . . . . . . . . . . . . . . . . . .. . ,. How To Expand the Trace Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 2.6.1 2.6.3 2.6.5 2.6.5 2.6.7 2.7.1 2.7.1 2.7.3 2.7.4 2.7.5 2.7.5 2.7.6 2.7.11 2.7.14 2.7.14 2.7.16 2.7.18 2.7.19 2.8.1 2.8.2 2.8.2 2.8.5 2.8.5 2.8.6 2.8.7 2.8.7 2.8.8 2.8.8 2.8.9 2.8.9 2.8.10 2.8.10 2.8.12 2.8.15 2.8.17 2.8.18 2.8.18 Section 3. Diagnostic Materials Approach. . . . . . . . . .. . . . . . . . . . . . . .. Standalone Dumps . .'. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SVC Dumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to Change the Contents of an SVC Dump Issued by an Individual Recovery Routine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SDUMP Parameter List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSABENDs, SYSMDUMPs, and SYSUDUMPs. . . . . . . . . . . . . . . . . . . . . . Software-Detected Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hardware~Detected Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 3.1.3 3.1.5 Section 4. Symptom Analysis Approach . . . . . . . . . . . . . . . . . . . . . . . . Waits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristics of Enabled Waits . . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristics of Disabled Waits . . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis Approach for Disabled Waits ~ . . . . . . . . . . . . . . . . . . . . . . . Analysis Approach for Enabled Waits . . . . . . . . . . . . . . . . . . . . . . . . Stage 1: Preliminary Global System Analysis. . . . . . . . . . . . . . . . . . Stage 2: Key Subsystem Analysis . . . . . . . . . . . . . . . . . . . . . . . . Stage 3: System Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Common Loop Situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 4.1.3 4.1.3 4.1.4 4.1.5 4.1.7 4.1.8 4.1.10 4.1.15 4.2.1 4.2.1 4.2.2 OS/VS2 System Programming Library: MVS Diagnostic Techniques . . . . . . . . . . . . 3.1.6 3.1.7 3.1.9 3.1.9 3.1.10 ~ TP Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Message Flow Through the System. . . . . . . . . . . . . . . . . . . . . . . . . . Types of Traces. . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . '.' . EP Mode Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NCP Mode Traces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trace Output Under Normal Conditions . . . . . . . . . . . . . . . . . . . . . . . Example 1: VTAM I/O Trace. . . . . . . . . . . . . . . . . . . . . . . . . . . Example 2: VT AM and GTF Traces. . . . . . . . . . . . . . . . . . . . . . . Notes on Examples 1 and 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VTAM Buffer Trace Modification. . . . . . . . . . . . . . . . . . . . . . . . . VTAM I/O Trace (RNIO) Modification . . . . . . . . . . . . . . . . . . . Other Tracing Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Performance Degradation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operator Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dump Analysis Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Incorrect Output . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . Initial Analysis Steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Isolating the Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analyzing System Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 4.3.1 4.3.3 4.3.4 4.3.5 4.3.7 4.3.7 4.3.12 4.3.27 4.3.28 4.3.29 4.3.29 4.3.30 4.4.1 4.4.1 4.4.2 4.5.1 4.5.1 4.5.1 4.5.2 4.5.3 Section 5. Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dispatcher. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Important Dispatcher Entry Points . . . . . . . . . . . . . . . . . . . . . . . . . . . Dispatchable Units and Sequencing of Dispatching . . . . . . . . . . . . . . . . . . Dispatchability Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miscellaneous Notes About the Dispatcher . . . . . . . . . . . . . . . . . . . . . . Dispatcher Recovery Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . Dispatcher Error Conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . Front-End Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Back-End Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lOS Problem Analysis . . . . . . . . . . . '.' . . . . . . . . . . . . . . . . . . . . . lOS Abend Codes .. ' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loops. . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lOS Wait States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ' . . . . . . . General Hints for lOS Problem Analysis . . . . . . . . . . . . . . . . . . . . . . . . Error Recovery Procedures (ERPs) . . . . . . . . . . . . . . . . . . . . . . . . . . . lOS and ERP Processing . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . Identifying ERP Module Names . . . . . . . . . . . . . . . . . . . . . . . . . . . How ERP Transfers Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abnormal End Appendages . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Retry/Restart the Channel Program . . . . . . . . . . . . . . . . . . . . . . . . Error Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ERP Messages and Logging. . . . • . . . . . . . . . . . . . . . . . . . . . . . . . Intercept Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unit Check on Sense Command . . . . . . . . . . . . . . . . . . . . . . . . . . . Compound Errors. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . Diagnostic Approach. . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . Program Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Functional Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Program Manager Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . Program Manager Con trol Blocks. . . . . . . . . . . . . . . . . . . . . . . . . . Program Manager Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Queue Validation . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . System Initialization . • . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . 5.1.1 5.1.3 5.1.3 5.1.4 5.1.10 5.1.12 5.1.13 5.1.14 5.2.1 5.2.1 5.2.1 5.2.1 5.2.4 5.2.4 5.2.5 5.2.6 5.2.8 5.2.8 5.2.9 5.2.9 5.2.10 5.2.11 5.2.11 5.2.12 5.2.13 5.2.13 5.2.13 5.2.14 5.3.1 5.3.1 5.3.1 5.3.1 5.3.2 5.3.4 5.3.5 Contents xi Basic Functional Flow. . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . .. LINK . . . . . . . . • . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . ATTACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XCTL . . . . . . . . . . . . . . . '. . . . . . . . . . . . . . . . . . . . . . . . . . . LOAD . . . . . . . . . . . . . . . . . . '. . . . . . . . . . . . . . . . . . . . . . . DELETE·. . . Exit Resource Manager. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . SYNCH . . . . . . . . . . . . . . . . . ; . . . . . . . . . . . . . . . . . . . . . . . . . IDENTIFY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abend Resource Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806 Abend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . APF Authorization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Module Subpools ... ' . . . . . . : . . . .. . . . . . . . . . . . . . . . . . . . . .. Fetch/Program Manager Work Area (FETWK) ; . • . . . . . . . . . . . . . . . . . RB. Extended Save Area (RBEXSAVE) . . . . . . . . . . . . . . . . . . . . . . . . VSM . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . Address Space Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step Initialization/Termination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual Storage Allocation. . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . GETMAIN's Functional Recovery Routine . . . . . . . . . . . . . . . . . . . . . . VSM Cell Pool Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miscellaneous Debugging Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Real Storage Manager (RSM) . . . " . . . . ~'.. . . . . . . . . . . . . . . . . . . . . Major RSM Control Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. PCB . . . . . . . . . . . . . . ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SPCT . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . PFTE . . . . . . . . . . . ' . . . . . . ; . . . . . . . . . . . . . . . . . . . . . . . . . Page Stealing. . . . . . . . . . . '. . . . . . . . . . . . . . . . . . . '. . . . . . . . . . . . Reclaim . . . . . . . ' .. :• . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relate . . . . . . . . . . '. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . RSM Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RSM Debugging Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Converting a Virtual Address to a Real Address . . . . . . . . . . . . . . . . . . . . Example: Converting a Virtual Address to aReal Address . . . . . . . . . . . . Auxiliary Storage Manager (ASM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Component Functional Flow . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . Saving an LG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Requesting I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Requesting Swap I/O . . . . . . . . ; . . . . . . . . . . . . . . . . . . . . . . .. Component Operating Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . System Mode ... '. . . . . . . . . . . . . . . . . . ',' . . . . . . . . . . . . . . Address Space, Task, and SRB Structure. . . . . . . . . . . . . . . . . . . . . . Storage Considerations. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . MP Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interfaces With Other Components. . . . . . . . . . . . . . . . . . . . . . . . . Register Conventions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Footprints and Traces . . . . . ; . . . . . . . . . . . . . . . . . . . . . . . . . . General Debugging Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Paging Interlocks . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . Incorrect Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finding the LSID for a Given Page . . . . . . . . . . . . . . . . . . " . . . Finding LSIDs ofVIO Data Sets .. , . . . . . . . . . . . . . . . . . . . . . Locate PART and PAT Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . Converting a Slot Number to a Full Seek Address . . . . . . . . . . . . . . Unusable Paging Data Sets.; . . . . . . . . . • . . . . . . . . . . . . . . . . . . . Page/Swap Data Set Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error Analysis Suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Validity Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c ••••••••••.•••••.•••• xii OS/VS2 System Programming Library: MVS Diagnostic Techniques '• • • • • • • • • • • • • • • • • • • • 5.3.5 5.3.5 5.3.8 5.3.8 5.3.11 5.3.11 5.3.11 5.3.12 5.3.12 5.3.13 5.3.14 5.3.14 5.3.19 5.3.19 5.3.20 5.4.1 5.4.3 5.4.5 5.4.6 5.4.8 5.4.10 5.4.10 5.5.1 5.5.1 5.5.3 5.5.5 5.5.6 5.5.6 5.5.8 5.5~8 5.5.9 5.5.12 5.5.13 5.5.15 5.6.1 5.6.2 5.6.2 5.6.3 5.6.4 5.6.4 5.6.4 5.6.6 5.6.6 5.6.6 5.6.7 5.6.7 5.6.7 5.6.8 5.6.8 5.6.9 5'.6.10 5.6.10. 5.6.12 5.6.14 5.6.15 5.6.17 5.6.18 5.6.19 'A .. I \ ASM Serialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SALLOC Lock . . . . . . . . . . . • . • . . . . . . . . . . . . . . . . . . . . ASM Class Locks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Local Lock of Current Address Space . . . . . . . . . . . . . . . . . . . . . Compare and Swap (CS) Serialization . . . . . . . . . . . . . . . . . . . .. Serialization via Control Block Queues. . . . . . . . . . . . . . . . . . . . . Recovery Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Recovery Traces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Recovery Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Recovery as a Debugging Tool. . . . . . . . . . . . . . . . . . . . . . . . . . .. Recovery Footprints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. FRR/ESTAE Work Areas . . . . . . . . . . . . . . . . . . . . . . . . . . .. SDWA Variable Reco"rding Area. . . . . . . . . . . . . . . • . . . . . . . .. ASM Diagnostic Aids. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. COD ABEND Meanings for ASM . . . . . . . . . . . . . . . . . . . . . . . . . . ASM Recovery Control Blocks . . . . . . . . . . . . . . . . . . . . . . . . . .. ASM Tracking Area (ATA) .. .' . . . . . . . . . . . . . . . . . . . . . . . . Recovery Audit Trail Path (EPATH) . . . . . • . . . . . . . . . . . . . . . . Additional ASM Data Areas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . BSHEADER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. BUFCONBK . . . . . . . . . • • • . . . • . • . . . . . . . . . . . . . . . . . . DSNLIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ·MSGBUFER . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . System Resources Manager (SRM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SRM Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Address Space States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SRM Indicators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. System Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Individual User Indicators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Indicators. . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . SRM.Error Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Module Entry Point Summaries . . . . . . . . . . • . . . . . • . . . . • . . . . . . . IRARMINT - SRM Interface Routine • . . . . . . • . . . . . . . . . . . . . . IRARMEVT - SRM SYSEVENT Router . • . . . . . . . . . . . . . . . • . . . IRARMSTM - Storage Management Routine. . . . . . . . . . . . . . . . . . . IRARMSRV - SRM Service Routine . . . . . . . . . . . . . . . . . . . . . . • . IRARMERR - SRM's Functional Recovery Routine . . . . . . . . . . . . . . . IRARMCPM .,... Processor Management . . . . . . . . . . . . . . . . . . . . . . IRARMIOM - I/O Management . . . • . • • . . . • . . . . . . . . . . . . . • • IRARMRMR- Resource Manager • . . . . . . . . . . . . . . . . . . . . . . . . IRARMCTL - SRM Control Algorithms • . . . . . . . . . . . • . . . . . . . . IRARMWAR- Workload Activity Recording . . . . . . . . . . . . . . . . . . . IRARMWLM- SRM Workload Manager . . . . . . . . . . . . . . . . . . . . . . VTAM . . . . . . . . . . . . . . . . . . • .". . . . . . . . . . . . . . • . . . . . . . . . . VT AM's Relationship With MVS . . . ~ . . . . . . . . . . . . . . . . . . . . . . .. Processing Work Through VTAM. . . . . . . . . . . . . . . . . . . . . . . . . . .. VTAM Functio~ Management Control Block (FMC B) . . . . . . . . . . . . . . . . VT AM Operating Characteristics. . . . . . . . . . .. . . . . . . . . . . . . . . .. Module Naming Convention . . . . '. . . . . . . . . . . . . . . . . . . . . . . . • Addre~s Space Usage. . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . Locking . . . . . . '. . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . VT AM Recovery /Terinination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VTAM Debugging . . . . . . . . . . . . . . • . • . . . . • . . . • . . . . . . . . . . . Waits . . . . . . . . . . . . . . . . . • . • • . . • . . . . . . . • . . . . . . . . . '. Program Checks . . . . . • . . . . . . • • . • . . . . . . . . . . . . . . . . . . . . Miscellaneous Hints on VTAM • . . . . . . . . . • • . . . . . . . • . . . . . . . . . VSAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Record Management . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . • . . . RPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PLH. . . . . . . . . . . . . . . . . . . . . . . . • • . • . . . . . . . . . . • . . . . BUFC . . . . . . . . . . . . . . . . . . : . • . . . . . . . . . . . . . . . . . . . . . 5.6.19 5.6.19 5.6.20 5.6.21 5.6.21 5.6.22 5.6.22 5.6.23 5.6.23 5.6.24 5.6.24 5.6.24 5.6.25 5.6.25 5.6.26 5.6.26 5.6.26 5.6.29 5.6.32 5.6.32 5.6.33 5.6.33 5.6.34 5.7.1 5.7.1 5.7.2 5.7.3 5.7.3 5.7.6 5.7.8 5.7.8 5.7.8 5.7.9 5.7.9 5.7.9 5.7.10 5.7.10 5.7.11 5.7.12 5.7.13 5.7.13 5.7.15 5.7.16 5.8.1 5.8.1 5.8.2 5.8.5 5.8.6 5.8.6 5.8.6 5.8.7 5.8.8 5.8.10 5.8.11 5.8.15 5.8.15 5.9.1 5.9.1 5.9.1 5.9.2 5.9.3 Contents xiii Record Management DebuggingAids . . . . . . . • • . • . . . . . . . . • . . . . . . Open/Close/End-of-Volume. . . . . . . .• . . . . . . . . . . . • . . . . . • • . . . O/C/EOV Debugging Aids. . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . I/O Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ' . . . . . . . . . . I/O Manager Debugging .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Catalog Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . Major Registers and Control Blocks. . . . . . . . . • . . . . . . . . . . . . . • . . . How to Find Registers . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . Major Registers . . . . . . • . . • . . . . . . . . . . . . • . . . . . . . . . . . . . Major Control Blocks . . . . . . . . . . . . ~ . . . . . . • . . . . . . . . . . . . . Module Structure. • . • . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . VSAM Catalog Recovery Logic . . . . , • . . • . . . . . . • . • . . . . . . . . . . . Establishing/Releasing a Recovery Environment . . . . . . . . . . . . . . . . . Maintaining a Pushdown List End Mark . . . . . . . . . . . • . . . . • . . . • . Tracking GETMAIN/FREEMAIN Activity . . . . . . . . . . . . . . . . . . . . CMS Function Gate. . . . . . . . . . . . . . . . • . . . . . • . . . . . . . . '. . . Recovery Routine Functions . . . . • . . . • . . . • . . . . . . . . . . . . . . . . . Diagnostic Output . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . Backout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Drop Catalog Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Storage Freeup . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DEFINE/DELETE Backout . . . . . . . . . . . . . . . • . . . . . . . . . . . . . Debugging Aids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Allocation/Unallocation . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Functional Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unallocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . Batch Initialization and Control. . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Initialization and Control. . . . . . . . . . . . . . . . . . . . . . . . . JFCB Housekeeping . . . . . . . . . . .'. . . . . . . . . . • . . . . . . . . . . . Common Allocation . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fixed Device Allocation . • . • • • . . . . . . . . . . . . . . . . . . . . . . . TP Allocation . • . . . . . . . . • • . . . . . . . . . . . . . . .' . . • . . . . . Generic Allocation . . . . . . . . • . . . • . . . . . . . . . • . . . . . . . . . Recovery Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Common Unallocation . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Volume Mount and Verify. . . • . . . . . . . • . . . . . . . . . . . . . . . . . . General Debugging Aids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Allocation Module Naming Conventions. . . . . . . . . . . . . . . . . • . . . . Registers and Save Areas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Common Allocation Control Block Processing • . . . . . . . . . . . . . . . . . ESTAE Processing . • . . • . . . . . . . .. . . . . . • . . . . • • . . . . . . . . • Debugging Hints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Allocation Serialization . . .'. . . . . . . . . . . . . . . . . . . . • . . . . • . . Subsystem Allocation Serialization. . . . . . . . . . . . . . . . . . . . . . . . . Device Selection Problems (Non-Abend) . . . . . . . . . . . . . . . . . . . . . . Address Space Termination. . . . . . . . . . . . . . . . . . . . . . . . . . . • . OBO Abend. . . . . . . . . . . . . . . . . . . . ' . . . . . . . . . . . . . . . . . . . OC4 Abend in IEFAB4FC, or Loop in IEFDB413 . . . . . . . . . . . . . . . . Volume Mount and Verify (VM&V) Waiting Mechanism . . . . . . . • . . . . . Allocation/Unallocation Reason Codes. . . . . • . . . • . • . . . . • . . . . . . . . Common and Batch Allocation and JFCB Housekeeping Reason Codes . . . . Common and Batch Unallocation Reason Codes . . . . . . . . . . . . . . . . . Dynamic Allocation Reason Codes. . . • . . . • . . . . . . . . . . . . . . . . . JES2 Job Processing Through JES2 . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conversion. . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . Purge . . . . . . . . . . . . . . . . . . '. . . . . . . . . . . . . . . ~ . . . . . . . . xiv OS/VS2 System Programming Library: MVS Diagnostic Techniques 5.9.3 5.9.6 5.9.7 5.9.8 5.9.9 5.10.1 5.10.1 5.10.1 5.10.2 5.10.2 5.10.9 5.10.10 5.10.10 5~10.10 5.10.11 5.10.11 5.10.12 5.10.12 5.10.13 5.10.13 5.10.13 5.10.14 5.10.15 5.11.1 5.11.1 5.11.2 5.11.2 5.11.2 5.11.3 5.11.3 5.11.4 5.11.4 5.11.4 5.11.5 5.11.5 5.11.5 5.11.5 5.11.6 5.11.6 5.11.6 5.11.7 5.11.10 5.11.11 5.11.11 5.11.12 5.11.12 5.11.13 5.11.13 5.11.13 5.11.14 5.11.16 5.11.16 5.11.19 5.11.19 5.12.1 5.12.1 5.12.1 5.12.1 5.12.1 5.12.1 5.12.2 JES2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.2 HASJES20 Program Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.2 HASJES20 Module Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.3 HASP Control Table (HCT) . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5.12.4 HASPSSSM, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.6 Subsystem Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.7 Dispatcher Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.9 $WAIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.9 $$POST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.10 JES2 WAIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.10 Dispatcher Queue Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.10 JES2 Error Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.11 Disastrous Error Routine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.11 JES2 ESTAE Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.13 Catastrophic Error Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.13 JES2 Exit Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.13 Input/Output Error Logging Routine . . . . . . . . . . . . . . . . . . . . . . . . 5.12.14 JES2 $DEBUG Functions In a Multi-Access Spool Configuration . . . . . . . . . 5.12.14 Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.15 Read . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.15 5.12.15 Write . . . . . . . . . . . . . . . . . . . . . . . . . . '. . . . . . . . . Release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.16 Miscellaneous Hints on JES2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.16 Starting JES2 - Enqueue Wait on STCQUE . . . . . . . . . . . . . . . . . . . . 5.12.16 Subsystem Interface (SSI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13.1 5.13.1 System Initialization Processing . . . . . . . . . . . . . . . . . . . . . . . . 5.13.2 Subsystem Interface Major Control Blocks . . . . . . . . . . . . . . . . . 5.13.5 Requesting Subsystem Services . . . . . . . . . . . . . . . . . . . . . . . . 5.13.5 Invoking the Subsystem Interface. . . . . . . . . . . . . . . . . . . . . Logic Flow Examples. . . . . . ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13.7 Notifying a Single Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13.7 Notifying All Active Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13.8 Debugging Hints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 5.13.9 Recovery Termination Manager (R TM) . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.1 Functional Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.1 Work Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.1 Major RTM Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.1 Process Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.2 Hardware Error Processing.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.2 Normal Task Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.4 Abnormal Task Termination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.5 Retry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.6 Cancel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.7 FORCE Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.8 5.14.9 Address-Space Termination . . . . . . . . . . . . . . . . . . . . . 5.14.10 Error ID . . . . . . . . . . . . . . . . . . . . , . . . . . . . . . . . . . . . . SVC Dump Debugging Aids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ,5.14.11 Important SVC Dump Entry Points . . . . . . . . . . . . . . . . . . . . . . . . 5.14.11 BRANCH=YES Option. . . . . . . . . . . . . 5.14.11 BRANCH=NO Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.11 SVC Dump Error Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.12 SYS1.LOGREC Entries Produced for SVC Dump Errors 5.14.12 Fixed Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.12 Variable Data. . . . . . . . . . . . . . . . . . . . . . . . . 5.14.13 Control Blocks Used to Debug SVC Dump Errors . . . . . . . 5.14.14 Address Space Control Block (ASCB) . . . . . . . . . . . . 5.14.14 Recovery Termination Control Table (RTCT) . . . . . . , . . . . . . . . . 5.14.14 SVC Dump Work Area (SDWORK). .. . . . . . . . . . . . . . . . . . . . 5.14.14 Summary Dump Work Area (SMWK). . . . . . . . . 5.14.14 Resource Cleanup for SVC Dump. . . . .. . . . . . . . . . . . . . . . . . . . 5.14.15 Contents 'xv Communications Task . . . . • . . . . . . . . . . . . . . • . . . . . . . . . . . . • . . . Functional Description. . . . . . . . . . . . . • . . . . • . . . . . . . . • . . • . . . Communications Task Control Blocks. . . • . . . . . . . . . . . . . . . . . • . .. Debugging Hints. . . . . . . . . • . .. . . . . • . . • • . . . . . . . . . . . . . • . . Console Not Responding to Attention. . . . . . . • . . . . . . . . . . • . . . . Enabled Wait State. . • . . • . . . . • . . . . . . . . . . . . . . . . . . . . . . . Disabled Wait State . . . . . . . . . • . . . . . . . . . . • . . . . . . . • . . . . . Messages or Replies Lost. . . . . . . . • . . . • . . . . . . . . . . . . . . . . . . No Messages on One Console . . . • . . . . . . . . . . . . . . . . . . . . • . . . Messages Routed to Wrong Console. • . . . . . . . . . • . . . . . . . . . . • . . Truncated Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Console Switching . . . . . . • . . . . . . . . . • . . . . . . . . . . . . . . . . • DIDOCS Trace Table. ; . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . DIDOCS-In-Operation Indicator . . . . . . . . • . . . . . • . . . . . . . . . . . DIDOCS Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A: Process Flows . . . . . . . . . . . . . . . . • • . . . . . . . . . • . . . • . RSM Processing for Page Faults. . . . . . . . • . . . . . . . . . . . . . . . . . . . . .. IEAVPIX Tests . . . . . . . . . • . . . . . • . . . . . . . . . • . . . . . • . . . . .. IEAVGF A Tests. . . . . . . . . . . . . . . . . . . . • . • . . . • . . . . . . . . . . . IEAVPIOP Tests . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . • . . IEAVIOCP Tests . . . . . . . . . . . . • . . . . . . . . • . . . . . • • . . . . . • . . Swapping. . . . . . . . '. . . . . . . . . . . . . . . . • . . • . . . . . . . . . . . . . . .. Swap-In Process . . . . . . . • . . . . . . . . . '. . . . . • . . . . . . . . . . . . . . . Swap-Out Process . . . . . . . . . . ; . . . . . . . . . . . . . . . . . . . . . . . . . . EXCP/IOS . . . . . . . . . . . . . . . . . . . . . . . " . . . . . . . . . . . . . . . . . . . GETMAIN/FREEMAIN . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . " GETMAIN Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FREEMAIN Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VTAM Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TSO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . Time Sharing Initialization. . . . . ~ . . . . . . . . . . . . . . . . . . . . . . . . . . LOGON Processing. . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . • . . . . LOGON Scheduling Diagnostic Aids . . . . . . . . . . . . . . . . . . . . . . . . TSO Line Drop Processing . . . . . . . . . . . . . . . . . . . . • . . . . . • • . . . . TMP and Command Processor Interface . . . . . . . ',' . . . . . . . . . . . • . . . TSO Command Processor Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . TSO Terminal I/O Overview. . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . Terminal Output Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Terminal Input Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . TSO/TIOC Terminal I/O Diagnostic Techniques . . . . . . . . . . . . . . . . . . . TSO Attention Processing . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . 5.i5.1 5.15.2 5.15.4 5.15.6 5.15.6 5.15.6 5.15.7 5.15.7 5.15.8 5.15.8 5.15.9 5.15.9 5.15.9 5.15.10 5.15.10 A.I.l A.1.3 A.l.3 A.l.3 A.l.6 A.l.6 A.2.1 A.2.1 A.2.3 A.3.1 A.4.1 A.4.1 A.4.2 A.5.l A.6.1 A.6.1 A.6.4 A.6.l2 A.6.l4 A.6.l7 A.6.21 A.6.23 A.6.24 A.6.25 A.6.26 A.6.27 Appendix B: Stand-alone Dump Analysis. . . . . . . . . . . . . . . . . . . . • . . . . B.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.l.l Analysis Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • • . B.1.7 Appendix C: Abbreviations . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . C.l.l Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.1,.1 xvi OS/VS2 System Programming Library: MVS Diagnostic Techniques Figures Figure 2-1. Figure 2-2. Figure 2-3. Figure 2-4. Figure 2-5. Figure 2-6. Figure 2-7. Figure 2-8. Figure 2-9. Figure 2-10. Fi~ure 2~11. Figure 2-12. Figure 2-13. Figure 2-14. Figure 2-15. Figure 2-16. Figure 2-17. Figure 2-18. Figure 4-1. Figure 4-2 . . Figure 4-3. Figure 4-4. . Figure 4-5. Figure 5-1. Figure 5-2. Figure 5-3. Figure 5-4. Figure 5-5. Figure 5-6. Figure 5~7. Figure 5-8. Figure 5-9. Figure 5-10. Figure 5-11. Figure 5-12. Figure 5-13. Figure 5-14. Figure 5-15. Figure 5-16. Figure 5-17. Figure 5-18. Figure 5-19. Figure 5-20. Figure 5-21. Figure 5-22. Figure 5-23. Figure 5-24. Figure 5-25. Figure 5-26. Figure 5-27. Figure 5-28. Figure 5-29. Figure 5-30. Figure 5-31. Figure 5-32. Figure 5-33. Definition and Hierarchy of MVS Locks . Bit Map to Show Locks Held on a Processor Classification and Location of Locks. SYS 1.LOG REC Software Incident- Record 1. SYS1.LOGREC Software Incident Record 2. SYS1.LOGREC Software Incident Record 3. Format of the LOGREC Recording Control Buffer Format of Records Within the LOGREC Recording Control Buffer SIGP Return Codes. External Call (XC) Process Flow . Emergency Signal (EMS) Process Flow How to Locate the Trace Table Types of Trace Entries . MVS Trace of a Page Fault Without I/O MVS Trace of a Page Fault With I/O . GTF Trace of a Page Fault Without I/O GTF Trace of a Page Fault With I/O . Trace Example of PER Hardware Monitoring Summary of EP and UCP Mode Traces VTAM I/O Trace Example. VT AM and GTF Traces Example . JES2 Commands for Status Information. System Use of Hardware Components Global SRB Queue Structure and Control Block Relationships Local SRB Queue Structure and Control Block Relationships. Dispatcher Processing Overview lOS Processing Overview Major lOS and EXCP Control Block Relationships Program Manager Modules. Program Manager Control Blocks and Work Areas Program Manager Queues . IEAVNP05 Initialization New PRB Initialization - LINK New RB Initialization - XCTL XCTL RB Manipulation CDE Initialization by IDENTIFY. Module Search Sequence for LINK, ATTACH, XCTL and LOAD. Module Search Seq uence of Private Libraries CDE Allocation . VSM's View of MVS Storage VSM's Control Block Usage VSM's Global Data Area SDWAVRA Error Indicators VSM Cell Pool Management Major RSM Control Blocks and Their Functions Relationship of Critical RSM Control Blocks Page Stealing Process Flow. Converting Virtual Addresses to Real Addresses. Relationship of Important ASM Control Blocks. Locating an LSID From an LPID . Relating the Virtual Address to the PART and PAT Page/Swap Data Set Error Action Matrix. SRM Control Block Overview . SRM Module/Entry Point Cross Reference VTAM Control Block Structure Several RPHs Waiting for the Same Lock. 2.3.2 2.3.4 2.3.6 2.4.4 2.4.7 2.4.11 2.4.16 2.4.16 2.5.8 2.5.12 2.5.14 2.6.1 2.6.2 2.6.3 2.6.3 2.6.4 2.6.4 2.8.16 · 4.3.3 .4.3.8 · 4.3.14 · 4.4.2 · 4.4.3 · 5.1.5 · 5.1.7 · 5.1.9 · 5.2.2 · 5.2.3 · 5.3.2 · 5.3.3 · 5.3.3 · 5.3.6 · 5.3.7 · 5.3.9 · 5.3.10 · 5.3.13 · 5.3.15 · 5.3.16 · 5.3.17 · 5.4.2 · 5.4.4 .5.4.7 · 5.4.9 · 5.4.11 .5.5.1 ·5.5.2 5.5.7 5.5.14 5.6.5 5.6.11 5.6.13 5.6.17 5.7.4 5.7.20 5.8.3 5.8.9 Figures xvii Figure 5-34. Figure 5-35. Figure 5-36. Figure 5-37. Figure 5-38. Figure 5-39. Figure 5-40. Figure 5-41. Figure 5-42. Figure 5-43. Figure 544. Figure 5-45. Figure 5-46. Figure 5-47. Figure 5-48. Figure 5-49. Figure 5-50. Figure 5-51. Figure 5-52. Figure 5-53. Figure 5-54. Figure A-I. Figure A-2. Figure A-3. Figure A4. Figure A-5. Figure A-6. Figure A-7. Figure A-8. Figure A-9. Figure A-10. Figure A-II. Figure A-12. Figure A-13. Figure B-l. xviii Sample Storage Pool Dump Queueing of RPHs While Waiting for Storage Relationship of the Six Major Functions of Allocation/Unallocation . Common Allocation Input. Common Allocation Control Blocks After Construction of Volunit Table and EDLs. . VM& V Control Block Structure . HASJES20 Module Map Locating the JES2 Module Directory in HASPNUC. HCT Major Vector Fields . The Subsystem Vector Table . HASPSSM - HASJES20 - OS/VS2 Relationship . Formal Subsystem-Interface Vectors. JES2 Queue Control Fields. JES2 Processor Control Element Relationships . Example Dump of JES2 Processor Queue Chains Major JES2 Control Blocks. Subsystem Interface Control Block Usage Control Block Structure for Invoking Subsystem Interface Finding the SSIB for a Job When SSOB Pointer is Zero Sequence of Communications Task Processing . Communications Task Control Block Structure. Page Fault Process Flow Swap-In Process Flow . Swap-Out Process Flow. IOS/EXCP Process Flow VTAM SEND Process Flow Overview of Logon Processing. TCAM Organization After a TSO Logon . Logon Work Area LOGON Work Area Bits That Indicatt;: the Currently Executing Module. LOGON Scheduling Post Codes Overview of TSO Line Drop Process. Summary of Command Processor Recovery Activity TSO Attention Flow Standalone Dump Analysis Flowchart OS/VS2 System Programming Library: MVS Diagnostic Techniques · 5.8.13 · 5.8.14 .5.11.1 · 5.11.8 .5.11.9 .5.11.15 · 5.12.3 · 5.12.4 .5.12.5 .5.12.6 · 5.12.6 · 5.12.8 · 5.12.9 · 5.12.11 · 5.12.12 · 5.12.17 · 5.13.4 · 5.13.6 · 5.13.6 · 5.15.3 · 5.15.5 · A.1.4 · A.2.2 · A.2.4 · A.3.2 · A.5.2 · A.6.2 · A.6.7 · A.6.9 · · · · · A.6.12 A.6.13 A.6.15 A.6.22 A.6.28 · B.I.6 Summary of Amendments for GC28-072S-2 VS2 Release 3.7 Changes have been made throughout this publication to reflect a Service Update to OS/VS2 Release 3.7 and to include the following topics: Diagnostic Aids Information Information from OSjVS2 System Logic Library, Volume 7, SY28-0719, was added in the following topics: • • • • • • • Started task control (STC) abend and reason codes. Scheduler work area (SWA) manager reason codes. Auxiliary storage manager (ASM) diagnostic aids and serialization information. Allocation/unallocation reason codes. TSO logon scheduling. Communications task overview and diagnostic aids. DIDOCS diagnostic aids. Also, diagnostic aids information was added for: • Error recovery procedures (ERPs). • Converting virtual addresses to real addresses. • JES2 miscellaneous hints. Interactive Problem Control System (IPCS), SU57 Overview information was added for IPCS. Miscellaneous Changes Throughout the text: • Minor technical and editorial changes were made. • References to DSS (dynamic support system) were removed. • References to EREPO were changed to EREPI (environmental recording editing and printing). Summary of Amendments xix xx OS/VS2' System Programming Library: MVS Diagnostic Techniques Section 1. General Introduction This section introduces basic MVS problem analysis and provides an overview of the interactive problem control system (lpeS). Basic MVS Problem Analysis Techniques Problem isolation and determination are significantly more complex in MVS than in previous operating systems because of: • Enabled System Design which has made the internal and environmental statussaving functions more extensive than those of previous system. • Multiprocessing (MP) which potentially allows the execution of code in sequences not encountered in a uniprocessing (UP) environment. MP can also cause contention for serially reuseable resources. (In this manual, MP refers to multiprocessing on both multiprocessors and attached processors.) • Locking Mechanism which facilitates Enabled System Design and Multiprocessing functions and maintains data integrity. • Subsystems which are responsible for processing work requested from the system. They maintain their own work queues, control block structures and dispatching mechanisms - all of which must be understood in order to effectively pursue problems in the MVS operating system. • Software Recovery which attempts to keep the system available despite errors. • The large number of new components which provide new functions and whose internal logic must be understood for effective problem determination. As a result of this complexity, MVS problem solvers have made two adjustments in their diagnostic outlook: • Rather than learning the system logic at an instruction or module level, they have learned the system in terms of component interactions at the interface level. • They have learned that the most effective problem analysis at a system level is obtained from a disciplined, almost formal, diagnostic approach. Section 1. General Introduction 1.1.1 • Section 1: General Introduction (continued) This publication contains those debugging techniques and guidelines that have proven the most useful to problem solvers with several years experience in analyzing MVS system problems. These techniques are presented in terms of a debugging "approach" that can be summarized in three steps: 1. Identifying the external symptom of the problem. 2. Gathering relevant data from system data areas in order to isolate the problem to a component. 3. Analyzing the component to determine the cause of the problem. The most important step in this approach is often the first - correctly identifying the external symptom of a problem. To do this, it is best to get a description of the problem as it was perceived by an eyewitness. You will want a description that provides a context from which to start, such as: "System is looping; can't get in from console." '''Job abended with 213" "I/O error on 251." "Console locked out." "Terminal hung, keyboard locked." "System in wait, nothing running." "Bad output." "Job won't cancel." "System degrading. Very slow." "System died." "OC4 in component abc." The list is endless, of course. Your objective is to fit one (or more) of these descriptions to one of the following external symptoms. • Enabled wait - The system isnot executing any work and when it takes interrupts, nothing happens. Something appears to be stuck. • Disabled wait - The system freezes with a disabled PSW that has the wait bit on. This can be either an explicit and intentional disabled wait or a situation that occurs because the PSW area has been overlaid. Unfortunately, the latter is more often the case. • Disabled loop - This is normally a small (fewer than 50 instructions) loop in disabled code. ,e 1.1.2 Enabled loop - This is normally a large loop in enabled code (and may include disabled portions - loops as a result of interrupts). OS/VS2 System Programming Library: MVS Diagnostic Techniques I \ Section 1. General Introduction (continued) • Program check - The program is automatically cancelled by the system, usually because of improper specification or incorrect use of instructions or data in the program. The program check message gives the location of the failing operation and the condition code. If a SYSABEND, SYSMDUMP, or SYSUDUMP DD statement was included in the JeL for the job, a dump of the problem program will be taken. I. ABEND - The system issues an SVC 13 with a specific code from 1 to 4095 to indicate an abnormal situation. • Incorrect output - The system is not producing expected output. Incorrect output can be categorized as: missing records, duplicate records or invalid data that has sequence errors, incorrect values, format errors, or meaningless data. If a program has apparently executed successfully, incorrect results will not be detected until the data is used at some future time. • Performance degradation - A bottleneck or system failure (hardware or software) has severely degraded job execution and throughput. • TP problem - A problem, usually detected by the operator or terminal user, that indicates malfunctions are affecting one or more terminals,lines, etc. The chapters in Section 4 (Symptom Analysis Approach) will help you identify these symptoms. The main rule at this stage of your analysis is to proceed carefully. When first screening a problem, do not assume too much. Don't even assume that the original eye witness description was correct. Keep all initial information about the problem as a reference for your later analysis. In the course of identifying the correct external symptom, you will begin gathering data that will lead you to other sections of the publication. Specific data gathering techniques are contained in Sections 2 and 3. Section 2 describes the major MVS debugging areas such as LOGREC records and recovery work areas. Section 3 describes how to use a storage dump effectively as your main source of diagnostic material. Eventually you should have gathered enough data to isolate the problem to a particular component or process. Section 5 and Appendix A provide techniques for analyzing system components and processes so that you can determine the cause of the problem. Appendix Bcontains a step-by-step procedure that can be used as a guide for analyzing a stand-alone dump. I Note: Before you begin using this publication for problem analysis, scan through it to find out where the various types of information are located. Depending on your current debugging ski1llevel, various sections will be more important than others. Always keep in mind that trouble-shooting a system of the internal complexity of MVS is not always an "If A, then B" procedure. The guidelines and techniques presented in this publication define "generally" what the analyst will discover. The nature of the debugging process is such that the problem solver does not perform the same analysis for every problem. Section 1. Generallntroduction 1.1.3 Section 1: General Introduction (continued) IPCS - Interactive Problem Control System The interactive problem control system (IPCS) provides MVS installations with expanded capabilities for diagnosing software failures and facilities for managing problem information and status. IPCS includes facilities for: • Online examination of storage dumps. • Analysis of key MVS system components and control blocks. • Online management of a directory of software problems that have occurred in the user's system. • Online management of a directory of problem-related data, such as dumps or the output of service aids. IPCS runs as a command processor under TSO, allowing the user to make use of existing TSO facilities from IPCS, including the ability to create anti execute command procedures (CLISTs) containing the IPCS command and its subcommands. IPCS supports three forms of MVS storage dumps: I • High-speed stand-alone dumps produced by AMDSADMP. • Virtual dum,ps produced by MVS SDUMP on SYSl.DUMP data sets. • Virtual dumps produced by MVS SDUMP on data sets specified by the SYSMDUMP DD statement. Dumps on data sets specified by the SYSABEND or SYSUDUMP DD statements cannot be analyzed using the IPCS facilities. For information about IPCS, refer to the OS/VS2 MVS Interactive Problem Control System (IPCS) User's Guide and Reference. 1.1.4 OS/VS2 System Programming Library: MVS Diagnostic Techniques ( " Section 2. Important Considerations Unique to MVS This section describes concepts and functions that are unique to the MVS environment and useful to problem analysis. It also contains miscellaneous debugging hints and general data gathering techniques. The chapters in this section are: • Global System Analysis • System Execution Modes and Status Saving • Locking • Use of Recovery Work Areas in Problem Analysis • Effects of Multi-Processing on Problem Analysis • MVS Trace Analysis • Miscellaneous Debugging Hints • Additional Data Gathering Techniques Section 2: Important Considerations Unique to MVS ·2.1.1 2.1.2 OS/VS2 System Programming Library: .MVS Diagnostic Techniques Global System Analysis In trying to isolate a problem to an intemal symptom, a global system analysis often uncovers enough data to provide a starting point for the actual problem isolation and debugging. This chapter discusses the main considerations the analyst should be aware of when analyzing a stand-alone dump, including: • The system areas that should be inspected to understand the current system state at the time of a dump • The system areas that should be examined to understand the current state of the work in the system and the current disposition of storage and tasks Global Indicators That Determine the Current System State The following areas should be examined to help determine the current state of the system: I. PSA - occupies the first 4K bytes of real storage for each processor. Note that absolute 0 is not used during normal system operation on a machine with the MP feature - this is true whether the system is operating in MP or UP. (The one exception is a control program that is system generated with ACRCODE=NO.) During NIP processing the PSA(s) for the processor(s) are initialized and the prefix register(s) are initialized to point to them. Special Notes About Standalone Dumps: • Before taking a stand-alone dump, it is necessary to perform a STORE STATUS operation. This hardware facility does not use prefixing; instead it stores values such as the current PSW, registers, CPU timer, and clock comparator in the unprefixed PSA (the one used before NIP initialized the prefix register) at absolute address 100. The dump program subsequently saves these values and, in an MP environment, issues a SIGP instruction to the other processor requesting a STORE STATUS operation. As a result, these values in the unprefixed PSA are overlaid by the second processor's values. Therefore, in an MP environment the status in the unprefixed PSA is always that of the non-IPLed processor, not the one on which the standalone dump was IPLed. • In a machine not equipped with the MP feature and therefore without prefixing, the IPLing of the stand-alone dump program causes low storage (0-X'18') to be overlaid with CCWs. You should be aware of this and not consider it as a low storage overlay. Section 2: Important Considerations Unique to MVS 2.1.3 Global System Analysis (continued) ;; • In an MP environment, the STORE STATUS operation must be performed only from the processor to be IPLed for the stand-alone dump program. • IPLing the stand-alone dump program twice causes the storage dump to contain a dump of itself because it was read in for the first IPL. This causes the dump program to overlay a certain portion of the nucleus (generally starting at X'7000') and the general purpose registers to contain values associated with the stand~alone dump program and not MYS. • If the operator does not issue the STORE STATUS instruction before IPLing a stand-alone dump, the message "ONLY GENERAL PURPOSE REGS YALID" appears on the formatted dump. The PSW, coritrol registers, etc., are not included. This greatly hampers the debugger's task: . ) 2. Registers and PSW - The print dump program formats the current PSW and the general, floating point, and control registers associated with each processor. From these, you can determine the program executing on each processor. If the current PSW is 070EOOOO 00000000 and the GPRs are all 0, you are in the no~work wait condition, which indicates no ready work is available for this processor to execute. If there is or should be work remaining, an invalid wait condition results. (Refer to the chapter on "Waits" in Section 4.) If the registers are not equal to zero and the PSW does no~ contain the wait bit (X'0002'), there is an active program. If the wait task is dispatched, the system is in the no-work wait condition. 3. ILC/CC - location X'84' for external interrupts; location X'88' for SYC interrupts; location X'8C' for program interrupts. These fields indicate the last type of interrupt associated with each interrupt class for each processor. The work active when each interrupt occurs is represented by the old PSWs aLlocations:' X'18' (external); X'20' (SYC); X'28' (program). Common conten ts 0 f these fields are: X'84' 2.1.4 00001004 clock comparator 00001005 CPU timer - 00001201 SIGP-emergency signal - 00001202 SIGP-external call OS!VS2 System Programming LibraI)': MVS Diagnostic Techniques ';j Global System Analysis (continued) X'88' - 000200xx where xx is the SVC number. This field should be inspected for 1,!nusual SVCs such as: 1, - WAIT: can indicate an enabled wait situation D - ABEND: can indicate program error processing F - ERREXCP: can indicate· a problem in I/O error processing 10 .PURGE: can indicate a problem in the swap process 38 ENQ: can indicate a resource contention problem 4F STATUS: can indicate a non-dispatchability problem X'8C' - OOOXOO 11 indicates a'page fault interrupt. Anything other than a code of 11 is highly suspect and must be inspected further. Also with a code of 11 ~ the program check old PSW (location X'28') must be enabled (mask X'OT) because disabled page faults are not allowed in MVS and it is an error if one occurs. = 4. PSA + X'204' (CPU ID) 5. PSA + X'21 0' (address of LCCA - 1 per processor) - The LCCA contains many of the status-saving areas that were located'in low storage in previous systems. It is used for software environment saving and indications. The registers associated with each of the interrupts you find in the PSA are saved in this area. In addition, the system mode indicators for each processor are maintained in the LCCA. 6. PSA + X'224' (PSAAOLD) - This is the address of the ASCB of the work last dIspatched on each processor. This field indicates the address space that is curren tly executing. 7. PSA + X'21C' (PSATOLD) - This is the address of the TCB of the work last dispatched on each processor. This field in conjunction with PSAAOLD isolates to a task within an address space. Note: PSATOLD=O when SRBs are dispatched. 8. PSA + X'228' (PSASUPER) - This is a field of bits that represent various supervisory functions in the system. If a loop is suspected, these bits should be checked in an attempt to isolate the looping process. Note: Because of SRM timer proces~ing in MVS, the external first level interrupt handler bit (X'20') or the dispatcher bit (X'04') may be set.in this field even in the enabled wait situation. 9. PSA + X'2F8' (PSAHLHI) - This field indicates the current locks held on each processor. Knowing which locks are held helps isolate the problem, especially in a loop situation. By determining the lock holders you can isolate the current process. (See the chapter on "Locking" later in this section.) Section 2: Important Considerations Unique to MVS 2~1.5 Global System Analysis (continued) 10. PSA + X'380'(PSACSTK) ~ This is the address of the active recovery stack which contains the addresses of the recovery routines to be routed control in case of an error. If the address is other than X'COO' (normal stack), the type .' of stack (for example, program check FLIH or restart FLIH) is meaningful, especially inthe.Joop situation. By searching the normal stack (X'COO') and associating the recovery routine to active mainline'routines you may get an idea of the current process. This is true only ifthe pointer to the current entry is not X'C34,' which would indicate an empty recovery stack. Note: If a loop is suspected, the first word following each routine address in the current stack should be scanned. A X'80'indicates that routine is in control. A X'40' indicates that routine is in control and that it is a nested recovery routine .. If X'IO' into the stack is non-zero, also check for an SDWA address at X'44' into the active stack. This block is mapped by the SDWA DSECT and is described in the Debugging Handbook, (RTCA and SDWA are different names for the same control block.) If an SDWA address is present, an error has occurred and it can be related to the problem you are analyzing. If trapping via RTM's SLIP facility, the registers at entry to RTM are contained in this area. At this point you should understand each processor's current activity, any possible errors that have been detected by recovery, and the current system state or mode. Work Queues, TCBs and Address Space Analysis Examine the following areas to help determine the current state of work in the system. TCBSummary The TCB summary report, produced by AMDPRDMP (print dump program), contains a s,ummary of the address spaces .and their associated tasks. A quick scan of the completion (CMP) field for each task reveals any abnormal terminations that have occurred. Discovery of an error completion code warrants further investigationasto the cause. Remember, however, that these codes are residual and thejob or task might have recovered from the problem. , Also investigate mUltiple abnormal completion codes which all relate to the same area of the system, or many tasks that all have the same completion code. These 'completion codes. can all relate to one area of the .system and perhaps to the problem you are investigating. Again, LOGREC should provide further documentation in an error situation such as this. 2.1.6 OS/VS2 System Programming Library: MVS Diagnostic Techniques Global System Analysis (continued) Once you understand the system's history from a trace, LOGREC,.and error viewpoint, you should examine the work to be done as your next step to understanding the problem. SRB Dispatching Queues The print dump program formats the SRB dispatching queues. Elements on any of these queues should be investigated, especially in cases where no work appears to be progressing through the system. Elements on the global or local service manager queues (GSMQ/LSMQ) can indicate that the dispatcher has not received control since these SRBs were scheduled. This is an unusual condition that should be investigated. It can also indicate that the CVT anchors for these queues have been inadvertently altered. This again is an error condition. Elements on the GSPLs/LSPLs should be explained. It is possible the dump was taken before the SRB routines were able to execute. But it more likely indicates some other system problem such as an enabled wait or disabled loop. If there are SRBs on an LSPL, you should determine if the associated address space is swapped-into storage and if it is not, why not. (Possible causes are real frame shortage or a problem in the paging/swapping mechanism.) Again this is an indication of a potential system problem. The chapter on "Waits" in Section 4 and the chapter on "Dispatcher" in Section 5 contain additional information on the dispatching queues. If, at this point, you can isolate the problem to a component, refer to the "Component Analysis" for that component in Section 5. The chapter on "Waits" in Section 4 should prove helpful if you have isolated to a problem in the system. Address Space Analysis If you have isolated the error to a given address space or wish to determine the state of a given address space, analyze the ASCB. Important indicators in the ASCB are: • . ASCBLOCK (ASCB + X'80') - to determine the specific state of the local lock. If it. contains 7FFFFFFF or FFFFFFFF (the lock suspend/interrupt IDs), refer to the chapter on "Locking" later in this section for an explanation. Note: When holding a suspend lock, code can only be suspended because it attempts to obtain an unavailable higher suspend lock or because of a page fault. To find the reason for the suspension, refer to the discussion of Task Analysis later in this chapter and to the chapter on "Locking" later in this section. Section 2: . Important Considerations Unique to MVS 2.1.7 Global System Analysis (continued) • ASCBEWST (ASeB + X'48') - to determine the TOD clock value when the address 'space last executed. This field helps you determine how long an address space has been swapped-out. By subtracting this field (middle four digits) from the last timer value in the MVS trace table and converting to seconds, you can discover the approximate swap-out time. (See the chapter "MVS Trace Analysis" later in this section.) • ASCBRCTF (ASCB +X'66'), - current status of the address space. ASCBFLGI (ASCB + X'67') • ASCBASXB (ASCB + X'6C') - pointer to the ASXB that anchors the T~Bs. • ASCBSRBS (ASCB + X'76') - number ofSRBs active (currently executing or suspended) in the address space. • ASCBOUCB (ASCB + X'90') - pointer to the OUCB, whicH is helpful when determining Why an address space is swappedout. • ASCBFMCT (ASCB + X'?8')- number of real frames currently occupied by the address space. • ASCBTCBS (ASCB + X'7C') number of ready TCBs. ASCBCPUS (ASCB + X'20') - number of processors running tasks in this address space. Task Analysis Once you understand the ASCB you should analyze the associated task structure. Once again, scan the TCBs associated with your address space and look for an abnormal completion field. While doing so, check the RB structure for each task. Remember that the region control task, dump task, and started task control are represented by the firstthree TCBs. "Normally" they will be waiting during task execution. If one of them is not, you should determine why. Assuming the first three TCBs ar.e not obvious problem areas, continue inspecting the remaining TCBs. You are trying to explain each RB. Starting with the last RB created (the first RB, pointed to by the TCB + 0), determine what work is represented. If work is waiting, find out.why. Note: The master scheduler address space has system task TCBs that differ from other address spaces. Refer to the diagrams for Master Scheduler Initialization, Start Initiator, and Job Execution in the topic "General System Flow" in the Debugging Handbook, Volume 1 for details of the TCB structures. 2.LB OS/VS2 System Programming Library: MVS Diagnostic Techniques Global System Analysis (continued) The RBOPSW indicates the issuer of an explicit WAIT. If an explicit WAIT is not obvious, consider the following suspension possibilities and their associated key indicators: 1. If ASCBLOCK = X'7FFFFFFF' or X'FFFFFFFF', the status (registers and PSW) of the suspended or interrupted task is saved in the IHSA (ASCB + X'6C' points to ASXB; ASXB + X'20' points to IHSA). The reason for suspension is important. If it is for a lock, find oUJ what address space or task owns that lock and what the owners.' state is. (The chapter on "Locking" later in this section shows how to determine lock owners.) If it is for a page fault, find out of the state of that page fault. Note also that while the RBTRANS field points to the page fault causing address, the RBWCF is O. Note: If a task owned the local lock at the time of the suspension or interrupt, the TCB active indicators and the TCBCPUID (last processor on which this task was dispatched) is set on. If no TCB in the task structure has these indicators set, you can assume an SRB owned the lock. If no SRBs are on the CMS suspend queue, the suspension is probably the result of a page fault. An SRB can be suspended because of a page fault or a request for an unavailable suspend lock. The save area for the suspended SRB is the SSRB (see the Debugging Handbook). If suspended for page fault processing, the SSRB is pointed to by the corresponding PCB+IC. PCBs are generally chained together and anchored in two locations: (1) the RSMHDR for local address space page faults; (2) the PVT for page faults caused by referencing commonly addressable storage. Note that if real frames were not available when the page fault occurred, even local page faults are queued from the PVT on the defer queue (PVTGFADF, PVT + X'7S4'). For a CMS lock request, the SSRB is on the CMS lock suspended queue. Se~ the chapter on ''Waits'' in Section 4 for details on how to locate the SSRB. For Local lock suspensions, the SSRBs are chained together on a queue anchored in the ASCB (ASCB + X'84'). A locked TCB can be suspended for the same reasons as an SRB. The save area is the IHSA (described in the Debugging Handbook). The IHSA is valid during a page fault if the corresponding PCB+8 flag is on, indicating the lock 'was held at the time of the page fault. Also, the TCBLLH (TCB + X'I14') is set to X'O l' if the task was locally locked at the time of the page fault. The IHSA is valid for a CMS lock suspension if the ASCB is on the eMS lock suspend queue at label CMSASBF in IEANUCO 1. The TCB can be suspended because of a page fault while holding both the local and CMS locks. One way to tell is that the ASCB+X'67' flag for the CMS lock is turned on and the ASCB address is in the CMS lockword. Section 2: Important Considerations Unique to MVS 2.1.9 Global System Analysis (continued) 2. If ASCBLOCK = X'OOOOOOOO' and the memory/task is waiting, the status is saved in the RB/TCB. (See the chapter on "System Execution Modes and Status Saving" later in this section.) 3. Suspended SRBs can cause bottlenecks. The chapter on "System Execution Modes and Status Saving" can aid in locating any suspended SRBs that relate to the address space. Note: Do not spend time looking for them unless other facts about the problem indicate a potential problem in this area. By far the most important consideration in task analysis is the RB structure of each task. Generally if you have isolated the problem to an address space, RB analysis shows a potential problem in the way of: • Long RB chains • Contention caused by an ENQ (SVC 38) request • Page fault waits • I/O waits • Abnormal termination processing, that is, SVC D RB Once you have analyzed the RB structure you might want to go back and further analyze the TCBs. Following are additional important fields in the TCB: 1. TCBFLGS (TCB + X'ID') - indicators of how the system currently considers this task. 2. TCBGRS (TCB + X'30') -.general purpose registers (O-IS) saved when a TYPE 1 SVC is issued or for an interruption for a non-locked task. 3. TCBSCNDY (TCB + X'AC') - additional system indicators for this task that help to determine why this task is not executing. 4. TCBRTWA (TCB + X'EO') - pointer to the RTM2 work area (mapped in the Debugging Handbook) which contains information similar to the SDWA but also data for RTM processing. Summary This chapter contains major considerations you must be aware of when analyzing a stand-alone dump in MVS. A disciplined approach is important; resist the tendency to go off on tangents upon finding the first unexplainable condition. After gathering all the facts, try to resolve the "cause and effect" situations you are bound to uncover. Generally, at this point you will have isolated the error and can start a detailed component/process analysis. 2.1.10 OS/VS2 System Programming Library: MVS Diagnostic Techniques ~ ~ System Execution Modes and Status Saving MVS differs significantly from previous operating systems by having multiple execution modes. Status is saved and·restored from many different locations depending upon the execution mode at the time control was lost. This chapter explains those modes and how they affect problem analysis. System Execution Modes MVS has four execution modes: 1. Task mode 2. SRB mode 3. Physically disabled mode 4. Locked mode Code always executes in one of these modes or, in certain cases, in a combination of modes. For instance, code running in task or SRB mode can also be either locally locked or physically disabled. Task Mode Task mode describes code that is executing in the system because the dispatcher selected work from the task control block (TCB) chain. To start execution, the dispatcher sets up the environment (registers and PSW) and then passes control to the code to be executed. The registers and PSW are found in one of two places: 1. In the TCB at TCBGRS (TCB+X'30'), which is a register save area used when unlocked, enabled TCB mode work is interrupted. The PSW is obtained from the request block (RB) that is found through the TCB+O. 2. In the IHSA (interrupt handler save area), which is used to save registers when locally locked task mode code is interrupted. IHSA is found through ASXB+X'20'; ASXB is found through ASCB+X'6C'. The PSW for locally locked tasks is obtained from the IHSA. Task mode is probably the most common execution mode. All programs given control via ATTACH, LINK, and XCTL operate in this mode. Section 2: Important Considerations Unique to MVS 2.2.1 System Exe~ution Modes and Status Saving (continued) SRB Mode SRB (service request block) mode describes code that is executing in the system because the dispatcher fmds an SRB on one of the SRB queues. SRB set-up is started by the SCHEDULE macro. SCHEDULE is an in-line macro that places the requestor-furnished ~RB on one of two service queues, local or global, depending on the requestor's specification. These queues can be found from the CVT at CVTGSMQ (CVT+X'264'), which contains the address of the global service manager queue, or at CVTLSMQ (CVT+X'268'), which contains the address of the local service manager queue. Whenever the dispatcher finds work on either queue, the SRBs are moved to the corresponding system priority list queue. The global system priority list qlleue (GSPL), which contains globally scheduled SRBs, is found from the CVT at CVTGSPL (CVT+X'26C'). There is also one local system priority list queue (LSPL) per address space. Each LSPL, which is found from the ASCB at ASCBSPL (+X'IC'), contains all SRBs locally scheduled by the requestor and also those SRBs that were globally scheduled when the targeted address space was swapped out. SRBs are selected from these LSPLs by thy dispatcherin order to start execution. The dispatcher loads registers 0, 1, 14, and 15 from information in the SRB and builds the PSW. The PSW key and address are the responsibility of the scheduler of the SRB and are specified in the SRB. SRB mode has the characteristics of being enabled, supervisor state, key requested and non-preemptable. Nonpreemptable means that the interrupt handler should return control to the interrupted service routine (code running under SRB mode). However, service routines can be suspended because of a page fault or because a lock (CMS or local) is unavailable. Physically Disabled Mode Disabled mode is reserved for high-priority system code whose function is the manipulation of criticlllsystem queues and data areas. It is usually combined with supervisor state and key 0 in the PSW, and assures that the routine running disabled is able to complete its function before losing control. It is restricted to just a few modules in MVS (for example, interrupt handlers, the dispatcher, and programs holding a global spin lock). Physically disabled mode is used for one of two reasons: 1. To assure that data remains static while the code is referencing or updating the data. 2. To assure that non-reentrant code does not lose control while performing critical system functions. For example, lOS must run disabled while enqueueing and dequeueing requests to UCBs and while updating UCBs at the start and end of I/O operations. 2.2.2 OS/VS2 System Programming LibraJY: MVS Diagnostic Techniques System Execution Modes and Status Saving (continued) In the MVS system, physical disablement on a system basis because of MP must be accompanied by locking in order to guarantee serialization. MVS disabled code is also always accompanied by either a global spin lock or code executing under a "super bit". The "super bits" are located in each processor's PSA (X'228'). They are used primarily for recovery reasons - they allow RTM to recognize that a disabled supervisory function was in control at the time of error even though global locks were not held. This indicates that FRR recovery processing should be initiated by RTM. Note that type 1 SVCs do not execute disabled in MVS. Instead they are entered with the local lock. Thus they are considered to be task mode physically enabled, holding the local lock. Locked Mode Locked mode describes code executing in the system while owning a lock. (See the chapter on "Locking". later in this section.) A lock can be requested during any execution mode (SRB, TCB, physically disabled). Status saving while in a locked mode requires unique considerations from the system. An example is a program that invokes a type 1 SVC, such as EXCP or WAIT, that executes in locked mode. When a type 1 SVC is enabled, it can be interrupted. However, if the SVC is interrupted, the registers cannot be saved in the T<;B beca:use it is being used to save registers active at the time of the SVC request for return to the requestor. Therefore, status must be saved elsewhere. For programs executing in locked mode, status is saved according to the condition surrounding the programs, as follows: Locally locked task is interrupted. A new area, the IHSA interrupt handler save area (IHSA), has been defmed in MVS to contain the status when a locally locked task is interrupted. The IHSA is found from the ASCB + X'6C,' which points to the ASXB; the ASXB + X'20' points to the IHSA. Locally locked SRB is interrupted. When locally locked SRBs are interrupted, there is no problem because SRBs are non-preemptable. The registers and PSW are saved in the LCCA. When the system has handled the interrupt, the SLIHs return to the FLIHs, the status is restored from the LCCA, and control is returned to the interrupted SRB routine. Locally locked SRB is suspended. Locally locked SRBs that are suspended must have their status saved in a unique area. The process that suspends an SRB is responsible for obtaining an SSRB (suspended SRB), 'which will contain the interrupted status and will also serve as the control block used to reschedule the service routine once the reason for suspension has been resolved. See "Locating Status Information in a Storage Dump" later in this chapter for a detailed description of how to find these SSRBs. Section 2: Important Considerations Unique to MVS 2.2.3 System Execution Modes and Status Saving (continued) Determining Execution Mode from a Stand-alone Dump Knowing the system's execution mode at the time a stand-alone dump was taken is important in analyzing adisabled coded wait state or a loop. The folloWing areas may help determine the mode of execution: LCCA Indicators - There are two bytes of important dispatcher flags in the LCCA + X'21C'. At location X'21D', the LCCADSRW flag is turned on just prior to any LPSW (Load PSW) for a global SRB, a Local SRB, or task dispatch. For a global SRB, the LCCAGSRB and LCCASRBM flags are also set on. For a Local SR,B, only the LCCASRBM flag is set on in addition to LCCADSRW. PSA Indicators • Super Bits - Flags in the supervisor control word located at PSA + X'228' indicate whether the dump was taken while in one of the interrupt handlers or dispatcher. • Recovery Stack If the first two words of the RTM stack vector table (PSA + X'380') are not equal, then control is in one of the interrupt handlers or the dispatcher. Compare the address at PSA + X'380' with each entry in the FRR stack vector table starting at PSA + X'384' to determine the owner of the active stack. (See the chapter on "Use of Recovery Work Areas for Problem Analysis" later in this section for stack vector table analysis.) • Current Work PSA + X'218' contains the addresses of the new TCB, old TCB, new ASCB and old ASCB consecutively in a four-word area. If the system is in SRB mode, the address of the old TCB equals O. If the addresses of the new and old ASCBs are not equal, then the stand-alone dump was taken between the time that an address space switch was requested and the time the dispatcher dispatched an address space or a global SRB was dispatched. In all cases, the old TCB and ASCB indicate the current work. • Locks The PSA also contains the lock indicators. (See the chapter on "Locking" later in this section for a description of how to determine the lock mode.) ASCB Indicators - The following ASCB locations help determine execution mode: 2.2.4 X'lC' Address of the local service priority list, which contains SRBs queued for dispatching. X'66-67' RCT flags. X'72-73' Non-dispatchability flags. OS/VS2 System Programming Library: MVS Diagnostic Techniques System Execution Modes and Status Saving (continued) X'76' Count of SRBs dispatched in this address space. X'7C' X'80' Number of ready TCBs in this address space. Local lock (see the chapter on "Locking'~ later in this section for how to interpret this field when f 0). X'84' Address of the SRB suspend queue for unavailable local lock requestors. Keep in mind that mixed modes frequently occur. For example, a local SRB can obtain a lock, be interrupted, and the stand-alone dump taken while disabled in the I/O supervisor. Depending on the system mode at the time of the interrupt, a task's status (registers, PSW, etc.) can be saved in one of several places. Locating Status Information in a Storage Dump Status information is located in a storage dump depending on the conditions under which it was saved . • Task and SRB Mode Interruptions: Status saving is required whenever the code gives up control, whether voluntarily or involuntarily. Initial status is saved by the first level interrupt handler (FLIH) as follows: SVC FLIH (task mode only) - Initially: registers saved at LCCA+X'380' (LCCASGPR) Then for Type 1 and Type 4 SVCs: registers moved to TCB+X'30' (TCBGRS) PSW moved from PSA to requestor's RB Then for Type 2, 3, and 4 SVCs: Registers moved to SVRB PSW moved from PSA to requestor's RB I/O FLIH - Initially: registers saved at LCCA+X'lCO' (LCCAGPGR) PSW saved at LCCA+X'200' (LCCAIOPS) Then for unlocked tasks: Registers moved to TCB PSW moved to RB Section 2: Important Considerations Unique to MVS 2.2.5 System Execution Modes and Status Saving (continued) For locked tasks (CMS or local): registers moved to IHSA ASCB+X'6C' ~ASXB ASXB+X'20' ---.. IHSA PSW moved to IHSA For SRBs: registers remain in LCCA PSW remains in LCCA External FLIH - Initially: registers saved at LCCA+X'AO' (LCCAXGRI) Then for recursion purposes: registers moved to LCCA+X'EO' PSW is in PSA+X'240' (LCCAXGR2) (pSAEXPSI) I f first recursion: registers moved from LCCA+X'AO' to LCCA+X'120' PSW is in PSA+X'248' If second recursion: registers moved to LCCA+X'AO', where they stay PSW is in PSA+X'18' (LCCAXGRI) (LCCAXGR3) (PSAWXPS2) (LCCAXGRI) (FLCEOPSW) Note: Subsequent status manipulation for tasks and SRBs is the same as for the I/O FLIH(that is, the movement from LCCA to TCB or IHSA is identical). Program check - Initially: Then: registers saved at LCCA+8 (LCCAPGRI) registers moved to LCCA+X'48" PSW is in LCCA+X'88' (LCCAPGR2) (LCCAPPSW) For page faults that require I/O the following occurs: Unlocked tasks: registers moved to TCB PSW moved to RB Locked tasks: registers moved to IHSA PSW moved to IHSA SRBs: Are suspended: see "SRB Suspension" later in this chapter. Note: For SRB code, status is not moved from the LCCA save areas. SRBs are non-preemptable and are given control back immediately, with the status being restored from the LCCA. • 2.2.6 Locally Locked Task Suspension: Status saving is the same as for locked task interruptions (described earlier under "I/O FLIH") except that IHSAalso contains the floating point registers, the FRR stacks, and the PSW. The ASCBLOCK field is updated to contain X'7FFFFFFF'. OS/VS2 System Programming Library: MVS Diagnostic Techniques System Execution Modes and Status Saving (continued) • SRB Suspension: An SRB can be suspended in two cases. If a service routine encounters a page fault and a page-in is required, then the SRB routine must give up control. In that event, an SSRB (suspended SRB) must be obtained and the status saved in that con trol block. Then the SSRB is queued from the page control block (PCB) in the real storage manager. When the paging I/O completes, the SSRB is re-queued to the local service priority list (LSPL) where it is found later by the dispatcher. The SSRB must be obtained because the original SRB was not retained after the dispatch. Status saved in an SSRB must include the current FRR stack. The second case of SRB suspension is an unconditional request for an unavailable lock. Status saving for SRB suspension for a lock differs from the page fault where the SSRB is queued and where control returns after the redispatch of the SSRB. For a request for a 10ca1lock that is unavailable, the SSRB is queued from the ASCB. For a request for an unavailable CMS lock, the SSRB is queued on the CMS suspend queue header. (For more detail see the chapter on "Locking" later in this section.) In both cases of SRB suspension, resumption is at the appropdate entry in the lock manager to try to acquire the lock. Upon release of the eMS lock by the holder, any SSRBs are rescheduled. Upon release of the local lock by the holder, the first SSRB that was suspended is given the local lock and rescheduled. Suspend SRB queues can be summarized: Page Faults PCB is chained from PVTCIOQF (at PVT+X'75C') for a common area page and from RSMLIOQ (at RSMHD+X'24') for a private area page. PCB+X'lC' points to SSRB. Local Lock Requests SSRB is queued from ASCBLSQH (ASCB+X'84'). eMS Locked SSRB is queued from the CMS SRB suspend queue in lEAVESLA as shown: PSALITA (PSA + X'2FC') ~ +0 LIT t DISP LOCK' ~ IEAVESLA DISP LOCK SALLOC LOCK SRM LOCK 00000000 CMS lockword and queue header for SR,Bs and ASCBs suspended for CMS t 10 +14 CMS LOCK CMS SUSPEND Q HDR Section 2: Important Considerations Unique to MVS 2.2.7 2.2.8 OS/VS2 System Programming Library: MVS Diagnostic Techniques Locking Serialization of resources to provide data integrity and protection is a necessary function of operating systems. In pre-MVS systems, resource serialization was accomplished by physical disablement and by the ENQ/DEQ component. Physical disablement controls only one processor and thus, in MP systems, does not guarantee serialization. To achieve these requirements the locking facility provides: • Serialization in a tightly-coupled MP system • Serialization across address spaces for common resources • Serialization within address spaces A central lock manager acquires and maintains all locks. Use of the lock manager is restricted to key 0 programs running in supervisor state, which prevents unauthorized problem programs from interfering with the serialization process. The lock manager is located in the nucleus in CSECT lEAVELK. Classes of Locks MVS locks are divided into two classes: • Global Locks, which protect serially reusable resources related to more than one address space. These resources provide system-wide services or use control information in the common area. Examples of resources protected by global locks are UCBs and dispatcher control blocks. • Local Locks, which protect serially reusable resources assigned to a particular address space. When a task or SRB holds a local lock, the queues and control blocks serialized by that lock can be used only by the task or SRB holding the lock. Figure 2-1 defines the MVS locks. All MVS locks, except the local lock, are global locks. Section 2: Important Considerations Unique to MVS 2.3.1 Locking (continued) Name Description DISP Global dispatcher lock - serializes all functions associated with the dispatching queues. ASM Auxiliary storage management lock.....; serializes the auxiliary storage resources. SALLOC Space allocation lock - serializes real storage management (RSM) resources; virtual storage management (VSM) global resources, and some auxiliary storage management (ASM) resources. 10SYNCH I/O supervisor synchronization lock - serializes the lOS purge function and other lOS resources. 10SCAT lOS channel availability table lock - serializes the lOS processorrelated save area. 10SUCB lOS unit control block lock - serializes access and updates to the unit control blocks. There is one lock per UCB. 10SLCH lOS logical channel queue lock - serializes access and updates to the lOS logical channel queues. There is pne lock per channel queue. SRM System resources manager lock - serializes use of the SRM control blocks 'and associated data. CMS Cross memory services lock - serializes on more than one address space where this serialization is not provided by one or more of the other global locks. Provides global serialization when enablement is required. LOCAL Local storage lock - serializes functions and storage within a local address space. There is one lock per address space. ~" Note: Locks are listed in hierarchical order, with DISP being the highest lock in the hierarchy. Figure 2-1. Definition and Hierarchy of MVS Locks Types of Locks Two types oflocks exist. The type determines what happens when a processor makes an unconditional request for a lock that is unavailable. The types are: 2.3.2 • Spin locks - prevent the requesting processor from doing any work until the lock is cleared by the other processor. The requesting processor enters a loop in the lock manager (lEAVELK) that keeps testing the lock until the other processor releases it. As soon as the resource is free, the first processor can obtaip the resource and continue processing. • Suspend locks - prevent the requesting program from doing work until the lock is available, but allow the processor to continue doing other work. The request is queued by suspending the requesting task or SRB, and the requesting processor is dispatched to do other work. Upon release of the lock, the highest priority queued requestor is given c'ontrol of the lock, except in the case of the local lock. Upon release of the local lock, the first SSRB will be given the lock and rescheduled. OS/VS2 System Programming Library: MVS Diagnostic Techniques Locking (continued) Combining classes and types oflocks provide three categories oflocks: Global Spin Lock, which is used primarily to provide serialization in MP systems. While code is executing under a global spin lock, it is physically disabled. An unconditional request for an unavailable lock will cause the processor to spinin the lock manager. Upon release of the global spin lock, the looping processor acquires ownership and returns control to the requestor. The global spin locks supported by MVS are: DISP, SALLOC, ASM, IOSYNCH, IOSCAT, IOSUCB, IOSLCH, and SRM. Local Suspend Lock, which is used to serialize resources within an address space. There is one local suspend lock per address space and it is located in the ASCB. An unconditional request for the local lock when it is not available causes the suspension of the requesting task or SRB until the lock is released. Global Suspend Lock, which is used to serialize resources that are commonly addressable from any address space. The requestor remains physically enabled while owning the lock. The CMS (cross memory services) lock is the only supported global suspend lock. The local lock must be held in order to obtain the CMS lock. An unconditional request for the CMS lock when it is unavailable causes suspension of the requesting task or SRB. Locking Hierarchy To prevent a deadlock between processors, MVS locks are arranged in a hierarchy, and a processor may unconditionally request only locks higher in the hierarchy than locks that it currently holds. The locking hierarchy is the order in "which the locks are listed in Figure 2-1 with DISP being the highest lock in the hierarchy. Some locks are single system locks (for example, DISP), and some locks are multiple locks in which there is more than one lock within the lock level (for example,IOSUCB). For those global lock levels that have more than one lock, a processor may only hold one lock of each level. For example, if a processor holds an IOSUCB lock, it may not request a different IOSUCB lock. The local lock must be held by the caller when requesting the eMS lock. Also, the local lock cannot be released while holding the CMS lock. It is not necessary to obtain all locks in the hierarchy up to the highest lock needed. Only the needed locks have to be obtained, but in hierarchical sequence. Section 2:' Important Considerations Unique to MVS 2.3.3 Locking (continued) Determining Which Locks Are Held On a Processor To diagnose certain MVS problems, such as wait states and performance degradation, it is necessary to d.etermine the lock status of the system as well as the back-up of work caused by lock contention., Locks held by a particular processor are indicated in the processors PSA (prefixed save area). There is a bit map in the PSA which the lock manager checks when a request is made for a lock. This map is called PSAHLHI (PSA highest lock held indicator). Each bit corresponds to a particular lock in the hierarchy. The bits are in the same order as the hierarchy so that the low-order bit corresponds to the lowest lock in the lock hierarchy. When a bit is on, it means that lock is held ~y the corresponding processor. Figure 2-2 shows the bit assignments. (Note: When a holder of a eMS or local lock is .suspended, the corresponding bit in the PSAHLHI field is reset to 0 even though the lock is still held.) PSAHLH I (location X~2F8' in PSA) . .2FA 2F8 10 08 04 02 01 00 00 00 00 00 00 00 00 00 00 00 00 00 80 40 20 10 08 04 02 01 DISP ASM SALLOC IOSYNCH IOSCAT IOSUCB IOSLCH not assigned not assigned not assigned SRM CMS LOCAL Figure 2-2. Bit Map to Show Locks Held on a Processor 2.3.4 OS/VS2System Programming Library: MVS Diagnostic Techniques Locking (continued) Content of Lockwords Each lock is represented by a lockword that defmes the availability and status of the lock. The contents of lockwords differ according to the type oflock they describe: Global Spin Lockword X'OOOOOOOO' - Lock is available. X'00000040' - Lock is held on processor O. X'00000041' - Lock is held on processor I. Global Suspend Lockword (CMS Lock) X'OOOOOOOO' - Lock is available. X'OOxxxxxx' - ASCB address of owner of lock. If an address space owned the CMS lock but was interrupted or suspended, the ASCBCMSH flag in ASCBFLG 1 is turned on and the CMS lock-held bit in PSAHLHI is turned off until the address space is redispatched. The ASCB address remains in the CMS lock until it is released. Locai Suspend Lockword (Local Lock) X'OOOOOOOO' - Lock is available. X'00000040' - Lock is held on processor O. X'00000041' - Lock is held on processor 1. X'7FFFFFFF' - Task or SRB suspended while holding the lock. The reason for suspension is either a page fault or an unconditional request for the CMS lock while it was unavailable. X'FFFFFFFF' - Task or SRB holding the local lock was suspended or interrupted but is now dispatchable. The reasons for this state are: • A page fault has been resolved for a locked task or SRB. • The CMS lock, at one time unavailable, is now available. • A higher priority address space was given control over this locked task. How To Find Lockwords Lockwords for single system locks are located in a table called lEAVESLA (pSA + X'2FC' points to the lock interface table (LIT); LIT + 0 points to IEAVESLA). They can also be located at the label IEAVESLA in a NUCMAP. Lockwords for multiple system locks are supplied by the requestor of the lock. The addresses of these are placed in the PSA for each processor at locations X'284' to X'298'. Section 2: Important Considerations Unique to MVS 2.3.5 Locking (continued) . The location of all the lockwords>are shown in Figure 2-3. Note that all lockwords must reside in fixed common storage. . ( Location of Address of Lock (when actually held) Lock Name Class Type Number of Locks Location of Lock DISP Global Spin 1 IEAVESLA+O ASM Global Spin 1 per ASID ASMHD+X'14' SALLOC Global Spin 1 IEAVESLA+4 IOSYNCH Global Spin 1 IOCOM+X'38' PSA+X'28C' IOSCAT Global Spin 1 IOCOM+X'30' PSA+X'290' IOSUCB Global Spin 1 per UCB UCB-8 PSA+X'294' IOSLCH Global Spin 1 per LCH LCH+8 PSA+X'298' SRM' Global Spin 1 IEAVESLA+8 CMS Global Suspend 1 IEAVESLA+X'10' LOCAL Local Suspend 1 per address space ASCB+X'80' PSA+X'284' *PSA+X'2FC' points to the lock interface table; the lock interface table +0 points to IEAVESLA. Figure 2-3. Classification and Location of Locks ( 2.3.6 OS!VS2 System Ptogramming Library: .MVS Diagnostic Techniques Locking (continued) Results of Requests for Unavailable Locks Global Spin Locks - An unconditional request for a global spin lock results in a disabled loop in lEAVELK. In this case, register 11 contains the address of the requested lock and register 14 contains the address of the requestor. Local Locks - Tasks requestil)g an unavailable local lock are suspended. In each case, the request block old PSW (RBOPSW) is set to re-enter the lock manager, and the registers are saved in the TCB. Note: The dispatcher will not dispatch any task in the address space other than the holder of the lock until the lock is released. SRBs requesting an unavailable local lock are suspended. In each case, the lock manager obtains an SSRB and places the GPRs and the current FRR stack there. Notes: 1. The FRR stack can be used to help recreate the process leading up to the point of suspension by interpreting the recovery routines that are currently active. SSRBs for local lock suspensions can be found by inspecting the local lock suspend queue anchored in the ASCB from field ASCBLSQH (ASCB+X'84'). SSRBs are obtained from SQA (SP 245). SSRBs on the local lock suspend queue are chained together at SRB+4. 2. When interrogating a given address space, if the ASCBLOCK field is not 0, check the ASCBLSQH to determine the SRB work being delayed in this address space because of lock contention. eMS Lock Tasks unconditionally requesting the CMS lock when it is unavailable are suspended. For each task: • GPRs are saved in the IHSA which is pointed to from ASXB + X'20'. • The resume PSW is set to re-enter the lock manager. • The ASCB is queued on the CMS suspend queue. (The first element of the CMS suspend queue is anchored in CSECT IEAVESLA + X'14'; this anchor points to either an SSRB or an ASCB which is suspended for the CMS lock. There is only one queue for suspended CMS lock requesters.) Note: When a NUCMAP is not available, locate the IEAVESLA through PSA + X'2FC' which contains the address of the lock interface table; the lock interface table + X'O' contains the address of I EAVES LA. Section 2: Important Considerations Unique to MVS 2.3.7 Locking (continued) The address spaces suspended on the CMS lock are represented by the ASCBs on the CMS suspend queue. The ASCBs are chained together at the field ASCBCMSF (forward pointer). Note: When an ASCB is on the CMS suspend queue, the ASCBLOCK contains X'7FFFFFFF'. When the CMS lock is released, the ASCBLOCK is changed to X'FFFFFFFF', which indicates that work was interrupted but it is now ready to be resumed. SRBs unconditionally requesting the CMS lock when it is unavailable are suspended. For each SRB, the lock manager: • Obtains an SSRB from SQA • Saves GPRs and the FRR stack in the SSRB • Sets ASCBLOCK to X'7FFFFFFF' • Chains the SSRB on the CMS SRB suspend queue located in IEAVESLA (IEAVESLA + X'14') Note: Since there is only one queue for suspended CMS lock requesters, the SSRBs and ASCBs are chained on the CMS suspend queue using either ASCBCMSF (ASCB + X'C') or SRBFLNK (SSRB + 4). There are no backward pointers. Thus the CMS suspend queue could have the following appearance: PSALITA (PSA+X'2FC') c: t +0 ) SSRB DISPLOCK ' .Joo. J +4 - +C , (SSRB IEAVESLA DISP LOCK +4 SRM LOCK 0-- - - --0 +1 4 2.3.8 CMS SUSPEND ./ QHDR / OS/VS2 System Programming Library: MVS Diagnostic Techniques , +4 ...a.. ASCB L/ +C SALLOC LOCK CMS LOCK __ SSRB ASCB ..lo.ASCB ~/ Use of Recovery Work Areas For Problem Analysis Recovery processing, which is unique to MVS, enhances the reliability of the operating system. When an error occurs, "active recovery" is given control, one routine at a time, in an attempt to isolate the error to a unit of work. Recovery terminates that work instead of the entire operating system and then continues normal system operation. This process occurs whether the error is in the system or an application. Because system operation is not halted at the point of error, the resulting storage dumps represent system status sometime after the original error(s). Often the system can encounter numerous errors, fully recover, and continue. At other times it can be a recovery failure that causes the system to cease operations and to take a stand-alone dump. In either case, the obvious problem and its associated tracks have been covered over. This makes the back-tracking process extremely difficult. However, experience has shown that although recovery causes this difficulty, it can very often provide valuable clues for the problem analyst. This chapter points out important recovery areas and explains how they can be used in the debugging process. CAUTION: Recovery is not designed to aid the problem solver; it is designed as a means by which the system can prevent total loss. Because recovery maintains system status information, its work areas often provide the same information to the analyst. However, once recovery is invoked, the system is in a tenuous position; it is attempting to maintain operation despite an error. It is possible that the recovery process itself can encounter the same error or bad data. Most often this is not the case; the system does recover and continues normal operation. But the possibility of recursive errors in the recovery process does exist, in which case the new error becomes of prime consideration. If you are dependent on internal recovery conrol blocks and queues, be aware of this possibility. Don't get caught following a chain of blocks for some subsequent or unrelated problem that will help your own error-finding efforts. This danger is most prevalent when you use recovery work areas without following the normal work-related debugging techniques. Do not immediately use the RTM2 work area without analyzing the Task/RB structure and associated indicators. The following work areas should be used carefully and only after traditional techniq ues have failed. The exceptions to this rule are: • When the dump is taken as a result of a trap (for example, SLIP) and the analyst understands that the current status at the time of error can only be found by using the recovery save areas. • When there are problems in the recovery process itself. In other instances, be aware of the total environment so that what you discover in these areas bears some relationship to the problem you are analyzing. These areas are of great importance if used with understanding. Section 2: Important Considerations Unique to MVS 2.4.1 Use Of Recovery Work Areas For Problem Analysis (continued) SYSl.WGREC Analysis For effective problem analysis, use the information in SYSl.LOGREC to understand the error history Of the system. Because of recovery processing, MVS does riot halt operation when an error occurs. Dump analysis must be performed using a snapshot of storage as it appears sometime after the error and recovery have occurred; therefore, some type of recording mechanism is needed in order to trace the error. The entries in SYSl.LOGREC provide information about a potential problem. This is the most informative data about the error that you receive. The SYS1.LOGREC entries serve as a diagnostic trace of the problem encountered by the operating system; they usually provide a history of events leading up to a system incident. Use this information to understand system problems, the recovery acti()ns that are taken as a ,result of these problems, and the outcome of the recovery attempt. Often more than one record exists for the same software incident. You must be able to relate these records in the proper sequence and understand the progress of recover'y the various records indicate. Knowing the errors that have occurred since the last IPL helps you understand the system behavior and explains; your findings at dump analysis time. In stand-alone dump analysis you should always inspect the in-storage LOGREC buffer for entries that recovery routines have made but which were not written to the SySl.LOGREC data set because of a system problem. Very often it is these records that are the key to the problern solution. (There is a discussion of LOGREC buffer analysis· later in this chapter.) , Information that is w!itten by recovery routines to the SYSl.LOGREC data set is used primarily to monitor incidents both when retry is attempted and when percolation to the next recovery routine takes place. Generally, functional recovery routines (FRRs) will write a SYSl.LOGREC record whenever they are entered. The default for ESTAE routines, however, is to not write a record. This means that unless the ESTAE routine specifically requests recording, no SYSl.LOGREC record will be built. Listing the SYSl.LOGREC Data Set To get a listing of the SYSl.LOGREC data set, use the IFCEREPI service aid as described in OS/VS Environmental Recording Editing and Printing (EREP) , Program. (The JCL required to print the SYSl.LOGREC data set is contained in the chapter "Additional Data Gathering" later in this section. It is important to obtain both an event history and a full report. The event history (EVENT=Y parameter on the EXEC statement) prints an abstract for all records in chronological' order. This allows the analyst to recreate the sequence of events.) IFCEREPI formats the standard area, the first X'194' bytes of each SDWA, into a series of titles, each followed by pertinent data found in the standard area. IFCEREPI will put the variable area, the last X'6C' bytes of each SDWA, in an alphameric or hexadecimal format, whichever is specified. This variable area is 2.4.2 OS/VS2 System Programming Lihrary:MVS Diagnostic Techniques Use of Recovery Work Areas For Problem Analysis (continued) used by the recovery routines to construct messages and to provide data that often contains valuable debugging information. There are five different types of software incidents for which the failure is written to SYSl.LOGREC. They are: 1. ABEND (SVC 13) 2. Invalid SVC 3. MCH software recovery attempt 4. Program check 5. Restart key depressed SYSl.LOGREC Records This section contains examples and explanations of three different types of error records that you can obtain from SYS1.LOGREC. SYSl.LOGREC Software Incident Record 1 Figure 24 is an example of the data that is recorded in SYS1.LOGREC when a software (source) entry is recorded as the result of an SVC 13. The following explanations are called out by Notes A-E in the example: Note A: The CSECT name is IDAVBPPI; it can be found in module IDDWI. IDAVBPRI is the FRR that processed the error under consideration. The EC PSW indicates that SVC D was issued at location X'F4BB64'. Note B: Approximately midway into the formatted record shown, you find more spe~ific information about why this particular LOGREC entry was made. Note B points out three bits that reflect the status of the system a t the time this failure was detected: • SVC was issued by a locked or SRB routine. • Logically disabled (physically disabled, locked, or SRB) routine was in contr.ol. . • Type 1 SVC routine was in control. Note C: For this LOGREC record there is no formatted entry for the system completion code. Only a portion of the recorded software incidents are assigned a system completion code. A system completion code can be found at X'04' bytes into the SDWA, which can be found unformatted at the bottom of the record. Also, if the cause of a record is an abend SVC, a completion code is contained in register 1 under "Regs at time of error". The system completion code for this failure is OE3. Section 2: Important Considerations Unique toMVS 2.4.3 N CA TE ~. _ ________ _.____ _ ___ ~ 20 31 01 93 CPU RELEASE .SE_R IAL ___ .1D ____ LEVEL _ _ _ __ 023732 0158 VS 2 REL. 03 ---JO:.iNAME--------·----- --NWll1ADP2 ~ ASE;~D P::tOG~ I:-..IG ~","'E AM N~ME OF :-1UDULE INVULVtD ~ NAME OF CSECT INVOLVED FU/liCTIO,"Al RECJ.VERY RJUTIilE--- IDAVBP~l 1 ~ Note A !3 4 -"f,GS 0:"7__ ~fC!S_8_-:15 _ _ _ JJOOO_OJ_O___~_OLi.~ 774 ~ J:~ST -a: LE~jGTH COOE _______________ I~TtRRUPTCJDE ~ _____ ._ _ _._ _ _ _ _ _ _ ~OOOOOOLO_O_OOOOOQ _ _ _ __ _ _ _ _ _ OOOoOOO_Loo_Oonooo coeceoco ________________ 50F~B9_rj~0_OC~ nt~50F:j_q~_E.9~O)O-"0..! _S_W__ BC _____ t,/A. __ ~ N i CPU HH "\'4 SS.TH __ --- RECORD ENTRY SOURCE - SO~TWARE --- TYPE SOFT~A~E(SVC 13) 117 75 -.ERRORID=SEQOOOS6 CPU0040 ASID0002 TIME 20.31.01.0 o _.._---- TIKE _______ DAy_n ESTAE Raco FOR ESTAI) INTE~RUPT CJOE VI RT Ai3..QLQf-IBANS EX CEP 0000 EXCEP___ QQ204]_20 _v_l_~I~OV.~_P£_J.~t\'lS F~U~ 070COOOO 00 F4EE90 0000 0020272~0=<-- ______________ REGS OF RB LEVEL UF ESTAE EXIT OR ZERO FOR ESTAI ~ __~_EG S __ .9:-:_ !-. ___ OOOCOuO o___0 CC.E_~OC_C. _ _C~_~.4 ~E .?~__ gq9_09~9~i;A 7 ~£! __O.9J~_Al FOB_ _QQ 1 OOOO_1__ 0:JC~_.?~D-'=-oJ_ _ _ _ _ _ _ _ __ tI:l o ~EGS ~. 00000J03 3-15 ---- - ,Y.CH FCAG- ~ 40F45174 13 YIE: ------------tiCIC a OJCA7B28 CCCCY:JCO f~puT--niFo F;{A;:'-E KEY FAILURE ~CK--_:~{CUP_L)--NOf-RECb;;:i)-E-D--O--REGISTE:RSW~pr{EC-I-C-TABLE _.1 1'1(, S TA'1~ I S VALID_ _____O_PSW _U:--'PREDI C TABLE STORAGE ~. ~ g. APE VALID ADD~S STO~AGE STURAGE IS RECONFIGURED 0 STORAGE DATA CHECK RF.C[).\FIGUtlE_ STHUS_ AVAIL__oJ_ _ ACJLR_ECUESJ_______ :;>~'_~".F!GU~E NJT ATT[r-IPTED 0 If\SlRLCTICN FAILuRE _________________________.SQElllW_R TIMER ERROR ______ ____ ______ BEGINNING VIRT ADDR OF STORAGE CHECK OOCOOOOO E~J[~G VI~T ADDP OF STORAGE CHECK COCCOOOO . = .~. ~ REAL STORAGE fAIl. INC; ADDRESS- -~~cHiNE~C-~rE-tK 0 50F4b9B4 00CA73b4 ERRURI~~lJICA SVC IN TDR.S STn~AGEERROR 0 CHA,ilGE INDICATOR ON 0 0 0 0 0 0 FRA~E OFFLINf(O~ SCHE~) 0 INTERCEPT 0 STJQAGE __ E_~~_O~_P~~MA~E/'{L_Q _ _ PERMANENT RES. STORAGE 0 FB_AME_r~LSOA_.. Q FRAME IN LSQA FR"~L£..A....GE 0 F~A~E IS Tlr-1_LS_T~~OL~_S_S.QCJAL~lLJ-1A_C!::il,"lL~.!:i£CK 0 0 0 FIXED V=~ REC_Oi ;'l(hE.i{ LOCK 0 LeeK -. --------6---------..----- ---SR~ ° ° ° UPDAlE8 ~EGS F~F ~ET~Y C IUSCAT LUCK FREE RTCABEFCRE- RETR-Y---o--iu-s-uc-e-COCK lOSLCr LeCK IGS~~CH LOCK _. ___ ._______ l\CB LeeK D~(e LeCK ACBDEdS LeCK -------------------~AS-M-PAT LOCK _______ SA L L C.C L uc,~ CNS LOCK LOCAL LOCK DUMP Crlt.~ACHRI ST ICS-------- .---. - -- - IOSC~T -------.--- LOCKWO~U 00100000 I OS-UCB L-OCK-\-I-j~D o~o~~50o;!.0------------IOSLCH LuCKwu~D OOu 00 00o ________________________ ILSVNCH LCCKWORO 00000000 _0 ___\lC6 L[)CKIJORD _ _ _ _ _ _ _ _ OOO~OOOO'____________________ . 0 D~CB LGCKWO~D 00000000 a ACBJEBS LOCK~ORD OOOOOOOJ 0 A-SMPA T L O~C~K'-.!.~"~J~R!.!.D~------~O~O~O~O~O~O~O~O~-------------0 AS IDe U::.:.R.:..:R..:.:E~N.:...:T'___ _ _ _ _ _0.:..0.:...:0.:...:2=--_ _ _ _ _ _ _ _ _ _ _ _ __ 0 Note E 0 . 0 DU~P OU~P FLAGS Note 0 SuAfA OPTlCNS __ S_NA.~D;)JI,~REQUEST_ 0_ _ 01 SPLA_Y_NL:Cl,.LUS PARM LIST SUPPLIED 0 DISPLAY SOA STORAGE LIST SU?PLIEO 0 DISPLAY LSOA ---- ---. - -.- - --.. - - -- 01 SPLAY--S~A ________________~____________~DISPL~V GTF TRACE TA~lE 151 SPLAY CCNTPGL BLOCKS _ _ _ _ _ _ _ _ . __ ._________ .__ DlSi~L.t1'l._QC3l.Q_EL_S USER VAol ABLE EnCOIe DATA (") ct. g ~ ~ ~ g .... n g ~ HE-X Dll."oIP OF .0' c: (I> S ~tI.l N ~ u-. DJ_SJ~L~Y_SAVc':'A~E_~_S_ _ _ _ _O _ _~A.!'\IGE ~TD QOCA7B2B 00C,AID3B_ DISPLAY SAVE AREA HEADER 0 RANGE 2 80CAIFOB 00CAIF2C DISPLAY oEGISTEqS 0 RANGe 3 00000000 OOCOOOOO 0·1 SI>LAY- TASK LPA-"MoDuLESo--RANGE-4-00000()O-O-000-ooooo DISPLAV TASK J?A~~~J~D~U~L;E~S__~O_________________________________ D1SP.LAY PSW 0 DI SEl..,A'L_V5ELs...UBPQQL.S'-_----"'0'___ _ _ _ _ _ _ _ _."..-_ _ _ _ _ __ 1 Rr-CORO ----'-H:..;:Ec.::.:Ac--DEQ.----4-oii~6BCO eCOCOOOO --.-~.-- .. - - :-~-.. - -- ~JOO OJOJOC9C 0020 OJ40 0060 OO~4EE90 - 0075117F 11494895 -------~-- ----_.---------_.. ~ ~ ° aOOE300Q 00000000 000000~J C000DC9C 00CA73F4 OOCAIF08 000,)0000-- O~CA 7B28--5CF 4oC;S·i.--CCCAiB64 00000))0 00000000 070C2000 OOF4~B66 -----'cO~-O____cOO-2ooJrj--ooici72 0 oetcoeco ooon 000 OOAO 001)0001 OOCA1CDO CCCOOCOO' 4aF~8174 - - - - OaCO-50F455Eb--00000004--0000::1000--0-0000000 OOEO COGOCOOF GeCCOCOe C40AOeCO O)OOOJJO - - - - - 0 1 0 0 OO)O')J 05---.)0) ooO·J 0---00 0) 0 J 00--0 00 OJ 000 012000020038 C9C4C4E6 C9404040 C9C4CIE5 -----'6-140-0-0-.5b-j-j 60---1 C:O:)000---CCCA-7B 28--0 JC Ai 038 0160 OOJOojOJ 00000000 OCOOOOOO 40000000 ·----O-18045320066--,-c-cccoco-c--ccc-OO·cc-o--oooo6-o-0-6 JIAO 6~C9C4Cl (5C2D101 F16SC14D C4E2D7C3 ----ClCO--7EC3ClFl C6FOFbo"O--6000J006--oo-0-00000 OleO OCCOOOOO CCCCGeee eocoooco 0000)000 f1? ~ 0 0 0 1 0 0 AOEA F~OM ~Cl~~~f~~~B~~i~m~O~~~e,A(~~)~~~8-----------·--------------~--------------- 1 Note C g' RANGES PDAlA CPTlu~S Figure 2-4. SYS l.LOGREC Software Incident Record 1 (part 2 of 2) 00023782 OOOOOJ~O 0158C2AO - OOJOOOOO 0010C001 00CA7:~0 -------------- 05D605C5 60C609D9 --------------_._._ .. - - - - - - - - 00000000 00000000 000E3000 40F4B774 50F4B8E·6--000~JJ:)4--00a·00ooo--OOOOoOOO------ 0002000D 00202720 070COOOO 00F4EE90 00 F4EE90 OOOOJC9C O'JCA 7OF4 aOCA IF 08 00000000 00CA7B28 50F4B9B4 00CA7864 b-oooo-oO-O--00o-0600o---oe-oo~io·o-0--cio-coocOO·-----O)OOOOJO 0)CCE550 OOJ00J~O 00e00000 0000-) 0 ~o--o 0 oeo C)O 5 - - 0 Cooc 0 0-0--0001)(1000-----C207D7Fl CQC4CIE5 C2D7D9Fl 00CCE5CO 8 JCAi F::I8 OJ CA1F 2 C 00000 JO 0 00000000:<.-----05D6D5C5 60C6D9)9 00380040 00020006 006C4033 05 E-b-FOF 1 Ci-C401F2 6~E·5E-~D-6-----5D7EC3Cl F7C2F2FS 68C140C2 ~4C6C350 000 00 O-C)O----O 00-0 0000--00 0 0-0-0 0 0---00-0 0 o-e 0-0 - - - - JOOOOOJO 0)000)00 OJOOOOOO 00000000 Use of Recovery Work Areas For Problem Analysis (continued) Given the name of the module involved in the error, you can determine the id of the failing component by using the "Module Summary" section of the Debugging Handbook. This summary also names the corresponding PLM for each component. Component microfiche numbers are found in the "Component Summary" section of the Debugging Handbook, Volume 1. Note D: The "Diagnostic Aids" section of the OS/VS2 VIO Logic describes the diagnostic output for module IDAVBPPI. It explains that the recovery routine sets starting and ending addresses for the DSPCT header and the BUFC in the SDWADPSL fie~d of the SDWA. A diagnostic message is then built in the variable recording area in the SDWA (at X'194'). This message is formatted in the LOGREC record under the 'User Variable EBCDIC Data' field, just above the unformatted SDWA. (Also see Note P.) Note E: The entries in the 'Dump Characteristics' section of this LOGREC record reflect the SDATA and PDATA options specified by the recovery routine for a SYSABEND, SYSMDUMP, or SYSUDUMP. All recovery routines can specify exactly what portions of storage are dumped. In addition, the recovery routines can specify a list of storage ranges that are to be dumped. In the dump for the failure in this example, the only area of storage displayed would be the SWA. A range of addresses would also be included. Range 1 is from CA7B28 to CA7D38; range 2 is from CAIF08 to CAIF2C. In summary, from studying this LOGREC entry you find that the module IDAVBPPI has detected an error and has issued a OE3 ABEND, with a return code of 4. At the time of the failure, the system was logically disabled and a type 1 SVC was in control. SVC 13 was issued while the system was logically disabled, which is why the LOGREC entry was written. A functional recovery routine, module IDAVBPRI, was given control and tried to recover from the error. It was unsuccessful so it dumped the scheduler work area (SW A) and two sections of stor~ge where the DSPCT and an important parameter list were located. The module then percolated to. the next higher FRR in the stack. Note that the 'Recovery Return Code' field (SDWA X'FC') = 00; this indicates percolation. A code of '04' indicates that retry was requested. + SYSl.LOGREC Software Incident Record 2 Figure 2-5 is another example of the type of data recorded in SYSI.LOGREC when a software incident occurs. Compare this example with record 1 in order to understand the different types of information that you can . obtain from SYS1.LOGREC. First compare the time stamp at the top of this record with that in record 1. These times are either identical or just a fraction of a second apart whenever the system is percolating through FRRs. 2.4.6 OS/VS2 System Programming Library: MVS Diagnostic Techniques ----- -- ----~--------.-- ..- - - - DATE D.\YY~ TIME C?U CPU RELEASE SERIAL 10 lEV_El___-:-:-- HHMM-SS.TH SO{YWflR E--':::=--iYP-E-SO~iwAP:E (SVC -'1-3''''-- 117 -75--2-0- 31 01 ·93---·02J·78Z--oisa-·--vs 2 REl. 03 ERRORID=SEQ00056 CPU0040 ASID0002 TIME20.31.01.0 -J-U3r~A-Yft:'----'- -NW01ADP2' ASENDING_PROGRAM NAME ____ r:-JA: BC M_QPE PSW.-!LU_.11LQ_Lf.fi.~PR BC MOOE PSW OF LAST RB ---' RECORD f:tiTQY SUURCf - UF ~UDULE INVULVED IDDWI OF CS ECT. INVOLVED _ ._ _ IDD~t TkM._ _ _ __ ~FUNCTIONAL RECuVERY RJuT1NE IDCWlfRR Note G_____ NA~E _~A~E REGS AT TIME OF c c CC.Of) OLOO_v 0000 0_ O_O_::I_Q.O_O_O'(LO 0 0_0_0.0.0.0,--_ _ __ E~~OR u c;-c--oooo oc REGS 0-7---- 0~)6uoOJ-·-Ii OOT3(Y60-Q0F4EE 9COOC A 7BF400CAl F 08 __ ~EG~ __ .8-1 '?___ o~9_q'O.QO_ _ 40F!!fHJ_4_ _COQ..9..Q.00 LA.Y--SOA ---0 D-i S?LAY-SAVE--:-AREA -HEADER 0 RANGE-2--0-0-CS2800-CO-CS-2B20-_ _S.lQ~~G_L~J..iL$l).?p_~_t~Q. _ _Q_ 01 S PL~LL_SQ_A l _ _D.t.5£>L A~_~EGISTER S 0 RA~G~_ _8_Q~B2A~JL_J~9C62~F8 DISPLAY S~A 1 DISPLAY TASK LPA MODULES 0 RANGE ~ 00000000 00000000 DISPL~Y GTF TRACE TABLE 0 DISPLAYTAS~K~J~P~A~M~D~O.~U~L~E~S__O~________________________________ DISPLAY CCNTQOL BLeCKS 0 DISPLAY PSW DISPLAY OCB/OELS 0DISPLAY uSER SUBPODLS,____~__~________________~___________ USER VARIABLEEa-C-DIC--DATA--~·--------- ··-/NoteH~-- ------- - t-t sr-. J ~ rIl ~ <§ -:rua-NA-,~:E'; N ~ OTAD? 2, V10--- L:"10 o,i; I JD..r;"kTtxCp;Af! C-S-)---ooc"Ifzs-bS; A {V DSC-fD =i)"ifcAi E7 0 i ~. HEX DUMP OF RECORD HEADER. 40-83-0-8{i~o--c:C-C:-0:-C:-O::-:0::-:O::-:O:-----:O--:O---=7---=5117F ~ g. 11494e95 o a00---0506 DC 7C ---8-6DEja 00 000 a0 0 00 000 00000 OOzo OOF4EE90 OOOOOC9C OOCA7DF4 00CA1F08 - - ---- - - OJ40 -0)000::)00 JOCA 7B28--~Ci=-4BC;B4---o-6cJ'.7664 0060 00000000 00000000 070C2000 OOF4S666 --------"008-0-00020000--66-2ci72"0----co-(000-00 000E3-boo OOAO 00100001 OOCA7CDO CCCCOCOD 40F4B774 ------OOCO-SOF4BbE6--0C000004--00000000--0000:)OOO OOEO OOOOOGOF CCCCCOOO 04CAI000 00000000 - - - - - 0 1 0 0 - 00000000--OJOOoo-00--60000 0-0-0--000-00 000 0120 ooooeooo C9C4C4E6 (9404040 CQC4C4E6 -----o~i4o_o-i)o05:)-6--3-ocooo-6-0---c-cC J2~2a ooc-~Tf28 0160 00000000 00000000 000000-00 40000000 -----0-180-4534000 o - - c Ccecco o--c"CO{)C 00-6-----00-600-660 o LAO C1C4D7F2 6Bf5C9LJ6 4003D406 C47EC9C4 ------':OlCO-5D407E40--FOFOC-3-t~i{iF-O-FO--6B-ci4i5E5 OlEO 00000000 CCCCOCOO CCCOOCCO OOOCOOOO ~. .: (D "" Figure 2-5. 015802AO D506D5C5 60C6D909 000 00000----0 0001 0 0 0--0-0-000 0 00 0 OUE 3000 00100001 OCCA7CDO 00000000 4aF4B17~ -S-OF483E6--0a 00 0 J)4--0000000-0---0000 OJOO------00020000 00202720 070S0000 OQF4E8FO OO-F4EE-90 OJoooE9{:---OO-CA1BF4 OOCA IF 08=-------OOOOCOOO 00CArB28 50F489B4 00C~7B64 6-o-o-o000·0--ooooooo-0--ooooci-oo-0--o(iuOo-coo-'----'-----OOOaOlJO OJCCES50 00000000 00800000 0000000-0--600:)00-0-0--00000000--00000000'-----C9E3D9D4 C9C4C4E6 C9C6)QD9 OOCCE500 OOc"i3-2tDO OJCBZ-S20--a70Ct32t.C-8 00c"B2AF~8'------DSiJ6DSC-S 60C60909 00380040 00020008 00 6"Ci;o4-S--Dt" [)& c-2155----c104(: S-iE--O-S!: 6F JF 1----------C4E6C968 EoC9C5E7 C3D702C1 4DC9DbC2 C-4-E2C3C-2--~fD-iE-F"(:lFO C3clF--i(s---F7F040CO'-"------00000000 00000000 00000000 OOOOOJOO SYSl.LOGREC Software Incident Record 2 (part 2 of 2) ~-- ~ 00023782- £-:::.... Use of Recovery Work Areas For Problem Analysis (continued) The following explanations refer to Notes F-I in Figure 2-5. Note F: Look at the status bits that appear approximately midway through the example. One additional bit has been turned on in this entry that was off in record I. This indicates that this routine received control through percolation (SDWA + X'EA' =X'IO'). This indicator (that is, SDWA + X'EA') is not set when recovery processing goes from the last active FRR to the current ESTAE. Note G: The name of the FRR in this example is IDDWIFRR. The completion code and the register contents are the same as in record I. Note H: Look at the 'User Variable EBCDIC Data' field. This area gives the location of two more control blocks that can be used in determining exactly what failed. These two control blocks are: • lOB, located at address X'CB2BOO' • VDSCB, located at address X'CA 7E70' Note I: Compare the lOB and VDSCB addresses to the 'Dump Ranges Area' that have been specified. The VDSCB does not fall in one of these ranges because it is part of the SWA. Using the 'Diagnostic Aids' section of. OSjVS2 VIO Logic, you can identify the other two dump ranges that are printed. Included in these ranges are the current channel program and the DEB. Section 2: Important Considerations Unique to MVS 2.4.9 Use of Recovery Work Areas For Problem Analysis (continued) SYSl.LOGREC Software Incident Record3 Figure 2-6 illustrates a SYS1.LOGREC software (source) entry that has been recorded as a result of a·program check (type). The following explanations refer to Notes J-N in Figure 2-6. Note J: Because there is no completion code in register 1 for this type of entry, look for the completion code in SDWA+X'4' (in this case OC9). Note K: Check the status bits. They confirm the fact that the failure was a program check that occurred while an enabled RB was in control. NoteL: The 'Dump Characteristics' bits are on only if the functional recovery routine issues a SETRP macro with the DUMP=YES operand. This macro uses the SDWA to contain its dump options and these are the fields formatted in the LOGREC entry. Functional recovery routines can also take dumps by issuing the SDUMP macro. The SDUMP macro uses a different area for its dump options. You might receive a dump of certain failures even though the LOGREC 'Dump Characteristics' are zeros. Check the byte at displacement X'4' into the SDWA. This flag is turned on if a dump was requested by a SETRP, CALLRTM, or ABEND macro. As a general rule, ESTAE routines are the most common users of the DUMP and DUMPOPT operands of the SETRP macro. Since the OC9 abend code in this LOGREC entry was for a problem program (an enabled RB in contro!), a dump would also be taken if the job had a SYSUDUMP, SYSMDUMP, or SYSABEND DD statement in its JCL. Note M: There is a dump associated with this failure because location SDWA+4 (X'80') indicates a request for a dump. This can be seen from the unformatted record. Note N: For this entry, the data in the variable recording area (at X'194' under 'Hex Dump of Record') is not formatted under 'User Variable EBCDIC Data'. This data is formatted by specifying an option (see Note P) in the individual recovery routine. Note P: The two bytes of SDWA + X' 190' specify the length of the variable recording area that starts at X'194'. In the two bytes at SDWA + X'192': the first byte specifies how the routine wants its data in the variable recording area printed (X'80' for unformatted hexadecimal, X'40' for hexadecimal and formatted under 'User Variable EBCDIC Data'); the second byte gives the length of the data. It is often helpful while reading LOGREC entries to refer to the SDWA layout in the Debugging Handbook for additional information about individual bit settings. 2.4.10 OS/VS2 System Programming Library: MVSDiagnostic Techniques DATE TIME CPU CPU - - ---",., ___. __,___ , _______ .____ . _ _ O.\X._Y~_Ht:L.r"f04,_ SS,.~ER..IAL-IQ RECGRO ENTRV,SOURCE - SOFTWARE TYPE PROGRAM CHECK 117 7S 20 31 01 50 023782 0158 __~~~~?-_____~E~R__ RORID=SEQ00055 CPU0040 ASID001B TIME20.31.01.0 JuBNA~~ PELEASE LEVEL, _ _ __ VS 2 REL. 03 N~01ACP2 ABEI\:DING PROGRAM NAME" ,__,_N/A NAME OF MUDULE INVULVED IGG0325A ,_,NA··~E .OF CSECT INVOLVED , _____ IGGC32,5A FUNCTIO~AL RECOVERY ROUT!NE IFGCRRCA B,c...1..C.P_E_P5jL~L.Ll~CO.LEB.B.a1L- B.L.l1.0DLP_S}f~LA.S_LP..JoL8_ _ __ EE_85Q02.(L5_0CJ~5.a,9,O,_ _ _ __ FFO_409_Q!L!t.9,021CCQ ReGS AT TIME Of ERROR -REGSO=-7---cf6003'i30--0-0EB'2E8800cB2EEO __R.E-'~S _ ?=..1_? _ _ 0 09J>9g_oO_Q.Q.QQQ9Q9~~,~QJLQ 0 00C82CBO 00E63iac OOOOFFFF ceo v ~~OO ,2E~~,A__oi~ft2!=J~_8 __E~~~,AT TIME OF ABEND 070COOOO 00021CCO ADUITIJNAL INfQ: It-lST_LE~G:rH CODE _ _ _ _ 02 I~TER~U?T CODE' 0009 __ VI.BLApq!LOF T~A~S_,p~,EP_ _09:p_'!!!.Q,4C REGS OF RB LEVEL UF EST,AE fX~I~T~O~R~I~E~R~O~F~O~R~ES~T~A~I~ FFfF1FFF 60E6323E 80001E70 00021~~~4=E _ _ _ _ _ _ _ _ __ EC,?SW FRUM ESTAE Q.B(O FOtt ESTAI) 01OCOOOO 00021CCO ADDITIONAL INFO: .1 NSLLENGJH::COD_!; 02"-_ _ _ _ _ _ _ _ _ _ _ _ _ __ INTERRUPT CODE 0009 VIRT ADDR OF TRA~S~E!:.!X~C!:<..!E!:..!P~~0!..!,0u,Da4~4~6~4~0'--_ _ _ _ _ _ _ _ _ __ ___________________________________________________ ttEGS 0-7 00003230 OCCB2E88 CCCB2EEO 00CB2CBO OOE631BC OOOOFFFF --r([:GS-9::i-5--0000000'O-'-OCOOOo-oo---6oooocioo---ao003230-000-ZFFCA-oDCB2E88 FFFF1FFF 80001E10 60E63Z3EO"Oo21C4=-e------------ ---~CH-FTAGBYTE ~ (") g' ~ ~ ~ g..... n g l:!/. ~ ; g .." o e. .g (1) s- RI/.TCHE" LJ,.lCK _0 ____________ _ VALI0 SPIN 0 5~~ LeCK 0 0 rUSCAT LOCK 0 IOSCAr LOCK~DRO OOJaOOOO UPDATED ~EGS FU~ R~TRY -F~ E-E-~fcA-bEFG~ E-R ~ TRY--O--ro S uc e L nCK 0 O-S-UC- S---L i(-";ORO 00--0 J-O-OOO~-----------___________________ IOSLCH _LGCK,_ 9__ IOSLCH _lOCKWO&D OOOOOOOO~_ _ _ _ _ _ _ _ _ _ __ IOSYNCH LOCK 0 IOSYNCH LOCKWORO 00000000 !\oCB LeCK __ O_NCB lOCKwQRD_ 00000000_ _ _ _ _ _ DNC8 LeCK 0 UNce LOCKWORD aaoeoooo _ _ _ _ _ _ _ _ _ _ _ _ _ _ _---'AC_B(:L~ryLL_QCK 0 AC ODESS~KWORO OOOOJ.QQ.--l/O~_ _ _ _ _ _ _ _ _ _ __ ASMPAT LOCK 0 ASMPAT lOCK wORD 00000000 ______________ SALL_CC __ I-OCK Q. ASID CURRENT 0018 CMS LOCK 0 L ~ __________ . LO~_~_L_L_Q~!<~_ _ _ _ _ _~O~_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ DUMP CHARACTeRISTICS DUMP RANGES AREA DU~'P FLAGS SDATA GPTIor-;s PDATA UPTIO~S FROM TO SNAP uUHP REQUEST 0 DISPLAY NUCLEuS 0 DISPLAY SAV~'AREAS 0 RA~GE 1 00000000 OJOOOOOO -P-i.R:1lISf--sUPPLIEO-------O--OIS-PL-AY-SQA-- -0 D-i-s?L-AY-SAVE-AR-EA-tiEA~o~NGez-000-00o-oo-000oo-0-60 _$_T_JJLAG~_l_lS.L_S_U~PlJEP_ _ _ J)__ DI SP_LA'CLSQ_A 9_-_DI$~L_"Y__R.EGl.Slf~S ______{)_ _R_~_NG_E._3 __-_QQ900Q(Hl_O_QP_Q90J~_Q..~__ J DISPLAY SWA 0 DISPLAY TASK LPA MODULES 0 RANGE 4 00000000 00000000 _0t e _-_OJ sP-UY~r..L.lR_ACLUJH__E 0 D I S£...LA Y TASK JP A ~DDU.L00E.....S~..lo!O~_ _ _ _ _ _ _ _ _ _--,_ _ _ __ N""",----Note M 01 SPLAY CCNTRCL BLOCKS 0_ DISPLAY PSW 0 ~ _______ ._:.. _______~ _0J S~l,..!LOC~.l.OEl_S 0 01 SPlA_Y_VSE..B_SJJ.ae...DQ_LS _ _ -2 HEX D~MP OF PECJRD _ _ _rtI;A O_[~_ _ 42 830800__ CC QQJ2Xl 3._2,_'__01.:> 8 J4A.O _ _ Q5_~_Q.F_O£l_ _C_1 C~Q.1~2~_ _ _ __ i ~ ci GLOBAL LOCKS TO BE FREED ADDITIONAL PROCESSING ~ ;... (ic N Note 52 em ~ -"-' _ _ _ _--=O_Q..Q~OOC B_2C 9~5.S 90 OOOO~_'-3...L-Q9~_e 2....,E~8,..,8:<-_ _ _ __ C020 00CB2tEO OCCS2CBO OOE631BC OOOOFFFF fFFF7FFF 8JJJ7F.70 ooooa~oo 00000000 _ _ _ _ _O~)40_00000QJO _ _ 0000323_Q_ _OPu2f.FC_A_ _aO(:JR~~8 60f;63_VE_ _000_2.1C_4~_ _00CCO_~5_8_ _0.o00000P~_ _ _ __ 0060 00000000 OCOOOOOO 070COeOO 00021CCO 00020009 00044640 070COOOO 00021CCO ________ 0)8 J __ OJJ20009___ 00044640_-~_00J03 23Q___ COCB2 E8$ OOCg2EEO_ _OQCB2C~_Q _ _ 00~531_~C___OOOOFS:-ff,_ _ _ _ __ OOAO FFF~7FFF 800J7E70 00000000 OOOOOOJO 00000000 00003230 0002FFCA 00CS2EdS _ _ _ _ _-'QQc;._~OE6323E_ _G_CC21C_~E_ _ _ E~Q_O_Q2~~O_Q_OOO_0_QQ_ ~_Q(H>-QO_QL_..QO_OAQ;>QQ Qu_Q9~J_O_;>'L-...Q..;tO_v.000",->O,,--_ _ _ __ O:lEO OOI)OOOOF 00000000 40040000 00001001) Note 00000000 00CA2F4S OOOOOJOO 00800000 . _ _ _ _ _ o1 OQ__OOOOOOOO _ _OCCOOOOO_ _ - _OQ_OQ90QO_ _OQOOoP~tO_ P ___Ooo.oOO_0_0_ _ O.QQ.o_OQQO _ _ _OQQOQ900 0900000_0"--_ _ _ __ 0120 00000000 C<;C7C7fO F3F2FSCl C9C7C7FO 'F3F2F5Cl C9C6C7FO D909FOCl _ 00CA2EF8 _ 0140 OOOJOOOO 00000000 00000000 00000000 ~OOOOJOOO oboooooo 00000000 00000000 -----0160-00COGOOO---eC000000--00000000--0000-0000 ~OOJO-O--cio-o6oo-oo--6o-376-cioo---00-HioOOB'------0180 45300000 00000000 OCCCOCCO OCCCOOOO~6C§018 AOOOO.)OO 00CB2A30 OOEACCAC ~ ~ g. = ..0' c(p rA -----O::...:i-A-0-OOO-o-0-ooO---oo-oooifoo--OO-CA2FSa-c47E"c"ic4 Oleo 50407E40 FCFCC3C2 FCC2FOFO -----=oi"Eo-oa-oo-ooo~oocjo-Oo-o()ocioo-oo Figure 2-6. ~ SYS l.LOGREC Software Incident Record 3 (Part 2 of 2) 6BC14DE5 00000000 {4E6C-9~E('C9C5E7 C3:>768(: 1 C4E2C3C2 00000000 C3~4i-4~3 5D7EFJFO OOOJOOOO 4DC9-D-6-C~2------ ___Cpf]40pa 00000000 0000000.",0------ Use of Recovery Work Areas For Problem Analysis (continued) Important Considerations About SYSl.LOGREC Records As shown in the three incident records the LOGREC records are mostly SDWAs the system supplies, plus variable user data areas the individual recovery routines supply. Following are some special considerations pertaining to specific portions of LOGREC entries: I • . Jobname - If the jobname is "NONE-FRR", this indicates that the record is generated by an SRB's FRR (Functional Recovery Routine) or the current ASCB was invalid. • "BC mode PSW at Time of Error, of Last RB" - You can ignore these fields. • "EC PSW from EST AE RB (0 for ESTAI)" - This field has the following possible meanings: a. If the ESTAE is associated with an RB level other than the one encountering the error, this is the PSW at the time that the RB level associated with the ESTAE last gave up control. Note: If this is the case, the "RB of ESTAE Not in Control" flag should also be set. If the ESTAE is associated with the RB level in error, the PSW is equal to the "EC PSW at Time of ABEND" because the last time the RB level gave up control was when the error occurred. b. If the record was generated by an FRR, this is the PSW used to pass control to the FRR and is therefore the address of the FRR. c. If the record was generated by an FRR (that is, a locked/disabled routine is in control, or the system is in SRB mode), and the "EC PSW at Time of ABEND" is equal to the EC PSW from ESTAE RB, this is a system-generated record. • "Regs of RB Level of ESTAE Exit or Zero for EST AI": a. If the EST AE exit is associated with the RB level that encountered the error, these registers are the same as "Regs at Time of Error". b. If the EST AE is associated with an RB level other than the one encountering the error, then these are the registers at the time that RB last gave up control. Section 2: Important Considerations Unique to MVS 2.4.13 Use of Recovery Work Areas For Problem Analysis (continued) c. If this is an FRR-generated record, the two sets of registers are identical. However, if the FRR or ESTAE has updated the registers for retry, these registers are the new, updated registers. • "SVC by Locked or SRB Routine" - This indicator can be misleading. A forced SVC 13, which is often the way FRR-protected code passes control to recovery, also causes this flag to be set if the SVC occurred in locked, disabled, or SRB mode. Although the flag is set, this situation is not a key, error indication in itself. The analyst must investigate why the issuing routine invoked SVC 13. • Error Identifier This field, as described in recovery termination management (Section 5), contains pertinent information regarding the error described by this SYSl.LOGREC entry, and provides a correlation to other SYSl.LOGREC entries. Related software and MCH records have the same sequence (SEQ) number that allows the correlation of records written in a particular recovery path (that is, FRR and/or ESTAE percolation, or MCH and subsequent software entries). For locked, disabled, or SRB routines, the processor identifier (CPU) indicates the processor on which the routine was running when it encountered an error. A zero processor identifier indicates that the record was written by an ESTAE ,routine (that is, the processor identifier is not uniquely identifiable because the ESTAE routine may be executing on a processor other than the mainline). ASID indicates the current ASID at the time of the error. TIME indicates the tirne that the ERRORID was generated. It is normally very close to the time that the record was written, as indicated in the first line of the record. TIME can be used to chronologically order related SYSl.LOGREC entries that contain the same SEQ number. This ordering is useful in reconstructing the environment as it was at the time of the error. If an SVC dump is taken, the ERROR ID as it appears if(the SYS1.LOGREC record, will also appear in the SVC dump output and associated IEA911 I message. Do not be concerned if the ERROR ID sequence numbers seem to have an increment of more than one. Although the RTM adds one to the sequence number of each unique entry (not percolation or recursion), there may be no associated recording of the error, thus, the sequence number is updated internally but is not always'externally written. As shown above, the SYSl.LOGREC data set is a vital tool in debugging. At times, the information in the LOGREC printout can be used to describe the entire problem situation. A search of Retain for the CSECT, FRR, and abend code will often 'identify the problem as a known one. SYS1.LOGREC Recording Control Buffer This is one of the most important areas to be used when analyzing problems in . MVS~ The previous discussion of LOGREC records analysis generally applies to the in-storage LOGREC buffer as well. 2.4.14 OS/VS2 System Programming Library: MVS Diagnostic Techniques Use of Recovery Work Areas For Problem Analysis (continued) This buffer serves as the intermediate storage location for data that the recovery process uses after it has completed but before the data reaches SYSl.LOGREC. The physical I/O is done from this buffer. Its real significance is in the error history it displays. Also, any records in the buffer that have not reached SYSl.LOGREC are almost certainly related to the problem you are trying to solve. Formatting the LOGREC Buffer The in-storage LOGREC buffer can be formatted by specifying the LOGDATA verb under AMDPRDMP. This verb causes the entries still in the buffer to be formatted in the same manner as those printed from SYSl.LOGREC. For detailed information on how to invoke the AMDPRDMP service aid, see OS/VS2 SPL: Service Aids. I Finding the LOGREC Recording Control Buffer The CVT + X'23C' points to the RTCT (recovery termination control table); and RTCT + X'20' points to the RTMRCB (LOGREC recording control buffer). The buffer always resides in SQA on a page boundary, is 4K bytes in length, and is generally located just beyond the trace table. Scanning the EBCDIC portion of the dump following the trace table usually leads you to a series of module/job names that are part of the individual records. Format of the LOGREC Recording Control Buffer The LOGREC recording control buffer is a "wrap-table" similar to the MVS trace table. The entries are variable in size. The latest entries are the most significant especially if they have not yet been written to SYSl.LOGREC. Knowing the areas of the system that have encountered errors and the actions of their associated recoverY routines, information obtained from SYSl.LOGREC and the LOGREC recording control buffer, helps provide an overall understanding of the environment you are about to investigate. Figure 2-7 shows the format of the buffer and Figure 2-8 shows the format of individual records \Yithin the buffer. Section 2: Important Considerations Unique to MVS 2.4.15 Use of Recovery Work Areas For Problem Analysis (continued) o 4 RCBBUFB start of record area t RCBBUFE end of record area t E. C 8 RCBFREE next available space t RCBFLNG number of bytes available X'40' 10 RCBDUM Dummy Displacement I SRB used to post Recording Task in Master ( Address Space in order to ) write record to ( SYS1.LOGREC X'50' Missing Record Header - This record shows the number of times space was requested but was not available. X'58' Processor serial number X'59' LCNT Missing record count X'5E' ( FLGS SRB in use flag RCBTLNG Total buffer length ) I \ If the record contains a counter or is present in SYS1.LOGREC, you have a good indication of a recovery loop. X'60' = first possible record header Figure 2-7. Format of the LOGREC Recording Control Buffer Record Header 2 o Length of Record Record Types Record Type - Options 4 3 6 8 ASID for POST Options 10 C ECB Reserved Actual Record X'80' This record wraps around from the end of the buffer space back through the beginning. X'40' This record is to go to SYS1.LOGREC. X'20' This record is a WTO. - X'08' Record not buffered; the address ofthe record exists at X'10.' X'04' The recording requestor is to be posted when the record is written. X'01' Record is ready to be written. If not set, the record is still being constructed. . . Note: The beginning of the actual record + X'20' is the start of the SDWA for software reoords. The SDWA contains software diagnostic information at the time of the error and is mapped in the Debugging Handbook. Figure 2-8. Format of Records, Within the LOGREC Recording Control Buffer 2.4.16 OS/VS2 System Programming Library: MVSDiagnostic Techniques Use of Recovery Work Areas For Problem Analysis (continued) FRR Stacks The FRR (functional recovery routines) stacks are often useful for understanding the latest processes on the processors. Entries are added and deleted dynamically as processing occurs. The PSA + X'380' contains the pointer to the current stack. The format is described in Data Areas section of the Debugging Handbook under FRRs. Experience has shown that the normal stack (located at X'COO' in each. PSA) is perhaps the most useful, although all stacks have been beneficial on occasion. The FRR stack +X'C' points to the current recovery stack entry. (Unless the FRR stack +X'C' matches FRR stack +0, in which case no recovery is present on the stack.) This entry +0 points to the recovery routine that is to gain control in case of error. The entry +4 contains flags used for RTM processing; a X'80' indicates this FRR is currently in control, a X'40' indicates a nested FRR is currently jn control. The next 24 bytes serve as a work area for the mainline function associated with the FRR pointed to by this entry. This parameter area may contain footprints useful to your debugging efforts. The previous entry in the stack (X'20' bytes in front of the current) represents the next most current recovery routine. Only the current and previous entries are valid. The stacks do contain residual information associated with recovery that was previously active but is no longer valid. You should not rely on any information beyond the current entry. Also consider the case where: A gains control and establishes recovery; A passes control to B; B establishes recovery, performs its function, deletes recovery, and passes control to C; C establishes recovery and subsequently encounters an error. The FRR stack will contain entries for module A's and C's recovery routines. There is no indication from the FRR stack that B was ever involved in the process although it might have contributed to or even caused the error. The debugger gains an insight into the process but is not presented with the exact flow. Although you can get an idea of the general process or flow, do not make assumptions based solely on the FRR stack con tents. If you have trapped a specific problem, the stacks often contain valuable information. The same is true of a stand-alone dump taken because of a suspected loop. If RTIW +0 (at FRR stack +X'lO') is not zero, the FRR stack contains current, valid data. Following are some of the more valuable fields in the FRR stacks from a debugging viewpoint: Section 2: Important Considerations Unique to MVS 2.4.17 Use of Recovery Work Areas For Problem Analysis (continued) 1. FRR stack + X'10' - RTM 1 work area (RTIW) In the case of an error, the RTIW + 2 (FRR stack + X'12') field indicates the error type as follows: 2 3 4 5 10 11 12 13 15 20 - program check restart key SVC error (SVC was issued while in locked, disabled, or SRB mode) DATerror machine check paging I/O error abnormal termination branch entry to abnormal termination (compatibility interface) cross memory abnormal termination memory termination MCH (machine check handler) 2. RTIW + X'34' (FRR Stack + X'44') - address of system diagnostic work area (SDWA) If no pointers can be found, the SDWA for each supervisor FRR stack can be found at X'20' bytes past the start of the last entry in the respective stack. (FRR +4 points to the last entry.) The SDWA for disabled errors on the normal stack is at X'330' bytes past the start of the last entry on the restart stack. (pSA +X'3B8' points to the restart stack.) 3. RTIW + X'40' (FRR stack + X'50') - mode at entry to RTMI X'80' X'40' X'20' X'10' X'08' X'04' X'02' X'O l' - supervisor control mode (pSASUPER=t=O) phYSically disabled mode global spin lock held global suspend lock held local lock held Type 1 SVC mode SRB mode - unlocked task mode This is the system mode at the time of entry to RTM 1. The mode may change as processing continues through recovery; the current mode is at RTIW + X'41 ' (FRR stack + X'51 '). 2.4.18 OS!VS2 System Programming Library: MVS Diagnostic Techniques Use of Recovery Work Areas For Problem Analysis (continued) Extended Error Descriptor (EED) The extended error descriptor (EED) passes error information between RTM 1 and RTM2 and also between successive schedules of RTM 1. The EED address is found at RTIW + X'3C' (FRR stack + X'4C'), at TCBRTM12 (TCB + X'104'), or in the RTM2 SVRB at X'7C'. The EED is generally not present because RTM2 releases it early in its processing. The EED is described in the Debugging Handbook as part of the RTIW. Important EED fields are: EED+O pointer to next EED EED + 4 (byte 0) - description of contents of the rest of the EED BYTE 0 = 1 - software EED 2 - dump parameters 3 - hardware EED 4 - errorid EED For a software EED: EED + X'C' - registers 0-15 EED + X'4C' - PSW/Instruction Length Code (ILC)/Translation Exception Address (TEA) at time of error RTM2 Work Area (RTM2W A) This is the work area used by RTM2 to control abend processing. Registers, PSW, abend code, etc. at the time of the error are recorded in the RTM2WA. This area is often useful for debugging purposes and is described in the Debugging Handbookby RTM2WA. This work area can be found through TCB + X'EO', or RTM2 SVRB + X'80'. Formatted RTM Control Blocks I RTM control blocks are formatted either by AMDPRDMP as a TCB exit with the FORMAT, PRINT CURRENT, and PRINT JOBNAMES control statements, or with the ERR option under SNAP/ABEND. With the exception of the RTCT, the formatted control blocks are all TCB-related, and are formatted only when they are associated with the TCB. The formatted control blocks are: • RTCT (recovery termination control table) - formatted with the first TCB of the current address space on the processor on which the dump was initiated. (This control block is formatted only by AMDPRDMP.) • FRRS (functional recovery routine stack) - has the RTI W embedded within it and is formatted with the current TCB if the local lock is held. (This control block is formatted only by AMDPRDMP and it is mutually exclusive of the IHSA). Section 2: Important Considerations Unique to MVS 2.4.19 Use of Recovery Work Areas For Problem Analysis (continued) • IHSA (interrupt handler save area) - has the FRR stack saved within it and is formatted with the TCB pointed to by the IHSA, if the address space was interrupted or suspended while the TCB was holding the local lock. (This control block is formatted only by AMDPRDMP and it is mutually exclusive of the FRRS) . •. RTM2WA (RTM2 work area) ...:. fonriatted if the TCB pointer to it is not zero. • ESA (extended save area of the SVRB) bit summary - formatted only if the RTM2WA formatted successfully and the related SVRB could be located. • SDWA (system diagnostic work area) - formats the registers at the time of error only if the ESA formatted successfully and the SDWA could be located. • EED (extended error descriptor block) - formatted if the TCB or RTIW pointer to it is not zero. • SCB (STAE control block) - formatted under AMDPRDMP for abend tasks only. It is formatted under SNAP/ABEND whenever the TCB pointer to it is not zero. , System Diagnostic Work Area (SDWA) Use in RTM2 This work area is used to pass information to ESTAE recovery routines·. It is found by: SVRB + X'80' points to RTM2WA; RTM2WA + X'D4' points to SDWA. Also, register 1 contains the address of the SDWA when the recovery routines are entered. 2.4.20 OS/VS2 System Programming Library: MVS Diagnostic Techniques Effects Of Multiprocessing On Problem Analysis The multiprocessing (MP) capability of MVS allows two processors to share real storage using one control program. (MP refers to multiprocessing on both multiprocessors and attached processors.) MVS also functions on a uniprocessor configuration, which may be only one processor configured out of what is otherwise an MP system. In MP mode, each processor has addressability to all of main storage and executes under the control of one set of supervisor routines. Because various queue structures must be processed in a serial fashion, interlocking facilities are implemented in both the hardware and software to allow serialization of portions of the control program where conflicts may arise. Queue structures that don't require serialization are processed in parallel, that is, without regard to the other processor. Features of an MP Environment The main features of a multiprocessing configuration are: PSA - Each processor has a unique real storage frame, called a prefixed save area (PSA), referenced with addresses from 0 to 4K. Its lOcation in real storage is determined by the processor's prefix register. Inter-Processor Communication - Malfunction alerts (MFA) are automatically generated by failing processors before entering the check-stop state. Other interprocessor signaling is accomplished with the SIGP instruction. (This feature is discussed in detail later in this chapter.) VARY Command - Performs three functions: (1) dynamically add or remove a processor from the configuration; (2) dynamically increase or decrease the amount of useable real storage; (3) control the availability of channels and devices. QUIESCE Command - Quiesces the system so that I/O pools or two channel switches or both can be reconfigured~ locking - Access to various supervisory services is serialized by means of a software locking structure. Dispatching - Assures that highest-priority ready work is processed by available processors. PTLB (purge translation lookaside buffer) - When an entry is to be invalidated in a page or segment table, the translation lookaside buffer (TLB) on every processor must be purged before permitting subsequent references to the corresponding virtual address. Timing - The TOD clocks must be synchronized among the configured processors. Section 2: Important Considerations Unique to MVS 2.5.1 Effects of Multiprocessing On Problem Analysis (continued) RMS - When components of the hardware operating system fail, it becomes the responsibility of the recovery management support (RMS) to help define the extent of the damage. Compare and Swap - Two instructions assure interlocked update operations. They are Compare and Swap (CS) and Compare Double and Swap (CDS). References to storage for these instructions are interlocked the same way as the Test and Set (TS) instruction. lOS - lOS has the ability to initiate I/O activity to a device from whichever processor has an available path. ACR - When one processor fails in an MP configuration, the alternate CPU recovery (ACR) function attempts to take the failing processor offline so that system operation can continue with the remaining processor. (See Miscellaneous Debugging Hir:tts). CPU Affinity - The ability to force a job step to execute on a particular processor is a feature of MVS. (For example, because an emulator feature is generally installed on only one of the processors in an MP environment, processor a:ffmity will force the execution of programs that require this feature to the proper proc~ssor.) MP Dump Analysis Experience with MVS has shown that there are comparatively few bugs unique to MP. Usually, problems encountered in an MP environment could also be discovered in a UP environment. The increased in~eraction (parallelism) between software components in an MP environment tends to increase the probability of hitting bugs that are not unique to MP. Thus, the. odds are that the dump you are trying to debug could also occur on a UP configuration. The first step of MP dump analysis is to determine conclusively that it is an MP dump. To do this, you must find the common system d~ta area (CSD). The CSD address is located at offset X'294' in the CVT. The halfword CSDCPUOL, at offset X'A' in the CSD, gives the number of processors currently active. If this number is two, you are looking at an MP dump. For the rest of this discussion, we will assume that CSDCPUOL=2. Several other fields in the CSD are informative. For example, the byte CSDACR at offset X' 16', indicates whether or not ACR is in progress. ACR in progress (X'FF' in CSDACR) indicates that one of the processors in the configuration is becoming inactive. If this is the case, the problem may be the result of a failure during ACR processing, and the MP dump will probably present at least two problems: 2.5.2 1. A hardware failure causing ACR to be invoked. 2. A failure during ACR processing. (See the discussion on ACR processing in the "Miscellaneous Debugging Hints" chapter later in this section.) OS/VS2 System Programming Library: MVS Diagnostic Techniques Effects of Multiprocessing On Problem'Analysis (continued) Data Areas Associated With the MP Environment There are several processor-related areas with which you should be familiar: 1. The PCCA (physical configuration communication area) 2. The LCCA (logical configuration communication area) 3. The PSA (prefixed save area) There is a set of these control blocks for each processor located as follows: CVT + X'2FC' points to the PCCAVT (contains the address of a PCCA for each processor) CVT + X'300' points to the LCCAVT (contains the address of an LCCA for each processor) PCCA + X'18' points to the virtual address of the PSA for that processor PCCA + X'1 C' points to the real address of the PSA for that processor The PSA is the "low storage area" (first 4K bytes of storage) and it contains, among other things, the hardware-assigned storage locations. System/370 Principles a/Operation details the prefixing mechanism the hardware uses to reassign a block of real storage for each processor to a different block in absolute main storage. Prefixing permits processors to share main storage and operate concurrently. The PCCA contains information about the physical facilities associated with its processor, the LCCA contains save areas for use by the first level interrupt handlers (FLIHs). The need for processor unique areas arises, for example, because external interrupts could occur simultaneously on each processor, and therefore a processorrelated area must exist for status saving by the external FLIH. Such areas are in the processor's LCCA. After locating these control blocks, you can determine several ' IIthings about the status of each processor. • The PSWs at the time of the last program, I/O, SVC, external, and machine check interrupts for each processor (PSA) • The general purpose registers at each interrupt (LCCA) • The mode (SRB or task) of each processor (LCCA) • The last program interrupt on each processor (PSA) • The address of the device causing the last I/O interrupt on each processor (PSA) In addition, a work/save area vector table (WSAVTC) pointed to at LCCA + X'218' is associated with each processor. This vector table contains pointers to processor-related work/save areas. For example, there is a large save area for use by , ACR, whiqh is pointed to in the processor's WSAVTC. It is important to be aware of the existence:of these processor-relatedareas because GTF, SRM, ACR, lOS, etc., use them; but you must narrow your problem to one of these processes (such as GTF, SRM, etc.) before the information in the associated work/save areas become helpful. _ .Section 2: Important Considerations Unique to MVS 2.5.3 Effects of Multiprocessing On Problem Analysis (continued) Parallelism The most important characteristics of the MVS MP capability is parallelism. In looking at MP dumps, you must always remember that any two processes run in parallel and reference the same main storage locations. As a result, queue structures and common data areas are vulnerable. In order to preserve their integrity, the system must insure that they are accessed serially. The resources that must be serialized in order to guarantee their integrity are called serially reusable resources I (SRRs). The use of shared resources is undoubtedly the key item to be kept in mind in debugging an MP dump. There are various mechanisms available for serializing SRRs: • • • • • • • • ENQ/DEQ WAIT/POST Disablement Locking Compare and Swap (CS) instruction Non-dispatchability Test and Set (TS) instruction RESERVE/RELEASE Obviously all users of a particular SRR must use the same serialization mechanism. The integrity of an SRR is not enhanced if one user employs locking and another uses ENQ/DEQ. The point is to understand the processes going on in both processors at the time of the failure. The processor on which the failure occurred may not be the one that caused the problem. Use of the work/save areas pointed to from the ASXB is a good example. These areas are serialized with the local lock . The following diagram shows what could happen if the same address space is running on both processors and one of the processes involved fails to serialize properly. PROCESSOR 0 PROCESSOR 1 • • • • • •Gets local lock •Branch enters validity check routine • • Branch enters validity check routine Releases local lock • • In this example, assume that the process executing on processor 0 fails to get the local lock before it branch enters the system validity check routine. The validity . check routine uses the local lock to serialize one of the save areas mentioned above in order to save the caller's registers. The registers saved by the validity check routine on processor 1 can be overlaid by the registers saved by the validity check routine on processor O. Thus, the failure would be encountered on processor 1, but the processor 0 process would be the one that caused the failure. 2.5.4 OS/VS2 System Programming Library: MVS Diagnostic Techniques Effects of Multiprocessing On Problem Analysis (continued) OIINI (OR Immediate and AND Immediate) instructions also illustrate this phenomenon. These instructions take more than one machine cycle to complete (that is, the operand is fetched, altered, and then stored). In previous operating systems, physical disablement and UP environments were enough to insure the completion of one instruction before another was executed. In MVS, with multiple processors, this is no longer true. For example, suppose processor 0 issues 01 and the operand has been fetched. Before processor 0 stores the changed byte, processor 1 executes the fetch cycle of an NI instruction to change a different bit in the same byte. Now, processor 0 stores the original status plus the 01 change; subsequently the NI instruction completes, which erases the effect of the 01 on the same byte. In MVS, locking is used to solve some of the problems arising from such multi-cycle instructions. When locking is not an appropriate solution, the CS instruction is. CS serializes the word containing the byte against the other processor. The point is that in debugging an MP dump, both processors must be considered because interaction between processes and shared resources is generally-the key to solving the problem. When a program serializes an SRR incorrectly, other programs can alter the SRR before the first program completes its update. The other programs may be running on the other processor, or they may have received control .on the same processor because the first program was pre-empted (for example, SRB suspension because of a page fault) before completing its update. Proving that a problem resulted from incorrect serialization is accomplished by fmding both the "other" program and the window (an interval in which a program opens a serialization exposure is called a window) . . The system trace table can sometimes be used to find potential "other" programs. If the occurrence of the error has not been overlaid in the trace table, it may be possible to reconstruct the series of events leading up to the failure by: 1. Listing all events on that proces'sar, in order, using the logical processor address field in each event's trace entry 2. Making a similar list of all of the events on the other processor 3. Comparing the two lists to see if the processes executing in parallel on the processors are altering a common resource Try to relate these two processes to the serialization problem that caused the dump. The existence of the window is confirmed by reading the code that alters the state of the SRR and finding where the two programs serialize improperly. Section 2: Important Considerations Unique to MVS 2.5.5 < Effects of MultiproceSSing On Problem Analysis (continued) General Hints For MP Dump Analysis The following is a list of general hints to help you analyze an MP dump. 1. The use of PRIORITY and DPRTY parameters no longer ensures the order in which tasks are dispatched. First; the SRM, when attempting to handle resources, can allow a task or job with a lower DPRTY to run prior to a job with a higher priority. Second,ils the dispatcher dispatches tasks to both processors, tasks of different priority may be executing on both processors simultaneously. 2. The CHAP (change priority) SVC does not ensure that tasks are dispatched in the expected order when dispatching on two processors. 3. Attached tasks can execute at the same time as the mother task on different processors. Therefore, if both tasks reference the same data, serialization of the data is required: 4. Any references made to system control blocks that change dynamically after IPL must be serialized to preserve the integrity of the data. The serialization technique for the data item must match that employed by the system. 5. Tasks can be redispatched on a different processor from the one on which they were previously operating. Therefore, do not use storage from 04K because redispatch on a different processor results in different data being referenced. 6. Ifsubpools are shared between tasks, users must serialize the use of any data in the sub pools c'ommon to the two tasks. 7. SRBs can be dispatched on either processor unless they are scheduled with affinity for a particular processor. 8. Asynchronous appendages can operate Simultaneously with the task on the other processor. , 9. Recovery routines can run on either processor, not necessarily the one on which the error was detected. I 10. STATUS STOP does not prevent SRBs from being added to the local queue; it merely quiesces the address space after any currently executing or suspended SRBs have completed. 11. When access methods allow sharing of data sets between tasks in the same address space, access to the data sets must be serialized between the tasks. 2.5.6 OS!VS2 System Programming Library: MVS Diagnostic Techniques Effects of Multiprocessing On Problem Analysis (continued) Inter-Processor Communication MVS uses the inter-processor communication OPC) function in doing its interprocessor related work. The IPC function uses the SIGP (Signal Processor) instruction to provide the necessary hardware interface between the MP-configured processors. This instruction provides twelve distinct functions. Two of these functions are augmented by the control program to request services of the other processor; external call (XC) and emergency signal (EMS) which are SIGP codes 02 and 03, respectively. Thus, there are two classes of IPC services: 1. Direct - These services are defined for those control program functions that require the modification or sensing of the physical state of one of the processors. Ten of the twelve SIGP functions are defined as IPC direct services: Function sense start stop restart initial program reset program reset stop and store status initial microprogram load initial processor reset processor reset Function Code 01 04 05 06 07 08 09 OA OB OC Note: Codes OA, OB, and OC are not valid on a Model 158. 2. Remote - These services are defmed for those control program functions that require the execution of a software function on one of the processors. The two remaining SIGP functions, external call (XC) and emergency signal (EMS), provide the hardware interface and interruption mechanism to initiate the desired program on the proper processor. The remote service function is provided in two categories: • Pend able service - uses the XC function of SIGP • Immediate service - uses the EMS function of SIGP When processor A issues a SIGP (XC or EMS) instruction to processor B, a request for an interrupt becomes pending in processor B for the external class. If external interrupts are disabled in the current PSW for processor B, the interrupt is not taken. If the PSW for processor B is enabled, then separate mask bits for XC and EMS are interrogated in control register O. Interrupts are taken one at a time for those requests enabled in the control register. If processor B is disabled, processor B keeps pending at most one XC and one EMS request. XC requests can pend simultaneously. Each specific XC request is encoded in a physical configuration communication area (PCCA) buffer associated with the receiving processor. Both the direct and remote services may be used to initiate the desired function on any of the processors phYSically attached via the MP feature, including the processor the request is initiated on. Section 2: Important Considerations Unique to MVS 2.5.7 Effects of Multiprocessing On Problem Analysis (continued) Direct Services The direct service function consists of a macro instruction (DSGNL) and a SIGP issuing routine (IEAVEDR); The DSGNL macro generates an in-line sequence of instructions that: 1. Loads general register 0 with one of the ten SIGP function codes used to perform the desired hardware action 2. Loads general register 1 with the address of the specified processor's physical configuration communication area (PCCA) 3. Loads general register 15 with the address of lEAVEDR 4. BALRs 14, 15 Upon return from lEA VEDR, register 15 contains a return code indicating the status of the request. If the return code is 8, register 0 contains sense information about the receiving processor as shown in Figure 2-9. Return Code of 8: Register 0 Bit o 1-23 24 25 26 27 28 29 30 31 Meaning Equipment check Reserved External call pendi ng Stopped Operator intervening Check stop Not ready Reserved Invalid order Receiver check The other return codes are: o- SIGP instruction successfully initiated. The function is not necessarily completed upon return to the caller. 4 - SIGP function not completed because path to the addressed processor was busy or the addressed processor was in a state where it could not accept and respond to the function code. 12 - Not operational, that is, the specified processor is either not installed or is not configured into the system or is powered off. 16 - SIGP unsuccessful. Processor is a uniprocessor and does not have SIGP sending and receiving capabilities. Figure 2-9. SIGP Return Codes 2.5.8 OS!VS2 System Programming Library: MVSDiagnostic Techniques Effects of Multiprocessing On Problem Analysis (continued) Remote Pendable Services The remote pendable services function (external call) consists of a macro instruction (RPSGNL) and a routine (lEAVERP) which are used to invoke the execution of a specified program on a specifc processor. This service is used by supervisor state, zero protection key functions that are not dependent upon the completion of the specified service in order to continue their processing~ The RPSGNL macro generates an in-line instruction sequence that: 1. Loads register 0 with a code identifying one of the services to be initiated 2. Loads register 1 with the address of the PCCA of the processor on which the service is to be initiated 3. Loads register 15 with the address of IEAVERP 4. BALRs 14, 15 Upon return, register 15 contains a return code. If the return code is 8, register 0 contains sense information (see Figure 2-9). There are currently six functions that can be initiated via external call: 1. Switch - specifies that the service routine (lEAVEMSI) used by the memory / task switch function is to be executed. 2. SIO - specifies that the IDS start I/O routine (IECIPC) is to be executed on the specified processor. 3. RQCHECK - specifies that the timer supervisor TQE check service routine (IEAPRQCK) is to be executed. This routine ensures that the top TQE on the real-time queue is being timed. 4. GTFCRM - specifies the GTF service routine (AHLSTCLS) that modifies the Monitor Call (MC) control registers is to be executed. 5. MODE - specifies the recovery management services (RMS) service routine (IGFPEXI2) that modifies the RMS oriented control registers is to be executed. 6. MFITCH - specifies that the MFI service routine (pointed to by CVT + X'320') is to be executed. This routine executes TCH (Test Channel) instructions on the processor to which the channels are attached. Section 2: Important Considerations Unique to MVS 2.5.9 Effects of Multiprocessing On Problem Analysis (continued) The remote pendable seIVices routine (IEAVERP) sets the appropriate code in the external call buffer of the receiving processor's PCCA (offset X'84') as follows: SWITCH SIO RQCHECK GTFCRM MODE MFITCH X'80' X'40' X'20' X'IO' X'04' X'02' Then IEAVERP sets the external call (XC) function code (X'02') in register 0 and uses the DSGNL macro instruction to cause the SIGP instruction to be issued. The receiving processor will take an external interrupt when it becomes enabled for such interrupts. The external FLIH determines that the interrupt was an XC and passes control to the XC SLIH. The XC SLIH locates the XC buffer (X'84') in his PCCA, determines the function requested, and branches (BAL) to the appropriate routine. Refer to Figure 2·10 for the XC process flow. Remote Immediate Services The remote immediate seIVices function consists of a macro instruction, RISGNL, and a routine, IEAVERI, which are used, like the remote pend able services, to cause the execution of a specified program on any of the online MP·configured processors. However, the immediate seIVice differs from the pendable seIVice in two important ways: • The processors in an MP configuration are enabled for the emergency signal (EMS) interrupt at times when the processors are not enabled for the external call interrupt. In particular, EMS interrupts are enabled when the processor is in the "window spin" state in which all other asynchronous interrupts (except machine check and malfunction alerts) are disabled. This "window spin" state is entered by a routine, such as the lock manager, when a point is reached in its processing that requires an action on the other processor in order for processing to continue. The "window spin" state specifically allows either the malfunction alert or EMS interrupts that are used to trigger the alternate CPU recovery (ACR) function to be accepted and processed. • An immediate seIVice routine can be requested to execute serially or in parallel with the function requesting the seIVice. That is, IEAVERI will spin while waiting for the designated processor to signal either that the receiving routine has completed execution(serial) or that the receiving routine has been given control (parallel). Some of the functions that can be initiated via EMS are: 2.5.10 • HIO - A Halt I/O command is issued to the designated device by the receiving processor. • ACR Function - The receiving processor helps the sending processor from a failure by al ternate CPU recovery procedures. OS/VS2 System Programming Library:MVS Diagnostic Techniques Effects of Multiprocessing On Problem Analysis (continued) • Clock Synchronization - TOD clocks are adjusted so the same value is in each clock. • PfLB - The receiving processor purges its translation-Iookaside buffer (TLB). The remote immediate services macro, RISGNL, generates an in-line sequence of instructions that: 1. Loads register 0 with the PARALLEL/SERIAL indication 2. Loads register 1 with the address of the PCCA of the processor on which the service is to be executed 3. Loads register 11 with the address of a parameter list to be passed to the service routine 4. Loads register 12 with the entry point address of the service routine to be executed 5. Loads register 15 with the address of lEAVERI 6. BALRs 14, 15 As for direct and remote pendable services, upon return register -15 contains a return code. Register 0 contains sense information in case the return code was eight. (See Figure 2-9). lEAVERI builds the emergency signal buffer in the sending processor's own PCCA at offset X'88', sets the EMS function code X'03' in register 0, and issues the DSGNL macro to cause the SIGP to be issued. The receiving processor will take an external interrupt when it becomes enabled. The external FLIH determines that the interrupt is an EMS and routes control to the EMS SLIH. The SLIH locates the EMS buffer of the sender and, for a parallel request, the SLIH turns off the parallel bit and calls the receiving routine. For a serial request, the receiving routine is given control, and, upon completion, the serial bit is turned off. During this interrupt handling process, the sending processor was in the window spin state until the serial or parallel bit was turned off. Figure 2-11 'shows the EMS process flow. Section 2: Important Considerations Unique to MVS 2.S.11 SENDING PROCESSOR Invoked via Macro (See Below) 1. Disables (STOSM) External and 10 Interrupt Set up (see Note 1.1 Input Registels RO . R1 Function Code 2. Is Receiving Processor Online? No Yes Receiving Processor's /PCCA R 14 Return Address R15 IEAVERP EP IEAVERP X'SO' X'40' X'20' X'10' X'OS' X'04' X'02' 3. Turns On External Call's Sub·Function Code in External Call's Buffer In Receiving Processor's PCCA. (Compare and Swap On) 4. Sets External Call Function Code, X'02' In Reg 0 IEAVEDR 5. Issues DSGNL (0), (1) Disables (STOSM) External and I/O Interrupts Set up . see Note 1. 6. Checks Return Codes. If R.C. = S and Status is External Call Pending, Set Return Code = O. Establishes SIGP Registers a. Physical Processor Address R2 = PCCACPUA based on R1 b. Establishes Parameter Register R1 = 0 c. Establishes Function Code R3= RO SIGP R1, R2, 0 (R3) 3. Checks Condition Code CC2 - Busy - Retry (2) CC1 - Eq. Chk, Operator Intervention Receiver Check - Retry Within Limits CC1 - All Others - R.C. S CC3 - R.C. S (See Note 3.) CCO - R.C. 0 Return Registers RO R14 R 15 Status Bits Return Address Return Code Note: R.C. S means status bits are set in Register O. 4. Input Registers RO R1 Function Code = X'02' Returns to IEAVERP Receiving Processor's PCCA R14 Return Address R15 IEAVEDREP Entry Point Return Registers RO R14 ! R 15 RPSGNL ~ SWITCH l ,PROCESSOR = j PCCA Entry .Address l 1 (1) f (0) Figure 2-10. External Call (XC) Process Flow (Part 1 of 2) 2.5.12 Status Bits Return Address Return Code Note: R.C. S means status bits are set in Register 0 IEAVERP Invoked via RPSGNL Macro Expansion: SIO RQCHECK GTFCRM MODE MFITCH Restores Caller's Status and Returns to Caller OS/VS2 System Programming.Library: MVS Diagnostic Techniques (To Part 2) RECEIVING PROCESSOR External FLiH (Fr om Pa rt 1) JIo.. A '" Determines If Interrupt IsAn External Call I Input Registers R2 FLIH Return Address R10 Ext. Call SLIH Entry Address "I .. External Call SLI H 1. Turns On Active Bit 2. Locates External Call Buffer PSA ~ PCCA 3. If Buffer Equals 0, Returns ,to F LI H 4. Determines Subfunction Requested Compare and Swap Bit Off and Bal 14 to Appropriate Routine: X'SO'SWITCH X'4Q'SIO X'20'RQCHECK X'10' GTFCRM X'OS'RESERVED X'04' MODE X'02' MF ITCH IEAVEMS1 IECIPC IEAPRQCK AHLSTCLS Appropriate Routine ... ... ... ..... IGFPEXI2 CVTMFRTR% 5. Turns Off Active Indicator and Returns To External FLiH - BR 2 Notes: 1. T urns on active indicator Saves callers registers Establishes addressability 2. Disables/Enables Spin 1. Turns on SPIN indicator 2. Enables for MFA and emergency signal interrupts 3. Disables 4. Turns off SPIN indicator 3. If CC = 3 and yet the processor is logically online, a SIGP hardware failure may exist. A "Soft ACR" option is available to the system operator to reconfigure to a UP system. Figure 2-10. External Call (XC) Process Flow (part 2 of 2) Section 2: Important Considerations Unique to MVS 2.5.13 SENDING PROCESSOR See Macro Below IEAVERI Input Registars RO R 1 ...R-e-ce-i-Vi-ng-P-ro-c-es-so-r-'s...... PCCA R11 R12 R14 ~-------I ~----~--~----~ R15 ~--------------~ 1. Disables (STOSM) External and 10 Interrupts Sets up (see Note 1J 2. Is Receiving Processor Online ~ Yes NO .... RC=4.~ Bit 0 - Parallel Bit 1 - Serial Bit 31 - RMS Indicator (To Receiving RoutIne's Entry Point 3. Builds Emergency Signal Buffer in Own peCA. a) Turn On Parallel or Serial Indicator. b) Place Receiving 1) Routine's EP 2) Routine's Parameter Address 3) Processor's Address In The Buffer 4. Sets Emergency Signal Function Code, X'03' In Reg O. Issue DSGNL (0), (1) 5. Checks Return Codes: Unsuccessful Successful 6. Serial Request Parallel Request Spin Until Serial Bit Is Off. Note 2 Spin Until Parallel Bit is Off. Note 2 7. Restore Caller's Status and Returns To Caller X'03' I Status Bits Return Address Return Code Input Registers RO Function Code = X'03' R1 Receiving Processor's PCCA R14 R15 ~-------f Figure 2·11. Emergency Signal (EMS) Process Flow (part 1 of 2) 2.5.14 Emergency Signal Buffer (In Sending Processor's PCCA) OS/VS2 System Programming Library: MVS Diagnostic Techniques Receiving Routine's Par~meter Address IEAVEDR 1. Disables (STOSM) External and 1/0 Interrupts Sets up (see Note 1,) 2. Establishes SIGP Registers a. Physical Processor Address R2 = PCCACPUA based on R1 b. Establishes Parameter Register R1=O c. Establishes Function Code R3=RO SIGP R 1. R2. 0 (R3) 3. Checks Condition Code CC2 - Busy - Retry (2) . CC1 - Eq.Chk.Operator Intervention Receiver Check - Retries Within Limits CC1 - All Others - R.C.S CC3 - R.C. 12 (See Note 3) CCO - R.C. 0 (To Part . / \ RECEIVING PROCESSOR (From Part 1) External FLIH ~~----------.~!r------. Determines Interrupt Is An Emergency Signal Input Registers R2 FLIH Return Address R10 EMSSLIH Entry Address E'YIergancy Sillnal SLIH 1. Turns On Active Bit 2. Locates EMS Buffer of Sender CVT ... PCCA VT (Processor TO) .... PCCA B 1===========1 :=:================~ 3. If RMS Indicator On, (From Part 1) ~alls ACR ... ACR ~ 4. If Receiving Processor 10 Equals This Processor 10, Returns to FLIH. Receiving Routine 5. Determines If This Is Serial Input Registers R1 R14 Parallel: or Calls Receiving Routine Tums Off Parallel Bit Turns Off Serial Bit. Calls Receiving Routine Parameter Address Return Address R 15 Receiving Routine's Entry Address 6. Turns Off Active Indicator R14 7. Returns to FLIH R15 Receiving Routine's Entry Address t . CPU RISGNL j Pa~lIel 1Senal ~ = j PCCA Entry Address ? = j Address t [ p ? (12) ~ ,arm (1) =j 1 Return Address Output Register IEAVERI Invoked via RISGNL Macro Expansion: EP Input Registers R 1 Parameter Address l f R21 FLIH Return Address I t] Address (11) ~ Notes: 1. Turns on active indicator Saves callers registers Establishes addressability 2. Disables/Enables Spin 1. Turns on SPIN indicator 2. Enables for MFA and emergency signal interrupts 3. Disables 4. Turns off SPIN indicator 3. If CC = 3 and yet the p'rocessor is logically online, a SIGP hardware failure may exist. A "Soft ACR" option is available to the system operator to reconfigure to a UP system. Figure 2·11. Emergency Signal (EMS) Process Flow (part 2 of 2) Section 2: Important Considerations Unique to MVS 2.S.1S Effects of Multiprocessing On Problem Analysis (continued) MP Debugging Hints 1. Apparent disabled loop in lEAVERI on processor A. This is probably caused when processor A sends an EMS to processor B, but the receiving routine on processor B has not yet turned off the serial or parallel bit in processor A's PCCA. Thus, processor A is in the "window spin" state in lEAVERI. To find what processor A wanted processor B to do, locate processor A's PCCA. CVT + X'2FC' points to the PCCAVT PCCAVT + 4 (CPUID for processor A) points to processor A's PCCA. PROCESSOR A's PCCA X'80' for Parallel Request X'40' for Serial Request X'88' RISP X'8C' Receiving Routine PARM address X'90' Receiving Routine EP address X'94' Receiving Processor's PCCA "t'address "'r By locating the proper PCCA (in this case processor A's), you can determine whether the EMS request was parallel or serial, the entry point, and, therefore, the name of the receiving routine. Although this information tells quite a bit about the current process on processor A, the real problem, however, is most likely on processor B. Three past experiences can help determine the state of processor B. • Processor B, if disabled for EMS interrupts, would never take the EMS interrupt; therefore the receiving routine would never get control and the parallel or serial bit would never get turned off. • There could be a hardware problem with the SIGP circuitry. For example, if lEAVERI got condition code 0 as a result of issuing the SIGP instruction on processor A, but the SIGP was never received on processor B, there would be a loop in IEAVERI. • Processor B was stopped in order to take a stand-alone dump. Before the dump program was IPLed or processor A was stopped, processor A issued the EMS for page invalidation. Thus, when the dump occurred, processor A was looping while waiting for the page invalidation to complete. So it appeared that processor A's looping was the problem when actually it was caused by a previously-identified problem on processor B. ( 2.5.16 OS/VS2 System Programming Library: .MVS Diagnostic Techniques Effects of Multiprocessing On Problem Analysis (continued) 2. Locate External Call buffers The external call buffer is located at offset X'84' in ilie PCCA. Normally, the buffer is clear, but it is worthwhile to check to make sure that there is no XC work to process, as indicated by the request codes below: PCCA X'84' code Request Code: I X'80' X'40' X'20' X'10' X'08' X'04' X'02' SWITCH SIO RQCHECK GTFCRM RESERVED MODE MF1TCH The code is set in the receiving processor's PCCA so that a bit on in processor B's PCCA, for example, means that processor A initiated the request. 3. Determining Which Processor Has I/O Capability The processor attribute bits, PCCAATTR, are located at offset x' 178' in the PCCA. lfbit 1 (PCCAIO) is 1, then this processor has 1/0 capability, which means that this processor has at least one channel logically online. Bit 1 is set to 0 by: IEAVNIPO: For each processor that has no channels physically online. (Note: For Model 158 and Model 168 APsystems, PCCAIO=O for the attached processing unit.) IEEVCPU: When the last channel of a processor is varied offline. Bit 1 is set to 1 by: IEAVNIPO: For each processor that has channels physically online. IEEVWKUP: When a processor is varied online and it has channels physcially online. When the first channel of a processor is varied online. Section 2: Important Considerations Unique to MVS ·2.5.17 Effects of Multiprocessing On Problem Analysis (continued) Bit 1 is referenced by: IGFPTERM: When searching for a live processor, if that processor has I/O capability (PCCAIO=1), a SIGP EMS is issued to that processor. IGFPTSIG: When processing an EMS received from a failing processor. When invoked during system termination, if executing on a processor with I/O capability, IGFPTSIG writes to LOGREC and the console. IGFPXMFA: When processing an MFA received from a failing processor. If executing on a processor that has I/O capability, IGFPXMFA invokes ACR. . IEAVTACR: IfPCCAIO=1 for the failing processor, IEAVTACR invokes I/O restart to handle outstanding I/O. 2.S.18 OS/VS2 System Programming Library: MVS Diagnostic Techniques MVS Trace Analysis This chapter reviews the trace formats found in VS2 storage dumps. The MVS trace (similar to the as trace) and the GTF trace are available in both systeminitiated dumps (SNAP) and in stand-alone dumps. There are formatting routines for most combinations. The trace table entry format can be found in the "Data Areas" section (see TTE-Trace Table Entry) and the "Dump and Trace Formats" section of the Debugging Handbook. .The information in this chapter is provided to assist you in reviewing the various formats as you will see them in a storage dump. The page fault path is used as the vehicle for describing these formats in the following examples and descriptions. Trace Entries To have these entries formatted in a SYSUDUMP/SYSABEND/SYSMDUMP, the installation must specify SDATA={TRT) in the SYSl.PARMLIB members or use the CHNGDMP command. Note: SYSMDUMP produces a machine-readable dump; AMDPRDMP must be used to print it.AMDPRDMP does not format the system trace table'. For unformatted trace table entries, the system queue area (SQA) must have been printed. Use location X'S4' as shown in Figure 2-12 to locate the trace table. Remember that 'TRACE ON' was required at IPL time. (Note that if GTF is active, the system trace is turned off.) Entry Pointers LocX'54' " " Current FD95CO FD95EO FD9600 FD9620 FD9640 FD9660 FD9680 ~ a . First 00000000 000295C8 ~~ 00000000 00000000 00000000 00000000 078D206F 40F63284 00000004 00000004 070C3011 BOF647EE 00000000 OOCBDOOO 078D706F 40F63284 00000000 00000000 078D2003 40014C72 00FA903A 00000000 070C700C 60E470B4 00FA903A 00000000 \ 070C200C--':OE4 70B41\00E4 70E6 OOOOO~O c b Last ~ 00000000 00064F08 OOCBDOOO 00000127 00064DE8 00064DE8 F5003240 00000000 00400004 00400004 00400004 00400004 00400004 00000000 00000000 00CEC410 00CEC410 00CEC410 00CEC410 00CEC410 00000000 00000000 0712 8A 1 B 07128A38 07128A67 07128A8D 07128A9F 0,()064DE8)~~~~ d e f 9 * ....... H .. . *.......... . *.... 6 . . . . . * ..... 6 . . . . . * .... 6.A .. . * ........ .. *..... u .... . * ..... U . . . U. where: a- address column in SQA b- PSW or device address/CAW if an SIO operation c- variable, see TTE in Debugging Handbook d- CPU I D: 0040 for processor 0; 0041 for processor 1 e- ASID: f - TCB address 9- Timer value 0001 is Master Scheduler; 0002 is usually JES; 0000 is Dummy Task or N/A Figure 2-12. How to Locate the Trace Table . Section 2: Important Considerations Unique to MVS 2.6.1 MVS Trace Analysis (continued) If low address storage is overlaid and the trace table pointer (X'S4') is lost, you can locate the trace table (which is in the SQA) by searching through the high address range of common storage: Each trace entry is X'20' bytes in length and begins in the extreme left-hand column of a storage dump. Once you locate a pattern ofX'OT and X'04' combinations, you have found the trace table. If location X'S4' has not been overlaid, then it will point to the control information for the trace; this information is directly in front of the actual table. The trace routine places an entry (record) type indicator in the fifth position of the PSW and moves the interrupt code in to make the PSW appear as Be mode . . Figure 2-13 illustrates and explains each of the trace entry types. Position 5 G)--FD98AO FD98CO FD98EO (Y-FDB900 FDB92 0 @-~g:~~~ @f~g:!~~ FDBA20 FD9740 @-FD9760 @:~~~!~ G)rFD97CO FD97EO FD9800 FD9820 ®-FD9840 FD9860 FD9880 000150 078D7000 078D206F 078r.@>04 078D706F 078D2003 078$6F 000060F8 00F62F84 40F63284 00F63284 40F63284 40014C72 40F63284 070~11 BOF647EE 078D706F 40F63284 078D2003 40014C72 001F54F8 00F62F30 00000004 00000000 00000000 00FA903A 00000004 00000000 00000000 00FA903A OCOOOOOO 00000001 00000004 00000000 00000000 00000000 00000004 00CB5000 00000000 00000000 00064F08 00064F08 000002BO 000002BO 00064DE8 00064F08 00CB5000 000002BF 00064DE8 00400004 00400004 00400004 00400000 00400004 00400004 00400004 00400004 00400004 00400004 00CEC410 00CEC410 00CEC410 00FE99B8 00CEC410 00CEC410 00CEC410 00CEC410 00CEC410 00CEC410 0712B5A8 0712B5C4 0712B6B2 0712B6DO 0712B700 0712B716 40014C72 3001A128 0001C760 7001A128 0003054C 000060F8 60E470B4 60E470B4 00E470E6 00F62F84 0001C760 00FA903A 001F5258 00000004 00000000 00000004 001F5258 00FA903A 00E470E6 00E470E6 001 F54F8 00000004 00000000 OCOOOOOO 00FF5168 00000088 OOOOFFFF OCOOOOOO 00000000 00000000 00000000 OCOOOOOO 00FF5168 00064DE8 00000000 00FF5194 00CC3CBO 00FF5258 00FF5194 00064DE8 00064DE8 00064DE8 00000000 00FF5194 00400004 00400004 00400004 00400004 00400004 00400004 00400004 00400004 00400004 00400004 00400004 00CEC410 00CEC410 00CEC410 00CEC410 00CEC410 00CEC410 00CEC410 00CEC410 00CEC410 00CEC410 00CEC410 07128BDF 07128BEE 07128C01 07128C10 07128C1C 07128C58 07128C65 07128D10 07128D25 07128D32 07128D43 078D2003 070*50 070 4 00 070C700C 070c@J00 00000250 070C700C 070C200C 078r:&>00 078D5250 070C4000 00FF51.~4 07128D81 07128D8E 07128DA9 0712B~82 *....... 8 ••• 8 ••••••••• *..•.. 6 ••• 6 ••••••••••• *.... 6 ••••••••••••••• *..... 6 ••••••••• *.... 6 ••••••••• *............. . *.... 6 ••••••••••••••• *..... 6 ••••••••••••••• *.... 6 ••••••••••••••• >1< • • • • • • • • • • • • • • • • • • >1< • • • • >I< >1< • • • • • • • • • • • • • • • • • • • • • >1< • • • • • • • >I< • • • • • >1< • • • • • .. >1< • • • • • >I< Fifth digit in first word is O. This is an SIO entry for device 250. The CAW address is '60F8'. The CSW is residual from the previous I/O interrupt. The 10SB address is 'FF5194'. Fifth digit = 2, an SVC interrupt. An SVC '6/=' was issued from location F63284 (minus the interruption length code -- I LC). Variable fields are registers 15, 0 and 1. Fifth digit = 3, a program interrupt. Interrupt code of X'11' is a page exception. Word four is the referenced translation exception address (TEA). ® Fifth digit = 4, an SRB dispatch. The address in the PSW (1C760) is the entry point address. Word 3 contains the ASID to be dispatched. This illustrates the scheduling of POST status after an I/O interrupt. ® Fifth digit = 5, an I/O interrupt. The device. address (250) has been moved into the PSW. Words 3 and 4 are the CSW with the channel end/device end. o ® Fifth digit = 6, SRB redispatch. SRBs can be suspended because of lock contention or a page fault. The address in the PSW is the return address to the lock manager or the instruction that caused a page fault. Fifth digit = 7, Task dispatch. Interrupt code is from the last task interrupt. If the interrupt cOde is 0, it is a return from SVC 0 or the first dispatch of this request block (RB) for the task. Figure 2-13. Types of Trace Entries 2.6.2 OS/VS2 System Programming Library: MVS Diagnostic Techniques 8 •.•..•..••••• *.. , .. u .•.•.••..•.•. Y. Fifth digit = 1, an external type. This entry has an interrupt code of X'1004' so it was generated by a clock comparator interrupt. @ •••••••••••••• Y. ..................... *.. ••• G •••••••••••••• ..................... >I< where: IO SRB DSP 017 ASCB 00FD5858 CPU 0000 RO 0000005B Rl 0000005B R8 008B5F04 R9 00000000 TIME 44413.341696 ASCB 000167FO CPU 0000 TIME 44413.343055 0353 ASCB 00016780 CPU 0000 FLGS 00000010 8801 TIME 44413.344333 ASCB 00017058 CPU 0000 'l'IME 44413.345269 0353 ASCB 00016780 CPU 0000 CSW 00078498 OC000001 'rIME 44413.372394 AseB 00F05858 CPU 0000 TIME 44413.373942 ASCB 00F05858 CPU 0000 TIME 44413.375033 JOBN USR1085 OLD PSW 075C0011 00C6BOOO TCB 00888FB8 MOON SVC-RES VPA-00C68000 R2 8F8B5B78 R3 40C69002 R4 008B58F8 R5 01885F2C R6 008B5EE4 R7 018B5F20 R10 00000008 R11 008B5A04 R12 00000000 R13 0000005B R14 008B8EB8 R15 00C6BOOO JOBN *MASTER* SRB PSW 070COOOO 00061A40 SRB 00FE7400 PARM 00000000 JOBN *MASTER* STAT 0000 R/V CPA 00078740 00078470 SK ADOR 00000000 OE000803 CAW OOOOEFBO CC 0 OSIO 00000000 JOBN N/A DSP PSW 070EOOOO 00000000 TCB 00017158 MOON N/A JOBN *MASTER* SNS N/A OLD PSW 070EOOOO 00000000 R/V CPA 00078470 00078470 TCB N/A FLG C0108801 USIO 00000000 A2000353 00 JOBN USRT085 SRB PSW 070COOOO 0004B6FA SRB 00FFB480 PARM 00000001 JOBN USRT085 OSP PSW 075COOOO 00C6BOOO TCB 008B8EB8 MOON SVC-RES PGM 017 The page fault. VPA=address of fault. SRB The dispatch of ASM's part monitor routine in master's address space. SIO 353 The Start 110 to page-in the requested page. DSP The dispatch of any ready work while the page-in 1/0 is in progress. In this case, there is no ready work, so the wait task is dispatched. 10353 The 1/0 interrupt from the paging device. ASM's disable interrupt exit(DIE) routine gets control. SAB The dispatch of RSM's I EAVIOCP page-in completion processor, to validate the page table entry and post the faulter as ready to run. IDSP The faulter resumed where he left off. Figure 2-17. GTF Trace of a Page Fault With I/O - 2.6.4 OS/VS2 System Programming LibraIy: MVS Diagnostic Techniques TYPE GLOBAL TYPE LOCAL MVS Trace Analysis (continued) Notes for Traces The trace provides a history of some of the events that lead to a storage dump. Trace interpretation is one of the most important aspects of debugging. Tracing Procedure When attempting to recreate the process that was occurring on the processor(s) when the dump was taken, start at the last entry in the trace table (identified either by the trace header or by the highest clock value in the last column) and scan upwards. While scanning, look for unexpected events. These include: • Unit check, unit exceptions on I/O devices • Non-CC =0 on SIOs • Non-type 11 program checks • SVC D, 33 - (see number 6 under "Cautionary Notes" later in this chapter) • Malfunction alerts (X'1200' external interrupt) • Entries that show both processors -executing the same code as indicated by the ICs (instruction counter) in the entries • Large time gaps in the TOD dock value • MP environment and only one processor doing anything These entries indicate a potential for errors. Do not be distracted if you discover an entry of this type. Record the incident for future use. Then continue scanning back through the trace and try to determine what was happening in the system that might have caused the failure. Remember to conduct the scan by unique processor. Separate the processes that occur on each processor and watch for any obvious interactions in the processes. You can further subdivide the activity by address space (as depicted by ASID) or by task (TCB address; remember to stay under the same ASID). As you recreate the situation, remember that you are relating individual entries to real events that must occur in order to accomplish work. Do not be distracted. For example, do not look for an I/O interrupt just because you see an SIO. The two events should be associated, but you should also determine the follOwing: • Why the I/O is occurring; • If the I/O is related to the process, address space, task, page fault, etc. that you are concerned with; • If the I/O completion should trigger another event. This is the way work is accomplished in MVS, that is, events triggering more events. As you become familiar with trace coding you learn to expect this "event causing" sequence. Certain sequences occur very frequently; you learn to recognize these and to look for less familiar sequences. Section 2: Important Considerations Unique to MVS 2.6.5 MVS Trace Analysis (continued) As you are searching trace entries, watch for repeating patterns, which can indicate a loop in the system. These patterns can 'appear as constantly repeating ICs (generally the case in a tight enabled loop), or as a repeating sequence of entries (often the case in a process loop, such as an ERP constantly retrying an I/O operation). Note that in the latter case, other entries from otlier processes can intervene periodically in the trace table, especially in an MP environment. If you reach a point in the trace analysis where you are somewhat comfortable with the processes you are uncovering and recreating, and you feel you have a fair understanding of the activity in the system, pause. Try to understand what you have found. Is there any way you can relate your findings to the reason you have taken the. dump in the first place? Do the unexpected events have anything to do with the problem, or are they unrelated to the problem? It c·an happen that the events you have discovered are unrelated to the problem causing the dump and you have exhausted the scope of the trace. In this case, you probably have to go into the system and study the address space and task structures, . queues, and global data areas in order to zero in on the problem. However, if the events you have discovered are related to the problem causing the dump, you must then attempt to isolate the erroneous process. Try to . understand how the unexpected events relate to the process. Look on both sides of the event: did the event trigger the bad process, or is it a result of the bad process? It is also necessary in trace analysis inMVS to understand whether you are looking at the primary error or at some secondary problem. Is this a mainline failure or a failure because of a problem in the recovery? Also, you must decide if the problem is caused by a previous error from which the system has recovered. Always be sure that it was not something several pages earlier in the trace that caused recovery to be activated and eventually led to the current problem. If thjs is the case you must now decide which error to pursue. The original error is probably more important; however, m\lch of the required information might be lost because of recovery arid the subs~quent recovery failure. Also k~ep in mind that if you must attack the secondary error condition, your search of the dump and the recovery areas can often uncover information about the first error. The trace is one of the most useful tools available for back-tracking through a problem sequence. You must use it in conjunction with system control blocks and indicators in order to recreate the errorsequence. This is still true in MVS despite the fact that the trace contains less information than in previous systems. In MVS, the SVC calls have been greatly reduced because of branch entry logic for both transfer of control and supervisor services. This means that trace entries are not provided as in previous operating systems. Also, many significant events, such as lock acquisition and release, SRB scheduling, and SIGP issuance, are not traced. Because of these MVS considerations, you must be able to understand the processes and interpret the trace table rather than just read it. 2.6.6 OS!VS2 System Programming Library: MVS Diagnostic Techniques '. MVS Trace Analysis (continued) Cautionary Notes Listed below are some items the problem solver should understand when analyzing an MVS trace table. I. I/O Processing: • Much I/O is accomplished in MVS by the branch entry interface to lOS and without the use of SVC 0 (EXCP). Therefore, you often find I/O entries (SIO/I/O interrupt) that are not accompanied by SVC O. • Back-end I/O processing in MVS generally results in an SRB schedule of IECVPST. This trace entry should appear soon after an I/O interrupt. The register 1 slot will contain the 10SB address. The 10SB is the key to tracking the I/O request. 2. Timer Value: The last field of each trace entry contains the middle four bytes of the eightbyte TOO clock at the time the entry was made~ The clock can be of considerable importance when trace entries and various system fields (such as the ASCB or LCCA, which also contain TOO clock values) are used to determine how much time has elapsed between significant events. The last digit represents a value that is increased every 16 microseconds. Also, the fourth digit represents the value to be increased every second. 3. Enabled Wait State: Because of recovery, the end symptom of many problems is an enabled wait state. For tracing, the wait state presents particular problems in MVS. SRM maintains a timer interval that causes a clock comparator interrupf(code X'1004') approximately every 1/2 second. These external interrupts are recorded in the trace table. You then see the re-dispatch of the no-work wait task followed by another clock comparator interrupt, and so on. Even though this occurs,the sequence is not repeatedly traced. In addition, in an MP environment there are external calls (code X' 1202') issued between the two processors requesting that the receiver look for ready work. These calls will be followed by a re-dispatch of the no-work wait on the receiving processor. In short, the wait state is a combination of dispatches of the no-work wait task, clock comparator interrupts, and SIGP external calls. The IC (instruction counter) will always be O. At approximately 12- or I3-seconds intervals, an SRB is dispatched in the master scheduler address space to run a section of SRM in order to gather system statistics. When the SRB has completed, the no-work wait task is again dispatched. All this extraneous activity causes the trace to wrap around and overlay the important trace entries of the events that led up to the enabled wait state. Section 2: Important Considerations Unique to MVS 2.6.7 MVS Trace Analysis (continued) 4. MP Activity: The communication between the two processors in the MP environment is traced ,as the external interrupts are accepted by the receiving processor. An external interrupt code ofX'1201' is an emergency signal; and an external interrupt code of X' 1202' is an external call. (The previous chapter, "Effects of MP on Problem Analysis," explains this communication process.) 5. Trace Currency: Various processesthatoccur in MVS turn off the MVS trace. The most prominent o'f these are GTF and SVC dump. Determine if the trace was running when the, dump occurred: if you are unaware that the trace was not running when the dUlllP was taken, you might go off on a fruitless chase and lose considerable time. The trace was still active when the dump occurred if the CVT +X'191' value =X'FA'. Note: When SVC dump turns off the MVS trace, it sets bit 0 on in the CPU identifier (offset X'14') in the current trace table entry. 6.' SVC D Entries: SVC D is the means .by which termination is invoked. In previous operating systems, SVC D meant abnormal termination. This is not always true in MVS. RTM2 is the mechanism for normal end-of-task processing as well as for abnormal te,rmination; RTM2 is invoked via SVC D. Consequently, SVC D for normal termination is a valid situation and is traced. You can determine whether SVC D implies normal or abnormal termination by inspecting the register 1 slot associated with the SVC D entry. If the first byte contains a X'08', RTM2 is being invoked for normal termination and this is not an error situation. 7. Important events not traced: MVS design prevents locked, disabled, or SRB code from issuing SVCs. The SVC FLIH abnormally terminates code that issues an SVC. Note that in this case, the erroneous SVC invo~ation is not traced. Also note that locked, disabled, or SRB code that issues SVC D does so as a means of entering RTMI; this is a common technique used by IBM SCP code in order to invoke recovery. RTM indicators show SVC error, but the real problem is why the SVC D was issued. 2.6.8 OS/VS2 System Programming,Libr~ry: MVS Diagnostic Techniques MVS Trace Analysis (continued) 8. Unit exception I/O interrupt on a 3705 communications controller: The presence of unit exception conditions from the 3705 is a common occurrence while running VTAM. This is a normal situation and should not be considered erroneous. The host processor has issued a set of read commands to the 3705, and the channel program has been terminated before all the reads have completed because the NCP did not have enough data to satisfy each read CCW. 9. GETMAIN, FREEMAIN - SVC X'A', SVC X78': For SVC X'A', inspect the register 1 slot of the associated trace entry. A value of X'80' in the high-order byte indicates GETMAIN; a value of X'OO' indicates FREEMAIN. SVC X'78' uses a code in register 15 (see the Debugging Handbook.) If a GETMAIN is indicated, the register 1 slot of the associated re-dispatch of the SVC issuing code can be used to locate the storage allocated by the GETMAIN process. I 10. A GETMAIN for X'3CC' bytes is often seen soon after an SVC D is issued: This is RTM2's request for storage for an RTM2WA. By locating the redispatch of RTM2 and inspecting the register 1 slot, you can locate the RTM2WA. Section 2: Important Considerations Unique to MVS 2.6.9 2.6.10 OS/VS2 System Programming Library: MVS Diagnostic Techniques Miscellaneous Debugging Hints This chapter is a collection of miscellaneous debugging hints to aid the problem solver in specific situations not covered elsewhere in this book. It includes the following topics: • • • • • • • • Alternate CPU Recovery Problem Analysis Pattern Recognition OPEN/CLOSE/EOV ABENDs Debugging Machine Checks Debugging Problem Program ABEND Dumps Debugging from Summary SVC Dumps Started Task Control ABEND and Reason Codes SWA Manager Reason Codes Alternate CPU Recovery (ACR) Problem Analysis Alternate CPU recovery (ACR) is the process by which MVS dynamically adjusts to the unexpected failure of a processor in a multiprocessing (MP) configuration. ACR is initiated by the failing processor. If the failing processor's hardware detects the failure, it issues a malfunction alert (MFA) external signal to the other processor. If the failing processor generates the severe machine check interrupt (recursive or invalid logout) type, the machine check iIlterrupt handler will initiate ACR via the SIGP instruction, emergency signal (EMS) operand, which generates an external interrupt on the receiving processor. When the running processor detects that a failing processor is requesting ACR, it places X'FF' in the CSDACR byte (CSD+X'16') in the CSD control block. The byte will be restored to X'OO' after ACR is complete. ACR works in three phases: pre-processing, intermediate, and post-processing phase. Pre-processing is the initialization phase: the running processor copies the PSA and normal functional recovery routine (FRR) stacks of both processor's and places them in the area pointed to from their respective LCCA's WSACACR pointer. The WSACACR pointer islocated at X'lO' beyond the area pointed to by LCCACPUS; Additionally, LCCAs are marked so that in both processor's LCCAs, LCCADCPU points to the LCCA of the failing processor and LCCARCPU points to the LCCA of the running processor. By means of the LCCACPUA field in the LCCA, you can determine which processor has failed and which is still running. Note that in a storage dump, the physical PSA of the failed processor is the same as it was when the processor decided that ACR should be initiated. The normal FRR stack, pointers to other FRR stacks, locks, PSASUPER bits etc. all reflect the state of the processor at the time it failed. This will be useful for solving problems in the recovery initiated for the process on the failed processor. Section 2: Important Considerations Unique to MVS 2.7.1 Miscellaneous Debugging flints (continued) The ACR intermediate phase gets control from the MVS dispatcher, or lock manager global spin lock routine. In this phase, ACR switches from the process . on one logical processor to the process on the other logical processor. This switching continues until the RTMI recovery (routing to FRRs) completes on behalf of the process on the failed processor. At this point, the ACR postprocessing phase is entered. ACR post-processing consists of cleanup activities performed by other components and by ACR. ·Post-processing invokes I/O restart (IECVRSTI) to initialize the channel reconfiguration hardware (CRH) function on a Model 168 or to mark outstanding I/O from the failed processor with a permanent error which then initiates error recovery processing via error recovery procedures (ERPs). Console switch is invoked via POST. Additionally, the system resources manager (SRM) is notified of the loss of the processor. Finally, ACR performs additional cleanup activities and sets the CSDACR flag to X'OO'. Historically, the parts of the ACR process that have had software problems are the FRRs (written by component developers to protect particular mainline functions) and the ERPs (device-dependent routines). The mainline ACR routine (lEAVTACR) is basic and has been quite free of problems. Note: The I/O error processing invoked during the ACR process has caused many of the problems discovered to date. Of significant importance is EXCP I/O error processing. The following flow describes the non-CRH situation for an MVS 1S8 MP system. .I 1. I/O restart (IECVRSTI}determines all devices that have outstanding requests at the time of a machine check. 2. IECVRSTI simulates an I/O interrupt for each device of a channel control check and interface control check (X'OOOOOOOO 00060000') and sets the pseudo interrupt bit in the IRT (lRTPINT bit at X'02' in IRTENVR). This prevents lOS from interfacing with the channel check handler (CCH). 3. IECVRSTI passes control to lOS via the I/O FLIH. 4. lOS sets the IOSCODfield in IOSP to X'74' and schedules IECVPST. S. IECVPST routes control to the abnormal exit routine. 6. For an EXCP, the EXCP compatability interface routine receives control. 7. EXCP converts the X'74' to X'7F' in the lOB. 8. EXCP branches to abnormal end appendage . • I 9. Abnormal end appendage returns to EXCP, which returns to IECVPST. 10. IECVPST invokes normal ERP processing. 11. If no path remains to a device, subsequent I/O requests (either ERP retry or normal new requests) are intercepted by lOS and flagged with IOSCOD = X'S l' and IECVPST is scheduled. 12. IECVPST routes control to the abnormal exit routine. 13. For EXCP requests, the abnormal exiUs again the EXCP compatability interface routin~. 2.7.2. OS/VS2 System Programming Library: MVS Diagnostic Techniques Miscellaneous Debugging Hints (continued) 14. EXCP converts the X'SI' to a X'41' (permanent error) in the lOB and enters the abnormal end appendage. 15. The abnormal end appendage returns to EXCP; EXCP returns to IECVPST, which enters the termination routine. The important point in the above discussion is that EXCP changes the ACR completion codes to conventional error post codes. The most frequent I/O problems have been: • ERP's abnormal end appendages not coded for a 0 CCW address in CSW. • ERP's abnormal end appendages not recognizing that the last path to a device has been lost (as with asymmetric I/O) and thus going into an I/O retry loop. Pattern Recognition When analyzing a dump you should always be aware of the possibility of a storage overlay. System incidents in MVS are often caused by storage overlays that destroy data, control blocks, or executable code. The results of such an overlay vary. For example: • The system detects the problem and issues an abnormal completion code, yet the error can be isolated to an address space. • Referencing the data or instructions can cause an immediate error such as a specification or op-code exception. • The bad data can be used to reference a second location, which then causes an evident error. When you recognize that the contents of a storage location are invalid and subsequently recognize the bit pattern as a certain control block or piece of data, you generally can identify the erroneous process/component and start a detailed analysis. This section discusses pattern recognition and potential causes of storage overlays, and points out common patterns that aid the debugger. Once you recognize an overlay, analyze the bit pattern. If you do not recognize the pattern at all, try to determine the extent of the damaged area. Look at the data on both sides of the obviously bad areas. See if the length is familiar; that is, can you relate the length to a known control block length, data size, MVC length, etc.? If so, check various offsets to determine their contents and, if you recognize some, try to determine the exact control block/data. Even if you do not recognize the pattern, take one more step. Can you determine the offset from some base (X) that would have to be used in order to create the bit pattern? If so, the fact that there is a certain bit pattern at a certain offset (Y) can be helpful. For example, a BALR register value (X'40D21CS8') at an offset X'C' can indicate that a program is using this storage for a register save area (perhaps caused by a bad register 13). Another field in the same overlaid area might trigger recognition. Section 2: Important Considerations Unique to MVS 2.7.3 Miscellaneous Debugging Hints (continued) Look at the overlaid area and scan for familiar addresses sucn-'as devjce· addresses, UCB addresses, and BAL/BALR register values (full word with Ngh-order byte containing some "1" bits). If you find. any of these, try to detemrine what components or modules are involved or what control blocks contain these addresses. Repetition of a pattern can indicate a bad process. If you can tecognize the bad data you might be able to relate that data to the component or module that is causing the error. This provides a starting point for further analysis\ Low Storage Overlays Low storage is a common location for storage overlays. The following should be noted: • Location X'IO' (CVT pointer) should contain a nucleus address. This location is refreshed by the program check first level interrupt handler and so is often valid when adjacent locations are bad. • Location X'14' should always be O. • Locations X'18' through X'3F' (old PSWs) should always contain valid PSWs. The mask (first byte) of each PSW should be X'07', with the exception of X'30' which can containX'O', X'04', or X'07'. • Location X'4C' should be equal to location X' 10'. • Locations X'S8' through X'7F' (new PSWs) should contain valid PSWs. If any of the above statements is not true, consider a low storage overlay. Further analysis is required to determine what the cause may be. Also consider that, on a non-prefixed machine, the low storage locations described above can be overlaid by CCWs for the stand-alone dump program, starting at location X'IO.' Do not consider this an error situation. Two common low storage problems are: • A register save area starting at location X'30'. This can happen when an area of the system saves register status in a TCB at location O. Or it can be cal,1sed by a routine using PSATOLD for a TCB address when the system is in SRB mode; this is indicated by PSATOLD=O. • An SRB/IOSB combination starting at location X'O'. This can be caused by a problem in the lOS storage manager. The contents vary depending upon how many control blocks the code has initialized. Points to considerare: 1. 2.7.4 The two blocks might point to each other (X'IC' into each). OS!VS2 System Programming Library: MVS Diagnostic Techniques Miscellaneous Debugging Hints (continued) 2. An ASCB address might be at location 8. 3. Addresses of I ECVEXCP routines might be at X'68' and/or X'6C'. Common Bad Addresses Three common bad addresses are: • X'COOOO', and this address plus some offset. These are generally the result of some code using 0 as the base register for a control block and subsequently loading a pointer from 0 plus an offset, thereby picking up the first half of a PSW in the PSA. Look for storage overlays in first level interrupt handlers or in code pointed to by the Qld PSW. These overlays result when 0 plus an offset cause the second half (IC) of a PSW to be used as a pointer. • X'COO', X'C34', X'CSO', X'CS4', X'CSC', X'C7C', and other pointers to fields in the normal FRR stack. Routines often lose the contents of a register during a SETFRR macro expansion and illegally use the address of the 24-byte work area returned from the expansion. • Register save areas. Storage might be overlaid by code doing an STM (Store Multiple) instruction with a bad register save area address. In this case, the registers saved are often useful in determining the component or module at fault. OPEN/CLOSE/EOV ABENDs When a dump shows an abend issued from O/C/EOV, the key area to start your diagnosis in is the RTM2 work area. The failing TCB has a pointer (at TCB+ X'EO') to this area. This work area contains information current at the time of the abend, the most important being the register contents. Register 4 pOints to the current O/C/EOV work area. This work area is built by IFGORROA during problem determination and contains key information about the problem: the JFCB, lOB, DEB and other pertinent fields are all saved in the work area for use later by the recovery routines. The O/C/EOV work area is documented on microfiche in each O/C/EOV module. The module in control at the time of the abend can be determined from the "Where To Go" (WTG) table, which is pointed to by register 6 in the RTM2 work area. The WTG table is contained within another work area called the O.C. work area. IFGORROA saves a copy of the current DCB in this work area. If multiple DCBs are involved, the prefix to the DCB work area points to another DCB work area. These DCB areas are laid out precisely like a DCB. All these work areas and their prefixes are documented at the end of every O/C/EOV module in the microfiche. Section 2: Important Considerati.ons Unique to MVS 2.7.5 Miscellaneous Debugging Hints (continued) In an MVS environment, O/C/EOV must build these work areas rather than rely on what is in real storage at the time of the dump. The main task is to find these . ·areas and interpret their fields using microfiche. A quick way to find these work areas is to find subpool230 in the dump. All O/C/EOV data is in this subpool. Assuming you have all the. pertinent information about the failure, the problem becomes the same as an O/C/EOV problem in OS. One more point: built into the code is message IEC999I. This message indicates that there is a problem in the O/C/EOV code that cannot be determined. While you may be able to circumvent this problem, you should also submit an APAR for it. Debugging Machine Checks The machine check interruption is the hardware's method of informing the MVS control program that it 'has detected a hardware malfunction. Machine checks vary considerably in their impact on software processing. Some machine checks notify software that the processor detected and corrected a hardware problem that required no software recovery action (software calls these errors soft errors). Hard errors are hardware problems detected by a processor but that require software-initiated action for damage repair. Hard errors also require software recovery to verify the integrity of the process that experienced the failure. Obviously, if there are software problems after a machine check, it is more likely that the machine check was a hard error. It is important to get· a feeling for which software components are affected by particular hardware failures. The machine check interrupt code (MCIC), located in the PSA, describes the error causing the interrupt. The following discussion shows how to find MCICs and how to interpret them for subsequent software processing. Machine checks can be found in a LOGREC buffer (LRB), the SYS I.LOGREC data set, or in the storage area used as a buffer prior to writing records to SYSI.LOGREC (see the discussion of SYSI.LOGREC analysis in the "Recovery Work Areas" chapter earlier in this· section). Also, a pointer to the LRB that describes the last machine check that occurred ona processor can be found in that processor's PCCA at PCCALRBV (PCCA+X'AO'). The LRB contains the machine check interrupt code (MCIC), except when: • The machine check old PSW is zero. The MCIC is also zero. The LRBMTCKSbit (field LRBTERM at LRB+X'20') is turned on by software. • MCIC is zero and the machine check old PSW is non-zero. The LRBMTINV bit (field LRBTERM at LRB+ X'20') is turned on by software. The MCIC is the principal driver of software processing after a machine check. \ It must be examined to determine the'actions that MVS should take. The MCIC contains bits describing the conditions that caused the interupt. Note that more than one failing condition can be described by a machine check at one time. Software performs repair processing for each condition found; software recovery processing is initiated if any hard error conditions are found (except in'the cases described on the following pages). 2.7.6 OS/VS2 System Programming Library: MVS Diagnostic Techniques Miscellaneous Debugging Hints (continued) Because hard errors require FRR and ESTAE processing, identifying a hard error is important. Important MCIC bits are listed below, with a description of their hardware significance and impact on software. A handy MCIC reference matrix, containing additional machine check and ensuing action-taken information appears at the back of this section. Bit 0 (System damage) - The processor is still useable, but damage occurred while the processor was in the process of changing PSWs or otherwise changing system control, and thus has lost the associated process or interrupt. Software recovery routines (FRRs) are entered for this hard error. Bit 1 (Instruction processing damage) - The processor is still useable but an instruction has failed to operate as intended. Software recovery is initiated for this hard error, unless the backed-up bit is on with storage error or key error uncorrected on refreshable storage (see Bit 16 description). Bit 2 (System recovery) - The processor detected and corrected a potential hardware problem. The interrupted process is completely restored by software for this soft error; no repair is performed and no recovery routines are entered. Bit 3 (Timer damage)· - The interval timer at PSA location X'SO' has failed. Because MVS does not use this timer, this failure is ignored (indicated as a soft error). Bit 4 (Timing facility damage) - Damage has occurred to the CPU timer, clock comparator, or time-of-day clock. The particular clock facility that is damaged is described by MCIC bits 46 and 47. A first failure to a facility results in an attempt to reuse it. Subsequent failures result in taking the facility offline (described in the PCCA fields PCCATODE, PCCACCE, or PCCAINTE). If no clock of a particular type remains in the system, any task which requests timing using that type of clock is sent through software recovery. This is treated as a soft error for the process current on the processor at the time of the interrupt. Bit 5 (External damage) - Damage has occurred to a unit external to the processor. MVS expects more information in a channel check I/O interrupt. This is treated as a soft error. Bit 7 (Degradation) - The system has detected that elements of the high-speed buffer (cache) or translation look-aside buffer have had bit (parity) errors. The bad elements are automatically reconfigured out of the buffer. Once a predefined threshold of degradation machine checks is reached, the buffer and the translation look-aside buffer are reset, thus making the entire buffer available again. This threshold has a default value of 3 which can be changed by the operator via the MODE command. Until then, the system might perform at a reduced rate because of increased storage access time (cache element deletion) or increased time to translate virtual addresses (because of translation look-aside buffer element deletion). However, because no damage has been dane to any software processor data, this soft error is merely recorded in SYSI.LOGREC. The system state at the time of the error is re-established, ignoring the occurrence of the buffer bit error. It is treated as a soft error and no software recovery is initiated. Section 2: Important Considerations Unique.to MVS 2.7.7 Miscellaneous Debugging Hints (continued) Bit 8 (Warning) - Damage isjmminen~; there is a cooling loss or a power drop, etc. Software determines if the error is transient or permanent. If it is transient, the warning interrupt is treated as a soft error. If permanent, an attempt is made to invoke the power warning feature software, to record the system state at the time of this hard error. Bit 16 (Storage error uncorrected) - There is a block in storage with a double bit error that is located at the real, prefixed address stored in PSA location X'F8'. If the frame's page is refreshable, that is, unchanged, pageable, and in the current address space, it is marked invalid so a future reference will cause a fresh copy to be paged into a new frame. (Note: More than one error can occur before the page goes offline.) In all cases, an attempt is made to take the damaged frame offline (unless the frame is in the nucleus). For unchanged nucleus frames, the page is refreshed from a copy paged-out at NIP time. When a storage error uncorrected condition occurs in conjunction with a system recovery or external damage error, it is treated as a soft error and no recovery routines are entered. If the storage error occurs in conjunction with instruction processing damage when the backed-up bit (bit J4) and storage logical validity-bit (bit 31) are on, and the frame's page is refreshable, the error is treated as soft and no recovery routines are entered. Any other occurrences of storage error uncorrected are treated as hard errors and software recovery is initiated for the error. Bit 17 (Storage error corrected) - A single-bit storage error was detected and successfully corrected by hardware. Software treats this error as a soft error. This error sometimes appears in conjunction with system recovery (bit 2). Bit 18 (Storage key error uncorrected) - Hardware has detected a bit error in a storage key. Software attempts to reset the storage key to its original value. If the key is successfully reset, and the storage key error occurs in conjunction with instruction processing damage when the backed-up bit (bit 14) and the storage logical validity bit (bit 31) are on, the error is treated as soft and no recovery routines are entered. When thestorage key error occurs in conjunction with a system recovery or external damage error, it is also treated as a soft error and no recovery routines are entered. Change bits are set to one in case the frames have been altered. Any other occurrences of storage key error are treated as hard errors and software recovery is initiated for the error. In addition to these error description bits there are other MCIC fields that describe the time-of-occurrence of the machine check interrupt, or the validity of the registers, PSW, and other data logged out during the machine check interruption process. The two time-of-occurrence bits are bits 14 and 15. The backed-up bit (bit 14), when set to 1, indicates that the machine check occurred before actual damage occurred. The delayed bit (bit 15) is set to 1 when the processor has been disabled for one or more of the interrupt conditions described in the MCIC. The processor had been processing after damage was detected. 2.7.8 OS/VS2 System Programming Library: MVS Diagnostic Techniques Miscellaneous Debugging Hints (continued) Validity bits describe the validity of the associated field logged out during the machine check interrupt. If a validity bit is 0, the associated data logged out is ineorrect. Validity bits are: • Bit 20 (PSW EMWP mask validity) • Bit 21 (masks and key validity) • Bit 22 (program mask and condition code validity) • Bit 23 (instruction address of machine check old PSW validity) • Bit 24 (failing storage address validity) • Bit 25 (region code validity) • Bit 27 (floating point register validity) • Bit 28 (general purpose register validity) • Bit 29 (control register validity) • Bit 30 (processor model-depe.ndent logout validity) • Bit 46 (processor timer validity) • Bit 47 (clock comparator validity) Additionally, the storage logical validity bit (bit 31 set to 1) indicates that all store operations (that were to occur before the machine check interrupt) have completed. Section 2: Important Considerations Unique to MVS 2.7.9 Miscellaneous ,Debugging '1'Hints (continued) . '·,1 The following chart attempts to show the action taken for each error condition. For example: In column 6 the condition involves recursive machine checks, or, a check stop, or, invalid logout. The condition originated on either a Model 158 or a Model 168 attached PI:oc~ssor system, and did not involve the APU. The action taken resulted in a disabled wait. Where multiple errors do exist, appropriate repair action is taken for all errors, and recovery action is taken for the most severe error. With the exception of I/O reserve outstanding, the status of each of the conditions can be determined from examination of MCH SYSI.LOGREC records. CONDITION 1 2 3 4 5 6 Recursion X X X X X X X X X X X X X X X X Check Stop X Invalid Logout Subclass (MCIC) System Damage 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 X X X Inst. Proc'g. Damage System Recovery X X X X .. Timer Damage Clock Damage X X External Damage X X X X X X Degradation X Warning Time Backed Up Type Stor Err Uncorr X 0 X Delayed X 0 0 X X X X X X Stor Err Corr X Key Error X X X X Key Err Unresetable PSW (WP, MS, PM, IA). Validity Failing Stor Addr Registers (FP, GR, CRI X X X X X X X X X .' 0 0 0 Logout Storage Logical CPU Timer 0 X Clock Comparator Location Pageable Nucleus X X X LSOA, SOA X Fixed X 0 X X 0 X 0 X X X X X X X X X X V=R Outside Curro Memory Storage State 0 UP X X 158 X X X X X 168 Reserve Outstanding 1st X X X X X X (X X I\"X X 0 APU I/O X X MP Occurrence IlX) X AP Processor I X X 0 Changed Unchanged System 0 1(5<) X X X X X X X X X 2nd X X X ACTION TAKEN X X Reset timing component X X Mark CPU Timer perm. damaged X Mark Clock Comp perm. damaged X X Mark TOD Clock perm. damaged X Invoke PWF if available X Activate CRH Take frame offline immed. X X X X X Take frame offline when avail. Invalidate Page Table Entry Resume at MCOPSW X X X Take Processor offline X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Notes: • Key. X = Condition must be present o = Condition must not be present ©= 2~·7.10 X X X Refresh the nucleus page 'Possible loss of Job. X X X X X Restartable Wait Enter RTM for Recov. Record X X Repair SPF Key Disabled Wait X The action is the same no matter which condition represents the situation OS!VS2 System Programming Lihrary: MVS Diagnostic Techniques Miscellaneous Debugging Hints (continued) Debugging Problem Program Abend Dumps The following steps may provide some initial assistance in this debugging process: I. Locate the RTM2 work area (RTM2WA), which is pointed to by the TCBRTWA field in the TCB and the ESART2WA field in the abend SVRB. It provides a summary of the abend as follows: Name Offset Explanation RTM2CC ID Abend completion code. RTM2ABNM 8C Abending program name. This is the name of a load module or an external entry point (ALIAS) in the load module. RTM2ABEP 94 Abending program address (the beginning 0 f the load module or an ALIAS in the load module). RTM2EREG 3C Registers at time of error. RTM2APSW 7C EC PSW at time of error. RTM2ILCI. 85 Instruction length code for PSW at time of error. RTM2ERAS 36C Error ASID. RTM2TRCU 37C Address of current trace entry for saved system trace table. RTM2TRFS 380 Address of first trace entry for saved system trace table. RTM2TRLS 384 Address of last trace entry for saved system trace table. RTM2ERR;\ B4 Error type. Notes: • The RTM2ABNM and RTM2ABEP fields do not contain information about the abending program if an SVC has abended . • In a recursive abend (an abend occurring while the original abend is . being processed by an ESTAE or other recovery routine), more than one RTM2WA may be created, and the RTM2PREV or RTM2PRWA field points to other RTM2WAs associated with the problem. The system diagnostic work area (SDWA) is pointed to by the RTM2RTCA field during recovery routine processing, and has register contents at time of error stored in the SDWAGRSV field. These register contents may differ from those in the RTM2WA after a recursive abend. Section 2: Important Considerations Unique to MVS 2.7.11 Miscellaneous Debugging Hints (continued) 2. To find the abend code and its explanation, look at the completion code at . the top of the abend dump. A user completion code is printed as a 4-digit decimal number and a system completion code is printed as a 3-digit hexadecimal number. If the user code is non-zero, a user program has specified the completion code in an abend macro instruction. Looking up the name of the abending program in the RTM2WA, and investigating why the program would issue this completion code, should lead directly to the cause of the error in the user program. Usually the system code is non-zero. This indicates that a system routine issued the abend but a problem program might indirectly have caused the abnormal termination. For example, a problem program might have branched to an invalid storage address, specified an invalid parameter on a macro instruction, or requested too much storage space. Often the explanation of the system code gives enough information to determine the cause of the termination. The explanations of system completion codes, along with a short description of the action for the programmer to take to correct the error, are contained in OS/VS Message Library: VS2 System Codes. A summary of system codes is in the Debugging Handbook Volume 1. Note: Completion codes are not printed at the top of abend dumps that are formatted with the AMDPRDMP service aid. System completion codes can be found in the third to fifth digits (OOxxxOOO) of the abend completion code in the RTM2 work area. User completion codes are located in the sixth to eighth digits (OOOOOxxx) of the abend code in the RTM2 work area, and in this case are in 3-digit hexadecimal form. 3. To find the name of the abending program look in the RTM2 work area. System routines usually start with the letters A or I; and module prefixes for system routines are listed in the Debugging Handbook Volume 1. Note: If the RTM2 work area is not available, or if the name of the abending program is not given in the RTM2 work area, the routine name can be obtained from the request blocks (RBs) that are formatted in the dump. If the ABEND dump wastaken to a data set (or to SYSOUT) specified with a SYSABEND, SYSMDUMP, or SYSUDUMP DD statement, the last two RBs are SVRBs for the SNAP and SYNCH SVCs used to take the dump. The SVC numbers can be checked.by obtaining the hexadecimal SVC number from the interruption code of the WC-L-IC field in the RB. The Debugging Handbook contains a list of SVC numbers. The SNAP SVC is hexadecimal '33', and the SYNCH SVC is hexadecimal 'OC'. The RB for the program that caused the abend is immediately before these two RBs. CSECTs within load modules in the private area of an address space can be located using a linkedit map produced by the AMBLIST service aid. CSECTs in load modules in the nucleus, FLPA, or PLPA can be located using a nucleus or link pack area map, also produced by AMBLIST. 2.7.12 OS/VS2 System Programming Library: MVS Diagnostic Techniques Miscellaneous Debugging Hints (continued) 4. To find the instruction that caused a program interrupt (program check) completion code (OCx) in a problem program, examine the PSW at the time of error. It is at the top of the abend dump, in the RTM2 work area, and in the RB for the program that caused the abend. The instruction address field in the PSW contains the address of the next instruction to be executed. The length of the abend-causing instruction is printed following the' instruction length code's title 'ILC' at the top of some abend dumps. It is also located in the RTM2ILCl field (see the RTM2 work area), and is formatted in the third and fourth di.gits (OOxxOOOO) of the WC-L-IC field in the PRB. The address of the instruction that caused the termination can be found by subtracting the instruction length from the address in the PSW. Subtract the program address found in the RTM2WA (and in the last PRB) from the instruction address. The resulting offset can be used to find the matching instruction in the abending program's assembler listing for this CSECT. S. To find the cause of a program interrupt, check the explanation of the system completion code and the instruction that caused the interrupt. Also check the registers from the time of error which are saved in the RTM2WA and in the SVRB following the RB for the program that caused the abend. The formatted save area trace can be used to check the input to the failing CSECT. 6. To find the cause of an abend code from anSVC or from a system I/O routine, check the explanation of the system completion code, then find the last instruction executed in the failing program and examine the related SVC and I/O entries in the trace table or GTF trace records. The last PRB in the formatted RBs has a PSW field containing the address of the instruction following the instruction that issued the SVC. For I/O requests, check the entry point address (,EPA') field in the last PRB. The formatted save area trace gives the address of the I/O routine branched to, and the return address in that save area is the address of the last instruction executed in the failing program. The trace information can be checked for SVC entries that match the formatted SVRBs, or for I/O entries issued from addresses in the failing program. The trace information is formatted in the dump if the installation has specified it as a dump option. If the system trace table is not formatted, look in the RTM2 work area for pointers to the copy of the system trace table that was saved from the time of the error. Location X'S4', which is the FLCTRACE field in the prefixed save area (PSA), points to the system trace table header. The system trace table is frequently overlaid with entries for other system activity by the time the dump is produced. If the dump contains trace records, begin at the most recent entry and proceed backwards to locate the most recent SVC entry indicating the problem state. From this entry, proceed forward in the table. Examine each entry for an error that could have terminated the SVC or I/O system routine. The format of system trace table entries is described in the Debugging Handbook under the heading 'TTE Trace Table Entry.' The format of GTF trace records is also described in the Debugging Handbook. Section 2: Important Considerations Unique to MVS 2.7.13 Miscellaneous Debugging Hints (continued) Debugging from Summary SVC Dumps The summary dump area formatted by the SUMDUMP option of SDUMP should contain the most current data relevant to the problem present in the dump. It is strongly recommended that the SUMDUMP output be reviewed prior to investigating the usual portions of the dump. The SUMDUMP option provides different output for SVC and branch entries. For example, branch entries general' dump PSA, LCCA, and PCCA control blocks, and SVC entries generally dump RTM2WA control blocks. Each output type is indicated by the header "- - - - tttt - - - - RECORD ID X'nnnn'," where tttt is the title for the type of SDMDUMP output, and nnnn is the hexadecimal record identifier assigned to the type. The record id values are described in the table below. They are also described by the IHASMDLR mapping macro in the Debugging Handbook. SUMDUMP Output for SVC-Entry SDUMP , The following table summarizes the SUMDUMP output types for an SVC entry to SDUMP: SVC-ENTR Y TABLE Record ID Dec. Hex Title Mapping Macro TRACE TABLE TTE 4 4 46 2E SUMLIST RANGE 48 30 REGISTER AREA 49 31 PSW AREA 53 35 NORMAL DATA END 57 39 RTM 2 WORK AREA IHARTM2A 58 60 3A 3C RTM2WA TRACE TAB ASIDINFO TTE Fields used to Dump PSW or Register Areas RTM2NXTI RTM2EREG For an SVC entry to SDUMP, the SUMDUMP output can contain information that is not available in the remainder of the SVC dump if options such as region, LSQA, nucleus, and LPA were not specified in the dump parameters. For each address space that is dumped, the SOMDUMP output is preceded by a header with the ASID, plus the jobname and stepname for the last task created in the address space. The SUMDUMP output contains RTM2 work areas for tasks in address spaces that are dumped. Many of the fields in the RTM2WA provide valuable debugging information. (See "Debugging Problem Program ABEND Dumps" for more details.) 2.7 .. 14 OS!VS2 System Programming Library: MVS Diagnostic Techniques Miscellaneous Debugging Hints (continued) Each RTM2WA is followed by 'RTM2WA TRACE TAB' output (record id x'3A'), if there is a copy of the system trace table associated with the RTM2W A (RTM2TRCU, RTM2TRFS, and RTM2TRLS fields are non-zero). The current entry in the trace table copy is pointed to by RTM2TCRU (offset 37C) in the associated RTM2 work area. System trace table entries are mapped by the TTE (Trace Table Entry) section in the Debugging Handbook. Each RTM2WA is also followed by 'PSW AREA' output (record id X'31 '). A PSW area, consisting of the instruction pointed to by the RTM2NXTl field in , the EC PSW saved in the RTM2WA, and the preceding instruction with length from the RTM2ILCl field, is dumped if the instructions can be accessed .. After information for all RTM2WAs associated with a task is dumped, 'PSW AREA' (record id X'31') and 'REGISTER AREA' (record id X'30') output appears. This consists of 2K of storage before and after each valid unique address pOinted to by the PSW and the registers from the time of the error (RTM2NXTl and RTM2EREG fields) from all the RTM2 work areas. Up to 32 unique addresses can be dumped for each task. Register addresses less than 2K are not dumped because they are considered to be counters. If the storage that is 2K before and after an address cannot be accessed, a length of 300 bytes is tried. If that amount of storage cannot be accessed, the address' record entry appears with a zero length. 'TRACE TABLE' output (record id X'04') appears if the first address space dumped has no trace table saved in an RTM2 work area and the system trace was active. The output includes the header (pointers to the current, fi~st, and last entries) and the entries in the system trace table. System trace table entries are mapped by the trace table entry (TTE) described in the Debugging Handbook. 'SUMLIST RANGE' output (record id X'2E') appears at the beginning of the SUMDUMP output if the SUMLIST keyword was specified in the SDUMP macro instruction. Section 2: Important Considerations Unique tQ MVS 2.7.15 Miscellaneous Debugging Hints (continu.ed) SUMDUMP Output for Branch-Entry SDUMP The following table summarizes the SUMDUMP output types from a branch entry to SDUMP: BRANCH-ENTR Y TABLE Title Mapping Macro 1 2 3 PCCA LCCA PSA lHAPCCA IHALCCA IHAPSA 4 5 6 7 8 9 TTE TRACE TABLE FRRSTACK ·IHAYSTAK GWSAPAGE 10 ERR GWSA GET/FREEMAIN GWSA RSM GWSA RSM SUSPEND GWSA MEM SWITCH GWSASTATUS GWSASRM GWSA MEM TERM GWSA ENQ/DEQ GWSA STOP/RESTRT GWSA lEAVESCO CWSALOW-LVL CMN CWSAGTF CWSASRM CWSA TIMER CWSAACR CWSA RTM/MACHK CWSA lOS FLIH CWSA DISPATCHER CWSAMFI CWSAABTERM CWSA I/O RESTART CWSASTATUS CWSA SUPR REPAIR CWSA RTM-CCH Record ID Dec. Hex l' 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 2.1.16 A B C D E F 10 11 12 13 14 15 16 17 18 19 lA IB lC ID IE OS/VS2 System Programming Library: MVS Diagnostic Techniques Fields used to Dump PSW or Register Areas FLCIOPSW, FLCPOPSW FLCEOPSW,FLCROPSW Miscellaneous Debugging Hints (continued) BRANCH-ENTR Y TABLE (Continued) Record ID Dec. Hex Mapping Macro Title 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 IF 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 LWSA LOW-LVL CMN LWSA VALID'Y CHK LWSARTM LWSASDUMP LWSAABTERM LWSACIRB LWSA STG2 EXT EF LWSA EXIT (SVC3) LWSAPOST LWSAWAIT LWSA STATUS LWSASTAE LWSAEVENTS LWSARSM LWSA ASCB CHAP SUMLIST RANGE INT HANDLER SA REGISTER AREA PSW AREA GBL WSA VEC TABL 51 33 CPU WSA VEC TABL S2 34 LCL WSA VEC TABL S3 S4 55 S6 35 36 37 38 NORMAL DATA END CWSAASMDIE CWSA ASM SRB-I/O SDWA 60 3C ASIDINFO IHAIHSA Fields used to Dump PSW or Register Areas IHSAGPRS IHAWSAVT (WSAVTG) IHAWSAVT (WSAVTC) IHAWSAVT (WSAVTL) IHASDWA SDWAGRSV The SUMDUMP output for a branch entry to SDUMP might not match the data that is at the same addresses in the remainder of the dump. The reason for this is that the SUMDUMP is taken at the entry to SDUMP, and while the processor is disabled for interrupts. The system data in the remainder of the dump ~s often changed because other system activity occurs before the dump is complete. The SUMDUMP output is preceded by a header with the ASID for the failing address space. Section 2: Important Considerations Unique to MVS 2.7.17 Miscellaneous Debugging Hints (continued) From a branch entry into SDUMP, the SUMLIST range and trace table output is handled similarly to that from an SVC entry. However, SUM LIST addresses must point to areas that are paged~in or they cannot be dumped. The PSA, LCCA, and PCCA are dumped for each alive processor (record ids x'03', x'02', and x'Ol' respectively). The interrupt handler save area (IHSA - record id x'2F') is dumped for the current address space. This save area includes the current FRR stack for suspended address spaces. The system diagnostic work area (SDWA - record id x'38') is dumped for the current error if the RTMI work area is currently valid and being used. Unique register contents are obtained from the IHSA and the current SDWA. Each unique register value is used as an address and storage is dumped from 2K plus and minus this address for a total of 4K each. These 'Register Areas' are printed with record id X'30'. The Super FRR Stacks (record id X'05'), including RTMI work areas are dumped. The global, local, and processor work save area vector tables (record id X'32', X'34', and X'33' respectively) are dumped. The save areas pointed to by these save area vector tables are also dumped. The branch-entry table at the beginning of this description lists the record ids for each work save area. 2k of storage on either side of the address portion of the I/O old PSW, the program check old PSW, the external old PSW, and the restart old PSW saved in the PSA for all processors, is dumped. These 'PSW Areas' are printed with record id X'31'. Note: The SUMDUMP output from a branch entry to SDUMP only contains areas that were already paged in when the SUMDUMP was taken. I Started Task Control ABEND and Reason Codes In case of an irreparable error, the started task control (STC) routines issue these ABEND codes: OB8 - An error occurred while STC routines were processing a START, MOUNT, or LOGON command. In each case, the command task is terminated; for a START or MOUNT command, the STC routines issue message IEE824I. The following error codes ABEND: 04 2.7.18 ~an appear in register 15 at the time of the Module IEEPRWI2 or IEFJSWT detected an invalid command code in the CSCB; the command code was incorrect for a ST ART, MOUNT, or LOGON command. OS/VS2 System Programming Library: MVS Diagnostic Techniques Miscellaneous Debugging Hints (continued) )B9 - 08 - Module IEESB605 invoked IEF AB4FC (an Allocation routine) to build a TIOT for the START, MOUNT, or LOGON task; IEF AB4FC returned control to IEESB605 with a return code indicating failure. 12 - Module IEESB605 invoked IEFJSWT (an STC routine) to . write the internal JCL text for the START, MOUNT, or LOGON command into system data set; IEFJSWT returned control to IEESB605 with a return code indicating that it failed in its attempt to open the data set. Module IEESB605 invoked the master subsystem via the subsystem interface to determine whether a START command was issued to start a subsystem; an error occurred during master subsystem processing. The command task is terminated; for a START or MOUNT command, IEESB605 issues message IEE8241. OBA - Module IEESB605 invoked the master subsystem via the subsystem interface to determine whether a START command was issued to start a subsystem; an error occurred during subsystem interface processing. The command task is terminated; for a START or MOUNT command, IEESB605 issues message IEE8241. I SWA Manager Reason Codes In case of an irreparable error, the SWA manager routines issue a OBO ABEND. Before abending, both object modules IEFQB550 and IEFQB555 place a code in register 15 indicating the exact cause of the error. These are the error codes that can appear in register 15: 04 - The routine that called SWA manager requested an invalid function. 08 - The routine that called SWA manager passed an invalid SW A virtual address (SV A). Either the SVA does not point to the beginning of a SWA prefix or the SWA prefix has been destroyed. OC - A SWA manager routine has attempted to read a record not yet written into SWA. 10 - Either IEFQB550 (move mode module) has attempted to read or write a block which is not 176 bytes or IEFQB555 (locate mode module) has attempted to assign a block with a specified length of 0 or a negative number. 14 The routine that called SWA manager has specified an invalid count field. For move mode, an invalid count is 0 for a READ, WRITE, or ASSIGN function; an invalid count for WRITE/ASSIGN is 00. 18 The routine that called SWA manager by issuing the QMNGRIO macro instruction specified both or neither of the READ or WRITE options. 1C The routine that called SWA manager was attempting to write into a SWA block for the first time; it either passed a nonexistent ID or failed to pass one at all. 20 IEFQB555 has attempted to write a block using an invalid pointer to the block. Section 2: Important Considerations Unique to MVS 2.7.19 2.7.20 OS/VS2 System Progran:lming Library: MVS Diagnostic Techniques Additional Data Gathering Techniques This chapter describes additional techniques for gathering data and circumventing certain system problems. The superzaps should be checked out before they are applied to your system. Displacements vary according to release level and PTP activity. The examples were deliberately kept simple and are designed to illustrate a technique rather than to be practical in themselves. CA UTION: Extreme care must be used when you are considering a system alteration in order to gather additional data about a problem. None of the Super~aps descnoed in this chapter should be applied before the system programmer has verified the logic being zapped and the trap logic itself. Remember if anyone location or offset within the module or trap changes, all offsets and base registers must be verified. This chapter contains the following topics: I• I Using the CHNGDUMP , DISPLAY DUMP, and DUMP Commands • How to Print Dumps • How to Automatically Establish System Options for SVC Dump • How to Copy PRDMP Tapes • How to Rebuild SYSl.UADS How to Print SYS 1.DUMPxx • How to Clear SYSl.DUMPxx Without Printing • How to Print the SYSl.COMWRITE Data Set • How to Print an LMOD Map of a Module • How to Re-create SYSl.STGINDEX • Software LOGREC Recording • Using the PSA as a Patch Area • Using the SLIP Command • Enabling the PER Hardware to Monitor Storage Locations • System Stop Routine • Using the MVS Trace to Monitor Storage • How to Expand the Trace Table I• Section 2: Important Considerations Unique to MVS 2.8.1 Additional Data Gathering Techniques (continued) Using the CHNGDUMP, DISPLAY DUMP and DUMP Commands A dump obtained from MVS contains those storage areas specified in the dump request and those defined as system defaults in SYSl.PARMLIB for SYSABEND, SYSMDUMP, and SYSUDUMP. Normal system defaults are: SYSABEND: CB, ENQ,TRT,ALLPA, SPLS, LSQA, PSW, REGS, SA, DM, 10, and ERR SYSMDUMP: LSQA, NUC, RGN, SQA, SWA, and TRT SYSUDUMP: CB, ENQ, TRT, ALLPA, SPLS, PSW, REGS, SA, DM, 10, and ERR There are no defaults for an SVC dump other than SQA, ALLPSA, and SUMDUMP, which are assumed by the dump program if the options NOSQA, NOALLPSA,and NOSUM are not specified. The CHNGDUMP operator command is llsed to dynamically alter the options specified originally by SYS I.PARMLIB or by previous CHNGDUMP commands. Dump mode may be set to ADD, OVER, or NODUMP. System action for each setting is: ~equest ADD - merges the options specified on the dump the system dump options list. OVER - ignores the options specified in the dump request and uses only the options in the dump options list. with the options in NODUMP - ignores the request and does not dump. To determine the c.urrent system dump options, use the DISPLAY DUMP, OPTIONScommand. If an error is made while specifying the CHNGDUMP command, the system rejects the command and issues an error message. The topic '.'How to Automatically Establish System Options for SVC Dump", which appears later in this chapter, describes how to issue the CHNGDUMP command during IPL. See Operator's Library: OS/VS2 MVS System Commands for the format of the CHNGDUMP command. The DUMP command must be used carefully if the desired dump is to be obtained. For instance, the following typical error can occur when requesting a dump. The operator enters DUMP COMM=(title). The system responds with message IEE094 requesting the dump parameters. If the operator replies 'U' to this message, the system dumps the current address space which is the master scheduler address space. The operator must reply with ASID, Jobname, or TSOname. See Operator's Library: OS/VS2 MVS System Commands for the format of the DUMP command. How to Prin t Dumps The PRDMP control statements can be used to minimize the size of the output produced from a stand-alone dump and still keep the number of reruns to a minimum. This section discusses the DD statements and control statements used in the following example: 2.8.2 OS/VS2 System Programming Library:' MVS Diagnostic Techniques Additional Data Gathering Techniques (continued) /lASIDDMP JOB MSGLEVEL=1 /I EXEC PGM=AMDPRDMP /lPRINTER DD SYSOUT=A /lSYSPR INT DD SYSOUT=A /lTAPE DD UNIT=TAPE,LABEL=(1,NL),VOL=SER=ABCTPE,DISP=OLD /lSYSUT1 DD UNIT=251 ,SPACE=(TRK,(400,20)) ,DISP=NEW /1* PRINT STORAGE=ASJD(X)=(X,X,X,X,X,X) IS PROPER FORMAT CVTMAP CPUDATA SUMMARY QCBTRACE SUMDUMP LPAMAP FORMAT EDIT PRJ'NT CURRENT,SQA PR I NT STORAG E=ASI D(X)=(xxxx,xxxx,xxxx,xxxx) PRINT JOBNAME=(jobnames) PRINT REAL=(xxxx,xxxx) ASMDATA END The PRINTER DD statement defines the output data set for the dump itself. It should be directed to aSYSOUT class as shown. The SYSPRINT DD statement defines the data set for PRDMP messages, etc. The TAPE DD statement defines the input data set to PRDMP. It can define one of the SYS1.DUMPxx data sets, a stand-alone dump tape, or a GTF output data set on either tape or DASD. The SYSUTI DD statement defines work space to PRDMP. It can be used to define the input data set. It is not required if the input data set is defined by the TAPE DD statement. It does, however, significantly enhance the performance of PRDMP when it is used in conjunction with the TAPE DD statement and when the input is a tape data set. The SPACE parameter is determined by the size of the dump. Generally 5 cylinders or 95 tracks or 285 4104 records should be specified for each megabyte of real storage dumped by SADMP. Control Statements The placement of the control statements determines the sequence in which the dump is printed. Refer to the "Dump and Trace Formats" section of the Debugging Handbook for examples of how these statements format a dump. The following statements should be included in ev.ery run of PRDMP: SUMMARY - defines and prints the dump ranges of the dump, active processor, active tasks, etc. CVTMAP - formats the CVT and can be an aid in finding other Significant control blocks in thesystem. CPUDATA - formats the CSD, PSA, PCCA and LCCA for each active processor. Section 2: Important Consid.erations Unique to MVS 2.8.3 Additional Data Gathering Techniques (continued) QCBTRACE - formats the END/DEQ control blocks in use at the time the dump was taken. SUMDUMP - locates and prints the summary dump data provided by SVC dump. It should be used on all SVC dumps. LPAMAP - provides a listing of the modules on the link pack area list. It identifies the entry point address of those modules and their length. It does not identify SVC modules since they are found by the SVC table. The FORMAT statement can produce voluminous data depending on the number of address spaces defined at the time the .dump is taken. However, it should be included in the initial run of PRDMP because it produces the formatted TCB summary showing the abend completion codes for each TCB in the system and the global and local SPLs. The EDIT statement should also be included in every initial run ofPRDMP. It formats and prints the GTF buffers (that is, all internal trace buffers or those external trace buffers that havenot been written to the TRACE data set) if GTF is active at the time the dump is taken. If GTF is not active, only an error message is printed. The OS trace is not valid if GTF is running. The PRINT statemen t can be used several ways: 2.8.4 • PRINT CURRENT, SQA - should be included in the initial rUn of PRDMP. It formats and prints the address space and task-related control blocks of the address space active at the time the dump is taken. SQA should be printed for the valuable data it contains such as trace table, and logrec buffers. PRINT CURRENT prints only the current address space of the processor from which the SADMP program was IPLed. • PRINT NUC, CSA - should not be included in the initial run of PRDMP because of the volume of data it produces. Once a problem is suspected in this area, the PRDMP program should be rerun specifying only these parameters. • PRINT STORAGE=ASID(x)=(xxxx,xxxx) - should not be included in the initial run of PRDMP. Once a problem is isolated to an address space or a range of storage addresses, rerun PRDMP specifying only these parameters. Several AS IDs and several address ranges can be requested with one run of PRDMP. PRDMP does not duplicate address ranges for every ASID but prints all storage dumped (NUC, CSA, SWA, LPA in storage) if only ASIDs are specified without address ranges. PRINT STORAGE is useful for printing SVC dumps. See the discussion "How to PrintSYSl.DUMPxx" later in this chapter. • PRINT JOBNAME=Gobnames) - produces output equivalent to PRINT CURRENT except it prints the private address space ofjob(s) requested. It should not be used for the initial run of PRDMP unless the jobname is known from another source, such as the system log. OS!VS2 System Programming Library: MVS Diagnostic Techniques Additional Data Gathering Techniques (continued) • PRINT REAL=(xxxx,xxxx) - prints real storage in specified address range pairs. Use this option only when the system cannot find adequate data to format the dump. ASMDATA - formats and prints all ASM control blocks. It produces voluminous data and should not be run until an ASM failure is suspected. How to Automatically Establish System Options for SVC Dump A potential problem is that the SVC dumps written to the SYSl.DUMPxx contains only those address ranges that the FRR or EST AE routine passes to SDUMP. When these dumps are subsequently printed by PRDMP, the PRDMP formatting program might not find sufficient data to format the dump properly. This can make it difficult to find data in an SVC dump and it can provide erroneous indicators to the problem solver. The CHNGDUMP command can be used to alter the SVC dump system options and provide a complete dump. The following job updates the COMMNDOO member of SYSl.PARMLIB to issue the CHNGDUMP command automatically at IPL time. The CHNGDUMP command can also be entered by the operator. (See Operator's Library: OS/VS2 MVS System Commands for a description of the CHNGDUMP command.) IIUPDAT JOB ("S,S),MSGLEVEL=1,REGION =100K II EXEC PGM=IEBUPDTE IISYSPRINT DDSYSOUT=A IISYSUT1 DO UNIT=SYSDA,VOL=SER=SYSRES,DISP=OLD,DSN=SYS1.PARMLIB IISYSUT2 DO UN IT=SYSDA,VOL=SER=SYSR ES,D ISP=OLD ,DSN=SYS 1.PARM LI B ffSYSIN DO DATA .f REPL NAME=COMMNDOO,LIST=ALL .I NUMBER NEW1=10,INCR=20 COM='TRACE ON' COM='CD SET,SDUMP=(PSA,NUC,SQA,LSQA,RGN,TRT), Q=YES,ADD' ./ ENDUP How to Copy PRDMP Tapes It is sometimes necessary to copy dump tapes to supply another location with a copy of the dump while retaining your own. It is particularly useful to be able to supply a dump tape with an AP AR. A simple way to do this is to use PRDMP as a copy program. Define the input tape with the TAPE DD statement and the output tape with the SYSUT2 DD statement. It is also possible to put several dumps on one tape or take one dump from a multiple dump tape by manipulating the file number parameters in the label parameter. The following example shows how this is done: Section 2: Important Considerations Unique to MVS 2.8.5 · Additional Data Gathering Techniques (continued) I/ASIDDMP JOB MSGLEVEL=1 EXEC PGM=AMDPRDMP /IPRINTER DO SYSOUT=A IISYSPRINT DDSYSOU.T=A IITAPE DO UNIT=TAPE,LABEL=(2,NL),VOL=SER=DMPIN,DISP=OLD IISYSUT2 'DO UNIT=:=TAPE,LABEL=(,NL),VOL=SER=DMPOUT,DISP=(NEW,KEEP) /ISYSIN DO * END II 1* After 'copying a PRDMP tape, a quick run through PRDMP to verify that the CVT can be formatted and printed will prove that the copy was successful. IIADMP JOB MSGLEVEL=1 II EXEC PGM=AMDPRDMP IIPRINTER bb SYSOUT=A IISYSPRINT DO SYSOUT=A IITAPE DO UNIT=TAPE,LABEL=(1 ,NL) ,VOL=SER=DMPTPE,DISP=OLD IISYSUT1 DO UNIT=SYSDA,SPACE=(TRK,(400,20)),DISP=NEW CVTMAP END 1* How to Rebuild SYSl.UADS The loss of the SYSl.UADS data set can significantly impact a TSO environment. However, it is possible to run the TMP as a batch job and recreate SYS1.UADS in the background. The following is an example of a job that has been run successfully to scratch and recreate a SYSt;UADS data set. IIBLDUADS JOB MSGLEVEL=1 II EXEC PGM=IEFBR14 /1002 DO VOL=SER=SYSRES,DISP=(OLD,DELETE),UNIT=3330, /I DSN=SYS1.UADS II EXEC PGM=IKJEFT01 IISYSPRINT DO SYSOUT=A IISYSUADS DO DSN=SYS1.UADS,DISP=(NEW,KEEP),SPACE=(SOO,(20,9,30)), II UN IT=3330,VOL=SER=SYSR ES,DCB=(R ECFM=FB,DSORG=PO, LR ECL=80, II BLKSIZE=SOO) IISYSLBC DD DSN=SYS1.BRODCAST,DISP=SHR IISYSIN DO * ACCOUNT SYNC ADD (USER01 TSOTEST * IKJACCNT) UNIT(SYSDA) ACCT OPER JCL MOUNT ADD (USER02 TSOTEST * IKJACCNT) UNIT(SYSDA) ACCT OPER JCL MOUNT ADD (USER03 TSOTEST * IKJACCNT) UN IT(SYSDA) ACCT OPER JCL MOUNT ADD (USER04 TSOTEST *IKJACCNT) UNIT(SYSDA) ACCT OPER JCL MOUNT ADD (USER05TSOTEST * IKJACCNT) UN IT(SYSDA) ACCT OPER JCL MOUNT ADD,:(USER06 TSOTEST * IKJACCNT) UN IT(SYSDA) ACCT OPER JCL MOUNT ADD (USER07 TSOTEST * IKJACCNT) UN IT(SYSDA) ACCT OPER JCL MOUNT ADD (USEROS;TSOTEST * IKJACCNT) UN IT(SYSDA) ACCT OPER JCL MOUNT ADD (USER09 TSOTEST * IKJACCNT) UN IT(SYSDA) ACCT OPER JCL MOUNT ADD (USEROA TSOTEST * IKJACCNT) UNIT(SYSDA) ACCT OPER JCL MOUNT ADD (USEROB TSOTEST * IKJACCNT) UN IT(SYSDA) ACCT OPER JCL MOUNT ADD (USEROC TSOTEST * IKJACCNT) UN rr(SYSDA) ACCT OPER JCL MOUNT LIST (*) END 1* 2.8.6 OS/VS2 System Programming Library: MVS Diagnostic Techniques Additional Data Gathering Techniques (continued) How to Print SYSl.DUMPxx See the discussion under "How To Print Dumps" earlier in this chapter to define the control statements required. The same rules apply except in this case the TAPE DD statement points to one of the SYSl.DUMPxx data sets. These are catalo~d data sets and require no further definition. Be aware that the dump data sets contain only those address ranges passed to SVC dump by tb.e dump requestor and might not contain sufficient data for PRDMP to properly format all requested control blocks. Because SVC dumps usually contain a limited number of address ranges, printing the entire SYSl.DUMPxx data set is feasible and assures that all the information about the problem will'be available. See the next topic "How To Clear SYSl.DUMPxx Without Printing" for a description of how to clear the dump data sets for reuse. Note: Printing the dump data sets does not clear them as it did on previous systems. The following example shows how to print SYSl.DUMPOO: IIASIDDMP JOB MSGLEVEL=1 EXEC PGM=AMDPRDMP IIPRINTER DD SYSOUT=A IISYSPRINT DD SYSOUT=A IITAPE DD DSN=SYS1.DUMPOO,DISP=OLD IISYSUT1 DD UNIT=SYSDA,DISP=NEW,SPACE=(CYL,(10,5)) SUMMARY CVTMAP CPUDATA SUMDUMP LPAMAP PRINT STORAGE II /* How To Clear SYS 1.DUMPxx Without Printing In previous systems, printing the dump data set also cleared it and made it available for reuse. In MVS this is no longer true. The dump data sets can be cleared at 'SPECIFY SYSTEMS PARAMETERS' time during IPL. They can also be cleared and made available for reuse by using PRDMP to copy the data set to tape with the SYSUT2 DD statement pointing to the output data set. This must be a separate job step from printing the dump. If it has been determined that the SYSl.DUMPxx data set need not be saved, it can be cleared and made available for reuse by running PRDMP with the SYSUT2 DD statement defined as DUMMY. The following example shows how'to clear SYSI DUMPOO. See the example in the discussion "How to Copy PRDMP Tapes" e'arlier in this chapter for how to define the SYSUT2 DD statement to unload the SYSl.DUMPxx data sets. Section 2: Important Considerations Unique to MVS l.&' 7 Additional Data Gathering Techniques'(cohtinued) /lASIDDMP JOB MSGLEVEL=1' EXECPGM=AMDPRDMP l/PRINTERDD SYSOUT=A /lSYSPRINT DD SYSOUT=A IITAPE DD DSN=SYS1.DUMPOO,DISP=OLD /lSYSUT2 DO DUMMY IISYSIN DD * END II How To Print The SYSl.COMWRITE Data Set ,The follQwingjobwill format and print the TCAM SYSl.COMWRITE data set. Note that the PARM fields in the EXEC statement define the traces to be formatted and printed. See OS/VS TeAM Debugging Guide Levell 0 for more information on the use of the SYSl.COMWRITE data set. /lCOMWRITE JOB MSGLEVEL=1 /lSTEP1 EXEC PGM=IEDQXa,PARM='STCB,IOTR,BUFF' /lSYSPRINT DD SYSOUT=A ' /lSYSUT1 DD DSN=SYS1.COMWRITE,DISP=SHR I*' How To Print An LMOD Map Of a Module The following job produces a modUle cross-reference of the nucleus, module IEFW21 SD, and a link pack area map. In addition, AMBLIST produces an IDR listing or a complete hexadecimal dump of an object module. If you include the RELOC parameter, the cross-reference listing is based at the address the module is loaded in LPA. , Note that the JCL must contain a DD statement for every data set containing a module you referenced in the control card section. For more information about AMBLIST, see OS/VS2 System Programming Library: Service Aids. /lAMBLlST JOB MSGLEVEL=1 EXEC PGM=AMBLIST IISYSLlB,'OD DSN=SYS1.LPALlB,DISP=OLD IILOADLIB DD DSN=SYS1 ;NUCLEUS,DISP=OLD /lSYSPRINT DD SYSOUT=A /lSYSIN DD * LISTLOAD OUTPUT=XREF ,MEMBER=I EANUC01 ,DDN=LOADLIB LlSTLPA LlSTLOAD OUTPUT=XR EF,M EMB ER= I E FW21SD II "1* 2.8.8 OS/VS2 ,System Programming Library: MVS Diagnostic Techniques Additional Data Gathering Techniques (continued) How To Re-Create SYSl.STGINDEX It is possible for the SYS 1.STGINDEX data set to be destroyed because of system failure or operator intervention during an:lPLwith the coldstart (CLPA~CVIO) option. Loss of this data set prevents warmstarting the system or restarting jobs using VIa data sets. I The followingjob has been run successfully to recreate this data set. Remember to change the VOLUME and CYLINDERS parameters to apply to your system. /lSTGINDEX JOB MSGLEVEL=1 /lEXEC PGM=IDCAMS IISYSPRINT DO SYSOUT=A I!VOL DO DISP=OLD,UNIT=3330,VOL=SEA=SYSAES /lSYSIN DO * DEFINE SPACE(VOL(SYSAES)FILE(VOL)CYL(7)) DEFINE CLUSTER(NAME(SYS1.STGINDEX)VOLUME(SYSAES)CYLINDEAS(7)KEYS(128)BUFFEASPACE(5120)RECORDSIZE(2041 2041)REUSE)DATA(CONTROLlNTERVALSIZE(2048) )INDEX(CONTROLINTERVALSIZE(1024)) ,/ Software LOGREC Recording The following JCL defines a two-step job. The first step prints an event history report for all SYS1.LOGREC records. The second step formats each software~ IPL, and EOD record individually. The event history report is printed as a result of the EVENT=Y parameter on the EXEC statement of the first step. It can be a very useful tool to the problem solver because it prints the records in the same sequence they were recorded and therefore shows an interaction between hardware error records and software error records. I/EREP JOB MSGLEVEL=1 I/EREPA EXEC PGM=IFCEREP1,PARM='EVENT=Y,ACC=N',REGION=128K I/SERLOG DO DSN=SYS1.LOGREC,DISP=SHR I/TOURIST DO SYSOUT=A /lEREPPT DO SYSOUT=A,DCB=BLKSIZE=133 /lEREPB EXEC PGM=IFCEREP1 ,PARM='TYPE=SIE,PRINT=PS,ACC=N',REGION=128K /lSERLOG DO DSN=SYS1.LOGREC,DISP=SHR I/TOURIST DO SYSOUT=A I/EREPPT DO SYSOUT=A,DCB=BLKSIZE=133 1* See the discussion on LOGREC analysis in the "Use of Recovery Work Areas" chapter earlier in this section for an explanation of its use and for examples of the output produced. Section 2: Important Considerations Unique to MVS 2.8.9 Additional Data Gathering Techniques (continued) Using The PSA As a Patch Area There' are two areasin the PSA reserved for future expansion. They can be used for· quitk implementation of a'trap without having to consider base registers. They are X'410' - X'BFF' and X'ES4' - X'FFF'. Both of these areas are frequently used in examples throughout this chapteL CAUTION: Use extreme care when you use this method. Patches should be made only to disabled code unless the patch is completly reentrant. Saving registers and data in the PSA while the system is enabled could produce unpredictable results, especially in an MP environment where more than one PSA exists and the code could be interrupted and'subsequently redispatched on the other processor. Extreme care must be used when considering a system alteration in order to gather additional data about a problem. No superzaps should be applied before the system programmer has verified the logic being zapped and the trap logic itself. Remember if anyone location or offset within the module or trap changes, all offsets and base registers must be verified. Using the SLIP Command SLIP (serviceability level indication processing) provides a way of getting information from RTM prior to ESTAE or FRR recovery processing. This is in addition to the information ordinarily supplied by dumping services during abnormal termina. tion. The SLIP command, usual1y entered by a system programmer, either at the console or via the input stream, can al,soreside in theJ~OMMNDxx parmlib member. The SLIP command's purpose is to establish SLIP definitions of the error circumstances under which interception of an error is to occur, and of the action to be taken following the interception. ;If 1"4 As long as enough system queue area storage is available, SLIP definitions may be established at any time. ,The recovery termination manager (RTM) compares the SLIP definitions with the dynamic system conditions at the time of the error. If RTM detects a match, the requested action is taken. The ACTION keyword has the following options: I• ACTION=SVCD indicates that an SVC dump will be scheduled for the current ASID. This is the default option if ACTION is not specified. SDUMP parameters in this case are: SUM, SQA, RGN, TRT, LPA, CSA, and NUC. ( 2.8.10 PS/VS2 System Programll1!pgLibrary: MVS Diagnostic Techniques Additional Data Gathering Techniques (continued) One of the advantages of this dump over one taken by a recovery routine is that nothing has been done to correct the error situation. Although the bulk of the SVC dump is not taken until later , the summary dump portion preserves as much volatile data as possible. An SVC dump also contains more data than a SYSABEND or SYSUDUMP, and because it is machine readable, it can, if necessary, be copied onto a tape to accompany an APAR, or used with interactive dump display programs. The biggest advantage is in situations where no dump was occurring. • ACTION=W AIT indicates that the system will be placed in a 01 B wait state. At this time, the operator can find the save area where the stop/restart routine (IEESTPRS) saves the caller's (IEAVTSLP) registers. Register 2 contains the address of the RTM work area for the error. This is either IHAFRRS (RTMI) or IHARTM2A (RTM2). Register 4 contains the address of the SLIP control element (SCE), which contains the id for this trap. IHAWSAVT CVT X'2AC' CVTSPSA V X'24' WSAGREST V Save Area Registers 0-14 • ACTION=NODUMP indicates that SLIP is to set a flag in the RTM work area which is checked by the dump programs ABEND/SNAP and SVC dump. If the bit is on, all dump requests are ignored. Because the bit is in the RTM work area, only dumps requested during processing of this error by RTM or its subroutines (FRR and ESTAE) are suppressed. Shouid the error involve recursive entry into RTM, the bit setting is propagated to the next RTM work ~rea. This action is useful for preventing dumps that may not be needed (X22, X37, etc.) because accompanying messages provide sufficient information. It can also be used to prevent duplicate dumps for known problems which have already been documented. • ACTION=IGNORE indicates that the system will not do any further SLIP processing, and that normal system recovery will continue. This option is normally chosen for known errors. For example, if trapping OC4 completion codes and SLIP SET,COMP=OC4,ACTION=IGNORE,LPAMOD=MODX,END is entered after SLIP SET ,COMP=OC4,A=SVCD,END had been issued, it results in dumps for all OC4 errors except those in module MODx. The ACTION= IGNORE command must be issued after the original command because trap conditions are checked LIFO. ~ ~ Section 2: Important Considerations Unique to MVS 2.S.11 Additional Data Gathering Techniques (continued) It is also possible to display information about SLIP definitions by using the DISPLAY command at the operator's console. For details concerning operand usage and entering the SLIP and DISPLAY commands, see Operator's Library: OS/VS2 MVS System Commands. The folloWing is provided to demonstrate a typical application of the SLIP command: Obtaining a Dump with Queue Control Blocks and Elements An error in the DEQ SVC routine is suspected because whenever program DVTRTN executes, it abnormally terminates even though its parameter list is correct. The resulting abend dump does not include queue control blocks and queue elements. To get a dump that does include this information, issue the following SLIP command: SLIP SET,ID=QELS,COMP=X30,ERRTYPE=ABEND,JSPGM=DVTRTN,END ID identifies this SLIP definition as "QELS"; COMP specifies the applicable system completion code; ERRTYPE specifies that an abend condition must exist for this error interception; }SPGM identifies "DVTRTN" as the job step program that must be executing for this error interception; END denotes the end of this SLIP command. Designing an Effective SLIP Trap The design of a SLIP trap requires knowledge of the error conditions and what makes the error unique. An effective trap should catch only the intended error. To do this, the description should be as specific as possible. The best way to design a trap is from a dump of the error. In the case of the NODUMP action, a dump should be available. In other cases, an approximate dump (one taken near the time 'of the error) or one without sufficient information to 'debug might be available. The following chart lists several SLIP keywords and indicates the data area fields that SLIP compares them with. It should be understood that SLIP operates as a subroutine within the RTM. SLIP is called from either RTM 1 or RTM2, depending on whether the error environment allowed FRR or only ESTAE recovery respectively. The level of RTM in control affects the data areas available. The calls to SLIP are prior to calls to any error recovery routines, therefore it is possible that the data areas contained in a dump may have been changed since SLIP examined them. This is especially true of the COMP keyword value. Many recovery routines change the abend completion code to make it more specific. For example, a system service that receives a bad address from a user parameter list will get an OC4 which it converts to its own completion code meaning a bad parameter list. 2.8.12 OS/VS2 System Programming Library: MVS Diagnostic Techniques ,/ ~ Additional Data Gathering Techniques (continued) SLIP Keywords and Corresponding Data Areas Note: There may be several RTM2 work areas pointed to by the TCB if several abends occurred. The oldest one (last on the queue) is probably the best one to use. ERRTYP (RTMI) In RT1TENPT of the RTM1 work area is the number indicating the reason for entry into RTM1: 5=MACH l=PROG 2=REST 10=PGIO 3=SYCERR 15=MEMTERM 4=DAT (RTM2) The reason for entry into RTM2 is indicated by flags in the RTM2 work area as follows: RTM2MCHK=MACH RTM2SYCE=SYCERR RTM2PCHK=PROG RTM2TEXC=DAT RTM2RKEY=RESTART RTM2PGIO=PGIO RTM2SYCD=ABEND RTM2EOM=MEMTERM RTM2ABTM=ABEND MODE System mode at error time is indicated in the MODEBYTE as follows: 1 ... .1 ...... .. 1. . ... ... 1 .... 1 ... .1 .. .. 1. ... 1 MODESUPR MODEDIS MODEGSPN MODEGSUS MODELOC MODETYP1 MODESRB MODETCB Supervisor Control Physically disabled Global spin lock held Global suspend lock held Locally locked Type 1 SYC SRB mode Task mode (unlocked) Section 2: Important Considerations Unique to MVS 2.S.13 Additional Data GatheringYechniques (continued) (RTMl). The MODEBYTE value is contained in RTIWMODE. The PSW from SDWAECI is used for PP, Super, SKey, and PKey states. The SDWASTAF bit is used for ,RECV. (RTM2) In the ESAMODE field (SVRB + X'B1,l') of the SVRB pointed to byRTM2VRBC, are bits mapped by MODEBYTE as indicated above. For the PSW values, SLIP uses the RBOPSW field of the RB preceding the SVRB. The RTM2RECR bit must be on for RECV, and in the previous RTM2 work area the RTM2XIP bit plus the SCBINUSE bit of the SCB pointed to by RTM2NSCBN must be on. COMP (RTMl) In the SDWA, field SDWACMPC contains the original value. (RTM2) , The RTM2CC field contains the original value for each work area. JOBNAME (RTMI and RTM2) In the ASCB, fields ASCBJBNI or ASCBJBNS point to the job name for either initiated or started jobs. JSPGM This keyword does not apply to errors which enter RTMl, so if it is specified, the trap is limited to RTM2 type errors only. PVTMOD (RTM2 only) LPAMOD . ADDRESS The address used for these keywords is obtained from the same PSW used when checking values for the MODE keyword. Additionally, PVTMOD applies to RTM2 type errors only and restricts the trap accordingly. The module name for PVTMOD is compared with those in the CDE list for the jobstep TCB of the current address space. 2.8.14 OS/VS2 System Programming Library: MVS Diagnostic Techniques Additional Data Gathering Techniques (continued) ASID (RTMl) Both SOWAFMIO and SOW AASIO are checked. I (RTM2) Both RTM2FMID and ASCBASID are checked. Enabling The PER Hardware To Monitor Storage Locations A convenient place to hook the system is in the MVS trace table's common prologue code in lEAVTRCE. All interrupts and dispatcher entries enter this code. Therefore a modification here will enter this trap after every interrupt and before the dispatcher dispatches or redispatches any TCB or SRB. The trap in the examples below was jnserted in module lEAVTRCE three instructions after the label STR in place of the code that normally stores the timer value in the trace table. This trap does not stop the system but traces in the MVS trace the PSWs that alter a specified storage location. To stop the system, a branch from the program check FLIH can be made to a patch area, and a test can be made for the interrupt code of X'80' with a branch equal to a trap to stop the system. In the system dump, the instruction that performs the modification is pointed to by X'98' in the PSA. Care should be used with this diagnostic aid since degradation occurs in proportion to the number of interrupt.s taken. Only use it to monitor a section of storage which is never modified or only infrequently modified. Once the trap is in, there is ho need to re-IPL to remove it. Manually storing a word of zeros in control register 9 prevents further interrupts. Section 2: Important Considerations Unique to MVS 2.S.1S Additiorial Data Gathering Techniques (continued) Following is an example of the PER hardware trap to be applied by superzap. NAME IEANUC01 IEAVTRCE VER 03AO B2058000,4780B02C,D70380028002,D203C01C8002,947FC014 REP 03AO 4 7F00608,07000700 ,07000700,07000700,07000700,07000700 NAME IEANUC01 IEAVTRTS VER 0796 82001078 REP 0796 47F00600 NAME IEANUC01 IEAVFXOO V ER 0600 00000000,00000000,00000000 ,00000000 ,00000000,00000000,00000000 V ER 061 C 00000000,00000000,00000000,00000000,00000000 ,00000000 ,00000000 REP 0600 96401078,82001078 TURN ON PER BEFORE ENTERING FRR REP 0608 96400300 ALWAYS TURN ON PER FOR DISPATCHER REP 060C 4700B032,92F0060D BUT SET THE FIRST TIME SWITCH FOR THE REST REP 0614 96400058,96400060,96400068,96400070,96400078 SET THE NEW PSW. REP 0628 B79B0630, LOAD FUNCTION CODE, LOW AND HIGH RANGE REP 062C 47FOa032 RETURN TO MAINLINE REP 0630 XXOOOOOO FUNCTION CODE IN HIGH ORDER REP 0634 XXXXXXXX LOW RANGE l * REP 0638 XXXXXXXX HIGH RANGE r *Note: To check a word in storage starting at 9F41C for example, Lowrange address = 9F41C High range address := 9F41 F. To check a byte, use the same address in low and high. Because the switch is in the PSA, the control registers and NEW PSWs are initialized on both processors in an MP environment. However, they are set only once and not each time through the routine. The example in Figure 2-18 shows trace entries usingthe storage alteration mask (function code X'20'). The interrupt address is the address of the instruction that modified the monitored storage. PGM PGM PGM OLD PSW OLD PSW O,LD PSW 470C3080 A0009AOA 470CI080 E0034126 470C3080 AOOOBEF8 R15/RO 00009970 00DF467A R15/RO 00014C20 00DF467A R15/RO 0000BEB8 00DF467A R1 R1 R1 00FFOB08 00FF837C 00FF837C IDS 00400002 IDS 00400002 IDS 00400002 TCB 00000000 TCB 00000000 TCB 00000000 TME 9BD80401 TME 9BD80439 'TME 9BD8053F" Figure 2-18. Trace Example of PER Hardware Monitoring On occasion it might be necessary to monitor when only one address space is active. One way of doing this is to change the previous superzap example at address 060E from B032 to 0640 and include the following superzap. This superzap turns PER on only if the specified address space is active. NAME I EANUC01 I EAVFXOO V ER REP REP REP REP REP REP REP REP REP REP 2.8.16 0640 0640 0644 0648 064C 0650 0654 0658 065C 0660 0664 00000000,00000000 ,00000000 ,00000000,00000000,00000000,00000000 58D00224 GET CURRENT ASCB 48DOD024 GET CURRENT ASID 49D00664 IS THIS MY ASID? 47800658 YES - GO TURN PER ON B7990660 TURN PEROFF 47FOB032 RETURN TO MAINLINE B7990630 TURN PER ON 47FOB032 RETURN TO MAINLINE 00000000 THIS WORD IN CR9 TURNS OFF PER xxxx ASID TO BE MONITORED OS/VS2 System Programming Library: MVS Diagnostic Techniques Additional Data Gathering Techniques (continued) Caution: Extreme care must be used when considering a system alteration in order to gather additional data about a problem. No superzaps should be applied before the system programmer has verified the logic being zapped and the trap logic itself. Remember if anyone location or offset within the module or trap changes, all offsets and base registers must be verified. System Stop Routine On occasion it is necessary to stop the system and take a stand-alone dump to fully document a problem. Loading a wait state PSW is sufficient on a uniprocessor. Stopping only one processor on an MP system is not adequate. This routine will stop an MVS MP or UP system. The caller must be supervisor state and key zero. The wait state code you wish displayed is placed at location X'756'. This trap also moves the wait state PSW to storage location zero and loads the PSW from there to prevent inadvertent restarts when the trap is hit. "~ NAME IEANUC01 IEAVFXOO VER 0700 36F'00' REP 0700 ACFC074E REP 0704 900F0758 REP 0708 58FOO010 REP 070C 58EOF294 REP 0710 91COE008 REP 0714 47E00744 REP 0718 41200000 REP 071C 41300001 REP 0720 48400204 REP 0724 1244 REP 0726 4770073C REP 072A A E030009 REP 072E 4760072A REP 0732 D20700000750 REP 0738 82000000 REP 073C A E020009 REP 0740 4760073C REP 0744 D20700000750 REP 074A 82000000,0000 REP 0750 OOOEOOOO,OOOODEAD REP 0758 00000000 DISABLE SAVE REGISTERS GET CVT POINTER GET CSD POINTER TEST IF MP NO JUST LOAD WAIT PSW SET REG 2 TO CPU 0 SET REG 3 TO CPU 1 GET CPU ADDRESS TEST FOR CPU 0 NO, STOP CPU 0 FIRST YES, STOP CPU 1 FIRST SPIN TIL CC=O MOVE THE WAIT PSW TO ZERO LOAD WAIT STATE ON CPU 0 SIGP STOP CPU 0 SPI N TI L CC=O MOVE THE WAIT PSW TO ZERO LOAD WAIT STATE ON CPU 1 WAIT PSW SAVE AREA Caution: Extreme care must be used when considering a system alteration in order to gather additional data about a problem. No superzaps should be applied before the system programmer has verified the logic being zapped and the trap logic itself. Remember if anyone location or offset within the module or trap changes, all offsets and base registers must be verified. Section 2: Important Considerations Unique to MVS 2.8.17 Additional Data GatheringTechniques (continued) Using The MVS Trace To Monitor Storage ( TheMVS trace code in module IEAVTRCE is an excellent place to hook the system to monitor system operation and branch to a trap routine. Three instructions past label STR in lEAVTRCE is the code which stores the timer values in the trace table. All trace entries pass through this code. Overlaying this code allows you to monitor any place in the system as it runs disabled, key zero and supervisor state. It must be understood that this ,code is physically disabled and therefore the trap must not page fault. Also no reference can be made to private area addresses since the trap can receive control in any address space. For larger patches a branch from this code to a patch area in the PSA is possible. At entry to this code, register 12 (C) points to the trace entry. This code normally stores the timer value located at X'lC' into the trace table. Storing a word at register 12 (C) . + X'lC' would allow dYnamic monitoring of that word in storage if addressability is obtained. The other seven words of the trace table are ·built within the trace entry code for each trace type. Monitoring for more than one word entails changing all entries. To eliminate certain trace entry types, it is only necessary to put a branch instruction 07FB at the entry point for that entry. Caution: Location X'10' cannot be monitored with this trap because the PCFLIH refreshes location X'1 0' before it branches to the trace routine. Extreme care must be used when considering a system alteration in order to gather additional data about a problem. No superzaps should be applied before the system programmer has verified the logic being zapped and the trap logic itself. Remember if anyone location or offset within the module or trap changes, all offsets and base registers must be verified. How To Expand The Trace Table Use the following zap to force trace on during NIP processing. NAME IEEVWAIT IEEVWAIT VER 0194 4710 REP 0194 47FO To increase the size of the trace table, you may zap module IEAVNIPO at label NVTTRACE to a greater value. It defaults to X'190' (400 decimal). Do not exceed a value of X'400' for the size of the trace table; 806-4 and OC4 abends can occur when the link pack area directory is accessed. NAME IEANUC01 IEAVNIPO VER 2ECO 0190 REP 2ECO XXXX WHERE X IS THE NEW VALUE DESIRED. Caution: Extreme care must be used when considering a system alteration in order to gather additional data about a problem. No superzaps should be applied before the system programmer has verified the logic being zapped and the trap logic itself. Remember if anyone location or offset within the module or trap changes, all offsets and base registers must be verified. ( 2.8.18 OS/VS2 System Programming Library: MVS Diagnostic Techniques Section 3. Diagnostic Materials Approach ",,, \ / This section provides guidelines for analyzing storage dumps to find which data areas were affected by the error and to isolate internal symptoms of the problem. The three chapters in this section are: • Stand-alone Dumps I • SVC Dumps I • SYSABENDs, SYSMDUMPs, and SYSUDUMPs Section 3. Diagnostic Materials Approach • 3.1.1 ( 3.1.2 OS/VS2 System ProgrammingUbrary: MVS Diagnostic Techniques Stand-alone Dumps The stand-alone dump provides the problem solver with a larger quantity of data than system-initiated dumps because it contains areas that belong to the entire operating system rather than just a single address space or component. One of the major problems for the analyst is finding the important data for his problem and then isolating the problem area. Once this isolation is achieved, the debugger uses unique system/component techniques to gain further insight into the exact cause of the problem. This chapter points out where to look in a stand-alone dump to determine various problem symptoms. The general approach is to analyze a stand-alone dump to find out what the system isdoing (or not doing). Important areas will be described and possible reasons for their current state/contents will be explained. The analysis starts at the global system level and, by gathering data and gaining an understanding of the environment, works down to the address space and task level. The experienced problem solver realizes that under certain conditions it may be necessary or advantageous to omit interpreting various areas. For example, if during system operation he observes that a given segment of the system (such as VT AM) is not functioning (other areas.appear okay - jobs are executing, SYSIN!SYSOUT is appearing, etc.), he may decide to take a stand-alone dump. In this case, the current state of the system is probably not important. He would not be interested in current PSW, registers, etc.; he would be interested only in the address spaces that are using VT AM and the state of the TP network. The dump is not taken for a problem that is "active" now, but to give the analyst data with which to determine a problem that appears to have originated some time ago. The point is that knowing why the dump was taken will often govern. which, if any, of the stand-alone dump areas are of significance for a given problem. Information contained in the chapter on "Waits" in Section 4 can be used as a supplement to the following discussions. (Also, a step-by-step approach to analyzing a stand-alone dump is contained in Appendix B of this manual.) To analyze a stand-alone dump, you should always ask the following questions: 1. Why was the dump taken ? Console sheets/logs are very important in stand-alone dump analysis. They are often the key to solving "enabled wait" situations and may present valuable information about system activity prior to taking the dump. Messages concerning I/O errors, condition code=3, SVC dumps, abnormal job terminations, device mounts, etc. should be thoroughly investigated to determine if they could possibly contribute to the problem you are tracking. The dump title gives an'indication of the problem's external signs or, possibly, a specific situation that must be investigated, such as "VTAM NOT FUNCTIONING." Section 3. Diagnostic Materials-Approach 3.1.3 Standalone Dumps (continued) . 2. What is the current state 01 the system ? Examine the available global data areas to determine what the system is currently doing. The "Global System Analysis" chapter in Section 4 aids in this process. Remember that at this point, you are gathering information and trying to understand the system environment in order to isolate the internal symptom; you are not ready yet to debug. 3. Has your global analysis isotated the problem to an internal symptom ? If so, refer to the qiscussion of that. symptom in Section 4 of this manual. 4. What previous errors have occurred within the system,. could they possibly .have any allect on your current problem ? The interpretation ofSYSl.LOGREC and the in-storage LOGREC buffers are most important in determining error history. See the chapter on "Use of Recovery Work Areas" in Section 2. 5. What is the recent system activity? The chapter on "MVS Trace Analysis" in Section 2 aids in trace table in terpretation. 6. Wh€!t is the work status within the tiyslem ? Your objective is t.o determine if the system has for some reason not completed all scheduled work. Determining what that work is and why it is not progressing can provide insight into the problem as well as answer some questions that may hav.e arisen during an earlier analysis. Understanding the majorcontrol block structure anq work queue status should aid in determining the possible source of the error. Refer to the discussion of "Work Queues and Address Space Status" in the "Global System Analysis" chapter of Section 2. At this point, you should have gathered enough data to have a definition of the internal problem symptom. You should also have considerable information about the system's state, error history, and job status. You should refer to the appropriate chapter in Section 4 "Symptom Analysis Approach" or, if you have isolated the error to a component or process, Section 5 or Appendix A, respectively. 3.1.4' OS/VS2 System ProgrammiIJg Library: MVS Diagnostic Techniques SVC Dumps SVC dumps (invoked by the SDUMP macro) are usually taken as a result of an entry into a functional recovery routine (FRR) or ESTAE routine. The component recovery routine specifies the address that will be dumped. The "Component Analysis" chapters in Section 5 should help you identify what areas of the system were dumped and what they contain. The SVC dump is taken asynchronously and the global data areas (PSA, LCCA, PCCA, etc.) usually contain no relevant data except in cases where overlays, machine checks, channel checks, etc., have occurred. SDUMP options SQA, ALLPSA, and SUMDUMP are the defaults for all requests. The SUMDUMP option of SDUMP provides a summary dump within an SVC dump. There is a twofold purpose for this. First, since dump requests from disabled, locked, or SRB-mode routines cannot be handled by SVC dump immediately, system activity destroys much useful diagnostic data. With SUMDUMP, copies of selected data areas are saved at the time of the request and then included in the SVC dump when it is taken. Second, SUMDUMP provides a means of dumping many predefmed data areas simply by specifying one option. The data areas saved in SUMDUMP can be printed out by using the AMDPRDMP control statement SUMDUMP. TIlls summary dump data is not "mixed with the SVC dump because in most cases it is chronologically out of step. Instead, each data area selected in the summary dump is separately formatted and identified . . For information on print dump program changes needed to print the summary dump, and multiple address-space output from SVC dump, see OS/VS2 System Programming Library: Service Aids. The RTM2WA pointed to by the TCB upon whose behalf the dump is being taken is the most valid system status indicator available. The dump task is usually the current task; the task upon whose behalf the dump is being taken will contain a completion code in the TCB completion code field. It is possible for the EST AE routine to issue SVC D itself, in which case the current task is also the failing task. Section 3: Diagnostic Materials Approach 3.1.5 I SVC Dumps (continued) Because of MVS recovery (retry and percolation), the SVCdump may be only part of the documentation at the problem solver's disposal. The problem solver should attempt to obtain: 1. 2. The system log for the time the dump was taken to ascertain if: • Any other SVC dumps were taken before or after the one he is investigating. • Any task subsequently abended. If so, a system dump that displays other areas of storage that have meaningful data may be available. The LOGREC formatted listing for the time immediately preceding the time of the SVC dump. If the component analysis procedure fails to determine the cauSe of the problem, analyze the dump as you would a stand-alone dump. Keep in mind that the information obtained via the CPUDATA option on AMDPRDMP is probably meaningless. Refer to the "Global System Analysis" chapter in Section 2 for information on how to do a task analysis of available address-space-related control blocks. Keep in mind that the system has detected the error and has attempted recovery, at least on a system basis. Therefore, there will be a good indication of the type (internal symptom) of error (loop, abend, problem check, etc.) that caused the problem. (See Section 4, "Symptom Analysis Approach.") How to Change the Contents of an SVC Dump Issued by an Individual Recovery Routine At times, SVC dump contents are not sufficient to solve a problem. The most convenient way to change the contents is the CHNGDUMP command. It can be used to establish system options to be added to the options on each SDUMP request, or to totally override the SDUMP options. See "Using the CHNGDUMP Command" in Section 2. If you do not want to affect all SVC dumps or if storage lists are involved, you may wanttochange the parameter list in a particular ESTAE exi t instead. You can usually find the name of the recovery routine by looking at the user data (or title) on the SVC dump printout. If not, search the ESTAE's PRB for the virtual address of the SDUMP SVC instruction. The following description of SDUMP's parameter list can help you decide which bits will provide the data you want. The SDUMP macro expansion generates the parameter list and puts the address of the list in register 1. 3.1.6 OS/VS2 System Programming Library: MVS Diagnostic Techniques I SVC Dumps (continued) I SDUMP Parameter List Offset o 1... user-supplied DCB= · 1.. BUFFER=YES · . 1. user-specified STORAGE= or LIST= · .. 1 .... user-specified HDR= or HDRAD= 1. .. user-specified ECB= · 1.. user-specified ASID= · . 1. QUIESCE=YES · .. 1 BRANCH=YES 1. .. indicates SDUMP (as opposed to SNAP) · 1. . indicates a SYSMDUMP request · . 1. indicates enhanced SVC Dump · .. 1 ... . user-specified ASIDLST= · . .. 1.. . user-specified SUMLIST= reserved others SDATA options 2 1. .. ALLPSA . 1. . PSA .. 1. NUC ... 1 ... . SQA 1.. . LSQA · 1.. RGN · . 1. LPA · .. 1 TRT (MVS trace table) Section 3. Diagnostic Materials Approach 3.1.7 SVC Dumps (continued) Of!Set 3 more SDATA options 1. .. CSA . 1.. SWA .. 1. SUMDUMP . .. 1 ' . '. I~ • 1. . . ~ others 3.1.8 1.. NOSUMDUMP ,NOALLPSA "NOSQA reserved 4 DCB address 8 address of storage list C address of header record 10 address of ECB 14 caller's ASID 16 target ASID of scheduled dump 18 address of ASID list lC address of summary dump storage list 20 address of SYSMDUMP 4K SQA area 24 address of SYSMDUMP CSA work area 28 length of header record (less than 100) 29 header record (will appear as title) OS/VS2,System PtogrammingLibratY: MVS Diagnostic Techn;'.Iues SYSABENDs, SYSMDUMPs, and SYSUDUMPs SYSABENDs, SYSMDUMPs, and SYSUDUMPs are produced by the system when a job abnormally terminates and a SYSABEND, SYSMDUMP, or SYSUDUMP DD statement was included in the JCL for the tenninating step. In an MVS system, the output produced is dependent on parameters supplied in the SYSI .PARMLIB members IEAABDOO, IEADMROO, and IEADMPOO for SYSABENDs, SYSMDUMPs, and SYSUDUMPs, respectively. See OS/VS2 System Programming Library: Initialization and Tuning Guide for the IBM-supplied defaults and options that are available. If the IBM defaults are used, a hexadecimal dump of LSQA is produced when the SYSABEND DD statement is specified. MVS systems do not dump the nucleus or SQA as a default for SYSABEND or SYSUDUMPs. SYSMDUMP defaults include NUC and SQA. With a SYSABEND, SYSMDUMP, or SYSUDUMP, the system has detected the error and therefore provided a starting point (such as ajob step completion code) for analysis. The analyst should always look at the JCL and allocation messages that accompany the dump. The allocation messages contain error messages that can sometimes be helpful. There will also be a JES2 job log that shows the operator messages and responses that relate to the job. The error messages also contain valuable information about the error and should always be investigated. SYSABEND, SYSMDUMP, and SYSUDUMP errors can generally be divided into two categories: software-detected errors and hardware-detected errors. Software-Detected Errors Software-detected errors are those in which one or more of the following occurs: • A module detects an invalid control block queue. • A called module returns with a bad return code. • A program check occurs in system code and a recovery routine changes the program check to a completion code and abnormally terminates the task. The best approach for a software-detected error is: 1. Use the JES2 job log and allocation messages to investigate all error messages produced. (Refer to the appropriate Message manual to determine the causes and corrective action of each message.) 2. Check the abend code defmed in the dump. (Refer to OS/VS Message Library: VS2 System Codes to determine causes and corrective actions of the code.) Some abend codes define problem determination areas that can be used to help derme the problem. . Section 3. Diagnostic Materials Approach 3.1.9 · SYSABENDs, SYSMDUMPs, and SYSUDUMPs (continued)· 3. In the event that sufficient data is not available in the Messages and Codes manuals to resolve the problem, the analyst can go directly to the program listing. The diagnostic sections of most PLMs contain a message/module and abend'/module cross-reference. Once the correct module has been located, the program listing (supplied in the system microfiche) helps to defme the problem. SYSABENDs, SYSMDuMPs, and SYSUDUMPs normally do not produce system-related data areas other than those which are formatted. Because of this and the fact that error recovery will attempt to reconstruct invalid control block chains before terminating the task, any error that does not occur in the private area maybe difficult to resolve from a SYSABEND, SYSMDUMP, or SYSUDUMP alone. Because of the recovery and percolation aspects of MVS, the SYSABEND, SYSMDUMP, or SYSUDUMP could be the end result of an earlier system error. If so, the analyst should determine if any LOGREC entries were made pertaining to this task and if any SVC dumps were taken while this task was running. The system error is normally reflected in either the LOGREC entries, the dump data sets, or both. Hardware-Detected Errors A hardware-detected error is a program check that is not intercepted by a recovery routine. This is identified by a system completion code of X'OCx' where x is the program check type. For this type of error, the analyst needs to know the address of the module where the program check occurred, and the register contents when the program check occurred. The best place to locate this information is in the RTM2WA that is pointed to by the abending TCB. Given the registers and PSW at the time of the error, the analyst should determine the module that program checked by using the load list link edit maps of the program. (If the module is outside the private area, a NUCMAP or LPA map may be nece~sary.) Then he should examine the program listing for the module until the cause of the program check is defined. 3.1.10 OS!VS2 System Programming Library: MVS Diagnostic Techniques Section 4. Symptom Analysis Approach . This section describes how to identify correctly an external symptom, and provides an analysis procedure for determining what kind of problem is causing the symptom. Each external symptom is described in a separate chapter, as follows: • Waits • Loops • TP Problems • Performance Degradation • Incorrect Output • Section 4. Symp'\'i!ftn Analysis Approach 4.1.1 4.1.2 OS/VS2 System Programming Lib,rary: MVS DiagnoStic Techniques Waits Wait states may be either enabled or disabled. The characteristics of each type are described below. haracteristics of Enabled Waits Enabled waits have traditionally been the most difficult problem to analyze because of the lack of an obvious failure. The enabled wait provides no indication of error other than that the system apparently has nothing to do. In fact the enabled wait has been accurately described as an end symptom of a problem with no obvious causes. The task of determining the possible cause is left to the debugger. Other types of software failures - abends, program checks, loops, messages - provide a starting point for analysis; that is, software or hardware has indicated a violation of interfaces or data integrity and has halted the erroneous process at the point of error. The enabled wait provides none of these. Note: The subsystem design of many components includes a dispatching mechanism and internal control block structure not generally recognized by the operating system. When these subsystems (for example, VTAM, TCAM, JES2) malfunction, work through these components is often halted. Because of the critical nature of these processes, external signs of the problem are often detectable. Within this debugging discussion, these problems are often treated as wait states, that is, the system may be capable of running batch work, but the TP network appears "hung-up." This general discussion of analysis-approach applies for problems such as "permanently" swapped-out address spaces, TP network hung, and no batch running. The advantage is that the external symptoms may allow you to more easily isolate the problem component or at least a starting point - it may be obvious that TCAM is not responding, or that JES2 is not processing input. Experience has shown that in MVS a much greater percentage of re-IPL situations are caused by enabled waits than in previous systems. One reason for this characteristic of MVS is software recovery. Software recovery attempts to repair the damage caused by a failure and allow the system to continue meaningful operation. The general philosophy of recovery is to isolate the error to ajob, terminate the job, and allow the system to continue. This philosophy dictates that under certain conditions innocent work may be forcefully terminated. Section 4. Symptom Analysis Approach 4.1.3 Waits (continued) Software recovery obviously may cause the termination of some critical process which in turn causes dependent processes to wait indefinitely. For example, assume that while processing a page-fault, an error occurred during the I/O interruption processing; software recovery was invoked and subsequently caused a cleanup of the bad control blocks, but did not post the I/O requestor. It is possible that the paging mechanism will wait indefmitely for the missing interrupt. This in turn could cause a problem program to wait indefinitely for the paging operation to complete. The end result is no work accomplished and also no external problem symptom, although a problem clearly exists. The debugger must find the bottleneck -the paging exception - and subsequently back-track enough to determine why the bottleneck still exists. Very often, this back-tracking requires analysis of several components in order to determine the original cause. Characteristics of Disabled Waits Situations can develop during execution of the MVS system that require the software to abruptly terminate the system by loading a disabled PSW with the wait bit set to 1. In previous systems, this occurred much more frequently, than it does in MVS because, in MVS, many of these situations were removed from the code and replaced with software error recovery. However, a few cases still remain that cause this symptom. To understand these situations better, refer to the 'Wait State Codes'section ofOS/VS Message Library: VS2 System Codes. A more critical situation for the analyst is a disabled wait that is caused when data areas. containing PSWs referenced by the dispatcher or hardware are overlaid and subsequently fetched for use in an LPSW. This often occurs when a PSA overlay condition exists, that is, the low storage PSWs fetched by the hardware have been inadvertently overlaid by a program running in supervisor state key O. Other data areas, such as PRBs, may contain PSWs used by the dispatcher and are also potential sources of the disabled wait state. Bad LPSWs are difficult to track down. The most common MVS uses of the LPSW in instructions are: • hardware loading from low storage for an interruption-processing sequence • dispatcher loading from X'300' into the PSA • RTM (IEAVTRTS) passing control to FRRs. • the system termination routine • SVC FLIH and I/O FLIH LPSWs. Storage overlays resulting in wait state PSWs are approached in the same manner as other storage overlays. The important step is to realize the storage overlay has occurred, then re-create the process that was possibly responsible. The discussion of pattern recognition in the chapter "Miscellaneous Debugging Hints" in Section 2 should be helpful. 4.1.4 OS/VS2 System Programming LiJ>rary: MVS Diagnostic Techniques Waits (continued) Analysis Approach for Disabled Waits The following is a list of objectives that provides a systematic approach to analyzing a disabled wait. Objective 1 - Determine positively that an actual disabled wait condition exists. Is the PSW the type that is used when MVS loads an explicit wait or is this an overlaid PSW with the wait bit on ? Analysis - Examine the current PSW contained in the dump according to the technique described in the chapter "Standalone Dumps" in Section 3. The PSA overlay should also be analyzed to determine if key PSWs have been overlaid. If the PSW shows an explicit wait, look up the wait state code in OS/VS Message Library: VS2 System Codes to find what conditions could cause the explicit wait. You may need to do some extra analyzing before the condition can be related to a component. (Note: No further analysis for explicit wait situations is discussed in this book.) If the PSW suggests an overlaid PSA or some other error source, proceed to Objective 3; otherwise proceed to Objective 2. If,for any reason, the current PSW is not formatted in the dump, the last PSW shown in the trace table, location X'300' (used by the dispatcher), or low storage should be examined as possible sources of the last PSW . . Objective 2 - Determine if the situation has been improperly diagnosed as a disabled wait. This will eliminate a situation in which the locked console is diagnosed as a disabled wait. Analysis - In previous operating systems, the operator's inability to communicate with the system through the console was an external indication of a disabled wait condition. In MVS, this same external symptom is often not a true disabled wait. Console communication is dependent upon other services of the operating system, such as paging, and the I/O subsystem. A problem in any of these services often terminates console activity and causes an apparent "disabled wait" situation, when the PSW does not actually reflect a disabled wait. If the current PSW is not disabled for external and I/O interrupts or if the wait bit (X'0002') in the PSW is not set to one (PSW = X'070EOOOO 00000000'), you should proceed to either the "Enabled Wait Analysis" topic later in this chapter or to the chapter on "Loops" later in this section. Objective 3 - Once you know that the disabled PSW is the result of an overlay in low storage or in another data area, you must gather specific data about the overlay. Ask such questions as: What was the damage to the PSW? When did the overlay most likely occur? Where did the PSW come from? Sectiot.t 4. Symptom Analysis Approach 4.1.5 Waits (continued) Analysis - It is important to try to fmd out how the PSW was overlaid - was it a byte, an entire word or doubleword, a single bit, or was a large portion of the surrounding area destroyed along with the PSW? (The discussion of Pattern Recognition in the chapter "Miscellaneous Debugging Hints" in Section 2 will help you determine this.) Much of this analysis depends on your experience and familiarity with the nonnal data for the subject PSW and the surrounding area. You should try to gather enough data to know, for example, that "n" bytes were overlaid beginning at location xyz. Also, examine the trace table, if available, and try to determine when the PSW was probably last valid. Look for interrupts and unusual conditions in the trace entries to try to reconstruct the process(es) leading up to the incorrect PSW. If the trace indicates the overlay occurred after the most recent trace ~~y, the registers are important because they may show recent BALs ·and BALRs and they may contain the address of a routine or control block that was used to overlay the subject PSW. This "is actually a good situation because it will not take long to relate the overlay to some bad pointer in a control block and, hopefully, your analysis will proceed to a specific component. If the overlay occurred several trace en tries earlier, detennine a possible save area that might contain the registers that were active at the time of the overlay by examining interrupt entries or dispatch entries in the trace table. If there is no trace table, it is almost impossible to defme when the overlay occurred. You might try to analyze, for example, TCB save areas, hoping for a clue as to when the overlay occ,urred and to gather infonnation concerning the problem. However, this process is baSically undefined and undisciplined. In most cases, a trap for the overlay can be generated at this point and used as soon as possible. Objective 4 - Determine which component most likely caused the overlay and choose a likely set of modules from that component to analyze at an instruction level. Determine which data area field contains the bad address and who set up the field. Analysis - As mentioned earlier, by using the registers and trace table it is possible to identify which code actually overlaid the PSW, but the source of the error must still be found. This mostly involves screening code to reconstruct the path which caused the overlay and locating the data that generated the bad address. At this point, you want to learn which module set the bad field so you can start backtracking. Shortcuts are possible according to the analyst's familiarity with the modules that are involved. Certainly the main objective should be to decide which component is most likely responsible and then to proceed to the discussion of that component's analysis (in Section 5). 4.1.6 OS/VS2 System Programming Library: MVS Diagnostic Techniques Waits (continued) Analysis Approach for Enabled Waits It is most important that you understand the actions that must take place in order to accomplish work in the operating system. This requires a basic understanding of the key system processes in MVS - paging, I/O, dispatching, locking, WAIT/POST, ENQ/DEQ, VTAM, TeAM, SRM, JES2/3. These areas of the system are responsible for directing work through MVS; a malfunction in anyone may cause global system problems. Several, if not all, must be investigated in order to determine why work is not progressing. This investigation requires a disciplined approach. The relationships of component interfaces and their mutual dependencies must be understood. With this in mind, the debugger should proceed to gather information about the vaoous processes and try to integrate his findings with his other information and assumptions about the problem, always trying to isolate one cause of the bottle-neck. He must avoid the tendency to guess, assume, and go off on tangents once the first irregular item is uncovered. Instead, he should continue to gather known facts and piece them togetherin some logical pattern that recreates the situation. In the vast majority of wait state cases, more than one key process will appear backlogged. The challenge is to determine how these problem processes relate and which is the fundamental cause of the wait situation. After you gather the facts and understand the bottlenecks, you must answer one question. If I "pull the cork" on this given bottleneck will all the other intertwined situations resolve themselves? In every problem there is only one bottleneck for which the answer to this question is "yes". The other problems are consequences of this key process's failure to complete its designed function. Isolating the process is half your battle; the other half is determining the cause of this one process's failure. Following is a sl,lggested disciplined approach for the problem solver who is approaching" a-system wait problem. The approach involves three distinct stages of problem analysis: Stage 1 - Preliminary global system understanding, including • system externals • current system state • LOGREC analysis • trace analysis • determining the reason for waiting Stage 2 - Key subsystem analysis - an in-depth analysis of the MVS components that are responsible for accomplishing work. Stage 3 - System analysis - using the information gathered in Stages 1 and 2 the problem solver must "step back", get perspective about the known facts by piecing them together in a logical fashion, and isolate the error to a process, component, module, etc. This approach is described in detail in the following sections. Section 4. Symptom Analysis Approach 4.1.7 Waits (continued) Stage 1: Preliminary Global System Analysis 1. System Externals - Completely understand· the system externals of the situation. Console sheets and the system log should be inspected. • For any' enabled wait (operators call it "system hung") find out if a display requests command was issued. (Lack of operator action can cause system bottlenecks.) • Often many pages of console sheets must be investigated to uncover operational problems and explain events uncovered in the dump. Scanning provides a feeling for the events, jobs, requests, etc. leading up to the problem. • Make sure all DDR SWAP requests, I/O error messages, SQA shortage messages, etc. can be explained. • Always take the time to examine these external areas because a small effort here could save many hours of detailed dump analysis. Do not overlook obvious items such as a MOUNT PENDING message in the console log that can cause system problems. 2. Current System State - Investigate fully the current situation as depicted by the dump. For enabled waits, the PSW should equal X'070EOOOOOOOOOOOO' (often called the "no-work " wait) or there should be a considerable recurrence of the no-work wait in the OS trace table - see the chapter on "MVS Trace Analysis" in Section 2. If this is not the case, use the disabled wait analysis approach (earlier in this chapter). If the PSW indicates the no-work wait situation, you have an enabled wait. You should now check other global system data areas indicators to get the whole picture. Following are key global indicators: • There should be no bit set in the PSASUPER field (PSA+X'228'). If there is, some supervisor routine should be in control. This situation can indicate incomplete processing by the associated routine. All possibilities should be pursued until the situation can be explained . • Because of SRM timer/analysis processing, even when the system is in the enabled wait situation,the state of the processor at the very instant the dump was taken can indicate, via the "super bits" or locks indfcator (PSAHLHI), that some process was occurring. You must determine in this case that these fields being set is nonnal and continue with wait analysis. If the fields cannot be explained, you have isolated the error. 4.1.8 OS!VS2 System Programming Library: MVS Diagnostic Techniques Waits (continued) • There should be no locks held, as indicated by PSAHLHI on either processor. This situation is similar to the one described just above. You must try to discover the owner of the lock and determine why it is still held despite the fact that the system is waiting. Often the purpose of the lock will provide insight as to who the owner might be .. The chapter on "Locking" in Section 2 should be of help in your analysis. 3. LOGREC Analysis - Determine if key components have encountered difficulty; determine previous errors encountered by the system. This can be accomplished by inspecting SYSl.LOGREC as well as the in-storage LOGREC buffer. Errors encountered in any of the key processes noted earlier (RSM, ASM, lOS, JES2/3, SRM, ENQ/DEQ, VTAM, etc.) may provide further information. If you do find an error associated with any of these areas, determine whether it could lead to the bottleneck. The LOGREC records generally contain the names of the errorencountering routines and often the job on whose behalf the system was processing at the time of the error. If the routine names are not present, you may have to use system maps and the PSW/register information in the LOGREC records in order to associate errors with components. The discussion !~:~l LOGREC analysis in the "Use of Recovery Work Areas" chapter in Section 2 should be helpful in your analysis. 4. Trace Analysis - Determine the last activity within the system. Because of SRM's timer processing, the trace table for most wait conditions is not useful. However, on the rare occasion that the system has been stopped or if for some reason the trace is not overlaid with timer il1terrupts (X'l004' extemalinterrupt entries), the trace should be analyzed to ensure normal processing, for example, page faults are being processed, I/O is being accomplished. Be suspicious oflarge (relative to most entries) time gaps in the trace table. If the table has not wrapped-around, process re-creation may be of some use in determining what the system was doing up to the point of incident. (The chapter on "MVS Trace Analysis" in Section 2 shouid be helpful.) 5. Determine the reason for waiting - Once it has been determined that the system is waiting, it is always useful to determine what the various address spaces or jobs are waiting for. This is accomplished by inspecting and scanning the various tasks and their associated RB structure in a formatted stand-alone dump. Remember the RCT, started task conrol (STC)/LOGON, and dump task may all be waiting in each address space - this is normal. The question you should ask is: Why are the subtasks below the STC/LOGON waiting? Generally in an active system more than one address space will be waiting for the same or similar resource in a problem situation. Therefore, as you scan and analyze address space status, look for suspensions in common modules (RB resume.PSWs containing.similar addresses): Section 4. Symptom Analysis Approach 4.1.9 Waits (continued) • marty tasks in page-fault wait can indicate the paging or I/O mechanism is faulty. • The PVT can indicate areal frame shortage. • Many tasks in terminal I/O wait can indicate something is wrong with the TP access method or some part of the network. • Seve'ral Resume PSWs pointing into the ENQ/OE.Q routine, IEAVENQ1, can indicate an ENQ resource contention problem. In general, be on the look-out. Try to compare and relate the system activities as you encounter them. Often more than one process or address space is held up because of a common bottleneck. It may be a global resource required by more than one address space, for example, a lock or data set. It is important that the exact cause be determined. Stage 2: Key Subsystem Analysis As partof this investigation, if noth~hg can be easily determined from a cursory address space scan, you may have to delve~)'i1io the key components. Following are some highlights of the important and potentially suspect areas: 1. I/O Subsystem ..... Check for unprocessed I/O requests, bottlenecks in the I/O process will almost always log-jam the system. Since lOS is the central facility for controlling I/O operations, I/O problems should always be suspected in an enabled wait condition. Therefore, the lOS component and its associated queues should be analyzed early in the subsystem analysis stage of debugging. Two important lOS queues and control blocks will indicate whether problems exist in the I/O process: • Logical channel queues (LCH) contain lists of elements for I/O requests. If these queues (pointed to by the CVT + X'8C') are not empty in a waiting system, lOS must be further investigated. • Unit control blocks (UCBs) are a logical representation of each I/O device containing I/O active indicators at offset 6/7. If any indicator is set, this device must be further investigated. This condition can indicate either a hardware or software problem. Both the queued (LCH) and active (UCB/IOQE) requests must be further investigated to determine the associated requestors and what effect their I/O not being serviced will have on system operation (for example, if paging I/O or console I/O is not being serviced, the system will usually stop). The UCB contains indicators for OOR, intervention required, and missing interrupt handler processing. Any such indication must be further investigated. An ENQ on the SYSZEC 16 resource is an indication of a waiting condition generally associated with swapping. The swapping process cannot complete until active I/O finishes. In a quiesced system, an ENQ on this resource must be further investigated. 4.1.10 OS/VS2 System Programming Library: MVS Diagnostic Techniques Waits (continued) 2. Paging Mechanism - Check for unserviced page faults. ASM, RSM, and SRM are closely related and depend upon each other to maintain real storage, the swapping process, and page fault resolution. If, when you determined the reason for waiting as described in stage 1, you discovered several page fault wait conditions, be suspicious. Some key indicators in determining page fault waits are: • ASCBLOCK = X'7FFFFFFF' - indicates suspensiori while holding the local lock. If in task mode at the time of suspension, the resume PSW instruction address (saved at IHSA + X'IO') should be checked. When the instruction address = RBRTRAN (-C offset), it indicates the task is suspended while it waits for a page fault resolution. The page fault occurred when a new module (paged-out) was referenced. If in SRB mode at the time of suspension, an SSRB will be queued from a PCB. The anchor for these PCBs is the RSMHDR (private area page fault) or the PVT (common area page fault). • ASCBLOCK = 00000000 indicates no locks are held. The RB structure can reveal the same situation as described above for RBOPSW instruction address = RBRTRAN or RBXWAIT=O and, in addition, an RB wait count = 1. If you find several tasks in this state, check the dump for the page represented by RBRTRAN. Is it in storage? (Remember for private area addresses to be sure that the address space you are investigati.ng is printed.) If the page is not in storage you may have a potential paging problem. Again, if in SRB mode at page fault time, the SSRB must be found to determine more about the process. If you believe paging is a potential problem, check the PVTAFC (available frame count). A "low" value may indicate a frame shortage. While "low" is difficult to define, the value should certainly be above the PVTAFCOK value (PVT+ X'6 '). Beyond this, "low" is influenced by sizes of working sets of the address spaces in your system. The working set size for each address space is contained in the associated SRM-user control block (OUCB). This count (plus an SRM constant of 10) is the number of frames required to swap-in the corresponding address space. If enough frames are not available, the address space will remain swapped-out. ASM maintains a count of the number of paging requests received and the number for which processing has completed in the ASMVT. If these counts are not equal, ASM is backed-up and page faults have not been resolved. This can be caused by an I/O problem or some internal ASM problem. The ASM Component Analysis chapter in Section 5 describes the work queues in the paging activity reference table entries (PARTEs). Finding unprocessed work on these queues will aid in determining whether ASM is the problem component. But again be careful: you are still gathering data about the wait state. Your purpose now is not to debug ASM - it may not be the problem. Note the apparent ASM problems and continue your investigation. Later when you piece together your findings and find the real source of the problem, detailed debugging and logic flow will be required. Section 4. Symptom Analysis Approach 4.1.11 Waits (continued) 3. ENQ/DEQ - Check for unresolveable resource contention. Finding an ENQ/DEQ interlock and determining what work is being held up because of this interlock can provide important information about the overall problem. The QCBTRACE option of AMDPRDMP provides a formatted structure of the resources and the work that is in con ten tion for them. Determining who owns the resources and the current status of the owners (if swapped-out, why? or if waiting, for what ?) often provides important clues in understanding the bottleneck. Also in your scanning process, you should be on the alert for address spaces that contain subtasks (usually below the STC/LOGON level) with multiple RB levels, and with the lowest RB containing a resume PSW with an address somewhere within ENQ code (nucleus resident) and with the RB wait count RBWCF = 1. The previous RB should be an RB with the ENQ SVC (SVC X'38') indication in the "WC-LIe" portion of the RB prefIx (-4 offset). This indicates that this task and probably the address space are suspended because of an unsatisfIed ENQ request. If several address spaces or tasks are found in this state you should fInd out why. The QCBTRACE facility of AMDPRDMP can be most helpful. An illustration follows: Investigation of QCBTRACE data shows many requests backed-up on resource A. The analyst notes this and determines what ASID or TCB owns resource A at this time (in this example , ASID 9). The other resources represented in the QCBTRACE are now scanned. If ASID 9 is backed-up behind someone else (ASID 10) waiting for another resource (B), you must now determine ASID 10's status with respect to other resources, including resource A. Essentially you are looking for cases where: • An address space has resource A and is waiting for resource B and a second address space has resource B and is waiting for resource A. This indicates a deadlock. You must determine the faulty process. In this case you have probably isolated the error to the ENQ process and the way it is being used. You· must analyze the task structure of each address space to determine how this situation occurred. Do not forget the SYSl.LOGREC buffers. They may contain clues like errors in ENQ/DEQ or one of the tied-up address spaces Gobs). Faulty recovery should be suspected if the latter is the case. It may be that ajob requests control (via ENQ) of a resource and subsequently encounters a software error. The task's associated recovery gains control and "recovers" from the error but does not dequeue (DEQ), and therefore does not release the resource. Eventually, the contention for this resource, depending on its importance, could cause severe problems. 4.1.12 OSjVS2 System Programming Library: MVS Diagnostic Techniques Waits (continued) • An address space has control of a resource and a lot of address spaces are queued-up behind this address space. In this case, you must find out why the holder is not releasing the resource. Also know your system. It is not unusual to see activity on the master catalog resource: "SYSIGGVI Master Catalog Name." But be suspicious of most resources. Determine from the holder's task structure what process it is attempting. Determine whether the address space is waiting or swapped-out and why. If it is not waiting or swapped-out, check the non-dispatchability bits and the possibility that the address space is looping. This second case is much more likely to be a sign of some other system problem. Your clue is what is preventing the holder's execution; this will point you to another process which must be investigated and may lead to the detection of the final problem. Note: When analyzing a dump of a quiesced system you should be suspicious of "unusual" ENQ resource names - resources that should not be a contention factor in a quiesced system. The presence of these names should be understood and explained because they very often will point you to the problem area. Common resource names are: "SYSZECI6 - PURGE" - Can indicate a problem in the I/O process related to the resource holders address space Can indicate a bottleneck in the swapping process "SYSZVARY - x" 4. indicates the reconfiguration component has been invoked - why is it not completing? Dispatching - Determine if there is work to do in the system. A common trouble indicator is an MVS dispatching queue containing elements that indicate work is ready to execute in a waiting system. The GSMQ, LSMQ, GSPL, and each LSPL should be empty. (The chapter on "System Modes and Status Saving" in Section 2 contains details of these queues and how to find them). Generally it is not a problem in the dispatching mechanism itself but merely an error indication. Often the most useful information is just that 'yes, there is work.' Why is it not being dispatched? Is there a problem in some other area of the system? Is the address space swapped out? Yes, there may be a real storage problem delaying swap-in. Or perhaps SRM has not been told to swap-in the address space via a "user-ready" SYSEVENT. In short, investigate the OUCB for the address space you are concerned with. Another useful point is to find out what problems could arise if this work were not dispatched. Investigating the queued work will indicate what would be accomplished if this work were executed. This is usually important because it can clear up much of the "smoke" you may be encountering in your overall system investigation. Section 4. Symptom Analysis Approach 4.1.13 Waits (continued) Likewise, investigate the task strucfure. Generally, you can ask the same questions as above,bllt you must look in different places for the key indicators. Among the most important indicators are: • The ASCB, which contains a count of ready TCBs in the memory • The TCB non-dispatchability flags • The RTM work area, which contains ~us at time of error • . The RB structure. Look for long RB chains or unusual SVCs and interrupt codes. Look for page fault waits. Again, use this information to lead you to processes or problems that hold-up the system. S. . Locking - Determine if there is a locking conflict. The locking mechanism causes system bottlenecks when it is not used properly. The global spin locks cause obvious problem symptoms such as one processor spinning in the lock manager (lEAVELK) in an MP environment. (In a UP eJlvironment, global spin iocks are generally not a problem unless a lock/w~rd or interface is overlaid or ba.d, causing a disabled spin. The enabled locks (local/CMS) are generally the problem ones.) The chapter "Locking" in Section 2 describes in detail the considerations with which you should be concerned. Elements on the CMS/localsuspend queues may indicate a problem. The technique you adoptto resolve the conflicts is exactly the same as the ENQ interlock or log~ jam situation. 6. Teleprocessing - Determine if the TP network is responding. Problems in the TP network often manifest themselves as waiting network or waiting terminals, even waiting systems. The chapter "TP Problems" in Section 4 contains a detailed description of TP problem analysis. The VTAM and TCAM chapters in· SectionS contain techniques for VT AM and TCAM problem analysis. An important fact for the problem solver here is that these ar:e subsystems. As such, they maintain their own control blocks, queues, and dispatching mechanisms. They are responsible for work being processed once it enters the subsystem and they often have little direct dependency on MVS. That is, normal MVS problem indicators will not generally solve the problem. You must understand the subsystem's work-processing mechanism in order to be an effective analyst. For example, VTAM has its own address space with a number of tasks used primarily for network start-up, shut~down, and operator commands. In most VT AM problems, a look at the VTAM address space will show these tasks are waiting. However, this is normal when no operator processing is required. Even though VTAM is waiting, this is not the place to be distracted. Again, remember this VTAM task structure, put it aside as part of your information gathering, and then proceed to the analysis of VTAM's in ternal work queues as described in the VTAM chapter of Section S. 4.1.14 OS/VS2 Systel11 Programl11ing Library: MVS Diagnostic Techniques Waits (continued) 7. Console Communications - Determine whether console communication is possible. The system can appear or actually prove to be waiting because the operator is not able to communicate with MVS. This could be the sign of a problem almost anywhere in MVS, but it often indicates an error in the communications task or its associated processing. The communications task (comm task) runs as a task in t~e master scheduler's address space and is usually represented by the third TCB in the formatted'portion of the stand-alone dump and identified by a X'FD' in the TCBTID field (TCB+ X'EE'). By inspecting the RB structure associated with this task, you can 'to determine the current status. It is not unusual to fmd one RB with a resume PSW address in the LPA and an RB wait count of one. If more than one RB is chained from the TCB and you were not able to enter commands, analyze the RB structure because this is not a normal condition. The key cOlltrol block is the unit control module (UCM) which is located in the nucleus. CVTCUCB (CVT+X'64') points to the base UCM. The base UCM-4 contains the address of the UCM MCS prefrx and the base UCM-8 contains the address of the UCM extensipn. From the UCM you can determine the status of the various consoles. The following should be considered and can warrant further investigation: • Important WTORs are outstanding. • An out-of-buffer (WQEs, OREs) situation exists. • There are unusual flags in the UCM. • There is a full-screen condition. • There is a console out of ready. Remember that comm task processing is dependent on the rest of the operating system. Most likely, some external service or process has caused comm task to back-up, and this possibility should be investigated. Remember the debug process: gather all the facts, then proceed with analysis. Stage 3:, System Analysis At this point you should have a detailed understanding of the system and its key components. You should know which components or processes are back-logged and, correspondingly, what work Gobs) is not being processed by the system because of these back -logs. You must now stand back from the problem. Section 4. Symptom Analysis Approach 4.1.15 Waits (continued) Answer this question: Which of these problems and situations can be related to or attributed to each other? For example, if I/O is queued for the paging devices (indicated by 10QEs on the LCHs associated with the_paging devices' DCBs) al!dy'Qu also_ found several address spaces are in "page-fault wait", you can now relate these findings. And if one of these address spaces performed an ENQ for a resource and did not yet DEQ because of the page-fault suspension, it is very likely other address spaces are also backlogged behind this job's processing. Initially your ENQ/DEQ analysis showed the problem, but at this point you can attribute the ENQ contention problem to the page-fault suspension problem that you have already attributed to the I/O problem. This process must be repeated for all the potential error situations you uncovered in your investigation. Do not forget to use the system indicators in your attempt to arrive at the source of the problem. And most importantly, ask yourself: If I unplug this bottleneck, will all the other intertwined situations resolve themselves ? In the previous example, resolving the ENQ situation will allow the work queued in the ENQ/DEQ component to execute but the "page-fault waiting" job will still be hung. That is, ENQ/DEQ is not the problem to pursue. Indeed, if you resolve the I/O problem, this page fault is resolved, the DEQ will be performed, and all work in the system will resume normal operation. Yes, the I/O problem is the important consideration in this case. The I/O problem is the one that must be pursued. When this problem is resolved, the enabled wait state condition has been resolved. Global system areas, recovery work areas, 'LOGREC analysis, and lOS component analysis will be necessary to further isolate, and eventually solve, the problem. 4.1.16 OS/VS2 System Programming Library: MVS Diagnostic Techniques Loops Loops are defined as disabled or enabled, depending upon their external appearances. A disabled loop can be recognized externally by a solid system light and the inability to communicate with the system through the consoles (that is, no input or output). Usually, a disabled loop indicates a hardware and/or software malfunction. There are several cases in MVS however in which a disabled loop is purposely used and is not an error indication. These cases are discussed later in this chapter. An enabled loop is generally much larger than a disabled loop. Observed from the console it appears as a bottleneck: the system seems to be slowing down periodically, suggesting performance degradation. The operator may notice that a particular job remains in the system for a long time and does not terminate. Common Loop Situations There are two common loop situations: 1. Two processors of an MP environment communicate via the signal processor (SlGP) instruction. Often the SIGP-issuing processor enters a disabled loop until the receiving processor either accepts the SIGP-caused interrupt or performs the operation requested by the issuing processor. This loop serializes the processors in the MP configuration. The SIGP-issuing processor loops in a nucleus-resident module, IEAVERI. Often during an MP dump analysis you will fmd that one processor was in this loop. This is not an error if: • The operator pushed the STOP button on one processor and not the other to investigate a suspected problem. • The receiving processor disabled for external interrupts thereby preventing the SIGP-issuing processor from proceeding. If this situation continues for an extended period, it means there is a system problem but the loop is a result of that problem and is not an error itself. Most often, the other processor's activities must be analyzed to determine the problem. For a more detailed discussion of MP communication, refer to the chapter "Effects of MP on Problem Analysis" in Section 2. Section 4: Symptom Aria1ysis Approach 4.2.1 Loops (continued) 2. The lock manager (lEAVELK), which resides in the nucleus and controls the locking mechanism of MVS, contains a section of code that enters a disabled loop when a global spin lock is requested butis not available. On a UP this is an invalid condition and always signifies an overlaid lockword or invalid lockword address. On an MP system, this usually indicates that the other processor is holding the lock and not releasing it. But it may indicate an overlaid lockword; if not, the problem is definitely on the other processor. In either case, register 11 contains the pointer to the requested lockword and register 14 contains the address of the requestor. Check the value in the lockword. Valid values are a fu1lword of zeros, or three bytes of zeros and the logical processor address in the fourth byte. Any other bit configuration will cause the system to spin in a disabled loop and signifies an overlaid lockword or invalid lockword address. If the lockword is not valid, it is necessary to identify who overlaid the lockword. It is possible that the lockword was overlaid in conjunction with some other problem. Again, since the disabled loop may not be the problem but a symptom of a possible error on the other processor, determine why the requested lock is not available. For a detailed discussion of "Locking" see Section 2. Analysis Procedure Generally for loop analysis, you will have a stand-alone dump if the operator considered the problem serious enough to re-IPL the system, or an SVC dump, SYSUDUMP, SYSMDUMP, or SYSABEND (provided by the software recovery) if the operator pressed the RESTART key in order to break the apparent loop. For the SVC dump, SYSUDUMP, SYSMDUMP, and SYSABEND dumps there is an abnormal completion code of X'071' associated with the looping task of a job if the RESTART key was pressed when the program was actually looping. In addition, a formatted SYSI.LOGREC listing should be available. Before you can determine what problem is causing the loop, you must determine first that a loop really exists,and second whether it is enab,led or disabled. First, verify that a loop exists. The disabled loop situation is fairly straightforward. The PSW contains a disabled mask (X'40' or X'OO') and all other system activity will have stopped. Recognizing that there is an enabled loop is often the most difficult step. Enabled loops are often quite large and may encompass several distinct operations -:- I/O events, SVCs, module linkage, etc. Because the loop is enabled, it is often interrupted, pre-empted and eventually resumed many times. This makes it difficult to recognize the loop pattern. Following are some indicators of a potential enabled loop: • 4.2.2 The current PSW has an enabled mask, X'OT, in the first byte and the instruction address portion =t= O. This alone does not prove there is a loop, but the information may help your analysis of the problem later. OS/VS2 System Programming Library: MVS Diagnostic Techniques Loops (continued) • The MVS trace table shows a repetitive pattern of events, for examp!e, SVCs issued from the same virtual addresses, or dispatcher entries for virtual addresses that are relatively close together. Determine if the entries are related to the same address space by using the ASID field (offset X'16' into the trace entry). If so, you can now examine the task and control block structure indicated by the trace entries. The chapter on "MVS Trace Analysis" in Section 2 should prove helpful. • Many tasks (TCBs) or address spaces (ASIDs) appear to be bottlenecked waiting for some resource(s). This can be determined by using the QCBTRACE option for AMDPRDMP and analyzing the output. If there appears to be a bottleneck, determine what job owns the resource(s) and what that job is currently doing. It may be that the job that acquired the resource(s) is in an infinite enabled loop; therefore, when other jobs request the same resource(s), their requests cannot be satisfied, which eventually causes a major performance throughput problem. See the chapter on "System Execution Modes and Status Saving" in Section 2 for how to recreate the job's current status. A reconstruction of the PSW and registers helps you to determine if there was an enabled loop. • TCB/RB structure analysis. Look for unusual or long RB structures chained from TCBs. These may.indicate a loop that includes several levels of supervisor linkage. Enabled Loop Exception: The system resources manager (SRM) of MVS constantly monitors resources, gathers data, and analyzes the system. SRM uses a timer interrupt approximately every .4 seconds in order to gather its stati.stics. This timer interrupt occurs even when the system is in an enabled wait condition. Because of this, the enabled wait is often referred to by operators as an enabled loop. (They observe the "WAIT" indicator from the console, followed by a burst of activity (SRM processing), followed by the "WAIT" indicator, etc. It may even be possible to enter certain operator commands.) However, this is really an enabled wait condition and analysis should proceed according to the discussion on "Enabled Waits" in the "Waits" chapter earlier in this section. The dump you are analyzing may show the MVS trace table containing a no-work wait (070EOOOO 00000000) PSW followed by a timer interrupt, SRB dispatch, MP communication, etc. This pattern indicates an enabled wait condition, not an enabled loop. (See the "Pattern Recognition" topic in the."MisceUaneous Debugging Hints" chapter in Section 2.) Once you have determined the type of loop, the following analysis procedure should help determine what problem is causing the loop. Section 4: Symptom Analysis Approach 4.2.3 Loops (continued) Objective I - Who is looping? . The PSW and registers saved at the time of the dump indicate the active work. (See the chapter "Global System Analysis" in Section 2.) The register save areas in the LCCA/PSA indicate important environmental data at the time of the last I/O interrupt, external interrupt, etc. (See the chapter "System Execution Modes and Status Saving" in Section 2.) The PSA indicators contain valid information about disabled loops. Also remember the recoverY areas active at the time of the loop are valid and may provide hints as to the current process. (See the chapter "Use of Recovery Work Areas" in Section 2.) Unlike the disabled loop situation, the enabled loop may not have the current registers associated with it. This is true if the loop was interrupted and new processing was initiated before the dump was taken. For the enabled loop, find the current registers and status from the ASCB/ASXB/TCB/RB structure and the associated save areas (for example, IHSA)~ The chapter "System Execution Modes and Status Saving" in Section 2 will be helpful for this phase. Objective 2 - What is the system mode? It is important to know whether the system is in SRB or task mode and the implications of these modes. In all cases of true disabled loops, the PSW, LCCA, and PCCA contain valid status indicators such as the last dispatched routine (PSA+X'300'). The old PSWs reflect the last interrupt status. The register save areas in the LCCA are valid. The LCCA+X'2ID' set to I indicates SRB mode; set to 0 indicates task mode. The ASCB NEW/OLD and TCB NEW/OLD pointers reflect the current task. (Note: If the TCB OLD pointer is zero, the system is in SRB mode or possibly in superviso~ mode - that is, dispatcher or supervisor recovery. The discussion in the "System Execution Modes and Status Saving" chapter in Section 2 and the "Dispatcher" chapter in Section 5 are useful. By scanning the MVS trace table, you will be able to determine system events leading up to the loop. See the chapter on "MVS Trace Analysis" in Section 2. SYSl.LOGREC and the in-storage LOGREC buffer may contain indications of previous occurrences of the loop (records with X'071' completion codes) or records of previous errors in the currently looping process that could possibly contribute to the current loop. See the "Locking" chapter and the discussion on LOGREC in the "Use of Recovery Work Areas" chapter in Section 2. 4.2.4 OS/VS2 System Programming Library: MVS Diagnostic Techniques Loops (continued) Objective 3 - What is the extent of the loop and why is the system looping? Using the current PSW and the global data areas in combination with the general purpose registers, you should be able to determine the extent of the loop. One register often contains the key to a loop-causing value. Try to isolate that one register. It may be necessary to inspect the actual object/source code to determine the basic logic in case there is an encoded loop that is supposed to end when a certain value is reached. If that value cannot be reached for some reason, the loop will not end. Isolating the cause of the loop is important in loop analysis. Once the cause is isolated, you can proceed the same as with a system-detected error such as a program check. Objective 4 - Determine the cause of the error - how is the value that is causing the loop developed? To determine how the bad value was developed, it is necessary to back through the logic leading to the loop. Be aware of bad control blocks. Look at the bad value itself and the areas from which it was developed. Try to determine if the value is the result of a storage overlay or if it was calculated from bad logic. See the "Pattern Recognition" topic in the "Miscellaneous Debugging Hints" chapter of Section 2 to help make this determination. In addition to bad control blocks and data fields, consider the loop control mechanism used for encoded loops. Often a common cause of problems is that the BCT instruction is used and the loop control register contains a negative value. Scanning the active registers at the time of the dump often aids in discovering this type of problem. Figuring out how the erroneous field could possibly contain the value it does is the most challenging part of the process. Again, the contents of the field often provide the clue to determining the error-causing process. Also, consider how serialization is accomplished for the field in question. Is it possible for both processors to be updating the field simultaneously? The MVS trace helps you recreate recent processes, but you also must understand the modes and structure of the code that updates the field. (Your work in Objective 2 should be helpfu1.) It is possible that the code setting up the field was physically interrupted and, because it was non-reentrant or the logic was faulty, another process updated the field or control fields and subsequently caused the first process to encounter unexpected data. Section 4: Symptom Analysis Approach 4.2.5 4.2.6 OS/VS2 System Programming L,brary: MVS Diagnostic Techniques TP Problems A common problem in teleprocessing (TP) environments is incorrect data, which may affect one termirtal or an entire component. The symptoms include no data, wrong data, or too much data, but the general problem symptom is that something is wrong with one or more messages. The problem is usually not tied directly to a component or access method, as a program check would be; often an error message is from a component not directly causing the problem. Typical symptoms are: • An error response from an application that suggests incorrect data was entered from a terminal, when in fact the data was correct • A "hung" terminal - the system will not respond • Wait states, in which message traffic gradually dies off • Incorrect characters in a message (the data may be going into or out of the system) This chapter discusses TP problem analysis, in the following topics: • Message Flow Through the System • Types of Traces • Trace Output Under Normal Conditions Message Flow Through the System Data exchanged between programs in the system and terminals follows a route through several components. The first step in solving "typical" problems is to determine where along that route something is happening incorrectly. By far the most valuable tools for doing this are the traces in the various components. To use the traces effectively it is necessary to understand how messages flow through the system. For example, consider a message from a TCAM application to a TCAM terminal. The path might go from the application program buffer to a TCAM queue data set, to a TCAM buffer while TCAM processes it, over the channel into the 3705, then into an NCP buffer, and fmally over a communications line to a terminal: Traces allows the message to be checkpointed at certain spots along the path; therefore, understanding the,path is vital to knowing what traces to use and what you should see for a message that flows correctly. Section 4: Symptom Analysis Approach 4.3.1 TP Problems (continued) To use traces effectively you must also understand how components refer to terminals or lines and how they communicate with each other regarding these terminals and lines. Terminals or lines are identified in traces by a line control block (LCB), a logical name, a network address, a polling/addressing sequence, or a subchannel address. Not only must these relationships be known in order to use multiple traces, but certain correspondences must be correct in order for data to move through the network. When using traces, the general approach to a problem of incorrect data is to track the data flow from a point where everything was all right to a point where the messages stopped or were incorrect. Messages that are flowing correctly can be used to establish time relationships between different traces. Then each message can be followed along its route past each checkpoint, with the goal of isolating : a gap between two checkpoints where the message stopped or became bad. The next step is to focus on this narrower area to learn what is wrong. If a message stops, what is wrong or what is missing? How does the flow up to that point compare with a normal flow? You must understand what resources and what processes are required for a message to move from where it appeared last to where it should have appeared next. What buffers and/or control blocks would have been used? Were they available? A single terminal or all terminals may encounter a "wait state", and it is necessary"to dig into the component to determine what processing has taken place and what condition or resource is preventing further processing. The TeAM Debugging Guide should be referenced for problem isolation in TCAM. If a part of the data moving through the system becomes bad, the traces should isolate a component or an interface over which it was transformed. Comparison with normal message flow will indicate whether any change at all should have taken place. If no change should have occurred, an overlay ("clobber") or incorrect pointers to data buffers maybe the problem. The exact amount and positioning of bad data should be determined, for it might provide an obvious correlation ~ith other known variables such as a buffer length. If some transformation norm"ally occurs to a message, the controlling process under which it is performed must be examined. What could cause an incorrect transformation of data? Examples are translate/edit tables or mappings from one resource name into another name, such as mapping the logical network name into the network address. 4.3.2 OS/VS2 System Programming Library: MVS Diagnostic Techniques TP Problems (continued) Types 0 f Traces The following is. a summary of EP and NCP mode traces and their relationships to each other. For more information on these traces refer to: • IBM 3704 and 3705 Communication Controllers Emulation Program Generation and Utilities Guide and Reference Manual • IBM 3704 and 3705 Communication Controllers Network Control Program/VS Generation and Utilities Guide and Reference Manual • OS/VS2 MVS VTAM Debugging Guide • IBM 3704 and 3705 Program Re!emence Handbooks • OS/VS2 TCAM System Programmer's Guide, Level 10 • OS/VS TCAM Debugging Guide, Levell 0 Figure 4-1 shows a summary of EP and NCP mode traces . . EP Mode 3705 EP or PEP EP Subchannels BTAM APPL r-----IIOS EP TCAM· . TCA M 1/0 Trace GTF 510 and 1/0 EP Line Trace Level 3 EP Line Trace Level 2 NCPMode Application 3705 TCAM 105 TeAM Dispatcher Subtask Trace TCAM Buffer Trace Native Subchannels 1--_ _ _ _ _-1 PIU GTF Trace 510 and 1/0 Trace NCP Channel Adapter Trace NCP Line Trace Figure 4-1. Summary of EP and NCP Mode Traces Section 4: Symptom Analysis Approach 4.3.3 TP Problems (continued) EP Mode Traces The following describes the six EP mode traces: 1. 3705 Emulator Program line trace - Any or all emulator subchannels can be traced in the 3705. Each character over a line (level 2 interrupt) and/or each interaction with the channel (level 3 interrupt) for data or status transfer can be traced. The trace is activated via the 3705 panel or via the EP (emulator program}service aid, Dynadump. Trace data is retrieved from a 3705 dump or via Dynadump. All of the 3705 storage above the relatively small emulator program itself is used for a trace table (this can be a significant amount in a large 3705). If a problem can be isolated to one or two lines and can be detected quickly, an in-storage trace (in the 3705) is usually sufficient. The trace can be stopped before critical information is lost because of wrapping. If all lines must be traced, if there are a good number of high-speed lines with constant activity such as polling, or if the problem is not externally detectable, then Dynadump should be used to dump trace data as it is created. 2. PEP Emulator line trace - This traces,EP lines in a PEP (partioned emulator program). The size of the trace table is set at PEP generation time and is fixed. The size of the trace table compared to the amount of storage used for tracing in an EP makes wrapping a much more serious problem. Dynadump is extremely useful when tracing EP lines in a PEP. 3. GTF I/O and SIO trace -All emulator sub channels can be traced. SIOs, I/O interrupts, CAWs, CSWs, and SIO condition codes are traced. No CCWs or data are traced. Data can be traced to an external file and selectively reduced by time and sub channel address. SIOs and I/O interrupts are directly correlated with channel activity seen in a 3705 EP line trace. Refer to OS/VS2 System Programming Library: Service Aids for details on activating and printing this trace. 4. TCAM EP mode line I/O interrupt trace table - This trace I/O interrupts on EP or 2701/2/3 lines. The TCAM INTRO macro or the operator specifies the response at TCAM start up~ 5. TeAM dispatcher sub task trace - This records the flow of all TCAM-dispatched elements. It is specified, as is the TCAM I/O trace, by the INTRO macro or by operator responses. It is useful in determining the flow of activity within TeAM. It contains a history of TCAM buffers, LCBs, QCBs, etc., that are processed by the various TCAM subtasks. 6. TeAM buffer trace - This traces buffers for specified lines as they are ;processed by a message handler (MH). The lines are the same lines as those specified by the I/O interrupt trace. or TPIO trace. 4.3.4 OS!VS2 System Programming Library: MVS Diagnostic Techniques TP Problems (continued) NCP Mode Traces The following describes the NCP mode traces: 1. NCP line trace - This traces all characters on a given TP line (level 2 interrupt) in the 3705. Only one line can be traced at a time. There is one four-byte entry for each character interrupt in the 3705, including a one-byte time field. The time field wraps after 25 .5.. seconds, but it is useful for seeing delays and when time-out periods expire. This trace is controlled through access method operator commands. Data is shipped to VTAM from the NCP and traced via GTF, so GTF must be active for the USR trace when line trace data is required. 2. NCPchannel adapter trace - This traces the NCP's interaction with the channel for status and data transfer. There is one 32-byte entry for each Level 3 NCP channel adapter interrupt. The data stays in a static trace table within the 3705 and is retrieved via~a 3705 dump. The trace option is included in an NCP by altering the TRACE= operand in the SYSCG006 macro (in SYSl.MAC3705). Refer to the section "Channel Adapter Trace" in IBM 3 704 and 3 705 Communications Controller Network Control Program/VS Logic, and to the 3704/3705 Program Reference Handbook. If the NCP uses a Type 1 channel adapter, the trace includes all data transferred over the channel. If the NCP uses a Type 2 or Type 3 channel adapte~ (CA), the trace does not include data. The trace is most useful for problems involving NCP ABENDs where the last activity on the channel is of interest. Because the trace wraps very fast and cannot be written out dynamically, it is not useful for finding what happened over a period of time. 3. GTF SIO and I/O trace - This traces SIO and I/O interrupts for a 3705. Its main value is in showing what SIOs and I/O interrupts take place in relation to the RNIO trace (which is discussed next). The RNIO trace shows data (PIUs) in and out. The SIO and I/O traces can then show attention interrupts and how many PIUs were transferred at a time and if there are problems in timing or in "coat-tailing." Coat-tailing is the ability to bring PIUs into the system from the 3705 on the same I/O operation used to write other data out to the 3705. 4. GTF RNIO trace, or VTAM I/O trace - This shows the header and first few bytes of data of each PIU coming into or going out of VTAM. It is sent to GTF under the RNIO trace option. Each entry is time-stamped. The tracing is done in channel-end appendages as soon as the I/O operation that transfers the data is complete. Section 4: Symptom Analysis Approach 4.3.5 TP Problems (continued) 5. ITAM buffer trace - This is a VT AM trace sent to GTF under the USR trace option. The trace is performed at two points in VTAM. a) TPIOS. Traces are labeled TPIOS IN/OUT REMOTE. Application jobname, destination and origin node names, feedback data block (FDB), feedback status block (FSB) for input operations, PIU header and text are traced. For TPIOS OUT REMOTE entries, the transmission header is not exactly as it will appear when I/O is finally performed. Sequence number and length fields, plus some other bits in the TH (transmission header) may be filled in after the TPIOS trace. Use the RNIO trace to see exactly how the PIU header was sent to the 3705. b) Control Layer. Traces are labeled C/L IN and C/L OUT. Application jobname, destination and origin node names, and text are traced approximately at the time of the transfer of a message from the application's buffer to VTAM's buffer, and vice-versa. Note: VTAM traces such as I/O trace (RNIO) and buffer trace (USR) must be started and stopped by a VTAM operafor command for each terminal or node in the network that is to be traced. There is no higher level operation available. 6. TCAM dispatcher subtask trace - This trace is described in item 5 under "EP Mode Traces" earlier in this chapter. 7. TeAM buffer trace - This trace is similar to the TCAM buffer trace for EP lines described at the end of the section "EP Mode Traces" earlier in this chapter. 8. TeAM PIU trace - This traces path information units (PIUs) for a line, a line group, or an NCP. 4.3.6 OS/VS2 System Programming Library:· MVS Diagnostic Techniques TP Problems (continued) Trace Output Under Nonnal Conditions The following sections illustrate how some nonnal situations are seen in traces. Understanding normal processing cannot be over-emphasized, for it is often a comparison between traces such as these and a trace of an error that reveals the key to a problem. Example 1: VTAM I/O Trace The first example (Figure 4-2) shows only a VTAM I/O trace (RNIO) for: 1. the activation of a PEP (it is incidental that it's a PEP), 2. the activation and connection of a 3600 logical unit, 3. data exchange between an application and the LU, then: 4. disconnection and deactivation of the LU and the PEP. Trace Entries: • • • • 1-9 10 - 15 16 - 21 24 - 32 - show the PEP activation and initialization are line-activates for start-stop lines are line-activates for some BSC lines. show the activation of the link, controller, and LU. Operator VARY commands are used to activate a 3600 cluster controller and logical unit. Unlike old-device lines, the SDLC linkJs not activated until a device on the link needs to be activated. An application program is then started that openS' (via OPNDST) the LU. • 33 - 37 - show both the connection and first message to the LU. • 38 - 42 - show another link activation and controller contact; the controller and its LUs are not being traced, however, so none of the subsequent activation is shown. (Every node to be traced must have a VTAM command issued.) The PEP itself is being RNIO traced, which is the reason its activation is shown. • 43 - 52 - show data exchange between the LU and the application. • 53 - 56 - show the disconnection from issuing a CLSDST. • 57 - end- show the PEP deactivation of the SNA devices, the SIS and BSC lines, and the PEP itself. Note entries 8 - 9 and 22 - 23 in Figure 4-2: this trace was made with a level 3.0 PEP, which does not support VTAM's attempt to alter the channel attention delay; therefore, error responses are received. VTAM, through the NCP or PEP, immediately activates all old-device lines (S/S and BSC) in NCP mode. The SDLC devices are system generated with ISTATUS = INACTIVE and therefore, no SDLC links are activated. (The VTAM messages 1ST ..... 1 PEP736A ACTIVE would have appeared on the console after trace entry 23.) Section 4: Symptom Analysis Approach 4.3.7 ~ ~ EXTERNAL TRACE - DO TAPE 00 ••• DATE o ~ fC fI'l '< C"I.I S- a ~ ~ a a !' t- o: ~ ~ fI'l S2 ~ ~ ct, f') ~ g. = ~' ('D C"I.I DAY 168 YEAR 1915 . 1;RNIO ASCB OOFOCD60 CPU 0001 JOBN .41690.144513 TIME 2, RNIO ASCB OOFOC060 CPU 0001 JO~N TIME 41690.145616 3lRNIO ASCB 00FDCD60 CPU 0001 JOSN TIME 41691.003599 4,RNIO ASCB OOFOCD60 CPU OOCI JOEN TIME 41691.226991 5,RNIO ASCB OOFDC060 CPU 0001 JOcN 41691.5286~4 TIME 6;RNIO ASCB OOFDCD~C CPU 0001 JOBN TIME 41691.549546 7 RNIO ASCS OOFOC06v CPU 0001 JObN TIME 41691.1995'-"2 8· RNIO ASC8 00FOCD6Q CPU 0001 JOSN TIME 41691.81872b 9 RNIO ASCB 00FDCD60 CPU 0001 JObN TIME 41692.0~9619 10'RNIO ASC5 OOFOCGeD CPU 0001 JOBN 41692.111212 TIME 11 RNIO A~CB 00FOCD60 CPU ODOI JDbN TIME 41692.399922 12 RNIO ASCS OOFOCObO CPU 0001 JObN TIME 41692.411~38 13' RNIO ASCB 00FOCD6C CPU 0001 JOBN 41692.100031 TIME 14 RNIO ASCe COFDCD60 CPU 0001 JObN 416Cf2.1231Ib TIME 15;RNIO ASCB OOFOCD6~ CPU 0001 JOBN TIME 41693.0001~1 0..---. ~ .... - - TIME 16.30.32.113950 VTAM IN 10000000 2boooooo 00092BOO 00500900 400698 VTAM OUT IF002000 10000001 00066B80 00110103 VTAM IN IFOOI000 0000FB80 00110101 C5D1F1F3 OUT IF002000 10000002 00046B80 OOAO VTAM IN IFOOI000 20000002 0004FB80 OOAO VTAM OUT lE002COO 10000001 00100880 00010211 200001FO VTAM IN lEOOI000 20000001 00069B80 00010211 VTAM OUT 1£002000 10000002 00080880 00010211 20000500 VTAM IN lEOOI000 20000002 OOOA9F90 00100200 00010211 VTAM ~ ~ ~ r::r OUT lE002000 10000003 00080880 0001020A 2001 VTAM IN lECOI000 20000003 00069880 000102CA I:IJ '(;' VTAM OUT VTAM I~002000 VTAM = -- OUT lE002000 10000005 00080B80 OOOl020A 2005 VTAM IN lEOOICOO 2000000500069880 0001020A - VTAM - -- - OUT lE002DOG 11000026 OD08JBaO 0001020A 206B VTAM IN lEDOI000 20000026 00069B80 VTAM OOQ1~2nA OUT lE0C20QC 10000021 00080B80 0001020A 2019 VTAM IN IfCOI000 2C000021 00069880 0001020A VTAM OUT If 002000 1(000028 00080B80 DOOI020A 2081 VTAM IN lECC1000 20000028 00069B80 0001020A V1AM 0UT lE002000 10000029 000B0B80 0001C211 2000050C VTAM .IN H001000 2COOCC29 OOOA9F90 00100200 OC010211 VTAM Figure 4-2. VTAM I/O Trace Example (Part 1 of 4) [Q 10000004 00080880 0001020A 2003 IN lE001000 2C000004 00069880 0001020A - 16'RNIO ASCt COFrCDb( CPU 0001 JOtN TIME: 411f.4.5:C32bS 17 RNIIJ ASCS COFLC060 CPU 0001 JO~N TIME 41104.605968 18 RNIO ASCS r.OFlI.CvbCr CPU 0001 JO~N TIME: 41704.e632~e 19 RNIO ASCE. C'O~['C(:6Q CPU 0001 JLBN 4] 705. H't.C!';a lIME 20 RNIC Asce CCF(jC[){;C CPU 0001 JObN T1Mf4110 5 .18 511(, 21RNIO Asce COFI.JCD6C CPU COCI JOfN TIM[ 41 n ~.4v62B3 22 ~NIC ASCB 0f·FlICDt,l CPU 2C01 JOb~ 4170~.15893 ~ "0 47!RNIO ASCB ~IOFE439C CPU 0001 JOBN TIME 41968.BOO126 OOFE439C CPU 0001 JOSN 4l961~.876992 TIME OOFE4390 CPU 0001 JUbN 41972.101915 TIME OOH.4390 CPU 0001 ·JOBN TIM~ 41972.125414 OOFE4390 CPU 0001 JOBN TIME 41991.309495 OUT VTAM OUT lE002COO IC00002C 00080B8C 0001C20A 212E VTAM VTAM 10010001 00230390 OOD9C5Cl C4E84B40 IN lEOOlOOO l000002C 00069B80 0001020A OUT lE002000 1000002D 00080B80 00010201 212F IN lEDOlCOO VTAM IN lCOOI000 20000002 00090600 000102S0 212FOI lRAC36GO IN lC001001 20~0002D 2110~001 00069B80 000le201 00060390 0040FOFl TRAC3bOO OUT lEC0211D 10010002 001rC390 00C90508 F3F6FOFO TRAC3600 IN lCOOI001 21100002 00180390 00C90508 E4C9D9E8 TRAC3600 OUT lE002110 1(1010003 00560390 00C4CIE3 C140C2Cl TRAC3600 IN lCOOI00l 21100003 OC180390 00C90508 E4C909E8 f5 =iD 1~00211D VTAM 0 .j:I. 0 = C". = -e= til lRAL3600 '"I iN ""r::r ;r:I.I TRAC3600 = ~ ~ ." ." 0 ..- TIME 16.36.27.900691 OUT IF00211D 10010002 00046B80 OOAO t) ~2FF9100 IN IFOOI001 2ll0COOl 0004EB8C 0031 TkAC3600 d". • OUT lE002000 10000028 00080B80 00010201 211t 35 RNI0 ASCB eOF£:439C CPU ODD} JLBN TIME41788.0977t;4 36 RNIO ASCb OOFE439C CPU 0001 JObN TIME 41788.642254 37,RNIO ASCB OOFE439G CPU COOl JObN 41789.441206 TIME 38 RNIO ASCB n":FOCD6~ CPU OOul JOhN TIME 41872 .~31413 39 RNIO ASCB OOFOCD60 CPU GOOI JOcN TIME 41B13.07178't 4O:RNIO ASCB OOFDC06C CPU 0001 JOBN TIM!:: 41873.266604 41' RNIO ASCB M'FDCD6C> CPU 0001 JOBN TIME 41873.571967 42 RNIO ASCB 00FDCD6C, CPU 0001 JOBN TIME 4tlB15.!>131l3 ff TAPE Figure 4-2. VTAM I/O Trace Example (Part 2 of 4) ~ ~ o o CIl - ,..- 48 RNIO ASCB OOFE4390 CPU 0001 JOBN TRAC3600 TIME 41991.332963 49 RNID ASCB OOFE4390 CPU 0001 JOaN TRAC3600 TIME 41994.510949 "< -- CIl N ~ S(Ij a a ~s $' OC! t"'4 a= ~ ~ CIl 52 ,em ~ ct. (') ~ g. = ~. (II (Ij ~ EXTERNAL TRACE - DO TAPE - OUT 1E002110 IG010004 00560390 00C4C1E3 C140(2C1 IN lCOO1001 21100004 00180390 00C90508,E4C9o'9E& ~ 50 RNID ASCB 00FE:4390 CPU 0001 JOBN 42224.350661 TIME 51 RNIO ASCS OOFE4390 CPU 0001 JOBN 42224.375301 TIME 52 RNIO ASCB OOFE4390 CPU 0001 JOBN TIME 42237.552094 53 RNIO ASCB OOFE4390 CPU 0001 JOBN TIME: 42238.652850 54 RNIO ASCB OOFE4390 CPU 0001 JOHN TIME 42239.251761 55RNIO ASCB 00FDCD60 CPU 0001 JOBN 42253.499154 TIME 56.RNIO ASCB OOFDCD6C CPU 0001 JOhN 42254.054066 TIME TRAC3600 IN 1COOI001 21100017 00180390 00C90508 E4C909E& TRAC3600 OUT 1E002110 10010018 00560390 00C4C1E3 C140C2Cl TRAC3600 IN lC001001 21100018 00180390 00C905D8 E4C909E8 TRAC3600 IN 1F001001 21100003 0004E68D DOA1 TRAC3600 IN 1FOOI001 21100001 0004~880 0032 VTAM OUT lF002ll0 10000002 00046880 OOOE VTAM IN 1FOOlOOO 211D0002 0004FB80 OOOE ~ ." '"t ~ a ('I) *** DATE DAY 168 TIME 16.44.23.800573 YEAR 1975 57 RNIO ASce 00FDC060 CPU COOl JOBN 42263.952790 TIME 58'RNIO ASCB 00FOCD60 CPU 0001 JUBN 42264.5558Cl TIME 59 RNIO ASCB OOFOCD60 CPU 0001 JOBN 42264.586587 TIME 60 RNIO ASCB OOFDC060 CPU 0001 JO&N 42264.955744 TIME 61 RNIC Asce COFDC060 CPU vOOI JCBN 42265.2·)2fj~7 TIME 62 RNIO ASCB 00FOC060 CPU UOOI JOAN 42270.355864 TIME 63 RNIO ASCS 00FOCD6C CPU 0001 JuBN 42275.385671 TIME 64 RNlO ASCB OOFDCD6C CPU 0001 JObN 422.75.756677 TIME 65 RNIO ASCB OOFDCD60 CPU 0001 JOeN 4Zr(S.797796 TIME 66 RNIO ASCE OOFDCD60 CPU COOl ~05N TIME: 42281.157516 rIl ..(') VTAM OUT 1F00211C 10000002 00056B80 001202 VTAM IN IFOGI000 211C0002 0004EBcC 0012 Q = ~. ~ ('I) VTAM VTAM VTAM VTAM VTAM VTAM VTAM VTAM OUT If 002000 10000034 00080B~0 IN lEOOI00C 20000034 00069BbO 00010202 OUT 1E002COO It000035 C008CB80 GOOI020B Z11? IN lEOOI000 2t000035 00069880 (001020B OuT lE002000 1000003600080880 00010202 212f IN IfOOlGOO 2C000036 C0069BHO 0001r202 OUT 1[002000 10000037 00080880 00010208 212l IN lEOOlOOO 2C000037 00069B80 0001020B - -- 67 RNI0 AS(B OOFDCD60 CPU 0001 JUHN VTAM 42895.353348 TIME 68 RNIO A~C8 O~FOCD6C CPU 0001 JGbN VTAM 42898.bt699~ TIME 69 RNIO ASCB GOFDCD6G CPU ceDI JOPN VTAM ~289f,.8b4574 TIMt ,e 00010202 211C GUT If 002000 10Qo002F e008~B8C 0001020B 2eOI IN If 001000 20Q0002F C~069B8~ 00Cl r 2&R OUT l~CLlGCO 11000030 000eCB80 0001020S • 2CC~ Figure 4-2. VTAM I/O Trace Example (part 3 of 4) ~ ....... - - EXTERNAL TRACE - Dn TAPE 70 RNlu ASc[; rOrDCD6r CPU 0001 JObN VTAM 42'i02.46bo41 CPU 0001 JQ~N VTAM [H;C[!6' - - IN IfCOI0QC 2(000030 000b9BHO 0001C208 --. TIM .. 71 kNIO A~Ce ~ T1 Mf CPU ceDI J()EN VTAM 72 RNIO ASCE OOfDCD6C TIME 42906.C7135"t 73 RN u:: ASC b fC F[,C C6e CPU (O~1 JubN V1AM 42'1C'o.C6't(·4b TIM!' 74 RNlO ASCB -,OF-CCUt;-; CPU cnOl JDBN VTAM TIME 42909.671495 """"!:: ~ - g ~ ~ 3 't:I ~ ~ ea. '< fa. ~ ~ 't:I "1 ~ ~ ~ 2~000031 00069BPO OOO1020b OUT IfOC7000 10000032 00080e80 0001020B ?C07 IN IEOOIOOO 20000032 Qr069B80 OOOl~2rB VTAM V1AM V1AM VTAM OUT lEOC2000 lOOOC04D COC90B8(. GOOI n20B 2030 IN IEOOIOOO 2C00004~ OOC6geao COO1020R OUT IfC02000 1000004E 00080B80 OC~lJ20B 2019 IN IFDOIOOO 2000004E 00069880 00010206 ~ ~ Q a' VTAM VTAM OUT 1£C02000 1~OOQ04F J006CB8~ 00010208 2Cb7 IN 1fOCIOOO 2000004r 00069880 OD010208 VTAM OUl IF002000 1000000D 00056~BC 0012(1 VTAM iN IFOCHOO 200000.oD OOC4H&O 0012 Figure 4-2. VTAM I/O Trace Example (Part 4 of 4) ~ (") IN 110CICOO - 75 RNlO ASCf:. OOFuC[J';'v CPU GOOI JObN 43005.3150('0 lIME76 RNIO ASCb ~! FDCD6(J CPU G.I JOhN TIM! 43008.~91b5t. 77 RNlO ASCB OOfDCD6i) CPU 0001 JObN 43009.10C'051 TIME 78 RNIO ASCB OQFGCD6C CPU 0001 JUbN 43012.499105 lIMF 79 RNHJ ASCS O!" FOCDb0 CPU 0001 JOUN TIME 43012.S5&114 80 RNIO Asce OOFDCD60 CPU 0001 JObN TlMf. 43015.8990HB 81 RNIO ASCB OOFDCDeO CPU oeol JOBN 43·017.031614 TIME. 82 RNIO ASCB fJOFuCD6C: CPU 0001 JOEN 43017.2994L5 TIME ct. LUT HO('2')OO 1 ('(100031 0008::;&&0 r:"01'.'205 zr:-5 42902.48(:' 7C;4 ~ fIl '()' Q a g' tD ~ TP Problems (continued) Example 2: VTAM and GTF Traces The second example (Figure 4.3) shows all of the VTAM·GTF traces for parts of the process shown in the previous example. The TPIOS buffer trace, control layer (C/L) buffer trace, RNIO trace, and NCP line trace are illustrated in the order they occur. Trace Entries: • 1· 17 • - show the PEP activation, etc. 18· 29 - are activations. for start·stop lines. • 30·55 - show the SOle link activation; they also show controller and LU activation and connection. • 56 - is the first C/L record, the first data received from the application. • 56·76 - show the message flow between the application and the LU. • 77· 93 - show the NCP line trace in relation to the data in the PIUs. The exact placement ofline trace records relative to RNIO and buffer trace records cannot be depended upon; in general, most or all of the line trace that shows receipt of a message in the NCP will precede the inbound host traces for that message. The RNIO records are omitted from this section of the trace: there should be RNIO records for each PIU, plus one for each line trace entry showing a buffer ofline trace data coming into VTAM from the NCP. .. • 84· 122 - show the last data exchanges, the disconnection and deactivation of the 3600, and deactivation of the SOLC link. The RNIO trace entries in this example· can be matched exactly with the entries of the previous example. This example shows how messages can be followed through the network. The GTF I/O and SIO traces are not shown in this example. Running a single termiilal at a low message rate, as in this example, would cause almost every RNIO trace to be preceded by several I/O - SIO entries. The sequences are usually as follows: Outbound: GTF SIO GTF I/O -CE,OE GTF RNIOOUT - Inbound: 4.3.12 GTF I/O -ATTN GTF SIO GTF I/O - CE, DE, UE GTF RNIO IN OS/VS2 SystentProgrammins Library: MVS Diagnostic Techniques TP Problems (continued) All SIO and I/O entries are for the native sub channel address of the 3705. With more message traffic, different sequences may be seen. Very few problems have occurred in the area of the VTAMNCP channel interface, but a brief explanation is included to avoid confusion. The following GTF I/O, SIO, and RNIO sequences are possible: 1. Coat-tailing - data is returned from the NCP on the same I/O operation used to send data out, as follows: GTF SIO (WRITE CCWs with READ CCW appended) GTF RNIO I/O CE, DE, UE (UE because some but not all READ CCWs were used) IN RNIO IN RNIO OUT The number of PIUs transferred out and in can vary. If the maximum number of PIUs is sent in (see the MAXBFRU operand of the HOST macro in NCP generation), then the I/O interruption has a status of just CE, DE. 2. More than the maximum number of PIUs were in the NCP ready to be sent to VTAM (assumes the maximum is 3): . GTF I/O ATTN GTF SIO (READ CCW string) GTF I/O CE, DE, ATTN RNIO IN RNIO IN RNIO IN GTF SIO (read CCW string) GTF I/O CE, DE, UE RNIO IN The first read operation ends normally with ari attention, indicating more data in the NCP. This avoids an extra interrupt just to present an attention. Section 4: Symptom Analysis Approach 4.3.13' ~ ~ EXTERNAL TRACE - DO TAPE ~ ••• DATE 0 -< '< rIl ;- a ." t a S· QQ t: 0" lot ~ ~ t"'-l FDB 00000000 00867028 ,00100000 RSVD 0000 THRH lCOC2GOC 10000000 00006B80 00 TEXT 110103 lNG2 00A4 * ••. 7 RNIO ASCB 8 RNIO ASCB ~ 9USRFD FEF e. ,Q = (D rIl 10 USRFD FEF- 41691.5431Cy2 CPU 0001 JUbN VTAM TIMf 41691.54C,54b 12 RNJO ASCB OCFD('D60 CPU 0001 JOBN VTA~ TIMf 41691.799592 13 USRF[l FEf- ASCB COF[)(,!)60 JOBN VIAM IPIQS IN ANODE VTAM HW. DNLDE PtP73(,A REMOTE. FSb Tl--:I<,H Ton TIMF 41691.803107 14 USRFO FEF ASCS OOFDCUbC JOoN VTAM TPI0S UUl ANLL~ V1AM FGE THRH kEMOTt ONOOE: P~P7~cA TEXT TIMt 11. kNIO ASCB [:.:F[)CD6~ L.. Figure 4-3. VTAM and GTF Traces (Part 1 of 13) 00000000 00000000 * OUT IF002000 10000001 00066B80 00110103 IN IFOOI000 2DOOOOOI OOOOFB80 00110107 C5D7F7F3 COOOOOOO 022COOOO IFOOI000 110107C5 OOB67369 00000000 20000001 D7F7F3F6 OOOAOOOO RSVO 0812 lNG2 OOCO 10002000 00010000 00000000 00000000 00000000 00000000 DOODFB80 00 C140 * •• PE-P736A * ~ ~ ~ "'t coro lNG2 OOA4 RSVD *. TlM~ 41691.217088 00FOC060 CPU 0001 JUbN VTAM TIME 41691.226997 OOFDCDbO CPU 0001 JU~N VTAM TIMF 41691.528864 ASCb C0FDCD60 JOBN VTAM TPIOS IN ANOliE VTAM FOB RE:MOTE: DNDDE P[P730A f-SI:l THRH TEXT TIME 416'11.532490 ASCB \' Of'OCD60 JObN VTAM TPIeS OUT ANODE VTAM F:::'L DN(jOE Pl P73bA RE:MOn THkH TEXT RSVD IN 10000000 20000000 00092800 00500900 400698 00000000 OO~67348 OOOEOOOO RSVD lC002000 10000000 00006B80 00 nXl AD t;. go TIME 16.30.32.113950 TIME 41690.735400 2 RNIO ASCBOOFDCD60 CPU 0001 JUhN VTAM TIME 41690.744573 3 RNIO ASCB 00FDCD60 CPU 0001 JOBN VTAM TIME 41690.745676 4 RNIO ASCB 00FDCD60 CPU 0001 JOhN VTAM TIME -41691.003599 5 USRFD FEF ASCB OOFOCD6D JOBN VTAM TPIOS IN ANODE VTAM FDE; REMOTE DNODE PFP736A FSI3 THRh TE-Xl TIME 41691.0742~1 6 USRFD FEF ASCB COFDCD60 JOBN VTAM TPIOS OUT ANODE V1AM FDE REMOTE DNODE PE'P736A lHRH i 0 ~ YEAR 1975 USRFD FEF ASCB 00FDCD60 JOBN VTAM TPIOS OUT ANODE VTAM REMOTE DNODE PEP736A ~ t"'-l N t"'-l DAY 168 0 t::I" OOuOOOOO 00000000 * ;" a --= fIl n -= 0 OUT IFOC2DOO 10000002 0004&680 OOAO S· IN IfnOl000 20000002 0004FB80 OOAO (D ,e (0000000 00807431 00010000 RSVD 0812 lNG2 OOCO 022COOC:: '':;COOOOOO 10002000 (,'0020COO OOOOe-JOO OOCOOOOO 00000000 00040000 IFOOIJon 20000002 C004Fb80 00 AC' *. * 00000000 00667410 rC270COC RSVD (1000 lNG2 COA4 RSVD 00000000 00000000 lCOO~OOO l~OOOOOO 00000880 00 0102l12C COOIFOF6 61FIF761 F7F54BFl F6F8FIFb ••••••• 06/17/75.16816* 4Bf-3F44::; F5F2 *.34.52 * GUT IF002:00 100QCOCI 00100680 0001'-'211 2ea001H. IN llCOI000 20000001 0006Qe8o 00010211 (,00000(,[; C 22CC'OOO 1 LCe 1 ,.,;)C 0507369 ~OO300CC RSV[' (812 LNG2 ooeo DOCODOD lC_u20~' G~010COO cccrnooo to:~0CAC 0000COCC COObODOD OeeeDOl CC069b8C 00 * ••• 010211 GrOeteC lCCOZ~O 01[2112 OB67348 ~015000C - RSVU COl ';::: r ;) -'f''',',B8C CO (r5CCOC eeoo LNG2 OOA4 * RSVD * •••••••• 00000000 00000000 * ~--~ '-'--"" - EXTtRNAl TRACE - DO TAPE lIME 41t91.e143b2 15 RIIIIO A~Cf:' (;(,F[lC[,c.C C..·.v 0001 J(JbN VlAM TIM~ 416S1.81U128 16 RNIr; ASCI: COF(JC()b(' CPU 00(:1 JUE..N VlAM TIM[41692.099019 17 USR~U FEF ASC~ ~O~D(Dbt JO~N VTAM TPICS IN ANO~E VlAM F~b kEMOTE DNCD!: PEP73~A fS6 CUT ItC02COO lroooooz OOOBDBBO 00010211 IN 11.001:CO 2~000002 0(000000 . 0 67431 ~OC70 OC (22C0000 uO 00000 10002 '0 THR~ lE001~OC 2C (0002 (SOA9 90 llXl 1002~OOO Jl 211 41692.103190 18 USRFO FEF A~(e COFDtD60 JObN V1AM TPIOS OUT ANODE. V1AM REMOTE ~~co~ PtP73bA OCOA9F90 00100200 00010211 ~SVD ce12 LNG2 ooco OQOOOOOO OOQ0}CQO 0COOOOOO OOCAaOOO 00020~OC co * ...... . TIME F(;E' CCoC-.:.coc THKH lCOO2~Ot oe6741C C0120000 RSVD 0000 "(\f1(J~Cr: (;0000&8 C :'0 lNG2 OOA4 19 RNIO A$Cb CCFGC~bC 41692.11~b7? C~UCCCI JUtN VTAM TIME 41692.117212 20 RNIO Asce OOF[JCDcC CPU ceDI JllbN VT AM TIM[ 41692.399922 21 USRFD FEF Asce GOFDCD60 JOBN V1AM TPIOS IN ANO~E VTAM FDE: ~EM01E DNO[JE P~P13bA FS5 OUT IFGCi~OC 10(00003 0008CB8Q C001020A IN l~~Cl~OO ~~OC0003 24 RNIO ASCb 25 USRFIJ FEF ~ () ct. g f: 26 USRFD FEF ~ .§ s-a ~ ~ 27 RNIO ASCB 28 RNIO ASCB lao C'I'.I FDb 00000000 00b6134b 00120000 RSVD 0000 THRH lC002~~0 10000000 00000880 00 TIME 41692.413215 OOFDC060 CPU \i001 JOBN VTAM TIME 41692.417538 OOFDCD60 CPU oeOl JObN VTAM TIME 4169?700031 ASCB 00FDCD60 JOSN VTAM TPIO~ IN ANODE VTAM FOB kEMOTf DNODE PtP7~6A FSb THRH TEXT TIME 41692.703547 ASCB OOFDCC60 JOSN VTAM TPIOS OUT ANODE VTAM FOb REMOTE ONODE PEP736A THRH TEXT TIME 41692.713339 OOFDCObO CPU COOl JDBN VTAM TIME 41692.723718 OOFDCD6~ CPU 0001 JOBN VTAM TIME 41693.000187 ~ a..-....-..- ~ Figure 4-3. VTAM and GTF Traces (Part 2 of 13) ... "C ~ ~ t it * ~ 2~Ol '"0 a~ 00069B80 0OOlS20A lNG2 OOA4 3fIl - RSVD -.. n * = ... 00000000 00000000 ~ * .•••. TE Xl 0 Ie 20A2C ': 3 A~CB 00000000 ~OCOOOOO Ol'OOOOOO CCE-:t13b9 (\0030000 RSV[) 0812 LNG2 OOCD C22COCOC oocroooc 10002000 00030000 OOOcOOOO 00000000 00000000 00060000 THk~ lE001000 2(000003 00069BPO 00 lEXT C 102e A * ••• TIME 41e92.4C3432 22 USRFD FH ASCb 00FDCD60 JO&N VTAM TPIOS OUT ANODE VTAM REMOTE DNODE PtP73~A 23 RNIO RSVD * * ••••• lEXT CI02CA20 TIME 20000~OG Q ~. * OUT lE002000 1 0 000004 00080880 OOOlC20A 20C3 IN 1I:0('}(/00 2(-00('004 00069880 0001020A D0[00~ot U0867431 00030000 RSVD 0812 lNG2 OOCD 022COOOO 00COOrioo ID002000 ~00400DO 00000000 00000000 OODoocor lEOOlCOO 20000004 00069880 00 01020A .... 00000~00 60867410 08120000 RSVD OOCO 10000000 GOOOOB80 00 01020A20 05 lNG2 OOA4 lC002~OO OUT lE002000 10000005 00~80B~0 * ••••• RSVD 00060000 * 00000000 00000000 * 0001020A 2005 IN 1[00100C 20000005 00069880 OD01020A - - " .....!, , ~ w ;... Q\ o ~ fIl 29 USRFD FEF ASCB. 00FDCD60 JOBN VTAM TPIOS IN ANODE VTAM REMOTE DNODE PEP136A N ~ S' :3 r.f,I ~ 1 5· OQ r""I c;: ~ ~ fIl S2 .~ !~ ~ ct. (') ~ So 5!. .c c: (D r.f,I - r---- TIME 41693.003693 ~r-3O'USRFD FEF ASC8 COFOC06C JOE.N VTAM TPIOS OUT ANODE VTAM REMOTE DNODE PEP136A FOB FSB THRH TEXT - 00000000 00B61369 00030000 RSVD 0812 LNG2 OOCO 022COOOO 00000000 10002000 00050000 00000000 00000000 0000000000060COO lEOOI000 20000005 00069B80 00 01020A •... * FOB 00000000 00861410 00120000 RSVO 0000 THRH lC002000 10000000 00000880 00 TEXT Ol020A21 IS TIME 41146.536984 31 RNIO AS(.e 00FOCD60 CPU UOOI JOBN VTAM 41146.543962 TIME ·32 RNIO ASCB OOFOC06C CPU 0001 JOSN VTAM TIME 41146.823315 33 USRFD FEf AS(B 00F~CD60 JOSN VUM FOB TPIOS IN ANODE: VTAM Rt::MOTE ONODE PfP136A FSB THRH TEXT TIME 41146.826948 34 USRFO FEF ASCE OOFD(060 JOBN VUM FOB TPlOS OUT ANODE VTAM THRH REMOTE ormoE PE:P136A TEXT TIME 41146.837196 35 RNIO ASCB OOf~CD60 CPU 0001 JO~N VTAM 41146.B42293 TIME 36 RNIO ASCB OOF(JCDbO CPU 0001 JOBN VTAM TIME 41141.123411 37USRFO FEF ASCB OOFOCG60 JOBN VTAM FDb TPIOS IN ANODE VTAM FS6 REMLH DNDDE PEP121t-A THRH TFXT 41141.121060 TIME 38 RNIO Asce OOFDCD60 CPU 0001. JObN VTAM TIME 41149.32486~ 39 USRFD FEF ASCB COFOCDt..O JOBN VTAM FOb TPIOS IN ANODE VTAM FSB kl::.MOTt: ONODE PEI'13t..A THRH TfXT TIME41149.3284'76 40 USRFD FfF ASCB OOFDCDbO JObN V1AM FOB TPIOS OUT ANuDE V1AM THRH REMOTE ONODE Cl1l0f1 TEXT TIME 41149.34064C 41 RNIO ASCB GOFDCD60 CpU ~OOI JOSN VTAM TIMf 41141i.3451£:7 - -- 'EXTERNAL TRACE - DO TAPE LNG2 OOA4 RSVD 00000000 00000000 * ••••. * OUT H002000 IO()0002A 00080880 COOHl2CA 211B IN lEOOlOOO 2000002A 00069880 C001020A ~ 00000000 (0861369 00030000 RSVO 0812 LNG2 OOCO 022COOOO 00000000 10002000 D02ACOOO 00000000 ccrcooco 00000000 lE001COO 20obo02A 00069880 00 01020A *••• 0000C1CO (':OB61348 00120000 RSVD 0000 lC002000 10000000 00000880 00 01020121 IC LNG2 OOA4 RSVO a f ~)06COOO * ~ .() o -= 00000000 CCOO;OOO * ••••• ~. * fD ~ OUT 1£002000 10000028 000bDB8o DOOle2DI 211C IN If 001000 2C00002b OOOt9B80 ODOI~2nl 00000000 00661431 00030000 RSVD 0212 lNG2 ooce ~22CCOOO COOOOCOO 10002000 D0280000 ooooooeo aOGrODeO 00000000 IEOOI000 200DOQ2B 00069B80 00 010201 o~rbOOCO *••• * IN lC001000 20000001 00090800 000lr280 211COI 00000000 022COOOO lCOOI000 01028021 RSVD 0812 nOB612Al 00060000 lNG2 OOCC GOOOQOOC 10002000 00010000 00000000 oooooono COOOOOr,O 00090000 20000001 0009caco 00 lCOl * ....... * 00000000 00667280 onlOOO~C kSVD 0000 lCOC21lC I COOCJOO·~. ('000688C 00 110101 lNG2 OOA4 *••• RSVO (I{;(:OO:;. 0 (t~'-CtO,' ;,'tlv * OUT IFC0211C 10000001 OOOb.6bftO COll0101 - - Figure 4-3. VT AM and GTF Traces (Part 3 of 13) ~ ~-~ ~ "'~7 _?" - ~ 42 kNIO ASCB OCFDCObv 43 USRF 0 'F EF 44 USRFD HF 45 RNIU ASCB 46 RNIO ASCB 47 USRFD FEF CPU 0001 JU8N VTAM TIME 41750.CZ5~22 ASC.I; OOFDCD6C JOBN VT AM TPIOS IN ANODE VTAM FDb RtMGTE DNODE CLILOt7 fS8 THRH TEXT TIME ~1750.C28~7q ASCB OOFOCDbO JO.BN VTAM TPI05 OUT A~ODE VIAM FOB REMOTt DNODE UICILOE7 THRH TEXT TIME 41756.278253 OOFDCD60 CPU 0001 J08N VTAM TIMt 41758.282b5b COFOCD6( CPU COOl JOBN VTAM TIME 41758.828781 ASCh OOf'DCD60 JOBN VTAM TPIOS IN ANODE VTAM FOB REMOTE DNDDE UICIlOE7 FSB THRH - EXTERNAL TRACF - DO TAPE IN IFOOlC'GO 211C(\001 (V'''00fB80 C01F1F3 f'6fOFOFe COOOOOOO CObb736~ OOOACOOO RSVD 0812 LNG2 00C0 022COOOO OOfDroeD 10Q0211C 00010000 rooncoco 80000000 ~OC"COOO IF001000 211C0001 OOOOEB80 00 1101F3F6 FOFQFOFO FOFO *~.36000000 00000000 00b67348 00100000 RSVD 0000 lC002110 10000[00 COOOtB80 00 000101 LNG? (,(jA4 RSVv oroooooo * 000 0000 OOOCOOOO * ••• * OUT IF00211D 10000001 OOU66BPO 00000101 ~ IN If-001GOO 2110("001 0005EB80 000001 ""CI ""CI 00000000 OOB672Al 00020000 RSVD 0812 LNG2 ooce 022COOOO 00000000 10002110 00010000 00000000 ooooooeo 1FOOIOOO 21100001 0005EBBO 00 TEXT OD01 ooooeooo *•• *••••• * ~ ~ ~ a ~ 'tS ~ ~ fa • .." ~.... ~ ~ ~ ;... -..l *** OATf DAY 168 YEAR 1975 51 USRFD FEF ASC8 OOFE4390 JO~N TRAC360C TPIOS IN ANODE PIVT3600 FOB REMOTE DNODE U1C1l0E7 FSB THRH TEXT lIME 41786.044994 52 USRF~ FEF ASCB OOFE4390 JOBN TRAC36CO TPIOS OUT ANODE P1VT3600 FOB REMOTE DNODE UICILOE7 THRH TEXT TIME 41788.093294 53 RNIO ASCB OOFE4390 CPU 0001 JOBN TRAC3600 TIME 41788.097764 - Figure 4-3. VTAM and GTF Traces (Part 4 of 13) III .n OOOO~OOO * * oooeoooo *. LNG2 00A4 *. 0~040000 * RSVD 00000000 00000000 * ~ ~. ~ *** 00000000 00867369 0001000C RSVO 0812 LNGZ OOCO 022COOOO 00000000 1001211D 00010000 COOOOOOO ceoooooo IFOOI0~1 21lDOOOI 0004EB80 00 31 OUT 1F002110 1C010002 00046B8C OOAO tD a * TIME 16.3t.27.900691 00000000 00861348 COOfOOOO RSVO 0000 lC002110 10010000 G0006BBQ 00 AO ~ o~OSOOOO TIME 41758.b32307 48 USRFD fEF ASCB OOFt:4390 JOBN TRAC360(j TPIOS OUT ANODE P1VT360C FOE 00000000 00B67280 00310000 RSVD oeco LNG2 OOA4 RSVD OOOCOOOO REMOTE DNODE U1C1l0E7 THRH 1C002110 10010000 C~006B80 00 TEXT 31010302 FF910000 D9(5(3D6 09C44040 r0000040 • • RECORD 40404040 40404008 D7F1E5E3 F3F6FOFO .PIV136CO TIME 41787.389730 OUT 1F00211D 10010001 00276BeO 00310103 OZFF910C 49 RNlO ASCB OOFE4390 CPU COOl JObN TRAC3600 TIME 41787.394270 IN 1FC01001 21100001 0004E680 0031 50 RNIO ASCB OOFE4390 CPU 0001 JOBN TRAC3600 TIME 41788.042000 ~ (") =t. - - ..... ; C(;H 429(: J(If:,N ThAC. (;'(0 TPIOS IN ANGll ~lV13t( Hi (,c co.:,'(, R~MPT~ GNOD~ UICILCt F~ O~ T~ H 1~ T At;: H 56 IN IFOOICCI Ti\AL36::'O USRFD Ffl ASCB tiL ~11~2C(: con4E6~~ - on TAPi" ~OA( 417~f.0~:lj4 TI~L _ f-t 72.AI ;'.01'" ,( (,·0('(' :'00CC;f)O 1,)01211[1 OI)020~Ct' ~1)Cl 00 211C~OG2 (,OC~E~a[ k::'V[J Cf<12 ("O~, lNG2 u.cn 0000 :CCCC'CC' cee ;(;(. t l>04(" \,:0 *. 417b~.b4~266 JOUN lkAC36f~ ANODE ~lVl~GC0 TEXT D~(5CIC4 r~~t~OF3 ~BG7C~4: 7DD7DQDb G4f7E?7D t.N(lL·E UIClLCE7 4CCtD6[J 404(,4C40 * * * * 40404G4r,) 40404040 TIMr 41972.11982C OUT HC'021lD 1<'010003 OO!)B1)390 C'('C4LIE3 LI4C('2('1 70 RNIG A~(l' (tCH439(; CPU ~\-01 JuuN TRAC3bCO TI~E 71 RNIO AStt ~ n g" CPU 0001 JOhN TRAC36GO IN 1(001001 211D0003 00180390 00C90508 t4C9D9E8 TIMt 41991.309 .. 95 72 USRF[ FlF ASCE 'OOFE~390 JUbN TRA(360C TPIOS IN - ANODE PIVT3(;.O(; Hi6 (000000(, -LO~o7111 CC1~COOO RSVD OEl2 lNG2 OOCO R~MLT[ ~NQD[ UICllCE7 F~B 022(0000 lOOCOCOO l001211~ U00300CC Dcceocoo CCOOtOCO C0000COO GOlbOOOO T~RP lC~OI~01 211D0003 ~OlbG~9C QO TEXl C9D5U8E4 C909f~4C ~3F6fOf~ 40F~FrF2 ~~f4FIC~ *INQUIRY3~OO oo~ UIC* F1 *1 * a a ." 0- !;;. ~ 41q91.3124 CB OCFE439C' JObN TRAC3t,1;0 GUT ANL{;[ PIVT360c.. HXl C/l LNC[;f UIC.UC£ 7 C~Clf3tl 4CCZtlf2 C54DCl~7 C1D3CQC3 CIE3C9Cb U54('Lc,.C~ f.2[J7u6D~ l~L54CD4 C~E2E2Cl C7C~~04C 4('404' 40 .,04::4'41'1 .. C41,"·4(-4'.: 4e:4('4"4C 4(>41>4('40 ~(;"'~~'-'4C --- 4~404~4G 4n4~4~4~ 4~4n4~4~ 4~4~~040 404(4040 40404('4( TIMf 41991.323826 Figure 4-3. VT AM and GTF Traces (Part 6 of 13) *INQUIRY 3600 C02 UIL* *1 TIME ." i TEXl [;N(J[)l \JIClVi:7 -- -- * *~ATA bA~~ APPliCA1IC* *N RfSPGNSE MESSAGE * * * * * * * ~ ." a= ~ 9 --= fIl t"J Q ..... ~. ~ ,e ~ W t...» Q - -- - 75 US~FL ~tf ~St~ ~0F~43~( TPI(;~ o ~IUT fd:MLlt: ~ en (,f-l(·Dl UICll( 1:7 [f rt Is !3 $' OQ t': 0" ~ ~' en ~ i' o ~ n' ~ g. = .E' ; "" 76 kNIC ASCh --17 -- - lIM[ 41991. U~F.4~9C CPU C001 JObN 4199 1. 33;" '163 TJM:; - - - .JUblli TRt.C3cfO AN(lCf PIVT3le;) N !3 - - EXTFF:NAL TP,AU - DC TAPI ~OCCDOOC (OB670FG rOb~GOOO LNC2 RSVu 0000 lC0021l[l li':Cll"'''CO ; Of}G3'-'0 CO TEXT C4LIE3Cl ~OCZCIE~ t54CC187 0703(9C3 Clt:3C9D6 054CDCfC5 £.287Db(\5 !'2Ce; /tCDlo C~E2f2C 1 C7C54040 404~4(40 ~c4t4r4C 404C4n~C 4n404G4C 4P4J4C40 404t~C4n 40404(40 4040404( 40404C4C 40404040 404(4C40 404 r 4C4C HiE ~?7:J79 LCe LCD LCu LCD LCD UUT It 0 0211D TRAC~~0~ 9 '1 9 9 9 lIMf42085.93Y817 78 USRFD FF2 ASCb rCFOCD6C JOhN VTAM ut-.(,DE P~P73bA LHH LCD 9 LCD 9 LCU 9 LCD 9 LCD 9 LCD 9 LCD 9 LCD 9 LCD 9 LC[j 9 TIME 42085.942128 79 USRF~ FF2 ASCB rOFDCD60 JDBN VTAM LINE uNDO£: Pl:-P736A LCD 9 LCD 9 LCD 9 LC[! 9 LCD 9 LC[I 9 LCD 9 LCD 9 LCD 9 LCD 9 TIME 42(-85.942605 80USRFD FF2 ASCB OOFDC060 JOBN VTAM ONODE PE:P73tA LINE LCD 9 LCD 9 LCD 9 LCD 9 LC[) 9 TIME42085.943443 Figure 4-3. VTAM and GTF Traces (Part 7 of 13) (:(lA4 ~,~Vl. C00C~~OO ~OCOtCOQ nl~:t-: '" C058:390 Or(4C}E3 C140('2Ll - f'CF q P(.F Q I-'CF " f'CF 9 PCf 5 TIME TlMf TIME TIME TlMf [P 6 7 1 7 7 7 1 7 PCf- 1 PCF 7 00 TIME TIMF TIME TIME TIME TIME TIME TIME TIME TIME PCf PCf f'CF PCF PCF Pc.F PCF PCf l~nlG004 *OA1A ~~Sl A~PLI(ATIO* *N kE~PON~~ HES~~lt: * ** * >I' * PDf 7F P[~F Cl PDF 33 PDF 1F PDF FF LCu LCD LCD LCD 9 9 9 9 PCF 9 PCF ,Q PC': 9 PCF 9 TIME TIMF TIME TIME 07 01 07 01 ~('f "'5 SCf 40 SCF 40 SCF 1t5 PDF PDF PDF PDF 71:; Fl 3A FF 09 09 C9 TIME 00 SCF CD SCF 49 SCF 49 SCF 49 SCF 49 SCF 49 SCF 49 SCF 49 SCF 49 SCF 49 PDF PDF PDF PDF PDF PDF PDF PDF f'OF PDF CO 01 OC 90 C9 08 C9 E8 LtD LCD LCD LCD LCD LCD LCD LCD LCD 9 9 9 9 9 9 9 9 9 PCF PCF PCF PCF PCF PCF PCF PCF PCF 7 1 1 1 1 7 7 7 7 TIME TIME TIME TIME TIME' TIME TIME TIME TIME 09 09 09 09 09 09 09 09 09 SCF SCF SCF SCF SCF SCF SCF SCF SCF 49 49 49 49 49 49 49 49 49 PDF PDF PDF PDF PDF PDF PDF PDF PDF C'l 2C 01 00 03 CO D5 f4 09 07 07 07 01 01 09 09 09 09 09 09 (i9 ~CF 4~ SCf SCF SCF SCF 40 40 45 45 FF !:.E PCF f'CF PCF PCF PCF PCF PCF PCF PCF PCF 7 1 1 7 1 7 7 7 6 9 TIME 09 TIME 09 TIME 09 TIME OA TIME OA TIME OA TIME OA TIMl OA TIME OA TIME,OA TIME 08 SCF 49 SCF 49 SCF 49 SCF 49 SCF 49 SCF 49 SCF 49 SCF 49 SCF 00 SCF 45 PDF PDF PDF PDF PDF PDF PDF PDF PDF PDF 40 F6 FO FO F1 E4 C3 CB 85 7f LCD LCD LCD LCD LCD LCD LCD LCD LCD 9 9 9 9 9 9 9 9 9 PCF PCF PCF PCf PCF PCF PCF PCF PCF 7 7 7 7 7 7 7 7 9 TIME TIME TIME TIME TIME TIME TIME TIMe; TIME 09 09 OA OA OA OA OA OA OA SCF SCF SCF SCF SCF SCF SCF SCF SCF 49 4'1 49 49 49 49 49 49 45 PDF PDF, PDF PDF PDF PDF PDF PDF PDF F3 FO 40 H 40 FI FI 85 00 PCF PCF PCF PCF PCF [P 9 9 9 9 5 00 TIME TIME: TIME TIME TIME TIME' OB SCF 45 SCF 40 SCF 40 SCF 45 SCf 45 PDF PDF PDF PDF PDF 7E CI 3D 7E FF LCD lCO LCD LCD 9 9 9 9 PCF PCF PCF PCF 9 9 9 9 TIME TIME TIME TIME OA OA OA OA SCF SCF SCF SCf 45 40 40 45 PDF P.OF PDF PDF 1E 11 00 FF f:P 00 OA OA OA OA OA - - -::-J a~ a fIj .() o s·a ~ ~ "",.....,., .-, ~XTERNAL TRACE - DO TAPE 81 USRFD FEF ASCS 00FE4390 JObN TRAC3600 TPIOS IN ANODE PIVT360C FOB REMOTE ONOOE U1CllOE7 F~B THRH TEXT TIME 42085.948765 USRFO FF1 ASCB 00FE4390 JOE:;N TRAC3600 ell IN ANODE PIVT3600 TEXT ONOOE UIC'1l0E7 TIME 42085.956048 . USRFO FF1 ASCB 00~E4390 JOBN TRAC3600 C/l OUT ANODE PIVT3000 TEXT ONODE UICILOf7 00000000 00B67369 00150000 RSVD 0812 lNG2 OOCO 022COOOO 00000000 10012110 OOOCOOOO 00000000 OOOOOOO'\) oooooooe 00180000 1COOI001' 2110(100C-00180390 OC C90508E4 C9D9E840 F3FbFOFO 40FOFIF1 4OE4FIC3 *INQUIRY 3bOO Oil UIC* FI *1 * C9D5v8E4 C909f84C F3F6FOFO 40FOF1Fl 40E4FlC3 Fl *INQUIRY 3600 011 UIC* *1 * C4CIE3CI 0540D9C5 40404040 4C404040 *OATA BASE APPlICATIO* *N RESPONSE ME:SSAGE * * * * * 40C2Cl£:2 E207D605 4('404040 40404040 C540CI07 E2C540D4 40404040 4C40404r 0703C9C3 C!1E2E:2Cl 40404040 404n4C4C C1E3C90b C7C54040 404C4040 40404040 82 ff (') ;to 8 ~ ~ a a .... "I:S ·0 ~ 'e?, "< ~. flO) .t' "I:S USRFD FEF ~sce OOFF4390 JOSN TRAC36fO TPIOS OUT ANODE PIVT3600 FOb 00000000 00867348 C06~00OO REMOTE DNODE UICILOE:7 THRH IC00211D 10010000 00000390 TEXT C4Clf3Cl 40C2tlE2 C540C1D7 054009C5 E2070605,E2C540D4 40404040 4040404C 40404'::40 40404·040 404(14040 40404041) 4040404~ 4r4"4040 TIME 42085.970723 83 USRFD FF2 ASCS OtFDCD60 JOEN VTAM DNODE: PFP7:;;6A LIM E:P co TIME CF LCD 9 po- 6 lIME DC SC,F 00 POF fF Lce 9 H.F 7 TIME UC SCF 49 PDF 51 LCfI 9 PCF 7 TIME OC StF 49 PDF 9F LCD 9 pet- <.1 TIME OC SCF 45 PDF 00 LtD 9 PCF 'i TIME OC S(.F 45 PDF 7E LCD 9 PCF S- TIME cc StF 40 PCF Cl LCD 9 PtF 9 TIME DC SCF 40 PDF 2A LCC 9 f'CF 'i TIME OC . SCF 40 PDF 01 LC.I. 9 PCF- 9 TIME OC SCF 40 PI)F 00 Leo 9 peF 9 TIME (-(. S('F 4C PDF 03 TIME 420&6.537157 84 USRFD FF2 .ASCS CC.FOC[lb0 JOBN VTAM LINE UNODE PfP736A E P 00 TIMI: Of: LC[) 9 I-'CF C) TIME OC S(.F 4C PDF 90 LCL 9 P(.F 9 TJ"'E nc SC.F 40 POF C~ LCf) 9 PCF 9 TIME OC SCF 40 PLF E3 LCD 9 P('F C; TIME: OQ SCF 4Ci PDF 40 LCD 9 PCF CI TIM( O[} SCF 4() P[}F C1 LC[: 9 PCF 7' TIME C·D SCF 4(\ PDF C5 LC!) 9 H.F .,.. TIMt cn SCF 40 ~DF Cl LCu <;, PCF 9 TIME- 0[1 SCF 40 PDF 07 LCu 9 PCF 9 UMf:' 0[: SCF .. 0 pt;.F e9 U .. r, '1 TIME (o[) SCF 40 PDF Cl PC.F ':' 11M[ 42(·!<0.537[;43 a~ ~ ~ ..... CO 0703C9C3 C5E2E2Cl 4040404Q 41)4(4040 LNG2 00A4 CIE3C906 C7C54040 40404040 404()4C40 ~ 00000000 OOOCOOOO 9 9 9 9 9 9 9 9 9 PCF PCF PC'-: PcrPCF PCF PCF PCF PCF 7 7 6 9 9 9 9 9 9 TIME TIME TIME TIMf. TIME TIME HME TIMf: TIME DC DC ct (lC DC Ot OC DC LCD L(.9 L(.I:; ·LCD lCD LCD LCD LCD LCD 9 9 9 9 9 9 9 9 9 PCJ-= PCF PCF PCF peF PCF PCF PCF PCF 9 9 9 9 9 9 9 9 TIMl: TIME TIMl: TIMl: UMF TIM!: TIM£. TIM£; TIME OC 0(, oe OD OD Of) 00 DC O~ OD SCF SCF SCF SCF SCF SCF SCF SCF SCF 49 49 00 45 45 40 'to 40 4C PDF PDF PDF PDF PDF PDF PDF PDF PDF CI 39 9f 7E 7E 04 CO rl 00 S('F S('F S('F ::'CF SCF SCF SCF SCF SCF- 40 40 40 PDF PDF PDF Pl.F P[;F PDFPDFPDF PDF- riO Cl Cl C2 E2 4(1 40 '40 40 40 4f) Figure 4-3. VTAM and GTF Traces (part 8 of 13) . .." .." g. *OATA BASE APPLICATIO* *N RESPONSE MESSAGE *. * * * * * * LCO LCD LCD LCD LCD LCD LCD LCD LCD Cj RSVD ;'" a fIJ .() -;s·= Q -Q. ~o li7 03 C3 - =- ~ RSVD 0000 -- ~ W N N 0 rn "< rn N rn '< fI> S!3 "'CI Jot 0 ~ r» = !3 er OQ ~ ; '3 a:: -< .rn ~ ~0 !. n ~ (D n e.=- .c c: (D fI> I, EXT~kNAL c, 85 USRFD FF2ASCB CO~OtOt:O JGbN VTAM LINE Dr'WDl PEP736A LtC. 9 LCD 9 LtL 9 LCD 9 LCV 9 LtD 9 LCD 9. LCD 9 LC[ c;. lIME 86 USRFD FF2 [P OC 'i TIME: TIME TIME TlMl TIME TIME TIME TIME. TIME Cf P(.F PtF P(F PCF PCF P(.F 9 9 q 9 9 '-I 9 TIMf TIME: TIME TIME TIME TIME lIME T J~I: 11 SCF 40 SCF 40 SCF 40 SCF 45 SCF 45 SCF 40 SCF 40 ~CF 45 SCf 00 SCF 4'1 TIMl 11 SCF 49 SCf- DO SCF 45 SCF 4!> SCF 4() set- 40 SCF 40 SCF 40 SCF 40 SCF 40 PDF C1 PDF 01 PDF 1£ FDF 1£ P~)F 11 I'DF DO PDF F-F PlJF ~F PDF 71 SCF SCF SCF SCF S(.F SCF SCF ~ ." ." ... 0 ~ tD a --.... fIJ () 0 == 5· PDF 3B PDF PUF F['F t'CF PDF PDF bE 71: 7~ (:6 oa '01 PDF Or\ V()F 4') PDF 40 lIME 11 S('F 40 PDF 40 SCF 40 . PDF 40 TIM!: 11 11 11 11 11 11 11 P[;F E2 LCD LC[' Lcr. LCl; LCD LCD LCD LCD LCD 9 9 4 9 9 PCF 7 PCF 9 PCF 9 PC~ 9 PCF 9 PCF 9 PCF 9 PCf- q PCF 9 TIM.., TIME TIME TIME TIME TIME TIMf: TIME TIMf 10 10 10 10 10 10 10 lC' 10 SCF 49 SCF 45 SCF 45 SCF 40 StF 40 SCF40 SCF 40 ~CF 4(1 SCF 40 P[)F PDF PDF PDF PDF PDF PDF PDF PDF LCD 9 PCF 9 TIME. 10 SCF 40 PDF 40 LCD LCD LCD LCO PCF PCF PCF PCF PCF PCF PCF TIME 11 SCF SCF SCF SCf SCF $CF PDF PDF PDF P[IF PDF P[)F PDF q 9 9 <1 = tD b£: OC 11: Cl 26 l'l 00 t5 40 Q. '-' 13 40 40 40 40 40 40 4CJ PDF PDFPDF PDF PDF PDF f>OF 40 40 4(1 40 40 40 40 9 9 9 9 lee 9 LCD 9 LCD 9 9 9 9 9 9 9 9 TIME 11 TIME 11 TIMF 11 lIME,l1 TIME 11 TlMf 11 ~CF 40 .40 40 40 40 40 40 ~o 40 4'0 40 40 40 40 .Figure 4-3. VTAM and GTF Traces (part 9 of 13) ~ ~ -~ '3 fXTFRN~L TRACE - DD TAPE LCD 9 LCD 9 LCD 9 f€l (') d'. ~ f: !R a "0 ~. ~ ea. '< ~. I:fl ~ TIM!: 42087.037982 90 USRFC FF2 ASC6 OOFDCObO JOhN VTAM LINE DNDDE P!:P736A LCD 9 LCll 9 LCD '1 LCD 9 LCD 9 LCD 9 LCD 9 LCD 9 LCD 9 LCD 9 TIME 42087.03&683 91 USRFD FF2 ASCB DOFOCD60 JOBN VTAM oNDDE PtP736A lINE LCD 9 LCD 9 LCD 9 lCO 9 LCD 9 LCD 9 LCD 9 LCD 9 LCD 9 LCD 9 TIMF 42087.039365 92 USRFD FF2 ASCB OOFDCD60 JOBN VTAM LINE DNODE PE.P73bA LCD 9 LCD 9 LCD 9 LCD 9 LCD 9 LCD 9 LCD 9 LCD 9 LCD 9 LCD 9 TIME 42087.041704 93 USRFD FF2 ASCS 00FDCD60 JOSN VTAM LINE DNODE PE:P736A LCD 9 LCD 9 LC~ 9 LCD 9 LCD 9 LCD 9 ...o go "0 .1\) ~ iN ~ IN Figure 4-3. VT AM and GTF Traces (Part 10 of 13) PCF 9 PCF 9 PCF 9 TIME: 11 TIME 11 TIME 11 SCF 40 SCF 4Q SCF 40 PDF 40 PDF 40 PDF 4(1 LCD 9 LCD 9 PCF 9 . TIME- 11 PCF 9 TIME 11 SCF 40 SCF 40 PDF 40 .-[iF 40 00 TIME TIME TIME TIME TIME TIM£: TIME TIME TIME lIME 11 11 11 11 11 11 12 12 12 12 TIME. 13 SCF 40 SCF 40 SCF 40 SCF 40 SCF 40 SCF 40 SCF 40 SCF 40 SCF 40 SCF 40 PDF PDF PDF PDF PDF PDF PDF PDF PDF PDF LCD LCD LCD LCD LCD LCD LCD LCD LCD PCF PCF PCF PCF PCF PCF PCF PCF PCF SCF SCF SCF SCF SCF SCF SCF SCF SCF PDF PDF PDF PDF POF PDF PDF PDF PDF PDF PDF PDF PDF PDF PDF PDF PDF PDF PDF 40 40 40 9F 7E 7E Cl 3D 71: FF LCD 9 LCD 9 LCD 9 LCD 9 LCD.9 LCD 9 LCD 9 LCD 9 LCD 9 PCF PCF PCF PCF PCF PCF PCF PCF PCF 9 9 9 9 9 9 9 9 9 TIME TIME TIME TIME TIME TIME TIME TIME TIME 12 12 12 12 12 12 12 12 12 SCF SCF SCF SCF SCF SCF SCF SCF SCF 40 40 40 40 45 45 40 40 45 PDF PDF PDF PDF PDF PDF PDF PDF PDF 40 40 40 02 7E 7E 11 DO FF FF 91 59 00 7E C1 30 7E FF Cl LCD LCD LCD LCD LCD LCD LCD LCD LCD 9 9 9 9 9 9 9 9 9 PCF Pt-F PCF PCF PCF PCF PCF PCF PCF 7 7 6 9 9 9 9 9 6 TIME TIME TIME TIME TIME TIME TIME TIME TIME 14 14 14 14 14 14 14 14 14 SCF SCF SCF SCF SCF SCF SCF SCF SCF 49 49 00 45 45 40 40 45 00 PDF PDF PDF PDF PDF PDF PDF PDF P(JF C1 35 59 7E 7E 11 DO FF FF 91 LCD LCD LCD LCD LCD LCD 9 9 9 9 9 9 PCF PCF PCF PCF PCf PCF 7 6 9 9 9 9 TIME TIME TIME TIME TIME TIME 14 14 15 15 15 15 SCF SCF SCF SCf SCf SCF 49 00 45 45 40 40 PDf PDf PDF PDF PDf PDF 35 59 7E 1E 11 DO PCF PCF PCF PCF PCF PCF PCF PCF PCF PCF EP 9 9 9 9 9 9 9 9 9 9 PCF PCF PCF PCF PCF PCF PCF PCF PCF PCF EP 00 9 TIM!:: 9 TIME 9 TIME 9 TIME 9 TIME 9 TIMf 9 TIME 9 TIME 9 TIME 5 TtME 12 12 12 12 12 12 12 12 12 12 TIME 13 SCF 40 SCF 40 SCF 40 SCF 40 SCF 45 SCF 45 SCF 40 SCF 40 SCF 45 SCE' 45 9 5 7 00 TIME TIME TIME HME TIME TIME TIME TIME TIME TIME 14 14 14 14 14 14 14 14 14 14 TIME 16 SCF 00 SCF 49 SCF 49 SCF 45 SCF 45 SCF 40 SCF 40 SCF 45 SCF 40 SCF 49 PDF PDF PDF PDF PDF PDF PDF PDF PDF PDF EP 7 7 9 9 9 9 00 TIME TIME TIME TIME TIME TIME 14 14 14 15 15 15 liME 10 SCF 49 SC.F 49 SCF 45 SCF 45 SCf 40 SCF 40 PDF PDF PDF PDF PDF PD·F PCF PCF PCF PCF PCF flCF PCF PCF PCF PCF PCF PCF PCF PCF PCF PCF EP 6 7 7 9 9 9 9 40 40 40 40 40 40 40 40 40 40 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 TIME TIME. TIME TIME TIME TIME TIME: TIME TIME 11 11 11 11 11 12 12 12 12 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 ~ """ ~9 CO 7E Cl 3D J a --....g fIJ () ~. -e ~ W EXTERNAL TRACE - 00 TAPE N ~ lCLI lCD lCD lCD oCIl "< CIl TIME ~ ~ c:I> S- a ~ ~ aa 5· CJQ t:'! fa:: ~ S2 «!i g. a.~ i c:I> ... ~ 9 9 9 9 PCF PCF PCF PCF 9 5 1 1 TIME TIME TIME TIME 15 15 15 15 SCF 'SCF SCF SCF 45 40 49 49 PDF PDF PDF PDf 1E FF C1 35 lCO 9 lCD 9 lCD 9 PCF.9 PCF b PCF 1 TIME 15 TIME 15 TIME 15 SCF 45 SCF 00 SCF 49 PDF FF PDF FF PDF 91 42081.042310 '--- 94 RNIO ASCS 00FE4390 CPU 0001 JObN TRAC3bOO IN 1COOI00121100017 00180390 00C90508 E4C909E8 TIMf 42224.350661 95 USRFO FEF ASCB OOFE4390 JObN TRAC3600 TPIOS IN ANODE P1VT3600 FOB 00000000 00B61111 ~0150000 RSVO 0812 lNG2 OOCO REMOTE ONODE UICll0E1 FSB 022COOOO 00000000 10012110 00170000 00000000 00000000 00000007 ~1800CO THR~ lCOOI001 211D0011 00180390 00 TEXT C90508E4 C9D9E8~0 F3F6FOFO 40FOF2F2 40E4FIC3 *INQUIRY 36CO 022 U1C* Fl *1 * 42224.35366~; TIME 96 USRFD FFI ASCS OOFE4390 JOSN TkAC3600 IN ANODE P1VT3bOO TEXT C90508[4 C9D9[840 F3F6FCfO 40FOF2F2 4DE4FIC3 *INQUIRY 3600 022 U1C* C/l DNODE UIClLOE7 F1 *1 * 42224.351028 TIME 97 USRF£: FF 1 ASCf OOFE4390 JObN TRAC3bOO OUT ANODE ~IVT3600 TEXT C4C1E3Cl 40(2CIE2 C54CCI01 0703C9C3 CIE3C906 *OATA BASf APPlI(ATIO* C/l ONODE: UIClLOl:1 054009C5 [2010605 £2(54004 C5E2E2Cl C1C54040 *N RESPONSE MESSAbE * 40404040 40.404(;40 40404040 40404040 40404040 * * 40404040 40404040 40404040 40404040 40404040 * ** 40404040 40404Q40 TIME 42224.365018 98 USRFO FEf- ASC£) OOH4390 JOlN TRAC3H,~ Tf'lOS OUT ANODE PI VT3600 FOB OO'd)(..OC>O cOB670FO 00650000 RSVO 0000 LNG2 OOA4 RSVD COCOOO~ 0 Qr.,OOOOOO REMOT~ DNODE UICll0E7 THRH 1(002110 10010000 00000390 00 TEXT C4C1E3Cl 40C2CIE2 C540CI07 07D3C9C3 Clf3C906 *OATA BASE APPlICATIO* D~4009C5 E2D7DbD5 E2C54004 C5E2E2Cl C7C54040 *N RESPONSE MESSAGE. 40404~40 40404040 4040404C 4C404040 40404040 n 40404 40 4C40404r 404~,4{\4':-' 4C4G4040 40404040 40404,:40 404(\4(,40 TIME 42224.36E637 OUT lE00211D 10010018 005B0390 OOC4CIE3 C140(2Cl 99 RNIO AStB COFF4390 CPU 0001 JObN TRAC36CG lIM[ 42224.375301 OOFE439( CPU("~~1 JObN A('3bOO IN 1(,001001 21100018 O~180390 COC9D5D8 E4('909E8 100 RNIO ASC8 TIME 42231.552094\ 101 USRFD FEF ASCS COFt4390 JOBN TRAC36~O TPIOS LUT ANOOE PIVT36C0 FOb ocoeoooo (0867280 OOOEOOOO RSVD COOO lNG2 COA4 RSVD 00000000 00000000 kEMnT~ DNODf UICIlOE7 HMH lC002110 10010000 20006B80 00 T XT Al *. TIME 4?23f:.C343Q2 . 102 RNIO ASCo (OF f"4390 CPU COCI JOBN TRAC36C,_ IN IFOOIOOI 21100003 0004EB80 OCA1 lIME 4L23~.b52~~O ' 103 USRF r HF ASCP 0 vFE439C JOt.N TRAC36';O RSVO 0812 lNG2 OOCO TPI0S IN ANODE PIVT3600 fDB 00000000 "0667111 COCIOOOO 0~2COOCO 00000000 I~CI2rlD 0003CDct oooeoooo 00000000 (OeOOOoo OC04COOO klMOT[ DNOOE UICILOE7 FSB THRH IFOOIDCl 21100003 0004EB80 00 next Al *. ~ ::p g. [ f'-I -.. (') -! g * * * * * * * * \ * - lIHF- * 42238.691~21 ~ FigUre 4-3. VTAM and GTF Traces (Part Hof 13) ~ ~ ~ - - - -- 104 USRFfJ HF ASCB OOH4-390 JO~N TRAC36('.O FOB TPIGS OUT ANODE PIVT3bOO REMOTE DNOOt U1CILCE7 THRH TEXT TIME 4223E.717014 105 RNIO ASCS OOFE43YO CP~ 0001 JOBN TRAC36CO TIME 42239.251781 106 USRFD HF ASCB OOFt439Q JOBN TRAC3b~O TPIOS IN ANUDe P1VT3bOO FOB R~MOTE ~NOOt UIC1LOE7 FSb THRH TEXT TIMr 42239.254&10 1W USRFU FEF ASCB ODFDCD60 JOBN VTAM TPIOS OUT ANODE VTAM FDB REMOTE DNODE UICILOE7 THRH TEXT TIME 42253.494622 108 RNIO ASCB ft"FDCD6(" CPU 0001 JOBN VT AM TIME 42253.499154 109 RNIO ASCB OOFDCObO CPU 0001 JOElN VTAM TIME 42254.054066 110 USRFD FEF ASCB COFDCDbO JOliN VlAM TPIOS IN ANODf VTAM FDB REMOTE DNOU~ UICILCE1 FSE THRH HXT TIME 42254.121367 - - EXTERNAL TRACE - DO TAPE 00000000 GOBblQ28 OOOFcono RSVD lCCC211D 10010000 C:006880 00 3201 IN IFOOI001 211UCCOl croe LNG2 C0A4 PSVO - OODocaso CQOOGboo * •. 0004Eb~O * 0032 OOeb75Cl 0001000e RSVD C812 lNG2 OOCO C22C0000 00000000 1001211D 00010000 oeOraDOO OOOOOOCO IFnDl~Ol illDOOOl rOC4fB8C 00 32 *. toco~~oo 000000:0 OOB~75AO r-OE0000 RSVD 00(;(' lC00211D 100COOOO r0006B8C 00 OE lNC;2 rCA4 OO~OOOO 00040000 * RSVD *. 00000000 oooooeoo * ..,;j ""0 ~ OUT IFOC211D 1(000002 00046680 OOOf: 0 TN IFOOIUOO 21}00002 0004~&~C s: OOOf ('I) !3 fIl -- ooooeOOD UOBb72Al ~OO10000 RSVD 0812 lNC;2 ooco 'J22COOOCt :;OO))O}O'; 1(,'02110 0;;020:00 ooocoooo 00000000 00000000 00040000 IFOOI000 21100002 C004EB80 DO uf *. ( ") 0 ::I 9· * s:: ('I) ~ *** DATE: DAY 168 YEAI<. 1975 111 USRFD FH ASCB OOFDCD60 JOhN VTAM TPIOS OUT ANODE VTAM REMPTE DNODE Clll0b7 ~ (') 112 RNIO ASCb d". g ~ 113 RNIO ASCB 114 USRFD FEF ~ a '0 i ~ ~ 115 USRFD FEF ~. ~ '0 i 116 RNIC ASCB ~ ~ FOB 00000000 10B67280 ~~OFCOCO RSVO 0000 THRH lC00211C 10000000 00006e80 00 TfXT 1202 TIMf 42263.944e7b 00FDCD60 CPU «01 JDBN VTAM TIME 42263.952790 OOFOCObO CPU COOl JuBN VTAM TIME 42264.555b~1 ASC5 00FDCD60 JOBN VTAM Foe TPIOS IN ANODE V1AM REMOTE UNODE tLll0~7 FSE THRH lEXT TIME 422b4~55q330 ASCB otFDCDbC JOeN VTAM FOB TPIOS OUT ANOCE VTAM REMOTE DNOuE PeP730A THRH TEXT TIME 42264.582179 COFDCD60 CPU 0001 JUBN VTAM TIME 42264.~665b7 ~ Figure 4-3. VTAM and GTF Traces (Part 12 of 13) **. TIME 16.44.23.800573 LNG2 OOA4 kSVD *•• 00000000 00000000 * OUT IF00211C 10000002 00056880 001202 IN IFOOI000 211C0002 C012 ~~04fB80 RSVO 0812 00000000 008&75(1 00010000 lNG2 OOCO 022c0600 00(00000 100021]C 00020000 COOCOOOO 00000000 00000000 00040000 IF001000 211COOC2 nOC4EB8C 00 12 *. • 00000000 GCB675AC ~0120000 RSVD 0000 lCC02000 10000000 00000BbO 00 01020221 lC OUT IfC0200C lSOOC034 OOQ&~B8C lNG2 r;I)A4 * ••••• OCDIr2C2 lIIC - "SVD 00000000 00000000 * - - .,.,. w o ~ f'-l N 117 RNIO A$CE, (lOFDC(;t,O CPU (;001 JOBN VTAM TIME 42264.9~~744 118 USRHI FfF ASC& COFlJCDbC) Jue,'l VTAM ~Db TPIOS IN ANO~E VT~M REMClt DNC'Di:: PEP73t.A FSB f TIMl REM0Tf !: 121 RNIO AseB (OmCDoD ~ f'-l 52 c§ ONUOE P~P73~A TIME ~ a. 1 U> C0069SAC GOOI0202 FOL (OOOOODe CClb7?RD (0120000 THRH lCG02000 Il000000 ~CCCCB8( TIME TIM~ lNC2 ~DA4 * ••••• QIC20b21 IB OUl CPU O{)C 1 JOf.N VTAM 422b5.2C2C27 CPU COOl JUBN Vl AM O~CbCOOO I£Ot2[00 I 0 0C0035 IN 1[001000 2~OOD035 C00BC~6C 0001~20f kSVV ~OOCOCOO * 00000000 * 211t 000698eo 00010208 ~ 42210.35~ao4 122 USRFO HF ,ASCB COfDCDtO JOPN VTAM TPIOS IN ANODE VTAM DNO~f Pf~736A REMOTE TIME RSVD 0000 O~ 4226~.197C3& 42270.35~397 i(I). So 2C~QCC34 *... T~Xl 120 RNIO ASCf:i OOFCCD60 f 1~DC1300 4~264.9592b3 119 USRFD HF ASCI: 00F[lCU60 JObN VHM TPlcis GUT ANODE VTAM a5· OQ IN 00000000 (Oh672Al CCC30COC Rsvn 0812 LNG? roce 622C0000 (0000000 ]~C~2~CS C034COCO r00cCono JCOOOOCG [00"00eO THRH lEOOlf00 2CD00)34 ~OC69BHC 0r TlXT 01~202 ~ i a - rXTtkNAlTkAC[ - DO TAP[ ~ FOE OCOOCOOC tOb675Cl 00030000 RSVD 0812 lNG2 O~CO fSb 022COOOO 00000000 100C2COO 00350[OC 00000000 00000000 00000000 00060000 THRH If 001000 20000035 00069ErD 00 lEXl 0102)b *••• * J --g... fIJ () ~. Figure 4-3. VTAM and GTF Traces (Part 13 of 13) ! TP Problems (continued) Notes on Examples 1 and 2 1. Mappings of the data in the various trace entries are not included. Blocks such as FDB and FSB are described in VTAM manuals, OS/VS2 VTAM Logic and OS/VS2 VTAM Data Areas. PIU formats can be found in the 3704/3705 Program Reference Handbook. Those PIUs that accomplish a function other than transfer of data between an application and a terminal have a network command in the RU (as shown in entry 3 of the examples). Most are detailed in the "Network Commands" section of the 3704/3705 Program Reference Handbook. Network commands that the NCP must process are shown in more detail in the "Network Commands" appendix in IBM 3704 and 3705 Communications Controller Network Control Program/VS Logic. For a full understanding of the line trace entries, refer to 3704/3705 Communications Controller Principles of Operation for ICW field definitions. SDLC commands and N(R) - N(S) processing can be seen in the trace examples (at entries 83-93). The IBM 3704 and 3705 Program Reference Handbook section "SDLC Commands and Responses", and IBM Synchronous Data Link Control General Information may be useful. 2. Data flow between the application and the LU is on an exception-response-only basis. Other PIUs request positive FME acknowledgement, and the FMEs can be seen in the trace examples (at entry 8). 3. No pacing is used; it was not included in the NCP definition. 4. Note that the line trace shows the outbound PIU (see entry 82) changed from the FIDI TH (transmission header) format that was transferred to the NCP, into a FID2 format TH that is sent to a cluster controller (see entry 83). Also note that the NCB segmented the PIU during transmission to the controller. The length of the PIU was greater than the MAXDATA operand for the controller in the NCPgeneration (66 in this NCP), so it was broken into two segments. The second segment is sent (starting in entry 87). 5. At the beginning of each PIU transmission, there are three or four flags set (X'7E') because a temporary "superzap" was made in the-NCP at the time the trace was made. Normally only one flag would be sent. Section 4: Symptom Analysis Approach 4.3.27 TP Problems (continued) Summary Wh~n symptoms are intermittent or confusing, the debugger should be aware that there is some variable present that he hasnot recognized. In such situations, assumptions are dangerous, yet it is very easy and common to focus immediately on the wrong process as the location of a problem and assume that other steps must be correct. Using the traces in the TP components should help the debugger to make fewer assumptions. A less obvious benefit of these traces is their educational value. NCP line traces, for example, illustrate from actual examples the workings of TP line protocols and their use with different devices. These protocols are important when "what if' situations need to be projected and when line errors or terminal errors can be translated into some of the none-too-obviou.s external symptoms that sometimes result. Then, symptoms may be seen later in terms of possible component errors and traces or traps can be used to confirm suspicions. As discussed earlier, an operator command turns on (or off) a trace for one node. Sometimes, many or all terminals of a certain class or on a set of lines need to be traced. Depending on when an error is occurring, or on the connection design of a network, tracing sometimes should start as soon as the NCP is activated. A trace can be started when the NCP is activated (see Operator's Library: VTAM Network Operating Procedures or, Operator's'Library: OS/VS TeAM Level 10), buf if several must be started, the following technique has proven useful. In the NCP definition, code INITEST=YES on the PCCU macro, but do not put an INITEST DD card in the VTAM procedure. Upon activation of the NCP, VTAM asks the operator if he wants to bypass initial test. At this point the network is defined inside VT AM so traces can be started by, operator command, but the NCP has not yet been loaded. Start as many traces as desired and reply to bypass initial test. After your reply, the NCP will activate. -This technique was used to trace the initialization sequences shown in the first trace example. 4.3.28 OSNS2 System Programming Library: MVS Diagnostic Techniques TP Problems (continued) VTAM Buffer Trace Modification Many operator commands are required if all nodes are to be traced. However, VTAM modules can be superzapped to trace unconditionally. The examples that follow are intended only to illustrate a technique for gathering information. They may be applied differently to suit individual situations. Various techniques can be used for a VT AM buffer trace. Buffer trace for a node is indicated by bits in the 'RDTE (resource defmition table entry) and in the FMCB (function management control block) that describe a session. Module lSTOCCFB creates the FMCB and, by temporarily modifying it to create all FMCBs with trace bits on, causes buffer trace to always be done. NAME ISTOCCRT ISTOCCFB ver 08BA 9110A015 ver 08BE 47EOB8AE ver 08C2 9604E020 turn on in FMCB rep 08BE 4700 cause fall-thru test RDTE trace bit Alternately, each traced path can be "superzapped" in the logic that checks a, trace bit. For buffer trace, four "zaps" are required: for C/L and TPIOS buffer trace, and each IN and OUT. By zapping at these locations, selectivity is possible, and alternate control can be introduced by changing the logic from checking the trace bit in FMCB to checking some other global indicator, such as a flag in the PSA. This technique is illustrated in the following discussion of I/O trace. VT AM I/O Trace (RNIO) Modification The same alternatives apply to the I/O trace that applied to the VTAM buffer trace. To create every NCB (node control block) with the RNIO trace indicator on (so that everything is traced), the ZAP is: NAME ISTOCCRT ISTOCCFB ver 06BA 91087015 ver 06BE 47EOB6AE ver 06C2 9201AOID turn on flag in NCB rep 06BE 4700 cause fall-thru check flag in RDTE Section 4: Symptom Analysis Approach 4.3.29 TP Problems (continued) Using the second approach of altering the checking at the time of the tracing, two zaps are required .. The following zap will cause tracing of all inbound PIUs if a byte in the PSA(s) (iocation X'xxx') is set non-zero: NAME ISTZCEAB ISTZFMIB ver 68 9500301D check NCB flag rep 68 95000xxx check low-core flag instead ver 7A 9500401D check NCB flag rep 7A 95000xxx check low-core Two paths exist within ISTZFMIB, so two locations must be changed. For ,outbound PIUs, the logicin VTAM is slightly more complex. An indication of whether or not to perform RNIO trace is transferred from the NCB to the PIUs buffer area before the buffer is queued for output. However, the trace is not performed until the I/O is complete, so although the indicator is used at that time, VTAM also checks to see if the GTF RNIO trace is still active. Therefore, in addition to the following "zap," the GTF RNIO trace must be active before VTAM attempts to make the trace entry: NAME ISTZCEAB ISTZFFDB ver D2 91406011 47 EOyyyy check trace flag rep D2 95000xxx 4780y),ryy check low~core instead Other Tracing Methods If there is a single point to be made in thls section, it is to minimize assumptions about what or how data is travelling through a network. There are other sources of information besides standard traces that can provide snapshots of mes~ages at various points enroute. An application program may log every message received and sent. It may be important to know exactly at what point in the flow the logging occurs, but in general the log can be used as another trace point when access method traces have been followed as far toward the application as they go. Access method or component buffers may sometimes be used to see if a message got as far as a buffer, or to see what form a message had when it was put into a buffer by a component. Dumps of buffer areas and dumps of TCAM queues, from disk or main storage, would be used in place of traces. The limitation here is buffer or queue reuse, which often creates confusion when half of the message to be examined is found but critical information has been lost because of reuse. Nevertheless, these sources can be valuable. In the NCP, for example, because only one line can be traced, buffers usually provide the last snapshot of a message as it appeared before going to a terminal. Status indicators in buffer headers often can be used 'to tell how a message was processed; if the buffer is still in use, then backtracking to find a work element or process that refers to the buffer can provide the key to understanding why a message is stuck. The following example shows such a case. 4.3.30 OS/VS2 System Programming Library:MVS Diagnostic Techniques TP Problems (continued) Assume a heavy-running 3600 network in which a few logical units do not receive a response message after input is entered. The problem is intermittent and strikes any LU any time over a period of several hours. The GTF VT AM traces are run for all LUs during a typical run and when one LU fails to receive a response message~ traces are stopped and all network components dumped. It is not possible (without writing a user exit) to print the GTF trace for selected tenninals~ so the entire trace is printed for a short period surrounding the time of failure. Activity on the problem LU shows the last input message came into VTAM and the application and a response message was sent all the way out to the 3705. Line trace is not used because only one line can be traced and the problem line is not predictable. . Other variations are tried with the LU; it can be closed (via CLSDST) and reconnected and run~ so a hardware problem is unlikely. No MDR records in SYSl.LOGREC indicate an error on the line. At this point the problem is isolated to the NCP or beyond in the network. Using other indicators (in this case the NCP~s logical unit block~ LUB)~ an analysis of the message path from the time the NCP receives it shows that the last outbound PIU did not go out over the TP line. (NCP keeps the sequence number of the last PIU sent and the number in the GTF trace is the next higher sequence number than the one in the LUB.) The NCP buffers are searched and the missing PIU is found intact in a buffer. The problem is isolated to the NCP; the buffer is still in use~ and indicators in the buffer header reveal how much processing was done in the NCP; this leads eventually to the bug in the NCP. Section 4: Symptom Analysis Approach 4.3.31 4.3.32 OS/VS2 System Programming Libraty: MVS Diagnostic Techniques Performance Degradation This chapter describes how to investigate performance degradation problems. It is not intended to serve as a tuning guide or as a reference for general performance analysis (which should be performed through SMF, GTF, etc.). The following points should be considered when a problem is suspected in the operating system itself or in the manner in which applications use the operating system. Operator Commands When a bottleneck or system failure, hardware or software, is degrading throughput, the following operator commands can help identify the source of degradation and, possibly eliminate it. D A,L Displays current system status. A job step with a name of STARTING indicates initiation has not successfully completed. Also, if a job step is marked with an'S,' it is considered swapped out. Other jobs may be queuing behind these jobs in an allocation/ deallocation path. D R,L Displays any outstanding requests. Operator action is required (for example, to mount a volume). Other jobs may need to wait until action has been taken. DM Displays configuration information. The loss of a hardware component (for example, a channel) may have been noted on a hard copy console and missed by the operator. If a resource queue "snooper" program exists, it should be started and output examined to find any ENQ bottlenecks. If no such program is available, take a dump of an address space, the nucleus, and request SQA. The PRDMP service aid (with the QCBTRACE option) can then be used to print the dump so the resource queue can be examined. Use the job entry subsystem display commands to fmd the status of jobs, queues, printer setups, requirements of SYSOUT data sets, etc. to find reasons why JES2 is not able to schedule work. Some JES2 commands that may be useful are shown in Figure 44. Section 4: Symptom Analysis Approach 4.4.1 Performance Degradation (continued) $D J1-9999 S1-9999 T1-9999 Status of jobs, started tasks, or time-sharing users. If a range of jobs has been held they may be released using $AJ. $AJ Release jobs. $DF $DU,PRTS $TPRTN Status of output forms queue. Status of printer setup characteristics. Change setup to needs of queue output. $LJ 1-9999 ,H $OJ1 List held SYSOUT data sets Release held SYSOUT data sets $DO $AO Display queue. Release queue. $AA Release all jobs held by a $H A command. Figure 4-4. JES2 Commands for Status Information If the use of previous commands does not make it obvious why JE82 is not scheduling work, take a dump of the JE82 address space. Print the 8YS1.DUMPxx data set to help determine the problem. Find the number of the IPS member that should be active and issue T IPS=na to ensure that it is active. Print the IP8 member in 8YS1.PARMLIB and analyze the IPS for an explanation of degraded service. Then, enter the W command to print the system log to obtain the history of system execution. Figure 4-5 shows important hardware components used by the system that should be understood when a degradation problem is suspected. Dump Analysis Areas The following areas in a storage dump may provide a starting point for further analysis. Problems in these areas may indicate a bug or some unexpected use of system services. 1. 4.4.2 ENQ/DEQ - A check of ENQ/DEQ's processing queues may indicate contention problems. The in-use blocks are anchored in the CVT at CVT+ X'280' (CVTFQCB, first QCB element) and at CVT+X'284' (CVTLQCB, last QCB element). A queue of many QELs off a particular major or minor QCB should be explained. An indication of a possible problem is a mixture of shared and exclusive requests intertwined for one resource. The state (running! waiting/swapped-out/etc.) of the holder of the resource should be determined. OS!VS2 System Programming Library:. MVS Diagnostic Techniques Perfonnance Degradation (continued) @®' Ye~, Profile Services No Find Dominant Jobs ® -Yes Yes No ~ Yes Yes Build PAK (lEAPAKxx) List ®® Concurrent Analysis Find Major Contributors Find Major Contributors DASD Seek Analysis Primary Tools o SMF G ® MF/1 @ GTF Hardware Monitor Figure 4-5. System Use of Hardware Components Section 4: Symptom Analysis Approach 4.4.3 Performance Degradation {continued} Also check the free elements. The ENQ/OEQ global save area, mapped in lEAVENQ 1, contains six addresses , each of which points to the first element of a free queue. The ENQ/OEQ global save area is found through: CVT+X'2AC' (CVTSPSA) which points to the global save area table; the global save area table +X'20', points to the ENQ/OEQ global save area. There are multiple queu~s, each containing blocks of all one size. These blocks occupy SQA. Merging the free ele'ments with the in-use elements should provide an indication of ENQ SQA fragmentation. Because fixed storage is involved, the fragmentation may be reducing the number of frames available for paging. 2. lOS storage manager queues should be inspected. The anchors for the various pools (small, medium, and large block pools) are located at the end of IECVSMGR at external symbol IECVSHO~, which should show up in a NUCMAP. Generally there should be one 2K page of small blocks (used for 10QEs) and one 4K page of medium blocks (used for RQEs). Examine the large blocks in detail. If the system was quiesced, there should be two 4K pages of large blocks and all blocks should be on the free queue. Many heavilyloaded systems require 8-10 pages oflarge blocks. If the actual number is much higher than this, determine the ASIO that each in-use block is assigned to (the two bytes at block address-8 contains the ASIO address). System address spaces can have many blocks, but any user address space with a large m.imber of blocks should be explained. Common problems are: I/O loop, I/O errors, and storage not being freed at . I/O termination time. These page frames occupy real storage, which depletes the pool of available real storage and possibly causes excessive paging. 4.4.4 3. Check page frame table entries (PFTEs) for large fix counts. The CVT=X'164' contains the address of the PVT; PVT+X'C' contains the address of the apparent PFTE origin - you must index several hundred bytes (X'10' times the number of pages in the nucleus). Large fix counts may , indicate a page fix macro loop, or page fix without page free. Frames allocated to a private area space may indicate a user errOL Try to analyze the contents of the page for a clue as to;who is page fixing without page freeing. 4. Check PFTEs for bad frames caused by hardware storage errors that rendered these frames unusable (in PFTE+ X'C', the X'04' flag should be set if this is the case). Contact hardware personnel to determine if a machine malfunction has occurred. S. COE (contents directory entry). These blocks represent modules loaded into virtual storage; COEs reside in SQA and the queue is anchored at CVTQLPAQ (CVT+X'BC'). The loaded module's name and starting address reside in the COE. Those with starting addresses less than the value in CVTLPDIA (CVT+X'168') were members of either an IEAFIXnn list or an IEALPAnn list. For members of these lists, COEs are built by ,~IP and they occupy real, fixed storage even when the module is not in use. If fixed storage or fragmentation is a problem, moving these modules to LPA can provide a partial solution. OS/VS2 System ·Programming Library:' MVS Diagnostic Techniques Perfonnance Degradation (continued) 6. The BLDL table, pOinted to by the nucleus external symbols IEARESBL and/or IEARESBS, should be checked. The address(es) should be less than the value at CVTNUCB, the upper nucleus boundary. If not, try changing the . BLDL=nn system initialization parameter to BLDF=nn. This will cause the BLDL list to occupy real storage at all times. If the number of entries is less than 93, one frame is used. 7. In a quiesced system, the number of paging requests received should equal the number of paging requests completed by ASM. The fields ASMIORQR (ASMVT+X'IDO') and ASMIORQC (ASMVT+X'lD4') in the ASMVT represent the number of tequests received and completed, respectively. The difference between the two counts represents requests not completed. A large number of uncompleted requests can indicate ASM is either not processing at all or is taking considerable time for each operation. Examine the PAT (page allocation table) to determine whether the page data sets are almost full. Also examine ASMERRS (ASMVT+X'74'), PAREFLGS (pART+X'8'), and the 10SB for paging requests (IOSCOD=X'41') to determine if I/O errors have occurred and the data sets are no longer in use. 8. CSA use should be examined. If SQA is depleted, requests are filled from CSA. This can be determined by inspecting the SQA DQE (descriptor queue element): • the CVT+X'230' points to the GDA (global data area) • the GDA+X'18' points to the SQA SPQE (subpool queue element) • the SPQE+X'4' points to the SQA DQE. The DQEs are chained together. If more than one DQE exists for the SQA, it has expanded into the CSA. This causes the frame to be fixed. Also, often CSA users page fix. In this case fragmentation, if present, could cause performance degradation. 9. Possible real frame shortage can be indicated by inordinately large counts in the PVT fields: PVTRSQA (a count of the number of times the SQA reserved frame was allocated), and PVTDFRS (a count of the number of times real frame allocation was deferred because of a lack of frame availability). These counts by themselves mean little,"but can be of some use when analyzing an overall problem. .. Section 4: Symptom Analysis Approach 4.4.5 4.4.6 OS/VS2 System Programming Library: MVS Diagnostic Techniques Incorrect Output The problem of missing, unexpected, or erroneous output is one of the most difficult. This incorrect output might take the form of a message on the console log or in SYSOUT, or an incorrect total in a report. There is usually very little documentation that assists the debugger in analyzing incorrect output. Ini tial Analysis Steps To resolve the problem of missing or incorrect output the analyst must have a complete understanding of the job environment. There is no fast, clear cut approach to these errors. This section only tries to assist your thought processes as you begin to work on a problem of this type. There are four basic categories of incorrect output: missing, unexpected, erroneous, or a combination of these. The steps in resolving the problem must take the category into account. Initially, consider the following steps: 1. Gather all possible documentation. You will probably need additional information as you begin to understand the problem in more detail. 2. Consider all recent hardware and software changes to the system and to the application(s) if relevant. A change to an application that updates a data base affects all other data base users. 3. Remember that output requires input. Consider the possibility of bad input. 4. Consider whether the problem is associated with some new function or application. Most incorrect output errors occur in the installation and test phase. Isolating the Componen t Next, attempt to locate the component causing the error. Do this by thinking through the flow. listed below are some questions that might assist you. • Is the problem related to a user function or application? If yes, have there been recent changes or is testing still in progress? • Is the job control language correct? Have there been recent changes to the JCL? • Have any user exits been added or modified? • Have any user supervisor calls (SVCs) been added or modified? • Are there operator interactions that could affect the input/output? Section 4: Symptom Analysis Approach 4.5.1 Incorrect Output (continued) • From which access method or function is the output expected? Some examples are: JES, VSAM, BTAM, TCAM, and WTO. • Was RJE involved in the input and/or output? • Was there any cross-address-space communication involved in the data movement? In MVS, most telecommunication requires data passing between address spaces. • Is there any evidence of I/O error activity? Refer to the console log and LOG REC data. • Do you have a storage dump, or should you obtain one? See the chapter on "Additional Data Gathering" in Section 2. • Would a trace be helpful in understanding the flow? Consider tracing the activity with GTF. Many of the above questions have to be answered in order to get a better understanding of the problem area. In many cases, the problem has to be recreated with various traces or traps. These questions help to determine what data is needed to solve the problem. Analyzing System Functions To solve an incorrect output problem, you must understand the mode of operation and the processes required to accomplish the function in question. The first question must be the following: where does the output originate? Then you must be able to verify that the activity did occur. There must be some means for understanding the path the data should take from the origin to the fmal location (device). . Consider the following example: I. A TSO user invokes his program which should write a message to the terminal and then wait. 2. The program waits after the I/O but no message appears. 3. What are the system functions involved? a. A language translator and the linkage editor that created the load module. b. OPEN code necessary to complete the link between the device and the user PUT macro. c. TSO TIOC flow. The user issues PUT which branches to the TIOC module IGG019T4. This module issues TPUT. What is the TPUT path through TIOC? d. TSO TIOC interfaces with TCAM. What is the data path through TCAM ? e. TCAM interfaces with the I/O supervisor. Can evidence be found of the SIO? What types of trace would be helpful? In this example it may be necessary to take a series of dumps to resolve where the message was lost. But first be certain that the correct message is in the correct buffer at the time of the user PUT macro. 4.5.2 OS/VS2 System Programming Library: MVS Diagnostic Techniques Incorrect Output (continued) It could be necessary to apply this type of thinking all the way down to the CSECT level. ' Summary In analyzing incorrect' output, there are two key points. The first is that a better understanding of the system flow is probably required for this type of problem than for any other. The second point is that it is very important to be able to obtain the correct documentation at the correct time. Note: The chapter on TP (teleprocessing) problem analysis earlier in this section provides some specific steps for analyzing incorrect output in the TP environment. Many of the techniques in that chapter can be applied to incorrect output analysis. Section 4: Symptom Analysis Approach 4.5.3 4.5.4 OS/VS2 System Programming Library: MVS Diasnostic Techniques. Section 5: Component Analysis This section describes the operating characteristics and recovery procedures of 15 system components and provides debugging techniques for determining the cause of an error that has been isolated to a component. The components described in this section are contained in the following chapters: • Dispatcher • lOS • Program Manager • VSM • RSM • ASM • SRM • VTAM • VSAM • Catalog Management • Allocation/Unallocation • • JES2 • SSI • RTM I• Communications Task Section s. Component Analysis 5.1.1 5.1.2 OS!VS2 System Programming Library: MVS Diagnostic Techniques Dispatcher For effective problem analysis, it is important to understand how work is processed by the MVS system. The MVS dispatcher plays a large role in processing work by controlling the initiating of all work within the system. An understanding of the dispatcher's processing and control block structure is imperative for the debugger. This chapter describes the following items about the MVS dispatcher: • Important dispatcher entry points • Dispatchable units and sequence of dispatching • Dispatchability tests • Dispatcher recovery considerations • Dispatcher error conditions Important Dispatcher Entry Points The dispatcher's main entry points are the following: lEA ODS - Entered disabled, key 0, supervisor state, no locks held. This entry point is called by the following: • Exit prologue (lEAVEEXP), when control is not returned to the issuer of an SVC. • Lock manager (IEAVELK), when it is suspending a task that unconditionally requested a local lock that was unavailable. • Program check FLIH (lEAVEPC), when a TCB or SRB was suspended because of a page fault that required I/O or because no frames were available. • External FLIH. • RTM (recovery termination manager). IEAPDS7 - Entered disabled, key 0, supervisor state, no locks held. This entry point is called by I/O FLIH and by SVC FLIH when the SVC requires ., alocal lock that is not available. IEAPDS6 - Entered disabled key 0, supervisor state, no locks held. This entry point is called by RTM on an EOT (end of task) condition. IEAPDS2 - Entered disabled, key 0, supervisor state, the dispatcher lock held. This entry point is called by the lock manager (lEAVELK), when suspending an SRB that has requested the local or CMS lock, and when suspending an address space that has requested the CMS lock. Section 5. Component Analysis 5.1.3 Dispatcher (continued) IEAPDSRT - Entered enabled or disabled, any key, supervisor state, no locks held. This entry point is the termination return address for all SRBs. Ds.{STCSR - Job step timing subroutine. Calculates an.P accumulates job step timing. This entry point is called by the following: • Lock manager (lEAVELK), when common suspend routine of the lock manager is suspending an SRB or locally 10cked'TCB because of a lock request or a page fault suspension. • Dispatcher. The dispatcher calls this subroutine internally when it is saving the status of a previously executed unit of work. • Timer SLIH. The timer SLIH calls this subroutine before it gives control to SRM. Dispatchable Units and Sequence of Dispatching This section describes the unique dispatchable units of work and the queues where they are located. The dispatchable units are described below and are listed according to the priority with which they are dispatched. 1. Special Exit A special exit is made known to the dispatcher by a unique flag setting in the LCCADSF 1 (LCCA + X'21 C') field. The LCCADSF 1 bits and the exits they indicate are: Bit Exit LCCAACR LCCAVCPU LCCATIMR ACR Vary CPU Timer Recovery The dispatcher enters these exits via a branch. 2. Global SRBs IEAGSMQ is the header for the global SRB staging queue. If it is not zero, it points to the global SRB queue. (See Figure 5-1.) Requestors use the SCHEDULE macro to compare and swap global SRBs onto the queue. The dispatcher obtains the DISP lock and removes the SRBs from the queue with the compare and swap (CS) instruction. The dispatcher then calls CSECT lEAVESCO at entry lEAVESC I in order to move the SRBs to the appropriate priority level (0 or 4) on the GSPL (global system priority list) queue. IEAGSPL is the global SRB dispatching queue. The queue is divided into non-quiescable (priority level 4) and system level (priority level 0) SRBs. The dispatcher removes the SRB from the GSPL queue, updates the PSAAOLD with the SRBASCB address, loads its STOR value, and dispatches the SRB. PSAANEW is not updated. 5.1.4 OS/VS2 System Programming Library: MVS Diagnostic Techniques Dispatcher (continued) Global SRB Staging Queue 10 IEAGSMQ CVTGSMQ X'264' SRB o Global SRB Dispatching Queue 10 IEAGSPL CVTGSPL X'26C' X'S' PSAANEW PSAAOLD I 125CO I 125CO I o ASCB (at location FFE300) IEAGSPL X'2C' ASCBSTOR o NOTE: 0 and 4 in SRBs represent system priority level Figure 5-1. Global SRB Queue Structure and Control Block Relationships Section 5. Component Analysis 5.1.5 Dispatcher (continued) 3. Local SRBs IEALSMQ is the header for the local SRB staging queue. If it is not zero, it points to the local SRB queue. (See Figure 5-2.) Requestors use the SCHEDULE macro to compare and swap local SRBs onto the queue. The dispatcher tests this queue if it cannot fmd any special exits or global SRBs to dispatch. If this queue is not empty, the dispatcher obtains the DISP lock and removes the entire queue with compare and swap instructions. The disp~tcher then calls CSECT lEAVESCO at entry lEAVESC2 in order to move the SRBs to the appropriate priority level (0 or 4) on the LSPL. lEAVESC2 also notifies SRM via the SYSEVENT macro if the address space is swapped ou t. Memory switch is then invoked to direct the dispatcher to the highest prionty work. (Note that no work is dispatched. -The SRBs are simply moved to the appropriate dispatching queues' (ASCBSPLs).) 5.1.6 OS/VS2 System Programming Library: MVS Diagnostic Techniques Dispatcher (continued) After a user request to schedule local SRBs: 10 IEALSMQ CVTLSMQ X'268' SRB SRBASCB After the dispatcher has determined there are SRBs to be processed and moves them to the appropriate ASCBlevel: 10 X'1C' Figure 5-2. Local SRB Queue Structure and Control Block Relationships Section 5. Component Analysis 5.1.7 Dispatcher (continued) 4 .. Address Space Dispatcher This is not actually a unique dispatchable unit of work, but rather an anchor for the real dispatchable units of work (that is, local SRBs or TCBs). The address space dispatcher is entered to select the next address space (memory) in which work will be dispatched. If an address space is dispatchable, the priority of dispatching within the address space is the following: a) Local SRBs b) Local Supervisor (locally locked, interrupted work) c) TCBs If the dispatcher finds any SRBs on the LSPL (pointed to by the ASCBSPL), the top SRB is de queued and dispatched. If there are no SRBs on the local SPL queue, the local lock is tested for the interrupt id, X'FFFFFFFF'. If the interrupt id is in the local lock, the id is changed to the current CPU ID via compare and swap, and the status (FRRs, GPRs, FRR stack, CPU timer value, PSATOLD, PSATNEW and resume PSW) is restored from the IHSA (Interrupt Handler Save Area). The ASCBASXB points to the ASXB; ASXBIHSA (ASXB + X'20') in turn points to the IHSA. Status is saved in the IHSA when a locally-locked program is interrupted and control is switched away from it because there is higher priority work to handle. The dispatcher does a compare and swap to obtain the local lock: • If the local lock is available and the number of ready TCBs exceeds the number of processors active in the address space, • or if the ASCBS3S bit (ASCB + X'67') indicates that there is work for the Stage 3 Exit Effector to process. If the dispatcher is successful in obtaining the lock, it will go to the Stage 3 exit effector, if necessary, and then select the first dispatchable TCB that is not active on another processor. The dispatcher may dispatch the above units (SRBs, supervisor, TCBs) without going through the memory dispatcher if the address space was current when the dispatcher was entered and if there was no indication that a memory switch was required. (PSAANEW =PSAAOLD). 5. Wait Task The wait task is dispatched when the dispatcher reaches the bottom of the ASCB ready queue and can find no ready work after a recursive search of the SRB queu~s and the ready queue. Figure 5-3 provides an overview of the processing sequence through the MVS dispatcher. 5.1.8 OS/VS2 System Programming Library: MVS Diagnostic Techniques Dispatcher (continued) EXIT START INDICATE SRB MODE UPDATE PSAANEW , PSAAOLD INDICATE SRB MODE RESTORE STATUS FROM IHSA LPSW RESTORE REGS, PSW FROM IHSA LPSW GET HIGHEST READY TCB YES WAIT RECURSIVE SEARCH OF THE SRB QUEUE AND THE READY QUEUE TO VERIFY THAT NO READY WORK EXISTS .. Figure 5-3. Dispatcher Processing Overview Section 5. Component Analysis 5.1.9 Dispatcher (continued) Dispatchability Tests The dispatcher conducts the following dispatchability checks: SRB Tests Test * Condition 1. ASCBRCTF / / ASCBOUT Address space swapped out. . 2. CSDSCFLI//CSDSYSND System non-dispatchable . (al ASCBFLG2/ /ASCBXMPT (a) 3. ASCBDSPII!(any bit on) Address space non-dispatch able . . 4. ASCBFLG2/ / ASCBSNQS All SRBs stopped. . 5. ASCBSSRB System level SRBs stopped (This does not apply to NONQ SRBs.) 6. SRBCPAFF Does SRB have affinity to this processor? (PCCACAFM defines the current processor) 7. SSRBFLG 1//SSRBLLH If set, compare and swap (via CS instruction) CPUID into local lock word, ASCBLOCK, which has the suspend ID in it (X'7FFFFFFF'). If the system is non-dispatchable, the SRB must have been scheduled to an exempt address space. *Format of test description is "field//bit within field." S.I.10 OS/VS2 System Programming'Libruy: MVS Diagnostic Techniques Dispatcher (continued) Address Space and Task Tests The following address space test criteria must be met before the task dispatcher gets control. Address Space Tests Condition 1. ASCBDSPI / / ASCBNOQ The ASCB is not on the ready queue. The address space will not be dispatched. 2. ASCBDSPI / /ASCBF AIL The ASCB is in failure mode and in the process of being terminated. The address space will not be dispatched. 3. CSDSCFLI / /CSDSYSND System non-dispatchable. (a) ASCBFLG2/ /ASCBXMPT (a) If the system is non·dispatchable, the SRB must have been scheduled to an exempt address space. 4. LOCAL LOCK//ASCBLOCK Suspend ID (X'7FFFFFFF') Cannot process the address space unless an SRB owned the local lock and was suspended and is now re-scheduled to be dispatched. Interrupt ID (X'FFFFFFFF') Compare and swap CPUID into local lock and restore the status (FPRs, GPRs, FRR stack, CPU timer value, PSATOLD PSATNEW, and resume PSW) from the IHSA. Own CPUID Restore GPRs (general purpose registers) and PSW from the IHSA. Free (X'OOOOOOOO') If ready work is in the address space, compare and swap (via CS instruction) the CPVID into ASCILOCK. Other CPVID Bypass the address space. 5. ASCBFLG 1/ / ASCBS3S Go to the task dispatcher and interface with Stage 3 exit effector. 6. ASCBTCBS > ASCBCPVS There are more TCBs ready than there are processors currently executing in the address space; the address space can be dispatched. After these six tests indicate that the dispatcher should dispatch in an address space, the following task indicators are tested . .SecUOR S. COIRponeat.\aaly~ 5.1.11 Dispatcher (continued) Task Tests Condition 1. RBWCF RB must not be waiting. 2. TCBFLGS4 TCB primary non-dispatchability flags must not be set. 3. TCBFBYT 1/ /TCBACTN If TCB is active, it must be a redispatch situation; otherwise, this TCB is active on the other processor (TCBCCPVI). 4. TCBAFFN TCB affInity, if any, must match this processor's physical address (which is located in PCCACAFM). Miscellaneous Notes about the Dispatcher 1. . 2. You can determine the last dispatch by examining the PSW at location X'300'. The TOD of the last dispatch is located at LCCADTOD (LCCA + X'2S8') . The dispatcher sets the following mode indicators before dispatching work. a. For a global SRB - LCCADSF2//LCCASRBM, LCCAGSRB, and LCCADSRW PSA TNEW/PSATO LD =O's b. For a local SRB - LCCADSF2//LCCASRBM, and LCCADSRW PSATNEW/PSATOLD = O;s c. For a task - LCCADSF2//LCCADSRW PSATNEW/PSATOLD =f O's TCB address 5.1.12 OS/VS2 System-Programming Library: MVS Diagnostic Techniques Dispatcher (continued) Dispatcher Recovery Considerations Dispatcher recovery is designed to record information about the error, reconstruct critical dispatching queues, and to retry to continue normal dispatching functions. The data that the dispatcher records in the system diagnostic work area (SDWA) is the following: Fixed Data: SDWAMODN IEAVEDSO, dispatcher module name SDWACSCT IEAVEDSO, dispatcher CSECT name SDWAREXN IEAVEDSR, dispatcher recovery routine Variable Data: SDWAURAL - Seven full words' of data as follows: PSAHLHI - Locks held at time of error. ASCBLOCK - Value of locallockword for current address space at the time of error. LCCASPLJ -SRB queue journal word. Contains the address of the top SRB on the staging queue when dequeued by the dispatcher and passed to IEAVESCO. PSAAOLD - Current ASCB address at the time of error. Control Register 1 - Value of STOR (CRt) at the time of error. PSATOLD - Current TCB address at the time of error. LCCADSFl - Dispatcher flag bytes that were on at the time of error. If the dispatcher lock was held at the time of error, the following recovery routines are called by the dispatcher recovery routine: • IEAVESCR - Schedule recovery routine; it recovers SRB queues. • IEAVEQV3 - Verifies, and possibly reconstructs, the ASCB Ready Queue. • IEAVEGAS - Verifies each ASCB on the ready queue. If the local lock was held by the dispatcher, the error was not a DAT (dynamic address translation) error; and if the current ASCBSTOR value equaled the CRl value, then the following recovery routines are invoked by the dispatcher: • IEAVEEER - Exit effector recovery routine (if the ASCBS3S is on). • lEAVEQV3 - Verifies, and possibly reconstructs, the TCB queue. • IEAVETCB - Verifies each TCB on the TCB queue. Note: The queue verification routine, IEAVEQV3, also records error information in the SDWAURAL about any changes to the queue structure. Section s. Component Analysis 5.1.13 Dispatcher (continued) By removing elements that have been overlaid (or "clobbered") from the queue, the dispatcher recovery routine attempts to keep the system up at the cost of a particular user,job, address space, etc. There is a certain exposure in this philosophy.because the element that has been lost might have owned a critical system resource or might be a critical function in itself (for example, a TCB that represents the user's main application program). Once the element is lost, there might be no indication that it was a critical resource (a valid control block, for example) or that it owned a critical resource. Dispatcher Error Conditions • The abend COD is issued from CSECT lEAVESCO when a local SRB is scheduled t6 an invalid ASCB. • Program check interrupts (usually of the page, addressing, or segment exception variety) occur when: PSAANEW is overlaid and the dispatcher attempts to switch address spaces into the value in the PSAANEW - PSALCCAV or PSAPCCAV values are overlaid The CVT pointer is overlaid The ASCB ready queue is overlaid - The TCB queue or the TCBRBP field is overlaid 5.1.14 OS/VS2 System Programming Library: MVS Diagnostic Techniques lOS The purpose of the I/O supervisor (lOS) is to provide a central facility to control and conduct I/O activity through· the operating system. The structure of lOS in MVS is somewhat different than that of previous operating systems. In MVS, lOS "front end processing" is responsible for device control and I/O initiation; lOS "back end processing" is responsible for processing interrupts, providing sense information in error situations, and scheduling the posting of the I/O requestor at completion time. (Figure 5-4 provides an overview of lOS front-end and back-end processing. Figure 5-5 shows the major lOS and EXCP control block relationshlps.) Front-End Processing The major portion of the I/O process (the queueing of I/O requests and starting them) is contained in CSECT IECIOSCN (microfiche name IECIOSAM), which is called the channel scheduler. The channel scheduler is invoked through an interface provided by the ST ARTIO macro via a branch entry. The channel scheduler assumes that all channel program translation and page fixing of buffers and CCWs is performed by the caller. The control block interface is the SRB/IOSB combination, which must be non-pageable and commonly addressable from any address space (that is, SQA and fixed CSA). The channel scheduler operates in physically disabled mode. Invokers (called "drivers") of the channel scheduler include EXCP, VSAM block processor, VTAM TPIOS, and PCI fetch; they are identified by the driver ids located in the IOSB+4. Back-End Processing When lOS is invoked for an I/O interrupt, processing starts in the I/O first level interrupt handler (FLIH) which branches to an entry point, IECINT, within the channel scheduler. Back-end lOS executes physically disabled in the address space · that is active on the processor at the time of the I/O interrupt. lOS then schedules the SRB/IOSB to the address space of the requestor. The module IECVPST (post status) receives control under the SRB and interfaces with the driver's special exits and termination routines (channel end, abnormal endappendages). Figure 5-4 shows an overview of the lOS process using EXCP as the I/O driver. lOS Problem Analysis Problems in the I/O process can cause three symptoms: I. Abend codes 2. Loops 3. Wait states These symptoms are discussed in the following sections. Section 5. Component Analysis 5.2.1 lOS (continued) Front· End,Processing Back· End Processing User SVCO ...... ... ~ IECVSMGR Gets storage for SRB/IOSB, TCCW, BEB, FIXLlST, ROE BR ~ EXCP Driver - IECINT (lOS) tECVSMGR BR I ECVTCCW Scheduled via SRB/IOSB CCW Translation Fixing BR ~ ~ BALR SRB/IOSB (input parameter) , EXCP Driver (lOS) IECVSMGR • • Gets storage for 100E (I/O Oueue Element) • ... IECIOSCN po .... BR BR • EXCP Driver t User Figure 5-4. lOS Processing Overview 5.2.2 OS/VS2 System Programming Library: MVS Diagnostic Techniques POST Appendage Interface IECVTCCW - Retranslation - Unfixing IECVSMGR - Free blocks Frees 100E lOS (continued) Major lOS Control Block Relationships Major EXCP Control Block Relationships CVT 8e lOB DCB LCH 0 • 1st 100 ,/ / 4 ( • Last 100 / \ \ 100 I " \ , ( ~ .... UCB Prefix --- lI' UCB 14 8 ASCB - - - - - - - -) 20 -f- '1\ 4 I 8 I Virtual - 4C Real ~I I ~CHANPGM .~l--A'" Common UCB Extension ..J '\ I I ./ .... TCB / - 0 4 48 ~(DEB 10 ~ 2C ~ .J C I 14 UCB '.,..TCCW EWA 34 ROE 0 • I I I .- / 1C ..... -4 8 10SB -Va-- 8 - SRB --- CCW1 CCW2 CCWn L.-", I . . _J Figure 5·5. Major lOS and EXCP Control Block Relationships Section 5. Component Analysis 5.2.3 lOS (continued) lOS ABEND Codes lOS abends are generally caused by an invalid control block. The error can be caught by validity checking or it can cause a program check. The recovery routines, generally FRRs, receive control on a program check. For either a validity check or a program check, the error is converted to an abend code. EXCP FRR processing saves the abend code and the relevant status (that is, error PSW, and error registers) at the time of error in the EXCP problem determination area, which is pointed to by the TCB (X'CO'). lOS abend codes are documented in OS/VS Message Library: VS2 System Codes. The EXCP problem determination area is documented in OS/VS2 I/O Supervisor Logic. Note: DUring abend processing, the EXCP problem determination areas are, not freed. When you find the area pointed to by the TCB, scan that area for previously-obtained areas to help with lOS analysis. Loops If an invalid control block is passed to lOS and it is not caught by the validity check routines, a loop is often the result. The traditional problem has been caused' by the storage manager (IECVSMGR) passing a bad address back to a requestor. Consequently, the requestor initializes the bad block and overlays or clobbers some valuable piece of storage. On occasion the bad address passed by IECVSMGR is O. The fact that most of the I/O process runs in supervisor state, key 0 means that the PSA can be overlaid. This usually causes a program check loop whenever any type of interrupt is subsequently received by the processor. At this point, pattern recognition is importari~ to determine whether the storage manager has been involved in the problem. (pattern recognition is discussed in the "Miscellaneous Debugging Hints" chapter in Section 2.) Try to determine whether ohas been used as the address of an SRB/IOSB or EWA control block. The first . X'AO' bytes ofPSA may be affected. The routine responsible for this could be an lOS driver or recovery routine. Look for addresses of exit routines which are pointed to by the 10SB; they give an indication of the driver and potentially, some idea of the process. Remember that the hardware stores the current PSW as an old PSW (at locations X'18' - X'40') if any interrupt occurs. Therefore these locations may not look bad. The main thing to keep in mind is that generally IECVSMGR is not the cause of the problem. For performance optimization reasons, the storage manager has minimal validity checking and thus trusts that the invoker is operating correctly. Historically the cause of this type of problem is that the same block is freed twice, which causes the storage manager's free queue to contain invalid pointers. Often this double freeing has occurred some time earlier, which makes the recreation of the erroneous process very difficult. Extensive analysis and piecing are required. Multiple dumps may help provide the pieces necessary to recognize a pattern or common occurrence. Or, a trap might have to be devised. 5.2.4 OS!VS2 System Programming Library: MVS Diagnostic Techniques lOS (continued) If there is evidence of a recent error in the I/O process, searching the in-storage LOGREC buffer or SYSl.LOGREC records for an lOS error helps recreate the process. Generally the lOS recovery routines attempt to free control blocks and might inadvertently free one that has just been freed. Try to determine if there is any way that the channel scheduler or I/O driver and its associated exits could have freed blocks before or after recovery processing. In a retry situation, normal termination procedures could have freed a block that was already freed by recovery. Again, traps might be required. lOS WAIT States Another problem is an enabled wait state with work remaining I for lOS to accomplish. To analyze a wait state, it is necessary to determine the cl,Jrrent status of lOS. To determine current lOS status, scan the UCBs for valid 10QEs in UCBIOQ (UCB-4). The 10QE is valid ifUCBPST (UCB+6, bit X'20') is on. The 10QE address is valid only when it is active. Understand that once, a block is freed, it is generally reused quickly when a subsequent request for: an I/O operation is encountered. Because of this, it is very uncommon to find a significant 10QE pointed to by the UCB prefix once lOS has returned the block. The block usually represents another request. If the UCB pointer in the 10SB pointed to by the 10QE does not equal the address of the UCB you started with, the blocks have been reused and the data is invalid. Additionally the 10QEs can be found in the storage manager areas. These are located by CVT+X'7C' which points to IOCOM+X'24' which points to module IECVSMGR. Label IECVSHOR is an external symbol for the storage pool headers for small blocks (IOQEs). These are foliowed by the pool headers for medium (RQEs) and large (SRB/IOSBs, BEB, TCCW, ERPWA, fix lists) blocks. The pool headers are 16 bytes long and the last word points to segment headers for 2K bytes (small block) or 4K bytes (medium and large blocks) of storage. The IOQE+5 contains an allocated indicator. If all X'3C' bits are on, the block is allocated and, in the case of 10QEs, represents I/O requests that are started or that have been requested by a driver but have not been started because of a busy or not ready condition (UCBFLA). After the storage manager (medium and large) blocks are found, notice their g-byte prefixes, the first halfword of which contains the ASIO of the address space to which the block is allocated. Note that the ASIO is 0 when the block is not allocated and in special cases such as when unsolicited device ends are not associated with any address space. Scanning these prefixes for an ASIO that. matches the problem address space can help in finding blocks associated with I/O requests related to that address space. Medium and large blocks that contain aX'17' in the fourth byte of the prefix are not allocated. A value of X'7S' for medium blocks, and X'76' for large blocks, indicates that they are currently allocated. (Note that the third byte of this first word of the prefix is unused.) The 10QE points to the associated 10SBs which contain information about the channel programs and pointers to the requestor's control blocks. Section S. Component Analysis S.2.S lOS (continued) In general, VCBs and associated 10QEs/IOSBs indicate active I/O. Any flag bits. set in the VCB + 6/7 help identify the status of the requestor. Also, investigate VCB flags indicating the quiesce option, DAVV (direct access volume verification) processing, I/O restart, missing interrupt handler (MIH), or message pending. Another place to look is the LCHs (logical channel queues). When a STARTIO macro is issued,if both the channel and device are available, lOS attempts to issue the SIO instruction. If any bit in VCBFLA (VCB+6) is on, the device is considered busy. The TCH instruction is used to determine if the channel is busy. If ei ther is busy, the 10QE for the request is queued to the LCH. This queue then indicates' all requests that have been accepted for processing but for which either no SIO has yet been issued or an SIO was issued but a non-zero condition code was received. The first LCH is pointed to by CVT+X'8C'. Each LCH is X'20' bytes long. VCBLCI (VCB+ X'A') is an index to the LCH for the given VCB. Each LCH is a double-headed, single-threaded queue of 10QEs. The LCH + 0 points to the first 10QE and LCH+4 points to the last, or only, 10QE. If LCH + 0 is all Fs or Os the .~_ _ -----9ueue is emm. in which case ther.uue_lliLLe_qlte_stsJnL1haLchanneL---.The~QQEs__ _ themselves are linked with 10QELNK (IOQE+O). 10QEIOSB (IOQE+8) points to the 10SB for the request it represents. Note that 10QENQ (IOQE+4, bit X'40') must be on for all 10QEs on the LCH. General Hints For lOS Problem Analysis 1 . Saveareas. lOS does not use save areas in the standard manner. When registers are saved, the order is often 0-15 at offset 0 into the save area. If the local lock is obtained (as is generally the case), IECVPST, the first module to execute in the user's address space after an I/O interrupt, uses the local lock save area (ASXBFSLA at ASXB+X'24') to pass the address of the local lock save area to the exit routines. An exception is I/O interrupt processing for a paging pack where a storage manager or ASM area is used. Basic lOS uses the lOS save area (LCCA+X'218' points to the CPV work save area vector table! (WSAVTC); WSAVTC+X'18' points to the lOS save area). This save area is also passed to DIE (Disable Interrupt Exit) routines. Also, the TCCW control block contains a save area. EXCP passes the address of the associated TCCW+X'48' (in Register 13) to appendages for use as a save area. 2. EXCP back-end processing does all the interfacing to the traditional appendages. In MVS, appendages are entered in SRB mode, physically enabled, and with register 13 containing the address of a save area. It is EXCP's responsibility to map the 10SBto the lOB to maintain compatability. Also on return from the appendage, EXCP re-maps the lOB to the 10SB. 5.2.6 OS/VS2 System Programming Library: MVS Diagnostic Techniques ~ lOS (continued) 3. The EWA (ERP work area) can be important in problem analysis. The IOSB+X'34' points to EWA, which contains infonnation, including sense data, passed to the ERPs from lOS as well as work areas and counters for the ERPs. The ERPIB, which is useful for channel errors, is contained in the EWA. See the topic "Error Recovery Procedures (ERPs)" later in this section for a description of ERP processing. Several problems have been uncovered where ERPs constantly retry an I/O operation that constantly fails. The EWA can contain the number of retries. and other control information helpful in determining the reason why. EWAs often contain the retry CCWs. 4. The LCCA of each processor contains an IRT (lOS recovery table). lOS uses various fields in the IRT to checkpOint its progress. The IRT also contains pointers to the active control blocks on whose behalf lOS is processing. 5. Two 10SB flags (IOSEX, 10SERR) are used to control error processing. For a permanent error the general flow is: • Abnormal or nonnal exit entered with IOSCOD=7F, 10SEX=1, IOSEl ......../ ~ 1 - - - - - - ' :; "0 1------.. 0 :2E ~ --y Builds fixed LPA (via IEAFIXxx) and puts COEs on the ALPAO. Note: Some modules can now be in both the pageable and fixed LPAs. Builds modified LPA (via IEALPAxx). Fetches modules and enqueues CO Es on bottom of ALPAO only if modules are not already in the fixed LPA. ALPAO (SOA) MOduleslb Pageable LPA .... "> -----...", --' r1 PLPAO Modified LPA r1 Uses lEA LO 000 to build "permanent" COEs for specified pageable LPA modules. SVCLlB Modules Figure 5-9. IEAVNP05 Initialization 5.3.6 > v LPOEs ~----------------------~,-'" .......... -----~ LlNKLIB Modules If CLPA is specified, builds pageable LPA and PLPAO for all modules on SYS1.LPALIB. COEs OS/VS2 System Programming Library: MVS Diagnostic Techniques ... " Modules> .... Fixed LPA Program Manager (continued) If a useable copy of the requested module is not immediately available, the requestor's program manager SVRB is put into a wait state and enqueued on the SVRB suspend queue (SSQ). The SVRB is dequeued and posted out of its wait when the desired module becomes available. For "not in storage" suspends, module IEAVLKOI posts all SVRBs queued on a CDE's SSQ when it successfully completes a module fetch. Each of these SVRBs then restarts the LINK request essentially from the beginning at entry point IEAQCS02 in module IEAVLKOO. For the serially reuse able case, module IEAVLK02 posts the top SVRB on a CDE's SSQ when the PRB that was using the module represented by the CDE exits. In this case, execution resumes in module IEAVLKOO at entry point IEAQCS03. The logic at this entry point assumes the requested module is in storage and immediately available. Once a module becomes available to a request, the module-use count in the CDE is increased by one. This use count is decreased by one when the current requestor no longer needs the module. Next, LINK processing gets storage for a PRB out of sub pool 253. The PRB is initialized (including setting the RBOPSW to point to the entry point of the requested module) and enqueued on the current TCB's RB queue. It is enqueued after the program manager SVRB, but before the linking module's RB. The program manager then exits, thus causing the requested load module to gain control next. (See Figure 5-10.) PRB How initialized by IEAVLKOO for LINK (and ATTACH) Field RBPREFIX zero RBSIZE 13 double words RBSTAB1 zero RBSTAB2 from PM SVRB except RBATTN=O RBCDFLGS zero RBCDE1 + requested CDE (may be a minor) RBOPSW-LH from caller's RB (or AABCODOO) Note 1 Note 2 RBOPSW-RH module entry point from CDENTPT RBPGMQ from PM SVRB RBWCF from PM SVRB RBLINKB + RBGRSAVE from PM SVRB caller PRB (or 'TCB if ATTACH) Notes: 1. 2. RBCDE1 will point to the CDE containing the requested load module name. This may be a minor CDE. CDRRBP of the major CDE however will point to the new PRB. Field CDRRBP in a minor CDE has no meaning. If ATTACH, RBOPSW (left half) is set to AABCODOO where, AA = from current PM PSW B = from TCB protect key (TCBPKF) C = X'C' if TCBFSM = 1; X'D', otherwise D = from PICA if there is one, else 0 Figure 5-10. New PRB Initialization - LINK Section 5. Component Analysis 5.3.7 Program Manager (continued) ATTACH When the ATTACH service routine completes the initialization of the requested daughter TCB, it gives control to LINK in order to establish the first PRB for the daughter TCB. ATTACH simulates the SVC FLIH by creating a program manager SVRB under the daughter TCB and then causing the daughter to branch enter module IEAVLKOO at entry point IEAQCS01. Processing is essentially the same as for LINK except for APF considerations which are explained later. XCTL Module IEAVLKOO gets control from the SVC FLIH at entry point IGC007 when the XCTL SVC (SVC 7) is issued. With XCTL, unlike LINK, the first function of module lEAVLKOO is to establish the new RB. The method used depends on the type of caller, as follows: • If the caller is an SVRB, the caller's SVRB is reused for the new module. It remains in the TCB RB queue in the same position as it was when IEAVLKOO got control. • If the caller is an IRB, storage is obtained from subpool 255 for a new PRB. The new PRB is then enqueued on the TCB RB queue between the IRB and the program manager SVRB. • If the caller is a PRB, storage is obtained for a new PRB from subpool 255 and then it is enqueued upon the TCB RB queue following the program manager SVRB. The caller's PRB is then put on top of the queue. The program manager then issues the EXIT SVC (SVC 3) forcing the caller's PRB, since it now is on top of the queue, through exit processing. This results in the storage for the caller's old module being freed before the new module is obtained. The program manager then resumes execution at entry point IEAQCS02 in module IEAVLKOO. Figure 5-11 shows how the new PRB (SVRB in the case where the caller is an SVRB) is initialized for an XCTL. Figure 5-12 shows how the new RB is enqueued in the TCB RB queue before the program manager locates the new load module. The next function in the XCTL process is to locate the desired module. If the caller is an SVRB, the module is searched for via the ALPAQ; if it is not found, it is searched for via the PLPAD. If it is not found by either the ALP AQ or the PLPAD, an 806 abend is generated. If the load module is found, final initialization in the RB is completed and the program manager exits. The following exceptions to normal processing occur when an SVRB issues an XCTL macro (they are made for performance reasons): 5.3.8 • Only the ALPAQ and PLPAD are searched. • If the CDE on the ALPAQ is found useable, the use count is not increased. OS/VS2 System Programming Library: MVS Diagnostic Techniques Program Manager (continued) • If an LPDE in the PLP AD is found useable, no CDE is built or enqueued on the ALPAQ. Furthermore, the RBCDEI field is made to point to the LPDE rather than a CDE. If the caller is not an SVRB, the requested load module is located as it is in LINK. Once found, initialization is completed on the already existing PRB and return is made to the caller. How initialized by IEAVLKOO for XCTL RB field Caller is a PRB Caller is an I RB Caller is an SVRB RBPREFIX zero zero left as is RBSIZE from caller PRB 17 double words left as is from caller PRB zero left as is except RBTRSVRB, Note 1 from caller PRB from caller IRB except RBFOYN=1 left as is RBCOFLGS zero zero left as is RBCOE1 t requested COE t requested COE +COE or +LPOE RBOPSW-LH from caller PRB from caller IRB left as is RBOPSW-RH t module entry point t module entry point t module entry point RBPGMQ zero zero left as is RBWCF from caller PRB from caller IRB left as is t calling IRB left as is from caller IRB left as is RBSTAB1 RBSTAB2 RBLINKB +caller's RBGRSAVE from caller PRB caller RB Note: 1. Bit RBTRSVRB indicates (for a dump routine) the location of the load module. It will be set to 0 if the module was located via a COE on the ALPAQ. It willi;>e set to 1 if the module was located in the pageable LPA. Figure 5·11. New RB Initialization - XCTL Section 5. Component Analysis 5.3.9 Program Manager (continued) XCTL by PRB At Start: TCB ~ Before the SVC 3: TCB . After the SVC 3: TCB ... Program Manager SVRB ... XCTLissuing PRB XCTLissuing ~ PRB's calling RB XCTLissuing PRS Program Manager SVRB New PRB ... Program Manager SVRB I New PRB ~ XCTLissuing ~ PRB'.s calling RB XCTLissuing ~ PRB's calling RB resumes at I EAQCS02 XCTL by IRB TCB ... Program Manager SVRB - ... Program Manager SVRB .... - New PRB IRB ----. ... XCTLissuing ~ PRB's calling RB XCTL by SVRB TCB New SVRB XCTLissuing ~ PRB's calling RB Lwas Figure 5-12. XCTL RB Manipulation 5.3.10 OS/VS2 System Programming Library: MVS Diagnostic Techniques XCTL·issuin g PRB 's SVRB Program Manager (continued) LOAD Module IEAVLKOO is called by the SVC FLIH at entry point IGC008 when the LOAD SVC (SVC 8) is issued. For a LOAD request, the TCB's load list is first searched for an LLE representing a useable copy of the requested module. If found, the LLE total responsibility count is increased by one. In addition, if the caller is in supervisor state and/or key 0-7, the system responsibility count is updated. A separate system count is maintained to prevent a non-system user from deleting a module loaded by a system routine. If the load list does not yield a useable copy of the requested module, the module is located and CDEs are manipulated as explained earlier for LINK. The final step for LINK processing is the building of the PRB. For LOAD, however, no PRB is built; instead, an LLE is built and enqueued at the top of the TCB's load list queue. This LLE points to the CDE (whether it be on the JPQ or the ALPAQ) of the requested module. The total responsibility count is initialized to one, and the system responsibility count to zero or, if a system request, to one. DELETE Module lEAVLKOO is called by the SVC FLIH at en try poin t IGC009 when the DELETE SVC (SVC 9) is issued. Since the module to be deleted must have been previously loaded by the same task, lEAVLKOO searches the TCB's load list queue for the module. If it is not found, the program manager exits with a return code of 4. If the module is found, the total responsibility count in the LLE is decreased by one. The system responsibility count is also decreased by one if the DELETE was issued by a system program. Finally, the use count in the CDE is decreased by one. The LLE is dequeued and freed if the total responsibility count goes to zero. If the use count in the CDE also goes to zero, routine CDHKEEP in module IEAVLK02 is called. This routine frees the CDE and all its minor CDEs, the associated extent list, and the module itself. Once control is returned to IEAVLKOO, the program manager exits. Exit Resource Manager Module lEAVLK02 is called by the exit prologue at entry point IEAPPGMX whenever a PRB exits. The purpose is to clean up the program resources that were being used by the PRB. First, the program manager decreases by one the use count in the CDE being used by the PRB. SectionS. Component Analysis 5.3.11 Program Manager (continued) If the module is serially reuseable, and there are SVRBs suspended on the CDE's SSQ, the top SVRB is posted so it can begin using the module. If the CDE's use count goes to zero, then the CDE, all its minor CDEs, the extent list, and the module itself are freed. When the module is freed (by subroutine CDHKEEP) it is freed from: • Sub pool 0, jf bit CDSPZ is 1 • Subpool 251, if bit CDSPZ is 0 and bit CDJPA is 1 • Subpool 252, if bit CDSPZ isO and bit CDJP A is 0 (See the discussion of "Module Sub pools" later in this chapter.) If the exiting PRB is the last in the rCB's RB queue, lEAVLK02 also does endof-task clean up. This consists of cleaning up and freeing all LLEs remaining on the TCB's load list queue. SYNCH Module lEAVLKOO is called by SVC FLIH at en try point IGCO 12 when the SYNCH SVC (SVC 12) is issued. SYNCH essentially uses the tail end of LINK processing to build and enqueue a PRB for the user exit. No module searching, CDEs, LLEs, etc. are involved. IDENTIFY Module IEAVIDOO is called by the SVC FLIH at entry point IGC041 when the IDENTIFY SVC (SVC 41) is issued. IDENTIFY builds a minor CDE for the requested name and entry point. The CDE is enqueued on the JPQ or ALPAQ following the major CDE that represents the module containing the entry point. One exception to this is if the requestor is not authorized (not supervisor state, not in a system key, and not executing in an APF-authorized step) and the embedded entry point is in a module from an APFauthorized library. In this case, for integrity reasons, a major CDE for the embedded entry point is built and enqueued on the JPQ. Since the CDE is initialized to represent the module as not coming from an authorized library, no authorized user is allowed to use this user-defined entry point. Module lEAVIDOO also accommodates as/LOADER with special processing. When as/LOADER issues the IDENTIFY SVC, it has loaded a module into subpool 0, built an extent list, and now wants to be represented by a major CDE and extent list built and enqueued on the JPQ. This request is called a "major request" and is indicated when Register 0 contains 0 upon entry to IEAVIDOO. Register 1 contains a pointer to the module name and extent list. Figure 5·13. illustrates CDE initialization by IDENTIFY. 5.3.12 OS/VS2 System Programming Library: MVS Diagnostic Techniques Program Manager (continued) Normal Request COE Field Normal Non-authorized requestor and module from an APF-authorized library COCHAIN (behind major) (top of JPQ) (top of JPQ) CORRBP zero zero zero as per input Major Request CONAME as per input as per input COENTPT as per input as per input as per input COXlMJP • major COE zero +Xl (at end of CO E) . couse zero zero zero CONIP as in major coe 0 0 CONIC 0 0 0 COREN 1 as in major coe 0 COSER 1 as in major COE 0 CONFN 0 0 0 COMIN 1 0 0 COJPA 0 1 0 CONlR 1 as in major coe 1 COSPZ 0 0 1 COXle 0 0 1 CORlC 0 0 0 COOlY 0 1 0 COSYSLIB 0 0 0 COAUTH as in major coe 0 0 Figure 5-13. CDE Initialization by IDENTIFY ABEND Resource Manager Module IEAVLK02 is called by RrM at entry point IEAPPGMA under two circumstances: when a rCB is going to abnormally terminate; and when a program manager SVRB is going to be forced through exit processing because of a recovery retry. When IEAVLK02 is called, its function is to clean up CDE SVRB suspend queues. If the current rCB has any program manager S VRB on an SVRB suspend queue for any CDE on the lPQ, the SVRB is dequeued. Furthermore, when a rCB is going to abnormally terminate, if any CDE on the lPQ has the CDNIC bit on and a program manager SVRB on the abending rCB's RB queue is responsible for fetching the module into virtual storage, all other SVRBs waiting for the module are posted and the CDE is dequeued and freed. Section 5. Component Analysis 5.3.13 Program Manager (continued) 806 ABEND If the program manager cannot locate a load module in response to a LINK, ATTACH, XCTL, or LOAD request, it issues an 806 abend. Two key areas in the program manager should be understood if an unexpected 806 abend occurs or if the program manager uses a copy of a module that was not anticipated. These are (1) the module search sequence or search order and (2) the criteria used in determining whether or not a module already in virtual storage is useable. 1. Search Sequence For a LOAD request,CDEs located on the task's load list queue are first searched for a useable module. If this search fails, the search sequence for LOAD is then the same as it is for LINK, ATTACH, and XCTL. The search sequence for LINK, ATTACH, XCTL, and LOAD (if the LLE scan is unsuccessful) is shown in Figures 5-14 and 5-15. 2. Useability Criteria When searching for a module, the program manager looks for a CDE already enqueued on the JPQ or ALPAQ with a CDNAME the same as that of the requested name. If a matching name is found and the CDE is on the ALPAQ, the module is immediately available to the requestor because all these CDEs represent modules that are reentrant and from APF-authorized libraries. If the CDE is on the JPQ, however, certain tests have to be made to determine if the module represented by the CDE can be used by the requestor. The routine CDALLOC (CDE Allocation) performs this testing. The CDE with the matching name is the input to CDALLOC. Output is a return code indicating the useability of the associated module. Figure 5-16 describes tests and actions taken by CDALLOC. APF Authorization 'The program manager performs two APF-related functions. The first function determines whether or not ajob step is APF-authorized when the job step TCB is attached. The second function prevents any authorized program from accessing, via LINK, ATT ACH, XCTL or LOAD, a module that is not from an APF-authorized library. 5.3.14 OS/VS2 System Programming Library: MVS Diagnostic Techniques Program Manager (continued) Note: For XCTL, if caller is an SVRB, only the ALPAQ and PLPAD are searched Search CDEs via TCB's Load List Queue Search JPQ No See Figure 5-15 Order of COEs on ALPAQ Search Private Libraries First: Modules activated from the pageable LPA - newest modules first No Second: Modules listed in I EALODOO - in inverse order of specification in list Search ALPAQ Search PLPAD Search SVCLlB Third: Modules in fix lists in inverse order of specification in lists. Lists also in inverse order of their specification Fourth: Modules in MLPA lists - in inverse order of specification in lists. Lists also in inverse order of their specification Yes Search LlNKLIB Figure 5-14. Module Search Sequence for LINK, AIT ACH, XCTL and LOAD Section 5. Component Analysis 5.3.15 Program Manager (continued) Yes No No No Yes Yes Search Library Via Given DCB Z Byte Z Byte> 1. Search the Parent Job/Step/Task Libraries Via Z Byte =1 Search All Job/Step/T ask Libraries Search Library Via Given DCB Z Byte:::i 1 Z Byte> 1 Search Current Job/Step/T ask Libraries Conti nue Search with the ALPAQ Figure 5-15. Module Search Sequence of Private Libraries 5.3.16 OS!VS2 System Programming Library: MVS Diagnostic Techniques Search the Parent Job/Step/Task Libraries Via Z Byte Program Manager (continued) Type of Request COALLOC Return Code Module Condition via the Input COE 8 Module from non APF-authorized library Requestor is Authorized* From APF-authorized library - = same as non-authorized request Module being fetched (CDNIC = 1) 16 Reentrant or serially reuseable LOAD Module in Storage (CDNIC = 0) No Load Restrictions (CDNLR = 1) Fetched by Program Manager Nonreusable Load Only Requestor is Non-authorized 4 Loaded by as/LOADER 0 USECT =0 4 USECT )0 0 Reentrant or serially reuseable 4 Non-reuseable 0 16 Module being fetched (CDNIC = 1) Reentrant (CDR EN = 1) LINK ATTACH XCTL Module in Storage (CDNIC =0) No Load Restrictions (CDNLR = 1) Serially Reuseable Nonreuseable 4 In use (CDRRBP +0) 12 Not in use (CDRRBP = 0) 4 Used (CDNFN = 1) 8 Never used (CDNFN = 0) 4 Load only (CDNLR = 0) 406 ABEND 0: Module not available via JPO 4: Module is immediately available 8: Module not available - continue JPO search 12: Module not immediately available - suspend requestor until module is no longer in use 16: Module not immediately available - suspend requestor until fetch is complete *In supervisor state, in system key, or as part of an APF-authorized step Figure 5.16. CDE Allocation Section 5. Component Analysis 5.3.17 Program Manager (continued) 1. Establishing APF Authorization An APF-authorized job step is executing if bit JSBAUTH is on in the JSCB. This bit is turned on by the program manager if the following conditions exist when LINK is called by ATTACH: • It must be a job step ATTACH. The program manager considers it a job step ATTACH if field TCBJSTCB in the attached TCB points to itself and if there is a JSCB for the step indicated by a non-zero TCBJSCB field. • The load module being attached must have been link edited with an APF authorization code of 1. This is indicated to the program manager when bit PDSAPF is on in the module's directory entry. • The load module being attached must be from an APF-authorized library. This is determined by FETCH and indicated to the program manager by bit WKA UTH being on in the FETWK. In summary, a job step is APF -authorized if the first program executed in the step is both from an APF-authorized library and is link edited with an APF authorization code of one. 2. 306 ABEND An authorized program is one that is executing in supervisor state, or with a system protect key (0-7), or as part of an APF-authorized job step. An authorized program must LINK to, ATTACH, LOAD and XCTL to modules exclusively from APF-authorized libraries. The program manager issues an abend code of 306 if the only useable copy of a module requested by an authorized program is on a non-APF-authorized library. When a load module is fetched into virtual storage, FETCH indicates to the program manager via the FETWK bit, WKAUTH, whether it is (bit on) or is not (bit off) from an APF-authorized library. If the requested module is already in virtual storage, the program manager determines whether or not it is from an APF-authorized library by examining the CDE bit, CDSYSLIB. If it is on, the module can be used by an authorized program. Bit CDSYSLIB = 1 if the associated module is from an APF-authorized library except in the following cases: 5.3.18 • The bit = 0 if the module is reentrant but is still fetched into subpool 251 because of TSO TEST considerations (see the following discussion on "Module Subpools"). • The bit = 0 when IDENTIFY creates a major CDE because a nonauthorized user indicates an embedded entry point in a module from an APF-authorized library. OS/VS2 System Programming Library: MVS Diagnostic Techniques Program Manager (continued) Module Subpools All modules represented by CDEs on the ALPAQ are loaded into the pageable LPA, the fIXed LPA, and the modified LPA. These modules are never freed. Modules represented by CDEs on the JPQ however, are freed by the program manager and can occupy storage in subpool 0, subpool 251, and subpool 252. Modules loaded by the OS/LOADER always use subpool O. This is indicated by bit CDSPZ being set to one. When bit CDSPZ is zero, modules fetched by the program manager use sub pool 251 if bit CDJPA is set on orsubpool 252 if bit CDJPA is set off. Reentrant modules from APF-authorized libraries are always fetched into subpool 252 if the requestor is a system program (a program in supervisor state or with a system key). Reentrant modules from APF-authorized libraries requested by nonsystem programs are also fetched into subpool 252 provided TSO test (TCB bit TCBTCP=O) is not running. All other modules are fetched in to subpool 251. FETCH/Program Manager Work Area (FETWK) Module IEAVLKOI obtains the FETWK (mapped by DSECT IHAFETWK) from subpool253 when a load module is to be fetched into virtual storage. A pointer to the FETWK is placed in RBCSWORK of the program manager SVRB. FETWK is logically divided into three areas. The first area, up to but not including field WKADDR, is used exclusively by FETCH as a work area. The second area, up to but not including WKPREFX, is used as a work area by the program manager. Field WKIOADDR and bits WKAUTH and WKSYSREQ in this area are also addressed by FETCH, as follows: • WKIOADDR is always set to zero by the program manager. This instructs FETCH to do the GETMAIN for the load module. • WKAUTH is set to one by FETCH to tell the program manager when a load module is from an APF-authorized library. • WKSYSREQ is set to one by the program manager to tell FETCH that the requesting program is in supervisor state and/or system key. FETCH uses this indication to bypass DEB checking. The third area of the FETWK, .starting with WRPREFX, is the BLDL area. This area contains the directory entry used by FETCH to obtain the requested module. The directory entry is placed there by the program manager either by moving a caller-supplied directory entry into the area or by issuing a BLDL. Section 5. Component Analysis 5.3.l9 Program Manager (continued) RB Extended Save Area (RBEXSA VEl The 48-byte extended· save area (RBEXSAVE at RB+ X'60') of the program manager SVRB is used by the program manager as a work area. This area, along with the FETWK, is a key area in analyzing program manager problems. RBCSNAME (at RB+X'60') contains the module name requested by the caller, and RBCSDE (at RB+X'68') points to a copy of the caller-supplied directory entry if one was supplied. If EP or EPLOC is specified, then this pointer is zero. Other key areas of the extended save area are RBCSWORK (at RB+X'74'), which points to the FETWK if FETWK was obtained, and bit RBCSSYSR (RB+ X'70' = X'40'), which is on if the caller is a system program. 5.3.20 OS/VS2 System Programming Library: MVS Diagnostic Techniques VSM Virtual storage management (VSM) is responsible for allocating virtual storage, keeping track of what is allocated and, for certain subpools, recording to whom it is allocated. These services are performed both for the system and problem programs. (See Figure 5-17.) The following are the five basic functions that VSM performs: Function Module Performing Function Nucelus initialization (NIP) IEAVNP08 IPL parameters are: SQA=, CSA=, REAL=, VRREGN= Address space initialization IEAVGCAS Called by memory create Step initialization/termination I EAVPRTO GETP ART /FREEPART Virtual storage allocation IEAVGMOO GETMAIN/FREEMAIN Cell pool management l IEAVGTCL IEAVFRCL IEAVBLDP IEAVDELP Comments GETCELL (get cell) FREECELL (free cell) BLDCPOOL (build cell pool) DELCPOOL (delete cell pool) The nucleus initialization program (NIP) is not discussed in this book. The remaining VSM functions, and the related FRRs (functional recovery routines), are described in the following topics: • Address Space Initialization • Step Initialization/Termination • Virtual Storage Allocation • VSM Cell Pool Management • Miscellaneous Debugging Hints Section 5. Component Analysis 5.4.1 VSM (continued) 64K boundary ..... -.. Cannot be relea sed via FREEMAIN 4K boundary 64K boundary SP 245 - Key 0, not fetch protect SQA r These can be intermixed -< on a 4K bas is ... Common Area LPA SP 231/241/227/228/239 CSA ~ I--I- LSQA SP 253 SP 254 -'SP 255 - SWA SP 236, SP 237 - Key 1, not fetch protect, pageable U keY_l SP 229 - User key, fetch protect, pageable SP 230 - User key, not fetch protect, pageable '- Key 0, not fetch protect, not pageable AQE for task Key 0, not fetch protect, not pageable AQE for any job step task Key 0, not fetch protect, not pageable , User Area (Private Apdress Space) These are not allowed to cross. These can b intermixed 64K boundary I e{ Top of SP 0-127,251,252 is CURRGNTP in contro.1 block local data area - (LOA) SP 252 - Key 0, not fetch protect, pageable SP 251 - User key, fetch protect, pageable SP 0 - SP 127 - User key, fetch protect, pageable -.. SYSTEM REGION 16K chained out of RCT's TCa (TcaPQE at Tca 5.4.2 OS/VS2 System Programming Library: MVS Diagnostic Techniques I- I- + X'98') NUCLEUS (FREEMAIN cannot be issued for the NUCLEUS) Figure 5-17. Virtual Storage Management's View ofMVS Storage - J System Area VSM (continued) Address Space Initialization The create address space module (IEAVGCAS) initializes the VSM address space. lEAVGCAS creates the local data area (LOA). The LOA is the key anchor block and VSM save area for space allocation within an address space. lEAVGCAS builds all the blocks labeled "C" in Figure 5·18. lEAVGCAS also builds the initial allocation of 16·byte LSQA elements for VSM's local cell pool. GETMAIN then allocates VSM's internal control blocks from this pool. lEAVGCAS also contains VSM's task termination and address space termination resource managers. The task termination routine frees all non-shared space anchored in the TCB. (See Figure 5-18). The address space termination routine frees any WAIT or POST elements of a V=R (virtual=real) job that are associated with the terminating address space and are chained out of VSM's GOA (global data area). These V=R WAIT/POST elements exist only when ajob is waiting for V=R address space. lEAVGCAS's functional recovery routine (FRR) is lEAVCARR. IEAVCARR is divided into the following three sections, corresponding to those of lEAVGCAS. 1. Entry lEAVCARR, which protects the create address space portion of IEAVGCAS. 2. Entry IEAVTTRR, which protects the task termination portion of IEAVGCAS. 3. Entry lEAVF ARR, which protects the address space termination portion of IEAVGCAS. lEAVCARR does not place data in the variable recording area of the SOW A (STAE diagnostic work area). It does invoke SOUMP and either retries at the beginning of the function that detects the error or continues with termination. Section 5. Component Analysis 5.4.3 VSM (continued) Current ASCB I ASCBLDA h I SPQEfO/ V LSOA ~IJ ~LDA J V DOE chain (C) (C) FOE chain V / po. (C) 8-byte FOEs - ~ (C) ~_LS_O_A_P_T_R_---tlJ _________ LDASRPOE (Release 3 only) VSM's pool of 16-byte LSOA cells for VSM's internal control blocks -~ \ _AS_D_PO_E_ _ LCLCELL .... ..310. .Dummy POE (~ 8 0 _....L.._-- POE TCBSWA TCBMSS TCBUKYSP \. address space FBQE chain ------(C) / V FBOE chain (C) TCBAOE TCBPOE except in (C) Master Scheduler's address space) _ ~~-------~ not V=R jo POE for Current TCB ( V POE for system region (16K .Jio. / (C) - exists only for V=R job I POEforV=R I ~ user region ~V - FBOE chain po.'"" J--------' I SPOE chain subpools 236 and 237 (SWA) AOE chain built on V=R GETPART I AOE chain* , SPOE chain sub pools 0-127, 251, and 252 po. - SPOE chain subpools 229 r-I and 230 ~ ____________ I ...- Ipo. I- '--------- DOE chain / *AOEs will be for SP 254 for a job step task or for SP 253 If not a Job step TCB C = built by IEAVGCAS Note: Updates of all control blocks and queues in this figure are serialized by the local lock. Figure 5-18 .. Virtual Storage Management's Control Block Usage 5.4.4 OS/VS2 System Programming Library: MVS Diagnostic Techniques FOE chain '-------.-~ 16-byte FOEs ~ VSM (continued) Step lnitializationtrennination (lEA VPRTO - GETPART /FREEPART) IEAVPRTO is invoked by lEAVGMOO (GETMAIN/FREEMAIN) via a branch entry as a result of a GETMAIN/FREEMAIN request from an initiator for subpools 242 (V=R) or 247 (V=V). IEAVPRTO does not return to lEAVGMOO; it returns directly to the initiator. IEAVPRTO performs four functions, as follows: 1. For a V~V GETPART request: Sets TCBPQE to point to the dummy address space partition queue element (PQE) that was created at address space initialization time. Calls the initiator exit routine IEALIMIT in order to establish the LDALIMIT which is the value used by GETMAIN as an upper limit for problem program subpool GETMAIN's requests. The OS/VS2 System Programming Library: SupenJisor contains a discussion on LDALIMIT, REGION=, and variable GETMAIN requests. 2. For a V=R GETPART request: Performs IEALIMIT processing as described above. Attempts to obtain V=R space by interrogating V=R FBQEs chained from the GDA. • If unsuccessful - Creates V=R wait element - Chains the V=R wait element from the GDA - Indicates the V=R wait element is waiting for space • If successful - Interfaces with RSM's (real storage manager) IEAVEQR to obtain real frames - Builds V=R dummy PQE, V=R PQE, and V=R FBQEs, and updates TCBPQE 3. For a V=V FRE~PART request: Frees all task-related space by calling FREEMAIN, and freeing one subpool at a time. Frees SPQEs and task-related subpools. Sets TCBPQE=O. 4. For a V=R FREEPART request: Performs the same functions as for V=V FREEPART. Returns space to V=R FBQEs chained from the GDA. Attempts to satisfy V=R waiting requests by posting the waiting initiator. The waiting initiator reissues the request; lEAVPRTO will move the WAIT elements to the POST queue anchored in the GDA. This POST element is freed by GETPART when the initiator's new request is received. Section 5. Component Analysis 5.4.5 VSM (continued) lEAVPRTO's FRR, lEAVGPRR, does not use the variable recording area of the SDWA. It attempts a retry back into lEAVPRTO based on footprints set in the FRR's six-word parameter area. Refer to the IEAVPRTO code (microfiche) for a detailed description of this FRR parameter area. Virtual Storage Allocation (lEA VGMOO - GETMAINJFREEMAIN) lEAVGMOO satisfies all GETMAIN requests for virtual storage. The control block structure it uses is shown in Figures 5-18 and 5-19. All GETMAIN processing for the private area subpools and all associated control blocks are serialized by the local lock. All common area sub pools and associated control blocks are serialized by the SALLOC lock. A detailed process flow through GETMAIN for a virtual storage allocation request can be found in Appendix A in the GETMAIN/FREEMAIN process flow description. In debugging GETMAIN, the most important information is contained in the original request for virtual storage. This information can be obtained from the trace table, from a parameter list passed by the problem program code, or sometimes from the LDA (local data area). The LDA cannot always be relied upon to provide information about the request unless the system is stopped immediately. Unless you insert a code or SLIP trap and take a stand-alone dump on error, the LDA is overlaid by another request during the dumping procedure. If an immediate stop has been obtained upon encountering an error, the most useful information in the LDA is obtained from the flags in the LDARQST A (LDA + X'10') field. The LDARQSTA contains the current request status. Compare the flags in this field to the request and determine if the two are consistent. Then check through the control block chain, the LDA and GDA chains that are set up when subpools are requested to find out why the abend or error occurred. LDARQSTA (LDA+X'lO') Offset o 1. .. · 1. . · . l. · .. 1 .... 1. .. · 1. . · . 1. · .. 1 1. .. · 1. . · . 1. · . .1 1. .. · 1.. · .1 . · . .1 2 3 4096-byte Request from Subpool 253/254 An AQE is needed Fetch Protected Subpool Authorized User Key Subpool SWA Subpool LSQA Sub pool CSA Subpool SQA SUbpool SVC Number · . 1. · . .1 5.4.6 Subpool 254 Requester has TCBABGM on Explicit V=V Region reached Variable Request, Pass 2 SQA or LSQA Expansion Deferred Error I/O Flag FMAINB or MRELEASR Request GETMAINB Request Branch Entry Subpool FREEMAIN Supervisor Mode. (if zero) OS/VS2 System Programming Library: MVS Diagnostic Techniques VSM (continued) Note: The GOA is always at the very end of SQA; X'FFFFC8' in Release 2, X'FFFFCQ' in Release 3. GOA CVTGOA CSAPQEP VRPQEP PASTART private area start and size PASIZE SQASPQEP SQA space l e f t , - - - - - - f SQASPLFT includes CSA VRPOSTQ ava i lab Ie for VRWAITQ SQA expansion PFSTCPAB CSASPQEP SQA SPQE GLBLCELL GBLCELCT count of free cells V=R virtual is now available; the initiator is posted to re-issue the request. V=R post element jobs waiting for V=R virtual space V=R wait element permanent cell pool CPABs CPAB CPAB CPAB (CPAB Table is shown in Figure 5-21.) CSA , SPQE (Release 3 only) VSM's pool of ,1 6-byte SQA cells for VSM's internal control blocks Note: Updates of all the control blocks and queues in this figure, except PFSTCPAB, are serialized by the SALLOC lock. PFSTCPAB is read only after NIP. Figure 5-19. Virtual Storage Management's Global Data Area (GDA) Section 5. Component Analysis 5.4.7 VSM (continued) GETMAIN's Functional Recovery Routine - lEA VGFRR Whenever GETMAIN's FRR (lEAVGFRR) fmds an error in a queue, an entry is made in the SDWA variable recording area, SDWAVRA (SDWA + X'194') to indicate the error that has been found, its location, and the corrective action taken. Each entry is added to the next available location and the length of the usersupplied data is increased (fieldSDWAURAL, SDWA + X'193'). The high-order byte (byte 0) of the first word contains FF if the space in the variable recording area was exceeded and data entries were lost. The low order byte (byte 3) of the first word contains a digit indicating the type of error that caused lEAVGFRR to get control. This digit comes from FRRBRNDX (branch index FRR) in the LDA where it is set up by IEAVGMOO. FRRBRNDX is X'lF' into the GETMAIN/FREEMAIN work area (GMFMWKAR); GMFMWKAR is the portion of the LDA that is used by lEAVGMOO as a work area. It is mapped at the end of this chapter. The second word of the SDWA variable recording area contains the LDA field LDARQSTA at the time of error. The contents ofLDARQSTA are described in the previous topic "Virtual Storage Allocation (IEAVGMOO - GETMAIN/ FREEMAIN)" . The next 16 words usually contain the registers (ordered 0-15) at the time IEAVGMOO was entered. These registers are useful for debugging problems that occur on branch entry requests. Register 14 contains the caller's return address. The remaining SDWAVRA entries consist of one to three words each. The first word of each entry will have a code in the high-order byte and a data length in the low-order byte. If this length is 0, there is no additional data for this entry." A length of 4 or 8 indicates one or two additional words of data. These data words always contain the address of the affected field or control block. These are all shown in the table in Figure 5-20. Control blocks are checked to determine if they are in the correct subpool, for example, SQA or LSQA; queues are checked for validity. 5.4.8 OS/VS2 System Programming Library: MVS Diagnostic Techniques VSM (continued) Code Data Length Data Addresses First Second Explanation 01 4 PVTLCSA PVTLCSA is incorrect - it will remain unchanged. 02 4 PASTRT PASTRT in GOA is incorrect - it is reset using PVT. 03 4 PASIZE PASIZE in GOA is incorrect - it is reset using PVT. 04 0 05 4 PVTHQSA PVTHQSA is incorrect - it will remain unchanged. 06 4 Bad TCB pointer TCB pointer is not valid - no queue repairing is attempted. B1 4 SPQE with bad pointer Next SPQE pointer is incorrect - the SPOE queue is tru ncated (by setting the bad pointer to zero). B2 4 SQA SPQE SQA SPQE flagword is incorrect - it will remain unchanged. B3 4 LSQA SPQE LSQA SPQE flagword is incorrect - it will remain unchanged. B4 4 SPQE Next SPQE pointer = 0, but last SPQE flag is not on - the last SPQE flag is set on. e1 4 Cell with bad pointer Next cell address is incorrect - the cell pool chain is truncated. 01 4 SQA SPQE First SQA OQE pointer (in SPQE) is incorrect; it points outside SQA so all of SQA may be lost - it will remain unchanged. 02 8 OQE with bad pointer 03 -4 OQE 04 8 Current OQE Overlapped OQE Current OQE area overlaps a previous DQE - current OOE is removed from queue. 05 8 Current OQE Previous OQE Circular OQE queue - queue is truncated after previous OQE. 06 4 OQE First SQA OQE area address or length. is incorrect - address and length are corrected. F1 4 FQE Incorrect type flag in FQE - the flag is corrected. F2 4 OQE or FBQE with bad pointer Next FQE pointer is incorrect (if SQA or LSQA, then previous FQE length could be too large) - FQE queue is truncated (by setting the bad pointer to zero). F3 4 FQE I ncorrect (too long) length in FQE FQE queue is truncated. All three sources of CSA start addresses (GOA, PVT, CSA, PQE) disagree - no change will be made. Bad OQE address OQE pointer is not in SQA or LSQA OQE queue is truncated (by setting the bad pointer to zero). OQE area address or length is incorrect - this OQE is removed from the queue. Figure 5-20. SDWAVRA Error Indicators Section: 5. Component Arialysis 5.4.9 VSM (continued) VSM Cell Pool Management VSM's cell pool management consists of the following functions: Module Macro Function IEAVGTCL GETCELL Gets a cell from a preformatted pool of cells IEAVFRCL FREECELL Frees a cell to a preformatted pool of cells IEAVBLDP BLDCPOOL Builds a pool of formatted cells IEAVDELP DELCPOOL Deletes a pool of formatted cells The key to the VSM cell pool management function is the cell pool anchor block (CPAB). The layout of the cell pools is shown in Figure 5-21. Note that the permanent cell pools have IDs that are negative, for example, RSM01 (X'D9E2D401'), while the dynamic cell pools have IDs that are the address of the first CPAB divided by 4 (shift right 2). The four VSM cell pool management modules are small enough that processing can be determined from studying the CPAB mapping along with the code. Miscellaneous Debugging Hints 1. I Most common problems with GETMAIN/FREEMAIN occur in the interface processing. There is usually a bad TCB or ASCB address or the local lock is not held upon entry. The ASCB is used only to find the LDA which is in the :same location in all address spaces except the master scheduler's. Note: On a branch entry to GETMAIN, register 7 contains the address of the ASCB; however, on return from the branch entry, register 7 no longer contains this address. 2. You can catch callers who do not hold the local lock on entering GETMAIN within routine CSPCHK. To do this, test for all of the following conditions: • Not NIP (CVTNIP) • Not GLBRANCH entry (SALHELD flag offset X'1E' in LDA) • Not GETMB or FREEMB (offset X'lO' in LDA) • Local lock not held (pSAHLHI) Not in the address space creation process (ASXBTCBS not equal to 0)\ I. 5.4.10 3. A valid GETMAIN/FREEMAIN that is issued for zero bytes is treated as a no-op. 4. The routine CSPCHK is a good place for a GETMAIN/FREEMAIN trap to occur because CSPCHK is called for every request during the beginning of lEAVGMOO processing. OS!VS2 System Programming Library: MVS Diagnostic Techniques VSM (continued) Dynamic pools CVT X'230' CVTGDA, cells cells GOA X'30' PFSTCPAB / A-----r-t // // / / / . Permanent CPAB (Cell Pool Anchor ilock) Table 4 pools are currently in use: SRBOO RSM01 RM103 RT104 . Pointer to next CPAB ; for this pool JIo.._-...ii..-~--I Contains CPID when cell in use ~____i..-_....... Pointer to next ,available cell Figure 5-21: VSM Cell Pool Management Section 5•. Component AJlalysis 5.4.11 VSM (continued) 5. GETMAIN makes few queue manipulation errors. If lEAVGMOO rejects a request, it is usually because the caller made an error on the interface; the message lEA 7001 is issued. 6. Subpool queue elements (SPQEs) are not freed even on a subpool FREEMAIN, and multiple keys for a problem program sub pool on the same rCB are not allowed. This can result in a problem if a user changes TCBPFK. The following is an example of such a situation: Set TCB key TCBPFK=6 GETMAIN SP I FREEMAIN SP I Change rCB key TCBPFK=8 GETMAIN SP 1 Causes SPQE to be built, storage in key 6 SPQE for SP I is not destroyed lEAVGMOO uses old SPQE and assigns storage in key 6 7. On branch entry to GETMAIN, IEAVGMOO saves registers at field BRANCHSV in the LDA and turns on the BRENTRY flag at offset X'10' in LDA. To be sure these sayed registers are for the request in question, it is necessary to stop the system via a trap. (See "Using the SLIP Command" and the "System Stop Routine" topics in the chapter "Additional Data Gathering Techniques" in Section 2.) 8. Since the GDA occupies the last few bytes of storage, a random store at -4 or -8 overlays the GDA. 9. MVS has added a new register type GETMAIN/FREEMAIN SVC and branch entry. It is SVC 120. The parameters differ from those of SVC 10 as follows: Register 1 Zero for a GETMAIN; address to be freed for FREEMAIN. Register 15 (SVC only) Bytes 0 and 1: Reserved, set to O. Byte 2: Subpool ID Byte 3: Following flag values: Bits 0-4 Bit 5 = 0 =1 Bit 6 =0 =1 Bit 7 =0 =1 14 5.4.12 Reserved, set to 0 Double word boundary Page boundary Conditional request Unconditional request GETMAIN FREEMAIN For the branch entry SVC 120, register 15 contains the entry point address and register 3 is used for the parameters. Register 3 is set up the same as register 15 above with one exception: Byte 1 is the protect key for subpools 227-231 and subpool 241. The address that was obtained via GETMAIN is returned in register 1 as in SVC 10. OS/VS2 System Programming Library: MVS Diagnostic Techniques VSM (continued) 10. Some GETMAIN failures are recorded in an information list contained in the nucleus. This list is similar to a trace table and is pointed to by the CVTQMSG (CVT + X'10C'). Each entry contains data such as: ABEND code, ASCB address, TCB address, register 14 ifGETMAIN was branch entered, and the parameters passed. Refer to the DSECT INFO LIST in module IEAVGMOO for additional information. GMFMWKAR (IN LDA AT + X'18') OFFSET IN HEX (FROM START OF LOA) 18~--------------------------------------~ ABNDATA(VAR.DATA) 1C~--------~--------~--------~~------~ MSGLEN I FREESW ILOCKFLAG I FRRBRNDX 20~--------~--------~----------~------~ REGSAVE SAVE AREA USED FOR SRM AND RSM (18 FULL WORDS) MSAVE SAVE AREA USED FOR MRELEASE (16 FULL WORDS) -,.... E8 GNOTSAVE SAVE AREA USED FOR GNOTSAT (16 FULL WORDS) -..... I GFSMFSVE/CSPCKSAV (SMF CORE ROUTINES SAVE AREAl CSPCHK SAVE AREA) -..... MPTRS (PREVIOUS AND NEXT PTRS SAVE AREA) DUMYDOE (DUMMY DOE FOR L/SOA EXPANSION) JL ..... 11~r_______________(_4_F_U_L_L_W_O__R_D_S_)______________Jr~ J: _ 121 -I'- 134 140 144 14C 150 154 -... TEMPDOE (TEMPORARY DOE FQR FMCOMMUN) (4 FULL WORDS) DUMFBOE (DUMMY FBOE FOR MRCLEASE) (4 FULL WORDS) FREESW· X'80' FREEMAIN IN PROGRESS X'40' LENGTH HAS BEEN INCREMENTED X'20' ADDRESS HAS BEEN DECREMENTED X'10' NOT 1ST DOE (FOR L/SOA) X'08' FQE WAS BELOW FREED AREA X'04' FURTHER PAGE RELEASE NEEDED LOCKFLAG X'02' SALLOC LOCK OBTAINED X'01' SALLOC LOCK ALREADY HELD FO 104 REASON CODE AND LENGTH OF VAR. DATA -~ LOCKSAVE (OVERLAPS INTO GFSMFSVE AND MPTRS) FC MSGLEN - r FRRBRNDX X'OT SUBPOOL FREEMAIN, AOE AREA NOT IN DOE X'06' PAGE RELEASE RETURN CODE OF 1 X'05' SALLOC OBTAIN RETURN CODE NOT OOR 4 X'04' ON L/SOA EXPANSION, GFRECORE FAILED X'03' FINDPAGE RETURN CODE NOT 0 OR 4 X'02' CREATE SEGMENT RETURN CODE>O X'01' SALLOC RELEASE RETURN CODE >0 X'OO' UNEXPECTED ERROR,SEE STATUS -,.... SAV911 SAVE AREA FOR REGS 9-11 (BRANCH ENTRY) LASTSAVE (LAST LIST ENTRY) MINMAX MAX & MIN LENGTH FOR VARIABLE REOUEST LASTLSTA (LAST LIST ENTRY ADDRESS) LSTINDEX (INDEX FOR LIST REOUEST) FMARCAS (PTR TO AREAS TO BE FREEMAINED) -... Section 5. Component Analysis 5.4.13 VSM (continued) OFFSET IN HEX (FROM START OF LOA) ,- 158 ...... ....1""" I PARMLDA 15C CLOPREV (PREVIOUS FOE TO CLOSEST) 160 CLOSEST (CLOSEST INSIZE FOE) 164 LARGESTA (LARGEST AVAILABLE) 168 LARGEST (LARGEST AVAILABLE FBOE) 16C 170 174 SAVESIZE (SIZE OF MULTIPLE OF 4K CORE) ENDADD (END ADDRESS) STRTADD (START ADDRESS) 17C 184 188 18C 190 PARMLDA X'80' GLOBAL REOUEST (GLBRANCH OR MRELEASE ON GLOBAL REOUEST) X'40' SALLOC LOCK OBTAINED BY GM/FM X'04' FIRST FLAG BYTE IN FRR PARM X'OO' LOA ADDRESS IN FRR PARM LENSAVE (SAVE AREA FOR LENGTH LIST PTR) 178 180 FRRPARM (FRR PARM AREA ADD) DIFF/SAVEPOE (DIFFERENCE/POE PTR IN FBOESPCH) FIXSTART (STARTING ADDRESS TO CLEAR) FIXEND (ENDING ADDRESS TO CLEAR) NOTSATSV (LEN PTR USED BY GNOTSAT) NOTSATSI (LDAROSTA SAVE AREA FOR GNOTSAT) SAVESEG (ADDRESS OF MULTIPLE OF 4K CORE) 194 198 19C LARSOFAR (LARGEST AVAILABLE IN FBOE) RSTRTADD (ROUNDED START ADDRESS) RENDADD (ROUNDED END ADDRESS) 1AO 1A4 1AC VPFP (FIND PAGE ADDRESS TO BE USED) DOESAVE SAVE DOE PTR AND PREVIOUS DOE PTR FMSAVE (SAVE RETURN REG FOR FREEMAIN) 1BO 1B4 CODE1 - (SAVE AREA FOR OPTION CODE) X'CO' LIST INDICIO.TOR (MIXED IF LIST) X'20' CONDITIONAL INDICATOR X'10' MASK FOR PAGE BOUNDARY X'04' SVC 120 PAGE BOUNDARYREOUEST X'02' SVC 120 UNCONDITIONAL REOUEST X'Ol' SVC 120 FREEMAIN REOUEST PREVFOSV (SAVE AREA FOR PREVIOUS FOE PTR) CLEARSW - (CLEARSW FOR GFRECORE) X'Ol' FOECPB INDICATOR ON IN FOE FOESAVE (SAVE AREA FOR FOE) lB8 SPOESAVE (SAVE AREA FOR SPOE) lBC S\(RLB (SAVE AREA FOR RLB) 1CO SVSIZE (SAVE AREA FOR ROUNDED SIZE) 1C4 DOESAVE1 SAVE DOE INFO WHEN USING FMAINB GMFMSW X'04' X'02' X'Ol' 1CC 100 FETCH - (KEY AND FETCH PROTECT) X'08' FETCH PROTECT ON FMSAVE1 (SAVE RETURN REG IN FMAINB) FOESAVEl (SAVE FOE INFO IN FMAINB) 104 lOB - (GM/FM SWITCH FOR MRELEASE) FIRST TIME SW FOR MRELEASE INDICATES FM FOR FBOE INDICATES GM FOR FBQE PREVFOS1 (SAVE PREVIOUS FOE IN FMAINB) SPOSAVEl (SAVE SPOE IN FMAINB) 1DC OUTSW - (SWITCH FOR OUT OF REALIVIRT.) X'OO' REAL INDICATOR FOR OUTSW X'FF' VIRT. INDICATOR FOR OUTSW SVRLBl (SAVE RLB FOR FMAINB) 1EO SVSIZEl (SAVE ROUNDED SIZE FOR FMAINB) CODE2 - (SAVE AREA FOR OPTION CODE) lE4 1E8 .SAVSVTSV (SAVE LDAROSTA IN FMAINB) FOENXTSV (FOE NEXT SAVE AREA) lEC SAVFRESW - (SAVE FREESW IN FMAINB) SPID - (SPIO FOR MRELEASE) OLDFOELN (OLD FOE LENGTH) 1FO 1F8 lFC SEGTEST (END SEG TEST AREA) CODE1 CLEARSW GMFMSW FETCH OUTSW CODE2 SAVFRESW SPID LSPOEKEY ROSTRKEY SAVSPID SAVSPID2 200 5.4.14 LSPOEKEY - (PROTECT KEY FROM CURRENT SPOE) NEWFOELN (NEW FOE LENGTH) lF4 ROSTRKEY - (REOUESTER KEY OR KEY SAVSPID - (SAVE SPID FOR FREEMAIN) SAVSPID2 - (SPID FOR MESSAGES) OS/VS2 System Programming Library: MVS Diagnostic Techniques = PARM) Real Storage Manager (RSM) The real storage manager (RSM) manages the real storage of the system. To do this, it divides all potentially pageable real storage into 4K-byte frames. Within RSM, the page frame table entry (PFTE) describes the frame according to type, current use, or its most recent use. The current or last state of a request for RSM pageable services is described by the pagecontrol block (PCB) within RSM: the requestor supplies information about his request and RSM reformats this data into a PCB. As the request is processed, RSM adds other internal RSM information to the PCB. RSM is a queue-driven component. Both PFTEs and PCBs are queued based on their current state. Simply stated, frames that can be used immediately are queued on the available frame queue; their PFTEs describe the frame's last use. Similarly, free request elements are queued on the FIFO PCB free queue; these PCBs describe the final state of previously processed requests. (This historical nature of PCBs is often useful in problem analysis.) To manipulate these control blocks and manage the queues, RSM has a PFTE manager (lEAYPFTE) and a PCB manager (lEAYPCB). Besides being queued, PFTEs are located in a contiguous table starting at (PYTPFTP) + (PYTFPFN) and ending at (PYTPFTP) + (PYTLPFN). PCBs, however, are obtained (via GETMAIN) in groups and are spread out in SQA. They can be found only by following queue pointers. Major RSM Control Blocks RSM's major control blocks are the PFTE, PCB, page table entry (PGTE), external page table entry (XPTE), paging vector table (PYT), RSM header (RSMHDR), and swap control table (SPCT). An RSM service routine called find page (lEAYFP) locates the PGTE and XPTE control blocks. The table in Figure 5-22 lists the control block functions. Control Block Function PFTE describes the last use of a frame PCB describes the current or last state of a request PGTE XPTE describe the current real frame and virtual page relationship for a particular virtual address PVT RSMHDR basically these are RSM anchors and work areas SPCT related only to swapping, it describes the RSM requirements necessary to swap in an address space (the swap out process formats the SPCT) Figure 5-22. Major RSM Control Blocks and Their Functions Section 5. Component Analysis 5.5.1 Real Storage Manager (RSM) (continued) Only the leftmost 12 bits of either a real or virtual address are needed to describe a specific real frame or virtual page (a modulo 4K-bytes real and virtual addressing scheme). These 12-bit numbers are multiplied by 16 to form block numbers; for example, VBNO and RBNO are four-digit, hexadecimal, virtual and real block numbers. Also, note the following: • PGTEs contain RBNx values. • The contents of PVTPFTP plus RBNO is the address of the PFTE for the frame whose real address is RBNOOO through RBNFFF. Of all the RSM control blocks, the most critical are the PCB, PFTE and SPCT. The important fields in each block are described below. Figure 5-23 shows the relationship among the blocks. PGT PCB XPT The contents of PVTPFTP plus RBNO equal the address of the PFTE. AlA t XPTE t PGTE RBNO VBNO [I--______ -A-IA_R_P_N..... Figure S-23. Relationship of Critical RSM Control Blocks 5.S.2 OS/VS2 System· Programming Library: MVS Diagnostic Techniques _---:1_ VBNO r 'l. Virtual Page Index = SSPO Virtual Segment Index Real Storage Manager (RSM) (continued) PCB (Page Control Block) Important fields in the PCB are: +0 PCBCQN - indicates the current queue location of this PCB as follows: X'10' - PCB is not currently in use. It is queued on the PCB free queue anchored in the PVT. X'18' - PCB is currently waiting for frame allocation to occur. It is queued on the PCB defer queue anchored in the PVT. X'20' - PCB represents a common area I/O operation. Actual physical I/O mayor may not be complete. It is queued on the PCB common-I/O queue anchored in the PVT. X'88' - PCB represents a private area I/O operation. Actual physical I/O mayor may not be complete. It is qu~ued on the PCB local-I/O queue anchored in the RSMHDR for the address space indicated by PCBASCB; ASCBRSM points to the RSMHDR. X'FF' - PCB is probably in use. The not-queued state means only that the PCB is not on the primary forward/backward chain of the above mentioned major PCB queues. It can be a related PCB, a root PCB, or an associated PCB. +8 PCBFL1: PCBSRBMD=X'20' - PCBSRB is the address of a page-fault.. suspend SSRB. The use of this address is the only means of locating page-faultsuspended SRBs. PCBROOT=X'04' - PCBRTCA is the address of a root PCB. Root PCBs are only valid if their PCBCQN field is X'FF'. +9 PCBRTPA - When the PCBROOT bit is on, this contains the address of a PCB that controls a block page operation. +X'D' PCBRLPA - The address of a chain of PCBs for the same PCBVBN/ PCBRBN. The related chain of PCBs are dequeued PCBs that are chained via the PCBRLPA field (not via PCBFQP/ PCBBQP). +X'10' PCBFL2: PCBRESET=X'10' - The function indicated by the PCB has been suspended for a page fault because no frames were available or paging I/O had to be completed before redispatching the page faulter. PCBASCB, PCBRTPA, and PCBSRB define the ASCB, TCB, and RB to be RESET when PCBSRBMD is O. When PCBSRBMD is 1, PCBASCB and PCBSRB define the ASCB and SSRB that will be RESET. Section 5. Component Analysis 5.5.3 Real Storage Manager (RSM) (continued) +X' 11' PCBXPT A Is either 0 or the address of the XPTE. +X'lS' PCBPGTA Is either 0 or the address of the PGTE. +X'18' PCBRBN This value when added to the address in PVTPFTP gives the address of the associated PFTE. +X'lA' PCBVBN - This field is often zero; when it is zero, the operation has either been NOPed with page I/O still in progress or the I/O is complete and the PCB is only serving a scheduling/tracking function. The operation is considered to be complete when PCBVBN=O; no other paging request should be able to relate to it; that is, it cannot be found via an equal compare on PCBVBN. When PCBVBN is zero, its previous value can be determined from the AIARPN field in the AlA. The AlA is the last 28 bytes of the PCB. The following information about roots is useful to the problem solver. • Root PCBs can generally be recognized because most of the PCB is still zero. • The SPCT points to active roots for SWAP; RSMSPCT in the RSMHDR points to the SPCT. • V=R waits for region roots are queued from PVTVROOT in the PVT. • Vary offline roots are queued from PVTOROOT in the PVT. • PAGE FIX and PAGE LOAD roots can only be found via PCBRTPA of the queued FIX/LOAD PCBs. For non-root PCBs: PCBCQN, PCBFLI , PCBFL2, and PCBFL3 are the key fields. They describe the current state of the paging request for which the non-root PCB was last used. 5.5.4 OSjVS2 System Programming Library: MVS Diagnostic Techniques Real Storage Manager (RSM) (continued) SPCT (Swap Control Table) The SPCT is mapped in modules IEAVSOUT, IEAVSWIN, IEAVCSEG, and IEAVITAS. Space for the SPCT is obtained via GETMAIN and is initialized in IEAVITAS. As segments are created, IEAVCSEG updates the SPCT. IEAVSOUT initializes the SPCT with the pages that make up the working set (such as, LSQA and fixed pages plus recently referenced pages). IEAVSWIN uses the information IEAVSOUT put in the SPCT in order to start up a previously swapped-out address space. The first portion of the SPCT contains the _address of the swap root PCB (SPCTSWRT); the number of fixed and LSQA entries in this SPCT (SPCTFIX and SPCTLSQA); the number of segment entries and the number of active segment entries (SPCTNSEG and SPCTSSEG); and the working set size (SPCTWSSZ). The flags at offset X' A' are defined as follows: X'80' Swap-in in progress X'40' Swap-out in progress X'20' Paging was purged during swap-out X'lO' There is at least one fix entry with a fix count greater than 255 X'08' Page data set used for LSQA X'04' Swap-out requested by IEAVEQRP The next portion of the SPCT (SPCTSW AP) is the SPCT extension and is 56 (decimal) bytes long. It contains a maximum of six fix swap entries or eight LSQA swap entries, or a combination of the two. In a combination, LSQA entries precede all fix entries. LSQA entries are six bytes each and fix entries are eight bytes each. Both entries contain the following flags in the first byte: X'80' LSID in this entry is valid. X'40' This is an LSQA entry. X'20' The VBN in this entry is for common. X'lO' The page was flagged defer release at swap time. This flag byte is followed by a three-byte LSID and a two-byte VBN. If the entry is for LSQA, there is nothing more, but if the entry is a fix entry, the next two bytes contain the fix count. The last portion of the SPCT contains a variable number of six-byte segment entries. The first byte is the segment number and it is followed by the address of the page table. The next two-byte field (SPCTBITM) is a 16-bit map indicating which pages are to be brought in at swap-in time. Section 5. Component Analysis 5.5.5 Real Storage Manager (RSM) (c,ontinued) PFTE (Page Frame Table Entry) Important fields in the PFTE are: PFTIRRG - indicates the format of the first word of the PFTE. This bit is located in PFTFLAG2 at offset X'D' and is a X'lO'. If it is on, the first word of the PFTE is mapped as PFTPGID and contains a VIO LGN and RPN. If PFTIRRG is off, the first word of the PFTE is mapped as PFTASID and PFTVBN. An ASID of X'FFFF' indicates a common area page.' Note that a VIO LGN can be the same as an address space ASID; address space ASIDs and LGNs are seldom the same but could be. PFTPCBSI - indicates there is a PCB on an I/O queue for this page; there can be a string of related PCBs for this page. This bit is located in PFTFLAG 1 at offset X'C' and is a X'08'. This bit is turned off by the process that validates the page when the I/O completes, or, for output I/O, after the I/O completes but before the PFTE is sent to the free queue. Note that I/O queues sometimes contain several "no-op" PCBs; these appear to point back to a frame and its associated PFTE. When a PCB is made into a "no-op," PFTPCBSI is turned off and the association between that PCB and that frame and its associated PFTE is broken. These "no-op" PCBs are either output PCBs with incomplete I/O or inputPCBs with complete I/O. Page Stealing Figure 5-24 shows the flow of the page stealing process. The circled numbers in the figure correspond to the notes below. CD 5.5.6 lEAVRFR scans the local frame queue (LFQ) or common frame queue (CFQ); the queue it scans is determined by the parameter list received from SRM. G) lEAVRFR checks the hardware reference bits and then updates the unreferenced interval count (VIC). lEAVRFR orders the LFQ and the CFQ so that the PFTEs with the highest VICs are at the top of the queue. The queues are in descending order, with zero VICs at the bottom. C]) Frames are selected to be stolen based on their VIC and pageability; that is, flxed/LSQA/bad pages, and pages that are V= R allocated cannot be stolen. OS/VS2 System Programming Library: MVS Diagnostic Techniques Real Storage Manager (RSM) (continued) o and (0 IEAVRFR calls a common routine, FREEPAGE, to invalidate selected pages and build a PCB for the page-out process if the page is changed. If the frame queue from which frames are being selected does not correspond to the current address space or the CFQ, lEAVRFR must schedule an SRB (STEAL) to the appropriate address space in order to get to the PGTE in LSQA. Finally, IEAVRFR calls ASM to start output paging. and 1(0 ® Entry IEAVRFRA scans the LFQ of the address space it is scheduled into. If PFTSTEAL= 1 and if a frame is still stealable and has not been referenced since "Select," IEAVRFRA sets the steal flag. FREEPAGE is then called to steal the frame. After the frame queue has been scanned, ASM is called and given a string of AlAs. IRARMSTM SRM branches to IEAVRFR RFR parameter list Flags A (ASCS) For common area steal --~ IEAVRFR (SELECT) Flags CD Obtains queue headers A (0). ® Selects frames to be stolen Flags @ A (ASCS) o ® Parameter list is in module I RARMSTM (Label RFRLST1) IEAVRFR (Free Page) Invalidates PGTE and builds PCS if changed page ® Stops scanning the frame queue when the UIC is less than the criteria number Calls FREEPAGE or schedule steal Starts accumulated I/O IEAVRFR Entry IEAVRFRA (Steal) ILRPAGIO • Locates frames with PFTSTEAL=1 Starts page-out I/O • If still stealable, calls FREEPAGE • Starts accumulated I/O Figure 5-24. Page Stealing Process Flow Section 5. Componeni Analysis 5.5.7 Real Storage Manager (RSM) (continued) Reclaim Reclaim is an RSM function that revalidates a page/real frame pair that was previously iiwalidated. IEAVGFA performs the reclaim for the normal case after a page fault on an address space or common area virtual address. IEAVAMSI handles the VIO case. In the virtual address case , lEAVG F A handles work as follows: 1. PCBVBN is used to locate the PGTE. 2. The PGTE is used to obtain the last-used RBNO value. 3. The RBNO is used to address the PFTE. 4. PFTIRRG is checked to ~etermine if the first word of the PFTE is in PFTPGIDor PFTASID/PFTVBN format. S. If PFTIRRG=O, PFTVBN is compared to PCBVBN. 6. If the VBNs match and the VBN is in the common area, the reclaim is successful. If the VBN is in the private area and PFT ASID matches ASCBASID (which PCBASCB points to), the private area reclaim is successful. In the VIO case, IEAVAMSlhandles work as follows: 1. lEAV AMSI is supplied with both a RBNO and a DSPID. 2. The RBNO is used to address the PFTE. 3. PFTIRRG is checked to determine ifPFTPGID is in PGID format. 4. IfPFTIRRG=I, PFTPGID must match the DSPID; if it matches, the reclaim is successful. When reclaim fails, normal frame allocation paths are followed just as though the page had never been in storage. Relate Relate is an RSM function that associates independently-generated page requests (PCBs) for the same virtual address. When the physical action required to satisfy one of these requests (I/O or frame allocation) is completed, all related requests are also satisfied. A PCB-related chain is produced for all cases except the VIO data set. The same modules that do reclaim, lEAVG F A and lEAV AMSI, handle the relate process, which only follows after a successful reclaim. In the virtual address case, lEA VGF A handles work as follows: 1. The relate function is invoked for one of two cases: • The reclaim function has successfully completed and PFTPCBSI is on, indicating page I/O is in progress; a PCB I/O queue is searched. • The XPTDEFER bit is on, indicating that the previous PCBs have been delayed because frames were not available. The GF A defer queue will be searched to do the relate fUnction. 5.5.8 OS!VS2 System ProgramMing Library: MVS Diagnostic Techniques Real Storage Manager (RSM) (continued) 2. The search argument is PCBVBN in all cases except that of the G F A defer queue; in that case PCBASCB and PCBVBN are the search arguments. 3. When the correct queued PCB is located, the current'PCB is inserted in the related PCB chain between the queued PCB and the previous, first-related PCB. In the VIO data set case, IEAVAMSI handles work as follows: 1. The PCB local I/O queue is scanned for a match on PCBRBN because PCBVBN is always set to 0 for move-out PCBs. If PCBRBN matches, PCBV AM must be on. 2. 'When the correct PCB is found, it is updated with the information the I/O completion portion of RSM needs to place the page of the VIO data set in the new window location (this is not necessarily a new page). RSM Recovery RSM recovery consists of a SETFRR at all major entry points to the, RSM: • The issuer of the SETFRR places the address of the FRR in PVTPRCA. • Each SETFRR returns a six-word parameter list in the recovery communications area (RCA). • RSM has only one FRR - lEA VRCV. • The IHARCA macro maps the RCA; this macro can be found in most RSM modules. • lEAVPSI contains the RCA macro in assembler language format. Whenever an unexpected error or COD abend occurs, the RCA is copied into SDW AVRA. The CSECT ID and the module-entered flag in the RCA can be used to determine the path taken through RSM to the point of error. To determine this path, you must understand the RSM flow and know which module issues SETFRR. The follOWing RSM modules issue SETFRR at their main entry point: IEAVAMSI IEAVPIOP IEAVSOUT IEAVCSEG IEAVPIX IEAVSQA IEAVEQR IEAVPRSB IEAVSWIN IEAVIOCP IEAVPSI IEAVTERM IEAVITAS IEAVRCF lEAVRELS at lEAVRELV IEAVPIOI IEAVRCV lEAVFRSB at IEAVPRSR (entry) pomt IEAVRFR Section 5. Component Analysis 5.5.9 Real Storage Manage.. (RSM) (continued) RSM'sFRR does not attempt complex rec()very. Its main objective is to record the error and issue an SDUMP. It has some special logic based on where the error occurred, as follows: Error Occurred III FRR Actioll lEAVEQRI or lEAVRCFI Restore registers for return to IEAVPFTE. IEAVPIX Attempt to reset page faulter. IEAVSIRT "Memterm" swapping in address space. IEAVSWIN "Memterm" swapping in address space. IEAVPIOI Retry for cleanup or "Memterm." IEAVINV Set "GO" indicator and PTLB or retry to PTLB. IEAVPSI If error occurred while checking input parameters, set abend of 171. Other If it is a non-zero retry address, retry; otherwise continue with termination. Recursion is not allowed. The PVT and PFTEs are dumped on the SDUMP. The following reason codes are put into the RCARCRD field when real storage management issues abend with a code of COD. All COD abends are retried at the next sequential instruction. Real Storage. Mallagemellt ABEND Reasoll Codes Code (hex) 5.5.10 Meaning 01 Findpage, translate real to virtual, or the LRA instruction returned, an unexpected code for a segment, page, or frame whose existence was implied by some RSM control block or function. Findpage, translate real to virtual, or LRA is assumed to be correct. 02 A GETCELL or FREECELL for the RSM cell pool failed. If FREECELL, the error is ignored; if GETCELL, asynchronous retry is attempted where possible. 03 A FREEMAIN failed for space originally obtained by RSM or VSM using GETMAIN. The error is ignored. 04 The return code from ASM (ILRSWAP, ILRPAGIO, or ILRTRPAG) indicates an invalid request. The recovery action taken by RSM varies with the type of reql,lest, but the RSM function being performed is usually terminated if ASM resources were being requested, or continued if ASM resources were being returned. 05 A GETMAIN for RSM control block space was unsuccessful. The function for which the space was required is terminated. 06 An attempt was made to release a lock which was not held. RSM tables might be damaged due to the loss of serialization. RSM attempts to continue normal operation. OS!VS2 System Programming Library: MVS Diagnostic Techniques Real Storage Manager (RSM) (continued) Code (hex) Meaning 07 RSM control information indicated a PCB for a page should exist. on an I/O active queue or on the defer queue, but searching of the. queue(s) failed to find the PCB. It is assumed the control information is in error and no such PCB exists. 08 The existence of a V=R or offline root PCB was implied but no appropriate PCB could be found on the V=R or offline root queue. The error is ignored and indicators are reset. 09 Swap-in's XMPOST error exit was entered, so restore will not run. The target address space is terminated. OA An incorrect fix count was detected in a PFTE. The count is adjusted to the expected value. OB The interprocessor communication service routine (RISIGNL) could not signal another processor as requested by lEAVINV. The condition is ignored and normal operation continues. OC IEAVPIOP has discovered an undefined combination of I/O completion status flags in the AlA after a page-in or page-out. The condition is treated as an I/O error. OD lEAVDSEG was requested to destroy a non-existent or common area segment. The request is denied. OE A PCB was required but none were available. The routine needing a PCB is terminated. OF The attempt to complete processing of a previously deferred FREEMAIN release has failed. 10 An FOE could not be found on the specified TCB's fix ownership, list. 11 An internal RSM invocation of the PGOUT function was unsuccessful. The page remains in real storage. 12 A swap (in or out) was requested for an address that already has a swap in progress, or no SPCT exists for the address space to be swapped. The request is denied. 13 Swap-in could not re-establish the address space due to missing or incorrect control information (SPCT or PCBs). The address space is abnormally terminated. 14 An internal invocation ofPGFREE failed. The error is IS Swap-out has detected an inconsistency in the SPCT fix entries it has created. The error is suppressed and recovery attempted. ' 16 ASCBCHAP could not enqueue or dequeue an ASCB during a swapin or swap-out operation. The address space is terminated. 17 Swap-out has detected an error in the allocated frame count (ASCBFMCT) for the address space. If possible, the count is corrected and the swap-out continued; otherwise, the swap-out is suppressed. ignored~ Section 5. Component Analysis 5.5.11 Real Storage Manager (RSM) (continued) Code (hex) Meaning 18 No SPCT segment entry could be found for a segment whose existence was implied by other RSM control information. The error is ignored and the SPCT update is skipped. 19 An internal RSM function issued a return code which was either invalid or not applicable. System action depends on the nature of the function. lA Swap-in detected a common area page that was not obtained using GETMAIN among the input working set. The page is not made available to the incoming address space. Some other address space must have freed the page using FREEMAIN while the current one was swapped-out. Probable user error. IB During an attempt to free the frames backing a V=R region, it has been determined that the virtual space is not backed by real storage, or that the virtual-to-real mapping is not 1 to 1. lC lEAVPSI attempted to flx the ECB for a page service that will complete asynchronously, but lEAVFXLD returned a code indicating the flx was not accomplished. ID A PCB marked I/O complete (indicating that it was previously processed by lEAVPIOP) has been passed to lEAVPIOP by ASM. IE A software error has been found in the AlA passed from ASM to RSM for an I/O request. Possible errors are: • The AlA contains data inconsistent with previous AlAs. • The original input chain (to ASM) was invalid. • The LSID in the XPTE was invalid. • The LPID in the XPTE was invalid. • A hardware I/O error occurred on a pageout PCB (this should not occur). 1F An invalid real storage address was returned to IEAVPRSB at entry point lEAVPRSR. 21 IEAVPFTE detected a discrepancy in the SQA reserve queue count controls. Use of the SQA reserve queue is discontinued until after re-IPL. RSM attempts to continue normal operation. 22 lEAVTERM has found an FOE flx count that is greater than the flx count in the corresponding PFTE. The PFTE flx count is not changed, but the FOE is freed. RSM Debugging Tips 5.5.12 1. Because the PCB free queue is a FIFO queue, it represents recent history in RSM. Start your search of the PCB free queue with the youngest PCB (PVTFPCBL) and look for the appropriate VBN in the PCBVBN or AIARPN. This approach often reveals what has most recently happened to the page in question. 2. Whenever the system wants to break the logical connection between the PCB and the page, it sets PCBVBN to zero. Therefore, look at AIARPN to determine what VBN the PCB was associated with (A IARPN=PCBV BN! 16). OS!VS2 System Programming Library: MVS Diagnostic Techniques Real Storage Manager (RSM) (continued) 3. The PVT contains several work/save areas that belong to a unique module. These are often useful to determine the last thing a module did. 4. At any time, there should never be more than one input PCB with a given PCBVBN on the I/O-active or GFA-deferred PCB queues. Output PCBs are never related. 5. The XPTVIOLP flag can be confusing. If it is on, XPTXAV must be on. SV AUX=l means the LPID in the XPT is a VIO LPID and not an adqress space LPID. When a page with XAV=l and SVAUX=l is stolen, it is paged out under an address sp~ce LPID and SV AUX is set to zero. If the next operation on the page is a VIO move out, RSM tells ASM to logically transfer the address space LPID contents to the VIO LPID contents. 6. It is sometimes useful to observe the AIANXAIA pointers in PCBs on the PCB free queue. These pointers probably indicate the order in which I/O completed for a group of requests. 7. To help diagnose a COD abend, the PVTDUMP bit (byte 0, bit 7 of the PVT) can be turned on (using superzap) to cause the RSM FRR to dump the PVT, PFT, SQA, and current LSQA data areas. Converting a Virtual Address to a Real Address A virtual address contains the segment number in the first byte, the page number in the next four bits, and the page displacement in the remaining twelve bits (that is, sspddd - segment, page, displacement). The ASCB for the address space points to the RSMHD. The first word (RSMVSTO) of the RSMHD is the virtual address of the segment table (SGT). Multiply the segment number (ss) by the length of a segment table entry (4) to locate the SGT entry (SGTE). The SGTE contains the real address of the page table (PGT). A real address consists of a real block number (RBN) in the first twelve bits and a page displacement in the remaining twelve bits (that is, rrrddd - RBN, displacement). The RBN portion of the real address of the PGT is concatenated with zero (RBNO) to form an index into the page frame table (PFT). This index is added to the apparent origin of the page frame table (PVTPFTP) in order to obtain the virtual address of the page frame table entry (PFTE). The PFTE identifies the frame that contains the page in which the page table resides. The second half of the first word of the PFTE is the virtual block number (VBN). The VBN is concatenated with the displacement portion of the real address of the page table to form the virtual address of the page table (VBNllddd). Multiply the page number (p) of the virtual address being converted by the length of a page table entry (2) to locate the PGT entry (PGTE). The PGTE contains the RBN portion of the real address that corresponds with the initial virtual address. This RBN is concatenated with the displacement portion of the initial virtual address to obtain the desired real address (RBNllddd). Figure 5-25 shows the relationship of the control blocks used to convert a virtual address to a real address. Section 5. Component Analysis 5.5.13 Real Storage Manager (RSM) (continued) Given a virtual address - find the corresponding real address. Definitions: Virtual address = sspddd = VBNllddd ss segment number p page number ddd displacement within page VBN virtual block number Real address = RBNllddd RBN real block number ddd displacement within page G) Find the real address of the page table (RBNlld'd'd'). RSMHD ASCB SGT RSMVSTO + (4*ss) ASCBRSM SGTE SGTE contains the real address of the page table ® Convert the real address of the page table to a virtual address (VBNlld'd'd'). PVT PFT I PVTPFTP ~ +RBNO PFTE -p ,/ from SGTE PFTE contains the VBN portion of the virtual address of the page table. ® Find the RBN portion of the real address. PGT +(2*p) PFTE PGTE SGTE PGTE contains the RBN portion of the desired real address. Concatenate the displacement portion of the virtual address (ddd) with the real block number (RBN)to form the real address that corresponds to the given virtual address. real address = RBNllddd Figure 5·25. Converting Virtual Addresses to Real Addresses. 5.5.14 OS/VS2 System Programming Library: MVS Diagnostic Techniques Real Storage Manager (RSM) (continued) Example: Converting a Virtual Address to a Real Address This example shows how a virtual address of A9ECO was converted to a real address. The values used in this example (such as ASCBRSM = FC7380) were obtained from a sample dump. Given: ss p ddd Step 1: = A9ECO (sspddd) Virtual address OA (segment number) 9 (page number) ECO (displacement within page) Find the real address of the page table (PGT). ASCBRSM RSMVSTO + = = FC7380 (address of RSMHD) 89FCOO (address of SGT) 89FCOO (RSMVSTO) 28 (4*ss) 89FC28 (address of SGTE) SGTE = F0307F20 Real address of PGT Step 2: = 307F20 Convert the real address of the PGT to a virtual address. RBN d'd'd.' = = PVTPFTP 307 and RBNO F20 = 3070 = 78760 78760 + 3070 (PVTPFTP) (RBNO) 7B7DO (address of PFTE) PFTVBN = 87BO Virtual address of the PGT Step 3: = VBNlld'd'd' = 87BF20 Find the RBN portion of the real address. + 87BF20 (virtual address of PGT) 12 (2*p) 87BF32 (virtual address of PGTE) = 3811 RBN portion = 381 PGTE Step 4: Form the real address for the sample. Real address = RBNllddd = 381ECO Section 5: Component Analysis 5.5.15 5.5.16 OS/VS2 System Programming Library: MVS Diagnostic Techniques Auxiliary Storage Manager (ASM) The auxiliary storage manager (ASM) controls all system direct access storage that is allocated for virtual address space paging and for virtual input/output (VIa) data sets. ASM supports the dynamic paging requirements of the real storage manager (RSM) and the data set storage and retrieval requirements of the virtual block processor (VBP). For MVS paging, ASM has the responsibility of selecting the auxiliary storage location (slot), maintaining the slot/page mapping, and coordinating the slot/frame transfer. The auxiliary storage manager consists of four sections: • I/O control • I/O subsystem • VIa control • VIa group operators. I/O control is the link between RSM and the I/O subsystem for paging requests, and between RSM and the I/O supervisor (IDS) for swapping requests. RSM initiates all swapping requests; the I/O is executed by the I/O subsystem and IDS. I/O control accepts the paging/swapping requests from RSM, determines the type of I/O to be done and when it can be started, and notifies RSM when the I/O is completed. I/O control also records the auxiliary storage locations of all virtual pages. The I/O subsystem communicates with lOS to cause the physical transfer of data between real and auxiliary storage. It allocates auxiliary storage slots, builds paging channel programs, passes them to IDS for execution, and processes I/O completions. VIO control coordinates all the ASM processing required to support VIa data sets (called logical groups by ASM). Operations on a logical group are classified as group operations and page operations. A group operation is not allowed to process while another group operation or page operation is processing for a logical group. The virtual block processor (VBP) initiates group-related operations and VIO control passes them to the VIO group operators to be processed. RSM initiates page-related operations and I/O control and VIO control jointly process them. VIO group operations maintain the logical group information that VBP requires. The VBP group operators perform all processing necessary to create, save, restore, and delete a logical group. These operators are invoked only by VIO control as a result of requests from VBP. Section 5. Component Analysis 5.6.1 Auxiliary Storage Manager (ASM) (continued) Modules (CSECTs) belonging to each section are: I/O Control I/O Subsystem ILRPAGIO ILRPAGCM ILRSWAP ILRSWPDR ILRFRSLT ILRPTM ILRSRT ILRCMP ILRMSGOO VIO Control ILRPOS ILE-GOS ILRVIOCM ILRSRBC ILRJTERM VIO Group Operators ILRACT ILRSAV ILRRLG ILRTMRLG ILRVSAMI Component Functional Flow ASM provides seven functional services. The first four are invoked by the use of the ILRCALL macro, the remaining three via BALR: • ASSIGN LG obtains a logical group identifier from ASM and creates a logical group for a VIO data set. • SA VE preserves the status of a logical group for recovery at a later time. • A CTIVATE places a logical group into active status after it has been saved and the saved status of the group is desired. (Used for step restart of VIO data sets.) • RELEASE LOGICAL GROUP deletes an entire logical group; this allows ASM to reuse all slots associated with that logical group (VIO data set). • TRANSFER PAGE moves the logical slot identifier (LSIO) for a page from an address space to a VIO logical group. • REQUEST I/O transfers page-sized blocks between real storage and ASM's auxiliary storage. • REQUEST SWAP I/O transfers LSQA between real storage and ASM's auxiliary storage. Page~size blocks are transferred if page data sets are used. Swap-set size (up to 12 pages) blocks are transferred if swap data sets are used. The following descriptions track three of these services through the component: SAVE, which is similar to assign LG,activate, and release logical group; request I/O; and request swap I/O. Saving an LG SAVE requests ASM to write the ASPCT containing the slot numbers (LSIOs) of a VIO data set to SYS 1.STGINOEX. ILRGOS receives control from VBP in task mode with an ASM control area (ACA) containing the LGN of the VIO data set as input. ILRGOS builds an ASM control element (ACE), queues it to the logical group entry (LGE) process queue (LGEPROCQ)..for that LGN, and calls ILRSAV. 5.6.2 OS/VS2 System Programming Library: MVS Diagnostic Techniques Auxiliary Storage Manager (ASM) (continued) ILRSAV calls ILRVSAMI, which calls VSAM to write the ASPCT to the SYSl.STGINDEX data set. An'S' symbol is returned by ILRVSAMI. (The'S' symbol is part of the VSAM key used to save this ASPCT and can be used to uniquely identify the ASPCT for an activate request). ILRSAV puts the'S' symbol in the ACE and returns to ILRGOS. ILRGOS copies it into the ACA, frees the ACE, and returns to VBP. Requesting I/O RSM calls ILRPAGIO for I/O requests. An ASM I/O request area (AlA), or string of AlAs describes the request. ILRPAGIO determines if the request is for a VIO page and, if it is, calls ILRPOS to process it. Otherwise, ILRPAGIO continues to process the request. For write requests, the previous slot for this page is freed. For read requests, the LSID is obtained from the extended page table entry (XPTE), and put into the AlA. The AlA is queued to the ASM staging queue (ASMST AGQ) and ILRQIOE is called. ILRQIOE builds an I/O request element (IOE) for each AlA on the staging queue, and queues 10Es to the paging activity reference table (PART) header or to a PART entry. Each PART entry represents a paging data set and controls activity on the data set. Since a read request is for a particular data set, the read 10E is queued to the PART entry identified by the PART index contained in the LSID. Write 10Es are queued to the PART header because the data set to be used is still unknown. If an SRB is not already scheduled for ILRPTM, ILRQIOE schedules one. The PART monitor (lLRPTM) scans the PART entries for work and available resources (I/O control blocks) to process the 10Es. For each PART entry with I/O to be done, slot sort routine (ILRSRT) is called. ILRSRT allocates slots for writes interspersed between reads (to minimize arm movement), builds PCCW chains, and issues a STARTIO macro to initiate lOS processing. When I/O completes, lOS calls ASM's disable interrupt exit (DIE) routine (ILRCMPDI - an entry pOint in ILRCMP). ILRCMPDI checks for errors, and if one occurred, returns to lOS indicating that the I/O should be handled by the post status lOS routine and ASM's appendages (ILRCMPAE and ILRCMPNE). If the I/O is successful, ILRCMPDI calls page completion (lLRPAGCM). ILRP AGCM calls VIO completion (ILRVIOCM) if the I/O is for a VIO page. If it is a non-VIO write request, ILRPAGCM takes the LSID that ILRSRT put into the AlA and puts it in the XPTE for the page in the correct address space. The AlA is then returned to RSM (lEAVPIOP). . Section 5. Component Analysis 5.6.3 Auxiliary Storage Manager (ASM) (continued) Requesting Swap I/O RSM calls ILRSWAP with a chain of AlAs for either a swap-in or swap-out request. The following discussion traces a swap-out operation. ILRSWAP separates the non-LSQA AlAs from the LSQA AlAs and calls ILRP AGIO to process the non-LSQA pages as a regular request I/O function. The LSQA AlAs are queued to the ASM header of the address space (ASHSWAPQ). If there were no non~LSQA AlAs, ILRSWAP immediately calls ILRSLSQA to process the LSQA AlAs. Otherwise, ILRSWAP returns to RSM. As non-LSQA AlAs complete, ILRPAGCM isgiven control (see Requesting I/O). When alI non-LSQA AlAs have completed, ILRPAGCM calls ILRSLSQA to process the LSQA AlAs. ILRSLSQA, called by ILRPAGCM or ILRSWAP, calls ILRPAGIO to process the LSQA AlAs if there are no available swap sets. Otherwise, ILRSLSQA assigns swap sets and initializes swap channel control workareas (SCCWs) for all the AlAs queued to ASHSWAPQ. A count of LSQA pages (ASHSWPCT) is incremented for each AlA. The completed SCCWs are chained to the swap activity reference table (SART) entry SCCW queue (SRESCCW). If an SRB is not already scheduled for swap driver (ILRSWPDR), ILRSLSQA schedules one. ILRSWPDR searches each SART entry for a non-zero SCCW queue, chains the SCCWs to an 10RB for that data set, and issues a ST ARTIO macro to initiate I/O processing. Completed I/O is handled by ILRCMPDI as in the "Requesting I/O" function, and ILRP AGCM is called. ILRPAGCM processes LSQA AlAs by putting the LSID for each page into the SPCT control block for this address space, putting the AlA on the capture queue (ASHCAPQ), and decreasing the swap count (ASHSWPCT) by 1. When the swap count is 0, ILRPAGCM returns all the AlAs on the capture queue to RSM (IEAVSWPC module). Figure 5-26 shows the relationships among the important ASM control blocks. Component Operating Characteristics The following topics discuss characteristics of ASM's operating environment. System Mode ASM uses the SALLOC lock in most page and swap processing in I/O control modules. I/O control"modules interface directly with RSM, the principle user of the SALLOC lock. The SALLOC is held throughout processing, including the calls to the VIO control routines ILRPOS and ILRVIOCM. The local lock is used during assign and release logical group requests processed by ILRGOS and ILRRLG. 5.6.4 OS!VS2 System Programming Library: MVS Diagnostic Techniques Auxiliary Storage Manager (ASM) (continued) PAGE FAULT PROCESSING INPUT Start Macro ILRPAGIO LPID or LslD Non-VIO AlA ~ I I I __ --1 '-... I I I I ~ "'- '-... Create from AlA "'- ......... Works With. IIOE AlA 1+ ILRPOs / / '-... ......... "-,,"- / Page Operation Starter / / PART Use LPID to Find LPME tccw AlA SRB ~ .... Q) C) CIJ c: • PCT PCCW { . LGE ASCB LGVTEs LGE + +PAT +PCT CCW +ASCB +EDB LGVT ASHMD PARTES Header CCW CIJ ~ Q) C) ~ 0 en m Q) II: ' 10sB', + CCW Header CVT + 2CO ./' I <' Flags "....." I I I NN Slot No. I '---..,--J LslD I Header 0001111 1111111 10110111 uu Figure 5-26. Relationship of Important ASM Control Blocks Section 5. Component Analysis 5.6.5 Auxiliary Storage Manager (ASM) (continued) An ASM class lock exists for each active address space (lockword in ASMHD). The ASM class lock is used by the VIO control modules. ILRPTM uses an ASM class lock (lockword in PART) to serialize the IOE write queues. The ILRCMPDI entry of ILRCMP (ASM's DIE routine) runs in physically-disabled mode since it is running under the I/O interrupt handler. The rest of ASM modules simply run in task Or SRB mode using compare and swap instructions where necessary. For additional information on locking, refer to the topic "ASM Serialization" later in this section. Address Space, Task, and SRB Structure I/O control modules receive control in the address space of the caller with the exception of ILRSWPDR, which is an SRB executing in master's address space. Note also that ILRPAGCM transfers to the correct address space (TRAS) to update the external page table entry (XPTE), which is in LSQA. I/O subsystem modules run in SRB mode under master's address space, except for the ILRCMPDI entry of ILRCMP and the modules it calls, which execute in the address space interrupted by the I/O completion. VIO group operator modules, as well as ILRGOS (VIO control module), are tasks (locked mode) executing in the address space associated with the VIO requests. ILRTMRLG runs in task mode, but in master's address space. VIO control modules ILRPOS and ILRVIOCM receive control in the address space of the caller. ILRSRBC executes in SRB mode in the address space associated with the VIO requests. Storage Considerations ASM maintains four-cellpoois for its internal control blocks. These cell pools are pushdown stacks and the elements at the top of the cellpools represent the last control blocks used by ASM. There are three expandable cell pools for work areas, ACEs, BWKs, and SWKs. The 10E cellpool is not expandable. The cellpools are anchored in the ASMVT and the control blocks reside in SQA. The ASMVT is in the nucleus, but most of the other ASM control blocks are in SQA. One exception is the ASpeT, which resides in the LSQA of the associated address space. MP Considerations ASM takes advantage of MP by allowing both the I/O subsystem ahd the ILRSWPDR module of I/O control, to execute concurrently on both processors. This is achieved through extensive use of compare and swap logic. An individual PART entry or SART entry is 'flag' locked by the processing processor, but ASM can process a request for another entry on the second processor. 5.6.6 OS/VS2 System Programming Library: MVS Diagnostic Techniques Auxiliary Storage Manager (ASM) (continued) Interfaces With Other Components Four other components interface with ASM: • RSM with I/O control for page and swap I/O requests, and with VIO control for transfer page requests. • VBP with VIO control for assign, save, activate, and release logical group requests. • IDS with I/O subsystem and ILRSWPDR routine of I/O control to process I/O requests. • VSAM with VIO group operators to handle I/O to SYS1.STGINDEX. RSM and VBP call ASM, and ASM calls IDS and VSAM. Register Conventions ASM modules adhere to the following register conventions when calling other ASM modules. There are some exceptions where certain addresses are not required. REGISTER: 0 - Parameter register, if required. 1 - Parameter register, if required. 2 RSMHD address for the current address space or the address space identified by an input parameter in register 0 or 1. The ASMHD is addressable as part of the RSMHD. 3 - ASMVT address. 4 - Address of ATA or EPATH currently active for recovery tracking. 13 - Address of register save area, if required. 14 Return address. 15 Entry point address. The I/O subsystem does not use the ASMHD and therefore does not maintain register 2 convention. Footprints and Traces The most useful traces of ASM processing are its control blocks and queues, because they document the movement of requests through ASM. The processor work/save area vector table (WSAVT), which is pointed to by LCCACPUS, will point to the work/save areas for the last I/O processed on the processor. WSACASMD points to the 256 byte save/work area for ILRCMPDI (ASM's DIE routine). WSACASMS points to two contiguous 256-byte save/work areas, the first for ILRPTM, and the second for ILRSRT. The first one is also used by ILRSWPDR and ASM's I/O appendages (ILRCMPAE, ILRCMPNE, and ILRCMP). Section 5. Component Analysis 5.6.7 Auxiliary Storage Manager (ASM) (continued) ASMVT contains save areas for ASM's other I/O-related modules. ASMBWKPC is a poolof work areas used by VIO-related modules (ILRGOS, ILRACT, ILRSAV, and ILRRLG). Bits in the X'O I' byte of the ASMVT indicate whether the IPL was a cold, quick, or warm start. The LGE process queues (LGEPROCQ) contain AlAs and ACEs in process, or waiting for processing for VIO requests. If the PARTE is locked, part monitor (ILRPTM) has called or is about to call slot sort (ILRSRT). If ASMTMECB (ASMVT + x,AB') is a posted ECB, ILRTMRLG is or was about to process the task portion of a release logical group request. When an ASM-Iocked or SRB-mode routine is processing, its functional recovery routine is on the current FRR stack. The first word of the parameter area passed to the FRR contains a one-byte id of the ASM module that established the FRR, followed by three bytes of flags indicating the ASM module or entry point in process at the time of the error. The different ids are discussed in "Recovery Footprints." When ASM's I/O completionmodule encounters the first bad slot, an error record is built with its address at X'14' into the ASMVT. It contains the LSIDs of the unusable slots. The first three bytes in the record are the address of the current entry filled, beginning address of the record, and ending address of the record. An entry contains one byte of flags and the three-byte LSID. If bit 0 is on, the error occurred on a swap data set. If bit 4 is on, there was a read error. I/O error counts are found in the ASMVT, PART entries and SART entries. ASMERRS (ASMVT + X'7C') is the total of error slots found on local page data sets. PARERRCT (PART entry + X'lB')and SRERRCNT (SART entry + X'lB') are the error slots encountered on the particular data set represented by the entry. In the ASMVT there are two counts, ASMIORQR (ASMVT + X'2B') and ASMIORQC (ASMVT + X'2C'), which contain both the number of I/O requests ASM has received and the number completed. If more requests have been received than completed and the system is waiting, there is something wrong with ASM or lOS. General Debugging Approach This description helps isolate paging problems, the most difficult problems to debug. Paging problems (not all of which are ASM problems) fall into two main groups - paging interlocks and incorrect or duplicate pages. Paging Interlocks Paging interlocks result in: an enabled wait state. There are two indicators that hint that the enabled wait is a paging interlock: • The I/O request counts in the ASMVT (ASMIORQR and ASMIORQC) are not equal. 5.6.8 OS/VS2 System Programming Library: MVS Diagnostic Techniques Auxiliary Storage Manager (ASM) (continued) • The ASMIOCNT contains a count of the number of I/O requests sent to lOS but not received by ILRCMP. It is necessary to determine why paging requests received by ASM are not completing. To learn more about these requests, it is necessary to follow I/O control block chains . • The ASMSRBCT field indicates whether ASM's SRB for ILRPTM (ASM PART monitor) has been scheduled. If this field is not zero, ASM's SRB has been scheduled but not dispatched. It is necessary to determine why the SRB has not been dispatched. The blocks discussed here are in the Debugging Handbook. To find the I/O request blocks for a given page space, start with the PART entry. The PART entry points to the first 10RB. There is one 10RB for each page data set on a disk, and four for each page data set on a drum. The first bit of the fourth byte indicates whether or not ASM has passed the 10SB to lOS. If the bit is 0, the 10RB/IOSB is available. If the bit is 1, the 10RB/IOSB is in use. The 10RB points to the 10SB and to the first of a queue of PCCWs. For an active I/O request, the third word into the PCCW points to the associated AlA, and the second word points to the next PCCW on the chain. If the request has been sent to lOS and not returned, it is necessary to trace lOS processing. If I/O processing has caused a page-fault or a request for an enabled lock, the interlock is probably explained. Either ASM could not get the resources· to handle the page fault, the page is already in use and this request is backed up behind the previous one, or the holder of the lock has page-faulted and the page fault cannot be resolved. Incorrect Pages It is almost impossible to determine from a dump what caused the wrong page to be written or read. At best, a dump provides clues as to which general area is causing the problem. Intensive code reviews are then necessary to find it. Frequently, traps must be applied to narrow the area further. The following paragraphs contain descriptions of how to find various pieces of useful information. There is no attempt to describe how to use them because there is no general method. It is first necessary' to determine which page contains bad data and whether the whole page or only part of it is bad. If possible, also determine which page has overlaid the bad page. If only part of the page is bad, the.error probably occurred while handling a track overflow record to or from an alternate track. Check for an invalid first or last part of a page. ASM is unlikely to be the cause of invalid data in the middle of the page. Incorrect pages cause a system failure when the page is used by a system task or by a routine holding a critical system resource. The invalid page is more likely to cause an address space to fail because of program checks that result from invalid data. These failures are rarely attributed to invalid pages. Section 5: Component Analysis 5.6.9 Auxiliary Storage Manager (ASM) (continued) Scan the SYSI .LOGREC data set for any improbable program checks and obtain any associated dumps. Multiple versions of the same problem are helpful in suggesting a pattern for the error. For example, the error might only occur for the second page of LSQA or only on a page associated with an overfl9w record. Finding the LSID for a Given Page A virtual address contains the segment numoer in the first byte, the page number in the next four bits, and the page aisplacement in the remaining bits in the form sspddd (segment, page, displacement). The ASCB for the address space points to the RSMHD. The first word of the RSMHD is the virtual address of the segment table. Multiply the segment number (ss) by the length of the segment table entry (4) to locate the correct entry. It contains the real address of the page table (PGT). Convert this address to a virtual address. Then locate the correct extended page table entry (XPTE) by adding 16 times the length of the page table entry (2), and adding the page number (p) times the length of a XPTE (12). The XPTEcontains information about the status of this page on auxiliary storage. If either the XPTVALID or XPTVIOLP flag is on, there is a slot assigned to this page. If XPTV ALID is on, the LSID (slot identifier) is in the XPTE. If the page is duplexed, two LSIDs are in the XPTE (one for each slot). If XPTVIOLP flag is on an LPID instead of an LSID, is in the XPTE. To relate the LPID to an LSID, see the following topic "Finding LSIDs of VIO Data Sets". Finding LSIDs of VIO Data Sets The ASPCT is used to record the auxiliary storage locations (LSIDs) of VIO data set pages. Only a 1088 byte base ASPCT is created at ASSIGN LGN time. This ASPCT can handle up to 1 megabyte of VIO data set space. If more than 1 megabyte of VIO space is used, the ASPCT is expanded as follows: 1. For each 256 megabytes of space up to 1 billion bytes, an ASST extension is built. 2. For each megabyte of space, a LMPE extension is built. Each ASST or LPME extension requires 1088 bytes of storage. Each ASST extension contains a vector table of LPME extension addresses. The ASPCT (base and all extensions) resides in the LSQA of the associated address space. The LPID is eight bytes. The first four bytes contain an LGID, logical group (VIO data set) identifier. The second four bytes contain a relative page number (RPN). 5.6.10 OS/VS2 System Programming Library: MVS Diagnostic Techniques Auxiliary Storage Manager (ASM) (continued) When given an LGID, there are two methods to locate an ASPCT: 1. The ASCB (of the desired address space) points to the RSMHD. The ASMHD is part of the RSMHD. ASHLGEQ in the ASMHD is the queue of LGEs (active VIO data sets) related to this address space. Searching through the address space's ASHLGE queue, one of the LGEs will have an LGELGID field that matches this LGID. This same LGE has the address of the needed ASPCT (LGEASPCT) . • 2. Another way to locate an ASPCr from an LGID is to follow the CVT to the LGVT (CVTASMVT, ASMLGVr). Using the LGID as an index, locate the appropriate LGVT entry. The LGVT entry contains the address of the LGE that contains the address of the needed ASPCT. With the appropriate ASPCT, now use the RPN portion of the LPID as an index to locate the LPME containing the associated LSID. Figure 5·27 shows the pointers and control blocks described in the following paragraphs. If A' and AA are both zero, use the LL to index ASPLPMES In the ASPCT base for the LPME containing the LSID. Otherwise, use A' to index ASPASSTP for the address of the appropriate ASST extension. Use AA to index the ASPSECTAof the ASST extension for the address of the appropriate LPME extension. And use LL to index the ASPSECT A of the LPME extension for the LPME containing the LSID. The LSID is the slot identifier for this page of the VIO data set. This LSID can be related to the ASM control blocks PART and PAT and to the actual paging device. See the following topic "Locate PART and PAT Bit". LPID: RPN: o 2 3 Figure 5·27. Locating An LSID From An LPID (part 1 of 2) Section 5: Component Analysis 5.6.11 Auxiliary Storage Manager (ASM) (continued) indexes the base ASPCT, ASPASSTP. indexes the ASST extension, ASPSECTA. indexes the LPME extension, ASPSECTA. +A' ASPCT Base ASPCT ASST Extension ASPCT LPME Extension Header Header Header C ASPASSTP I +AA +LL ( ASPSECTA I ASPLPMES Figure 5-27. Locating An LSID From An LPID (Part 2 of 2) Locate PART and PAT Bit Suppose LSID 0004E3BO was found in the XPTE that represents the sample address 07 AI2C: PART entry index is 04. Relative byte address (RBA) is E3BO. The PART has one entry for each page data set, each having a pointer to its PAT. The PAT is a cylinder bit mapping of this page data set. PATCYLMW is the number of words that map a cylinder. PATCYLSZ, slots per cylinder, is the number of significant bits in each cylinder mapping. For device 2305-2: P ATCYLMW is 1 PATCYLSZ is lA (26). To locate the bit in the PAT map for slot E3BO(58288): 1. address of map word = (address of PATMAP) + 8964 = (address of PATMAP) + ((58288/26) x PATCYLMW x (bytes in a word» 2. bit in the map word (origin 0) = 58288/26 = 22. Figure 5-28 shows the control blocks involved in relating a virtual address to the PART and PAT. 5.6.12 OS/VS2 System Programming Library: MVS Diagnostic Techniques Auxiliary Storage Manager (ASM) (continued) PSA +10 FLCCVT +224 PSAAOLD - ~ CVT +2CO ASMVT CVTASMVT / ASMPART PAT (PART Header •• • • ••• • ASCB +34 PARTE ~ ~ PAREPATP / Header PAT BIT MAP PAREEDB -----....... EDB +4 ~RSMHD +0 RSMVSTO EDBLPMBA / -- __ SGT ~ PGT • •• SGTE ASCBRSM LPMB •• • V ...... PGTE • • • >- 16 PGTEs • • PGTE • -' XPTE 16 XPTEs { ----- •• ••• XPTE I 40 I 80 I 00 I 04 I E3 I BO I 00 I 00 I 00 1 00 1_____ 12-byte XPTE Figure 5-28. Relating the Virtual Address to the PART and PAT Section 5: Component Analysis 5.6.13 Auxiliary Storage Manager (ASM) (continued) Converting a Slot Number to Full Seek Address The full seek address can be used to read the record from the disk and determine exactly what it does contain. The PART entry points to the AMB extent descriptor block (EDB) for the data set. The EDB and its associated LPMB(s) describe the data set on the device. The EDB consists of an 8-byte header followed by entries, one' for each extent. The second byte of the header contains the number of entries and the next two bytes contain the length of an entry; The relative byte address (RBA) is calculated by multiplying the slot number by 4K. The extent containing this slot is found by comparing the RBA to the low and high RBAs in each extent. These are found at X'C' and X'10' in the EDB entry. The second word of the entry thus found points to an LPMB. To find the relative cylinder, subtract the low RBA of the extent from the RBA and divide by the allocated unit size (LPMB + 4). To find the relative track, take the remainder in the division just performed and divide it by the bytes per track (LPMB + 8). The remainder of the bytes-per-track division is the first step toward finding the record number. Add 4095 to the remainder, divide the result by 4096, and add 1. The result is the R of MBBCCHHR. To find the CCHH, multiply the relative cylinder by the number of tracks per allocated unit (LPMB + X'10') and add the relative track (as computed by the m~thod just shown) and the starting track from the EDB entry + 8. Divide this result by the number of tracks per cylinder (LPMB + X'12'). The quotient is the CC and the remainder is the HH. For example, when given an RBA of X'E3BO' (58288), calculate the MBBCCHHR for device 2305-2. (Reference IDAEDB and IDALPMB macros for fields) M BB Relative CC = (RBA -,- EDBLORBA)/LPMAUSZ 58,288 - 0/53,248 = 1 = (RBA - EDBLORBA)//LPMAUSZ/LPMBPTRK 5040/13,312 = 0 = (ReI CC * LPMTRRAU + ReI HH + EDBSTTRK)/LPMTPC (1 * 4 + 0 + 8)/8 = 1st cylinder Relative HH CC 5.6.14 extent number =0 return code = 00 OS/VS2 System Programming Library: MVS Diagnostic Techniques Auxiliary Storage Manager (ASM) (continued) HH = R = (ReI CC * LPMTRRAU + ReI HH + EDBSTTRK)/ /LPMTPC 12//8 = 4th tnick [««RBA - EDBLORBA)/!LPMAUSZ)/ /LPMBPTRK) + LPMBLKSZ-l )/LPMBBLKSZ] +1 «5040 + 4095)/4096) + 1 = 3rd record Therefore: MBBCCHHR = 0000000001000403 Un useable Paging Data Sets Certain types of I/O errors received at completion of I/O indicate that ASM is either unable to, or would be ill-advised to access a particular auxiliary storage data set any longer. ILRCMPAE, an entry in ILRCMP, receives these errors and marks the data set as unuseable. For page data sets, the DSBD flag in the PART entry is turned on. For swap data sets, the DSBD flag in the SART entry is turned on. Both flags are X'OA' into the respective entries, and are set to X'04'. ILRMSGOO is then called to determine whether ASM can continue processing without the data set. If ASM is unable to continue processing, ILRMSGOO issues message ILR008W and terminates the system with a X'02E' wait state. At this point, a stand-alone dump should be taken to determine which of the above conditions occurred. The console sheet, if available, might also help because ASM may have previously issued message(s) ILR0091. If ASM is able to continue processing without the unuseable data set, message ILR009I is written to the operator. This message indicates which volume contains the unuseable data set. If this message occurs, use the DUMP command to take an SVC dump of master's address space to detennine what error occurred. The options specified should include NUC and SQA. To determine from the dump what error occurred, the PART or SART entry for the unuseable data set and the 10SB for the failing request must be located. Use the AMDPRDMP service aid (print dump) with ASMDATA control statement to print the dump. One of the following conditions occurred on the data set to make it unuseable: • If the number of write errors (X'18' into the entry: PARERRCT or SRERRCNT) is 175, ASM has stopped using the data set because it has incurred too many write errors (one way for this to happen is if the data set was not formatted). • If the completion code (X'OD') in the 10SB is a X'5 1" ASM has stopped using the data set because there is no longer a path to the device (this could happen as a result of an ACR condition). • If the completion code in the 10SB is a X'6D', ASM has stopped using the data set because the channel or the device has become non-operational. Section 5: Component Analysis 5.6.15 Auxiliary Storage Manager (ASM) (continued) • If the completion code in the IOSH is a X'41 " the device status in the IOSH (offset X'lB') is X'02' and the sense data in the IOSH (offset X'2A') is X'1000', ASM has stopped using the data set because an equipment check occurred. • If the completion code in the IOSH is a X'41 ' and the channel status in the IOSH (offset X'19') is X'OB', X'04', or X'02', ASM has stopped using the data set because a channel check occurred. The system is terminated only if this unuseable data set (or several unuseable data sets) caused one of the following conditions: • The unuseable paging data set contains PLPA pages and the duplex data set, if any, is already unuseable or full. • The unuseable paging data set is the duplex data set and not all PLPA pages are accessible (that is, the PLPA paging data set or a data set containing PLPA pages is unuseable). • 5.6.16 The unuseable paging data set is the last local paging data set. OS/VS2 System Programming Library: MVS Diagnostic Techniques Auxiliary Storage Manager (ASM) (continued) Page/Swap Data Set Errors Figure 5·29 shows the messages issued and the actions taken by the ASM I/O sub· system for various error conditions with the page and swap data sets. Duplex Status Message(s) Issued ASM Action Taken Error Conditions Common * Available ILROO51 Spill to Common Common **Unavailable ILR0101 Duplex Only ILROO91, ILR0101 Duplex Only PLPA Available ILROO61 Spill to PLPA PLPA Unavailable ILR0101 Duplex Only ILROO91, ILR0101 Duplex Only PLPA or Common Available ILROO71 Suspend Duplexing PLPA and Common Unavailable ILROO8W Wait X'03C' PLPA Full PLPA Bad Common Full Common Bad Duplexing Active Duplex Full PLPA and Common Available ILROO71 Suspend Duplexing PLPA or Common Full ILROO71 Suspend Duplexing PLPA or Common Bad ILROO8W Wait X'02E' PLPA and Common Unavailable ILROO8W Wait X'02E' PLPA Full ILROO51 Spill to Common PLPA Bad ILROO8W Wait X'02E' Common Full ILROO61 Spill to PLPA Common Bad ILROO8W Wait X'02E' PLPA and Common Full ILROO8W Wait X'03C' Local Bad ILROO91 Stop Writes to Bad Data Set Last Local Bad ILROO8W Wait X'02E' Swap Bad ILROO91 Stop Swap-outs to Bad Data Set Last Swap Bad ILROO91 All Swap-outs Done to Page Data Sets Duplex Bad Duplexing Not Active In Either Case * Available - Data Set Neither Full Nor Bad **Unavailable - Data Set Either Full or Bad !Fi_gure 5·29. Page/Swap Data Set Error Action Matrix SectionS: Component Analysis 5.6.17 Auxiliary Storage Manager (ASM) (continued) Error Analysis Suggestions The following are some guidelines for determining ASM problems: • Print the dump specifying ASMDATA as a control statement to AMDPRDMP. I• Check SYSI.LOGREC and the LOGREC buffer to see if ASM's mainline has abended. If it has, a request might have been lost or mishandled. • Check the trace table for recent ASM activity. The key trace table entries are SRB dispatches for ILRPTM (address of SRB in ASMPSRB, X'58' into ASMVT), or ILRSWPDR (address of SRB is SARSRBP, X'30' into SART). Also look for schedules of the post status SRB closely following an interrupt for ASM I/O (CSW points to the nucleus area), which could be temporary or permanent I/O errors coming to ILRCMP or one of its entry points. • Check for outstanding I/O requests and determine the status of the I/O by looking at the VCB and 10SB. • Check for I/O errors on the paging packs, either on the error record (X'14' into the ASMVT), or on SYSI.LOGREC. • Scan the ASMHD's LGE process queues (LGEPROCQ) for current VIO activity. Determine the extent of ASM processing for these LGEs. Determine the logical group for which a VIO group operator has been requested. • Scan the PART entries for the PAREFSIP flag which indicates that the PART entry is locked and that sort slot or part monitor should be processing. Check the PART and PARTE 10E queues for requests waiting for I/O. Also scan the SART for the SRELOCK flag indicating that ILRSWPDR should be processing. • If you are interested in a specific request, find the request on ASM queues and determine the extent of ASM processing for the request. For an I/O request, convert the virtual page number to an LSID. • Scan the BWK and SWK cell pool for a work area that is not chained to another work area (offset 0). An unchained work area indicates current ASM processing or a lost work area. • Check for suspended ILRPTM or ILRSWPDR SRBs by scanning the PCB I/O queues (painted to by the RSMHD and the PVT) for a suspended SRB whose address matches ILRPTM or ILRSWPDR SRB's address. Although this situation should not occur, it does occur occasionally. 5.6.18 OS/VS2 System Programming Library: MVS Diagnostic Techniques Auxiliary Storage Manager (ASM) (continued) Validi ty Checking ASM is a nucleus-resident, performance-oriented component. For this reason, there is minimal validity checking in mainline code. In addition, few of ASM's problems can be attributed to invalid control blocks; this is probably because ASM communicates only with other system components. In both mainline and recovery code, critical global control blocks such as the ASMVT, PART, and SART, are used without any validity checking. ASM's recovery routines do validity check control blocks (and queues of these control blocks) that represent work to be processed by ASM. Some of these control blocks are the ACE, AlA, LGE, and PCCW. In most cases, if a control block fails the validity check, it is no longer used by ASM. The only exception is the 10RB-IOSB-SRB combination, which is refreshed. ASM Serialization Serialization of ASM processing is done using the SALLOC and ASM global locks, the local lock of the current address, compare-and-swap (CS) logic and control block queueing. SALLOC Lock ASM uses the SALLOC lock to serialize most page and swap processing in I/O control. The I/O control modules interface directly with RSM, the principle user of SALLOC, either as the called routine or the calling routine. The SALLOC is held throughout processing including calls to the VIO ILRPOS and completion routines. The SALLOC is used to serialize most processing of: XPTEs PCB/AlAs SPCTs SART SATs complete coverage. .complete coverage, except AlA noted below. complete coverage. complete coverage, except where noted below. - complete coverage. Specific areas of other control blocks serialized by the SALLOC lock are: ASMVT Work save areas. I/O control section fields: Flags ASMDUPLX ASMNOCWQ ASMCALLQ ASMNODPX ASMPLPAF ASMCOMMF LGVT pointer Section 5: Component Analysis 5.6.19 Auxiliary Storage Manager (ASM)·(continued) ASMVT (continued) Non-Via slot allocated count. Expansion of ASM pools. ASMHD I/O control flags. Swap and page counters. Swap queue. ASCB Non-VIO slot allocated count. LGVT Available LGVTE queue. Expansion of the LGVT. PART Count of local page data sets. Modules whose processing is serialized by the SALLOC lock are: ILRPAGIO complete coverage, held by caller. ILRPAGCM complete coverage, obtained at entry. ILRFRSLT complete coverage, except ILRFRSL 1 entry point where caller mayor may not hold the lock. The lock is not obtained by this module, held only if by caller. ILRSWAP complete coverage, held by caller. ILRPTM only obtained to process data set full conditions for non-local page data sets. ILRCMP only obtained to process I/O completion error conditions that may require operator notification. ILRMSGOO complete coverage for main entry point, held by caller. ILRPOS complete coverage, held by caller (except for ILRTRANS entry point). ILRVIOCM complete coverage, held by caller. ILRGOS only obtained for LGVT processing and GETMAIN/FREEMAIN requests. ILRPGEXP only obtained to adjust the SART to reflect addition of a new swap set data and update the count of local page data sets on the PART. ILRTERMR obtained when referencing above control blocks. ILRPEX obtained when expanding an ASM pool. ASM Class Locks The ASM lock is a global spin class lock. A lockword must be provided when obtaining or releasing an ASM class lock. A class Jock exists for each active address space. The lockword is in the ASMHD. It is used by the VIa controller modules. A class lock is also defined for the PART write queues with its lock word in the PART header. This lock serializes the four FIFO 10E write queues in the PART. The address space class locks serialize processing of the following control blocks: 5.6.20 AlA VIa controller flags, LPID field. ASMHD VIa controller flags, LGE queue base pointer. OS/VS2 System Programming Library: MVSDiagnostic Techniques Auxiliary Storage Manager (ASM) (continued) ASCB VIO slot allocation count. LGE complete coverage. ACE complete coverage. ASPCT complete coverage while group operations are in progress. Group operations and page operations can be executed in parallel. VIO controller processing of the LGE process queue provides this serialization. The address space class locks serialize processing in the following modules: ILRGOS partial, obtained where processing above control blocks. ILRPOS complete coverage. ILRSRBC partial, obtained when searching LGE queue and LGE process queues. ILRVIOCM complete coverage. ILRJTERM partial, obtained when adding ACEs to LGE process queue. Local Lock of Current Address Space The local lock is used by VIO controller and VIO group operator modules to serialize certain VIO related operations. It is used by ILRGOS (held on entry) and ILRJTERM (obtained) to serialize release LG requests with the internal ASM deactivate function used to clean up VIO logical groups for a terminating job. The local lock is also used by most VIO-related modules to allow use of branch entry GETMAIN, rather than the SVC route. Compare and Swap (CS) Serialization Certain modules of ASM run without locks, requiring CS serialization of pointers, flags, and counts. Where routines running with the locks change fields used by unlocked routines, CS must be used. The I/O subsystem and VIO group operators run unlocked and are the principle users of compare and swap. Control blocks serialized via CS include: PART a special CS lock exists for each PARTE controlled by PART monitor. This lock is used mainly for execution control. Most fields are still serialized by CS. The 10E write queues are the exception described above. PATs complete coverage . ASMVT I/O subsystem and group operator sections. I/O error count. unreserved slot count. pool controllers. VIO slot count. A special CS lock exists in each SARTE to serialize swap driver processing of the swap data sets. Other fields updated by swap driver or I/O completion processing of the I/O subsystem are updated with CS. SART Section 5: Component Analysis 5.6.21 Auxiliary Storage Manager (ASM) (continued) The ASM modules that run without locks, using CS to serialize control block fields are: ILRSWPDR ILRPTM ILRSRT ILRCMP ILRSAV ILRACT ILRRLG ILRTMRLG ILRVSAMI Serialization via Control Block Queues Certain ASM control blocks are serialized via their available queues. The blocks are kept on available queues and removed when needed. While in use the block is so marked and associated with a specific operation and/or control block. Control blocks included in this category are PCCWs, IORBs, and SCCWs. The ASPCT is a special case. VIO control enforces the rule that page and group operations cannot be performed in parallel for a given logical group and its ASPCT. This is controlled by the LGE process queue. While paging operations are being performed, the ASPCr is serialized via the ASM class lock of the owning address space. While a group operation is in progress, ASPCT serialization is maintained by the ACE for the group operation that is on the LGE process. This ACE prevents any other processing of ASPCT until the group operation completes. Recovery Considerations The philosophy of ASM's recovery is to allow the system and ASM to continue processing. To accomplish this, the first step in ASM's recovery routines is to validity check any control block or queue that might have been affected by the error, for example, the AlAs on the ASMSTAGQ. This is to allow future ASM processing to proceed without error. The second step in ASM's recovery is to notify ASM's caller that an error has occurred. In.a few instances w~ere ASM is directly invoked by RSM (such as ILRPAGIO or ILRSWAP), ASM recovery attempts retry to return to RSM with a failing return code. When an error occurs during ASM processing that runs asynchronously, ASM recovery queues the failing request for eventual return to RSM. When an error occurs during ASM processing of a VIO group operator request, ASM recovery cleans up its resources and allows the associated task to terminate. 5.6.22 OS/VS2 System Programming Library: MVS Diagnostic Techniques Auxiliary Storage Manager (ASM) (continued) Recovery Traces A dump of SYSl.LOGREC is a prerequisite to debugging ASM problems. ASM's recovery always records the SDWA to the SYSl.LOGREC data set. It is the most convenient way of determining that recovery has been invoked. The recovery routine id in the SDWA indicates which recovery routine was invoked. ASM has a number of system abend completion codes ('08x' series) that are always set up for retry. The purpose of these ABENDs is to record to SYSl.LOGREC logical errors that have occurred in ASM's mainline processing. Recovery Structure ASM has eight recovery routines for ASM mainline: \ • ILRIOFRR is an FRR that provides recovery for ILRPAGIO, ILRPOS, ILRPAGCM, and ILRVIOCM. It also acts as a router, giving control to the routines in ILRSWPO 1. • ILRSWPOI contains recovery routines for ILRSWPDR and ILRSWAP. • ILRSRTO 1 is an FRR that provides recovery for part monitor (ILRPTM) and slot sort (ILRSRT). • ILRCMPOI is an FRR that provides recovery for the I/O completion routine (ILRCMP). • ILRGOSO 1 is both an FRR and an EST AE that provides recovery for ILRGOS, for the group operators ILRSAV, ILRACT, and ILRRLG, and for ILRVSAMI. • ILRTMIOI is the ESTAE that provides recovery for ILRTMRLG and for its path through ILRVSAMI. • ILRSRBOI is an FRR that provides recovery for ILRSRBC. • ILRFRRO 1 is a validity check routine called by most of the recovery routines. Additional recovery routines are: • TERMRFRR is an FRR that is an entry point into and provides recovery for ILRTERMR. • ILRJTMOI is an FRR that is an entry point into and provides recovery for ILRJTERM. • ILRMSGO 1 is an FRR that is an entry point into, and provides recovery for ILRMSGOO. Section 5: Component Analysis 5.6.23 Auxiliary Storage Manager (ASM) (continued) • ESTAER is an ESTAE that is the entry point into and provides recovery for ILRPGEXP. • ESTAEXIT is an ESTAE that is an entry point into and provides recovery for ILRPREAD. Recovery As a Debugging Tool Recovery has a benetlcial effect on problem solving primarily because having it invoked isolates the problem to a specific area of ASM. If there is a paging interlock or duplicate page problem subsequent to an abend in ASM, the two are probably related and the first error provides information useful in debugging the second problem. Recovery ignores invalid control blocks and truncates some of ASM's internal queues in order to allow ASM to continue processing. Therefore, recovery will cover up valid problems that cause code overlays in ASM and other system components. The primary culprit in covering up errors is the non-historical nature of ASM resource queues that results in rapid reuses of critical control blocks. The only valuable information left by the recovery is the SDWA with its variable recording area in the SYSl.LOGREC data set. At the very least, this record provides sufficient information to trap the problem when it recurs. Recovery Footprints FRR/ESTAE Work Areas ILRATA and ILREPATH are mapping macros that define the areas required by ASM modules to provide tracking information for the FRRs and ESTAEs. • ILRATA defines the six-word parameter area passed to the ASM routine issuing the SETFRR macro, or it defines the parameter area passed to the ASM routine issuing the ESTAE macro. It contains a module ID in the first byte, flags in the next three bytes, and four words which have module-dependent contents. The IDs of the ASM modules are: ILRPAGIO ILRPAGCM ILRSWAP ILPTRPAG ('01 'X) ('02'X) ('03'X) ('04'X) ILRSWPDR ILRGOS ILRPTM ILRSRBC ('OS'X) ('06'X) ('07'X) ('08'X) ILRCMPDI ('09'X) ILRCMPNE ('OA'X) ILRCMP AE ('OB'X) ILRCMP ('OC'X) • ILREPATH defines a variable-length area containing any additional recovery audit-trail data required for recovery by ASM recovery routines. The address of the EPATH, if present, is in the AT A. There are four variations of the EPATH area. 5.6.24 OS/VS2 System Programming Library: MVS Diagnostic Techniques Auxiliary Storage Manager (ASM) (continued) The formats of ILRAT A (ASM tracking area - AT A) and ILREP ATH (recovery audit trail area - EP ATH) are described later in this chapter in the topic "ASM Recovery Control Blocks". SDWA Variable Recording Area ASM uses the SDW A variable recording area (SDW AVRA) to save the contents of the ATA (and EPATH, if present) upon entry to some of the recovery routines. This preserves the original state of the error before recovery took place. ILRIOFRR saves the ATA. ILRGOSOI, ILRCMPOl, and ILRSRTOI save the ATA and EPATH. ILRTMIOI saves only the EPATH. ASM Diagnostic Aids This section contains diagnostic aids that are helpful in debugging problems in ASM. Topics included are: • COD ABEND Meanings for ASM • ASM Recovery Control Blocks • Additional ASM Data Areas Section 5: Component Analysis 5.6.25 Auxiliary Storage Manager (ASM) (continued) COD ABEND Meanings for ASM An RSM routine has found one of the following conditions which should not occur and has set the appropriate return code in register 15: RC 4 - The count of available swap sets· for a specific swap data set is non-zero but no available swap sets could be found. RC 8 - The total count of available swap sets is non-zero but none of the swap data sets contain available swap sets. RC 12 - The group operations starter has returned from one of the group operators but the ACE is not the first one on the LGE queue. RC 16 - The memory termination resource manager for ASM has found that LSQA is not available for an address space that is abnormally terminating for one of the following reasons: 1. 2. 3. address space is not swapped in address space is in process of being swapped in RSMLSQA frame queue is unusable. RC 20 - The ASM SRB controller has found an AlA or ACE on the LGE process queue which does not have the LPID converted flag on. A software error record is written to SYS I.LOGREC and recovery processing continues. ASM Recovery Control Blocks During error recovery and cleanup processing, the ASM recovery routines communicate with other routines by using the ASM tracking area (ATA) and recovery audit trail area (EPATH). ASM Tracking Area (ATA) The AT A contains information necessary for the recovery or cleanup processing performed by the ASM recovery routines. The ATA is mapped to the six word work area returned by SETFRR when an FRR is established. For task mode routines, the ATA is mapped to the parameter area that is passed via the ESTAE macro. 5.6.26 OS/VS2 System Programming Library: MVS Diagnostic Techniques Auxiliary Storage Manager (ASM) (continued) \ The mapping macro name is: ILRATA. Disp 0 0 Name Size Description ATA ATAMODID ATAMPGIO ATAMPGCM AT AM SWAP ATAMTRPG ATAMSWPD ATAMGOS ATAMPTM ATAMSRBC ATAMCMPD ATAMCMPN ATAMCMPA ATAMCMP 24 1 01 02 03 04 05 06 07 08 09 OA OB OC ASM Tracking Area ID of module establishing recovery routine. ILRPAGIO module ID. ILRPAGCM module ID. ILRSWAP module ID. ILRTRPAG module ID. ILRSWPDR module ID. ILRGOS module ID. ILRPTM module ID. ILRSRBC module ID. ILRCMPDI module ID. ILRCMPNE module ID. ILRCMPAE module ID. ILRCMP module ID. ATASFLGS 3 ATAQIOE ATASLSQA ATASCOMP ATAVIOCM ATAPCOMP ATAPOS ATAPAGIO ATAPAGCM ATASWAP ATATRPAG ATASWPDR ATASRT 800000 400000 200000 100000 080000 040000 020000 010000 008000 004000 002000 001000 Bit map representing logical sections of ASM routines; set to 1 on entry, set to 0 on exit. ILRQIOE flag. ILRSLSQA flag. SWAPCOMP flag. ILRVIOCM flag. PAGECOMP flag. ILRPOS flag. ILRPAGIO flag. ILRPAGCM flag. ILRSWAP flag. ILRTRPAG flag. ILRSWPDR flag. ILRSRT flag. The remaing flags are reserved: ATARFLGS ATACNVRT 2 8000 ATASGNST ATASCCWP ATABADPK 4000 2000 1000 Other recovery flags. ILRSLSQA flag-converting between forward chained AlA's and lateral chained AT A's. ILRSLSQA flag-in ASSIGNSET subroutine. ILRSLSQA flag-in SCCWPROC subroutine. ILRCMPAE flag-in BADPACK subroutine. Section 5: Component Analysis 5.6.27 Auxiliary Storage Manager (ASM) (continued) Disp Name Size Description The remaining flags are reserved: 6 7 ATARCRSN ATARCRF1 ATARCRF2 ATARCRF3 ATARCRF4 ATARCRFS ATARCRF6 ATARCRF7 ATARCRF8 ATARCODE 1 80 40 20 10 08 04 02 01 1 Recursion flags. Recursion flag-function 1. Recursion flag-function 2. Recursion flag-function 3. Recursion flag-function 4. Recursion flag-function S. Recursion flag-function 6. Recursion flag-function 7. Recursion flag-function 8. Reason code for ASM-issued ABEND's. The mapping of the remaining four words is dependent on the recovery routine involved. For the recovery routine ILRIOFRR: 8 8 8 C ATAWORDS ATAAIA ATAACE ATAASCB 16 4 4 4 C C ATALGE ATAAIAQ 4 4 Maximum size of four-word area. Address of in-process AlA. Address of in-process ACE. Address of in-process ASCB, or TRAS'd-to address space. Address of in-process LGE. Address of AlA queue. For the recovery routine ILRSWP01: 8 ATACLEAR 16 8 C 10 14 ATAAIA ATASARTE ATASCCW ATAIORB 4 4 4 4 Definition allowing next four words to be cleared. Address of in-process AlA. Address of SART entry. Address of in-process SCCW. Address of in-process IORB. For the recovery routine ILRGOSO 1 : 8 C ATAWORKA ATAEPATH 4 4 Address of work-area cell. Address of EPATH. For the recovery routine ILRSRT01: 8 C 5.6.28 ATAWORKA ATAEPATH 4 4 OS/VS2System Programming Library: MVS Diagnostic Techniques Address of PTM work-area cell. Address of EPATH. li.l ~ Auxiliary Storage Manager (ASM) (continued) Disp Name Size Description For the recovery routine ILRSRB01: 8 C 10 14 ATAAIACE ATAAIAQ ATAACEQ ATAEPATH 4 4 4 4 Address of in-process AlAI ACE. Address of AlA queue. Address of ACE queue. Address of EPA TH. For the recovery routine ILRCMP01: 8 C ATAIOSB ATAPCCWQ 4 4 10 ATACOMPQ 4 14 ATACPCCW 4 Address of in-process IOSB. Queue of PCCWs to be put back on PCCW available queue. Queue of AlAs to be returned to ILRPAGCM. Address ofin-process PCCW, not on IORB queue and not on ATAPCCWQ. For the recovery routine ILRJTM01: 8 8 ATASAVE ATAACEQ 4 4 Address of register save area. Address of ACE queue. For the recovery routine TERMRFRR: 8 ATARMPL 4 C ATAWORKA 4 Address of RMPL, resource manager parameter list. Address of work-area. Recovery Audit Trail Area (EPATH) The EPATH is a communication area between the mainline routine and its corresponding recovery routine. The EPATH is necessary when the 6 word ATA is not large enough to accomodate the data to be tracked. The mapping of the EPATH is dependent on the recovery routine or mainline routine including the macro. EPATH for ILRPTM, ILRSRT, and recovery routine ILRSRTOI: Disp 0 4 8 Name Size Description EPAPARM EPAIOEIP EPAIOEQP 4 4 4 Address of parameter list. Address of IOE currently being processed. Address of first IOE on 'WORK' read IOE queue. Section 5: Component Analysis 5.6.29 Auxiliary Storage Manager (ASM) ( continued) Disp Name Size Description C EPAFFIOE 4 10 EPALFIOE 4 14 EPAWRTQ 4 18 EPAWTPAT 4 IC 20 EPACYLA EPAMSPAD 4 4 24 26 EPAWRTCT EPACPUID 2 2 Address of-first IOE on free IOE internal queue. Address of last IOE on free IOE internal queue. Address of write queue from which last write IOEs removed. Address of SCYLWRT which is used to update current CYL Map. Address of current CYL Map. Address of 2 word parameter list for ILRMSGOQ. Also serves as a switch for ILRPTM. Number of writes prepared for current CYL. Processor locking count for current part monitor processing. EPATH for VIa group operators and their recovery routines - ILRGOS, ILRSAV, ILRRLG, ILRACT, ILRVSAMI, ILRGOSOl, ILRTMRLG, ILRTMIOO, ILRTMlOl, ILRSRBC, and ILRSRBOl. ILRGOSOI is the recovery routine for ILRGOS which callsILRSAV, ILRRLG, and ILRACT which call ILRVSAMI. ILRTMIOI is the recovery routine for ILRTMRLG which calls ILRVSAMI and ILRTMIOO. ILRSRBO 1 is the recovery routine for ILRSRBC which calls ILRRLG. The first section is common because of the use of ILRVSAMI. The second section is dependent on the recovery routine involved. Disp 5.6.30 Name Size 0 EPAOWKA 4 4 EPAVWKA 4 4 4 8 8 C C 10 10 EPATMWKA EPASWKA EPAAASP EPADSLST EPABASP EPATMIBA EPA RASP EPATMACB 4 4 4 4 44 4 4 14 EPARTYRG 4 14 EPABKSLT 4 OS/VS2 System Programming Library: MVS Diagnostic Techniques_ Description Group Operator's or ILRTMRLG's workarea address. ILRVSAMI workarea address also points to RPL in workarea. ILRTMIOO workarea address. ILRSRBC workarea address. Address of active ASPCT. Address of data set name list storage. Address-of buffer ASPCT. Base address value for ILRTMIOO. Ada-ress of retrieved ASPCT. Address of storage used to build ACB for STGINDEX in ILRTMIOO. Address of 15 word save area containing retry_registers RO-RI4 for record-only abends. Backing slots, only used for assign processing. Auxiliary Storage Manager (ASM) (continued) Disp Name Size 18 EPAFLAGI EPAVSAMI EPAGRPOP EPARLG EPASAVE EPAACT EPAACASR EPAASGN X'80' X'70' X'40' X'20' X'IO' X'08' X'04' EPAUNSAV E~AMAST X'02' X'OI' I X'80' X'40' ')('20' X'IO' X'08' EPATMI EPARECUR X'04' X'02' 1ft X'OI' 1ft 19 EPAFLAG2 EPATMXIT EPAWARM EPACOLD EPABUlLD Description Recovery flags. ILRVSAMI currently processing. One of group operators processing. ILRRLG is currently processing. ILRSAV is currently processing. ILRACT is currently processing. Activate or assign request. Assign processing - backing slots count (ASMBKSLT) has been updated. Mark slots unsaved in active ASPCT. Reserved. Recovery flags. ILRTMIOO completed processing. ILRTMIOO warm start is processing. ILRTMIOO CVIOSTRT is processing. ILRTMIOO BUILDSNL is processing. Master scheduler initialization has been posted. ILRTMIOO is currently processing. Recursion indicator for retry into mainline ILRTMRLG. Reserved. For ILRGOSO I, ILRSAV, ILRACT, ILRRLG, ILRSRBC, and ILRSRBO I : Disp Name Size Description IA IC EPALSIZE EPALGVTP 2 4 20 EPALGEP 4 24 28 2C 30 32 EPASRB EPAACE EPARBASP EPARSIZE 4 4 4 2 2 Size of LGVT expansion. New LGVT address for LGVT expansion in ILRGOS. Logical group entry for request being processed. Address of SRB for SRB controller. * Address of current ACE being processed. Address of rebuilt ASPCT (LSQA). LSQA block storage size for rebuilt ASPCT. Reserved. Section 5: Component Analysis 5.6.31 • "xiliary Storage Manager (ASM) (continued) For ILRTMI01 and ILRTMRLG, andILRTMIOO: Disp Name Size Description 1A 1C 1C EPAACE EPAMSECB * 2 4 4 20 24 24 EPATMRSV EPAABEND EPATMIRT 4 4 4 28 EPATPART 4 Reserved. Address of ACE currently being processed. Address of master scheduler initialization ECB. Address of ILRTMRLG save area. Retry address for record-only abends. Current retry address for failure in ILRTMIOO. Address of TPARTBLE while in ILRTMIOO. Additional ASM Data Areas The following four ASM data areas (BSHEADER, BUFCONBK, DSNLIST, and MSGBUFER) are not contained in OS/VS2 Data Areas. For debugging ASM, BSHEADER (bad slot record) may be especially helpful. BSHEADER 5.6.32 Acronym: BSHEADER Full Name: ASM error record (bad slots) . Macro ID: None. Size: 1024 bytes. Function: Trace table of the last 253 slots that ASM has found to be bad. Patterns of bad LSIDs can indicate where and what paging data sets are having difficulties. Location: Pointed to by ASMVT (ASMEREC). Name Description 4 BSCURR Current bad slot entry filled. 4 (4) 4 BSFIRST Beginning address of table. 8 (8) 4 BSLAST End address of table. 12 (C) 1012 BSLIST 253 four-byte bad slot identifiers (LSIDs). Offset Length 0(0) OS/VS2 System Programming Library: MVS Diagnostic Techniques Auxiliary Storage Manager (ASM) (continued) BSLIST entry 0(0) BSFLAG 1. .. 1... 1 (1) 3 BSSPLSID if 1, LSID entry is swap. if 0, LSID entry is page. BSRDLSID if 1, LSID entry is for a read error. if 0, LSID entry is for a write error. BST ABNTY LSID that is bad. BUFCONBK Acronym: BUFCONBK Full Name: VSAM buffer control block. Macro ID: None. Size: 12 bytes. Function: Queue VIO group operation for later processing until VSAM resources are available. Location: Pointed to by ASMVT (ASMGOSQS). Name Description 4 BUFCHAIN Pointer to next BUFCONBK. 4 (4) 4 BUFASCB Pointer to ASCB. 8 (8) 4 BUFACE Pointer to ACE. Offset Length 0(0) DSNLIST Acronym: DSNLIST. Full Name: Data Set Name List (ASM). Macro ID: None. Size: 44 times number of possible page/swap data sets. There are two DSNLISTs, one for page data sets and one for swap data sets. Function: Make data set names available in non-fixed (pageable) storage. Location: Pointed to by PART (PARTDSNL) for page data sets, and by SART (SARDSNL) for swap data sets. Offset Length 0(0) 44 Name Description DSNENTRY Data set name left-justified and padded with blanks. Section 5: Component Analysis 5.6.33 Auxiliary Storage Manager (ASM) (continued) MSGBUFER 5.6.34 Acronym: MSGBUFER. Full Name: ASM message buffer. Macro ID: None. Size: 376 bytes. Function: Ensure that WTOR ~ith LOGREe request will have a buffer to use. Location: Pointed to by ASMVT (ASMMSGBF). Offset Length Name Description 0(0) 4 MSGCURR Pointer to current buffer used. 4 (4) 4 MSGFIRST Pointer to first buffer. 8 (8) 4 MSGLAST Pointer to last buffer. 12 (C) 4 MSGTERM Pointer to special termination buffer. 16 (10) 240 MSGBFRS Three 80-byte buffers .. 256 (100) 120 MSGTBFR Special termination buffer. OS/VS2 System Programming Library: MVS Diagnostic Techniques System Resources Manager (SRM) The system resources manager (SRM) is a component of the MVS control program. It determines which, of all active address spaces should be given access to system resources, and the rate at which each address space is allowed to consume the resources. An installation controls the MVS system primarily through the SRM. The evaluations and resulting decisions made by the SRM are dependent on the constants and parameters with which it is provided. The reader should understand the philosophy inherent in the use of these constants and parameters, so that their use will produce the desired effect. Part 3 of OS/VS2 System Programming Library: Initialization and Tuning Guide, provides the background information necessary to understand the controls available through the SRM, and the implementation of these controls. SRM Objectives The SRM bases its decisions on two fundamental objectives: I. To distribute system resources among individual address spaces in accordance with the installation's response, turnaround, and work priority requirements. 2. To achieve optimal system-throughput through use of system resources. An installation specifies its requirements for the first objective in a member of SYSl.PARMLIB called the installation performance specification (IPS). Through the IPS, the installation divides its types of work into distinct groups called domains, assigns relative importance to each domain, and specifies the desired performance characteristics for each address space within these domains. A secondary input to the SRM is another member ofparmlib, the OPT member. Through a combination of IPS and OPT parameters, an installation can exercise a degree of control over system throughput characteristics. When the need arises, trade-offs can be made between SRM's objective.s. That is, the installation can specify whether, and under what circumstances, throughput considerations take priority over turnaround reqUirements. The SRM attempts to ensure optimal use of system resources by periodically monitoring and balancing resource utilization. If resources are under~utilized, the SRM attempts to increase the system load. If, on the other hand, resources are over-utilized, the SRM attempts to reduce the system load or to shift commitments to low-usage resources such as the processor, logical channels, auxiliary storage, and pageable real storage. Section 5•. Component Analysis 5.7.1 System Resources Manager (SRM) (continued) Address Space States The SRM recognizes address spaces as being in one of three general states. Each state corresponds in concept to a queue on which SRM places the SRM user control block (OUCB) which describes the address space. These three states are: 1. In The working set of an address space in this state occupies real storage. 2. Wait The working set of an address space in this state does not occupy real storage. It has been swapped-out, because it cannot be put into execution. 3. Out ~ The working set of an address space in this state does not occupy real storage; however, the address space is capable of executing and can be considered for swapping-in. It is important to recognize that the correspondence between these states and presence on the associated queue is not precise; an address space can be in transit between two states (for example, it may be in the process of being swapped-out). Thus, the presence on a particular queue might not exactly mirror the physical state of affairs. Further, these classes are necessarily broad, and SRM recognizes subclasses; this is especially true among address spaces belonging to the "In" class. The use of the swap transition flags, in conjunction with the presence of an OUCB on a particular queue, mirrors the exact physical state of an address space. For wait state analysis, the exact state of given address spaces is important. If you can determine precisely what state SRM considers the various ~ddress spaces to be in, and the reasons why, you will gain insight for further analysis. The OUCB is the primary address-space-related control block in which much of the above information can be found. In the OUCBQFL field (OUCB + X'10'), when the OUCBGOB bit is on, the SRM's OUCB repositioning routine is to be invoked. The destination of this pending OUCB repositioning is indicated by the following bit settings: 1. OUCBOUT='O'B ~ The OUCB will be placed on the "In" queue. 2. OUCBOUT='l 'B and OUCBOFF='l 'B "Wait" queue. 3. OUCBOUT='l 'B and OUCBOFF='O'B "Out" queue. The OUCB will be placed on the ~ The OUCB will be placed on the When the repositioning is completed, the OUCBGOB bit is turned off; the setting of the OUCBOUT and OUCBOFF bits indicates the location of the OUCB. The setting of the swap transition flags for swap-out processing occurs in the following order: 1. If swap-out is initiated successfully, the OUCBGOO bit is set. 2. At quiesce-complete time, the repositioning of the OUCB takes place. 3. At swap-out-complete time, the OUCBGOO bit is turned off. 5.7.2 OS!VS2 System Programming Library: MVS Diagnostic Techniques System Resources Manager (SRM) (continued) The setting of the swap transition flags for swap-in processing occurs in the following order: 1. If swap-in is initiated successfully, the OUCBGOI bit is set. 2. At restore~complete time, the repositioning of the OUCB takes place and the OUCBGOI bit is turned off. SRM Indicators It is helpful to understand how SRM views the total MVS system, as well as the individual address spaces. This understanding can assist you in further problem analysis, especially of enabled wait state situations. A discussion of some of the SRM system and individual user indicators follows. Figure 5-30 shows the relationships among important SRM control blocks and queues. A study of several counters and flags aids in further understanding of SRM processing. The counters and flags that pertain to the entire system are located in the SRM constants module (IRARMCNS), which resides in the nucleus. The counters and flags that pertain to a specific user are found in that user's OUCB. System Indicators The SRM control table (RMCT) is located at the start of module IRARMCNS. This address is found at the CVT + X'25C'. Generally, when SRM is in control, the address of the RMCT is contained in register 2. In the module IRARMCNS, the following fields provide information concerning SRM's current processing: MCTAVQl (RMCT + X'IDB'; bit 2) This bit indicates that the count of available pages has fallen below the PVTAFCLO value, so the real storage manager (RSM) has called SRM to steal pages in order to increase the count of available pages. If this bit is on, it could indicate a normal condition. MCTSQAI (RMCT + X'IDB'; bit 0) Indicates that the number of available SQA pages is critically low. If MCTSMSI (RMCT + X'I09'; bit 4) is 1, the operator was notified of this situation. MCTSQA2 (RMCT + X'IDB'; bit 1) Indicates that the number of available SQA pages has fallen below a second, more critical threshold than the one noted above. If MCTSMS2 (RMCT + X'I 09'; bit 5) is I, the operator was notified of this situation. MCTASMI (RMCT + X'109'~ bit 0) Indicates that the SRM has detected that less than 30% of all local slots are available. The SRM has informed the operator of this fact and has taken appropriate action to relieve the shortage. Section 5. Component Analysis 5.7.3 System Resources Manager (SRM) (continued) ~ IRARMCNS CVT \Y------:l:.--~ RMCT (SRM Tables and Entry Points) -- 4 4 4 - .. - 4 4 0# "' CCT ICT MCT RMPT RMCA WMST (IPS Information) R LCT (Logical Channel Information) 25C OUCBs •+ RMEX RMSB· - + (Workload Activity +WAST Specifications) (Workload Activity +WAMT Information for MF/1) TMQE 4 ACTION QUEUE ,0UCBs Anchor Queues ''Wail' OUCBs H+ Timed Actions • .OTQE INQE Algorithm Request Bits Immediate Algorithm Request Bits Service Work + Request Area (RQSV) DMDT (Domain Descriptor +Table) 4 J I RCT RMPT RMCA RMEX RMSB ~ EPAT EPDT EPST I DMDT (Last Entry) CCT ICT MCT CPU Usage Information I/O Usage Information Storage Usage Information Resource Control Information Swap Analysis Parameters Swap Analysis Variables External Entry POint Descriptors Subroutine Entry Points Algorithm Entry Point Descriptors Serialized Action Entry Point Descriptors Scanned Action Entry Point Descriptors Figure 5-30. SRM Control Block Overview (Part 1 of 2) 5.7.4 V WTQE TMQE WTQE • OUCBs for RMEPs in the EPAT, EPDT EPAT 4 .J ~eferred Aciion ~ 1 + EPDT 0# " ~ ~ + -- • RMCT OS/VS2 System Programming Library: MVS Diagnostic Techniques RMEPs Entry Point Descriptions I 11+ 'I '11 O~CBS ~ Wait Queue OTQE OUCBs L" I '0 t' ... Oul Queue INQE ~I+ .. -, In Queue OUCBs I I 'In' OUCBs ~ System Resources Manager (SRM) (continued) SRM Registers +RRPA ,...... RRPA (Recovery Parameters) ~ +RMCT Register 0 Contents on Entry (ASID, PGN, SYSEVENT Code) Register 1 Contents on Entry (I nput Parameter Address) RMEP "7' Entry point descriptor of routine most recently entered t RMEP WMST Bf--~=~--- 1 + PGVT ...-------1 • PGDT + POVT +PODT ~ f---.- Pointer to and indexes into performance group descriptor table (entry location similar to that for domain descriptor table) J Pointer to and indexes into performance objective table (entry location similar to that for domain descriptor table) 1 - - - - - - - - - - - - - -..... • DMDT - - - ----4 DMVT + DMDT (domain descriptor table) ~ Entry for i th domain x -- - .~ IMCB ASCB ~----I~ t OUCB I f I \ I/O Measurement Information /J-f_O_U_X_B-I t ASXB OUCB J SRM User Statistics "- t ~ ASCB J +IMCB \OUSB SRM User Statistics (Temporary) SRM User Statistics (Swapped) I Figure 5-30. SRM Control Block Overview (part 2 of 2) Section S. Component Analysis 5.7.5 System Resources Manager (SRM) (continued) MCTAMS2 (RMCT + X'ID9'; bit 3) Indicates that the SRM has detected that less than 15% of the total local auxiliary storage slots are available. The SRM has informed the operator of the slot shortage, and has taken appropriate action to relieve the shortage. MCTFAVQ (RMCT+X'ID8'; bit 3) This bit indicates that the count of fixed pages in the system is above the threshold value, PVTMAXFX, and the real storage manager (RSM) has called SRM to swap-out the users responsible for the shortage of pageable frames. If MCTFXI (RMCT + X'1 D9'; bit 6) is 1, the operator was informed of this situation. RCVUICA (RMCT+ X'21E'; halfword) RCVCPUA (RMCT + X'220'; halfword) RCVASMQA (RMCT + X'224'; halfword) These values are the system contention indicators that the resource monitor examined for the last interval. They represent in the order given; the average unreferenced interval count (UIC), the average processor utilization, and the average ASM queue length. Based on these values, the target MPL for a domain is altered. RMCAINUS (RMCT + X'29E'; halfword) Indicates the count of address spaces currently residing in storage. This count includes non-swappable address spaces; If this count is high, look at the next field. CCVENQCT (RMCT + X'138'; halfword) Indicates the count of address spaces currently residing in storage and marked non-swappable because they are holding ENQ resources that other address spaces want. Individual User Indicators The SRMuser control block (OOCB) contains flags and counters to provide information about a specific user. There is one OUCB for each address space, pointed to by ASCBOUCB (ASCB + X'90'). The following fields help in the understanding of specific user characteristics. OUCBMWT (OUCB + X'15'; bit 7) If this bit is on, the SRM has detected that this user has not been dispatched, but was occupying storage for at least ~ seconds. (This interval is processor-model dependent.) The user will be swappedout until the dispatcher informs SRM that the address space has work to do. 5.7.6· OS/VS2 System Programming Library: MVS Diagnostic Techniques System Resources Manager (SRM) (continued) OUCBAXS (OUCB + X'12'; bit 5) When this bit is on, the user has been swapped-out of storage because the user's address space was obtaining auxiliary storage slots at the fastest rate in the system when an ASM slot shortage occurred. OUCBENQ (OUCB + X'il'; bit 6) A different address space has tried to ENQ on a resource held by this address space. This user is treated as non-swappable for an installation-defined time perioQ. OUCBYFL (OUCB + X'12') See specific bit deSignations below: • Bit 1 - indicates that the user was created via a.START command. • Bit 2 - indicates that the user was created via a TSO LOGON command. • Bit 3 - indicates that the user was created via a MOUNT command. OUCBFXS (OUCB + X'12'; bit 7) When this bit is on, it indicates that the user has been swapped-out of storage because the user's address space had allocated to it the greatest number of fixed frames when a pageable frame shortage occurred. ~, OUCBJSAS (OUCB + X'I7'; bit I) When this bit is on, it indicates that, at the time of job select processing for this user, there was an auxiliary slot shortage. This user's initiation is being delayed until the shortage is relieved. OUCBJSFS (OUCB + X'I7'; bit 0) When this bit is on, it indicates that there was a pageable frame shortage at the time of job select processing for this user. This user's initiation is being delayed until the shortage is reHeved. OUCBSRC (OUCB + X'2S'; 1 byte) This field contains a code describing why this user was last swapped-out. The codes are: 01 Terminal output wait 02 - Terminal input wait 03 - Long wait 04 - Auxiliary storage shortage OS Real storage shortage 06 Detected wait 07 Reqswap or transwap SYSEVENT issued Section S. Component Analysis 5.7.7 System ResQurcesManager (SRM) (continued) 08 09 OA OUCBRDY ENQexchange by swap analysis Exchange based on recommendation values by swap analysis Unilateral swapout by swap analysis. ( (OUCB +X'56'; bit 0) This flag indicates that ready work became available for this address space which was swapped-out due to a long wait. The address space is now capable of executing and is a candidate for swap-in. Other Indicators TheSRM domain descriptor table can be useful in pinpointing a problem involving SRM's MPL control. Mapping of the table can reveal why a user is kept out of main storage, why erratic response time occurs, and other user and system information. SRM Error Recovery SRM maintains two functional recovery routines (FRRs) that are located in IRARMERR. One FRR (recovery routine 1 - RR1) gets control whenever errors occur after SRM is branch-entered by a routine that holds a lock higher in the lock hierarchy than the SRM lock. The other FRR (recovery routine 2 RR2) gets control whenever errors occur and SRM is running with the SRM lock. If it is suspected that SRM is entering error recovery and a stop is necessary at the time of error, RMRR2INT is a subroutine common to both RRI and RR2. ( Recovery routine 1 (RR1) retries if a retry routine exists. If no routine exists, or if the error recurs, RRI percolates the error. With recovery routine 2 (RR2), many special situations such as the following are first checked: • Is RMF active and should it be terminated? • Is SET IPS active and should abend code be converted? • Is OUCB valid and should abend code be converted? Then RR2 retries if a retry routine exists. If no retry routine exists, or if the error recurs, RR2 percolates the error. Module Entry Point Summaries I Figure 5-31 shows a cross reference between SRM modules and entry points. Descriptions of selected SRM modules and entry points are: ( 5.7.8 OS/VS2 System Programming Library: MVS Diagnostic Techniques System Resources Manager (SRM) (continued) IRARMINT - SRM Interface Routine IGC09S - SVC entry pOint to SRM. IRARMIOO -Branch entry point to SRM. Handle all external SYSEVENTs. IRARMI48 - Branch entry point to the SRM. Handle the internal SYSEVENT (48). IRARMIOI - Entry point from IRARMEVT or IRARMCTL. Return to the SYSEVENT issuer. IRARMIIO - Entry point from IRARMEVT. Abend a user of the SRM. IRARMEVT - SRM SYSEVENT ROUTER IRARMEVT - SYSEVENT processor. Begin to process the indicated SYSEVENT. IRARMXVT - SYSEVENT retry. Prepare a retry of a SYSEVENT that had incurred a system error. IRARMDEL - Synchronize address-space delete processing. IRARMIPS - Set new IPS. Invoke IRARMSET to establish a new IPS. IRARMVXB - Synchronize OUXB deletion at swapout-completion time. IRARMSTM - Storage Management Routine IRARMPRI - Page Replacement Normal Processing. Examine each user in main storage and the system pageable area, and call RSM real frame replacement to update VICs for each user. IRARMPRS - Page Replacement Real Page Shortage Force Steal. Steal as many pages as required to relieve a real page frame shortage. The steal decision is made at entry IRARMMS2. The oldest unreferenced pages are stolen first. Section 5: Component Analysis 5.7.9 System Resources Manager (SRM) (continued) IRARMMS2, ' - Real Page Shortage Prevention. Calculate the number of frames necessary to reach the O.K. threshold, and schedule IRARMPRS processing (if a real page shortage eXists). Inform the operator of users that have the greatest number offixed frames and direct the swaps of these users (if a pageable real page shortage eXists). IRARMMS6 - Main Storage Occupancy Long Wait Detection. Discover users who have gone into long wait without notifying SRM. Swapout such users, if swappable. IRARMASM - Auxiliary Storage Shortage Monitoring. Monitor the extent of auxiliary shortage allocation. If auxiliary pages are in short supply, inform the operator and direct swaps of users who are most rapidly acquiring auxiliary storage slots. IRARMSQA STEAL lRARMSRV II - SQA Shortage Message Writer. Inform operator of system queue area shortages. Internal STM Steal Subroutine. Add users to RFR interface list until full, then call RSM real frame replacement (RFR) routine (via IRARMI03) and record the number of pages stolen. - SRM Service Routine. IRARMI02 Interface to ASCB CHAP. IRARMI03 Interface to RSM's real page frame replacement. IRARMI04 Obtain or free SQA storage. IRARM 10 5 Requeue SRM TQE routine. IRARMI06 Cross-memory post entry point. IRARMI07 SWAP SRB SCHEDULE routine. IRARMI09 RECORD entry point. TR"D~Mn1oL" i\.~ l~l Set a return code of 16 in register 15 and return. (Dummy routine) 1 IRARMERR - SRM'sFunctional Recovery Routine. IRARMRRI Functional recovery for globallY-locked entries (entries to SRM in which the SRM lock could not be obtained). Retry the failing SRM routine when possible. Otherwise, percolate the error. 5.7.10 OS/VS2System Programming Library: MVS Diagnostic Techniques System Resources Manager (SRM) (continued) IRARMRR2 - Functional recovery for non-globally-locked entries (entries to SRM in which the SRM lock was obtained). Validate queues and clean up. Retry the failing routine if possible; otherwise percolate the error. RMRR2RTY Return to RTM indicating retry. RMRR2PER Return to R TM indicating perc,olation. RMRR2INT FRR initialization. RMRR2VLD - Validate control blocks. RMRR2GST - Release the dispatcher lock in order to call IRARMI04. RMRR2CKQ - Verify the location of an OUCB. RMRRIVFB Verify addresses. RMRR2REQ OUCB enqueue routine entry point. RMRR2SPR Return with the return code in register 15. IRA RM CPM Processor Management. IRARMAPI Automatic Priority Group Reorder Processing. Recompute dispatching priorities for all APG users in main storage. Invoke ASCBCHAP for each user whose dispatching priority has chan.ged. IRARMEQI ENQ/DEQ Algorithm ENQ Time Monitoring. Stop giving extra CPU service to users with ENQHOLD SYSEVENTs outstanding who have already received their guaranteed processor service. IRARMCLO Processor Load Balancing User Swap Processing. Compute user processor usage profile at QSCECMP SYSEVENT. IRARMCLI Processor Utilization Monitoring. Compute processor utilization variables for processor load balancing and resource management algorithms. IRARMCL3 Processor Load Balancing User Swap Evaluation. Produce a numerical recommendation value that reflects the desirability of swapping a user based on processor utilization. SectionS: Component Analysis S.7.11 System Resources Manager (SRM) (continued) CHAP ~ CPLRVSWF - IRARMCPM Internal Wait Factor Computation Subroutine. Compute system wait factor for CPU load balancing recommendation value. CPUWAIT - IRARMCPM Internal Wait Time and Processor Utilization Compute Subroutine. IRARMCPM Internal Chapping Subroutine. Search queue for APG users with changed dispatching priorities. Put them in a list and call ASCBCHAP. Compu te accumulated system wait time total for all processors and compute recent processor utilization. CPUTLCK IRARMCPM Internal Processor Utilization Checking Routine. Ensure that the computed processor utilization percentage falls between 0 and 100 percent. If 100 percent and lowest priority user has not been dispatched, set to 101 percent. NEWDP IRARMCPM Internal APG Computation Routine. Compute mean time to wait and a new dispatching priority for the APG user. IRARMIOM IRARMILO IRARMILI 5.7.12 - I/O Management. I/O Load Balancing User I/O Monitoring. Compute I/O usage profile for all swappable problem-state users. - I/O Load Balancing Logical Channel Utilization Monitoring. Compute channel utilization values for I/O load balancing, page replacement algorithms, and the device allocation SYSEVENT. IRARMIL3 I/O Load Balancing User Swap Evaluation. Compute numerical recommendation value that reflects desirability of swapping a user based on logical channel utilization. IRARMIL4 I/O Load Balancing IMCB Deletion Routine. At the end of the user job step, clean up the control blocks used in monitoring a heavy I/O user. LCHUSE Internal I/O Subroutine. Compute logical channel utilization, request rate, and I/O load balancing recommendation value computation factor. OS/VS2 System Progtamming Library: MVS Diagno~tic Techniques System Resources Manager (SRM) (continued) IRARMRMR Resource Manager IRARMRMI Resource Monitor Periodic Monitoring. Accumulate, at periodic sample intervals, several system resource contention indicators and the number of ready users for each domain. IRARMRM2 - Resource Monitor MPL Adjustment Processing. Compute the average system resource utilization and determine if the system MPL should be raised or lowered. IRARMCTL - SRM Control Algorithms. IRARMCTL - Mainline Control Processing. Transfer to deferred user action processing (IRARMCEN) and then to the algorithm request routine (IRARMCEL). IRARMCEN Deferred User Action Processing. Examine the OUCBACN field of the OUCBs on the action queue and routes control to all routines whose request bits have been set in that field. Dequeue each OUCB after its indicated actions have been performed. IRARMCEL - Algorithm Request Routine. Examines the RMCTALR and RMCTALA fields in the RMCT. Routes control (via IRARMCRT) to each algorithm whose request bit has been set in either of the two fields. Reset the individual request bit after each algorithm completes. IRARMCET - Periodic Entry Point Scheduler. Accept timer interrupts, schedule the algorithms currently due for execution, and requeue the SRM timer element to permit interrupts again when the next algorithm is due for execution. IRARMCED - SRB-Dispatched Original Entry Processor. Receive control under an SRB scheduled by the dispatcher and set up an entry to the mainline of SRM (IRARMCEN) by invoking SYSEVENT 48. IRARMCQT - Periodically-Invoked Entry Point Rescheduler. Accept a request to reschedule the execution of a periodicallyinvoked algorithm, and requeue the corresponding RMEP block on the timed entry queue. IRARMCRD - SRB Scheduling Routine. Accept a request to schedule the SRM SRB which, if available, is scheduled to obtain the SRM lock. Section 5: Component Analysis 5.7.13 System Resources Manager (SRM) (continued) IRARMCRL - Algorithm Scheduling Routine. Accept requests for an algorithm to be run. Turn on the bit in the RMCTALA or RMCTALR associated with the algorithm. IRARMCRN - Action Request Routine. Accept requests for an action requiring the SRM lock. If the SRM lock is held, control is given to the action immediately via a routing routine. If the SRM lock is not held, the bit is set in the OUCBACN field of the OUCB associated with the requesting user, to indicate that the action requested is deferred. 5.7.14 IRARMCRT - Entry Point Table Scanner. Accept an 'invocation bit pattern and an entry point table address. Compare the bit pattern to invocation flags in the entry point table entries. When a match is found, invoke the routine identified by the entry point. IRARMCRY User Swap Request Receiving Routine. Accept a request for a user swap and check to see if such a swap is already in progress. Route control to IRARMCSO or IRARMCSI if a swap is not in progress and the SRM lock is held. IRARMCSI User Swap-In Request. Accept a swap-in request, allocate an OUXB for the user, and initiate the swap-in. IRARMCSO - User Swap-Out Request. Accept a swap-out request and post the region control task's quiesce routine to initiate the swap-out. IRARMRPS OUCB Repositioning Routine. Dequeue an OUCB and requeue it at the end of the queue specified in its OUCBQFL field. IRARMWMY Periodic Entry Point RequeUing Routine. Requeue all of the members on the timed algorithm queue and adjust all the time-due fields. IRARMCAP - Swap Analysis Algorithm. Attempt to keep the multiprogramming level (MPL) at its target level in each domain by performing user swaps. IRARMCPI - Select Swap-In Candidate Subroutine. Scan the OUT queue for the user in a particular domain with the highest recommendation value. IRARMCPO Select Swap-Out Candidate Subroutine. Scan the IN queue for the user in a particular domain with the lowest recommendation value. OS/VS2 System Programming Library: MVS Diagnostic Techniques System Resources Manager (SRM) (continued) IRARMCVL - User Swap Evaluation Routine. Compute a numerical value representing the recommendation for a user to be swapped in. This recommendation value is the sum of the user's workload level and the recommendations of the I/O and processor resource managers. IRARMWAR - Workload Activity Recording IRARMWRI - Workload Activity Recording Initialization Subroutine. Constructs and initializes the workload activity measurement table (WAMT) in the buffer (storage from SQA obtained by MF /1 and input with SYSEVENT 45). IRARMWR2 - Workload Activity Recording WAMT Initialization Subroutine. Build the WAMT in a format suitable for updating by the SRM. IRARMWR3 - SRM Workload Activity Recording Data Collection Subroutine. Move the contents of the WAMT into a collection buffer capable of containing the data. (Note that the buffer is obtained by MF/l from LSQA, storage key 0, and must be fixed in storage). If the IPS has not been changed, add to the collected data the transaction data for the current in-storage interval for each instorage address space with an active transaction, re-initialize the data collection buffer for the next collection interval, and calculate the workload level for each performance group period that contains transaction data. IRARMWR4 - SRM Workload Activity Recording Transaction Data Update Subroutine. Add the service and transaction active time to the appropriate WAMT performance group period accumulator in the data collection buffer. IRARMWR5 SRM Workload Activity Recording Workload Level Calculation Subroutine. Calculate the workload level for each WAMT performance group period entry in which a transaction has been accumulated during the data collection interval. Note that for those WAMT entries in which the service rate calculated can be associated with multiple workload levels, or is zero (even through at least one transaction has been active during the data collection interval), the negative value of the workload level is calculated to indicate an estimated value to MF /1. IRARMWR6 SRM Workload Activity Recording Transaction End Update Subroutine. Add the transaction elapsed time to the appropriate WAMT performance group period accumulator and count the number of transactions that terminated during the current data collection interval. Section 5: Component Analysis 5.7.15 System Resou~ces Manager (SRM) (continued) IRARMWR7 - SRM Workload Activity Recording WAMT Entry Determination Subroutine. Obtain addressability to the WAMT performance group period entry used to accumulate user transaction information. IRARMWR8 SRM Workload Activity Recording. Terminate workload activity data collection whenever an IPS change occurs. lRARMWLM - SRM Workload Manager IRARMWMI - Workload Manager Service Calculator Routine. Calculate the amount of service provided to a user since the beginning of the current workload manager measurement for that user. Service is calculated using the following equation: Service = (MP)/K)+(CT/K)+EI where: .T the TCB processor time elapsed for the current interval. K M the time required to execute 10,000 instructions. (dependent on the processor mode!). = the MSO service coefficient scaled by 1/50. P the number of page-seconds used by the user. C the processor service coefficient. E the Excp count for this interval. I the I/O service coefficient. This routine calculates each of the three service factors and the total service for the user for the interval. IRARMWM2 - Swappable User Evaluation Routine. Scan the in-storage queue and the out-of-storage-butready queue, and evaluate each swappable user, . assigning each his current workload level. IRARMWM3 - Individual User Evaluation Routine. Evaluate a swappable user on the in queue or the out queue, assigning acurrent workload level. IRARMWM4 - Workload Manager Workload Level Calculator Subroutine. Accept a service rate and a performance objective, and calculate the corresponding workload level. 5.7.16 OS/VS2 System Programming Library: MVS Diagnostic Techniques System Resources Manager (SRM) (continued) IRARMWMS - Workload Manager Update Performance Group Period Subroutine. Test whether a user has accumulated enough service/time to be assigned to a new performance group period. If so, adjust the pointers that indicate the performance group period, the performance objective, APG priority, and the domain applicable to the transaction current for the user. Note that the frequency (resolution) at which the test for period-end is made depends on how often IRARMWMS i.s called for any given user. IRARMWM7 - WLM Recommendation Calculation Routine. Calculate a workload manager recommendation value for a user, based on the service that was received and on the performance objective currently associated with the user. Users who have not yet received an amount of service equal to their interval service value (ISV) specification while in storage are given a recommendation value boost. The boost gives preferential treatment to users in their ISV as compared to users not in their ISV, or to users between job steps. IRARMHIT - Workload Manager User Ready SYSEVENT Swap-In Scheduling Routine. Reposition the now-ready user from the wait queue to the out queue. Receive control as the result ofa decision to apply swap-in processing to a now-ready user. IRARMWMI - Workload Manager In Storage Interval Change Subroutine. Update the transaction accumulators with the service and the time received by the user during the preceding in-storage interval. IRARMWMJ - Routine to Determine the Scope of ApplicabilitY' of Analysis to a User. Examine the current swap status and the performance specification for a user. Indicate if the resource manager algorithms are applicable to this user. IRARMWMK - WLM Dontswap/Okswap User Analysis Routine. Calculate the current service and ensure that the user is in the correct performance group period. Set applicable algorithm indicators based on the new swap status of the user. Section 5: Component Analysis 5.7.17 System Resources Manager (SRM) (continued) IRARMWMN - Workload Manager Transaction Start Routine. This routine receives control as the result of a SYSEVENT that has been defined by the workload manager to signify that a new transaction should be started for that user. If the user is not in storage; a flag is set to cause the IRARMWMN routine to be reentered during the swap-in of the user. Otherwise, any existing transaction is stopped by calling IRARMWMO, and the user transaction fields are reset to reflect the new transaction field being started. IRARMWMO - Workload Manager Transaction Stop Routine. This routine receives control as the result of a SYSEVENT that has, been specified by the workload manager as defining the end of any current user transaction. If a new transaction is to be created for the user, IRARMWMO indicates the end of the current transaction. If the next user event is known, IRARMWMO leaves the transaction accumulated values for later resumption of the transaction. In any case, IRARMWMO causes the preceding time and service to be properly recorded for the current transaction. IRARMWMQ - Workload Manager Quiesce Completed SYSEVENT Processing Routine. ~ This routine receives control when a user has stopped executing and is being swapped out so that the workload manager can record the service given that user while he was in storage. The workload manager determines if a user event caused the swapout, and flags the user to indicate whether previous service is to be considered when the user is next swapped in. IRARMWMR - Workload Manager Restore Completed SYSEVENT Processing Routine. This routine receives control when a user has been swapped in and is ready to begin executing. The workload manager sets up the fields used to calculate the service rate received by the user during the forthcoming in-storage residency period. IRARMSET - Set to New IPS Non-Resident Action Routine. Replace the internal IPS currently in use by the SRM with a new IPS. All references to the old IPS in the SRM's control blocks are resolved with offsets or addresses in the new one. IEEMB812 - Set IPS Processor. IEEMB812 Open PARMLIB. Processes the IPS parameter of the SET command. IRARMRDR - Obtain a buffer and reads records from PARMLIB. IRARMWTR - 5.7.18 Write a message to system log. OS/VS2 System Programming Library: MVS Diagnostic Techniques System Resources Manager (SRM) (continued) I RARM IPS - SRM List Processor. IRARMIPS IRARMFRE IRARMOPT I EAVNP 10 - Scan the IPS List in the SYS I.P ARMLIB member, and if valid, build control blocks containing the IPS information. Free the obsolete IPS tables. - Scan the IEAOPTxx member of PARMLIB. SRM Initialization. IEAVNPI0 1. Initialize constants in SRM tables. ~ 2. Initialize sysgened address spaces for the SRM. 3. Process the APG, OPT, and IPS system parameters. IRARMRDR - Obtain a buffer and read a record from SYSl.PARMLIB. IEEDISPD Display Domain Processor. Write a console display of entries in the domain descriptor table to a target console. IEE8603D SETDOMAIN Command Processor. Process the SETDMN command by altering the domain descriptor table. Section 5: Component Analysis 5.7.19 System Resources Manager (SRM) (continued) ~ MODULES SRM ENTRY POINTS CHAP CPLRVSWF CPUTLCK CPUWAIT IGC095 IRARMAP1 IRARMASM IRARMCAP IRARMCEO IRARMCEL IRARMCEN IRARMCET IRARMCLO IRARMCL1 IRARMCL3 IRARMCPL tRARMCPO IRARMCQT IRARMCRO IRARMCRL IRARMCRN tRARMCRT IRARMCRY IRARMCSI IRARMCSO IRARMCVL IRARMDEL tRARMEQ1 IRARMFRE IRARMHIT IRARMIOO IRARMI01 IRARMI02 IRARMI03 IRARMI04 IRARMI05 IRARMI06 IRARMI07 IRARMI09 IRARMI10 t,RARMI48 IRARMILO IRARMIL 1 IRARMtL3 IRARMIL4 IRARMIPS IRARMMS2 IRARMMS6 Q Q.. 0 Q.. Z ~ « w w > !:!:! 0 N coCD en 0 aco ~ 00 ~ - w w w Z () ~ a: !!: ~ Q.. () ~ a: !!: -I t; ~ a: !!: « « « a: a: w ~ a: I- ~ > w zI- Q a: !!: ~ a: ~ ~ a: !!: « « « « 9: ~ ~ a: !!: « (!) en ~ ~ a: a: ~ a: ~ a: I- w en ~ a: > a: ~ a: en I- en ~ a: a: a: ~ ~ ~ ~ -I ~ ~ a: « « « « « « « 9: !!: !!: 9: 9: !!: a: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X IRARMNQP Figure 5-31. SRM Module/Entry Point Cross Reference (part 1 of 2) 5.7.20 en ~ OS/VS2 System Programming Library: MVS Diagnostic Techniques X X X System Resources Manager (SRM) (continued) ~ MODULES SRM ENTRY POINTS IRARMOPT IRARMPR1 IRARMPR5 IRARMRDR IRARMRM1 IRARMRM2 IRARMRPS IRARMRR1 IRARMRR2 IRARMR16 IRARMSOA IRARMUXB IRARMWM1 IRARMWM2 IRARMWM3 IRARMWM4 IRARMWM5 IRARMWM7 IRARMWMI IRARMWMJ IRARMWMK IRARMWMN IRARMWMO IRARMWMO IRARMWMR IRARMWMY IRARMWR1 IRARMWR2 IRARMWR3 IRARMWR4 IRARMWR5 IRARMWR6 IRARMWR7 IRARMWR8 IRARMWTR IRARMXVT IRARMXTL LCHUSE NEWDP RMRR1CKO RMRR2GST RMRR21NT RMRR2PER RMRR2REO RMRR2RTY RMRR2SPR RMRR2VFB RMRR2VLD STEAL I Q Q. z > « !:!:! 0 Q. ~ 0 w ~ ("II 0 com 8 <0 :E w ~ co w !:!:! en Z :E Q. (.) (.) :E :E a: ..J a: ~ :E a: ! ! « « « ! a: a: w I~ w z a: a: :E :E :E a: :E Q :E a: en 2:: :E a: (!) en :E :E a: a: a: ~ a: :E Iw en :E a: a: t; « :E en a: a: a: a: > a: :E :E :E ..J :::E :::E « « « « « « « « « « « ~ ! a: ! ! ! ! ! ! ~ ! « ! X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 'Figure 5-31. SRM Module/Entry Point Cross Reference (part 2 of 2) Section 5: Component Analysis 5.7.21 5.7.22 OS/VS2 System Programming Library: MVS Diagnostic Techniques VTAM This chapter outlines the important aspects of VTAM problem analysis. It is important that the problem solver have some understanding of how VTAM works. The following publications provide important VT AM structure, logic, control block format, and debugging information: • OS/VS2 VTAM Logic • OS/VS2 System Programming Library: VTAM • OS/VS2 VTAM Data Areas • OS/VS2 MVS VTAM Debugging Guide VTAM is a subsystem in itself. For VTAM problem determination, it is especially important to understand how work progresses through VT AM via its internal dispatching mechanism, process scheduling services (PSS). Some of the VT AM concepts that are discussed in this section are: • Process scheduling services (PSS) • VTAM's Relationship With MVS • Processing Work Through VTAM (PABs, FMCBs) • VT AM Locking • VT AM Recovery/Termination • VTAM Debugging VTAM's Relationship with MVS VTAM has its own address space to manage the network control program (NCP) network. Under VTAM's main task, the following services are performed: • VT AM initiation and termination. • VARY, DISPLAY, and MODIFY network operator commands. • An access method control block (ACB) is opened to VTAM so that VTAM services can be used to communicate with an NCP. The VARY processor obtains and releases IBM 3704/3705 communications controllers through the system dynamic allocation services. A waiting subtask is . posted to build an NCP resource definition table (RDT) which provides a table definition of the network. Another subtask is attached to actually load the 370x NCP. This procedure allows multiple 370x's to be activated concurrently. If TOLTEP and NETSOL are selected, each has its own subtasks operating in VTAM's address space. Each is also connected to VTAM with an OPEN ACB. Section 5. Component Analysis 5.B.1 VT AM (continued) Because the VT AM address space owns the 370x, lOS schedules global SRBs to this address space for POST STATUS processing. Normally, however, VTAM uses a disable interrupt exit (DIE) to run its channel end appendage. The DIE schedules SRBs (physically located in the I/O buffer) into the application program's address space to run the posting process. POST STATUS is used only to handle error situations or when RNIO is beirig traced with GTF active. VTAM operates in the application program's address space when a service is requested by the application program. Local SRBs are used for all VT AM I/O processing to terminals or logical units. Other VTAM services such as OPNDST /CLSDST are run under an IRB from the task that opened the VT AM ACB. VT AM exits, ACB and request parameter list (RPL), are given control. when VTAM issues a SYNCH under the IRB. This means that a VTAM exit runs as a parameter request block (PRB) under the.task that opened the VTAM ACB. VTAM macros (for I/O or other services) can be issued from these exits; however, if the SYN option is used on the macro, a serialization bottleneck can result. As seen from this explanation, VT AM's address space is hot used for normal I/O activity. In analyzing VTAM problems, do not l?e concerned if several tasks are waiting in VTAM's address space. These tasks .are the operator control, NCP communication, and initiation/termination tasks, and are normally waiting·' in VTAM's address space. Processin~ Work Through VT AM Following is an explanation of the dispatching mechanism and the associated key control blocks that the problem solver should understand. VTAM satisfies an application program's request by executing a series of processes. Examples of processes are control layer and TPIOS; each process is a discrete piece of work. Each process is represented by a process anchor block (PAB) which is four words long and serves as a serialization mechanism for a resource. See Figure 5-32. A PAB always resides within some larger control block called a major control block such as an FMCB or an ACDEB. A process is always executed for a particular terminal, logical unit, or option as defmed by the major control block PAB. 5.8.2 OS!VS2 System Programming Library: MVS Diagnostic Techniques VTAM (continued) Major Control Block ~ 8 PST r. ~ 8 PAB 'MPST' Offset Flag WEL WEL (Work Element) CHAIN 1 DVT. T +RPH CRA (Component Recovery Area) Register l' ~ RPH PSS CRR (lSTAPCRR) PROCESS CRR I Figure 5-32. VTAM Control Block Structure Section 5. Component Analysis S.S.3 VTAM (continued) The first word of a PAB contains a work element pointer. A work element is a parameter list for the process. A request parameter list (RPL) and it logical channel program block (LCPB) are examples of work elements. The high-order bit (byte O,X'80') of this first word is a gate bit which indicates that a work element has been queued to the PAB. The gate bit serves as a serialization mechanism; as more work elements are queued to the PAB, the gate bit prevents rescheduling of the PAB until it can handle the work. The gate bit is needed to prevent double scheduling of the PAB, because for many VTAM processes the process scheduling service (PSS) dequeues the work element before it gives the process control. The TPQUE macro is always used to queue work elements. This in-line macro checks the gate bit to determine if scheduling is required and, if so, executes an inner macro, TPSCHED. The second word ,of the PAB is the PAB chain field. As a general convention, PSS and its macros (for example, TPQUE) use the second word of any control block as a chain field. The end of the chain is indicated by X'80000000' in the chain field. The PAB chain field is used to chain the PAB to some queue, for instance, a dispatching queue. The chain field's high-order bit is a gate bit. The gate bit indicates that the PAB has been scheduled for dispatching. Following are the three ways to schedule a PAB for dispatch: • While running under a VTAM process, queue the PAB to the PABQ in the request parameter header (RPH). The PAB will be dispatched when the current process completes. • If not running as a VT AM process, queue the PAB to the process scheduling table (PST) for task-related work, or to the memory process scheduling table (MPST) for address-space related or cross-address-space related work. • DIRECT scheduling causes an SRB, with the PAB address as a parameter, to be scheduled to a special PSS entry point. TPIOS uses this method to initiate inbound processing from the DIE. Note that if the PAB chain gate is off while the work element gate is on, the PAB is probably suspended. A TPSCHED macro is required to reactivate the process. The third word in the PAB contains the PAB offset and the destination vector· table (DVT) pointer. 5.8.4 OS/VS2 System Programming Library: MVS Diagnostic Techniques VT AM (continued) The PAB offsetis used to locate the beginning of the major control block. It is necessary to locate the beginning of the major control block because there is a PSS convention that uses the third word of the major control block as a pointer to the process scheduling table (PST). The fourth word of the PST points to the memory process scheduling table (MPST), which is related to a particular address space. The PAB offset then provides a means to identify task and address space relationships for a given PAB. As a rule, the PST is used to schedule processes to run under the IRB of a particular task, while the MPST is used for scheduling a local SRB into the address space. The PAB DVT pointer points to the beginning of a module list, that is, a list of addresses that are entry points to the modules to be given control during the process. Because the DVT defines a whole process that is to be executed, many PABs will have DVT pointers to the same DVT. The next entry in the DVT to be given control is kept in the request parameter header (RPH); the RPH is updated each time the TPESC macro is used to pass control to the next module in the DVT. The fourth and last word of the PAB contains a byte of flags and a pointer to the request parameter header (RPH). The flag byte contains scheduling indicators for PSSand a bit to indicate whether or not PSS should dequeue the work element for the process. The RPH pointer is set by PSS when the PAB is to be dispatched and is reset to zero when the process completes. Register I always points to the RPH when the process is given control. All of the information relating to the process is stored in the RPH. This information includes such items as pointers to the work element (RPHWEA), the PST (RPHTSKID), a resume address (RPHRESMA) and a register save area (RPHWORK) if the process is suspended, and back pointers to the PAB (RPHMAJCB). The RPH resides within the component recovery area (CRA). VT AM Function Management Control Block (FMCB) The function management control block (FMCB) is the primary control block used in controlling I/O processing between an application program and a destination node (terminal, component, logical unit (LV), etc). This block usually contains the most information when a problem develops in the I/O processing to a particular node. The FMCB is created at OPNDST time; at least one FMCB exists for each open connection. All FMCBs for an application are chained together (at offset X'4') out of the application's ACDEB (at offset X'40'). In addition to the application FMCBs, VT AM maintains FMCBs for such things as dial-in lines .and cluster control units. For logical units, there is also an SSCP FMCB chained from VTAM's ACDEB that is used for network control. The FMCB contains the PABs that control processing through control layer (inbound/outbound) and TPIOS (outbound). Although there is aPAB in the FMCB for TPIOS inbound, the PAB in the DNCB is normally used to control it. In addition to the PABs, the FMCB contains many flags and indicators and some queue headers. These flags and headers are described in the OS/VS2 Data Areas (microfiche). Section 5. Component Analysis 5.S.S VTAM (continued) The wait queues at offset X'110' and X'll~e FMCB are important in debugging. These fields are used to queuerOgical channel program blocks (LCPBs) that have had channel programs built and queued to be shipped out to the 370x or local 3270. The LCPBs are dequeued from the wait queues when the requested operation completes. Expect to se~ read-type operations queued to the wait queue because these operations do not complete until data is entered and received from the associated terminal. However, if write or control type operations are not completing, investigate the situation further. VT AM Operating Characteristics Th~following topics describe characteristics ofVTAM's operating environment. Module Naming Convention Each VTAM module name indicates the type of processing that it performs. ,Following are the major VTAM module naming groups and the processes associated with them: Module Group Process ISTAICxx Application program interface ISTAPCxx Process scheduling services ('.RSS - VTAM's dispatching' mechanism) ISTDCCxx Basic and common ISTRCCxx Record format control layer ISTZxxxx TPIOS ISTORFxx Storage management ISTOCCxx OPNDST, CLSDST, OPEN, CLOSE ISTINxxx SSCP(VARY, DISPLAY) ISTRAMxx Task termination and address space termination resource, manager IS TSDCxx SYSDEF contro~ layer Address Space Usage VTAM's modules reside in the nucleus, the VTAM address space private area, and the LPA. The nucleus contains the attention handling routine and type 1 SVC' routine. The private area contains modules for initialization, termination, initial command processing, SYSDEF, and NETSOL. The LPA contains all of the other VTAM modules. Most of the VT AM control blocks are lo~ated in the CSA. The data buffers as well as the majority of the control blocks occupy the 11 buffer pools that are allocated at VTAM initialization with the CSA. 5.8.6 OS/VS2 System Programming Library: MVS Diagnostic Techniques VT AM (continued) Locking Because VTAM uses a number of SRBs and TCBs, it is important to serialize VTAM's internal use of shared resources (that is, to prevent simultaneous update of a control block by two different processes). The VT AM locking structure accomplishes this serialization. The VTAM locking structure is an internal VTAM function not visible to the user or MVS. VTAM's locking structure is totally independent of the MVS locking structure. In storage, the locks exist as full words in various areas of the VTAM control blocks. Following is the organization of the lockword: Lockword x o,,----_______v~----------~,----------~v------------1 byte - Count of lock holders of this lock. 22 bits - Chain of RPHs (request parameter headers) when process(es} is waiting for this lock. Low-order bit on indicates lock is held exclusively. Each lock is defmed as being of a certain lock level. This allows a lock to be maintained according to a predetennined hierarchy. These lock levels are usually not of significance to the debugger except that he needs to know that they exist so he can interpret the lock level bits in the component recovery area (CRA) as described below. Locks are obtained and released by using VTAM internal macro instructions. The access to the locks is controlled as follows: • A shared request for a lock that is free or held as shared (with no outstanding exclusive requests) is honored immediately. • An exclusive request for a lock that is held as shared is queued until all current shared requests are released. • Any request for a lock that is held as exclusive, or has an exclusive request outstanding, is queued until the exclusive use is complete. The locks held by a process are indicated by the lock level bits in the CRA (at offset X'B'). The pointers to the various locks are located at X'C' through X'30' of the CRA. The pointers to the locks are filled in when a lock request is made; therefore, only the locks currently held have valid pointers. Locks are held only for the duration of a VTAM process; all locks must have been released when a process exits. Section 5~ Component Analysis S.B.7 VTAM (continued) Examples of a locking situation: CRA+X'8' I 6 5 4 3 2 1 I ~_O_-_-_-_-_-_-_-_-_-___ -_-_-_-_-_-_-_~_O__O __1__0__ 0__ 0~. (Bit 32) Lock level Bit setting + (Bit 1) L indicates Level 4 lock held CRA+X'18'~I______~t_'l_e_ve_I_4_'_o_ck____________________ If the CRA lock accounting word appeared as above, it would mean that a level 4 lock was held by the process currently active. Offset X'18' (level 4 lock pointer) of the CRA contains the pointer to the lock in question. Refer to OS/VS2 Data Areas (microfiche) for details .. RPHs that are waiting for a lock will be queued onto that lock. Multiple RPHs waiting for the same lock will be chained together. This relationship is shown in Figure 5-33. Summary of VTAM Locking: The main concern of the debugger regarding locks is that a process can be forced to wait because it cannot obtain a lock. The lock is unavailable because it is held by some other process. This situation is reflected by an active CRA with a resume address that points to code that follows a lock request. Layout the CRAs, etc., as shown in Figure 5-33 and investigate those processes that are waiting, determine what lock is being requested, locate the current lock holder, and determine why the lock has not been released. VTAM Recovery/Termination VTAM recovery/termination is accomplished by means ofSTAE, EST AE, FRR, and resource manager routines. The exact recovery action attempted by VTAM depends on conditions at the time of the errors. However, for debugging purposes, the basic functions ofVTAM are the following: • To record the SDWA in SYS 1.LOGREC • To take an SDUMP. • To terminate the application program or VTAM. (Note, if the error occurs in the VTAM address space, VT AM generally attempts to simulate a normal shutdown.) 5.8.8 OS/VS2 System Programming Library: MVS Diagnostic Techniques VTAM (continued) LPBUF A lockword, somewhere in storage CRA 1 @ of a lock 10 0000 0000 ! RPH CRA 2 @ of a lock 10 00000000 CRA3 ! RPH @ of a lock A lockword, somewhere in storage @ RPH 0000 0000 Resume @ @ofa lock CRA 5 4 ~O_O_O_O_O_O_O_0-l 10 Resume @ l f 4 0000 0000 10 Resume @ RPH Non-waiting RPH (CRA 1) holds the lock that RPH 3 (CRA 3) and RPH 4 (CRA 4) are waiting for. Non-waiting RPH 2 (CRA 2) holds a lock no RPHs are waiting for. Waiting RPH 3 (CRA 3) holds a lock that no RPH is waiting for. Waiting RPH 4 (CRA 4) holds a lock that RPH 5 (CRA 5) is waiting for. Figure 5-33. Several RPHs Waiting for the Same Lock Section 5. Component Analysis 5.8.9 VTAM (continued) The termination of VTAM or VTAM applications causes the VTAM resource managers to get control. The resource manager routines clean up the VTAM resources allocated to the terminating task or address .space. VTAM recovery/termination functions affect debugging in the following ways: 1. A dump and SDWA in SYSI LOGREC are provided for the error condition. If the error was in a VTAM application address space, VTAM and other VTAM applications continue to run. This allows you to debug certain problems without having a major impact on the installation's operation. 2. Subsequent errors can occur in termination, in VTAM, or in other VTAM applications as a result of the original error that was undetected. If this is the case, a dump can be very difficult to understand because cleanup was attempted or performed on behalf of the original error. Always be aware of any problems that have occurred prior to the particular problem being diagnosed. To detect these previous problems, inspect the in-storage LOGREC buffer and SYSI.LOGREC. In addition to the major recovery action described above, there are other recovery actions: • A failure during command interpretation (but not during command execution) results in the loss of the current operator command, but continued availability of the operator control function. • Failure during various SSCP functions can result in the immediate termination of VTAM without simulating a normal shutdown. • Failure in the storage management services (SMS) modules, ISTORFBA and ISTORFBD, results in a failure of the storage request, but does not cause termination. The module requesting SMS service is informed of this action by return codes. • Authorized path entry/exit errors are retried or the RPL is posted with an error indication. VTAM Debugging Because VTAM is a large component that interacts with other components and application programs, when you debug VTAM you must look at a number of factors besides the storage dump. Begin the debugging process by considering the operating environment and all the conditions that could have led to the suspected error. 5.8.10 OS/VS2 System Programming Library: MVS Diagnostic Techniques VT AM (continued) Following are some items you should look at when attempting to solve VTAM problems: • Console sheet • GTF traces (especially for VTAM I/O activity) I • SYSl.LOGREC entries for program checks or MDR records • NCP generation listing • PTF level of the system Waits VTAM waits can happen to the following groups: • Entire VTAM component and all VTAM applications • One or more applications only • VTAM network operator commands only (possibly only VARY) • One or more terminals only Also, a wait can occur when VTAM will not halt. VT AM process scheduling services (PSS) routines control the flow of work through VT AM by performing an internal VTAM dispatching function. In debugging, the most important control block in determining the status of the dispatching activity and the wait states is the request parameter header (RPH), which is located within the component recovery area (CRA). I If a VT AM internal process is waiting for a VTAM lock, for storage to become available, or for another process to complete, this condition is reflected by an active CRA. CRAs are found in the LPBUF buffer pool. (See Figure "How to Locate the CRA" in OS/VS2 MVS VTAM Debugging Guide.) This pool is located as shown in Figure 5·34. The RPH is located X'34' bytes into the CRA; it can be recognized by the string X'016C' in the first halfword. Offset X'lO' into the RPH is the RPHRESMA field; this field contains zeros if the RPH is not waiting or a resume address if it is waiting. Once you find a waiting RPH, the best way to determine why it is waiting is to find the module at the address in the resume address field, and then look at the module listing. Unless the wait is for a lack of buffers (which can be resolved by increasing the number of buffers), further analysis is necessary to determine why the process is not being posted or why a lock is not being freed. Section 5. Component Analysis 5.8.11 VTAM (continued) RPHs waiting for a VTAM lock are queued onto that lock. Multiple RPHs waiting for the same lock are chained together (as shown in Figure 5-33). If a process holds any locks, the lock level bit at offset 8 in the CRA indicate the level of the lock(s) being held. Pointers to the various locks are located at offsets X'C' through X'30' of the CRA. Note that although all these pointers can contain addresses, only the pointers to the locks held or requested during the dispatching of the current process are valid. A VT AM internal process can often be waiting for storage. VTAM routines obtain and release buffers by using VTAM internal macro instructions. These macro instructions branch to the VT AM storage management modules that control the buffer pools allocated at VTAM start time. Because the number of buffer pools is specified at VTAM initialization and is constant, it is possible to encounter a shortage condition in unusual situations. See the section on tuning VTAM in the OS/VS2 System Programming Library: VTAM for information on specifying the proper storage pool values, the threshold value effect, and slowdown processing. To determine if a buffer pool is in a slowdown state, do the following: I• Locate the buffer pool control blocks (BPCBs)as shown in Figure 5~34. • Look at offset X'lO' of each BPCB. If the X'40' bit is on, the buffer pool is in a slowdown state. (The BPCBs are located in contiguous storage and can thus be scanned quickly.) If a dump has been taken because of a wait-type problem in VTAM, and the dump shows a buffer pool in slowdown, you can usually conclude that a buffer shortage has caused the wait problem. If you increase the number of buffers in the appropriate pool, this usually eliminates the problem. However, the problem should be investigated further to determine if a VTAM logic error has caused the buffers to be wasted and thus depleted. VTAM routines that request buffers can choose to wait or not to wait if there are not enough buffers available to fulfill the buffer request. If a routine chooses to wait (and most do) when buffers are unavailable, the process is represented by an active CRA with a non-zero resume address. In addition to the slowdown bit being on in the BPCB, the RPH for the process is queued onto the BPCB. The queue headers in the BPCB are located at offsets X'18' and X'IC'; X'18' for queuing priority requests and X'lC' for queuing normal requests. Figure 5-35 represents the queuing of RPHs and Figure 5-34 shows how to fmd the BPCBs. Any address in either of the queue headers of a BPCB indicates a buffer problem with that pool. The RPHs queued from the BPCB have a resume address that points to code following the buffer request. Examine the routine in question to determine if an error in the code has caused the buffer problem or if the condition exists because the buffer specification was too small. 5.8.12 OS/VS2 System Programming Library: MVS Diagnostic Techniques VTAM (continued) Address of ATCVT ~ V000400 86578881 60FE3000 00B2EA40 00000000 00000000 00000000 00000000 00000000 Beginning of ATCVT 0 0 VB2EA40 VB2EA60 VB2EA80 VB2EAAO VB2EACO 40000000 80A9D098 00000000 00000000 00B2COOO 00C2D400 00B2D800 00000000 00C26190 00BF4B08 00000000 80A9D098 00000000 00C26160 000C2D50 Address of BPDTY Beginning of BPEs Beginning of BPDTY 0 VB2cOOO VB2C020 VB2C040 VB2C060 VB2C080 VB2COAO VB2COCO VB2COEO VB2C100 VB2C120 VB2C140 ooocoooc 00B2COAO 00320000 00320001 0064000A E2C6FOFO D3C6FOFO E4C5C3C2 C6D4C3C2 C4C1E3C1 C3D9 00000000 00B2EA40 00320000 00320000 00510000 60000058 60000080 50000070 40000148 440000D8 40000070 00B2C4B8 00000000 00320000 02000000 00640000 00B2C2D8 00B2C398 00B2C458 00B2C518 00B2C5D8 00B2C5D8 000003E8 00250000 01 FC00(V 00580e 5 0000001l1' 00000('0 OOOOOO\;;. 00000000 00000000 00 00000000 005800FD 00140002 02000033 E2D4E2F1 E2D7FOFO D3D7FOFO C9D6FOFO C1C3C540 D5D7FOFO 00000 00B2DOOO 00B2C038 OOOCOOOO 02000000 60000000 40000048 400003A8 6COOOOCO 40000040 42000148 578 00000000 00280000 00320000 0100001A 00B2C278 00B2C338 00B2C3F8 00B2C4B8 00B2C578 00B2C638 4 00B2C278 001DOOOO 00320000 01000000 00000000 00000000 00000000 00000000 00000000 00000000 000 LPBUF BPENT Address Beginning address of pool Ending address of pool VB2C3EO VB2C400 VB2C420 VB2C440 00C26550 00B2C398 00000000 00B28000 00B253BO 0144000B 000403A8 00B2C42C D6D9C4DB OOOO~~--~~~~ Beginning of BPCB 00000000 00000000 400000E7 00B27ECO 00000000 00000000 00B2COOO 00B2C458 00000000 00FF2590 00000006 00000000 000 Notes: 1. Locate the address of the ATCVT (VTAM communication vector table) by going to absolute storage location X'408.' 2. Go to the specified address to locate the ATCVT. 3. Locate the address of the BPDTY (buffer pool directory) by indexing into the ATCVT a value of X'80: 4. Go to the specified address to locate the BPDTY. 5. Locate the BPEs (buffer pool entries) by indexing into the BPDTY a value of X'90: 6. The BPEs contain the name of the pool in the first 4 bytes - the length of the pool element in the second 2 bytes of the second word - and the address of the BPCB (buffer pool control block) in the third 4 bytes. Each BPE is X' 10' bytes long. (Note that the example shows the LPBUF.) Figure 5-34. Sample Storage Pool Dump Section 5. Component Analysis 5.S.13 VTAM (continued) ---------- LPBUF ACRA la @RPH 18 or 1C '> Resume @ I ~ @ of RPH RPH ACRA I I 4. 0 10 Resume@ } RPH \ACRA 4 A BPCB @RPH ' } RPH 10 Resume @ I Figure 5-35. Queuing of RPHs While Waiting for Storage If a routine chooses not to wait when buffers are unavailable, return codes notify the routine of the lack of buffers. There are no specific flags that are always set to indicate that the request was rejected. Therefore, you cannot easily determine if a particular routine requested buffers but did not get them. However, you can tell that there is a buffer problem because this is usually indicated by the slowdown bit being on in one of the BPCBs. Usually there is an active CRA for problems that are described as VT AM waits. However, for some problems (for example, one or more terminals waiting), a dump might be obtained that has no active CRAs. The best place to start with a problem such as this one is to locate the FMCB/DNCB for the terminal(s)in question. The FMCB/DNCB control blocks contain the following: • various flags • PABs to control the inbound and outbound request processing • queues of outstanding requests Investigate further any work elements found unprocessed on the PABs or queues of these control blocks. To find the FMCB/DNCB for a particular node name, look at the following: QAB + X'8' +ATCVT +QAB 4first RDT segment RDT+X'4C' .• next RDT segment LOC X'408' ACTCVT + X'C' 5.8.14 = OS!VS2 System Programming Library: MVS Diagnostic Techniques VTAM (continued) ROTs are segmented tables and each segment contains information ab.out a major node as defined in SYSl.VTAMLST. Each segment contains entries for the groups, lines, clusters, terminals, components, etc., for the major node, as shown below: Segment ~ Entry <0:::::::::-----:.----1... __--~w~--- NCPname subelement name subelement name The node name is in the first eight bytes of each ROT entry. Stop chaining through the RDT entries when the major node name (that is, NCP or LBUILD) is found; then scan down the RDT entries of this major node until the node in question is found. Offset X'28' of the appropriate RDT entry points to the DNCB (offset X'28' from beginning of name). DNCB + X'10' points to the FMCB. Another way to find the FMCB/DNCB for a process that is waiting (but has an active CRA and an RPH with a non-zero resume address) is to look at the module that is waiting and determine its register usage. Find the registers that were saved in the RPH + X'28'. See the module for the order in which the registers were saved. Program Checks The information generally available on VTAM program checks is in the SDWA and the SDUMP that the EST AEs Or FRRs provide. This information is used in the normal manner to determine the cause of the program check. However, there can be cases where more exact or timely information is required. This additional information might have to be obtained through the use of traces or traps. Traps at the en try point of the FRRs or EST AEs (1ST APC61 and 1ST APC62 , in particular) are often useful. Miscellaneous Hints On VT AM 1. VTAM waits can occur because of buffer depletion. Such a situation usually occurs just after VT AM is installed and before the actual buffer requirements are determined. Running GTF (with the USR trace option) at this time can be helpful because VT AM creates an SMS trace record whenever a storage request is queued. Because VARY ACTIVATE/INACTIVATE of an NCP puts the heaviest stress on the buffer pools, start GTF before an NCP is activated. Section 5. Component Analysis 5.B.15 VTAM (continued) 2. VTAM places warmstart copies of major nodes into the data set SYSl.VTAMOBJ the first time that anode is activated. These warmstart copies are used for subsequent activations of the node. If a node definition is changed in SYS 1.VTAMLST, be sure to scratch the corresponding member in SYS 1.VTAMOBJ to ensure that the new defmition is used by VT AM. Also scratch the members of SYSl.VTAMOBJ after PTFs have been installed because some of the bit defmitions might have changed. 3. Most VTAM control blocks are in the 11 VTAM buffer pools. By simply scanning the buffer pools and looking for unusual conditions you can often uncover many of the problems. Each buffer in the buffer pools is preceded by a two-word buffer header. The high-order bit of the first word indicates whether the buffer is allocated or available: on =allocated, off =available. The address portion of the first word points to the module that last released the buffer. The second word contains a pointer to the buffer pool control block. 5.8.16 OS/VS2 System Programming Library: MVS Diagnostic Techniques VSAM The virtual storage access method (VSAM) consists of three major subcomponents: • Record management • Open/ close/end-of-volume • I/O manager Record Management Record management processing produces no messages. Problem determination normally begins with an examination of the request parameter list (RPL). If a physical error occurs and the user has provided a large enough message area (pointed to by RPLERMSA), VSAM (IDAOI9R5) builds a SYNADAF-type record in that area for the user to examine. For both logical and physical errors, VSAM sets return codes in the RPL. RPL Three fields in the RPL are used to indicate an error: 1. RPLERREG - (RPL + X'D') a one-byte value which is also returned in register 15 after a request: o request completed normally 8 a logical error occurred 12 2. RPLCMPON a physical error occurred. -- (RPL + X'E') a one-byte value that indicates which component was being processed at the time of the error if the request involved alternate indexes. This value also indicates whether upgrading was valid or was incorrect because of the error. CODE 3. RPLERRCD COMPONENT STATUS OF UPGRADE 0 base cluster valid 1 base cluster might be incorrect 2 alternate index valid 3 alternate index might be incorrect 4 upgrade set valid 5 upgrade set might be incorrect (RPL + X'F') a one-byte value describing the error (see the Diagnostic Aids section of OSjVS2 VSAM Logic). Section 5. Component Analysis 5.9.1 VSAM{continued) Other important fields in the RPL are: RPLREQ (+X'02') request type RPLPLHPT (+ X'04') pointer to the PLH RPLECB (+X'OB') ECB or pointer to the ECB ~he RPLDACB - (+X'lB') pointer to RPLAREA - (+X'20') pointer to the user's record area RPLARG - (+X'24') pointer to the user's search argument RPLOPTCD - (+ X'2B') two bytes of option flags RPLDDDD - (+X'40') last successful request's RBA value (returned to user by VSAM). ACB PLH Once the information in the RPL has been evaluated, the next block to examine is the placeholder (PLH). The PLH contains current information about the request, including positioning and pointers to associated control blocks such as buffer control blocks (BUFCs) and the I/O management block (IOMB). The following fields are important for understanding the request: PLHFLG1 - (+X'02') status flags PLHFLG2 - (+X'03') status flags PLHEFLGS - (+X'04') two bytes of ex~eption flags PLHFLG3 -(+X'06') status flags PLHAFLG3 - (+X'07') statusflags PLHCRPL - (+X'14') pointer to the current RPL PLHDBUFC - (+X'34') pointer to the current data BUFC PLHDIOB - (+X'4C') pointer to 10MB PLHRETO - (+X'74') halfword offset into register 14 pushdown save area. If the halfword at +X'76' is zero, PLHRETO is an offset from +X'7S' into a 14-word save area and points to the next available word. If the halfword at +X'76' is not zero, then it is the offset from +X'7B' to the beginning of a 20-word save area at the end of the PLH, and PLHRETO is an offset from +X'7B' into that save area. - (+X'BC') pointer to the current index BUFC PLHIBUFC 5.9.2 PLHIXSPL - (+X'CS') 32-byte index search parameter list (IXSPL) containing information about the results of the last index search. PLHKEYPT - (+X'FS') pointer to the current key value or relative record number. OS!VS2 System Programming Library: MVS Diagnostic Techniques VSAM (continued) BUFC The buffer control block (BUFC) contains function codes, status indicators, and relative byte address (RBA) values describing the associated buffer. BUFFLG 1 - (+ X '0 1 ') BUFC status flags BUFCIOFL - (+X'02') I/O status flags BUFCDDDD - (+X'08') RBA for input if BUFCV AL is on BUFCORBA - (+X'OC') RBA for output if BUFCMW is on BUFCBAD - (+X'14') pointer to associated buffer During record management processing, register usage is as follows: Rl RPL pointer R2 PLH pointer R3 pointer to the access method block (AMB) of the component being processed R4 BUFC pointer Use the register 14 save area in the PLH to find the path taken by a request through record management. Record Management Debugging Aids It is not always desirable to cause program checks as a method of getting dumps, because some applications have sophisticated error recovery routines that can possibly change the environment. It is preferable to get documentation of the error before such routines get control, and then allow these routines to do their cleanup function after the dump is taken. The following code is an example of a console-activated communications vector table (CVT) trap for record management errors that causes the failing application to loop, allowing a console dump' to be taken. Following the dump the trap can be deactivated, allowing the application to continue processing. The code can be inserted into CSECT IDA019Rl at label 'POSTRPL', label 'POSTRPL2', and the patch area at the end of the module. Section 5. Component Analysis 5.9.3 VSAM l continued) NAME IDA019Ll IDA019Rl VERPOSTRPL 9S0C,100D VER POSTRPL2 1851,9101,1028 VERPATCH 0000,0000 X'S4' bytes of patch area REPPOSTRPL 4SEO,Bx.xx to PATCH 1 REPPOSTRPL2 18S1,4SEO,Bxxx to PATCH2 REP PATCH 1 S8FO,0010, point to CVT 9102,F108, is trap activated? 4780,Bxxx, no, go to EXITI LOOPI DSOO,F10A,100D, compare error type 4770,Bxxx, no, go to EXIT 1 DSOO,FI0B,100F, compare error code 4770,Bx.xx, no, go to EXIT 1 47FO,Bxxx, yes, go to LOOPI. Loop until trap bit in CVT is turned off. 9S0C,100D, restore instruction 07FE, branch back inline PATCH2 S8FO,0010, point to CVT LOOP2 9102,FI08, is trap activated? 4780 ,Bx.xx, no, go to EXIT2 D 500 ,F lOA, 1OOD compare error type 4770,Bx.xx, no, go to EXIT2 DSOO,F10B,100F, compare error code 4770,Bx.xx, no, go to EXIT2 47 FO ,Bx.xx, yes, go to LOOP2. Loop until trap bit in CVT is turned off. 9101,1028, restore instruction 07FE branch back inline EXIT 1 EXIT2 To activate the trap, set CVT + X'10A~10B' to logical error (X'08xx') where xx is the error code (RPLERRCD), or to physical error (X'OCOO'). Then 'OR' on bit 6 (X'02') in CVT + X'108' taking care to leave the other bits in that byte undisturbed. After the loop occurs and a console dump of the failing address space has been taken, turn off bit 6 in CVT + X'108' to deactivate the trap and allow the application to continue processing. Be sure that the dump taken includes the region, SQA, and CSA. Note that when using the trap for physical errors the RPLERRCD is X'OO' at the point of the trap because VSAM has not yet gone to IDAOI9RS. Physical errors caused by unit check (for example - incorrect length, no record found on a search id, require that the I/O supervisor block (lOSB) be examined. To get a dump with the 10SB still valid, a trap can be inserted into nucleus CSECT IDA121A4 (abnormal end appendage) at label 'PERMERR'. Since this is in the nucleus, the trap can be set from the console. (See I/O Manager Debugging). 5.9.4 OS!VS2 System Programming Library: MVS Diagnostic Techniques VSAM (continued) Record management error codes (RPLERRCD) are described in the Diagnostic Aids section of OS/VS2 VSAM Logic. It is useful to know which module sets each error and the name of each error, so that you can find where it is set in the module via the cross reference. Error Code (hex) Symbolic Name Moclule (IDAOl'xx) RPLEODER RPLDUP RPLSEQCK RPLNOREC RPLEXCL RPLNOMNT RPLNOEXT RPLINRBA RPLNOKR RPLNOVRT RPLINBUF RPLNOPLII RPLINACC RPLINKEY RPLINADR RPLERSER RPLINLOC RPLNOPTR RPLINUPD RPLKEYCH RPLDLCER RPLINVP RPLINLEN RPLKEYLC RPLINLRQ RPLINTCB RPLSRLOC RPLARSRK RPLSRISG RPLNBRCD RPLNXPTR RPLNOBFR RPLIRRNO RPLRRADR RPLPAACI RPLPUTBK RPLINVEQ RD, RR, RY, R2,R4, R8 RA,RQ,RX,R4 RA,RR,RX,R4 RA,RR,RY RF, RY, R2, R8 RW, RY, R2, R5 RE, RF, RM, R5, 1t8 RA,R8 Logical 04 08 OC 10 14 18 lC 20 24 28 2C 40 44 48 4C 50 54 58 5C 60 64 68 6C 70 74 78 84 88 8C 90 94 98 CO C4 C8 CC DO RM RG,RU,RX RR, RT, RY, R4, R8 RU,RX,Rl RQ,R4,R8 Rl, R8 Rl,R8 RL,RX,R8 RQ, Rl, R4, R8 RD,RR, R4, R8 RQ, RX, R4, R8 RL,RX RL,RQ RA, RR, RY, RX, Rl, R4, R8 RL, RQ, RU, R4, R8 Rl RR, R4, R8 RP RT RT R4 RX RU RY RQ,RR Rl RX RQ,R4 RP Section 5. Component Analysis S.9.5 VSAM (continued) Physical 04 08 OC 10 14 18 RPLRDERD RPLRDERI RPLRDERS RPLWTERD RPLWTERI RPLWTERS RS RS RS RS RS RS Record management processing sometimes requires serialization of internal resources, When the needed resource can be acquired, processing proceeds normally. However, when another request has control of the resource the request is deferred. As each request completes, a scan is made for requests which have been deferred. If the resource has become available, the deferred request is restarted. While a request is deferred, PLHDRPND is set in the PLH and PLHDRRSC points to the resource byte to be tested for availability. Open/Close/End-Of-Volume O/C/EOV documents errors by means of error messages and access method control block (ACB) return codes. The codes returned in the ACB (ACBERFLG) are explained in the Diagnostic Aids section of OS/VS2 VSAM Logic, along with an indication of the modules that set each error. In the cross reference of the modules, these error codes have the symbolic name of OPERRddd, where ddd is the decimal error code. The most significant problem determination feature of O/C/EOV however, is its message facility. The following messages are issued: MSGIEC070I - END OF VOLUME MSGIECI611 - OPEN MSGIEC2S II - CLOSE MSGIEC2S2I - CLOSE (TYPE=T) The messages contain both problem codes (symbolic PPddd) and function codes (symbolic PDFddd). The problem codes that describe the error are explained with each message in VS2 System Messages. The function codes are described best in the Diagnostic Aids section of OS/VS2 VSAM Logic, along with the module that was performing the function at the time of the error. 5.9.6 OS/VS2 System Programming Library: MVS Diagnostic Techniques VSAM (continued) O/C/EOV Debugging Aids There is a built-in trap for O/C/EOV (see the Caution later in this topic). There are two bits involved. Bit 4 (X'08') at CVT + X'108' can be OR'd on (being careful to leave the other bits in that byte undisturbed) to cause an abend dump (U888) when the message is issued. Bit 6 (X'02') at CVT + X'10A' when turned on prevents the freeing of module work areas. When both these bits are on, the U888 dump produced contains the module work area for every module gone through in the open path. There is a discussion in the Diagnostic Aids section of OSjVS2 VSAM Logic on finding the work areas in the dump and a diagram showing how the work areas are chained together. I GTF trace is also available for debugging. If GTF is active for TRACE=USR at the time of the error, VSAM Open (IDAOI92P) writes user records FFF and FF5 containing the VSAM control blocks at the time of the failure. The standard OPEN work area trace is also available by coding AMP='TRACE' on the DD statement. The following ENQs are issued by O/C/EOV: Major Name Minor Name Modules Reason SYSVSAM NNNCCCCB (Note 1) IDA0192A IDA0200T IDAo231 T IDA0557 A SYSVSAM NNNCCCCI (Note 1) SYSVSAM NNNCCCCO (Note 1) IDA0192A The '0' ENQ is issued for each component of a data set being opened for output processing. DEQ is issued when the data set is closed. The 'B' or busy ENQ is used to serialize the modification of the control block chains by allowing only one of the functions (OPEN, CLOSE, TCLOSE, or END OF VOLUME) to process the data set. This resource is held for the life of the function IDA0192A The 'I' ENQ is issued for each component of a data set being opened for input processing. DEQ is issued when the data set is closed. Note: If the data set is opened for both input and output, both the 'I' and '0' resources will be held for each component. Note 1: In the minor name, NNN CCCC the 3-byte CI number of the component's catalog record = the 4-byte catalog ACB address. Section 5. Component Analysis 5.9.7 VSAM (continued) When a VSAM (non-catalog) ACB is opened, data extent blocks (DEBs) are constructed and chained as follows: • A DEB containing the data set ACBaddress at DEB + X'IS' is chained on the DEB chain of the current TCB. This DEB is referred to as the 'dummy' DEB. Its purpose is to allow abend to close the VSAM data set if abnormal termination occurs. • A DEB containing the component access method block (AMB) address at DEB + x' IS' is chained on the DEB chain of the jobstep TCB for each component being opened. These are the 'real'DEBs and are the ones actually used by VSAM processing. When an ACB is being opened for DSNAME or DDNAME sharing and the data set is already open, the ACB is just connected to the existing control block structure and only the 'dummy' DEB is built and chained on the current TCB. Caution: When using the O/C/EOV trap be aware that: • If the bit is turned on to prevent the freeing of work areas and the job causes many calls to O/C/EOV, the region size may have to be increased to prevent ABENDSOA. • JOBCATs and STEPCATs are opened under the initiator TCB. The work area core is owned by the initiator TCB. If this core is not freed because the CVT debug bit is on, the initiator may get an ABEND20A when it issues FREEMAIN for subpool 247 at jo b termination. I/O Manager I/O management includes the following modules: IDAO 19 R3 Problem state I/O driver; a CSECT of LPA load module IDA019Ll IGC121 Supervisor state I/O driver (SIOD); a CSECT in the nucleus IDA121A2 Actual block processor (ABP); a CSECT in the nucleus IDA121A3 Channel end appendage; aCSECT in the nucleus IDA121A4 Abnormal end appendage; a CSECT in the nucleus The drivers and the ABP translate requests for access to the contents of control intervals into requests for reading and writing physical records. They also build the channel program to be passed to lOS. 5.9.8 OS/VS2System Programming Library: MVS Diagnostic Techniques VSAM (continued) I/O Manager Debugging The combination of the I/O management block (IOMB), the I/O supervisor block (IOSB), and the service request block (SRB), is used by I/O management to control the processing of a request. The PLH (PLHIOB) points to the 10MB. The 10MB points to the 10SB (IOMIOSB), which in turn points to the SRB (IOSSRB). For debugging unit checks (for example: no record found, incorrect length, channel program check, channel protection check) the best place to trap for a dump is at label 'PERMERR' in nucleus csect IDA121A4. Section 5. Component Analysis . 5.9.9 S.9.10 OS/VS2. System Programming Library: MVS Diagnostic Techniques Catalog Management Catalog management manages system requests for references and updates to the master catalog. The following description of catalog management includes these topics: • Major Registers and Control Blocks • Module Structure • VSAM Catalog Recovery Logic • Debugging Hints Major Registers and Control Blocks This section describes the major catalog management registers and control blocks, shows how each can be located, and describes those control block fields and flags that have proven to be useful in debugging. How to Find Registers Catalog management runs under control of an SVRB. The registers are saved across supervisor-assisted linkages and interruptions in the standard ways. Depending upon the nature of the problem, the registers can usually be found in one of the following areas: • For abends, registers are stored in RTM's SVRB and SDWA. • For program checks, registers are stored in RTM's SVRB, the SDWA, and the LCCA. • For catalog-management-issued type 2, 3, and 4 SVCs, registers are stored in the successor SVRB. • F or waits, registers are stored in the TCB. The registers stored in any of these areas will be the registers that existed when the code that was running under a catalog SVRB gave up control. These registers will either be the registers of one of the three catalog management routines or the registers of a routine that was branch-entered by catalog management. If register 11 points to the CCA (identifiable vi~ a X'ACCA' in the first word), the registers probably belong to IGGOCLAI; register 12 will be the base register for the CSECT last in control. Otherwise, if register 11 is a base register, the code that it references may be inspected to determine the routine in control. If the routine in control is one that was branch-entered by catalog management, then catalog management's registers may have been saved in a standard area pointed to by register 13. Section S. Component Analysis 5.10.1 Catalog Management (continued) Major Registers IG COO 02F Register 11 Base register Register 12 Work area pointer IGCOCLAI Register 11 - CCA pointer Register 12 Base register (current CSECT) Register 13 - Register save push down list pointer (see CCAREGS) or standard save area pointer IGGOCLCA Register 11 - Base register Register 12 - Work area pointer Major Control Blocks The control blocks described in this section (AMCBS, PCCB, ACB, CAXWA, CTGPL and CCA) are those that are most useful from a debugging standpoint. The AMCBS and PCCB are useful in locating the control block structures for open catalogs. The ACB and CAXWA relate to a particular catalog or catalog recovery area (CRA) data set. The CTGPL and CCA relate to a particular catalog request. AMCBS The AMCBS (access method control block structure) is essentially a VSAM vector table. It is constructed within the SQA during early NIP processing (IEAVNPll) and resides there throughout the life of the system. The AMCBS is found through CVT+X'100' (field CVTCBSP). Major fields in the AMCBS are: Field 5.10.2 Descrip tion CBSACB Pointer to the master catalog's ACB. CBSCMP Pointer to the IGGOCLAI load module. CBSCAXCN CAXW A chain pointer. The CAXWAs 0 f all currently open VSAM catalogs are included in this chain. The master catalog's CAXWA is the last CAXW A in this chain. OS!VS2 System Programming Library: MVS Diagnostic Techniques Catalog Management (continued) PCCB A PCCB (private catalog control block) connects a VSAM user catalog to a particular initiator or job step. A PCCB is constructed (in SWA) for each user catalog opened during the life of ajob step. PCCBs are chained together to form an initiator or job-step-oriented PCCB chain. Generally, PCCBs are freed by step termination. A PCCB is not required for the master catalog. PCCBs are located through the TCB: TCB+X'B4' (field TCBJSCB) points to the JSCB; JSCB+ X' 15 C' (field JSCBACT) points to the. active JSCB; the active JSCB+X'CC' (field JSCBPCC) points to the first PCCB. PCCBs are chained via PCCNEXTP. Major fields in a PCCB are: Field Description PCCACRO PCCB identifier ('PCCB'). PCCNEXTP Pointer to the next PCCB. This field is 0 if it is the last PCCB. PCCACBP Pointer to the catalog's ACB. PCCDSNAM Catalog's name. PCCTGCON Catalog's alias name. Major flags in a PCCB are: Flag Description PCCSTEPC The catalog was specified to the job step through the use of a JOBCAT or STEPCAT DD card. PCCACTIV The catalog is allocated and active. PCOSCVOL The catalog is an OS CVOL. ACB There is one ACB (access method control block) for each open VSAM catalog or CRA. The ACB is created by the routine that opens the data set. Catalog and CRA ACBs generally reside in the CSA. An ACB can be located in the following ways: 1. The master catalog's ACB can be located from the AMCBS (CBSACB). 2. A particular user catalog's ACB can be located either via the CAXWA chain or via the PCCB chain. To locate the ACB via the CAXWA chain, inspect the CAXCNAM field of each CAXWA in turn until the desired catalog name is found. The first CAXWA!s pointed to by the AMCBS (CBSCAXCN). The CAXWAs are chained via CAXCHN. When the desired CAXWA is found, it points to the desired ACB (CAXACB). Section 5. Component Analysis 5.10.3 Catalog Management (continued) To locate the ACB via the PCCB chain, inspect the PCCDSNAM and PCCTGCON fields of each PCCB in turn until the desired catalog name or alias name is found. The first PCCB is pointed to by the job step's active JSCB (JSCBPCC). The PCCBs are chained via PCCNEXTP. When the desired PCCB is found, it points to the desired ACB via PCCACBP. 3. A particular CRA's ACB can be located as follows: a. Find the owning catalog's ACB (via steps 1 or 2). b. Find the owning catalog's CAXWA (pointed to by ACBUAPTR). c. Find the first CRA's ACB (pointed to by CAXCRACB). d. Find the first CRA's CAXWA (pointed to by the CRA ACB's ACBUAPTR field at ACB+X'40'). e. Inspect the CAXVOLID field for the desired CRA volume serial number. f. If the desired CRA's ACB has not yet been found, then search the remaining CAXW As in the CRA CAXWA chain. Inspect the CAXVOLID field of each remaining CRA CAXWA in turn until the desired CRA volume serial number is found. The remaining CRA CAXWAs are chained to the first CRA CAXWA (and to each other) via CAXCHN. When the desired CRA CAXW A is located, it points to the desired CRA ACB via CAXCRACB. 4. The ACB representing the VSAM catalog that is currently being processed by a particular catalog request can be located via the CCA (CCAACB). 5. The ACB representing the CRA that is currently being processed by a particular catalog request can be located via the CCA (CCARAACB). Major fields in the ACB are: 5.10.4 Field Descrip don ACBID Control block identifier (X' AO'). ACBAMBL Pointer to the VSAM record management control block structure. This set of control blocks is built at OPEN time, resides in CSA, and consists of those control blocks required to support a KSDS (catalog) or an ESDS (CRA). ACBERFLG Error code stored by OPEN or CLOSE when the operation is unsuccess ful. ACBUAPTR Pointer to the CAXWA. OS/VS2 System Programming Library: MVS Diagnostic Techniques Catalog Management (continued) Major flags in the ACB are: Flag Description ACBCAT ACB represents a catalog. ACBSCRA ACB represents a CRA that has been opened for catalog management use. ACBVCRA ACB represents a CRA that has been opened for use by an access method services (AMS) utility function. CAXWA There is one CAXW A (catalog ACB extended work area) for each open catalog or CRA. The CAXWA is created during the OPEN process (either before the OPEN or by the catalog OPEN routines). CAXW As generally reside in the CSA. The CAXWA is pointed to by the ACB (field ACBVAPTR). See step 3 for locating the ACB under the heading "ACB" earlier in this chapter. Major fields in the CAXWAare: Field Descrip tion CAXID Control block identifier (X'CA'). CAXCHN Pointer to the next CAXWA in the CAXWA chain. This is 0 if it is the last CAXWA in the chain. CAXACT Count of the number of job steps for which this catalog is currently open. CAXACB Pointer to the catalog ACB. CAXVCB Pointer to the catalog's or CRA's VCB. CAXRPL Pointer to a pool of RPLs. This pool is obtained at OPEN time and resides in CSA. (Note: This field is not used in CRA CAXW As. CRA RPLs are included within the owning catalog's RPL pool.) CAXCNAM Catalog name (for catalog CAXWA only). CAXVOLID CRA volume serial number (for CRA CAXWA only). CAXCRACB For a catalog CAXWA: pointer to the first CRA ACB. For a CRA CAXW A: pointer to the CRA ACB. Major flags in the CAXWA are: Flag Description CAXBLD The catalog or CRA is in the process of being created. CAXOPN The catalog or CRA is being opened. CAXCLS The catalog or CRA is being closed. CAXEOV The catalog or CRA is being extended. CAXMCT The CAXWA represen ts the master catalog. CAXF2DT The catalog has been deleted. Section 5. Component Analysis 5.10.5 Catalog Management (continued) Flag Description CAXF2NDD Unable to OPEN or CLOSE - DDNAME not found. CAXF2NCR Unable to OPEN or CLOSE CAXF2IOE Unable to OPEN or CLOSE - I/O error. CAXF2REC The catalog is a recoverable catalog (catalog CAXWA only). ~ insufficient main storage. CTGPL The CTGPL (catalog parameter list) is built by the routines that issue SVC 26 to represent the desired catalog management request. The storage area where this block resides varies and is controlled by the building routine. When a caller issues SVC 26, the caller's registers are saved in the SVRB under which catalog management operates. Register 1 of this SVRB's register save area points to the CTGPL. The CTGPL may also be located via the CCA (CCACPL). Note: At times, catalog management processing uses CCACPL as a pointer to an internal CTGPL. Therefore, you should be careful when you use this pointer to locate the caller's CTGPL. Major fields in the CTGPL are: Field Description CTGOPTI CTGOPT2 CTGOPT3 CTGOPT4 CTGOPTNS CTGTYPE These fields contain the codes and flags that indicate the type of function requested. CTGENT Pointer to the entry name or CI number (for types of requests other than DEFINE or ALTER). CTGFVT Pointer to the field vector table (FVT) for DEFINE and ALTER requests. CTGCAT Pointer to an area that indicates the specific catalog (if any) to be used in processing this request. The area may contain either the catalog name or a pointer to the catalog's ACB. If no specific catalog is indicated, CTGCAT will be O. CTGWKA Pointer to the work area. In general, catalog management stores the requested information into this area. CTGNOFLD Number of FPL pointers in CTGFIELD. CTGFIELD An array of 4-byte FPL pointers. The FPLs describe the data fields that the request is to process. 5.18.6 OS/VS2 System Programming Library: MVS Diagnostic Techniques .- Catalog Management (continued) CCA The CCA (catalog communications area) is the main VSAM catalog work area. It is built upon entry to the VSAM catalog processor and freed just before exit. The CCA resides in sub pool 252 of the caller's address space. Register 11 points to the CCA. Major fields in the CCA are: Field Description CCAID Control block identifier (X'ACCA'). CCAPROB Error data - consists of a CSECT ID (2 bytes), reason code (I byte), and error code (I byte). CCATCB Pointer to the caller's TCB. CCACPL Pointer to the CTGPL. CCAACB Pointer to the ACB of the catalog that is currently being processed. CCAURAB Pointer to the record area block (RAB) of the record area currently in use. CCASRCH Search argument for I/O requests. CCARxREC Pointer to record area x. (There are six record areas, record area 0 through record area 5; x indicates the number of the record area in question.) CCARPLI Pointer to the RPLthat is currently assigned to this request. CCAEQDQ An ENQ/DEQ parameter list that is used when VSAM catalog management issues the RESERVE macro. CCAMSSPL A GETMAIN/FREEMAIN parameter list that the VSAM catalog processor uses for most GETMAIN/FREEMAINs. CCACMS Pointer to the catalog management selVices work area (CMSWA); it is used only for DELETE, ALTER, DEFINE, and LISTCAT ALOG requests. CCAREGS An array of small (12·byte) register save areas. When a VSAM catalog processor routine calls a lower level (nested) routine, the contents of registers 12·14 are saved in the next save area by the routine that is called. Registers 12 and 14 contain the calling routine's base address and return address, respectively. Register 13 is used to maintain position within the array. Each time register 13 is saved, it points to the preceding save area. During a lower level routine's processing, register 13 points to the current save area (that is, the area containing the caller's registers). When a lower level routine exits, registers 12-14 are restored which causes register 13 to be auto· matically switched (the preceding save area becomes the current save area). Whenever VSAM catalog processor routines branch·enter external routines, they pass a standard 72·byte save area to the external routine. This is accomplished by increasing register 13 by 12 during the process of setting up the linking conventions for the branch and link. (The 72 bytes Section 5. Component Analysis 5.10.7 Catalog Management (continued) Field Description CCAREGS ( continued) that follow the current save area are used as the standard save Note: The register contents stored within this array area. can be used in debugging to identify predecessor routines and modules.) CCARAACB Pointer to the ACB of the CRA that is currently being processed, or zero. CCARARPL Pointer to the RPL that is currently assigned to this request for CRA I/O use. Major flags in the CCA are: Flag Descrip tion CCAFLG1-4 Miscellaneous processing control flags. CCARPLX I/O option flags : 00 .....0 PUT direct 00 ..... 1 PUT sequential 01. ..... ERASE 1. .....0 GET direct 1. ..... 1 GET key equal to or greater than · .0 . . . .. Use the record area pointed to indirectly by CCAURAB · .1 . . ... Use record area 0 · ..0 . . .. Addressed or CI operation · .. 1 . . .. Keyed operation · ...0. . . Update operation · ... 1 . .. Non-update operation · ....0 .. Check for errors · .... 1 .. Bypass error checking · .... ·.0. 50S-byte low-key range record · ..... 1. 5.10.8 47 -byte high-key range record CCAFLG9 Miscellaneous CRA processing flags CCARVFG1 Miscellaneous recovery (EST AE) control flags OS/VS2 System Programming Library: MVS Diagnostic Techniques Catalog Management (continued) Module Structure Catalog management is packaged into three load modules. These modules are the following: 1. IGC0002F - Catalog Controller 2. IGGOCLAI - VSAM Catalog Processor 3. IGGOCLCA - CVOL Processor This set of modules resides within SYS1.LPALIB and can be viewed as a type 4 SVC routine consisting of three load modules. Catalog management receives control via SVC 26 and operates under an SVRB. Control is passed between the three load modules via XCTL. Each load module establishes its own ESTAE routine. A brief description of each load module follows. 1. IGC0002F - Catalog Controller The function of this module is to translate (map) interfaces. The module logically processes from a front end and a back end. The front end receives control from the SVC SLIB whenever SVC 26 is issued. Register 1 points either to an as CAMLIST or a VSAM CTGPL. If register 1 points to an as CAMLIST, the as request is translated into an appropriate VSAM request (a CTGPL is constructed). Control is then passed to IGGOCLAI. The back end receives control (at EP IGGOI02F) from IGGOCLAI upon completion of a VSAM request for a VSAM catalog. It determines if the original request was an as CAMLIST request and if so, it translates the CTGPL output and the IGGOCLAI return code into appropriate CAMLIST format. It then returns control to the issuer of SVC 26. For a more detailed description of this module, see OSjVS2 Catalog Management Logic. 2. IGGOCLA1- VSAM Catalog Processor IGGOCLAI is a large load module that consists of many CSECTs anq procedures. Control is passed between the various procedures via CALls. This module relates a request to a specific catalog and also determines the catalog type. If the catalog is an as CVOL, IGGOCLAI passes control to the CVOL processor (IGGOCLCA). Otherwise, IGGOCLAI accesses the VSAM catalog and: performs the function indicated by the CTGPL. When the function is completed, IGGOCLAI exits by passing control to the back end of IGC0002F. For a detailed description ofVSAM catalog management, see OSjVS2 Catalog Management Logic. 3. IGGOCLA. - CVOL Processor IGGOCLCA is a load module that consists of several CSECTs and procedures. Control is passed between the various procedures via CALLs. This module translates CTGPL requests into as catalog requests and accesses as CVOLs to perform the indicated function. Upon completion of processing this module returns control to the issuer of SVC 26. For a detailed description of this module, see OSjVS2 CVOL Processor Logic. Section 5. Component Analysis 5.10.9 Catalog Management (continued) VSAM Ca talog Recovery Logic This section describes how mairiline VSAMcatalog management supports recovery and also how its recovery routine works. Mainline VSAM catalog management does the following: • Establishes/releases the recovery environment • Maintains a pushdown list end mark • Tracks GETMAIN/FREEMAIN activity • Maintains a CMS (catalog management services) function gate Establishing/Releasing a Recovery Environment To establish or release a recovery environment, the following actions occur: 1. Sub function BLDCCA in module IGGOCLC9 issues a branch entry to EST AE to establish the recovery environment. This is done immediately after storage has been obtained for the CCA via GETMAIN~ 2. When BLDCCA completes the initialization of the CCA, it sets RVCCAV to indicate that the CCA is now valid .. 3. Sub function IGGPRCLU (request cleanup) in module IGGOCLC9 performs the following: • Indicates that the CCA is no longer valid (RVCCAV = oft) • Frees any GETMAIN/FREEMAIN tracking spill blocks that may exist • Branch enters ESTAE to remove the recovery environment Maintaining a Pushdown List End Mark A pushdown list end mark is maintained so that the ESTAE recovery routine can reliably locate the last pushdown list entry. This enables the recovery routine to determine: 1. The address at which the last call to a nested sub function was issued. 2. The routine to which this call was directed. There is an instruction in the exit procedure code contained within each CSECT to i~sure that the first byte following the last active entry contains an end-of-list marker. (Note that X'OO' and X'FF' are considered end-of-list markers.) S.10.10 OS/VS2 System Programming Library: MVS Diagnostic Techniques Catalog Management (continued) Tracking GETMAIN /FREEMAIN Activity GETMAIN/FREEMAIN tracking provides the recovery routine with the information it needs to automatically issue FREEMAINs against those areas of main storage that have been acquired and not yet freed by VSAM catalog management. The GETMAIN/FREEMAIN tracking function is implemented as follows: 1. 2. A 256-byte contiguous area is defined in the CCA. The area consists of: a. A 248-byte tracking buffer. b. A single entry GETMAIN/FREEMAIN length list (four bytes) with the high-order byte initialized to X'80' and the low-order three bytes defined as CCAMNLEN. c. The GETMAIN/FREEMAIN address word (CCAMNADR). The ?GETMS and ?FREEMS macros generate code that: a. Track the operation. This is accomplished by an MVC instruction that traces the GETMAIN/FREEMAIN length and address by pushing it (shifting it left) to the bottom (low address) of the 248-byte tracking area. b. Check for a full tracking buffer. If the buffer is full, a spill routine (IGGPARFS) is called before the tracking MVC instruction is issued. This spill routine: (1) Issues GETMAIN to obtain a 256-byte spill buffer. (2) Chains this buffer to the end of the spill buffer chain. (Note: Chain anchor words are located in the CCA.) (3) Copies the CCA tracking buffer into the new spill buffer. (4) Clears the CCA tracking buffer. c. If the ?GETMS macro call is specified with CLASS(S) for storage (global), a flag (MNATSCLS) is set in the first byte of the two-word trace entry to indicate this. Refer to the description of CCAMNCAT, a work area that is located at CCA+X'308', contained in OS/VS2 Catalog Managemen t Logic. eMS Function Gate The CMS function gate assists the recovery routine in determining if DEFINE or DELETE back out action is required. This gate is represented by a bit (RVCMSFG) in field CCARVFG 1. The bit is turned on by the CMS driver (IGGPCDVR in module IGGOCLAT) immediately after a successful return from the check authorization function. The bit is reset upon entry to the CMS cleanup function (IGGPCCLN in module IGGOCLAT). Section 5. Component Analysis 5.10.11 Catalog Management (continued) Recovery Routine Functions VSAM'scatalog processor recovery routine is labelled IGGPCMRR (CSECT IGGOCLA9). This recovery routine is entered from MVS's recovery tennination manager (RTM) whenever an error or interruption occurs either in VSAM catalog management or in any successor routine that VSAM catalog management can cause to receive control. A pointer to the ST AE diagnostic work area (SOW A) is passed as input to IGGPCMRR: IGGPCMRR performs the following functions. (Functions 2-13 are performe.d only when the CCA is marked valid, that is, RVCCAV = ON.) 1. Retrieves the CCA pointer from the SDWA and puts it into register 11. 2., Saves the RTM return address in CCARI4S. 3. Saves the SDWA pointer in CCASDWAP. 4. Produces diagnostic output. 5. Initializes register 13 to point to the first register save area. 6. Cleans up RPLs (if required). 7. Determines if backout is to be performed. 8. Checkpoints the CCR (if required). 9. Drops catalog orientation. 10. Frees storage (using GETMAIN/FREEMAIN tracking information). 11. Frees GETMAIN/FREEMAIN tracking spill blocks (if any exist). 12. Performs DEFINE/DELETE backout (if applicable). 13. Restores the RTM return address and the SOWA pointer. 14. Frees the CCA. 15. Returns to RTM indicating that RTM should continue with termination. The following sections describe the more complex of these recovery routine functions in greater detail. Diagnostic Output (Function 4) Diagnostic output is'P{oduced except in those situations where the recovery routine is invoked only for-elean up type functions, such as CANCEL. Diagnostic output can be produced in two fonns-: 1. . 5.10.12 Iriformation is' placed in a vaoable recording ~rea (SDWAVRA) within the SOWA. This data is written t~~thN)ys I.LOG~EC data set as part of an entry describing the error . OS/VS2 System Programming Library: MVS Diagnostic Techniques Catalog Management (continued) This variable data is formatted as follows: Byte 2. Length Description of Data 0(0) 8 VSAM catalog processor module name - 'IGGOCLA1' 8(8) 3 IGGOCLA1 's entry point address 11(B) 8 Procedure name of the last-called routine 19(13) 3 Address of the last-called routine 22(16) 8 Procedure name of the routine that called the last-called routine 30(1 E) 3 Address of the CALL to the last-called routine 33(21) 4 The characters 'CPL=' 37(25) 28 A copy of the user's CTGPL An SDUMP is-taken (if allowed by the system). Backout (Function 7) °Backout is performed for DEFINE or DELETE requests (except for DEFINE or DELETE catalog requests) when the CMS function gate is active (RVCMSFG = ON). When backout is to be performed, a switch (RVESBOR) is set. The backout function (Function 12) is described later in this chapter. Drop Catalog Orientation (Function 9) This function uses the normal IGGPRPLF sub function to perform the RPL freeup/DEQ functions. Storage Freeup (Function 10) This function frees all the storage (with the exception of the CCA and any existing tracking spill blocks) that has been acquired and is still owned by the current VSAM catalog management request. Storage freeup is done as follows: 1. The GETMAIN/FREEMAIN tracking data is scanned starting at the first spill block (if any) and following the chain of spill blocks. When the last spill block has been processed, the scan continues with the first valid entry in the CCA tracking buffer. This first scan selects and eliminates paired entries; a paired entry consists of two entries with matching storage addresses, which indicate that the storage area in question has already been freed. 2. The tracking data is scanned again. During this second scan, each valid remaining entry is processed as follows: a. The length and address of the storage to be freed are extracted from the entry. Section 5. Component Analysis 5.10.13 Catalog Management (continued) b. The subpool is determined from a switch setting within the entry. c. A ?FREEMS macro is issued to free the main storage. This macro specifies "RFR (NO)" to prevent recursive tracking. DEFINE/DELETE Backout (Function 12) This function attempts to preserve catalog integrity by cleaning up partiallycompleted DEFINE or DELETE operations. It uses the normal DELETE function to accomplish this. The switch indicating that backout is required is tested. If this switch is on, the following actions are performed: 5.10.14 1. A backout work area is obtained. 2. A DELETE CTGPL is constructed in the backout work area. This CTGPL is set up to cause a DELETE of the object that was being defined (with DEFINE) or deleted (with DELETE) whenever the error occurred. 3. The CCA is rebuilt as follows: a. CCACPL, CCASZ, CCATCB, CCASDWAP, CCARI4S, and CCARVFGl are saved (in the backout work area). b. The complete CCA is cleared. c. The previously-saved fields (with the exception of CCACPL) are restored. d. CCAPCL is initialized to point to the CTGPL, which was built into the backou t work area. e. CCAID, CCAURAB, CCAROREC through CCAR5REC, CCAEDXFF, CCAMNPTR, CCAMNLLP, CCAMNLL, and register 13 are reinitialized to their original values. f. CCAF2SYS is set on. g. RVESBO is set on to indicate that backout is in control. 4. The CMS driver (IGGPCDVR) is invoked which then invokes the DELETE function; when the DELETE action is complete, control is returned to the recovery routine. 5. The CCR is checkpointed (if required). 6. Catalog orientation is dropped (via a call to IGGPRPLF). 7. CCACPL is restored. 8. The backout work area is freed. 9. Any spill blocks acquired during the backout process are freed. OS/VS2 System Programming Library: MVS Diagnostic Techniques Catalog Management (continued) Debugging Aids The control block structures for the VSAM catalog reside in the CSA. There is a built-in communications vector table (CVT) debug word which allows you to get a console dump at the time of the failure. This word is located at CVT + X'10B' and is examined by module IGGOCLC9 at the end of each catalog request. Following are the contents of the CVT debug word: Byte 0 (X'10B') bits 0-3 must remain unchanged bit 4 not used by catalog bit 5= 1 causes message IEC3311 to be issued when condition specified in byte 1 (X'109') is met. IEC3311 contains the name of the catalog module which detected the error. bit 6 not used by catalog bit 7=1 prevents catalog FRR (IGGOCLA9) from freeing the catalog communications area (CCA) so that it is available in the dump. Byte 1 (X'109') Condition for which action specified at location X'10A-I0B' is to be taken. X'O l' - take action at end of every catalog request X'02' - take action for any non-zero catalog return code X'03' - take action for return codes other than those considered to be "nonnal". (The following are considered to be normal return codes - X'OO, OB, 24, 2B, 2C, 4C, BC' and reason codes X'2B, BC, and FO'). X'04' to X'FF' - take action only when catalog return code equals value in this byte. Bytes 2 and 3 (X'10A-I0B') Action to be taken on above condition: X'07FE' return immediately to inline catalog code and continue processing. This setting, in conjunction with bit 5 of byte 0, causes no action other than message IEC331I. X'07FF' - will cause loop here at CVT + X'10A' to allow console dump of failing ASID. To break job out ofloop. either cancel the job or set these bytes to X'07FE' to continue processing. Section 5. Component Analysis 5.10.15 Catalog Management (continued) When message IEC3311 appears by itself, use the above CVT trap to get a dump of the failure. When messages IEC33II, IEC3321, and IEC3331 appear together, the error is the result of a call to record management. Message IEC3331 contains the record management return code in the form Lxxx (for logical error) or Pxxx (for physical error) where xxx = decimal return code. In these cases use the CVT trap discussed earlier in the Record Management Debugging Aids section of VSAM component analysis. In situations where an att~mpt to open a VSAM catalog results in message IEC161I 004-080, it is difficult tQ determine the exact nature of the problem because there are many conditions which can cause this error. The best place to trap dump is at label 'CAPERR' in modules IFG0191X and IFG0191Y. Register 14 at that point will be in the calling routine which detected the failure. It is sometimes necessary to examine the records in the catalog as part of the problem analysis. The following is an example of the access method services job necessary for this. / /PRINT EXE~ PGM=IDCAMS / /STEPCAT DD DSN=catalogname, DISP=SHR / /DD 1 DD DSN=catalogname, DISP=SHR //SYSPRINT DD SYSOUT=A DD * //SYSIN PRINT INFILE(DDl) /* The following ENQs are issued for catalog processing: Major Name Minor Name Modules Reason SYSIGGVl MCATOPEN IGGOCLAC IGGOCLAD Open master catalog SYSIGGV2 SYSIGGV2 catalogname catalogname IGGOCLA3 Assign RPL processing SYSVTOC volser IGGOCLBU Read/Write format 4 DSCB SYSZCAXW CAXW IDACATII IDACAT12 IGGOCLBG Open, close, or delete Catalog request SYSZPCCB PCCB IGGOCLA3 While building PCCB for catalog open SYSZTIOT asid IDACATll IDACAT12 IGGOCLAD Open and close of catalog , IEZIGGV3 5.10.16 addr of caxwa OS/VS2 System Programming Library: MVS Diagnostic Techniques IGGOCLA3 Component recovery area (CRA) orientation While Caxwa RPL count is being altered. Allocation/Unallocation This section is divided into four parts. Part one provides a description of the six major functionalareas of allocation/unallocation and the way in which they interrelate. Parts two, three, and four contain general debugging aids, debugging hints, and reason codes. Functional Description Figure 5-36 illustrates the control-flow discussion that is presented in the following paragraphs. Batch .JFCB Housekeepi n9 Routine I Figure 5-36. The Relationship of the Six Major Functions-of Allocation/Unallocation Section 5: Component Analysis 5.11.1 Allocation/Unallocation (continued) Allocation The flow through allocation following either batch initialization or dynamic initialization is the same: • Batch/dynamic initialization and control invokes JFCB housekeeping • Batch/dynamic initialization and control then invokes common allocation • Common allocation inv()kes volume mount and verify (if volume unloading or mounting is needed). Unallocation At batch/dynamic unallocation, the control flow is as follows: • Batch/dynamic initialization and controlinvokes common unallocation • Common unallocation invokes volume mount and verify (if any volume unloading is needed). • Batch initialization and control invokes volume mount and verify (if volume unloading is needed). Batch Initialization and Control Batch initialization and control uses the following control blocks: • • • • Job control table (JCT) Step control table (SCT) linkage control table (LCT) Job step control block (JSCB) The SCT is needed to locate the chain of step I/O tables (SlOTs) and job file control blocks (JFCBs) in the scheduler work area (SWA). A SlOT and its corresponding JFCB are constructed by the converter/interpreter for each DD statement in a job step's JCL. Allocation allocates one step at a time. The SlOTs and JFCBs for a step are read by batch initialization and control when initializing for the allocation or unallocation of a step. At step initiation, space for the task I/O table (TIOT) is obtained, and the JSCB is initialized to point at the top of the chain of data set association blocks (DSABs), which are actually constructed by common allocation. At job step allocation, the SlOTs and JFCBs are passed as the main input, first to JFCB housekeeping, and then to common allocation. At job step unallocation, the SlOTs and JFCBs are passed as the main input to common unallocation. At the end of the job, batch initialization and control uses a volume unload table (VUT) to determine those private volumes that belong to the ending job and that are to be unloaded. Unloading is done by volume mount and verify (VM&V). . 5.11.2 OS/VS2 System Programming Library: MVS Diagnostic Techniques Allocation/Unallocation (continued) Dynamic Initialization and Control When dynamic initialization and control is invoked, the job step's SlOTs and JFCBs must be read. This is done only for the first dynamic allocation during a given job step. The caller's parameters are syntax- and validity-checked and used to build a SlOT and JFCB, just as in a DD statement. EXisting allocations (represented by an existing DSAB and TIOT entry) are used where possible to satisfy the request. If the requested data set is already allocated, certain information is copied from the SlOT and JFCB of the existing allocation to those of the new allocation. By using the existing allocation, invocation of JFCB housekeeping and common allocation is avoided. If an existing allocation cannot be used to satisfy the dynamic request, the SlOT and JFCB built by dynamic initialization and control are used, first as input to JFCB housekeeping, then to common allocation. After common allocation completes, the SIOT(s) representing the request is chained to the step's other SlOTs. If dynamic unallocation is being requested, the parameters must be syntaxand validity-checked. The correct SlOT is located and passed to common unall oca tion. JFCB Housekeeping the major input to JFCB housekeeping is the SlOT chain, each SlOT having an associated JFCB. JFCB housekeeping completes needed information about either batch or dynamic allocation requests that was not placed in SlOTs and JFCBs by the converter/interpreter. Allocation parameters that JFCB housekeeping completes are the name, volume, unit, DCB, and disposition of the data set. Before processing these parameters, JFCB housekeeping, using dynamic allocation, allocates to the initiator's task control block (TCB) any STEPCAT DD or JOBCAT DD statements. A private catalog control block (PCCB) is built for each such catalog allocated, and all SlOTs are processed, one at a time. This JOBCAT/STEPCAT processing takes place in a batch environment only. Information for a request is placed in the JFCB housekeeping work area as a SIOT/JFCB pair, is processed and reinitialized for each SlOT. If volume information was not specified for an old data set, the passed data set information (PDI) is searched (only in a batch environment) in the SWA to locate volume and unit information. If not found, or if the data set name is a generation data group (GDG) single name, a catalog LOCATE is issued to obtain the volume and unit information. If volume reference is specified in the SlOT, either the data set referenced is located in the PDI or via catalog LOCATE, or the SIOT/JFCB of the referenced DD statement is found. The source of volume and unit information is recorded in the JFCB housekeeping work area; the information is then retrieved and placed into the SIOT/JFCB being processed. A DCB reference to a cataloged data set is resolved by LOCATE and OBTAIN. A DCB reference to a DD statement is resolved by going to the JFCB of the referenced DD statement and then issuing an OBTAIN. Finally, disposition-related information is entered into the SIOT/JFCB. Section 5. Component Analysis 5.11.3· Allocation/Unallocation (continued) Common Allocation Common allocation receives as input the SlOTs' and JFCBs of allocation requests. For requests that do not require a unit to be allocated, namely, DUMMY, Via, and subsystems, DSAB and TIOT entries are built and the SlOT is marked "allocated." For each request requiring units, a list of eligible devices called the eligible device list (EDL) is constructed, and pointed to by the requestor's SlOT. An entry is built into the volunit table representing each volume/unit required. Inter-DO relationships are represented primarily by setting fields in the VU table for use by the remainder of common allocatioll. The remainder of common allocation is divided into: ., • • • Fixed Device Allocation TP Allocation Generic Allocation Recovery Allocation. Common allocation control invokes each of these functions in the order: indicated. If all requests have been allocated, any requests needing volumes mounted have volume mount and verify (VM&V) RBs chained to their SlOTs. These VM&V RBs are chained to each other and sent to VM&V on input. VM&V mounts the' necessary volumes. Fixed Device Allocation , Allocation for any request that can be allocated to a volume on a permanentlyresident or reserved DASD uses fixed device allocation. The allocation of a request (VU entry) involves: • The selection of the device • The building of the DSAB (pointedto by a SlOT) • The building ofa TIOT entry (pointed to by a DSAB) • Setting indicators in the unit control block (UCB) of the selected device • Issuing DADSM • Demounting incorrect volumes (except in the case of fixed device allocation) • Scheduling a mount (by building a VM&V request block (VM&V RB) if a volume must be mounted) , TP Allocation This is a small specialized operation for teleprocessing lines. TP lines, once allocated, remain allocated whether online or not, and cannot be reallocated. 5.11.4 OS/VS2 System Programming Library: MVS Diagnostic Techniques Allocation/Unallocation (continued) Generic Allocation Generic allocation attempts to allocate the remaining requests that were not allocated by previous processes. Requests for tapes, demountable direct access volumes, graphics devices, and unit record devices are not considered until generic allocation. A special set of tables, the generic allocation tables are built to represent the units eligible for each request (VU entry). These tables are used throughout generic and recovery allocation. Generic allocation processes requests not sequentially but on the basis of generic device type. The order in which generic device types are chosen is determined by a table, built at SYSGEN time, called the device preference table. Recovery Allocation Requests left unallocated by previous steps are allocated by recovery allocation. The main functions of recovery allocation are to interface with the operator to request that offline devices be brought online, and, once online, to allocate these devices to unallocated VU entries. Common Unallocation The input to common unallocation is a chain of RBs, each of which points to a SlOT to be unallocated. Disposition processing uses the SIOT/JFCB and common unallocation RB to give the data set a disposition. Units allocated to each SlOT are unallocated by using the TIOT entry. Private tape volumes are unloaded and the VUT is updated with volume serials to indicate which of the job's volumes were left mounted at unallocation time, but need demounting by batch initialization and control at end of job. Data sets are released (dequeued) by using the data set enqueue table to determine if the data set's last use in the job is in the current step. All volumes used by a step are released by a generic dequeue if unallocation is for a step. In the dynamic unallocation environment, only the subject request's volumes are dequeued. Volume Mount and Verify Volume mount and verify (VM&V) mounts, verifies, and unloads volumes. VM&V is driven by a chain of VM&V request blocks. A VM&V count table is built in which the numbers of mount, verify, and unload requests are maintained. In mounting and verifying direct access volumes, VM&V builds a mount verification communication area (MVCA) in CSA. This contains a pointer to an MVCA extension (MVCAX), which MV&V builds in the user region. The MVCAX contains a device-end ECB and UCB pointers for each device for which a mount has been issued. After issuing mounts and building the MVCA/MVCAX blocks, VM&V waits for the device-end ECB in the MVCAX. Whenever a device-end occurs on a unit that VM&V is waiting for, a nucleus routine (IEFVPOST) posts the device-end ECBs in all MVCAXs. Any VM&V that is waiting looks at all UCBs being waited for. Volume serials are read and verified when the devices become ready. Section s. Component Analysis 5.11.5 Allocation/Unallocation (continued) Volume unloading is accomplished for DASD by issuing an unload message to the operator and clearing volume-related data from the VCB. For tape volume unloading, a physical rewind/unload operation is also performed. Virtual volume unloading is accomplished by issuing an unload SVC (SVC 126), and clearing volume-related data from the VCB. General Debugging Aids Described here in general terms are the following: • • • • Allocation Module Naming Conventions Registers and Save Areas Common Allocation Control Block Processing EST AE Processing Allocation Module Naming Conventions All Allocation module names have the following format: IEF B4 - - IEF indicates the module is a scheduler module. The fourth character has the following meaning: • If A, the module is part of common allocation, common unallocation, JFCB housekeeping, or volume mount and verify. • If B, the module is part of batch allocation or batch unallocation. • If D, the module is part of dynamic allocation or dynamic unallocation. B4 identifies the module as a part of allocation. The last two characters are a unique module identifier. Regist~/and Save Areas Allocation follows standard register saving and usage conventions. Register 13 is used as a save area pointer, register 14 as a return address, and register 15 as a branch address. Register save areas are chained in the standard manner. Since allocation is coded completely in top-down fashion, it is a simple matter to find the flow of control leading to the current point of processing by tracing back through the save areas. All allocation modules have identifiers just after the beginning of the module, which contain the module name in EBCDIC. A graphic representation of control flow can be found under "Allocation/Vnallocation" in "Module-to-Module Control Flow" of Volume 6 of OS/VS2 System Logic Library. 5.11.6 OS/VS2 System Programming Libr8lY: MVS Diagnostic Techniques Allocation/Unallocation (continued) Space for the allocation save areas is obtained in a unique manner, which can be of help in debugging. On entry to allocation, a 4K block of space is obtained from subpool230. This block is used to contain the save area and data area for each module called, until the block is full, at which time another 4K block is obtained. Save areas of modules that had been given control but then returned are still valid, that is not freed, if the 4K block in which they had been placed has not been freed. Allocation does not keep the address of a control block in any particular register. Register 13 always points at the save area of the module in control. Register 12 is usually the base register of the module in control. Common Allocation Control Block Processing This section graphically describes the control blocks used by common allocation and explains how these control blocks reflect allocation processing. Figure 5·37 shows the control blocks which are input to common allocation. Data set associa· tion blocks (DSABs) and their associated task input/output table (TIOT) entries are shown as input. Note that DSABs exist only if common allocation was called by dynamic allocation. When batch allocation calls common allocation, there are no DSABs, but there is a DSAB queue descriptor block (QDB). The first major step in common allocation processing is the construction of the allocation work area (ALeWA). Following this, requests that do not require units, such as DUMMY and SYSOUT DD requests, are allocated. A DSAB and TIOT entry are built for eaGh of these requests as they are allocated. SIOTETIO is initialized to point to the DSAB whenever it is created for a given SlOT. Bit SIOTALCD is set to 1 whenever a request (SlOT) is fully allocated. After allocating these requests, the vol unit table (VU table) is created to represent the unit requirements of remaining (unallocated) SlOTs. In addition., an eligible devices list (EDL) is created for each remaining SlOT. The EDt" contains the unit control block (UeB) pointers to all UeBs representing devices eligible for allocation to the SlOT. (A device is "eligible" at this point whether on or offline, either logically or physically.) Figure 5·38 shows the relationship of the ALeWA, SlOTs, etc., after the VU table and EDLs are built. The first SlOT on the chain (SlOT A) represents a SYSOUT DD statement that has already been allocated. The second SlOT on the chain (SlOT B) represents a SlOT that requires one or more units. It is shown to have 2 volunit entries, which indicates the total number of units that can be allocated to that SlOT. SVOLUNNO in the SlOT contains the number of VU entries for a SlOT. (Note that the total number of units allocated to a request can exceed the number of units requested. This happens, for example, if a specifically requested volume were found to be mounted with the permanently-resident mount attribute). Section S. Component Analysis 5.11.7 Allocation/Unallocation (continued) Problem Program JSCB DSAB ODB DSAB (0) TIOT DSOFRSTP +X'8' DSOLASTP TIOENTRY JSCDSABO +X'140' +X'10' DSABTIOT UCB TIOENTRY r +X'8' DSABTIOT Virtual Address of 1st SlOT to Allocate SlOT (1 per data set) JFCB I +X'98' JFCBX +X'9C' +X'AO' SIOTJFX C +X'AC' [ JFCBXNXT V SlOT -IJFCBX JFCB JFCBX I Figure 5-37. Common Allocation Input 5.11.8 OSjVS2 System Programming Library: MVS Diagnostic Techniques I Allocation/Unallocation (continued) SlOT 'A' (SYSOUT) + X'2B' (SlOTA LCD) = X'02' TIOT +X'94' +X'98' A LCWA +X'8' +X'50' SIOT1 P ( 1-----.. . DSABTIOT [ :J TIOENTRY ~r~~ ( +1st SlOT) VOLUNPTR +X'88' +X'8C' SIOTEDLP SVOLUNAD ~------I +X'98' +X'A8' SIOTNPTR t-------.. . EDL +X'88' SIOTEDLP +X'8C' SVOLUNAD +X'98' - SIOTNPTR (=0) VU entry no. 1 for SlOT 'B' VU entry no. 2 for SlOT 'B' I Figure 5-38. Common Allocation Control Blocks After Construction of Volunit Table and EDLs. Section 5. Component Analysis 5.11.9 Allocation/Unallocation (continued) Common allocation processing is reflected by the status of request's SlOT and VU entries. As each VU entry requiring a unit is allocated, bit VOLALOC (bit 0 (X'80') at +7 into the VU entry) is set on. Bit VDEVREQD (bit 2 (X'20) at +7 into the VU entry), if on, indicates that the VU entry requires a unit. Once all VU entries with VDEVREQD=l for a given SlOT are allocated and VOLALOC=l, the SlOT is marked allocated by setting on SIOTALCD (bit 6 (X'02') at X'2B' into the SlOT). As each unit is allocated to a request, that allocation is reflected in (1) the unit's UCB by setting UCBALOC (bit 4 (X'08') at +3 in the UCB) on, and in (2) the request's TIOT entry by placing the UCB pointer into field TIOUCBP in the TIOT entry. (TIODCBP is at a X'lO' into a TIOT entry for the first unit allocated, at +X'14' for the second, etc.). The first time a VU entry for a SlOT is allocated, a DSAB and TIOT entry are created. For subsequent VU entries allocated to a SlOT, the DSAB and TIOT entries are updated. ESTAE Processing All of allocation is protected from abends by ESTAE processing. Only one ESTAE is issued during allocation. The batch allocation ESTAE exit routine, IEFAB4E4, performs a retry, causing routine IEF AB4E3 to get control. IEFAB4E3 returns to the initiator with a failure return code, causing the initiator to fail the job. All other ESTAE exit routines percolate to the next higher level of ESTAE protection. In a batch unallocation environment, this causes the initiator to terminate. When an abend occurs in a batch environment, message IEF197I "SYSTEM ERROR DURING ALLOCATION/UNALLOCATION" is issued to SYSOUT by ESTAE processing. If the abend occurs in batch allocation or a routine called by batch allocation, such as JFCB hou~ekeeping, message IEF1971 is issued to the job's SYSOUT. If the abend occurs during batch unallocation, the same message goes to the initiator's SYSOUT. An SVC dump is always taken if an abend occurs when allocation is in control. S.11.10 OS/VS2 System Programming Library: MVS Diagnostic Techniques Allocation/Unallocation (continued) Debugging Hints Hints for debugging specific problem areas are described here including: • • • • • Allocation Serialization Device Selection Problems (Non-Abend) OBO Abend OC4 Abend in IEFAB4FC, or Loop in IEFDB413 Volume Mount and Verify (VM&V) Waiting Mechanism Allocation Serialization Allocation serializes on several types of resources. This has resulted in deadlocks between job steps when a programming change caused incorrect seriJIlization. Both dynamic allocation and JFCB housekeeping enqueue on data set names. Dynamic allocation enqueues on non-temporary data set names before calling JFCB housekeeping. JFCB housekeeping enqueues on real data set names when it finds via LOCATE, that the specified data set name is an alias; the fullyqualified names of GDG single requests (found via LOCATE); the individual names in a generation data group; and the data set names of temporary, non-VIO data sets. (The initiator enqueues all non-temporary names of JCL-specified data sets before a job starts). Data set names are dequeued by unallocation, either batch or dynamic, in the last step in which the data set is referenced. Common allocation enqueues on volume serials of all specific volume requests except for direct-access volumes, which are either permanently resident or reserved. This is done after the allocation of permanently resident or reserved direct-access volumes, that is, following fixed device allocation. The volume serials of demountable volumes allocated to non-specific volume requests are enqueued either when the volume is allocated (if the volume is already mounted) or when the volume is mounted (if allocation mounts and verifies it). (When there is a nonspecific request for tape, OPEN enqueues the tape-volume serial numbers because allocation only waits for direct-access volumes to be mounted.) Before actually allocating a device, common allocation serializes the status of devices by enqueuing on several resources all with the major name SYSIEFSD. The minor names and functions serialized are as follows: 1. Q4 - to serialize device allocation with VARY offline processing, which is actually done by common allocation 2. CHNGDEVS - to serialize device allocation with device unloading done by the UNLOAD operator command and JES3 3. DDRDA - to serialize devices allocation with dynamic device reconfiguration (DDR) processing of direct access devices and 4. DDRTPUR - to serialize device allocation with DDR processing of tape and unit record devices Section 5. Component Analysis 5.11.11 Allocation/Unallocation (continued) . These four resources are enqueued for shared use by allocation and for exclusive use by the other functions. Within common allocation, these resources, with the exception of Q4, are dequeued when allocation must wait on an allocation recovery WTOR or on an allocation group. Allocation serializes, via an internal mechanism, the processing of all devices except direct access devices containing. non-demountable (permanently mounted or reserved) volumes. The serialization unit is an allocation group. This serialization is done to serialize the device allocation in one address space with that in another. Group serialization" is exclusive, that is, it prevents an allocation in a given address space from considering the same device that an allocation in another address space is considering. All allocations serialize on groups in the same order; this order is specified at sysgen and isrepresented in the csect PREFTAB, which is part of allocation load module IEFW21SD. PREFTAB is simply a list of generic device types. To serialize changes to a specific UCB,allocation and unallocation always obtain the local and CMS locks before setting fields in the UCB. Dynamic allocation serializes with itself so that only one dynamic allocation may proceed in an address space. This is done by an enqueue for exclusive use on major name SYSZTIOT, the minor name consisting of the 2-byte ASID and 4-byte address of the DSAB QDB. Subsystem Allocation Serialization Allocation does not serialize when processing subsystem data set requests, but provides the capability whereby a subsystem may serialize its own requests if it so desires. The mechanism to do this is the subsystem allocation sequence table (SAST). A skeletal SAST is built during subsystem interface initialization to define the order in which subsystems are to be invoked for the allocation of subsystem data sets. During common allocation processing the subsystem requests are sorted by subsystem. Using the sequence defined by the SAST, all requests for a given subsystem are passed to that subsystem for allocation before the next subsystems requests are processed. Thus a subsystem can serialize its allocation processing in order to prevent deadlocks. Device Selection Problems (Non-Abend) The device selection logic of common allocation is heavily dependent on the eligible devices table (EDT) which is built at SYSGEN. The EDT describes the unit eligibility of any unit name that may be specified either via JCL or dynamic request. Users have in the past tried to modify the EDT without doing either a full or an I/O SYSGEN. Modification of the EDT can result in incorrect allocation, for example, allocation of a 3330 request to a 2314,or failure of a request or job ste.p with no error indicated. If such a device selection error occurs after modification of an EDT, the modification is suspect and should be carefully verified by consulting the EDT descriptions in the OS/VS2 System Logic Library section on Data Areas, and/or EDT mapping in OS/VS2 Data Areas (microfiche ),via mapping macro IEFZB421. 5.11.12 OS!VS2 System Programming Library: MVS Diagnostic Techniques. Allocation/Unallocation (continued) Address Space Termination When an address space is being abnormally terminated, the allocation address space termination routine, IEFAB4E5, gets control. This routine releases any allocation groups held by the address space and un allocates any non-shareable units allocated to the address space. Non-shareable units include all units except shareable directaccess devices. The ASID of the address space allocated to a non-shareable unit is at X'E' (halfwonU in the common UCB extension. OBO Abend OBO abends have occurred in allocation more than once. The code is issued by the SWA manager, which handles the reading, writing, and assigning of SWA records. Allocation requests all these functions of the SWA manager. Two situations cause allocation to receive a OBO abend from the SWA manager: 1. The address of a SWA record to be read or written, in behalf of allocation, has been overlaid. Allocation usually obtains a SWA virtual address (SV A) to read or write from another SWA record. When such an SVA has been overwritten by a scheduler SUb-component, a OBO abend may occur. 2. A OBO abend will occur when ~location assigns an SVA for a record and then uses the SVA to attempt to read the record without first having written the record. OC4 Abend in IEFAB4FC, or Loop in IEFDB413 This error always occurs when the device type in a UCB is changed from one generic type to another, and when a JCL statement or dynamic request specifies that particular unit. If this error occurs, it can be diagnosed as follows: 1. Find the device type (+ X'l 0') in the UCB of the specific unit. 2. In the EDT (a CSECT that is well mapped in its assembly at SYSGEN), find the look-up entry representing the device type in the DCB. If the requested unit is not among the units represented by the look-up entry, the problem is that the device type in the DCB was changed. Section 5. Component Analysis 5.11.13 Allocation/UnaIlocation (continued) Volume Mount and Verify (VM&V) Waiting Mechanism Volume mount and verify must wait for direct access volumes to be mounted so that the labels can be verified, and so that allocation can enqueue the volume serials for non-specific volume requests and obtain space (for new data set requests). In order to allow for several allocations to be waiting simultaneously, the control block structure shown in Figure 5-39 is set up by VM&V. Each address space waiting for at least one direct access volume to be mounted has its own mount verification control area (MVCA), MVCA extension (MVCAX), one or more mount entries, and, in each mount entry, one or more UCB entries .. Each MVCAX contains an ECB. When an allocation is waiting for a direct-access volume to be mounted, VM&V waits for this ECB in behalf of the allocation. The MVCA chain is anchored in the allocation/termination communication area (ATCA) in the nucleus. The ATCA is pointed to by location CVTQMWR in the CVT. All devices on which allocation waits for a device end (volume mount), will have the scheduler attention table index placed in their VCBs (at +3 in the common VCB extension). The index is X'OC'. Any destruction of the MVCA/MVCAX structure causes one or more allocations to wait "permanently." The wait is not truly permanent, however, because VM&Valso waits for (in a batch environment) the cancel ECB (in the CSCBcommand scheduling control block), which is posted when the operator cancels ajob. In a dynamic environment, VM&V waits for a WTOR ECB, in which case the operator can, via reply, cancel the single mount but not the job. 5.11.14 OS!VS2 System Programming Library: MVS Diagnostic Techniques Allocation/Unallocation (continued) ATCA Nucleus CSA (Subpool 241) 4 8 0 C I -~---~ Memory A (Sub pool 230) Memory B (Subpool 230) o Dev End DCB 4 t---------1 8 C Device Entries I I I I I I Mount Entry o 4 8 t------....... t - - - - - -....... t--------I C I Figure 5-39. VM&V Control Block Structure o 4 8 Device Entries 1------.. . C I I I I Section 5. Component Analysis 5.11.15 Allocation/Unallocation (continued) Allocation/Unallocation Reason Codes The reason codes listed here are divided into three groups: .. Reason codes set by batch and cornmon allocation modules and by JFCB housekeeping modules. • Reason codes set by unallocation modules . • Reason codes set by dynamic allocation modules. Common and Batch Allocation and JFCB Housekeeping Reason Codes The reason codes set by common and batch allocation and by JFCB housekeeping are divided into step-related reason codes and DD-related reason codes. The following are DD-related error reason codes set by allocation and JFCB housekeeping modules and placed in the SIOTRSNC field of the SlOT. The reason codes serve as an index into message module IEFBB4M3. The prologue of IEFBB4M3 lists the modules which detect the error conditions. Reason Code 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Dynamic Allocation Error Reason Code 1700 0244 0210 020C 0458 0214 021C 0480 0224 0398 4714 47A8 47AC reserved 039C 0228 4704 4708 470C 4710 4714 4718 4734 4738 reserved 4740 reserved Message Meaning IEF2121 IEF3711 IEF2111 IEF2111 IEF3651 IEF7021 IEF2211 IEF2101 IE F1951 IEF1921 IEF1941 IEF2461 IEF7211 IEF3721 IEF3181 IEF7191 IEF7201 IEF6881 Data set not found. Telecommunication device is not accessible. Unable to E NO on data set name. Unable to E NO on data set name. Referenced data set name is GOG ALL. Unable to allocate. Invalid backward reference to a step. Invalid UNIT parameter. Maximum number of devices for statement exceeded. Not enough eligible devices. Volume sequence number incorrect. Insufficient space on storage volumes. Protection conflict in ISAM requests (SU 32 only). VOL=REF to unresolved DO. UNIT=AFF to new direct data set. Data set previously defined (SU 32 only). User not authorized to define this data set (SU 32 only). Nullfile and DSNAME conflict in ISAM concatenation. IEF2451 IEF4741 IEF2531 IEF2541 IEF1931 IEF2561 IEF2571 IEF2581 IEF2601 IEF2611 IEF2621 IEF2631 IEF2641 IEF2661 IEF1401 Inconsistent unit name and volser. Unit or volume in use by system task. Duplicate data set name on direct access volume. Insufficient space in VTOC. Space not obtained because of I/O error. Absolute track not available. Space requested not available. Invalid record length in SPACE parameter. Incorrect DSORG or DISP. No prime area request for ISAM data set. Prime area must be requested before overflow area. Space request not on cylinder boundary. Duplication of DSNAME element. Invalid JFCe or partial DSce pointer. Directory space request too large. IEF2731 Invalid user label request. ... - means that the, error cannot be set in dynamic allocation. 5.11.16 OS/VS2 System Programming Library: MVS Diagnostic Techniques Allocation/Unallocation (continued) Reason Code 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 6~ Dynamic Allocation Error Reason Code 474C 476C 4780 035C 0390 0394 0218 0494 022C 0214 0220 4794 4798 479C 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 0490 17FF 022C 024C 0250 03AO 04A4 0484 7700 04A8 91 92 93 94 95 96 04AC 04BO 7704 Reserved 04B4 03A4 0230 0488 048C 47A4 0214 0240 04B8 04BC 0234 0470 046C Message Meaning IE F1271 IEF1281 IEF1291 IEF1301 IEF1311 IE F1321 IEF1331 IEF1341 IEF1351 I EF1361 IEF2671 IEF1451 IEF1411 IEF1431 IEF3661 IEF2191 IEF2861 IEF4661 No SPACE parameter or zero space request at ABSTR O. Invalid request for ISAM index. Multivolume index request. DSNAME element wrong. Multivolume OVFLOW request. CYL and ABSTR conflict in SPACE parameter. CYL and CONTIG conflict in SPACE parameter. Subparameter wrong in SPACE parameter. Zero primary space request. Index area requested twice. Space request for directory larger than primary space request. Space request not ABSTR for DOS volume. Index request did not precede prime request. Last concatenated DO card unnecessary or invalid. Relative GOG generation number contains syntax error. GOG group name exceeds 35 characters. DISP field incompatible with data set name. Unable to recover from DADSM failure. Mounting required but not allowed. Can't access SYSCATLG data set on CVOL. Volume on ineligible permanently resident or reserved device. Units required not available - waiting not allowed. Volumes required not available - waiting not allowed. Data sets overlap in VTOC. DOS split cylinder data sets overlap. Possible VTOC error. VTOC error on second or later volume of ISAM prime data set. Same unit request twice - conflicts exist. Permanently resident or reserved volume on requested unit. Volume containing pattern DSCB not mounted. Pattern DSCB record not found in VTOC. New data set requested on DOS stacked pack format volume. Can't wait for offline devices. Requested device is a console. MSS not initialized. MSS select error. More units required for demand request. Invalid JOBCAT or STEPCAT parameters. Invalid data set name for JOBCAT or STEPCAT. IEF7041 IEF4751 IEF4671 ~EF4851 IEF4761 IEF4771 IEF4781 IEF4791 IEF4811 IEF4821 IEF2171 IEF2181 IEF7031 IEF4831 IEF7261 IEF7251 IEF4841 IEF4931 IEF4921 reserved IEF4801 reserved IEF7011 IEF2131 IEF6871 IEF7521 IEF7521 IEF7521 IEF7521 IEF7521 IEF7521 IEF7531 Unauthorized request"or of subsystem data set. Invalid destination requested. Error changing allocation assignments. Error processing cataloged data set. Requested volume mounted on JES3-managed unit. I The request for a subsystem data set was failed by the subsystem attempting to allocate the request. IEF7541 IEF7551 IEF7561 A SUBSYS parameter specified a subsystem which does not support the allocation of subsystem data sets. The subsystem requested on a SUBSYS parameter was not operational. The subsystem requested on a SUBSYS parameter does not exist. A system error occurred in allocating a subsystem data set. IE F7401 IEF7411 Data set/volume could not be RACF protected - RACF not active (SU 32 only). Protect request failed - invalid data set/volume specification (SU 32 only). • - means that the error cannot be set in dynamic allocation. Section 5: Component Analysis 5.11.17 AUocation/Unallocation (continued) The following are step-related error reason codes set by allocation and JFCB housekeeping modules in an area pointed to by the allocation work area (ALCWA). With· the exception of reason code 1, the reason codes serve as an index into message module IEFBB4M2. The prologue of IEFBB4M21ists the modules which detect the error condition. Reason code 1 is set by IEFAB469 and is returned to dynamiC allocation. Reason Code 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 *- Dynamic Allocation Error Reason Code 023C 0204 0220 0484 0238 0220 049C 0474 0248 0450 172C 1718 670C 0478 047C 0214 0490 0468 0498 04AO 024C 0250 03AO 04A4 7700 Message IEF1801 IEF7131 reserved IEF2511 IEF2401 IEF4851 IEF71'1-1 IEF4731 IEF7161 IEF4911 IEF3631 IEF3641 IEF3671 IEF4651 IEF4561 IEF7001 IEF7011 IEF3611 IEF3621 IEF2021 IEF2021 IEF7151 IEF7171 IEF7181 IEF7511 IEF7511 IEF7511 IEF7511 IEF751! Meaning Catalog not mounted. GETMAIN error. MSS volume not available. Job cancelled. No space in TIOT. Volumes not available and waiting not allowed. MSS volume not defined. System Resources Manager error. Unable to mount MSS volume. Number of DDs exceeds 1635. Not enough storage for processing cataloged data set. Permanent 1/0 error processing cataloged data set. I/O error obtaining pattern osce. Unable to allocate subsystem data set. Error issuing EST AE macro. Environment changed - no longer able to allocate. Error changing allocation assignments. Unal;>le to allocate private catalog. Unable to unallocate private catalog. Step not run because of condition codes. Step not run because of condition codes. MVS volume inaccessible. Specified virtual volume group (VVGRP) name does not exist. Space or virtual volumE! group (VVG RP) required for nonspecific MSS request. } The job was failed in allocation by a subsystem processing a request to allocate one or more subsystem data sets. means that the error cannot be set in dynamic allocation. 5.11.18 OS!VS2 System Programming Library: MVS Diagnostic Techniques Allocation/Unallocation (continued) Common and Batch Unallocation Reason Codes The following reason codes are set by common and batch unallocation modules. Reason codes 1, 2, and 4 serve as an index into message module IEFBB4MS. Reason code 3 does not result in a message; it is returned to dynamic allocation. Reason Code 2 Message Meaning Mpdule Setting IEF4681 GETMAIN error. IEFBB410,IEFBB414, IEFBB416,IEFAB4AO IEF4691 3 IEF7241 4 IEF4S61 Data sets not released. IEFAB4AO,IEFAB4A6 Volumes not released. (Dynamic allocation only). Step catalogs not allocated. (Warm start only). IEFAB4AO,IEFAB4A8 Error issuing ESTAE macro. IEFBB410,IEFAB4AO IEFAB4A2 In addition, IEFAB4A2 (disposition processor) receives return codes returned by the data management catalog and scratch functions (called by IEFAB4A2 to perform disposition processing). If the allocation is dynamic, these return codes are returned to dynamic allocation as reason codes in a field in the unallocation request block. For batch allocation, the return code is converted to a code for a disposition message. Dynamic Allocation Reason Codes For a description of dynamic allocation reason codes, refer to the topics "Informational Reason Codes" and "Error Reason Codes" in OS/VS2 System Programming Library: Job Management. Section 5: Component Analysis 5.11.19 5.11.20 OS!VS2 System Programming Library: MVS Diagnostic Techniques JES2 JES2 is a job entry subsystem for OS/VS2 MVS. An overview of the JES2 structure is presented in this section. For detailed information on JES2 structure, logic, and control block formats, see OS/VS2 JES2 Logic. A partial list of major JES2 control blocks showing storage location and primary use may be seen in Figure 5-49 at the end of this section. JES2 is a subsystem that runs as an operator-started job in a separate address space. It provides input and output spooling for local and remote unit record devices, and simplified batch scheduling. A subsystem support module, provided by JES2 and located in the page able link pack area (PLPA), is utilized to communicate with other system components in performing job selection and execution. JES2 may be connected to as many as seven other JES2 subsystems via the multi-access spool direct access storage devices. Job Processing Through JES2 JES2 job processing is divided into the following five major phases: Input Jobs are read into the system from online card readers, remote terminal and internal reader interfaces (TSO LOGONs, TSO-submitted j?bs, system tasks, or jobs presented to the internal reader from other sources). These jobs are then . entered into a priority queue to await processing by the next stage. Conversion As soon as the converter is available, the JCL for a job is passed through the converter, scanned for syntax errors, and converted into internal text. Any jobs having JCL errors will bypass execution and be queued for output processing immediately. Those jobs that successfully complete conversion are queued by priority, within class, to await an eligible initiator for execution. Execution Jobs are selected by priority, within class, for an eligible initiator. Input cards are supplied as required to the executing program. Output records are received and written onto JES2 spool devices. At the completion of execution, the job is placed in a queue to await output processing. Output The print data sets created during execution, and messages created during earlier stages, are printed. The punch data sets are punched. Section 5. Component Analysis 5.12.1 JES2 (continued) Purge Upon completion of all processing required for the job, the direct access space acquired by JES2 for the job and all JES2 resources associated with the job are released. JES2 Structure JES2 consists of two basic modules. HASJES20, which operates in JES2 address space and provides the subsystem's job processing functions; and HASPSSSM, which is located in PLPA and provides the interface between the operating system and the HASJES20 programs. HASJES20 Program Structure The HASJES20 module is made up of seven tasks that perform JES2 job processing. The JES2 main task provides the basic functions of reading and spooling job input, converting JCL, selecting jobs for execution from the JES2 job queue, receiving and outputing job output, and job cleanup, all accomplished with a set of programs called basic functional processors. These processors are supported by another set of programs, the control processors, which provide subsystem control and JES2 facilities. Both sets of processors use numerous subroutines called control service programs. The heart of HASJES20 is the dispatcher, which schedules and dispatches various processors under the single TCB of the main task. Since the main task cannot afford to go into a wait state, any JES2 programs that have the potential for waiting are isolated as subtasks. There are six JES2 subtasks: 1. Conversionsubtask - links to OS/VS2 converter. 2. Image loader subtask - loads universal character set (UCS) and forms control buffer (FCB) images. 3. System management facility (SMF) sub task - issues SVC 83 to write accounting records for main task. 4. Communications subtask - issues SVC 34 and SVC 35 for main task operator communications. 5. SNA subtask - initializes JES2 use of the VTAM interface with OPEN ACB. 6. Dynamic spool allocation subtask - initializes JES2's spool volumes (SYS I.HASPACE and SYS I.HASPCKPT). 5.12.2 OS/VS2 System Programming Library: MVS Diagnostic Techniques JES2 (continued) HASJES20 Module Structure HASJES20 shown in Figure S-40, consists of ten source modules which contain JES2 main flow job processing code and associated directories. The HASPINIT module is loaded in JES2 address space to initialize the subsystem. After initialization HASPINIT is deleted. HASPRDR, HASPXEQ, and HASPPRPU contain the functional and control processors needed to effect major job processing steps, along with related specialized subroutines and subtasks. The first 4K of HASPNUC is fixed in JES2 memory since it contains the system routines which provide support to the other modules. HASPNUC also contains the HCT and JES2 module directory (see Figure S-4l). HASPRTAM contains all the access method and line management functions to support both bisynchronous and systems network architecture (SNA) remote job entry terminals. Communication functions are isolated to this module. All processing associated with JES2 multi-access spool systems is contained in HASPMISC along with spool initialization, checkpoint, and job purge operations. HASPNUC HCT Dispatcher I/O Supervisor Service Routines Service Processors Module Directory • • • • • • JES2 Address Space (Low end) HASPRDR HASPXEQ Conversion Processor Conversion Subtask Execution Processor Time Excession Processor Process SYSOUT Processor • • • •• HASPPRPU Output Processor Print/Punch Processor Image Loading Subtask • • • HASPACCT • SMF Subtask HASPMISC Purge Processor Checkpoint Processor Track Group Allocation Subroutine Priority Aging Processor Warm Start Processor Dynamic Spool Allocation Subtask • • • • • • HASPCON Routines • Service Communication Subtask • HASPRTAM RT AM Service Routines (SSC and SNA) Line Manager Processor (SSC and SNA) Remote Console Processor VTAM Subtask (ACS OPEN, CLOSE) API Routines • • • • • HASPCOMM Command Processor • HASPINIT (Deleted after subsystem initialization) JES2 Address Space (High end) I Figure 5-40. HASJES20 Module Map Section 5. Component Analysis 5.12.3 JES2 (continued) HASPNUC HCT R11 (BASE 1) OFFSET 0008 Module MOP} " __------------A~---------__ , ( • HASPXEQ • • HASP/NIT IHASPCOMM HASPACCT ~ I--- JES2 Modules I Figure 5-41. Locating the JES2 Module Directory in HASPNUC HASP Control Table (HCT) The global directory for HASJES20 is the HASP control table (HCT), which is found in HASPNUC. (Figure 5-42 shows the major vector fields of the HCT.) In HASJES20, RII may be used to locate the HCT. Eight bytes into the HCT is the address of the JES2 module map which contains a symbolic name and YCON entry for each of the JES2 modules. 5.12.4 OS!VS2 System Programming Library: MVS Diagnostic Techniques JES2 (continued) HASJES20 R11 l HCT I ReI. 4.1 Offset Rei. 4.0 Offset 0008 0008 0010 001C 0010 001C 0020 OOFO 0020 OOFO 00F4 012C 00F4 012C 0134 0134 0138 0240 $HASPMAP Address JES2 Module Directory 0 •••••••••• • 0 •••••••••• • ••• 0 • 0 ••••• o 0 • • 0 • 0 • • • • o 0 • • • • • 0 • • • 0138 0234 o ••• 0 0 •••••• 0244 0254 0238 0248 · . . • . . . . . . . .Configuration Constraints 0256 029B 024A 028F • '0 • • • • • • • • • • Operating Constraints 029C 02E2 0290 02CE · . . • . . . . . . . • Internal Constraints 02E4 03E8 02DO 03CF • • . . . . . • . . • .Control Fields 03EC 042C 03DO 0410 • • . . . . . . . . . . Processor PCE Addresses 0430 0414 0434 0436 0418 041A 0438 0490 041C 0474 0498 0540 047C 051A $SSVT $CURPCE •• • ••• 0 Entries HASP Dispatcher Entries Service Routines .TCB and ECB Addresses for SUbtasks .Address SSVT Control Block Directory ••••••• Current PCE Address • • . • . . . . . . . •Dispatcher Event Control Fields - o • • • • • • • • • • .Processor Queue Addresses · . . . . . . . . . . .Checkpoint Record Figure 542. HCT Major Vector Fields Section 5. Component Analysis 5.12.5 JE82 (continued) HASPSSSM Locate.d in PLPA, HASPS SSM interfaces directly with the operating system through the formal subsystem interface (SSI) to provide job scheduling, data management (SYSIN and SYSOUT), and. operator communications. HASPS SSM contains function routines which are invoked through the use of vectors in the subsystem vector table (SSVT) shown in Figure 5·43. The vectors are used by the operating system to invoke functions which are defined by the IEFJSSOB macro expansion. Additional SSVTvectors are used by the HASJES20 module to provide services to the rest of the JES2 system. During execution of functions represented by the SSVT vectors, additional vectors are set into data extent blocks (DEBs) and access method control blocks (ACBs) for data management support. In the performance of its functions, HASPSSSM makes requests for services to the HASJES20 module running under the JES2 TCB as well as to the operating system. The module is entered in the privileged state. The storage relationship of HASPSSSM and HASJES20 to the operating system is shown in Figure 5·44. SSVT Function Map Common to all Subsystems Pointers to Function Routines Pointers to Service Routines JES2 Extension Queue Heads Spool Control I Figure 5·43. The Subsystem Vector Table I I ] SSVT . USER REGION • • • I HASPSSM JES2 REGION STC Tasks TSU Tasks Batch JO B Tasks HASJES20 HCT Supporting Code • • I Control Blocks I Nucleus SVC.111 • I 5~ 12.6 Figure 5·44. HASPSSM - HASJES20 - OS/VS2 Relationship OS/VS2 System Programming Library: MVS Diagnostic Techniques JES2 (continued) Subsystem Interface MVS interfaces formally with JES2 by building a subsystem options block (SSOB) and issuing the IEFSSREQ macro. This formal subsystem interface is shown in Figure 5-45. The subsystem is entered by indexing the SSVT for the entry pointer into HASPSSSM. HASPSSSM performs the requested function, if necessary communicating with HASJES20 by cross-memory post and returns. The SSVT, located in CSA subpool241, contains the pointers necessary for MVS/JES2 and HASPSSSM/HASJES20 communication. With the HCT, the major control block in HASJES20, the SSVT forms the central directory for JES2. The subsystem interface is used for the following: A. Job scheduling and control functions 1. Job selection 2. Job deletion (termination) 3. Re-enqueue job 4. Request job identification 5. Return job identification 6. End of memory 7. End of task B. Data set access method functions 1. Allocation 2. Open - activates following interfaces: a. GET/PUT/PUT ENDREQ/NOTE/POINT b. End of block (SVC 111) 3. Checkpoint 4. Restart 5. Close 6. Unallocation C. TSO/external writer communications 1. Process SYSOUT 2. CANCEL 3. STATUS 4. User identification validity check D. Operator communications 1. Command processing (SVC 34) 2. Write to operator (SVC 35) In addition to the formal SSOB interfaces, the following miscellaneous subsystem interfaces are defined: 1. Exit from the OS/VS2 converter 2. Unsolicited device end 3. Privilege status from the program properties table. Section S. Component Analysis S.1~.7 JES2 (continued) User Module /' I I ... Build SSOB Make Req. J t "'1 I i IEFSSREQ I I I Validate Set 55113 Enter SSYS. I I I ..... SSOB R1 ... SSCT Header / RO ," I D Extension I I I I I I I I I SSIB I I I I I I I HASPSSSM ~ I R11 I . ssvi 11, Perform Function Return I I SJB ~ ... " ..... - Figure 5-45. Formal Subsystem-Interface Vectors 5.12.8 - Pointer R1 Save Area I I I, .. R 13 OS!VS2 System Programming Library: MVS Diagnostic Techniques II JES2 (continued) Dispatcher Structure The JES2 dispatcher allocates processor time to the JES2 main task processors. Each processor is represented by a control block called a processor control element (PCE). When a processor is eligible for dispatching, its PCE is on a dispatcher queue called the $READY queue. When a processor is waiting on an event, it is ineligible for dispatching. If the processor is waiting for a resource, its PCE is chained to the designated resource wait queue; if the processor is waiting for a specific event, its PCE is queued to itself via a specific event wait field, 'PCEEWF', in the PCE. The currently active processor's PCE is at the top of the $READY queue and is addressed by the $CURPCE of the HCT. Major queue and eventcontrol fields in the HCT and SSVT are shown in Figure 5-46. $WAIT A processor that is currently active remains so until it issues a $WAIT macro instruction, at which time the dispatcher is entered at entry point $WAIT (for a specific event), or $WAITR (for a general resource). The dispatcher continues to dispatch eligible processors from the $READY queue until the queue is found to be empty. At this time control is passed to the dispatcher's resource posting routine, which looks for waiting PCE's that have been posted for ev:ents and are therefore eligible for dispatching. All eligible PCE's are moved to the $READY queue, and control passes back to the dispatcher. SSVT (PLPA) Offset 0204 $SVECF . . . . . . . . . . . . . •$$Post Event Control Field 1.............. 0208 $$Post Elements for HASJES20 Processors 02FC .. HCT (JES2 Address Space) Rei 41 Offset Rei 40 Offset 03EC 0300 042C 040C 0430 0414 $CURPCE · . . . . . Current PCE Address 0434 0418 $HASPECF · . . . . . Master Event Control Field 0438 041C 0480 0464 0488 046C $EWOABIT · . . . . . 0 Header for PCEs Awaiting Any Post 0490 0474 $READY · . . . . . 0 Header for PCEs Eligible for Dispatching } ••••• peE Addresses for JES2 Processors 1- .. . . . Walt Queue Header Addresses Figure 5-46. JES2 Queue Control Fields Section 5. Component Analysis 5.12.9 JES2 (continued) $$POST The JES2 dispatcher can be notified of work from within its own address space by the $POST macro. In addition, the dispatcher can be notified of work from other address spaces or from subtasks within its address space by the $$POST macro, which causes· a HASPSSSM interface routiJ)e to cross-memory post the JES2 main task. In this case, the dispatcher post promulgation routine, which receives control when the resource posting routine runs out of work, propagates event posts from the SSVT fields used by HASPS SSM interface routines to the HCr fields in JES2 address space. Here the resource posting routine can pick them up and mark the corresponding processors eligible for dispatching. Control then returns to the dispatcher. JES2 WAIT When the JES2 dispatcher determines that there is no more work to be done, it issues an MVS WAIT macro, and waits to be posted for more work. When a $$POST macro is issued, the dispatcher post promulgation routine receives control and transfers the event notifications to the HCT, where they are picked up by the resource posting routine. The corresponding PCE's are transferred to the $READY queue. Dispatcher Queue Structure The dispatcher queues are double headed and double threaded. Each- PCE (as shown in Figure 5-47) has a chain field to the following PCE entry and one to the preceding entry on the queue. In the special case of the first PCE (referred to as PCE zero), the preceding entry field points to the queue headers, offset so that the queue header appears to be a PCE itself. The last PCE has a following entry field which points back to the queue header. The queue header itself is double, with pointers to the first and last'PCE's in the chain. An empty queue has both queue header fields pointing to itself, offset to appear as PCE zero. A PCE that is not on a queue has both its preceding and following entry fields pointing to its origin. In addition to the chain fields, each PCE has a PCEEWF field, which contains information about the type of event the processor is waiting for. Figure 5-48 provides an example of a 4ump of JES2 proces~or queue chains. 5.12.10 OS/VS2 System Programming Library: MVS Diagnostic Techniques JES2 (continued) HCT Specific PCE Q Header T r--------I I I : PCE 0 Queue Head Empty Queue PCE not on Queue Figure 5-47. JES2 Processor Control Element Relationships JES2 Error Services The following routines make up the JES2 error services: • Disastrous Error Routine • JES2 EST AE Routine • Catastrophic Error Routine • JES2 Exit Routine • Input/Output Error-Logging Routine Disastrous Error Routine This routine is entered at entry point $DSTERR in HASPNUC whenever a physical I/O error occurs, or whenever a logical error is detected when reading a job control table (JCT) or an input/ output table (lOT). The symbol and module names are moved into the message from the $DISTERR macro expansion. A $WTO is issued to notify the operator of the error, and control is returned to the calling processor. The message to the operator is as follows: $HASP096 DISASTROUS ERRORAT SYMBOL symbol IN MODULE module Section 5. Component Analysis 5.12.11 -~ ~ N 0 JES2 Processor Queues $EWQ1: C"I) < C"I) N C"I) '< r;n S- a "1:1 lot c§ a a ~ 5° 075380 075400 075420 075440 075460 075480 1E 1E 1E 1E 1E 1E 000A7030 000A7010 FFFFOOOO 000753F4 00075414 000A5478 000A7910 )00A70FO 000753DC 000B7A88 00075410 00000000 FFOA7180 000A7FOO 000753DC 000A7B78 00075410 00000000 Queue is Empty if First and Last Pointers Point to Queue - 48 (Hex) 000A7188 000A0488 00075334 000A8688 00075424 00000000 000A7200 000A7400 000753E4 000A8F10 00075424 00000000 00000000 000A8000 00075320 00075400 00075420 00000000 OQ lot ~ -<.. a:: , <: C"I) ~ 0 r;n Q"o (') too3 (I) (') =- .EO = = (I) r;n JES2 Processor Control Element OA8680 OA8680 OA86AO OA86CO OA86EO OA8700 OA8720 OA8740 OA8760 OA8780 OAS800 OA8820 OA8840 OA8860 OA8880 1C 1C 1C 1C 1C 1C 1C 1C 1C 1C 1C 1C 1C 1C 1C 85B71140 00000000 7FOOC1C1 OOOFEOOO 000A0800 00000000 00000000 00000000 00000000 00000000 00000000 00000000 NEXT LINE ADDRESS 00000000 00000000 00000000 00000000 NEXT LINE ADDRESS 00000000 00000000 000A09FO D7D~C9D5 20010000 E2E3C44B 50505050 50505050 OOOAOAOO 600778FO 00009000 00000700 00000000 000A8560 00000000 00000000 00075000 0007FAEO 00000000 00010000 00000000 00000000 SAME AS ABOVE 00000000 02800009 00000000 00000000 SAME AS ABOVE 00000000 500A8888 E3C3D9F1 00000000 F0404040 F0404040 50505050 000000C1 00075000 00FF83EE PCEPCEA Points to the Next PCE Waiting on this Q. I Figure 548. Example Dump of JES2 Processor Queue Chains 00000000 000A8878 00000000 000A8A68 000A8800 00000000 000A7536 000A8878 000756F4 00075414 000A8870 00000000 Register 15 in the PCE is the Resume Point When Processor is Dispatched PCENEXT Points to Next peE in Chain of all PCEs. t""' c;: 000A7298 000A8000 000753EO 00075400 00075420 00000000 00000000 A007FB26 00001AEO 00075404 00000000 00000000 100100C3 8007F842 00000000 000 8107 00001060 00000000 02000200 00000000 00080AEO 00000000 00000000 00000000 c.... ~ c:Il N -(') 0 ::s ~ ei" 8- '-' 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 30088900 01404040 OOODFOOO 00AD5190 00000000 5C5C585C 40404040 000A09AO PCEPCEB Points to the Processor Queue - 48 if this is the First PCE on the Q 000A5888 00000000 5C5C5C5C 4007FF2A OOOODOOO 00200200 00000000 5C5C5C5C 000A8686 00000000 PCEID 8107 = Local Printer JES2 (continued) JES2 should be qUiesced and restarted as soon as it is practicable in order to recover any direct-access space that might have been lost as a result of the error. JES2 ESTAE Routine This routine is e.ntered at entry point $ABEND in HASPNUC whenever JES2 abends for any reason. The catastrophic error routine is called with an error code of ABND and control is passed to the JES2 exit routine .. Catastrophic Error Routine This routine is entered at entry point $ERRORTN in HASPNUC whenever an unrecoverable error is discovered by JES2. Register 0 contains the address of a three or four character left-justified error code. The four byte error code field is moved into the operator message which is then written on the operator's console (using a WTO macro instruction) as follows: $HASP095 JES2 SYSTEM CATASTROPHIC ERROR. CODE=code. Control is then passed to the JES2 exit routine. The error codes and their meanings are listed in OS/VS Message Library: VS2 System Codes. JES2 Exit Routine This routine is entered from the catastrophic error routine whenever JES2 is to terminate under abnormal circumstances, and whenever a $P JES2 command is successfully executed. When entered from the catastrophic error routine, the following WTOR message is issued: $HASP098 ENTER TERMINATION OPTION The routine waits for the operator to respond with one of the following replies: • EXIT • PURG • DUMP text If the reply is EXIT, the subsystem vector table (SSVT) telmination complete flag and the $SVPOSTP byte are set on. Control is returned to the system in the case of JES2 error detection, by an SVC 3 instruction with register 15 set to 24; or in the case of a JES2 task abend, by a branch to the location in register 14. Section 5. Component Analysis 5.12.13 JES2 (continued) If the reply is PURG, the routine attempts to clean up commonly addressable control blocks. If the subsystem is the primary subsystem: the UCB attention index values are set to zero; tasks waiting for CANCEL/STATUS, process-sysout, and storage cell expansion queues are posted; a system management facility (SMF) record may be optionally written, and JES2 sub tasks are terminated and detached. Control is then returned to the system as with the EXIT option. If the reply is DUMP, a $DUMP macro instruction is executed with the text (if any) used as the header. Processing continues as with the PURG reply. If entry to the routine is through the normal execution of the $P JES2 command, processing is the same as with thePURG option for abnormal terminations, except that control is returned to the system by an SVC 3 instruction with register 15 set to zero. Input/Output Error Logging Routine This routine is entered whenever an unrecoverable input/output error occurs on a JES2 spooling volume, or whenever a line error occurs which may require the attention of the operator. A message to the operator is generated as follows: • The channel status, channel command code, sense information, track address, and line status are retrieved from the lOB (pointed to by register 1) and formatted. • The unit address and volume serial are obtained from the UCB. • The device name (if applicable) is acquired from the device control table (DCT). The format of the message is described in OS/VS Message Library: VS2 System Codes. JES2 $DEBUG Functions in a Multi-Access Spool Configuration JES2 systems in a multi-access spool configuration share a single job queue, job output table, master track group map, and remote message spooling queues, all of which are kept on the JES2 checkpoint record. In addition, the checkpoint record contains shared system queue elements (QSEs) and other miscellaneous information needed for inter-system control. The checkpoint record is allocated to one processor at a time. Access to any part is controlled by a system's JES2 checkpoint processor, found in the HASPMISC module. The processor has four major sections: • • • • 5.12.14 Initialization Read Write Release OS!VS2 System Programming Library: MVS Diagnostic Techniques JES2 (continued) Initialization Initialization is executed once when the processor is first activated by the JES2 dispatcher. If the debug option has been selected (&DEBUG=YES), storage is obtained, if possible, for debug copies of the job queue and the job output table' (JOT). If sufficient storage is not available, message $HASP452 is issued and processing continues. Read I This is executed at the beginning of each shared queue ownership period by systems in a multi-access spool configuration. If parameter &DEBUG=YES, the job queue and JOT areas are compared with copies saved just prior to the last checkpoint write. A mismatch indicates an invalid alteration to these shared queue areas and )ES2 is terminated with a KOt catastrophic error. This step is skipped for the first read following initialization. All checkpoint records are read from DASD. A lockout warning timer (parameter &WARNTIM) is started and the read is started by $EXCP. lOS performs the actual hardware reserve of the checkpoint device. The processor then wait~ for read completion or timer expiration. If the timer expires before read completion, warning message $HASP260 is issued and the warning timer is restarted. The message is issued repeatedly, at warning intervals, until read completion occurs. If a permanent read error occurs, JES2 is terminated with a K02 catastrophiC error. After a successful read completion, the time stamps in this system's QSE, and in any QSE for which the $ESYS command had been entered, are compared with a time stamp saved in the HCT. A mismatch indicates that another system has illegally taken ownership of a QSE owned by this system, or that the reserve mechanism (hardware or lOS) has failed to prevent simultaneous access to the multi-access spool checkpoint records. JES2 is terminated with a K03 catastrophic error. Write This is executed repeatedly as a loop, in response to various requests by other processors or timers. In a multi-access spool environment, the loop operates only during an ownership or hold period. Section 5: Component Analysis 5.12.15 JES2 (continued) If parameter ~DEBUG=YES, the saved copies of the job queue and JOT areas are compared with the current job queue and JOT areas. If a record has been modified, but its corresponding checkpoint control byte does not indicate so, JES2 is terminated with a K05 abend. A K05 abend indicates failure to issue a . $QCKPT or $#CKPT macro prior to executing a $WAIT macro, after modifying the job queue or JOT. The current hardware time-of-day (TOD) clock is recorded in the HCT, along with this system's QSE and a QSE for which any $ESYS command had been entered. In multi-access spool systems, these stamps are verified following the next read operation to ensure the integrity of QSE ownership. If parameter &DEBUG=YES, copies of the job queue and JOT areas to be written are saved. In multi-accessed spool systems, these are used prior to the next read operation to detect invalid updates to shared but not owned information. The write is started by $EXCP and the processor waits for completion. If a permanent error occurs, JES2 is terminated with a K04 catastrophic error. Following a successful completion, the $CKPTACT bit in theHCT is cleared. Release Release is executed only by systems in a multi~access spool configuration at the end of each shared queue ownership period. Miscellaneous Hints on JES2 Starting JES2 - Enqueue Wait on STCQUE The installation can choose the option to manually start JES2 by changing MSTRJCL with AMASPZAP. When MSTRJCL is changed, JES2 parameters are entered on an operator-issued START command (that must be issued before MVS processing can occur). If the operator misspells JES2 on the START command (such as entering JES), a wait occurs with no indication other than STC (IEESB605) being exclusively enqueued on SYSIEFSD STCQUE behind IEEVWAIT, which does not release the resource until JES2 is initialized. Also note that the CSCB (command scheduling control block) pointed to by the parm list to IEFJSWT is not formatted in the dump. Therefore, if you encounter an enqueue wait on STCQUE, check that the ST ART command is entered correctly. This also holds true for any normally started tasks (such as mounts or installation started tasks) which cannot be started until the primary job entry subsystem is started. 5.12.16 OS/VS2 System Programming Library: MVS Diagnostic Techniques JES2 (continued) TYPE NAME ABBR. STORAGE TYPE PRIMARY USE SSVT Subsystem Vector Table CSA SUBPOOL 241 Contains system pointers and parameters for HASPSSSM interface routines. HCT HASP control table JES2 Address space, HASPNUC module Major directory for HASJES20. Contains queue headers, control blocks pointers, module and entry pointers and system parameters. Processor Management PCE Processor Control Element JES2 Address Space Unit of JES2 dispatcher. Has associated work space and save areas for JES2 processors. Buffer Management BUFFER Buffer JES2 Address Space Basic building block for JES2 control blocks (JCT, lOT, Special) JOB Management (TRANSI ENT) JCT Job Control Table SYS1.HASPACE User address space during XEO Primary job oriented control block. Contains accounting information and pointers to other job information. lOT Input Output Table SYS1.HASPACE User address space during XEO Contains job DASD information and PDDB's for input/output data sets. PO DB Peripheral Data Definition Block SYS1.HASPACE User address space during XEO Describes a job input or output data set. OCT Output Control Table SYS1.HASPACE User address space during XEO Contains output control records (OCB's) to describe data output records (forms, route, etc,) JOE Job Oueue Element In JOB (JES2 address space) Represents a job in process. Resides on appropriate job queue chain. JOB Job Oueue Buffer SYS1.HASPCKPT & JES2 address space Contains the job queue chain header's JOE's and I/O parameters for checkpointing. JOT Job Output Table SYS1.HASPCKPT & JES2 address space Central control block for all JES2 output. Contains three kinds of JOE's. JOE Job Output Element IN JOT (JES2 address space) Represents output data set by units of work, characteristics of data set, and class of output. SJB Subsystem Job Block CSA SUBPOOL 231 Represents a job in process toOS/VS2 used by HASPSSSM interface routines. SOB Subsystem Data Block User address space SUBPOOL 229 Used by HASPSSSM to control processing of data set using HASP Access Method (HAM). PIT Partition Information Table CSA Completely describes a JES2 logical partition, its job classes and current state. Organizational Job Management (RESIDENT) Job Management (MISCELLANEOUS) Unit Management Multi-System Management CAT Class Attribute Table JES2 Address Space Describes the attributes of a job class. SCAT SYSOUT Class Attributes Table In SSVT Space Describes output classes by print, punch, plot, etc. characteristics. OCT Device Control Table JES2 Address Space Represents a unit record device or RJE line. Contains all the information necessary to set up EXCP. RAT Remote Attribute Table JES2 Address Space Consists of one entry per remote device, containing attributes of device. OSE Shared Oueue Control Element SYS1.HASCKPT & JES2 address space (JOB) One per system of a Multi-Access Spool environment, containing identification and cross-system communication parameters. I Figure 5-49. Major JES2 Control Blocks Section 5 •. Component Analysis 5.12.17 5.12.18 OS/VS2 System Progtartlining Library: MVS Diagnostic Techniques Subsystem Interface (SSI) In the course of HASP/ASP installation, hooks were put into OS and SVS operating systems to establish an interface. With the job entry subsystem (JES), an interface was designed to eliminate the need for these hooks. The subsystem interface (SSI) is primarily used to communicate with the job entry subsystem (either JES2 or JES3), but is flexible enough to communicate with any trace subsystem. System Initialization Processing At SYSGEN, the name of the primary job entry subsystem and secondary subsystems are listed on the SCHEDULR macro and put in the job entry subsystem names table (CSECT IEFJESNT). Alternatively, secondary subsystems may be specified in the subsystem names table in load module IEFJSSNT. The master scheduler base initialization module (IEEVIPL) gives control to the subsystem interface initialization module (IEFJSINT). This module builds a subsystem communication vector table (SSCVT) for each unique name in the JES name table and in the subsystem names table. The SSCVTs are chained together with the primary job entry subsystem SSCVT first, and the master subsystem SSCVT second. These are followed by (in the same order as in their respective tables) the SSCVTs in the JES names table and in the subsystem names table. IEFJSINT also puts the name of the primary job entry subsystem in the JES control table (JESCT). The subsystem vector table (SSVT) for the master subsystem is built and initialized. The subsystem allocation sequence table (SAST) is built for later use in allocating subsystem data sets. IEFJSINT then returns control to IEEVIPL. The job entry subsystem builds and initializes its own SSVT when the system is initialized. All other subsystems must do likewise. A subsystem can be initialized as follows: • By being started (for example: START JES2) or, • By having an initialization routine specified in the subsystem names table. Additional subsystem initialization processing is performed by module IEEMB860. This module has two subsystem initialization functions. The first is to issue operator messages for errors that occurred during subsystem interface initialization; the second is to LINK to the subsystem initialization routines specified in IEFJSSNT. Section S. Component Analysis S.13.1 Subsystem Interface (SSI) (continued) Operator messages are issued by IEEMB860 rather than IEFJSINT because IEFJSINT executes before the communications task has been initialized. Message IEE730I is issued to indicate that a duplicate subsystem has been specified in the subsystem names table. A subsystem is a duplicate if it is: • A respecification of the primary job entry subsystem • A respecification of the MSTR subsystem, or • A respecification of a subsystem that has been initialized previously. Message IEE858I is issued if the subsystem names table (IEFJSSNT) could not be found; message IEE859I is issued for each subsystem initialization routine which could not be found. It is the responsibility of the subsystem initialization routine to inform the operator of and recover from errors in that routine. If the . subsystem initialization routine fails to recover from these errors, the next entry in IEFJSSNT is processed and the failing subsystem may not be completely initialized. Subsystem Interface Major Control Blocks Subsystem interface's major control blocks are the JES control table (JESCT), the subsystem communications vector table (SSCVT), the subsystem vector table (SSVT), the subsystem information block (SSIB), the subsystem options block (SSOB), and the extension to the SSOB or function dependent area. The following table summarizes each of these control blocks which are described in the Debugging Handbook. 5.13.2 OS/VS2 System Programming Library: MVS Diagnostic Techniques Subsystem Interface (SSI) (continued) Control Block Sub pool Created By NUCLEUS Mapping Macro Size Pointed ToBy 0 44 bytes CVT Contains information needed by the Subsystem Interface and addresses of Scheduler routines. IEFJESCT 0 Note 1 JESCT Defines the order in which subsystems will be invoked to allocate subsystem data sets. IEFJSAST 0 24 bytes JESCT Identifies each subsystem defined to the system and points to the SSVT for each subsystem. IEFJSCVT Any Note 2 SSCVT Contains the indications of functions of a subsystem and the addresses of the routines that perform those functions. IEFJSSVT Key Function JESCT SYSGEN SAST IEFJSINT SSCVT IEFJSINT SSVT Subsystem Any - determined owning the by the subsystem SSVT, at initialization of subsystem. SSIB The user of Subsystem Interface User's Subpool Any 36 bytes SSOB, JSCB Identifies the subsystem to the Subsystem· Interface and passes information between the subsystem and its caller. IEFJSSIB SSOB The User of Subsystem Interface User's Subpool Any 20 bytes SSWA, IEL The parameter list for the Subsystem Interface. IEFJSSOB Function Dependent Area The User of Subsystem Interface User's Subpool Any Variable SSOB Passes. information to the function of the subsystem the user wishes to invoke. IEFJSSOB 241 241 Notes: 1. The SAST size is 8 bytes plus 12* (the number of subsystems in the SSCVT chain). 2. The SSVT size is 260 bytes plus 4* (the number of functions suppored by the subsystem). Minimum size is 264 bytes, maximum - 1284 bytes. Control Block Usage is shown in Figures 5-50 and 5-51. Section 5. Component Analysis • 5.13.3 Subsystem Interface (SSI) (continued) Lac X'10' I CVT' • X'128' CVT JESCT ~ (JESCT X'14' JESSSREQ X'18' JESSSCT JESPJESN X'1 C' 1 - - - - - - 1 r x'30' J ESSAST A, SSCVT If SSCVT X'O' 'SSCT' X'4' SSCTSCTA SSCTSCTA SSCTSCTA X'8' SSCTSNAM SSCTSNAM SSCTSNAM SSCTSSVT. .SSCTSSVT X'10' SSCTSSVT 'SSCT' ~SSVT 'S5CT' SSVT SSVTCOD256·byte Function Matrix J X'1 04' ....S_S_V_T_R_T_N-t ] i Function Pointer Matrix can be Maximum 256 Words SSVTRTN , Figure 5·50. Subsystem Interface Control Block Usage 5.13.4 If SSCVT OS{VS2 System Programming Library: MVS Diagnostic Techniques • SSVTRTN Subsystem Interface (SSI) (continued) Requesting Subsystem- Services To request subsystem services, a system routine enters the correct function code (see Subsystem Interface Summary in OS/VS2 System Logic Library) in the subsystem options block (SSOB), and the name of the desired subsystem in the subsystem information block (SSIB). The IEFSSREQ macro is then issued, causing control to pass to the subsystem interface routine IEFJSREQ. The specified function code and subsystem name indicates to the interface routine the subsystem routine to receive control. Invoking the Subsystem Interface Storage is acquired for the SSIB, the SSOB, and the function dependent area of the SSOB if required. The following entries are made in the SSOB header: SSOBID - 'SSOB'. SSOBLEN - The length of the SSOB header. SSOBFUNC - The function ID of the function to be invoked. SSOBSSIB - The address of the SSIB or zero. Zero means that the life-of-job SSIB is to be used. Its address is in the active JSCB, field JSCBSSIB. The request will thus be directed to the subsystem that started the initiator under which the job is running. (See Figure 5·52).- SSOBINDV - The address of the function dependent area or, if not needed by the function, zero. The following entries are made in the SSIB: SSIBID - 'SSIB' SSIBLEN - The length of the SSIB SSIBSSNUM - The name of the subsystem to which the request is being made. SSIBJBID SSIBDEST - If the function requires these fields The entries made in the function dependent area are: length * - The length of the function dependent area (first halfword) - Any fields required by the function Section 5. Component Analysis 5.13.5 Subsystem Interface (SSI) (continued) Register 1 SSOB Header 'SSIB' X'O' X'4' X'4' X'S' X'S' X'C' SSOBRETN X'10' SSOBINDV SSIBLEN SSIBSSNM } Function Dependent Function Dependent Area (SSOB Extension) 16 ~-----""j :;;::'on TvpeofFunctlon ~-----.., I Figure 5-51. Control Block Structure for Invoking Subsystem Interface TCB X'B4' TCBJSCB JSCB JSCB ( X'15C' JSCBSACT X'13C' JSCBSSIB ~SSIB X'O' X'4' Note: The active JSCB may be the same as TCBJSCB. X'8' Figure 5-52. Finding the SSIB for a Job When SSOB Pointer is Zero 5.13.6 OS!VS2 System Programming Library: MVS Diagnostic Techniques 'SSIB' SSIBLEN I SSIBSSNM Subsystem Interface (SSI) (continued) Register 1 points to a one-word parameter list which points to the SSOB. (See Figure 5-51). Macro IEFSSREQ is invoked which passes control to routines which handle the Subsystem Interface request. The communications vector table (CVT) and the JES control table (JESCT) must be mapped if IEFSSREQ is invoked. The subsystem interface returns a code in register 15. Possible return codes are: o 4 8 12 Successful completion - request went to subsystem - Subsystem does not support this function Subsystem exists, but is not active Subsystem does not exist 16 Function not completed - disastrous error 20 Logical error (such as invalid SSOB format, incorrect length) The field SSOBRETN in the SSOB contains a return code from the subsystem if the request was successful. The return code depends on the function being invoked (see the SSOB description in the Debugging Handbook.) Logic Flow Examples Thi$ section provides an overall logic flow from a task making a request, through the subsystem interface to the subsystem, and then back to the task. Two examples are described. Notifying a Single Subsystem 1. A task (TSO/cancel) wants to inform JES2 that a job is to be canceled. 2. The task creates an SSOB, SSIB, and a function dependent area. a. The SSOB is filled in. A function code of 2 is used. (See OS/VS2 System Logic Library, Volume 3, for a complete function code list.) b. The SSIB is filled in. The subsystem name is JES2. c. The function dependent area is filled in with the necessary information that the subsystem needs for this type of request. 3. Macro IEFSSREQ is invoked which branches to module IEFJSREQ (IEF JSREQ's address is in the JESeT). Register 1 points to a parameter list which points to the SSOB .. 4. IEFJSREQ checks: a. Are the pointers to the SSOB and SSIB valid? No, then return with a return code of 16 in register 15. Section 5. Component Analysis 5.13.7 Subsystem Interface (SSI) (continued) b. Are the formats of the SSOB and SSIB correct? No, then return with a return code of 20 in register 15. c. Find the requested subsystem's SSCVT. If not found, return with a return code of 12 in register 15. d. Find the requested subsystem's SSVT. If not found, return with a return code of 8 in register 15. e. Is the requested function code valid? No, then return with a return code of 16 in register 15. f. Is the requested function code supported by the requested subsystem? No, then return with a return code of 4 in register 15. g. Index into the SSVT and get the address of the function routine. h. Branch to the function routine. Register 0 register 1 = Address of the SSOB. = Address of the SSCVT, 5. Module HASPSSSM at label HOSCANC receives control. It is the function routine for JES2 for function code 2 (CANCEL request). a. Process the request and place a return code in the SSOB (SSOBRETN). b. Return codes for this function code are as follows: o- CANCEL completed. 4 - Job name not found. 8 - Invalid JOBNAME/JOB ID combination. 12 - Job not canceled - Duplicate jobnames and no job ID given. 20 - Jobnot canceled- Job is on output queue. 24 -' Job ID with invalid syntax for subsystem. 28 . - Invalid CANCEL request. Cannot cancel an active TSO user or a started task. 6. Control is then returned to the requesting task directly from the function routine. The task then examines register 15 and SSOBRETN and acts accordingly. Notifying All Active Subsystems 1. A task wants to notify all active subsystems of a WTO message. 2. The task creates an SSOB and a function dependent area. No SSIB need be created if the task's life-of-job SSIB has the master subsystem's name (MSTR) in it. If it does not, and that SSIB is used, only one subsystem would be notified. The SSOB ·and the function dependent area are mled in. A function code of9 is used. (A list .of all function codes is in OS/VS2 System Logic Library, Volume 3.) 5.13.8 OS/VS2 System Programming Library: MVS Diagnostic Techniques Subsystem Interface (SSI) (continued) 3. Macro IEFSSREQ is invoked which branches to IEFJSREQ (address is in the JESCT). Register 1 points to a parameter list which points to the SSOB. 4. IEFJSREQ checks: a. Are the pointers to the SSOB and SSIB valid? No, return with a return code of 16 in register 15. b. Are the formats of the SSOB and SSIB correct? No, return with a return code of 20 in register 15. c. Find the requested subsystem's SSCVT. If not found, return with a return code of 12 in register 15. d. Find the requested subsystem's SSVT. If not found, return with a return code of 8 in register 15. e. Is the requested function code valid? No, return with a return code of 16 in register 15. f. Is the requested function code supported? No, return with a return code of 4 in register 15. g. Index into the SSVT and get the address of the function routine. h. Branch to the function routine. Register 0 = address of the SSCVT. Register 1.= address of the SSOB. 5. The SSIB points to the master subsystem, so module IEFJRASP is the function routine that receives control. a. IEF JRASP makes a copy of the SSIB. b. For each SSCVT, th~ name of the subsystem is copied into the SSIB copy. (The master sUbsystem's SSCVT is skipped.) IEFSSREQ is then invoked for each subsystem. c. The highest return code from the subsystems is placed in the requesting task's SSOB, and the lowest return code from the subsystem interface is put in register 15. 6. Control is then returned to the requesting task directly from the function routine. The task then examines register 15 and SSOBRETN and acts accordingly. Debugging Hints • Paging must be possible at the time subsystem interface is entered since the code for subsystem interface may not already be paged in at the time the call is made. • For the same reason, the processor must not be physically disabled. • The mapping macro IEFJSSVT maps the SSVT. Only the master subsystem's SSVT matches the mapping exactly. JES2 and JES3 SSVTs have additional material appended to the end of the area mapped by IEFJSSVT. For JES3, the mapping macro is IATYSVT. For the contents of the JES2 SSVT, refer to OS/VS2 JES2 Logic. . Section 5. Component Analysis 5.13.9 Subsystem Interface (SSI) (continued) • Some functions requested at theinaster subsystem cause the function to be broadcast to every active subsystem. These function codes are: 4 - Notify the subsystem of end-of-task. 8 - Notify the subystem of end-of-address space. 9 - Notify the subsystem of a WTO message. 10 - Notify the subsystem of an operator command. 14 - Notify the subsystem of a delete operator message (DOM). 32 - Notify the subsystem of a failing START command. • Function code 9 is used in an SSOB with the pointer to the SSIB always zero. This causes the SSIB pointer in the JSCB to be used. If that SSIB is for the master subsystem, the request is ~v(m to every active subsystem. If the SSIB is not for the master sub&ystem, the request is given to only the subsystem named in the SSIB. • If a subsystem verification request (function code 15) is made to the master subsystem, (field SSIBSSNM in the SSIB contains 'MSTR'), and the name in the SSIBJBID field of the SSIB is not that of a job entry subsystem, then upon return from the subsystem interface, field SSIBSSNM will contain the name of'the primary job entry subsystem. Ajob entry subsystem is dermed as a subsystem that can provide its own sysout services. This is indicated by bit SSCTUPSS being offin the subsystem's SSCVT. S.13.10 OS!VS2 System Programming,'Library: MVS Diagnostic Techniques Recovery Termination Manager (RTM) The recovery termination manager (RTM) cleans up system resources when a task or address space tenninates, either normally or abnonnally. Functional Description Logically, RTM consists of four related processes. 1. RTMl attempts recovery for software or hardware errors; it is entered via the CALLRTM macro instruction issued by supervisory routines. Functional recovery routines (FRRs) are processed in this logical phase. 2. RTM2 performs normal and abnormal task termination for both system and problem program routines. The ABEND macro (SVC13) requests RTM2 services. 3. Address space tennination provides normal and abnonnal address space termination for supervisory routines. The CALLRTM macro instruction is used to request this function. 4. RTM support functions such as error recording, formatting of dumps*, and creating recovery control blocks for error exit processing. *Note: RTM generates an error id that ensures that information recorded in SYS1.LOGREC concerning a problem, can be readily correlated with SVC dump infonnation concerning the same problem. See 'Error Id' later in this topic. Work Areas For details of RTM work areas see "Use of Recovery Work Areas for Problem Solving" in Section 2. Major RTM Modules RTM 1, which is part of the nucleus, comprises four modules: 1. IEAVTRT1 - RTM entry point processor 2. IEAVTRTM - RTM1 mainline 3. lEAVTRTS system recovery manager 4. IEAVTRTR RTMI recovery routines Section 5. Component Analysis 5.14.1 Recovery Termination Manager (RTM) (continued) RTM2, which resides in the link pack area (LPA), is entered via SVC 13. The mainline for RTM2 comprises the following three modules: 1.IEAVTRT2 2.IEAVTRTC 3.IEAVTRTE - initialization - controller - exit handler Other important RTM2 modules are: • • • • • • IEAVTAS1 IEAVTAS2 IEAVTAS3 IEAVTSKT IEAVTMRM IEAVTRML - pre-exit processing post-exit processing control recovery task termination purges R TM2 resource manager installation resource manager list Process Flow The following charts depict the process flow for: • • • • • • Hardware error processing Normal end-of-task termination Abnormal end-of-task termination Retry Cancel Address-space termination. Hardware Error Processing Depicted here is the processing for a hard type machine check in a global routine that has FRR recovery. It shows the interfaces and control flow between the machine check handler and RTM1 for both hardware error processing and the resulting sofi:ware recovery attempt by the FRR. It indicates that software recovery continues in task mode because, in this example, the FRR does not recover the error. The use of extended error descriptors (EEDs) allows the LOGREC buffer to be available for further possible machine checks and is the mechanism for passing information to RTM 1 and RTM2. The information in the global system diagnostic work area (SDWA) used by RTM 1 recovery was obtained from the EEDs. RTM2 obtains an SDWA, but also uses the EEDs as its source of error data to be passed to recovery routines. RTM 1 uses the RTM processor-related work save area (WSACRTMK) to alter the registers and the PSW that MCH reloads, thereby determining whether MCH resumes the interrupted process (soft error), or reenters RTM1 for software recovery (or hard error). 5.14.2 OS!VS2 System Programming Library: MVS Diagnostic Techniques Recovery Termination Manager (RTM) (continued) Legend: Logree Buffer MCH • Processing a storage check in a global routine that has established an FRR . • Invokes RTM1 for software repair: CALLRTM TYPE=MACHCK ~Pointer Information about hardware error --+ Control flow c:::=;> Data flow I. .. ~ ~~II~E~A~V_T_R_T~1 L...i... • ~ ... RTM1 ~.~~ ~IEAVTRTM ____ Sets up environment for MACHCK entry. EED 01EAVTRTH . . . . . Calls lEA VTRTH (Hardware error processor). • Preserve hardware data in EED's (RTM's internal control blocks). EED Registers and PSW at time of MACHCK . I • Call appropriate repair routine. • Passes back pOinter to re-e ntry data (stored in WSACRTMK). • Returns to caller (MCH) with pointer to WSACRTMK. • Record hardware error to logrec. ... Repair status information WSACRTMK • Establish environment for re-entry to RTM in WSACRTMK. '" Registers and PSWfor re-entry to RTM1 I WSACRTMK MCH regs and PSW altered by RTM1 SDWA r'\ rV JCs) MCH 11 ~~~~I~I~V6----~/~)----~~~----~ .....,.. . • Load registers and ~~ WSACRTMK (causing re-entry to RTM1 _ type MACHCK _ RE-ENTRY) for software recovery ~_".GI E~A~V~T~R~T~1;j~~;;_\~7~ I EAVTRTM ~ r • Sets up environment for MACHCK re-entry. ~ DISPATCHER The task is dispatched eventually and execute the SVC 13 which causes RTM2 task recovery/termination services to be invoked. ~--------------~ -- ... SDWA '--EEDs RB ,. • Records the error • • Returns with a 'Continue-withtermination' indicator ~ TCB TCBRTM12 • ~ FRR I EAVTRTS ~ ~r- ... • Attempt system ....,.. • Routes to FRR to recovery since error attempt recovery for (MACHCK) occurred routine that suffered ~.. Percolates in a global routine. MACHCK. • Sets up task for entry to RTM2 by altering RBOPSW. • Exits to the dispatcher. ... 8 MACHCK Information A .. Continuewith-term. ~ SVC 13 Section 5. Component Analysis 5.14.3 Recovery Termination Manager (RTM) (continued) Normal 'Task Termination EXIT and parts of RTM2 make up this function. The flow shows how EXIT is entered and then reentered to complete task termination; it also provides a perspective of RTM2 functions related to normal termination of a task. Legend: Task issues SVC 3 ~ISP - - . . Pointer PRB IGC003 - EXIT \DISP \!\ -+ SVRB ===> ASXB 1 Determine tasks eligibility for normal task termination . • Exit was issued by last AS or AB queue • • TCBEOT = zero. 2 Issue SVC 13 to pass control to RTM2. Contrpi Flow Data Flow ~IASXBTCBS~ RTM2 RTM2WA Communications are tor processing within the RTM2 load module. IEAVTRTC No abnormal conditions to handle. RMPL IEAVTSKT Free resources via Link to F.TM2 and user defined resources managers, passing resource managers parameter list (RMPL). Pass control to task termination processor. 2 3 4 It ASXBTCBS indicates '1' task is left in the memory - then address space termination Is necessary. Set the task non-dlspatchable and issue CALLRTM TVPE=MEMTERM to SCHEDULE an SRB Which Initiates address space termination processing. If only normal task termination. then branch to exit prolog to get rid of SVRB. Free the RTM2 work area. 2 Set PRB resume psw to point to an SVC 3 Instruction. 3 Set end-of-task indicator for exit in TCB (TCBEOT). 4 • LINK, To all Resource Managers defined in IEAVTRML ..,.. . . .~ To System Resource "'II Managers Indicate proper control flow in RTM2 work area. • If last task in memory. indicates address space termination processing. , • Not last task. PRB BRANCH Prolog deletes SVRB ~ OlSP \ IGC003 - EXIT 1 IEAQSPET - IEAVGCAS Free storage .... BALR ... IGC062R1 IEAVEEDO TCB & RB core. •• ,Free "Dequeue TCB. EITHER end-ot-task exit • Schedule routln!' ror task. OR • Post mother task if attached w/ECB operand. 5.14.4 ... 0) . r BALR ... Since the end-ot-task indicator has been set (TCBEOT) BALR'to resource manager for cleanup of task, 2 CD CD 0 TRRM VSM PGM .... BALR ... .., OET Exit to dispatcher. BH 14 ., Normal Task Termination is Complete OS/VS2 System Programming Library: MVS Diagnostic Techniques BALR I .. I .. .. l IEAVTSBP Dequeue/Free SCB's owned by RB or task. I J IEAPPGMX I Free programs I Recovery Termination Manager (RTM) (continued) Abnormal Task Termination Shown here is the logic flow during abnormal termination of a non-critical nature. If the error is not recoverable at a particular task level, that task and its subtasks are removed. If the scope of the abend is "Step," then the entire job step is removed. Optionally, serviceability information (dumps and software error records) is supplied to the user. ABEND TCB 1----...J,,·-;.;ooihb;;ta;Tn;.'"i-;;n~ltlalize and queue the RTM2 work area. • Save a copy of the trace t~le~ _ . . ._ _ IEAVTRTC .... • Validity check and process the dump options. Control Is returned from IEAVTASI (Stacking) IEAVTRTC • ABTE RM SMC subtasks with 100. • Walt for the subtasks In RTM2 to complete. • Set the subtasks non· dispatch able. • Purge the resources. 1P......~~~~~JI--------~~........~~IEAVTRT2 resource managers • • Update the RB queue for exit. normal exit legend -----.... Pointer ---+ Control Flow c===) Data Flow Section 5. Component Analysis 5.14.5 Recovery Termination Manager (RTM) (continued) Retry Shown here is the flow through RTM2 when processing a potentially recoverable error. The recovery exit is supplied environmental data that describes the error (for example,completion code, register contents, PSW,system state at time of error) to ilid in diagnosing the error. To retry, the resume PSW in each request block (RB) up to and including the retry RB is modified. The retry address supplied by the exit is placed in the resume PSW field of the retrying RB, and all RBs between the retry RB and the RTM2 RB have their resume PSW set to either exit prologue or SVC 3. When RTM2 eventually returns to the system, supervisor-assisted linkage will cause the retry address in the retry RB to be given control. r;;sc~ SJ·~Lr_:g ~ __V_I.___ __________ ~ Tce IEAVTRTC • Process and validity check the dump options. Legend: • Select an exit (SCB). • Obtain and Initialize the SDWA. • Perform I/O requests and block asynchronous exits If requested. .~V:'C~t:,EXI~ IEAVTAS2 Control from IEAVTASl via IEAVTRTC . . ._ _ _ .Dlagn'ose error • • Select options • Track the SDWA. • Record, If requested. • Save the dump OPtions. IEAVTRT2 • Route control to IEAVTRTE. IEAVTRTE • Free the saved copy of the trace table If available. (RTM2TRTB10). • Free the RTM2 work area. • Clear the TCB flags. • Branch to the exit prologue. Abnormal 'exit 5.14.6 - - - . PoInter User exit QS/VS2 System Programming Ubw{ MVS Diagnostic Techniques • ~ Contr~1 flow ••• ~Dataflow Recovery Termination Manager (RTM) (continued) Cancel Shown here is the flow of control through RTM when a job is cancelled. The CANCEL request is indicated by specific completion codes set in the TCB by RTMI (code='X22'). The CANCEL process is distinctive in that it is considered a strictly unrecoverable situation. Normal termination procedures are abandoned in favor of creating an express path through termination. However, term exits are given control. Legend: _Pointer - " c o n t r o l Flow RTM1 ~DataFIOW IEAVTABD • Process the EED's. • SDUMP/SLIP considerations. IEAVTRTC • Determine the type of dump (SVSABEND or SVSMDUMP). • Process the dump data set for current & SNAP. • Find the daughters & SNAP If not SVSMDUMP. • Reset the TCB flags In current and daughters. --..- • Set the subtasks nondispatch able. • Process the subtasks and current task. Setting abend bits, halting I/O and purging resources. IEAVTRTC • Initialize' term oxit processing until all term exits have been entered. _____ ~--ii61 L-_ _ _..J Resource managers (Task Termination) IEAVTSKT' • Initiate task termination until L-~e~ac~h~s~u~b~ta~s~k~h~a~s:.e~x~it~ed~·~.____ jl............ • Find the deepest subtask. • Detach the subtask. ••••••••• : ~';;~:t~ht~:e~o~~~;~e for • Installation resource managers. • IBM resource managers . L-_e_x_lt_.__________________--J • Free the saved trace table if available. Exit prologue (lEAVEEXP) Normal exit Section 5. Component Analysis 5.14.7 Recovery Termination Manager (RTM) (continued) FORCE Command The FORCE command is designed to remove a job or TSO user from the system after the CANCEL command has failed to do so. For example, a job is writing to a DASD unit when the unit is suddenly made unavailable to the system; in this case, the CANCEL command is frequently unable to remove the job. If CANCEL does fail to remove the job, then FORCE can be used. However, FORCE does not use normal termination or normal cleanup routines, and is intended to be used only as an alternative to another IPL. ' When FORCE is issued, the job's address space is terminated and any task running in the address space is terminated. If a job is running under an initiator, the initiator is also terminated. FORCE processing is dependent on the recovery termination manager (RTM), and on the command scheduling control block (CSCB), which contains a new bit definition in CHAFORCE. When the FORCE command is entered on a console having system authority, control is given to the CANCEL-FORCE processor which verifies that the command syntax is correct. The processor then scans the CSCB chain to, see if the job exists and is cancelable. A bit in the job's CSCB is then checked to see if a CANCEL has been issued for this job. If not, a call is made to the message module to issue m~ssage IG~~38I - 'CANCELABLE - ISSUE CANCEL BEFORE FORCE', and control is returned to the system. If CANCEL has been issued, a CALLRTM TYPE=MEMTERM is issued. The message module is called to issue message IEE301I - 'FORCE COMMAND ACCEPTED', and control is then returned to the system. If an error is found in the command syntax, or the job was either not found or was non-cancelable, the message module issues an appropriate message and control is returned to the system. The FORCE processor uses the, current CANCEL serialization code. The CSCB chain resource is serialized via ENQUEUE on SYSIEFSD QIO. Because the holder of CSCB QIO must be non-swappable while holding the resource, the FORCE command 'processor issues a SYSEVENT DONTSWAP before issuing the ENQUEUE on QIO. For' additional information concerning the use of FORCE, see Operator's Library: OS/VS2 MVS System Commands. 5.14.8 OS/VS2 System Programming Library: MVS Diagnostic Techniques Recovery Termination Manager (RTM) (continued) Address-Space Termination The process of terminating an address space (memory) is one that cannot be isolated to one task, module, or logical unit of code. The many parts of the process, depicting control flow and data flow, are shown here. BALR Since the MEMTERM process circumvents all TASK recovery and TASK resource manager processing, its use is restricted to a select group of routines which can determine that task recovery and resource manager clean up is either not warranted or will not successfully operate in the address space being terminated. It therefore is restricted to the following users: ATMI \/\ , IEAVTRTI CALLRTM TVPE=MEMTERM ASIOm COMPCOO-O (normal) +0 (abnormal) Global SRB dispatcher Via branch table go to 'TVPE' processor. TVPE=MEMTERM Address space termination SRB Post RTCTMECB This activates the address space termlntlon task In the master address space. IEAVTRTM Put the ASCB of the address space to be termi nated on the address apace queue. 1) 2) 3) 4) 5) 6) Paging supervisor when It de· termlnel that It cannot swap In the LSQA for an address space. Address sPace create when It determines that an address space cannot be initialized. The RTM or the supervisor can· trol F R R when they determine that uncorrectable tranSlation errors are occurring In·the address space. The RTM2 when It determines that task recovery and termination cannot take place in the current address space: The RCT when it determines that the address space is permanently deadlocked. The RTM2 when all tasks in the address space have terminated (IEAVTRTE). This is the only reQuestor of normal address sPace termination (that 05 2 queue of addr.., spec.(,) to be terminated. 3 The auxiliary storage management recovery routine when It suffers an indeterminate error from which it cannot recover I while handling a swap-In or swap-out request. 8) The auxiliary storage management recovery routine when It determines that uncorrectable transla" tlon errors are occurring while ASM is uSing the control register of another address space to uPdate the address space's LSQA. 7) 9) SVC 34 Store the completion code in tha ASCB with matching ASIC (or current). Dispatcher (IEAVEDSO) Schedule the SRB to post the address space termination task in the master address space (use of the SR B routine is serialized by compare and swap). IEAVTRTI Return to the caller. in r.spons. to a FORCE command. 10) COMPCOD=O). VTIOC In response to an FSTOP rePly. Step 1 Step 2 Steps 3.4 Steps 5.6.7 Note: Since callers 4. 5, and 6 above are task-related and running in the address space to be terminated, they will set themselves non-dlspatchable after Issuance of CALLRTM. Identify the requesters The request format Initiate the request Process the request RTCT I RTCTFASB ASCBQptr L~ ~.7 ASCB I ~________~:A:S:ID::::~' / , next ASCB ~ Reside~ess ~ r-'-"\ r-------, f~t!~~~~i. I I I 2 I 3 I I 4 Get the local lock _ patcher lock. 5 Free locks. L _ _ _ _ _ --.J 6 Can SVC I/O purge. Purge 110 for that address spee•. 7 CaU RSM (real .torage management) to free aU real and au~iliary storage. poSted for work. I CMS lock _ dis- J.. MEMTERM OPtions I I Address space terminator processor taSk ~ IEAVTMTR ~ 1 2 Set th.e address Ipace Indicator by ASCB non-dilpatchable. 'I :~~~tl~;aSt~~~d~tler L..J IPLl. It remains I I inactive until I I I to dequeued ASCBl Dequeue the last ASCB on the address space termination queue (QUEUE MOOIFICATION SERIALIZED via compare and swap.) I I CD rt to dequeued ASCB Rt I~ I~ '~_1__________I_EA_V_T_M_T_C____________-4 t:::::::::J ReSIdent task It Rt RTCTMECB space termination controller task in master address space I ~ I RO POST 3 MP-W.lt for task and SRB activity for this addreilipace to stop In other procellor. 4 Set RO to point to this terminating address space's ASCB. I ndicete the MEMTERM options in RI. Issue SVC 13 - to invoke the services of RTM2. RTM2 SVC 13 .. . Perform address space purges Return to caller. EXIT to the dispatcher. l ~ ~IGCOOOIF SR 14 ~ ~IEAVTERM 8 Atteeh a subtask to handle remainder of purges for the address space (pass ASCB in Rl). 9 If the address space termination ASCB queue pointer is not zero, do processing steps o to (£) - A TTACH Legend: for the n•• t ASCB. Otherwlle, the talk waltl for work (walt on RTCTMECBl. ______ Pointer ~ Control flow ~Dataflow WAIT Section 5. Component Analysis 5.14.9 Recovery Termination Manager (RTM) (continued) Error ID Error ID ensures that problem information recorded in SYSl.LOGREC, can be easily correlated with SVC dump information concerning the same problem. The error id function is invoked whenever the RTM is entered to process an error condition. The RTM determines if the entry is to process a recursive or a new error, a new error beingone.unrelated to a previous error. If an error occurs during the processing of a previous error, the error id has the same sequence number as the original error, but is given a new time stamp. In this way, the sequence numbers show that the errors are related, and the time stamps show the history of error processing. The RTM generates an entirely new error id: 1. Upon entry to RTMI for a machine-check error (module IEAVTRTH)" 2. Upon entry to RTMI in SLIH mode for non-recursive error processing. 3. Upon entry to RTM2 when there has been no previous error processing in RTMI. Control passed to RTM2 by RTMI does not result in a new error id if RTMI has already generated one. RTMI maintains the error id in either the SDWA or an EED. RTM2 maintains the error id in its work area, RTM2WA. At an appropriate point in the error processing, the error id is moved to the SDUMP work area (pointed to by RTCTSDWK), where it is stored until processed by SDUMP. The correct error id is passed to SySl.LOGREC when a software or hard machine check error record is written by RTM. Soft machine check error records do not contain an error id because no subsequent software recovery takes place following a "soft" error. IFCEREPI recognizes and prints the error id in the LOGREC software or machine check record. AMDPRDMP recognizes and prints the error id as part of the header information for an SVC dump, formatted as follows: ERRORID FOR THIS DUMP = SEOyyyyy CPUzz ASIDaaaa TIMEhh.mm.ss.t where: yyyyy zz aaaa hh.mm.ss.t represents the sequence number of the error id represents the error id logical processor represents the error id ASID represents the error id time stamp converted to read as hours, minutes, seconds, and tenths of seconds If an error id is not available for a dump (indicated to AMDPRDMP by zeroes in the error id header field), the message "NO ERRORID ASSOCIATED WITH THIS DUMP" is printed where the error id is normally found. To further increase the usefulness of error id, message IEA911 A is changed to include the error id when an SVC dump is taken. The message reads: IEA911A COMPLETE/PAR.TIAL DUMP ON SYS1.DUMPxx/UNIT=ddd ERRORID=SEOyyyy CPUzz ASIDaaaa TIMEhh.mm.ssit where xx and ddd have the same meaning as in the current message. 5.14.10 OS/VS2 System Programming Library-: MVS Diagnostic Techniques Recovery Tennination Manager (RTM) (continued) , SVC Dump Debugging Aids The SVC dump function of RTM is invoked when the SDUMP macro is issued. SVC dump produces dumps of system errors on a SYSl.DUMPxx or user-defined data set. SVC dump also produces abend dumps requested by SYSMDUMP DD statements. Items that are important for you to understand when debugging errors in SVC dump processing are described in the following topics: • Important SVC Dump Entry Points • SVC Dump Error Conditions • SYSl.LOGREC Entries Produced for SVC Dump Errors • Control Blocks Used to Debug SVC Dump Errors • Resource Cleanup for SVC Dump Important SVC Dump Entry Points The BRANCH= parameter on the SDUMP macro determines the SVC dump entry points and mainline processi~g to be used. BRANCH=YES Option Entry point lEAVTSDX is used for branch-entry SVC dumps. lEAVTSDX creates a summary dump in a'real storage buffer (if the SUMDUMP option is requested on the SDUMP macro), schedules one or more SRBs to invoke dump task (IEAVTSDT) processing for the requested address spaces, and then returns control to the caller. The branch-entry option is requested by many FRR routines and some ESTAE routines. This option is also requested when ACTION=SVCD is specified on the SLIP command. BRANCH=NO Option Entry point lEAV ADOO is used for the SVC entry to SVC dump. For scheduled dump requests (ASID or ASIDLST is specified on the SDUMP macro), lEAV ADOO calls lEAVTSDX which schedules one or more SRBs to invoke dump task (lEAVTSDT) processing for the requested a,ddress spaces, and then returns control to the caller. For synchronous dump requests (ASID and ASIDLST are not specified on the SDUMP macro), lEAVADOO processes the dump and then returns to the caller. The SVC entry is requested by many ESTAE routines. It is also requested by the DUMP command (as a scheduled dump), and by the abend dump processor (lEAVTABD) for SYSMDUMP DD statements (as a synchronous dump). SectionS: Component Analysis 5.14.11 Recovery Tennination Manager (RTM) (contipued) SVC Dump Error Conditions If the SVC dump function encounters an unexpected abend during its processing, it produces a software SYS1.LOGREC record and, if possible, continues taking the dump. Expected program checks can occur when SVC dump is checking whether a virtual page that is to be dumped is valid and assigned. These program checks do not result in SYSl.LOGRECentries. SVC dump issues abends 133 and 233 if it detects an unauthorized caller or invalid input parameter. In these cases, LOGREC entries are not created and retry is not attempted. SVC dump issues a COD abend for some unexpected errors during its processing. In this case, retry is attempted. SYS1.LOGREC Entries Produced for SVC Dump Errors The best starting place for debugging SVCdump problems is the SYS1.LOGREC entries contained in the in-storage SYS1.LOGREC buffer or in SYS1.LOGREC, because a dump of the SVC dump problem is generally not available. (SVC dump does not take a dump of its own problems.) Many SVC dump problems can be debugged from the SYS1.LOGREC entries alone. However, more complex problems may require a stand-alone dump that can be taken after a SLIP trap with ACTION=WAIT has matched. These problems include loops and failures to free critical system resources. Fixed Data The fixed data that SVC dump places in the system diagnostic work area (SDWA) for recording on SYS1.LOGREC is: SDWAMODN - Load module name (generally IGC0005A, which is the SVC 51 load module in SYSl.LPALIB). SDWACSCT CSECT (microfiche) name, which can be any SVC dump module name. For details of SVC dump module functions, interfaces, and flow, refer to OS/VS2 System Logic Library. - SDWAREXN - 5.14.12 O~LVS2 Recovery routine name, which is given as a label. This label is not always within the failing CSECT shown in SDWACSCT. System Programming Library,: MVS Diagnostic Techniques Recovery Termination Manager (RTM) (continued) The following table shows the label of the recovery routine, the name of the containing CSECT, and a description of the recovery processing. Label CSECT Description DTEST AEI IEAVTSDT ESTAE routine for scheduled SVC dumps that are executing under the dump task (IEAVTSDT) in the requested address space. lEAVTSDT also establishes SDESTAEX which can percolate to DTESTAEI. SCHFRR IEAVTSDX FRR routine for branch-entry to SVC dump and for the timer disabled interrupt element (DIE) used to free SVC dump's real storage buffer. This FRR is established by lEAVTSDX. SDEST AEX lEAVADOO ESTAE routine for mainline SVC dump processing. This EST AE is established by lEAV ADOO and IEAVTSDT. SDFRRRTN lEAVADOO FRR routine for mainline SVC dump processing. This FRR is established by SVC dump modules when a lock is held and a retry is needed in the locked state. IEAVADOO and IEAVTSDT are the main users of this FRR. SUMFRFRR lEAVTSSD FRR routine for SUMFRR routine processing. This FRR is established by the SUMFRR routine. SUMFRR lEAVTSSD FRR routine for summary dump processing invoked for branch-entered SVC dumps. This FRR is established by lEAVTSSD. If SUMFRR abends, SUMFRFRR receives control; and if it percolates, SCHFRR receives control. Variable Data The variable data that SVC dump places in the SDWA for recording to SYSI.LOGREC is: SDWAVRA Contains the 24-byte recovery routine parameter area if DTESTAEl, SDESTAEX, SDFRRRTN, SUMFRFRR, or SUMFRR is the recovery routine name in SDWAREXN. This area contains bits that indicate the resources held, other status bits, the retry address, the base register value, and the address of the SVC dump work area (ERRWKADR at X'8 '). The contents of the parameter area are mapped by the IHASDERR macro. The common name of the work area is ERRWORK. To obtain the offset into the failing module, subtract the base register field in ERRWORK (ERBASEI at X'C') from the address in the failing PSW (found in the SDW ANXTI field at X'6C'). Section 5: Component Analysis 5.14.13 Recovery Termination Manager (RTM) (continued) Control Blocks Used to Debug SVC Dump Errors The following control blocks contain key information that can be used to debug problems in SVC dump routines. • Address Space Control Block (ASCB) • Recovery Termination Control Table (RTCT) • SVC Dump Work Area (SDWORK) • Summary Dump Work Area (SMWK) Address Space Control Block (ASCB) The ASCB contains the address of the TCB for the SVC dump task (lEAVTSDT) in the ASCBDUMP field (at offset X'60'). In this TCB, the TCBEXSVC bit (loworder bit at X'CC') is set on while the SVC dump task is executing. Recovery Termination Control Table (RTCT) The RTCT is pointed to by the CVTRTMCT field (at X'23C') in the CVT. It contains SVC dump information including status bits, an array that describes the SYSl.DUMPxx data sets, and an array that contains information for the address spaces to be dumped. SVC Dump Work Area (SDWORK) The SDWORK is pointed to by the RTCTSDPL field (at X'9C') in the RTCT. It contains most of the reentrant storage used by SVC dump including register save areas, CCWs, and the I/O buffer that contains the 4104-byte SVC dump records before they are written to the dump data set. Summary Dump Work Area (SMWK) The SMWK is pointed to by the RTCTSDSW field (at X'B4') in the RTCT and contains fields used when a summary SVC dump was requested or defaulted (via the SUMDUMP option on the SDUMP macro). It includes counter fields that show how many real frames are used for the real storage buffer that holds the summary dump created for branch-entry callers of SVC dump. The count of real frames held .(field SMWKFRHD at X'C6') is zeroed after the summary dump is written to the dump data set and the frames returned to RSM. 5.14.14 OS/VS2 System Programming Library: MVS Diagnostic Techniques Recovery Termination Manager (RTM) (continued) Resource Cleanup for SVC Dump Resource cleanup performed by SVC dump includes: setting the system dispatchable, setting tasks dispatchable, freeing the summary dump real storage buffer, deleting the TQE for the storage buffer, restarting the system trace, writing end-offile on the dump data set, dequeueing the dump data set, and turning off indicators that an SVC dump is in progress. These resources are cleaned up by SVC dump's mainline processing or recovery routines. In special cases, the following routines also perform resource cleanup. If an address space terminates during SVC dump processing, SVC dump's MEMTERM exit (IEAVTSDR) cleans up the resources related to that address space (such as setting tasks dispatchable). If the address space was the last to be processed, then all resources are cleaned up and the SVC dump in-progress indicators (high-order bits in the CVTSDBF (at X'24C') and RTCTSDPL (at X'9C') fields) are turned off so that additional dumps can be taken. SVC dump also uses a timer DIE exit that is contained in module lEAVTSDX at label SCHDIE. This exit ensures that the SVC dump real storage buffer is returned to RSM if SVC dump encounters an error during processing (such as a loop ). Section 5: Component Analysis 5.14.15 5.14.16 OS/VS2 System Programming Library: MVS Diagnostic Techniques Communications Task The communications task (comm task) handles communications between console operators and the system (user programs and system routines). The types of communications that the communications task handles are: • Operator commands from a console as a result of an attention interrupt (on local devices). • Output to the operator caused by the write-to-operator (WTO), write-tooperator with reply (WTOR), and delete-operator-message (DOM) macro instructions. • External interrupts that are caused by the operator pressing the interrupt key on the operator control panel. The communications task switches the master console's function to an alternate console. • Automatic console switching from a failing console to its alternate when an unrecoverable I/O error occurs. • Console switching as the result of the VARY CHANNEL, VARY CPU, or VARY MSTCONS command. • Console switching as a result of a processor failure in a multiprocessing system as a part of alternate CPU recovery (ACR). Before processing WTO, WTOR, and DOM macro requests, the communications task passes control to the job entry subsystem (JES) responsible for the job issuing the request. The JES exit routine may suppress the message, or modify the message text or routing code. Multiple console support (MCS) is a standard feature that supports up to 99 consoles. With MCS, messages can be routed to up to 15 different functional areas. according to the type of information in the message. Device independent display operator console support (DIDOCS) is an optional feature that provides uniform console services for various display consoles. Section 5: Component Analysis 5.15.1 Communications Task (continued) Functional Description . The. ~ait service routine (lEAVMQWR) detennines the functions to be performed by the communications task. It is given control by the dispatcher (supervisor control routine IEAVEDSO) after one of the communications task's event control blocks (ECBs) has been posted. Upon each entry to the wait service routine, the entire list of communications task's ECBs is tested from top to bottom in priority sequence. The posted ECB identifies the service that will be performed by the communications task. As each service is completed, control is returned to the wait service routine and the entire list of ECBs is again tested for an active ECB. When no active ECBs are found, the wait serVice routine issues the WAIT macro which places the communications task in the wait state until the next communications task ECB is posted. In addition to testing for posted ECBs, the wait service routine checks other indicators (represented by control bits). The communications task ECBs and control bits are located in the unit control module (UCM) and unit control module entries (UCMEs). Figure 5-53 lists and describes the ECBs and control bits in the sequence that the wait service routine makes the tests. 5.15.2 OS/VS2 System Programming Library: MVS Diagnostic Techniques Communications Task (continued) ECB or Control Bit Function UCMARECB (in UCM) Alternate CPU recovery - the process of switching from one processor to another in multiple processor configurations. The communications task switches consoles as required. UCMXECB (in UCM) External interrupt - switches the master console functions from the current master console to the next available alternate console. This results when the operator presses the interrupt key on the console control panel. UCMAECB (in UCM) Attention interrupt - prepares the console (from which the interrupt was received) to accept operator input. UCMECB (in UCME) I/O processing complete - indicates a message has been sent to or received from a console. Results from the interrupt that an I/O device causes after performing each I/O operation. UCMPF (bit in UCME) Console output pending - indicates a message is queued and ready for some console. Results if (1) one message is queued for several consoles, or (2) a console is busy when a WTO or WTOR message is queued for that console. UCMSYSJ (bit in UCM) Hardcopy output pending - indicates a message is queued for hardcopy output and ready to be sent to a data set. Note: Before the following ECB is processed, the communications task tests the WOEs and may issue message IEA405E (80% of WOEs in use), IEA404A (limit of WOEs reached), or I EA4061 (shortage relieved). UCMOECB (in UCM) Queue message for output - prepares the message posted by the WTO or WTOR macro for output to the appropriate consoles. UCMSYSI (bit in UCM) Cleanup WQE chain - eliminates WOEs marked for deletion by system functions (such as task termination). UCMDECB (in UCM) Delete operator message - indicates that a DOM macro has been issued to (1) delete a WTOR message that the operator has not responded to, or (2) delete a WTO message when the issuer has determined that the requested action was performed. UCMNPECB (in UCM prefix) Write NIP messages to buffer - indicates that NIP messages stored during nucleus initialization can be written. Figure 5-53. Sequence of Communications Task Processing Section 5: Component J\nalysis 5.15.3 Communications Task (continued) Comm~nications Task Controll;llocks' The following control blocks are used by the communications task: UCM Unit control module - created at system generation. Contains pointers to the control blocks and routines that support the communications , task. UCME Unit controlmodule entry - created at system generation for each generated device. Contains information about the deVice including attributes, pointer to the UCB, I/O ECB and message queue for the device: WQE Write'queue element - created for each WTO or WTOR request.' Contains information about the request including message text and routing code. ORE Operator reply element - created for each WTOR. Contains information about the reply portion of a WTOR request including the buffer to receive the reply. CQE Console queue element - created for each console that is to receive a message. Contains information about messages queued to particular consoles. ElL Ev.ent indication list - created at system generation. Contains pointers to the various ECBs in the UCM and UCME. RDCM . Residentdisplay control module - created at system generation. Contains information about a display console. TDCM Pageable display control module - created at system generation. Contains DIDOCS work and save areas, pointers to related modules, and the screen image. CXSA Communications extended save area - used to communicate among communications task modules. Refer to Figure 5-54 for the relationship of these control blocks. 5.15.4 OS/VS2 System Programming Library: MVS Diagnostic Techniques Communications Task (continued) CVT TCB ..... Communications Task CVTCUCB -;:Y_EIL UCM Pointers to the ECBs in UCM and UCME UCMPXA "" UCMLSTP ~ UCMWTOO UCMWOEND WOE (lasd ~ WOE (firsd WOELKPA WOE WOETXT (message) UCMRPYO UCMVEA IRE 1-..... ORE UCMVEL /I ORELKP , UCMXB UCMOUTO " \ Uastl I ~Next ORE OREWOE (address of associated WOE) UCME (first) I~ WOETXT (message) .~ Next . ORERPYor OREOPBUF (reply buffert UCME (last) __ COE COEWOEA RDCM • • DCMADTRN f Pointers to associated WOEs (word 6) COE I TDCM DCMCXSVE --.......... I CXSA 1 I Figure 5·54. Communications Task Control Block Structure Section 5: Component Analysis 5.15.5 Communications Task (continued) Debugging Hints Hints for debugging various problems are described in this topic. Console Not Responding to Attention If a console is not responding to an attention interrupt, check the following: • The console attention processor (lEAVVCRA) may not be posting the attention ECB (UCMAECB) in the UCM. The communications task will not process the attention interrupt until the attention ECBis posted. This normally occurs when the console is inactive (UCMUF indicator in the UCME is off), a CLOSE is pending for the device (UCMCF indicator in the UCME is on), or the device does not support interrupts (UCMIF indicator in the UCME is off). • The attention processor may not be setting the attention pending indicator (UCMAF in the UCME) on for the console causing the interrupt. It is turned on when the attention ECB (UCMAECB) is posted. • If the attention pending (UCMAF) and busy (UCMBF) indicators in the UCME are both on, the attention interrupt will not be processed until an I/O processing complete interruption is received from the console. I/O processing is performed by a specific device support processor (DSP). The busy indicator (UCMBF) is turned on while the console is waiting for the completion of an I/O operation and is turned off when the I/O completion operation is processed. Enabled Wait State If the communications taskis in an enabled wait state, check the following: Norma/Case: The communications task has no work to do; that is, no communications task ECBs have been posted. Check the following ECBs (see Figure 5-53 f()r descriptions and locations of the ECBs). UCMXECB UCMARECB UCMAECB UCMNPECB UCMOECB UCMECB UCMDECB WQE Limit Reached: The system limit for WQEs or OREs has been reached (indicated by message IEA404A). • Check the following fields in the UCM: UCMWQNR. - indicates the current number ofWQEs in the system. 5.15.6 UCMWQLM indicates how many WQEs can be built. UCMRQNR indicates the current number of OREs in the system. UCMRQLM indicates how many OREs can be built. OS/VS2 System Programming Library: MVS Diagnostic Techniques Communications Task (continued) • Check the following indicators in the UCM prefix: • UCMSYSI - indicates that cleanup of the WQE chain is needed; that is, eliminate WQEs marked for deletion. This indicator is checked by the wait service (lEAVMQWR) and device service (lEAVMDSV) routines; and it is set on by the DaM processing (lEAVMDOM), wait service (lEAVMQWR), console queueing (lEAVMWSV), multiple-line processing (lEAVMWTO), and WTO!WTOR processing (lEAVVWTO) routines. UCMSYSJ indicates that at least one message needs to be sent to the hardcopy log. Possibly, the WQE space is filled with WQEs (messages) that need to be sent to the hardcopy log. This indicator is referenced by the wait service (lEAVMQWR) and device service (IEAVMDSV) routines, and it is set on by the wait service (lEAVMQWR) and console switching (lEAVSWCH) routines . UCM~YSM indicates a failure in a composite console. This indicator is used by the console switching (IEAVSWCH) routine. UCMSYSO indicates a dummy attention interrupt. This indicator is checked by the wait service (lEAVMQWR) routine. It is set on by the WTO/WTOR processing (lEAVVWTO) routine when the system log is not available and a WTL (write-to-log) is changed to a WTO macro. Disabled Wait State The communications task issues only one wait state code, code 007. This code is issued during nucleus initialization when a master console is not available to the system. See wait state code 007 in OS/VS2 System Initialization Logic. Messages or Replies Lost Messages and replies can be lost or routed incorrectly if the WQE, ORE, or CQE control blocks are not chained correctly. • To ensure that the WQE chain is intact, check the following: In the UCM, check fields: UCMWTOQ - points to the first WQE on the chain. UCMWQEND - points to the last WQE on the chain. In each WQE, check: WQELPKA WQEORE points to the next WQE on the chain. - indicates that an ORE exists for this WQE. • To ensure that the ORE chain is intact, check the following: In the UCM, the UCMRPYQ field points to the first ORE. In each ORE, check: ORELKP OREWQE - points to the next ORE on the chain. - points to the WQE associated with this ORE. SectionS: Component Analysis 5.15.7 Communications Task (continued) .To ensure that theCQE chain is intact, check the following: - In the UCME (for each console), the UCMOUTQ field points to the first group of six CQEs. - In each group of CQEs, the CQEWQEA field in the last (sixth) CQE points to the next group of CQEs on the chain .. Note: Each CQE is one word; one byte for control bits, and three bytes for a pointer. The. CQEs are built in groups of six. The first five CQEs point to WQEs and the sixth points to the next group of CQEs. - In each group of CQEs, the CQEWQEA field in the first five CQEs point to their associated WQEs. - In each CQE, the CQEFLAG byte contains the control bits. No Messages on One Co'nsole If messages are not being received on a specific console, check the following: • The device busy indicator (UCMBF) in the failing console's UCME may be on. A message is not processed until an I/O processing complete interruption is received from the console. I/O processing is· performed by a specific device support processO,r (DSP). The busy indicator (UCMBF) is turned on while the console is waiting for the completion of an I/O operation and is turned off when the I/O completion operation is processed. • If the console is not busy, ensure that the CQE chain for the console is intact. (Refer to the previous topic "Messages or Replies Lost".) • If the CQE chain is valid, then check for unusual status in the failing console's UCME and UCB. Messages Routed to Wrong Console The console queueing routine (IEAVMWSV) queues messages for specific consoles and builds the CQE chain. If messages are routed to the wrong console, then: • Ensure that the CQE chain is correct for the fa,iling console. (Refer to the previous topic "Messages or Replies Lost".) • Check the routing codes for each console. The, UCMRTCD field in each console's UCME defmes the routing codes for the respective consoles. • Check the routing codes for the messages that are being incorrectly routed: - In the WTO/WTOR WQE, the WQEROUT field contains the routing codes for the message. - In a major multiple-line WQE (for MLWTO), the WMJMRTC field contains the routing codes. S.lS.8 OS!VS2 System .Programming Library: MVS Diagnostic Techniques • Communications Task (continued) Truncated Messages If message text is being truncated (the length of the message text is shortened), then: • The message may exceed the maximum allowable bytes for console messages. • The console operator may have requested that time stamps and/or job names appear with the messages. Check the following indicators in the UCME for the failing console: UCMDISPI indicates that messages are to appear with both time stamps and job names. UCMDISPJ - indicates that only job names are to appear with messages. Console Switching Console switching is performed by the IEAVSWCH routine for the following error condi tions: • An I/O error occurs on a console. The failing console's function is automatically switched to its alternate (or, if none available, to the master console). Check the I/O interrupt ECB (UCMECB) in the failing console's UCME. Note that successful I/O completion is indicated by X'7F' in the first byte of the ECB. • An abnormal termination in the device support processor (DSP) that services the failing console. The failing console's function is automatically switched to its alternate (or, if none available, to the master console). Check the appropriate DSP in load module IGC0007B. • A processor failure in a multiprocessing environment as a part of alternate CPU recovery (ACR). Consoles are switched as required. Check the alternate CPU recovery ECB (UCMARECB) in the UCM. DIDOCS Trace Table A DIDOCS trace table exists in the pageable DCM (display control module IEETDCM) beginning at field DCMTRACE. The trace table contains the identifiers of up to 16 of the last DIDOCS modules to receive control on the console represented by the pageable DCM. After each DIDOCS module receives control, it places a two-byte identifier in the trace table. The first byte of the identifier states whether the module is an "E" module (such as IEECV~TA) or an "F" module (such as IEECV..ETA). The second byte of the identifier is the last character in the module name. For example, the identifier for IEECVETA is "EA and the identifier for IEECVFTI is "F 1". (An exception to this rule occurs during DIDOCS recovery processing. Entries to the ESTAE routine in IEECVETI are indicated by the identifier "ES".) Section 5: Component Analysis 5.15.9 Communications Task (continued) When DIDOCS is entered for the first time to perform an operation, the first DIDOCS module to receive control (module IEECVETI) places two bytes of asterisks in the trace table before it stores its identifier. The asterisks signal the beginning of a DIDOCS operation. DIDOeS-In-Operation Indicator At offset X'IIF' in a console's page able DCM (display control module IEETDCM) is a field labeled DCMMCSST. When DIDOCS is processing, bit DCMUSE (X'80') in DCMMCSST is set on. This bit remains on during any SVC processing initiated by DIDOCS (SVC34, GETMAIN, FREEMAIN, andEXCP). DIDOCS turns the bit off when DIDOCS exits (via BRI4). DIDoes Locking DIDOCS uses two fields (CSAXB and CSAXC) in the communications extended save area (CXSA) to control locking during DIDOCS operations. The two fields are used as follows: • When the lock is available: Field CSAXB contains the address of the subroutine that obtains the lock. Field CSAXC contains the address of a BRl4 instruction. • After a DIDOCS module obtains the lock, the subroutine that obtains the lock: Sets field CSAXB to the address of a BRl4 instruction. Sets field CSAXC to the address of the subroutine that releases the lock. • After the DIDOCS module releases the lock, the subroutine that releases the lock: Resets field CSAXB to the address of the subroutine that obtains the lock~ Resets field CSAXC to the address of a BRl4 instruction. When the lock is already held by a DIDOCS module (field CSAXB contains the address of a BRI4), any attempt by another DIDOCS module to obtain the lock results in a no-operation (NOP). 5.15.10 OS/VS2 System Programming Library: MVS Diagnostic Techniques Appendix A: Process Flows This appendix describes the flow of various MVS processes. These processes are described in the following chapters: • RSM Processing for Page Faults • Swapping • EXCP/IOS • GETMAIN/FREEMAIN • VT AM Process • TSO • Appendix A: Process Flows A.l.l A~1.2 OS/VS2 System Programming Library: MVS Diagnostic Techniques RSM Processing for Page Faults This chapter describes the important aspects of the RSM component's page-fault processing. Figure A-I outlines the major functions in this processing. During page fault processmg, several important tests are made. The following describes what these tests are, where they are made, and what they mean during the course of the RSM page fault process. lEAVPIX Tests lEAVPIX performs the following: • Checks the PGTE to ensure that PGTPVM is still on after the SALLOC lock has been obtained. This is done because in an MP environment the other processor might have validated the PGTE (turned PGTPVM off) between the time this processor page-faulted and the time the SALLOC lock was obtained. • Checks the PGTE to ensure that PGTPAM is on. If it is not, this is.a logical protection violation. lEA VGF A Tests IEAVGFA performs the following: • Checks XPTE to ensure that XPTDEFER is off. If it is on, some paging request (PCB) for this page has been deferred. (The PCB is on the GFA defer queue anchored in the PVT.) Normally, the fact that a paging request is currently outstanding is indicated by PFTPCBSI, but in the defer case there is no frame and therefore no PFTE is yet associated with the request. • Checks to see if RBNx from the PGTE is non-zero. If it is non-zero, the last used PFTE can be located. Once that PFTE is located, you can determine if the frame has been used for another purpose since last backing the requested VBN. • Checks the XPTE for XPTFREL. If XPTFREL is on, the RBNx of the PGTE is zero and there is a paging request (PCB) on an I/O queue for this page that must be "related to." Because only a swap produces this condition, which applies only to stage 2 working set pages, this bit can only be on in private area XPTEs. • If reclaim is successful, checks PFTONAVQ. If it is on, the reclaimed frame is available immediately; that is, no page I/O can be in progress for it. If it is off, the frame must be on either a local queue or the common queue and a PCB must be on an I/O queue. Appendix A: Process Flows A.1.3 RSM Processing for Page Faults (continued) ® PIC 11 IEAVEPC Determines type of program interrupt IEAVFP Locates PGT E and XPTE IEAVEDSO Saves status and selects next unit of work to run @ IEAVPIX Formats page fault PCB IEAVPCB Allocates a PCB from the PCB free queue IEAVPFTE Moves PFTE from AFQ to LFQ or CFQ IEAVSUSP (physically located in IEAVEPC) Suspends page faulting TCB/SRB and allocates an SSRB for an SRB mode page fault 0 0 IEAVGFA 0) IEAVPCB Queues PCB on I/O queue Note: Circled numbers indicate the sequence of processing. Figure A-I. Page Fault Process Flow (Part 1 of 2) ,A.l.4 OS!VS2' System Programming Library: MVS Diagnostic Techniques ILRPAGIO Passes AlA (ASM's request element) to ASM. If ASM is not active, schedules an SRB to the ASM monitor. RSM Processing for Page Faults (continued) ILRPAGCM ASM I/O complete processor (obtains SALLOC Lock) J@ ® " Normally given control via IECIOSCN " IEAVPIOP Runs with the SALLOC lock held in the address space interupted at I/O completion Schedules SRB to IEAVIOCP and validates PGTE for common area pages Marks PCB I/O complete (zeroes PCBVBN for common area PCBs) • • • IEAVIOCP • Runs in page faulting address space in SRB mode with SALLOC lock and sometimes also with local lock • Validates private area PGTE IEAVRSET (physically located in IEAVEPC) Sets dispatchable suspended TCB/RB or schedules SSRB IEAVPCB Returns PCB to PCB free queue Note: Circled numbers indicate the sequence of processing. Figure A-I. Page Fault Process Flow (Part 2 of 2) Appendix A: Process Flows A.1.S RSM Processing for Page Faults (continued) • Checks to see if Pf'TPCBSI is zero. If it is, there is an inconsistency and IEAVGFAissues"a·COD abend to record the error. IfPFTPCBSI is on, the VBNO value is used to select the PCB I/O queue to be searched and a PCB relate functiort is performed. • If an old frame with or without I/O in progress cannot be found, a frame is selected from the front of the AFQ. The PFTE is filled in and it is queued on either the common or the local frame queue. The XPTE (XPTXAV) is now checked to see if the paging data sets contain a copy of this virtual page. If XPTXAV is on, a page in operation is started; if it is off, the frame is cleared to zeroes. • If the AFQ is empty, the request is deferred by placing the PCB on the PCB defer queue (PVTGFADF) of the PVT. The XPTDEFER flag is set in the XPTE. • If a page~in is needed, the RBNO of the allocated frame is placed in the AlA (which is always physically adjacent to the PCB) and the AlA is passed to ASM. Processing then exits as shown by steps 9, 10 and 11 in Figure A-I. lEA VPIOP Tests lEAVPIOP receives control from ASM and is passed the AlA when I/O has completed. lEAVPIOP checks for an I/O error and marks the PCB I/O complete. If necessary, lEAVPIOP indicates an I/O error in the PCB. lEAVPIOP checks PCBFREAL to d~.termine if the reason for the page-in still exists. IfPCBFREAL=I, the page-in has been NOPed for some reason (such as FREEMAIN) and the frame is sent to the AFQ. If PCB FREA L=O , the PGTE must be validated. IEAVPIOP validates common area PGTEs but must schedule IEAVIOCP to validate private area PGTEs because they are in the LSQA of the page-faulting address space. If IEAVPIOP validates the common area PGTEs, PCBVBN is set to 0 to prevent a second validation by lEAVIOCP. lEAVIOCP will be scheduled if PCBRESET=I. PCBRESET is still one unless the PCB has been NOPed. lEAVIOCP Tests IEAVIOCP runs in SRB mode and gets the local lock according to SRBPARM. SRBPARM is set earlier by IEAVOPBR (a subroutine of IEAVPIOP) if IEAVIOCP will need the local lock. lEA VOPBR is called from several places in RSM; its sole function is to determine if lEAVIOCP will need the local lock and to schedule IEAVIOCP. A.1.6 OS/VS2 System Programming Library: MVS Diagnostic Techniques RSM Processing for Page Faults (continued) lEAVIOCP searches the local and common PCB queues looking for I/O complete PCBs. Once found, lEAVIOCP calls lEAVRSET for any I/O complete PCBs with PCBRESET= 1. The reset function (lEAVRSET in IEAVEPC) is responsible for making the suspended work (TCB/SSRB) redispatchable. lEAVIOCP validates the PGTE for any I/O-complete PCB with a non-zero PCBVBN, with PCBFREAL=O, and without an I/O error (PCBIOERR=O). When this is done, IEAVIOCP returns the PCB to the free queue. Because lEAVIOCP is queue-driven, it might not be able to get the local lock when it requests it. In such a case, it can be held in suspension by a page faulter whose PCB is on the queue lEAVIOCP is working on. Therefore, up to two SRBs can be scheduled for IEAVIOCP at one time. If IEAVIOCP does not hold the local lock and discovers an I/O-complete PCB that needs to be reset and for which reset requires the local lock (PCBLLHLD=O, PCBSRBMD=O, PC BPEX= 1, an unlocked TCB page fault), it can call IEAVOPBR to reschedule itself (exit to dispatcher). IEAVIOCP continues its scan of the PCB queues, doing any work possible before it exits to the dispatcher. Appendix A: Process Flows A.t.7 A.1.8 OS/VS2 System Programming Library: MVS Diagnostic Techniques Swapping This chapter describes the major considerations and decisions of the swapping processes (swap-in and swap-out). Swap-in Process The numbers in the following descriptions correlate to the circled numbers in Figure A-2. CD - G) ·1 CD - 0 CD SRM schedules lEAVSWIN and passes it the address of the ASCB in SRBPARM. IEAVSWIN obtains working-set size (SPCTWSSZ) +1 PCBs. It then scans the SPCT LSQA entries and fills in a PCB for each entry. Next IEAVSWIN scans the fix entries. For private area fix entries, it builds a stage one PCB. For common area fixes, it adds the SPCT fix count to the PFTE fix count. For common area fixes not in storage, it builds a PCB. Next, IEAVSWIN scans the SPCT segment entries and builds a PCB for each bit map entry. It then returns unused PCBs to the PCB free queue and calls lEAVGFA. If enough frames are not available for the stage one pages, IEAVGFA returns a code of eight to IEAVSWIN and sets PCBRETRY. IEAVSWIN notifies SRM via a SYSEVENT SWINFL to try the swap-in later. IEAVGFA allocates frames for both stage one and stage two PCBs and then calls ASM to start swap-in I/O. After swap-in I/O completes, the IEAVSWIN root exit IEAVSIRT is called by lEAVPIOP with stage one PCBs chained from the root PCB. IEAVSWIN does the following: • Updates PFTFXCT if any fix counts are greater than 255 • Sets ASCBSTO • Fills in SGTEs in non-translate mode • Fills in PGTEs in non-translate mode o lEAVSIRT calls lEAVPCB to free the root and all stage one PCBs. (j) lEAVSIRT calls ASCBCHAP to put the ASCB back on the ASCB queue. @ IEAVSIRT calls status to start both quiesceable and non-quiesceable SRBs. Appendix A: Process Flows A.2.1 Swapping (continued) IRARMCSI SRM schedules swap-in CD IEAVPCB SWIN gets SPCTWSSZ+1 PCBs PCBs on exit fromSWIN mainline IEAVSWIN (MAINLINE) Executes in SRB mode in master scheduler's address space. Builds PCBs and gets frames allocated ® ,0 IEAVGFA Allocates Stage 1'and Stage 2 frames '~ Stage 2 0 ILRSWAP' Starts Swap-in paging I/O Stage 1 I/O Completes I LRPAGCM (Normally executes iii the address space interrupted at the completion of I/O) IEAVPIOP Decrements ' SWIN root count; Calls root exit when count=O IEAVSWIN Entry IEAVSIRT (root exit) Rebuilds segment and page tables Schedules SRB to swapped-in address space IEAVSWIN Entry SWINPOST Post RCT to restore address space PCBs on entry to SWI N root exit IEAVPCB Frees root and Stage 1 PCBs IEAVEACO (ASCBCHAP) Places ASCB on dispatching Queue Private Area Stage 1 PCBs chained out of PCBRWRK1 and PCBRWRK2 Figure A-2. Swap-In Process Flow A.2.20S/VS2 System Programming Library: MVS Diagnostic Techniques IGG079 (entry IGC07903) Status start SRBs Swapping (continued) lEAVSIRT obtains an SRB from the RSM cell pool and schedules an entry point in IEAVSWIN (SWINPOST) into the swapped-in address space so it can post the region control task. SWINPOST posts RCT's ASCBQECB to restore the address space. Note that stage two frames are allocated at the same time as stage one frames. The XPTFREL flag is on in each stage two PCB's corresponding XPTE. Then, if a page fault or other request reaches IEAVGFA prior to stage two I/O complete, IEAVGFA can relate the request to ongoing I/O (see the chapter on RSM in Section 4 for a discussion of RSM's relate functions). IEAVIOCP sets the XPTFREL flag to zero and fills in the PGTRSA field when stage two I/O completes. lEAVSOUT sets to 0009 all PGTEs for which it made a bit map entry in the SPCT. Swap-Out Process IEAVAR02 SRM (IRARMCSO) posts the region control task (RCT) to swap out the address space. RCT is responsible for: • lOS purge processing: I/O requests that have been requested or started are purged or quiesced, respectively. • Halting all tasks in the address space with the exception of its own task. • Preventing quiesceable SRBs from executing. IEAVSOur The numbers in the following descriptions correlate to the circled numbers in Figure A-3. CD IEAVSOUT receives control from RCT and calls STATUS (IEAVSSNQ) to stop non-quiesceable SRBs. lEAVSOUT gets enough PCBs to page out every private area page in the address space plus one to be used as a swap out root. lEAVSOUT clears the swap control table (SPCT) LSQA, fix entries (SPCTSWPE), and all bits in the bit maps in the segment entries (SPCTSEGE). Prior to this, the SPCT reflects the status of the address space at the last swap-out. SPCTSEGEs provide a mechanism to check how many and which segments are obtained via GETMAIN in an address space because there is a SPCTSEGE for each private area segment that is obtained by GETMAIN. Appendix A: Process Flows A.2.3 Swapping (continued) IEAVAR02 Region Control Task j ~ IEAVSOut (2) Stops non-quiesceable SRBs 0 Gets ASCBFMCT+1 PCBs (1 extra for root) 0 0 0 0 G 0 @ @) @ @ In itializes SPCT Builds LSQA entries in SPCT SRB for IEAVPIOI Builds fix entries in SPCT from FOEs Initializes PCBs including root Purges paging I/O Completes Stage 1 PCBs (!.-SQA and Fixed) Completes working set PCBs (changed private area) Frees unused PCBs Schedule I EAVPI 01 to master scheduler's address space Returns to RCT After the SRB is dispatched in the master scheduler's address space: IEAVEACO (ASCBCHAP) Removes ASCB from dispatching queue PCBs are on local queue (RSMLlOQ) when received by PIO I @ IEAVINV Issues PTLB ILRSWAP Starts I/O swapping Figure A-3. Swap-out Process Flow A.2.4 OS!VS2 System Programming Library: MVS Diagnostic Techniques Swapping (continued) @ @ @ lEAVSOUT next initializes a PCB for each changed page on the local frame queue and sets a bit in the bit map (SPCTBITM) for all pages that are not to be stolen. The steal is based on a comparison of a criterion number passed by SRM in OUXBSTC to PFTUIC. lEAVSOUT returns any unused PCBs to the free queue. This marks where on the free queue the swap-out began. IEAVSOUT schedules an SRB for IEAVPIOI, releases the SALLOC lock, and returns to RCT (lEAVAR02), which waits for ASCBQECB to be posted by swap-in (lEAVSWIN). Because release of the SALLOC lock enables the processor, an address space is often swapped-out before RCT has gotten a chance to wait. When analyzing a stand-alone dump, you will see the following if the above case is true: • The RCT is dispatchable. • There is no wait count in RBWCF. • There are no frames allocated to storage (ASCBFMCT=O). • The address space is not on the ASCB dispatchability queue. Do not consider this situation a problem. /EAVP/O/ lEAVPIOI receives control in the master scheduler's address space. It calls ASCBCHAP to remove the ASCB from the dispatching queue, calls ASM with the string of AlAs passed to it from IEAVSOUT via the SRBPARM field, and calls lEAVINV to PTLB and exits. A.2.6 OS!VS2 System Programming Library: MVS Diagnostic Techniques ( ... Swapping (continued) o ) lEAVSOUT builds a two-byte LSQA entry for each frame on the LSQA frame queue. lEAVSOUT builds a four-byte fix entry for each page (private or common) that has an FOE on any TCB in this address space. The fix count is added into the ftx entry SPCTSWPE. Note that ftxes done without a TCB address supplied do not have FOEs. lEAVSOUT initializes a root PCB to zero and places the address of IEAVSORT in PCBRGOTO. It initializes the remaining PCBs, which might be used to swap-out a page as follows: Partially initialized Swap-Out PCB FF 000000 00 06 00 80 80 00 00 000000 A(ROOT) Root and output 000000 000000 000000 000000 000000 PCBFREAL Swap-out A(ASCB) :< ) AlA 00 00 18 00 00 00 00 000000 000000 COOOOO Swap-out and write 000000 000000 000000 000000 lEA VSOUT purges paging for this address space on the common PCB I/O queue, local PCB I/O queue, and the GFA deferred queue. The processing is to post users waiting on ftxes, reset page faulters, and to NO-OP the PCB. (Fix entries are made for PCBs found for zero TCB ftxes.) The NO-OP process makes the PCB look like a cancelled page load PCB; that is, no notification (RESET/POSTING) is to be done and the frame is to be freed. PCBs on the GFA defer queue are removed. The only exceptions here are for zero TCB fixes for which no entries could be made in the SPCT (GETMAIN for SQA failed). These PCBs remain unchanged and the fixed frame remains fixed throughout the swap .. IEAVSOUT fills RBNs and VBNs into PCBs for each LSQA or ftx entry now in the SPCT. Even unchanged fixed or LSQA pages are paged out. Appendix A: Process Flows A.2.S EXCP/IOS Figure A-4 is an overview of the I/O process through MVS using EXCP as the driver. The following outline correlates to this process. 1. Problem program issues GET/PUT (implied wait). 2. Problem program branches to access method. 3. Access method issues SVC 0 (EXCP) to EXCP front end. or Access method issues SVC 114 (EXCPVR) to EXCP front end. 4. EXCP front end: a. Validates request. b. Builds RQE. c. Queues related requests. d. If a VIO data set, goes to window intercept processor. e. Builds SRB/IOSB. f. If a virtual user, gets TCCW and BEB. g. Branches to PAGE FIX appendage (if specified and not a V=R region). h. Branch returns. i. If EXCPVR request, fixes pages from PAGE FIX appendage. j. Fixes DEB for V=R user if not already fixed. k. If a DASD device, branches to END of EXTENT appendage (if seek address is out of specified extent). 1. Branch returns. m. Branches to START I/O appendage if specified. s. n. Branch returns o. If virtual user: translates CCWs, fixes pages for buffers, and builds IDAL. p. Issues START I/O macro (branch entry to IDS front end). IDS front end. a. Builds IOQ. b. Selects physical path (channel scheduling). c. If path available, adds prefix CCWs and issues SID; otherwise, queues 10Q on LCH. d. Restarts all queued I/Os to available channels. e. Branch returns to EXCP front end and branch returns from EXCP front end to problem program WAIT. Appendix A: Process Flows .A.3.1 > ~ Enabled, Problem Program, User Key, Under TCB Enabled, Supervisor, Key 0, Under TCB, Local Lock Disabled, Supervisor, Key 0, Under TCB EXCP Front End o rn BR I'Access Mother GET/PUT "< rn ~ .fIJ S- Enabled, Supervisor, Key 0, Under TCB, Local Lock :3 ~ i r :3 Branches (If PAGE FIX Appendage Specified and Not V=R Region) Fixes DEB For V=R if Not Fixed Yet. BRANCH Branches (DASD Device and Seek Address is Out of Extent) EOE ") Appendage BRANCH ~ ----'-- ~ Branches (If SIO Appendage Specified) --o n I BRANCH SIO ~4 APpendage) ~ rn Prefixes CCWs Issues SIO (Instruction) Restarts All Queued I/Os to Available Channels Returns BRANCH ~. ~ Queues 10Q on LCH Return To Caller Yes BRANCH PAGE FIX ' - Appendage IDS Front End Builds 10Q Selects Physical Path (CHANNEL SCHEDULING) Validates Request Builds RQE Queues Related Requests If VIO Data Set, Goes to WINDOW INTERCEPT PROCESSOR Builds SRB/IOSB If Virtual User, Gets TCCW and BEB SVC 114 (EXCPVR) EXCP r+-- SVC 0 (EXCP) IV BRANCH "'tJ Channel Program Execution If Virtual User: Translates CCWs, Fixes Pages For Buffers, Builds IDAL 00 Enabled, Problem Program, User Key, UnderTCB S! ..... - t - - - - -- - - WAIT '- ) . ECB=ECBX 4 ~ --- BRANCH .J ::to (') ~ CD lOS EXCP Processor User ~ - - - - - Disabled, Supervisor, Key 0 - - - - - - g. '(;" Issues START 10 (Macro) II -r -- BRANCH Returns - --. -- -~~ --:-~b~~ -t- - - Supervisor, Disabled~~ l-·- - - --""- Disabled Interrupt Exit (DIE) .c.= lOS Back End = fIJ BRANCH PCI Appendage FLIH ) Maps 10SB/IOB BRANCH L ( -- --- I I If PCI and V=R or EXCPVR 4 CD o a~. I/O Interrupt I I Maps 10B/IOSB I Queues Type 3 Related Requests BRANCH TRAS I I BR1ANCH .. TRAS Schedules POST STATUS (Global SRB) Channel Restart Returns to FLiH - Supervisor, Enabl;d,""U;;;t(ey, Local Lock, Under S~ Appendages PCI (V=V) CE I ~ • ABE Figure A-4. IOS/EXCP Process Flow BRANCH BRANCH I --S~isor, Enabled, Key 0, Under SRB-- ~ II .• -- S~pervisor ~abled, Key 0, Under SRB IDS POST STATUS I II f EXCP Back End BRANCH Maps IOS8/IOB • . If Exit Processing (PCI, CE, ABE) .. Maps 10B/IOSB _ _ _ _ _-+._ "B...,R_A_N_C_H_t-.- - t . - - - - - - - - , Termination: Maps 10SB/IOB Yes No Start Related Requests BRANCH Error? Schedules ERP Free Control Blocks Exits to Dispatcher Posts ECB = ECBX Exits to Dispatcher I I <- From Dispatcher --2- EXCP/IOS (continued) 6. lOS back end (entry from I/O FLIH) entered as a result of I/O interrupt. a. If DIE is specified: (1) TRAS (translates address space - to get addressability to control blocks in originating address space). (2) Branch enters DIE. (3) If PCI and V=R or EXCPVR, maps 10SB to lOB and branch enters PCI appendage. (4) PCI processing. (5) Branch returns to DIE. (6) Maps lOB to 10SB. (7) Queues type-3-related requests. (8) Branch returns to lOS front end. (9) TRAS (returns to addressability at time of interrupt). 7. b. Schedules POST STATUS [global] (means POST STATUS will be entered via dispatcher). c. Branches to channel restart to start queued 10QEs on LCHs. d. Returns to FLIH. e. If system was in SRB mode, loads PSW for SRB or returns to dispatcher. lOS POST STATUS (scheduled from lOS back end). a. If PCI, CE or ABE appendages specified: (1) Branch enters EXCP back end. (2) Maps 10SB to lOB. (3) Branch enters appropriate appendage. (4) Appendage processing. (5) Branch returns to EXCP back end. (6) Maps lOB to 10SB. (7) Branch returns to lOS POST STATUS. b. If error, schedules ERP. (See 8.) c. Branches to EXCP back end for termination processing. (1) Maps 10SB to lOB. (2) Starts related requests. (3) Unfixes buffer pages. (4) Posts ECB (the one after the GET/PUT). (5) Exits to the dispatcher. Appendix A: Process Flows A.3.3 EXCP/IOS (continued) 8~ A.3.4 ERP interface. a. If IBM ERP, get ERP work area. b. If DASD(IECVDERP), branch to ERP. c. If non.. DASD, schedule ERP loader (IECVERPL) under SIRB. Use stage II exit effector to queue SRB to ASXBFSRB. Set stage II exit effector switch in ASCB. OS!VS2 System Progtamminj Library: MVS Diagnostic Techniques GETMAIN/FREEMAIN This chapter describes the processing for virtual storage requests in terms of GETMAIN processing and FREEMAIN processing. The flow through the GETMAIN/FREEMAIN process is complicated and the VSM control block structure should be understood prior to following this process. This process flow is not intended to explain exactly how GETMAIN/FREEMAIN works but to provide an understanding of the important considerations of virtual storage management, how the important control blocks are manipulated, and the common subroutines of VSM. GETMAIN Processing The following describes the processing required to satisfy a given GETMAIN. 1. A problem program issues an SVC 10 GETMAIN for subpool 0 for 256 bytes. 2. GETMAIN (entry at IGCOI0) saves the TCB addresses in LDA, sets IEAVGMOO's FRR (module IEAVGFRR), sets up the length and subpool ID for·common processing routines, saves the caller's mode in LDARQSTA, and goes to the common GETMAIN routine, GMCOMM 1.. 3. GMCOMMI goes to routine CSPCHK to find the SPQE for the requested subpool. CSPCHK is a key routine for defining the characteristics of various subpools. For subpool 0, CSPCHK searches the TCBMSS chain~ If no SPQE is found, CSPCHK returns a zero for the address of the SPQE and saves the address of the previous SPQE on the chain in SPQE SAVE. GMCOMMI then calls routine QSPQESPC to get a 16-byte element to build and chain-in an SPQE for the requested subpool. The 16-byte blocks for internal control blocks are obtained via GETMAINC (a simple GETCELL function). 4. GSPQESPC passes control to (label) ROUND where the request is rounded up to a doubleword boundary. 5. GMCOMM calls GFRECORE to search the FQEs pointed to by each DQE for the appropriate subpool. A best-fit algorithm is used to fmd the smallest free elementlarge enough to satisfy the request. Exception: LSQA/SQA requests for 4096 bytes or less are not satisfied across page boundaries because the request can be for page or segment tables that must reside in contiguous real storage. 6. If storage is found in an FQE, GFRECORE calls GFQEUPDT to maintain and update the FQE chain. (Control is passed to step 9.) Appendix A: ProcessFlows A.4.1 Virtual Storage Requests (continued) 7. If storage is not found in an FQE, GFRECORE determines the number of 4K-blocks that are required and calls G4KSRCH to satisfy the request. 8. G4KSRCH performs the following functions: 9. a. Calls FBQSRCH to search the appropriate FBQE chain to fmd 4K bytes of free space. (For problem program subpools, TCBPQE points to PQE which pOints to FBQE.) Once found, FBQSRCH removes the space from the FBQE and, if the FBQE is empty,frees it via an internal FREEMAIN (FMAINB) or an internal freecell (FMAINC). b. Acquires a DQE and chains it onto the DQE chain anchored in the SPQE. c. Calls RSM (lEAVFP1) to locate the page table entry (PGTE) and the external page table entry (XPTE) of the new 4K-block. Then at label SETUPPTE it initializes both the GETMAIN-assigned flag (PGTPAM) in the PGTE and the XPTPROT (protection key) in the XPTE (+0). Note: This is the only place XPTPROT is set. d. Updates the SMF region usage fields of the TCT (task control table). e. Creates an FQE and chains it from the DQE that was just built. f. Returns to GMCOMMI. GM~OMMI places the address of the allocated storage in register 1 and sets the return code. Then GMCOMMI performs housekeeping of any areas chained from FMAREAS in the LDA, deletes the FRR, and passes control to the EXIT prologue. FREEMAIN Processing The following is a logic flow of the FREEMAIN process when a problem program issues an SVC 10 requesting 256 bytes from subpool O. 1. 2. A.4.2 Upon entry at IGCOI0, FREEMAIN: a. Saves the TCB address in LDA. b. Establishes the FRR (lEAVGFRR). c. Saves the callers mode in LDARQSTA. d. Sets up the length and subpool ID for common processing. e. Passes control to FMCOMM 1. FMCOMMI passes control to FMCOM because the request is not to free an entire subpool. FMCOM calls CSPCHK to locate the SPQE. The associated DQEs are searched to locate the one DQE that describes the area to be freed. OS/VS2 System Programming Library: MVS Diagnostic Techniques Virtual Storage Requests (continued) 3. Label QELOCATE ensures that the area is not already described in an FQE (if it is, the requestor is abnormally terminated). Subroutine CREATFQE obtains a 16-byte element for an FQE, then builds the FQE and adds it to the proper FQE chain. Note: If possible, FQEs are combined if the new free space is adjacent to free space described by an existing FQE. 4. If less than 4K bytes are freed, FREEMAIN has completed its task and control is passed to the EXIT prologue. 5. a. If all space described by a DQE has become free, FREEMAIN frees the FQE and DQE and notifies RSM (IEAVRELV) that a page(s) can be released. b. If a virtual page is freed, FREEMAIN frees the FQE (and adjusts the DQE if the free pages exist at either end of the described area) and notifies RSM (IEAVRELV) to release the page(s). c. If the free page exists in the middle of the area described by the DQE, FREEMAIN obtains a new DQE and the two DQEs will now describe the area (essentially the area has been split into two parts). FREEMAIN updates the associated FQEs and notifies RSM (lEAVRELV) to release the page(s). Note: RSM invalidates the PGTE(s) for the associated pages being freed and calls ASM to release the auxiliary storage copy of the page. If a page table has become completely free, lEAVGMOO is passed the PGT address, which is queued from a field in the LDA (FMAREAS) to be freed at exit time. FMAREAS is really a list of items no longer required to describe virtual storage. 6. After restructuring the DQEs, MRELEASE returns virtual space to the appropriate FBQEs. If possible, MRELEASE places 4K blocks of storage in an existing FBQE; if not, it builds a new FBQE and includes it in the existing FBQE chain. 7. FREEMAIN returns to FMCOMM 1A, which performs FMAREAS bookkeeping, deletes the FRR, and returns to the caller. Note: FMAREAS anchors a one-way chain of areas to be freed. The area itself contains the address of the next area at offset +0 and the subpool's ID and length at offset +4. These areas are not freed immediately by IEAVGMOO because freeing them might cause register save area overlays on the double recursion into FREEMAIN processing. Appendix A: Process Flows A.4.3 A.4.4 OS/VS2 System Programming Library: MVS Diagnostic Techniques VTAM Process The following shows the logic ·flow through the VT AM component into lOS and out to the 3705 when an application issues a SEND request. This description includes the major module flow and the control blocks required in order to process the request. Note that this is a general processing flow; additional modules not shown can be entered depending on options and device type. Figure A-5 illustrates the system modes at various stages of the VTAM processing. 1. The application program issues the VT AM SEND macro, passing an RPL (request parameter list), which points to the data that is to be sent. 2. The SEND macro branches to a VTAM interface routine (ISTAICIR). 3. 1STAICIR determines that this is a non-authorized request and issues the VTAM SVC. This is a type 1 SVC (SVC 124). 4. The type 1 SVC routine (1ST APC22) obtains an MQL (MPST queueing element), places the address of the user RPL in it, and issues the TPQUE macro to queue the MQL to the TPIO PAB for the application's address space. 5. The TPQUE macro (normally) issues the TPSCHED macro in order to schedule the TPIO PAB. 6. The TPSCHED macro invokes ISTAPC32, which queues the TPIO PAB to the memory process scheduling table (MPST), and schedules an SRB to execute ISTAPC55. 7. ISTAPC32 returns to ISTAPC22. 8. 1STAPC22 issues a Type 1 exit back to ISTAICIR. 9. 1STAICIR determines if the request was synchronous or asynchronous. If it was synchronous, it issues the WAIT macro. If it was asynchronous, it returns control to the application program. 10. When the SRB is dispatched, ISTAPC55 de queues the TPIO PAB from the MPST, obtains a component recovery area (CRA) from the large pageable (LP) pool and passes control to 1STAPC57 . 11. ISTAPC57 formats the request parameter header (RPH) (within the CRA), dequeues the MQL from the TPIO PAB, and passes control to 1ST APC23. 12. ISTAPC23 releases the MQL, obtains a Copy RPL (CRPL) from the CRPL pool and copies the user RPL into it. 13. ISTAPC23 then issues the TPQUE macro to queue the CRPL to the control layer outbound PAB in the appropriate FMCB and schedules control layer processing. Appendix A: Process Flows A.S.I VT AM Process (continued) Application's Address Space Task Mode I Application's Address Space SRB Mode Any Address Space Disabled Mode I/O interrupt \ VS2 Dispatcher Via SRB ~ I Exits to VS2 Dispatcher I I I I Figure A-S. VTAM SEND Process Flow A.S.2 OS/VS2 System Programming Library: MVS Diagnostic Techniques VTAM Process (continued) 14. ISTAPC23 then issues the VTAM TPE2'IT macro, which passes control to ISTAPC31. 15. 1STAPC31 recognizes that there is more work to do (control layer processing) and passes control to ISTAPCS7. 16. 1STAPCS7 reformats the RPH (within the same CRA) for processing by the control layer . 17. ISTAPCS7 then passes control to the control layer (ISTDCCOO). 18. ISTDCCOO recognizes that this is a SEND request, obtains a logical channel.i program block (LCPB) from the CRPL pool, and invokes ISTRCC22. 19. ISTRCC22 sets up the logical channel command words (LCCWs) in the LCPB from the options in the CRPL and issues the TPQUE macro to queue the LCPB to the TPIOS outbound PAB in the FMCB and schedule TPIOS processing. 20. ISTRCC22 then passes control to ISTCDDOO, which issues the TPEXIT macro. 21. The TPEXIT macro passes control to 1ST APC31, which recognizes that there is more work to do (TPIOS processing) and passes control to 1STAPCS7 . 22. 1STAPCS7 reformats the RPH (within the same CRA) for TPIOS processing and passes control to ISTZAFIB. 23. Within TPIOS, ISTZDFAO allocates the fIxed I/O buffer; ISTZDFCO and ISTZDFDO move the user data to the I/O buffer. 24. Once the data is moved from the user's buffer, TPIOS invokes a routine (ISTRCFYO) which calls 1STAICPT. 1STAICPT copies the CRPL back to the user's RPL, frees the CRPL, and POSTs the ECB complete. 25. ISTRCFYO then frees the LCPB and returns control to TPIOS. 26. TPIOS then invokes ISTZEMBB, which obtains the UCB lock for the 370S and checks the ICNCB (intermediate controller node control block) to see if there is an active channel program currently executing for the 3705. 27. If the 3705 is busy, ISTZEMBB queues the I/O buffer to the ICNCB write queue, releases the UCB lock, and returns to TPIOS. (Go to step 29.) 28. If the 3705 is not busy, ISTZEMBB calls ISTZEMAB, which issues the STARTIO macro to lOS ana then returns to ISTZEMBB, which returns to TPIOS. The 10SB, which is the "interface to lOS, physically resides within the ICNCB. 29. After ISTZEMBB returns to TPIOS, TPIOS issues the TPEXIT macro, which invokes 1STAPC31. Appendix A: Process Flows A.S.3 VTAM Process (continued) 30. 1STAPC31 recognizes that there is nothing more to do and calls 1STAPC58. 31. ISTAPC58 frees the CRA and exits to the VS2 dispatcher. 32. Sometime later, an I/O in terrupt occurs as a result of the write channel program completing. 33. lOS passes control to the VTAM DIE (disable interrupt exit) (ISTZFM3B). 34. ISTZFM3B frees the I/O buffer and returns to lOS, indicating that POST STATUS should not be scheduled. 35. lOS exits to the dispatcher. A.S.4 OS/VS2 System P~ogramming Library: MVS Diagnostic Techniques TSO Following are some of the more important processes involved with the TSO/TIOC/TCAM interface portion ofMVS. The processes are: • Time Sharing Initialization • LOGON Processing • TSO Line Drop Processing • TMP and Command Processor Interface • TSO Command Processor Recovery • TSO Terminal I/O Overview • TSO/TIOC Terminal I/O Diagnostic Techniques • TSO Attention Processing Time Sharing Initialization The system operator issues the MODIFY command (F TCAM, TS=START) to initialize the time sharing system. Terminal I/O control (TIOC) logic is documented in OS/VS TeAM Level 10 Logic. The major functions that occur during time sharing initialization are: 1. The SYS 1.PARMLIB member IKJPRMxx is read to determine the TIOC buffer size and number, the maximum number of time sharing users allowed to be logged on at one time, and thresholds for the maximum number of TIOC buffers a single user can use at one time. 2. The main control block for the time sharing system (TIOC reference table TIOCRPT) is initialized. This control block points to the free queue of TIOC buffers and has status flags indicating whether the system is in an LWAIT (out of TIOC buffers). The TIOCRPT also points to a pool of terminal status blocks. 3. The pool of terminal status blocks (TSBs) is built. The number of TSBs is determined by the maximum user parameter in IKJPRMxx. A TSB is assigned to a user during logon processing. The TSB connects the ASCB of the user to the terminal-name table entry of the terminal. From the terminalname table entry, TCAM can locate the terminal table entry for the user and hence the address of the destination QCB. The TSB contains input and output queues for TIOC buffers that are used by the time sharing user. The TSB also contains status indicators that record whether the user is in an input wait (TGET issued and no TIOC buffer on TSB input queue) or an output wait (maximum number of TIOC buffers used for output). Appendix A: Process Flows A.6.1 TSO (continued) Terminal User Issues LOGON , - - -- -- -- -- -- --I-;a-;;S-;;;'':;;;--' I Address Space I I I I I I I II I TIOC ATTACH SVC34 'LOGON' IEDAY3 TIOC LOGON SYNC POST IEDAYL and IEDAYLL 1 I I I I I TeAM Address Space I ---1 L------r----. _I_- I I I I I I LOGON Processor XCTL STC New User Address Space Note: Details of this process are shown in part 2 of this figure. Figure A-6. Overview of Logon Processing (Part 1 of 2) A.6.2 I OS/VS2 System Programming Library: MVS Diagnostic Techniques TSO (continued) LOGON Scheduling IKJEFLA STC XCTL - Installation Exit Logon Initialization I ILiNK IKJEFLB IKJEFLE IKJEFLC Logon Scheduler Router ATTACH Logon Monitor LINK Logon Verification Calls Job Scheduling Subroutine POST Schedules Session CALL Logon Information Routine LINK Pre-TMP Exit ATTACH TMP Issues "READY" Message IKJEFLH IXCTL IKJEFLJ IEESB605 Job Scheduling Subroutine (STC) Figure A-6. Overview of Logon Processing (Part 2 of 2) Appendix A: Process Flows A.6.3 TSO (continued) 4. The TIOe buffer pool is bui~.t. The number and size of the buffers is determined from IKJPRMxx .. If no parmllb member was specified on the MODIFY TeAM command, SYS1.PARMLIB is searched for the default parmlib member name - IKJPRMOO. If this member is not found, standard default values are used. . S. The 'TSO HAS BEEN INITIALIZED' message is issued (via WTO). LOGO N Processing The major functions of LOGON processing are: 1. TeAM handles line I/O and routes the buffer to the TSO message handler. The message handler routes the buffer to various functional routines. One of these is logon. 2. The logon routine receives control from the TSO message handler as a result of the expansion of the LOGON macro. Logon routes the buffer to TSINPUT s'o that logon scheduling may retrieve it via a TGET sve. TIOe logon then issues an sve 34 to notify the master scheduler that logon processing should be started. TIOe then issues QTIP 10 to initialize control blocks. Note: QTIP is the TIOe code invoked when sve 101 is issued. It performs functions rel(ited to communication between the TeAM and TSO user address spaces. The specific function it is to perform is indicated by an entry code (for example, QTIP 10). A table of entry codes, their callers, the functions performed, and the modules that provide the function is contained in OS/VS TeAM Levell 0 Logic. QTIP issues an XMPOST to inform the master scheduler that TIOe initialization is complete and that memory create may begin. TIOe then returns to the message handler for final buffer disposition. If logon fails or is terminated, TIOe is notified so that the appropriate error message can be issued. 3. TSINPUT invokes QTIP to move the contents of the TeAM buffer to the TIOe buffers. This data can then be accessed using TGET services. 4. The master scheduler recognizes that a logon has been requested and attaches TIOe synchronization. This routine waits until QTIP signals with a post that memory create can begin. Once an address space has been initialized for the logon request, the region control task is the first task to be dispatched. S. Region control establishes an ESTAE routine, attaches the dump task, attaches started task control, and waits for one of the following: • An attention request signaled by TIOe via XMPOST • A swap request signaled by SRM • A termination request A.6.4 OS!VS2 System Programming Library: MVS Diagnostic Techniques TSO (continued) 6. Started task control recognizes that logon is requested and passes control to logon initialization (IKJEFLA). 7. Logon initialization opens the DADS and broadcast data sets, initializes control blocks, and calls logon scheduling (IKJEFLB). 8. The logon load module contains four service modules. One,IKJEFLPO, contains the default values for the number of seconds requested between 'LOGON PROCEEDING' messages and the number of logon attempts allowed before automatic logoff. Both values are sysgen options on the TSO macro. The logon scheduler attaches the logon monitor (IKJEFLC). The scheduler and monitor now begin parallel processing. WAITs and POSTs are used when synchronization is required. 9. The logon monitor (IKJEFLC) builds the environment control table (ECT), sets the first element of the input stack to indicate terminal input, and links to logon verification. 10. Logon verification (IKJEFLE) calls the user's pre-prompt exit if it was coded. Logon verification makes the· following checks: • Determines (via ENQ) if the userid is in use. • Checks the user's password, account number, and procedure name. • Checks the.performance group requested in the LOGON command. Logon verification prompts the user for missing parameters if required parameters do not have defaults in the DADS. After all required parameters have been obtained, verification builds the JOB and EXEC statement images for the session. The EXEC statement contains the name of a logon procedure specified in the DADS or the LOGON command. 11. Logon verification posts the logon scheduler when the parameters are complete and the job can be scheduled. The scheduler's job now is to cause the broadcast messages to be listed at the terminal at the same time that the user's job is being scheduled. To do this, it posts the monitor task and then XCTLs to the initiator, passing it the JCL that has been created. 12. The logon monitor regains control when signaled by the logon scheduler, attaches the LISTBC command processor to write broadcast messages to the terminal, and then waits for a post from a special initiator logon routine. This post signals that final processing can be completed. 13. The initiator uses the TSO internal reader to send the logon job to JES2. JES2 reads the user's procedure from the procedure library specified by the &TSD job class parameter and changes the JCL to internal text. This is placed on the spool data set. Once this processing has completed, the initiator requests the user's job by ID and completes initiation and allocation. Initiation finally gives control to a special TSO routine (pre-TMP exit, IKJEFLJ). This routine posts the logon monitor and issues a WAIT. The logon monitor then terminates. This causes the initiator task to regain control. The logon monitor is then detached. Once the monitor is detached, the initiator attaches the TMP and waits. Appendix A: Process Flows A.6.S TSO (continued) 14. The TMP (specified as IKJEFTO 1 in the LOGON PROC on the EXEC statement) performs initialization and then issues a PUTGET to write the 'READY' message a~d request a command from the user. This PurGET results in a TPUT to send READY and a TGET to request terminal input. The user is now in an input wait. This signals SRM to perform a swap-out until input is available. Figure A-7 shows TCAM's organization after a TSO logon. The following are detailed descriptions of the logon process including information on control block manipulation. The numbers in parentheses correlate to the numbers in the preceding summary of the logon process. TIDe Logon Processing (2): • Checks the maximum user count in TIOCRPI'. • Issues SVC 34 'LOGON'. • . Places the returned ASID in the QCB for this line. • Calls QTIP (entry 10) to fmd and initialize the TSB. - Puts TSB address in the ASCB for the user's address space. Puts the ASCB address in the TSB. - Updates the user count. - Puts the UCB address in. the TSB. - XMPOSTs 'TIOC SYNC'. • Sets the QCB to indicate TSO. • . Pass the logon message buffer to TSINPUT QCB (which is now available to system logon processing via GETLINE). A.6.6 OS!VS2 System Programming Library: MVS Diagnostic Techniques TSO (continued) Common Storage CVT Asce ASVT TS buffers (with data) TCAM's Address Space TSINPUT ace , MCP H " / TeAM buffers MH 7 ............. "'" ." \ , \ 4 TSINPUT ace I -TSO - User ASID Figure A-7. TeAM Organization After a TSO Logon Appendix A: Process Flows A.6.7 TSO (continued) Logon Initialization (IKJEFLA) (7): Logon initialization uses the address; of the ASCB as input and does the following: • Ensures SYSI.UADS and SYSI.BRODCAST data sets are allocated. • Gets the LWA (logon work area) from the LSQA. (See Figure A-8.) • Puts the LWA address in the ASXB. • Gets the JSEL Gob scheduling entrance list) from the LSQA. • Puts the CSCB and ASCB addresses in the JSEL. • Gets the JSXL Gob scheduling exit list) from the LSQA. • Puts the LWAaddress in the JSXL. JSXL contains pointers to the PRE·TMP, POST·TMP, and PRE·FREEPART exits. • Puts the JSXL address in the JSEL. • Gets the UPT (user profile table) from sub pool 230. • Issues BLDL for the installation exit routine (Release 2 only). • Gets the PSCB (protected step control block) from subpool 230. • Puts the PSCB address in the LWA. • Puts the UPT address in the PSCB. • Gets the re-Iogon buffer from subpool 230. • Puts the re-Iogon address in the PSCB. • Calls the logon scheduler router. Logon Scheduler Router (IKJEFLB) (8): • Frees subpool O. • Attaches the logon monitor. • }>osts the monitor with the 'schedule' code. • Waits for the 'what to do' post from the monitor. Logon Monitor (IKJEFLC) (9): • Switches the storage key to '8'. • Gets the STAX work area from sub pool 1. • Gets the ECT (environment control table) from subpool 1. • Puts the ECT address in the LWA. • Invokes the STACK macro (input is to come from the terminal). • Gets the new CSCB (command scheduler control block) from the SQA. • Sets the CSCB to indicate the job is: - swapp able - terminal job - cancellable - TSO A.6.S OS!VS2 System Programming Library: MVS Diagnostic Techniques TSO (continued) ASXB . X'14' ~ LWA Logon work area "LWA" PSCB ECT LOGON ECB t RLGB PROMPT ECB 30 SCHED ECB 34 • UPT PROMPT ECB o 4 1-:-------. .. , *The logon work area (IKJEFLWA) is a 148-byte area that is created by IKJEFLA and is pointed to by ASXB and JSXL. It contains control block pointers, entrance lists, and parameter lists that are required for logon/logoff. Figure A-S. Logon Work Area Appendix A: Process Flows A.6.9 TSO (continued) • Gets the local and CMS locks. • Puts the CSCB address in the ASCB. • Frees the local and CMS locks. • Calls MGCR to remove the old CSCB from the chain. • Puts the new CSCB pointer in the JSCB and JSEL. • Calls MGCR to add the new CSCB to the chain. • Issue the STAX macro to set up attention handling. • Unks to logon verification. Logon Verification (IKJEFLE) (10): • Calls the installation exit (if necessary). • Issues GETLINE or uses the installation supplied buffer containing the logon parameters. • Calls the command scan service routine to ensure that input is the LOGON or LOGOFF command (assumes LOGON). • Calls PARSE for logon parameter parsing. • Indicates no password required for the UADS. • Issues ENQ on the UADS (prevents the ACCT CP from changing UADS). • Opens the UADS. • Issues FIND for the userid member (userid is taken from the logon parameter). • Places the userid in the PSCB. • Posts the logon scheduler. • Waits for the post from the logon scheduler. Logon Scheduler (IKJEFLB): • Enqueues (via ENQ) on SYSIKJUA.USERID. • Posts logon verification. • Waits for logon verification. Logon Verification (IKJEFLE): • Dequeues (via DEQ) from UADS. • Puts the userid in the CSCB. • Puts the userid in the ASCB. • Enqueues (via ENQ) on UADS. • Finds userid member. • Dequeues (via DEQ) on UADS. • Reads UADS. • Issues check. A.6.10 OS/VS2 System Programming Library: MVS Diagnostic Techniques TSO (continued) • Places' the parameter in the proper control block. • Places the password in the TSB. • Places the procname in the CSCB. • Places the region size in the PSCB. • Informs SRM of the performancegroup. • Builds the JCL: /'/USERID JOB 'account#', REGION=region size / /procname EXEC procname, PERFORM=performance group • Issues the 'LOGON IN PROGRESS' message to the terminal. • Closes the UADS. • Clears 'NO PASSWORD' in the JSCB. • Dequeues (via DEQ) from UADS. • Posts the logon scheduler to schedule the session. (11) • Waits for the logon scheduler. • Sends the broadcast messages (via the information routine). (12) • Issues the 'LOGON IN PROGRESS' messages until posted by the initiator. • Frees subpool 78. Logon Scheduler (IKJEFLB) (11): • Sets up the interface to JSS.· • Posts the logon monitor. • XCTLs to JSS (initiator). Job Scheduling Subroutine (IEESB605) (13): • Calls the PRE-TMP exit. PRE·TMP Exit (IKJEFLJ): • Posts the monitor task to terminate. • Moves the PSCB from (unaccountable) subpool 230 storage to (accountable) subpool 252 storage. The PSCB address is placed in the active JSCB. • Moves the UPT and the re-Iogon buffer to 0 (allows updating by CPs). • Returns to the initiator. Initiator: • Attaches TMP (PARM='xxx ... ' is passed). TMP(I4): • Issues "READY" message. • Requests terminal input. Appendix A: Process Flows A.6.11 TSO (continued) LOGON Scheduling Diagnostic Aids The following two figures contain information that can be used for diagnosing problems that occur during logon scheduling. ' Field Name and Contents LWAINX1 LWALA LWALB LWALC LWALE LWALEA LWALI LWALH LWALL LWALGM LWALJ LWALK LWALG LWALGB LWALS LWALTBC LWAMCK LWAPCK LWAPHASE Name of Executing Module =1 =1 =1 =1 =1 =1 =1 =1 =1 =1 =1 =1 =1 =1 =1 =1 =0 LWAPHASE =1 LWAPSW LWATNBT Common Name of Module IKJEFLD IKJEFLA IKJEFLB IKJEFLC IKJEFLE IKJEFLEA IKJEFLI IKJEFLH IKJEFLL IKJEFLGM IKJEFLJ IKJEFLK IKJEFLG IKJEFLGB IKJEFLS IKJEFLH IKJEFLGB IKJEFLGB Any LOGON module except IKJEFLH IKJEFLH IKJEFLGB IKJEFLG Installation Exit (written by installation) LOGON Initialization LOGON Scheduling LOGON Monitor LOGON/LOGOFF Verification Parse/Scan Interface Installation Interface LOGON Synchronizer LOGO F F Processi ng LOGON Message Handler Pre-attach Exit Post-attach Exit Attention Exit LOGON Monitor Recovery LOGON Scheduling Recovery and Retry Mail and Notices Processing ABEND was a machine check ABEND was a program check LOGON/LOGOF F Verification LOGON Synchronizer Console Restart key depressed Attention Routine Figure A-9. LOGON Work Area Bits That Indicate the Currently Executing Module A.6.12 OS/VS2 System Programming Library: MVS Diagnostic Techniques TSO (continued) Module 'Issuing POST IKJEFLB IKJEFLC IKJEFLE Module Being .Posted Location of ECB IKJEFLC field LWASECB in LWA IKJEFLB IKJEFLB field LWAPECB in LWA field LWAPECB in LWA Condition of Post Module Issuing Code POST Action Taken by Module Being Posted 16 Ready to invoke job scheduling subroutine (lEESB605). Invoke LOGON informati on routine (lKJEFLH). 24 Terminating for LOGOFF or for unusual termination of LOGON monitor (lKJEFLC)' Perform clean-up operations and terminate. 12 Termination or attention requested. Issue DEQ on user identification. 16 Verified and processed Schedule a terminal the LOGON session. parameters. 24 Processing a LOGOF F Terminate. command. 8 Authorized the user identification. Issue ENQ on user identification. 12 Error processing. Issue DEQ on user identification. IKJEFLJ IKJEFLH field LWASECB in LWA 20 Detects that the initiator is ready to attach the TMP. Finish LlSTBC processing; return to caller. IKJEFLH IKJEFLJ field LWAPECB in LWA 20 Finished LlSTBC processing. Terminate so the initiator can attach theTMP. Figure A-I O~ LOGON Scheduling Post Codes Appendix A: Process Flows A.6.13 TSO (continued) TSO Line Drop Processing The following description corresponds to the overvieW of line drop processing shown in Figure A-II. IEDA YH (Part of TCAM MCP): • Gets control from the TCAM dispatcher when either of the following occurs: A hang up on a monitoring channel program or a message generation. Each input or output message ends. • Tests for and handles several kinds of errors. If it discovers the line has dropped, it begins terminating the user. Each of the following is considered a line drop: Entry because of a hang up on a·monitoring channel program or a message generation A 3705 control unit error -'- indicated in the SCB (station control block) A permanent terminal error - indicated in the SCB A countable error and an appropriate number of retries have been done indicated in the SCB • If a line drops, issues a QTIP 4 (SVC IO I, entry code 4). QTIP4 (IEDAYHH): • TSBHUNG=l. • Issues QTIP 28 to free the TCAM buffers. • If the reconnect time limit is 0 (in TIOCRPT), then branch enters SIC (systeminitiated-cancel IKJEFLF) with code 622; upon return, returns to caller. • For a non-O RECONLIM: - Sets TSBMINLequal to the reconnect time limit. IfTIOCTECB (in TIOCRPT) is posted, then increases the value in TSBMINL by one. Otherwise, posts TIOCTECB (which IEDA Y802, running as a subtask of TCAM, is waiting for). • Returns. A.6.14 OS/VS2 System Programming Library: MVS Diagnostic Techniques TSO (continued) LINE DROP IN TSO ENVIRONMENT ·TeAM Address Space TCAMMCP SVC 101 IEDAYH POST ... - QTIP4 ~ t-- IEDAY802 Subtasks of TCAM CALL - - SIC IKJEFLF I .- _______ ---1 USER Address Space I SCHEDULE SRB I ,--POST SIC (SRB) IKJL4TOO J t SVC INITIATOR* - CALL SVC34 ~ 'RETURN ~ TMP - - CALLRTM I I I ______ . _ _ ---..J ABEND Command Processor *Upon return, continues with normal logoff. Figure A-It. Overview ofTSO Line Dump Process Appendix A: Process Flows A.6.IS. TSO (continued) IEDA Y802 (subtas~ of TeAM): I{e'eps track of users whose lines have dropped and, if the time limit' expires before they come back, terminates the address space. IEDAY802 does the following: • Waits for TIOCTECB. • Sets the one-minute timer. I • Invokes QTIP 27 (IEDAY88) SVC 101, entry 27 which scans the TSBs for TSBHUNG=l and TSBMINLfO, If so, QTIP 27 decreases TSBMINL by 1. If TSBMINL is now 0, QTIP 27 branch enters SIC (system-initiated-cancel) with code 622. QTIP 27 returns a code of 0 if any users have time left or a code of 4 if all users have been cancelled. • If the return code is 0, IEDA Y802 goes to the one-minute timer. • If the return code is 4, IEDAY802 waits for TIOCTECB. SIC (system-initiated cancel): IKJEFLF schedules an SRB in the address space to be terminated, passes a completion code (622 for line drop), and returns to the caller. IKJL4TOO runs under the SRB scheduled by IKJEFLF and gets control the next time the address space is dispatched. IKJL4tOO does the following: • If TMP is in control, skips to POST. • Issues STATUS STOP for TCB= (IWAIT/OWAIT dispatchability bits). • Issues QTIP 24, which sets TSBCANC=l and removes aWAIT for other address spaces TPUTing to this user. • POST cancels the ECB in the CSCB (IKJL4 TOO branch enters POST with completion code 622). The initiator (IEFSD263) waits for the ECB while TMP is in control. • If TMP is not in control, issues STATUS START for the logon scheduler and monitor tasks. • Exits. Initiator (IEFSD263): • Waits for the CANCEL ECB and ATTACH ECB of the TMP task. • When the CANCEL ECB is posted, issues SVC 34 to abnormally terminate the user. A.6.16 OS!VS2 System Programming Library: MVS Diagnostic Techniques TSO (continued) SVC 34: • Issues CALLRTM, which sets the resume PSW of the TMP task to point to an' SVC D instruction and forces the TMP task to be dispatchable. SVC D (RTM2): • Oversees the termination of the TMP task and all daughter tasks. • When the TMP task terminates, its attach ECB is posted, giving the initiator control again. Initiator: Processing continues the same as for normal logoff except: IKJEFLK, the POST~TMP exit module, issues QTIP 24. IKJEFLC issues the "session cancelled" message before the logon scheduler XCTLs to the STC termination. If the line drops, IEDAY8, the TIOC resource manager, does not force the remaining messages out. TMP and Command Processor Interface The following is a description of the TMP and command processor flow. 1. The TMP is attached by the initiator as a result of a logon command from a teJ;minal user or the execution of a batch job. Logon initialization establishes the s1 AE environment to handle abends and the STAX exit to handle attention interrupts. 2. The TMP mainline routine receives control and determines which buffer to obtain. This can be either: a. The logon buffer (from PARM= on the EXEC statement of the logon procedure) b. The command buffer, as a result of a PUTGET c. The buffer obtained by the attention prolog d. The buffer obtained by the STAI exit 3. If the current input is the command buffer, the TMP must check for five special cases as follows: ' a. PUTGET is responsible for checking for a '?' in the first buffer position in response to a mode message. When one is detected PUTGET immediately issues the next available second-level message. This TMP should never receive a '?' in a buffer, but if the user enters a '~?' (blank ?), PUTGET lpasses the buffer through to the TMP. b. A nul11ine. Appendix A: ProcessFlows A.6.17 TSO (continued) c. TEST command without operands. d. TIME command. e. If scan determines that the data in the buffer is not one of these special cases and that the data begins with an alphabetic character and is less than eight bytes, the TMP issues an ATTACH for the command name. Prior to ATTACH processing a search is conducted (through MLPA, LPA, joblib, LNKLSTxx, respectively) to assure a successful ATTACH. If the ATTACH is not successful, the TMP assumes a CLIST and attaches the EXEC CP to search the user's command procedure library. If the TMP does not locate either a command or a command procedure whose name is the same as that found in the input buffer, a 'COMMAND NOT FOUND' message is issued to the terminal. 4. If the command processor was attached, the TMP waits for an ECB list containing the following ECBs: a. STAI ECB:The'TMP's STAI exit routine posts this when a command processor abnormally terminates and does not recover with its own ST AE routine. ,b. Attention exit ECB: The TMP's attention exit routine I!osts this when it gains control. It gains control when the user enters an attention interrupt and the TMP exit is the current level exit. For more details, see the discussion of "TSO attention processing" later in this chapter. A.6.18 c. STOP/MODIFY ECB: this ECB is posted if a stop userid is requested by the system operator. d. Command processor ECB: this is the ECB specified in the attach of the command processor. It is posted when the processor terminates. 5. If the command processor ECB is posted, the TMP repeats step 2 to determine what' action to take. 6. If the attention exit or STAI ECB is posted, the TMP does one of the following: a. If a ~~?' was entered in response to the mode message, the TMP sends second level messages to tl;te terminal. b. if a null line was entered, TMP returns control to the command processor. If an attention interrupt occurred, the TMP continues normal processing. If an abend occurred, the TMP takes a dump. c. If TEST was entered without operands, the TMP links to TEST and places the interrupted command processor under test control. When TEST processing is ended, the TMP aetaches the current command and prompts the user with a 'READY' to enter a new command. d. If the TIME command was en tered, the TMP displays the curren t time. and prompts the user for anew command. In this case, the user can exercise any of the preceding options or enter a new command. c. If the user enters a new command or exercises one of the preceding options, the TMP detaches the current command and issues a PUTGET requesting new input. OS/VS2 System Programming Library: MVS Diagnostic Techniques TSO (continued) The following common control blocks are used for communication among the TMP, command processors, and service routines (PUTGET, PARSE, etc.): IKJTMPWA (TMP Work Area) Created by: IKJEFTOI Length: 1076 bytes Pointed to by: TMPWAPTR,WORKAPTR Function: Provides communication among TMP modules. Contains register save areas, parameter lists for TEST and TMP, ABEND exit routines, and mappings of macros commonly used by TMP modules. TPL MAPPING 3C 14C 158 , CPPL , ECT 168 • PSCB 170 IKJCPPL (Command Processor Parameter List) Created by: IKJEFTO 1 Length: 16 bytes Pointed to by: Register 1 • UPT 2E4 2E8 2EC 2FO 2F4 (CPPL), CBUF ~ UPT • PSCB ECT ~ Function: Provides parameters for the command processor. 334 (ECT) IKJECT (Environment Control Table) 338 33C Created by: IKJEFTOI Length: 40 bytes Pointed to by: TPL,CPPL Function: Provides communication among the TMP, CP, and service routines. Contains current command/subcommand names, pointers to work areas and second-level message chains, and return codes. 340 ~ IOWA I' SMSG PRIMARY COMMAND 348 SUBCOMMAND 350 Appendix A: Process Flows A.6.19 TSO (continued) IKJPSCB (Protected Step Control Block) Created by: IKJEFLA Length: 72 bytes Pointed to by: LWA,CPPL Function: Contains information from UADS, control bits, and accounting data for the user ID. (This accounting data is controlled by the installation via the ACCOUNT command.) PSCB 0 User 10 8 30 34 ~ RLGB • UPT IKJRLGB (Re-Logon Buffer) Created by: IKJEFLA Length: 264 bytes Pointed to by: PSCB Function: Contains the LOGON/LOGOFF command entered at the terminal at th~ end of the session. RLGB X'100'M----------I • ECT IKJUPT (User Profile Table) Created by: . Length: Pointed to by: 24 bytes C User Environmental Switches PSCB,CPPL Function: Contains information stored in UADS that is used by LOGON/LOGOFF, the TMP, and the command processors. (This information is all controlled by the installation via the PROFILE command.) A.6.20 UPT IKJEFLA OS/VS2 System Programming Library: MVS Diagnostic Techniques Line Line Delete Delete Char Char 10 ~ 18 DSNAME Prefix r--= 1S0 (continued) TSO Command Processor Recovery The following describes IBM's TSO command processors. Figure A-12 summarizes their recovery activity. ACCOUNT The STAE exit routine for ACCOUNT flushes the input stack and posts the ACCOUNT ECB before returning to continue abend processing. ACCOUNT attaches the HELP command processor, specifying for a STAI exit routine the same name as the STAE exit routine. EDIT The ESTAE exit routine for EDIT flushes the input stack, stops automatic line prompting, and frees any acquired storage still remaining. The EDIT work area, mapped by IKJEBECA, can be located in a dump to obtain certain data on the EDIT session. The pointer to the communication area is passed between routines in register O. By convention, most routines keep the pointer in register 9 during . execution. A description of IKJEBECA can be found in the data areas microfiche (OSjVS2 Data Areas). LOGON The ESTAE exit routine for LOGON de queues from the userid, closes the UADS data set, and detaches IKJEFLC. The LOGON work area, mapped by IKJEFLWA, can be located in the dump (field ASXBLWA in the ASXB) to obtain certain information on the session. A description of IKJEFLWA can be found in the data areas microfiche. LOGON also has an EST AI exit routine, which dcqueues from the userid, closes the UADs data set, cancels the attention exit, and frees subpools 1 and 78. OPERATOR The STAE exit routine for OPERATOR stops all active monitor function if the abend is caused by a DETACH with ST AE. OPERATOR also has a ST AI exit routine that is the same name as the STAE exit routine. The SVC 100 parameter list, mapped by IKJEFFIB and passed to the OPERATOR command processor, can be located in the dump and certain data on the session can be obtained. A description of IKJEFFIB can be found in the data areas microfiche. . Appendix A: Process Flows A.6.21. TSO (continued) OUTPUT Before returning to continue abend processing, the ESTAE exit routine for OUTPUT closes any data sets that are being processed. The OUTPUT work area, mapped by IKJOCMTB, can be located in a dump (while OUTPUT is in control) and certain data on the session can be obtained. OUTPUT attaches the HELP command processor specifying a STAI exit routine. The ST AI exit routine simply returns to continue abend processing. SUBMIT The SUBMIT command processor runs under the ST AI environment established by SVC 100. This STAI routine closes the INTRDR data set before it returns to continue abend processing. The SVC 100 parameter list, mapped by IKJEFFIB and passed to the SUBMIT command processor, can be located in the dump and certain data on the session can be obtained. A description of IKJEFFIB can be found in the data areas microfiche. Command Processor STAEI ESTAE ACCOUNT STAE STAll ESTAI RETRY SDUMP LOGREC IKJ565541 STAI EDIT ESTAE LOGON ESTAE Messages IKJ565541 I I ESTAI See Note 1 ,; IKJ564521 I IKJ56451I IKJ564521 I KJ56406 I OPERATOR STAE STAI OUTPUT ,j IKJ550041 I IKJ550041 See Note 2 ESTAE See Note 3 IKJ563181 STAI SUBMIT STAI IKJ562941 Notes: 1. Abend codes B37, D37, and E37 point to I KJ524271, I KJ524281; the others point to I KJ524221. I f the data set is modified, abend codes poi nt to I KJ525551. 2. SDUMP is issued for all abends except for DETACH with STAE, codes 437, 913, and 422. 3. LOGREC is written to except for DETACH with STAE. 4. An effective trapping and problem solving technique for TSO command processors is to stop the error processing in the appropriate error recovery routine. Figure A·12. Summary of Command Processor Recovery Activity A.6.22 OS/VS2 System Program~ing Library: MVS Diagnostic Techniques TSO (continued) TSO Tenninal Input/Output Overview Terminal I/O flow is divided into two parts: input flow and output flow. This overview highlights each at the SVC level. TS/TCAM uses the services of three SVCs to communicate between the user's address space and the TCAM address space: 1. TGET/TPUT (SVC 93): The TMP and command processors issue this SVC to move data from the user's buffer to an interface buffer in CSA (TIOC buffer). 2. QTIP(SVC 101): This SVC is a set of multipurpose routines that perform functions for both the user address space and the TCAM address space. For example, QTIP is used by TCAM to move data from a TCAM buffer to an interface (TIOC) buffer and is also used by J'GET/TPUT to move data from a user's buffer to a TIOC buffer. 3. STCC (SVC 94): This SVC is a set of routines used to update TCAM control blocks from the user's address space. For example, the user can use the terminal command to change a terminal characteristic. This is communicated to TCAM via SVC 94. TS/TCAM data flow also requires a logical connection between a terminal, a line, and an address space. This is accomplished as follows: • The terminal macro in the user's MCPestablishes the connection between a terminal name and it destination (destination QCB). • At TCAM initialization, OPEN establishes the connection between the destination and a physical terminal (a line control block is connected to the terminal name table via an index into the table). • Logon processing establishes the connection between the destination QCB and the user's address space (the destination QCB contains the ASID of the user and the user's terminal status block (TSB) contains an index to the TCAM terminal name table). Also, a user's TSB and ASCB point to each other. The station's control block contains the address of the TSINPUT QCB. Terminal I/O flow also requires the use of two special TCAM subtasks: TSINPUT and TSOUTPUT. TSOUTPUT acts as the router for all messages coming from time sharing users. TSOUTPUT is responsible for editing output messages as it moves the data from the time sharing interface buffers (TIOC buffers) in CSA to the TCAM buffers in the TCAM address space. Once TSOUTPUT has moved data to the TCAM buffers, the buffer is routed to the output side of the message handler and then written to the terminal. TSOUTPUT also runs as a subroutine of TCAM. TSOUTPUT is the first subroutine in control of the disk I/O QCB in a TCAM system that supports time sharing. Appendix A: Process Flows A.6.23 TSO (continued) Terminal Output Flow Assume that a user has logged on, the TMP has been initialized, and a PUTGET has been issued by the TMP to put out a 'READY' message and request input from the terminal user. The following now occurs: 1. 2. The TMP invokes the services of thePUTGET service routine, which issues a TPUT and then a TGET (both SVC 93s). TPUT performs the following basic functions: a. Obtains a TIOC buffer from the pool of free buffers. If a buffer is not available or the user has passed the output buffer limit (OWAITHI ,parameter in IKJPRMOO), the user is placed in an output wait (the appropriate flag is set in the TSB). b. If a buffer is available, the 'READY' message is moved from the user's buffer to the TIOC buffer. c. The user's terminal status block is placed on TCAM's asynchronous ready queue. (A siJecial element at TSB + X'40' is used.) d. An XMPOST is done to alert TCAM. e. Control is returned to PUTGET. When the TCAM address space is dispatched, and the MCP TCB regains control, TCAM searches its asynchronous ready queue and discovers the user's TSB. However, because this -is a TS/TCAM system, TSOUTPUT receives control instead of the disk I/O routine. ~SOUTPUTperforms the following functions: a. Builds TCAM buffers from basic TCAM buffer units. b. 'Uses QTIP services to move the TIOC buffer from the TSB header queue (queue of complete output messages) to the TSB output trailer queue (queue of TIOC buffers being moved). A.6.24 c. Uses special TIOC edit routines (not QTIP) to move and edit data from the TIOC buffer to the TCAM buffer. d. Once the data has been moved into the TCAM buffers, the TCAM buffers are routed to the output side of the message handler and are then written to the terminal. After the message is successfully written, the TIOC buffers are freed via a subsequent call to TSOUTPUT. OS/VS2 System Programming Library: MVS Diagnostic Techniques TSO (continued) Terminal Input Flow The following process can run in parallel with step 2 in the preceding section, "Terminal Output Flow." It starts when control is returned to PUTGET as described at the end of step 1 in that section. 1. PUTGET issues a TGET to obtain input. TGET (SVC 93) performs the following functions: a. Checks to determine if there is an input buffer on the user's terminal status input queue. TCAM normally allows users of remote terminals to enter input while the current input is being processed. Therefore, it is possible that input could be 'stacked' and an input buffer found on the TSB input queue. However, TCAM does not allow local devices to 'stack' input. In this case, assume a local device and no buffer on the TSB input queue. b. Therefore, the TGET notifies SRM that an input WAIT has been entered and sets the appropriate flag in the TSB (IWAIT condition). c. SRM eventually performs a swap-out on the user. 2. The user now enters a new command at the display station and hits 'ENTER'. TCAM handles the interrupt, associates it via the LCB to a terminal name table index, terminal table entry, and destination QCB. 3. The TCAM buffer is routed to the input side of the appropriate message handler (determined from the DCB for the line). The message handler normally translates the data from line code to EBCDIC. The message handler must locate the destination QCB of the terminal that issued the message and also check that the terminal is logged on to time sharing. If it is logged on, the message handler routes the buffer to TSINPUT as the common input destination for all time sharing messages. 4. TSINPUT performs the follOWing functions: a. From the ASID value in the terminal's destination QCB, TSINPUT determines which address space should receive a particular message. b. TSINPUT obtains a TIOC buffer from the free buffer pool. If no TIOC buffers are available, the TCAM buffer is chained from a special queue in the TSINPUT QCB until TIOC buffers are made available. In this case, the time sharing system is placed in an LWAIT (out of TIOC buffers). c. If a TIOC buffer is available, TSINPUT uses the services of QTIP to move data from the TCAM buffer to the TIOC buffer. Most line control characters and all 3270 buffer control characters are edited out ·of the message during this move. Appendix A: Process Flows A.6.25 TSO (continued) d. SRM is"notified that the user is no longer in an input wait and may be swapped in. e. The TCAM bufferis routed to the buffer disposition routine for final processing. 5. Once the TCAM buffer has been freed and final cleanup has been performed on the line, TCAM searches for additional work on the work-to-do queues. If there is none, TCAM enters a wait. 6. Once SRM has swapped-in the user, TGET regains control. Using QTIP, TGET moves the data from the TIOC buffer to the user's buffer. TSO/TIOC Tenninal I/O Diagnostic Techniques For terminal hangs or interlocks involving TSO tem1inal I/O, a good place to start is at the TSB and TIOCRPI'. The TSBs are physically contiguous and adjacent to the TIOCRPT (aU in CSA), as shown below: TCX (TCAM CVT extension) +24 TIOCRPT (reference pointer table) TSB A.6.26 OS/VS2 System Programming Library: MVS Diagnostic Techniques TSO (continued) TIOCRPT is described in the Debugging Handbook. TSB is described in OS/VS2 Data Areas (microfiche). TIOC is described in OS/VS TCAM Level 10 Logic. TSBOWIP and TSBWOWIP are used to serialize TPUTs to a user. TSBOWIP is set at the start of a TPUT SVC, while that SVC holds the local and CMS locks. If another TPUT is issued before OWIP is reset, then WOWIP is set and the issuer of the second TPUT is put in OWAlT. The task that has "seized the TSB" (that is, set OWIP) can be determined by checking TSBCTCB. (TSBTJIP and TSBTJOW serve approximately the same function for cross-memory TPUTs.) TSO Attention Processing . The followingsection summarizes the process of TSO attentions. The numbers in I parentheses correlate to the numbers in Figure A-I3. TCAM Channel End Appendage ( 1) • Ensures TCAM is active. • Finds the element associated with this terminal. • Places the element on the asynchronous queue. • TeAM dispatcher merges the asynchronous queue to the ready queue and give control to the message handler. • TCAM recognizes the following forms of terminal attention interrupts: I/O attention interrupt for a 2741, which is checked in the line end appendage. Two separate interrupts for the 3270; (1) a keyboard-invoked I/O attention interrupt, followed by (2) an I/O complete interrupt for the read issued by TeAM in response to the first interrupt. A user character string for a simulated attention, which is checked by the SIMATTN routine. Appendix A: Process Flows A.6.27 TSO (continued) Hardware Attention Simulated Attention (3) (1) ..- TeAM Channel End Appendage Message Handler -- (~) Message Control Program(MCP) I (4) TIOC Attention Handler / / / (5) / / OTIP Attention Handler / L --( r-:(6) TIme I Sharing t--.. . RCT LBIOC~S~J - / r - L -, User Issues IL ___ STAX ..JI (7) RCT Attention Scheduler (9) RCT Attention Exit I .1 (8) , Figure A·13. TSO Attention Flow A.6.28 OS/VS2 System Programming Library: MVS Diagnostic Techniques Selected User STAX Exit TSO (continued) Simulated Attention (2): The message control program (MCP) reads input from the terminal the same as it does for normal operation. It then passes the message to the message handler. Message Handler (MH) (3): . • Checks for the following conditions and calls TIOC if any exist: Terminal input (character string) PAl function key Terminal output lines TIDC Attention Handler (4): • Ensures TSO is active. • Gets the user's TSB. • Checks if the attention was caused by a deleted line. • Invokes QTIP (TIOC/TSO interface). QTIP Attention Handler (5): • Checks if the user has issued any STAX macros. • Ensures the number of unprocessed attentions does not exceed the number of active STAXs (causes '!I=TOO MANY ATTENTIONS or '!'=ATlIENTION ACCEPTED to be printed a(the terminal). • Posts the RCT to schedule the user's attention exit. • Purges input and output message queues to/from user except ASID type messages. RCT (Region Task Control) (6): • Waits for: - Termination - QUIESCE/RESTORE - Attention .Appendix A: Process Flows A.6.29 TSO (continued) RCT Attention Scheduler (7): • Cancels previously-scheduled attentions that have not been executed. • Determines the current attention level requested. • Disables any affected tasks. • IfOBUF and/or IBUF was specified on the STAX macro, issues TPUT and/or TGET. User STAX Exit (8): • User defined. RCT Attention Exit (9): • Enables any affected tasks. • Checks for another attention pending. • RCT enters wait. TSO APAR Documentation . TSO APAR documentation should include: • Terminal input and output. • SYSUDUMP or stand-alone dump, as appropriate. • Information about how the system differs from PID release in the TSO area: PTF list. Information about non-IBM commands that appear in terminal output. Description of any TMP modifications. Description of applicable installation exits (LOGON, SUBMIT, etc.). • A.6.30 listing of the logon procedure, with a list of membernames in STEPLIBs, if any. OS/VS2 System Programming Library: MVS Diagnostic Techniques Appendix B: Stand-alone Dump Analysis This appendix contains a procedure that has been used successfully in stand-alone dump analysis. It is part of the course material in Field Engineering classes that teach MVS problem determination. This procedure does not attempt to cover all situations bu t it can be used as a guide through major status areas until you become thoroughly familiar with the system. Overview Stand-alone dumps are generally taken by the operator when he detects: • That the system has stopped in a solid wait state with a wait state code. • What appears to him to be a system loop. • That the system is not running or is running slowly. Usually the 'Title From Dump' reflects what the operator thought happened. Before becoming too involved in the problem itself, it is a good practice to get some feel for the status of the system at the time the dump was taken. Some valuable system status indicators can be obtained from the formatted section of the dump. Indicators can be obtained from the formatted portion of the dump under "System Summary" (produced by the SUMMARY control statement) and CSD, PSA, LCCA, and PCCA (produced by the CPUDATA control statement). Although it is seldom that anyone indicator definitely points out the problem, when all indicators are noted and analyzed, a pattern might emerge that points the problem solver to the proper area for further investigation. The enabled wait generally occurs as aresult of the lack of some critical system resource. If the PRINT statement of PRDMP is used, PRDMP identifies the current task. If the current task is the wait task, the message "Curren t Task = Wait Task" will appear. If it appears you have an enabled wait condition, read the chapter on "Waits" in Section 4 of this book before proceeding with your analysis. The system can appear or actually prove to be bottlenecked because the operator cannot communicate with MVS. This is the sign of a problem almost anywhere in MVS, but an error in the communication task or its associated processing might be the direct cause. The communication task runs as a task in the master scheduler's address space, usually represented by, the third TCB in the formatted portion of the stand-alone dump; it is identified by a X'FD' in the TCBTID field (TCB+X'EE'). By inspecting the RB structure associated with this task, you can determine the current status. It is not unusual to find one RB with a resume PSW address in the LP A and an RB wait count of one. If more than one RB is chained from the TCB and you could not enter commands, analyze the RB structure as this is not a normal condition. Appendix B:. Stand-alone Dump Analysis B.1.1 Appendix B: Stand-alone Dump Analysis (continued) Remember that communicatio"ns task processing is very dependent on the rest of the operating system. Probably some external service or process has caused the communications task to back-up, and this possibility should be investigated. For the system to continue execution, the major components must be opera. tional. If any critical system components such as master scheduler, ASM, JES2, and TCAM for TSO, terminate abnormally and fail to. recover, the system· cannot continue normal operation. Usually this can be determined from the records in SYSI.LOGREC. However, check the TCB summary in the formatted section for completion codes. The presence of a TCB completion code does not positively identify the associated task as being inoperative. It is possible that the completion code is residual and the task has recovered. The presence of a completion code makes the task suspect, however, and should be investigated. Unless the operator STORE STATUS command was issued before taking the dump or the "Title from Dump" reflects a WSC (wait state code), it can be difficult to determine if a WSC exists and what it is if it does. Ifhowever, the WSC PSW is dispatched by NIP during IPL, it is generally located in one of two places: • In the MCH new PSW if a program check occurred prior to RTM initialization. • In the nucleus vector table (NYT + X'EO') in the case of a system-detected error during the NIP process. The other WSCs (they are few in number) issued by the system are dispatched by the master scheduler communications task and ASM. The current address space should identify who loaded the WSC PSW; WSC PSWs are issued when the system determines that it cannot continue. They are usually preceded by other error indicators that should be investigated along with the WSC. Note: A valid WSC always looks like: X'00020000 OOOOOxxx' A disabled wait normally has a wait state code associated with it. If so, the messages and codes should contain a problem description. If there is no wait state code, the trace table should indicate the last sequence of events leading to the wait state condition. Probably a bad PSW (wait bit on) has been loaded. If no valid WSC exists and if the PSW reflects the wait bit, is disabled, and the STORE STATUS registers are not equal to zero, suspect: a user or Field Engineering trap or a SLIP trap (with a wait state code ofX'OlB'), a bad branch, or system damage. Examine the trace table and attempt to define the events that led up to the wait condition. Was the last entry an SRB dispatch or an SVC I/O interrupt? Using the PSW address, determine the entry point of the routine if possible. B.l.2 OSNS2 System Programming Library: MVS Diagnostic Techniques Appendix B: Stand-alone Dump Analysis (continued) The PSA is a system area whose status indicators are dynamically changing. The status indicators reflect the condition of the system the instant the dump was taken. Taken out of context, they can be misleading. Therefore, find out if the operator entered a STORE STATUS command and keep in mind the status could have been stored any time and not necessarily just before the dump. Note: The best evidence that the operator issued STORE STATUS is the content of 'Current Registers and PSW at Time of Dump.' This is because although the stored status is put in the PSA +X'IOO' and the registers are put at PSA +X'1601FF', the SADMP program reads this area as the current PSW and registers and writes them to the dump data set. On a UP, the formatted current data will be the same as in the PSA. On an MP system, however, the SADMP program issues SIGP to the other processor to store status. The STORE STATUS command always stores in the normal PSA at location zero. This means that the normal PSA will contain the registers and PSW from the other processor. If the SADMP program did not save the STORE STATUS data before issuing the SIGP instruction to the other processor, the data from the operator's STORE STATUS command would be overlaid and the contents lost. Also note that on an MP system there are three PSAs and the AMDPRDMP program formats all of them for you. The normal PSA is used only during NIP (and SADMP). Always be sure you are looking at the right PSA when you are analyzing the PSA contents. If the PSW + X'OI' = xE or x2, the PSW = Wait PSW. IfPSW + X'OO' = X'04', the system was disabled. If PSW + X'OO' = X'OT, the system was enabled. Determine whether the PSW contains a WSC or an address. Then determine what key the PSW reflects. PSA + X'lOl' = X'xC' or X'xE' where the x = key, as follows: o 1 5 6 7 8 9-F su pervisor scheduler/JES2/JES3 lOS, data management, actual block processor, O/e/EOV TCAM/VTAM IMS virtual problem program V=R problem program Check the PSA for a low storage overlay. Critical fields are the CVT pointer at X~IO', the new PSWlocations at location X'58-78' and at location X'OO', and the trace table pointer at location X'54'. Keep in mind that the CVT pointer at location X'lO' is constantly refreshed and the old PSWs are constantly updated by the hardware. They could have been overlaid at one time and sti11look okay in the dump from an MP system. I In a SADMP on a UP, locations X'OO' through X'18' are always overlaid by the IPL CCWs and PSW from the IPL of SADMP itself. These locations never contain valid data. Appendix B: Stand-alone Dump Analysis B.1.3 Appendix B: Stand-alone Dump Analysis (continued) If the PSW reflects the wait bit and does not have a zero address and if the STO RE STATUS registers are zero, check location X'300'. Is it equal to the wait state PSW? If so, it is possible some task scheduled a bad SRB. Examine the trace table for the SRB dispatch. Register O's position in the trace table is a pointer to the SRB. The previous address space before the SRB dispatch is the possible scheduler of the SRB. Another possibility is an overlaid RB or LCCA. What does the last entry in the trace table reflect - SRB or task dispatch? Make sure that the trace table was not stopped by the dump task. Check for an X'80' in the high order byte of the CPUID field. Loops can be either disabled or enabled. The best way of determining which has occurred is by noting the address of the loop if the operator recorded it before taking the SADMP. Recorded addresses that fall within the SRM code are usually not indicative of a loop because this code is entered periodically as a result of a timer interrupt. This signifies, however, that the system does enable for interrupts and you can treat the error as an enabled loop. Caution: If the only addresses the operator furnished are in the timer or SRM code, check that it is not really an enabled wait condition. The typical disabled loop is quite short, whereas the enabled loop covers a wide range of addresses. Be careful that the recorded addresses that may reflect a short loop are not a 'loop within a loop.' Scan the trace table and try to determine if a pattern of activity exists. Look for SIOs to the same device, SVCs from the same address, program checks occurring frequently for other than page faults, or any repetitive activity. If no pattern exists, try to correlate the last trace entry with what you already know about the loop (for example, I/O interrupts, a loop in an lOS or SRB dispatch, and a loop in the nucleus in some routine which is entered via an SRB). The enabled loop usually reflects a wide range of addresses and can even span address spaces between a user and the system address spaces. An examination of the trace table usually shows some pattern of activity that is recognizable as a loop. Be especially suspicious of a SVC OD or SVC OA for the same size area, SVC 33, SVC 4C, and SIOs to the same device with the same 10SB address in register 1. Trace table entries with SVC OD and/or SVC 33 in a stand-alone dump usually mean that some task is abending and the system is attempting to recover and' purge the task from the system. B.1.4 OS/VS2 System Programming Library: MVS Diagnostic Techniques Appendix B: Stand-alone Dump Analysis (continued) If any address within the loop points to the lock manager (module IE~VELK) , the problem is probably caused by someone requesting an unavailable spin lock. On a UP, this is an invalid condition and always signifies an overlaid lockword. On an MP system, this signifies that the other processor is holding the lock and failing to release it. There is a strong possibility that this indicates an overlaid lockword also. If not, the problem is on the other processor. In either case, register 11 can point to the lockword requested and Register 14 is the address of the requestor. Check the value in the lockword. Valid values are a fullword of zeros or three bytes of zeros and the CPUID in the fourth byte. Any other bit configuration causes the system to spin in a disabled loop and signifies an overlaid lockword. Register 12 always contains the bit mask to check the locks-held-table in the PSA. If the lockword is overlaid, you must identify who overlaid it. It is possible that the lockword was overlaid in conjunction with some other problem. This procedure is designed to aid the problem solver and to supplement the diagnostic procedures he has developed over the years. Its main purpose is to call attention to the new serviceability features within MVS and provide an index into the correct component analysis procedures in Section 5 of this manual. Once again, the component analysis procedures are there as hints and helps rather than to provide a structured approach to all problems. Appendix B: Stand-alone Dump Analysis B.1.5 Appendix B: Stand-alone Dump Analysis (continued) No Note 11 Ye-'s_ _ _ _--,J.... Note 21 >-N'""O____--,J. Note 18 Note 12 Yes PSA+X'2FS' Yes Note 25 Figure B.1.6 B-1. Stand-alone Dump Analysis Flowchart OS/VS2 System Programming Library: MVS Diagnostic Techniques Note 14 ~ OOOOxxxl Note 20 Appendix B: Stand-alone Dump Analysis (contineud) Analysis Procedure The following explanations correlate to the "Notes" in Figure B-1. Note 0 - Dummy Task? The enabled wait generally occurs as a result of the lack of some critical system resource. If the PRINT statement ofPRDMP is used properly (see the chapter "Additional Data Gathering Techniques" in Section 2), the message "CURRENT TASK = WAIT TASK" appears in the formatted portion of the dump. PSA + X'218/21C' is the new/old TCB pointer. PSA + X'220/224' is the new/old ASCB pointer. The ASCB with an ASID=O is the dummy ASCB. If the dummy task is the current task, go to the TCB summary before going to the next block and check whether any task has an error completion code. If any TCBs are abending, continue at Point A and start with the top three address spaces if they have a completion code. If the ASCB is the dummy ASCB but the TCB new/old pointers are zero, then take the "no" path and check for SRB mode. Note I - System Enabled for I/O? Is bit 6 on in the current PSW? Is control register 2 correctly loaded? The current status of the system is in the PSA if a STORE STATUS command was entered before the dump was taken. Note 2 - Dispatchable Work to be Done? 1 . One of the first places to check for system dispatchability is the common system data area (CSD). For example, CSD+C=40 indicates that most of the system is non-dispatchable. This bit can be set by SDUMP. Is any address space abendingand in the process of taking an SDUMP? Check the TCB summary for completion codes. 2. Dispatchable work within an address space is indicated by: ASCB+80 = 00000000 or FFFFFFFF ASCB+66, 67, 72, 73 = 00 ASCB+7C = some value or ASCB + 1C = the service priority list has an SRB queued. Appendix B: Stand-alone Dump Analysis B.l. 7 Appendix B: Stand-alone Dump Analysis (continued) 3. The JES2/JES3 address space can contain work that should be passed to a waiting initiator or interface that has an address space for SYSIN or SYSOUT data. 4. Dispatchable work at the system level is indicated by SRBs queued to the service manager queue and the global service priority list. For (1), you must determine who set the bit on, who should have reset it, and why the bit was set. It might be necessary to trap on the setting of this bit. For (2), a 7FFFFFFF in the lockword keeps the dispatcher from dispatching. Most of the flags show the reason for not dispatching. For (3), check the JES control blocks more closely. For (4), determine why the dispatcher is not functioning. See the "Dispatcher" chapter in Section 5 of this manual. Note 3 - Enqueue Lockout? As in other systems, an exclusive enqueue prevents other tasks from using the same resource. However, in MVS, locks are now used frequently instead of an enqueue. 1: Use the QCB format function to print the QCBs and check for exclusive enqueues. 2. The CVT+X'280' points to the first major QCB. Major QCB+8 points to the first minor QCB. Minor QCB+O points to the next minor QCB. Minor QCB+8points to the first QEL. 3. Any QEL reflecting exclusive control or reserve status prevents any other task from using that resource. Any.QEL reflecting shared status prevents any task requesting exclusive control from using that resource. 4. The Debugging Handbook defines some of the major and minor ENQ names. Note 4 - Incomplete I/O? Label IECVSHDR in IEANUCO 1 points to a pool of cells used by lOS to build the 10Q (I/O queue element). The 10Qs are found in two places: B.I.8 OS!VS2 System Programming Library: MVS Diagnostic Techniques Appendix B: Stand-alone Dump Analysis (continued) I. An 10Q chained to the VCBA indicates an I/O operation is in progress or has completed on that device. The flag bytes at VCB + 6 determine the current state of the device. The device is available when the flag byte is zero. No request for this device should be chained to the LCH during an enabled waiL 2. The 10Qs are chained to the logical channel queues (LCH) if the I/O operation has been requested but not started. The LCH is pointed to by the CVT+X'SC'. The entry for each logical channel is 20 bytes long. At X'OO' into each entry is a pointer to the first 10Q queued for that logical channel. The presence of IOQs on any logical channel is immediately suspect whet) examining an enabled wait state dump. An empty queue (no requests) is indicat.ed by a word of FFFFFFFF in the LCH at X'OO'. Note 5 - Is Any Task in a Page Wait? Check the TCB RBs for a wait count not equal to zero. RB+IC =wait count RB-S =1=40 (FLAGI) Note 6 - Explicit Wait in System Code? Does the address in the PSW fall within the nucleus or LP A code? Compare the address with a NVCMAP or LPA map. Check the load list and CDEs for system modules that have been loaded into the private area. Note 7 - Real Storage Okay? If a task remains in a page wait, it could indicate a shortage of page frames or a real storage failure. The control blocks that contain status about the use of real storage are: I. Page vector table (PVT) PVT+4 = available frame count PVT+X'24' = free PCB count PVT+X'140' = deferred for lack of free page frames PVT+X'14S' = requests sent to ASM Appendix B: Stand~alone Dump Analysis B.l.9 ,,', Appmdix B: Stand-alone Dump Analysis (continued) 2.' Page frame table (PFT) Shows use of each frame of real storage available for paging. Note 8 - Is Auxiliary Storage Okay? If tasks are in a page wait and real storage is not a problem, the trouble could be within the auziliary storage manager (ASM). ASM status indicators are: 1. AS MVT+ X'28' = the number of paging I/O requests received 2. ASMVT+X'2C' = the number of paging I/O requests completed 3. ASMVT+X'50' =the number of started I/O requests that have , not completed 4. ASMVT+X'54' =indicates whether the ASM's SRB for ILRPTM (ASM PART monitor) has been scheduled If the number of paging I/O requests completed is equal to the number of paging I/O requests received, the ASM has no outstanding work. However, if these counts differ, check the other status indicators for the following: 1. If I/O requests have been started but not completed, determine what has happened to the I/O. 2. If ASM's SRB for ILRPTM has been scheduled, determine what the dispatcher has done with the SRB. Note 9 - Is lOS Okay? If the number of started I/O requests that have not completed (ASMVT+X'50') is zero, then lOS has completely processed all the I/O that ASM has started. Note 10 - Interrupted SRB or TeB? The condition that caused the SRB to be suspended has been resolved. The suspended SRB (SSRB) is queued on the SPL at the non-quiesceable level. The condition that caused the TCB holding the local lock or local and and CMS locks to be suspended has been resolved. The save area to be restored upon dispatching is the IHSA. A TCB holding the local lock or local and CMS locks has been interrupted by a higher priority task. The save area used for redispatching is the IHSA. See the chapter "Dispatcher" in Section 5 or the chapter "System Execution Modes and Status Saving" in Section 2 of this manual. B.1.10 OS/VS2 System Programming Library: 'MVS Diagnostic Techniques Appendix B: Stand-alone Dumr Analysis (continued) Note 11 - Not RTM ? Without the detection of a failure by MVS, which would have caused entry into RTM, check the following. If the stand-alone dump reflects the same current task, this could be normal operation or the task could be in a loop. Check the following for status information: LCCA PSA PCCA Trace table TCB RB/SVRB If no failure information is found (the system appears to be running normally), the problem might be that another task or address space should be running and is unable to. Check the following for status information: 1. Check each address space that is expected to be running to find out why it is not running. The information about each address space and task within that address space can be found in: ASCB, ASXB, TCB, and RB/SVRB. 2. Or, check the total system to find out why other work is not being run. Check the status of the system resources: ENQ lockout of data sets I/O failures RSM or ASM failure Waits in system code for other system resources (such as buffers) If you are checking other than the current task, the TCBs could be dispatch able , but not yet dispatched. If the task is non-dispatchable (non-dispatchability bits on in the TCB), this can indicate an error situation. Or the task could be simply waiting (indicated by a wait count in the current RB). Check the dispatchability flags in the following control blocks for status of this task or select another address space or task and continue at Point A. Status information can be found in: ASCB, ASXB, TCB, and _ RB/SVRB. If this is a system dump, the TCB belongs to a non-abending sister, mother, or daughter task. Find the task that has a completion code (by checking the TCB summary) and continue at Point A. Appendix B: Stand-alone Dump Analysis B.1.11 Appendix B: Stand-alone Dump Analysis (continued) Note 12 - RTM2, Yes. The most important place to find information about abend codes is OSjVS Message Library: VS2 System Codes.. The RTM2 work area address is stored by RTM2 in TCB+X'EO'. Every system dump (SYSABEND/SYSMDUMP/SYSUDUMP) should have at least one TCB with an RTM2WA address at TCB+X'EO'. The error indicators contained in the RTM2WA are described in the Debugging 'Handbook. If an Estae routine is in control when an error occurs, RTM builds an SDWA (described in the Debugging Handbook) and places its address at the RTM2WA+X'D4'. Additional information about the failure may be found in the LOGREC buffer. RTM2WA+X'38' points to RTCT; RTCT+X'20' points to the LOG REC buffer. If recursion occurs during RTM processing, other RTM2WAs may exist. If other work areas were obtained, the last one is pointed to by the TCB+X'EO'. The last RTM2WA, points to the previous work area (RTM2WA+X' 168, 16C, 170'). If there is no space in LSQA to build the work area, SQA is used. Normally the RTM2WA is obtained from LSQA. It is therefore unique to each address space. If you are looking at a stand-alone dump, be sure that the area you are looking at belongs to the failing address space. If the abending task is one of several abending tasks it is important to decide which task to look at first. There could be several failures or one failure causing all the others. Any failure in the system address spaces (JES2, master scheduler) are important because they might have caused the user address spaces to terminate. For the system to continue execution, the major components must be operational. If any of the critical system components (master scheduler, ASM, JES2, TCAM for TSO, etc.) abend and fail to recover, the system cannot continue normal operation. Usually this can be determined from the records in logrec. However, check the TCB summary in the format section for completion codes. The presence of a TCB completion code does not positively identify the associated task as being inoperational. It is possible that the completion code is residual and the task has recovered. The presence of a completioncode makes the task suspect however, and should be investigated. B.l.12 OS!VS2 System PrograJ11ming Library: MVS Diagnostic Techniques Appendix 8: Stand-alone Dump Analysis (continued) Simplify your choice of address spaces by using: • SYSl.LOGREC external and internal entries • Console sheets • Trace table or GTF (check for SVC D or program check entries) Once you have selected an address space and TCB, continue at Point A. (Check Section 5 for the component analysis of the involved component.) In addition to the RTM2WA, status indicators related to the problem can be found in:: • • • • Trace table EST AE control block (SCB) RB/SVRB TCB Note 13 - Local Lock Only? The current ASCB+X'C' contains the CPU ID. The current TCB+X' 110' also contains the CPU ID. The loop is within this task. Status is saved (if a STORE STATUS was done) in: • • • • PSA LCCA Current stack Local SDWA (ASCB+6C) - if the task abended while holding the lock· • Trace table • In-storage LOGREC buffer Is this task looping in the lock managet's code? Check the; PSA+X'22B' and LCCA+X'20C'. If the task is looping and this is an MP system, the other processor could be causing the loop by not freeing a spin lock that it is currently holding. Note: The failure to free or obtain a lock can be caused by the lockword being overlaid on either an MP or UP. If both processors of an MP are looping in lock manager code, then the failure could be in that code. If only one processor is in lock manager code, then the failure is likely to be in the processor currently holding the lock. Within the lock manager code, register 12 contains the bit mask for the locks-held table in the PSA (PSA+ X'2FB'). Register 11 can contain the address to the lockword itself and register 14 contains the return address of the requestor. Appendix B: Stand-alone Dump Analysis B.1.13 :Appendix 'B: Stand-aIbneDttmp Analysis (continued) Whe.re is the tasklooping? Why doesn't it free the locks? Is RTM involved with this task? If it is, continue at Point A. See the chapters on "Locking" and "Effects of Multi-Processing on Problem Analys,is" in Section 2 of this manual. Note 14 - Local Lock Plus Another Lock. . The current ASCB contains the CPU ID. The current TCB+X'110' contains the CPU ID. The loop is within this task if only the local and CMS locks are held. The loop could be a spin loop waiting for the other processor to release a global lock. In this case, determine why the lock has not been released. Status indicators can be found in the following areas (if a STORE: STATUS was done): • • • • PSA LCCA Current stack Local SDWA (ASXB+6C) - jfthe task abended while holding local and CMS locks • Trace table .• In-stor~ge LOG~EC buffer See Note 13 for additional information. Also see the chapters "Locking" and "Effects of Multiprocessing on Problem Analysis" in Section 2 of this ~anual. Note 15 - Global Lock Held. A global lock loop in an MP system could be normal. The spin loop continues until the global lock is released by the other processor . . Determine why the other processor has not released the lock. Error status indicators can be found in the following areas if a STORE STATUS was done: • • • • PSA (current PSW) LCCA Current stack Global SDWA (if there was an abended failure while the global lock . was held) . The global SDWA for the super stacks is located at the respective super stack+ X'2S4'. For the normal stack, the global SDWA immediately follows the RESTART super stack SDW A. B.l.14 OS/VS2 System Programming Library: MvS Diagnostic Techniques Appendix B: Stand-alone Dump Analysis (continued) Now continue at Point A in the procedure. See Note 13 for additional infonnation. Also see the chapters "Locking" and "Effects of MlVtiprocessing on Problem Analysis" in Section 2 of this manual. Note 16 - lOS Not Okay. Check the requests sent to lOS from auxiliary storage manager (ASM). Control blocks containing information are: 1. PART (paging activity reference table) - One entry per page data set. Each PART entry contains a pointer to an 10RB (I/O request block) at X'l C' and a pointer to a UCB at X'2C'. 2. 10RB contains the following I/O related data: 10RB+X'1' = number of 10RBs for this page data set IORB+X'3' =indicates whether 10RB is in use 10RB+ X'4' = pointer to next 10RB for this page data set 10RB+ X'8' = pointer to the first PCCW 10RB+X'C' = pointer to the 10SB. Refer to the Component Analysis section for additional lOS status indicators. Note 17 - Suspended SRB or TeB With Lock Held. An SRB can be suspended because of a page fault or a request for a CMS lock when it is being held by another processor. The save area for the suspended SRB is the SSRB. If interrupted by a page fault, the SSRB is pointed to by the corresponding PCB+ X' 1C'. For a CMS lock request, the SSRB is on the CMS lock-suspended queue, which can be located in IEANUCOl at label CMSSRBF. (See AMBLIST of IEANUCO 1.) A locked TCB can be suspended for the same reasons as an SRB. The save area is the IHSA (described in the Debugging Handbook). The IHSA is valid during a page fault if the corresponding PCB+8 flag is on. The IHSA is valid for a CMS lock suspend if the ASCB is on the CMS lock suspend queue in IEANUCOI at label CMSASBF. The TCB can be suspended because of a page fault while holding the local lock and the CMS lock. A difference would be that the ASCB+X'67' flag for the CMS lock is turned on. See the chapter "Dispatcher" in Section 5 and the chapter "System Execution Modes and Status Saving" in Section 2 of this manual. Appendix B: Stand-alone Dump Analysis B.1.15 Appendix B: Stand-alone Dump Analysis (continued) Note 18 - Not RTM2. The presence of a TCB completion code does not positively identify the associated task as being inoperational. It is possible that the completion code is residual and the task has recovered. The presence of a completion code makes the task suspect however, and it should be investiga ted. The save areas have been released. The status of the error has been written to SYSl.LOGREC. Continue at Point A with other TCBs in the dump. Another abending task is likely. If this is a stand-alone dump, it very likely has the needed LOGREC entry in the in-storage buffer. CVT+'23C' points to RTCT; RTCT+X'20' points to the LOGREC buffer. Note 19 - Real Storage Not Okay. If page waits seem to be caused by the lack of real frames, check their usage. The PFT contains information about each frame currently being used. Importan t items to check are: Which ASID holds the most real storage? What are the frames being used for? Is it valid that they be held or is .there a problem with the freeing of the frames? Status information might be found in the PVT, PFT, and RSMHD and ASCB (X'98') for the ASID that is holding all the frames. See the "RSM" chapter in Section 5 of this manual for more information about RSM. Note 20 - lOS Okay. Either something was missed along the way or the failure is in one of the following areas: • The lOS interrupt handler has failed to schedule the SRBjIOSB to the address space. • The dispatcher has not handled the SRB correctly. • POST has not functioned properly. Information on these errors might be found in the trace table or the in-storage LOGREC buffers. B.l.16 OS/VS2 System Programming Library: MVS Diagnostic Techniques Appendix B: Stand-alone Dump Analysis (continued) Note 21 - RTMllnvolved. If there is an address at TCB+X'104' there might be two problems to resolve: • The failure that caused the system to enter RTM initially. • A loop between RTMI and RTM2, since the pointer at TCB+ X' 104' normally lasts for only a short time, The pointer at TCB+X'104' is the EED (described under RTIW in the Debugging Handbook). This data area is used to pass information from RTMI to RTM2. Once RTM2 receives control the information is moved to the RTM2 work area and the EED is deleted. Therefore, because of its short life span, the presence of an EED is unusual. A SLIP trap may be required to solve the RTM loop. This loop is of course the most important problem. If the loop is in the current task, check these status indicators: • • • • • • • • • LCCA PSA Current stack RTMIWA RTM2WA SDWA pointed to by RTMIWA EEDs LOGREC buffer Trace table If the loop is not in the current task, all the indicators above except the LCCA, PSA, and current stack are valid. The current FRR stack is also a valid status indicator. Remember that all disabled or locally locked code runs under the protection of an FRR routine. Check the current stack pointer at PSA + X'380'. If the current stack pointer points to a super FRR it is almost certain that system damage has occurred. The normal stack at X'COO' contains a record of FRR activity for the current address space. Location X'COC' is the pointer to the current entry on the normal FRR stack. An address at X'COC' or· X'C34' indicates an empty stack. Any address between X'CS4' and X'E34' indicates that the system is currently under FRR protection and the first word in each FRR entry is a pointer to the FRR routine. Because the FRR routine is usually embedded within the routine it protects, identifying the FRR routine identifies the "looper." Appendix B: Stand-alone Dump Analysis B.1.17 Appendix B: Stand-alone Dump Analysis (continued) The second word in each entry contains an indicator in the first byte. A X'BO' indicates that this routine is in control. A X'40' indicates that this nested recovery routine is in control. If any entry on the stack points to RTM or ABDUMP's FRR, it is almost certain that system damage has occurred in a SADMP. This is normal in an SVC dump. If there is an address at either X'C44' or X'C4B', there has been an entry into RTMI and an RTCA (SDWA) has been obtained. The loop could be occurring in the FRR routine itself. The first word in the FRR stack entry points to the FRR routine. TheSDWA (pointed to by X'C44' or C'4B') is the input passed to the FRR. Examine the code for the FRR and the module and consider the input passed to it in the SDWA to gain some insight into the cause of the loop. Note 22 - Auxiliary Storage Not Okay. If the count of I/O requests received (ASMVT+X'2B') differs from the count of I/O requests completed (ASMVT+X'2C'), and the number of started I/O requests that have not completed (ASMVT+X'50') is zero, locate those paging I/O requests (represented by an AlA) that ASM has received but not completed. Control blocks containing information are: I. AIA-X'2B' = part of PCB which contains RSM-related data 2. ASMVT+ X'20-24' = queue of AlAs waiting for 10Es 3. The I/O request element (lOE) which points to the AlA is queued to one of the following PART queues: PART+X'30-34' = common write queue PART+ X'3B-3C' =spill write queue PART+X'40-44' = duplex write queue PART+X'4B-4C' =local write queue each PART entry contains an unsorted read queue (X'C') and a sorted read queue (X'30'). 4. Each active 10RB (PART entry+ X' I C') contains a chain of PCCWs (lORB+ X'B'). Each of these PCCWs points to an AlA (PCCW+ X'B'). 5. If the AIAcannot be found by the above means (that is, it was lost by ASM), PCB/AlA may be found on the common I/O queue (PVT+X'75C-760') or one of the local I/O queues (RSMHD+X'IC-20'). For further information, see ASM's "General Debugging Approach" in section 5. B.l.18 OS/VS2 System Programming Library: MVS Diagnostic Techniques Appendix B: Stand-alone Dump Analysis (continued) Note 23 - Local SRB Mode. This indicates a loop (or enabled wait) within a single address space. The SRB code cannot be pre-empted. If a loop occurs in the SRB routine, no higher priority task can be dispatched. For an MP system there is a second possibility. Determine if the loop is in the lock manager code. If so, see notes 13, 14, and 15 for additional information. Continue at Point A. Status Indicators • Trace table. • PSA (current PSW). • LCCA. • Current stack. • RTMI WA (SDWA) - if abend occurred during SRB processing. • ASCB. • RTMI WA+X'38' points to an SDWA obtained via GETMAIN (if RTMI WA+ X'40' = 10). • RTMI WA+X'34' points to a local SDWA if the GETMAIN for SDWA failed. Note: If the system is an MP and the loop is in the lock manager code, then the other processor might be at fault. See notes 13, 14, and 15 for additional information. Continue at Point A. Status Indicators • PSA (current PSW). • LCCA. • Current stack. • RTMI WA (SDWA) - if failure occurred during SRB processing. • Trace table. • RTMIWA+X'38' points to an SDWA obtained via GETMAIN (if RTMIWA+X'40' = 10). • RTMI WA+X'34' points to a local SDWA if the GETMAIN failed. See the chapter "Dispatcher" in Section 5. Also see the chapters "Locking," System Execution Modes and Status Saving," and "Effects of MP on Problem Analysis" in Section 2 of this manual. Appendix B: Stand-alone Dump Analysis B.1.19 Appendi~ B: Stand-alone Dump Analysis (continued) Note 24 - Global SRB Mode. This indicates an enabled loop (or enabled wait) within a single address space . . The SRB code cannot be pre-empted. If a loop occurs in the SRB . routine, no higher priority task can be dispatched. For an MP system there is a second possibility. Determine if the loop is in the lock manager code. Ifso, see notes 13, 14, and 15 for additional infomlation. Continue at Point A. Status Indicators • Trace table. • PSA (current PSW). • LCCA. • Current stack. • RTMI WA (SDWA) - if ABEND occurred during SRB processing. • ASCB. • RTMIWA+X'38' points to an SDWA obtained via GETMAIN (ifRTMIWA+X'40' = 10). • RTMIWA+X'34'points to a local SDWA if the GETMAIN failed. Note: If this is an MP system and the loop is in the lock manager code, then the other processor might be at fault. See notes 13, 14, and 15 for additional information. Continue at Point A. Status Indicators • PSA (current PSW). • LCCA. • Current stack. • RTMIWA (SDWA) - if failure occurred during SRB processing. • Trace table. • RTMIWA+X'38' points to an SDWA obtained via GETMAIN (ifRTMIWA+X'40' = 10). • RTMIWA+X'34' points to a local SDWA if the GETMAIN failed. See the chapter "Dispatcher" in Section 5. Also see the chapters "Locking," "System Execution Modes and Status Saving," and "Effect of MP on Problem Analysis" in Section 2 of this manual. B.l.20 OS/VS'). System Programming Library: MVS Diagnostic Techniques Appendix B: Stand-alone Dump Analysis (continued) Note 25 - Wait in User Code. This could be normal operation for an explicit wait (SVC 1) issued by a user routine. Determine if the event waited upon has completed. Check the TCB non-dispatch ability flags to determine the reason. The flags normally indicate the area of the problem. For example, if Flags4 =X'04', this indicates a VARY or QUIESCE command is in process on an MP system; Flags5 =X'80' means the task was terminated. Note 26 - Non-enabled System. A disabled wait normally has a wait state code associated with it. If so, the messages and codes should contain a problem description. If there is no wait state code, the trace table should indicate the last sequence of events leading to the wait state condition. Probably a bad PSW (wait bit on) has been loaded. Status Indicators • LCCA • PSA • Current stack • Trace table • In-storage LOGREC buffer If no valid WSC exists, if the PSW reflects the wait bit and is disabled, and if the STORE STATUS registers are not equal to zero, suspect a user/FE trap, a SLIP trap (wait state code 01 B), bad branch, or system damage. Examine the trace table and attempt to define events that lead up to the wait condition. Was the last entry an SRB dispatch or an SVC or I/O interrupt? Using the PSW address, determine the entry point of the routine if possible and go to the chapter "MVS Trace Analysis" in Section 2 of this manual. If the wait state occurs during system initialization, see the NIP vector table for error information. If the system is in a disable loop, determine what code is in control and why it is not returning to the enabled state. A disabled loop in the lock manager on an MP system could be okay .. Read notes 13, 14, and IS. A disabled loop in the SIGP processor on an MP system could be okay. (The other processor should turn off its PCCA's parallel/serial bit.) If the system is looping (no wait bit), follow the SRB mode path. Check if RTM is involved and if it is, go to Point A. Appendix B:· Stand-alone Dump Analysis B.1.21 Appendix B: Stand-alone Dump Analysis (continued) Note 27 - Dispatchable Work'Available. If the system is dispatchable and an address space has dispatchable work, the following are possible causes: • The dispatcher is not functioning. • CPU affinity may have been requested. • JES2 might not be sending work to the initiators. In this case, take a closer look at JES2. See the chapter "Dispatcher" in Section 5 of this manual to determine why the dispatcher is not functioning properly. Note 28 - Enqueue Lockout. Determine why the top task of a series of exclusive enqueues is not running or has not dequeued from the resource. Note: It is valid for the top task to be swapped out. If it does not get swapped back in, then the failure might be in the system resource manager (SRM). Note 29 - Incomplete I/O. This is a probable hardware error. See the "IDS" chapter in Section 5 to determine the status of I/O. Note 30 - Explicit Wait in System Code. Check in the program listings (on microfiche) for the reason of the wait. Then determine which resource is being waited upon. Once the resource is identified, determine if the wait should have been satisfied. If the wait appears to be a normal operation, con tinue at Point A for this TCB. If the last thing done before the wait was an SVC 23 (WTO), related infoffilation can be found in the UCM base, prefix UCM, UCM extension and the chain of used WQEs. 8.1.22 OS/VS2 System Programming Library: MVS Diagnostic Techniques Appendix B: Stand-alone Dump Analysis (continued) Note 3 J - System Analysis. If the failing task or component is not known, continue on the "yes" path of the flowchart. To determine status about a TCB without doing a total system analysis, continue on the "no" path of the flowchart. For a complete system analysis, start with low storage. Check the PSA for a low storage overlay. Critical fields are the CVT pointer at X' 10', the PSW new locations at location X'S8-78' and at location X'OO', and the trace table pointer at location X'S4'. Be especially critical of the interrupt handler new PSWs. Any change to any new PSW will cause the next interrupt handler for that event to be dispatched in the wrong mode or key or to the wrong address. Subsequent results can be very unpredictable. Keep in mind that the CVT pointer at location X'10' is constantly refreshed and the old PSWs are constantly updated by the hardware. They could have- been overlaid at one time and still look okay in the dump from an MP system. In a SADMP on a UP, locations X'OO' through X'18' are always overlaid by the IPL CCWs and PSW from the IPL of SADMP itself. They will never contain valid data. Other important fields in the PSA are as follows. The interrupt code for the various classes of interrupts are located at: • X'84' external interrupt • X'88' SVC interrupt • X'8C' program interrupt These fields indicate the last type of interrupt associated with each interrupt class for each processor. PSA + X'21 0' - address of the LCCA (1 per processor). The LCCA contains many of the status-saving areas that were located in low storage in previous systems. It is used for software environment saving and indicators. The registers associated with each of the interrupts you have discovered in the PSA are saved in this area. In addition, the system mode indicators for each processor are maintained in the LCCA. The ASCB and TCB NEW/OLD pointers in the PSA (locations X'218-227') indicate the currently dispatched task. Note: PSATOLD can equal zero if an SRB is dispatched. AppendiX B: Stand-alone Dump Analysis B.1.23 Appendix B: Stand-alone Dump Analysis (continued) PSA + X'228' - PSASUPER. This is a field of bits that represent various supervisory functions in the system. If a loop is suspected, check these fields to isolate the looping process': PSA + X'2F8' - PSAHLI. This field indicates the current locks held on each processor. Knowing which locks are held may help isolate the problem, especially in a Loop situation, By determining the lock holders you can isolate the current process .. PSA + X'380' - PSACSTK. This is the address of the active recovery stack that contains the addresses of the recovery routines to which control will be routed in case of an error. If the address is other than X'COO' (normal stack), determining the type of stack (for example, program check FLIH, restart FLIH should aid in debugging the loop situation. Another thing to consider in systems analysis is the possibility of a storage overlay of some critical system code such as lOS or GETMAIN. Because of the recovery aspects of MVS (percolation and retry), evidence of storage overlays can often be found in the LOGREC recording buffers. To find the LOGREC recording buffers: CVT + X'23C' (RTCT). = pointer to the recovery termination control table. The RTCT + X'20' = pointer to the recording buffers. The recording buffer (LRB) + 0 =pointer to the first entry. The recording buffer (LRB) + 4 = pointer to the last entry. The recording buffer (LRB) + 8 =pointer to the next available buffer. Each buffer entry for a software record begins with X'408x' or '428x' where x =the release number. Each software entry is approximately X'200' bytes long. The first X'20' bytes is header information and contains the CPUID and serial, the time and date, and the JOBNAME if entry is made from an ESTAE routine. This is followed by an SDWA as defined in the Debugging Handbook. Identify the last entry. Are there entries following it? If so, the buffer might have been wrapped and it no longer contains the earliest entry. It is a good idea to have the SYSl.LOGREC records for the time leading up to the dump. Scan the trace table for SVC 4C. This represents a call to the logrec recording task and identifies a record being written to SYS1.LOGREC. If SVC 4Cs appear in the trace, it is certain that there are SYSl.LOGREC records that may more closely define the problem. (See the discussion of logrec records in the chapter "Use of Recovery Work Areas for Problem Analysis" in Section 2 of this manual). B.1.24 OS!VS2 System Programming Library: MVS Diagnostic Techniques Appendix B: Stand-alone Dump Analysis (continued) As a general approach, follow the flow of FRR activity from the last entry backwards until a pattern is recognizable or the first entry is found. If the abend codes relate to a particular component, refer to that component's analysis procedure in Section 5 of this manual. If you can define a function that is consistently failing (lOS, a program check, etc.), examine the trace table for evidence of successful completion of this function. If the function completed successfully, the search for the function that caused the overlay is narrowed to those functions appearing in the trace between the last successful completion and the first evidence of error. This should at least narrow the search to the address space and task level. Analyze the contents of the overlaid storage. If it appears to contain registers, determine what data areas or modules the registers are pointing at. This helps to identify the failing code. If there is no evidence of a storage overlay, return to your system analysis at the beginning of Note 31. If a storage overlay exists, further examination of the reported problem is usually non-productive until the cause of the system damage is explained. It might be necessary to build a trap to identify the cause of the overlay. The chapter "Additional Data Gathering Techniques" in Section 2 of this manual helps in building such a trap. Appendix B: Stand-alone Dump Analysis B.1.25 B.l.26 OS/VS2 System Progr~mming Library: MVS Diagnostic Techniques Appendix C: Abbreviations ABP ACA ACB ACE ACP ACR ACT ADA AFQ AlA ALCWA ALPAQ AMB AMBL AMCBS AMDSB AP APF APG ASCB ASID ASM ASMHD ASMVT ASPCT ASST ASVT ASXB ATA AVT > Actual block processor ASM control area Access method control block ASM control element Automatic command processing Alternate CPU recovery Account control table Automatic data area Available frame queue ASM. I/O request area Allocation work area Active link pack area queue Access method block AMB list Access method control block structure Access method data statistics block· .Attached processor Authorized program facility Automatic priority group Address space control block Address space identification Auxiliary storage manager Allxiliary storage management header A$M vector table Auxiliary storage page correspondence table Address space sector table Address space vector table Address space extension block ASM tracking area TCAM address vector table BPCB BUFC Buffer pool control block Buffer control area CA CAW CAXWA CCA CCH CCW CDE CFQ Control area or channel adapter Channel address word Catalog ACB extended work area Catalog communications area Channel check handler Channel command word Contents directory entry Common frame queue Change priority Control interval Control interval definition field Console message buffer Completion field CHAP CI CIDF CMB CMP Appendix C: Abbreviation~ C.I.I Abbreviations (continued) CMS CMSWA CPA CPAB CPB CPPL ,~ CPU CPUID CQE ,CRA CSA CSCB CSD CTGPL CVT CXSA Cross memory services or cat~og management services CMS work area Channel program area Cell pool anchor block Channel program block Command processor parameter list Central processing unit CPU identification Console queue element Component recovery area Common storage area Command scheduling control block Common system data area Catalog parameter list Communications vector table Communications extended save area DAT DAVV DCB DCM DCT DDRCOMDE DEB DECB IIDIDOCS DIE DIR DMDT DMVT DQE DRQ DSAB DSCB DSPCT DVT Dynamic address translation Direct access volume verification Data control block Display control module Device control table Dynamic device reconfiguration communication table Directory entry Data extent block Data event control block Device independent display operators console support Disable interrupt exit Deferred incident record Domain descriptor table Domain' vector table Descriptor queue element Data ready queue Data set association block Data set control block Data set page corfespondence table Destination vector table: ECB ECC ECT EDB EDL EED ElL EIP EMS EOA EP EPATH EPS C.l.2 Event control block Error checking and correction Environment control table Extent descriptor block Eligible device list Extended error descriptor Event indication list EXCP intercept processor Emergency signal End of address Emulator program Error path (recovery audit trail area) External page storage OS/VS2 System Programming Library: MVS Diagnostic Techniques Abbreviations (continued) ERP ERPIB ESTAE ESTAI EVNT EWA Error recovery procedures Error recovery procedures interface block Extended ST AE Extended ST AI Event table Common ERP work area FBQE FDB FETWK FIFO FLIH FMCB FOE FOT FQE FRR FRRS FSB FVT Free block queue element Feedback data block Fetch work area First in first out First level interrupt handler VT AM function management control block Fixed ownership element Fixed ownership table Free queue element Functional recovery routine FRR stack Feedback status block Field vector table GDA GPR GSMQ GSPL GSR GTF Global data area General purpose register Global service manager queue Global system priority list Global shared resource General Trace Facility HIR Hardware instruction retry IC ICNCB IHSA ILC/CC lOB 10E 10MB 10QE 10RB 10SB lOT IOWA IPC IPCS IPL IPS IQE IRB IRT Instruction counter Intermediate controller node control block Interrupt handler save area Instruction length condition code Input output block I/O request element I/O management block I/O queue element I/O request block I/O supervisor block I/O table I/O work area Inter-processor communication Interactive problem control system Initial program load Installation perfonnance specifications Interrupt queue element Interrupt request block lOS recovery table Appendix C: Abbreviations C.1.3 Abbreviations (continued) C.1.4 JCL JCT JES JESCT JFCB JFCBX JOE JOT JPQ JQE JSCB JSEL JSXL Job con trollanguage Job con trol table Job Entry Subsystem JES control table Job file control block Job file control block extension Job output element Job output table Job pack queue Job queue element Job step control block Job scheduling entry list Job scheduling exit list KSDS Key sequence data set LCB LCCA LCCAVT LCH LCPB LCT LDA LFQ LG LGF LGCB LGE LGN LGVT LGVTE LIFO LIT LLE LLQ LPA LPDE LPID LPME LRB LSID LSMQ LSPL LSQA LVB LWA TP line control block Logical configuration communication area Logical configuration communication area vector table Logical channel queue Logical channel program block Linkage control table Local data area Local frame queue Logical group Line group block Logical group control block Logical group entry Logical group number Logical group vector table Logical group vector table entry Last in first out Lock interface table Load list element Load list queue Link pack area Link pack directory entry Logical page identifier Logical to 'physical mapping entry (or) logical page mapping entry Logrec buffer Logical slot ID Local service manager queue Local service priority Jist Local system queue area NCP logical unit block Logon work area MCH MCIC MCP MCS Machine check handler Machine check interrupt code Message control program Multiple console support OS/VS2.System Programming Library: MVS Diagnostic Techniques Abbreviations (continued) MLPA MP MPST MSS MVS MWA Malfunction alert Message handler Missing interrupt handler Modified link pack area Multiprocessing Memory process scheduling table Mass storage subsystem Multiple Virtual Storage Module work area NCP NCP NIP VTAM node control block Network Con trol Program Nucleus initialization program OCR OCT OPWA ORE OUCB OUSB OUXB Output control record Output control table Open work area Operator reply element SRM-user control block SRM-user swapp able block SRM-user extension block PAB PART PARTE PAT PCB PCCA PCCAVT PCCB PCCW PCE PDDB PDS PEP PER PFT PFTE PGT PGTE PICA PIE PIT PIU PLH PLPA PLPAD PQE PRB PSA Process anchor block Paging activity reference table PART entry Page allocation table Page control block Physical configuration communication area PCCA vector table Private catalog con trol block Paging channel command work area Processor control element Peripheral data definition block Partitioned data set Partitioned emulator program Program event recording Page frame table Page frame table entry Page table Page table en try Program interrupt control area Program interrupt element Partition information table Physical information unit Place holder Pageable link pack area PLP A directory Partition queue element Program request block Prefixed save area MFA MH MIH I Appendix C: Abbreviations C.l.S Abbreviations (continued) PSAHLHIPSCB PSS PST PSW PTLB PVT PVTAFC PWKA PSA highest lock held indicator 'Protected step control block Process scheduling service Process scheduling table Program status word Purge translation lookaside buffer Paging vector table PVT available frame count Paging work area QAB QCB QEL Queue anchor block Queue control block Queue element RACF RB I RBA RBN RCB RCT I RDCM RDF RDT RDTE RIM RJE RMCT RMF RMS RPH RPL RQE RSM RSMHD RTAM RTCA RTCT RTM S/A SART SAST SAT SCCW SCT SDWA SGT SGTE SIC SIGP SIO C.l.6 Resource Access Control Facility (program Product) Request block Relative byte address Real block number Resource control block Region control task Resident display control module Record definition :field Resource definition table Resource definition table entry Resource initialization module Remote job entry Resource m~nager control table Resource Management Facility (Program Product) Recovery management support Request parameter header Request parameter list Request queue dement Real storage manager RSM header Remote tenninal access method Recovery termination control area Recovery termination control table Recovery termination manager Stand.alone (dump program) Swap activity reference table Subsystem allocation sequence table Swap allocation table Swap channel control work area Step control table System diagnostic work area Segment table Segment table entry System initiated cancel . Signal processor Start input/output OS!VS2· System Programming Library: MVS Diagnostic Techniques Abbreviations (continued) SlOT SLIH SMF SMS SNA SPCT SPQE SQA SRB Step I/O table Second level interrupt handler System measurement facility Storage management services System Network Architecture Swap control table Subpool queue element System queue area Service request block System resources manager Serially reusable resource System services control point Subsystem communications vector table Subsystem interface Subsystem identification block Subsystem options block SVRB suspend queue Suspended service request block Subsystem vector table Specify task abnormal exit Sub task abend in tercept Started task control Subtask control block Store Multiple instruction Segment table origin register Supervisor call Supervisor request block Scheduler work area SRM SRR SSCP SSCVT SSI SSIB SSOB SSQ SSRB SSVT STAE STAI STC STCB STM STOR SVC SVRB SWA TCAM TCB TCH TIE Telecommunications Access Method Task control block Test channel TCAM Page able display control module Translation exception address Transmission header TerminalI/O coordinator Task input/output table Translation lookaside buffer Task mode controller Task mode element Terminal monitor program Time of day Terminal statl,ls block Time Sharing Option Trace table entry UADS UCB UCM User attribute data sets Unit control block Unit control module TeX TDCM TEA . TH TIOC TIOT TLB TMC TME TMP TOO TSB TSO - - . Appendix C: Abbreviations C.I.7 Abbreviations (contin~ed) UCME UIC UPT Unit controlmQdule. entry Unreferencedidterval count User .profile table VBN VBP VOSCB VIO VSAM VSM VTAM VTOC VUT·. Virtual block number Virtual block processor Virtual data set control block· Virtual I/O Virtaul Storage Access Method Virtual storage··management Virtual Telecommunications Access Method Volume table of contents :Volume unload table - I WAST WMST WQE WfQE XL XPTE C.l.8 . Workload activity specification 'table Workload manager specification table Write queue element Wait queue element Extent list External page table entry OS/VS2 System Prol@JPmingLibrary: MVS Diagnostic Techniques Index abbreviations, list of C.l.3 abend codes ASM 08x series 5.6.14 COD in ASM 5.6.19 started task control 2.7.19 SW A manager 2.7.20 symptoms of lOS problems 5.2.4 OBO in Allocation 5.11.13 OC4 in Allocation 5.11.13 .306 abend in program manager 5.3.18 806 abend in program manager 5.3..14 abend dump debugging 2.7.11 .abend resource manager 5.3.13 abnormal end appendages with ERPs 5.2.10 abnormal task termination (RTM) 5.14.5 ACB (access method control block) how to locate 5.10.3 major fields in 5.10.4 major flags in 5.10.4 ACCOUNT command processor A.6.19 ACR (see alternate CPU recovery) active recovery stack 2.1.6 additional data gathering techniques 2.8.1 addresses, commonly bad 2.7.5 address space analysis 2.1. 7 ASM's 5.6.5 blocks 4.4.4 dispatchable work in B.1.7 dispatcher's 5.1.8 initialization 5.4.3 OUCB queues 5.7.6 states 5.7.2 termination 5.14.9 tests made by dispatcher 5.1.11 allocation of SRM device 5.7.5 of virtual storage 5.4.6 allocation/unallocation abends OBO 5.11.13 OC4 5.11.13 address space termination 5.11.13 allocation common 5.11.4 fixed device 5.11.4 generic 5.11.5 module naming conventions 5.11.6 recovery 5.11.5 serialization 5.11.11 TP 5.11.4 work area 5.11.7 batch initialization 5.11.2 data set association block (DSAB) 5.11. 7 device selection 5.11.12 dynamic initialization 5.11.3 EST AE processing 5.11.10 JFCB housekeeping 5.11.3 job control table (JCT) 5.11.2 job step control block (JSCB) 5.11.2 allocation/unalloca tion (con tin ued) linkage control table (LCT) 5.11.2 reason codes 5.11.16 step control table (SCT) 5.11.2 unallocation common 5.11.5 dynamic 5.11.3 volume mount and verify (VM&V) 5.11.5 defmition 2.5.2 initiated via EMS 2.5.10 problem analysis 2.7.1 AMCBS, major fields in 5.10.2 AMDPRDMP control cards 2.8.2 example of use of data 4.1.2 how to copy tapes 2.8.5 QCBTRACE option function 2.8.4 use for loop analysis 4.2.3 use for wait analysis 4.1.2 APFauthorization 5.3.14,5.3.18 appendages, abnormal end with ERPs 5.2.10 ASCB (address space control block) analysis 2.1.7 ASM (auxiliary storage manager) address space structure 5.6.6 cell pools 5.6.6 component functional flow 5.6.2 operating characteristics 5.6.4 con trol blocks 5.6.19 converting a slot number to full seek address COD abend 5.6.19 diagnostic aids 5.6.18 error analysis suggestion 5.6.12 finding the LSID for a given page 5.6.9 footprints and traces 5.6.7 FRR/ESTAE work areas 5.6.15 general debugging approach 5.6.8 incorrect pages 5.6.9 interfaces with other components 5.6.7 MP considerations 5.6.6 page/swap data set errors 5.6.12 paging interlocks 5.6.8 recovery as a debugging tool 5.6.15 considerations 5.6.13 footprints 5.6.15 structure 5.6.14 traces 5.6.14 register conventions 5.6.7.0 requesting I/O 5.6.3 requesting swap I/O 5.6.4 saving an LG 5.6.2 SDWA variable recording area 5.6.16 serialization 5.6.13.0 SRB structure 5.6.4 storage considerations 5.6.4 system mode 5.6.4 task structure 5.6.4 5.6.10 Index I.i.l ASM (auxiliary storage manager) (continued) unuseable paging data sets 5.6.11 validity checking 5.6.13 ATA (ASM tracking area) 5.6.19 ATTACH (program manager function) 5.3.8,5.3.15 attention processing (TSO) A.6.25 attention, console not responding 5.15.6 audit trail area (EPATH) 5.6.22 auxiliary storage manager (see ASM) backout (for DEFINE/DELETE) 5.10.13 batch initialization 5.11.2 BLDL table analysis 4.4.5 BPCBs (buffer pool control blocks) 5.8.12 BSHEADER data area 5.6.25 BUFCONBK data area 5.6.25 buffer emergency signal 2.5.11 external call 2.5.17 LOGREC 2.4.14 translation lookaside 2.5.1 VTAM buffer pools 5.8.16 VTAM buffer trace 4.3.6,4.3.29 cancel process (RTM) 5.14.7 catalog communications area (see CCA) catalog management backout 5.10.13 CMS function gate 5.10.11 component analysis 5.10.1 debugging aids 5.10.15 diagnostic output 5.10.12 establishing/releasing a recovery environment 5.10.10 how to fmd registers 5.10.1 maintaining a pushdown list end mark 5.10.10 major control blocks 5.10.2 major registers 5.10.2 module structure 5.10.9 recovery routine functions 5.10.12 tracking GETMAIN/FREEMAIN activity 5.10.11 VSAM catalog recovery logic 5.10.10 catalog parameter list (CTGPL), major fields 5.10.6 CAXWA major fields 5.10.5 major flags 5.10.6 CCA (catalog communication area) major fields 5.10.7 major flags 5.10.7 CDE (contents directory entry) allocation 5.3.17 analysis 4.4.4 initialization by IDENTIFY 5.3.12 order of on ALPAQ 5.3.15 cell pool anchor block (see CP AB) cell pool management VSM 5.4.10 channel program with ERPs 5.2.10 channel scheduler, invoked for lOS 5.2.1 CHNGDUMP command to change SDUMP contents 2.8.2, 3.1.6 to override SVCDUMP parameters 2.8.5 C/L IN, OUT traces definition 4.3.6 example 4.3.12 class locks with ASM 5.6.13.2 L1.2 CMS function gate 5.10.11 CMS lock 2.3.2 CMS lockword contents 2.3.5 requests for unavailable 2.3.7 suspend queues 2.3.7 command processor and TMP interface A.6.15 parameter list A.6.17 COMM task, current status 4.1.15 (see also communications task) common allocation 5.11.2 storage area (see CSA) unallocation 5.11.2 communications task control blocks 5.15.4 debugging hints 5.15.6 description 5.15.1 sequence of processing 5.15.3 compare and swap serialization with ASM 5.6.13.3 completion codes in IOSB for ASM errors 5.6.11 console messages 5.15.9 not responding to attention 5.15.6 switching 5.15.10 contents directory entries (see CD E) controllayer A.5.1 converting virtual to real addresses 5.5.14 CPAB 5.4.10 CQE control block 5.15.4 CSA (common storage area) analysis of use of 4.4.5 use by TCAM A.6.7 CTGPL (catalog parameter list), major fields 5.10.6 current recovery stack (see FRR stacks) CVOL processor 5.10.9 CXSA contro block 5.15.4 DASD ERPs 5.2.14 data gathering techniques 2.8.1 data sets page/swap errors 5.6.12 DEFINE/DELETE backout 5.10.14 DELETE (function of program manager) 5.3.11 DIDOCS in-operation indicator 5.15.11 locking 5.15.12 trace table 5.15.11 disabled loop (see loops) disabled mode 2.2.2 disabled wait (see waits) DISP lock description 2.3.2 recovery routines when held 5.1.3 dispatch able units of work in an address space B.1. 7 priority and location 5.1.4 dispatchability tests address space 5.1.11 SRB 5.1.10 task 5.1.12 dispatcher component analysis 5.1.3 determining the last dispatch 5.1.12 dispatchability tests 5.1.10 error conditions 5.1.14 OS/VS2 System Programming Library: MVS Diagnostic Techniques dispatcher (continued) important entry points 5.1.3 processing overview 5.1.9 recovery considerations 5.1.13 DISPLAY DUMP command 2.8.2 DSNLIST data area 5.6.26 dump analysis areas 3.5.6 MP 2.5.2 stand-alone 3.1.3, B.1.1 tracing procedure 2.6.5 dumps how to copy tapes 2.8.5 how to print 2.8.2 sample storage pool dump 5.8.13 DUMP command 2.8.2,5.14.12 FRR stacks, important field contents 2.4.17, B.1.17 functional recovery routine (see FRR) GDA (global data area) for VSM 5.4.7 GETMAIN/FREEMAIN GETMAIN FRR 5.4.8 indication in trace table 2.6.9 process flow A.4.1 SVC 120 5.4.12 virtual storage allocation 5.4.6 GETPART/FREEPART 5.4.5 global data area for VSM 5.4.7 global indicators of current system state 2.1.3 global locks defmition 2.3.1 error status B.1.14 spin locks definition 2.3.2 content of lockword 2.3.5 suspend locks defmition 2.3.2 content of lockword 2.3.5 global SRBs control block relationships 5.1.5 dispatching 5.1.4 mode indicators set by dispatcher 5.1.12 queue structure 5.1.5 status indicators B.l.19 global service priority list 2.2.2 (5740-XEl) global system analysis (chapter) 2.1.3 GSMQ/LSMQ 2.1.7 GSPLs/LSPLs 2.1.7, 2.2.2 GTF (generalized trace facility) I/O and SIO trace (EP) 4.3.4 I/O and SIO trace (NCP) 4.3.5 output examples 2.8.17 RNIO trace 4.3.5 trace examples 2.6.3 EDIT command processor A.6.19 EED, important fields 2.4.18 ElL control block 5.15.4 Emergency Signal instruction (see EMS) EMS (function of SIGP) defmition 2.5.7,2.5.10 process flow 2.5.14 enabled loop (see loops) enabled loop exception 4.2.3 enabled wait (see waits) enabling PER hardware 2.8.18 ENQ/DEQ analysis for enabled waits 4.1.12 analysis for performance degradation common ENQ resource names 4.1.13 enqueue lockout B.1.8 global save area 4.4.4 EP mode traces 4.3.4 EPATH (error path) 5.6.22 ERPs (error recovery procedures) abnormal end appendages 2.7.3, 5.2.10 description 5.2.8 diagnostic approach 5.2.17 EWA (ERP work area) 5.2.6 traps 5.2.16 error id 5.14.10 error interpreter table 5.2.11 error recovery procedures (see ERPs) EST AE/EST AI ASM work areas 5.6.15 processing, allocation 5.11.10 EWA (ERP work area) 5.2.6 EXCP major control block relationships 5.2.3 EXCP/IOS process flow A.3.1 execution modes (see system mode) exit resource manager 5.3.11 explicit waits 2.1.8, B.1.9 extended error descriptor (EED) 2.4.18,5.14.2 external call (XC function of SIGP) description 2.5.7,2.5.9 process flow 2.5.12 FETCH, program manager work area (FETWK) FMCB/DNCB, how to fmd for a node 5.8.14 FORCE command 5.14.8 FORMAT statement (ofPRDMP) 2.8.4 formatted RTM control blocks 2.4.19 formatting (LOGREC buffer) 2.4.15 FRR (functional recovery routine) ASM's 5.6.14 ASM's FRR work areas 5.6.15 GETMAIN's 5.4.8 RSM's 5.5.9 SRM's 5.7.10 hard ware-detected errors, analy sis hierarch¥ of locks 2.3.2 5.3.19 3.1.10 IDENTIFY (function of program manager) 5.3.12 lEA VGF A tests by RSM A.l.3 IEAVIOCP tests by RSM A.1.6 lEA VPIOP tests by RSM A.l.6 lEA VPIX tests by RSM A.I.3 IEAVSWIN A.2.1 IHSA 2.2.1 IEAVTABD 5.14.12 IEAVTSDT 5.14.11 ILC/CC important field contents 2.1.4 Incorrect Output (chapter) 4.5.1 analyzing system functions 4.5.2 initial analysis 4.5.1 isolating the component 4.5.1 in-operation indicator DIDOCS 5.15.11 Installation Performance Specification (IPS) 5.7.1 inter-processor communication 2.5.7 interactive problem control system (lPCS) 1.1.4 intercept condition ERPs 5.2.13 interrupts, PSA fields B.1.23 I/O capability in MP 2.5.17 incomplete B.1.8 problems in enabled waits 4.1.10 requesting (ASM) 5.6.3 requesting swap 5.6.4 trace entries 2.6.7 Index 1.1.3 I/O (continued) VTAM I/O trace (see VTAM) lOB (see 10MB) I/O manager debugging 5.9.9 modules 5.9.8 10MB 5.9.9 I/O request, information in PLH 5.9.2 lOS (I/O Supervisor) ABEND codes 5.2.4 back-end processing 5.2.1, A.3.3 component analysis S.2.1 ERP processing 5.2.8 EXCP/IOS process flow A.3.1 front-end processing 5.2.1, A.3.1 general hints 5.2.6 loops 5.2.4 major control block relationships 5.2.3 POST STATUS A.3.3 problem analysis 5.2.1 processing overview 5.2.2 save areas 5.2.6 storage manager queues 4.4.4 VTAM interaction A.5.1 wait states 5.2.5 IOSB flags 5.2.7 IOSCAT lock 2.3.2,2.3.6 IOSLCH lock 2.3.2, 2.3.6 IOSUCB lock 2.3.2, 2.3.6 IOSYNCH lock 2.3.2.2.3.6 IPC (see inter-processor communication) IPCS 1.1.4 JES2 Gob entry subsystem) &DEBUG parameter 5.12.14 &WAIT macro 5.12.9 control blocks 5.12.17 conversion 5.12.1 dispatcher 5.12.9 queue structure 5.12.10 error routines catastrophic 5.12.13 disastrous 5.12.11 ESTAE 5.12.13 exit 5.12.13 I/O error logging 5.12.14 execution 5.12.1 HASP Control Table (HCT) 5.12.4 HASPSSSM 5.12.6 multi-access spool configuration 5.12.14 initialization 5.12.15 read 5.12.15 release 5.12.16 write 5.12.15 operator commands for status information output 5.12.1 processor control element (PCE) 5.12.9 purge 5.12.2 structure 5.12.2 subsystem interface 5.12.7 JFCB housekeeping 5.11.3 job control table (JCT) 5.11.2 LCCA indicator 2.2.4 LCH queues, analysis for enabled waits LDA, important flags 5.4.6 LG, saving 5.6.2 line drop (TSO processing) A.6.12 linkage control table (LCT) 5.11.2 LINK (function of program manager) description 5.3.5 module search sequence 5.3.15 1.1.4 4.1.10 4.4.2 LMOD map, how to print 2.8.8 LOAD (function of program manager) description 5.3.11 module search sequence 5.3.15 local lock defmition 2.3.1 dispatcher recovery routines 5.1.13 lockword contents 2.3.5 lockword location 2.3.6 requests for unavailable 2.3.7 suspend (defmition) 2.3.3 local SRBs control block relationship 5.1.7 dispatching 5.1.6 dispatching priority in address space 5.1.8 mode indicators set by dispatcher 5.1.12 queue structure 5.1.7 status indicators B.1.19 locating statu's information in a storage dump 2.2.5 locked mode defmition 2.2.3 status saving during execution in 2.2.3 locking (chapter) 2.3.1 lock interface table (IEAVESLA) 2.3.5 locks (see also lockwords) classes 2.3.1,2.3.6 determining which held on a processor 2.3.4 hierarchy 2.3.2 location of 2.3.6 PSAHLSI bits 2.3.4 requests for.unavailable 2.3.7 table of definitions 2.3.2 types 2.3.2 VTAM locking 5.8.7 with ASM 5.6.4,5.6.19 with DIDOCS 5.15.2 lockwords contents of 2.3.5 how to find 2.3.5 LOGDATAverb 2.4.15 logging, ERPs 5.2.12 logical groups assigning 5.6.2 releasing 5.6.2 logon command processor A.6.19 diagnostic aids A.6.11.0 initialization A.6.8 monitor A.6.8 post codes A.6.11.0 process overview A.6.1 scheduler A.6.10 scheduler router A.6.8 verification A.6.10 work area A.6.9, A.6.11.0 LOGREC analysis 2.4.2 buffer, recording control 2.4.14 for debugging SVC dump 5.14.12 formatting 2.4.15 how to print 2.8.9 listing LOGREC data set 2.4.2 record examples 2.4.3 recording control buffe'r 2.4.14 loops common loops 4.2.1 disabled apparent in IEAVERI 2.5.16 definition 4.2.1 intentional 4.2.1 PSASUPER bits to check 2.1.5 OS/VS2 System Pr~grammingLibrary: MVS Diagnostic Techniques loops (continued) system mode 4.2.4 enabled definition 4.2.1 exception 4.2.3 in Lock Manager code B.l.19 symptoms of lOS problems 5.2.4 low storage overlays 2.7.4 LPAMAP (statement in PRDMP) 2.8.4 LPSW, common uses of 4.1.4 LSID, finding for a page 5.6.9 for VIO 5.6.10.0 LSMQ 2.1.7 LSPL 2.1.7,2.2.2 open/close/end-of-volume (see O/C/EOV) OPERATOR command processor A.6.19 operator commands for status information 4.4.2 to identify performance degradation 4.4.1 ORE control block 5.15.4 other tracing methods 4.3.30 OUCB (SRM user control block) important fields 5.7.2 OUTPUT command processor A.6.20 overlays, storage cause of wait state PSWs 4.1.4 how to locate in trace table 2.6.2 in low storage 2.7.4 pattern recognition 2.7.3 machine checks debugging 2.7.6 interrupt code (MCIC) 2.7.6 reference matrix 2.7.10 message flow through the system 4.3.1 trace examples 4.3.12 messages ERPs 5.2.12 lost 5.15.8 routed wrong 5.15.9 miscellaneous debugging hints (chapter) 2.7.1 module search sequence for LINK, ATTACH, XCTL, LOAD 5.3.15 of private libraries 5.3.16 module subpools 5.3.19 MP (multiprocessing) activity in trace table 2.6.8 ASM's use of 5.6.6 associated data areas 2.5.3 debugging hints 2.5.16 dump analysis 2.5.2 effects on problem analysis 2.5.1 features of MP environment 2.5.1 parallelism 2.5.4 PSA analysis B.1.3 remote pendable services 2.5.9 remote immediate services 2.5.10 SIGP instruction 2.5.7 system stop routine 2.8.20 MSGBUFER data area 5.6.26 multiprocessing (see MP) multi-access spool configuration 5.12.14 MVS trace (see trace, trace table) PAB (process anchor block) 5.8.2 page control block (see PCB) page fault process flow A.1.3 Reclaim 5.5.8 status saving 2.2.6 trace examples 2.6.3 waits 4.1.10 page frame table entries (see PFTE) page stealing . 5.5.6 page waits B.1.9 page/swap data set errors 5.6.12 paging fmding the LSID 5.6.9 incorrect pages 5.6.9 interlocks 5.6.8 process 5.5.6 unuseable data sets 5.6.11 paging requests, analysis 4.4.5 parallelism 2.5.4 PART/PAT bit, locating 5.6.10.3 pattern recognition 2.7.3 PCB (page control block) important fields in 5.5.3 swap-out A.2.5 use in debugging A.2.5 PeeB major fields and flags 5.10.3 PEP emulator line trace 4.3.4 performance degradation chapter on 4.4.1 dump analysis areas 4.4.2 operator commands to identify 4.4.1 PER hardware en~bling to monitor storage 2.8.18 trace example 2.8.19 PFTE (page frame table entries) analysis 4.4.4 important fields 5.5.6 PGTE, RSM tests on A.1.3 physically disabled mode 2.2.2 PIU (physical information unit) format 4.3.27 tracing inbound/outbound 4.3.30 PLH (place holder) 5.9.2 post codes, LOGON A.6.11.0 PRB initialization 5.3.7 PRDMP (see AMDPRDMP) PRE-TMP exit A.6.11 PRINT statement (in AMDPRDMP), use of 2.8.4 printer ERP 5.2.15 private libraries, module search sequence 5.3.16 process flows page faults (RSM processing) A.1.3 NCP (network control program) activating several NCP traces 4.3.28 channel adapter traces 4.3.5 line trace definition 4.3.5 example 4.3.12 node trace 4.3.5 normal stack 2.1.6,2.4.17 normal task termination 5.14.4 no-work wait (see also enabled waits) 4.1.8 O/C!EQV (open/close/end-of-volume) abends 2.7.5 DEB chaining 5.9.8 debugging aids 5.9.7 ENQs issued by 5.9.7 messages 5.9.6 online problem analysis 1-1.4· Index 1.1.S process flows (continued) EXCP/IOS A.3.1 GETMAIN/FREEMAIN A.4.1 swapping A.2.1 TSO A.6.1 VTAM A.5.1 program checks example of LOGREC entry 2.4.10 interrupts 5.1.14 VTAM 5.8.15 Program Manager APF authorization 5.3.14 ATTACH 5.3.8 CDE allocation 5.3.17 component analysis 5.3.1 control blocks 5.3.1,5.3.3 DELETE 5.3.11 exit resource manager 5.3.11 FETCH/program manager work area 5.3.19 functional description 5.3.1 functional flow 5.3.5 IDENTIFY 5.3.12 LINK 5.3.5 LOAD 5.3.11 module description 5.3.2 module search sequence for LINK, ATTACH, XCTL, LOAD 5.3.15 of private libraries 5.3.16 module subpools 5.3.19 organization 5.3.1 process anchor block (PAB) 5.8.2 queue validation 5.3.4 queues, description 5.3.2 RB extended save area 5.3.20 SYNCH 5.3.12 system initialization 5.3.5 XCTL 5.3.8 806 ABEND 5.3.14 PSA (preflxed save area) analysis on MP systems B.1.13 contents of important flelds 2.1.5 indicators 2.2.4 interrupt it: licator B.1.23 using as a patch area 2.8.10 used to determine current system state 2.1.3,2.2.4 PSW (program status word) analysis 2.1.4 wait state B.1.2 pushdown list end mark, maintaining 5.10.10 QCBTRACE (AMDPRDMP option) when to use 2.8.4 use for loop analysis 4.2.3 use for wait analysis 4.1.12 QTIP attention handler A.6.27 processing A.6.4 RB (request block) analysis 2.1.9 extended save area (RBEXSAVE) 5.3 .20 manipulation by XCTL 5.3.10 new RB initialization for XCTL 5.3.9 RCB (recording control buffer) 2.4.14 RCT (region control task) attention exit A.6.28 attention scheduler A.6.28 functions A.6.27 RDCM (resident display control module) con trol block 5.15.4 1.1.6 real addresses, converting 5.5.14 real frame shortage, indicators 4.4.5 real storage manager (see RSM) reason codes allocation 5.1-1.16 started task control 2.7.19 SWA manager 2.7.20 reclaim (function of RSM) 5.5.8 record management debugging aids 5.9.3 processing 5.9.1 recovery audit trail (ASM) 5.6.22 recovery stack 2.2.4 recovery work areas, use of 2.4.1 register conventions ASM 5.6.7.0 Relate (function of RSM) 5.5.8 replies, lost 5.15.8 requesting I/O (ASM) 5.6.3 swap I/O (ASM) 5.6.4 retry process (RTM) 5.14.6 retry/restart with ERPs 5.2.10 RMCT (SRM control table) system indicators 5.7.3 RNIO trace example 4.3.12 RPHs (request parameter headers) location of 5.8.11 queuing while waiting for storage 5.8.14 waiting for the same lock 5.8.9 RPL error flelds 5.9.1 RSM (real storage manager) abend reason codes 5.5.10 component analysis 5.5.1 debugging tips 5.5.12 major control blocks 5.5.1 .page fault processing A.1.3 page stealing process 5.5.6 Reclaim 5.5.8 Recovery 5.5.9 Relate 5.5.8 RTM (recovery termination manager) cancel 5.14.7 error id 5.14.10 FORCE command 5.14.8 extended error descriptor (EED) 5.14.2 hardware error processing 5.14.2 major RTM modules 5.14.1 process flow 5.14.2 retry 5.14.6 RTM1 5.14.1 RTM2 5.14.2 stack vector table 2.2.4 systept ~iagnostic work area (SDWA) 5.14.2 termmatlOn abnormal task 5.14.5 address space 5.14.9 normal task 5.14.4 I use in producing SVC dump 5.14.11 RTM2WA deflnition 2.4.19 status information A.11-A.12 SALLOC lock 2.3.2,2.3.6 with ASM 5.2.13.0 SCHEDULE macro 2.2.2 scheduler work area (see SWA manager) SDUMPs analysis 3.1.5 how to change contents of 3.1.6 parameter list 3.1.6 OS!VS2 System Programming library: MVS Diagnostic Techniques SDWA (system diagnostic work area) data recorded by dispatcher 5.1.13 use by Catalog Management 5.10.1 use by SYS1.LOGREC 2.4.3,2.4.6-2.4.10 use in FRR stack 2.4.18 SDWAVRA (SDWA variable recording area) entries 2.4.10,5.4.8 error indicators 5.4.9 use by ASM 5.6.16 use by catalog management 5.10.12 sense command with ERPs 5.2.13 serialization, ASM 5.6.13.0 SIC (system-initiated cancel) A.6.14 SIGP (signal processor) instruction 2.5.7 EMS function 2.5.10 return codes 2.5.8 XC function 2.5.9 SLIP command, using 2.8.11 SLIP keywords 2.8.11 SLIP trap design 2.8.12 slot number, converting to full seek address 5.6.10 software-detected errors, analysis 3.1.9 software incidents examples 2.4.3 types 2.4.3 . SPCT (swap control table) format of 5.5.5 important fields in 5.5.5 special exits, dispatching 5.1.4 spin locks, defmition 2.3.2 SRB (see also local and global SRBs) dispatching queues 2.1.7,5.1.4 global 5.1.4 local 5.1.6 locally locked interrupted/suspended 2.2.3 mode 2.2.2 suspension 2.1.9,2.2.7,2.3.7 tests made by dispatcher 5.1.10 SRM (system resources manager) address space states 5.7.2 control algorithms 5.7.12 entry point summaries 5.7.8 error recovery 5.7.8 functional recovery routine 5.7.10 in dica tors 5.7.3 interface routine 5.7.8 I/O management 5.7.12 objectives 5.7.1 processor management 5.7.11 resource manager 5.7.12 service routine 5.7.10 storage management routine 5.7.9 SYSEVENT router 5.7.9 system (RMCT) indicators 5.7.3 user (OUeB) indicators 5.7.6 workload activity recording 5.7.14 workload manager 5.7.15 SSI (subsystem interface) function codes 5.13.10 function dependent area 5.13.5 initialization processing 5.13.1 lES control table (JESCT) 5.13.2 logic flow examples 5.13.7 major control blocks 5.13.2 requesting services 5.13.5 return codes 5.13.8 subsystem communications vector table (SSCVT) 5.13.1 information block (SSIB) 5.13.1 su bsystems (COil till II ed) options block (SSOB) 5.13.1 vector table (SSVT) 5.13.1 stand-alone dump analysis B.l.1 procedure B.l.7 chapter on 3.1.3 debugging SVC dump 5.14.11 determining system mode from 2.2.4 how to print 2.8.2 special notes 2.1.3 started task control (see STC') status information, locating in storage dump 2.2.5 STATUS STOP SRB 2.5.6 STC (started task control) abend codes 2.7.19 reason codes 2.7.19 step initiation/termination 5.4.5 SUBMIT command processor A.6.20 subpools for modules 5.3.19 SUM DUMP output 2.7.14 summary dump 2.7.14 super bits (see PSASUPER) superzaps to expand trace table 2.8.21 to force tracing during NIP processing 2.8.21 to modify trace table to monitor low storage 2.8.20 to stop MP system 2.8.21 to trace all inbound PIUs 4.3.30 to trace all outbound PIUs 4.3.30 VTAM buffer trace modification 4.3.29 VT AM I/O trace modification 4.3.29 suspend locks, definition 2.3.2 suspended locally locked tasks 2.2.6 SRB status 2.1.9 SRB/task with lock held B.1.15 tasks or address space caused by unsatisifed ENQ request 4.1.12 task status 2.1.8 sve D entries in trace table 2.6.8 SVC dumps analysis 3.1.5 debugging of control blocks, use of 5.14.14 fixed data, use in 5.14.12 procedure 5.14.12 recovery routines, use in 5.14.14 SLIP traps, use in 5.14.12 SYS1.LOGREC, use in 5.14.12 variable data, use in 5.14.14 variable data offset determination 5.14.14 how to override parameters 2.8.5 IEAVTSDT, dump task for 5.14.11 invocation of branch entry 5.14.11 DUMP Command 5.14.12 IEAVTABD 5.14.12 producing RTM, use in 5.14.11 SYSMDUMP DD', use in 5.14.11 SW A (scheduler work area) manager reason codes 2.7.20 swap-in process A.2.1 swap transition flags 5.7.2 swapping process flow A. 2.1 swap-out process A.2.3 Index 1.1.7 swap-out PCB A.2.5 SWIN (lEAVSWIN) A.2.1 SYNCH (function of Program Manager) 5.3.12 SYSABENDs analysis approach 3.1.9 hardware-detected errors 3.1.10 software-detected errors 3.1.9 system degradation (see performance degradation) system diagnostic work area (see SDWA) system execution modes and status saving 2.2.1 system hung (see enabled waits) system options for SVCDUMP 2.8.5 system modes at entry to RTMI 2.4.18 determining from Stand-alone dump 2.2.4 locked mode 2.2.3 physically disabled mode 2.2.2 SRB mode 2.2.2 task mode 2.2.1 system resources manager (see SRM) system stop routine 2.8.20 system options for SVC dump SYSABENDs 3.1.9 SYSMDUMPs 3.1.9,5.14.1 SYSUDUMPs, analysis approach 3.1.9 SYSZEC16-PURGE 4.1.13 SYSZVARY-x 4.1.3 SYS1.COMWRITE data set, how to print 2.8.8 SYS1.DUMP how to clear without printing 2.8.7 how to print 2.8.7 SYS1.LOGREC (see LOGREC) SYS1.STGINDEX, how to recreate 2.8.9 SYS1.UADS, how to rebuild 2.8.6 traces (see also trace table) activating several NCP traces 4.3.28 analysis of 2.6.1 currency 2.6.8 EP mode 4.3.4 events not traced 2.6.8 examples 2.6.3,4.3.7 interpreting 2.6.5 NCP mode 4.3.5 other tracing methods 4.3.30 output under normal conditions 4.3.7 summary of 4.3.3 to monitor storage 2.8.21 types of 4.3.3 trace table cautionary notes 2.6.7 how to expand 2.8.21 how to locate 2.6.1 how to modify to monitor low storage 2.8.20 types of entries 2.6.1 with DIDOCS 5.15.11 traps, ERPs 5.2.16 ' TSO (time sharing option) AP AR documentation A.6.28 attention processing A.6.25 command processor recovery A.6.19 line drop processing A.6.12 message handler A.6.27 overview of logon processing A.6.2 process flow A.6.1 terminal I/O flow A.6.21 time sharing initialization A.6.1 TSO/TIOC terminal I/O diagnostic techniques A.6.24 tape ERP 5.2.15 task analysis 2.1.8 locally locked interrupted 2.2.3 locally locked suspended 2.2.6,2.3.5,2.3 7,4.1.11 mode indicators set by dispatcher 5.1.12 RB structure 2.1.9 tests made by dispatcher 5.1.12 TCAM address space A.6.7 buffer trace (EP) 4.3.4 buffer trace (NCP) 4.3.6 channel end appendage A.6.2S dispatcher subtask trace 4.3.6 EP mode line I/O interrupt trace table 4.3.4 organization after a TSO logon A.6.7 PIU trace 4.3.6 subtask trace 4.3.4 TIOC logon processing A.6.6 TSO terminal I/O diagnostic techniques A.6.24 TCB (task control block) analysis 2.1.8 dispatching priority in address space 5.1.8 summary report 2.1.6 suspended with lock held B.1.5 TDCM (pageable display control module) control block 5.15.4 teleprocessing (see TP) timer value in trace table 2.6.7 time sharing and TCAM data flow A.6.21 TMP (terminal monitor program) A.6.6 EP mode 4.3.4 typical problems 4.3.1 TMP/command processor interface A.6.15 work area A.6.1 7 TPIOS buffer trace, example 4.3.12 TPIOS IN/OUT REMOTE trace 4.3.6 UCB, analysis for enabled waits 4.1.10 UCM (unit control module) control block 5.15.4 UCME (UCM entry) control block 5.15.4 unit check with ERPs 5.2.13 use of recovery work areas for problem analysis 1.1.8 2.4.1 validity bits for machine checks 2.7.9 virtual addresses, converting 5.5.14 virtual storage access method (see VSAM) virtual storage manager (see VSM) . virtual telecommunications access method (see VT AM) volume mount & verify (VM&V) 5.11.5 VSM (virtual storage manag~r) addr~ss space initialization 5.4.3 allocation 5.4.6 basic functions 5.4.1 cell pool management 5.4.10 control block usage 5.4.4 debugging hints 5.4.10 GETMAIN/FREEMAIN process flow A.4.1 global data areas (GDA) 5.4.7 step initialization/termination 5.4.5 view of MVS storage 5.4.2 VSAM (virtual storage access method) component analysis 5.9.1 I/O manager debugging 5.9.9 O/C/EOV debugging aids 5.9.7 O/C/EOV messages 5.9.6 record management buffer control block (BUFC) 5.9.3 debugging aids 5.9.3 error codes 5.9.5 placeholder (PLH) 5.9.2 request parameter list (RPL) 5.9.1 OS/VS2 System Programming Library: MVS Diagnostic Techniques VTAM (virtual telecommunications access method) address space usage 5.8.6 component analysis 5.8.1 control block structure 5.8.3 debugging 5.8.10 function management control block (FMCB) 5.8.5 how work is processed 5.8.2 locating FMCB/DNCB for a mode 5.8.14 locking 5.8.7 miscellaneous hints 5.8.15 module naming conventions 5.8.6 operating characteristics 5.8.6 process flow A.S.l program checks 5.8.15 recovery/termination 5.8.8 relationship with MVS '5.8.1 sample storage pool dump 5.8.13 SEND process flow A.S.2 VTAM buffer trace defwtion 4.3.6 modification 4.3.29 VTAM GTF trace example 4.3.12 VTAM I/O trace defmition 4.3.5 example 4.3.7 modification 4.3.29 waits chapter on 4.1.3 disabled analysis approach 4.1.5 characteristics of 4.1.4 locked console exception 4.1.5 with communications task 5.15.7 enabled analysis approach 4.1.7 analysis via trace table 2.6.7 characteristics of 4.1.3 waits (continued) with communications task 5.15.6 enabled loop exception 4.2.3 explicit 2.1.8, B.1.9 for VTAM buffer depletion 5.8.15 indications of paging interlocks 5.6.8 in user code B.1.21 in VTAM 5.8.11 no-work wait 4.1.8 OUCB analysis 5.7.6 page fault waits 4.1.10 page waits B.1.9 record managementdebu8ging aids 5.9.3 wait state PSW B.1.2 wait task. dispatching of 5.1.8 window spin 2.5.10 working set sizes 4.1.11 work area bits logon scheduler A.6.11.0 work queues, TCBs, address space analysis 2.1.6 WQE control block 5.15.4 XC (SIGP external call) defmition 2.5.9 process flow 2.5.12 XCTL (function of program manager) deSCription 5.3.8 module search sequence 5.3.15 new RB initialization 5.3.9 RB manipulation 5.3.10 zaps (see superzaps) OBO abend 5.11.13 306 abend 5.3.18 3705 EP line trace 4.3.4 806 abend' 5.3.14 Index 1.1.9 1.1.10 OS/VS2 System Propamming Library: MVS Diagnostic techniques READER'S COMMENT FORM OS/VS2 System Programming Library: MVS Diagnostic Techniques GC28-072S-2 This manual is part of a library that serves as a reference source for systems analysts, programmers, and operators of IBM systems. This form may be used to communicate your views about this publication. They will be sent to the author's department for whatever review and action, if any, is deemed appropriate. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation whatever. You may, of course, continue to use the information you supply. Note: Copies of IBM publications are not stocked at the location to which this form is addressed. Please direct any requests for copies of publications, or for assistance in using your IBM system, to your IBM representative or to the IBM hranch office serving your locality. Possible topics for comments are: Clarity E Accuracy Completeness Organization Coding Retrieval Legibility If comments apply to a Selectable Unit, please provide the name of the Selectable Unit _ _ __ £ If you wish a reply, give your name and mailing address: "~ -S iO >Q) en 0 +"' Q) 0. f! i E E ;:, Q) c .... :J Q) Cl ..c Cl 0 « +"' 0 II) > '';: 'iii c Q) en '~ ;:, C 0 "C "0 u. 0 ....::s CJ en en Q) C. Q) en ;:, Q) en CO Q) ii: Please circle the description that most closely describes your occupation. (Q) Customer Install Mgr. (S) IBM System Eng. (U) (X) System System Consult. Analyst (P) Prog. Sys. Rep. (A) System Anilyst (Y) System Prog. (B) System Prog. (Z) (F) Applica. '§ystem Prog. Oper. (C) Applica. Prog. (D) Dev. Prog. (I) I/O Oper. (R) Compo Prog. ~ (L) Term. o per. (G) System Oper. L::J (J) I/O Oper. (E) Ed. Dev. Rep. (N) Cust. Eng. (T) Tech. Staff Rep. Number of latest Newsletter associated with this publication: _ _ _ _ _ _ _ _ _ _ _ _ __ Thank you for your cooperation. No pos-tag-e stamp necessary if mailed in the U.S.A. (Elsewhere, an IBM office or representative will be happy to forward your comments.) GC28-0725-2 (') s. ~ ." 0 Reader's Comment Form 0: » 5' :::l OQ ,... 5' (1). I I. Fold and tape Fold and tape Please Do Not Staple . III I I < CJ'. I\: I ~ -I 3 ..,'"t .,I NO POSTAGE NECESSARY IF MAILED IN THE UNITED STATES c CJ'. ...... (I) .-+ (1) 0 .., (.Q Q) 3 2 :l (.Q C- .., .., O" Q) -< s < BUSINESS REPLY MAIL FIRST CLASS PERMIT NO. 40 CJ'. c:: iii ARMONK, N.Y. (.Q :l 0 (I) .-+ POSTAGE WILL BE PAID BY ADDRESSEE: n International Business Machines Corporation Department 058, Building 706-2 PO Box 390 Poughkeepsie, New York 12602 ::r ~ .c (1) (') c (1) (I) c;, c...: ...... c;: c...: .:::: '"t ~ :l .-+ Fold and tape Please Do Not Staple Fold and tape (1) Q :; .C Cr. :~ C \. I\: OC ------.-- ----.-_ .International Business Machines Corporation Data Processing Division' . 1133 Westchester Avenue, White Plains, N.V. 10604 IBM World Trade Americas/Far East Corporation Town of Mount Pleasant, Route 9, North TarrytoWn, N.Y., U.s.A. 10591 IBM World Trade Europe/Middle East/Africa Corporation 360 Hamilton Avenue, White Plains, N.Y., U.S.A. 10601 C: .... I\: U i\;
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.3 Linearized : No XMP Toolkit : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37 Create Date : 2012:06:10 09:21:28-08:00 Modify Date : 2012:06:11 01:18:58-07:00 Metadata Date : 2012:06:11 01:18:58-07:00 Producer : Adobe Acrobat 9.51 Paper Capture Plug-in Format : application/pdf Document ID : uuid:4fbcb82a-cf2e-463d-9361-5f5e748c1960 Instance ID : uuid:241135ed-d334-4e82-b569-bd6328b04741 Page Layout : SinglePage Page Mode : UseNone Page Count : 564EXIF Metadata provided by EXIF.tools