SM 0040_COS_EXEC_STP_CSP_Internal_Reference_Manual_Oct80 0040 COS EXEC STP CSP Internal Reference Manual Oct80
SM-0040_COS_EXEC_STP_CSP_Internal_Reference_Manual_Oct80 manual pdf -FilePursuit
SM-0040_COS_EXEC_STP_CSP_Internal_Reference_Manual_Oct80 SM-0040_COS_EXEC_STP_CSP_Internal_Reference_Manual_Oct80
User Manual: SM-0040_COS_EXEC_STP_CSP_Internal_Reference_Manual_Oct80
Open the PDF directly: View PDF .
Page Count: 310
Download | |
Open PDF In Browser | View PDF |
c: RESEARCH J INC. ' CRAY-1® COMPUTER SYSTEMS COS EXEC/STP/CSP INTERNAL REFERENCE MANUAL SM-0040 c: ....."O:--t' PUBLICATION CHANGE NOTICE RESEARCH. INC. October, 1980 TITLE: COS EXEC/STP/CSP Internal Reference Manual PUBLICATION NO. SM-0040 REV. This manual supports COS Version 1.09 and obsoletes portions of the CRAY-OS Version 1 System Programmer's Manual, publication 2240012. c: RESEARCH, INC. CRAY-1® COMPUTER SYSTEMS COS EXEC/STP/CSP INTERNAL REFERENCE MANUAL SM-0040 Copyright© 1980 by CRAY RESEARCH, INC. This manual or parts thereof may not be reproduced in any form without permission of CRAY RESEARCH, INC. RESEARCH. INC. RECORD OF REVISION PUBLICATION NUMBER SM-0040 Each time this manual is revised and reprinted, all chan~es issued against the previous version in the form of change packets are incorporated into the new version and the new version IS assigned an alphabetic level. Between reprints, changes may be issued against the current version in the form of change packets. Each change packet is assigned a numeric designator, starting with 01 for the first change packet of each revision level. Every page changed by a reprint or by a change packet has the revision level and change packet number in the lower righthand corner. Changes to part of a page are noted by a change bar along the margin of the page. A change bar in the margin opposite the page nt,fmber indicates that the entire page is new; a dot in the same place indicates that information has been moved from one page to another, but has not otherwise changed. Requests for copies of Cray Research, Inc. publications and comments about these publications should be directed to: CRAY RESEARCH, INC., 1440 Northland Drive, Mendota Heights, Minnesota Revision 55120 Description October, 1980 - Original printing; supports COS Version 1.09. This manual obsoletes portions of the CRAY-QS Version 1 System Programmer's Manual, publication 2240012. SM-0040 ii PREFACE This manual describes the internal features of the EXEC, STP, and CSP portions of the CRAY-l Operating System. This pUblication is part of a set of manuals that describes the internal design of the CRAY-l Operating System and its product set. Other publications in this set are: SM-0042 SM-0043 SM-0044 SM-0045 SM-0046 SM-0049 SM-0050 COS COS COS COS lOS OGS COS Front-End Protocol Internal Reference Manual Operational Procedures Reference Manual Operational Aids Reference Manual Table Descriptions Internal Reference Manual Software Internal Reference Manual Internal Reference Manual Simulator (CSIM) Reference Manual The following, which are available for use only by Cray Research personnel, complete the set of software maintenance documentation. SM-OOl7 SM-004l FORTRAN (CFT) Internal Reference Manual COS Product Set Internal Reference Manual Manuals designated as internal describe the internal design of the software, whereas the other manuals in the set define procedures and external features of tools needed for installing and maintaining CRI software. The reader is assumed to be familiar with the contents of the CRAY-OS Version 1 Reference Manual (SR-OOll) and to be experienced in coding the CRAY-l Assembly Language (CAL) as described in the CAL Version 1 Reference Manual (SR-OOOO). In addition, the I/O Subsystem assembler language (APML) is described in the APML Reference Manual (SR-0036). Operating information is available in the following publications: SG-0006 SG-005l SM-0040 Data General Station (OGS) Operator's Guide I/O Subsystem (lOS) Operator's Guide iii CONTENTS .....······················ INTRODUCTION . . ···· ···· PREFACE 1. 1.1 1.2 ···· GENERAL DESCRIPTION HARDWARE CHARACTERISTICS 1.2.1 Computation section 1.2.2 Central Memory section 1.2.3 Memory protection 1.2.4 Mass storage 1.2.5 I/O Subsystem 1.2.6 Front-end computer systems 1.2.7 Maintenance Control unit (MCU) 1.2.8 peripheral Expanders 1.3 SOFTWARE CONFIGURATION 1.3.1 CRAY-1 Operating System (COS) 1.3.2 Language systems 1.3.3 Library routines 1.3.4 Applications programs 1.4 SYSTEM RESIDENCE 1.4.1 EXEC table area 1.4.2 EXEC program area 1.4.3 STP table area 1.4.4 STP program area 1.4.5 CSP area. 1.4.6 user area 1.5 MASS STORAGE SUBSYSTEM ORGANIZATION 1.5.1 Formatting 1.5.2 Device label (DVL) 1.5.3 Dataset Catalog (DSC) 1.6 EXCHANGE MECHANISM 1.6.1 Exchange package 1.6.2 Exchange package areas 1.6.3 B, T, and V registers 1.7 COS STARTUP 1.8 GENERAL DESCRIPTION OF JOB FLOW 1.8.1 Job entry 1.8.2 Job initiation 1.8.3 Job advancement 1.8.4 Job termination 1.9 DATASET MANAGEMENT 1.10 I/O INTERFACES ·· · ·· · · · · ··· ····· ······ ···· ···· ···· ···· · ·············· ···· ···· ····· ····· ···· ····· ···· ··· ·········· · · · ····· ···· · ··· · · · · ·· · · · · ····· ···· ···· ···· · · · · · · · · · · · · ·· ·· ·· ·· · · · ····· · · · · · · · · · .. · · · · · · · ···· ····· ····· ····· · · ···· ···· ····· ····· ···· · ···· ··· ···· · ···· ···· ········· SM-0040 v iii 1-1 1-1 1-2 1-5 1-5 1-5 1-6 1-6 1-8 1-8 1-8 1-9 1-9 1-10 1-12 1-13 1-13 1-17 1-18 1-18 1-21 1-21 1-21 1-23 1-23 1-24 1-24 1-25 1-25 1-25 1-27 1-29 1-29 1-29 1-30 1-30 1-31 1-31 1-32 2. EXEC ............... 2-1 INTERCHAOOE INTERRUPT ANALYSIS • INTERRUPT HANDLERS • • CHANNEL MANAGEMENT • • • • 2.3.1 Channel tables 2.3.2. Channel assignments 2.3.3 Channel processors 2.4 TASK SCHEDULER • • • • • • • 2.5 EXEC RESOURCE ACCOUNTING • 2.6 EXECUTIVE REQUEST PROCESSOR 2.6.1 Executive requests 2.6.2 EXEC error codes 2.7 FRONT-END DRIVER • • • • • • 2.7.1 Theory of operation • • • • • 2.7.2 System tables used by FED. 2.7.3 Processors • • • • • • • • • • • • • • 2.8 00-19/29 DISK DRIVER • • • • • • • • 2.8.1 ROll • • • • • • • • • ••• 2.8.2 Hardware sequences for sample requests 2.9 I/O SUBSYSTEM DRIVER • • • • • • • • • • 2.9.1 Functional description ••••• 2.9.2 Recovery • • • • • • • 2.9.3 MIOP command and status packet formats 2.10 EXEC DEBUG AIDS • • • • • • • 2.10.1 History trace • • • • • 2.10.2 System crash message buffer. 2.11 INTERACTIVE SYSTEM DEBUGGING ~ ~ ~ ~ 2-2 2-4 2-4 2-5 2-6 2-6 2-9 2-10 2-10 2-11 2-26 2-27 2-27 2-28 2-29 2-30 2-30 2-32 2-35 2-35 2-37 2-37 2-40 2-41 2-46 SYSTEM·TASK·PROCESSOR (STP) 3-1 2.1 2.2 2.3 0 3. 3.1 3.2 3.3 SM-0040 0 0 0 @ GENERAL DESCRIPTION • • • • TASK COMMUNICATIONS • • • • • • • • • 3.2.1 EXEC/task communication. 3.2.2 Task-to-task communication 3.2.3 User/STP communication STP COMMON ROUTINES • • • • • • 3.3.1 Task I/O routines (TIO) • 3.3.2 System tables used by TIO • 3.3.3 Circular I/O routines (CIO) • 3.3.4 Memory allocation/deallocation routines • 3.3.5 Chaining/unchaining subroutines • • • • • • • • • 3.3.6 Interactive communication buffer management routines • • • • • • • • • • • • • • • • • • vi 2-47 3-1 3-2 3-2 3-2 3-7 3-7 3-7 3-9 3-19 3-27 3-29 3-32 4. ········ · · · ·· · · · ···· ···· SYSTEM' TASKS 4.1 ···· COS STARTUP 4.1.1 Input to Startup 4.1.2 Tables used by Startup 4.1.3 Startup subroutines 4.1.4 Install 4.1.5 Deadstart 4.1.6 Restart 4.1.7 Job recovery by Restart 4.2 DISK QUEUE MANAGER (D<»1) 4.2.1 System tables used by DQM 4.2.2 DQM interface with other tasks 4.2.3 Dataset allocation 4.2.4 Resource management 4.2.5 Queue management 4.2.6 I/O request flow in DQM 4.2.7 Hardware error logging 4.3 STATION CALL PROCESSOR (SCP) 4.3.1 System tables used by SCP 4.3.2 Processing flow for SCP 4.3.3 Interactive processing 4.4 EXCHANGE PROCESSOR (EXP) 4.4.1 System tables used by EXP 4.4.2 User area tables used by EXP 4.4.3 Exchange processor request word 4.4.4 User normal exit 4.4.5 System action requests 4.4.6 User error exit 4.4.7 Job scheduler requests 4.4.8 Job rerun 4.4.9 Reprieve processing 4.4.10 Non-recoverabi1ity of jobs 4.5 JOB SCHEDULER (JSH) 4.5.1 Job flow 4.5.2 Scheduling philosophy 4.5.3 Tuning the system 4.5.4 Memory management 4.5.5 Job startup 4.5.6 Job status and state changes 4.5.7 JSH interface with other tasks 4.6 PERMANENT DATASET MANAGEMENT (PDM) 4.6.1 Tables used by PDM 4.6.2 Subfunctions 4.6.3 POD status 4.6.4 Theory of operation . . . . ·· ·· · · ···· ···· ···· ····· ·· · · ···· ···· ···· ·· ···· ·· ···· ···· ···· ···· ········ ···· ···· .···· ········ ······ ······ .···· ···· ······ ·· ···· ···· ······ · · · · .. ·· · · ···· · · · · · · .. · · ······ ···· · SM-0040 vii 4.1-1 4.1-1 4.1-3 4.1-6 4.1-9 4.1-13 4.1-14 4.1-15 4.1-17 4.2-1 4.2-1 4.2-4 4.2-5 4.2-6 4.2-8 4.2-9 4.2-9 4.3-1 4.3-1 4.3-3 4.3-5 4.4-1 4.4-2 4.4-3 4.4-4 4.4-5 4.4-5 4.4-17 4.4-17 4.4-18 4.4-19 4.4-20 4.5-1 4.5-1 4.5-3 4.5-15 4.5-17 4.5-18 4.5-20 4.5-26 4.6-1 4.6-2 4.6-4 4.6-8 4.6-10 4.7 4.7-1 LOG MANAGER • • • • • • • • • • • 4.7-1 4.7.1 Message processor (MSG) •• 4.7.2 System tables used by MSG • 4.7-3 4.7-4 4.7.3 Task calls to MSG •••••• 4.7.4 $SYSTEMLOG format • • • 4.7-6 4.7-10 4.7.5 $LOG format • • • • • • 4.8 MEMORY ERROR PROCESSOR (MEP) 4.8-1 4.9-1 4.9 DISK ERROR CORRECTION (DEC) 4.9.1 System table used by DEC • • • • • • • • • • • • 4.9-1 4.9-1 4.9.2 DEC interface with other tasks •••• 4.10 SYSTEM PERFORMANCE MONITOR (SPM) • • • • • • • • • • • 4.10-1 4.10-1 4.10.1 control parameters • • • • • • • • • • • 4.10.2 Method of data collection • • • • • • 4.10-2 4.10.3 Data collection and record definition • • • • • 4.10-2 4.10.4 Task flow for SPM • • • • • • • • • • 4.1Q-9 4.10-9 4.10.5 System tables used by SPM • • • • • • • 4.11 JOB CLASS MANAGER (JCM) • • • • • • • • • • • • • • • • 4.11-1 4.11.1 Job class assignment • • • • • • • • • • • • • • 4.11-1 4.11.2 Interface between JCM and other tasks • • • • • 4.11-2 4.12 OVERLAY MANAGER (OVM) • • • • • • • • • • • • • • • • • 4.12-1 4.12.1 Task communication with OVM • • •• • • • • 4.12-1 4.12-6 4.12.2 System generation/overlay definition. 4.12.3 Overlay calling macros. • ••••• 4.12-6 4.12-7 4.12.4 OVM tables • • • • • • • • • • • • • • 5. CONTROL STATEMENT PROCESSOR (CSP) 5-1 5.1 5-1 5-1 5-1 5-2 5-2 5-2 5-2 5-3 5-4 5-4 5-4 5-5 5-5 5-6 5.2 5.3 5.4 SM-0040 SYSTEM TABLES USED BY CSP 5.1.1 Job communication block (JCB) 5.1.2 Logical file table (LFT) 5.1.3 Dataset parameter area (DSP) 5.1.4 Dataset name table (DNT) THEORY OF OPERATION • • • • • • • 5. 2.1 CSP load process • • • • • • • • • 5.2.2 Entry and exit conditions • • • • • 5.2.3 Begin job. • • • • • ••• 5.2.4 Crack statements • • • • • • • • 5.2.5 Process statements • • • • • • ••• • 5.2.6 Advance job • • • • 5.2.7 Error exit processing. • ••••••• 5 • 2 • 8 End job • • • • • • • • • • • CSP STEP FLOW • • • • • • • • • • • • • • • • RECOVERY STATUS MESSAGES • ..... ........ viii 5-6 5-8 FIGURES 1-1 1-2 1-3 1-4 1-5 1-6 1-7 1-8 1-9 CRAY-IA/B or CRAY-l S Series Model S/250, S/500 or S/lOOO Computer Systems • • • • • • • • • • • • • • • • • • 1-3 CRAy-l S Series Model S/1200 through S/4400 Computer Systems 1-4 1-5 program field • • • • Elements of CRAY-OS • 1-10 1-14 Memory Assignment • • Expansion of a user area • • • • • • • • 1-15 1-16 Expansion of COS resident Mass storage organization • • 1-23 1-26 Exchange package. • • • • Exchange package management • 1-28 Overview of COS I/O • • • • • 1-33 EXEC-controlled exchange sequences 2-2 System control • • • • • • • 2-3 Channel table linkage • • • • • • • • • • • 2-5 2-11 Task Scheduler table linkage Task communication tables 3-4 Dataset table linkage • • 3-8 TIO logical read 3-12 TIO logical write 3-13 Physical I/O 3-20 3-28 Memory allocation tables • • • • • Chain tables 3-31 4.2-1 DQM table linkages • • • • • • • • • OAT structure • • • • • • • 4.2-3 DCU-2, 3 Controller configuration • • • • • • • • • • • • • ....... 1-10 1-11 2-1 2-2 2-3 2-4 3-1 3-2 3-3 3-4 3-5 3-6 3-7 4.2-1 4.2-2 4.2-3 4.2-7 4.2-4 DCU-4 controller configuration 4.5-1 Job flow ••••• 4.5-2a4.5-2f Memory priority variation. • • •••• 4.5-3 Normal transition between job states 5-1 CSP general flow diagram • • • • 4.2-8 4.5-2 4.5-10 4.5-22 5-7 TABLES 1-1 1-2 2-1 2-2 4.5-1 4.5-2 4.5-3 4.5-4 4.6-1 4.10-1 4.10-2 4.10-3 4.10-4 SM-0040 Characteristics of models of the CRAY-1 Computer Systems Operational characteristics of disk storage units • History trace functions • • • • • EXEC stop message • • • • • • • • DNT initialization Status bit assignments State change sequences JSH functions • • • • • • POD status •• • • • • CPU usage record - subtype 1 Task usage record - subtype 2 EXEC requests record - subtype 3 User memory usage record - subtype 4 ix 1-2 1-7 2-42 2-46 4.5-19 4.5-20 4.5-23 4.5-28 4.6-8 4.10-3 4.10-3 4.10-4 4.10-4 TABLES continued 4.10-5 4.10-6 4.10-7 4.10-8 4.10-9 4.10-10 4.10-11 4.10-12 4.11-1 SM-0040 Disk usage record - subtype 5 • • • • Disk channel usage record - subtype 6 Link usage record - subtype 7 • • • • EXEC call usage record - subtype 8 User call usage record - subtype 9 Interrupt count record - subtype 10 • Job Scheduler management statistics record - subtype 11 • Job class information record - subtype 12 •••• JCM functions • • • • • • • • • • • • • • • • • • • • • • x 4.10-5 4.10-5 4.10-6 4.10-6 4.10-7 4.10-7 4.10-8 4.10-8 4.11-3 INTRODUCTION 1.1 1 GENERAL DESCRIPTION CRAY-OS (COS) is a multiprogramming operating system for the CRAY-l Computer System. The operating system provides for efficient use of system resources by monitoring and controlling the flow of work presented to the system in the form of jobs. The operating system centralizes many of the job functions such as input/output and memory allocation and resolves conflicts when more than one job is in need of resources. CRAY-OS is a collection of programs that, following startup of the system, resides in CRAY-l Central Memory, on system mass storage, and in the I/O Subsystem on some models of the CRAY-l S Series. (Startup is the process of bringing the CRAY-l and the operating system to an operational state. ) Jobs are presented to the CRAY-l by one or more computers referred to as front-end systems, which may be any of a variety of computer systems. Since a front-end system operates asynchronously under control of its own operating system, software executing on the front-end system is beyond the scope of this publication. The FORTRAN compiler, the CAL assembler, the SKOL macro translator, the UPDATE program, and utility programs execute as parts of user jobs and are described in separate publications. The operating system is available in two forms: (1) preassembled into absolute binary programs in an unblocked format and (2) source language programs in the form of UPDATE decks. The binary form of the program is provided for the installation of the basic system. The UPDATE decks provide a means of modifying and updating the source code and generating a new system in binary form by reassembling the modified programs. Details for generating, installing, and starting up the operating system are given in COS Operational Procedures Reference Manual, CRI publication SM-0043. SM-0040 1-1 1.2 HARDWARE CHARACTERISTICS This section briefly summarizes the hardware characteristics of the CRAY-l Computer System. The basic components of the system are summarized in table 1-1 and illustrated in figures 1-1 and 1-2. Figure 1-1 illustrates basic components of a CRAY-lA/B or CRAY-l Model S/250, S/500, or S/lOOO Computers. These systems consist of a central processing unit (CPU), power and cooling equipment, a minicomputer maintenance control unit (MCU), a mass storage disk subsystem, and a front-end system. Table 1-1. Model Characteristics of Models of the CRAY-l Computer Systems S/250 S/500 or lIB S/1000 or l/A S/1200 S/1300 S/1400 S/2200 S/2300 S/2400 S/4200 S/4300 S/4400 1 M 1 M 2 M 2 M 2 M 4 M 4 M 4 M CPU Memory size in 1/4M 1/2M 1-3 1-3 1 M 1 M 64-bi t words FRONT-END INTERFACES 1-3 I/O SUBSYSTEM I/O Processor s Buffer Memory DCU-4 Controllers 00-29 Disk Storage Units MASS STORAGE DCU-3 Disk 00-29 Disk DCU-2 Disk 00-19 Disk SUBSYSTEl-IS controllers Storage Units Controllers Storage Units 2-8 2-32 2-8 2-32 2-8 2-32 2-8 2-32 2-8 2-32 1-3 1-3 1-3 1-3 1-3 1-3 1-3 1-3 1-3 .5 or 1M 1-4 2-16 .5 or 1M 1-8 2-32 .5 or 1M 1-12 2-4B .5 or 1M 1-4 2-16 .5 or 1M 1-8 2-32 .5 or 1M 1-12 2-48 .5 or 1M 1-4 2-16 .5 or 1M I-B 2-32 .5 or 1M 1-12 2-48 2-85S 1-32S5 2-8§§ 1-3255 2-8§§ 1-32SS 2-8S5 1-32SS 2-8S5 1-32§5 2-BSS 1-3255 2-8S5 1-3255 2-BSS 1-32SS 2-BSS 1-3255 ~: M3S~ ctaregc limite ""CGumc '" GonE igu.r i:i. tivrl with a jYtQhiiflUilt 0[ il I,,;hGluut:l~ d.vd..i.ldult:. 5S While connection of mass storage devices through the I/O Subsystem is preferred, where possible, available CPU channels can be used for additional mass storage. Figure 1-2 illustrates the CRAY-l S Series Models S/1200 through S/4400 Computer Systems. These systems are characterized by the incorporation of an I/O Subsystem comprised of two to four I/O Processors. SM-0040 1-2 r-------------------------------, I I CPU I ,I I I I I 1 I I I I I I I I I I I I I COMPUTATION SECTION CONTROL SECTION • Registers • Functional units • Instruction buffers • Control registers • Exchange mechanism MEMORY SECTION • 0.25 Mor 0.5 Mor 1 M words of 64 bits each • Interrupt system • Real-time clock • Programmabl e clock I/O SECTION • 12 I/O channel pairs L _________ _ , f/' 1/1/ 1/1/ / / I I / / / I I I / I I \\ \\ j ,I I:, \ \ \\ 'I' \\ \ \ __ 'I' \\ \\ \\ \\ 'I' I~. , I I I ' \ \ , \ \ \ \ \ \ ----. FRONT-END COMPUTERS, MASS STORAGE, AND PERIPHERAL EQUIPMENT MCU Figure 1-1. SM-0040 II' _ _ 1_ L 1-1_ - __ :_I_~_\_\_\_\ _ _ - CRAY-lA/B or CRAY-! S Series, ModelS/250, S/500 or S/1000 Computer Systems 1-3 J r-------------------------------, CPU COMPUTATION SECTION CONTROL SECTION • Registers • Functional units • Instructio buffers • Control registers • Exchange mechanism MEMORY SECTION • 1 Mor 2 M or 4 M words of 64 bits each • Interrupt system • Real-time clock • Programmabl e clock I/O SECTION • 12 I/O channel pairs • 1 Memory channel L ________ _ - 1 7 I· _ _ I' 1'1 \ 1'1' I I I II / / I /-1+ I- -,~ -~ -~ --\ -\-'.- - - - _-.J I 1 I I I I I \ \ \ \ /" I I , /1' I', \ / I, I \ \ \ \ \ \ \ \ \ \ \ \ \' \\ \\ \ \ ~----~--------~--~ I/O • 2 to 4 • 1/2 to Buffer / I I / I I I I I I I I FRONT-END COMPUTERS Figure 1-2. SM-0040 SUBSYSTEM I/O Processors 1 M words of Memory FRONT-END COMPUTERS " ""," , ' "" , , "' ...'""" ",", , , ," " "" ",," ....... '\. MASS STORAGE, BLOCK MULTIPLEXERS, AND PERIPHERAL EXPANDER EQUIPMENT CRAY-l S Series Model S/1200 through S/4400 Computer Systems 1-4 1.2.1 COMPUTATION SECTION The computation section is composed of instruction buffers, registers, and functional units that operate together to execute sequences of instructions. At anyone time, only one program can be in execution although several programs may be candidates for execution. This means that multiprogramming, the sharing of the computation section among multiple programs, is possible: but multiprocessing, the concurrent execution of multiple programs, is not possible. 1.2.2 CENTRAL MEMORY SECTION The CRAY-l Central Memory is constructed of LSI chips arranged in 8 or 16 banks. Memory sizes depend on the model. Available sizes are: 262,144 words, 524,288 words, 1,048,576 words, 2,097,152 words, or 4,194,304 words. A word is 64 bits. The lower memory addresses contain exchange packages, operating system tables and pointers, and operating system programs. The extreme upper memory addresses contain operating system I/O buffers. The remainder of memory is available for user jobs. An algorithm that calculates the maximum memory size allocation for a job is described in Appendix B of the COS Operational Procedures Reference Manual, publication SM-0043. 1.2.3 MEMORY PROTECTION Two registers (BA and LA) define the field of memory addresses that can be referenced by the executing program (see figure 1-3). The base address (BA) register contents define the beginning address: the limit address (LA) register contents define the upper address. The last usable address is at (LA)x2 4-l. Q-----------.r-------------------------. [(BAit2~ __ to [< LA) x2 4 ] I -1 _______ .~~~iil: iil:~~~:l4 Memory Figure 1-3. program field The hardware senses an attempt by a program to reference an address not in this range of addresses and sets an error interrupt flag. Note that both SA and LA addresses are with reference to absolute address O. SM-0040 1-5 Some of the operating system programs are privileged by having access to all of memory: others are limited to certain portions of the operating system and to user program areas. Each user program has access to its own defined field only. • 1.2.4 MASS STORAGE CRAY-l mass storage consists of one or more Cray Research DCU-2 or OCU-3 Disk Controllers for CRAY-IA/B Systems or CRAY-l 5/250, S/500 or S/lOOO Systems and multiple 00-19 or 00-29 Disk Storage units (OSUs). The disk controller is a Cray Research product and is implemented in ECL logic similar to that used in the mainframe. Each controller may have up to four 00-19 or 00-29 disk storage units attached to it. Operational characteristics of the OSUs are summarized in table 1-2. (The 00-29 resembles the 00-19, except that it has approximately twice the storage capacity of the 00-19.) Additional information about the CRAY-l mass storage subsystem is given in the CRAY-l OCU-2, OCU-3 Disk Controller Reference Manual, publication 2240630. For the CRAy-l S/1200 and above, the mass storage is attached to the I/O Subsystem. The I/O Subsystem consists of 2 or more I/O processors. One of these serves as the Master I/O Processor. A second processor (the Buffer I/O Processor or SlOP) is dedicated to mass storage. The other two processors may be dedicated to mass storage. If they are, they are referred to as Data I/O Processors (DIOP). Each SlOP or DIOP can drive up to four DCU-4 Oisk Control units. Each DCU-4 Disk Control unit supports up to four disk storage units. All units connected to a OCU-4 may be simultaneously active. However, the number of concurrent data streams is limited by the Suffer Memory size, the Suffer I/O Processor (SlOP) transfer capacity, and software overhead. For example, a Model S/x200 might be limited to 6 streams while a larger system could have as many as 12 streams. 1.2.5 I/O SUBSYSTEM Starting with the S/1200 (1 million words), I/O throughput to front-end computers and to mass storage devices is significantly enhanced with the incorporation of an I/O Subsystem. The I/O Subsystem is a Cray Research product specifically designed to complement the CRAy-l CPU requirements. A primary feature is the incorporation of a Memory Channel linking the I/O Subsystem to Central Memory. Maximum transfer rates of approximately 850 Mbits per second are achievable on this channel. The power of the I/O Subsystem relates directly to the number of I/O Processors it contains. TWo, three, or four I/O Processors may comprise the I/O Subsystem. with each addition of another I/O processor, significant increases in mass storage capacity or the ability to drive peripheral devices is achieved. SM-0040 1-6 Table 1-2. Operational characterstics of disk storage units 00-19 Word capacity per drive 3.723 x 10 7 7.585 x 10 7 Word capacity per cylinder 92,160 92,160 Bit capacity per drive 2.424 x 10 9 4.854 x 10 9 404 (411 less 7 cylinders reserved for diagnostics) 814 (823 less 9 cylinders reserved for diagnostics) Sectors per track 18 18 Bits per sector 32,768 32,768 Number of head groups 10 10 Latency 16.7 ms 16.7 ms Access time 15 - 80 ms 15 - 80 ms Data transfer rate (average bits per second) 35.4 x 10 6 35.4 x 10 6 Longest continuous transfer per command 92,160 words (1 cylinder) 92,160 words (1 cylinder) Total bits that can be streamed to a unit (disk cylinder capacity) 5.9 x 10 6 5.9 x 10 6 Tracks per surface or cylinders per drive SM-0040 00-29 1-7 1.2.6 FRONT-END COMPUTER SYSTEMS The CRAY-l Computer System may be equipped with one or more front-end computer systems that provide input data to the CRAY-l and receive output from the CRAY-l for distribution to a variety of slow-speed peripheral equipment. Peripherals attached to the front-end system vary with application requirements (i.e., local or remote job entry stations, data concentrator for multiplexing remote stations, etc.). On CRAY-l Models S/1200 and above, the front-end computers are usually connected through the I/O Subsystem. Front-end systems connect directly to the CPU I/O channels on systems that do not have I/O Subsystems. The CRAY-l is interfaced to front-end systems through special interface controllers that compensate for differences in channel widths, machine word size, electrical logic levels, and control protocols. The interface controller is a Cray Research product implemented in logic compatible with the host system. CRAY-l front-end systems connect directly to the CPU I/O channels on systems that do not have I/O Subsystems. On Models S/1200 and above, the front-end computers normally connect through the I/O Subsystem but may also be connected to the CPU I/O channels. 1.2.7 MAINTENANCE CONTROL UNIT (MCU) On CRAY-lA/B Systems and Models S/250, S/500, and S/lOOO Systems, a Data General minicomputer serves as a maintenance control unit. The MCU performs initial system startup and recovery for the operating system. Included in the MCU system is a software package that enables the minicomputer to monitor CRAY-l performance during production hours. When not used for maintenance purposes, the MCU can serve as a front-end system for the CRAY-l by employing CRI-supplied software. A description of the software for the MCU is beyond the scope of this publication. 1.2.8 PERIPHERAL EXPANDER On CRAY-l models 5/1200 through S/4400, peripheral devices connected to the I/O Subsystem through a Peripheral Expander interface allow for maintenance operations such as initial system startup and recovery. SM-0040 1.3 SOFTWARE CONFIGURATION The CRAY-l, as with any other computer system, requires three types of software: an operating system, language systems, and applications programs. The I/O Subsystem, when present, also requires its own software. The internal features of the I/O Subsystem Software are described in the lOS Software Internal Reference Manual, publication SM-0046. 1.3.1 CRAY-l OPERATING SYSTEM (COS) The CRAY-l Operating System (COS) consists of memory-resident and disk resident programs that (1) manage resources, (2) supervise job processing, and (3) perform input/output operations. COS also contains a set of disk resident utility programs. The operating system is activated through a system startup operation performed from the MCU or the I/O Subsystem. A job may consist of a compilation or assembly of a program written in some source language such as FORTRAN, followed by execution of the program resulting from the compilation or assembly. The CRAY-l Operating System consists of the following modules that execute on the CPU (figure 1-4): Executive (EXEC) System Task Processor (STP) Control Statement Processor (CSP) Utility programs (not shown) EXEC (described in section 2) runs in monitor mode and is responsible for control of the system. It schedules STP tasks, manages exchange packages, performs I/O, and handles all interrupts. EXEC has access to all of memory. STP (described in section 3) runs in object program (user) mode. It accesses all memory other than that occupied by EXEC and is responsible for processing all user requests. STP is composed of a number of programs known as tasks, each of which has its own exchange package. CSP (described in section 5) is responsible for interpreting all job control statements and either performing the requested function or making the appropriate system request. An image of CSP is resident after the STP area of memory but is copied into a user field for execution. Utility programs (described in the COS Product Set Internal Reference Manual) include the loader, a library generation program (BUILD), a source language maintenance program (UPDATE), permanent dataset utility programs, copy and positioning routines, and so on. SM-0040 1-9 JOBS STP EXEC Figure 1-4. Elements of CRAY-OS Images of utility programs are resident on disk storage and are summoned for loading and execution in the user field through control statements. 1.3.2 LANGUAGE SYSTEMS Currently, four language systems developed by Cray Research are available for use on the CRAY-l. They are the FORTRAN compiler (CFT), the CRAY-l Assembler language program (CAL), the SKOL macro translator, and A Programming Macro Language (APML) for the I/O Subsystem. SM-0040 1-10 FORTRAN compiler Developed in parallel with the CRAY-l Computer System, the Cray Research FORTRAN compiler is designed to take advantage of the vector capability of the computer. The compiler itself determines the need for vectorizing and generates code accordingly, removing the burdens of such considerations from the programmer. Optimizing routines examine FORTRAN source code to see if it can be vectorized. The compiler adheres closely to the ANSI 1966 standards and includes many ANSI 1978 extensions. A description of the design of the compiler is outside the scope of this publication. It is included in the CRAY-l FORTRAN (eFT) Internal Reference Manual, CRI publication SM-0017 which is distributed only to CRI personnel. CAL assembler The CAL assembler provides users with a means of expressing symbolically all hardware functions of the CPU. Augmenting the instruction repertoire is a set of versatile pseudo instructions that provide users with options for generating macro instructions, organizing programs, and so on. Programs written in CAL may take advantage of Cray Research-provided system macros that facilitate communication with the operating system. CAL enables the user to tailor programs to the architecture of the CRAY-l. Much of the operating system as well as other software provided by Cray Research is coded in CAL assembly language. A description of the design of the CAL assembler is beyond the scope of this publication. See The CRAY-l CAL Assembler Language Reference Manual, publication SR-OOOO for assembler information. APML assembler The APML assembler executes on the CRAY-l CPU and generates absolute code that is executable in the CRAY-l I/O Processors. APML allows the system progranwer to express symbolically all hardware functions of a CRAY-l I/O Proce~sor. It is used to generate the I/O Subsystem software. In addition to the full range of symbolic instructions, which allow the APML user to fully use the I/O Processors arithmetic and I/O instructions, registers, and memory, APML provides a number of macro, conditional assembly, and pseudo instructions that simplify the task of creating assembly language programs. APML is described in the APML Reference Manual, publication SR-0036. SM-0040 1-11 SKOL macro translator SKOL is a high-level programming language that stresses readability and extensibility. It offers the user a well structured language while retaining the power and efficiency of the CFT compiler. This is possible because SKOL is translated into FORTRAN code by a set of string-processing macro instructions. By adding to these instructions, the user can extend the language to suit his own purposes. By inserting macros directly into the SKOL source program, changes in the languages can be defined for a specific run. Many of the control statements are familiar to users of other high-level languages. For example, SKOL's IF-ELSEIF-ELSE-ENDIF structure is derived from LISP and ALGOL, and its LOOP-WHILE-ENDLOOP subsumes all single-exit loop structures. The scalar case structure is derived from Pascal. The important situation case structure, which eliminates the need for labels and GOrOs, is unique to SKOL. The use of the record and pointer data structures in SKOL also largely parallels Pascal. Character string processing is performed in SKOL with the STRING data structure, and partial-word variables can be defined by the WORD structure. The user can also define his own enumerated data types. Since any valid FORTRAN code is also valid SKOL code, SKOL makes use of the subroutine and the function. Additionally, SKOL offers routines without parameters, recursive routines, and the concept of a process. A process consists of several cooperating coroutines that can activate one another or suspend the process. SKOL provides a number of tools for testing and debugging programs. Among the tools are: • Conditional compilation, which specifies a statement, part of a statement, or a series of statements to be either compiled or not compiled, as determined by the user for a specific run. • The TRACE statement, Which prints the value of a variable Whenever an assignment is made to it. • The VALIDATE statement, which enables or disables the output of built-in run-time debugging messages. 1.3.3 LIBRARY ROUTINES The CRAY-l software includes a group of subprograms that are callable from CAL and CFT programs. These subprograms reside in the $FTLIB, $SYSLIB, and $SCILIB libraries. They are grouped by UPDATE deck name within each library. The subprograms have been divided among the three libraries generally on a functional basis. SM-0040 1-12 $FTLIB contains routines that are an intrinsic part of CFT, such as the mathematical functions. All of the basic external functions as specified by ANSI X3.9-l966 are incorporated in the library. Additionally, a large number of vector FORTRAN library routines are also provided. $FTLIB also contains nonmathematical routines such as the DATE routine. $SYSLIB routines, which link directly to the operating system, are not usually accessible from a CFT program but are callable from $FTLIB routines for specific tasks. In general, $SYSLIB serves as a link between the general-purpose $FTLIB routines and the details of COS. The routines in $SCILIB usually perform mathematics in the scientific process such as matrix multiply or Fourier transformation. 1.3.4 APPLICATIONS PROGRAMS Applications programs are specialized programs usually written in a source language such as FORTRAN to solve particular user problems. These programs are generally written by customers and as such are not described in this publication. 1.4 SYSTEM RESIDENCE This section describes the locations of the various components of the operating system without attempting to explain what they are. The components are described in later sections. The system components reside in areas of memory as defined during startup (section 4.1). Figure 1-5 illustrates the general contents of memory following startup. Figure 1-6 illustrates the general layout of a user area. Figure 1-7 itemizes the memory resident portions of the operating system. SM-0040 1-13 o User areal User area2 User area3 User area Figure 1-5. n Memory Assignment that defines maX1mum memory 1n SM-0040 1-14 ____ .:1_ WUl.U~ Use r SA - I @I J TL " . - - - - - - - - - - - - - - - . Job Table Area User BA Job Communication Block User BA+200 s User program user field JCHLM · ............................ . ........................... ............................. ." ......................... ..... ······............................ ............................. ........................... . l •••••••••••••• ~ •••••••••••••••••••••••••••••••••••••••••••or JCLFT I. • • • • • • e.• • • • • • • • • • • • • • • • • • • • • • • • • • :- • • • • • • • • • • • • • • • • • • • • .: • Dataset buffers and I/O tables User LA-l Figure 1-6. SM-0040 Expansion of a user area 1-15 o EXEC Table Area ------------ EXEC XMTR STP Table Area - - - - - - - - - - -- STP CSPBASE CSPEND CSP Available for Jobs Memory for CRAY-OS System tog and station buffers J@MEM . ._ _ _ _ _ _ __ F igur~ 1-7. SM-0040 1-16 1.4.1 EXEC TABLE AREA The EXEC table area contains the following tables and parameters used by EXEC. Detailed descriptions of the tables are given in COS Table Descriptions Internal Reference Manual, publication SM-004S. IC XMELIM XMECNT XMEDIS SAXP SAEF SUXC NCAERR SXBF IDXP CORXP MCCCNT MCLCNT SIDLE SERRLIM DSLWA CRT SSBO SXTC ETIM ITIM UTIM BTIM RUNTIM MSLIM STT CHT , SM-0040 Channel interrupt counters Miscellaneous pointers and constants Logged single-bit error limit Single-bit error count Single-bit interrupt disabled flag Pointer to currently connected user job Error flags from current exchange package User exchange package in JTA flag Count of channel address errors on disk channels Current user exchange package Idle exchange package Correction exchange package Disk master clear count Disk master clear loop count Alternate scheduling flag Station input error retry limit l+LWA of COS binary and parameter file Miscellaneous pointers and constants Disk channel reservation table Contents of BO register of interrupted processor Clock at beginning of interrupt Accrued executive time Accrued idle time Accrued user time Accrued system I/O blocked time Accrued time since deadstart Idle memory scan limit System Task Table. This table consists of three parts: a 4-word header, a task parameter word area, and an exchange package area. The sign bit of the second word of the STT header is set if the highest priority STP task is to execute. The address in the low-order bits of the word points to the parameter word for the task to be executed. The third header word contains a bit for each STP task. The bit is set if the task is created. This word also contains a pointer to the exchange package for the currently scheduled STP task. The fourth word contains a breakpoint flag. Channel Processor Table. This table contains a I-word entry for each side (input and output) of a physical channel and a pseudo channel. An entry contains a pointer to the channel message buffer for the channel-assigned task ID and the address of the channel processor assigned to the side of the channel. Input sides are assigned even numbers: output sides odd numbers. 1-17 CBT AET PUT MTCT DOFS SMSC SSEC SHMS SMDY TBPT PERT DBF SCT CXT 1.4.2 Channel Buffer Table. This table contains one entry of working storage for each disk driver channel. Assigned Equipment Table. This table points to entries in the PUT based on channels. TIl tables: TIICSW, TIILID, TIILIDC, TIICHUN Physical Unit Table. This table contains one entry of working storage for each disk drive on the system. Executive Request Table. This jump table contains a I-word entry for each executive request that can be made by a task. The entry consists of the address of the routine that processes the request. A set of constants used by the system 1 ms in 12.5 ns counts 1 second in 12.5 ns counts ASCII time in hours, minutes, and seconds ASCII date in month, day, and year Task Breakpoint Table Parity Error Table History trace buffer Subsystem control table Channel extension table EXEC PROGRAM AREA Included in the System Executive (EXEC) occupied area are interrupt handlers, channel processors, task scheduler, the drivers (disk, I/O Subsystem, and front-end), system interchange, request processors, and debug aids. EXEC has a BA of 0 and an LA equal to the installation parameter I@MEM. 1.4.3 STP TABLE AREA This area contains tables accessible to all STP tasks (not necessarily in the order noted). AUT Active User Table. It contains an entry for each interactive user that is logged on. CMCC Communication Module Chain Control. This table controls task-to-task communication. It is a contiguous area containing an entry for each combination of tasks possible within the system. The CMCC is arranged in task number sequence. The IDs of the requesting task and requested task determine the appropriate CMCC entry. SM-0040 1-18 CMOD Communications Modules. These are groups of six words each that form a pool from which they are allocated as needed. Two words are used as control; two are used as input registers; and two are used as output registers. A task receives all of its requests and makes all of its replies through a CMOD. CSD Class Structure Definition. CSD contains the job class structure. For each class defined in the structure, there is a class map; these appear in CSD in descending order. A header precedes the class maps. Variable length characteristic expressions for each class follow the maps. DAT Dataset Allocation Table. There is a OAT for each dataset known to the system that defines where the dataset logically resides on mass storage, that is, on which logical device(s) and what portion of a device. OCT Device Channel Table. The OCT serves as a link between the channel and the EQT. It is used by the disk driver to report completion of I/O and to report disk status. DET Device Error Table. the system log. DRT Device Reservation Table. There is a DRT for each device known to the system. The DRT contains a bit map showing available and reserved tracks on the device. ECT Error Code Table. This table controls abort and reprieve processing done by UEP. It contains a I-word entry for each system error code and is defined using the ERDEF macro. EQT Equipment Table. The EQT contains an entry for each device known to the system. 1ST Interactive Buffer Table. Buffer pool Table. JXT Job Execution Table. The JXT contains an entry for each job that has begun processing. The table is used to control all active jobs in the system and may contain from o to 63 entries. A 64th entry is reserved to represent the operating system, itself. LCT Link Configuration Table. It contains an entry for each front end connected to physical channels. LIT Link Interface Table. SCP assigns an LIT entry at deadstart to each channel used for interface communications. SM-0040 The DET is used to build messages for 1-19 It manages the Interactive LST Link Interface Stream Table. Eight input stream and eight output stream LSTs are contained within each LXT as used by SCP. LXT Link Interface Extension Table. An LXT entry is assigned by SCP to an active LIT entry for each front-end ID at LOGON and deassigned at LOGOFF. The LXT contains SCP working storage and input and output LSTs. MST Memory Segment Allocation Table. The MST contains an entry for each segment of memory that has been allocated by JSH as well as an entry for each free segment. It may contain from 1 to 127 l-word entries. PDI Permanent Dataset Information Table. This table contains information used by the Permanent Dataset Manager, such as the number of overflow and hash pages. PDS Permanent Dataset Table. The PDS table consists of a 1-word header followed by a I-word entry for each active permanent dataset. The entry indicates how a dataset is accessed and if multiple access exists. If so, the entry tells how many users are accessing the dataset. RJI Rolled Job Index Table. For each defined JXT entry, the RJI Table contains an entry that describes the job assigned to the JXT entry and controls the recovery of jobs from mass storage. RQT Request Table. This table is used to queue transfer requests for disk management. QAT Queued Dataset attributes for managed by PDM must equal the SDR System Directory. This area contains a Dataset Name Table (DNT), section 1.4.6, for each of the datasets comprising the system library. The SDR is initialized after a system Startup. SDT System Dataset Table. This table contains an entry for each dataset spooled to or from a front-end system. STPDD STP Dump Directory. This area contains pointers to task or1g1ns, buffers, etc. An entry gives a mnemonic in ASCII plus the relative STP address for the area. Table. This table describes the mUltitype a dataset that has been disposed. It is and EXP. The number of entries in the QDT SDT entry count. Details of the STP tables are given in the COS Table Descriptions SM-0040 1-20 1.4.4 STP PROGRAM AREA The System Task Processor (STP) consists of tasks and re-entrant code common to all of the tasks. Tasks cannot access the memory area occupied by EXEC but may access the rest of memory. Although tasks are loaded into memory during Startup, they are recognized only through an Executive create-task request (usually issued by the Startup task). The Startup task is a special case since it executes only when the system is started up and is created by EXEC itself. Recovery of rolled-out jobs executes as a portion of the Startup task rather than as a separate task. STP is described further in section 3. 1.4.5 CSP AREA A prototype of the Control Statement Processor (CSP) is maintained in memory following STP. This program is copied into each user program field where it executes each time the job requires interpretation of a control statement. CSP is further described in section 5. 1.4.6 USER AREA The user area of memory is assigned to one or more jobs. Each job has an area referred to as the Job Table Area (JTA) preceding the field defined for the user. A JTA is accessible to the operating system but not to the user. The JTA contains job-related information such as accounting data; JXT sense switches; areas for saving B, T, and V register contents; control statement, logfile, and EXU DSPs (user calls that load the binaries); a logfile buffer; and a DNT area. pointer~ DNT Dataset Name Table. This area in each user's JTA contains an entry for each dataset used by the job. Each user field begins with a l28-word block referred to as the Job Communication Block (JCB), which contains a copy of the current control statement for the job as well as other job-related information. The highest of the user field contains dataset buffers and I/O tables. The user field, in addition to being used for user-requested programs such as the compiler, assembler, and object programs, is also the area in which the operating system utility programs such as the loader, copy and positioning routines, and permanent dataset utility programs execute. The Control Statement Processor (CSP) also executes in the user field. SM-0040 1-21 Tables that may reside in the user field include the following: BAT Binary Audit Table. This table contains an entry for each permanent dataset that meets requirements specified on the AUDIT control statement and for which the user number matches the user number for the job. DOL Dataset Definition List. A DOL in the user field accompanies each request to create a DNT. DSP Dataset Parameter Area. A DSP area in the user field contains information concerning the status of a particular dataset and the location of the I/O buffer for the dataset. JAC Job Accounting Table. This table defines the format of data returned to the user by an accounting request. LFT Logical File Table. This table in the user field contains an entry for each dataset name and alias referenced by FORTRAN users. Each entry points to the DSP for a dataset. ODN Open Dataset Name Table. A request to open a dataset for a job contains a pointer to the ODN table in the user field. PDD Permanent Dataset Definition Table. A PDD table in the Control Statement Processor (CSP) is used for saving, accessing, and deleting permanent datasets. Refer to COS Table Descriptions Internal Reference Manual, publication SM-0045 for detailed descriptions of these tables. SM-0040 1-22 1.5 MASS STORAGE SUBSYSTEM ORGANIZATION Depending on the CRAY-l model, mass storage consists of either DD-19 or DD-29 Disk Storage Units and DCU-2, DCU-3, and DCU-4 Disk Control Units. The controllers are model dependent. These devices are physically non-removable. For models that do not have an I/O Subsystem, assignment of units and DCU-2 and DCU-3 DCUs to channels is assembled into the Equipment Table (EQT). The DCU-4 controllers and their corresponding units are on the I/O Subsystem. Each disk storage unit contains a device label, datasets, and unused space to be allocated to datasets (figure 1-8). Additionally, one disk storage unit is designated as the master device and contains a table area called the Dataset Catalog (DSC), which is used for maintaining information about permanent datasets. 1.5.1 FORMATTING Before a unit can be introduced into the system, it must be formatted. Formatting is the process of writing cylinder, head, and sector identification on the disk storage unit. This process is performed off-line by field engineers. Unless addressing information has been inadvertently destroyed, formatting is performed only once. MASTER DEVICE DEVICE Figure 1-8. SM-0040 Mass storage organization 1-23 DEVICE 1.5.2 DEVICE LABEL (DVL) A disk storage unit (DSU) must be labeled before it can be used by the system. The Install program writes a Device Label Table (DVL) on one track of each DSU. The DVLs act as the starting point for determining the status of mass storage when the system is deadstarted or restarted. The location of the DVL is usually, but is not required to be, the first track on the device. Flaw information A DVL contains a list of flaws (bad tracks) for its DSU. Initial flaw information is obtained from an engineering diagnostic run prior to the Install program. Install reads back each DVL after writing it to verify the integrity of the DVL. If a DVL cannot be read back perfectly, then the track is overwritten with a test pattern and a different track is tried. The DVL is the last track written by Install so that all flaws, even any discovered while trying to write the DVL itself, are recorded in the DVL. Dataset Allocation Table (DAT) for DSC The DVL for the master device maps the Dataset Catalog (DSC) since it contains the complete Dataset Allocation Table (DAT) for the DSC except for DAT page headers. 1.5.3 DATASET CATALOG (DSC) The Device Label Table (DVL) for the master device states which tracks comprise the Dataset Catalog (DSC). Similarly, the DSC states which tracks comprise each of the currently cataloged datasets. Deadstart and Restart update the Disk Reservation Table (DRT) in STP-resident memory to reserve these dataset tracks so that the existence of permanent datasets is known to the system when it is deadstarted or restarted, as opposed to an install which assumes that all of mass storage is vacant. Special consideration is given to job input and output datasets, however. Deadstart deletes all of the input and output datasets, defined by flags in the DSC. Entries for these datasets in the DSC are zeroed. Restart, on the other hand, recovers the job input and output datasets. SM-0040 1-24 1.6 EXCHANGE MECHANISM The technique employed in the CRAY-l to switch execution from one program to another is termed the exchange mechanism. A 16-word block of program parameters is maintained for each program. When another program is to begin execution, an operation known as an exchange sequence is initiated. This sequence causes the program parameters for the next program to be executed to be exchanged with the information in the operating registers. The operating register contents are thus saved for the terminating program and the registers entered with data for the new program. Exchange sequences may be initiated automatically upon occurrence of an interrupt condition or may be voluntarily initiated by the user or by the operating system through normal (EX) or error (ERR) exit instructions. As will be shown in section 2, the System Executive (EXEC) is always a partner in the exchange; that is, it is either the program relinquishing control or receiving control. All other programs must return control to EXEC. The contents of the interrupt flag register (F) are instrumental in the selection of the next program to be executed. 1.6.1 EXCHANGE PACKAGE An exchange package is a 16-word block of data in memory that is associated with a particular computer program. It contains the basic hardware parameters necessary to provide continuity from one execution interval for the program to the next. The exchange package is illustrated in figure 1-9. 1.6.2 EXCHANGE PACKAGE AREAS System hardware requires that all exchange packages be located in the first 4096 words of memory. In addition, the deadstart function expects an exchange package to be at address O. This is the exchange package that initiates execution of EXEC and, consequently, the operating system. The EXEC exchange package is either active or is in one of the other exchange package areas (figure 1-10). SM-0040 1-25 o 2 10 12 14 16 18 24 31 36 63 40 n ~~~~~~~~9----------------+~~----------------~ n+1 n+2 ~~~~~~~~ n+3 ~~~~~~~~~~~~~~~~~~~__________________~ n+4 ~~~~~~~~~~~~~~~~~~~ __________________~ n+~ ~~~~~~~~~~~~~~~~~~~ __________________~ n+6 ~~~~~~~~~~~~~~~~~~~ __________________~ ~~~~~~~~~~~~~~~.w.w~~ __________________ n+7 n+8 n+9 ~ ~ n+IO~ n+1I ~ n+12~ n+13~ n+14~ ______ ~ ____ ~ __ ~ __ ~ __________________ ~ ~ ______________________________________________________ ~ ______________________________________________________, ______________________________________________________, ______________________________________________________, ______________________________________________________ ~ ______________________________________________________, ______________________________________________________ ~ n+15~______________________________________________________~ S R RAB I P BA LA XA VL Registers Syndrome bi ts Read address for error (where B is bank) Program address Base address Limit' address Exchange address Vector length E - Error type (bits 0,1 of n) 10 Uncorrectable memory 01 Correctable memory R - Read mode (bits 10,11 of n) 00 Scalar 01 I/O 10 Vector 11 Fetch H - Modes n+1 n+2 n+2 n+2 n+2 Interrupt monitor mode t Interrupt on correctable memory error 37 Interrupt on floating point error 38 Interrupt on uncorrectable memory error 39 Monitor mode 39 36 f - Flags n+3 n+3 n+3 n+3 n+3 n+3 n+3 n+3 n+3 31 32 33 34 35 36 37 38 39 Programmable clock interrupttt MCU interrupt Floating point error Operand range error Program range error Memory error I/O interrupt Error exit Normal exit Supports Monitor Mode Interrupt option on CRAY-lA and CRAY-lB. tt Supports Programmable Clock (optional on CRAY-lA and CRAY-lS; standard on CRAY-l S Series computers) t Figure 1-9. SM-0040 Exchange package 1-26 These other exchange packages, summarized below, are selected by EXEC depending on interrupt flags and other conditions as defined later: • Any of a set of exchange packages in the System Task Table (STT). (There is one exchange package for each STP task.) • The active user exchange package. This exchange package, located at location SXBF, points to the currently active user program as selected by the Job Scheduler. When a user program becomes inactive (is disconnected from the CPU) or causes a normal or error exit, its exchange package is copied from the active user exchange package area into the Job Table Area (JTA) for that job. When a user program is disconnected, some other job may be connected to the CPU and its exchange package is copied from its JTA to the active user exchange package area. • The idle program exchange package. This exchange package, located at location IDXP, is selected when there are no tasks or user programs are scheduled for execution. • The memory error correction exchange package. This exchange package, located at CORXP, is selected when an exchange is caused by a memory parity error. 1.6.3 B, T, AND V REGISTERS On any exchange to EXEC, EXEC saves the task or user program's BO register because EXEC uses BO. A task's BO register values are stored in the STT. The active user's BO value is stored in SSBO during interrupt processing. When EXEC exchanges out, it restores the proper BO register value. All B, T, and V register values are saved by EXEC only when the current user job is being disconnected from the CPU in favor of some other job. A job's B, T, and V register values are restored when it is reconnected to the CPU. These registers are maintained in the job's JTA. SM-0040 1-27 k -(BA).. - -- ...... , - , User XP \ EXEC ~ -- (p)-- Idle XP STP --- Task 0 XP Task 1 XP -USERS --- / : / "-- - 5LA ) -....t / Operating Registers Program Areas Task n XP Exchange Package Areas A. EXEC .STP ~ EXEC IN EXECUTION ".'" - /----, User XP l , , .;1'1 ... ,(BA) ",/' .... __ "{PY''' Idle XP I' / ,,/ " / -- // --- -- USERS -- TASK XP / --- / / " / 1 }LA) I¥ Operating Registers ~ Exchange Package Areas Program Areas B. TASK 1 IN EXECUTION -J .... '" ...... EXEC / / , .. /(p'( I f.-- / USERS -- /1 -- ~;/, (LA)'" -- ~ / ,.., -- "" /' (SA) STP 1-- / ,, /' / ,- " ... A' ' Idle XP / USER XP Task 0 XP Task 1 XP Operating Registers Task n XP Program Areas Exchange Package Areas C. SM-0040 Task n XP CURRENT USER IN EXECUTION 1-28 1.7 COS STARTUP System Startup is the process of loading the operating system into CRAY-l memory, beginning execution, and generating or recovering tables for the operating system. There are three types of startup: Install, Deadstart, and Restart. A general description follows; details are given in section 5. Install For an install, COS is started as if for the very first time. All CRAY-l mass storage is assumed to be vacant. The startup program labels devices and establishes the Dataset Catalog (DSC) on mass storage. Deadstart For a deadstart, COS is started as if after a normal system power-down. Permanent datasets are recovered but input queues and output queues are not reconstructed. Rolled-out jobs cannot be recovered during a deadstart. Restart For a restart, COS is started as if after a system failure (crash). Input queues and output queues as well as permanent data sets are recovered. Rolled-out jobs may be recovered according to operator selection. 1.8 GENERAL DESCRIPTION OF JOB FLOW A job passes through the following stages from the time it is read by the front-end system until it terminates: • • • • 1.8.1 Entry Initiation Advancement Termination JOB ENTRY A job enters the system from a front-end system. The Station Call Processor task (SCP) in STP is responsible for making the job's existence known to the system. It does this by: SM-0040 1-29 • Making an entry in the System Dataset Table (SDT), • Requesting that an entry be created in the Dataset Catalog (DSC), thereby making the dataset permanent, and • Readying the Job Scheduler Task (JSH). 1.8.2 JOB INITIATION The Job Scheduler Task (JSH) scans the SOT looking for candidates for processing. A job is scheduled to begin processing (initiated) when: • An entry for a job of the correct class is available in the Job Execution Table (JXT) (the maximum number of entries in the JXT is 63), and • No other job of higher priority is waiting to begin processing. JSH uses an available entry in the JXT to create an entry for the job being initiated. The Job Scheduler continues to use the JXT entry during the life of the job to control CPU use, job roll-in/roll-out, and memory allocation. JSH also moves the job's SDT entry from the input queue to the executing queue, still in the SDT. The Rolled Job Index entry corresponding to the assigned JXT entry is also initialized at this point. 1.8.3 JOB ADVANCEMENT The Job Scheduler gives each job a CPU priority that reflects its history of CPU usage so that I/O-bound jobs can have a greater chance of being assigned to the CPU. A job requiring a large memory area is allowed to stay in memory longer to compensate for its greater roll-in/roll-out time. A job assigned more than average CPU time for its priority is liable to be rolled out sooner as a consequence. The operator may change a job's priority while a job is running. Not all jobs having entries in the JXT are in memory. Some may be rolled out to mass storage when some event has occurred that causes other jobs to replace them in memory. The Control Statement Processor (CSP) advances a job through its program steps. CSP is first loaded and executed in the user field following job initiation; thereafter, it is called whenever a job step terminates. Nnrm~l -. - - ---- - ;nh oJ - - ~t-~n t-~rmin~t-inn - - L,- ---- - - -- - - nr~l1r~ -- - wh~n - . _. - - an F'mAnV -- - - -.-- - _. ~all - is maop to thp - system by the user program. Abnormal termination occurs upon detection of an error during the job step or an F$ABT call by the user program. SM-0040 1-30 1.8.4 JOB TERMINATION When a job terminates, the following action occurs: • • • • • • • • • 1.9 A DSC entry is created for the job's output datasets. A SDT entry is created for the job's output datasets. The DSC entry is deleted for the input dataset. The user dayfile, $LOG, is copied onto the end of $OUT. The SDT entry is deleted from the executing queue. The JXT entry and the memory assigned to the job are released The Rolled Job Index entry is cleared (zeroed). SCP is readied and scans the SDT for output to send to the front-end system. SCP deletes its DSC and SDT entries after the output dataset is totally transmitted to the front-end system. DATASET MANAGEMENT All information maintained on mass storage by the CRAY-l Operating System is organized into collections of information known as datasets. Datasets are of two types: local or permanent. A local dataset exists only for the life of the job that created it and can be accessed only by that job. A permanent dataset is available to the system and can survive system deadstarts. A dataset is permanent if it has an entry in the Dataset Catalog on disk. Permanent datasets are of two types: those that are created through use of directives (user permanent datasets), and those that represent standard job input and output data sets (sy~tem permanent datasets) • User permanent datasets are maintained for as long as the user or installation desires. A user permanent dataset is protected from unauthorized access by use of permission control words. The user may create a user permanent dataset by pre-staging in a dataset from a front-end computer system or by using the SAVE or ACQUIRE control statement or macro. A user accesses a user permanent dataset by using the ACCESS control statement or macro. The dataset may be removed from the system with the DELETE control statement or macro. More than one authorized user may access a permanent dataset. A user wishing to write on or otherwise alter a permanent dataset must have unique access; multiple users wishing to read the dataset may have multiaccess. Some permanent datasets similar to user permanent datasets are created and maintained by the system. No user can either delete or access these datasets because the system has unique access to them. Among these datasets is the Rolled Job Index dataset, which is created or accessed by the Startup task and remains in use throughout the operation of the system. SM-0040 1-31 System permanent datasets are job related. Each job's input dataset is made permanent when the job is received by the CRAY-l. When job processing ends, certain of the job's local datasets having special names or which were given a disposition other than scratch by the user are made permanent and the job's input dataset is deleted from CRAY-l mass storage. The output datasets that were made permanent are sent to a front-end computer system for processing. They are deleted from CRAY-l mass storage when their receipt has been acknowledged by the front-end computer system. 1.10 I/O INTERFACES Figure 1-11 presents an overview of the interfaces and system components involved in performing input and output in the system. It summarizes the request levels and routine calls without going into details on the movement of data. That is, it does not describe how data is transferred from disk to a circular buffer and then to a user area on a read; nor does it describe how it is transferred in the reverse sequence on a write. Major interfaces exist between the user and STP and between STP and EXEC. Details of the user levels of I/O are presented in the FORTRAN Reference Manual, publication 2240009, and in the CRAY-OS Version 1 Reference Manual, publication SR-OOll. Details for EXEC (driver level) I/O are given in section 1 of this publication. Details for STP interfaces are given in section 3.3 of this publication. I/O can be blocked or unblocked and can be initiated by the user or by the system. FORTRAN statements for logical I/O represent the highest level of I/O requests. The FORTRAN statements fall into two categories: formatted/unformatted and buffered. The formatted/unformatted statements (i.e., READ,- PUNCH, WRITE, and PRINT) result in calls to library routines $RFI through $WUF. These routines contain calls to the Logical Record I/O routines, also on the library. These calls may be formatted by the user or may be made through CAL language macros. The Logical Record I/O routines issue Exchange Processor requests (i.e., F$ calls) that consists of read circular and write circular requests to the Circular Input/Output (CIO) routines resident in STP (see section 3.3.3). SM-0040 1-32 Asynchronous 1/0 Synchronous I/O user CFT ~UFFERED I/O STATEMENTS eFT FORMATTED/ UNFORMATTED STATEMENTS BUFFER IN READ PUNCH BUFFER OUT PR INT WRITE CAL BUFFERED I/O MACROS BUF IN BUFOUT CAL UNBLOCKED I/O MACROS BUFEOF interface CAL BLOCKED I/O MACROS READ WRITE WR I TEF READP WRITEP WRlTEG READC WR !TEC BKSP READCP WRITECP BKSPF GETPOS READU BUF INP BUFOUTP BUFEOD BUF CHECK SETPOS WRITEU " BUFFERED REWIND I/O $RF I " $WF I $RUI 1 ibrary routines $WU I $RFA $WFA $RUA $WUA $RB $RFV $WFV $RUV $WUV $WB $RFF $WFF $RUF $WUF , CAL BUFFERED I/O INTERFACE $CB IO LOGI CAL RECORD I/O $RWDR $WWDR $WEOF $GP05 UNBLOCKED DATASETS $RLB $RWDP $WWDP $WEOD $WLB $RCHR $WCHR $REWD $RCHP $WCHP $WWDS $51'05 $BK5P $BKSPF system calls F$RDC F$B IO F$WDC USER , CIO TIO $RWDR $RWDP $WWDR $WWDP $WWDS $WEOF $WEOD .. 1-----------------....... ~ RDCS NON-CIO WDCS (Z. SCP. and JSH) CIOS $R EWD I I I DQM _____ .J STP l' DISK DRIVER '/ I / I \ \ Disk Controller Functions EXEC Figure 1-11. SM-0040 Overview of COS I/O 1-33 System logical I/O required by COS tasks (e.g., management of the Dataset Catalog, etc.) is generally performed through Task I/O routines resident in STP (see section 3.3.2). TIO routines closely resemble the Logical Record I/O routines. In addition to supporting I/O for system tasks, TIO routines also handle FORTRAN buffered I/O. At the FORTRAN level, the BUFFER IN and BUFFER OUT statements are compiled into calls to two library routines, $RB and $WB.These routines issue F$BIO Exchange Processor requests that interface with a subset of TIO routines in STP. Since TIO routines reside jointly with CIO in STP, they directly call CIO routines to perform the same functions as requested through F$ calls by the Logical Record I/O routines. Thus, CIO becomes the focal point for all logical I/O in the system. CIO communicates its needs for physical I/O to the Disk Queue Manager (DQM) through DNT and DSP tables. The DNT for a dataset points to its DSP, which specifies the request. This is the normal mode of communication with DQM. Currently, however, DQM also communicates with the Station and Startup interfaces. In these interfaces, SCP and Z pass a caller-built DNT containing the I/O request for DQM. The Job Scheduler (JSH) also uses a non-CIO interface to process job roll-in/roll-out and to manipulate the Rolled Job Ind~x dataset. DQM coordinates physical I/O activity on the disks by queueing executive requests for the Disk Driver (section 2.8). This driver consists of a number of channel processors that issue functions to the disk controllers. SM-0040 1-34 2 EXEC The system Executive module (EXEC) is the control center for the operating system. It alone accesses all of memory, controls the I/O channels, and selects the program to execute next. Components of EXEC include an interchange routine, interrupt handlers, channel processors, an EXEC request processor, and a task scheduler. These programs are integral to EXEC. Control transfers from routine to routine through simple jumps. EXEC first begins execution when the system is started up. Following this initial system interchange, EXEC performs the following functions. • Disables and clears the programmable clock, • Determines the size of the deadstart binary by reading the channel address for the MCU channel, • Receives a word packet containing time and date from the MCU, • Sets the real-time clock, • Master clears each disk control unit, • Sets the SECDED bits in memory in case a power off has occurred, • Sets the limit address to the proper value, and· • Starts up the root task (Startup). After the system is started up, EXEC begins execution Whenever a task or the currently active user program is interrupted. This interrupt may result from the program executing an exit instruction, ERR or EX, or may be an I/O or error condition. EXEC saves (BO) and saves the clock upon entry. If one second has elapsed, it sets the real-time (RT) flag in the exchange package. After setting and checking the clock, EXEC initiates execution of the Interchange routine. SM-0040 2-1 Interchange analyzes the cause of the interrupt (details are given below) and passes control to the appropriate handler. The interrupt handler, in turn, clears the interrupt flag and activates a channel processor. After processing the interrupt condition, the channel processor returns to Interchange which checks for additional conditions. When all of the outstanding interrupts have been analyzed and processed, the Task Scheduler selects the highest priority task ready for execution. If no tasks are ready, the currently active user program is scheduled for execution. If no user program is currently active, the idle program is executed. EXEC then initiates an exchange sequence to the selected program and does not regain control until the next interrupt occurs. ----------~.~EXEC exit .. - - - - - - I/O interrupt, program exit, or error condition Figure 2-1. 2.1 EXEC-controlled exchange sequences INTERCHANGE INTERRUPT ANALYSIS Each time Interchange is entered, it checks to see if the cause of the interrupt was an I/O condition. It does not do this by directly checking the I/O interrupt flag in the F register but rather by requesting the channel number of the highest priority channel with an I/O request. If a channel has an I/O request, Interchange calls the I/O Interrupt Handler, 101, Which then uses the channel number to initiate the correct channel processor. The channel processor returns control to Interchange which repeats the inquiry for the highest priority channel having an I/O request. In this way, Interchange continues to process I/O requests until none remains. When no I/O requests are outstanding, Interchange checks each of the following interrupt flags in turn, summoning the appropriate interrupt handler: MCU Programmable clock Real-time Error exit, floating-point error, program range error, and operand range error (tested as a group) Normal exit Memory error exit SM-0040 2-2 After a flag is processed, control always returns to the beginning of EXEC to ensure that any I/O interrupts occurring in the interim are handled. JTA n DISK RES ID~ IUtilities . 3 .. -.. CAL CFT COS - JTA 1 ............. I- • ~ LDR Icspr I-- 4 .... I 2 ... ... 4 3 USER 1 2 n t I--- t-!-- I-- Memory error correction program ... INTERRUPT I ............... / TASK SCHEDULER INTERCHANGE ++- ... t t t 1TI ... EXEC ~ CHANNEL PROCESSORS lOP CHANNEL DRIVER DISK DRIVER I t FRONTEND DRIVER ! Figure 2-2. SM-0040 ~ \ EXEC REQUEST PROCESSOR I System control 2-3 Task i-- ~ Task 1 XP J ~ ~ V , ~ INTERRUPT HANDLERS XP XP XP t ,.... Idle to \ current job COMMON ROUTINES V ~ 1\ XP 0 Task XP ~ f-- • •• XP 2 •• • r-- r.. Task n STP 2.2 INTERRUPT HANDLERS An interrupt occurs, triggering an exchange to EXEC, if one or more of the interrupt flags is set in the F register. One or more of the following interrupt handlers will then be executed. IOI CII RTI NEI EEl MEl I/O interrupt handler MCU interrupt handler Real-time interrupt handler Normal exit interrupt handler Error exit interrupt handler Memory error interrupt handler The interrupt handler clears the interrupt flag in the exchange package of the interrupted program and determines the channel on which the interrupt occurred. Execution of the processor assigned to that channel is then initiated. 2.3 CHANNEL MANAGEMENT The operating system recognizes 12 physical I/O channels and 4 pseudo channels. The four pseudo channels are for normal exit, error exit, memory error, and real-time interrupts and can be processed in a manner similar to I/O interrupts. Each channel is considered as having two sides, an input side and an output side, each numbered separately. Input sides are assigned even numbers; output sides are assigned odd numbers. When both sides of a channel are referenced (sometimes referred to as a channel pair), the number is the input channel number divided by two. That is, channel pair 5 is channels 10 and 11. Thus, channel pairs are assigned decimal numbers as follows: o Console interrupt 2 4 6 12 I/O channel pairs 24 26 28 30 32 SM-0040 Normal exit pseudo channel pair Error exit pseudo channel pair Real-time interrupt pseudo channel pair Memory error pseudo channel pair 2-4 2.3.1 CHANNEL TABLES The following tables aid in channel management: Channel Table (CHT) I/O Service Task Defined Table System Task Table (STT) Channel Extension Table (CXT) Channel Buffer Table (CBT) Subsystem Control Table (SCT) Detailed information on these tables is available in the COS Table Descriptions Internal Reference Manual, publication SM-0045. Figure 2-3 illustrates how these tables are linked together. I/O Service Processor Table CHT STT Figure 2-3. Channel table linkage Channel Table (CHT) CHT contains an entry for the input side and the output side of each channel, real or pseudo. The entry contains (1) a pointer to the I/O Service Processor table used by the channel processor to control the channel, (2) the address of a processor for that side of the channel, and (3) a pointer to the STT parameter block for the task using the request on this channel. SM-0040 2-5 I/O Service Processor Tables The I/O Service Processor tables contain information for control of the channel processor and may contain pointers to other tables. Each of the following channel types has a different I/O Service Processor Table: front-end channel, mass storage channel, exchange pseudo channel, memory error, and real-time pseudo channel. System Task Table (STT) The STT contains a parameter block and the exchange package area for each task. Tasks are identified by numeric IDs. 2.3.2 CHANNEL ASSIGNMENTS Channel assignment is a 2-level process: • • Assign a task to a channel Assign an EXEC processor to a channel Startup performs the first level when it enters the task 10 and Channel Control Table address in the CHT. The second level assignments occur either at system build time or dynamically during I/O initiation. The following executive requests relate to channel assignment: • • • 2.3.3 Assign channel Disk block I/O request Station I/O request CHANNEL PROCESSORS Some channel processors are assigned to a channel permanently; others are assigned on an as-needed basis as a result of task requests to EXEC. Each side of a channel always has one of the following processors assigned to it. The current assignment is maintained in the Channel Table (CHT). Console Exit Processor (CEP) The Console Exit Processor is always assigned to channel 0 and to the MCU interrupt flag. It forces the executive to check the top event on its time event stack. SM-0040 2-6 Reject Processor (RJ) The Reject Processor is assigned to any unused channels. A transient interrupt could result in RJ being executed. The processor simply returns control to the interchange routine. Normal Exit Processor (NE) The Normal Exit Processor is assigned to pseudo channel 26. If the normal exchange is from a task, bit STRTS of word I of the STT header is nonzero and NE initiates execution of the executive request processor. Otherwise, the job has issued a system task request and the Exchange Processor task is scheduled. Error Exit Processor (EE) The Error Exit Processor is assigned to pseudo channel 28. If the error exit was from a task, EE checks for a possible breakpoint condition. If the error exit (includes error exit and operand range and floating-point errors) was from the currently active user, the Exchange Processor task is readied. The interrupt register error flags are passed to the Exchange Processor for interpretation of the cause of the error exit. Memory Error Processor (ME) The Memory Error Processor is assigned to pseudo channel 32. When a single-bit memory error occurs, it compares the total single-bit error count against the limit for disabling single-bit error detection. If the count exceeds the limit, the processor disables the interrupt on single-bit errors. For single-bit errors, the processor attempts to restore the memory address to its correct value so that the error does not occur on the next read of that address. For all memory errors, the processor readies the memory error task, Which determines what software corrective action should occur. Real-time Interrupt Processor (RTP) The Real-time Interrupt Processor is assigned to pseudo channel 30. If the job's time slice has elapsed, it schedules the Job Scheduler task. SM-0040 2-7 Disk Processors Any of the following disk processors can be assigned to a CPU I/O channel as a result of an executive request for the disk driver from the Disk Queue Manager. US DP WD MS RD SS MC EC Unit select Disk position write Margin select Read Subsystem status Master clear controller Error correction When a processor completes its function, the disk driver in the Executive Request Processor assigns a processor for the next function in the sequence to be performed without involving a task. Refer to Disk Driver, section 2.8, for details. Front-end Processors Any of the following processors can be assigned to a CPU I/O channel as a result of an executive request for the front-end driver from the Station Call Processor task. WLCP WSSEG WLTP RLCP RSSEG RLTP WXLCP WXLTP write link control package Write subsegment Write link trailer package Read link control package Read subsegment Read link trailer package Write error link control package Write error link trailer package When a processor completes its function, it assigns the next front-end processor or reject (RJ) to the channel without involving the station Call Processor. Refer to Front-end Driver, section 2.7, for details. I/O Subsystem MIOP Command and Status Processors The following two processors are assigned respectively to the I/O Subsystem Master I/O Processor (MIOP) command and status channels which are handled in a fully asynchronous manner. • APIIP Processes MIOP status input interrupt • APOIP Processes MIOP command output interrupt SM-0040 2-8 APIIP calls the following packet processors to handle the appropriate status packets. APPACN APPACX APPACE APPACI APPACJ APPACA APPACB Null packet STP task packet Echo packet I-packet processor J-packet processor Disk packet Station packet I/O Subsystem BIOP Simulated Memory Channel The following processors handle the I/O Subsystem Buffer I/O Processor (BIOP) simulated memory channel in a synchronous manner. SHSRQT SHSOT SHSIT 2.4 Simulated memory channel request processor Simulated memory channel output processor Simulated memory channel input processor TASK SCHEDULER If a channel processor has requested execution of the Task Scheduler, the request task scheduler flag (STRTS) is set. If execution of the Task Scheduler has not been requested, the currently executing task resumes execution. The Task Scheduler executes, if requested, when Interchange has checked all of the interrupt conditions. If the STP interlock flag is set, the Task Scheduler returns to the previously executing task. This flag allows tasks to run in uninterruptible mode. Depending on the condition of the debug scheduling flag, one of two parts of the Task Scheduler executes. If the flag is not set, normal scheduling occurs. The STT is examined for the highest priority task ready to execute and its exchange package is selected. If two tasks have equally high priority, the first encountered task is selected. If debug scheduling is indicated (the debug scheduler flag is set), only the Station Call Processor (SCP) is a candidate for scheduling. SM-0040 2-9 A task is a candidate for execution when all of its status bits§ in its parameter word in the STT are clear. The status of a task may change when a task issues one of the following executive requests: • • • • Create a task Ready a task Suspend self Ready called task and suspend self Figure 2-4 illustrates the table linkage for task scheduling. If no task is ready for execution, the currently active job as defined by the Job Scheduler task is allowed to run. If no job is currently active, the idle program is selected. 2.5 EXEC RESOURCE ACCOUNTING EXEC maintains the following performance information in EXEC tables: • • • • Accumulated CPU time for itself (in EXEC table ETIM) Accumulated CPU time for each task (in STT) Total time given to users (in table UTIM) Count of all channel interrupts for both real and pseudo channels (Ie) Each user's execution time (in Job Table Area) Number of normal exits for each task (in STT) Number of ready task requests, both from other tasks and from external and internal interrupts, for each task (in STT) Number of each type of EXEC request • • • • 2.6 EXECUTIVE REQUEST PROCESSOR The Executive Request Processor is initiated by the Normal Exit (NE) channel processor when a normal exchange from a task implies the presence of a request for the Executive. The request is passed to EXEC in registers S6 and S7 of the task's exchange package. The low-order bits of word 1 of the STT header point to the parameter word for the interrupted task. § Ready is not considered a status bit. SM-0040 2-10 The Executive Request Processor handles the requests defined by the Task Call Table (TCT). When EXEC returns to a task following processing of an Executive request, control returns at (P)+3 for a normal return and at (P)+l if an error occurred. (P) is the address of the exit to EXEC. 2.6.1 EXECU'rlVE REQUESTS This section provides the request format and functional flow of executive requests issued by tasks. ..-STT HEADER --- STT TASK PARAMETER BLOCKS ...- STT TASK EXCHANGE PACKAGES Figure 2-4. SM-0040 Task Scheduler table linkage 2-11 Create a task request (CTSK=Ol) Request format: 63 task 56 10 S7 01 This function adds a task to the EXEC STT. The flow is: 1. If table is full, report table full error to STP. 2. Retrieve task's XP information from caller. 3. Set task status to requested status. 4. Set task 10 to requested priority. 5. Set task priority to requested 10. 6. Construct task XP in STT XP area. 7. Set task defined bit in header of STT. 8. Build disk I/O reply queue control word. 9. Force task into execution. Ready a task request (RTSK=02) Request format: 55 56 This function readies a task. Its flow is: 1. Find task with desired 10; if none, report error to STP. 2. Clear suspend 3. If task is suspended, request task scheduler. SM-0040 fla9s~ 63 set reready status if task currently ready. 2-12 Self-suspend task request (SUSP=03) Request format: o 55 63 ::1 1 This request suspends a task. Its flow is: 1. If reready bit is clear, set suspend bit: otherwise, clear reready bit. 2. Request task scheduler to run. Assign channel request (ARES=04) Request format: o 40 16 52 55 56 63 task ID This request assigns a task to a channel pair. The flow is: 1. Find task with desired 10. 2. Check for task assigned to channel; if one is, report status to STP. 3. Link task to channel by entering the Task Parameter Block (TPB) address of task in CHT entry for the channel pair. SM-0040 2-13 Station I/O request (FET=05) Request format: a 55 36 S6 5 S7 _~_------------.JI 60 63 This request activates the input and/or output sides of a channel pair. The processing flow is as follows: 1. If channel ordinal is 0: a. b. c. d. e. 2. Assign task to channel. Set input and/or output active flags. Set CA and CL for input and/or output.§ Start processing by station channel driver. Release task from channel. Otherwise: a. b. c. Build MIOP station request in CXT. When MIOP requests addresses, put message on send queue to MIOP. (The eXT contains a flag indicating that an address request has arrived and addresses should be queued immediately. ) Return to requesting task. Timed ready request (TDLY=06) Request format: o 63 55 RTC value when resume is to occur 06 § Privileged to monitor SM-0040 2-14 This request suspends a task until the specified time occurs. processing flow is: 1. Clear current task time delay, if set. 2. Set resume time in time event stack. 3. After time delay, ready task and request scheduling. The Dequeue SOT entry request (DQSD=10) Request format: 40 16 01 63 55 DQ o if FIFO dequeuing 1 if entry dequeuing The processing flow is: 1. Type of dequeue is determined. - If FIFO dequeuing, the first entry is used. Its address is placed in S6. - If entry dequeuing, the entry specified in S6 is used. 2. Entry dequeued. 3. Count in queue head is decremented by 1. 4. Entry is unlocked. Disk block I/O request (10=11) Request format: o 16 31 40 DNT address DCT address SM-0040 2-15 55 63 EQT address 11 8 The processing flow is as follows: 1. If channel number greater than 12: a. b. c. 2. Build a MIOP disk request. Set a request timeout. Return to the calling task. Otherwise, the disk block I/O request results in execution of the disk drive. See section 2.8. Enqueue SOT entry request (EQSO=12) Request format: 01 40 16 55 63 S6 S7 EQ a if FIFO enqueuing 1 if class-priority enqueuing The processing flow is: 1. Type of enqueue is determined. - If FIFO enqueueing, the last entry position is used. - If class-priority enqueueing, the queue is searched by class rank, priority, and time submitted to determine position. 2. Entry is enqueued. 3. Count in queue head is incremented by 1. 4. Entry is locked. 5 .. Set up SDT with control information. 6. Set return address. The routine that processes this request is actually the disk driver (section 2.8). SM-0040 2-16 Ready task and suspend self request (RTSS=14) Request format: a .55 56 63 task ID 14 8 This request permits one task to ready another and then have itself suspended. The processing flow is: 1. Find task 10. 2. Clear suspend bit of called task. 3. Set suspend bit of calling task. 4. Set EXEC scheduler request bit. Get time and date request (RQST=15) Request format: a 55 63 :1 I This request returns the time and date to the requesting task. is: The flow 1. Get time and date from real-time pseudo channel. 2. Set time and date into S6 and S7 of requesting task's XP area. Time format: a h SM-0040 15 h 23 39 m m 2-17 47 63 s 5 I Date format: a 1m 15 m 23 / 47 39 d d I/ 63 y y Connect user job to CPU request (RCP=16) Request format: 16 0 55 40 relative address s of job I s JTA 63 16 8 in msec. The job scheduler issues this request when the CPU is to be switched from the currently executing job to a newly selected job. The executive request flow is: 1. Verify CPU not in use by another job. 2. Get absolute address of job. 3. Set SAXP to user P register and XP pointer. 4. Copy XA from user XP area. 5. Set time slice value into real-time pseudo channel. 6. Set start time into real-time pseudo channel. 7. Load B, T, and V registers. Disconnect user job from CPU request (DCP=17) Request format: o SM-0040 55 2-18 63 This request must precede an RCP request issued by the job scheduler. The processing flow for this request is: 1. Clear SAXP. 2. Save B, T, and V registers. 3. Relocate SA and LA. 4. Copy XP to Job Table Area for the job being disconnected. Post message in history buffer request (POST=20) Request format: o 43 46 49 55 63 ~r~~ ::1 Debug function code This request permits any STP task to enter two S registers of information into the history buffer, when that debug function is selected. The processing flow is: 1. Set up call to EXEC subroutine DEBUG. In other words, move debug function code to AS, first S register to S6, and second S register to S7. 2. Call subroutine DEBUG to enter message in trace with time and issuing location stamp. Set memory size request (SMSZ=21) Request format: o 40 56 55 63 ddress 21 8 57 _ This request is used during system initialization when the size of memory is changed through a Startup *SIZ parameter. 1. SM-0040 Set new system limit address in all system exchange packages. 2-19 Packet I/O request (PIO=22) Request format: o 40 52 55 63 S6 S7 022 Field Word Bits Description SCT S6 40-63 52-54 subsystem Control Table address Function code o Clear I Send packet 2 Receive packet Fe S7 81 This request invokes the I/O Subsystem driver called the lOP driver. Before a task uses this request to perform I/O, the lOP driver must be linked to the STP resident table called the Subsystem Control Table (SCT). This linking is accomplished when the task issues the first clear PIO request. The SCT address must never change thereafter. A task monitors the status of the subsystem by inspecting the status field (SCSTAT) of the SCT table. Three flags are maintained by the lOP driver in the SCSTAT field. They are as follows: • • • SCDOWN=1 SCRST=1 SCIR=1 I/O Subsystem Down flag I/O Subsystem Reset flag Input Ready flag Flag SCDOWN is cleared and flag SCRST is set by the lOP driver when the I/O Subsystem has been restarted or initialized. A task can then acknowledge reset by issuing a PIO clear request which clears the SCRST flag. The driver cannot accept a clear until all input has been processed which means the flag SCIR must be clear. Sending or receiving a packet requires that a packet address (SCCIP) and packet size in words (SCPSZ) be passed in the SCT table. In general a packet can be received When the SCIR flag is set, and a packet can be sent when all status flags are clear. Move system down to execution area request (MVEDWN=23) Request format: o 16 40 S6 A S7 SM-0040 63 55 23 8 2-20 This request moves an image of an operating system down to the executable area. The processing flow is: 1. Build an exchange package at location 0, with monitor mode set ano with P as the first instruction in the move-down loop_ 2. Consider that exchange package to be the current one and activate it. The move-down flow is: 1. Starting with the high address of the system, perform full-length vector transfers. 2. Assign the exchange package at location 0 and exchange to it. Start system request (START=24) Request format: 56 57 This request starts the system after a system breakpoint is encountered or after a stop function has been issued. The flow is: 1. Clear alternate task scheduling flag that forced system to idle except for external requests to the station (SCP). 2. Request execution of the task scheduler. stop system request (STOP=25) Request format: o 55 56 I 57 SM-0040 63 2-21 This request stops the system except for entry of interactive debugging commands. The processing flow is: 1. Set alternate task scheduling flag. The alternate scheduling allows only SCP to execute so interactive debugging commands can be entered. Display memory request (DMEM=26) Reauest format: o 40 16 Display area FWA 56 55 63 Buffer area FWA Length 57 This request copies memory to a specified area. It is used to display memory during interactive debugging. The processing flow is: 1. Move the memory block from the requested area to the display buffer. Enter memory request (EMEM=27) Request format: o 40 4648 52 55 63 Value to be entered 56 27 Memory word address 57 8 it length This request enters the bit string into memory at the specified bit position. The flow is: 1. Shift value right (64 - bit offset - bit length). 2. Merge into the memory address. Display exchange package request (DXPR=30) Request format: o 40 47 55 63 I I' SM-0040 2-22 This request moves the contents of the exchange package and BO to a buffer. The processing flow is: 1. Copy exchange package to words 0 through 15 of buffer. 2. Copy (BO) to word 16 of buffer from the task BO Save Table. Enter exchange package register request (EXPR=31) Request format: o 8 16 56 32 40 46 48 55 63 Value Register This request inserts the bit string into the specified exchange package register. The flow is: 1. Determine word length and position of specified register in memory. 2. Shift value right to desired position. 3. Merge into memory address. Register designators can be any of those noted in COS Front-end Protocol Internal Reference Manual, publication SM-0042, Debug Function Request ( 027 8) • Set system breakpoint request (SBKPT=32) Request format: o 16 40 43 56 55 63 1 57 This request sets a single or double breakpoint in the system by changing an instruction parcel to an illegal value. If a breakpoint exits at the address, an error is reported. The double breakpoint allows for automatic resetting of the initial breakpoint When the second breakpoint is encountered. Up to eight system task breakpoints are allowed. The processing flow is: SM-0040 2-23 1. Verify breakpoint number. 2. Verify breakpoint number not in use. 3. Verify memory address not already in Breakpoint Table. 4. Store information in task breakpoint table. 5. Save breakpoint instruction parcel. 6. Set breakpoint. Clear system breakpoint request (CBKPT=33) Request format: o 40 43 55 63 56 57 This request clears a system task breakpoint entry. is: The processing flow 1. Verify breakpoint number. 2. Verify breakpoint number in use. 3. Determine which of two possible breakpoint addresses is active. 4. Restore instruction parcel at the active addresses. 5. Clear breakpoint table entry. Report CPU usage request (CPUTIL=34) Request format: o 24 40 ~ __~___ 57 SM-0040 55 2-24 ss 63 I 4 3 8 _ This request puts CPU usage data into the assigned buffer flow is: The processing 1. Validate buffer size. 2. Fill the buffer with CPU usage data, zeroing the fields in EXEC that collect such data. Report task usage request (TASKUTIL=35) Request format: o 24 40 55 s6 63 ss 5 7 . This request puts task usage data into the assigned buffer. processing flow is: 358 The 1. Validate buffer size. 2. Put number of tasks into buffer. 3. Put number of readies of each task into buffer, zeroing the fields in the STT that collect such data. Report EXEC request (EREQNT=36) Request format: ~ o 24 40 55 ss 5 7 . 36 8 63 I _ This request puts the EXEC request count of each task into the assigned buffer. The processing flow is: 1. Validate buffer size. 2. Put number of tasks into buffer. 3. Put number of requests made by each task into buffer, zeroing the fields in the STT that collect such data. SM-0040 2-25 Report EXEC call counts request (ECALLCNT=37) Request format: o 24 40 56 55 63 55 5 7 . 378 This request puts the number of EXEC requests of each type into the assigned buffer. The processing flow is: 1. Validate buffer size. 2. Put number of task EXEC request types into buffer. 3. Put number of requests of each type into buffer, zeroing the fields in the STT that collect such data. Report interrupt counts request (CINTCNT=40) Request format: o 24 40 55 56 63 55 57 . 40 8 This request puts interrupt counts of each channel and pseudo channel into the assigned buffer. The processing flow is: 1. Validate buffer size. 2. Put number of interrupt channels into buffer. 3. Put interrupt count of each channel into buffer, zeroing the table entries that collect such data. 2.6.2 EXEC ERROR CODES EXEC returns one of the following error codes in register S6 if a request cannot be processed. The P register is not incremented in this case. SM-0040 2-26 Cooe (octal) 1 2 3 4 5 6 7 10 11 12 13 14 15 16 17 20 21 24 25 2.7 Significance No task space left No task assigned Task does not exist Resource already assigned to a task Channel already active Illegal task call Input side of channel active Output side of channel active Illegal breakpoint number Address already has a breakpoint Bad field definition Job already connected Disk malfunction Breakpoint invalid Device does not exist Illegal register name Equipment not in system Time queue is full Insufficient buffer length FRONT-END DRIVER The front-end driver (FED) physically controls I/O between front-end computer systems and STP. It performs all hardware error recovery, While STP provides all logical error recovery. FED supports configurations allowing up to four front-end channels with multiple front-end computer systems attached to each channel. These multiple front-end systems may be attached either directly to the channel or may be remote batch entry stations attached to the front-end system. The front-end driver also intercepts and processes maintenance control unit (MCU) request messages. The format and function of these messages is described in CRAY-OS Message Manual, publication SR-0039. 2.7.1 THEORY OF OPERATION The front-end driver is a slave to the Station Call Processor (SCP) in that all external interrupts are rejected on front-end channels until SCP requests an output/input (0/1) pair. The driver then becomes the passive partner of the front-end computer system. SM-0040 2-27 Error recovery is initiated by the front-end computer system. FED only reports input hardware errors to the front-end system and retransmits output messages on request from the front-end system. 2.7.2 SYSTEM TABLES USED BY FED FED uses the following system tables: CHT LIT LXT CXT Channel Table Link Interface Table Link Extension Table Channel Extension Table Detailed information on these tables is available in the COS Tables Descriptions Internal Reference Manual, publication SM-0045. Channel Table (CHT) The front-end driver sets the link interface table address and requesting task parameter block address into the channel table (CHT) When the 0/1 is initiated, and clears them when the 0/1 completes. FED sets the interrupt processor address for the next interrupt in the processing sequence. Link Interface Table (LIT) FED uses the Link Interface Table to determine the type of 0/1 request, to communicate 0/1 completions to sep, to maintain usage statistics, and to maintain auxiliary control information. Link Extension Table (LXT) FED uses the LXT to validate source 10 and obtain parameters for control of checksumming. Channel Extension Table (CXT) FED uses this table to build output message for I/O Subsystem. SM-0040 2-28 2.7.3 PROCESSORS The front-end driver is composed of a request processor and the following set of re-entrant interrupt processors. RLCP Processes interrupt from input of Link Control Package (LCP). If channel error flag, RLCP sets up WXLCP to process output interrupt from error LCP. If the LCP is a hardware error LCP, RLCP restarts driver: otherwise, it checks input LCP for following subsegments. If no subsegments exists, RLCP terminates the 0/1 operation or sets up for receipt of LTP. When there are subsegments, RLCP sets the channel to receive the first subsegment and requests RSSEG to process the next interrupt on this input channel. RSSEG Processes interrupt from input of subsegment. If a channel error occurs, RSSEG requests WXLCP to process next interrupt on output side and resets RLCP to process next interrupt on input side. When there are no errors and no LTP is expected, RSSEG readies the requesting task if there are no more subsegmentsJ otherwise, it sets up to receive the next subsegment or LTP. RLTP Processes interrupt from input of Link Trailer Package (LTP). WLCP Processes interrupt from output of LCP. If a subsegment is to follow, WLCP sets the channel to send the first subsegment and requests WSSEG to process the next interrupt on this output channel. If an LTP is to follow, WLCP sets the channel for a data transfer and requests WLTP to process the interrupt. WSSEG Processes interrupt from output of LCP. If another subsegment remains to be sent, WSSEG sets up to send the next subsegment and requests itself to process the interrupt, otherwise, WSSEG sets up for an LTP and requests WLTP to process the interrupt. If there is to be no LTP, it clears output active in the LIT and sets reject on the output channel. WLTP Processes interrupt from LTP and clears output active and sets reject on the output channel. WXLCP Processes interrupt from output of hardware error LCP. WXLCP clears output active and sets reject on output channel. If an LTP is to follow, WXLCP sets the channel for a data transfer and requests WXLTP to process the interrupt. WXLTP Processes interrupt from output of error LTP. APIIP Processes MIOP status input interrupt SM-0040 2-29 APOIP Processes MIOP command output interrupt APPACN Null packet processor APPACX STP task packet processor APPACE Echo packet processor APPACI I-packet processor APPACJ J-packet processor APPACA Disk packet processor APPACB Station packet processor SHSRQT BIOP simulated memory channel request processor SHSOT BIOP simulated memory channel output processor SHSIT BIOP simulated memory channel input processor 2.8 00-19/29 DISK DRIVER The disk driver drives either a 00-19 or a 00-29 disk storage unit connected to a CRAY-l I/O channel. The disk driver executes in monitor mode as that part of the Executive Request Processor devoted to processing the I/O Executive request. This request is described in section 2.6.1. 2.8.1 ROll The disk driver is labeled ROll in EXEC. ROll receives control when an STP task executes an EX instruction with an I/O function code in S7. The contents of S6 and the remainder of S7 specify parameter addresses relative to STP. The parameters at these addresses completely specify the current request to ROll. ROll does no I/O until it has checked the legality of the request parameters. If the request parameters are legal, ROll activates the CRAY-l channels and disk hardware specified by the request. Channel activation consists of the sending of data and disk hardware functions and the receiving back of data and disk hardware status. SM-0040 2-30 An illegal value causes ROll to schedule its caller to resume execution at the parcel immediately following the EX instruction. A legal request causes ROll to schedule its caller at the third parcel following the EX instruction. ROll is interrupt driven and executes a request in short bursts. Upon activating the I/O hardware, ROll gives control to Interchange. It regains control When the I/O hardware generates an interrupt on the CRAY-I channel involved. ROll keeps its caller STP task aware of the progress of the request after each transfer of a oector and at the completion of the request. Lost interrupts An interrupt could fail to occur due to a hardware failure. Therefore, ROll protects itself by scheduling a timeout interrupt for each request to ROll. As a result, each execution of Interchange compares the current contents of RTC (Real-Time Clock) to the timeout value. Interchange gives control to ROll if the timeout occurs. Since it is possible that exchanges or other interrupts might not occur for an extended period, the MCU real-time interrupts to the CRAY-I should be enabled to ensure frequent execution of Interchange. The time delay scheduled for timeout reflects the magnitude of the request to ROll while being liberal enough to avoid needless timeouts. A single ROll request may involve many interrupts~ thus, the single timeout scheduled per ROll request acts as a blanket protection. Lost interrupts are rare, generally only expected interrupts occur. When the request completes, the timeout is released. Multiprogramming Between the short bursts of ROll activity, Interchange finds other work for the CPU. Therefore, user programs, STP, and even other parts of EXEC are multiprogrammed with ROll. A trace of COS activity generally shows considerable non-ROll activity between disk interrupts. ROll also helps facilitate CPU-I/O overlap for user programs by helping them to take advantage of each sector of transfer when that sector completes as well as when the entire request completes. SM-0040 2-31 The user program is taken out of recall as soon as possible so that the user program may immediately begin to process the next block of data. I/O requests by user programs come to ROll via the disk queue manager task (DQM) of STP. DQM queues these user calls, issuing corresponding requests to ROll in a sequence that optimizes system throughput; that is, seeks are optimized. DQM passes to ROll the address of the DNT for the dataset involved. Status checking and error recovery ROll checks hardware status at the start of each request, during the request, and at the completion of the request. ROll notifies the calling STP task when a request completes whether successfully or in error. To effect error recovery, the calling task must make the appropriate calls to ROll. ROll contains features explicity for error recovery. ROll performs the servo offset and data strobe functions if requested by the calling task. Callers The Disk Queue Manager is the only STP task that calls ROll. ROll rejects any call that would interfere with an ongoing request, except for the call to master clear the disk controller and the CRAY-I input and output channels to which it is connected. 2.8.2 HARDWARE SEQUENCES FOR SAMPLE REQUESTS This subsection assumes that the reader is familiar with the CRAY-I DCU-2, DCU-3 Reference Manual, publication 2240630. The processing sequence for several requests are presented here. Multiple sector read If another DSU is selected, release connected DSU. If the DSU is not selected, then, Set unit selected flag, Wait 5 usec after last head select, Activate output channel to send unit reserve function to disk control unit, and Await interrupt causes by output channel completion. SM-0040 2-32 1. • If in error recovery mode, then, - Activate output channel to send status readout function (requesting subsystem status) to disk control unit, - Await interrupt caused by output channel completion, - Activate input channel to receive subsystem status from disk control unit, - Await interrupt caused by input channel completion, - If subsystem status is bad, then go to error recovery, - If not desired cylinder, then, (1) Update current cylinder to desired cylinder, (2) Activate output channel to send cylinder select function to disk control unit, and (3) Await interrupt caused by output channel completion. • If sectors to transfer equal 0, then go to 2. • If retry is enabled, then, • • • • • • • • 2. • • • • • • • SM-0040 - Read continuity is broken. Activate output channel to send margin select function to disk control unit, and - Await interrupt caused by output channel completion. If read continuity is broken, then, - Await 5 usec after last head select. - Activate output channel to send read function to ncu, - Await interrupt caused by output channel completion, - Activate input channel to receive read-write response from ncu, - Await interrupt caused by input channel completion, and - If read-write response is bad, then go to error recovery. Activate input channel to receive data block from DCU-2. Await interrupt caused by input channel completion. If channel error, get subsystem status, then go to error recovery. Update tables. If sectors to transfer equal 0, then go to 2. If recall flag in DNT is set, then ready caller task. Go to 1. Activate output channel to send status readout function (requesting subsystem status) to ncu. Await interrupt caused by output channel completion. Activate input channel to receive subsystem status from DCU. Await interrupt caused by input channel completion. If subsystem status is bad, then go to error recovery. Update tables. Ready caller task. 2-33 Multiple sector write A multiple sector write resembles the multiple sector read; however, retry is disabled implicitly and write continuity is checked. Either a margin select function or a read function destroys write continuity. A write function destroys read continuity. Cylinder select A cylinder select resembles a multiple sector read or write except that sectops to transfer are 0 on entry to the driver. Controller master clear 1. Clear reserved bits in all PUTs for the channel. 2. Clear PUTs to force unit select and seek. 3. Master clear channel with recommended I/O master clear sequence. 4. Reserve unit. 5. Send subsystem status. 6. Open input channel for one parcel immediately. 7. If input interrupt preceeds output interrupt for subsystem status, reject further input interrupts until output interrupt. 8. If subsystem status is not ready - Send fault status function, - Release unit with fault clear bit set. - If fault status shows a seek error, perform clear fault and return to zero seek. - Reserve unit. 9. SM-0040 If subsystem status is ready, return to caller; otherwise, retry clearing procedure until it is successful. 2-34 Unit release 1. Activate output channel to send unit release function to DCU-2. 2. Await interrupt caused by output channel completion. 3. Update tables. 4. Ready caller task. Margin select The margin select algorithm selects margins in the following sequence: 1. Late strobe, maximum offset toward center (position 37 S ) 2. Late strobe, maximum offset toward edge 3. Early strobe, maximum offset toward center 4. Early strobe, maximum offset toward edge 5. Maximum offset toward center 6. Maximum offset toward edge 7-12. Same strobe and direction but with offset position 24S l3-lS. Same strobe and direction but with offset position lIS 2.9 I/O SUBSYSTEM DRIVER The I/O Subsystem driver, called the lOP driver, is responsible for controlling all normal I/O channels between the CRAY-l CPU and its I/O Subsystem. 2.9.1 FUNCTIONAL DESCRIPTION All information passed through the driver is in the form of a six 64-bit word packet. The general format of a packet as well as specific formats can be found in the COS Table Descriptions Internal Reference Manual, publication SM-0045. The tables to be concerned with are: ADC ASC APT SM-0040 I/O Processor Disk Command lOP Station Command Any Packet Table 2-35 Processing of a packet by the driver is determined by the packet's source 10 (SID). Possible SIOs are listed below. SID Definition and processing CI Any packet to be queued for output to the I/O Subsystem A Disk request reply packet is passed to disk driver for processing. B Station request packet passed to station driver for processing C lOP message packet (Not implemented) o Reserved for expansion E Echo packet; the source and destination IDs are swapped and the packet queues for output to the I/O Subsystem. I Initialize channel; subsystem downed; output queue cleared and packet echoed to sender J Acknowledgement of channel initialization; up the subsystem and echo the packet. N Null Packet; packet discarded and no further processing. STP tasks interface to the lOP driver via the PIO monitor request (PIO=22) as described in section 2.6.1 of this manual. Tables used internally to the lOP driver are described as follows. Channel Extension Table (CXT) The station driver interfaces to the lOP driver through the CXT table. This table holds some control information for the station driver as well as the station command packet (ASC). Queue Control Table (QCT) This table is used internally by the lOP driver to maintain an input queue to each STP task and an output queue to each I/O Subsystem. SM-0040 2-36 Subsystem Control Table (SCT) This table resides in EXEC and is used by the driver to manage I/O with the I/O Subsystem. A duplicate of this table is required in STP for each task Which is using the lOP driver. The tables are linked when the first lOP driver request is made. The task can then monitor the subsystem status in its SCT table. 2.9.2 RECOVERY The lOP driver does not handle the recovery of any requests. It will only notifies each requester when a subsystem has gone down and comes up again. 2.9.3 MIOP COMMAND AND STATUS PACKET FORMATS ILCP ISEG ILTP Description Destination 10 Source 10 Channel ordinal Message number Address request flag. Set by MIOP when requesting next set of addresses only. When this field is clear, SCP will be initiated. SGZ SM-0040 2 32-63 Segment size. Set by EXEC; MIOP does not need to return these values. 2-37 Word Bits Description OLCP 3 0-31 Output LCP address. Set by EXEC~ MIOP does. not need to return these values. ILCP 3 32-63 Input LCP address. Set by EXEC; MIOP does not need to return these values. OSEG 4 0-31 Output segment address. Set by EXEC~ MIOP does not need to return these values. If 0, there is no segment or LTP. ISEG 4 32-63 Input segment address. Set by EXEC; MIOP does not need to return these values. If 0, there is no segment or LTP. OLTP 5 0-31 Input LTP address. Set by EXEC; MIOP does not need to return these values. If 0, there is no segment or LTP. ILTP 5 32-63 Input LTP address. Set by EXEC; MIOP d~es not need to return these values. If 0, there is no segment or LTP. The station command protocol interleaves with the standard station protocol in the following cycle: 1. At polling interval, MIOP sends station packet with address request flag set. 2. CPU returns addresses to MIOP. 3. CPU input LCP and segment are transfered over the memory channel. 4. MIOP sends station packet with address request flag clear, causing CPU to execute station task. 5. CPU returns addresses to MIOP. 6. CPU output LCP and segment are transfered over the memory channel. SM-0040 2-38 Disk packet format Request: o 1 2 3 4 RAe MOS 5 RES Reply: o 11 -16 23 o \ 2 DA 3 4 CYL RSEP 5 OFF SEC HD WL CYSK INLK Field Word Bits Description DID o 0~'15 Destination IO SID o 16-31 Source IO PCH O~ ·15 Pseudo channel number TPB 2 OCT 2 EQT 2 48-63 EQT address DA 3 0"'31 Bipolar data address FCT 3 32-39 Function STS 3 40-47 Logical status 3 48-52 Unused SM-0040 16-31 lMRG Task parameter block address OCT address 2-3'9 Field --- Word Bits Description PN 3 53-54 lOP Processor number CHN 3 55-63 lOP channel number CYL 4 0-10 Cylinder HD 4 11-15 Head SEC 4 16-22 Sector OFF 4 23-31 Word offset WL 4 32-63 Word length MaS 5 0-31 MOS data address RAC 5 32-47 Read ahead count RES 5 48-63 Reserved for lOP RESP 5 0-15 Error flags INLK 5 16-31 Interlock status CYSK 5 32-47 Cylinder status IMRG 5 48-63 Last margin The disk packet command protocol interleaves with data flow over memory channel with the following cycle. 1. CPU sends MIOP the disk request. 2. Data transfers over memory channel. 3. MIOP sends the disk status reply. Note that only one request is outstanding to the lOP on each pseudo channel. 2.10 EXEC'DEBUG'AIDS EXEC has two debugging aids: SM-0040 history trace and the stop buffer. 2-40 HISTORY TRACE 2.10.1 ., History trace is an EXEC-resident routine composed of 4-word messages entered in a circular buffer by the subroutine DEBUG. The buffer begins at location DBF and has room for 1024 messages before previous messages are overwritten. DEBUG maintains the table offset to the next message address at location DBFP. The general format of a trace message 'address is as follows: :. 48 0 0 51 p F 1 Time interval since last 63 XP ~ntry 2 word 3 s 3 Field Word Bits Description F 0 0-6 Function number P 0 27-48 XP 0 51-63 P register of interrupted exchallg'e I p~k~e . .' . Exchange package addr'ess BO 1 1 0-23 24-63 Contents of BO of interrupted task Time interval since last entry The purpose of each trace type and the contents of the two registers are defined in table 2-1. The function trace tool changes frequentlY1 check listings for current formats. The function trace may be selectively or collectively enabled. To enable all functions, set location DBUGM nonzero. To selectively enable a function clear (DBUGM) and set (DBUGM plus the function number). Any combination of functions may be enabled at a given time. Disabling is accomplished by clearing the location that enabled the debug function. STP functions (12 and 21 through 32) are further selected through the POST micro acting together with the POST macros occurring throughout STP. An STP function not listed in the POST micro in the early part of STP is disabled and can be re-enabled only through reassembly or by patching the code generated by a particular POST macro. In table 2-1, disabled functions are indicated with a § flag. Functions 1-11 and 13-20 are used by EXEC and are not affected by the POST micro. S Refer to table 2-1 for the contents of words 2 and 3. SM-0040 2-41 Use the following macro to make a trace entry from a task in 8TP. This example assumes that 0'77 is the function number and 82 and 83 contain the information to be captured. Note that any register values other than 80 and 87 can be used instead. Location 1 Comment Operand Result 10 20 POST 0177,S2,S3 35 In EXEC, to perform a history t:J:ace, place the information of interest in 86 and 87 and execute the following: 1 Comment Operand Result Location 10 20 35 A5 R 0 77 DEBUG function number 1 The history trace is easily expandable so new function types may be added. New DEBUG function numbers may be assigned up to a maximum value of 77 octal. Table 2-1. History trace functions II Event causing trace entry Function Number S Contents of words 2 and 3 1 I/O trace interrupt 7/channel error flag, 9/channel number, 24/fwa of CBxx, 24/fpa 64/lst word of relevent PUT table if disk 2 User-initiated normal exchange 64/user 80 64/user 81 3 8TP-initiated normal exchange 64/task 86 64/task 87 4 EXEC-initiated normal exchange 64/MCPTS 64/8AXPSS MCPT = SS 8AXP = 8M-0040 l/task scheduler request flag, 5l/not used, l2/task parameter block address (zero if user program was interrupted) l3/JXT offset, 24/JTA address, l5/time slice in milliseconds, 12/XP address (constant) 2-42 Table 2-1. Function Number History trace functions (continued) Event causing trace entry Contents of words 2 and 3 5 Real-time interrupt 64/MCPT 64/SAXP 6 Copy user XP to JTA 64/MCPT 64/SAXP 7 Station input interrupt for LCP 64/LCP+0 64/LCP+l 10 User is being given control of CPU 7/not used, 9/XP flags, 24/not used, 24/nonzero if user XP is in JTA 64/SAXP 10 Input SCBs received 64/input stream control bytes 64/output stream control bytes 11 Physical disk I/O request 16/transfer length, 24/disk address, 24/buffer address l/transfer direction, 54/unused, 3/1, 4/software channel number,2/unit number 11 Physical disk I/O retry first message 16/1ast function, 24/1ast status, 24/first parcel address 16/edited status 11 Physical disk I/O retry second message 16/retry length, 24/disk address, 24/memory address 16/, 24/CA IN, 24/CA OUT 11 Physical disk I/O channel error 16/channel no, 24/desired CA, 24/actual CA 64/'CA ERROR' 12 Intertask message 64/input word 0 or output word 0 64 input word 1 or output word 1 13 Error exchange 64/MCPT 7/unused, 9/XP flags, 40/unused 14 Station output interrupt for LCP 64/LCP+0 64/LCP+l " SM-0040 2-43 Table 2-1. Function Number History trace functions (continued) Event causing trace entry Contents of words 2 and 3 15 Input/output segment termination 63/, l/input active 63/, l/output active 16 Input SCBs received 64/input stream control bytes 64/output stream control bytes 17 Station output interrupt for error LCP 63/, l/input active 63/, l/output active 20 Output SCBs sent 64/input stream control bytes 64/output stream control bytes User's time slice expired 24/'RTC', 40/number of RTC interrupts 64/RTC interrupt word (JRTCWORO) Job being initiated 64/ASCII job name from SOT l6/priority, 24/job size (including JTA), 24/time limit in seconds Job being reactivated after a time elapsed or an event occurred 64/ASCII job name 64/RTC value, or the event compar ison value Job status change 8/0, 56/0ld JXSTCH (with "--> II if room) 56/New JXSTCH, 8/JXT ordinal (1-0'77) Search for a free memory segment 64/JXSTCH for job that needs memory; 64/size of free segment sought Allocation of a memory segment 64/MST entry§§ for the free segment from Which the allocation is to be taken; 64/number of words to be allocated 24 § Entries disabled in default system S§ MST entry = l6/JXT ordinal, 24/segment size, 24/segment address. SM-0040 2-44 Table 2-1. Function Number § History trace functions (continued) Event causing trace entry Contents of words 2 and 3 Liberation of a memory segment 64/offset of the MS'r entryS § for the segment to be freed; 64/the MST entry itself User job about to be connected 64/job name 64/'GOT CPU' Disconnected user job losing X status 64/Job name 64/'HAD CPU' Disconnected user job about to be reconnected 64/Job name 64/'KEPT CPU' CPU going to idle state 64/'CPU IDLE', 64/JXTPOP in the first trace entry; for each job in the JXT, another function-27 entry follows, containing: 64/Job name, 56/JXSTCH, 8/JXORD Request received by JSH 40/ASCII function name, 24/JXT ordinal 64/ASCII job name 32 J$ALLOC request's initial processing done 64/address of memory request word in STP 64/memory request word itself 32 Entry to MOVEMEM routine (trace entry suppressed if no data will be moved) l6/'MV',24/from-address, 24/from-length, 40/to-address, 24/to-length 32 Entry to ERASEMEM routine (trace entry suppressed if no data will be erased) 64/ ' ERASE', 40/address, 24/length of area to be erased 32 Exit from RELOCATE routine (Always two trace entries: before and after relocating) Values before relocating: 22/HLM 2l/LFT, 2l/DSP, 22/BFB, 2l/buffer boundary, 2l/FL Values after relocating: 22/HLM, 2l/LFT, 2l/DSP, 22/BFB, 2l/change in FL, 2l/FL Entries disabled in default system SM-0040 2-45 2.10.2 SYSTEM CRASH MESSAGE BUFFER This error reporting feature of EXEC assists the computer operator or system analyst in finding the general cause of a system crash. When EXEC has detected a fatal error condition, a STOP message is built in a buffer called the stop buffer. This buffer is located in EXEC at B@STOP, which is located just before the history trace buffer, is loaded with the label in EXEC where the error was detected, the word address of P and BO, and a stop message in ASCII. The buffer is dumped with the following format. *==============================* S TOP B U F FER *==============================* EXEC STOPPED AT LABEL: $ STOP 0 0 6 W.P = W.BO = EEF - OPERAND RANGE ERROR ------------END BUFFER --------- The stop label is the label used in EXEC to call the STOP macro. The values in P and BO are not converted to ASCII characters, so their values appear in the dump. The value of P is in the word after the word containing W.P and the value of BO is in the word after the word containing W.BO. Remember these two values have been truncated to,words. A convention is used for STOP labels and messages. The label has the form $STOPec, where ec is a unique decimal number for each error condition. The stop message contains the routine name where the stop occurred and a short descriptive error message. EXEC stop messages are shown in table 2-2. Table 2-2. EXEC stop messages Label Code Significance $STOPOOO $STOPOOl $STOP002 $STOP003 $STOP004 $STOP005 $ STOPOO 6 $STOP007 $STOP008 $STOP009 $STOPOlO $STOPOll $STOPOl2 $STOPOl3 $STOPOl4 EEF DEQRT DEQRT DEQRT EE EEF EEF EEF EEF ENSECl TllCLCA NER EE EEF MC unknown Error Invalid Time Index Parameter Word Mismatch Time Queue Empty program Address Range Error Floating-point Error Operand Range Error Program Range Error STP Error Exit (See STP Hang Message) Time Queue Full Illegal Disk Channel Selected Uncorrectable I/O Read Memory Error Double Bit Memory Error Memory Error Disk Controller Not Responding SM-0040 2-46 STOP-macro to hang EXEC The STOP macro calls routine $STOP When the selected condition is true. Otherwise execution continues after the macro call. Routine $STOP builds the message in the STOP buffer, restores all the registers to their values before the STOP macro was entered, and then hangs in a tight loop. Format: Location Po Result Operand STOP Po Label field is required. Pl Stop condi tion argument. true. Condition UC SZ SN SP SM AZ AN AP AM Significance Unconditional stop Stop if SO zero Stop if SO not zero Stop if SO positive Stop if SO negative Stop if AO zero Stop if AO not zero Stop if AO positive Stop if AO negative Message argument1 string of 1 to 64 characters enclosed by parentheses. P2 2.11 EXEC hangs When this condition is INTERACTIVE SYSTEM DEBUGGING The executive requests described in section 2.6.1 provide the mechanism through which interactive system debugging control passes from the user to SCP to EXEC. The debugging capability provides for memory entry and display, operating register entry and display, setting and clearing breakpoints, and starting and stopping the system. The operator debug commands that use this capability are described in the COS Operational Procedures Reference Manual, publication SM-0043. SM-0040 2-47 SYSTEM TASK PROCESSOR (STP) 3.1 GENERAL DESCRIPTION The System Task Processor (STP) consists of tables, a set of routines called tasks, and some re-entrant routines common to all tasks. A task is a routine that serves a specific purpose and usually recognizes a set of subfunctions that can be requested by other tasks. Characteristics of a task are that it has its own 10 (a number in the range 0-358)' an assigned priority (OOO-3778)' its own exchange package area in the System Task Table (STT), and its own intertask communication control table which defines which tasks are allowed to communicate. The addresses of the Base Address (BA) register and Limit Address (LA) register are the same for all tasks; BA is set to the beginning of STP and LA is set to I@MEM (an installation defined maximum memory value). Although a task is loaded into memory during system startup, it does not normally become known to the system until an existing task issues an executive request for the creation of some other task. COS Startup is the necessary exception. A "create task" request assigns an 10 and a priority to a task via the task's parameter block in the STT. Tasks execute in program mode and are thus interruptible. An interrupt may occur as a result of the task executing an exit instruction (ERR or EX) or may result from one of the interrupt flags being set automatically (e.g., an I/O interrupt occurred). When a task is created, it is forced into execution. During this initial execution, it usually performs some initialization and setup operations and then suspends itself. Thereafter, a task is executed only if it is readied. Readying of a task consists of altering its suspend bit. It is not a candidate for execution, however, unless all of the bits in its status field are 0, including the breakpoint and stop bits. Task readying occurs automatically or explicitly. Readying occurs automatically for tasks assigned to a channel when an interrupt occurs on the assigned channel. Readying of a task may also occur as a result of an explicit EXEC request issued by one task for the execution of another task. A task may be readied or suspended by a System Operator Station request (Station Debug Command). A task remains ready (unless breakpointed or stopped) until EXEC receives a request to suspend it. SM-0040 3-1 3 A task requests self suspension when it has completed an assigned function or posts a request for another task. Note that if the task being requested is of lower priority than the task making the request, the requesting task must suspend itself to allow the lower priority task to execute. Subsequent requests to ready a task already readied cause the ready request bit in the task's parameter word to be set. When this bit is set, the next suspend request for the task causes the task to be rereadied rather than suspended. The task ready request bit is then cleared. 3.2 TASK COMMUNICATIONS Tasks may communicate with EXEC, with each other, and with user jobs. 3.2.1 EXEC/TASK COMMUNICATION A task communicates with EXEC by placing a request and parameters in registers S6 and S7 and by executing an EX instruction. When a task executes an EX, the error return is to the instruction following the EX; the normal return is to the instruction following the error return. The error return instruction must be a 2-parcel instruction. A reply to the request is returned in S6 and 57. EXEC requests are described in detail in section 2.6. 3.2.2 TASK-TO-TASK COMMUNICATION 5TP contains two areas used for inter task communication. The first area is the Communication Module Chain Control (CMCC); the second area is the Communication Module (CMOO). SM-0040 3-2 The CMCC is a contiguous area containing an entry for each combination of tasks possible within the system. The CMCC is arranged in task number sequence, that is, all possible task 0 combinations of requests to task 0 are followed by all possible combinations of requests to task 1. The task 10 of the requesting task and the task 10 of the requested task are the values that determine the appropriate CMCC entry. CMOOS are allocated from a pool as needed and, therefore, have no fixed location. Memory Pool 2 is reserved exclusively for use by intertask communications. A CMOO consists of six words: two are used for control; two are used as input registers: and two are used as output registers. A task receives all of its requests and makes all of its replies through a CMOO. Figure 3-1 illustrates the tables used for task communication. One task communicates with another by placing a request in the input word of a CMOO. The requested task replies by placing the request status in the output words of the CMOO. The format of a request is subject to the requirements defined by the called task. Requests recognized by a task are described with the task later in this section. However, some conventions do exist. Conventionally, the requested function is placed in INPUT+O. Output usage is conventionally defined such that OUTPUT+O is o if no error has occurred; otherwise, it contains a nonzero error code. Task communication routines Six re-entrant routines in STP that are common to all tasks facilitate inter task communication. They are: PUTREQ Put request routine, asynchronous, destroys A6 and A7 GETREQ Get request routine: destroys A6 and A7 PUTREPLY Put task reply routine; destroys A6 and A7 GETREPLY Request status routine; destroys A6 and A7 TSKREQ Task request routine, synchronous; destroys A3 REPLIES Queues unrequested reply: destroys A6 and A7 The task placing a request calls PUTREQ to place the request and calls GETREPLY to check for a status from the requested task. Conversely, the requested task uses GETREQ to locate outstanding requests and uses PUTREPLY to return the status. SM-0040 3-3 COMMUNICATION MODULE CHAIN CONTROL ------ TASK 0 HEADER TASK 0 to TASK 1 TASK 1 TASK 1 to TASK 1 , TASK 2 to TASK 1 " "- .... "- ....... ....... "- " TASK n ",J TASK n to TASK 1 COMMUNICATION MODULES /' CONTROL /' CMOD No. 1 TASK 2 to TASK 1 / / ------------------~ INPUT /' CMOD No. 2 TASK 2 to TASK 1 ~-------------,I- OUTPUT - - - - - - - - - - - - - _____________, ••• CMOD No. n TASK 2 to TASK Figure 3-1. SM-0040 Task co~~unication 3-4 tables PUTREQ - This STP common subroutine places the request in the input registers of a CMOO and links it to the appropriate communications module chain control. If the request cannot be chained because either no CMOOs are available or the chain is at its maximum, PUTREQ suspends the calling task or, at the caller's discretion, returns control to the requester with no action taken. Once PUTREQ has successfully generated the CMOO and linked it to the CMCC, the requested task is readied and control returns to the requester. PUTREQ is called via a return jump with the caller providing the following values: INPUT REGISTERS: (AI) = "Throw-away" indicator. If (AI) is positive, control is not returned to caller until request is queued. If (AI) is negative, control returns with no action taken if the request cannot be queued without suspending the caller. (A2) = Requested task's 10 ( Sl) = INPUT+O} (S2) = I NPUT+I OUTPUT REGISTERS: Request None GETREQ - This STP common subroutine locates any outstanding request for the caller. Using the CMCC, GETREQ searches for a CMOO representing a request not yet given to the requester. GETREQ begins the CMCC search with the lowest numbered task and returns the first request encountered to the caller. A task calls GETREQ via a return jump. INPUT REGISTERS: None OUTPUT REGISTERS: (AO) = "Found" indicator. If (AO)=O, no outstanding requests exist. If (AO),O, a request is being returned. (A2) = 10 (Sl) = INPUT+O} (S2) = of task that generated the request Request INPUT+l PUTREPLY - This STP common subroutine places the reply to a request in the first available CMOO. Requests and replies are stored in the CMOO in the sequence in Which they are generated. Therefore, a single CMOO may represent an unrelated request and reply. The subroutine readies the task to which the reply is directed and returns to the requester. PUTREPLY is called via a return jump. SM-0040 3-5 INPUT REGISTERS: OUTPUT REGISTERS: of task to receive the reply (A2) = 10 (51) = OUTPUT+O} (52) OUTPUT+l Reply None GETREPLY - This STP common subroutine searches for a reply to the calling task. The search begins with the lowest numbered task and ends with the highest numbered task, returning the first reply encountered. GETREPLY removes the CMOO from the CMCC and releases it for reallocation. The subroutine is called via a return jump. INPUT REGISTERS: OUTPUT REGISTERS: None (AO) = "Found" indicator. If (AO)=O, no reply was located; if (AO);iO, a reply is being returned to the caller. (A2) 10 (51) OUTPUT+O} (52) = OUTPUT+l of replying task Reply TSKREQ - This STP common subroutine makes a request to a task for processing and suspends the caller until a reply is received. If the request cannot be queued immediately, because either the queue is at its maximum or because no communication modules are available, the caller is suspended until the request can be queued. Once the request has been queued, the caller is suspended until a reply is received. TSKREQ is called via a return jump. INPUT REGISTERS: OUTPUT REGISTERS: of requested task (A2) = 10 (51) = INPUT+O} (52) = INPUT+l (51) = OUTPUT+O} (52) = OUTPUT+l Request Reply REPLIES - This subroutine queues a reply for which no request was made. The reply is queued at the beginning of the reply queue. A reply sent via this subroutine will be seen by GETREPLY before any reply sent via .Po'rREPLY. SM-0040 3-6 INPUT REGISTERS: OUTPUT REGISTERS: 3.2.3 (Al) = "Thro\traway" indicator. If (Al) is positive, control is not returned to caller until reply is queued. If (Al) is negative, control returns with no action taken if the reply cannot be queued without suspending the caller. (A2) = Replied task's ID (Sl) INPUT+O} (S2) = INPUT+l Reply None USER/STP COMMUNICATION All user/STP communication is initiated by user jobs. A user-program request to STP may be issued as a CAL macro (see CRAY-OS Version 1 Reference Manual, publication SR-OOll). The user macro results in a normal exit from the user program. EXEC routes all normal exits from a job to the Exchange Package Processor. Exchange Package Processor handling of these requests is described in section 4.4 3.3 STP COMMON ROUTINES Certain re-entrant routines resident in STP may be called via return jumps rather than via a call to another task. These include task logical I/O routines (TIO), circular I/O routines (CIO), memory management routines, and item chaining/unchaining routines. 3.3.1 TASK I/O ROUTINES (TIO) Task I/O is a re-entrant routine in STP that logically can be considered a part of any task that uses it. It operates only on blocked datasets. TIO allows a system programmer to do logical I/O at the task level without being concerned about physical I/O. A task that uses TIO must have a DSP, DNT, and buffer assigned for the dataset; TIO does no allocation or deallocation of DSPs, DNTs, and buffers. A task may use a dataset's DNT, DSP, and buffer in a user field or may generate its own DNT, DSP, and buffer for the dataset local to STP. For example, SCP may use an existing DNT portion of an SDT entry While generating a DSP and buffer to accommodate the logical I/O. Similarly, EXP may reference the existing DNT for a user's $OUT dataset or JSH may reference its own DNT for a roll-out dataset. Figure 3-2 illustrates the linkages between the DNT, DSP, and buffer. SM-0040 3-7 Buffer DSP Figure 3-2. Dataset table linkages Since task I/O cannot sense completion of physical I/O, each task must provide the sensing. To do this, each task, when it is readied from a suspension, should call GETREPLY. If a reply is found that belongs to task I/O (A2=DQMID), the task should jump to REPCIO with Sl and S2 intact from GETREPLY. When TIO must wait for completion of physical I/O before completing a task's logical request, it returns to the task's main interrupt loop. TIO returns to the task's calling address only on completion of the logical request. The calling task may not make another TIO request for a particular DNT until any pr~ious logical request has completed. The following TIO routines are available to system programmers: $RWDP Read word (s); partial mode (will not point to next eo1') $RWDR Read word(s); record mode (will point to next eo1') $WWOP Write word(s); partial mode (no eo1' written) $WWDS Write word(s) with unused bits in last word; record mode (eo1' written) $WWDR Write word(s); record mode (eo1' written) $WEOF Write eo!; calls $WWDR if no eo1' was written $WEOD write eod; calls $WEOF if no eo! was written $REWD Rewind dataset; calls $WEOD if no eod was written To call a TIO routine, a task places parameters required by the routine in A registers and executes a return jump to the routine. The routine returns results to the caller via A registers. CAUTION These TIO routines have the same names as logically equivalent routines in the system library, $SYSLIB. However, the TIC routines reside in STP and the source for library routines resides in the SYSLBPL program library. SM-0040 3-8 3.3.2 SYSTEM TABLES USED BY TIO TIO uses the following system tables for the dataset on Which I/O is to be preformed: DNT DSP Dataset Name Table Dataset Parameter Area Detailed information on these tables is available in the COS Table Descriptions Internal Reference Manual, publication SM-0045. Dataset Name Table (DNT) TIO uses the DNT as indicated by the F$RDC and F$WDC routines available to users (refer to description of the Exchange Package Processor in section 4.4). Dataset Parameter Area (DSP) TIO uses certain DSPs located in the user field, such as those for $IN, $OUT, data sets read or written by BUFFER IN/OUT, and sequential COS blocked datasets that are being closed When in write mode and not positioned to end of data. TIO uses reserved words at the end of the DSPs. These are saved in the JTA When a TIO routine goes into recall. Error processing When TIO detects an error, a negative value is returned in AO. The caller is responsible for processing these errors. Appropriate error bits in the DSP error status (DPERR) indicate which error occurred. TIO logical read routines The TIO read routines transfer partial or full records of data from the I/O buffer to the task's data area. The data is placed in the data area in full words depending on the' read request issued. Figure 3-3 provides an overview of the logical read operation. The calling routine must examine DPEOR, DPEOF, and DPEOD in the DSP to determine end of record, end of file, or end of data status. If the record control word indicates unused bits in the last word of the record, these bits are zeroed in the data area and field DPUBC is set to the number of unused bits. $RWDP - Words are transmitted from the I/O buffer defined by DSP to the area beginning at FWA until either the word count in A3 is satisfied or an eo?? is encountered. $RWDP calls $RBLK, automatically. SM-0040 3-9 SUBROUTINE NAME: $RWDP - Read words, partial mode ENTRY CONDITIONS: (Al) Address of DSP (A2) FWA of task's data area (A3) Word count. If word count is 0, no data is transferred. (A6) Address of DNT (A7) Address of JXT (=0 if not job related) RETURN CONDITIONS: (AO) Status TIO error (block number error, null dataset, etc.) =0 Logical I/O complete <0 (Al) Address of DSP (A2) FWA of task's data area (A3) Word count (A4) LWA+l (A6) Address of DNT (A7) Same value as input $RWDR - This routine resembles $RWDP; however, following the read, the dataset is positioned after the eop that terminates the current record. SUBROUTINE NAME: $RWDR - Read words, record mode ENTRY CONDITIONS: Same as $RWDP RETURN CONDITIONS: Same as $RWDP TIO logical write routines The TIO write routines transfer partial or full records of data from the task's data area to the I/O buffer. The data is transferred in full words depending on the write operation requested. Two additional write routines provide for writing an eo! or an eod on the dataset. Figure 3-4 provides an overview of the logical write operations. When writing in record mode, it is possible to provide a count of unused bits in the last word of the record. These bits are not zeroed in the buffer, but the record control word indicates unused bits, and the bits are then cleared When the record is read. SM-0040 3-10 $WWDP - The number of words specified by the count are transmitted from the task's data area beginning at FWA and are written in the I/O buffer defined by DSP. $WWDP automatically calls $WBLK, as needed. SUBROUTINE NAME: $WWDP - Write words, partial mode ENTRY CONDITIONS: (AI) Address of DSP (A2) FWA of task's data area (A3) Word count. If count is 0, no data is transferred. (A6) Address of DNT (A7) Address of JXT (=0 if not job related) RETURN CONDITIONS: (AO) Status <0 TIO error =0 Logical I/O complete (AI) Address of DSP (A2) FWA of task's data area (A3) Word count (A4) LWA+l (A6) Address of DNT (A7) Same value as on input $WWDR - The $WWDR routine resembles $WWDP. However, an eo~ RCW terminating the record is inserted in the I/O buffer in the next word following the data. To simply write an eo~, the task issues a $WWDR with (A3) =0. SUBROUTINE NAME: $WWDR - Write words, record mode ENTRY CONDITIONS: Same as $WWDP RETURN CONDITIONS: Same as $WWDP SM-0040 3-11 (A2) --f_.J__ ~__~A_r_e_a (A3) (A6) Task's Data __~~ dn [QL ~ DNT l.(A 1 ,....-------.., DSP I/O BUFFER CNCC for TASK I/O DQM PHYSICAL I/O Figure 3-3. SM-0040 TIO logical read 3-12 (A2)--t- Task's (A3) Data Area _J_~~~-=="" I/O BUFFER CMCC I=====~ fo r _______----" -pQM_ TASK I/O PHYSICAL I/O mass storage Figure 3-4. SM-0040 TIO logical write 3-13 $WWDS - The $WWDS routine is identical to $WWDR, except that the last word of the record contains unused bits, and the eop RCW constructed contains the unused bit count. SUBROUTINE NAME: $WWDS - write words, record mode, with unused bit count ENTRY CONDITIONS: Same as $WWDR, plus: (A4) Unused bit count in the last word of the a value from 0-63 record~ RETURN CONDITIONS: Same as $WWDR $WEOF - This routine writes an eof RCW preceded by an eop RCW, if necessary, as the next words in the I/O buffer. SUBROUTINE NAME: $WEOF - write end of file ENTRY CONDITIONS: (Al) Address of DSP (A6) Address of DNT (A7) Address of JXT (=0 if not job related) RETURN CONDITIONS: (AO) Status <0 TIO error =0 Logical I/O complete (A6) Address of DNT (A7) Same value as on input $WEOD - This routine writes an eod RCW preceded by an eop and an eof, if necessary, as the next words in the I/O BUFFER. The $WEOD forces the final block of data to be written on the disk; that is, it flushes the I/O buffer. A $WEOD cannot be followed by a write. SUBROUTINE NAME: $WEOD - Write end of data ENTRY CONDITIONS: (AI) Address of DSP (A6) Address of DNT (A7) Address of JXT (=0 if not job related) SM-0040 3-14 RETURN CONDITIONS: (AO) Status <0 TIO error =0 Logical I/O complete (A6) Address of DNT (A7) Same value as on input Positioning routine TIO supports a single positioning routine, $REWD. $REWD - The $REWD routine positions the dataset at the beginning of data (bod). If the deadstart is in write mode and no eod has been written, $REWD calls $WEOD. SUBROUTINE NAME: $REWD - Rewind dataset ENTRY CONDITIONS: (AI) Address of DSP (A6) Address of DNT (A7) Address of JXT (=0 if not job related) RETURN CONDITIONS: (AO) Status <0 TIO error =0 Logical I/O complete (A6) Address of DNT (A7) Same value as on input Block transfer routines TIO supports two block transfer routines, $RBLK and $WBLK. $RBLK - $RBLK is called only by other task I/O routines and may not be called directly by a task. $RBLK looks to see if the buffer is less than half full. If it is, it calls CIO to initiate a disk read. CIO continues to read as long as the user continues to empty the buffer fast enough to keep it half empty. If the buffer is more than half full when $RBLK is called, $RBLK verifies the next BCW (its block number must equal the relative sector number of the dataset) and returns to the caller. SM-0040 3-15 SUBROUTINE NAME: $RBLK - Read block(s) ENTRY CONDITIONS: (AI) Address of DSP (AS) Address of current block control word (A6) Address of DNT (A7) Base address of DSP buffer pointers (either uses BA or JM address) (W.DPTM) RETURN CONDITIONS: (AO) PW Status <0 TIO error =0 Logical I/O complete (AI) Address of DSP (A4) OUT (A6) Address of DNT (A7) Same value as input $WBLK - $WBLK is called only by other task I/O routines. $WBLK checks to see if the buffer is more than half full. If it is, it calls CIO to initiate a disk write and writes a sew. CIO continues to write as long as the user continues to fill the buffer fast enough to keep it more than half full. If the buffer is less than half empty When $WBLK is called, $WBLK does no more than insert BCWs as needed. SUBROUTINE NAME: $WBLK - Write block(s) ENTRY CONDITIONS: (AI) Address of DSP (AS) Address of next block control word (A6) Address of DNT (A7) Base address of DSP buffer pointers RETURN CONDITIONS: (AO) SM-0040 Status <0 TIO error =0 Logical I/O complete (AI) Address of DSP (A6) Address of DNT (A7) Same value as input 3-16 Stepflows of TIO subroutines $REWD 1. If eod not written, call $WEOD. 2. Reset DSP. 3. Exit. $WEOD 1. If eof not written, call $WEOF. 2. Call $WWDR to write eod. 3. Exit. $WEOF 1. If eop not written, call $WWDR. 2. Call $WWDR to write eof. 3. Exit. $WWOP/$WWDR/$WWDS 1. If preceding function was a write, go to 3. 2. Process write after read. 3. Move words into buffer; if end of move, go to 7. 4. If not at BCW, go to 3. 5. Call $WBLK. 6. Go to 3. 7. If not record mode ($WWDR), go to 11. 8. Insert eop. 9. If not at BCW, go to 11. SM-0040 3-17 10. Call $WBLK. 11. Update DSP. 12. Exit. $WBLK 1. If buffer more than half used, call woeS. 2. Update DSP. 3. Exit. $RWDP/$RWDR 1. Move words out of buffer; if end of move, go to 5. 2. If not at BCW, go to 5. 3. Call $RBLK. 4. Go to 1. 5. If not record mode (SRWDR), go to 9. 6. Point at next eop. 7. If not at BCW, go to 9. 8. Call $RBLK. 9. Update DSP. 10. Exit. $RBLK 1. If buffer more than half empty, call ROCS. 2. Update DSP. 3. Exit. SM-0040 3-18 3.3.3 CIRCULAR I/O ROUTINES (CIa) Physical I/O on a dataset uses a circular initiated by a set of STP common routines The Exchange Processor uses CIa to handle CIa is also accessible to all other tasks calls. buffering technique and is known as CIa (Circular I/O). I/O calls from user programs. through TIO and through direct The Disk Queue Manager initiates circular I/O operations on mass storage in response to CIa calls. These calls are issued by user programs or tasks when data is to be transferred between the I/O buffer defined by the DSP and mass storage. However, these requests need not be explicitly issued. FORTRAN I/O routines in user programs and TIO routines in STP manage the I/O buffers and make calls to CIa. The I/O buffer consists of an integral number of 5l2-word blocks. For a COS blocked file, the first word of each block is a block control word. The size and location of the buffer are defined when the DSP is generated. The default size is defined by an installation parameter. Logical I/O on a buffer may be concurrent with physical I/O. That is, on a read operation, the user may be extracting data from the buffer at the same time the system is inserting data, with the user read lagging the system read. Alternatively, on a write operation, the user may be inserting data into the buffer at the same time the system is emptying it. In this case, the user write leads the system write. The buffers are managed through the IN, OUT, FIRST, and LIMIT pointers in the DSP. Figure 3-5 illustrates the format of physical I/O. Referring to step A, the IN pointer advances from FIRST to LIMIT as data is inserted into the buffer. Step B illustrates how emptying the buffer lags filling the buffer. The OUT pointer, Which is initially the same as IN, advances toward LIMIT but always lags IN. For writing, a buffer can become full when data is inserted faster than it is extracted. For reading, a buffer can become empty if data is extracted faster than it is inserted. Physical reads and writes always involve 5l2-word blocks. On a read, IN is always at a 5l2-word buffer boundary, but OUT, Which is being modified by the user, need not be. Conversely, on a write, OUT is always at a 5l2-word buffer boundary but IN need not be. SM-0040 3-19 On a read operation, EXEC and CIO modify the IN pointer and the caller modifies the OUT pointer. If IN=OUT, the buffer is empty if errors have occurred (DPERR~O) or if the DSP is busy (DPBSY=l). The buffer is full when IN=OUT, the DSP is not busy, and no errors have occurred. OUT=F I RST-+ FIRST -+ OUT-+ I I t IN-+ IN-+ LIM IT -+ '--_ _ _ _ _ _ _ _ _----1 LIMIT-+~------------------~ A. Fi 11 i ng the buffer B. Emptying the buffer FIRST -+ IN-+ processing flow OUT-+ LIMIT-+ C. Concurrently filling and emptying the buffer Figure 3-5. S~0040 Physical I/O 3-20 On a write operation, DQM and CIO modify the OUT pointer and the caller modifies the IN pointer. If IN=OUT, the buffer is full if errors have occurred (DPE~O) or if the DSP is busy (DPBSY=l). The buffer is empty if IN=OUT, the DSP is not busy, and no errors have occurred. A dataset may be declared memory resident. If so, CIO determines Whether a physical I/O request should be issued for the dataset based on processing direction and whether the buffer is full or empty. If the request is to write the dataset and the buffer is full (IN=OUT), CIO issues a physical I/O request. In this case, CIa also clears the memory resident indicators in the DSP and DNT. If the buffer is not full, CIO merely returns to the caller. If the request is to read the dataset and the buffer is empty (IN=OUT and DPIBN=O), CIO issues a physical request if the DNT shows that mass storage space exists. If CIO is called to read and the buffer is not empty, CIO returns as if a successful read had occurred. If the buffer is empty, CIO determines whether the requested block (DPIBN) is within the buffer (IBN*512~LIMIT-FIRST) and whether the block exists (IBNo and p = Pp for jobs not in.memory. The approximation improves with decreasing 6t, becoming exact when 6t is infinitesimally small. (~P is too large by a factor that approaches 1+ (p6t)/2 as P 6t approaches O. Any error introduced by too large a scheduling interval causes p to approach its asymptote at Po ~ a more quickly.) The rate of change for a job's memory priority should depend to some extent on the size of the job. A large job takes more time to be rolled in and out than does a small job, so the large job should remain longer in memory and on the disk. Job size causes memory priority to vary when the rate of change (p) is multiplied by 1 - jaw ( job size ) flmax where S~0040 jaw is an adjustable weighting factor (I@JSHJSW), and flmax is the maximum job size (I@JFLMAX). 4.5-8 Note that the value of this expression is always less than or equal to 1, and that it approaches 1 either as jsw approaches 0 or as the job size approaches the maximum job size. Thus, the final version of the algorithm is I:, P = (Po + a - P) - jsw ( _ j size O b) 1 PI:, t f"lmax For the purpose of determining when a job should be rolled out, the time actually used by a job is assumed to be a function of the time spent executing and the time spent waiting for I/O completion; that is, it excludes only the time spent waiting for assignment to the CPU. The underlying assumption is that no time is lost in the I/O request queue and no time lost waiting for the best opportunity to begin the seek operation. To the extent that this assumption is false, jobs are short-changed on memory time. Also penalized are those jobs that access disk cylinders far removed from the centers of disk activity. An offsetting advantage is that, when waiting time is low, memory priorities age nearly as fast in memory as on the disk. Behavior of the high-level scheduling algorithm Figures 4.5-2a through 4.5-2f show the variation of the memory priorities for several jobs running at the same time. In all of the examples given below, the following conditions hold: • Not all the jobs can fit into memory together. • The scheduling interval is very small, so that exponential curves may be used to describe the variations in each job's running priority. • The rate of change of memory priority is independent of how much CPU time each job gets. (This is contrary to the facts, but it simplifies the graphs by giving each curve fragment the same decay constant. ) • The deadband interval, db, is 2. • The amplitude, a, is 3. • The solid lines represent memory priority for jobs in memory; the dashed lines represent jobs that are waiting for memory. • All jobs are submitted at the same time and they each run long enough that their subpriorities are all O. • Competing jobs are of the same size (until figure 4.5-2e) • • The rates of rise and decay are independent of job size. SM-0040 4.5-9 Figure 4.5-2a shows two jobs that are too large to share memory. Their initial priorities (from the job statements) are 3 and S. The nIgher priority job runs first and consistently enjoys a longer stay in memory, because its priority has asymptotes at P=2 and 8 (as opposed to 0 and 6 for the other job). 9 8 7 t 6 t db=2 t Note: In all the examples, jobs are swapped when the difference between their priorities equals or exceeds db. o~--------------------------------------------------------------- Figure 4.5-2a. Memory priority variation Figure 4.5-2b shows three jobs, only two of which can share memory at any one time. Except for the presence of the priority-7 job, this graph is identical to figure 4.5-2a. The priority-7 job can never be forced out of memory because its priority never will be even as much as one unit below P=5, which is the maximum attainable by either of the other two jobs. (If its initial priority was 6 instead of 7, it still would never be forced out by the other two jobs.) 9 8 7 t p 6 5 7 4 3 2 ~t3 -I ~ t5~ 0 t ----. Figure 4.5-2b. SM-0040 Memory priority variation 4.5-10 Figure 4.5-2c shows three jobs which, like those in figure 4.5-2b, must share memory two at a time. In this case, though, the priorities are close enough together that they each get some time in memory. Although the pattern may not be a repeating one, each job's time in memory is consistent with its priority. 9 8 7 r p 6 5 4 ;' 3~4 ,I / ~ ",;' I 3 , I I /" 2 o~----------------------------------------------------------------t ---. Figure 4.5-2c. Memory priority variation Figure 4.5-2d shows a very large number of jobs, all of the same initial priority (4), and only one of which can fit in memory at anyone time. The uppermost dashed line represents the current priority of all the jobs that have yet to be initiated. The jobs are initiated in the order they were submitted (or in JXT-entry order, if they were submitted simultaneously). Each job that runs gets slightly less time than the previous one, but this time in memory rapidly approaches a limiting value T as t advances. (It may be possible to specify the decay rate Pp indirectly by specifying this value of T, along with db and a.) 9 8 7 Time in memory (t4) approaches a 1 imit T as t~oo; specifying T would fix the decay rate if db and a were known. 3 2 o~----------------------------------------------------------------t ---. Figure 4.5-2d. SM-0040 Memory priority variation 4.5-11 In figure 4.5-2e is the first example in which the jobs are of different sizes. One high-priority job (6L) fills mernory~ while two other jobs C6S and 3) are each half as large as 6L. Note that the two possible states of memory (6L versus 6S and 3) each exist half the time, so that the low-priority job gets as much memory time as if it had the same priority as the other two jobs. This unfairness is remedied by making 3's priority decay faster, since the other two jobs completely determine scheduling. 9 8 7 6 r 45 / /XX /" / 2 / 6L 8d~~ 65 p 3 / / 65 / 3 //~ ~t6L :~s~ / 0 ...-- Alternating memory states for figures 4,S-2e and 4.S-2f. t -.. Figure 4.5-2e. Memory priority variation Figure 4.5-2f is the same job mix as in figure 4.5-2e, but the rates of priority rise and decay are both inversely proportional to job size. The two possible states of memory each exist half the time, just as before. No gain is made in fairness--but making both rates depend on job size in this way improves the efficiency of the operating system because larger jobs take more time to roll in and out. The strength of the relationship between job size and rate of priority change depends on the value of I@JSHJSW. t ---. Figure 4.5-2f. SM-0040 Memory priority variation 4.5-12 Interrelationship between a, db, and user priorities By choosing a value for the amplitude, a, an installation determines the range of permissible user priorities so that the difference between the minimum and maximum Po is l5-2a. Furthermore, any two jobs whose pas differ by as much as 2a-db can never force each other to rollout. Therefore, the installation should make sure that l5-2a is less than 2a-db by making db < 4a-15 if memory swapping is to affect all jobs. The installation also selects a default priority (I@MEMPRI) when the P option is missing from the job statement. The priority should be in the middle of the permissible range; ideally, 7 or 8. Job class scheduling Job class scheduling consists of identifying sets of jobs having common characteristics so that they can receive special handling. Special handling occurs during job initiation (When a job receives a JXT) and/or job processing (when a job is given the CPU and/or memory). A class receives a job initiation advantage when a number of JXTs are reserved for its exclusive use. A job belonging to that class is initiated as soon as it enters the input queue unless all of that class's reserved JXTs are allocated. If they are, a JXT may be drawn from the JXT pool. The JXT pool consists of all the non-reserved JXTs available to the system. Each class may draw from the pool when all of its reserved JXTs are allocated. The total number of a class's allocated JXTs--reserved and in the pool--must not exceed the class's maximum JXT limit. A relative advantage is given to jobs in a class that has a large maximum JXT limit. A class receives an additional job initiation advantage when it is assigned a high class rank in a CLASS directive, because JXTs are allocated from the pool in class rank order. Job initiation is disabled when a class is turned OFF in a CLASS directive; that is, no new JXTs are allocated to a class that is OFF. Members of such a class remain in the input queue until the operator turns the class ON. Job processing is affected when the class provides a class priority that overrides the job priority of all of its members. SM-0040 4.5-13 Job class structure Job class scheduling is defined by the job class structure in effect. See JCSDEF for a detailed description of a job class structure. After a system Install, the following default job class structure is in effect: SNAME,SN=DEFAULT. CLASS,NAME=JOBSERR,RANK=1,CHAR=JSE,RES=O,MAX=63 CLASS,NAME=NORMAL,RANK=2,CHAR=ORPH,RES=O,MAX=63. SLIMIT,LI=lS. When the default structure is in effect, all jobs are classified as normal. The operator may use the LIMIT command to set the maximum number of JXTs to the structure's recommended value, IS. When the LIMIT comand sets the maximum number of active JXTs to less than the total reserved JXTs, the JXTs are reserved in class rank order until they are exhausted. The utility JCSDEF can be used to define a job class structure from a series of card image directives. The operator can run JCSDEF to invoke a new class structure at any time without a system interruption. A job class structure can be recovered or invoked at system Startup. Job class structure monitoring The effectiveness of a job class structure can be monitored by system log entries. When a structure is invoked or recovered, COS enters a message into the system log recording the structure's name. Additionally, the System Performance Monitor task logs scheduling status information at regular intervals. Scheduling status information includes: • • • • • • • Number of jobs in the system Number of active JXTs Number of available pool JXTs Number of active JXTs in each class Number of classes waiting for JXTs Number of jobs waiting for JXTs in each class RES, MAX and ON/OFF values of each class COS enters a message into the system log when a job enters the input queue and again when it is given a JXT. The system utility, EXTRACT, can be used to generate job class reports. SM-0040 4.S-14 4.5.3 TUNING THE SYSTEM To reduce thrashing (excessive rolling in and out), several adjustments are possible. For quick results, an installation might increase the deadband, decrease the amplitude, and/or decrease the rise and decay rates, all of which affect high-level scheduling. A much slower change occurs if the operator reduces the maximum number of jobs in the mix. The most drastic approach is to reduce the maximum field length, thus preventing very large jobs from ever being able to run. To reduce turnaround time for low-priority jobs (at the expense of system throughput), an installation might take the opposite course: increase the number of jobs in the mix, increase the rise and decay rates, increase the amplitude, and/or decrease the deadband. Installation parameters Installations can change several JSH parameters to suit their specific needs. (Parameters are made adjustable by using a memory word for each parameter, rather than equated values assembled into multiple locations, and by providing a means for changing them during deadstart and/or while the system is running.) A list of installation parameters (including JSH parameters) and the values assembled into the released system are given in COS Operational Procedures Reference Manual, publication SM-0043. Time slice The time slice is the maximum amount of continuous execution time that a user job is allowed to have before it is disconnected from the CPU. This maximum normally affects only CPU-bound jobs because an I/O-bound job usually relinquishes the CPU before its time slice expires. The formula used by the Job Scheduler allows an installation to determine each job's time slice. If I@TSCTM=O, the time slice is directly proportional to its initial memory priority. If I@TSMPM=O, all jobs have the same time slice as determined by the mean connect time for all jobs in the past scheduling interval. A position anyWhere between these two extremes is also possible. The value of I@TSMIN is important. The smaller it is, the more sensitive the time slice computation is to variations in the other four terms and the more likely it is that the system will be subjected to excessively frequent exchanges when strongly I/O-bound jobs are sharing the cpu. For the latter reason, very small values of I@TSMIN are to be avoided. A value of 0, however, turns off the entire time-slice mechanism in favor of a strategy of instant preemption by the higher priority job, as described under Low-level scheduling. SM-0040 4.5-15 Maximum number of jobs in mix (LIMIT command) The maximum number of jobs in the mix (JXTMAX) is set by the operatuL using the station's LIMIT command. JXTMAX is usually smaller than the number of jobs waiting to be initiated but larger than the number of jobs that can be in memory together. If JXTMAX is greater than the actual number of jobs in the mix, jobs are taken from the input queue (Where they are ordered by priority) and added to the mix. This parameter can be reduced to 0 by the operator to prevent any new jobs from being added as the operator cancels or lets complete all jobs in the mix. No job can be added to a full mix, regardless of its priority, until one of the running jobs terminates. Deadband Increasing the deadband decreases the importance of priority levels; as an extreme example, if the deadband height were 10 (with a = 3 and Po therefore ranging from 3 to 13), all jobs would inevitably run to completion without ever being forced to rollout. Even with the deadband height as low as 2 (with a = 3), a job with Po = 7 would never rollout to make way for one with Po = 3; although, reducing the deadband height to 1.9 in this case would allow the job with Po = 7 to eventually be forced out by the job with Po = 3. Rise and decay rates Decreasing the rates at which memory priorities rise and fall reduces the roll-out activity, increases the advantage of jobs with higher initial priorities, and greatly increases the turnaround time of some low-priority jobs. Infinitely slow aging rates prevent any rolling out, just as if a large deadband were used. Weighting factors I@JSHPHW: The past-history weighting factor. When increased, this factor reduces the effect (on CPU sharing) of momentary fluctuations in the character of a job from CPU-bound to I/O-bound or vice versa. I@JSHJSW: The job-size weighting factor. At 1.0, this factor allows close correspondence between job size and rollout frequency so that a job that is twice as large is rolled to disk about half as often. If this factor is set to 0.0, job size is not considered at all in computing the changes in memory priority. SM-0040 4.5-16 4.5.4 MEMORY MANAGEMENT The Job Scheduler allocates memory for job initiation and rolling in. also handles requests made by the job that is currently running for changes in field length. It If a request for more memory cannot be satisfied by annexing part of an area contiguous to the current job, the job must be moved to another free area or rolled out. The first-fit method is used for allocating memory to jobs that are to be moved or rolled in. That is, JSH allocates from the first free block of memory that is large enough, always beginning its search at the low end of memory to encourage large blocks of free memory to grow at the high end. If a memory requirement cannot be satisfied immediately, but could have been satisfied if the available blocks were collected into one large block, a storage compaction mechanism is enabled and remains enabled as long as it can help to satisfy such requirements. If compaction is enabled, it takes place after all allocations are made that can be made without compaction. During compaction, only jobs in wait state (not running and not in I/O suspension) can be moved, so the result is necessarily imperfect. More than one free area will very likely be available. After each cycle of compaction, JSH tries the unsatisfied requests again in the same order in Which they originally occurred. When all requests are satisfied, compaction is disabled. The compaction mechanism moves allocated blocks from the high end of memory to the low end, again (like the "first-fit" allocation strategy) to encourage the growth of large blocks of free memory at the high end. Jobs to be moved are selected starting at the low end of memory. For each job that can be moved, the memory it occupies is temporarily freed and a search is made for a segment of the same size starting at the low end of memory. (The search is always successful because it can return the same segment or one that overlaps it.) The job is relocated to the found segment, if necessary, and compaction resumes with the next job that is movable. The compaction mechanism is enabled based on a tally of the total amount of memory available, including blocks that will become available When previously requested rollouts (if any) are completed. If a job is promised memory (to be allocated when made available by compaction), this tally is reduced accordingly. It is only when an unsatisfied memory request is less than this total amount that compaction does any good; larger requests must wait for rollouts to occur (rollouts triggered by changes in memory priorities) • SM-0040 4.5-17 A table of memory segment descriptors (MST) is maintained by ~cneduler. Each I-word entry in this table corresponds to a allocated memory segment. A newly freed segment adjacent to segment are immediately combined. All of the memory that is by JSH is initialized as a single free block. the Job free or another free allocatable For a description of how the Job Scheduler handles requests for changes in field length, see Allocate Request in section 4.5.7. 4.5.5 JOB STARTUP When JSH sets up'a job for its first shot at the CPU, it allocates the amount of memory specified by the M parameter on the JOB statement (but first adjusting this value if it exceeds I@JFLMAX or is smaller than I@JFLMIN). It then initializes the job's JXT entry and the Job Table Area (JTA). Most of the JTA is initially filled with zeros. The exceptions are the following fields: Field Description JTJN JTUSR JTXP+l JTXP+2 JTJCB JTCMSG JTEPJ JTSID JTDID JTJXT JTTID JTDNT 7-character job name (from JXT) IS-character user number (from SOT) Bits 18-35 (BA) LA and flags JCB pointers Conditional message flags JSH request flag Source ID (from SDT) Destination ID (from SDT) Pointer to JXT entry Terminal ID (from SDT) 3 DNTs (the first 2 for system use only), in the following order: $CS, $LOG, and $IN The DNTs for $CS and $IN refer to the same dataset. DSP list for $CS DSP list for $LOG JTCDP JTLDP JSH makes the DNTs for $CS and $LOG· unavailable to the user by storing a nonzero value in the low-order 8 bits of the first word in each DNT. The names $CS and $LOG, therefore, cannot be found by FSDNT and are used only as reference points in a dump. These two DNTs are always placed in the JTA in the order given above. The user DNTs, theoretically, may be in any order following the first two. (The DNTs for $CS and $IN refer to the same dataset.) JSH sets up DSPs for $CS and $LOG in the JTA, initializing the dataset name (the same as the dataset name in the DNT) and the four I/O pointers--FIRST, IN, OUT, and LIMIT. $CS is opened for input and $LOG is opened for output. SM-0040 4.5-18 The DNTs are initialized as shown in table 4.5-1. dataset is in the JXT rather than in the JTA. Table 4.5-1. The DNT for the rolled DNT initialization Files initialized DNT Field Roll Dataset Control Statement File LOG File Standard Input Dataset 'ROLLDNT' '$CS'§ '$LOG'§ '$IN' DNOC binary ' 11'B 'lO'B 'Ol'B 'OO'B DNP binary l§§ DNDC ASCII 'SCI DNDN ASCII - - - 'IN' 'PR' - 'IN' DNDAT address - DNPDS binary 0 1 0 DNACS octal 0007 0007 0007 0007 DNBFZ decimal - 1 1 4§§§ DNDSP address - yes yes copied from SDT copied from SDT 1 no § tr'he dataset names $CS and $LOG are unavailable to the user because the low-order byte of the word in which each name is stored is set to a nonzero value. The roll dataset's DNT is unavailable because it is not in the JTA. §§ The DNT (processing direction) flag for the roll dataset is toggled according to the expected direction of the next I/O transfer. § §§ The buffer size for $IN is an installation-dependent parameter. The numbers given for DNBFZ are multiples of 512-word blocks. SM-0040 4.5-19 d.5.6 JOB STATUS AND STATE CHANGES The 23-bit status field (JXSTAT) in each job's JXT entry is described in table 4.5-2. The bits labeled Q, R, X, I, U, L, S, 0, and Mare determine the jobis state; the other bits modify the job's state. If all of bits 3 through 22 are 0, the job is said to be waiting to be connected to the CPU (state W). Table 4.5-2. Bit position in JXSTAT Bit name Status Corresponding job state bi~ assignments Interpretation (when bit is set) o K s Keep this job in memory; do not roll it out. 1 A any Abort pending; reason given in JXEPC 2 H o Holding operator or shutdown suspension until RN is set 3 o o Suspended (indefinitely) by operator 4 s Suspended (possibly by system) 5 T s s 6 E s Suspended until a given event occurs 7 M M Memory allocation is pending. 8 Q Q Queued up; waiting to be initiated 9 R R Rolled out. set. 10 x x Executing 11 I I Dormant pending recall on I/O completion 12 C any Rerun request in process 13 D any Delete request in progress 14 U u Unloading from memory to roll file SM-0040 Suspended until a given time elapse 4.5-20 The M bit may also be Table 4.5-2. Bit position in JXSTAT Bit name Status bit assignments (continued) Cor responding job state Interpretation (when bit is set) 15 L L 16 p U or L Unload or load initiation is pending 17 y M" Q or R Waiting for memory liberation 18 Z M" Q or R Waiting for memory compaction 19 B S Suspended (indefinitely) by recovery 20 V I Waiting on INDEX write completion 21 F I Waiting on rollfile write completion 22 N M" Q or R SM-0040 Loading into a new memory area Not in memory 4.5-21 Figure 4.5-3 and table 4.5-3 illustrate some of the transitions that normally occur between job states. Hust wait for I/O Disconnection (time slice expired) Operator . SUSPEND command I/O is complete Allocate request Roll-in is done ----I IRoll-out Iforced by lanother job I soon enough -i'7l ~ I I Preempts Imemory from lanother job I ----Roll-out is done Operator RESUME command Roll-:out is done Figure 4.5-3. SM-0040 Normal transitions between job states 4.5-22 Table 4.5-3. State-change sequences Sequence Explanation QN +(W) +X A job is started up for the first time. X +I +W A job requests recall on I/O completion. x +W A job's CPU time-slice expires. W+X A job is given another time slice. W+UN-+RN A job is rolled out. RN +LN +W A job is rolled in. X +MN +W A job's memory allocation is adjusted. X -4- MN + UN + RN Not enough memory is immediately available. x + UN + RN-+liRO A job is suspended by the operator. A job is resumed. State changes involved in CPU swapping Figure 4.5-3 shows most of the state changes that can occur for any particular job. Those state changes shown with broad solid lines are the basic changes that all jobs must undergo. If all the jobs in the JXT can fit into memory at the same time, these state changes are also the only state changes that the jobs undergo as long as they make no memory requests and open no auxiliary datasets. QN~X A job that has been queued in the JXT (Job Execution Table) waiting for sufficient memory to become available is given its first CPU time slice. (The job actually exists momentarily in the W state before it begins executing; but because the new job's CPU priority is initialized to the highest possible value, the transition to state X is immediate. ) X+I The currently executing job becomes dormant, either by requesting suspension pending the completion of a particular I/O transfer. The CPU becomes available for use by another job. I+W The I/O transfer for which suspension was requested is now complete. The job joins others that may be waiting for CPU time. SM-0040 4.5-23 x -+W The executing job's time slice expires. Unless it is the only job that is running~ it is disconn~ctpd from th~ CPU and joins any other jobs that may be waiting. h-' +X The CPU has just become availableQ JSH selects the waiting job that has the highest CPU priority (excluding the job that was just disconnected) and connects it to the CPU. State changes involved in memory swapping In figure 4.5-3, the paths involved in rolling jobs in and out are shown as dashed lines. Any job that is in state W, waiting for more CPU time, is liable to be rolled out if it occupies space that can be used by another job with sufficiently high memory priority. In essence, jobs having states W and R exchange places, but they must each pass first through an intermediate state (U or L). {i -+ UN A roll-out I/O request is initiated for a waiting job in order to make memory available. UN -+ RN The roll-out I/O request is complete; the job's memory is released. RN -+LN Memory is allocated for the job and a roll-in I/O request is initiated. LN -+W The roll-in I/O request is complete; the job begins contending for CPU time. State changes involved in job suspension and reactivation In figure 4.5-3, the suspended states are connected to the rest with narrow solid lines. A job may be momentarily suspended when it makes an allocation request (J$ALLOC). Shortly after it suspends the job, JSH checks for active I/O requests; if there are no I/O requests and the allocation request can be satisfied, the suspension is lifted. The suspension is kept in force if the job must be moved but there is no space for it right away; that is, the M bit remains set and the job is then liable to be rolled out if memory is in demand. Suspension can also occur as a result of an explicit user request to suspend processing until a given event has occurred (J$AWAIT) or until a given time has expired (J$DELAY). SM-0040 4.5-24 Finally, a job can be suspended and resumed by the operator to prevent the job from using any system resources. X-+ M The currently executing job makes an allocation request and is considered dormant until the request can be satisfied. If the request involves the movement of I/O buffers or tables, it cannot be satisfied until all the job's I/O is done. If, after all I/O is done, the request still cannot be satisfied, the job may be rolled out. MN -+ W The allocation request was satisfied before the job could be rolled out. The job joins any other jobs that may be waiting for CPU time. MN-+ UN The job has been rolled out because memory is in demand. More space was required to satisfy the allocation request than could be obtained merely by reallocating memory. X-+ S The currently executing job makes a suspension request (JSDELAY or JSAWAIT). The job is disconnected and may be rolled out if memory is in demand. w-+ 0 A job in memory is suspended by operator intervention. A job that is operator-suspended (0) is always rolled out when active I/O finishes. S-+ SUN A roll-out I/O request to make memory available is inititated for a suspended job. SUN -+ SRN The roll-out I/O request is complete; the job's memory is released. ORN-+ RN A job that was suspended by the operator (and subsequently rolled-out is reactivated by the operator. The job is rolled back in When memory is available. SRN-+ RN A job that was suspended by the system (and subsequently rolled-out) is reactivated because a given time elapsed or a particular event occurred. The job is rolled back in when memory is available. S-+ W A job that was suspended by the system is reactivated While still in memory. SM-0040 4.5-25 4.5.7 JSH INTERFACE WITH OTHER TASKS The job scheduler task is created with all other system tasks by the startup procedure. It can then be called by any other task through the sequence of instructions shown later in this subsection. JSH always replies to each request by setting the appropriate output registers and readying the requesting task. However, the reply is not always immediate. For some requests, JSH must wait until memory is available or until an I/O transfer is complete before it replies, so that the requesting task may proceed correctly. To enable the requesting task to determine What a delayed reply means, JSH echoes the entire contents of the first input register as the second output register. This word contains the JSH function code, the JXT ordinal identifying the job, and additional information as supplied by the requesting ta$k. A status indicator is returned in the first output register. Input register format: INPUT+Q INPUT+l Field Word Bits Description AUX I NPUT+O 0-23 Auxiliary information1 unused by JSH. value the caller places in INPUT+O is returned verbatim in OUTPUT+I.) CODE I NPUT+O 24-47 (J$ABORT request only) Abort code; use equated labels of the form A$xxxxx, Where xxxxx = DROP, KILL, RERUN, or other predefined abort code. These abort codes are also used in the J8UROLL request. ADDR INPUT+O 24-47 Word address relative to the beginning of STP of an additional word or list of words if needed to fully specify the call. Refer to the individual function descriptions for more detail. FC INPUT+O 48-52 Function code; use equated labels of the form J$Xxxx, selected from table 4.5-4. JXO INPUT+O 53-63 JXT ordinal for the job in question. It can assume a value from 1 to I@JXTSIZ. It cannot be O. SM-0040 4.5-26 (Any Output register format: 00 OUTPUT+Q r --."--_. _'"""---._---- OUTPUT+l 1---____ _ L _______ ._.~~_~__ _ 24 48 53 63 . . -- -. -- ------..-- _.-.--.--._-- ----------------------1 I STATUS , -FC-r--·-JXO----' -.CODE-.. . _.or .ADOR __. _. . _____. ~---__ - _______l Field Word Bits Description STATUS OUTPUT+O 0-63 Status of requested function =0 Requested function completely accomplished ;0 Error or system is unable to fulfill request completely AUX OUTPUT+l 0-23 Auxiliary information; unused by JSH. value the caller places in INPUT+O is returned verbatim in OUTPUT+I.) CODE OUTPUT+l 24-47 (J$ABORT request only) Abort code; use equated labels of the form A$xxxxx, Where xxxxx = DROP, KILL, RERUN, or other predefined abort code. These abort codes are also used in the J$UROLL request. ADDR OUTPUT+l 24-47 Word address relative to the beginning of STP of an additional word or list of words if needed to fully specify the call. Refer to the individual function descriptions for more detail. FC OUTPUT+I 48-52 Function code; use equated labels of the form J$xxxx, selected from table 4.5-4. OUTPUT+I 53-63 JXT ordinal for the job in question. It can assume a value from I to I@JXTSIZ. It cannot be O. SM-0040 4.5-27 (Any Calling sequence JSH can be invoked from any other task by calling either TSKREQ or PUTREQ with the following instruction sequence: Location Result Operand JSHID,O function code (already shifted) job1s ordinal in JXT Sl!S2 address if any 1 S2<0116 Sl!S2 auxiliary information if any 1 A2 Sl S2 Sl S2 S2 [ Sl [ Sl~~ S2
Source Exif Data:File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.3 Linearized : No XMP Toolkit : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37 Create Date : 2013:06:29 15:15:11-08:00 Modify Date : 2013:06:29 16:18:15-07:00 Metadata Date : 2013:06:29 16:18:15-07:00 Producer : Adobe Acrobat 9.55 Paper Capture Plug-in Format : application/pdf Document ID : uuid:1d5479c1-726d-f144-8ba3-b36fe2ce3e57 Instance ID : uuid:8e635db0-4988-ce4f-b81d-16e199ccaced Page Layout : SinglePage Page Mode : UseNone Page Count : 310EXIF Metadata provided by EXIF.tools