SMM 1012C CRAY_YMP_XMP_EA_XMP_CRAY_1_Computer_Systems UNICOS_Online_Diagnostic_Maintenance_Manual March_1989.OCR CRAY YMP XMP EA 1 Computer Systems UNICOS Online Diagnostic Maintenance Manual March 1989.OCR

SMM-1012C-CRAY_YMP_XMP_EA_XMP_CRAY_1_Computer_Systems-UNICOS_Online_Diagnostic_Maintenance_Manual-March_1989.OCR manual pdf -FilePursuit

SMM-1012C-CRAY_YMP_XMP_EA_XMP_CRAY_1_Computer_Systems-UNICOS_Online_Diagnostic_Maintenance_Manual-March_1989.OCR SMM-1012C-CRAY_YMP_XMP_EA_XMP_CRAY_1_Computer_Systems-UNICOS_Online_Diagnostic_Maintenance_Manual-March_1989.OCR

User Manual: SMM-1012C-CRAY_YMP_XMP_EA_XMP_CRAY_1_Computer_Systems-UNICOS_Online_Diagnostic_Maintenance_Manual-March_1989.OCR

Open the PDF directly: View PDF PDF.
Page Count: 392

DownloadSMM-1012C-CRAY_YMP_XMP_EA_XMP_CRAY_1_Computer_Systems-UNICOS_Online_Diagnostic_Maintenance_Manual-March_1989.OCR SMM-1012C-CRAY YMP XMP EA CRAY 1 Computer Systems-UNICOS Online Diagnostic Maintenance Manual-March 1989.OCR
Open PDF In BrowserView PDF
CRAY Y_MPTM, CRAY X-MP EA™,
CRAY X_MpTM, and CRAY-l®
Computer Systems

UNI COS On-line Diagnostic
Maintenance Manual
®

-

,,!

SMM-I012 C

Cray Research, Inc.

CRAY PROPRIETARY
Dissemination of this documentation to non-CRI personnel requires
approval of the appropriate vice president and that the recipient sign
a nondisclosure agreement. Export of technical Information In this
category may require an export license

CRAY PROPRIETARY
Dissemination of this documentation to non-CRI personnel requires approval from the
appropriate vice president and a nondisclosure agreement. Export of technical
information in this category may require a Letter of Assurance.
Restricted Rights Legend
Use, duplication, or disclosure by the Government is subject to restrictions as set forth
in the subparagraph [(c) (1) (ii)] of the rights in Technical Data and Computer
Software clause at 52.227-7013. (May 1987)
Cray Research, Inc.
608 2nd Avenue South
Minneapolis, MN 55402
Cray Research, Inc.
Unpublished Proprietary Information - All Rights Reserved under the copyright laws
of the United States and the U.C.C.
CRAY, CRAY-1, HSX, SSD, and UNICOS are registered trademarks and CFT, CFT77,
CFT2, COS, Cray Ada, CRAY-2, CRAYX-MP, CRAYX-MP EA, CRAYY-MP, CSIM,
Delivering the power... , IDS, SEGLDR, and SUPERLINK are trademarks of
Cray Research, Inc.
HYPERchannel and NSC are registered trademarks of Network Systems Corporation.
IBM is a registered trademark of International Business Machines Corporation.
Motorola is a registered trademark of Motorola, Inc. Sun Workstation is a registered
trademark and Sun is a trademark of Sun Microsystems, Inc. UNIX is a registered
trademark of AT&T. VMEbus is a trademark of Motorola, Inc.
The UNICOS operating system is derived from the AT&T UNIX System V operating
system. UNICOS is also based in part on the Fourth Berkeley Software Distribution
under license from The Regents of the University of California.
Due to space restrictions, the following abbreviations are used in place of the specific
system names:
CXll

Includes all models of the CRAY X-MP and CRAY-l computer systems

CEA

Includes all models of the Extended Architecture (EA) series, including
the CRAY Y-MP and CRAY X-MP EA computer systems

CRAY-2

Includes all models of the CRAY-2 computer system

CXlCEA

Includes all models of the CRAY X-MP computer systems plus all
models of the CRAY Y-MP and CRAY X-MP EA computer systems. It
does not include the CRAY-l computer systems.

Requests for copies of Cray Research, Inc. publications should be sent to the following
address:
Cray Research, Inc.
Distribution Center
2360 Pilot Knob Road
Mendota Heights, MN 55120

HEW AND ERBANCED FEATURES

This UNICOS release 5.0 overview describes the new and enhanced features
contained in the CRAY Y-MP, CRAY X-MP EA, CRAY X-MP, and CRAY-l Computer
Systems UNICOS On-line Diagnostic Maintenance Manual, CRI publication
SMM-I012.
With UNICOS 5.0, there is support for diagnostics that run on CRAY Y-MP
and CRAY X-MP EA computer systems, as follows:
•

Y-mode (32-bit addressing), available only as indicated in
appendix A, On-line Diagnostic Programs

•

X-mode (24-bit addressing), unless otherwise indicated

Specific new and enhanced features are as follows:
Feature

Status

cleario

Enhanced

6

Adds support for the
Operator Workstation (OWS)
and the CRAY Y-MP and CRAY
X-MP EA computer systems.

dsdiaq

Enhanced

6

Adds support for the OWS and
the CRAY Y-MP and CRAY
X-MP EA computer systems.

donut

New

5

On-line disk maintenance
program

offmon

New

2

Off-line confidence monitor

olcfpt

New

3

Comprehensive floating-point
instructions and data test

olClO

New

3

Common memory test

olcrit

Enhanced

3

Adds cluster selection.

oldmon

New

5

Down CPU monitor

olhpa

Enhanced

7

Adds support for DD-40 disk
drives, SSD errors, and the
CRAY Y-MP and CRAY X-MP EA
computer systems.

Section

Description

Feature

Status

olibuf

New

3

Instruction buffer test

olsbt

New

3

On-line semaphore, shared B
and shared T register test

runsequence

Enhanced

7

Adds examples of sequence
files used for testing and
file cleanup. Invokes one
less shell.

unitap

New

5

On-line magnetic tape test

Section

Description

RESEARCH. INC.

RECORD OF REVISION

PUBLICATION NUMBER

SMM-I012

Each time this manual is revised and reprinted, all changes Issued against the previous version are incorporated into the new version
and the new version is assigned an alphabetic level.
Every page changed by a reprint with revision has the revision level in the lower righthand corner. Changes to part of a page are noted
by a change bar in the margin directly opposite the change. A change bar in the margin opposite the page number indicates that the
entire page is new. If the manual is rewritten, the revision level changes but the manual does not contain change bars.
Requests for copies of Cray Research, Inc. publications should be directed to the Distribution Center and comments about these
publications should be directed to:

Restricted Rights legend
CRAY RESEARCH, INC.
1345 Northland Drive
Mendota Heights, Minnesota

Revision

55120

Use, duplication, or disclosure by the Government is subject to restrictions as
set forth in the subparagraph [(c)(1 )(ii)) of the Rights in Technical Data and
Computer Software clause at 52.227-7013. (May 1987) Cray Research,lnc.,
608 2nd Avenue South, Minneapolis, Minnesota 55402

Description

September 1986 - Original printing. This printing supports
the on-line diagnostic tests that run under the Cray operating
system UNICOS, release 2.0, on the CRAY X-MP and CRAY-1
computer systems. The on-line diagnostic tests for CRAY-1
computer systems are not available for UNICOS release 2.0.
All trademarks are listed in the record of revision.
A

June 1987 - Rewrite.
This printing supports the on-line
diagnostic tests that run under the Cray operating system
UNICOS, release 3.0, on CRAY X-MP and CRAY-1 computer systems.

B

July 1988 - Rewrite. This printing supports the on-line
diagnostic tests that run under the Cray operating system
UNICOS, release 4.0, on CRAY Y-MP, CRAY X-MP EA, CRAY X-MP,
and CRAY-1 computer systems.

C

March 1989 - Rewrite.
This printing supports the on-line
diagnostic tests that run under the Cray operating system
UNICOS, release 5.0, on CRAY Y-MP, CRAY X-MP EA, CRAY X-MP,
and CRAY-1 computer systems.

SMM-1012 C

CRAY PROPRIETARY

iii

PREFACE

This manual describes the on-line environment for diagnostic tests that
run under the Cray operating system UNICOS, release 5.0, on CRAY Y-MP,
CRAY X-MP EA, CRAY X-MP, and CRAY-l computer systems.
It is intended for
Cray Research, Inc. (CRI) field engineers and analysts. A working
knowledge of UNICOS is assumed.

CONVENTIONS
To aid in identifying the various groups of Cray mainframes, this manual
uses the naming conventions shown in the Hardware Product Line sheet,
which is located at the end of the preface. The Hardware Product Line
sheet shows both the chronological evolution of Cray mainframes and the
characteristics of each group. The reverse side contains definitions of
the terms used on the sheet and throughout this manual.
The conventions for entering the diagnostic commands are as follows:
Convention

Description

bold

Bold indicates one of the following:
Diagnostic program
Command option
Man page entry
File name

italic

Italic indicates variable or user-supplied
information.

x is

O'x

The prefix 0' indicates that

RETURN

This indicates the RETURN key. You must press the
RETURN after entering each keyboard command.

[

Square brackets indicate optional items.

]

an octal value.

+option

A plus sign (+) preceding a command option indicates
that the option is enabled.

-option

A minus sign (-) preceding a command option indicates
that the option is disabled.

SMM-1012 C

CRAY PROPRIETARY

v

Convention

Description

command(l)

This refers to an entry in the UNICOS User Commands
Reference Manual, CRI publication SR-2011.

command(lM)

This refers to an entry in the UNICOS Administrator
Commands Reference Manual, CRI publication SR-2022.

system call(2) This refers to an entry in the UNICOS System Calls
Reference Manual, CRI publication SR-2012.

entry(4X)

This refers to an entry in the UNICOS File Formats and
Special Files Reference Manual, CRI publication
SR-2014. The x indicates the section of the manual
that contains the entry.

OTHER PUBLICATIONS
CRI off-line diagnostic publications that may be of interest are as
follows:
HO-OI004
HO-OI005
HO-OI007
HM-OIOIO

CRAY-l Computer Systems Diagnostic Ready Reference Guide
CRAY X-MP Computer Systems Diagnostic Ready Reference
Guide
1/0 Subsystem (lOS) Diagnostic Ready Reference Guide
CRAY X-MP Computer Systems lOS-based Diagnostic Reference
Manual

CRI software publications that may be of interest are as follows:
SO-0083
SD-0235
SG-0307
SG-2005
SR-2011
SR-2012
SR-2014
SR-2022
SN-3030

vi

CRAY Y-MP, CRAY X-MP EA, CRAY X-MP and CRAY-l CAL
Assembler Version 2 Ready Reference
Software Problem Report (SPR) User's Guide
1/0 Subsystem (lOS) Administrator's Guide
1/0 Subsystem (lOS) Operator's Guide for UNICOS
UNICOS User Commands Reference Manual
Volume 4: UNICOS System Calls Reference Manual
UNICOS File Formats and Special Files Reference Manual
UNICOS Administrator Commands Reference Manual
Operator Workstation (OWS) Guide

CRAY PROPRIETARY

SMM-I012 C

CRI hardware publications that may be of interest are as follows:
HR-0030
HR-0081
CSMOll0000
CSM-0111-000
CSMOl12000
CSM-0400-000

IIO Subsystem Model B Hardware Reference Manual
I/O Subsystem Model C/O Hardware Reference Manual
CRAY X-MP/2 System Programmer Reference Manual
CRAY X-MP/l System Programmer Reference Manual
CRAY X-MP/4 System Programmer Reference Manual
CRAY Y-MP System Programmer Reference Manual

For additional information, refer to the on-line diagnostic listings.

UNICOS SYSTEM INSTALLATION BULLETIN
Refer to the UNICOS System Installation Bulletin for the following
information:
•
•

Build and installation procedures
Configuration guidelines

Each site receives this bulletin with the UNICOS release package.
can order additional copies from the CRI Distribution Center.

You

Note that appendix G, Installation Information, describes the procedure
for on-line diagnostic re-installation subsequent to system installation.

READER COMMENTS
If you have any comments about the technical accuracy, content, or
organization of this manual, please tell us. You can contact us in any
of the following ways:
•

Call our Technical Publications department at (612) 681-5729
during the hours of 7:30 A.M. to 6:00 P.M. (Central Time).

•

Send us electronic mail from a UNICOS or UNIX system, using the
following UUCP addresses:

uunet!cray!publications
sun! tundra!hall !publications
•

Send us electronic mail from a UNICOS or UNIX system, using the
following ARPAnet address:

publications@cray.com

SMM-1012 C

CRAY PROPRIETARY

vii

•

Send a facsimile of your comments to the attention of
"Publications" at FAX number (612) 681-5602.

•

Use the postage-paid Reader's Comment form at the back of this
manual.

•

Write to us at the following address:
Cray Research, Inc.
Technical Publications Department
1345 Northland Drive
Mendota Heights, Minnesota 55120

We" value your comments and will respond to them promptly.

viii

CRAY PROPRIETARY

SMM-1012 C

Hardware Product Line
eXIt Syatems

, . . . - - - - - - , • 12.s.na cloek qtClc
• Up to.1 Mword of.mcmory
• Bft'i.c:ientvocw ~ ~C8

- ......--

.......

. . . . . - - - - - - , • 12.S-mckUeyd¢
• Up to 4 Mwonts Of~CX'y
• Jntrodacdod of 1.0 Subsystem (lOS)

_-.,..--

......

. . . . . - - - - - - , • 12.O-nIclodt .1.
• Upto4MworcJsOfmemay

The following list defines architecture terms:
Definition
CX/l systems

This group includes all models of the CRAY X-MP and CRAY-l
computer systems. It is characterized by 24-bit addressing capabilities.

CEAsystems

This group includes all models of the Extended Architecture (EA) series,
which are the CRAY Y -MP and CRAY X-MP EA computer systems.
It is characterized by 32-bit addressing capabilities.

CRAY -2 systems

This group includes all models of the CRAY -2 computer systems. It is
characterized by 32-bit addressing capabilities, large common memories,
and immersion cooling.

CX/CEA systems

This group designates all models of CRAY X-MP computer systems
plus all models of the CRAY Y -MP and CRAY X -MP EA computer
systems. It does not include CRAY -1 computer systems.

EAM bit (hardware)

In CX/l systems, the EAM bit is the Enhanced Addressing Mode bit in
the Flag register. When set, it sign-extends certain instructions for
memory addressing in 8- and 16-Mword systems. In CEA systems, the
EAM bit is the Extended Addressing Mode bit in the Flag register. It is
set by the operating system to select either 24- or 32-bit addressing.

EMA feature (software)

In CX/l systems, EMA is the Extended Memory Addressing feature for
8- or 16-Mword systems.

X-mode

This term refers to the 24-bit addressing mode in CEA systems. The
operating systems select this mode with the EAM bit in the Exchange
Package.

V-mode

This term refers to the 32-bit addressing mode in CEA systems. The
operating systems select this mode with the EAM bit in the Exchange
Package.

COlITEHTS

PREFACE • • • • •

1.

CONVENTIONS •
OTHER PUBLICATIONS
UNICOS SYSTEM INSTALLATION BULLETIN . .
READER COMMENTS • . . . . • . . . • . •

v

ON-LINE DIAGNOSTIC SYSTEM

1-1

1.1

1.2

2.

vi
vii
vii

ON-LINE DIAGNOSTIC ENVIRONMENT .
. . . •
ON-LINE DIAGNOSTIC PROGRAMS . . . . . . .

1-1
1-2

CONFIDENCE TEST AND MONITOR OVERVIEW . . .

2-1

2.1
2.2
2.3
2.4
2.5
2.6

2-1
2-1
2-5
2-5
2-6

2.7

3.

v

ON-LINE CONFIDENCE MONITOR (olcmon)
PROGRAM SYNOPSIS . . . . • . . . . • • .
TEST EXECUTION . .
TEST TERMINATION .
TEST EXAMPLES
TEST MESSAGES . . . . . . . . .
2.6.1
Informative messages ••
2.6.2
Error messages . . . . . . . . . . .
OFF-LINE CONFIDENCE MONITOR (offmon) .

.....

....

2-8

2-9
2-9
2-10

CONFIDENCE TEST DESCRIPTIONS •

3-1

3.1

3-1
3-2
3-6

3.2

SMM-1012 C

olcfdt
3.1.1
3.1.2
3.1.3

olcfpt
3.2.1
3.2.2

Test synopsis
Test examples
Test messages . . . . . . . . . .
3.1.3.1
Informative messages.
3.1.3.2
Error messages .
Test synopsis . . . . . . . .
Test execution • • . . . . . •
3.2.2.1 Test initialization
3.2.2.2 Random floating-point
data generation . . .
3.2.2.3 Random floating-point
buffer simulation . .
3.2.2.4 Random floating-point
buffer execution

CRAY PROPRIETARY

3-8
3-9
3-9

. . . • •
• .
.
instruction
. . . • . •
instruction
. . . . . .
instruction
. . . • • .

• ..
and
• ..

3-11
3-11
3-14
3-15
3-15
3-15

•

3-16

ix

3.2.2

3.2.3
3.2.4
3.2.5

3.3

olem •
3.3.1
3.3.2

3.3.3
3.3.4
3.3.5

3.4

Test synopsis . • . • . . . • . . .
Test execution . • . • .
3.3.2.1 Test initialization.
3.3.2.2 Test section execution . . . • .
Test section 1
Test sections 2 and 3
Test section 4
Test section 5
Test section 6
Test section 7
3.3.2.3 Comparison of expected and actual
data . • . • .
3.3.2.4 Error report . . • • • • • • • •
Test termination • • . . . . . . .
Test examples
. . . • • • • •
Test messages
3.3.5.1 Informative messages
3.3.5.2 Error messages
3.3.5.3 Error output definitions

olcrit
3.4.1
3.4.2

3.4.3
3.4.4
3.4.5

x

Test execution (continued)
3.2.2.5 Comparison of simulation and execution
results • • • • • • • • • • . • • •
3.2.2.6 Error isolation.
Test termination •
Test examples
Test messages
. . . . • . • • • • •
3.2.5.1 Informative messages
3.2.5.2 Error messages

Test synopsis . • • • . • • • . •
Test execution • • • . . • • . • . . • • •
3.4.2.1
Test initialization and hardware
configuration detection • • • • .
3.4.2.2
Random instruction and data
generation . . . • • • • •
Random instruction buffer
3.4.2.3
simulation • . . .
3.4.2.4
Random instruction buffer execution
3.4.2.5
Comparison of simulation and execution
results • • • . . . . • •
Error isolation
3.4.2.6
Test termination • • . . .
Test examples . • . • • • . • • •
Test messages • • . • • . . • • . • • • •
3.4.5.1
Test mode messages • • • • • . • . • .
3.4.5.2
Informative messages
.•••.
3.4.5.3
Error messages. • •
• ••.

CRAY PROPRIETARY

3-16
3-16
3-18
3-18
3-23
3-23
3-24
3-25
3-25
3-26
3-26
3-27
3-27
3-27
3-27
3-28
3-28
3-29
3-30
3-30
3-30
3-30
3-34
3-34
3-34
3-35
3-36
3-36
3-44
3-45
3-46
3-47
3-47
3-47
3-48
3-49
3-49
3-57
3-57
3-59
3-59

SMM-1012 C

3.5

olcsvc
3.5.1
3.5.2

3.5.3
3.5.4
3.5.5

3.6

olibuf
3.6.1
3.6.2

3.6.3

3.6.4
3.6.5
3.6.6

3.7

olsbt
3.7.1
3.7.2

.....

········

····

·· · · ·
· · ·

Test synopsis
Test execution
3.5.2.1
Test initialization and hardware
configuration detection
3.5.2.2
Random instruction and data
generation
3.5.2.3
Instruction buffer execution
3.5.2.4
Comparison of execution results
3.5.2.5
Error isolation
Test termination
Test examples
Test messages
3.5.5.1
Test mode messages
3.5.5.2
Informative messages

· · ·

3-61
3-61
3-66

· · · ·

3-66

··············

3-67
3-75
3-76
3-76
3-77
3-77
3-83
3-84
3-84
3-85
3-85
3-88
3-88

·
··
····
·····
· ·· ·
·
··
··
····
·· ···
··· ····
··
··
· ··
···· ··
Test synopsis
·
·
·
···········
Test execution
·
·
·
· ····· · · · ·
·
3.6.2.1
Test initialization
· · ·test
· ····
3.6.2.2
CRAY X-MP computer system
buffer generation
·······
3.6.2.3
CRAY Y-MP computer system test
buffer generation
· ·· ·
3.6.2.4
Test buffer execution
···
3.6.2.5
Comparison of expected and actual
data
· ····
···
····
3.6.2.6
Error report
·
·
·
·
·
·
·
···
Error isolation to the failing bit
·
·
·
···
3.6.3.1
CXl1 system error isolation
·
··
3.6.3.2
CRAY Y-MP computer system error
isolation
·····
····
Test termination
·
·
·
·
·
·
···
Test examples
·
·
·
·
·
·
·
·
··
Test messages
·
·
·
·
·
·
·
··
3.6.6.1
Informative messages
·
·
·
·
·
·
·
Error messages
3.6.6.2
·····
····
············
Test synopsis
·
·
····
·
···
·
·
·
Test execution
·
·
· · ·
·
·
·
·
·
·
3.7.2.1 Test initialization and hardware
configuration detection
··· ···
3.7.2.2 Random instruction and data
generation
·······
····
3.7.2.3 Random instruction buffer
simulation
· ···· · ·
3.7.2.4 Random instruction buffer execution

3-113
3-113

Comparison of simulation and execution
results
3.7.2.6 Error isolation
Test termination

3-114
3-114
3-115

3-89
3-92
3-96
3-96
3-96
3-96
3-97
3-99
3-101
3-101
3-105
3-105
3-106
3-107
3-107
3-110
3-110
3-110

3.7.2.5

··· ············

3.7.3

SMM-1012 C

· · ·

·
·

CRAY PROPRIETARY

··

xi

3.7

4.

4.5
4.6
4.7
4.8

4-1

MAINTENANCE MONITOR (olmon)
PROGRAM SYNOPSIS . . . . . .
TEST EXECUTION . . . • . • .
TEST-SPECIFIC REQUIREMENTS .
4.4.1
olaht
4.4.2
olCDm:
4.4.3
olibz
TEST TERMINATION
TEST EXAMPLES
TEST MESSAGES
DIAGNOSTIC MEMORY IMAGE FOR MAINTENANCE TESTS

4-1
4-2

4-4
4-4
4-5
4-5
4-6
4-7
4-7
4-12
4-13

DOWN-DEVICE PROGRAMS

5-1

5.1

5-1
5-2
5-2
5-3
5-3
5-4
5-4
5-5
5-9
5-9
5-10
5-10
5-11

donut
5.1.1
5.1.2

5.1.3
5.1.4
5.1.5
5.1.6

5.1.7
5.1.8

5.1.9

xii

3-115
3-126
3-126
3-126
3-127

MAINTENANCE TEST AND MONITOR OVERVIEW
4.1
4.2
4.3
4.4

5.

olsbt (continued)
3.7.4
Test examples
•.•.•
3.7.5
Test messages
.•••••
3.7.5.1 Test mode messages
3.7.5.2 Informative messages
3.7.5.3 Error messages

Disk selection •
Disk mode
5.1.2.1
System mode
5.1.2.2
Maintenance mode • •
Warnings and messages
. • • •
Menu displays
. • . •
Program execution
. • . .
Main menu • • . .
...•••
• • • •
5.1.6.1
Commands to display submenus . . • . •
5.1.6.2
Commands to select display format
5.1.6.3
Commands to set arguments • . • • • .
5.1.6.4
Commands to display the data buffer
5.1.6.5
Commands to display flaw table
menus • • . . . .
5.1.6.6
Commands to change the data buffer • •
5.1.6.7
Commands to change the type of write
command used • . •
5.1.6.8
Commands to display commands list
Buffer Utility menu
• . • •
Error Utility menu . • • •
•••••••••
5.1.8.1
Error Table menu.
. .•.
5.1.8.2
Error Log menu.
• •••
Formatting menu . . . . .
. . . .
5.1.9.1
Logical address of the sector ID
5.1.9.2
Position field of the sector ID
(DD-10s and DD-40s only) . • . •

CRAY PROPRIETARY

5-11
5-12
5-12
5-13
5-13
5-17
5-18
5-19
5-20
5-21
5-22

SMM-1012 C

5.1.9

Formatting menu (continued)
5.1.9.3
Examine Oata Buffer menu
5.1.9.4
IO Analysis menu (00-10s, 00-39s,
00-40s, and 00-49s only)
10 analysis (00-39s/49s)
1D analysis (DD-40s)
IO Analysis menu commands
5.1.9.5
Parameter menu
Surface Tests menu
5.1.10.1 Write Data, Read Data and Compare,
and Surface Analysis menus
5.1.10.2 Examine Data Buffer menu
5.1.10.3 Parameter menu
Flaw Table Utility menus
Error correction code test
Parameter menu
Exiting donut
Program examples

·····
···

···

5.1.10

5.2

5.1.11
5.1.12
5.1.13
5.1.14
5.1.15
oldmon
5.2.1
5.2.2
5.2.3

······
·········

·

·
·······
···
····
··

· ··

. . . . . . . . · ··
Down CPU tests
··
Program synopsis
·
Program execution

····

····
····
····
···
off-line

Oown CPU tests
Modifications to the
diagnostic test base
Default configuration files
5.2.3.2 Test loop code
5.2.3.3 Environment variables
Display modes
5.2.4.1 Scroll mode display
5.2.4.2 Screen mode display
Program commands
5.2.5.1 Common arguments
5.2.5.2 Append ( a) and Oump ( d) commands
5.2.5.3 CPU command (c)
5.2.5.4 Enter command (e)
5.2.5.5 Execute command (x)
5.2.5.6 Fill command ( f)
5.2.5.7 Go command (9)
5.2.5.8 Halt command (h)
5.2.5.9 Load command (I)
5.2.5.10 Options command (0)
5.2.5.11 Quit command ( q)
5.2.5.12 Redraw command ( r)
5.2.5.13 Shell escape command (!)
5.2.5.14 Status command (8)
5.2.5.15 Up command ( u)
5.2.5.16 View command (v)
5.2.5.17 Write command (w)
Program example
Program messages
5.2.3.1

·

····

5.2.4

·····

·

5.2.5

···········

····
· ···
·

····

· ·
···
···
··
··
· ·
·
· ··
·
·
··
····
·······

·

5.2.6
5.2.7

SMM-1012 C

·

CRAY PROPRIETARY

··
·······
··
·
·
·
·
·

·
· ·
· ····
·
·
···
···

···

5-22
5-23
5-24
5-25
5-27
5-27
5-27
5-29
5-33
5-33
5-33
5-41
5-42
5-44
5-44
5-50
5-50
5-51
5-53
5-53
5-54
5-54
5-56
5-58
5-59
5-61
5-62
5-63
5-65
5-66
5-67
5-68
5-68
5-68
5-69
5-69
5-70
5-70
5-71
5-71
5-72
5-72
5-72
5-72
5-73
5-74
5-87

xiii

5.3

unitap
5.3.1
5.3.2
5.3.3

5.3.4

......···············
Program synopsis
····
·········
Interactive program execution
·········
Program menus
·
·
·
·
·
·
····
·
·
·
5.3.3.1
Main Menu
····
·
·
·
·
5.3.3.2
Variable Menu
·
·
·
· ····
5.3.3.3
Test Menu
········
5.3.3.4
Canned Test Menu
·
·······
·
5.3.3.5
Debug Menu
·
·
·
·
5.3.3.6
Global Options Menu
······
5.3.3.7
Hardware Layout Menu
·
Debug tools
·
·
·
·
·
····
·
·
5.3.4.1
Breakpoint Tool
·
···
5.3.4.2
Channel Commands Tool
5.3.4.3
Display Data Buffer Tool
5.3.4.4
Compare Data Tool
5.3.4.5
System Call History Tool
5.3.4.6
Programming Tool
5.3.4.7
Packet Status Tool
Trace file
Learn mode
Program examples
Program messages
5.3.8.1
Messages with menu displays
5.3.8.2
Messages without menu displays

····

·

·

5.3.5
5.3.6
5.3.7
5.3.8

6.

IIO SUBSYSTEM DEADSTART PROGRAMS

6.1
6.2

6.3

····
······
··
····
···· ··

····
···

·

. . . . . · ·· ·· ·· ·· · · ·· ·· ·
···
··
·
......······ ·
····

SYSTEM CONFIGURATION
cleario
6.2.1
Program execution
6.2.2
Program messages
6.2.2.1
Informative messages
6.2.2.2
Error messages
dsdiaq
Program execution
6.3.1
6.3.1.1
IOP-O tests
IIO Subsystem tests
6.3.1.2
dsmos16k
dsiom
dsiop
dsmos
dshsp
dslsp

··

·
·

·
·
·

····

CRAY PROPRIETARY

·
·
·
·

·
··
·····
···
······
·····
·

·····
·····
····
·
·· ·
·····
····
·····
·
·····
·····
····

····
·····

xiv

····

5-89
5-90
5-91
5-91
5-92
5-93
5-94
5-96
5-98
5-99
5-100
5-102
5-103
5-104
5-105
5-107
5-108
5-109
5-110
5-111
5-111
5-111
5-111
5-112
5-113

···

6-1

···
···
···

6-1
6-2
6-2
6-4
6-4
6-4
6-5
6-5
6-7
6-9
6-9
6-10
6-10
6-13
6-14
6-15

···
·
···
···

····

SMM-1012 C

6.3

7•

dsdiaq (continued)
6.3.2
Program messages . . . . . . •
6.3.2.1
Informative messages
6.3.2.2
Error messages . . •
Messages applicable to all tests •
IOP-O messages . • .
dsmos16k messages
dsiom messages . . • . •
dsiop messages
dsmos messages • .
dshsp messages . . . . . . . • • .
dslsp messages .

6-16
6-16
6-17
6-17
6-18
6-19
6-19
6-20
6-22
6-24
6-31

UTILITY PROGRAMS

7-1

7.1

7-1
7-1
7-6
7-9
7-10
7-13
7-14
7-14
7-16
7-17

7.2

olhpa
7.1.1
Program synopsis .
7.1.2
Help menus . • • •
Program examples . .
. . . . . • • . .
7.1.3
Shell script generation and execution • • • • .
7.1.4
7.1.5
Program messages . .
runsequence . . . . • . •
7.2.1
crontab input file.
7.2.2
Sequence files . . .
7.2.3
runsequence shell script.

APPENDIX SECTION

A.

ON-LINE DIAGNOSTIC PROGRAMS
A.1
A.2
A.3
A.4
A.5
A.6
A.7

B.

CONFIDENCE TESTS . . •
MAINTENANCE TESTS
DOWN-DEVICE PROGRAMS .
ON-LINE NETWORK COMMUNICATIONS PROGRAM • •
1/0 SUBSYSTEM DEADSTART PROGRAMS . .
UTILITY PROGRAMS •
offman TESTS . . • . . • . . . .

TEST EXECUTION TIMES . . . . . . . •
B.1
B.2

SMM-1012 C

EXECUTION TIMES FOR CONFIDENCE TESTS .
EXECUTION TIMES FOR MAINTENANCE TESTS

CRAY PROPRIETARY

A-1
A-1
A-2
A-4
A-7
A-8
A-9
A-9

B-1
B-1
B-2

xv

C.

ON-LINE DIAGNOSTIC PROGRAM LIBRARIES • •
C.1
C.2
C.3

C-l

DIAGPL • •
XMPPL
CRAY1PL

C-1
C-2
C-2

D.

SOFTWARE PROBLEM REPORTING •

D-1

E.

SYSTEM UTILITIES • • 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

E-1

Fo

SITE COMMUNICATIONS

F-1

Go

INSTALLATION INFORMATION

G-1

Gol
Go2
Go 3
Go4

G-1
G-2
G-2
G-3
G-3
G-3
G-4
G-4
G-5
G-5
G-6
G-6
G-7
G-8
G-9
G-10

Go5

G.6

Go7

ON-LINE DIAGNOSTIC DIRECTORIES 0 0 0 . 0 0
GENERATING ON-LINE DIAGNOSTIC BINARIES 0
GENERATING ON-LINE DIAGNOSTIC LISTINGS
SAVING OFF-LINE VERSIONS OF ON-LINE CONFIDENCE TESTS
Go401
MVS-based systems running CMS 0 0 . 0 0
Go4.2
Expander-based systems running DDS 0 0 0
SAVING IIO SUBSYSTEM (lOS) DEADSTART PROGRAMS
Go501
OWS UNICOS 0 . 0 0 0 0 .
0 0 0 .
Go5.2
Expander Tape UNICOS 0 0
Expander disk UNICOS 0
Go503
GENERATING olnet 0 0 . . . 0 .
Go601
IBM front-end • 0 0 0
Go602
Sun Workstation front-end (NSC) 0 0 0 0 0 0 0 0
0 0 0 0 0 0
Go603
Sun Workstation front-end (VME)
Go604
Motorola Workstation, OWS, or MWS front-end (VME)
DELETING PROPRIETARY SOURCE CODE 0 0 0 0 0 0 0 0 0 0 0 0

FIGURES
4-1
5-1
5-2
5-3
5-4
5-5
5-6
5-7
5-8
5-9
5-10
5-11
5-12
5-13
5-14

xvi

Sample Diagnostic Memory Image
Main Menu for dODut 0 0 0 0 0 0 0 0 0 0
Buffer Utility Menu 0 0
Write Buffer Menu 0 0 0 0 0 0 0 0
Read Buffer Menu
Error Utility Menu
0 0 0 0 0 0
Error Table Menu
0 0 0 0 0
Error Log Menu
0 0 0 0
Formatting Menu 0 0 0
0 0 0
Examine Data Buffer Menu
0 0
ID Analysis Menu for DD-39 and 00-49 Disk
ID Analysis Menu for DD-40 Disk Drives
Surface Tests Menu 0 0 0 0
0 0 0 0 0
Write Data Menu 0 . 0
0 0 0 0
Read Data and Compare Menu
0 0 0 0

CRAY PROPRIETARY

0 0
0 0
0 0 0 0 0 0 0
Drives
0 0 0 0 0 0 0 0
0 0 0
0 0 0 0 0 0 0 0

4-14
5-9
5-14
5-15
5-15
5-17
5-18
5-19
5-20
5-23
5-25
5-26
5-28
5-30
5-30

SMM-1012 C

FIGURES (continued)
5-15
5-16
5-17
5-18
5-19
5-20
5-21
5-22
5-23
5-24
5-25
5-26
5-27
5-28
5-29
5-30
5-31
5-32
5-33
5-34
5-35
5-36
5-37
5-38
5-39
5-40
5-41
7-1
7-2
7-3
7-4
D-1

Surface Analysis Menu • •
.•....••••••
Flaw Table Utility Menu •
• • • •
Factory Flaw Table Menu .
. .••..•.•
User Flaw Table Menu for DD-39 and DD-49 Disk Drives
User Flaw Table Menu for DD-10 and DD-40 Disk Drives
System Flaw Table Menu • • • • . • • . • • • • • • • • • • •
Found Flaw Table Menu for DD-19/29/39/49 Disk Drives
Found Flaw Table Menu for DD-10 and DD-40 Disk Drives • .
Parameter Menu
Main Menu for oldman . . . .
Scroll Mode Display .
Screen Mode Display . .
Main Menu for unitap
Variable Menu . .
Test Menu .
. • . .
Canned Test Menu . . • .
Debug Menu . . . . . • • • . •
Global Options Menu . • . . .
Hardware Layout Menu • • • • • • . • • • • • .
Block Multiplexer Layout Menu (BMC-5) . • • • •
Breakpoint Tool . . . • .
......•.
Channel Commands Tool • .
Display Data Buffer Tool
Compare Data Tool •
System Call History Tool
Programming Tool • • • . . . • . • •
Packet Status Tool
Disk Help Menu . . . • .
Memory Help Menu . . • .
Tape Help Menu . . . • .
SSD Help Menu . . . . .
SPR Form
• • • •

5-31
5-33
5-36
5-37
5-37
5-38
5-38
5-39
5-42
5-53
5-61
5-62
5-92
5-93
5-94
5-96
5-98
5-99
5-100
5-101
5-103
5-104
5-105
5-107
5-108
5-109
5-110
7-7
7-8
7-9
7-9
D-2

TABLES
5-1
5-2
5-3
5-4
5-5
5-6
5-7
5-8
5-9
5-10
5-11
5-12

Main Menu Commands . • . . . .
Commands to Set Arguments • . .
Buffer Utility Menu Commands
Commands for the Write Buffer and
Error Utility Menu Commands . . .
Error Table Menu Commands
Error Log Menu Commands . . .
Formatting Menu Commands
. .
Examine Data Buffer Menu Commands
ID Analysis Menu Commands . . . .
Surface Tests Menu Commands . . .
Commands for the Write Data, Read
Surface Analysis Menus

SMM-1012 C

Read Buffer Menus .
. . • . . . . . •

.

. • .
.
. . . . • • .
. . . • • .
Data and Compare, and

CRAY PROPRIETARY

5-10
5-11
5-14
5-16
5-18
5-19
5-20
5-21
5-23
5-27
5-28
5-31

xvii

TABLES (continued)
5-13
5-14
5-15
5-16
A-I
A-2
A-3
A-4
A-5
A-6
A-7
A-a
A-9
8-1
8-2

Flaw Table Utility Menu Commands
Commands for the Flaw Table Menus •
Parameter Menu Commands •
oldman Commands • • •
Confidence Tests
CPU Maintenance Tests •
Down-Device Programs . . . •
Down CPU Confidence Tests
Down CPU Maintenance Tests
On-line Network Communications Program
lID Subsystem Deadstart Programs
Utility Programs . • • . . . . • • •
offmon Tests
• • • . • • • • • • •
Execution Times for Confidence Tests
Execution Times for Maintenance Tests

5-34
5-39
5-43
5-52
A-I
A-2
A-4
A-5
A-5
A-7
A-a
A-9
A-9
B-2
8-2

INDEX

xviii

CRAY PROPRIETARY

SMM-I012 C

,
1.

.

OR-LINE DIAGNOSTIC SYSTEM

This manual describes the on-line test environment for diagnostics that
run under the Cray operating system UNICOS on the following computer
systems:
•

CEA systems
Y-mode (32-bit addressing)
X-mode (24-bit addressing)

•

CX/1 systems

The on-line diagnostic system performs error detection and isolation
concurrent with system operation. This type of on-line maintenance
provides the following benefits:
•

Ensures an enhanced level of continuous system operation

•

Prevents possible system software failures and identifies data
integrity problems in system output

•

Provides the capability for concurrent maintenance

•

Reduces mean time to repair (MTTR) by isolating the failing
hardware while the system is running

•

Reduces off-line preventive maintenance (PM) time required for
failure detection, isolation, and repair

1.1

ON-LINE DIAGNOSTIC ENVIRONMENT

The on-line diagnostic system consists of programs that reside in Cray
central memory or in Cray mass storage. To run the on-line diagnostic
programs in a Cray computer system configuration, UNICOS must be running
in at least one Central Processing Unit (CPU).
Throughout this document, the term operator's station refers to one of
the following devices, as appropriate to your site:
•

Peripheral expander

•

Operator workstation

SMM-1012 C

CRAY PROPRIETARY

1-1

1.2

ON-LINE DIAGNOSTIC PROGRAMS

To ensure maximum system reliability, the on-line diagnostic programs do
the following:
•

Detect, isolate, and report hardware faults

•

Gather and analyze system performance data

The on-line diagnostic programs are grouped as follows:

t

1-2

Diagnostic Group

Description

Confidence tests

These tests provide error detection and
isolation. To verify system integrity, it
is recommended that these tests be run at
system startup and at intervals thereafter.

Maintenance tests

These tests provide error detection and
isolation. These tests are variants of
off-line diagnostic tests.

Down-device programs

The down-device programs provide on-line
CPU and peripheral testing while the
hardware is removed from normal system
operations.

Network test (olnet)t

This test detects and isolates faults in
the communications link between a Cray
mainframe and a front-end computer system.

IIO Subsystem (lOS)
deadstart programs

These programs can be run prior to system
deadstart to verify the integrity of the
lOS hardware. They isolate failures to the
functional area, at which point a CRI field
engineer must interpret the results.

Utility programs

These are on-line diagnostic tools.

The olnet test is described in the On-line Diagnostic Network
Communications Program (OLNET) Maintenance Manual, CRI publication
SMM-I016.

CRAY PROPRIETARY

SMM-1012 C

2.

CONFIDENCE TEST AND MORITOR OVERVIEW

On-line diagnostic confidence tests provide a comprehensive performance
check of the system hardware. This test level consists of the following:
•

High-level language diagnostic programs

•

A set of CAL Version 2 diagnostic programs that direct hardware
testing to specific logic areas

This section provides an overview of the following:

•
•

•
•
•
•
•

On-line confidence monitor (aleman)
Program synopsis
Test execution
Test termination
Test examples
Test messages
Off-line confidence monitor (offman)

For a brief description of each confidence test, refer to appendix A,
On-line Diagnostic Programs. For a list of test execution times, refer
to appendix B, Test Execution Times. For additional information on
specific confidence tests and their command options, refer to section 3,
Confidence Test Descriptions.

2.1

ON-LINE CONFIDENCE MONITOR (aleman)

The on-line confidence monitor program, aleman, does the following:

2.2

•

Accepts and interprets command options and arguments

•

Sends test results to stdaut (standard output device) by default
or to a file when UNICOS output redirection is indicated on the
command line

PROGRAM SYNOPSIS

The aleman command options are entered with the test command options of
each confidence test to be executed. The test-specific command options
are described in section 3, Confidence Test Descriptions.

SMM-1012 C

CRAY PROPRIETARY

2-1

The oleman command options can be entered in any order.
is omitted, the program uses the default value.

If an option

The following command options provide different methods of specifying the
starting seed value (specify only one for each test executed):
•

+I-qetseed

•

qetseed file

•

seed n (a test-specific command option described in section 3,
Confidence Test Descriptions)

Synopsis:

test [chtpnt mode] [cpu clist] [cputime h:m:s] [+I-qetseed]
[qetseed file] [help] [mazerr n] [mazp n] [+I-parcel] [time h:m:s]
[+I-verbose] [+zmp] [+crayl]

[test options]t
chtpnt mode
Indicates whether restart files are to be generated.
mode is one of the following arguments:
Argument

Description

first

Generates a restart file for the first
failure detected (default)

all

Generates a restart file for each failure
detected, including failures detected during
error isolation

none

Does not generate restart files

The default generates a restart file for the first failure
detected.
For additional information, refer to the following:
chtpnt(l), restart(l), chtpnt(2), and restart(2).

t

2-2

For additional information on confidence tests and their test-specific
command options, refer to section 3, Confidence Test Descriptions.

CRAY PROPRIETARY

SMM-I012 C

cpu clist
Selects the CPUs to be tested.
following format:

Enter clist in the

X,X, ••• ,X

x can be a, b, c, d, e, f, q, or h. The first CPU
selected is the master CPU. The default is cpu a.
If you enter an invalid CPU value in clist or a value for
a CPU that is currently down, you will receive an error
message.
. cputime h:m:s
Sets the test execution time in CPU time. The time is
specified in hours (h), minutes (m), and seconds (s);
minutes and seconds; or just seconds. Use colons as
delimiters, as follows:
h:m:s.
Generally, actual execution time is within one second of
the specified CPU time.
If cputime is allowed to
default, or is set to 0, the test uses the mazp value.
However, if set to a value other than 0, cputime
overrides mazp.
+/-getseed
Enables (+qetseed) or disables (-qetseed) the option
that reads the file test.seed to obtain a starting
seed.
If the test terminates because the maximum pass or
error limit is reached, the seed from the last pass is
saved in the file test. seed.
If there are any problems
with reading the seed from this file, the program uses the
default seed (0'33).
If you select +qetseed, do not
select seed n (test-specific command option). The
default is -qetseed.
qetseed file
Gets a starting seed from file.
file can contain a
dump from a previous failure or a single seed value.
allowed to default, the program uses the seed value
specified by +qetseed or seed n (test-specific
command option).

If

help

Generates an on-line help display containing a synopsis and
a brief description of the command options and arguments.
If help is entered with a test name, help information is
written to stdout, and the test terminates.

mazerr n

Sets the maximum number of errors.
value. The default for n is 1.

SMM-I012 C

CRAY PROPRIETARY

n is an octal

2-3

Sets the maximum number of passes. n is an octal
value. The default for n is 0'1000. If cputime or
time is set to a value other than 0, the specified option
overrides IIlaZp.

mazp n

+I-parcel
Enables (+parcel) or disables (-parcel) the option that
forces dumped data to parcel format. +parcel forces data
that would otherwise be in word format (64 bits in octal,
with leading O's) to parcel format (four groups of 16 bits
in octal). Parcel format displays two words (8 parcels)
per line. Word format displays four words per line. The
default is -parcel.
time h:m:s
Sets the test execution time in elapsed (wall-clock) time.
The time is specified in hours (h), minutes (m), and
seconds (s); minutes and seconds; or just seconds. Use
colons as delimiters, as follows: h:m:s.
Generally, actual execution time is within one second of
the specified elapsed time. If time is allowed to
default (or is set to 0), the test uses the mazp value.
However, if specified to a value other than 0, time
overrides mazp.
+I-verbose
Enables (+verbose) or disables (-verbose) the
generation of informational messages. The +verbose
option causes a line of output to be generated after each
pass of the diagnostic. The default is -verbose.
+zmp

Indicates the test mode for the following computer systems:

+crayl
Command

Computer System

+zmp

CRAY X-MP

+crayl

CRAY-l

If allowed to default, the monitor determines the machine
type during test execution and selects the appropriate test
mode. This option can be used to override the default
selection. These command options are not applicable to a
CEA system.

2-4

CRAY PROPRIETARY

SMM-I012 C

2.3

TEST EXECUTION

To start a single diagnostic test, enter the following on the command
line:

•

•
•

test

Monitor command options
Test-specific command options

To run a sequence of diagnostics, use the runsequence utility described
in section 7, Utility Programs.
Before a test can be started, UNICOS must be running in the CPUs to be
tested. The master CPU (the first CPU selected) does the following:
•

Generates instructions and data

•

Generates expected results

•

Compares the test execution buffers of the selected CPUs to the
expected results

•

Generates and formats error reports

•

Controls error isolation

Each CPU, including the master, does the following:
•

Loads registers and buffers

•

Executes test instructions

•

Saves results

TEST TERMINATION

2.4

A test stops under the following conditions:
•

The test successfully completes the maximum number of passes
(mazp n).

•

The test reaches the specified CPU time (cputime h:m:s) or
elapsed (wall-clock) time (time h:m:s).

•

The test detects and isolates the maximum number of errors
(mazerr n). Error reports are automatically sent to stdout
(standard output device), but they can be redirected to an error
file.

SMM-I012 C

CRAY PROPRIETARY

2-5

2.5

•

The help option is entered with a test name, help information is
written to stdout, and the test terminates.

•

The monitor or test detects an error in a command line entry and
writes a message to stderr (standard error device). Only the
first error detected is reported.

TEST EXAMPLES

The following example executes olcsvc in CPUs c, a, and b, with c as
the master.
Example:
olcsvc cpu c,a,b
The following example executes olcsvc in CPUs a and b, with a as the
master. The seed x option provides an octal seed value to start
random number generation.
Example:
olcsvc seed x cpu a,b
In the following example, the Dohup(l) command allows olcsvc to
continue executing after you log off the system. The ampersand (&)
causes the entire command to execute in the background, so that another
prompt is immediately displayed and you can continue to use the system.
Example:
nohup olcsvc &

2-6

CRAY PROPRIETARY

SMM-1012 C

The following example shows the test-specific help information that is
displayed if help is entered with a test name.
Example:
olcsvc help

Help display:

olcsvc help
olcsvc [chkpnt mode] [cpu clist] [+/-getseed] [getseed file] [help] [maxerr n]
[maxp n] [+/-parcel] [+/-verbose] [+cray1] [+xmp] [cputime h:m:s]
[time h:m:s] [disable ilist] [enable ilist] [+/-isolate] [isop n] [numpar n]
[+/-repeat] [seed n] [+/-sgci] [vI n] [+/-cm] [+/-fpadd] [+/-fpmult]
[+/-fprecip] [+/-int] [+/-logical] [+/-pop] [+/-shift] [+/-onezero]
[+/-random] [+/-slide]
chkpnt mode
- Checkpoint mode: none, first, or all. (Default: first)
cpu clist
- Run in selected CPUs. (Default: a)
+/-getseed
- Get/don't get seed from test. seed.
(Default: -getseed)
getseed file
- Search file for starting seed
help
- Provides a help display.
+/-verbose
- Enable/disable info. messages to stdout.
(Default: -verbose)
maxp n
- Set maximum pass limit to n.
(Default: 0'1000)
maxerr n
- Set maximum error limit to n.
(Default: 1)
+/-parcel
- Force/don't force dump to parcel format.
(Default: -parcel)
+cray1/+xmp
Selects CRAY-!/CRAY X-MP test mode.
(Default: host machine)
cputime h:m:s - Set amount of CPU time to execute.
time h:m:s
- Set amount of wall clock time to execute.
disable ilist - Do not run specific instructions.
Ignored if invalid.
enable ilist
- Run specific instructions. Ignored if invalid.
+/-isolate
- Enable/disable isolation.
(Default: +isolate)
isop n
Loop during isolation n times to find error.
(Default: 0'1000)
numpar n
- Number of parcels to run in vector buffer.
(Default: 0'100)
+/-repeat
- Repeat/do not repeat first pass.
(Default: -repeat)
seed n
- Set seed for random number generator to n.
(Default: 0'33)
+/-sgci
- Enable/disable scatter/gather/compressed index testing.
vI n
- Set VL. 0 <= n <= 100.
If n = 0, VL is random.
(Default: 0)
+/-cm, +/-fpadd, +/-fpmult, +/-fprecip, +/-int, +/-logical, +/-pop, +/-shift
- Enable/disable specific instruction groups.
(Default: all instructions)
+/-onezero, +/-random, +/-slide
- Enable/disable specific data patterns.
(Default: all data patterns)

SMM-1012 C

CRAY PROPRIETARY

2-7

The following example shows the output that is displayed when olesve is run
with all default values.
Example:
olcsvc
Output:
olcsvc
olcsvc: started in cpu A on Thu Jan 8 08:55:46 1987
CRAY X-MP MODE
olcsvc reached maximum pass limit with 1000 passes and 0 errors
on Thu Jan 8 08:56:08 1987
The following example shows the output that is displayed if +verbose is
specified and mazp reaches 10.
Example:
olcsvc +verbose maxp 10
Output:
olcsvc +verbose maxp 10
olcsvc: started in cpu A on Thu Jan 8 08:56:43 1987
CRAY X-MP MODE
1, error =
olcsvc: pass =
0 Thu Jan 8 08:56:43
2, error =
olcsvc: pass =
0 Thu Jan 8 08:56:43
3, error =
olcsvc: pass =
0 Thu Jan 8 08:56:43
4,
olcsvc: pass =
error =
0 Thu Jan 8 08:56:43
5, error =
olcsvc: pass =
0 Thu Jan 8 08:56:43
6,
olcsvc: pass =
error =
0 Thu Jan 8 08:56:43
7, error =
olcsvc: pass =
0 Thu Jan 8 08:56:43
10, error =
olcsvc: pass =
0 Thu Jan 8 08:56:43
olcsvc reached maximum pass limit with 10 passes and 0 errors
on Thu Jan 8 08:56:43 1987

2.6

1987
1987
1987
1987
1987
1987
1987
1987

TEST MESSAGES

Each test generates the following types of messages:
•
•

Informative
Error

These messages are listed in the subsections that follow.

2-8

CRAY PROPRIETARY

SMM-1012 C

2.6.1

INFORMATIVE MESSAGES

This subsection lists the informative messages, which are sent to
stdout (standard output device).

test:

Cannot open test. seed. Seed cannot be saved.
The test cannot write test. seed. Therefore, the ending seed
cannot be saved. Check write permissions of the current directory.

test:

Cannot write restart file. errno
The test cannot write a restart file.
representative.

2.6.2

= n.
Contact your CRI

ERROR MESSAGES

This subsection lists the error messages, which are sent to stderr
(standard error device).

test:

Illegal option x.
Option x is invalid. Correct and rerun.

test:

Illegal argument x.
Argument x is invalid. Correct and rerun.

test:

Illegal CPU selection x.
CPU x is invalid. Correct and rerun.

test:

Maximum of O'x items in option list.
Too many items are in the argument list for option. The maximum
number of items allowed in the argument list is O'x. Correct
and rerun.

test:

An error occurred when selecting CPU x.
CPU x is unavailable. Contact your CRI representative.

test:

Cannot allocate memory. Cannot save buffers.
The test cannot allocate memory or save buffers. Regenerate the
diagnostic and rerun. If the problem persists, contact your CRI
representative.

test:

Too many buffers. Cannot save buffers.
The test cannot save buffers. Regenerate the diagnostic and
rerun. If the problem persists, contact your CRI representative.

test:

Cannot open file.
The test cannot open the file name specified by the getseed
option. Correct and rerun.

SMM-1012 C

CRAY PROPRIETARY

2-9

test:

Cannot find seed in file.
The test cannot find the seed in file.
valid and rerun.

Ensure that file is

Error selecting cluster x.
Cluster x is unavailable. Contact your CRI representative.

test:

2.7

OFF-LINE CONFIDENCE MONITOR (offman)

The offmant monitor allows the following on-line confidence tests to
be executed either in an off-line environment or in a down CPU under the
down CPU monitor, oldmon:tt

•
•
•
•
•

olefpt
olem
olerit
olesve
olibuf

To execute in these environments, each on-line confidence test is
concatenated to offman and assembled (instead of being linked to
oleman). To ensure compatibility between the on-line and off-line test
environments, the on-line and off-line confidence tests are built from
the same source code. The equivalent off-line confidence test names
start with the prefix off instead of 01. For example, the off-line
equivalent of olerit is offerit.
To generate the same test conditions in both the on-line and off-line
test environments, use the same seed value. Set the seed value for the
on-line confidence test (refer to subsection 2.2, Program Synopsis), and
use the same value for the off-line test.
For information on executing offmon, refer to the diagnostic listing.

t

tt

2-10

The offman monitor is supported on CX/CEA systems only.
The oldman monitor is supported on multiple-CPU Cray computer
systems only.

CRAY PROPRIETARY

SMM-1012 C

3.

CORFIDEHCB TEST DBSCRIPTIOHS

This section describes the following on-line confidence tests:
Test

Description

olcfdt
olcfpt
olem
olcrit
olcsvc
olibuf
olsbt

Mass storage device test
Comprehensive floating-point test
Central memory test
Comprehensive random instruction test
Comprehensive scalar and vector comparison test
Instruction buffer test
Semaphore, shared B and shared T register test

For general information on confidence tests, refer to section 2,
Confidence Test and Monitor Overview. For a list of test execution
times, refer to appendix B, Test Execution Times.

3.1

olcfdt

The olcfdt test is an on-line confidence test for mass storage
devices. It creates a user-specified file that is used for all input and
output operations during test execution.
To test a specific device, specify the absolute path name to the device.
If an absolute path name is not specified, olcfdt creates a file on the
user's current working directory and tests the device associated with the
working directory. Your system file configuration determines which
directories and files reside on each device.
The created file is permanent.
command.

To delete the file, use the rm(l)

The test uses the values specified by the record size (rsz) and file
size (sz) options to determine the following:
•
•
•

Data record size
Size of the device file to be created
Number of data records required to fill the file

The default values for the tests and patterns to be run (specified by the
test and pat options, respectively) are designed for optimum
functionality. When selecting arguments for these options, be aware that
varying degrees of functionality may be achieved.

SMM-1012 C

CRAY PROPRIETARY

3-1

If a failure occurs, messages are output to stdout, provided the
program is in control after the failure. However, you can redirect
output from stdout to a specified file.

3.1.1

TEST SYNOPSIS

The olcfdt command options can be entered in any order.
is omitted, the program uses the default value.

If an option

Synopsis:
olcfdt [disp display] dt type [fn file] [help] [mazp n] [ntks]
[pat patterns] [rsz n] [seed n] [sz n] [test tests] [upat n]
disp display
Enables or disables the option that generates an error
information/history display option. The default is err
(all error information is displayed). display is one of
the following:

dt type

Value

Description

hst

Displays a history of the current iteration
(test pattern and test sections executed)

err

Displays all error information

none

Does not display error information or a history
of the current iteration

all

Displays all error information and a history of
the current iteration

Device type (required). If the specified device type
is not associated with the specified file name, the program
overrides the dt command option and tests the device type
associated with file. type is one of the following
(only one device type can be selected at a time):
Device Type
ddlO
dd19
dd29
dd39

3-2

Description
00-10
00-19
DD-29
00-39

disk
disk
disk
disk

drive
drive
drive
drive

CRAY PROPRIETARY

SMM-1012 C

dt type
(continued)
Device Type
dd40
dd49
bmr
ssd

Description
00-40 disk drive
00-49 disk drive
Buffer memory resident storage
SSD solid-state storage device

fn file

File name. file is the absolute path name to a file. The
created file is permanent. When assigning a file, you must
know which directory is associated with the selected device
type. Consult your CRI analyst to determine the directory
associated with a specific device. The default is
workfil under the current working directory.

help

Produces an on-line help display containing a synopsis and
brief description of the command options and arguments. If
the help option is entered with a test name, help
information is written to stdout, and the test terminates.

mazp n

Pass count (decimal). On each pass, all selected test
patterns and test sections are run. The default for n is
512.

ntks

File size is in number of tracks. This command option
indicates that the argument associated with the sz
command option is the file size in number of tracks
(decimal). If allowed to default, the file size is in data
sectors (decimal).

pat patterns
Patterns to be run. The default is all (all test
patterns are run). If the upat option is specified, you
must either set pat to all or include user in the
list of arguments. patterns is a comma-separated list of
up to nine test pattern arguments. Duplicate entries are
allowed. For example:
pat zeros, ones

patterns can be one of the following:

SMM-1012 C

Argument

Pattern

zeros

All O's

ones

All l's

chkbrd

Checkerboard (1252525252525252525252B,
0525252525252525252525B •.. )

CRAY PROPRIETARY

3-3

pat patterns
(continued)
Argument

Pattern

chkbrdc

Complement of the chkbrd pattern
Record/word index. The record number in the
upper 31 bits of the data word, followed by
the data word number within the record in
the lower 33 bits (hardware numbered bits).

rvic

Complement of the rvi pattern

fpn

Random floating-point numbers
Random numbers

user

User pattern. This is the pattern specified
by the upat option (upat must be
specified if this argument is entered).

all

All patterns are run (default). The
patterns are processed in the following
order:
zeros,ones,chkbrd,chkbrdc,
rvi,rvic,fpn,rdm,user
The user argument is processed only if the
upat option is entered. all is a
stand-alone argument.

rsz n

Record size in data words. n is a decimal record size of
512, or a multiple thereof, up to a maximum value of 4096.
The default is 512 words.

seed n

Random number
than or equal
which selects
random number

sz n

File size (decimal). If sz n is specified without the
ntks command option, the file size is in data sectors; if
ntks is specified, the file size is in number of tracks.
The minimum value for n is 1. The maximum value for n
is as follows:

seed. n is an octal value that is less
to 48 bits. The default for n is rdm,
the nearest integer of the product of a
and the real-time clock.

(Track size

*

number of tracks) - 1

or
Maximum file size allowed by the system

3-4

CRAY PROPRIETARY

SMM-1012 C

sz n
(continued)
The default for n is the track size of the device
specified by the command option dt.
test

tests
Test sections to be run. The test does a sequential write
before executing the selected test sections. The default
for tests is all (all test sections are run).

tests is a comma-separated list of up to three test
section entries. The test sections are processed in the
order in which they are entered on the command line.
Duplicate entries are allowed. For example:
rw,rw,rr

tests can be one of the following:
Test
Section
rr

Description
Random read; performs random reads on the
work file.
A data compare is performed on
each record read. On a miscompare, a message
is displayed and the program is aborted.
Random write; performs random writes on the
work file. This section automatically
performs a sequential read (sr) if sr is
not selected after a random write (rw).
For example, the following entry runs test
sections rr, rw, and sr, respectively:
test rr,rw

upat n

SMM-1012 C

sr

Sequential read; reads the work file
sequentially. A data compare is performed on
each record read. On a miscompare, a message
is displayed and the program is aborted.

all

Runs all test sections (default). This is a
stand-alone argument. The tests are run in
the following order: rr,rw,sr.

User pattern. n is an octal value that is less than
or equal to 64 bits. An error occurs if the upat option
is not specified when user is entered in the argument
list for the pat option. The default is no user pattern.

CRAY PROPRIETARY

3-5

3.1.2

TEST EXAMPLES

This subsection contains olcfdt execution examples.
The following example runs olcfdt using default command options to test
a DD-29 disk drive. It is assumed that the current user directory is
associated with the 00-29 disk drive to be tested.
Example:
olcfdt dt dd29 rsz 512
Output:
olcfdt dt dd29 rsz 512
olcfdt submitted on Wed Mar 11 15:38:30 1987
odt06 - Test completed.

The following example runs olcfdt using user-specified command options
to test a 00-29 disk drive. It is assumed that the specified file name,
Iw/xxxlyyy, is associated with the 00-29 disk drive to be tested.
Example:
olcfdt fn IW/xxxlyyy dt dd29 sz 36 rsz 512 test all pat all
upat 707070707070707070707 seed 7070707070707070 maxp 10
disp none
Output:
olcfdt fn IW/xxXlyyy dt dd29 sz 36 rsz 512 test all pat all
upat 707070707070707070707 seed 7070707070707070 maxp 10 disp none
olcfdt submitted on Wed Mar 11 16:26:20 1987
odt06 - Test completed.

The following example runs olcfdt using default options and the
checkerboard data pattern to test a 00-29 disk drive. The test displays
the data compare error output by default.
The test output indicates that a data compare error was detected at
word 99 of record 9. The test displays expected, actual, and difference
data for the following words:
•
•
•

3-6

Ten words on either side of the failing word
Last word of the preceding record
First word of the next record

CRAY PROPRIETARY

SMM-1012 C

If there are less than 10 words preceding or following the word that
failed, more words are displayed from one side than another to make up
the difference.
In the following example, data information is displayed
for words 89 through 109 of record 9, word 1024 of record 8, and word 1
of record 10.
Example:
olcfdt dt dd29 pat chkbrd rsz 1024
Output:
olcfdt dt dd29 pat chkbrd rsz 1024
olcfdt submitted on Wed Mar 11 13:14:19 1987
odt14 - Data compare error.

*****
FILENAME
FILE SIZE
DEVICE TYPE
CURRENT DATA PATTERN
CURRENT TEST
ITERATION COUNT
NUMBER OF PASSES
RECORD SIZE
NUMBER OF RECORDS
FAILING RECORD NUMBER
FAILING WORD NUMBER
USER PATTERN
RANDOM NUMBER SEED

EXPECTED

WORD
89
90
91
92
93
94
95
96
97
98
99
100
101
102

1252525252525252525252
0525252525252525252525
1252525252525252525252
0525252525252525252525
1252525252525252525252
0525252525252525252525
1252525252525252525252
0525252525252525252525
1252525252525252525252
0525252525252525252525
1252525252525252525252
0525252525252525252525
1252525252525252525252
0525252525252525252525

SMM-1012 C

DATA COMPARE ERROR

*****

workfil
18
dd29
chkbrd
sr
512
100
1024
13
9
99
0000000000000000000000
0000003427130120254365

ACTUAL
1252525252525252525252
0525252525252525252525
1252525252525252525252
0525252525252525252525
1252525252525252525252
0525252525252525252525
1252525252525252525252
0525252525252525252525
1252525252525252525252
0525252525252525252525
1777777777777777777777
0525252525252525252525
1252525252525252525252
0525252525252525252525

CRAY PROPRIETARY

DIFFERENCE
0000000000000000000000
0000000000000000000000
0000000000000000000000
0000000000000000000000
0000000000000000000000
0000000000000000000000
0000000000000000000000
0000000000000000000000
0000000000000000000000
0000000000000000000000
0525252525252525252525
0000000000000000000000
0000000000000000000000
0000000000000000000000

3-7

Output (continued):
EXPECTED
103
104
105
106
107
108
109

1252525252525252525252
1252525252525252525252
0525252525252525252525
1252525252525252525252
0525252525252525252525
1252525252525252525252
0525252525252525252525

*****
WORD
1024

*****

0525252525252525252525

FIRST WORDS OF NEXT RECORD

EXPECTED

1

0000000000000000000000
0000000000000000000000
0000000000000000000000
0000000000000000000000
0000000000000000000000
0000000000000000000000
0000000000000000000000

ACTUAL

0525252525252525252525

WORD

1252525252525252525252
1252525252525252525252
0525252525252525252525
1252525252525252525252
0525252525252525252525
1252525252525252525252
0525252525252525252525

LAST WORDS OF PREVIOUS RECORD

EXPECTED

*****

DIFFERENCE

ACTUAL

1252525252525252525252

*****

ACTUAL
1252525252525252525252

The following example runs o!cfdt with user-specified command options
to test a DD-29 disk drive. Test output is sent to /a/b/ccc.
Example:
olcfdt fn /w/xxx/yyy dt dd29 sz 36 rsz 4096 test all pat rdm
seed 7070707070707070 > /a/b/ccc

3.1.3

TEST MESSAGES

The o!cfdt test produces the following types of messages:
•
•

Informative
Error

These messages are listed in the subsections that follow.

3-8

CRAY PROPRIETARY

SMM-10l2 C

3.1.3.1

Informative messages

This subsection lists the informative messages, which are sent to
stdout (standard output device).
odt06 - Test completed.
odt16 - iteration
pattern
tests
odt16 - iteration
pattern
tests
This message is generated if the disp command option is set to
display the history of the current iteration. On each iteration
through the test, the selected device is tested with one of the
selected patterns in all of the selected test sections. The
following information is displayed:

iteration
pattern
tests
3.1.3.2

Current iteration
Current test pattern (64-bit octal word)
Test sections being run

Error messages

This subsection lists the error messages, which are sent to stderr
(standard error device).
odt01 - Option x is invalid.
Enter a valid option and rerun.
odt02 - Argument x is invalid.
Enter a valid argument and rerun.
odt03 - Too many items in value list 1.
Reenter argument list and rerun.
odt04 - Required option x is not present.
Enter option x and rerun.
odt15 - Argument is missing.
An option requiring an argument was entered alone.
option with an argument and rerun the test.

Reenter the

The following error messages are sent to stdout:
odt05 - Specified record size exceeds
odt05 - the maximum limit of 4096.
Reenter the rsz option and rerun.
odt07 - Cannot open file.
Contact your CRI representative.
odt08 - Cannot close file.
Contact your CRI representative.

SMM-1012 C

CRAY PROPRIETARY

3-9

odt09 - Cannot seek file.
Contact your CRI representative.
odt10 - Cannot read file.
Contact your CRI representative.
odt11 - Cannot write file.
Contact your CRI representative.
odt12 - User pattern option (upat) must be specified
odt12 - when pattern option (pat) is 'user'.
Enter the upat option and rerun.
odt13 - Pattern option (pat) must be 'user' or 'all'
odt13 - when the user pattern option (upat) is specified.
Enter the pat option and rerun.
odt14 - Data compare error.
Examine the error output to identify the point at which the
failure occurred.

3-10

CRAY PROPRIETARY

SMM-I012 C

3.2

olcfpt

The olcfpt test is an on-line comprehensive floating-point test. It
generates floating-point instructions and data to detect data-sensitive
failures in the floating-point functional units. The generated
instructions are simulated and then executed. The simulation and
execution results are compared, and any differences are reported. This
process continues until the maximum pass, error, or time limit is
reached. If an error is detected, the diagnostic attempts to isolate the
failing data.

3.2.1

TEST SYNOPSIS

The olcfpt command options can be entered in any order. If an option
is omitted, the program uses the default value. The test synopsis lists
the olcfpt command options and arguments in the following order:
1.
2.
3.
4.

Monitor options
Test-specific options
Data pattern options
Instruction options

Synopsis:
olcfpt [chkpnt mode] [cpu clist] [cputime h:m:s] [+I-qetseed]
[getseed file] [help] [maxerr n] [maxp n] [+I-parcel] [time h:m:s]
[+I-verbose] [+xmp] [+crayl]t
[disable ilist] [enable ilist] [+I-isolate] [isop n]
[numins n] [+I-repeat] [seed n] [vI n] [+I-vload]
[+I-fpbits] [+I-fprand] [+I-random]
[+I-fpadd] [+I-fpmult] [+I-fprecip] [+I-scalar] [+I-vector]

t

The monitor command options are described in section 2, Confidence
Test and Monitor Overview.

SMM-1012 C

CRAY PROPRIETARY

3-11

disable ilist
Deselects specific instructions.
following format:

Enter ilist in the

n,n, ... ,n
n is the octal value in the gh field of the specific
instruction. The disable ilist option overrides the
enable ilist option and any selected (+) or
deselected (-) instruction options.
enable ilist
Selects specific instructions.
following format:

Enter ilist in the

n,n, ... ,n

n is the octal value in the gh field of the specific
instruction. The enable ilist option overrides any
selected (+) or deselected (-) instruction options.
When the test is run with default values for the +/instruction options, and the enable ilist option is
selected, only the instructions specified by the
enable ilist option are run.
+I-isolate
Enables (+isolate) or disables (-isolate) the error
isolation option. The default is +isolate.
isop n

Sets the isolation pass limit to n (octal). During
isolation, the diagnostic repeatedly executes the suspected
failing sequence. If the sequence fails, the loop
terminates and the diagnostic attempts to isolate the
sequence further. If the sequence does not fail, the loop
terminates after n passes, and olcfpt assumes that the
error is not in the tested sequence. The default for n
is 0'1000.

numins n

Sets the number of instructions to be generated. n can
be any octal value within the range 1 through 0'20. The
default for n is 0'20.

+I-repeat
Enables (+repeat) or disables (-repeat) the option that
repeats the first pass until the diagnostic terminates.
+repeat is useful for recreating an error. It is
normally used with one of the following options: seed n,
+qetseed, or qetseed file. The default is -repeat
(the program generates new instructions and data after each
pass) •

3-12

CRAY PROPRIETARY

SMM-I012 C

seed n

Sets the random seed to n.
n can be any 64-bit
octal value.
If n is 0, the test reads the real-time
clock and uses the value for the initial seed. The default
for n is 0'33.
If seed n is selected, do not select
+qetseed or getseed file.

vI n

Sets the vector length to n.
n can be any octal value
in the range 0 through 0'100.
If vI is set to 0, a
random vI value is used to initialize the test. The
default for n is 100.

+I-vload

Selects (+vload) or deselects (-vload) vector instructions
for the instruction buffer and, in the case of -vioad,
does not allow you to load (write) or save (read) the
·vector registers.
-vioad overrides vector instructions
selected by +vector and enable ilist. The default is
+vload.

+I-fpbits, +I-fprand, +I-random
Selects (+) or deselects (-) specific data patterns.
If allowed to default, all of the data patterns are run.
If the vI option is 0 or not specified, the vector length
register is initialized with 6-bits of random data. The
data patterns are as follows:
Option

Data Pattern

fpbits

Random number of consecutive I-bits in the
coefficient. Exponent data depends on the
floating-point instruction. For example:
0370000000000007740000
1574777740003777777777
0217600000000000030000
0237740000000000100000

fprand

Random bit generation in the coefficient.
Exponent data depends on the floating-point
instruction. For example:
0224055214537525453301
1327217472141363076211

random

Random bit generation in a word.

For example:

1023122123232122777127
0003423100233344322177
1640034356453221213532
1123235467543221322120
1304322300332105534311

SMM-1012 C

CRAY PROPRIETARY

3-13

+I-fpadd,

+I-f~ult, +I-fprecip, +I-scalar, +I-vector
Selects (+) or deselects (-) specific instruction
groups for the following options:

Option

Instruction Type

fpadd
fpaault
fprecip
scalar
vector

Floating-point addition
Floating-point multiply
Floating-point reciprocal
Scalar instruction (destination)
Vector instruction (destination)

If allowed to default, all instruction groups are run.
groups are as follows:

3.2.2

Option

Instruction Group

fpadd

062, 063
170 through 173

fpmult

064 through 067
160 through 167

fprecip

070, 174

scalar

062, 063
064 through 067
070

vector

160 through 167
170 through 174

The

TEST EXECUTION

The olcfpt execution sequence is as follows:
1.
2.
3.
4.
5.
6.

Test initialization
Random floating-point instruction and data generation
Random floating-point instruction buffer simulation
Random floating-point instruction buffer execution
Comparison of simulation and execution results
Error isolation

Steps 2 through 5 occur on each pass through the test loop.
occurs only on error.

3-14

CRAY PROPRIETARY

Step 6

SMM-1012 C

3.2.2.1

Test initialization

At test initialization, the selected instructions are processed in the
following order:
1.

All instructions are initially enabled unless either of the
following occurs (in which case no instructions are initially
enabled):
•

An instruction group is selected (+option)

•

An enable option is entered and there are no deselected
(-option) instruction group entries

2.

Selected groups are processed, enabling instructions in the
selected groups.

3.

Deselected groups are processed, disabling instructions in the
deselected groups.

4.

Individually selected instructions are processed (all
instructions specified by the enable option).

5.

Individually deselected instructions are processed (all
instructions specified by the disable option).

6.

Vector instructions disabled by -vload are processed.

7.

If no instructions are selected, an error message is displayed
and the test is terminated.

3.2.2.2

Random floating-point instruction and data generation

These routines build and generate the floating-point instruction buffer
and initial data. Instructions for the buffer are randomly selected from
a list of enabled floating-point instructions.
If the i, j, or k field is represented by an x in the Cray
Assembly Language (CAL), a 0 is used for the field (for additional
information, refer to the CRAY Y-MP, CRAY X-MP EA, CRAY X-MP and CRAY-1
CAL Assembler Version 2 Ready Reference, CRI publication SQ-0083).

3.2.2.3

Random floating-point instruction buffer simulation

After the instructions and data are generated, the floating-point
instruction buffer is simulated by the master CPU only. The save
monitor routine saves the results.

SMM-1012 C

CRAY PROPRIETARY

3-15

Each instruction type has a unique simulation routine. The simulation
routines use machine resources differently from the instruction being
simulated. For example, the scalar add, pop, leading zero, and logical
functional units are used to simulate the floating-point add functional
unit.
3.2.2.4

Random floating-point instruction buffer execution

After the instructions are simulated, all of the selected CPUs execute
the floating-point instruction buffer. Before the instructions can be
executed, the program loads the following:
•
•
•

Scalar registers
Vector "registers
Vector length register

Then an unconditional jump to the floating-point instruction buffer is
executed. At the end of the floating-point instruction buffer is an
unconditional jump to a routine that unloads the contents of all the
registers. The save monitor routine saves the results.
3.2.2.5

Comparison of simulation and execution results

After the instructions are executed in all of the selected CPUs, the
compare monitor routine compares the results, and one of the following
actions occurs:
•

If the results match, the test proceeds with the next data
pattern. After all of the selected data patterns are run, the
pass count is incremented.

•

If the results do not match, the test dumps all of the data
related to the suspected failure and, if the isolation option is
enabled (+isolate), attempts to isolate the failure.

3.2.2.6

Error isolation

If an error is detected and the isolation option is enabled (+isolate),
the test attempts to identify and isolate the failing instruction by
executing the instructions in the floating-point instruction buffer, one
at a time.
For scalar instructions, error isolation occurs as follows:

3-16

1.

The j operand is set to O.
operand is restored.

If no error is detected, the

2.

The k operand is set to O.
operand is restored.

If an error is not detected, the

CRAY PROPRIETARY

SMM-1012 C

3.

Each bit of the j operand is set to 0 (one at a time).
error is detected, the bit is restored.

If no

4.

Each bit of the k operand is set to 0 (one at a time).
error is detected, the bit is restored.

If no

For vector instructions, error isolation occurs as follows:
1.

Each element of the j operand is set to 0 (one at a time).
no error is detected, the element is restored.

If

2.

Each element of the k operand is set to 0 (one at a time).
no error is detected, the element is restored.

If

3.

Each bit of the j operand is set to 0 (one at a time, for all
elements) • If no error is detected, the bit is restored.

4.

Each bit of the k operand is set to 0 (one at a time, for all
elements) • If no error is detected, the bit is restored.

When the isolation process terminates, the output dump contains the
following:
•
•
•
•
•

Floating-point instruction buffer
Data used when the failure occurred
Simulated execution results
Actual execution results (if different from the simulated results)
An exclusive OR of the simulated and actual execution results

If the failure is very intermittent, the isolation process may terminate
without detecting an error, and then the output dump does not contain any
actual execution results (differences). In this case, increase the value
of isop n, enable the +repeat option, select the failing CPU, and
use the failing seed to rerun the test.
The program may report an error resulting from a failure in either the
simulated or actual execution. To determine if the error is the result
of an actual execution failure, start olcfpt in a different CPU and
select the suspected failing CPU. For example, the following entry
starts olcfpt in CPU c:
olcfpt cpu c
If olcfpt fails, and the simulated execution is suspect, rerun olcfpt
using a different master CPU and the failing seed, as follows:
olcfpt cpu a,c +repeat seed n
If olcfpt fails in CPU c, the failure is in the actual execution of the
floating-point instruction buffer. If olcfpt does not fail, the error
is either in the simulated execution results from CPU c or it is very
intermittent.

SMM-1012 C

CRAY PROPRIETARY

3-17

3.2.3

TEST TERMINATION

For information on test termination, refer to section 2, Confidence Test
and Monitor Overview.

3.2.4

TEST EXAMPLES

This subsection contains olcfpt execution examples.
The following example runs olcfpt for 0'10000000 passes. Output is
redirected to olcfpt.log. The Dohup(l) command allows the program to
continue executing after you log off the system. You can later log on to
check the test's progress. The ampersand (&) causes the entire command
to execute in the background, so that another prompt is immediately
displayed and you can continue to use the system.
nohup olcfpt maxp 10000000 )olcfpt.log &
The following example runs olcfpt with selected command options and
shell facilities. The test runs for 0'1000000 passes in CPU a with all
default instructions. The job runs as a background process, and the
output is sent to olcfpt.log.
olcfpt maxp 1000000 cpu a )olcfpt.log &
The following example shows a procedure for determining how frequently an
error occurs. The test is rerun with the +repeat option, so that the
first pass is run repeatedly until the test terminates. The test uses
the seed value from the output at the time of the initial error. Error
isolation is disabled. The output is filtered to olcfpt.log.
olcfpt +repeat -isolate maxerr 100 maxp 100 cpu d seed
1436651016713554002511 I tail )olcfpt.loq &
The following example runs olcfpt with floating-point multiply
instructions, and instructions 70 and 174.
olcfpt +fpmult enable 70,174 )olcfpt.log &
The following example runs olcfpt with all of the floating-point vector
instructions except instructions 166 and 167.
olcfpt +vector disable 166,167 )olcfpt.log &

3-18

CRAY PROPRIETARY

SMM-1012 C

The following example runs olcfpt with all of the instructions except
floating-point multiply.
olcfpt -fpmult )olcfpt.log &
The following example shows the output displayed when olc£pt is run
with all default values.
olcfpt
Output:
olcfpt
olcfpt started in cpu A on Tue Aug 25 15:32:16 1987
olcfpt reached maximum pass limit with 1000 passes and 0 errors
on Tue Aug 25 15:32:32 1987

The following example runs olcfpt with the +verbose option enabled so
that a line of output is generated after each pass.
olcfpt +verbose
Output:
olcfpt +verbose
olcfpt started in
olcfpt: pass = 1,
olcfpt: pass = 2,
olcfpt: pass = 3,

cpu A on Tue Aug 25 11:42:47 1987
error = 0 Tue Aug 25 11:42:47 1987
error = 0 Tue Aug 25 11:42:47 1987
error = 0 Tue Aug 25 11:42:47 1987

olcfpt: pass = 1000, error = 0 Tue Aug 25 11:43:03 1987
olcfpt reached maximum pass limit with 1000 passes and 0 errors
on Tue Aug 25 11:43:03 1987

The following example runs olcfpt in CPU conly.
olcfpt cpu c
Output:
olcfpt
olcfpt
olcfpt
on Tue

cpu c
started in cpu C on Tue Aug 25 11:44:51 1987
reached maximum pass limit with 1000 passes and 0 errors
Aug 25 11:45:07 1987

SMM-1012 C

CRAY PROPRIETARY

3-19

The following example runs olcfpt in CPUs a ~nd b, with a as the
master. On each pass, olcfpt tests a sequence of instructions, using
fpbits data for the initial register values.
olcfpt +fpbits cpu a,b
Output on an error:
olcfpt +fpbits cpu a,b
olcfpt started in cpus A, B with
CRAY X-MP mode
olcfpt: restart file written to
(
name
1640> =
(
rev
1641> =
(
date
1642> =
(
pass
1643> =
(
error
1644> =
(
seed
1645> =
(
failpat
3254> =
(
isop
1656> =

master cpu A on Wed Oct 26 10:38:22 1988
A34408- olcf llt
'olcfpt
'5.0
'10/21/88'
11
1
1260350316p37024772740
'fpbits
1000

random floating-point instruction buffer
ibuff

(the floating-point instruction buffer is di$played)
6040a
6040b
6040c
6040d
6041a
6041b
6041c
6041d
6042a

165431
063556
062607
062031
066742
163360
163125
174670
006000 016400

initial scalar register data
(
initsO
12740>
(
12741>
inits1
(
inits2
12742>
(
inits3
12743>
(
inits4
12744>
(
12745>
inits5
(
12746>
inits6
(
inits7
12747>

V4
S5
S6
SO
S7
V3
VI
V6
J

=
=
=
=
=
=
=
=

V3*RV1
S5-FS6
SO+FS7
S3+FS1
S4+RS2
V6*HVO
V2*HV5
/HV7
3500a

0200777600017777777777
1200174777777777777777
0201747777740037777777
1200070000000000000100
0201767777400000000007
0277760000000000037777
0277607777777777777617
1200750077777777777776

initial vector length register
initvl
( 1 2 7 5 0 > = 0000000000000000000100
initial vector register data

(vector register data is displayed)

3-20

CRAY PROPRIETARY

SMM-1012 C

Output (continued):
simulated floating-point instruction buffer results
The expected data shown below has the following format:
name
name:
index:
offset:
data:

***

+ index

The
The
The
The



= data

name of the data dumped on this line.
index into the data starting at name.
offset into the data buffer.
actual data dumped.

Expected Results

***

Optional, default:

cpu A (master)

Source data buffer at 13640 in Memory
Memory address in source data buffer

= (offset>

+ 13640 (source data buffer)

simulated scalar register data results
1100> = 1200174777777777777777
<
sO
1101> = 1200174777777777777777
sl
<
1102> = 0201747777740037777777
s2
<
1103> = 1200070000000000000100
s3
<
s4
1104> = 0201767777400000000007
<
(
1105> = 1277607777777777777617
s5
1106> = 1200677777777777777600
<
s6
1107> = 0000000000000000000000
s7
<
simulated vector length register data results
1110> = 0000000000000000000100
vI
<
simulated vector register data results

(vector register data is displayed)
Differences are the results from actual execution of the floating-point
instruction buffer that differ from the master (simulated or
actual) execution.
sO-s7
= scalar register data results
vI
= vector length register data result
vO-v7
= vector register data results
The difference data shown below has the following format:
name

+ index



= data
data differences

SMM-1012 C

o.

CRAY PROPRIETARY

3-21

Output (continued):
name:
index:
offset:
data:

The
The
The
The
The
the

name of the data dumped on this line.
index into the data starting at name. Optional, default: O.
offset into the data buffer.
actual data dumped.
differences are marked with an asterisk (*) preceding
data word.

data differences: The bits in difference between the actual results and
the expected results.

*** Differences ***
cpu A (master)
Source data buffer at 15640 in Memory copied to save buffer at 112573 in Memory
Memory address in source data buffer =  + 15640 (source data buffer)
Memory address in save data buffer =  + 112573 (save data buffer)
actual floating-point buffer execution results

*** Differences ***
cpu B
Source data buffer at 15640 in Memory copied to save buffer at 113705 in Memory
Memory address in source data buffer =  + 15640 (source data buffer)
Memory address in save data buffer =  + 113705 (save data buffer)
actual floating-point buffer execution results
s5

<

1105>

=

*1277607777776000000000
0000000000001777777617

=
=
=
=
=
=
=
=

'olcfpt
'5.0
'10/21/88'
11
1
1260350316637024772740
'fpbits
1000

Beginning error isolation
Error isolation complete
name
rev
date
pass
error
seed
failpat
isop

<
<
<
<
<
<
<

<

1640>
1641>
1642>
1643>
1644>
1645>
3254>
1656>

isolation: random floating-point instruction buffer
ibuff
6040b
6040c

3-22

063556
006000 016400

S5
J

CRAY PROPRIETARY

S5-FS6
3500a

SMM-I012 C

Output (continued):
isolation: initial scalar register data
initsO
12740> = 0200777600017777777777
<
inits1
12741> = 1200174777777777777777
<
inits2
12742> = 0201747777740037777777
<
inits3
12743> = 1200070000000000000100
<
inits4
12744> = 0201767777400000000007
<
inits5
12745> = 0200000000000000002000
<
inits6
12746> = 0000000000000000000000
<
inits7
12747> = 1200750077777777777776
<

(From this point on, the dump is similar to the previously listed
portion of the dump that displayed the unisolated error information.)
The first address (FADD) of the diagnostic is 1640a
olcfpt reached maximum error limit with 11 passes and 1 errors
on Wed Oct 26 10:40:37 1988

3.2.5

TEST MESSAGES

The olcfpt test produces the following types of messages:
•
•

Informative
Error

These messages are described in the subsections that follow.

3.2.5.1

Informative messages

If no error occurs, olcfpt produces two messages, one at start-up time
and another at test termination.
If the +verbose option is enabled, a
message is sent to stdout (standard output device) after each pass
through the test loop.
On an error, the test provides information such as the following:
•

Pass and error counts

•

Seed at the beginning of the pass on which the error occurred

•

Contents of the instruction buffer

•

Initial data

SMM-1012 C

CRAY PROPRIETARY

3-23

•

Data results from the simulated instruction execution in the
master CPU

•

Differences between the simulated execution results from the
master CPU and the actual execution results from all of the
selected CPUs

3.2.5.2

Error messages

One of the following error messages is sent to stderr (standard error
device) if an invalid command option is entered:
olcfpt: selins: No executable instructions selected.
Correct and rerun.
olcfpt: selins: Vector length must be in the range of 0 through 100.
Correct the vI option and rerun.
olcfpt: No data patterns(s) selected.
All data patterns are deselected. Correct and rerun.
One of the following error messages is sent to stderr if oIcfpt
detects an unexpected error. Select a different master CPU and rerun the
test. If the problem persists, contact your CRI representative.
simulate:

(software error)

The gh field is greater than

olcfpt: simulate:
simxxx routine.

(software error)

The instruction does not have a

olcfpt: generate:
genxxx routine.

(software error)

The instruction does not have a

olcfpt:
177.

3-24

CRAY PROPRIETARY

SMM-1012 C

3.3

olcm

The olom test is an on-line central memory test.
It tests central
memory and the paths for the S, T, B, and V registers by using unique
algorithms that perform an ascending and descending READ/TEST/WRITE loop
of central memory, one word at a time with scalars and one block (100 a >
at a time with the T, B, and V registers.
olcm also has a random-data
section and a section to create memory conflicts. olcm runs on CX/CEA
and CX/1 systems.

3.3.1

TEST SYNOPSIS

The olcm command options can be entered in any order.
If an option is
omitted, the program uses the default value. The test synopsis lists the
olcm command options and arguments in the following order:
1.
2.

Monitor options
Test-specific options

Synopsis:
olcm [chkpnt mode] [cpu clist] [cputime h:m:s]
[qetseed file]

[+I-qetseed]

[help] [maxerr n] [maxp n] [+I-parcel] [time h:m:s]

[+I-verbose] [+xmp] [+crayl]t

[section slist] [seed n] [words n]

section slist
Selects the test sections to be executed.
entered in the following format:

slist is

n,n, ... ,n

n can be any of the following test sections, entered in
any order (if allowed to default, all test sections are
executed):
Section

t

Description

1

Central memory storage and scalar path test

2

Central memory storage and T register path
test

The monitor command options are described in section 2, Confidence
Test and Monitor Overview.

SMM-1012 C

CRAY PROPRIETARY

3-25

section slist
(continued)
Section

Description

3

Central memory storage and B register path
test

4

Central memory storage and vector register
path test using only the first vector logical
unit

5

Central memory storage and vector register
path test using both vector logical units

6

Central memory random data test

7

Central memory conflict test

seed n

Sets the random seed to n. n can be any 64-bit
octal value. If n is 0, the test reads the real-time
clock and uses the value for the initial seed. The default
for n is 0'33. If seed n is selected, do not select
+getseed or getseed file.

words n

Indicates the number of words to be tested in central
memory. n is a value in the range 0'100 through
0'4,000,000. All values are rounded down to the nearest
0'100 words. For example, 0'150 is rounded down to 0'100;
0'1000 remains unchanged. The default for n is 0'3,000.

3.3.2

TEST EXECUTION

The olom execution sequence is as follows:
1.
2.
3.
4.

Test initialization
Test section execution
Comparison of expected and actual data within each test section
Error report

Steps 2 and 3 occur on each pass through the test loop.
only on error.
3.3.2.1

Step 4 occurs

Test initialization

At test initialization, the test information is processed as follows:
1.

3-26

The number of words to be tested in central memory is validated
(words n).

CRAY PROPRIETARY

SMM-1012 C

2.

Selected test sections are validated (section slist).

3.

The random seed is validated (seed n).

3.3.2.2

Test section execution

The subsections that follow describe the olem test sections.
Test section 1 - This section tests central memory storage and the scalar
paths.
The following algorithm is used to perform an ascending and descending
read/test/write loop of central memory (one word at a time):
1.

Write a 64-bit address pattern to all memory locations in the
test buffer.

2.

Load the scalar register with the pattern from the address
register.

3.

Verify data integrity by comparing the memory location written
with the 64-bit address pattern to the scalar register. Generate
a dump on a data miscompare.

4.

Write the 64-bit address pattern to the previously tested memory
location.

5.

Increment location if ascending, or decrement if descending.

6.

Repeat steps 2 through 5 until all locations are written.

Test sections 2 and 3 - These sections test the T and B register paths,
respectively, and central memory storage.
The algorithm used in test section 1 is used in these test sections to
perform an ascending and descending read/test/write loop of central
memory. However, in test sections 2 and 3, the algorithm differs as
follows:
•

Data transfers are done in 64-word blocks (rather than one word at
a time).

•

Data transfers use ascending memory addresses only (the descending
loops contain descending data blocks with ascending addresses).

Test section 4 - This section tests central memory storage and the vector
register paths, using only the first vector logical unit.

SMM-1012 C

CRAY PROPRIETARY

3-27

The algorithm used in test section 1 is used in this test section to
perform an ascending and descending test of central memory storage and
the vector register paths. However, in test section 4, the algorithm
differs as follows:
•

Data transfers are done in 64-word blocks (rather than one word at
a time).

•

Data transfers use negative indexing in the descending test
subsections, so that the 64-bit pattern is stored in the vector
registers in reverse order of the way the pattern is stored in
test sections 2 and 3.

Test section 5 - This section tests central memory storage and the vector
register paths, using both vector logical units.
In section 5, the following occurs:
•

Vector loads are doubled to force the use of more than one central
memory port.

•

Vector comparisons are doubled to force the use of both logical
units.

•

The 64-bit pattern is generated with vector recursion. (In a
vector instruction, vector recursion results when vi and Vj
or vi and vk refer to the same vector register).

The algorithm used in test section 1 is used in this test section to
perform an ascending and descending test of central memory storage and
the vector register paths. However, in test section 5, the algorithm
differs as follows:
•

Data transfers are done in 64-word blocks (rather than one word at
a time).

•

Data transfers use negative indexing in the descending test
subsections, so that the 64-bit pattern is stored in the vector
registers in reverse order of the way the pattern is stored in
test sections 2 and 3.

Test section 6 - This section tests central memory by generating random
data in the subroutine RANCOM. The test does the following in
subsection 1:

3-28

1.

Loads random data (64 bits) into VI (all 100 elements).

2.

Writes VI to the central memory area under test (the same block
of 100 random words is written consecutively, so that each 100th
word is the same).

CRAY PROPRIETARY

SMM-1012 C

3.

The central memory area under test is read into V2.

4.

Vl and V2 are compared in Vo.

The test does the following in subsection 2:

1.

Loads random data (64 bits) into TOO through T77.

2.

Writes TOO through T77 to the central memory area under test (the
same block of 100 random words is written consecutively, so that
each 100th word is the same).

3.

The central memory area under test is read into 82.

4.

The T-registers are loaded into 81, one word at a time.

5.

81 and 82 are compared in 80.

The test does the following in subsection 3:
1.

Loads random data (32 bits) into B02 through B77. (BOO and B01
are skipped because they are used for return jumps.)

2.

Writes B02 through B77 to the central memory area under test (the
same block of 100 random words is written consecutively, so that
each 100th word is the same).

3.

The central memory area under test is read into A2.

4.

The B registers are loaded into A1, one word at a time.

5.

A1 and A8 are compared in AO.

Test section 7 - This section tests central memory by generating
conflicts in the vector reads. The conflicts are generated as follows:
1.

Do a vector read from the first memory buffer location to V2,
using an increment of O.

2.

Increment the memory location by 0'40.

3.

Initiate a fetch.

4.

Do a vector read from the memory location (from step 2) to V3,
using an increment of O.

5.

Compare V2 and V3 to V4.

6.

Increment the memory location (from step 1) by 0'1000, and write
V4 to the new memory location, using an increment of 1.

7.

Check for error.

8MM-1012 C

CRAY PROPRIETARY

3-29

8.

Increment the memory location (from step 1) by 1.

9.

Repeat steps 1 through 8 until all memory locations are read.

The two vector reads to locations 40-words apart generate section and
subsection conflicts. A fetch issued between the two reads generates
conflicts in port D.

3.3.2.3

Comparison of expected and actual data

After each test section is executed, the actual results are compared to
the expected results.
If the results match, the test continues.
If the
results do not match, the test dumps all of the data related to the
suspected failure.
After all of the selected sections are run, the pass
count is incremented.

3.3.2.4

Error report

If an error is detected, the test dumps all of the data related to the
suspected failure. The output dump contains the following:
•
•
•
•
•
•
•
•

3.3.3

Diagnostic Information Blocks (DIBs)
Section and subsection under test
Number of central memory words being tested
Expected results
Actual results
Differences
Address of the code at the time the error was detected
Buffer address of the data at the time the error was detected

TEST TERMINATION

There are several monitor options that can cause a test to terminate.
Refer to the information on test termination in section 2, Confidence
Test and Monitor Overview.

3.3.4

TEST EXAMPLES

This subsection contains olem execution examples.
The following example executes olem for a maximum of 0'500 passes,
testing 0'100,000 words of central memory.
olcm maxp 500 words 100000

3-30

CRAY PROPRIETARY

SMM-1012 C

The following example executes olcm for a maximum of 0'1500 passes,
with test sections 1 and 5 enabled.
oLcm maxp 1500 section 1,5

The following example executes olcm for a maximum of 0'1000 passes
(default), using an initial seed value of 12345, with test sections 1, 2,
3, 6, and 7 enabled.
olcm seed 12345 section 6,3,2,1,7

The following example runs olcm for 0'1000 passes (default), with test
sections 1, 2, 3, and 4 enabled. Output is redirected to olcm.log.
The nohup(l) command allows the program to continue executing after you
log off the system. You can later log on to check the test's progress.
The ampersand (&) causes the entire command to execute in the
background, so that another prompt is immediately displayed and you can
continue to use the system.
nohup olcm section 1,2,3,4 )olcm.log &

The following example shows the output displayed when olcm is run with
all default values.
olcm
Output:
olcm
olcm started in cpu A on Mon Jul 18 11:14:10 1988

CRAY Y-MP MODE
olcm reached maximum pass limit with 1000 passes and 0 errors
on Mon Jul 18 11:14:42 1988

The following example executes olcm for a maximum of 0'1000 passes
(default), testing 0'150 words of central memory.
olcm words 150
Output:
olcm words 150
olcm started in cpu A on Fri Jul 15 15:30:12 1988
CRAY Y-MP MODE
The value for words was rounded down to the nearest 100 octal words
olcm reached maximum pass limit with 1000 passes and 0 errors
on Fri Jul 15 15:30:47 1988

SMM-1012 C

CRAY PROPRIETARY

3-31

The following example executes olem for 0 passes (terminated on error),
testing 0'1234 words of central memory.
olcm words 1234
Output (on error):

error
seed
failsec
words
subs
lower
upper
$dif
$exp
$act
$elem
$vm

<

1

<

33

<

1

<
<
<
<
<
<
<
<
<

1200
2
13270
14470
0000000000000000000004
0000000000000000004467
0000000000000000004463
0000000000000000000000
0000000000000000000000

Error Address of the executing code
errcode
(
2760> = 0000000000000000004577
Error Address of the data area
errdata
(
2761> = 0000000000000000014467
A registers at the time of error
savea
( 4 3 4 0 > = 0000000000000000000000
savea
+ 0004 (
4344> = 0000000000000001100333
S registers at the time of error
saves
( 4 3 5 0 > = 0000000000000000001234
saves
+ 0004 <
4354> = 0000000000000000000000
B registers (sections 3 and 6 only)
$actb
<
3640> = 0000000000000000000000
$actb
+ 0004 <
3644> = 0000000000000000000000
$actb
+ 0010 <
3650> = 0000000000000000000000

$actb

3-32

+ 0074 (

3734>

0000000000000000000000

CRAY PROPRIETARY

SMM-1012 C

Output (continued):

T registers (sections 2 and 6 only)
$actt
(3740>
0000000000000006720344
$actt
+ 0004 <
3744> = 0000000015033227672440
+ 0010 <
$actt
3750> = 0000356647785921190300

=

$actt

+ 0074 <

VO - Difference
$difvO
$difvO
+ 0004
$difvO
+ 0010

$difvO

4034>

=

3987564008722334539870

(section 6 only)
<
4040>
0000000000000000000000
<
4044>
0000000000000000000000
<
4050> = 0000000000000000000000

+ 0074 <

=

4034>

=

0000000000000000000000

V1 - Expected (section 6 only)
$expv1
<
4140> = 0000000000000000000000
$expv1
4144> = 0000000000000000000000
+ 0004 <
$expv1
+ 0010 <
4150> = 0000000000000000000000

4234>

=

0000000000000000000000

V2 - Actual (section 6 only)
4240>
$actv2
<
$actv2
4244>
+ 0004 <
$actv2
+ 0010 <
4250>

=
=
=

0000000000000000000000
0000000000000000000000
0000000000000000000000

=

0000000000000000000000

$expv1

$actv2

+ 0074 <

+ 0074 <

4334>

The first address (FADD) of the diagnostic is 550a
olcm reached maximum pass limit with 0 passes and 1 errors
on Mon Jul 18 14:58:37 1988

SMM-1012 C

CRAY PROPRIETARY

3-33

3.3.5

TEST MESSAGES

The olem test produces the following types of messages:
•
•

Informative
Error

These messages are described in the subsections that follow.
3.3.5.1

Informative messages

If no error occurs, olem produces two messages, one at start-up time and
another at test termination. If the +verbose option is enabled, a message
is sent to stdout (standard output device) after each pass through the test
loop.
If the value for words n is rounded down to the nearest 0'100 words, the
following informative message is displayed:
The value for words was rounded down to the nearest 100 octal words.
If the value for seed n is set to 0, the following informative message is
displayed:
Seed selected was 0, so the test read RTC to initial seed.
3.3.5.2

Error messages

One of the following error messages is sent to stderr (standard error
device) if an invalid command option is entered:
Invalid section selected. Valid sections are: 1, 2, 3, 4, 5, 6, and 7.
Rerun olem using a valid value for section slist.
Number of words selected is too small (minimum is 100 octal).
Rerun olem using a valid value for words n.
Number of words selected is too large (maximum is 4,000,000 octal).
Rerun olem using a valid value for words n.
System could not allocate words; words selected may be too large.
Rerun olem using a smaller value for words n.

3-34

CRAY PROPRIETARY

SMM-1012 C

3.3.5.3

Error output definitions

The following are definitions of the output that is dumped on error.
Refer to section 3.3.4, Test Examples, for an example of error output.
Output

Description

failsec

Test section that was executing when the error occurred

words

Size of the central memory buffer being tested

subs

Subsection of the test section

lower

Address of the beginning of the buffer defined by words

upper

Address of the end of the buffer defined by words

errcode

Address where the test code was executing

errdata

Address within the central memory buffer that was being
tested at the time the error occurred

SMM-I012 C

CRAY PROPRIETARY

3-35

3.4

alcrit

The alcrit test is an on-line comprehensive random instruction test.
It randomly generates instructions and data to detect
instruction-sensitive and data-sensitive sequence failures. The
generated instructions are simulated and then executed. The simulation
and execution results are compared, and any differences are reported. If
an error is detected, the diagnostic attempts to isolate the failing
instruction sequence. The test generates, simulates, executes, and
compares new instructions and data until the maximum pass, error, or time
limit is reached.
The olcrit test runs under the confidence monitor program, aleman.
The aleman monitor compares the test simulation and execution results.
For additional information on aleman, refer to section 2, Confidence
Test and Monitor Overview.

3.4.1

TEST SYNOPSIS

The olcrit command options can be entered in any order. If an option
is omitted, the program uses the default value. The test synopsis lists
the olcrit command options and arguments in the following order:
1.
2.
3.
4.

3-36

Monitor options
Test-specific options
Data pattern options
Instruction options

CRAY PROPRIETARY

SMM-1012 C

Synopsis:
olcrit [chkpnt mode] [cpu clist] [cputime h:m:s] [+I-getseed]
[getseed file] [help] [mazerr n] [mazp n] [+I-parcel] [time h:m:s]

[+I-verbose] [+Emp] [+crayl]t
[+I-cluster] [cluster n] [disable ilist] [enable ilist]
[+I-isolate] [isop n] [numins n] [+I-repeat] [seed n] [vI n]
[+I-vload]
[+I-bits] [+I-onezero] [+I-random]
[+I-address] [+I-ci] [+I-cm] [+I-ema] [+I-fpadd] [+I-fpmult]
[+I-fprecip] [+I-int] [+I-jump] [+I-logical] [+I-pop] [+I-scalar]
[+I-shift] [+I-shr] [+I-vector]
+I-cluster
Enables (+cluster) or disables (-cluster) cluster
selection. This option is recommended only for sites that
run multitasking jobs. If a site runs multitasking jobs
and olcrit detects a failure in the shared registers, the
only way to determine which cluster was used is to enable
the +cluster option. However, selecting a specific
cluster with the cluster n option does not ensure that
olcrit will be able to access that cluster immediately.
The UNICOS scheduler must wait for that cluster to become
available. The default is -cluster.

t

The monitor command options are described in section 2, Confidence
Test and Monitor Overview.

SMM-1012 C

CRAY PROPRIETARY

3-37

cluster n
Selects a specific cluster. n can be anyone of the
following cluster numbers associated with the indicated
mainframe (cluster number 1 is reserved for the operating
system) :
Mainframe

Cluster Numbers

CRAY
CRAY
CRAY
CRAY
CRAY

2,
2,
2,
2,
2,

Y-MP/8
Y-MP/4
X-MP/4
X-MP/2
X-MP/1

3, 4, 5, 6, 7, 10, 11
3, 4, 5
3, 4, 5
3
3

'If cluster n is selected, the +cluster option must
also be selected. The default for n is a random cluster
number.
disable ilist
Deselects specific instructions.
following format:

Enter ilist in the

n, n, ••• , n
n is the octal value in the gh or ghijk field of the
specific instruction. If the gh field does not specify a
unique instruction, the ijk field can be used to deselect
a specific instruction. For example, the following
instructions all have the same gh field:

030jO, 036jk, 037jk
To deselect the preceding instructions, you must specify
the ghijk field, as follows:
disable 03000,03600,03700
The disable ilist option overrides the enable ilist
'option and any selected (+) or deselected (-)
instruction options.
enable ilist
Selects specific instructions.
following format:

Enter ilist in the

n, n, .•• , n

3-38

CRAY PROPRIETARY

SMM-1012 C

enable ilist
(continued)
n is the octal value in the gh or ghijk field of the
specific instruction. If the gh field does not specify a
unique instruction, the ijk field can be used to select a
specific instruction. For ~xample, the following
instructions all have the same gh field:
0030jO, 0036jk, 0037jk
To select the preceding instructions, you must specify the
ghijk field, as follows:
enable 003000,003600,003700
The enable ilist option overrides any selected (+) or
deselected (-) instruction options. When the test is run
with default values for the +/- instruction options, and
the enable ilist option is selected, only the
instructions specified by the enable ilist option are
run.
When using the enable option to select any of the
following instructions, numins n should be greater
than 1 or the selected instructions will not be placed in
the instruction buffer:
34 through 37
56, 57, 76, 77
100 through 130
150 through 153
176, 177
All of these instructions use an A
such as an index or a shift count.
selected instructions is executed,
A register load instruction. As a
set to 1, there is no buffer space
instruction using the A register.

register for operations
Before each of the
the test executes an
result, if numins is
remaining for the

+I-isolate
Enables (+isolate) or disables (-isolate) the error
isolation option. The default is +isolate.
isop n

SMM-1012 C

Sets the isolation pass limit to n (octal). During
isolation, the diagnostic repeatedly executes the suspected
failing sequence. If the sequence fails, the loop
terminates and the diagnostic attempts to isolate the
sequence further. If the sequence does not fail, the loop
terminates after n passes, and olcrit assumes that the
error is not in the tested sequence. The default for n
is 0'1000.

CRAY PROPRIETARY

3-39

numins n

Sets the number of instructions to be generated.
n can be any octal value within the range 1 through 0'2000.
The default for n is 0'200.

+I-repeat
Enables (+repeat) or disables (-repeat) the option that
repeats the first pass until the diagnostic terminates.
+repeat is useful for recreating an error. It is
normally used with one of the following options: seed n,
+getseed, getseed file, or +cluster together with
cluster n. The default is -repeat (the program generates
new instructions and data after each pass).
seed n

Sets the random seed to n. n can be any 64-bit
·octal value. If n is 0, the test reads the real-time clock
and uses the value for the initial seed. The default for n is
0'33. If seed n is selected, do not select +getseed or
getseed file.

vI n

Sets the vector length to n. n can be any octal value
within the range 0 through 0'100. The default for n is O.
If vI is set to 0, a random vI value is used to initialize
the test and the value may change during the execution of the
random instruction buffer.
If the vI value is within the range 1 through 0'100,
instruction 00200k is disabled. The vI value is initialized
to n and remains set to n during the execution of the random
instruction buffer. However, if instruction 00200k is
selected by the enable option, the vI value is initialized
to n and may change each time a 00200k instruction is
executed in the random instruction buffer.

+I-vload

Selects (+vload) or deselects (-vload) vector instructions
for the instruction buffer and, in the case of -vload,
does not allow you to load (write) or save (read) the
vector registers. -vload overrides vector instructions
selected by +vector and enable ilist. The default is
+vload.

+I-bits, +I-onezero, +I-random
Selects (+) or deselects (-) specific data patterns.
If allowed to default, all of the data patterns are run.
The selected data patterns are used for the initial
register and memory values. However, the vector length
(VL) register is always initialized with 6-bits of random
data. The data patterns are as follows:

3-40

CRAY PROPRIETARY

SMM-1012 C

+I-bits, +I-onezero, +I-random
(continued)
Option

Data Pattern

bits

Random number of consecutive I-bits in a
word. For example:
0000017777777776000000
1777000000000000000377
1777777777777777777777
0000000000000000000000
0000000000100000000000

onezero

Random selection of alII's or all O's in a
word. For example:
1777777777777777777777
0000000000000000000000

random

Random bit generation in a word.

For example:

1023122123232122777127
0003423100233344322177
1640034356453221213532
1123235467543221322120
1304322300332105534311

+I-address, +I-ci, +I-em, +I-ema, +I-fpadd, +I-fpmult, +I-fprecip, +I-int,
+I-jump, +I-loqical, +I-pop, +I-scalar, +I-shift, +I-shr, +I-vector
Selects (+) or deselects (-) specific instruction
groups for the following options:
Option

Instruction Type

address
ci
em
ema
fpadd
fpmult
fprecip
int

Address register
Compressed index
Central memory
Extended memory addressing
Floating-point addition
Floating-point multiply
Floating-point reciprocal
Integer
Jump
Logical
Population/parity count
Scalar register
Shift
Shared register
Vector register

jump

loqical
pop
scalar
shift
shr
vector

SMM-1012 C

CRAY PROPRIETARY

3-41

+I-address, +I-ci, +I-ca, +I-ema, +I-fpadd, +I-fpault, +I-fprecip, +I-int,
+I-jump, +I-logical, +I-pop, +I-scalar, +I-shift, +I-shr, +I-vector
(continued)
The instruction groups are as follows:

Option

CX/CEA Instructions

address

001000, 00200k
002200, 002300
002500 through 002700

01hijkm
010ijkm through 013ijkm

CRAY-1
Instructions

001000, 00200k
010ijkm through 013ijkm
020 through 022

023i01

023i01

024, 025
030 through 032
034, 035

024, 025

10hijkm, 11hijkm

020 through 022

026ij7, 027ij7
030 through 032
034, 035

10hijkm, 11hijkm

ci

t

175ij4, 175ij5
175ij6, 175ij7

None

10h through 13h
34 through 37
176iOO, 1770jO

10h through 13h
34 through 37
176iOO, 1770jO

e.at

01hijkm

None

fpadd

062, 063
170 through 173

062, 063
170 through 173

fpmult

064 through 067
160 through 167

064 through 067
160 through 167

fprecip

070, 174ijO

070, 174ijO

int

030 through 032
060 through 061
154 through 157

030 through 032
060 through 061
154 through 157

Extended memory instructions are not available on CEA systems in
Y-mode.

3-42

CRAY PROPRIETARY

SMM-1012 C

+I-address, +I-ci, +1-0., +I-e.a, +I-fpadd, +I-fpmult, +I-fprecip, +I-int,
+I-jump, +I-logical, +I-pop, +I-scalar, +I-shift, +I-shr, +I-vector
(continued)

CRAY-1
Option

CX/CEA Instructions

Instructions

jump

005, 006, 007
010 through 017

005, 006, 007
010 through 017

logical

042 through 051
140 through 147
175

042 through 051
140 through 147
175

pop

026ijO, 026ij1
027ijO
174ij1, 174ij2

026ijO, 026ij1
027ijx
174ij1, 174ij2

scalar

0036jk, 0037jk
014jkm through 017jkm
023ijO
026ijO, 026ij1
027ijO
036 through 071
072i02, 072ij3
073i02, 073ij3
074, 075
12hijkm, 13hijkm

014jkm through 017jkm
023ijO
026ijO, 026ij1
027ijO
036 through 071
074, 075
12hijkm, 13hijkm

shift

052 through 057
150 through 153

052 through 057
150 through 153

shr

0036jk,
026ij7,
072i02,
073i02,

None

vector

0030jO, 073iOO
076, 077
140 through 177

0037jk
027ij7
072ij3
073ij3

003, 073, 076, 077
140 through 177

The diagnostic does not currently execute the following
instructions in the random instruction buffer: 0, 002400,
0034jk, 4, 33, 072iOO, 073ij1, 176iOk, 176i1k,
1770jk, 1771jk.

SMM-1012 C

CRAY PROPRIETARY

3-43

+I-address, +I-ci, +1-0., +I-ema, +I-fpadd, +I-fpmult, +I-fprecip, +I-int,
+I-jump, +I-loqical, +I-pop, +I-scalar, +I-shift, +I-shr, +I-vector
(continued)
If allowed to default on a CEA system in Y-mode, all
instruction groups are selected with the following
exceptions:
•

If the cluster number assigned to the job is 0, the
shared register (shr) instruction group is
deselected.

•

The extended memory addressing (ema) instruction
group is deselected.

If allowed to default on a CRAY X-MP computer system, all
instruction groups are selected with the following
exception: if extended memory addressing (ema) or
compressed index (ci) hardware is not present in the
system, the ema and ci instruction groups are
deselected, respectively.
If allowed to default on a CRAY-1 computer system, all
instruction groups are selected except ema, ci, and
shr. However, the vector population count and parity
(pop) instruction group is selected only if pop
hardware is-present in the system.

3.4.2

TEST EXECUTION

The olcrit execution sequence is as follows:
1.
2.
3.
4.
5.
6.

Test initialization and hardware configuration detection
Random instruction and data generation
Random instruction buffer simulation
Random instruction buffer execution
Comparison of simulation and execution results
Error isolation

Hardware configuration detection occurs only at test initiation. Steps
2 through 5 occur on each pass through the test loop. Step 6 occurs only
on error.

3-44

CRAY PROPRIETARY

SMM-1012 C

3.4.2.1

Test initialization and hardware configuration detection

At test initialization, instructions are processed in the following order:
1.

All instructions are initially enabled unless either of the
following occurs (in which case no instructions are initially
enabled):

(+option)

•

An instruction group is selected

•

An enable option is entered and there are no deselected
(-option) instruction group entries

2.

Selected groups are processed, enabling instructions in the
selected groups.

3.

Deselected groups are processed, disabling instructions in the
deselected groups.

4.

If the vI option is set to a value within the range
1 through 0'100, instruction 00200k is deselected.

5.

Individually selected instructions are processed (all
instructions specified by the enable option).

6.

Individually deselected instructions are processed (all
instructions specified by the disable option).

7.

Any vector instructions disabled by -vload are processed.

8.

If no instructions are selected, an error message is displayed
and the test is terminated.

The hardware configuration detection routine determines which of the
following computer systems is configured:
•

CRAY X-MP computer system

•

CRAY-1 computer system

Then the hardware configuration detection routine adjusts testing
accordingly, by determining the following:
Mainframe

Hardware Configuration Detection Routine

CEA (Y-mode)

Determines whether cluster 0 is in use

CRAY X-MP

Determines whether the system contains extended
memory addressing and/or compressed indexing
hardware, and whether cluster 0 is in use

SMM-1012 C

CRAY PROPRIETARY

3-45

Mainframe

Hardware Configuration Detection Routine

CRAY-l

Determines whether the system contains a vector
population count functional unit

After determining the hardware characteristics, the routine writes a
message to stdout to indicate the type of system detected, and disables
all instructions that are not available because of hardware constraints.
Instruction generation is dependent on the hardware configuration
detected, as follows (you can use +I-ci, +I-ema, +I-pop, or +I-shr
to override this default instruction generation process):
Mainframe

Instructions Generated

CEA (Y-mode)

All instructions except extended memory addressing
instructions are generated

CRAY X-MP

All instructions are generated with the
following exception: compressed indexing and
extended memory instructions are generated only if
present in the hardware.

CRAY-l

All instructions are generated except the following:
A load VL instruction (00200k)
Scatter/gather/compressed indexing instructions
Extended memory instructions
Shared register instructions
Vector pop/parity instructions are generated
only if the hardware contains a vector
population count functional unit.

3.4.2.2

Random instruction and data generation

These routines build and generate the random instruction buffer and
initial data. Instructions for the buffer are randomly selected from a
list of enabled instructions. The values of the i, j, and k fields
are randomly selected when appropriate.

3-46

CRAY PROPRIETARY

SMM-I012 C

3.4.2.3

Random instruction buffer simulation

After the instructions and data are generated, the random instruction
buffer is simulated by the master CPU only. The save monitor routine
saves the results.
Each instruction type has a unique simulation routine. The simulation
routines use machine resources differently from the instruction being
simulated. For example, the address multiply functional unit may be
simulated with the floating-point multiply functional unit.

3.4.2.4

Random instruction buffer execution

After the instructions are simulated, all of the selected CPUs execute
the random instruction buffer code. Before the instructions can be
executed, the program loads the following:
•
•
•
•
•

•
•
•
•
•
•

Vector registers
Vector length register
Vector mask register
Address registers
B registers
T registers
Semaphore registers
Shared T registers
Shared B registers
Scalar registers
Central memory

Then an unconditional jump to the random instruction buffer is executed.
At the end of the random instruction buffer is an unconditional jump to a
routine that unloads the contents of the registers and central memory.
The save monitor routine saves the results.

3.4.2.5

Comparison of simulation and execution results

After the instructions are executed in all of the selected CPUs, the
compare monitor routine compares the results, and one of the following
actions occurs:
•

If the results match, the test proceeds with the next data
pattern. After all of the selected data patterns are run, the
pass count is incremented.

•

If the results do not match, the test dumps all of the data
related to the suspected failure and, if the isolation option is
enabled (+isolate), attempts to isolate the failure.

SMM-I012 C

CRAY PROPRIETARY

3-47

3.4.2.6

Error isolation

If an error is detected and the isolation option is enabled (+isolate),
the test attempts to reduce the random instruction buffer to the minimum
number of failing instructions. The isolation process consists of two
parts.
In the first part of the isolation process, the instruction buffer is
shortened from the end, one instruction at a time. The isolation routine
initially tests the number of instructions to be generated minus one
(numins n-l). The routine executes until the specified number of
passes is reached (isop n) or an error is detected. If an error is
detected, the number of instructions tested is decremented by one, and
testing continues for isop n passes. This process continues until no
errors are detected or there are no remaining instructions to be tested.
If there are no remaining instructions to be tested and the test detects
an error resulting from loading and unloading the registers, the test
generates an output dump and the isolation process terminates.
In the second part of the isolation process, the last instruction removed
is tested by itself for isop n passes. If an error is not detected,
the last instruction removed and the instruction preceding it in the
random instruction buffer are tested for isop n passes. Until the
program detects an error or reaches the beginning of the instruction
buffer, one more preceding instruction is added to the test sequence on
each iteration of the isolation process.
When the isolation process terminates, the output dump contains the
following:
•
•
•
•
•

Isolated instruction buffer
Data used when the failure occurred
Simulated execution results
Actual execution results (if different from the simulated results)
An exclusive OR of the simulated and actual execution results

If the failure is very intermittent, the second part of the isolation
process may terminate without detecting an error, and then the output
dump will not contain any actual execution results (differences). In
this case, increase the value of isop n, enable the +repeat option,
select the failing CPU, and use the failing seed to rerun the test.
The program may report an error resulting from a failure in either the
simulated or actual execution. To determine if the error is the result
of an actual execution failure, start olcrit in a different CPU and
select the suspected failing CPU. For example, the following entry
starts olcrit in CPU c:
olcrit cpu c

3-48

CRAY PROPRIETARY

SMM-1012 C

If olcrit fails, and the simulated execution is suspect, rerun olcrit
using a different master CPU and the failing seed, as follows:
olcrit cpu a,c +repeat seed n
If olcrit fails in CPU c, the failure is in the actual execution of the
random instruction buffer.
If olcrit does not fail, the error is
either in the simulated execution results from CPU c or it is very
intermittent.

3.4.3

TEST TERMINATION

For information on test termination, refer to section 2, Confidence Test
and Monitor Overview.

3.4.4

TEST EXAMPLES

This subsection contains olcrit execution examples.
The following example runs olcrit for 0'10000000 passes. Output is
redirected to crit.log. The nohup(l) command allows the program to
continue executing after you log off the system. You can later log on to
check the test's progress. The ampersand (&) causes the entire command
to execute in the background, so that another prompt is immediately
displayed and you can continue to use the system.
nohup olcrit maxp 10000000 )crit.log &

The following example runs olcrit with selected command options and
shell facilities. The test runs for 0'1000000 passes in CPU b with all
default instructions. The job runs as a background process, and output
is sent to crit.log.
olcrit maxp 1000000 cpu b )crit.log &

The following example shows a procedure for determining how frequently an
error occurs. The test is rerun with the +repeat option, so that the
first pass is run repeatedly until the test terminates. The test uses
the seed value from the output at the time of the initial error.
Error
isolation is disabled. The output is filtered to crit.log
olcrit +repeat -isolate maxerr 100 maxp 100 cpu d seed
1436651016713554002511 I tail )crit.log &

SMM-1012 C

CRAY PROPRIETARY

3-49

The following example runs olcrit with floating-point and vector
instructions.
olcrit +fpadd +fpmult +fprecip +vector )crit.log &
The following example runs olcrit with all of the vector instructions
except instructions 146 and 147.
olcrit +vector disable 146,147 )crit.log &

The following example runs olcrit with instructions 026ijO, 026ij1,
026ij7, 031, and 072i02.
olcrit enable 26,31,072002 &
The following example runs olcrit with all of the default instructions
except floating-point add and multiply.
olcrit -fpadd -fpmult )crit.log &
The following example shows the output displayed when olcrit is run
with all default values.
olcrit
Output:
olcrit
olcrit started in cpu A on Tue Aug 25 11:32:08 1987
CRAY X-MP MODE
olcrit reached maximum pass limit with 1000 passes and 0 errors
on Tue Aug 25 11:32:18 1987
The following example runs olcrit with the +verbose option enabled so
that a line of output is generated after each pass.
olcrit +verbose

3-50

CRAY PROPRIETARY

SMM-1012 C

Output:
olcrit +verbose
olcrit started in
CRAY X-MP MODE
olcrit: pass = 1,
olcrit: pass = 2,
olcrit: pass = 3,

cpu A on Tue Aug 25 11:42:47 1987
error
error
error

=0
=0
=0

Tue Aug 25 11:42:47 1987
Tue Aug 25 11:42:47 1987
Tue Aug 25 11:42:47 1987

olcrit: pass = 1000, error = 0 Tue Aug 25 11:42:57 1987
olcrit reached maximum pass limit with 1000 passes and 0 errors
on Tue Aug 25 11:42:57 1987

The following example runs olcrit for 10 seconds (wall-clock time) in
CPU conly.
olcrit cpu c time 10
Output:
olcrit cpu c time 10
olcrit started in cpu C on Tue Aug 25 11:44:51 1987
CRAY X-MP MODE
olcrit reached maximum time limit with 1016 passes and 0 errors
on Tue Aug 25 11:45:01 1987

The following example runs olcrit in CPUs a and b, with b as the
master. On each pass, olcrit tests a sequence of 15 instructions,
using random data for the initial register and memory values.
olcrit numins 15 +random cpu b,a
Output on an error:
olcrit numins 15 +random cpu b,a
olcrit started in cpus A, B with
olcrit: restart file written to
CRAY X-MP MODE
(
2100> =
name
(
2101> =
rev
(
2102> =
date
(
2103> =
pass
(
2104> =
error
(
2105> =
seed
(
4027> =
failpat
(
2116> =
isop
(
2107> =
numins

SMM-1012 C

master cpu B on Tue Mar 1 12:40:37 1988
B67350-olcrit
'olcrit
'4.0
'03/01/88'
31
1
1114623621420641250446
'random
1000
15

CRAY PROPRIETARY

3-51

Output (continued):
random instruction buffer
ibuff
10100a
10100b
10100e
10101a
10101e
10101d
10102a
10102b
10102e
10102d
10103a
10103c
10104a
10104c

144744
061032
012000
020000
077406
144107
030367
002700
037705
067045
020600
007000
021033
006000

V7
SO
JAP
AO
V4,A6
VI
A3
CMR
O,AO
SO
A6
R
AO
J

042400
026211

000172
042410
031327
021200

54 V4
53-52
10500a
00026211
SO
SO V7
A6+A7
T05,A7
54*155
00000172
10502a
#06631327
4240a

jump buffer (used by the random instruction buffer)
jbuff
10500a
10500b
10500d
10501a
10501e
10501d
10502a
10502b
10502d
10503a
10503b
10503c
10503d

001000
110000 026400
001000
006000 040404
000000
000000
024100
110100 026401
001000
005000
000000
000000
000000

initial address register data
21600>
initaO
<
21601>
inita1
<
inita2
21602>
<
21603>
inita3
<
21604>
inita4
<
inita5
21605>
<
21606>
inita6
<
21607>
inita7
<

3-52

PASS
26400,0
PASS
J
ERR
ERR
Al
26401,0
PASS
J
ERR
ERR
ERR

AO
10101a

BOO
Al
BOO

= 0000000000000016317572
= 0000000000000017662707
= 0000000000000066352041
= 0000000000000066313277
= 0000000000000014173556
= 0000000000000027243236
= 0000000000000055114565
= 0000000000000006421710

CRAY PROPRIETARY

SMM-1012 C

Output (continued):
initial scalar register data
21610>
initsO
<
inits1
21611>
<
inits2
21612>
<
inits3
21613>
<
inits4
21614>
<
21615>
inits5
<
21616>
inits6
<
21617>
inits7
<

= 0570435766134171410070
= 0657045641432164307775
= 0362774051154520352750
= 1427136526115123426026
= 1510553624661224560223
= 1734474576202245120017
= 1460472150234237442222
= 1214375337067423156017

initial vector length and mask register data

(vector length and mask register data is displayed)
initial central memory data

(central memory data is displayed)
initial jump data (octal ones pattern)

(jump data is displayed)
initial vector register data

(vector register data is displayed)
initial shared B register data

(shared B register data is displayed)
initial shared T register data

(shared T register data is displayed)
initial semaphore register data

(semaphore register data is displayed)
initial B register data

(B register data is displayed)
initial T register data

(T register data is displayed)
simulated random instruction buffer results
The expected data shown below has the following format:
The expected data shown below has the following format:
+ index

name
name:
index:
offset:
data:

The
The
The
The

SMM-I012 C

(offset>

= data

•••

name of the data dumped on this line.
index into the data starting at name.
offset into the data buffer.
actual data dumped.

CRAY PROPRIETARY

Optional, default: O.

3-53

Output (continued):

***

Expected Results

***

cpu B (master)

Source data buffer at 22100 in Memory
Memory address in source data buffer

= 

+ 22100 (source data buffer)

simulated address register data
< 2500> =
aO
a1
< 2501> =
a2
< 2502> =
< 2503> =
a3
a4
< 2504> =
< 2505> =
a5
a6
< 2506> =
a7
< 2507> =

results
0000000000000071146450
0000000000000000040420
0000000000000066352041
0000000000000055114565
0000000000000014173556
0000000000000027243236
0000000000000000000172
0000000000000006421710

simulated scalar register data
(
2510> =
sO
s1
< 2511> =
(
2512> =
s2
(
2513> =
s3
s4
< 2514> =
s5
< 2515> =
s6
< 2516> =
s7
< 2517> =

results
0600005600346143005524
0657045641432164307775
0362774051154520352750
1427136526115123426026
1510553624661224560223
1734474576202245120017
1460472150234237442222
1214375337067423156017

simulated vector length and mask register data results

(vector length and mask register data is displayed)
simulated central memory data results

(central memory data is displayed)
simulated jump data results

(jump data is displayed)
simulated vector register data results

(vector register data is displayed)
simulated shared B register data results

(shared B register data is displayed)
simulated shared T register data results

(shared T register data is displayed)
simulated semaphore register data results

(semaphore register data is displayed)
simulated B register data results

(B register data is displayed)

3-54

CRAY PROPRIETARY

SMM-1012 C

Output (continued):
simulated T register data results

(T register data is displayed)
Differences are the results from actual execution of the random instruction
buffer that differ from the master (simulated or actual) execution.
aO-a7
sO-s7
vI
vm

cm
jmp
vO-v7
sb
st
sm
br
tr

= address

register data results
scalar register data results
= vector length register data results
vector mask register data results
= central memory data results
= jump buffer data results
= vector register data results
= sbO-sb7 register data results
= stO-st7 register data results
= semaphore register data result
= bOO-b77 register data results
= tOO-t77 register data results

The difference data shown below has the following format:
name

+

index

(offset>

= data
data differences

The name of the data dumped on this line.
The index into the data starting at name. Optional, default: O.
The offset into the data buffer.
The actual data dumped.
The differences are marked with an asterisk (*) preceding the data word.
data differences: The bits that differ between the actual results and
the expected results.

name:
index:
offset:
data:

*** Differences ***

cpu B (master)

Source data buffer at 25100 in Memory copied to save buffer at 106362 in Memory
Memory address in source data buffer = (offset> + 25100 (source data buffer)
Memory address in save data buffer
= (offset> + 106362 (save data buffer)
actual random buffer execution results
a3

(

2503>

= *0000000000000063536475
0000000000000036422110

*** Differences ***

SMM-1012 C

cpu A

CRAY PROPRIETARY

3-55

Output (continued):
Source data buffer at 25100 in Memory copied to save buffer at 106362 in Memory
Memory address in source data buffer = (offset> + 25100 (source data buffer)
Memory address in save data buffer
= (offset> + 106362 (save data buffer)
actual random buffer execution results
a3

(

= *0000000000000063536475

2503>

0000000000000036422110
Beginning error isolation
Error isolation complete
name
rev
date
pass
error
seed
failpat
isop
numins

(
(
(

(
(
(
(
(

(

2100>
2101>
2102>
2103>
2104>
2105>
4027>
2116>
2107>

=
=
=
=
=
=
=
=
=

'olcrit
'4.0
'03/01/88'
31
1
1114623621420641250446
'random
1000
15

isolation: random instruction buffer
ibuff
10102a
10102b

030367
006000 021200

A3
J

A6+A7
4240a

jump buffer (may be used by the isolated random instruction buffer)
jbuff
10500a
10500b
10500d
10501a
10501c
10501d
10502a
10502b
10502d
10503a
10503b
10503c
10503d

3-56

001000
110000 026400
001000
006000 040404
000000
000000
024100
110100 026401
001000
005000
000000
000000
000000

PASS
26400,0
PASS

AO

J

10101a

ERR
ERR
A1
26401,0
PASS
J

BOO
A1
BOO

ERR
ERR
ERR

CRAY PROPRIETARY

SMM-1012 C

Output (continued):
isolation: initial address register data
(
initaO
21600> = 0000000000000000026211
inita1
21601> = 0000000000000017662707
<
inita2
21602> = 0000000000000066352041
<
inita3
21603> = 0000000000000066313277
<
inita4
21604> = 0000000000000014173556
<
inita5
21605> = 0000000000000027243236
<
inita6
21606> = 0000000000000055114565
<
21607> = 0000000000000006421710
inita7
<
isolation: initial scalar register data
initsO
21610> = 1044142454740403053056
<
inits1
21611> = 0657045641432164307775
<
21612>
inits2
<
= 0362774051154520352750
21613>
inits3
<
= 1427136526115123426026
inits4
21614>
<
= 1510553624661224560223
inits5
21615>
<
= 1734474576202245120017
inits6
21616>
<
= 1460472150234237442222
21617>
inits7
<
= 1214375337067423156017

(From this point on, the dump is similar to the previously listed
portion o£ the dump that displayed the unisolated error in£ormation.)
The first address (FADD) of the diagnostic is 2100a
olcrit reached maximum error limit with 31 passes and 1 errors
at Tue Mar 1 12:40:59 1988

3.4.5

TEST MESSAGES

The olcrit test produces the following types of messages:
•
•
•

Test mode
Informative
Error

These messages are listed in the subsections that follow.

3.4.5.1

Test mode messages

During test execution, one of the following informational messages is
displayed to indicate the test mode:
CRAY Y-MP MODE
Indicates that the mainframe is a CEA system in Y-mode.

SMM-1012 C

CRAY PROPRIETARY

3-57

CRAY Y-MP MODE, shared register testing disabled
Indicates that the mainframe is a CEA system in Y-mode, and that
shared register instruction testing is disabled because cluster 0
is in use. If this message is inconsistent with your hardware
configuration, it normally indicates an instruction failure. To
determine where the failure occurred, rerun olcrit with the
+shr command option. Contact your CRI representative for
additional assistance.
CRAY X-MP MODE
Indicates that the mainframe is a CRAY X-MP computer system.
CRAY X-MP MODE, shared register testing disabled
Indicates that the mainframe is a CRAY X-MP computer system, and
that shared register instruction testing is disabled because
cluster 0 is in use. If this message is inconsistent with your
hardware configuration, it normally indicates an instruction
failure. To determine where the failure occurred, rerun olcrit
with the +shr command option. Contact your CRI representative
for additional assistance.
CRAY X-MP MODE, compressed index testing disabled
Indicates that the mainframe is a CRAY X-MP computer system
without compressed indexing hardware. If this message is
inconsistent with your hardware configuration, it normally
indicates an instruction failure. To determine where the failure
occurred, rerun olcrit with the +ci command option. Contact
your CRI representative for additional assistance.
CRAY X-MP MODE, extended memory testing disabled
Indicates that the mainframe is a CRAY X-MP computer system
without extended memory instruction hardware. If this message is
inconsistent with your hardware configuration, it normally
indicates an instruction failure. To determine where the failure
occurred, rerun olcrit with the +ema command option. Contact
your CRI representative for additional assistance.
CRAY-l MODE
Indicates that the mainframe is a CRAY-l computer system.
CRAY-l MODE, vector pop/parity testing disabled
Indicates that the mainframe is a CRAY-l computer system without
vector population count/parity instruction hardware. If this
message is inconsistent with your hardware configuration, it
normally indicates an instruction failure. To determine where the
failure occurred, rerun olcrit with the +pop command option.
Contact your CRI representative for additional assistance.

3-58

CRAY PROPRIETARY

SMM-I012 C

3.4.5.2

Informative messages

If the +verbose option is enabled, a message is sent to stdout

(standard output device) after each pass through the test loop.
On an error, the test provides information such as the following:
•

Pass and error counts

•

Seed at the beginning of the pass on which the error occurred

•

Contents of the instruction buffer

'.

Initial data

•

Data results from the simulated instruction execution in the
master CPU

•

Differences between the simulated execution results from the
master CPU and the actual execution results from all of the
selected CPUs

In addition, the following informative messages may be displayed:
The ijk field is invalid; the instruction was not
selected/deselected.
The ijk field specified with the gh field for enable ilist
or disable ilist is invalid. Correct and rerun.
The ijk field is not needed to select/deselect the instruction.
The ijk field specified with the gh field for enable ilist
or disable ilist is not required. However, the specified
instruction was selected or deselected.

3.4.5.3

Error messages

One of the following error messages is sent to stderr (standard error
device) if an invalid command option is entered:
olcrit: pattern: No data pattern(s) selected.
All data patterns are deselected. Correct and rerun.
olcrit: selins: No executable instructions selected.
All instructions are deselected. Correct and rerun.
olcrit: selins: Vector length must be in the range 0 through 100.
Vector length is not in the range a through 100. Correct the vI
option and rerun.

SMM-1012 C

CRAY PROPRIETARY

3-59

One of the following error messages is sent to stderr if olcrit
detects an unexpected error. Select a different master CPU and rerun the
test. If the problem persists, contact your CRI representative.
olcrit: simulate:
simxxx routine.

(software error)

The instruction does not have a

olcrit: generate:
genxxx routine.

(software error)

The instruction does not have a

olcrit:

(software error)

The gh field is greater than

simulate:

177.

3-60

CRAY PROPRIETARY

SMM-1012 C

3.5. olcsvc
The olcsvc
functional
functional
units, and

test provides comprehensive testing of the vector registers,
units, and paths, and limited testing of the scalar registers,
units, and paths. All address registers, address functional
related paths are assumed to be operating correctly.

The olcsvc test generates a random sequence of vector instructions,
followed by a sequence of scalar instructions. The scalar and vector
instructions perform identical functions. The two sets of instructions
are executed with random data, and the results are compared. Any
differences are reported, and the test attempts to isolate the error. If
no differences are detected, the test generates new instructions and
data, and repeats the process.
The olcsvc test runs under the confidence monitor program, olcmon.
The olcmon monitor compares the scalar and vector execution results.
For additional information on olemon, refer to section 2, Confidence
Test and Monitor Overview.

3.5.1

TEST SYNOPSIS

The olcsvc command options can be entered in any order. If an option
is omitted, the program uses the default value. The test synopsis lists
the olcsvc command options and arguments in the following order:
1.
2.
3.
4.

Monitor options
Test-specific options
Data pattern options
Instruction options

SMM-1012 C

CRAY PROPRIETARY

3-61

Synopsis:
olcsvc [chkpnt mode] [cpu clist] [cputime h:m:s] [+I-qetseed]
[qetseed file] [help] [mazerr n] [mazp n] [+I-parcel] [time h:m:s]
[+I-verbose] [+zmp] [+crayl]t
[disable ilist] [enable ilist] [+I-isolate] [isop n]
[numpar n] [+I-repeat] [seed n] [+I-sqci] [vI n]

[+/-onezero] [+I-random] [+I-slide]

[+I-cm] [+I-fpadd] [+I-fpmult] [+I-fprecip] [+I-int] [+I-Ioqical]
[+I-pop] [+I-shift]
disable ilist
Deselects specific instructions.
following format:

Enter ilist in the

n,n, ••• ,n
n is the octal value in the gh field of the specific
vector instructions. Only vector instructions are valid;
all other instructions are ignored. The disable ilist
option overrides the enable ilist option and any
selected (+) or deselected (-) instruction options.
enable ilist
Selects specific instructions.
following format:

Enter ilist in the

n,n, ••. ,n
n is the octal value in the gh field of the specific
vector instructions. Only vector instructions are valid;
all other instructions are ignored. If you do not enter
enable ilist, all vector instructions are run. The
enable ilist option overrides any selected (+) or
deselected (-) instruction options. When the test is run
with default values for the +1- instruction options, and
the enable ilist option is selected, only the
instructions specified by the enable ilist option are
run.
t

The monitor command options are described in section 2, Confidence
Test and Monitor Overview.

3-62

CRAY PROPRIETARY

SMM-I012 C

+I-isolate
Enables (+isolate) or disables (-isolate) the error
isolation option. The default is +isolate.
isop n

Sets the isolation pass limit to n (octal). During
isolation, the diagnostic repeatedly executes the suspected
failing sequence.
If the sequence fails, the loop
terminates and the diagnostic attempts to isolate the
shortened sequence further.
If the sequence does not fail,
the loop terminates after n passes, and olcsvc assumes
that the error is not in the tested sequence. The default
for n is 0'1000.

numpar n

Sets the minimum number of parcels of vector instructions
to be generated on each pass. The actual number of parcels
generated can be greater than n on any given pass. n
can be any octal value in the range 1 through 0'200. The
default for n is 0'100.

+I-repeat
Enables (+repeat) or disables (-repeat) the option that
repeats the first pass until the diagnostic terminates.
+repeat is useful for recreating an error.
It is
normally used with one of the following options:
seed n,
+getseed, or getseed file.
The default is -repeat
(the program generates new instructions and data after each
pass).
seed

n

Sets the random seed to n.
n can be any 64-bit
octal value.
If n is 0, the test reads the real-time
clock and uses the value for the initial seed. The default
for n is 0'33.
If seed n is selected, do not select
+getseed or getseed file.

+I-sgci

Enables (+sgci) or disables (-sgci) testing of the
scatter/gather/compressed index hardware. When enabled,
testing occurs even if the hardware configuration detection
routine indicates that the hardware is not present in the
system. However, if this option is enabled and the
hardware is not present in the system, you will receive a
When allowed
dump indicating that the hardware has failed.
to default, the test determines the type of hardware
configuration and sets the default value accordingly.

vI n

Sets the vector length to n.
n can be any octal value
within the range 0 through 0'100. The default for n is

o.

If vI is set to 0, a random vI value is used to
initialize the test and the value may change during the
execution of the random instruction buffer.

SMM-1012 C

CRAY PROPRIETARY

3-63

vI n
(continued)

If the vI value is within the range 1 through 0'100,
instruction 00200k is disabled. The vI value is
initialized to n and remains set to n during the
execution of the random instruction buffer. However, if
instruction 00200k is selected by the enable option,
the vI value is initialized to n and may change each
time a 00200k instruction is executed in the random
instruction buffer.

+I-onezero, +I-random, +I-slide
Selects (+) or deselects (-) specific data patterns.
Except when the vI value is initialized to a value
within the range 1 through 0'100, random data is used
for the vector length register. The default is
+onezero +random +slide. The data patterns are as
follows:
Option

Data Pattern

one zero

Random selection of alII's or all O's in a
word. For ex~nple:
1777777777777777777777
0000000000000000000000

random

Random bit generation in a word.

For example:

1023122123232122777127
0003423100233344322177
1640034356453221213532
1123235467543221322120
1304322300332105534311
slide

Random number of consecutive l's (O's) that
slide in either direction through a field of
O's (l's). Consecutive words contain the
sliding pattern. For example:
0777777777777777777777
0377777777777777777777
0177777777777777777777
1077777777777777777777
1437777777777777777777

1777777777777777777770
1777777777777777777774
1777777777777777777776
1777777777777777777777

3-64

CRAY PROPRIETARY

SMM-1012 C

+I-onezero, +I-random, +I-slide
(continued)
Option
Data Pattern
slide

(Example continued):
0000000000000000000001
0000000000000000000003
0000000000000000000007
0000000000000000000017
0000000000000000000036

0740000000000000000000
1700000000000000000000
1600000000000000000000
1400000000000000000000
1000000000000000000000
0000000000000000000000
+I-om, +I-fpadd, +I-fpmult, +I-fprecip, +I-int, +I-logical, +I-pop,
+I-shift
Selects (+) or deselects (-) specific instruction
groups for the following options:

Option

Instruction Type

om

Central memory
Floating-point addition
Floating-point multiply
Floating-point reciprocal
Integer
Logical
Population/parity count
Shift

fpadd
fpmult
fprecip
int
logical
pop
shift

If allowed to default, all instruction groups are run.
groups are as follows:
Option

Instruction Group

om

176, 177
170 through 173
160 through 167t
174ijO
154 through 157
003, 073, 140 through 147, 175
174ijl, 174ij2
150 through 153

fpadd
fpmult
fprecip
int
logical
pop
shift
t

The

Instruction 166 is not generated on a CEA system.

SMM-I012 C

CRAY PROPRIETARY

3-65

3.5.2

TEST EXECUTION

The olcsvc execution sequence is as follows:
1.
2.
3.
4.
5.

Test initialization and hardware configuration detection
Random instruction and data generation
Instruction buffer execution
Comparison of execution results
Error isolation

Hardware configuration detection occurs only at test initiation. Steps
2 through 4 occur on each pass through the test loop. Step 5 occurs only
on error.
3.5.2.1

Test initialization and hardware configuration detection

At test initialization, instructions are processed in the following order:
1.

All instructions are initially enabled unless either of the
following occurs (in which case no instructions are initially
enabled):
•

An instruction group is selected (+option)

•

An enable option is entered and there are no deselected
(-option) instruction group entries

2.

Selected groups are processed, enabling instructions in the
selected groups.

3.

Deselected groups are processed, disabling instructions in the
deselected groups.

4.

If the vI option is set to a value within the range
1 through 0'100, instruction 00200k is deselected.

5.

Individually selected instructions are processed (all
instructions specified by the enable option).

6.

Individually deselected instructions are processed (all
instructions specified by the disable option).

7.

If no instructions are selected, an error message is displayed
and the test is terminated.

The hardware configuration detection routine determines which of the
following computer systems is configured:

3-66

•

CRAY X-MP computer system

•

CRAY-1 computer system

CRAY PROPRIETARY

SMM-1012 C

Then the hardware configuration detection routine adjusts testing
accordingly, by determining the following:
Mainframe

Hardware Configuration Detection Routine

CRAY X-MP

Determines whether the system contains
scatter/gather/compressed indexing hardware

CRAY-l

Determines whether the system contains a vector
population count functional unit

After determining the hardware characteristics, the routine writes a
message to stdout to indicate the type of system detected.
Instruction generation is dependent on the hardware configuration
detected, as follows (you can use the +/-sqci option to override this
default instruction generation process):
Mainframe

Instructions Generated

CEA

All instructions are generated except instruction 166,
which is the 32-bit vector integer multiply instruction

CRAY X-MP

All instructions are generated with one condition:
scatter/gather/compressed indexing instructions are
generated only if present in the hardware.

CRAY-1

All instructions are generated except the following:
A load VL instruction (00200k)
Scatter/gather/compressed indexing instructions
Any instructions that would cause vector
recursion. (In a vector instruction, vector
recursion results when vi and Vj or vi and
vk refer to the same vector register).
Vector pop/parity instructions are generated only
if the hardware contains a vector population
count functional unit.

3.5.2.2

Random instruction and data generation

These routines build the random vector instruction buffer. As each
vector instruction is generated, the sequence of scalar instructions that
simulates the vector instructions is generated in the scalar instruction
buffer.

SMM-1012 C

CRAY PROPRIETARY

3-67

The following information applies to the sequence of scalar instructions
that is generated for each vector instruction:
•

Sa, Sb, Sc, and Sd are randomly selected S registers. Am, An, Ap,
and Aq are randomly selected A registers. The test uses unique
A registers and S registers for each sequence, but not AO or so.
The registers are not selected based on the ijk fields of the
vector instruction. Therefore, the same vector instruction does
not always generate the same sequence of scalar instructions. The
registers used in the scalar sequence will vary.

•

The labels vireg, vjreg, vkreg, sireg, sjreg, skreg, and vrnreg
are central memory locations containing the simulated vector
registers, scalar registers, and vector mask register,
respectively. The actual address depends on the i, j, and k
fields of the actual vector instruction.

•

For vector instructions that require A registers to contain
certain values (memory and shift instructions), constant loads of
the A registers are generated immediately preceding the actual
vector instruction in the vector instruction buffer.

•

These sequences are altered for certain vector instructions if the
i, j, and k fields of the vector instruction refer to the
same vector register. For instructions 141, 143, 145, 155, 157,
161, 163, 165, 167, 171, and 173, if the j field is equal to the
k field of the instruction, the read from vkreg in the scalar
instruction sequence is not generated because it is the same as
the read from vjreg; this results in faster execution of the
scalar instruction sequence.
The following applies only to CRAY-l computer systems:
For instructions 141, 143, 145, 147 through 153, 155, 157,
161, 163, 165, 167, 171, and 173, the i field never equals
the j field.
For instructions 140 through 147, and 154 through 174, the
i field never equals the k field.

3-68

•

The shift instructions normally produce a shift value in the range
o through 0'77 for a single shift and 0 through 0'177 for a double
shift, and only occasionally use a random value for the shift
amount.

•

For instructions 176iOk and 1770jk (read/write vector to
central memory), the central memory address is a random address
within the first 0'400 words of cmbuff. The stride is a random
value with its upper limit based on the random address and the
current vector length. Therefore, a large stride can be used if
the vector length is small.

CRAY PROPRIETARY

SMM-1012 C

•

For instructions 176i1k and 1771jk (gather and scatter), the
program sets up a vector register containing a specific range of
values by forcing a sequence of instructions before instruction
176i1k or 1771jk is generated. The forced instructions
consist of a load of an S register with a 9-bit mask from the
right (042i67), followed by a 140 instruction (the logical
product of a scalar register with a vector register to a vector
register). The resulting vector register is then used as the vk
register in a 176i1k or 1771jk instruction. This forces the
values into the range 0 through 0'777, and it reduces the
randomness of the instruction sequence generated. The test tracks
the vector registers that can be used for a gather/scatter
instruction. If the vk register is within the range 0 through
0'117 when a 176i1k or 1771jk instruction is generated, the
set-up sequence is not generated.
The following conditions indicate that a vector register is within
the range 0 through 0'777:
The register was set up for a previous gather/scatter
instruction.
The register received the results from a 174ij1 or 174ij2
instruction (pop/parity).
The register received the results from a 140 instruction, and
the vk field of the instruction was set up for
scatter/gather.
The register received the results from a 141 instruction, and
either the Vj or vk field of the instruction was set up
for scatter/gather.
The register received the results from a 143, 145, or 147
instruction, and the vj and vk fields of the instruction
were set up for scatter/gather.
The register received the results from a 151 instruction
(single shift right), and the shift value was greater than 55
(decimal).
The register received the results from a 153 instruction
(double shift right), and the shift value was greater than
119 (decimal).

The scalar instruction sequence that is generated for each vector
instruction follows.

SMM-1012 C

CRAY PROPRIETARY

3-69

Scalar instructions are not generated for vector instruction 00200k.
However, during the vector instruction sequence, the VL value to be used
in scalar instruction sequences is loaded into an A register and,
subsequently, the VL register is loaded from the A register.
The scalar instruction sequence for vector instruction 0030jO is as
follows:

Sb

sjreg,

vmreg,

Sb

Read Sj value
; Store Resulting VM

The scalar instruction sequence for vector instruction 073iOO is as
follows:
·Sa
sireg,

vmreg,
Sa

Read simulated VM reg.

The scalar instruction sequence for vector instruction 076 is as follows:
Ap
Sa
sireg,

element
vjreg,Ap
Sa

· Read

, Random element number
element from vj
Store into Si

The scalar instruction sequence for vector instruction 077 is as follows:
Ap
Sa
vireg,Ap

element
sjreg,
Sa

Random element number
; Read Sj
Store into vi

The scalar instruction sequence for vector instructions 140, 142, 144,
154, 156, 160, 162, 164, 166,t 170, and 172 is as follows:
Am
An
loop

Sb
Sc

vI
0
sjreg,
vkreg,An

Sa
vireg,An
An
AO
jan

Sa
An+1
Am-An
loop

SbopSc

Current simulated VL
Index
Get S register value
Get next vector element
Perform operation
, Store result
, Update index
, Test for end
Loop until index = VL

·
·

·

op can be one of the following:
&,!,

t

, +, -, *f, *h, *r, *i, +f, -f

Instruction 166 is not generated on a CEA system in Y-mode.

3-70

CRAY PROPRIETARY

SMM-1012 C

The scalar instruction sequence for vector instructions 141, 143, 145,
155, 157, 161, 163, 165, 167, 171, and 173 is as follows:

loop

Am
An

vI

Sb
Sc

vjreg,An
vkreg,An

Sa
vireg,An
An
AO
jan

Sa
An+1
Am-An
loop

0

SbopSc

Current simulated VL
Index
Get next vector
elements
Perform operation
Store result
Update index
Test for end
Loop until index = VL

op can be one of the following:
&,

I,

,

, +, -, *f, *h, *r, *'1., +f, -f

The scalar instruction sequence for vector instruction 146 is as follows:

loop

Am
An
Sd
SO
jsp
Sa
j

skip1
skip2

Sa
vireg,An
Sd
An
AO
jan

vI

o
vrnreg,
Sd
skip1
sjreg,
skip2
vkreg,An
Sa
Sd<1
An+1
Am-An
loop

Current simulated VL
Index
Get simulated VM reg.
VM to SO for testing
Decide on result
Read Sj register
Skip vector read
Read vector element
Write result element
Shift VM value
Update index
Test for end
Loop until index = VL

The scalar instruction sequence for vector instruction 147 is as follows:

loop

Am
An
Sd
SO
jsp
Sa
j

skip1
skip2

SMM-1012 C

Sa
vireg,An
Sd
An
AO
jan

vI

o
vrnreg,
Sd
skip1
vjreg,An
skip2
vkreg,An
Sa
Sd<1
An+1
Am-An
loop

Current simulated VL
Index
Get simulated VM reg.
VM to SO for testing
Decide on result
Read vj element
Skip vector read
Read vk element
Write result element
Shift VM value
Update index
Test for end
Loop until index = VL

CRAY PROPRIETARY

3-71

The scalar instruction sequence for vector instructions 150 and 151 is as
follows:

loop

·

Ap

shift

, Amount to shift

Am

vI
0
vjreg,An
SaopAp
Sa
An+1
Am-An
loop

, Current simulated VL
, Index
; Get Vj element
Do the shift
Store result
Update index
, Test for end
, Loop until index = VL

AS
Sa
Sa
vireg,An
An
AO
jan

·
·

·
·

op can be < (left shift) or > (right shift).
The scalar instruction sequence for vector instruction 152 is as follows:

loop

skip

Ap
Am
An
Sa
An
AO

shift

vI
0
vjreg,An
An+1
Am-An
Sb
0
jaz
skip
vjreg,An
Sb
Sa,SbAp
Sa
Sd
An+1
Am-An
loop

simulated VL
Index
Zero fill the shift
, Get Vj element
; Copy Sa into Sd
, Do the shift
, Store the result
Copy Sd into Sb
Update index
Test for end
Loop until index = VL

·

·
·

CRAY PROPRIETARY

SMM-1012 C

The scalar instruction sequence for vector instruction 174ijO is as
follows:

loop

Am
An

vI

sb

vjreg,An
IhSl
Sa
An+l
Am-An
loop

Current simulated VL
Index
Get Vj element
Perform operation
Store result
Update index
Test for end
Loop until index = VL

0

Sa
vireg,An
An
AO
jan

The scalar instruction sequence for vector instructions 174ijl and
174ij2 is as follows:

loop

Am
An

vI

Sb

vjreg,An
opSl
Ap
An+l
Am-An
loop

Ap
vireg,An
An
AO
jan
op can be P or

Current simulated VL
Index
Get Vj element
Perform operation
Store result
Update index
Test for end
Loop until index = VL

0

Q

The scalar instruction sequence for vector instructions 175ijO through
175ij3 is as follows:
Am
An
Sc
Sa
SO

loop

jump
Sa
Sc
An
AO
jan
vrnreg,

skip

vI

;

0

SB
0

vjreg,A5
skip
Sa!Sc
Sc>l
An+l
Am-An
loop
Sa

Current simulated VL
Index
Mask of current element
Build VM in this register
Get next element
Set VM bit?
Yes ~ Set bit in VM
Shift for next element
Update index
Test for end
Loop until index = VL
Store resulting VM

The jump value is determined by the vector instruction, as follows:
Vector
Instruction

Value

175ijO
175ijl
175ij2
175ij3

jsn
jsz
jsm
jsp

SMM-l012 C

Jump

CRAY PROPRIETARY

3-73

The scalar instruction sequence for vector instructions 175ij4 through
175ij7 is as follows:

loop

skip

Am

vI

An
Sc
Sa
Ap
SO
jump
Sa
vireg,Ap
Ap
Sc
'An
AO
jan
vmreg,

o
SB

o
o
vjreg,An
skip
Sa!Sc
An
Ap+l
Sc>l
An+l
Am-An
loop
Sa

; Current simulated VL
; Index
Mask of current element
; Build VM in this register
; Compressed index pointer
Get next element
Set VM bit?
; Yes - set bit in VM
Store index in vi
; Update compressed index
Shift for next element
Update index
; Test for end
; Loop until index = VL
Store resulting VM

The jump value is determined by the vector instruction, as follows:
Jump
Value

Vector
Instruction

jsn
jsz
jsm
jsp

175ij4
175ij5
175ij6
175ij7

The scalar instruction sequence for vector instruction 176iok is as
follows:

loop

3-74

Ap
Aq
Am
An
Sa
vireg,An
Ap
An
AO
jan

cmaddress
stride
vI
0

,Ap
Sa
Ap+Aq
An+l
Am-An
loop

CM address in cmbuff

· Current

, Random stride value
simulated VL
Index
, Read from cmbuff
; Store element of vector
Increment address by stride
Update index
, Test for end
Loop until index = VL

·
·

CRAY PROPRIETARY

SMM-I012 C

The scalar instruction sequence for vector instruction 176ilk is as
follows:
Ap
Am
An
Aq
Aq
Sa
vireg,An
An
AO
jan

loop

cmbuff
vI
0

vkreg,An
Aq+Ap
,Aq
Sa
An+l
Am-An
loop

Address of cmbuff
Current simulated VL
Index
Get element of vector
Calculate address
Get word from memory
Store vector element
: Update index
Test for end
Loop until index = VL

The scalar instruction sequence for vector instruction 177ijO is as
follows:
Ap
Aq
Am
An
loop

cmaddress
stride
vI
0

Sb

vjreg,An

,Ap
Ap
An
AO
jan

Ap+Aq
An+l
Am-An
loop

Sb

CM address in cmbuff
Random stride value
Current simulated VL
Index
Get element of vector
Write to cmbuff
Increment address by stride
Update index
: Test for end
Loop until index = VL

The scalar instruction sequence for vector instruction 177ijl is as
follows:
Ap
Am
An
Aq
Aq

loop

Sb
,Aq
An
AO
jan

3.5.2.3

cmbuff
vI

o
vkreg,An
Aq+Ap
vjreg,An
sb
An+l
Am-An
loop

Address of cmbuff
Current simulated VL
Index
Get element of vector
Calculate address
Get vector element
Write word to memory
Update index
Test for end
Loop until index = VL

Instruction buffer execution

After the instructions and data are generated, the scalar and vector
instruction buffers are executed first in the master CPU, and then in
each of the other selected CPUs. Immediately following the execution of
an instruction buffer, the save monitor routine is called to save the
execution results.

SMM-l012 C

CRAY PROPRIETARY

3-75

3.5.2.4

Comparison of execution results

After the scalar and vector instruction buffers are executed in all of
the selected CPUs, the compare monitor routine compares the results,
and one of the following actions occurs:
•

If the results match, the test proceeds with the next pass.

•

If the results do not match, the test dumps all of the data
related to the suspected failure and, if the isolation option is
enabled (+isolate), attempts to isolate the failure by reducing
the number of instructions in the execution buffers in which the
failure is occurring. Refer to the test output to determine which
CPU has failed.

3.5.2.5

Error isolation

If an error is detected and the isolation option is enabled (+isolate),
the test attempts to reduce the random vector instruction buffer to the
minimum number of failing instructions. If an instruction sequence is
removed from the vector instruction buffer, the corresponding scalar
instruction sequence is removed from the scalar instruction buffer. If a
vector instruction requires that a set of registers be used together to
perform a specific function, such as the address registers for memory
references, the set of instructions is considered to be a single
instruction sequence.
The isolation process consists of two parts. During the first part, the
vector instruction buffer is shortened from the end, one instruction
sequence at a time. The isolation routine initially tests the number of
instruction sequences generated minus one. The routine executes until
the specified number of passes is reached (isop n) or an error is
detected. If an error is detected, the number of instruction sequences
tested is decremented by one, and testing continues for isop n
passes. This process continues until no errors are detected or until
there are no remaining instructions to be tested.
If there are no remaining instructions to be tested and the test detects
an error resulting from loading and unloading the registers, the test
generates an output dump and the isolation process terminates.
During the second part of the isolation process, the last instruction
sequence removed is tested by itself for isop n passes. If no error
is detected, the preceding instruction sequence is loaded into the random
vector instruction buffer and tested for isop n passes. Until the
program detects an error or reaches the beginning of the instruction
buffer, one more preceding instruction is added to the test sequence on
each iteration of the isolation process.

3-76

CRAY PROPRIETARY

SMM-I012 C

When the isolation process terminates, the output dump contains the
following:
•
•
•
•
•

Isolated vector and scalar instruction buffers
Data used when the failure occurred
Scalar execution results from the master CPU
Vector execution differences from the master CPU
Scalar and vector execution differences from other CPUs

If the failure occurs intermittently, the second part of the isolation
process may terminate without detecting an error, and execution
difference results do not appear in the output dump. In this case,
increase the value of isop n, enable the +repeat option, select the
failing CPU, and use the failing seed to rerun the test.
All of the selected CPUs execute the scalar and vector instruction
buffers. Therefore, if the program reports an error resulting from a
failure in either the scalar or vector execution, the differences results
should indicate where the failure occurred. For example, if the scalar
and vector results indicate differences in all of the selected CPUs, the
scalar instruction buffer in the master CPU is suspect. In this case,
use the failing seed to rerun olcsvc in a different master CPU.

3.5.3

TEST TERMINATION

For information on test termination, refer to section 2, Confidence Test
and Monitor Overview.

3.5.4

TEST EXAMPLES

This subsection contains olcsvc execution examples.
The following example runs olcsvc for 0'10000000 passes in CPU b.
Output is redirected to olcsvc.log. The Dohup(1) command allows the
program to continue executing after you log off the system. You can
later log on to check the test's progress. The ampersand (&) causes
the entire command to execute in the background, so that another prompt
is immediately displayed and you can continue to use the system.
nohup olcsvc maxp 10000000 cpu b >olcsvc.log &

SMM-I012 C

CRAY PROPRIETARY

3-77

The following example shows a procedure for determining how frequently an
error occurs. The test is rerun with the +repeat option, so that the
first pass is run repeatedly until the test terminates. The test uses
the seed value from the output sent to fail.log at the time of the
initial error. Error isolation is disabled. The output is filtered to
olesve.log.
olcsvc +repeat -isolate maxerr 100 maxp 100 cpu d getseed fail. log I
tail >olcsvc.log &
The following example runs olesve with floating-point multiply and
central memory instructions, and instructions 140 through 143. The test
uses a constant vector length of 0'100.
olcsvc +fpmult +cm enable 140,141,142,143 vI 100 >olcsvc.log &
The following example runs olesve with all of the vector logical
instructions except instructions 146 and 147.
olcsvc +logical disable 146,147 >olcsvc.log &
The following example runs olesve with all of the instructions except
floating-point multiply.
olcsvc -fpmult >olcsvc.log &
The following example shows the output displayed when olesve is run
with all default values.
olcsvc
Output:
olcsvc
olcsvc started in cpu A on Tue Aug 25 13:42:07 1987
CRAY X-MP MODE
olcsvc reached maximum pass limit with 1000 passes and 0 errors
on Tue Aug 25 13:42:15 1987
The following example runs olesve with the +verbose option enabled so
that a line of output is generated after each pass.
olcsvc +verbose

3-78

CRAY PROPRIETARY

SMM-1012 C

Output:
olcsvc +verbose
olcsvc started in
CRAY X-MP MODE
olcsvc: pass = 1,
olcsvc: pass = 2,
olcsvc: pass = 3,

cpu A on Tue Aug 25 11:42:47 1987
error
error
error

=0
=
=

0
0

Tue Aug 25 11:42:47 1987
Tue Aug 25 11:42:47 1987
Tue Aug 25 11:42:47 1981

olcsvc: pass = 1000, error = 0 Tue Aug 25 11:42:55 1987
olcsvc reached maximum pass limit with 1000 passes and 0 errors
on Tue Aug 25 11:42:55 1981

The following example runs olcsvc for 10 seconds (CPU time) in CPU c
only.
olcsvc cpu c cputime 10
Output:
olcsvc cpu c cputime 10
olcsvc started in cpu C on Tue Aug 25 11:44:51 1981
CRAY X-MP MODE
olcsvc reached maximum cputime limit with 1510 passes and 0 errors
on Tue Aug 25 11:45:06 1987

The following example runs olcsvc in CPUs a and c, with a as the
master. On each pass, the test generates 20 parcels of vector
instructions.
olcsvc cpu a,c numpar 20
Output on an error:
olcsvc cpu a,e numpar 20
olcsvc started in cpus A, C with master cpu A on Mon Feb 9 11:19:19 1981
CRAY X-MP MODE
olcsvc: restart file written to Al1524-olcsvc
11760> = 'olcsvc
name
<
11161> = '4.0
rev
<
11762> = '02/09/87'
date
<
pass
11763> = 4
<
11764> = 1
error
<
11765> = 37507312636362015466
seed
<
11110> = 0
vI
<
12016> = 20
numpar
<
14521> = 1000
isop
<
12475> = 'slide ,
failpat
<

SMM-1012 C

CRAY PROPRIETARY

3-79

Output (continued):
random vector instruction buffer

15456a
15456b
15456c
15456d
15457a
15457c
15457d
15460a
15460b
15460c
15460d
15461a
15461b
15461c
15461d
15462a

175073
160464
143005
153060
020600 000072
150156
163334
162334
147015
163607
165604
141752
162716
141227
172347
006000 057120

VM
V4
VO
VO
A6
V1
V3
V3
VO
V6
V6
V7
V7
V2
V3
J

vbuff
V7,M
S6*FV4
VO!V5
V6,V6>AO
00000072
V500
S6
24457,A2
SO
15523c
JSP
S6!S2
S6
S2>100-77
S2
A2+AO
A2
AO
A6-A2
15522b
JAN
23546,0
S6
00000002
A6
00
A5
23555,0
S2
24157,A5
S3
S2*FS3
S4
24157,A5 S4
A5+AO
A5
A6-A5
AO
15526c
JAN

(scalar instructions simulating all of the vector instructions are
displayed)
initial vector length and mask register data
21533> = 0000000000000000000002
<
initvl
21534> = 1600000000000000000000
<
initvrn

3-80

CRAY PROPRIETARY

SMM-1012 C

Output (continued):
initial scalar register data
21535>
initsO
<
21536>
initsl
<
21537>
inits2
<
21540>
inits3
<
21541>
inits4
<
21542>
<
inits5
21543>
inits6
<
21544>
inits7
<

=
=
=
=
=
=
=
=

1700000000000000000000
1740000000000000000000
1760000000000000000000
1770000000000000000000
1774000000000000000000
1776000000000000000000
1777000000000000000000
1777400000000000000000

initial vector register data

(vector register data is displayed)
initial Central Memory data

(central memory data is displayed)
scalar instruction buffer execution results
The expected data shown below has the following format:
+ index

name
name:
index:
offset:
data:

The
The
The
The



= data

•.•

name of the data dumped on this line.
index into the data starting at name.
offset into the data buffer.
actual data dumped.

*** Expected Results ***

Optional, default: O.

cpu A (master)

Source data buffer at 16300 in Memory copied to save buffer at 73613 in Memory
Memory address in source data buffer =  + 16300 (source data buffer)
Memory address in save data buffer
 + 73613 (save data buffer)

=

Scalar Buffer Execution Results
scalar buffer execution: vector length and mask register data results
vlreg
<
2010>
0000000000000000000002
2011>
0000000000000000000000
vrnreg
<

=
=

scalar buffer execution: scalar register data results
<
2000> = 1700000000000000000000
sOreg
2001> = 1740000000000000000000
slreg
<
2002> = 1760000000000000000000
s2reg
<
2003>
s3reg
<
= 1770000000000000000000
2004>
s4reg
<
= 1774000000000000000000
2005> = 1776000000000000000000
s5reg
<
2006> = 1777000000000000000000
s6reg
<
2007>
s7reg
<
= 1777400000000000000000

SMM-1012 C

CRAY PROPRIETARY

3-81

Output (continued):
scalar buffer execution: vector register data results

(vector register data is displayed)
scalar buffer execution: central memory data results

(central memory data is displayed)
The following data shows the differences between executing
the scalar buffer in the master CPU and executing the
vector buffer and scalar buffer in any remaining CPUs.
vlreg
vrnreg
sOreg-s7reg
vOreg-v7reg
cmbuff

= vector length register results
= vector mask register results
= scalar register data results
= vector register data results
= central memory data results

The difference data shown below has the following format:
n~e

+

index

(offset>

= data
data differences

The n~e of the data dumped on this line.
The index into the data starting at n~e. Optional, default: O.
The offset into the data buffer.
The actual data dumped.
The differences are marked with an asterisk (*) preceding the data word.
data differences: The bits that differ between the actual results and
the expected results.
n~e:

index:
offset:
data:

*** Differences ***

cpu A (master)

Source data buffer at 16300 in Memory copied to save buffer at 75626 in Memory
Memory address in source data buffer = (offset> + 16300 (source data buffer)
Memory address in save data buffer
= (offset> + 75626 (save data buffer)
Vector Buffer Execution Results

*** Differences ***

cpu C

Source data buffer at 16300 in Memory copied to save buffer at 77641 in Memory
Memory address in source data buffer = (offset> + 16300 (source data buffer)
Memory address in save data buffer
= (offset> + 77641 (save data buffer)
Scalar Buffer Execution Results

3-82

CRAY PROPRIETARY

SMM-1012 C

Output (continued):

*** Differences ***

cpu A (master)

Source data buffer at 16300 in Memory copied to save buffer at 101654 in Memory
Memory address in source data buffer =  + 16300 (source data buffer)
Memory address in save data buffer
=  + 101654 (save data buffer)
Vector Buffer Execution Results
vOreg
<
23557> = *1773777777777777777000
0004000000000000000000
Beginning error isolation
Error isolation complete
name
rev
date
pass
error
seed
vI
numpar
isop
failpat

<
<
<

<
<
<
<
<
<
<

=
=

11760>
11761>
11762>
11763>
11764>
11765>
11770>
12016>
14527>
12475>

=

=
=

=
=

=

=

'olcsvc
'4.0
'02/09/87'
4
1
37507312636362015466
0
20
1000
'slide

isolated random vector instruction buffer
162334
147015
006000 057120

15460a
15460b
15460c

V3
VO
J

vbuff
S3*HV4
V1!V5&VM
13624a

(From this point on, the dump is similar to the previously listed portion of
the dump that displayed the unisolated error information.)
The first address (FADD) of the diagnostic is 11760a
olcsvc reached maximum error limit with 4 passes and 1 errors
on Mon Feb 9 17:23:52 1987

3.5.5

TEST MESSAGES

The olcsvc test produces the following types of messages:
•
•

Test mode
Informative

These messages are listed in the subsections that follow.

SMM-1012 C

CRAY PROPRIETARY

3-83

3.5.5.1

Test mode messaqes

During test execution, one of the following messages is displayed to
indicate the test mode:
CRAY Y-MP MODE
Indicates that the mainframe is a CEA system.
CRAY X-MP MODE
Indicates that the mainframe is a CRAY X-MP computer system.
CRAY X-MP MODE: scatter/gather/compressed index testing disabled
Indicates that the mainframe is a CRAY X-MP computer system
without scatter/gather/compressed indexing hardware. If this
message is inconsistent with your hardware configuration, it
normally indicates an instruction failure. To determine where the
failure occurred, rerun olcsvc with the +sgci command option.
Contact your CRI representative for additional assistance.
CRAY-l MODE
Indicates that the mainframe is a CRAY-1 computer system.
CRAY-1 MODE: vector pop/parity testing disabled
Indicates that the mainframe is a CRAY-1 computer system without
vecto~opulation count/parity hardware.
If this message is
inconsistent with your hardware configuration, it normally
indicates an instruction failure. To determine where the failure
occurred, rerun olcsvc with the +pop command option. Contact
your CRI representative for additional assistance.
3.5.5.2

Informative messaqes

If the +verbose option is enabled, a message is sent to stdout
(standard output device) after each pass through the test loop.
On an error, the test provides information such as the following:

3-84

•

Pass and error counts

•

Seed at the beginning of the pass on which the error occurred

•

Contents of the vector instruction buffer

•

Contents of the scalar instruction buffer

•

Initial data

•

Data results from the scalar instruction execution in the master CPU

•

Differences in the scalar execution results from the master CPU,
the scalar execution results from the remaining selected CPUs, and
the vector execution results from all of the selected CPUs
CRAY PROPRIETARY

SMM-I012 C

3.6

olibuf

The olibuf test is an on-line instruction buffer test. To detect
data-sensitive failures, the program generates test buffers and runs data
patterns through the instruction buffer. To detect branching failures,
the program generates test buffers containing in-stack and out-of-stack
jumps, compares expected jump addresses to actual jump addresses, and
reports any differences. The test continues until the maximum pass,
error, or time limit is reached.

3.6.1

TEST SYNOPSIS

The olibuf command options can be entered in any order.
If an option
is omitted, the program uses the default value. The test synopsis lists
the olibuf command options and arguments in the following order:
1.
2.
3.

Monitor options
Test-specific options
Data pattern options

Synopsis:
olibuf [chkpnt

mode] [cpu clist] [cputime h:m:s] [+I-getseed]

[getseed file]

[help] [maxerr n] [maxp n] [+I-parcel] [time

h:m:s]

[+I-verbose] [+xmp] [+crayl]t

[+I-repeat] [seed n]

[section

slist]

[+I-onezero] [+I-random] [+I-solid]

+I-repeat
Enables (+repeat) or disables (-repeat) the option that
repeats the first pass until the diagnostic terminates.
+repeat is useful for recreating an error.
It is
normally used with one of the following options:
seed n,
+getseed, or getseed file.
The default is -repeat
(the program generates new instructions and data after each
pass).

t

The monitor command options are described in section 2, Confidence
Test and Monitor Overview.

SMM-1012 C

CRAY PROPRIETARY

3-85

seed n

Sets the random seed to n.
n can be any 64-bit
octal value.
If n is 0, the test reads the real-time
clock and uses the value for the initial seed. The default
for n is 0'33.
If seed n is selected, do not select
+qetseed or getseed file.

section slist
Selects the test sections to be executed.
entered in the following format:

slist is

n, n, ••• , n
n can be one of the following test sections (if allowed
to default, all test sections are executed):
Section

Description

1

Executes a 16-bit pattern through parcel 0 of
all words in the instruction buffer

2

Executes a 16-bit pattern through parcel 1 of
all words in the instruction buffer

3

Executes a 16-bit pattern through parcel 2 of
all words in the instruction buffer

4

Executes a 16-bit pattern through parcel 3 of
all words in the instruction buffer

5

Executes random in-stack and out-of-stack
jumps in the instruction buffer

+I-onezero, +I-random, +I-solid
Selects (+) or deselects (-) specific data patterns.
If allowed to default, all of the data patterns are run.
The data patterns are as follows:
Option

Data Pattern

onezero

On each pass, random patterns of all l's or
all O's are run through the test area. For
example:
177777
000000

3-86

CRAY PROPRIETARY

SMM-1012 C

+I-onezero, +I-random, +I-solid
(continued)
Data Pattern
Option
random

On each pass, random bit patterns are run
through the test area. For example:
102314
000347
164002
112323
130431

solid

On each pass, a random pattern of either all
l's or all O's is run through the test area
with one complement pattern. The location of
the complement pattern is randomly selected.
For example:
Pass 1
177777
177777

000000 (complement)

177777
177777
Pass 2
000000

177777 (complement)

000000

SMM-1012 C

CRAY PROPRIETARY

3-87

+I-onezero, +I-random, +I-solid
(continued)
Option
Data Pattern
solid

(continued) :
Pass 3
000000

177777 (complement)
000000

Pass 4
177777 (complement)
000000

177777

3.6.2

TEST EXECUTION

The olibuf execution sequence is as follows:
1.
2.
3.
4.

5.

Test initialization
Test buffer generation
Test buffer execution
Comparison of expected and actual data
Error report

Steps 2 through 4 occur on each pass through the test loop.
occurs only on error.

3.6.2.1

Step 5

Test initialization

At test initialization, the selected sections and patterns are processed
in the following order:

3-88

1.

All sections and patterns are initially enabled.

2.

Selected sections are processed.

3.

Deselected patterns are processed. If all patterns are
deselected, an error message is displayed and the test is
terminated.

CRAY PROPRIETARY

SMM-I012 C

3.6.2.2

CRAY X-MP computer system test buffer generation

The generation routine builds and generates the test buffers. A test
buffer is generated for each section selected. Test sections 1 through 4
use the following instructions to execute a pattern through the
instruction buffer:
001000

PASS

020ijk.m
11hiOOO
030ijk
0050jk

Ai
IAh
Ai

Aj+Ak

J

Bjk

exp
Ai

Pass
Transmit exp=jkm to Ai
Store (Ai) to (Ah)
Integer sum of (Aj) and (Ak) to Ai
Jump to (Bjk)

Test section 5 uses the following instructions to execute random in-stack
and out-of-stack jumps in the instruction buffer:

020ijkm
11hiOOO
030ijk
006ijkm
0050jk

Ai
IAh
Ai

exp
Ai
Aj+Ak

exp

J
J

Bjk

Transmit exp=jkm to Ai
Store (Ai) to (Ah)
Integer sum of (Aj) and (Ak) to Ai
Jump to exp
Jump to (Bjk)

The following example shows a sample test buffer for section 1. The
parcel 0 instructions and data patterns are used to test first the odd
and then the even words. When the test buffer is executed, each data
pattern (nnnnnn) is loaded into parcel 0 of each instruction buffer
word.
Example:

Address

Opcode

CAL Mnemonics

5340a
5340b
5340c
5340d
5341b
5341d
5342a
5342b
5342c
5342d
5343b
5343d
5344a
5344b
5344c

001000
001000
001000
020100
112100
030223
001000
001000
001000
020100
112100
030223
001000
001000
001000

PASS
PASS
PASS
A1
0,A2
A2
PASS
PASS
PASS
A1
0,A2
A2
PASS
PASS
PASS

5536d

020100 nnnnnn

SMM-1012 C

nnnnnn
000000

nnnnnn
000000

A1

OOnnnnnn

Instruction
Buffer Word

001

A1
A2+A3

OOnnnnnn

003

A1
A2+A3

OOnnnnnn

CRAY PROPRIETARY

177

3-89

Example (continued):

Address

Opcode

CAL Mnemonics

5537b
5537d
5540a
5540b
5540c
5540d
5541a
5541b
5541c
5541d
5542b
5542d
5543a
5543b
5543c

112100 000000
030223
001000
001000
001000
001000
001000
001000
001000
020100 nnnnnn
112100 000000
030223
001000
001000
001000

0,A2
A2
PASS
PASS
PASS
PASS
PASS
PASS
PASS
A1
0,A2
A2
PASS
PASS
PASS

5735d
5736b
5736d
5737a
5737b
5737c
5737d
5740b
5740d
5741a

020100
112100
030223
001000
001000
001000
020100
112100
030223
005000

nnnnnn

Al
0,A2
A2
PASS
PASS
PASS
Al
0,A2
A2

000000

nnnnnn
000000

J

Instruction
Buffer Word

Al
A2+A3

OOnnnnnn

002

Al
A2+A3

OOnnnnnn

176

Al
A2+A3

OOnnnnnn

000

A1
A2+A3
BOO

The following example shows a sample test buffer for section 5.
Example:

3-90

Absolute Address

CAL Mnemonics

testbuff
testbuff+02:
testbuff+06:
testbuff+12:
testbuff+14:
testbuff+20:
testbuff+22:
testbuff+24:
testbuff+26:
testbuff+30:
testbuff+32:

ERR
A1
0,A2
A2

J
ERR
ERR
ERR
ERR
ERR
ERR

000
0000000001
Al
A2+A3
00000026660
000
000
000
000
000
000

CRAY PROPRIETARY

Jump Address

testbuff+214a

SMM-1012 C

Example (continued):
Absolute Address

CAL Mnemonics

testbuff+34:
testbuff+36:
testbuff+40:
testbuff+44:
testbuff+SO:
testbuff+52:
testbuff+S6:
testbuff+60:
testbuff+62:
testbuff+64:
testbuff+66:
testbuff+72:
testbuff+76:
testbuff+l00:

ERR
ERR
Al
0,A2
A2
J
ERR
ERR
ERR
ERR
Al
0,A2
A2
J

000
000
0000000020
Al
A2+A3
00000026201
000
000
000
000
0000000033
Al
A2+A3
00000026507

testbuff+2340:
testbuff+2342:
testbuff+2344:
testbuff+2350:
testbuff+2354:
testbuff+2356:
testbuff+2360:

ERR
ERR
Al
0,A2
A2
ERR

000
000
0000001162
Al
A2+A3
BOO
000

testbuff+2370:
testbuff+2372:
testbuff+2374:
testbuff+2400:
testbuff+2404:
testbuff+2406:
testbuff+2412:
testbuff+2414:
testbuff+2416:

ERR
ERR
A1
0,A2
A2
J
ERR
ERR
ERR

000
000
0000001176
Al
A2+A3
00000026634
000
000
000

SMM-I012 C

J

CRAY PROPRIETARY

Jump Address

testbuff+l00b

testbuff+161d

Return jump

testbuff+207a

3-91

3.6.2.3

CRAY Y-MP computer system test buffer generation

The generation routine builds and generates the test buffers. A test
buffer is generated for each section selected. Test sections 1 through 4
use the following instructions to execute a pattern through the
instruction buffer:
0010000
020iOOmn
11hiOO 00
030ijk
0050jk

PASS
Ai
,Ah
Ai
J

Pass
Transmit nm to Ai
Store (Ai) to (Ah)
Integer sum of (Aj) and (Ak) to Ai
Jump to (Bjk)

exp
Ai
Aj+Ak
Bjk

Test section 5 uses the following instructions to execute random in-stack
and out-of-stack jumps in the instruction buffer:
0010000
020iOOmn
11hiOO 00
030ijk
006ijkm
0050jk

Pass
Transmit nm to Ai
Store (Ai) to (Ah)
Integer sum of (Aj) and (Ak) to Ai
Jump to exp
Jump to (Bjk)

PASS
Ai
exp
,Ah Ai
Ai
Aj+Ak
exp
J
Bjk
J

The following example shows a sample test buffer for section 1. The
parcel 0 instructions and data patterns are used to test first the odd
and then the even words. When the test buffer is executed, each data
pattern (nnnnnn) is loaded into parcel 0 of each instruction buffer
word.
Example:
Address

Opcode

CAL Mnemonics

15740a
15740b
15740c
15740d
15741c
15742b
15742c
15742d
15743c
15744b
15744c
15744d

001000
001000
001000
020100
112100
030223
001000
020100
112100
030223
001000
020100

PASS
PASS
PASS
A1
0,A2
A2
PASS
A1
0,A2
A2
PASS
A1

3-92

nnnnnn 000000
000000 000000
nnnnnn 000000
000000 000000
nnnnnn 000000

CRAY PROPRIETARY

Instruction
Buffer Word

OOOOOnnnnnn
A1
A2+A3

001

OOOOOnnnnnn
A1
A2+A3

003

OOOOOnnnnnn

005

SMM-1012 C

Example (continued):
Instruction
Buffer Word

Address

Dpcode

CAL Mnemonics

15745c
15746b

112100 000000 000000
030223

0,A2
A2

A1
A2+A3

16136d
16137c
16140b
16140C
16140d
16141a
16141b
16141c
16141d
16142c
16143b
16143c
16143d
16144c
16145b

020100
112100
030223
001000
001000
001000
001000
001000
020100
112100
030223
001000
020100
112100
030223

nnnnnn 000000
000000 000000

A1
0,A2
A2
PASS
PASS
PASS
PASS
PASS
A1
0,A2
A2
PASS
A1
0,A2
A2

OOOOOnnnnnn
A1
A2+A3

177

OOOOOnnnnnn
A1
A2+A3

002

OOOOOnnnnnn
A1
A2+A3

004

16335d
16336c
16337b
16337c
16337d
16340c
16341b
16341c

020100
112100
030223
001000
020100
112100
030223
005000

nnnnnn 000000
000000 000000

A1
0,A2
A2
PASS
A1
0,A2
A2

OOOOOnnnnnn
A1
A2+A3

176

OOOOOnnnnnn
A1
A2+A3
BOO

000

SMM-1012 C

nnnnnn 000000
000000 000000

nnnnnn 000000
000000 000000

nnnnnn 000000
000000 000000

J

CRAY PROPRIETARY

3-93

The following example shows a sample test buffer for section 5.
Example:

3-94

Absolute Address

CAL Mnemonics

I5740a:
I5740d:
I574Ic:
I574Id:
I5742b:
I5742c:
I5742d:
I5743a:
I5743b:
15743c:
I5743d:
15744a:
15744d:
I5745c:
I5745d:
15746b:
I5746c:
I5746d:
I5747c:
I5750b:
I5750c:
I575Ia:
15751b:
I575Ic:
I575Id:
I5752a:
I5752b:
I5752c:
15752d:
I5753a:
I5753b:
15753c:
I5753d:
I5754a:
I5754b:
I5754c:
I5754d:
I5755a:
15755b:
I5755c:
I5756b:
I5757a:
I5757b:

Al
0,A2
A2
J
ERR
ERR
ERR
ERR
ERR
ERR
ERR
Al
0,A2
A2
J

ERR
ERR
Al
0,A2
A2
J

ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
Al
0,A2
A2
J

00000000000
Al
A2+A3
001606Ib

00000000020
Al
A2+A3
0016040b

00000000033
Al
A2+A3
0016152d

00000000066
Al
A2+A3
00160I2c

CRAY PROPRIETARY

-------------------

SMM-I012 C

Example (continued):
Absolute Address

CAL Mnemonics

16034b:
16034c:
16034d:
16035a:
16035b:
16036a:
16036d:
16037a:
16037b:
16037c:
I6037d:
16040a:
I6040b:
16041a:
I604Id:
16042a:
16042c:
I6042d:

ERR
ERR
ERR
ERR
Al
0,A2
A2
J
ERR
ERR
ERR
ERR
Al
0,A2
A2
J
ERR
ERR

16166c:
16166d:
16167c:
I6I70b:
16170c:
16I71a:
16171b:
16171c:
I6I71d:
16172a:
16172b:
16172c:
I6I72d:
16173a:
I6I73b:
16173c:
16173d:
16174a:
I6174b:
16174c
16I75b:
16176a:
16I76b:
16176d:
16I77a:

ERR
Al
0,A2
A2
J
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
ERR
Al
0,A2
A2
J
ERR
ERR

SMM-I012 C

00000000365
Al
A2+A3

BOO

(Return Jump)

00000000401
Al
A2+A3
0015746d

00000001133
Al
A2+A3
0015744a

00000001162
Al
A2+A3
0015775c

CRAY PROPRIETARY

3-95

3.6.2.4

Test buffer execution

After the test buffers are generated, the execution routine jumps to the
buffer and executes the test buffer code in all of the selected CPUs.
The save monitor routine saves the results. If a jump fails and an
error exit occurs (section 5 only), no results are saved.
3.6.2.5

Comparison of expected and actual data

After the instructions are executed in all of the selected CPUs, the
compare monitor routine compares the results. The actual results are
compared to the expected results. If the results match, the test
continues.
After all of the selected sections and data patterns are run, the pass
count is incremented. If the results do not match, the test dumps all of
the data related to the suspected failure.
3.6.2.6

Error report

If an error is detected, the test dumps all of the data related to the
suspected failure. The output dump contains the following:
•

Diagnostic Information Block

•

Test buffer data at the time of the failure

•

Expected results

•

Differences

3.6.3

ERROR ISOLATION TO THE FAILING BIT

An error report is generated for each section in which an error occurs.
By examining a dump for anyone of the test sections 1 through 4, you can
isolate the error to the failing bit.

3-96

CRAY PROPRIETARY

SMM-I012 C

3.6.3.1

CX/1 system error isolation

Use the following procedure to isolate an error to the failing bit
(perform all arithmetic operations in octal):
1.

For a CRAY X-MP computer system, use the index to determine the
failing word as follows:
Index

Failing Word

0'177

0

index < 0'100
index >= 0'100

(index x 2) + 1
(index - 0'77) x 2

For aCRAY-1 computer system, use the index to determine the
failing word as follows:
Index

Failing Word

0' 77

o
(index x 2) + 1
(index - 0'37) x 2

index < 0'40
index >= 0'40
2.

Examine the failing word to isolate the error to the failing bit.

The following example for a CRAY X-MP computer system shows a dump that
was generated after test section 1 detected an error. By examining the
dump, you can isolate the error to the failing bit, as follows (perform
all arithmetic operations in octal):
1.

Use the index (0'100) to determine the failing word as follows:

(index - 0'77) x 2

failing word

(0'100 - 0'77) x 2 = 2

2.

By examining the failing word, you can see that bit 2 5 is
dropped.

Example:
olibuf started in cpu A on Mon May 23 15:53:40 1988
olibuf: running
olibuf: restart file written to A33641-olibuf
1340> = 'olibuf
name
<
1341> = '1.0
rev
<
1342> = '05/17/88'
date
<
1343> = 0
pass
<
1344> = 1
error
<
(
1345> = 33
seed
1422> = 1
failsec
<
2156> = 'random
failpat
<

SMM-1012 C

CRAY PROPRIETARY

3-97

Example (continued):
Section 1 - test buffer tests parcel 0
buff
5340a
5340b
5340c
5340d
5341b
5341d
5342a
5342b
5342c

001000
001000
001000
020100 000033
112100 000000
030223
001000
001000
001000

PASS
PASS
PASS
A1
0,A2
A2
PASS
PASS
PASS

5541a
5541b
5541c
5541d
5542b
5542d
5543a
5543b
5543c
5543d
5544b
5544d
5545a
5545b
5545c

001000
001000
001000
020100
112100
030223
001000
001000
001000
020100
112100
030223
001000
001000
001000

PASS
PASS
PASS
A1
0,A2
A2
PASS
PASS
PASS
Al
0,A2
A2
PASS
PASS
PASS

120304
000000

164114
000000

00000033
A1
A2+A3

00120304
A1
A2+A3

00164114
A1
A2+A3

Expected results
data
<
data + 0002 <
data + 0004 <

= 000000

0>
2>
4>

= 000000

data + 0174 <174>
data + 0176 <176>

= 000000

=

000000 000000 000033
000000 000000 016667
000000 000000 000000 130653

= 000000

000000 000000 147000
000000 000000 073260

000000 000000 000000 000505
000000 000000 000000 010021
000000 000000 000000 042425

000000 000000 000000 141014
000000 000000 000000 042520

Difference(s) between exp and act results
data+ 0100 <200>

3-98

= *000000

000000 000000 120304* 000000 000000 000000 164114
000000 000000 000000 000040 000000 000000 000000 000000

CRAY PROPRIETARY

SMM-I012 C

3.6.3.2

CRAY Y-MP computer system error isolation

Use the following procedure to isolate an error to the failing bit
(perform all arithmetic operations in octal):
1.

2.

Use the index to determine the failing word as follows:
Index

Failing Word

0'177

o

index < 0'100
index >= 0'100

(index x 2) + 1
(index - 0'77) x 2

Examine the failing word to isolate the error to the failing bit.

The following example for a CRAY Y-MP computer system shows a dump that
was generated after test section 1 detected an error. By examining the
dump, you can isolate the error to the failing bit, as follows (perform
all arithmetic operations in octal):
1.

Use the index (0'132) to determine the failing word as follows:

(index - 0'77) x 2 = failing word
(0'132 - 0'77) x 2 = 66
2.

By examining the failing word, you can see that bit 2 3 is
dropped.

Example:
olibuf started in cpu A on Thu Aug 25 15:14:33 1988
olibuf: restart file written to A62851-olibuf
10740> = 'olibuf
name
<
10741> = '1.0
rev
<
10742> = '08/19/88'
date
<
10743>
pass
0
<
10744> = 1
error
<
10745> = 33
seed
<
failsec
11022> = 1
<
11616> = 'random
failpat
<

SMM-1012 C

CRAY PROPRIETARY

3-99

Example ( continued) :
Section 1 - test buffer tests parcel 0
buff
15740a
15740b
15740c
15740d
15741c
15742b
15742c
15742d
15743c
15744b
15744c
15744d
15745c
15746b

001000
001000
001000
020100
112100
030223
001000
020100
112100
030223
001000
020100
112100
030223

16223d
16224c
16225b
16225c
16225d
16226c
16227b
16227c
16227d
16230c
16231b

020100
112100
030223
001000
020100
112100
030223
001000
020100
112100
030223

000033 000000
000000 000000

000505 000000
000000 000000

016667 000000
000000 000000

063732 000000
000000 000000

165420 000000
000000 000000

152151 000000
000000 000000

PASS
PASS
PASS
A1
0,A2
A2
PASS
A1
0,A2
A2
PASS
A1
0,A2
A2

A1
0,A2
A2
PASS
A1
0,A2
A2
PASS
A1
0,A2
A2

00000000033
A1
A2+A3
00000000505
A1
A2+A3
00000016667
A1
A2+A3

00000063732
A1
A2+A3
00000165420
A1
A2+A3
00000152151
A1
A2+A3

Expected results
data
<
data + 0002 <
data + 0004 <

= 000000

0>
2>
4>

= 000000

data + 0174 <174>
data + 0176 <176>

= 000000
= 000000

3-100

=

000000 000000 000033
000000 000000 016667
000000 000000 000000 130653

000000 000000 147000
000000 000000 073260

CRAY PROPRIETARY

000000 000000 000000 000505
000000 000000 000000 010021
000000 000000 000000 042425

000000 000000 000000 141014
000000 000000 000000 042520

SMM-1012 C

The difference data shown below has the following format:
+

n~e

index

(offset>

data
data differences

The name of the data dumped on this line.
The index into the data starting at name. Optional, default: O.
The offset into the data buffer.
The actual data dumped.
The differences are marked with an asterisk (*) preceding the data word.
data differences: The bits that differ between the actual results and
the expected results.

name:
index:
offset:
data:

*** Differences ***
Source data buffer at 14740 in Memory copied to save buffer at 103362 in
Memory
Memory address in source data buffer = (offset> + 14740 (source data
buffer)
Memory address in save data buffer
= (offset> + 103362 (save data buffer)
Difference(s) between exp and act results
data + 0132 (264)

= *000000

000000 000000 165420* 000000 000000 000000 152151

000000 000000 000000 000010

3.6.4

000000 000000 000000 000000

TEST TERMINATION

If a jump fails in section 5, an error exit occurs.
There are several monitor options that can cause a test to terminate.
Refer to the information on test termination in section 2, Confidence
Test and Monitor Overview.

3.6.5

TEST EXAMPLES

This subsection contains olibuf execution

ex~ples.

The following example runs olibuf with selected command options and
shell facilities. The test runs for 0'1000000 passes in CPU b with all
default instructions. The job runs as a background process, and the
output is sent to olibuf.log.
olibuf maxp 1000000 cpu b >olibuf.log
The following example runs olibuf with section 1 selected.
olibuf section 1

SMM-1012 C

CRAY PROPRIETARY

3-101

The following example runs olibuf for 0'10000000 passes. Output is
redirected to olibuf.log. The Dohup(l) command allows the program to
continue executing after you log off the system. You can later log on to
check the test's progress. The ampersand (&) causes the entire command
to execute in the background, so that another prompt is immediately
displayed and you can continue to use the system.
nohup olibuf maxp 10000000 )olibuf.log

&

The following example shows the output displayed when olibuf is run
with all default values.
olibuf
Output:
olibuf
olibuf started in cpu A on Fri Aug 28 11:14:10 1987
olibuf reached maximum pass limit with 1000 passes and 0 errors
on Fri Aug 28 11:14:14 1987

The following example runs olibuf with the +verbose option enabled so
that a line of output is generated after each pass.
olibuf +verbose
Output:
olibuf +verbose
olibuf started in
olibuf: pass = 1,
olibuf: pass = 2,
olibuf: pass = 3,

=

cpu A on Fri Aug 28 11:14:14 1987
error = 0 Fri Aug 28 11:14:14 1987
error
0 Fri Aug 28 11:14:14 1987
error = 0 Fri Aug 28 11:14:14 1987

=

=

olibuf: pass
1000, error
0 Fri Aug 28 11:14:14 1987
olibuf reached maximum pass limit with 1000 passes and 0 errors
on Fri Aug 28 11:14:14 1987

3-102

CRAY PROPRIETARY

SMM-1012 C

The following example runs olibuf in CPU conly.
olibuf cpu c
Output:
olibuf
olibuf
olibuf
on Fri

cpu c
started in cpu C on Fri Aug 28 11:14:14 1987
reached maximum pass limit with 1000 passes and 0 errors
Aug 28 11:14:14 1987

The following example runs olibuf in CPUs a and b , with a as the master.
olibuf cpu a,b

olibuf
olibuf
olibuf
on Fri

cpu a,b
started in cpus A, B with master cpu A on Fri Aug 28 11:14:14 1987
reached maximum pass limit with 1000 passes and 0 errors
Aug 28 11:14:14 1987

The following example runs olibuf with the +verbose option enabled.
The output is generated after an error is detected.
olibuf +verbose
Output:
olibuf +verbose
olibuf started in cpu A on Fri Aug 28 11:14:14 1987
olibuf:
restart file written to A7465-olibuf
14420> = 'olibuf
name
<
14421> = '1.0
rev
<
14422> = '08/27/87'
date
<
14423>
0
pass
<
14424> = 1
error
<
14425>
52301500217376
seed
<
23221>
1
failsec
<
=
15174> = 'solid
failpat
<
Generated test buffer tests parcel 0
buff

(the test buffer that was executing when the error was detected is
dumped in parcel and ASCII format)

SMM-1012 C

CRAY PROPRIETARY

3-103

Section 1 - parcel 0 test
The expected data shown below has the following format:
n~e

n~e:

index:
offset:
data:

+

index



= data

•••

The n~e of the data dumped on this line.
The index into the data starting at n~e.
The offset into the data buffer.
The actual data dumped.

*** Expected Results ***

Optional, default: O.

cpu A (master)

Source data buffer at 6427 in Memory copied to save buffer at 70201 in Memory
Memory address in source data buffer =  + 6427 (source data buffer)
Memory address in save data buffer
=  + 70201 (save data buffer)

*** Expected Results ***
(the expected data is dumped in parcel format)
The difference data shown below has the following format:
n~e

+

index



= data
data differences

The name of the data dumped on this line.
The index into the data starting at n~e. Optional, default: O.
The offset into the data buffer.
The actual data dumped.
The differences are marked with an asterisk (*) preceding the data word.
data differences: The bits that differ between the actual results and
the expected results.
n~e:

index:
offset:
data:

*** Differences ***

3-104

cpu A (master)

CRAY PROPRIETARY

SMM-1012 C

Example (continued):
Source data buffer at 6427 in Memory copied to save buffer at 71204 in Memory
Memory address in source data buffer = (offset> + 6427 (source data buffer)
Memory address in save data buffer
= (offset> + 71204 (save data buffer)

***

Differences

***

(The differences are displayed.

Differences are the results of the

actual execution of the test buffer that differ from the expected
results.)
The first address (FADD) of the diagnostic is 14420a
olibuf reached maximum error limit with 0 passes and 1 errors
on Fri Aug 2811:14:23 1987

3.6.6

TEST MESSAGES

The olibuf test produces the following types of messages:
•
•

Informative
Error

These messages are described in the subsections that follow.

3.6.6.1

Informative messages

If no error occurs, olibuf produces two messages, one at start-up time
and another at test termination.
If the +verbose option is enabled, a
message is sent to stdout (standard output device) after each pass
through the test lOop.
On an error, the test provides information such as the following:
•

Pass and error counts

•

Seed at the beginning of the pass on which the error occurred

•

Failing word and parcel

•

Test buffer data used when the error occurred

•

Expected results

•

Actual results

•

Differences between the expected results from the master CPU and
the actual execution results from all of the selected CPUs

SMM-1012 C

CRAY PROPRIETARY

3-105

3.6.6.2

Error messages

One of the following error messages is sent to stderr (standard error
device) if an invalid command option is entered:
olibuf: initpat: No data patterns selected
Select one or more data patterns and rerun.
olibuf: bldtbl: Invalid section selected. Valid sections are:
Select one or more valid test sections and rerun.

3-106

CRAY PROPRIETARY

1-5.

SMM-1012 C

3.7

olsbt

The olsbt test is an on-line semaphore, shared B and shared T register
test for CX/CEA systems.
It tests the following components:
•
•
•
•

Shared B registers
Shared T registers
Semaphores
Clusters

The olsbt test generates a random sequence of shared register
instructions and data to detect inter-CPU communication failures. The
generated instructions are simulated and then executed.
If no
differences are detected, the test generates new instructions and data,
and repeats the process until the maximum pass, error, or time limit is
reached for the selected cluster number.
The olsbt test runs under the confidence monitor program, oleman.
The oleman monitor compares the actual and simulated results.
For
additional information on oleman, refer to section 2 of this manual,
Confidence Test and Monitor Overview.
For additional information on inter-CPU communication, refer to the
following manuals (as appropriate to your system configuration):
Publication

Title

CSM0110000
CSM-0111-000
CSM0112000
CSM-0400-000

CRAY
CRAY
CRAY
CRAY

3.7.1

X-MP/2 System Programmer Reference Manual
X-MP/1 System Programmer Reference Manual
X-MP/4 System Programmer Reference Manual
Y-MP System Programmer Hardware Reference Manual

TEST SYNOPSIS

The olsbt command options can be entered in any order.
If an option is
omitted, the program uses the default value. The test synopsis lists the
olsbt command options and arguments in the following order:
1.
2.
3.
4.

Monitor options
Test-specific options
Data pattern options
Instruction options

SMM-1012 C

CRAY PROPRIETARY

3-107

Synopsis:
olsbt [chkpnt mode] [cpu clist] [cputime h:m:s] [+I-qetseed]
[getseed file] [help] [mazerr n] [mazp n] [+I-parcel] [time h:m:s]
[+I-verbose]t
[cluster n] [numins n] [+I-repeat] [seed n]
[+I-bits] [+I-onezero] [+I-random]
cluster n
Selects specific cluster. n can be anyone of the
following cluster numbers associated with the indicated
mainframe (cluster number 1 is reserved for the operating
system) :
Mainframe

Cluster Numbers

CRAY
CRAY
CRAY
CRAY
CRAY

2,
2,
2,
2,
2,

Y-MP/8
Y-MP/4
X-MP/4
X-MP/2
X-MP/l

3, 4, 5, 6, 7, 10, 11
3, 4, 5
3, 4, 5
3
3

The default for n is a random cluster number. The
cluster number does not change during test execution.
cluster n must be used to recreate a failure.
numins n

Sets the number of instructions to be generated. n can
be any value within the range 1 through 0'20. The default
for n is 0'20.

+I-repeat
Enables (+repeat) or disables (-repeat) the option that
repeats the first pass until the diagnostic terminates.
+repeat is useful for recreating an error. It is
normally used with cluster n and one of the following
options: seed n, +getseed, or getseed file. The
default is -repeat (the program generates new instructions
and data after each pass).

t

The monitor command options are described in section 2, Confidence Test
and Monitor Overview.

3-108

CRAY PROPRIETARY

SMM-1012 C

seed n

Sets the random seed to n.
n can be any 64-bit
octal value.
If n is 0, the test reads the real-time clock
and uses the value for the initial seed. The default for n
is 0'33. If seed n is selected, do not select +getseed
or getseed file.

+I-bits, +I-onezero, +I-random
Selects (+) or deselects (-) specific data patterns.
The default selects all of the patterns. The data patterns
are as follows:
Option

Data Pattern

bits

Random number of consecutive 1-bits in a
word. For example:
0000017777777776000000
1777000000000000000377
1777777777777777777777
0000000000000000000000
0000000000100000000000

one zero

Random selection of all l's or all O's in a
word. For example:
1777777777777777777777
0000000000000000000000

random

Random bit generation in a word.

For example:

1023122123232122777127
0003423100233344322177
1640034356453221213532
1123235467543221344120
1304322300332105534311

SMM-1012 C

CRAY PROPRIETARY

3-109

3.7.2

TEST EXECUTION

The olsbt test should be executed with the maximum number of CPUs
available on the system. This allows the requested cluster number to
become available more quickly, since one process will be started in each
CPU.
The olsbt test execution sequence is as follows:
1.
2.
3.
4.
5.
6.

Test initialization and hardware configuration detection
Random instruction and data generation
Random instruction buffer simulation
Random instruction buffer execution
Comparison of simulation and execution results
Error isolation

Steps 2 through 5 occur on each pass through the test loop.
occurs only on error.
3.7.2.1

Step 6

Test initialization and hardware configuration detection

At test initialization, all instructions are enabled. The hardware
configuration detection routine identifies the number of available
clusters. If the cluster specified by the command option cluster n
is not available, the program overrides cluster n and uses a random
cluster.

3.7.2.2

Random instruction and data generation

These routines build and generate the random instruction buffers and
initial data. Instructions for the buffers are randomly selected from a
list of instructions. The values of the i, j, and k fields are
randomly selected when appropriate.
If four CPUs are selected, four random instruction buffers are created;
one for each CPU. If only one CPU is selected, two random instruction
buffers are created and both are executed in the selected CPU. Each
instruction buffer contains instructions that enable it to write to the
shared registers. Only one buffer can write to the shared registers at a
time. The buffer that can write to the shared registers is rotated
through the selected CPUs, starting with the selected master CPU. The
other buffers can read from the shared registers if the master is not
writing to that particular shared register. Before another buffer can
begin writing to the shared registers, all buffers must be syncronized.

3-110

CRAY PROPRIETARY

SMM-1012 C

A sample of the instruction buffers for four CPUs is as follows:

003416
003404
003401
003603
003627
003434
003730
003726
003702
026227
003634
003635
003405
003605
003617
003413
003613
003410
003415
003406
003636
005000

ibuffO
SM16
SM04
SM01
SM03
SM27
SM34
SM30
SM26
SM02
A2
SM34
SM35
SM05
SM05
SM17
SM13
SM13
SM10
SM15
SM06
SM36
J

1,TS
1,TS
1,TS
0
0
1,TS
1
1
1
SB2
0
0
1,TS
0
0
1,TS
0
1,TS
1,TS
1,TS
0
BOO

003616
003403
072473
072333
026607
003603
003431
003425
003427
003634
003620
026427
003623
003405
003605
003600
003413
003613
003610
003436
003636
005000

ibuff1
SM16
SM03
S4
S3
A6
SM03
SM31
SM25
SM27
SM34
SM20
A4
SM23
SM05
SM05
SMOO
SM13
SM13
SM10
SM36
SM36
J

0
1,TS
ST7
ST3
SBO
0
1,TS
1,TS
1,TS
0
0
SB2
0
1,TS
0
0
1,TS
0
0
1,TS
0
BOO

SMM-1012 C

CRAY PROPRIETARY

3-111

Example (continued) :

003604
003403
003603
003631
003434
003634
003433
003435
003423
026267
072663
073343
003605
072213
026647
003621
003413
003613
003615
003436
003636
005000

ibuff2
SM04
SM03
SM03
SM31
SM34
SM34
SM33
SM35
SM23
A2
S6
ST4
SM05
S2
A6
SM21
SM13
SM13
SM15
SM36
SM36
J

0
1,TS
0
0
1,TS
0
1,TS
1,TS
1,TS
SB6
ST6
S3
0
ST1
SB4
0
1,TS
0
0
1,TS
0
BOO

003601
003403
003603
026067
026367
026767
003614
003625
003434
003634
003633
03405
003605
003417
003400
003421
003613
003606
003436
003636
05000

ibuff3
SM01
SM03
SM03
AO
A3
A7
SM14
SM25
SM34
SM34
SM33
SM05
SM05
SM17
SMOO
SM21
SM13
SM06
SM36
SM36
J

0
1,TS
0
SB6
SB6
SB6
0
0
1,TS
0
0
1,TS
0
1,TS
1,TS
1,TS
0
0
1,TS
0
BOO

3-112

CRAY PROPRIETARY

SMM-1012 C

3.7.2.3

Random instruction buffer simulation

After the instructions and data are generated, the master CPU simulates
the random instruction buffers. The save monitor routine saves the
results.
Each instruction type has a unique simulation routine. The simulation
routines do not use any of the shared register hardware.
3.7.2.4

Random instruction buffer execution

After the instructions are simulated, all of the selected CPUs execute
their own instruction buffer in the selected cluster. The master CPU
uses the system call cpu(4D) to select the cluster.
The olsbt test allows you to test inter-CPU control and communication
by synchronizing code execution among selected CPUs. The first CPU
selected is the master CPU, which generates and simulates all instruction
buffers for all selected CPUs.
The following characteristics apply to instruction buffer execution:
•

The master CPU creates and schedules processes using the following
system calls:
System Call

Description

tfork(2)

Creates a multitasking process for each
selected CPU

cpselect(2)

Schedules the processes in the CPUs

•

Only one buffer can write to the shared B and shared T registers
in the specified cluster at a time.

•

The master CPU loads the shared registers with the generated data
before starting the other CPUs. The master CPU then waits for all
CPUs to execute their buffers before unloading the shared
registers.

•

All semaphores used in the test and set instructions in the
instruction buffers are initially set.

SMM-1012 C

CRAY PROPRIETARY

3-113

Before the instructions can be executed, the master CPU loads the
following:
•
•
•
•
•

Shared B registers
Shared T registers
Semaphore register
Address registers for the master CPU
Scalar registers for the master CPU

The other CPUs load the following:
•
•

Address registers
Scalar registers

Then an unconditional jump to the random instruction buffer is executed
in each CPU. At the end of the random instruction buffer is a jump to
BO. Each CPU unloads the contents of its address and scalar registers.
The master CPU waits until all CPUs have executed and then unloads the
contents of the shared registers. The save monitor routine saves the
results.
3.7.2.5

Comparison of simulation and execution results

After the instructions execute in all of the selected CPUs, the compare
monitor routine compares the results, and one of the following actions
occurs:
•

If the results match, the test proceeds with the next data
pattern. After all of the selected data patterns are run, the
pass count is incremented.

•

If the results do not match, the test dumps all of the data
related to the suspected failure.

If a deadlock interrupt was received, a core dump is produced and the
test terminates.
3.7.2.6

Error isolation

The output dump contains the following:
•
•
•
•

3-114

Data used when the failure occurred
Simulated execution results
Actual execution results (if different from the simulated results)
Exclusive OR of the simulated and actual execution results

CRAY PROPRIETARY

SMM-I012 C

The program may report an error resulting from a failure in either the
simulated or actual execution. To determine if the error is the result
of an actual execution failure, start oIsbt in a different CPU and
select the suspected failing cpu. For example, the following entry
starts oIsbt in CPU c:
olsbt cpu c
If oIsbt fails, and the simulated execution is suspect, rerun oIsbt
using a different master CPU, the failing seed, and the failing cluster,
as follows:
olsbt cpu a,c +repeat seed n cluster n
If oIsbt fails in CPU c, the failure is in the actual execution of the
random instruction buffer. If oIsbt does not fail, the error is either
in the simulated execution results from CPU c or it is very intermittent.

3.7.3

TEST TERMINATION

For information on test termination, refer to section 2.4, Test
Termination.

3.7.4

TEST EXAMPLES

This subsection contains oIsbt execution examples.
The following example runs oIsbt with all defaults. oIsbt executes
in CPU a. The output is displayed at the operator console.
olsbt
The following example runs oIsbt in CPUs a, b, c, and d.
displayed at the operator console.

The output is

olsbt cpu a,b,c,d

The following example runs oIsbt for 0'10000000 passes. By default,
oIsbt executes in CPU a. Output is redirected to sbt.log. The
nohup(l) command allows the program to continue executing after you log
off the system. You can later log on to check the test's progress. The
ampersand (&) causes the entire command to execute in the background,
so that another prompt is immediately displayed and you can continue to
use the system.
nohup olsbt maxp 10000000 )sbt.log &

SMM-1012 C

CRAY PROPRIETARY

3-115

The following example runs olsbt with selected command options and
shell facilities. oIsbt runs for 0'1000000 passes in CPUs a and b.
The job runs as a background process, and output is sent to sbt.log.
olsbt maxp 1000000 cpu a,b )sbt.log &
The following example shows a procedure for determining how frequently an
error occurs. oIsbt is rerun with the +repeat option, so that the
first pass is run repeatedly until the test terminates. The test uses
the seed value and the failing cluster number from the output at the time
of the initial error. Error isolation is disabled and olsbt executes
in CPUs a, b, c, and d. The job runs as a background process, and output
is sent to sbt.Iog.
olsbt +repeat -isolate maxerr 100 maxp 100 cpu a,b,c,d seed
1436651016713554002511 cluster 4 )sbt.log &
The following example shows the ouput displayed when oIsbt is run with
all default values.
olsbt
Output:
olsbt
olsbt started in cpu A on Wed Dec 14 15:18:56 1988
CRAY Y-MP MODE
olsbt reached maximum pass limit with 1000 passes and 0 errors
on Wed Dec 14 15:20:23 1988
The following example runs olsbt in four CPUs with the +verbose
option enabled so that a line of output is generated after each pass.
olsbt cpu a,b,c,d +verbose
Output:
olsbt cpu a,b,c,d +verbose
olsbt started in cpus A, B, C, D with master cpu A on Wed Dec 14 15:19:08 1988
CRAY Y-MP MODE
olsbt: pass
olsbt: pass
olsbt: pass

3-116

=

1, error

=

2, error =
3, error =

=

=

o
o
o

CRAY PROPRIETARY

Wed Dec 14 15:19:26 1988
Wed Dec 14 15:19:26 1988
Wed Dec 14 15:19:26 1988

SMM-1012 C

Output (continued):

=

=

olsbt: pass
1000, error
0 Wed Dec 14 15:21:23 1988
olsbt reached maximum pass limit with 1000 passes and 0 errors
on Wed Dec 14 15:21:23 1988

The following example runs o!sbt in CPUs a, b, c, d with CPU a as the master.
olsbt cpu a,b,c,d
Output on an error:
olsbt cpu a,b,c,d
olsbt started in cpus A, B, C, D with master cpu A on Wed Dec
CRAY Y-MP MODE

7 14:27:00 1988

olsbt: restart file written to A35411-olsbt
200> = 'olsbt
name
<
201>
rev
<
= '5.0
202>
date
<
= '12/07/88'
203>
pass
<
= 4
204>
error
<
= 1
205> = 103336000000000000000
seed
<
1774> = 'bits
failpat
<
220> = 2
failcln
<
206>
numins
<
= 20
TASK

0 random instruction buffer executed in CPU A
ibuffO

4200a
4200b
4200c
4200d
4201a
4201b
4201c
4201d
4202a
4202b
4202c
4202d
4203a
4203b
4203c
4203d
4204a
4204b
4204c
4204d
4205a
4205b

SMM-1012 C

003416
003404
003401
003603
003627
003434
003730
003726
003702
026227
003634
003635
003405
003605
003617
003413
003613
003410
003415
003406
003636
005000

SM16
SM04
SM01
SM03
SM27
SM34
SM30
SM26
SM02
A2
SM34
SM35
SM05
SM05
SM17
SM13
SM13
SM10
SM15
SM06
SM36
J

CRAY PROPRIETARY

1,TS
1,TS
1,TS
0
0
1,TS
1
1
1
SB2
0
0
1,TS
0
0
1,TS
0
1,TS
1,TS
1,TS
0
BOO

3-117

Output (continued):
TASK

1 random instruction buffer executed in CPU B
ibuff1

4240a
4240b
4240c
4240d
4241a
4241b
4241c
4241d
4242a
4242b
4242c
4242d
4243a
4243b
4243c
4243d
4244a
4244b
4244c
4244d
4245a
4245b
TASK

SM16
SM03
S4
S3
A6
SM03
SM31
SM25
SM27
SM34
SM20
A4
SM23
SM05
SM05
SMOO
SM13
SM13
SM10
SM36
SM36

003616
003403
072473
072333
026607
003603
003431
003425
003427
003634
003620
026427
003623
003405
003605
003600
003413
003613
003610
003436
003636
005000

J

0
1,TS
ST7
ST3
SBO
0
1,TS
1,TS
1,TS
0
0
SB2
0
1,TS
0
0
1,TS
0
0
1,TS
0
BOO

2 random instruction buffer executed in CPU C
ibuff2

4300a
4300b
4300c
4300d
4301a
4301b
4301c
4301d
4302a
4302b
4302c
4302d
4303a
4303b
4303c
4303d
4304a
4304b
4304c
4304d
4305a
4305b

3-118

SM04
SM03
SM03
SM31
SM34
SM34
SM33
SM35
SM23
A2
S6
ST4
SM05
S2
A6
SM21
SM13
SM13
SM15
SM36
SM36

003604
003403
003603
003631
003434
003634
003433
003435
003423
026267
072663
073343
003605
072213
026647
003621
003413
003613
003615
003436
003636
005000

J

CRAY PROPRIETARY

0
1,TS
0
0
1,TS
0
1,TS
1,TS
1,TS
SB6
ST6
S3
0
ST1
SB4
0
1,TS
0
0
1,TS
0
BOO

SMM-1012 C

Output (continued):
TASK

3 random instruction buffer executed in CPU D
ibuff3

4340a
4340b
4340c
4340d
4341a
4341b
4341c
4341d
4342a
4342b
4342c
4342d
4343a
4343b
4343c
4343d
4344a
4344b
4344c
4344d
4345a

003601
003403
003603
026067
026367
026767
003614
003625
003434
003634
003633
003405
003605
003417
003400
003421
003613
003606
003436
003636
005000

SM01
SM03
SM03
AO
A3
A7
SM14
SM25
SM34
SM34
SM33
SM05
SM05
SM17
SMOO
SM21
SM13
SM06
SM36
SM36
J

0

1 , TS
0
SB6
SB6
SB6
0
0

1 , TS
0
0

1 , TS
0

1 , TS
1 , TS
1,TS
0
0

1 , TS
0
BOO

initial address register data for TASK 0
initarO
5210> = 0000000000020000000000
<
initarO + 0004 <
5214> = 0000000000000000000000

initial scalar register data for TASK 0
initsrO
<
5200>
0377777777776000000000
initsrO + 0004 <
5204> = 0000000000000000000000

=

initial address register data for TASK 1

(address register data is displayed for task 1)
initial scalar register data for TASK 1

(scalar register data is displayed for task 1)
initial address register data for TASK 2

(address register data is displayed for task 2)

SMM-1012 C

CRAY PROPRIETARY

3-119

Output (continued):
initial scalar register data for TASK 2

(scalar register data is displayed for task 2)
initial address register data for TASK 3

(address register data is displayed for task 3)
initial scalar register data for TASK 3

(scalar register data is displayed for task 3)
initial shared B register data
initsb
<
5300> =
initsb
+ 0004 <
5304> =

0000000000000000000000
0000000000000177777777

initial shared T register data
initst
<
5310> = 0000000000000777760000
initst
+ 0004 <
5314> = 1777740000000001777777

initial semaphore register data
initsm
<
5320> = 1577777777700000000000
simulated random instruction buffer results
The expected data shown below has the following format:
name
name:
index:
offset:
data:

+ index

The
The
The
The



= data

••.

name of the data dumped on this line.
index into the data starting at name.
offset into the data buffer.
actual data dumped.

*** Expected Results ***

Optional, default: O.

cpu A (master)

Source data buffer at 6200 in Memory
Memory address in source data buffer

= 

+ 6200 (source data buffer)

simulated address register data results for TASK 0
actarO
<
10> = 0000000000020000000000
actarO
+ 0004
14> = 0000000000000000000000
<

3-120

CRAY PROPRIETARY

SMM-1012 C

Output (continued):
simulated scalar register data results for TASK 0
actsrO
( 0 ) = 0377777777776000000000
actsrO
+ 0004
4>
0000000000000000000000
<
simulated address register data results for TASK 1

(address register data is displayed for task 1)
simulated scalar register data results for TASK 1

(scalar register data is displayed for task 1)
simulated address register data results for TASK 2

(address register data is displayed for task 2)
simulated scalar register data results for TASK 2

(scalar register data is displayed for task 2)
simulated address register data results for TASK 3

(address register data is displayed for task 3)
simulated scalar register data results for TASK 3

(scalar register data is displayed for task 3)
simulated shared B register data results
100>
actsb
0000000000000000000000
<
+ 0004
104> = 0000000000000177777777
actsb
<

simulated shared T register data results
110> = 0000000000000777760000
actst
<
114> = 1777777777777777777777
actst
+ 0004
<
simulated semaphore register data results
actsm
<
120> = 1657473777200000000000

SMM-1012 C

CRAY PROPRIETARY

3-121

Output (continued):
Differences are the results from actual execution of the random instruction
buffer that differ from the master (simulated or actual) execution.

=

actar
address register data results
actsr
= scalar register data results
actsb
sbO-sb7 register data results
actst
stO-st7 register data results
actsm
semaphore register data result
The difference data shown below has the following format:

=
=
=

n~e

+ index



= data
data differences

The n~e of the data dumped on this line.
The index into the data starting at n~e. Optional, default: O.
The offset into the data buffer.
The actual data dumped.
The differences are marked with an asterisk (*) preceding the
data word.
data differences: The bits in difference between the actual results and the
expected results.
n~e:

index:
offset:
data:

*** Differences ***

cpu A (master)

Source data buffer at 7200 in Memory copied to save buffer at 113755 in Memory
Memory address in source data buffer
 + 7200 (source data buffer)
Memory address in save data buffer
=  + 113755 (save data buffer)

=

actual random buffer execution results
actst

+ 0004

<

114>

= *0000000000000000000000
1777777777777777777777

The first address (FADD) of the diagnostic is 200a

olsbt reached maximum error limit with 4 passes and 1 errors at Wed Dec
1988

3-122

CRAY PROPRIETARY

7 14:27:00

SMM-1012 C

If olsbt determines that the initial load of the semaphores failed, the
test produces a dump and terminates.
Output on an error:
olsbt cpu a,b,c,d
olsbt started in cpus A, B, C, D with master cpu A on Wed Dec

7 15:12:29 1988

CRAY Y-MP MODE
execute: an error was detected in the initial load of the semaphore register
olsbt: restart file written to A60249-olsbt
name
<
200> = 'olsbt
rev
201> = '5.0
<
date
202> = '12/07/88'
<
203> = 0
pass
<
error
204> = 1
<
seed
205> = 33
<
1774>
failpat
'bits
<
failcln
220> = 2
<
numins
206> = 20
<
TASK

0 random instruction buffer executed in CPU A

2175a
2175b
2175c

073102
072202
046012

SM

Sl

S2
SO

Sl\S2

SM

initial address register data for TASK 0
initarO
<
5210> = 0000000000000000000000
initarO + 0004 <
5214> = 0000000000000000000000

initial scalar register data for TASK 0
initsrO
<
5200> = 0000000000000000000760
initsrO + 0004 <
5204> = 0000777777777777777777

initial shared B register data
initsb
<
5300> = 0000000000000000000000
initsb
+ 0004 <
5304> = 0000000000000000000000

initial shared T register data
initst
<
5310> = 0000000000000000000020
initst
+ 0004 <
5314> = 1777776000000000000007

SMM-1012 C

CRAY PROPRIETARY

3-123

Output (continued):
initial semaphore register data
initsm
<
5320>

=

1106721617240000000000

simulated random instruction buffer results
The expected data shown below has the following format:
name
name:
index:
offset:
data:

***

+ index



The
The
The
The

Expected Results

= data

.•.

name of the data dumped on this line.
index into the data starting at name.
offset into the data buffer.
actual data dumped.

***

Optional, default: O.

cpu A (master)

Source data buffer at 6200 in Memory
Memory address in source data buffer

=

 + 6200 (source data buffer)

simulated address register data results for TASK 0
actarO
<
10> = 0000000000000000000000
actarO
+ 0004
14> = 0000000000000000000000
<

simulated scalar register data results for TASK 0
actsrO
<
0> = 0000000000000000000000
actsrO
+ 0004
<
4>
0000000000000000000000

=

simulated shared B register data results
<
100> = 0000000000000000000000
actsb
104> = 0000000000000000000000
<
actsb
+ 0004

simulated shared T register data results
110> = 0000000000000000000000
<
actst
<
114> = 0000000000000000000000
actst
+ 0004

simulated semaphore register data results
actsm
<
120> = 1106721617240000000000

3-124

CRAY PROPRIETARY

SMM-1012 C

Output (continued):
Differences are the results from actual execution of the random instruction
buffer that differ from the master (simulated or actual) execution.
actar
= address register data results
actsr
= scalar register data results
actsb
= sbO-sb7 register data results
actst
= stO-st7 register data results
actsm
= semaphore register data result
The difference data shown below has the following format:
n~e

+ index

(offset>

= data
data differences

name:
index:
offset:
data:

The name of the data dumped on this line.
The index into the data starting at name. Optional, default: O.
The offset into the data buffer.
The actual data dumped.
The differences are marked with an asterisk (*) preceding the
data word.
data differences: The bits in difference between the actual results and
the expected results.

*** Differences ***

cpu A (master)

Source data buffer at 6200 in Memory
Memory address in source data buffer

*** Differences ***

= (offset>

+

6200 (source data buffer)

cpu A (master)

Source data buffer at 7200 in Memory copied to save buffer at 113755 in Memory
Memory address in source data buffer = (offset> + 7200 (source data buffer)
Memory address in save data buffer
= (offset> + 113755 (save data buffer)
actsm

(

120>

= *1000000000000000000000
0106721617240000000000

The first address (FADD) of the diagnostic is 200a

olsbt reached maximum error limit with 0 passes and 1 errors
at Wed Dec 7 15:12:30 1988

SMM-1012 C

CRAY PROPRIETARY

3-125

3.7.5

TEST MESSAGES

The olsbt test produces the following types of messages:
•
•
•

Test mode
Informative
Error

These messages are listed in the subsections that follow.
3.7.5.1

Test mode messages

During test execution, one of the following messages is displayed to
indicate the test mode:
CRAY Y-MP MODE
Indicates that the mainframe is a CEA system (Y-mode).
CRAY X-MP MODE
Indicates that the mainframe is a CRAY X-MP computer system.
3.7.5.2

Informative messages

If no error occurs, the test generates two messages, one at start-up time
and the other at test termination.
If the +verbose option is enabled, a message is sent to stdout
(standard output device) after each pass through the test loop.
error, the test provides information such as the following:

On an

•

Pass and error counts

•

Seed at the beginning of the pass on which the error occurred

•

Cluster number for the error that occurred

•

Contents of the instruction buffers and in which CPU each
instruction buffer was executed

•

Initial data

•

Resulting data from the simulated instruction execution in the
master CPU

•

Differences between the simulation execution results from the
master CPU and the actual execution results from all of the
selected CPUs

3-126

CRAY PROPRIETARY

SMM-1012 C

3.7.5.3

Error messages

The following error message is sent to stderr (standard error device)
if an invalid command option is entered:
olsbt: no data pattern(s) selected
All data patterns were deselected (-bits -onezero -random).
Correct and rerun.
The following messages are sent to stderr if olsbt detects an
unexpected error. Select a different master CPU and rerun the test.
the problem persists, contact your CRI representative.

If

olsbt: generate: (software error)
generation routine.

The instruction does not have a

olsbt: simulate: (software error)
during simulation.

a deadlock was encountered

olsbt:

simulate:

(software error)

gh field is not valid.

olsbt:

simulate:

(software error)

ijk field is not valid.

olsbt: simulate: (software error)
simulation routine.

The instruction does not have a

The following error message is sent to stderr if olsbt detects an
error in the initial load of the semaphore register. Contact your CRI
representative.
execute: an error was detected in the initial load of the semaphore
register.

SMM-1012 C

CRAY PROPRIETARY

3-127

4.

MAIRTEHAHCE TEST AND KBI'l'OR OVERVIEW

The on-line maintenance tests provide error detection and isolation.
These on-line tests are variants of the off-line diagnostic tests.
This section provides an overview of the following information:
•
•
•
•
•
•
•
•

Maintenance monitor (almont)
Program synopsis
Test execution
Test-specific requirements
Test termination
Test examples
Test messages
Diagnostic memory image

For a brief description of each maintenance test, refer to appendix A,
On-line Diagnostic Programs. For a list of test execution times, refer
to appendix B, Test Execution Times. For additional information on the
maintenance tests, refer to the on-line diagnostic listings.

4.1

MAINTENANCE MONITOR (alman)

The olmon monitor is a C program monitor for the on-line maintenance
tests. The loader program attaches olmon to a slightly modified
version of an off-line diagnostic test to create an on-line maintenance
program.
The alman monitor provides the interface to the on-line maintenance
tests. By accepting and interpreting command options and arguments,
olmon allows you to do the following:

t

•

Set the diagnostic information block (DIB) locations in the
diagnostic

•

Set limits on the maximum number of passes and errors allowed
(maxerr nand maxp n)

•

Set limits on test execution time, in CPU time (cputime h:m:s)
or elapsed (wall-clock) time (time h:m:s)

CEA (X-mode) and CX/1 systems only

SMM-1012 C

CRAY PROPRIETARY

4-1

4.2

•

Allocate memory for memory tests

•

Select the CPU to be tested

•

Send test results to stdout (standard output device) by default
or to a file by indicating output redirection on the command line

PROGRAM SYNOPSIS

Before a test can be started, UNICOS must be running in the CPU to be
tested. The olmon command options can be entered in any order. If an
option is omitted, the program uses the default value.

Synopsis:

test [chtpnt mode] [cpu x] [cputime h:m:s] [data x:y] [dib x]
[help] [mazerr n] [mazp n] [time h:m:s] [+I-verbose] [words n]
chtpnt mode
Indicates whether restart files are to be generated.
Restart files cannot be created unless output is directed
to a disk file.

mode is one of the following arguments:
Argument

Description

first

Generates a restart file for the first
failure detected (default)

all

Generates a restart file for each failure
detected, including failures detected during
error isolation

none

Does not generate restart files

The default generates a restart file for the first failure
detected.
For additional information, refer to the following:
chtpnt(l), restart(l), chtpnt(2), and restart(2).
cpu x

4-2

Selects cpu x. x can be a, b, c, d, e, f, q, or h.
The default is cpu a.

CRAY PROPRIETARY

SMM-I012 C

cputiae h:m:s
Sets the test execution time in CPU time. The time is
specified in hours (h), minutes (m), and seconds (s);
minutes and seconds; or just seconds. Use colons as
delimiters, as follows: h:m:s.
Generally, actual
the specified CPU
(or is set to 0),
if set to a value

execution time is within one second of
time. If eputime is allowed to default
the test uses the mazp value. However,
other than 0, eputime overrides mazp.

data x:y

Stores data y (octal) at location x (octal) before
the diagnostic is started; no length check is performed on x.

dib x

Allows you to set the following diagnostic information
block (DIB) options in the diagnostic:
Option

Description

modes x
sees x
stop x

Test mode
Section select
Stop condition bits

option x

Refer to the on-line listings for
additional DIB descriptions.

In addition to the previously listed options, you can set
the following options for olcmx only (refer to subsection
4.4.2, olcmx):
Option

Description

param x
rep x

Test control bits
Repeat current pass
Number of parcels requested
Repeat isolation loop
Initial random number
Starting pass count (mazp n must be
greater than rpass x)

r~ix

rislp x
rnum X
rpass x

To determine the dib x settings, refer to the on-line
diagnostic listings.
help

Generates an on-line help display containing a synopsis and
brief description of the command options and arguments. If
help is entered with a test name, help information is
written to stdout, and the test terminates.

mazerr n

Sets the maximum number of errors.
value. The default for n is 1.

SMM-1012 C

CRAY PROPRIETARY

n is an octal

4-3

Sets the maximum number of passes. n is an octal
value. The default for n is 0'1000. If cputime or
time is set to a value other than 0, the specified option
overrides rnClZp.

mazpn

time h:m:s
Sets the test execution time in elapsed (wall-clock) time.
The time is specified in hours (h), minutes (m), and
seconds (s); minutes and seconds; or just seconds. Use
colons as delimiters, as follows: h:m:s.
Generally, actual execution time is within one second of
the specified elapsed time. If time is allowed to
default (or is set to 0), the test uses the rnClZp value •
. However, if specified to a value other than 0, tirne
overrides rnClZp.
+I-verbose
Enables (+verbose) or disables (-verbose) the
generation of informational messages. The +verbose
option causes a line of output to be generated after each
pass of the diagnostic. The default is -verbose.
words n

4.3

Allocates words for memory testing, and sets the DIB
locations rnfrst and rnlast (the first and last memory
addresses to be tested). n is an octal value. If
words n is not entered, the diagnostic sets the test
limits by default. Default values are test-dependent
(refer to the on-line diagnostic listings).

TEST EXECUTION

To start a single diagnostic test, enter the following:

•

test

•

Monitor command options

To run a sequence of diagnostics, use the runsequence utility described
in section 7, Utility Programs.

4.4

TEST-SPECIFIC REQUIREMENTS

This subsection provides information on test-specific requirements and
command line entries. You must observe these requirements to ensure that
the indicated test executes properly.

4-4

CRAY PROPRIETARY

SMM-I012 C

4.4.1

olaht

To run olahtt (on-line A register indexing test), you must set
cput n (the OIB option to set the CPU type), as follows:
Value

CPU Type

10

CRAY X-MP/1

20

CRAY X-MP/2

40 (default)

CRAY Y-MP
CRAY X-MP EA (X-mode)
CRAY X-MP/4

To execute olaht on a CRAY X-MP/2 or CRAY X-MP/l computer system, you
must set cput as previously indicated (rather than allow it to default)
or the test will generate invalid results.
To ensure that the test automatically selects the appropriate cput
value, do the following:
1.

Rename olaht to olabtl or olaht2.

2.

Create a shell script called olaht.

3.

Enter the following information into the olaht shell script:
01aht1 cput 10 $*
or
olaht2 cput 20 $*

4.4.2

olcmx

To run olcmxt (on-line random instruction and operand test) on a Cray
computer system without compressed indexing capabilities, you must set
param n (OIB option to set the test control bits) so that the vector
compressed indexing instructions are disabled. To disable these
instructions, set param as follows:
olcmx param 400000001
The default value for param is 1 (stop on isolated error).
If you
allow param to default, and the Cray computer system does not have
compressed indexing capabilities, the test does not run properly.

t

CRAY X-MP EA (X-mode) and CRAY X-MP computer systems only.

SMM-1012 C

CRAY PROPRIETARY

4-5

To ensure that the test automatically disables the vector compressed
indexing instructions, do the following:
1.

Rename olcmz to olcmza.

2.

Create a shell script called olcmz.

3.

Enter the following information into the olcmz shell script:
olcmxa param 400000001 $*

4.4.3

olibz

To run olibzt (on-line instruction buffer test), you must set cput
(the DIB option to set the CPU type), as follows:
CPU Type

Value
10 (default)

CRAY X-MP/l

20

CRAY X-MP/2

40

CRAY X-MP EA (X-mode)
CRAY X-MP/4

The default value for cput is 10, indicating a CRAY X-MP/l computer
system. If you allow cput to default, and you attempt to run olibz
on a mainframe other than the CRAY X-MP/1, the test executes but it
generates invalid error information. Therefore, ensure that the
appropriate cput value is set.
To ensure that the test automatically selects the appropriate cput
value, do the following:
1.

Rename olibz to olibz4 or olibz2.

2.

Create a shell script called olibz.

3.

Enter the following information into the olibz shell script:
olibz4 cput 40 $*
or
olibz2 cput 20 $*

t

4-6

CRAY X-MP EA (X-mode) and CRAY X-MP computer systems only.

CRAY PROPRIETARY

SMM-I012 C

4.5

TEST TERMINATION

A test stops under the following conditions:

4.6

•

The test successfully completes the maximum number of passes
(mazp n).

•

The test reaches the specified CPU time (eputime h:m:s) or
elapsed (wall-clock) time (time h:m:s).

•

The test detects the maximum number of errors (mazerr n).
If
maxerr is set to a value greater than 1, stop (DIB option to
set stop condition bits) must be set to 0 (continue on error).
Error reports are automatically sent to stdout (standard output
device), but they can be redirected to an error file.

•

The test detects an error and stop is set to 1 (stop on error).

•

The help option is entered with a test name, help information is
written to stdout, and the test terminates.

•

The monitor or test detects an error in a command line entry and
writes a message to stderr (standard error device). Only the
first error detected is reported.

TEST EXAMPLES

The following example executes olvrz with two DIB options set:
sees 3 executes test sections 0 and 1; stop 0 directs the program to
continue on error. To exit a continue on error, enter the kill(l)
command to terminate test execution.
Example:
olvrx secs 3 stop 0

The following example executes olvrx with two DIB options set:
sees 3 executes test sections 0 and 1; data 205:77 stores the value
0'77 at location 0'205.
Example:
olvrx secs 3 data 205:77

SMM-1012 C

CRAY PROPRIETARY

4-7

The following example executes olvrz with one DIB option:
executes test section o.

sees 1

Example:
olvrx sees 1
The following example executes test in CPU c, sets the maximum error
limit to 3, and redirects the output to test. loge.
Example:

test cpu c maxerr 3

>

test. loge

The following example displays test results from test. loge one page
at a time (press the RETURN key to display the next page).
Example:
pg test. loge
The following example executes olcmx in CPU b for 500,000 passes,
starting at pass 500,000. Output is redirected to olcmx.log. The
Dohup(l) command allows the program to continue executing after you log
off the system. You can later log on to check the test's progress. The
ampersand (&) causes the entire command to execute in the background,
so that another prompt is immediately displayed and you can continue to
use the system.
Example:
nohup olcmx cpu b maxp 1000000 rpass 500000 > olcmx.log &
The following example shows the help information that is displayed if
help is entered with a test name.
Example:
olaht help

4-8

CRAY PROPRIETARY

SMM-1012 C

Help display:

olaht help
olaht [help] [chkpnt mode] [cpu x] [cputime h:m:s] [data x:y] [maxerr n]
[maxp n] [time h:m:s] [+I-verbose] [words n] [dib x]
chkpnt mode - Checkpoint mode:
none, first, or all.
(Default: first)
cpu x
- Selects CPU x. (Default: a)
cputime h:m:s- Set amount of CPU time to execute.
data x:y
- Stores data y at diagnostic location x before the
diagnostic is started.
maxerr n
- Sets maximum number of errors.
(Default: 1)
maxp n
- Sets maximum number of passes.
(Default: 0'1000)
time h:m:s
- Set amount of wall clock time to execute.
- Send (+verbose)/do not send (-verbose) informational
+I-verbose
messages to output.
(Default: -verbose)
- Allocates x words for Central Memory testing.
words n
MFRST (sta) and MLAST (lim) are set with the appropriate
values.
- Sets the DIB location to x.
dib x
Refer to the individual test to determine which
DIBs are available for the test.
NOTE: Actual results of setting a DIB location are test-dependent.

The following example shows the output that is displayed when the test is
run with all default values.
Example:
olsr3
Output:
olsr3
olsr3: started running in cpu A on Thu Dec 17 09:10:05 1987
olsr3 reached maximum pass limit with 1000 passes and 0 errors
on Thu Dec 17 09:10:05 1987

The following example shows the output that is displayed if +verbose is
specified and mazp reaches 10.
Example:
olsr3 +verbose maxp 10

SMM-1012 C

CRAY PROPRIETARY

4-9

Output:
olsr3 +verbose maxp 10
olsr3: started running in cpu A on Thu Dec 17 09:10:48 1987
1, error =
olsr3: pass =
0 Thu Dec 17 09:10:48
2, error =
olsr3: pass =
0 Thu Dec 17 09:10:48
3, error
olsr3: pass =
0 Thu Dec 17 09:10:48
4, error =
olsr3: pass =
0 Thu Dec 17 09:10:48
5, error =
0 Thu Dec 17 09:10:48
olsr3: pass =
6, error =
olsr3: pass =
0 Thu Dec 17 09:10:48
7, error =
olsr3: pass =
0 Thu Dec 17 09:10:48
10, error =
olsr3: pass =
0 Thu Dec 17 09:10:48
olsr3 reached maximum pass limit with 10 passes and 0 errors
on Thu Dec 17 09:10:48 1987

1987
1987
1987
1987
1987
1987
1987
1987

The following example shows the output that is displayed if olsr3 is
run for 2 minutes (CPU time) in CPU conly.
Example:
olsr3 cpu c cputime 2:00
Output:
olsr3 cpu c cputime 2:00
olsr3: started running in cpu C on Fri Dec 4 09:11:45 1987
olsr3 reached maximum cputime limit with 1114656 passes and 0 errors
on Fri Dec 4 09:13:49 1987

The following example shows the output that is displayed if mazerr
reaches 1 (default).
Example:
oltrb
Output:
oltrb
oltrb started running in cpu A at Wed Jan 6
0, error =
oltrb: pass =
file
written
to A55663-oltrb
oltrb: restart
630> = 'TRB
NAME
<
REV
632> = 'X3.0
<
DATE
634> = '12/07/87'
<
636> = 'TB RU
MODES
<
642> = 16
MTRT
<
241> = 7654321
SECS
<

4-10

CRAY PROPRIETARY

15:30:34 1988
1 Wed Jan 6 15:30:34 1988

000000 000000 000000 000016
000000 000000 000037 054321

SMM-1012 C

Output (continued):
64>
66>
63>
65>
61>
62>
60>
67>
1440>
1441>

=0
=1
=1
= 1576
= 1777777777777777777
=1
= 1777777777777777776
=0
= 1777777777777777777
= 1777777777777777777

000000
000000
000000
000000
177777
000000
177777
000000
177777
177777

= 1777777777777777777

+ 0001 <

1537>
1540>
1541>

177777 177777 177777 177777
177777 177777 177777 177777
177777 177777 177777 177777

0077 <

1637>

= 1777777777777777777

< 27616>

0000000000000000001
= 0000000000000000100
= 0000000000000000076
= 0000000000000000077
= 0000000000000034772
= 0000000000000037035
= 0000000000000037027
= 0000000000000000001
= 0000000000000001576
0000000000000001311
= 0000000000000001576
= 0000000000000001471
= 0000000000000036711
= 0000000000000000000
= 0000000000000000000
= 0000000000000000000
= 1777777777777777777
= 0000000000000000004
= 0000000000000000000
0000000000000000102
= 0000000000000000001
= 0000000000000000001
= 0000000000000000003
0000000000000000000

177777
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
177777
000000
000000
000000
000000
000000
000000
000000

PASS
STOP
ERROR
ERA
ACT
EXP
DIF
CF
IBUF
IBUF + 0001

IBUF
OBUF
OBUF

OBUF
SAVAO
SAVAO
SAVAO
SAVAO
SAVAO
SAVAO
SAVAO
SAVAO
SAVBR
SAVBR
SAVBR
SAVBR
SAVBR
SAVBR
SAVSO
SAVSO
SAVSO
SAVSO
SAVSO
SAVSO
SAVSO
SAVSO
SAVVL
SAVVM

<
<
<
<
(

<
<
<
<

(

+ 0077 <

(

+
+
+
+
+
+
+
+

0001
0002
0003
0004
0005
0006
0007

0001
0002
+ 0003
+ 0004
+ 0005
+
+

+
+
+
+
+
+
+

0001
0002
0003
0004
0005
0006
0007

SMM-1012 C

( 27617>
< 27620>
( 27621>
< 27622>
< 27623>
< 27624>
< 27625>
( 30640>
< 30641>
< 30642>
( 30643>
< 30644>
< 30645>
< 27626>
< 27627>
< 27630>
< 27631>
< 27632>
< 27633>
( 27634>
< 27635>
< 30636>
( 30637>

= 1777777777777777777
= 1777777777777777777

=

CRAY PROPRIETARY

000000
000000
000000
000000
177777
000000
177777
000000
177777
177777

177777
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
177777
000000
000000
000000
000000
000000
000000
000000

000000
000000
000000
000000
177777
000000
177777
000000
177777
177777

177777
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
000000
177777
000000
000000
000000
000000
000000
000000
000000

000000
000001
000001
001576
177777
000001
177776
000000
177777
177777

177777
000001
000100
000076
000077
034772
037035
037027
000001
001576
001311
001576
001471
036711
000000
000000
000000
177777
000004
000000
000102
000001
000001
000003
000000

4-11

Output (continued):
SAVTR
( 30740>
SAVTR + 0001 ( 30741>

= 1777777777777777777
= 1777777777777777777

177777 177777 177777 177777
177777 177777 177777 177777

SAVTR + 0077 ( 31037>

= 1777777777777777777

177777 177777 177777 177777

The first address (FADD) of the diagnostic is 40a
oltrb reached maximum error limit with 0 passes and 1 errors
on Wed Jan 6 15:30:35 1988

4.7

TEST MESSAGES

Each test sends messages to stdout (standard output device) by default
or to a file when UNICOS output redirection is indicated on the command
line. When a test detects an error, the following information is
displayed:
•
•
•

DIBs
Absolute addresses of the DIBs
DIB values in word and parcel formats

The following error messages are sent to stderr (standard error device):

test:

Illegal argument x.
Argument x is invalid. Correct and rerun.

test:

Error selecting cpu x.
CPU x,is unavailable. Contact your CRI representative.

test: Error allocating memory:
number of words
n, error
O.
The test cannot allocate memory. Decrease the amount of memory
requested by the words n option, or regenerate the diagnostic,
and rerun.
If the problem persists, contact your CRI
representative.

=

=

test:

Cannot write restart file.
errno
The test cannot write a restart file.
representative.

4-12

CRAY PROPRIETARY

= n.
Contact your CRI

SMM-1012 C

4.8

DIAGNOSTIC MEMORY IMAGE FOR MAINTENANCE TESTS

Figure 4-1 shows a sample memory image of a diagnostic that is
executing. The diagnostic test is relocated to start at the first
address (FADD) of the test. FADD must be subtracted from the error
address if the diagnostic fails.
After an error occurs, FADD is
displayed in the following format:
The first address (FADD) of the diagnostic is xa
The value x is determined by the length of the on-line monitor program.
The on-line maintenance tests call the following monitor routines:
Routine

Description

UERROR()

The test calls the UERROR() routine when an error is
detected. The monitor dumps the DIB and examines a DIB
macro at the end of the diagnostic for memory areas to be
dumped.

UPASS()

The test calls the UPASS() routine on each successful
pass.

If an error is detected, the following occurs:
1.

The test does the following:
Creates a restart file
Saves the CPU registers using the SAVEREG macro, defined in
the common deck OLMAC
Calls the monitor error function routine, UERROR()
Restores the CPU registers using the RESTORE macro, defined
in the common deck OLMAC
For additional information on the restart file, refer to the
following system calls:
chkpnt(2) and restart(2). The
SAVEREG and RESTORE code is assembled into the on-line
maintenance test, but the memory required to save the registers
is allocated to the following monitor arrays: SAVAO, SAVBR,
SAVSO, SAVTR, SAVVO, SAVVL, and SAVVM.

2.

The system produces a core dump of the diagnostic test area.

SMM-1012 C

CRAY PROPRIETARY

4-13

Location Names

Memory Image

Base Address
UERROR()
UPASS()
SAVAO
SAVBR
SAVSO
SAVTR
SAVVO
SAVVL
SAVVM

Monitor program (olmon)

Data area for storing
register data

FADD
START ( )
DIB

Diagnostic program

SAVEREG
RESTORE
mfrst
Memory allocated for a memory test
mlast
C library routines
Unused area
System stack
Limit Address
Figure 4-1.

4-14

Sample Diagnostic Memory Image

CRAY PROPRIETARY

SMM-1012 C

5.

DONH-DEVICE PROGRAMS

The down-device programs provide on-line CPU and peripheral testing. The
hardware is removed from normal system operations and can be accessed and
exercised only by'the down-device programs.
This section describes the following programs:
Program

Oescription

donut

On-line disk maintenance program
Oown CPU monitor
On-line magnetic tape test

oldJnont
unitap

5.1

donut

The donut program is an interactive, menu-driven diagnostic program for
testing and maintaining 00-10, 00-19, 00-29, 00-39, 00-40, and 00-49 disk
drives. The donut program cannot be run off-line.
The donut program can be used to perform the following functions:
•
•
•
•
•
•

Buffer testingt
Error correction code (ECC) testingtt
Flaw table maintenance
Formatting
1D verificationtt
Surface analysis

The subsections that follow describe the following topics:

•
•
•

•
•
•
•

t
tt

oisk selection
Oisk mode
System mode
Maintenance mode
Warnings and messages
Menu displays
Program execution
Menus
Program execution examples

Multiple-CPU Cray computer systems only
Not available for 00-19 or 00-29 disk drives

SMM-1012 C

CRAY PROPRIETARY

5-1

5.1.1

DISK SELECTION

The donut program can test only one disk at a time. However, multiple
copies of donut can be executed simultaneously to test different disk
drives.
To access a disk, donut uses the same logical device name as that
assigned during system configuration. To select the disk to be
exercised, define the logical device name by doing one of the following:
•

Enter dey from the Main menu (refer to subsection 5.1.6.3,
Commands to Set Arguments)

•

Enter a from the Parameter menu (refer to subsection, 5.1.13,
Parameter Menu)

The donut program attempts to open and retrieve iobuf information for
the specified device, to determine whether the specified logical device
name is valid.
If the logical device name is valid, donut determines the device type
and adjusts the other arguments accordingly. As a precaution, donut
sets the initial cylinder argument to point to a scratch cylinder.
donut reads and verifies the disk flaw tables for the device, and
displays an appropriate message if any abnormalities are detected.
If the logical device name is invalid, donut does not accept disk
requests and the device argument is set as follows:

*

none

*

Reenter a valid logical device name and continue.

5.1.2

DISK MODE

A disk in the system configuration can be in one of the following modes:
Mode

Description

System

UNICOS system routines and all user jobs can access
the disk
Only UNICOS system routines and donut can access
the disk

Maintenance

The current mode is displayed under the MODE heading in the argument
banner of various menus (refer to subsection 5.1.4, Menu Displays).

5-2

CRAY PROPRIETARY

SMM-1012 C

To change the mode, do the following:
1.

Select the mode by doing one of the following:
Enter mode from the Main menu (refer to subsection
5.1.6.3, Commands to Set Arguments)
Enter t from the Parameter menu (refer to subsection
5.1.13, Parameter Menu)

///////////////////////////////////////////////////////

WARNING
The donut program can write to any of the cylinders
on a disk. Therefore, device labels and flaw tables
are vulnerable to accidental destruction.
It is
recommended that writes and surface analysis not be
performed on the CE cylinders that contain the flaw
tables (typically, cylinder 0 and the second-to-Iast
cylinder on a device) unless absolutely necessary, and
then only if backup procedures are used.
Before
writing to a disk, donut displays a message that flaw
table information will be destroyed on those cylinders
that contain information.
///////////////////////////////////////////////////////

5.1.2.1

System mode

In system mode, donut and other user jobs have equal access to the
disk. The following operations are supported:
•
•
•

5.1.2.2

donut can read from and write to CE cylinders only
donut can perform ID verification (except on DD-19s and DD-29s)
Flaw tables can be updated

Maintenance mode

In maintenance mode, only UNICOS and donut requests can access the
disk. All donut functions are valid.
If a maintenance mode function is requested while the disk is in system
mode, the function aborts and donut displays the following message:

*** DIAGNOSTIC TASK ERROR CODE
1 - Device not in Maintenance mode

SMM-1012 C

***

CRAY PROPRIETARY

5-3

5.1.3

WARNINGS AND MESSAGES

The donut program displays various warnings and messages. For example,
the following warning is displayed if you are about to overwrite the User
Flaw Table in donut's area of central memory:

WARNING
USER flaw table in memory will be altered.
Enter go to continue
or enter anything else to abort.

If an invalid command is entered, an error message is displayed and the
menu from which the command was entered is redisplayed. If an invalid
argument is entered, an informative message is displayed. After some of
the informative messages, the following prompt is displayed:
---) Enter anything to continue (--Some of the donut messages require a response. For example, the
following message requires a response to ensure that read, write, and
surface analysis operations are performed on only selected sectors.
LIM ITS

CHE C K

Check CYLINDER, HEAD and SECTOR limits.
Enter go to initiate.
Enter any other character to abort.

5.1.4

MENU DISPLAYS

At the top of various menus is the argument banner displaying the
arguments used in the program. A sample argument banner is as follows:

===============================================================
DEVICE

=

CYLINDERS

HEADS

SECTORS

SLIP

o

DISK

MODE

09:50:10

=
=

=

none -----=
==========================================================================

5-4

* none *

0-0

0-0

0-0

CRAY PROPRIETARY

SMM-1012 C

By default, arguments are displayed in decimal. The cylinder, head, and
sector values must be entered in decimal unless otherwise indicated.

To generate an octal display, enter oct from any of the menus (enter
dec to return to a decimal display).
If you generate an octal display, the following applies:
•

The argument banner displays the heading (OCTAL) to the left of
the arguments

•

The cylinder, head, and sector information is entered and
displayed in octal

5.1.5

PROGRAM EXECUTION

The donut program resides in Ice/bin directory.
enter the following:

To execute donut,

Ice/bin/donut
The initial donut screen display is as follows:

W e 1 com e

t

0

X - M P

V e r s ion

U

N I COS

DON U T

2.0

---) Enter anything to continue (---

To continue, press any key. The program displays the Main menu. From
the Main menu, you can get to various other menus. Menu commands are not
case sensitive. They can be entered in uppercase or lowercase. In this
document, the menus show commands in uppercase; however, the descriptions
show them in lowercase and bold, according to UNICOS conventions.

SMM-I012 C

CRAY PROPRIETARY

5-5

The menu structure is as follows:
Main Menu
Command

Description

a

Displays disk information

b

Displays Buffer Utility menu
Command

e

a

Displays Write Buffer menu

b

Displays Read Buffer menu

Displays Error Utility menu
Command

a

b

5-6

Description

Description
Displays Error Table menu
a

Adds the displayed error to the
Found Flaw Table

b

Adds all errors to the Found Flaw
Table

d

Deletes the displayed error record
from the Error Table

e

Prints the error record to a file

Displays Error Log menu
a

Adds top entry to the Found Flaw
Table

b

Adds all entries to the Found Flaw
Table

c

Prints the entire error log

e

Deletes all error log entries

CRAY PROPRIETARY

SMM-I012 C

Main Menu
Command
f

Description
Displays Formatting menu
Command

s

b

Displays argument banner with warning.
Enter qo to format IDs with flaw
handling.

c

Displays argument banner with warning.
Enter qo to format IDs with no flaw
handling.

e

Displays Examine Data Buffer menu

f

Verifies track IDs using the User Flaw
Table

q

Verifies track IDs without using the User
Flaw Table

z

Displays Parameter menu

Displays Surface Tests menu
Command

SMM-I012 C

Description

Description

a

Displays Write Data menu

b

Displays Read Data and Compare menu

c

Displays argument banner with warning.
Enter qo to perform a read exercise.

d

Displays Surface Analysis menu

e

Displays Examine Data Buffer menu

f

Displays argument banner with warning.
Enter qo to execute a read absolute
operation.

q

Displays argument banner with warning.
Enter qo to execute a write current
data buffer operation.

z

Displays Parameter menu

CRAY PROPRIETARY

5-7

Main Menu
Command
t

Description
Displays Flaw Table Utility menu
Command

Description

a

Displays
Displays
Displays
Displays

b
c
d

Factory Flaw Table menu
User Flaw Table menu
System Flaw Table menu
Found Flaw Table menu

11

Executes the Error Correction Code test

z

Displays Parameter menu
Command

Description
Sets logical device name
Sets cylinder limits
Sets head limits
Sets sector limits
Sets diagnostic flags
Toggles disk mode

a
b

c
d

e
t

q

Exits donut

In addition, there are various commands that can be entered from the Main
menu or various other menus. These commands are described in the
following subsections:

•

Subsection 5.1.6.1, Commands to Display Submenus

•

Subsection 5.1.6.2, Commands to Select Display Format

•
•
•
•
•

Subsection 5.1.6.3, Commands to Set Arguments

•

5-8

Subsection 5.1.6.4, Commands to Display the Data Buffer
Subsection 5.1.6.5, Commands to Display Flaw Table Menus
Subsection 5.1.6.6, Commands to Change the Data Buffer
Subsection 5.1.6.7, Commands to Change the Type of Write
Command Used
Subsection 5.1.6.8, Commands to Display Commands List

CRAY PROPRIETARY

SMM-1012 C

5 • 1. 6

MAIN MENU

Figure 5-1 shows donut's Main menu.

===============================================================
DEVICE

=
=
=

*

none

*

CYLINDERS

HEADS

SECTORS

0-0

0-0

0-0

SLIP

o

DISK

MODE

none

------

09:50:10

=
=
=

=

==========================================================================
D I S K
A
B
E
F
S
T
W
Z

Q

o N LIN E

(DONUT)

Disk Information
Buffer tests
Review Errors
Formatting and ID analysis
Surface tests
Flaw Table Utility
Error Correction Test
Reset Parameters
Exit DONUT - (Quit)

Enter command

==>

Figure 5-1.

5.1.6.1

U TIL I T Y

Main Menu for donut

Commands to display submenus

Table 5-1 lists the Main menu commands, which are used to do the
following:
•

Display disk information (enter a from the Main menu or enter
info from any menu)

•

Display various submenus

•

Execute the Error Correction Code test

SMM-1012 C

CRAY PROPRIETARY

5-9

Table 5-1.
Command

5.1.6.2

Main Menu Commands
Description

a

Displays disk information

b

Displays the Buffer Utility menu

e

Displays the Error Utility menu

f

Displays the Formatting menu

s

Displays the Surface Tests menu

t

Displays the Flaw Table Utility menu

11'

Executes the Error Correction Code test

z

Displays the Parameter menu

q

Quit; exits donut.

Commands to select display format

The following commands for selecting the display format can be entered
from any menu:
Command

Description

oct

Displays the cylinder, head, and sector information in
octal

dec

Displays the cylinder, head, and sector information in
decimal (default)

5.1.6.3

Commands to set arguments

Table 5-2 lists the commands to set arguments from the Main menu or any
of the subsequent menus (except the data pattern menus). Alternatively,
you can set arguments by entering z (reset parameters) from the Main
menu or various other menus.

5-10

CRAY PROPRIETARY

SMM-1012 C

Table 5-2.
Command

5.1.6.4

Description

Sets
Sets
Sets
Sets
Sets
Sets

cyl
dey
flags
hed
mode
sec

Commands to Set Arguments

the cylinder range
the logical device name
diagnostic flags
the head range
the disk mode to system or maintenance
the sector range

Commands to display the data buffer

The donut program keeps a record of the 1-track buffer used during the
last disk operation. When donut writes data or IDs, the buffer
contains data for the last track written. When donut reads data or IDs
or performs surface analysis, the buffer contains data for the last track
read. The buffer is reused during the next disk operation.
To display the data buffer from any menu, enter the following command:
data
The data buffer can also be displayed by entering e from the Formatting
menu (subsection 5.1.9) or the Surface Tests menu (subsection 5.1.10).

5.1.6.5

Commands to display flaw table menus

To display a flaw table without going through the Flaw Table Utility
menu, enter one of the following commands from the Main menu or any of
the flaw table menus, as appropriate:
Command

Description

fac

Factory Flaw Table menu
Found Flaw Table menu
System Flaw Table menu
User Flaw Table menu

fnd

sys
usr

For additional information on flaw tables, refer to subsection 5.1.11,
Flaw Table Utility Menus.

SMM-1012 C

CRAY PROPRIETARY

5-11

5.1.6.6

Commands to change the data buffer

To change the contents of the donut data buffer, the following commands
can be used:
Command

Description

clr

Fills all sectors of the data buffer selected in the
sectors section of the argument banner with O's

fill

Fills all sectors of the data buffer selected in the
sectors section of the argument banner with l's

5.1.6.7

Commands to chanqe the type of write command used

To change the type of write command used during write operations to the
disk, the following commands can be used.
These commands need only be
used for 00-40 type disks.

5-12

Command

Description

wrt

Sets the write command to perform a write (function
code 4) during write operations. The write function is
the default.

fill

Sets the write command to issue a write immediate
(function code 22 octal) during write operations. This
function code is valid only for 00-40 disks.
It may be
used when a controller releases control after all data is
received but before the data is written to the disk and
an error occurs when the remaining data is finally
written.

CRAY PROPRIETARY

SMM-1012 C

5.1.6.8

Commands to display commands list

Entering the he1p command displays a list of global commands that can
be entered from any menu:

Parameters changes:
DEV - Change DEVICE Parameter
CYL - Change CYLINDER Parameter Limits
HED - Change HEAD Parameter Limits
SEC - Change SECTOR Parameter Limits
MODE - Toggle Disk MODE (System/Maint.)
Flaw tables:
FAC - Factory Flaw Table
FND - Found Flaw Table

SYS
USR

- System Flaw Table
- User Flaw Table

Miscellaneous:
CLR - Clear Data Buffer To Zeros
DATA - Display Data Buffer
FILL - Fill Data Buffer With Ones
HELP - Display This Help Information
INFO - Display Disk Information
MAIN - Main Menu
WRT - Select Write Function (WRT=4)
WRIM - Select Write Immediate Function (WRTIM=22 oct)

5.1.7

BUFFER UTILITY MENU

Figure 5-2 shows the Buffer Utility menu (not applicable to 00-19 or
00-29 disk drives). Table 5-3 lists the Buffer Utility menu commands.
These commands display the following submenus:
•

Write Buffer menu

•

Read Buffer menu

From the submenus, you can execute a write or read function in the
controller's 16-parcel buffer. To exercise the basic Cray-to-disk
communication path, put the disk in maintenance mode and execute a write
followed by a read and compare (if the disk is in system mode, other jobs
may be using the buffer and the test may not be effective).

SMM-1012 C

CRAY PROPRIETARY

5-13

=============================================================== 09:54:01 =
CYLINDERS HEADS SECTORS SLIP DISK
MODE
DEVICE
=
=
--------------------------=
=
10
20
2
0049 Maint.
0 - 7
0 - 41
49-2-24A
=
=
==========================================================================

-

B U F FER
A
B
R

Write Buffer
Read Buffer and compare
Return

Figure 5-2.

Table 5-3.

Command

UTI LIT Y

Buffer Utility Menu

Buffer Utility Menu Commands

Description

a

Displays the Write Buffer menu, from which you can
select a data pattern to perform a 16-parcel write
to the buffer

b

Displays the Read Buffer menu, from which you can
compare actual data to the selected data pattern

r

Returns to previous menu

Figure 5-3 shows the Write Buffer menu. Figure 5-4 shows the Read Buffer
menu. Table 5-4 lists the commands for the Write Buffer and Read Buffer
menus.

5-14

CRAY PROPRIETARY

SMM-1012 C

===============================================================
DEVICE

=
=

CYLINDERS

HEADS

SECTORS

SLIP

DISK

09:53:52

MODE

=
=

=

2
10 - 20
0-7
o - 41
0049 Maint.
49-2-24A
=
=
==========================================================================

WR I T E

o
A
C
E
S

Z

B U F F E R

All zeros
Addressing pattern
Alternating 0,1 pattern
Hole
Sequential data
Reset Parameters
Input the data pattern

Figure 5-3.

1
B

All ones
Bump

F

Fixed data
Peak shift
Return

T
R

==>
Write Buffer Menu

===============================================================

=

DEVICE

CYLINDERS

HEADS

SECTORS

SLIP

DISK

09:54:07

MODE

=
=

=

2
o - 41
DD49 Maint.
10 - 20
0-7
49-2-24A
=
==========================================================================

REA D

o
A

C
E
S

Z

B U F FER

All zeros
Addressing pattern
Alternating 0,1 pattern
Hole
Sequential data
Reset Parameters

1
B
F

T
R

All ones
Bump
Fixed data
Peak shift
Return

Input the data pattern ==>

Figure 5-4.

SMM-1012 C

Read Buffer Menu

CRAY PROPRIETARY

5-15

Table 5-4.

Commands for the Write Buffer and Read Buffer Menus

Command

Description

o

All O's

1

All l's

a

Addressing pattern in a Cray word:
Parcel

Value

o

Cylinder number
Head number
Sector number
Word number

1

2
3
b

Bump.
Word
0
1
2
3

c

o
1

Hole.
Word
0
1
2
3

f

5-16

Octal

Hexadecimal

0525252525242104252525
0525250421052525252525
0104212525252525252525
0525252525252525210421

5555
5555
1111
5555

Alternating O's and l's.
pattern:
Word

e

This is a repeating 4-word pattern:

5555
1111
5555
5555

1111
5555
5555
5555

5555
5555
5555
1111

This is a repeating 2-word

Octal

Hexadecimal

1252525252525252525252
0525252525252525252525

AAAA AAAA AAAA AAAA
5555 5555 5555 5555

This is a repeating 4-word pattern:
Octal

Hexadecimal

0525252525256735652525
0525356735252525252525
0735672525252525252525
0525252525252525273567

5555
5555
7777
5555

Fixed data.

5555
7777
5555
5555

7777
5555
5555
5555

5555
5555
5555
7777

This is a 1-word, user-input pattern.

CRAY PROPRIETARY

SMM-1012 C

Table 5-4.

Commands for the Write Buffer and Read Buffer Menus
(continued)

Command

s

Description

Sequential data pattern:
Word

Description

o

Random number
Word 0 + n

n

t

Peak shift.
Word

5.1.8

This is a repeating 3-word pattern:

Octal

Hexadecimal

o

0631466735667356663146

6666 DODD BBBB 6666

1

1567355673554631556735

DODD BBBB 6666 DODD

2

1356733146333567335673

BBBB 6666 DODD BBBB

z

Displays the Parameter menu

r

Return to previous menu

ERROR UTILITY MENU

Figure 5-5 shows the Error Utility menu. Table 5-5 lists the Error
Utility menu commands. These commands display the following submenus:
•
•

Error Table menu
Error Log menu

===============================================================

=
=

DEVICE

CYLINDERS

HEADS

SECTORS

SLIP

DISK

09:54:21

MODE

=
=

10 - 20
2
49-2-24A
o - 41
0049 Maint.
0-7
=
==========================================================================

=

ERR 0 R
A
B
R

Review details of the latest Error Table
Review Error Log
Return

Figure 5-5.

SMM-1012 C

UTI LIT Y

Error Utility Menu

CRAY PROPRIETARY

5-17

Table 5-5.

Error Utility Menu Commands

Description

Command

5.1.8.1

a

Displays an error record and the Error Table menu

b

Displays the error log and the Error Log menu

r

Returns to previous menu

Error Table menu

When a disk request generates an error (such as a seek, read, or write
error), the lOS sends donut an error record containing information such
as function, address, status, and syndromes. The donut program
interprets these records and stores them in the Error Table.
If no error
is detected in the disk function, no error record is returned. The error
table is only valid for the latest call-in-error and is overwritten
during each disk function call.
Figure 5-6 shows an error record for a 00-39 read time-out error, and the
Error Table menu. Table 5-6 lists the Error Table menu commands.

----------------

E R R 0 R
R E COR 0
1 of 1 (octal data) ----------Read
Dev Type
000004 lOP number
0001 Channel #
000032 Major Err
Expect CYL 001511 Fin Err St Unrecov Expect HED 000001 Expect SEC 000017
Disk Funct LMA Rg1 Retry Cnt
000000 Orig Cntlr 007611 Orig GenSt 041600
Sel Stat 0 001600 Sel Stat 1 103200 Sel Stat 2 000200 Sel Stat 3 070200
Err Correc Is off
Sel Stat 4 000200 Unit numbr 000000 Offset dir None
C3 Cor Msk 000000 C3 Cor Off 000000 C2 Cor Msk 000000 C2 Cor Off 000000
C1 Cor Msk 000000 C1 Cor Off 000000 CO Cor Msk 000000 CO Cor Off 000000
Expect LMA 000000 Actual LMA 000000 Fin ctl st 007611 Fin gen st 041600
Fin Dsk Fn Unknown Orig Recov ON -set Finl Recov Unknown
C3 Syn upr 000000 C3 Syn low 000000 C2 Syn upr 000000 C2 Syn low 000000
C1 Syn upr 000000 C1 Syn low 000000 CO Syn upr 000000 CO Syn low 000000

A

Add THIS error to FOUND Flaw Table
Add ALL errors to FOUND Flaw Table
0
Delete THIS error record
E
Erase ALL error records
R
Return
Enter Command or Error Number ==>
B

Figure 5-6.

5-18

Error Table Menu

CRAY PROPRIETARY

SMM-1012 C

Table 5-6.

Error Table Menu Commands

Description

Command

a

Adds the displayed error record to the Found Flaw Table

b

Adds all error records in the Error Table to the Found
Flaw Table

d

Deletes the displayed error record from the Error Table

e

Creates a file called ERRECRD in the current
directory. The error record is saved in this file.

r

Returns to previous menu

5.1.8.2

Error Log menu

The dODut program maintains a log of all disk errors detected during a
session. For each error, the log contains an error summary with the
time, device, address, function, and pattern. The Error Log is deleted
if you exit or abort dODut, or if you enter e from the Error Table
menu. Figure 5-7 shows a typical Error Log display and the Error Log
menu. Table 5-7 lists the Error Log menu commands.

ERR 0 R

LOG

17

LAST=

------------------------------------------------------------------------CHANNEL
ERROR DISK FUNC
TEST
CYL HEAD SEC
TIME
NUM LOG DEV
--------------------------------------09:58:56
09:58:58
09:59:02
09:59:04
09:59:24
09:59:25
09:59:28
09:59:31
10:08:19
10:08:20

1
2
3
4
5
6
7
8
9
10

Bl
Read LMA
0
0
49-2-24A 10
Read LMA
B1
11
0
0
49-2-24A
B1
Read LMA
12
0
0
49-2-24A
Read LMA
Bl
0
0
49-2-24A 13
Read LMA
Bl
0
10
0
49-2-24A
Read LMA
B1
0
0
49-2-24A 11
Read LMA
Bl
12
0
0
49-2-24A
B1
Read LMA
13
0
0
49-2-24A
B1
Read LMA
0
0
49-2-24A 10
B1
Read LMA
0
0
49-2-24A 11
Add TOP entry to FOUND Flaw Table
A
B
Add ALL entries to FOUND Flaw Table
Print out entire log
C
Erase ALL log entries
E
Return
R
Enter Command or Entry Number ==>
Figure 5-7.

SMM-1012 C

Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg

1
1
1
1
1
1
1
1
1
1

Compare
Compare
Compare
Compare
Compare
Compare
Compare
Compare
Compare
Compare

Error Log Menu

CRAY PROPRIETARY

5-19

Table 5-7.

Error Log Menu Commands

Description

Command

5.1.9

a

Adds the top entry in the Error Log to the Found Flaw
Table. Duplicate entries are skipped.

b

Adds all entries in the Error Log to the Found Flaw
Table. Duplicate entries are skipped.

c

Creates a file called DOHULOG in the current
directory. The Error Log is saved in this file.

e

Deletes all Error Log entries.

r

Returns to previous menu.

FORMATTING MENU

Figure 5-8 shows the Formatting menu. Table 5-8 lists the Formatting
menu commands. These commands display the following submenus:
•
•
•

Examine Data Buffer menu
ID Analysis Results menu
Parameter menu

===============================================================

=

DEVICE

CYLINDERS

HEADS

SECTORS

SLIP

DISK

09:57:18

MODE

=
=

=

=

DD49 Maint.
10 - 20
2
o - 41
49-2-24A
0-7
=
==========================================================================

=

FOR MAT TIN G
B
C
E
F
G
Z
R

Format with USER Flaw Table
Format with NO flaw handling
Examine Buffer
Verify IDs with USER flaw table
Verify IDs with NO flaw handling
Reset Parameters
Return

Enter Command

==>

Figure 5-8.

5-20

Formatting Menu

CRAY PROPRIETARY

SMM-1012 C

Table 5-8.

Command

Formatting Menu Commands

Description

b

Uses the User Flaw Table to format IDs

c

Formats IDs without using the User Flaw Table (donut
assumes there are no flaws)

e

Displays the Examine Oata Buffer menu

f

Reads track IDs and does ID verification based on the
assumption that IDs were formatted with the User Flaw
Table (DD-10, DD-39, 00-40, and 00-49 disk drives only)

q

Reads track IDs and does 10 verification based on the
assumption that IDs were formatted without the User
Flaw Table (DD-39, OD-40, and OD-49 disk drives only)

z

Displays the Parameter menu, from which you can set the
arguments in the argument banner

r

Returns to previous menu

5.1.9.1

Logical address of the sector ID

Formatting
applicable
OD-40, and
disks have

is performed on a track basis, using spare sectors if
and the User Flaw Table if specified. Only 00-10, DD-39,
DD-49 disks have a User Flaw Table, and only DD-39 and DD-49
spare sectors.

During formatting, the logical address is written into the sector 10
field.
For flawed sectors, a flawed ID is written into this field. The
formatting routine does the following:
•

Uses the slip argument to calculate the logical address

•

Determines whether the User Flaw Table is to be used

SMM-1012 C

CRAY PROPRIETARY

5-21

When the logical address is written into the sector IO field, the type of
disk drive determines how the data is affected, as follows:
Disk

Logical Address is Written to Sector 10

00-10/39/40/49s

Data in the sector is not affected when the logical
address is written to the IO field.

00-19/29s

The entire data area is corrupted when the logical
address is written to the ID field. dODut does
not automatically write to the newly formatted
sectors. If a read is attempted following a
formatting operation, an unrecoverable error
occurs. Therefore, after completing a formatting
operation, write data before performing a read.

5.1.9.2

position field of the sector IO (00-10s and 00-40s only)

00-40 disk drives can contain the following types of defects:
Oefect Type

Oescription

Hideable

Contains a defect that resides in a 16-byte field
called the defect address, which is skipped during
all disk operations. The defect address is written
to the position field (POS) of the sector ID.

Unhideable

Contains either a defect that spans more than one
address or multiple defects. These defects are not
hidden because only one defect address is skipped
during all disk operations. The sector IO is set to
all l's to indicate that the sector is unavailable.

If a sector has no defects, the sector ID is formatted with the position
field set to 0'511 (all l's).

5.1.9.3

Examine Oata Buffer menu

Figure 5-9 shows the Examine Oata Buffer menu.
Examine Data Buffer menu commands.

5-22

CRAY PROPRIETARY

Table 5-9 lists the

SMM-1012 C

E X A MIN E

nn[, WPH]
A

B/nn
R

D A T A

B U F FER

Oisplay sector nn (Word(8), Parcel(8) or Hex)
Print out ALL sectors
Print out sector nn
Return

Input sector number or option ==)

Figure 5-9.

Table 5-9.

Examine Oata Buffer Menu Commands

Command

nne ,"Ph]

Examine Data Buffer Menu

Description

Displays sector nn in octal words (W) or parcels (P),
or in hexadecimal (H)

a

Prints all sectors to file BUFFER in the current
directory

b,nn

Prints sector nn to file BUFFER in the current
directory

r

Returns to previous menu

5.1.9.4

10 Analysis menu (00-10s, 00-39s, 00-40s, and 00-49s only)

10 analysis can be performed with or without the User Flaw Table (see
commands f and q, respectively, in table 5-8).
The 10 analysis report contains the following field headings for both the
expected and actual lOs:
Heading

Description

NUM

Entry number
Cylinder number
Head number
Sector number

CYL
HEO
SEC

SMM-1012 C

CRAY PROPRIETARY

5-23

The ID analysis report for DD-10s and DD-40s contains the following
additional headings:
Heading

Description

pas

position field (PaS) of the sector ID (contains the
defect address)

SPIN

Spindle associated with the sector ID. Each DD-40
contains four spindles, each of which is associated with
12 sectors. For DD-10s, SPIN is always O.

ID analysis (DD-39s/49s) - Figure 5-10 shows the ID Analysis menu for
00-39 and 00-49 disk drives. Table 5-10 shows the 10 analysis menu
commands.
The following example describes the results of an ID analysis that was
performed using the User Flaw Table (enter f from the Formatting menu).
To verify that lOs are being written correctly, the User Flaw Table is
used to read the lOs of a track containing a flawed 10.
If all IDs match, a display such as the following is generated:

On
On
On
On
On
On
On
On
On
On
On

VERIFYING lOs
Cylinder =
10 at
Cylinder =
11 at
Cylinder =
12 at
Cylinder =
13 at
Cylinder
14 at
Cylinder =
15 at
Cylinder
16 at
Cylinder =
17 at
Cylinder =
18 at
Cylinder =
19 at
Cylinder =
20 at

09:58:55
09:58:56
09:59:01
09:59:03
09:59:05
09:59:06
09:59:06
09:59:08
09:59:08
09:59:09
09:59:09

-------------------------------------All IDs checked were correct
----------------------------------------)

Enter anything to continue <---

If there are any unexpected lOs, such as a flawed ID or an invalid sector
ID, the routine generates an 10 analysis report and displays the report
with the ID Analysis menu (refer to figure 5-10, ID Analysis Menu for
00-39 and 00-49 disk drives).

5-24

CRAY PROPRIETARY

SMM-1012 C

If an ID matches the expected value, MATCH is displayed in the RESULTS
column; otherwise, MISMATCH is displayed. If a mismatch occurs, refer to
the mismatch column to determine whether the error is in the 10's
cylinder (C), head (H), or sector (S). An ID of -1 (0'77) represents a
flawed ID.
Generally, when one ID is in error, all subsequent IDs for the track are
in error. To view specific IDs in the report, enter the desired entry
number (NUM).
V E R I F Y I D A N A L Y S I S FOR 39-1-32A
15:21:55 06/23/87
EXPECTED ID
ACTUAL 10
CYL HED SEC
NUM
CYL HED SEC
RESULTS
MISMATCH
----------------------------- -------1 841
0
0
841
MAT C H
0
0
2 841
1
841
0
1
MAT C H
0
3 841
0
2
- Uncharted flaw found -1 -1 -1
C H S
4 841
0
3
841
0
2
M I SMA T C H - - - )
S
5 841
0
4
841
0
3
M I S M A T C H ---)
S
6 841
0
841
4
M I S M A T C H ---)
5
0
S
7 841
841
0
6
0
5
M I S MAT C H - - - )
S
7
8 841
841
M I S MAT C H - - - )
0
0
6
S
7
841
M I S MAT C H - - - )
9
841
0
8
0
S
10 841
841
0
9
0
8
M I S MAT C H - - - )
S
A

Show all entry types
Show mismatched entries: First=
Print out all entries
C
0
Print only mismatched entries
R
Return
Enter Command or Entry Number ==)
B

Figure 5-10.

1 Last=

72

10 Analysis Menu for OD-39 and DD-49 Disk Drives

10 analysis (OD-40s) - Figure 5-11 shows the ID Analysis Menu for 00-40
disk drives. Table 5-10 shows the 10 analysis menu commands.
The following example describes the results of an 10 analysis that was
performed without using the User Flaw Table (enter q from the
Formatting menu).
The 10 analysis report preceding the 10 Analysis menu (figure 5-11) is
for logical device 40-2-30A (command b, 'Show mismatched entries,' was
entered). The results show that three mismatched entries were detected
in the position (POS) field of the sector ID.

SMM-1012 C

CRAY PROPRIETARY

5-25

The SEC column in the ID analysis report shows the physical sector
number. To calculate the logical sector number, do the following:
1.

Multiply the spindle number (SPIN) by 12 (the number of sectors
in each spindle).

2.

Add the result from step 1 to the physical sector number.

For example, the ID analysis report in figure 5-11 shows physical
sector 5 is associated with spindle 1. Calculate the logical sector
number as follows:
1.
2.

1
12

*
+

12 = 12
5 = 17

(spindle number * number of sectors in the spindle)
(result from step 1 * physical sector number)

Logical sector 17 is the equivalent of physical sector 5 on spindle 1.

V

E R I F Y I D A N A L Y SIS F o R 40-2-30A
15:16:44 05/04/88
ACTUAL ID
EXPECTED ID
RESULTS
CYL HED SEC POS SPIN
MISMATCH
NUM CYL HED SEC POS
------------------ -------3 511 1063
0
3 210
0 M I S M A T C H ->
4 1063
P
0
7
2
1 M I S M A T C H ->
5
114 1063
2
5 511 1063
P
1 169
2 M I S M A T C H ->
1 511 1063
3
P
170 1063
3
D AT A
E N D o F
Show all entry types
Show mismatched entries: First=
Print out all entries
C
D
Print only mismatched entries
R
Return
Enter Command or Entry Number ==>
A

B

Figure 5-11.

5-26

4 Last= 170

ID Analysis Menu for DD-40 Disk Drives

CRAY PROPRIETARY

SMM-I012 C

ID Analysis menu commands - Table 5-10 shows the ID analysis menu
commands.

Table 5-10.

ID Analysis Menu Commands

Command

5.1.9.5

Description

a

Displays all entries in the report

b

Displays only the mismatched entries (which are not
necessarily contiguous). The first and last mismatched
entry numbers are displayed on the command line.

c

Enters the entire report in a file called PRINTIDS,
which is located in the current directory

d

Enters only the mismatched entries in a file called
PRIHTIDS, which is located in the current directory

r

Returns to previous menu

Parameter menu

Figure 5-23 shows the Parameter menu. Table 5-15 lists the Parameter
menu commands (refer to subsection 5.1.13, Parameter Menu).

5.1.10

SURFACE TESTS MENU

Figure 5-12 shows the Surface Tests menu. Table 5-11 lists the Surface
Tests menu commands. These commands display the following submenus:
•
•
•
•
•

Write Data menu
Read Data and Compare menu
Surface Analysis menu
Examine Data Buffer menu
Parameter menu

Surface tests consist of the following operations: reads, writes, read
absolute, and surface analysis. These operations are all performed
within the cylinder, head, and sector ranges specified in the argument
banner. The read absolute operation only reads from the lowest track
specified.

SMM-1012 C

CRAY PROPRIETARY

5-27

===============================================================
CYLINDERS

DEVICE

HEADS

SECTORS

SLIP

DISK

10:11:47

MODE

=
=

=
DD49
Maint.
2
o
7
o
41
10
20
49-2-24A
=
=
==========================================================================
=

S U R F ACE
A
B

C
D
E
F
G
Z

R

T EST

Write data
Read data and compare
Read exercise
Surface Analysis
Examine Buffer
Read Absolute (one track only)
Write Current Data Buffer
Reset parameters
Return

Enter read/write option

Figure 5-12.

Table 5-11.

Command

C HOI C E S

==>

Surface Tests Menu

Surface Tests Menu Commands

Description

a

Displays the Write Data menu, from which you can
select a data pattern to perform a write operation

b

Displays the Read Data and Compare menu, from which you
can read the sectors listed in the argument banner and
compare the data to the selected data pattern.

5-28

c

Reads the sectors listed in the argument banner. This
command can be used to verify the readability of a
sector or group of sectors.

d

Displays the Surface Analysis menu, from which you can
do a write-read-compare on the sectors listed in the
argument banner, using the selected surface analysis
pattern.

e

Displays the Examine Data Buffer menu

CRAY PROPRIETARY

SMM-1012 C

Table 5-11.

Command

Surface Tests Menu Commands (continued)

Description

f

Executes a read absolute operation, reading the
specified sectors of the track with the lowest cylinder
and head numbers in the argument banner. The read is
performed without checking the sector's IO field.
Therefore, the program reads the physical, rather than
the logical, sector addresses.

9

Writes the contents of the data buffer to the specified
cylinder, head, and sector locations

t

Reads the track headers of all the tracks in the
cylinder with the lowest number in the argument banner.
The information is stored in the data buffer. This
menu command is displayed for 00-39s only.

z

Displays the Parameter menu, from which you can set the
arguments in the argument banner.

r

Return to previous menu

5.1.10.1

Write Data, Read Data and Compare, and Surface Analysis menus

Figure 5-13 shows the Write Data menu. Figure 5-14 shows the Read Data
and Compare menu. Figure 5-15 shows the Surface Analysis menu.
Table 5-12 lists the commands for these menus. Use the commands to
select patterns to be used for various operations. For a write or a read
and compare operation, select only one pattern. For a surface analysis
operation, select one or more patterns.

SMM-1012 C

CRAY PROPRIETARY

5-29

===============================================================
DEVICE

=
=

CYLINDERS

HEADS

SECTORS

SLIP

DISK

15:21:55

MODE

=
=

=

DD39 System
39-1-32A 841 - 841 0 - 4
1
o - 23
=
==========================================================================

=

WR I T E

o
A

C

E
G
S

Z

OAT A

All zeros
Addressing pattern
Alternating 0, 1 pattern
Hole
Random
Sequential data
Reset Parameters

1
B

All ones
Bump

F

Fixed data

T

Peak shift
Return

R

==>

Input the data pattern

Figure 5-13.

Write Data Menu

===============================================================
DEVICE

=

CYLINDERS

HEADS

SECTORS

SLIP

DISK

15:21:55

MODE

=

=
=

=
1
39-1-32A 841 - 841 0 - 4
o - 23
0039 System
=
==========================================================================

=

REA 0

o
A
C
E
S
Z

B U F FER

All zeros
Addressing pattern
Alternating 0, 1 pattern
Hole
Sequential data
Reset Parameters
Input the data pattern

Figure 5-14.

5-30

&

COM PAR E
1
B

All ones
Bump

F
T
R

Fixed data
Peak shift
Return

==>

Read Data and Compare Menu

CRAY PROPRIETARY

SMM-1012 C

===============================================================
DEVICE

=

CYLINDERS

HEADS

SECTORS

SLIP

DISK

15:21:55

MODE

=
=

=

39-1-32A 841 - 841 0 - 4
o - 23
1
0039 System
=
==========================================================================

=

SUR F ACE

o

A N A L Y SIS

All zeros
Addressing pattern
Alternating 0, 1 pattern
All patterns but F
Hole
Random
Sequential data
Reset Parameters

A

C

D
E
G
S
Z

Table 5-12.

All ones
Bump

F

Fixed data

T

Peak shift
Return

1

R

Input the data pattern

Figure 5-15.

B

==>

Surface Analysis Menu

Commands for the Write Data, Read Data and
Compare, and Surface Analysis Menus

Command

Description

o

All O's

1

All l's

a

Addressing pattern in a Cray word:
Parcel

o
1

2
3
b

Bump.
Word
0
1
2
3

SMM-1012 C

Value
Cylinder number
Head number
Sector number
Word number

This is a repeating 4-word pattern:
Octal

Hexadecimal

0525252525242104252525
0525250421052525252525
0104212525252525252525
0525252525252525210421

5555
5555
1111
5555

CRAY PROPRIETARY

5555
1111
5555
5555

1111
5555
5555
5555

5555
5555
5555
1111

5-31

Table 5-12.

Commands for the Write Data, Read Data and
Compare, and Surface Analysis Menus (continued)

Command

c

Description

Alternating O's and l's.
pattern:

o
1

Octal

Hexadecimal

1252525252525252525252
0525252525252525252525

AAAA AAAA AAAA AAAA
5555 5555 5555 5555

d

All patterns except the fixed data pattern (F)

e

Hole.
Word
0
1
2
3

This is a repeating 4-word pattern:
Octal

Hexadecimal

0525252525256735652525
0525356735252525252525
0735672525252525252525
0525252525252525273567

5555
5555
7777
5555

f

Fixed data.

q

Random data

s

Sequential data pattern:
Word

o
t

o
1
2

7777
5555
5555
5555

5555
5555
5555
7777

This is a 1-word, user-input pattern.

Random number
Word 0 + n

Peak shift.
Word

5555
7777
5555
5555

Description

n

5-32

This is a repeating 2-word

This is a repeating 3-word pattern:

Octal

Hexadecimal

0631466735667356663146
1567355673554631556735
1356733146333567335673

6666 DDDD BBBB 6666
DDDD BBBB 6666 DDDD
BBBB 6666 DDDD BBBB

z

Displays the Parameter menu

r

Return to previous menu

CRAY PROPRIETARY

SMM-1012 C

5.1.10.2

Examine Data Buffer menu

Figure 5-9 shows the Examine Data Buffer menu. Table 5-9 lists the
Examine Data Buffer menu commands (refer to subsection 5.1.10.2, Examine
Data Buffer Menu).

5.1.10.3

Parameter menu

Figure 5-23 shows the Parameter menu. Table 5-15 lists the Parameter
menu commands (refer to subsection 5.1.13, Parameter Menu).

5.1.11

FLAW TABLE UTILITY MENUS

Figure 5-16 shows the Flaw Table Utility menu. Table 5-13 lists the Flaw
Table Utility menu commands. These commands display the following
submenus:
•
•
•
•

Factory Flaw Table
User Flaw Table
System Flaw Table
Found Flaw Table

===============================================================

=
=

DEVICE

CYLINDERS

HEADS

SECTORS

SLIP

DISK

10:11:53

MODE

=
=

=

2
10 - 20
o - 7 o - 41
49-2-24A
DD49 Maint.
=
==========================================================================

=

F L A W
A
B
C
D
R

TAB L E

FACTORY Flaw Table
USER Flaw Table
SYSTEM Flaw Table
FOUND Flaw Table
Return

Choose a flaw table

Figure 5-16.

SMM-1012 C

U T I LIT Y

==>

Flaw Table Utility Menu

CRAY PROPRIETARY

5-33

Table 5-13.

Command

Flaw Table Utility Menu Commands

Description

a

Displays the Factory Flaw Table (not used for DD-19/29
disks). This table contains the factory flaws
originally found on the disk.

b

Displays the User Flaw Table (not used for DD-19/29
disks). This table contains the physical addresses of
the flawed sectors.

c

Displays the System Flaw Table (sometimes called
the System EFT). This table contains the flaws used
by UNICOS when creating the UNICOS Flaw Map.

d

Displays the Found Flaw Table, which resides in donut.
This table contains flaws detected during surface
analysis.

r

Returns to the previous menu

The flaw table utilities allow you to read, edit, write, or print the
disk flaw tables. Flaw tables can be edited in donut's area of central
memory only. However, donut does not automatically write the edited
tables to disk; you must enter f (Write flaw table to disk) from either
the User or System flaw table, as appropriate. Any function that
requires flaw tables (such as formatting) uses the tables currently in
donut's area of central memory (the tables must be read into donut
before they can be referenced).
To display a flaw table without going through the Flaw Table Utility
menu, enter one of the following commands from the Main menu or any of
the flaw table menus, as appropriate:
Command

Description

FAC
USR
SYS
FND

Displays
Displays
Displays
Displays

the
the
the
the

Factory Flaw Table menu
User Flaw Table menu
System Flaw Table menu
Found Flaw Table menu

For example, if your current screen display shows the User Flaw Table
menu, you can enter sys to display the System Flaw Table menu. To
return to the User Flaw Table menu, enter r.

5-34

CRAY PROPRIETARY

SMM-1012 C

The main heading in each flaw table menu contains the following
information:
•
•
•

Logical device name
Flaw table name
Number of entries

Below the main heading are the following field headings:
Heading

Description

NUM

Entry number
Channel number
Cylinder number
Head number
Sector number
User-input-flaw bit

CHANNEL
CYL
HEAD
SEC
USER

The User and Found Flaw Tables for DO-lOs and 00-40s contain the
following additional headings (and no channel number heading):
Heading

Description

UIH

Hideablelunhideable defects. For additional information,
refer to subsection 5.1.9.2, Position Field of the Sector
10 (DO-lOs and 00-40s only).

Position

Position field (POS) of the sector 10.
contains the defect address.

The POS field

In the System Flaw Table, the field heading contains a contiguous
(CONTIG) number, which is always a value of 1, instead of a channel
number and no USER bit heading; however, this field is not used under
UNICOS.
Each flaw table display lists up to 18 flaws, two per line.
the flaw tables, you can do the following:

From any of

•

Enter a menu command to perform a specific function

•

Enter the number of the first flaw that you want to appear in a
display of any contiguous group of flaws

•

Enter + (plus) or - (minus) to scroll forward or backward,
respectively

For additional flaw information, refer to the Disk Systems Hardware
Reference Manual, CRI publication HR-0077.

SMM-1012 C

CRAY PROPRIETARY

5-35

The flaw tables are shown in the following figures:
Figure

Title

5-17
5-18
5-19
5-20
5-21
5-22

Factory Flaw Table Menu
User Flaw Table Menu for DD-39 and DD-49 Disk Drives
User Flaw Table Menu for 00-10 and DO-40 disk drives
System Flaw Table Menu
Found Flaw Table Menu for 00-19/29/39/49 Disk Drives
Found Flaw Table Menu for DD-10 and DD-40 Disk Drives

Table 5-14 shows the commands for the flaw table menus. These commands
apply to all of the flaw tables unless otherwise indicated.

F ACT 0 R Y

49-2-24A

CYL HEAD SEC

CHANNEL

HUM

F LAW
USER

A2
A2
A2
A2
A2
A2
A2
A2

0
1
2
3
4
5
6
7
7

CHANNEL

HUM

LAST=
CYL HEAD

SEC

249
USER

------------

--------------1
2
3
4
5
6
7
8
9 B2

TAB L E

0
0
0
0
0
0
0
0
5

0
0
0
0
0
0
0
0
1

0
0
0
0
0
0
0
0
0

10
11
12
13
14
15
16
17
18 B2

A2
A2
A2
A2
A2
A2
A2
A2

8
9
10
11
12
13
40
41
43

0
0
0
0
0
0
0
0
5

0
0
0
0
0
0
0
0
21

0
0
0
0
0
0
0
0
0

- Read flaw table from disk
- Check flaw table validity
- Erase flaw table from memory
V
- Print out flaw table
X n - Start display at cylinder n
R
- Return
Enter Command or Flaw Number ==>
B

C
E

Figure 5-17.

5-36

Factory Flaw Table Menu

CRAY PROPRIETARY

SMM-1012 C

USE R

49-2-24A

2
3
4
5
6
7
8
9 B2

-

X n -

o

A2
A2
A2
A2
A2
A2
A2
A2

1

A
C
E
G

CYL HEAD SEC

CHANNEL

NUM·

Add a
Check
Erase
Merge
Start

o
o
o

2
3
4

o

5
6

o

o

o

o
o
o

USER

o
o
o
o

o
o

o

o

o
o

7

o

o

o

7

5

1

o

F LAW

1055
1057
1058
1059
1060
1060
1063
1
1

A
C
E
V
Y n

-

15
6
1

15
2
15
15
1
8

28
16
37
16
27
16
16
2
12

o

o
o
o
o
o
1
1

o

U

U
U

U
U

U
U
H
H

511
511
511
511
511
511
511
69
199

NOM

Figure 5-19.

o

8
9

10

o
o

11

o
o

12
13
40
41
43

o
o

SEC

USER

o

o

o

o

o
o

o
o

o

o

o
o

o
o

o

o

o

5

21

o

Read flaw table from disk
Delete a flaw
Write flaw table to disk
Print out flaw table
Return

HIDEABLE = 425

LAST=1165

CYL HEAD SEC USER U/H POSITION

427
428
429
430
431
432
433
434
435

Add a flaw
B
Check flaw table validity
0
Erase flaw table from memory
F
Print out flaw table
X n Display hideables at CYL n
R
Enter Command or Flaw Number ==)

SMM-I012 C

A2
A2
A2
A2
A2
A2
A2
A2

TAB L E

CYL HEAD SEC USER U/H POSITION

418
419
420
421
422
423
424
425
426

10
11
12
13
14
15
16
17
18 B2

CYL HEAD

249

User Flaw Table Menu for DD-39 and 00-49 Disk Drives

USE R

40-1-36A

LAST=

CHANNEL

NUM

flaw
B flaw table validity
0 flaw table from memory
F FACTORY flaws into USER
V display at cylinder n
R Enter Command or Flaw Number ==)

Figure 5-18.

NOM

o

1

TAB L E

F LAW

6
7

1

1

1

7
7

6
12

19
12

o
o

4
4

13
11
35
13
11
12

o
o
o
o
o
o

14
15
12
14
15
12

3
3
4

151
214
148
151
215
256
117
95
256

H

H
H
H
H
H
H
H
H

Read flaw table from disk
Delete a flaw
Write flaw table to disk
Display unhideables at CYL n
Return

User Flaw Table Menu for 00-10 and 00-40 Disk Drives

CRAY PROPRIETARY

5-37

S Y S T E M

49-2-24A

I: CONTIG

HUM

CYL HEAD

F LAW

TAB L E

SEC

HUM

41

10
11
12
13
14
15
16
17

LAST=

I: CONTIG

CYL HEAD

1

SEC

-----------1

1

495

7

2
3
4
5
6
7
8
9

18

-

B
- Read flaw table from disk
Add a flaw
D
- Delete a flaw
- Check flaw table validity
- Erase flaw table from memory
F
- Write flaw table to disk
- Make SYSTEM table from FACTORY
H
- Make SYSTEM table from USER
- Print out flaw table
X n - Start display at cylinder n
- Return
Enter Command or Flaw Number ==>

A
C
E
G
V

R

Figure 5-20.

F LAW

F 0 U N D

49-2-24A
HUM

System Flaw Table Menu

CHANNEL

1 B2 Bl A2 Al
2
3

CYL HEAD SEC
1

2

USER

HUM

1

3

CYL HEAD

SEC

1
USER

10
12
13
14
15
16
17

5
6
7
8
9

X n

CHANNEL

LAST=

11

4

A
D
V

TAB L E

18

- Add a flaw
- Delete a flaw
- Print out flaw table
- Start display at cylinder

n

C
E
G
R

-

Check flaw table validity
Erase flaw table from memory
Merge FOUND flaws into USER flaw table
Return

Enter Command or Flaw Number ==>

Figure 5-21.

5-38

Found Flaw Table Menu for DD-19/29/39/49 Disk Drives

CRAY PROPRIETARY

SMM-I012 C

40-1-36A
NOM

F LAW

F 0 UNO

NUM

CYL HEAD SEC USER U/H POSITION

1 1055

2

1

15

28

1

2

o u
1

A

-

HIOEABLE =

2

LAST=

CYL HEAD SEC USER U/H POSITION

C
E

G
R

- Check flaw table validity
- Erase flaw table from memory
- Merge FOUND flaws into USER flaw table
- Return

Enter Command or Flaw Number ==>

Figure 5-22.

Found Flaw Table Menu for 00-10 and 00-40 Disk Drives

Table 5-14.

Commands for the Flaw Table Menus

Command

Description

a

Adds a flaw; issues prompts for the flaw arguments and
inserts valid flaws in their proper order in the flaw
table. Flaws cannot be added to the Factory Flaw Table.

b

Reads the flaw table from disk to central memory, after
first deleting the table currently in central memory.
•

System Flaw Table menu
When the System Flaw Table is read from disk,
the table is compared to the UNICOS Flaw Map
and any mismatches are displayed on the screen.

c

Verifies that the flaw table is in order, that no
duplicate entries exist, that values are within a valid
range, and that the table is terminated correctly.
If a
problem exists in any of these areas, a message is
displayed indicating the first entry in error.

d

Deletes a flaw; issues prompts for the entry number of
the flaw to be deleted. The flaw is only removed from
the table currently in central memory (does not affect
the disk-resident table).
Factory flaws cannot be
deleted.

SMM-1012 C

2

511
69

H

- Add a flaw
Delete a flaw
V
- Print out flaw table
X n - Start display at cylinder n

o

TAB L E

CRAY PROPRIETARY

5-39

Table 5-14.

Commands for the Flaw Table Menus (continued)

Command

Description

e

Deletes flaw table from central memory (does not affect
the disk-resident table)

f

Writes flaw table from central memory to disk,
overwriting the disk-resident table. Factory and Found
flaw tables cannot be written to disk.
•

System Flaw Table menu
In addition to writing the table from central
memory to disk, the UNICOS Flaw Map (used by
UNICOS to define alternate sectors for flawed
sectors) will be updated to reflect the new
System Flaw Table.

9

Merges flaws from one flaw table into another. The
menu from which the 9 command is entered and (in some
cases) the device type being exercised determine which
flaw tables are merged. You can enter 9 from the
following menus:
•

Found Flaw Table menu
For DD-39, DD-40, and DD-49 disk drives:
Copies the Found Flaw Table entries into the
User Flaw Table. Duplicate entries are
skipped. Entries are added in their proper
order.
For DD-19 and DD-29 disk drives:
Copies the Found Flaw Table entries into the
System Flaw Table (this does not overwrite
the current System Flaw Table)

•

User Flaw Table menu
Copies the Factory Flaw Table entries into the
User Flaw Table. Duplicate entries are skipped.
Entries are added in their proper order.

5-40

CRAY PROPRIETARY

SMM-1012 C

Table 5-14.

Commands for the Flaw Table Menus (continued)

Command

Description

•

System Flaw Table menu:
Creates a System Flaw Table from the Factory Flaw
Table entries. The SLIP argument determines
which entries are made in the System Flaw Table.

h

Creates a System Flaw Table from the User Flaw Table
entries (h is entered from the System Flaw Table
menu only). The SLIP argument determines which
entries are made in the System Flaw Table.

v

Creates a file with the name of the flaw table
(FACTORY, USER, SYSTEM, or FOUND) in the current
directory.

z n

Displays flaws starting at cylinder n. For DD-40s,
the flaws displayed are unhideable defects.

y n

Displays hideable defects starting at cylinder n
(DD-40s only)

+

Displays the next screen of flaws
Displays the previous screen of flaws

r

5.1.12

Returns to previous menu

ERROR CORRECTION CODE TEST

The Error Correction Code (ECC) test does the following:t

t

1.

Writes a 512-word buffer of random data with O's for ECC.

2.

Reads the data, expecting an ECC error.

3.

Writes the same data with standard ECC.

4.

Reads the data, expecting no errors.

The ECC test cannot be performed on DD-19 or DD-29 disk drives.

SMM-1012 C

CRAY PROPRIETARY

5-41

5.

Compares the data read with that written in step 3.

6.

Displays a message indicating whether the ECC test passed or
failed.
If the test failed, the message also indicates the
word-in-error.

The ECC test uses the DISK and DEVICE arguments (displayed in the
argument banner) and the following software CE cylinder numbers (instead
of the numbers in the argument banner):

n

Cylinder =
Head

=

0

Sector

=

0

5.1.13

Scratch cylinder; typically the last cylinder on
the device.

PARAMETER MENU

Figure 5-23 shows the Parameter menu, from which you can define the
logical device name and set the arguments (parameters) in the argument
banner. Table 5-15 lists the Parameter menu commands.

===============================================================

=

DEVICE

CYLINDERS

HEADS

SECTORS

SLIP

DISK

MODE

=
=

09:50:28

=
=

=

o none -----0-0
0-0
0-0
* none *
=
==========================================================================
PAR A MET E R S
A
Logical Device
B
Cylinder limits
C
Head limits
D
Sector limits
E
Diagnostic flags (not displayed)
T
Toggle disk mode (system/maintenance)
R
Return
Enter Command ==>

Figure 5-23.

5-42

Parameter Menu

CRAY PROPRIETARY

SMM-1012 C

Table 5-15.
Command

Parameter Menu Commands

Parameter

Description

a

DEVICE

Sets the logical device name.
respond to the prompts.

b

CYLINDERS

Sets the cylinder range.
to the prompts.

c

HEADS

Sets the head range.
the prompts.

d

SECTORS

Sets the sector range.
the prompts.

e

FLAGS

Sets diagnostic flags related to IDS error
handling and read-ahead/write behind
operations. You can set any combination
of the following flags:
Flag

You must

You must respond

You must respond to

You must respond to

Description

a

Returns the error record to the
diagnostic error logger, diaqerr

b

Disables error recovery. The lOS
does not attempt a retry.

c

Disables error reporting. The IDS
does not log errors in the error
logger.

d

Disables read-ahead/write behind
operations

If no flags are set, all flags are enabled.
t

r

SMM-I012 C

MODE

Sets the disk mode to system or
maintenance
Returns to previous menu

CRAY PROPRIETARY

5-43

5.1.14

EXITING donut

To exit donut, enter q from the Main menu. The exit process does not
change the disk mode or write any edited flaw tables to disk. (It is
assumed that these operations are performed prior to exiting.) The final
donut screen display is as follows:

Goo d bye

5.1.15

fro m

o

0 NUT

PROGRAM EXAMPLES

This subsection contains various dODut execution examples, all of which
originate from the Main menu.
Example 1 shows how to enable maintenance mode for a 00-39 disk with a
logical device name of 39-2-27A.
Example 1:
1.

Enter z (reset parameters) from the Main menu.

2.

Enter a (logical device) from the Parameter menu.
Enter 39-2-27A for the logical device name.

3.

Enter t (toggle disk mode) from the Parameter menu.
Enter qo to acknowledge the warning.
The following message is displayed and remains on the screen
until the disk is offloaded and in maintenance mode:
Please wait while 39-2-27A is entering MAINTENANCE mode

4.

5-44

Enter r to return to the Main menu.

CRAY PROPRIETARY

SMM-1012 C

Example 2 shows the procedure to do the following:
•

Read the User Flaw Table from disk

•

Add the following flaw to the table:
CYLINDER=25, HEAD=2, SECTOR=19, all surfaces

•

Write the modified User Flaw Table to the disk

•

Print the User Flaw Table in octal format

Example 2:
1.

Entert (Flaw Table Utility) from the Main menu.

2.

Enter b (USER Flaw Table) from the Flaw Table Utility menu.

3.

Enter b (Read flaw table from disk) from the User Flaw Table
menu.
Enter qo to acknowledge the warning.

4.

Enter a (Add a flaw) from the User Flaw Table menu.
Enter
Enter
Enter
Enter

5.

25 for the cylinder number.
2 for the head number.
19 for the sector number.
a for all surfaces.

Enter f (Write flaw table to disk) from the User Flaw Table
menu.
Enter qo to acknowledge the warning.

6.

Enter v (Print out flaw table) from the User Flaw Table menu.
Enter c for octal format.
Enter r to return to the User Flaw Table menu.

7.

Enter r to return to the Main menu.

Example 3 shows the procedure to do the following:
•
•

Format the track of CYLINDER=25, HEAD=2 (using the User Flaw Table)
Verify that the IDs were written correctly

Example 3:
1.

Enter f (formatting and ID analysis) from the Main menu.

2.

Enter z (reset parameters) from the Formatting menu.

SMM-I012 C

CRAY PROPRIETARY

5-45

Example 3 (continued):
3.

Enter b (cylinder limits) from the Parameter menu.
Enter 25 for the lower cylinder number.
Enter 25 for the upper cylinder number.

4.

Enter c (head limits) from the Parameter menu.
Enter 2 for the lower head number.
Enter 2 for the upper head number.
Enter r to return to the Formatting menu.

5.

Enter b (Format with USER Flaw Table) from the Formatting menu.
Enter go v after checking the formatting limits.
After formatting, the IDs are checked. If all IDs match
their expected values, a message to that effect is displayed
with the following prompt:
---) Enter anything to continue <--If an ID error occurs, the ID Analysis Results menu is
displayed. Check the results and/or obtain a printout.
Enter r to return to the Formatting menu.

6.

Enter r to return to the Main menu.

Example 4 shows how to perform surface analysis on cylinder 25, using the
default patterns and executing the random pattern 50 times with a seed
value of 6065.
Example 4:
1.

Enter z (reset parameters) from the Main menu.

2.

Enter b (cylinder limits) from the Parameter menu.
Enter 25 for the lower cylinder number.
Enter 25 for the upper cylinder number.

3.

Enter c (head limits) from the Parameter menu.
Enter a for all heads.

4.

Enter d (sector limits) from the Parameter menu.
Enter a for all sectors.

5-46

CRAY PROPRIETARY

SMM-1012 C

Example 4 (continued):
5.

Enter r to return to the Main menu.

6.

Enter s (surface tests) from the Main menu.

7.

Enter d (surface analysis) from the Surface Tests menu.
Enter d to execute all patterns except the fixed data
pattern.
Enter 50 for the number of random passes.
Enter 6065 for the seed value.
Enter go after checking the arguments.
The display changes as the program analyzes each track.
After all tracks are analyzed, the program displays a
message indicating the number of flaws added to the Found
Flaw Table. This signals the end of the surface analysis
operation.
Respond to the following prompt:
---) Enter anything to continue (--Enter r to return to the Surface Tests menu.

8.

Enter r to return to the Main menu.

Example 5 shows the procedure to do the following:
•

Read the User Flaw Table for the DD-49 disk with a logical device
name of 49-1-24A.

•

Add the following flaw to the User Flaw Table:
Cylinder
Head
Sector
Channel

= 1507
3
=
= 17
= A2

(octal)
(octal)

•

Generate a printout of the User Flaw Table (in octal).

•

Write the User Flaw Table to disk.

•

Generate the System Flaw Table from the User Flaw Table.

•

Generate a printout of the System Flaw Table (in octal).

SMM-1012 C

CRAY PROPRIETARY

5-47

•

Write the System Flaw Table to disk •

•

Reformat Cylinder
central memory.

= 1507,

Head

= 3,

using the User Flaw Table in

Example 5:
1.

Enter oct (octal display) from the Main menu.

2.

Enter dey from the Main menu to change the logical device name.

3.

Enter 49-1-24A.

4.

Enter usr (display the User Flaw Table) from the Main menu.

5.

Enter b (read flaw table from disk) from the User Flaw Table
menu.
Enter go to acknowledge the warning.

6.

Enter a (Add a flaw) from the User Flaw Table menu.
Enter
Enter
Enter
Enter

7.

1507 for the cylinder number.
3 for the head number.
17 for the sector number.
a2 for the channel number.

Enter v (Print out flaw table) from the User Flaw Table menu.
Enter c for octal printout.

8

Enter f (Write flaw table to disk) from the User Flaw Table
menu.
Enter go to acknowledge the warning.

9.

Enter sys (display the System Flaw Table) from the User Flaw
Table menu.

10.

Enter h (Make SYSTEM table from USER) from the System Flaw
Table menu.

11.

Enter v (Print out flaw table) from the System Flaw Table menu.
Enter c for an octal printout.

12.

Enter f (Write flaw table to disk) from the System Flaw Table
menu.
Enter go to acknowledge the warning.

13.

5-48

Enter r to return to the Main menu.

CRAY PROPRIETARY

SMM-1012 C

14.

Enter cyl (set cylinder range) from the Main menu.
Enter 1507 for the lower cylinder number.
Enter 1507 for the upper cylinder number.

15.

Enter hed (set head range) from the Main menu.
Enter 3 for the lower head number.
Enter 3 for the upper head number.

16.

Enter f

(Formatting and ID analysis) from the Main menu.

17.

Enter b (Format with USER Flaw Table) from the Formatting menu.
Enter go after checking the argument limits.
After formatting, the IDs are checked.
If all IDs match
their expected values, a message to that effect is displayed
with the following prompt:
---) Enter anything to continue (--If an ID error occurs, the ID Analysis Results menu is
displayed. Check the results and/or obtain a printout.
Enter r to return to the Formatting menu.

18.

Enter r to return to the Main menu.

Example 6 shows how to return the disk to system mode before exiting
dODut.
Example 6:
1.

Enter z (reset parameters) from the Main menu.

2.

Enter t

(toggle disk mode) from the Parameter menu.

(Alternatively, you can enter mode from the Main menu instead of
steps 1 and 2 and proceed with step 3.)
3.

Enter go to acknowledge the request.
The following message is displayed and remains on the screen
until the disk is in system mode:
Please wait while 39-2-27A is entering SYSTEM mode

4.

Enter r to return to the Main menu.

5.

Enter q to exit donut.

SMM-1012 C

CRAY PROPRIETARY

5-49

5.2

oldman

The oldmant monitor is the down CPU monitor, which initiates,
controls, and monitors the down CPU tests. These tests execute under
oldman in multiple-CPU environments only.
For a list of the down CPU
tests, refer to appendix A, On-line Diagnostic Programs. For information
on the down CPU interface to UNICOS, refer to cpu(4D).

5.2.1

DOWN CPU TESTS

The down CPU tests are executed in a down CPU from an operational CPU.
Down CPU tests cannot be executed in monitor mode; consequently, they
cannot perform I/O operations. A CPU other than the down CPU initiates
I/O activity and all CPUs other than the down CPU are favored for
external interrupts.
If the down CPU receives interrupts, it redirects
them to another CPU. For additional information on interrupts and
monitor mode, refer to the following manuals, as appropriate:
CSM0111000
CSM0110000
CSMOl12000
CSM-0400-000

CRAY
CRAY
CRAY
CRAY

X-MP/1 System Programmer Reference Manual
X-MP/2 System Programmer Reference Manual
X-MP/4 System Programmer Reference Manual
Y-MP System Programmer Reference Manual

To execute in a down CPU, a program must meet the following requirements:
•

Must be an absolute binary

•

Must not require any operating system support (the program cannot
allow screen output, keyboard input, disk reading, or disk writing)

The oldmon monitor does the following:

t

•

Downs the CPU

•

Loads a down CPU test from a file into central memory

•

Monitors and controls the execution of a down CPU test

•

Loads central memory areas from files

•

Allows an operator to modify the central memory image of a down
CPU test

Multiple-CPU Cray computer systems only

5-50

CRAY PROPRIETARY

SMM-1012 C

•

Displays central memory areas in various data formats

•

Writes central memory areas to files

•

Dumps central memory areas in a variety of formats to files or to the
expander printer

•

Executes user-defined program loops

5.2.2

PROGRAM SYNOPSIS

The oldman monitor resides in Ice/bin. Log on interactively at the
system console or any other supported front-end station (refer to the
appropriate front-end station reference manual).
Synopsis:
oldmon [-d cpulist] [-q] [-u cpulist]

-d

cpulist
Down CPUs immediately.
following format:

cpulist is entered in the

n, n, ••• , n
n is a value in one of the following ranges:
O,1,2, ••• ,n

or

a,b,c, ... ,x

If allowed to default, no CPUs are downed.
-q

Exit oldmon after processing the command line entry.
This command option should be entered with other options.

-u cpulist

Return CPUs to normal system operations.
entered in the following format:

cpulist is

n,n, ... ,n

n is a value in one of the following ranges:
O,1,2, ••• ,n

SMM-1012 C

or

a,b,c, ... ,x

CRAY PROPRIETARY

5-51

Table 5-16 lists the oldman commands. For additional information on
these commands, refer to subsections 5.2.5.2 through 5.2.5.17.

Table 5-16.

oldman Commands

Command

5-52

Description

a

Appends a formatted central memory dump to a file

c

Specifies a new default CPU

d

Dumps a formatted central memory dump to a file

e

Enters a value at a specific address

f

Fills consecutive central memory locations

9

Starts a test in a CPU

h

Halts test execution in a down CPU

I

Loads a test into a CPU's central memory buffer

o

Sets test options

q

Exits oldman

r

Redraws the display

s

Updates the current Exchange Package of the current CPU

u

Returns a down CPU to normal system operations

v

Views a formatted area of central memory

v

Writes an area of central memory to a binary file

z

Executes a command buffer containing oldman commands

CRAY PROPRIETARY

SMM-1012 C

5.2.3

PROGRAM EXECUTION

When oldman is started, it does the following:
1.

Allocates an area of central memory to each CPU

2.

Loads the test loop code into each CPU's memory area

3.

Executes $HOME/.oldmanrc (a profile file containing any
oldman commands)

4.

Displays the Main menu for oldman (refer to figure 5-24)

A/Dump Cpu Enter Fill Go Halt Load Opts Quit Redraw Stat Up View Write Xecute

Figure 5-24.

Main Menu for oldmon

The following subsections describe program execution under oldman:
•
•
•
5.2.3.1

Down CPU tests (listed in appendix A, On-line Diagnostic Programs)
Test loop code
Environment variables
Down CPU tests

The down CPU tests reside in Ice/oldman. Two types of down CPU tests
run under oldman: confidence tests and maintenance tests. The down
CPU confidence tests are on-line confidence tests that have been
converted to run under oldman (off-line). The down CPU maintenance
tests are taken from the off-line diagnostic release.t
The initial Exchange Package starts each test. The current Exchange
Package allows a test to continue from the point at which it is
interrupted.
For a list of the off-line diagnostics (down CPU tests) that run under
oldmon, refer to Appendix A, On-line Diagnostic Programs.

t

The down CPU maintenance tests are deferred for CEA systems.

SMM-1012 C

CRAY PROPRIETARY

5-53

Modifications to the off-line diagnostic test base - The down CPU tests
are derived from the off-line diagnostic release X3.0. Some of the
off-line diagnostics require modifications before they can be executed in
a down CPU test environment. A configuration file containing a list of
oldman commands is used to make the necessary modifications.
When oldman is executed, it attempts to access the configuration file
oldman.cf. If oldman.cf is found, oldman uses the information in
the oldmon.cf configuration file to automatically configure a loaded
diagnostic to execute in a down CPU environment; if oldman.cf is not
found, oldman uses the default configuration file.
If oldmon.cf is not found, you can initialize it by entering y (yes)
in response to the following prompt:
Cannot find configuration file oldmon.cf, should I initialize it?
Enter Yes or No (yIn»
If you enter n (no), oldman does not initialize oldman.cf.
Default configuration files - The default configuration files are used to
make the necessary modifications to the off-line diagnostics tests, so
that they can execute in a down CPU test environment:
The following is the default configuration file for a CRAY X-MP computer
system.

# OLDMON configuration file for X-MP off-line diagnostics.
#
aht:
e k cput 40
# Set CPU type, 20 for X-MP/2, 40 for X-MP/4
e k mlast 7777
# Set last address to be tested
o 1 7777
# Set limit address
arb:
e 40a 005000
# Change MTA 1/0 routine to return
e 140 100000000000
# Set P in SEXP
e 143 160000000000000
# Set mode bits in SEXP
e 144 1000000000000000000000 # Set EMA bit in SEXP
arm:
# Nothing to configure
brb:
0 1 1577
# Set limit address
cmp:
# Nothing to configure
cmx:
e 26c 1000
# Run CMX with cluster 0
o 1 44777
# Set limit address
e 1152c 001000
# Change monitor req. exch to pass
gth:
e k mlast 33777 # Set last address to be tested
o 1 33777
# Set limit address
ibz:
e k cput 40
# Set CPU type, 20 for X-MP/2, 40 for X-MP/4
e k secs 62
# Can only run sections 1, 4 and 5
# Set limit address
o 1 400777
e k cpun 1
# Set number of CPUs to 1
mit:
e k cput 40
# Set CPU type, 20 for X-MP/2, 40 for X-MP/4
e k mlast 7777
# Set last address to be tested
o 1 7777
# Set limit address

5-54

CRAY PROPRIETARY

SMM-1012 C

Default configuration file (continued):
sfa:
sfm:
sfr:
sis:
sr3:
sra:
srb:

e 20Sc 177777
e 205e 177777
e k secs 65432

srI:

e
e
e
e
e
e
e
e

o

srs:

stan:
svc:
trb:
vpp:
vra:

o
o
e
o
e
e
e
e
e
e
o
o
o
e
o
o
o
o
o
o

vrl:

vrn:
vrr:
vrs:
vrx:
olcrit:
olcsvc:
olcfpt:
olibuf:
olcm:

# Disable timing portion of test
# Disable timing portion of test

# Disable section 1 of test
# Nothing to configure
I 6277
# Set limit address
# Nothing to configure
# Nothing to configure
40a 005000
# Change MTA liD routine to return
140 100000000000
# Set P in SEXP
143 160000000000000
# Set mode bits in SEXP
144 1000000000000000000000 # Set EMA bit in SEXP
40a 005000
# Change MTA liD routine to return
140 100000000000
# Set P in SEXP
143 160000000000000
# Set mode bits in SEXP
144 1000000000000000000000 # Set EMA bit in SEXP
# Nothing to configure
I 1577
# Set limit address
1 1577
# Set limit address
20Sc 177777
# Disable timing portion of test
I 2077
# Set limit address
20Sc 177777
# Disable timing portion of test
40a 005000
# Change MTA liD routine to return
140 100000000000
# Set P in SEXP
143 160000000000000
# Set mode bits in SEXP
144 1000000000000000000000 # Set EMA bit in SEXP
205b 177777
# Disable timing portion of test
1 2777
# Set limit address
1 4777
# Set limit address
I 2077
# Set limit address
205d 177777
# Disable timing portion of test
I 23777
# Set limit address
1 60000
# Set limit address
1 50000
# Set limit address
1 40000
# Set limit address
1 30000
# Set limit address
1 40000
# Set limit address

The following is the default configuration file for a CEA system.

# OLDMON configuration file for Y-MP off-line diagnostics.
#
olcrit:
0 1 60000
# Set limit address
olcsvc:
0 1 50000
# Set limit address
olcfpt:
olibuf:
olcm:

0
0
0

SMM-1012 C

1 40000
1 30000
1 40000

it Set limit address
# Set limit address
# Set limit address

CRAY PROPRIETARY

5-55

5.2.3.2

Test loop code

The test loop code can be used to build a failing loop. The initial
Exchange Package resides at address 0'140. Use either the Enter or Fill
command to overwrite the PASS instructions (instruction 001000 at address
0'500a) with the suspected failing code. The suspected failing code (at
address O'500a) is executed with the test loop. The program then jumps
to a check routine.
The check routine does the following:
1.

Compares the actual results in Sl to the expected results in S2

2.

Increments the PASS and ERROR counts

3.

Jumps to the suspected failing code sequence (at address O'500a)
to loop

The current Exchange Package resides at address 0'120. It allows the
loop to continue from the point at which it is interrupted.
The test loop code is as follows:
START

=

SO
PASS,
ERROR,
ACT,
EXP,
OIF,
MAINLOOP

=

J

*
*
*
*
*

5-56

Initialize values.

*

0
SO
SO
SO
SO
SO

*

TESTCOOE

; Jump to testcode provided by user.

Test code provided by user should return here. The test code can
use all registers. It should return with sl containing the
actual value, and s2 containing the expected value.

CRAY PROPRIETARY

SMM-1012 C

Test loop code (continued) :
TE8TRTN

=

*

80
JSZ

81\82
CONTIN

ACT,
EXP,
DIF,

Sl
S2
SO

S6
S7
S6ERROR,

ERROR,
1
S6+S7
86

Increment error count

S6
SO
JSN

STOP,
S6\S7
CONTIN

check stop flag

Compare actual and expected.
No failure, increment pass count.
;

ERR
CONTIN

=
S6
S7
S6
PASS,
J

Save actual result
Save expected result
Save difference

Stop on error

*

PASS,
1
S6+S7
S6
MAINLOOP

Increment pass count

The following gives the locations of items within the test code.
CRAY X-MP
Computer System
START
TESTCODE
PASS
ERROR
ACT
EXP
DIF
STOP

200
500
24
23
21
22
20
26

CEA System
2000
2100
1104
1103
1101
1102
1100
1010

Location TESTCODE contains a series of PASS instructions, followed by an
unconditional jump to TESTRTN. You can create a test loop by overwriting
the PASS instructions at TESTCODE with the suspected failing
instructions. Before the jump to TESTRTN, the actual value should be in
Sl, and the expected value in S2.

SMM-1012 C

CRAY PROPRIETARY

5-57

5.2.3.3

Environment variables

The oldman environment can be modified by setting certain environment
variables. These variables are as follows:
Variable

Description

DMONPATH:

Enter a list of directories to search when opening a
file for reading. Separate directories with a
colon. When oldman tries to read a file, it first
checks the current directory for that file. If the
file is not found, oldman checks SHOME/oldman.
If the file is not found, the program searches the
directories specified by the DMONPATH environment
variable. If the file is not found in any of those
directories, the program searches the directory
Ice/oldman. If the file is still not found,
oldman issues an error message.

OLDMON_PRINTER:

Command used to print output. The data to be
printed is sent to stdin (the command's standard
input). If this variable is not defined, ezlp(l)
is used.

TERM:

Terminal type being used. The terminal specified
must be defined in the terminfo(4F) database.

Set the environment variables before entering oldman. If you are
running under the Bourne shell, sh(l), enter the following:
VAR=value
export VAR
If you are running under the C shell, csh(l), enter the following:
setenv VAR value
Examples:
To specify a VT100 terminal type while running under csh(l), enter the
following at the csh(l) prompt:
~

setenv TERM vt100

To specify an oldman search path while running under sh(l), enter the
following at the sh(l) prompt:
$ DMONPATH=search-path-one:search-path-two
$ export DMONPATH

5-58

CRAY PROPRIETARY

SMM-1012 C

To specify a different print command while running under sh(l), enter
the following at the sh(l) prompt:
$ OLDMON PRINTER='remsh remsys lusr/ucb/lpr'
$ export OLDMON PRINTER

In the preceding example, the single quotes are necessary because the
command contains spaces. When oldman wants to print output, it will
execute this command and send the data to be printed to the standard
input (stdin) of this command. In this example, the remsh command
will initiate a remote shell on the remsys system and execute the
lusr/ucb/lpr command on the remote system. This allows oldmon output
to be sent to a printer attached to a remote system. See remsh(l) for
more information.

5.2.4

DISPLAY MODES

The following subsections describe the oldman display modes:
•

Scroll mode display

•

Screen mode display

The oldman display contains the following information:
Information

Description

Command menu

Lists input values

Command prompt

Prompts user for information

Error messages

Identifies error condition

SMM-1012 C

CRAY PROPRIETARY

5-59

Information

Description

CPU status

Displays the following information for the current
CPU:
•

State of the CPU:
Up
Down
Down, idle
Down, running

•

Name of the diagnostic in the current CPU

•

Program register (P) of the current Exchange
Package of the current CPU

•

Status bits (S) of the current Exchange
Package of the current CPU
For CRAY X-MP computer systems:

f fff nun nun c
f fff indicates flags.
nun nun indicates mode bits.
C indicates the cluster number.
For CEA systems:

ffff mmmm cc
ffff indicates flags.
mmmm indicates modes.
cc indicates the cluster number.

5-60

Down CPU list

List of the down CPUs

Time

Current date and time

Display area

Display area for the portion of central memory
associated with the current CPU. The display area
can be divided into separate displays, showing
different areas of central memory. In addition,
each central memory display can be formatted
differently. For additional information, refer to
subsection 5.2.5.16, View command (v).

CRAY PROPRIETARY

SMM-I012 C

5.2.4.1

Scroll mode display

Figure 5-25 shows a scroll mode display.

CPU

B: Down, running Name: offcrit
Oa
B
0
8 0000 0000 00
L
0
DIB display for olcrit
name
='olcrit
rev
='5.0
='10/12/88'
date
pass
= 252
error
=0
seed
= 1206302764022300543002
failpat
='onezero '
failcln
= 0
isop
1000
numins
= 200
P

ibuff

12000a S5

87+85

jbuff
jbuff
jbuff
jbuff

12400a
12400b
12401a
12401b

BOO
AO
BOO

AO
32300,0
J

ERR

Wed Oct 19 14:13:14 1988
Downed CPUs:
B

00
01
02
03
04
05
06
07
00
04
10
14
20
24
30
34

00,000,004,000
0675543067115135020040 olcrit
0324561402004010020040 5.0
0304601363046213634070 10/12/88
0000000000000000000252
0000000000000000000000
1206302764022300543002
- @••••
0000000000000000000000
0000000000000000000200
00,000,003,600
running ....................... .
............................... .

·.......
· .......
·· ........
· .......

............................... .

single cpu mode .•.•....•....•.••
••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••

A/Dump Cpu Enter Fill Go Halt Load Opts Quit Redraw Stat Up View Write Xecute

Figure 5-25.

Scroll Mode Display

The following information is displayed (in the order listed):
1.

Current CPU status; time; down CPU list

2.

Central memory display area

3.

Error messages

4.

Command menu

5.

Command prompt

SMM-1012 C

CRAY PROPRIETARY

5-61

The following information applies to command line entries:
•

Enter commands after the command prompt.

•

If a command string is executed, the display scrolls upward and a
new display appears.

•

If a command is entered without a required argument, the argument
menu is displayed with a command prompt. Enter an argument after
the prompt. After all commands are executed, the display scrolls
upward and a new display appears.

5.2.4.2

Screen mode display

Figure 5-26 shows a screen mode display.

AIDump Cpu Enter Fill Go Halt Load Opts Quit Redraw Stat Up View Write Xecute

B: Down, running Name: offcrit
Oa
B
0
S 0000 0000 00
L
0
DIB display for olcrit
name
='olcrit
rev
='5.0
date
='10/12/88'
pass
= 252
error
0
= 1206302764022300543002
seed
failpat
='onezero '
failcln
=0
isop
= 1000
numins
= 200

Wed Oct 19 14:13:14 1988
Downed CPUs:
B
00,000,004,000
00 0675543067115135020040 olcrit
01 0324561402004010020040 5.0
02 0304601363046213634070 10/12/88
03 0000000000000000000252
04 0000000000000000000000
05 1206302764022300543002
- @••••
06 0000000000000000000000
07 0000000000000000000200
00,000,003,600
00 running . . . . . . . . . . . . . . . . . . . . . . . .

ibuff

12000a S5

S7+S5

04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10 ••••••••••••••••••••••••••••••••
14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

jbuff
jbuff
jbuff
jbuff

12400a
12400b
12401a
12401b

BOO
AO
BOO

CPU

P

=

AO
32300,0
J

ERR

Figure 5-26.

5-62

· .......
·.
· .......
· .......

·.......

20
24
30
34

single cpu mode ••...•...•...•.••
••••••••••••••••••••••••••••••••
..•..............••.......••....
••••••••••••••••••••••••••••••••

Screen Mode Display

CRAY PROPRIETARY

SMM-1012 C

To execute in screen mode, your terminal type must be defined in the
terminfo(4F) database.
See terminfo(4F) and curses(3X) for more
information.
The TERM environment variable sets the default terminal type.
If TERM is
set to a valid terminal type, oldman executes in screen mode; if not,
oldmon executes in scroll mode.
For information on the TERM
environment variable, refer to sh(I).
If your terminal type is not defined or is invalid, oldman does not
enter screen mode; instead, an error message is displayed.
In screen mode, the display is updated (overwritten) rather than
scrolled.
The following information is displayed (in the order listed):
1.
2.
3.
4.
5.

Command menu
Command prompt
Error messages
Current CPU status; time; down CPU list
Central memory display area

The following information applies to command line entries:
•

Enter commands after the command prompt.

•

If a command string is executed, the entire display is updated.

•

If a command is entered without a required argument, the argument
menu is displayed with a command prompt.
Enter an argument after
the prompt.
After all commands are executed, the entire display
is updated.

5.2.5

PROGRAM COMMANDS

The oldman commands are entered from a front-end terminal or an lOS
station console.
Figure 5-24 shows the Main menu for oldman.
Unless a complete command string is entered from the Main menu (with all
of the required arguments), the program displays various menus with
prompts for additional entries.
If you enter an invalid argument, the
program displays a menu listing the valid arguments.
Reenter a valid
argument and continue.
Between argument entries, the menu, prompt, and message lines are
updated.
After a command is executed with all of the required arguments,
the entire display is redrawn.

SMM-I012 C

CRAY PROPRIETARY

5-63

The following guidelines apply to all command entries:
•

Select commands from the command menu by entering the first letter
of the command. Depending on the command, the program displays
various menus with prompts for arguments.

•

Enter all inputs in uppercase, lowercase, or a combination of both.

•

Press the Return key to receive a prompt for the next required
argument or to execute the command if all of the required
arguments are entered.

•

Enter the less-than key «) to return to the preceding menu.
allows you to reenter an argument.

•

Enter the greater-than key (»
return to the Main menu.

•

Use a semicolon (;) to combine commands.
a combined command entry:

This

to abort the current command and

The following applies to

If any of the command entries are incomplete, the program
issues a prompt for additional arguments for the first
incomplete command.
If an error is detected in the command list, the program
displays the menu for the first incorrect command. This
allows you to reenter the menu commands and any subsequent
commands.
If you have not yet pressed the Return key to execute the
command list, you can abort the last command in the list by
pressing the greater-than key (». All commands in the list
are executed except the last entry, and the program returns
to the Main menu.

5-64

•

Use white space (blank spaces, tabs, and newline characters) to
indicate the end of an address or file name.

•

Enter a pound sign (#) to start a comment in a command buffer.

CRAY PROPRIETARY

SMM-1012 C

5.2.5.1

Common arguments

Several of the oldman commands accept the following arguments:
Argument

Description

address

Enter an octal address, or press K (Key) followed by a
diagnostic information block (DIB) entry (refer to the
off-line diagnostic listings for a list of DIB entries).
All addresses are relative to the central memory image
of the current CPU. The related menus indicate whether
a parcel or word address is expected.
If a parcel address is required, enter the word
address followed by a parcel designator (do not
leave a space between them). The parcel designator
can be a, b, c, or d; the default is a.
If a parcel address is not required and no parcel
designator is specified, the address is assumed to
be a word address.

cpu

CPU number.
ranges:

cpu is a value in one of the following

0, 1, 2, 3, 4, 5, 6, 7
or

a, b, c, d, e, f, q, h
The default is the current CPU.

file

SMM-1012 C

Enter a valid file name.
Full and relative path names
are valid file names.
If a relative path name is
specified, the program searches for the file in the
current directory.
If the file is not found, the
program uses the DMONPATH environment variable to
search. For information on the DMONPATH environment
variable, refer to sh(l).

CRAY PROPRIETARY

5-65

Argument

Description

format

Enter one of the following arguments to select the
display format for the Dump (d) and View (v)
commands:
Argument

Format

d

DIB format (View command only); displays
the DIB of the diagnostic in the current
CPU.

i

Instruction format; displays central
memory in disassembled instructions. The
program issues a prompt for a word or
parcel address.

p

Parcel format; displays central memory in
6-digit octal parcels. The program
issues a prompt for a word address.

r

Register format (View command only);
displays the registers of the current CPU
when the CPU is down and idle.

t

Text format; displays central memory in
ASCII. The program issues a prompt for a
word address.

w

Word format; displays central memory in
22-digit octal words. The program issues
a prompt for a word address.
Exchange Package format; displays central
memory as an Exchange Package (View
command only). The program issues a
prompt for a word address or an Exchange
Package value. The Exchange Package
arguments are as follows:
Argument
c

s

5.2.5.2

Exchange Package
Current (default)
Starting

Append (a) and Dump (d) commands

To append or dump a formatted central memory dump to a file (commands
a and d, respectively), use the following command synopses.

5-66

CRAY PROPRIETARY

SMM-I012 C

Synopsis (Append command):

a start-address end-address format file

Synopsis (Dump command):
d

start-address end-address format file

You must have permission to write to the specified file.
The file is
created if it does not already exist. Before writing the dump to the
file, the program issues a prompt for comments to precede the dump.
To print the dump, enter an asterisk (*) for file.
See subsection
5.2.3.3, Environment Variables, for more information.
To set append or dump arguments, use the following command synopses.
Synopsis (Append command):
a

argument file

Synopsis (Dump command):
d

argument file

argument

Enter one of the following values for
Argument

5.2.5.3

argument:

Description

d

Appends or dumps the DIB of the diagnostic
in the current CPU to file

r

Appends or dumps the registers of the
current CPU to file (the CPU must be down
and idle)

s

Appends or dumps the current screen to

file

CPU command (c)

To specify a new default CPU, use the following command synopsis.
Synopsis:
c cpu

SMM-1012 C

CRAY PROPRIETARY

5-67

The default CPU's memory area can be displayed in the memory display
area. The Status command is valid for the default CPU only. The Go,
Halt, and Load commands assume the default CPU if a different CPU is not
specified. The initial default CPU is the first CPU downed from the
command line or CPU a if no CPU was downed.

5.2.5.4

Enter command (e)

To enter a value at a specific address, use the following command
synopsis.
Synopsis:

e address value
If address is a parcel address and
program displays an error message.

5.2.5.5

value exceeds 0'177777, the
Reenter and continue.

Execute command (z)

To execute a command buffer containing oldman commands, use the
following command synopsis.
Synopsis:

z file

5.2.5.6

Fill command (f)

To fill consecutive central memory locations, use either of the following
command synopses.
Synopsis:
f

address value ... value

address

Indicates the first central memory location to be filled
with the first value specified. Each consecutive value is
placed in the next consecutive central memory location.
Depending on the address specified, the program fills the
memory location with words or parcels.

Press the Return key after address and after each value. If you press
the Return key without first entering a value, the current central memory
location remains unchanged and the next value specified is placed in the
next consecutive memory location.

5-68

CRAY PROPRIETARY

SMM-I012 C

To return to the preceding word or parcel location, press the less-than
key
You can modify the word or parcel value before proceeding to
the next location.

«).

To signal the completion of the consecutive entries, enter a period (.)
or the greater-than key (».
To fill memory in a specified range with a specific data pattern, use the
following command synopsis:
Synopsis:

fp start-address end-address value

If parcel addresses are specified, each parcel in the given range is
filled with the given data value.
If word addresses are given, the given
range of words is filled with the given data value.

5.2.5.7

Go command (9)

To start a test in a CPU, use the following command synopsis.
Synopsis:

q [cpu] [exchange-package]
exchange-package
Enter one of the following arguments for exchange-package:

Argument

Exchange Package

c

Current
Starting (default)

s

CX/CEA
Location
120
140

CEA
Location
1200
740

address
If the CPU is not down, the program issues a prompt for you to verify the
request to down the CPU. Enter y (yes) to down the CPU and start the
test.
Enter D (no) to cancel the Go command.

5.2.5.8

Halt command (h)

To halt test execution in a down CPU, use the following command synopsis.
Synopsis:
h

[cpu]

The CPU idles until the Go or Up command is executed.

SMM-1012 C

CRAY PROPRIETARY

5-69

5.2.5.9

Load command (1)

To load a test into a CPU's central memory buffer, use the following
command synopsis.
Synopsis:
1

[cpu] address file

file

5.2.5.10

Enter one of the following arguments for file:
Argument

Description

file
*

File containing the test to be executed
Test loop

Options command

(0)

To set test options, use the following command synopsis.
Synopsis:
o

option argument

option

The values for option are as follows (the argument
value is dependent on option):
Option
c

Description
Generates a display that is continuously
refreshed at a specified interval (in
seconds). Use the following command synopsis:
o c

seconds

seconds is the number of seconds; a value
in the range 1 through 9.
To return to the Main menu, an interrupt must
be sent to oldman. Typically, pressing the
Control-C keys sends an interrupt to
oldman. See the appropriate front-end
station guide and stty(l).
d

Downs a specified cPU.
command synopsis:
o d

5-70

Use the following

cpu

CRAY PROPRIETARY

SMM-I012 C

option (continued):
Option

Description
cpu defaults to the current CPU.
The CPU is
downed and left idle.
(The Go command also
downs the CPU.)

I

Sets a new limit address for the current CPU.
Use the following command synopsis:

o 1 address
The new limit address is rounded up to the
next 0'1000 word boundary.
t

Specifies the terminal type (required for
screen mode; refer to subsection 5.2.4.2,
Screen mode display). Use the following
command synopsis:

o t type

type is one of the terminal types defined in
the terminfo(4F) database. The TERM
environment variable sets the default terminal
type. For information on the TERM environment
variable, refer to sh(l).
5.2.5.11

Quit command (q)

To exit oldmon, enter one of the following commands:
Command

Description

eof

End-of-file (typically, press the Control-d keys).
Enter
from any menu. A prompt is displayed before the request
is processed. To verify or cancel the request, enter y
(yes) or n (no), respectively.

q

Quit. Enter from the Main menu only. A prompt is
displayed before the request is processed. To verify or
cancel the request, enter y (yes) or n (no),
respectively.

5.2.5.12

Redraw command (r)

To redraw the display, enter r.

SMM-1012 C

CRAY PROPRIETARY

5-71

5.2.5.13

Shell escape command (!)

To execute a shell command, use the following command synopsis.
Synopsis:
!

[shell-command]

The oldman monitor will execute shell-command in a subshell. If
shell-command is omitted, oldman will execute /bin/sh. You must
exit this shell to continue oldmon. See sh(l) for more information.

5.2.5.14

Status command (s)

To update the current Exchange Package of the current CPU, enter s.
the current CPU is not down, an error message is displayed.

5.2.5.15

If

Up command (u)

To return a down CPU to normal system operations, use the following
command synopsis.
Synopsis:
u [cpu]

5.2.5.16

View command (v)

To view a formatted area of central memory on all or part of the display
area, use the following command synopsis.
Synopsis:
v display format address

display

Enter one of the following arguments for display:
Argument
f
I
r

tl
tr

bl
br

5-72

Description
Full display
Left half of the display
Right half of the display
Top left quadrant
Top right quadrant
Bottom left quadrant
Bottom right quadrant

CRAY PROPRIETARY

SMM-I012 C

To display the DIB of the current diagnostic, use the following synopsis.
Synopsis:

v display d argument

argument

Enter one of the following arguments:
Argument

Description

RETURN

Displays the DIB starting at the
beginning.
Displays the differences section of the
DIB (confidence tests only)
Displays the DIB starting with DIB

d

k key

To display the current values of the CPU's registers, use the following
synopsis.
Synopsis:

v display r

To scroll the display areas forward or backward, use the plus (+) or
minus (-) parameters, respectively. The command synopses are as
follows.
Synopsis:

v [display] +[n]

or

v [display] -[n]

display

Enter the display to be scrolled.
areas are scrolled.

n

Number of lines to scroll. The default for n is 8 if
display is tI, tr, bI, or br. Otherwise, the
default is 16 (the number of lines in the display area).

5.2.5.17

If omitted, all display

Write command (w)

To write an area of central memory to a binary file, use the following
command synopsis.
Synopsis:

w start-address end-address file

SMM-1012 C

CRAY PROPRIETARY

5-73

5.2.6

PROGRAM EXAMPLE

This subsection contains a commented oldman execution example.
Example:
$ oldmon -d b
Do you really want to down CPU b?
Type y or n> y

**************************************************************
The -d b command line option requests that oldman down
CPU B immediately. Enter y to confirm the request.

**************************************************************
Cannot find configuration file oldmon.cf, should I initialize it?
Enter Yes or No (yIn» y

**************************************************************
The oldman monitor cannot locate the configuration file
oldman.cf. Enter y to initialize oldman.cf.

**************************************************************

5-74

CRAY PROPRIETARY

SMM-1012 C

Example (continued):

A/Dump Cpu Enter Fill Go Halt Load Opts Quit Redraw Stat Up View Write Xecute
v

CPU
P
S

B: Down, idle
Oa
0000 0000 00

Name: ** none **

Wed Oct 19 13:21:18 1988

B

a

Downed CPUs:

L

a

B

OLDMaN Version 1.0 - Online Down CPU Monitor
CRAY Y-MP Down CPU Monitor for the
UNICOS Operating System.
Copyright (c) Cray Research, Inc. Unpublished - All rights
reserved under the copyright laws of the United States.
CRAY PROPRIETARY

**************************************************************
The Main menu for oIdmon is displayed.
CPU B is the
default CPU. It is displayed as down and idle. Enter v to
set the View command.

**************************************************************
Display: Full, Top, Bottom, Left, Right; Scroll: + View I

**************************************************************
The choice of input values is displayed. Enter I to
select the left half of the screen as the display area.

**************************************************************

SMM-1012 C

CRAY PROPRIETARY

5-75

Example (continued):
Format: Dib, Instr, Parcel, Word, Register, Text, eXchange pkg; Scroll: + View Left in d

**************************************************************
The choice of input values is displayed.
select the DIB format.

Enter d to

**************************************************************
RETURN for DIB; Differences; Key
View Left in DIB format RETURN

**************************************************************
The choice of input values is displayed.
to display the beginning of the DIB.

Press RETURN

**************************************************************

A/Dump Cpu Enter Fill Go Halt Load Opts Quit Redraw Stat Up View Write Xecute
1

CPU

B: Down, idle

P

S

Oa

Name: ** none **
B
L

0000 0000 00
DIB display unavailable

5-76

0
0

Wed Oct 19 13:22:49 1988
Downed CPUs:

CRAY PROPRIETARY

B

SMM-1012 C

Example (continued):

**************************************************************
The Main menu for oldman is redisplayed. Enter I to load
a diagnostic into the common memory buffer for CPU B.

**************************************************************
Enter word address
Load cpu B at 0 RETURN

**************************************************************
Enter the address within the buffer where the diagnostic
is to be loaded. Pressing RETURN without entering an
address will default to zero.

**************************************************************
Enter file name, * for testloop
Load cpu B at 0 from offcrit

**************************************************************
Enter a file name. In this example, offcrit (off-line
version of olcrit) is specified.

**************************************************************

SMM-1012 C

CRAY PROPRIETARY

5-77

Example (continued):

A/Dump Cpu Enter Fill Go Halt Load Opts Quit Redraw Stat Up View Write Xecute
v r • 4000
CPU
B: Down, idle
Name: offcrit
P
Oa
B
0
L
0
S 0000 0000 00
DIB display for olcrit
name
='olcrit
rev
='5.0
='10/12/88'
date
pass
= 0
error
= 0
seed
= 33
lmstart
= 0
failpat
=
isop
= 1000
numins
= 200
ibuff

17000a EXIT

00

jbuff

17400a EXIT

00

initaO
inita1

Wed Oct 19 13:24:39 1988
Downed CPUs:
B

= 0000000000000000000000
= 0000000000000000000000

**************************************************************
The command string to set the right half of the display
is entered. The blank space between each entry is optional.
1.

Enter v to select the View command.

2.

Enter r to select the right half of the display.

3.

Enter w to select word format.

4.

Enter 4000 to specify the display address.

**************************************************************

5-78

CRAY PROPRIETARY

SMM-1012 C

Example (continued):

A/Dump Cpu Enter Fill Go Halt Load Opts Quit Redraw Stat Up View Write Xecute
e

CPU
B: Down, idle
Name: offcrit
B
0
P
Oa
L
0
S 0000 0000 00
DIB display for olcrit
name
='olcrit
rev
='5.0
='10/12/88'
date
pass
= 0
error
= 0
seed
= 33
failpat
=
failcln
= 0
isop
= 1000
numins
= 200
ibuff

12000a ERR

jbuff

12400a ERR

initaO
inital

= 0000000000000000000000
0000000000000000000000

Wed Oct 19
Downed CPUs:
B
00,000,004,000
00 0675543067115135020040
01 0324561402004010020040
02 0304601363046213634070
03 0000000000000000000000
04 0000000000000000000000
05 0000000000000000000033
06 0000000000000000000000
07 0000000000000000000200
10
11
12
13
14
15
16
17

14:10:33 1988

olcrit
5.0
10/12/88

· .......
· .......
· .......
· .......

·.......

1000000000000000037777 • ••••• ? •
0000000000000000000000
0000000000000000000007
0000000000000000000000
0000000000000000000000
0000000000000000000000
0000000000000000001000
0000000000000000000000

· .......
· .......
·.......
· .......
·.......
· .......

·.......

**************************************************************
The new display is shown. Use the Enter command to set
a location within the memory buffer.

**************************************************************
Key  : Press RETURN when complete
Enter at Key seed

**************************************************************
Enter seed to specify that the seed DIB entry is to be
used.

**************************************************************
The current value at Key seed is 0000000000000000000033
Enter at Key seed the value of
1206302764022300543002

**************************************************************
Enter the value 1206302764022300543002. Presumably,
this is the seed from an on-line failure of olcrit.

**************************************************************

A/Dump Cpu Enter Fill Go Halt Load Opts Quit Redraw Stat Up View Write Xecute
e 4017 1
Name: offcrit
B: Down, idle
B
0
Oa
L
0
S 0000 0000 00
DIB display for olcrit
name
='olcrit
rev
='5.0
date
='10/12/88'
pass
= 0
error
= 0
seed
= 1206302764022300543002
failpat
=
failcln
=0
isop
= 1000
numins
= 200
CPU

P

ibuff

12000a ERR

jbuff

12400a ERR

initaO
inita1

5-80

= 0000000000000000000000
= 0000000000000000000000

Wed Oct 19 14:12:59 1988
Downed CPUs:
B

00
01
02
03
04
05
06
07

00,000,004,000
0675543067115135020040
0324561402004010020040
0304601363046213634070
0000000000000000000000
0000000000000000000000
1206302764022300543002
0000000000000000000000
0000000000000000000200

10
11
12
13
14
15
16
17

1000000000000000037777 •••••• ? •
0000000000000000000000
0000000000000000000007
0000000000000000000000
0000000000000000000000
0000000000000000000000
0000000000000000001000
0000000000000000000000

CRAY PROPRIETARY

olcrit
5.0
10/12/88

•• -@ ••••

SMM-1012 C

Example (continued):

**************************************************************
The Enter command is used again to enter a 1 at location
4017. This sets the repeat flag for offcrit.
(Refer to
the offcrit listing for more information.)

**************************************************************

A/Dump Cpu Enter Fill Go Halt Load Opts Quit Redraw Stat Up View Write Xecute
y
**************************************************************
Enter a y to confirm the quit. Note that CPU B will be
left down since it was not explicitly returned to UNICOS
with the Up command.
**************************************************************

5-86

CRAY PROPRIETARY

SMM-1012 C

5.2.7

PROGRAM MESSAGES

This subsection lists the oldman messages in alphabetical order.
Address addr exceeds limit address
This message is associated with the Enter (e) command.
valid address to continue.

Reenter a

Cannot access printer
If the OLDMON PRINTER environment variable is set, its value is not a
valid command. If OLDMON PRINTER is not set, the command ezlp cannot
be executed.
Cannot allocate memory
This message is associated with the Load (1) or Options (0) command.
Cannot dump DIB of the loaded diagnostic
This message is associated with the Append (a) or Dump (d) command.
Cannot fill memory outside of buffer
This message is associated with the Fill (f) command.
Fill command.

Reenter the

Cannot find DIB entry x
This message is associated with the Enter (e) or Fill (f) command.
CPU n interrupts: list
This message lists all the interrupts for CPU n.
CPU n is already down
The oldmon monitor tried to down a CPU that it has downed already.
Indicates an internal oldman error. Contact your CRI representative.
CPU n is not down
This message is associated with the Status (s) or Up (u) command.
CPU n registers are unavailable and cannot be dumped
Registers cannot be dumped unless the current CPU is down and idle.
This message is associated with the Append (a) or Dump (d) command.
Exception condition: caught signal
Refer to signal(2).
Exchange Package is not in the CPU's memory
This message is associated with the Go (g) command.
File file is empty
An empty file was specified when loading a diagnostic.

SMM-1012 C

CRAY PROPRIETARY

5-87

File file: system error message
The oldman monitor had an error while accessing, reading, or
writing file.
Invalid input input
The oldman monitor received unexpected input.
The ioctl-request ioctl failed for cpu-device: errno n: system

error message
The oldman monitor made the specified request to UNICOS and the
request failed.
plock: errno n: system error message
The oldman monitor made a request to be locked in memory and the
request failed.
Second address must be greater than first address
This message is associated with the Append (a), Dump (d), Fill
(f), or Write (w) command.
Single CPU system; cannot down a CPU.
The oldman monitor does not allow downing a CPU on a single CPU
system.
Terminal type not set, cannot use screen mode
The TERM environment variable was not set when oldman was
started.
Unable to configure loaded diagnostic
This message is associated with the Load (1) command.
Unknown terminal terminal; cannot use screen mode
terminal is not defined in the terminfo(4F) database.
Value exceeds parcel size
This message is associated with the Enter (e) or Fill (f)
command. value must not exceed 0'177777.

5-88

CRAY PROPRIETARY

SMM-1012 C

5.3

unitap

The unitapt test is an on-line magnetic tape test that allows you to
test up to 8 tape paths in parallel. It is supported in a standard
configuration. You can execute unitap interactively or from a UNICOS
shell script.tt Interactive execution is menu-driven, with a
240-character command buffer. From each menu, you can access all of the
other menus.
All user input and output is saved in a trace file for later evaluation.
To simulate passing and failing test execution examples without removing
the tape device from normal system operations, you can execute unitap
in Learn mode.
The unitap testing options are as follows:
Testing Option

Description

All tape tests

All of the tape tests (test sections) are
executed (run time: approximately 3 minutes).

Two-channel
conflict tests

A selection of tape tests are executed in
parallel to exercise 2 tape paths (run time:
approximately 10 minutes). The tests verify
whether the channels can withstand conflict.

Three-channel
conflict tests

A selection of tape tests are executed in
parallel to exercise 3 tape paths (run time:
approximately 10 minutes). The tests verify
whether the channels can withstand conflict.

Canned test

A user-selected test is executed (for example, a
byte counter test).

Test loop

A user-defined test is executed (refer to
subsection 5.3.4.6, Programming Tool).

For additional information, refer to subsection 5.3.3.3, Test Menu.

t
tt

CX/CEA systems only.
Execution from a shell script is deferred.

SMM-1012 C

CRAY PROPRIETARY

5-89

In addition to providing error detection capabilities, unitap provides
the following troubleshooting tools:
Troubleshooting Tool

Description

Breakpoint

Sets breakpoints in the tape tests

Channel Commandst

Issues channel commands

Compare Data Buffer

Displays data miscomparisons for the write
and read data buffers

Display Memory

Displays the write and read central memory
data buffers, and allows you to modify the
write buffer

System Call History

Displays a history of the last 15 system
calls and the last 10 events that preceded
the current event. An event is defined
as any of the following actions:
A failure occurs
A breakpoint is reached

Programming

Builds test loops

Packet Status

Displays the status of the last packet sent
for each channel at the time of the last
10 events that preceded the current event

For additional information, refer to subsection 5.3.4, Debug Tools.

5.3.1

PROGRAM SYNOPSIS

You can execute unitap interactively or from a UNICOS shell script.tt
This subsection describes how to execute unitap from a shell script.
For a description of interactive execution, refer to subsection 5.3.2,
Interactive Program Execution.

t
tt

5-90

Deferred implementation.
Execution from a shell script is deferred.

CRAY PROPRIETARY

SMM-1012 C

5.3.2

INTERACTIVE PROGRAM EXECUTION

Interactive execution is menu-driven, with a 240-character command

buffer.

From each menu, you can access all of the other menus.

Menu options can be entered in uppercase or lowercase.

5.3.3

PROGRAM MENUS

This subsection provides a summary of the unitap menu system.
following menus are described.
•
•
•
•
•
•
•

The

Main menu
Variable menu
Test menu
Canned Test menu
Debug menu
Global Options menu
Hardware Layout menu

SMM-1012 C

CRAY PROPRIETARY

5-91

5.3.3.1

Main Menu

The Main Menu is displayed when unitap is initialized or when you enter
MR from any menu (refer to figure 5-27).

unitap

Main Menu

Option

Description
Debug Menu
Test Menu
Variable Menu

D

T
V

Global Options Menu
Program notes

G

W

EXIT
HELP option
Note:

Exit the diagnostic
Information on option

these menu options are global (valid from all menus).

Figure 5-27.

Main Menu for unitap

The menu options are as follows:
Option

Description

D

Debug Menu (refer to subsection 5.3.3.5)

T

Test Menu (refer to subsection 5.3.3.3)

V

Variable Menu (refer to subsection 5.3.3.2)

G

Global Options Menu (refer to subsection 5.3.3.6)

w

Program notes

EXIT

Exit the diagnostic; channels dedicated to on-line
diagnostic testing are released.

HELP option
Information on option

5-92

CRAY PROPRIETARY

SMM-1012 C

5.3.3.2

Variable Menu

The Variable Menu is displayed when you enter V from any menu (refer to
figure 5-28).

Variable Menu

unitap

Path !

CH=20, CO=O, DV=dv, DN=6250, PC=!

Option

Description

CH
CO
DN
DV
Pn
PC
RL

Channel number (20-33 octal)
Controller number (O-F hexadecimal)
Density value (800, 1600, or 6250, CART)
Device number (O-FFF ASCII)
Path (1-8)
Pass count (decimal)
Release the dedicated (reserved) path for the tape unit

n
n

n
dv

n

G
R

Note:

Global Options Menu
Previous menu
these menu options are global (valid from all menus).

Figure 5-28.

Variable Menu

Each option is briefly described in the Variable Menu.
following descriptions provide further clarification:

However, the

Option

Description

CH n

Channel number.
n is a value in the range 0'20 through
0'33.
The default for n is 0'20 through 0'27, for paths
1 through 8, respectively.

CO n

Controller number.
n is a value in the range 0 through
F (hexadecimal).
The default for n is O.

DN n

Density value.
n is one of the following values:
1600, or 6250 (default), CART.

DV dv

Device number (required).
value.

SMM-1012 C

800,

n is a site-defined ASCII

CRAY PROPRIETARY

5-93

Option

Description

Pn

Path under test (channel, controller, and device).
a value in the range 1 through 8. The default for
is 1.

PC n

Pass count.

RL

Release the dedicated path for the tape unit.

5.3.3.3

n is

n

The default for n is 1.

Test Menu

The Test Menu 'is displayed when you enter T from any menu (refer to
figure 5-29).

unitap

Path 1

Option

Test Menu

CH=20, CO=O, Dv=dv, DN=6250, PC=l

Description

A
C
2
3

Execute
Display
Execute
Execute

G
R

Global Options Menu
Previous menu

Note:

all
the
the
the

the tape tests
Canned Test Menu
two-channel conflict tests
three-channel conflict tests

these menu options are global (valid from all menus).

Figure 5-29.

5-94

Test Menu

CRAY PROPRIETARY

SMM-1012 C

The menu options are as follows:
Option

Description

A

All tape tests. All of the tape tests are executed (run
time:
approximately 3 minutes).

2

Two-channel conflict tests. A selection of tape tests are
executed in parallel to exercise 2 tape paths (run time:
approximately 10 minutes). The tests verify whether the
channels can withstand conflict.

3

Three-channel conflict tests. A selection of tape tests
are executed in parallel to exercise 3 tape paths (run
time:
approximately 10 minutes). The tests verify
whether the channels can withstand conflict.

C

Canned test. A user-selected test is executed (for
example, a byte counter test).

SMM-1012 C

CRAY PROPRIETARY

5-95

5.3.3.4

Canned Test Menu

The Canned Test Menu is displayed when you enter C from any menu (refer
to figure 5-30).

unitap
Path 1
Option

Canned Test Menu
CH=20, CO=O, Dv=dv, DN=6250, PC=l
Description

AC
BC
BF
BN
BS
LA
RB
ST
TP

All basic commands tests (except Read)
Byte counter test (transfers up to 4 kbytes)
Buffer tests (R/W 64 bits)
Next byte counter test (transfers 4 to 8 kbytes)
Bus test (R/W 8 bits)
Ladder tests
Random buffer tests (R/W 64 random bits)
Stress test
Tape position commands tests

G

Global Options Menu
Previous menu

R

Figure 5-30.

5-96

Canned Test Menu

CRAY PROPRIETARY

SMM-lOl2 C

The menu options are as follows:
Option

Description

AC

All basic commands tests. Tests the rewind, write, write
tape mark, forward block, backward block, forward tape
mark, and backward tape mark tape movement commands.

BC

Byte counter test. Writes and reads 1, 2, 4, 8, 16, 32,
64, 128, 256, 512, 1024, 2048, and 4096 bytes to the tape.

BF

Buffer tests.
tape.

BN

Next byte counter test. Writes and reads 1 sector (4096
bytes) plus 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024,
2048, and 4096 bytes to the tape.

BS

Bus test.

LA

Ladder tests. Writes and reads 1, 2, 3, 4, 5, 6, 7, and 8
sectors to the tape.

RB

Random buffer tests. Writes and reads random data
patterns to the tape.

ST

Stress test

TP

Tape position commands tests. Writes patterns to the
tape, issues tape positioning commands, and then reads the
patterns to verify that the positioning commands work.

SMM-1012 C

Writes and reads 64-bit patterns to the

Writes and reads 8-bit patterns to the tape.

CRAY PROPRIETARY

5-97

5.3.3.5

Debug Menu

The Debug Menu is displayed when you enter D from any menu (refer to
figure 5-31).

unitap
Option
B

cct
CD
E
H
L

LO
M

PG
S
G

R

Note:

t

Debug

Menu

Description
Breakpoint Tool
Channel Commands Tool
Compare Data Buffer Tool
Fail execution (Learn mode)
System Call History Tool
Learn mode/System mode (toggle)
Hardware Layout Menu
Memory Tool (Central Memory)
Programming Tool
Packet Status Tool
Global Options Menu
Previous menu
these menu options are global (valid from all menus).

Deferred implementation
Figure 5-31.

Debug Menu

For additional information, refer to subsection 5.3.4, Debug Tools.

5-98

CRAY PROPRIETARY

SMM-1012 C

5.3.3.6

Global Options Menu

The Global Options Menu is displayed when you enter G from any menu
(refer to figure 5-32).

Global Options Menu

unitap
Option

Description

Option

All confidence tests
Breakpoint Tool
Canned Test Menu
Command buffer pass count
Channel Commands Tool
Compare Data Buffer Tool
Channel number
Controller number
Debug Menu
Density value
Device number
Error mode (Learn mode)
System Call History Tool

M

3

LO

Learn mode/System mode
Display layout

EXIT
R

Exit diagnostic
Previous menu

HELP

A
B
C

CB n

cct

CD
CH n
CO n
o
ON n
DV n
E

H

Memory Tool
Main menu
Programming Tool
Pass count (decimal)
Path (1-8)
Print screen

MN

PG
PC n
Pn
PT
RL
RT
S
T
V
W

Release path
Return from breakpoint
Packet Status Tool
Test Menu
Variable Menu
Program notes
Two-channel conflict test
Three-channel conflict test

2
L

t

Description

option

Information on

option

Deferred implementation

Figure 5-32.

SMM-1012 C

Global Options Menu

CRAY PROPRIETARY

5-99

5.3.3.7

Hardware Layout Menu

The Hardware Layout Menu is displayed when you enter LO from any menu
(refer to figure 5-33).

unitap

Option
D

BM

Hardware Layout Menu

Description
Debug Menu
Block Multiplexer layout

Figure 5-33.

5-100

Hardware Layout Menu

CRAY PROPRIETARY

SMM-1012 C

The Block Multiplexer Layout Menu for a BMC-5 is displayed when you enter
8M from the Hardware Layout Menu (refer to figure 5-34).

unitap

Option
D

BM

Block Multiplexer Layout Menu (BMC-5)

Description
Debug Menu
Block Multiplexer layout

Figure 5-34.

SMM-1012 C

Block Multiplexer Layout Menu (BMC-5)

CRAY PROPRIETARY

5-101

5.3.4

DEBUG TOOLS

The unitap debug tools can be selected from any menu.
as follows:
Tool
Breakpoint
Channel Commandst
Memory Buffer
Compare Data Buffer
System Call History
Programming
Packet Status

These tools are

Menu Option
B

CC
M

CD
H

PG
S

These tools are described in the subsections that follow.

t

Deferred implementation

5-102

CRAY PROPRIETARY

SMM-1012 C

5.3.4.1

Breakpoint Tool

The Breakpoint Tool is displayed when you enter B from any menu (refer
to figure 5-35). This tool allows you to set a breakpoint immediately
preceding or following a system call in a test.
When the breakpoint is
reached, the user's keyboard input is executed.
If an error is detected, information relating to the event is
displayed.
An event is defined as any of the following actions:
a
failure occurs or a breakpoint is reached. Use the System Call History
and Packet Status tools to display additional information regarding an
event.

Breakpoint Tool

unitap

Breakpoint

=0

Breakpoint pass count

=I

When breakpoint is reached, the user's keyboard input is executed.

message displayed on error
Event n occurred after y system calls.

Option

Description

BP n

Execute a breakpoint on pass n

BR n

Set or clear a breakpoint.
n is one of the
following breakpoint numbers:

o-

Clear the breakpoint
1 - Set breakpoint prior to the system call
2 - Set breakpoint after the system call

RT

Return to test after a breakpoint (global option)

D

Debug Menu
Global Options Menu
Previous menu

G

R

Figure 5-35.

SMM-1012 C

Breakpoint Tool

CRAY PROPRIETARY

5-103

5.3.4.2

Channel Commands Tool

The Channel Commands Toolt is displayed when you enter CC from any
menu (refer to figure 5-36). This tool allows you to issue channel
commands to the tape device, and to display channel status. For
additional information on the channel commands, refer to the APML
Reference Card for COS and UNICOS, CRI publication SQ-0059.

unitap
Path 1

Channel Commands Tool
CH=20, CO=O, DV=dv, DN=6250, PC=l

LMARO
= 123456
LMAR1
= 123457
Byte counter = 1000

Command
00
01
02
03
04
05
10

Description

Bus in = 123001
Tags in = 377
Flags
= IDLE

Command

Clear chan control
Reset channel
Send command
Read address
Single byte IIO
Run diagnostics
Read LMAR

11
12
13
14
15
16
17

n
n
n
n

Description
Read byte counter register
Read bus and status
Read input tags
Write LMAR ( n: accumulator
Write BC
( n: accumulator
Enter Addr ( n: accumulator
Write tags ( n: accumulator

value)
value)
value)
value)

R Previous menu
G Global Options Menu

Figure 5-36.

t

Channel Commands Tool

Deferred implementation

5-104

CRAY PROPRIETARY

SMM-1012 C

5.3.4.3

Display Data Buffer Tool

The Display Data Buffer Tool is displayed when you enter M from any
menu (refer to figure 5-37). This tool allows you to display the read
and write data buffers, and to modify the write data buffer. Each data
buffer is 16 Kwords.

Display Data Buffer Tool

unitap

message displayed on error

=

Write Address
a
a 000000 000000 020124
1 000000 000000 070565
2 000000 000000 041162
3 000000 000000 021106
4 000000 000000 045155
5 000000 000000 000000
6 000000 000000 000001
7 000000 000000 000001

Option
DA n
DF DB DP DW
DI DO DD DX
ST SS SP SK
CP

LP

LN

=

044145
064553
067556
067570
070144
170435
020526
050617

000000
000022
000045
000070
000113
000136
000161
000203

000000
153207
126416
101625
055034
030243
003452
156661

Description
Display address
Display Forward or Back in Parcel or Word format
Display in Ascii,Octal,Decimal,Hex
Store adr data, Store Seeded random, Store Pattern, Store
Skip
Copy a block of data, Locate Pattern, Locate a
non-pattern

Figure 5-37.

SMM-1012 C

Read Address
a
a 000000 000000
1 000000 000000
2 000000 000000
3 000000 000000
4 000000 000000
5 000000 000000
6 000000 000000
7 000000 000000

Display Data Buffer Tool (1 of 2)

CRAY PROPRIETARY

5-105

Display Data Buffer Tool

unitap

Command

Description

CP addrl addr2 n
LP addr pattern
SK addr data n y

Copy n words from addrl to addr2
Search for pattern starting at addr
Store data in n words (skip y words between stores),
starting at addr
Store data consecutively in n words, starting at addr
Store random data consecutively in n words, starting at
addr, using seed to start the random number generator
Store data in addr

SP addr data n
SS addr seed n
ST addr data
D/DR/DL addr
Dx/DRx/DLx

Display full/right/left screen starting at addr
Display x: F (forward), B (backward), A (ASCII), 0 (octal),
D (decimal), X (hexadecimal), P (parcel), W (word)

Figure 5-37.

5-106

Display Data Buffer Tool (2 of 2)

CRAY PROPRIETARY

SMM-1012 C

5.3.4.4

Compare Data Tool

The Compare Data Tool is displayed when you enter CD from any menu
(refer to figure 5-38). This tool allows you to display the read and
write data buffers, and exclusive ORs (logical differences) for the Write
and Read address comparisons. Each data buffer is 16 Kwords.

unitap

Compare Data Tool

The Read compare grid is the Exclusive OR (or logical difference) of
the data at the Write grid address and the data at the Read grid
address.

Write Address
0
1
2
3
4
5
6
7

000000
000000
000000
000000
000000
000000
000000
000000

=0

000000
000000
000000
000000
000000
000000
000000
000000

READ COMPARE Address

020124
705654
041162
021106
045155
000000
000001
000001

044145
064553
067556
067570
070144
170435
020526
050617

0
1
2
3
4
5
6
7

=0

20124
70547
41127
21176
45046
136
160
202

044145
137754
141140
166355
025170
140676
023174
106076

------------------------------------------------------------------------Display
Enter

Forw,Back,Oct,Dec,Hex,Parc,Word, Display Address, Locate Error
DB
DO DD DX DP
DF
DW
LE
DA

Figure 5-38.

SMM-1012 C

Compare Data Tool

CRAY PROPRIETARY

5-107

5.3.4.5

System Call History Tool

The System Call History Tool is displayed when you enter H from any
menu (refer to figure 5-39). This tool allows you to display a history
of the last 15 system calls (commands) and the last 10 events that
preceded the current event. An event is defined as any of the
following actions: a failure occurs or a breakpoint is reached.

unitap

System Call History Tool

Event # 1 was on PATH 1 in the LMAR Test at label L11002 pattern=40
The diagnostic wrote 40 to the LMAR and read back 44445
Seq

Path

Chan

Cont

Dev

CMD

Sec

Blk

14
13
12
11
10
9
8
7
6
5
4
3
2
1
LAST

1
3
2
1
3
2
1
3
2
1
3
2
1
3
2

20
22
21
20
22
21
20
22
21
20
22
21
20
22
21

0
2
1
0
2
1
0
2
1
0
2
1
0
2
1

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

RLMAR
F BK
W BUS
WLMAR
BK BK
W TAG
RLMAR
F BK
W TAG
WLMAR
BK BK
R BUS
RLMAR
W TAG
W BUS

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

B Adr FIg
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

ACC

Label

10
0
2
20
0
2000
20
0
2000
40
0
2000
40
0
3

11001
27008
15000
11000
27009
15001
11001
27010
15002
11000
27011
15003
11001
21000
15000

Pattern
10
0
2
20
0
2
20
0
2
40
0
2
40
0
3

------------------------------------------------------------------------Option

Description

Option

Description

D
G

Debug Menu
Global Options Menu

N or P

Previous or next event
Status tool

Figure 5-39.

5-108

S

System Call History Tool

CRAY PROPRIETARY

SMM-1012 C

5.3.4.6

Programming Tool

The Programming Tool is displayed when you enter PG from any menu
(refer to figure 5-40). This tool allows you to define a test loop with
up to 32 steps and up to 8 channels performing read, write, rewind, and
compare operations.

unitap

Programming Tool

Path 1
STEP PATH
1
2
1
2
1
2
0
0

1
2
3
4
5
6
7
8

CH=20, CO=O, Dv=dv, DN=6250, PC=1
DEV
20
21
20
21
20
21
0
0

COMMAND
WRITE
REWIND
REWIND
READ
READ
FORW TM

SECT BLOCKS
5
0
0
3
0
2
0
0

1
0
0
2
0
0
0
0

BYTES

BUF ADR

FLAGS

0
0
0
0
0
0
0
0

1234
0
0
7010
11000
0
0
0

1357
0
0
0
0
0
0
0

JUMP TO STEP
0
0
0
0
0
2
0
0

Option

Description

Option

Description

BA n
BK n
BY n
eM n
FG n

Buffer address
Number of blocks
Number of bytes
Tape/channel command
Flag settings

JP n
PPn
SC n
ST n

Jump to step n
Path (1-8)
Number of sectors
Step (1-32)

DF/DB
G

Scroll forward/backward
Display global options

HELP
RUN

option

Information on option
Run test for n passes (PC n)

Now loading step number n

Figure 5-40.

SMM-1012 C

Programming Tool

CRAY PROPRIETARY

5-109

5.3.4.7

Packet Status Tool

The Packet Status Tool is displayed when you enter S from any menu
(refer to figure 5-41). This tool allows you to display the status of
the last packet sent for each channel at the time of the last 10 events
that preceded the current event. An event may be either of the
following actions: a failure occurs or a breakpoint is reached.

unitap
Path 1

Packet Status Tool
CH=20, CO=O, DV=dv, DN=6250, PC=l

Path 1 was in the LMAR Test at label L11002 pattern=40.
Event # 1 was on PATH 1 in the LMAR Test at label L11002 pattern=40.
The diagnostic wrote 40 to the LMAR and read back 44445

Requested Sector Count =
Requested Block Count =
Data buffer address =
Accumulator =
Function =
Diagnostic Flags =
DFT packet Status flag =
DFT packet Status code =

Last DFT

Last DFT Reply

0
0
0
40
RLMAR
0

0
0
0
44445
RLMAR
0
DONE
0

Option

Description

G

Global Options Menu
System Call History Tool
Previous or next event, respectively
Status for path (1-8)
Previous menu

H

P or N
Pn
R

Figure 5-41.

5-110

Packet Status Tool

CRAY PROPRIETARY

SMM-1012 C

5.3.5

TRACE FILE

All user input and output is saved in a trace file for later evaluation.

5.3.6

LEARN MODE

To simulate passing test execution examples without removing the tape
device from normal system operations, you can execute unitap in Learn
mode. To enter Learn mode, enter L from any menu; to return to normal
system operations (system mode), enter L again.
When you execute in Learn mode, the mode is indicated at the top of all
the menus.

5.3.7

PROGRAM EXAMPLES

This subsection contains unitap execution examples.

The following example runs all of the unitap tests on device 00 and
then exits the program.
unitap dv 00 a exit

The following example runs the two-channel conflict tests on devices 00
and 01, and then exits the program.
unitap dv 00 p2 dv 01 2 exit

5.3.8

PROGRAM MESSAGES

The following subsections contain the unitap messages:
•

Messages with menu displays

•

Messages without menu displays

The messages are listed alphabetically in each subsection.

SMM-1012 C

CRAY PROPRIETARY

5-111

5.3.8.1

Messages with menu displays

The messages are listed alphabetically in this subsection.

BREAKPOINT PROCESSED

Option
C
D
G
H
MN

N

Option

Description
Canned Test Menu
Debug Menu
Global Options Menu
System Call History Tool
Main Menu
Continue testing with next pattern

0

PG
R
S
T
V

Description
Rerun test
Programming Tool
Previous menu
Packet Status Tool
Test Menu
Variable Menu

TEST FAILED
Path 1

CH=20, CO=O, DV=dv, DN=6250

3-channel conflict tests were executing on pass 1 at label L4
Event # 1 was flagged in the diagnostic at label DL11002
Path 2 was in the Bus test at label L15004 variable=2
Path 3 was in the Tag-Loopback test at label L21001 variable=O
The error was on Path 1 in the LMAR Test at label L11002 variable=40
The diagnostic wrote 40 to the LMAR and read 44445

Option

Description

Option

Continue testing with next pattern
Rerun test
Packet Status Tool
Test Menu

H

Canned Test Menu
Debug Menu
Global Options Menu
System Call History Tool

F
X

Loop on failing pattern until next error or pass count is reached
Loop on failing pattern until abort (press the ESC-A keys)

C
D

G

5-112

N

Description

0

S
T

CRAY PROPRIETARY

SMM-1012 C

5.3.8.2

Messages without menu displays

The messages are listed alphabetically in this subsection.
Invalid entry: n
Range: n through n (radix)
Enter a valid value to continue
or an asterisk (*) to abort.
The value entered is invalid.

Enter a valid value.

Test passed: test
The test completed successfully.

SMM-1012 C

CRAY PROPRIETARY

5-113

6.

1/0 SUBSYSTEM DEADSTART PROGRAMS

This section describes the following 1/0 Subsystem (105) deadstart
programs:
Program

Description

cleario

105 deadstart utility. The cleario utility attempts to
clear the lOS if the deadstart procedure fails.

dsdiag

105 deadstart diagnostic control program. The dsdiag
program allows the system operator to run deadstart
diagnostics from tape or disk.

6.1

SYSTEM CONFIGURATION

The file aptezt contains the system text, including the configuration
information for the lOS deadstart programs. The following system
components are defined during system configuration:
•

Optional IIO processors (IOP-2 and IOP-3)

•

lOS type (model A, B, C, or D)

•

High-speed channel connections to central memory and the SSD
solid-state storage device

•

Low-speed channel connection from IOP-O to the CPU

•

Console channels

•

Central memory size

•

Buffer memory size

•

SSD memory size

For information on the lOS installation parameters, refer to the I/O
Subsystem (lOS) Administrator's Guide, CRI publication SG-0307.

SMM-I012 C

CRAY PROPRIETARY

6-1

6.2

cleario

If the lOS deadstart procedure fails, the system operator can execute
cleario from tape or disk in an attempt to clear the lOS. For
information on the lOS deadstart procedure, refer to one of the following
CRI publications, as appropriate to your configuration:
SG-2005
SN-3030

I/O Subsystem (lOS) Operator's Guide for UNICOS
Operator Workstation (OWS) Guide

IOP-O must be minimally operational to execute the tape, disk, or OWS
bootstrap routine (TAPELOAD, DISKLOAD, or VMELOAD, respectively) and
cleario.

6.2.1

PROGRAM EXECUTION

The cleario program does the fOllowing:
•
•
•

Disables all interrupts
Clears all of the lOS channels
Zeros the following:
The exit stack, the operand registers, and local memory in
each rop
Buffer memory
The last 64 words of central memory

Use the following procedure to execute cleario:
1.

Mount the deadstart tape or disk at the operator's station.

2.

Set the lOS maintenance panel toggle switches, as follows:
Tape/Disk Unit
Tape
Ampex disk
CDC disk

Switch Setting
Octal
Binary
22
60
27

010 010
110 000
010 111

NOTE
If the lOS maintenance panel has a 'maintenance mode'
switch, set the switch to the 'on' position. When
cleario is completed (successfully or unsuccessfully),
return the switch to the 'off' position.

6-2

CRAY PROPRIETARY

SMM-1012 C

3.

Press the IOP-O MC button (or the MASTER CLEAR button on a
CRAY-l A computer system) and the DEADSTART button on the Power
Distribution Unit or lOS chassis maintenance panel (as
appropriate for your site).

4.

Respond to one of the following prompts (for tape or
disk, respectively) at the IOP-O Kernel console:
FILE @MTO:
or
FILE @DKO:

NOTE
The FILE @MTO prompt is not displayed unless a tape is
mounted at the operator's station.

In response to the tape prompt, enter the number of the tape file
containing cleario and press RETURN. If a tape is written
using standard Cray generation procedures, file 7 contains
cleario.
In response to the disk prompt, enter the name of the directory
and file containing cleario (dir/cleario) and press RETURN.
5.

If cleario completes successfully, the following message is
displayed at the IOP-O Kernel console:
CLEARIO COMPLETE
The operating system bootstrap program is reloaded and one of the
following prompts (for tape or disk, respectively) is displayed:
FILE @MTO:
or
FILE @DKO:
Proceed with the lOS deadstart procedure. For information on the
lOS deadstart procedure, refer to one of the following CRI
publications, as appropriate to your configuration:
SG-2005
SN-3030

SMM-I012 C

1/0 Subsystem (lOS) Operator's Guide for UNICOS
Operator Workstation (OWS) Guide

CRAY PROPRIETARY

6-3

6.

If either of the following conditions occurs, run the lOS
deadstart tests to determine if an lOS hardware malfunction
exists:
cleario does not complete successfully (the message
'CLEARIO TERMINATED' is displayed or there is no response
within one minute).
The lOS deadstart procedure continues to fail after
cleario completes execution.

6.2.2

PROGRAM MESSAGES

The cleario program generates the following types of messages:
•
•
6.2.2.1

Informative
Error
Informative messages

The following informative messages are displayed at the IOP-O Kernel
console:
CLEARIO COMPLETE
cleario completed successfully.
TAPE NOT READY
This message is displayed until the tape is ready for use.
6.2.2.2

Error messages

The following error messages are displayed at the IOP-O Kernel console.
Unless otherwise indicated, use the lOS deadstart tests to do further
error isolation.
CLEARIO TERMINATED
An error in one of the lOPs prevented cleario from executing
successfully. Check the error logger for errors and run the
dsdiag program for more information on the failure.
BUFFER MEMORY TIMEOUT
A Done flag is not set on the buffer memory channel. Check the
error logger for errors and run the dsdiag program for more
information on the failure.
BUFFER MEMORY ERROR
A Busy flag is set on the buffer memory channel. Check the error
logger for errors and run the dsdiag program for more
information on the failure.

6-4

CRAY PROPRIETARY

SMM-1012 C

device ERROR, STATUS=status
A device error occurred while the overlay was being loaded.
device can be TAPE or DISK. status is the controller status
for the deadstart device. Select a different device and deadstart
the lOS. If no other device is available or the failure
continues, use off-line diagnostics to isolate the error.
TAPE ERROR, STATUS=status AFTER REWIND
A tape error occurred after the overlay was loaded. status is
the controller status for the tape device. Use a disk device and
deadstart the lOS. If a disk device is unavailable or the failure
continues, use off-line diagnostics to isolate the error.

dsdiag

6.3

The dsdiaq program is the deadstart diagnostic control program that
allows the system operator to run deadstart tests from tape or disk.
The dsdiaq program does the following:
1.

Executes a series of basic IOP-O tests

2.

Loads and executes subsequent lOS tests from a diagnostic overlay
file

6.3.1

PROGRAM EXECUTION

Prior to loading the lOS Kernel, the system operator can run deadstart
diagnostics from tape or disk by loading and executing the deadstart
diagnostic control program, dsdiaq. IOP-O must be minimally
operational to execute the tape, disk, or OWS bootstrap routine
(TAPELOAD, DISKLOAD, or VMELOAD, respectively) and dsdiaq.
Use the following procedure to execute the IDS deadstart diagnostics:
1.

Mount the deadstart tape or disk at the operator's station.

2.

Set the IDS maintenance panel toggle switches, as follows:

Tape/Disk Unit
Tape
Ampex disk
CDC disk

SMM-1012 C

Switch Setting
Binary
Octal
22
60
27

010 010
110 000
010 111

CRAY PROPRIETARY

6-5

3.

Press the IOP-O MC button (or the MASTER CLEAR button on a
CRAY-l A computer system) and the DEADSTART button on the Power
Distribution Unit or lOS chassis maintenance panel (as
appropriate for your site).

4.

Respond to one of the following prompts (for tape or disk,
respectively) at the IOP-O Kernel console:
FILE @MTO:
or
FILE @DKO:

NOTE
The FILE @MTO prompt is not displayed unless a tape is
mounted at the operator's station.

In response to the tape prompt, enter the number of the tape file
containing dsdiaq and press RETURN. If a tape is written using
standard Cray generation procedures, file 8 contains dsdiaq.
In response to the disk prompt, enter the name of the directory
and file containing dsdiaq (dir/dsdiaq) and press RETURN.
Pass/fail status messages are displayed at the IOP-O Kernel
console during test execution.
5.

If the diagnostic tests complete successfully, the following
message is displayed:
DIAGNOSTICS COMPLETE
The operating system bootstrap program is reloaded and one of the
following prompts (for tape or disk, respectively) is redisplayed
at the IOP-O Kernel console:
FILE @MTO:
or
FILE @DKO:

6-6

CRAY PROPRIETARY

SMM-I012 C

Proceed with the lOS deadstart procedure. For information on the
lOS deadstart procedure, refer to one of the following CRI
publications, as appropriate to your configuration:
SG-2005
SN-3030
6.

6.3.1.1

1/0 Subsystem (lOS) Operator's Guide for UNICOS
Operator Workstation (OWS) Guide

If a diagnostic test detects a failure, the message 'DIAGNOSTICS
TERMINATED' is displayed at the IOP-O Kernel console or there is
no response within one minute. The system operator should report
failures to a CRI field engineer.

IOP-O tests

Although IOP-O must be minimally operational to perform deadstart
operations, it can still contain faults.
Therefore, dsdiag tests IOP-O
before loading the deadstart tests from an overlay file.
If the IOP-O
diagnostics do not execute successfully, use off-line diagnostics to do
further testing.
The IOP-O tests exercise the following areas, in the order shown:
1.
2.
3.
4.
5.

Instruction buffers
Exit stack
Operand registers
Local memory
Real-time clock

The test procedure is as follows:
Logic Tested

Test Procedure

Instruction
buffers

Forces l's and O's through each buffer location to
detect dropped and picked bits, and adder faults.

If a failure is detected, the test does not issue an error message;
instead, it loops at the point of failure.
Use off-line diagnostics to
do further testing.
Instruction buffer addressing is not tested. However, a fault in this
area is likely to prevent dsdiag from loading.
If no messages are
displayed at the IOP-O Kernel console within a few seconds of loading, a
failure exists. You can scope the IOP-O P register before using off-line
diagnostics to do further testing.

SMM-1012 C

CRAY PROPRIETARY

6-7

Logic Tested

Test Procedure

Exit stackt

Checks for basic addressing and data faults in each
stack location. Using I/O instructions for access,
the test detects all single-stuck addressing and
data faults, and all coupled-data bit faults. It
also tests return jumps and exits at all stack
depths.

Operand
registerst

Checks for basic faults in all of the registers
except 0 and 1, which are used to run the test
algorithm. The test detects all single-stuck
addressing and data faults, and all coupled-data bit
faults.

Local memory

Tests the area of local memory between the end of
dsdiag and the highest local memory address. The
test uses an algorithm with a parcel-oriented,
ascending and descending, marching l's and O's
pattern to detect all single-stuck addressing and
data faults, and all coupled-data bit faults.

Real-time
clock

Tests the real-time clock to ensure that an
interrupt occurs approximately once every
millisecond.

When all of the IOP-O tests complete successfully, the following message
is displayed at the IOP-O Kernel console (it is not required that the
real-time clock test complete successfully):
IOP-O KERNEL PASSED
The dsdiag program then loads and executes the deadstart tests
contained in an overlay file.
If anyone of the IOP-O tests does not complete successfully (excluding
the real-time clock test), dsdiag does not execute any subsequent
diagnostics. An error message is displayed if a test fails (with the
exception of the instruction buffer test, which loops at the point of
failure instead of issuing an error message). The dsdiag program
automatically attempts to reload the deadstart bootstrap program,
TAPELOAD, DISKLOAD, or VMELOAD. If the attempt is unsuccessful, dsdiag
halts and you can use off-line diagnostics to isolate the fault.
For a list of messages, refer to subsection 6.3.2, Program Messages.
t

6-8

The test uses a variant of the Milner fast memory test algorithm (EDN,
28, 21; Oct 13, 1983). The Milner algorithm detects dropped and
picked bits in address data, and coupled-data bit faults. The
algorithm uses a rotating single-bit pattern to ensure that only one
bit is changed in each memory chip at each step.

CRAY PROPRIETARY

SMM-1012 C

6.3.1.2

1/0 Subsystem tests

If all of the IOP-O tests complete successfully (excluding the real-time
clock test), dsdiag loads and executes subsequent lOS tests from a
diagnostic overlay file.
The tests are executed in the following order:
Test

Description

dsmos16k

Test of the lower 16 Kwords of buffer memory from IOP-O
only

dsiom

Local memory addressing and data test for each lOP
except IOP-O

dsiop

Instruction test for each lOP

dsmos

Buffer memory addressing and data path test for each rop

dshsp

High-speed channel test from an lOP to central memory or
to an SSD solid-state storage device

dslsp

Low-speed channel test from IOP-O to central memory

dsmos16k - This program tests addressing and data in the first 16384
words of buffer memory from IOP-O only. This area of buffer memory is
used to load an lOP. Therefore, dsmos16k must complete successfully
before tests can be executed in rOP-1, rOP-2, or rOP-3.

The dsmos16k program consists of the following test sections:
1.
2.

Address and data test
Block length test

The dsmos16k test sections are as follows:
Section

Description

1

Address and data test. This section uses an algorithm
with a word-oriented, ascending and descending, marching
l's and O's pattern to test the lower 16 Kwords of buffer
memory. The block length is 1.

2

Block length test. This section tests block length bits
1 through 13 (that is, block lengths 21 through 2 13 ).

If dsmos16k completes successfully, the following message is displayed:
MOS-16K PASSED
The test completed successfully.
For a list of messages, refer to subsection 6.3.2, Program Messages.

SMM-1012 C

CRAY PROPRIETARY

6-9

dsiom - This program tests local memory addressing and data for each
lOP except IOP-O. The test detects basic faults that would inhibit the
proper loading of diagnostics into an lOP.

The dsiom program consists of the following test sections:
1.
2.
3.
4.

All O's
AlII's
Address
All O's

test.
test.
pattern test
test

The test uses deadstart and dead dump procedures to load and dump data
patterns.
In the lOP being tested, no code is executed except a jump to
P + 0 at address O. The jump is required to prevent the lOP from
executing after a deadstart.
(In each of the dsiom test sections,
address 0 contains 0'7000.)
The dsiom test sections are as follows:
Section

Description

1

All O's test. The test data is all O's.
data is all l's.

The background

2

All l's test. The test data is all l's.
data is all O's.

The background

3

Address pattern test. The test data for each parcel
(except parcel 0) is the parcel address. The background
data is all O's.

4

All O's test. This section is the same as section 1.
Section 4 is run so that local memory is reset to all O's
at the end of the test.

Each section uses the upper half of IOP-O and the lower 16 Kwords of buffer
memory as data buffers.
If dsiom completes successfully, the following message is displayed at
the IOP-O Kernel console:

IOP-n 10M PASSED
The test completed successfully in IOP-n.
For a list of messages, refer to subsection 6.3.2, Program Messages.
dsiop - This program tests instructions and registers in lOP-I, lOP-2,
and lOP-3. Part of test section 1, basic instructions and registers
test, executes in all of the lOPs, including lOP-O.

6-10

CRAY PROPRIETARY

SMM-1012 C

The dsiop program consists of the following test sections:
1.
2.
3.

Basic instructions and registers test
Jump instructions test
Operand registers test

The dsiop test sections are as follows:
Section
1

Description
Basic instructions and registers test. Testing starts
with the simplest instructions and data paths and becomes
increasingly complex.
The following

rop

components are tested:

1.

Registers A, B, and C

2.

Instructions in the range 4 through 67 (octal)

3.

Add and shift networks

4.

Operand registers 0 through 20, 40, 100, 200, 400,
and 777 (octal)

5.

Local memory addressing

6.

I/O instructions on channels 0 through 5

7.

E register and exit stack location 0

8.

Interprocessor channels to IOP-O

In rop-o, only areas 1, 2, and 3 are tested; testing in
the other areas would conflict with resident code.
IOP-O
must be minimally operational to execute dsdiag.
Therefore, this test is run in IOP-O only to ensure that
the basic instructions and the add/shift network are
tested completely.
There are no jumps in this test except a jump to P + 0,
which is executed when a fault is detected, causing the
test to loop at the point of failure.

2

Jump instructions test. This section is not run in
IOP-O. The following areas of the lOP are tested:
1.
2.
3.
4.

SMM-1012 C

Jump instructions 070 through 137
Exit instruction 001
Operand registers 0 and 1
Exit stack data and addressing

CRAY PROPRIETARY

6-11

Section
3

Description
Operand registers test. This section is not run in
lOP-D. This test section contains two subsections, as
follows:
Subsection

Description

Systematic
data

Performs a comprehensive test of operand
register addressing and data.t The
test detects all single-stuck faults in
addressing or data, and all coupled
data-bit faults.

Random data

Uses random data patterns to test
registers 20 through 777 (octal). The
test detects pattern-sensitive faults,
which normally cannot be detected by
systematic data. New data patterns are
used each time the test is run.

If test section 1 (basic instructions and registers test) completes
successfully, the following message is displayed at the IOP-O Kernel
console:
IOP-n BASIC PASSED
If test section 2 (jump instructions test) completes successfully, the
exit stack is reset to all O's and the following message is displayed at
the Kernel consoles of IOP-O and the lOP being tested:
IOP-n JUMPS PASSED
If test section 3 (operand registers test) completes successfully, the
operand registers are reset to all O's and the following message is
displayed at the Kernel consoles of IOP-O and the lOP being tested:
IOP-n OPREG PASSED
The dsiop program is run in all of the lOPs, regardless of whether a
fault is detected in any single lOP. However, if a fault is detected in
any of the lOPs, subsequent diagnostics cannot be executed until the
fault is corrected. Use off-line diagnostics to isolate the failure.
For a list of messages, refer to subsection 6.3.2, Program Messages.

t

The test uses a variant of the Milner fast memory test algorithm (EDN,
28, 21; Oct 13, 1983).

6-12

CRAY PROPRIETARY

SMM-1012 C

dsmos - This program tests the address and data paths from each lOP to
buffer memory.
It does not test the buffer memory data chips.
The dsmos program consists of the following test sections:
1.
2.
3.

Data path test
Local memory addressing test
Buffer memory addressing test

The dsmos test sections are as follows:
Section

Description

1

Data path test. This section tests for dropped or picked
data bits by transferring a single word between address 0
of local memory and address 0 of buffer memory. Dropped
address bits do not affect this test.

2

Local memory addressing test. This section transfers
data between address 0 of buffer memory and selected
local memory addresses, using an algorithm with an
ascending and descending, marching l's and O's pattern.
The block length is always 1.
The following local memory addresses (in octal) are used
for test data:
0, 100000, 100000 + 2 n (includes all
values for which n is an integer in the range
2 through 14), and 177774.

3

Buffer memory addressing test. This section transfers
data between local memory and selected buffer memory
addresses.
The block length is always 1. The test
algorithm is identical to that used in section 2 (local
memory addressing) except that the local memory address
is fixed and the buffer memory address varies.
The following buffer memory word addresses are used for
test data:
0, 2 n (includes all values for which n
is an integer value in the interval [0, log2(MOS@SIZ)]).

If dsmos completes successfully, the following message is displayed at
the Kernel consoles of IOP-O and the lOP being tested:
IOP-n MOS PASSED
The test completed successfully in IOP-n.
The dsmos program is run in all of the lOPs, regardless of whether a
fault is detected in any single lOP. However, if a fault is detected in
any of the lOPs, subsequent diagnostics cannot be executed until the
fault is corrected. Use off-line diagnostics to isolate the failure.
For a list of messages, refer to subsection 6.3.2, Program Messages.

SMM-1012 C

CRAY PROPRIETARY

6-13

dshsp - This program is a high-speed channel test from IOP-n to
central memory or to an SSD solid-state storage device. Although it does
not test memory, dshsp uses part of central memory or SSD memory to
test the channel. The contents in the portion of memory used for testing
are saved at the start of test execution and are restored only if the
test completes successfully.
The dshsp program consists of the following test sections:
1.
2.
3.

Buffer addressing and data test
Local memory addressing test
Central memory or SSD addressing test

The dshsp test sections are as follows:
Section
1

Description
Buffer addressing and data test. This section detects
all single-stuck faults and coupled-data bit faults in
the high-speed channel data buffers. The test writes to
and reads from a block of memory beginning at absolute
address 0 in either central memory or an SSD. For
central memory, the block length is fixed at 32 words
(the size of the data buffers). For an SSD, the block
length is fixed at 64 words (minimum block size).
This test section uses an algorithmt to move a block of
sliding l's and O's through memory in an ascending and
descending pattern. The block is addressed in ascending
order due to hardware constraints.

2

Local memory addressing test. This test uses an
algorithm with an ascending and descending marching l's
and O's pattern. The transfer length is always one word
for central memory and 64 words for an SSD. The central
memory or SSD address is always O.
The following local memory addresses are tested if the
test is from IOP-n to central memory:
77774, 100000,
100000 + 2 n (includes all values for which n is an
integer in the range 2 through 14), and 177774.
The following local memory addresses are tested if the
test is from IOP-n to an SSD: 77400, 100000,
100000 + 2 n (includes all values for which n is an
integer in the range 8 through 14), and 177400.

t

The test uses a variant of the Milner fast memory test algorithm (EDN,
28, 21; Oct 13, 1983).

6-14

CRAY PROPRIETARY

SMM-1012 C

Section
3

Description
Central memory or SSD addressing test. This section uses
an algorithm with an ascending and descending marching
l's and O's pattern. The transfer length is always one
word for central memory and 64 words for an SSD.
The local memory address is arbitrary because it is
assumed that section 2 (local memory addressing test)
passed successfully.
The following central memory addresses are tested if the
test is from IOP-n to central memory:
0, 2 n
(includes all values for which n is an integer in the
interval [0, log2(central memory size)-l]).
The following SSD addresses are tested if the test is
from IOP-n to an SSD:
0, 2 n (includes all values
for which n is an integer in the interval
[0, log2(SSD size)-l]).

If dshsp completes successfully, the following message is displayed at
the Kernel consoles of IOP-O and the lOP being tested:
IOP-n HSP CH=chlch PASSED
The test completed successfully in the high-speed channel pair
chich in IOP-n. The contents of central memory or the SSD
are restored.
The dshsp program is run in all of the lOPs for which a high-speed
channel is defined in $APTEXT, regardless of whether a fault is detected
in any single lOP.
However, if a fault is detected in any of the lOPs,
subsequent diagnostics cannot be executed until the fault is corrected.
Use off-line diagnostics to isolate the failure.
For a list of messages, refer to subsection 6.3.2, Program Messages.

dslsp - This program tests the low-speed deadstart channel from IOP-O
to the Cray mainframe. The dslsp program consists of the following
test sections:
1.
2.

Deadstart data test
Central memory addressing test

The dslsp test sections are as follows:
Section
1

SMM-1012 C

Description
Deadstart data test. This section uses an algorithm with
a marching l's and O's pattern to test the lower 64 words
of central memory.
Each data transfer begins at
address 0 of central memory for a dead load or a dead dump.

CRAY PROPRIETARY

6-15

Section
2

Description
Central memory addressing test. This section uses a CPU
driver for the CPU end of the low-speed channel to test
all address bits. The CPU driver occupies the first 64
words of central memory. The driver manages the channel
protocol; it does not check for errors.
All transfers are one word in length. The test uses the
following central memory addresses: 2n (includes all
values for which n is an integer value in the interval
[5, lo92(CM@SIZE/2)]). The first five address bits are
tested in section 1, deadstart data test.

If dslsp completes successfully, the following message is displayed at
the IOP-O Kernel console:
IOP-O LSP CH=chlch PASSED
The test completed successfully in the low-speed channel pair
chIch in IOP-O. The contents of central memory are restored.
If a fault is detected, subsequent diagnostics cannot be executed until
the fault is corrected. Use off-line diagnostics to isolate the failure.
For a list of messages, refer to subsection 6.3.2, Program Messages.

6.3.2

PROGRAM MESSAGES

The dsdiaq program generates the following types of messages:
•
•
6.3.2.1

Informative
Error
Informative messaqes

The following informative messages are displayed at the IOP-O Kernel
console unless otherwise indicated.
DIAGNOSTICS COMPLETE
The dsdiaq program completed successfully.

test PASSED
test completed successfully.

This message is displayed at the
Kernel consoles of IOP-O and the lOP being tested.

TAPE NOT READY
This message is displayed until the tape is ready for use.

6-16

CRAY PROPRIETARY

SMM-IOI2 C

6.3.2.2

Error messages

This subsection lists the dsdiag error messages, which are grouped as
follows:

•
•
•
•

•

•
•
•

Messages applicable to all tests
IOP-O messages
dsmos16t messages
dsiom messages
dsiop messages
dsmos messages
dshsp messages
dslsp messages

Messages applicable to all tests - The following error messages are
displayed at the IOP-O Kernel console. Use off-line diagnostics to do
further error isolation.
DIAGNOSTICS TERMINATED
An error in one of the tests prevented dsdiaq from executing
successfully. An error message from the failing test is displayed
at one or more of the Kernel consoles. Use off-line diagnostics
to do further error isolation.

device ERROR, STATUS:status
A device error occurred while the overlay was being loaded.
device can be TAPE or DISK. status is the controller status
for the deadstart device. Select a different device and deadstart
the lOS.
If no other device is available or the failure
continues, use off-line diagnostics to isolate the error.
TAPE ERROR, STATUS:status AFTER REWIND
A tape error occurred after the overlay was loaded. status is
the controller status for the tape device. Use a disk device and
deadstart the lOS.
If a disk device is unavailable or the failure
continues, use off-line diagnostics to isolate the error.
OVERLAY HEADER ERROR
The dsdiaq program detected an error in the overlay header.
Select a different device and deadstart the IDS.
If no other
device is available or the failure continues, use off-line
diagnostics to isolate the error.
ATTEMPTED TO READ PAST ADDRESS 77777
The dsdiaq program attempted to read beyond address 77777 in the
overlay. Select a different device and deadstart the IDS.
If no
other device is available or the failure continues, use off-line
diagnostics to isolate the error.

SMM-1012 C

CRAY PROPRIETARY

6-17

END-OF-FILE ENCOUNTERED
While reading the overlay, dsdiaq detected an unexpected
end-of-file. Select a different device and deadstart the 105.
no other device is available or the failure continues, use
off-line diagnostics to isolate the error.

If

INVALID OVERLAY DIRECTORY
While reading the overlay, dsdiaq detected an invalid overlay
directory. Select a different device and deadstart the 105. If
no other device is available or the failure continues, use
off-line diagnostics to isolate the error.
NO OVERLAY FILE FOUND
The dsdiaq program did not find an overlay file. Select a
different device and deadstart the 105. If no other device is
available or the failure continues, use off-line diagnostics to
isolate the error.
IOP-O messages - The following error messages are displayed at the IOP-O
Kernel console. Use off-line diagnostics to do further error isolation.
IOP-O FAILED EXIT STACK
The test terminated after detecting a fault in the IOP-O exit
stack. The bootstrap program is not reloaded. An 105 deadstart
is required.
IOP-O FAILED OPERAND REGISTER
The test terminated after detecting a fault in an IOP-O operand
register.
IOP-O FAILED MEMORY, p=address, LMA=lma
EXP=exp
ACT=act
The test terminated after detecting a data compare error in IOP-O
local memory.
The following information is displayed:

6-18

P=address

Parcel address relative to the start of the test
module in which the fault was detected

LMA=lma

Absolute parcel address in IOP-O local memory

EXP=exp

Expected data

ACT=act

Actual data

CRAY PROPRIETARY

SMM-1012 C

IOP-O FAILED REAL-TIME CLOCK
The test detected a fault in the real-time clock. Although the
test continues, subsequent tests can fail as a result of an
inaccurate clock. A clock failure can occur if the lOP model is
not defined correctly when the deadstart tests are generated.
Check the I@IOPMOD installation parameter and regenerate.
If the
failure continues, use off-line diagnostics to isolate the fault.
For a brief description of the lOS installation parameters, refer
to the I/O Subsystem (lOS) Administrator's Guide, CRI publication
SG-0307.
dsmos16t messages - The following error messages are displayed at the
IOP-O Kernel console. Use off-line diagnostics to do further error
isolation.
MOS-16K FAILED, p=address, BMA=bma
The test detected a hardware failure in buffer memory.
following information is displayed:

The

P=address

Parcel address relative to the start of dsmos16t
in IOP-O

BMA=bma

Absolute word address in buffer memory

MOS-16K FAILED, p=address, BMA=bma
EXP=exp
ACT=act
The test detected a data compare error in buffer memory.
following information is displayed:

The

P=address

Parcel address relative to the start of dsmos16t
in IOP-O

BMA=bma

Absolute word address in buffer memory

EXP=exp

Expected data

ACT=act

Actual data

dsiom messages - The following error messages are displayed at the
IOP-O Kernel console. Use off-line diagnostics to do further error
isolation.
IOP-n 10M FAILED, p=address, LMA=lma
The test detected a hardware failure in IOP-n local memory.
following information is displayed:

The

P=address

Parcel address relative to the start of dsiom in
IOP-O

LMA=lma

Absolute parcel address in IOP-n local memory

SMM-1012 C

CRAY PROPRIETARY

6-19

IOP-n 10M FAILED, p=address, LMA=lma
EXP=exp
ACT=act
The test detected a data compare error in IOP-n local memory.
The following information is displayed:

P=address

Parcel address relative to the start of dsiom in
IOP-O

LMA=lma

Absolute parcel address in IOP-n local memory

EXP=exp

Expected data

ACT=exp

Actual data

dsiop messages - The following error messages are displayed at the
IOP-O Kernel console unless otherwise indicated. Use off-line
diagnostics to do further error isolation.
IOP-n section FAILED, NO RESPONSE
An input-channel-done signal was not received from IOP-n within
the required time limit. section is one of the following test
sections: BASIC, JUMPS, or OPREG. This message precedes the
following message (described in this subsection):
PRESS ANY KEY TO CONTINUE WITH REGISTER DUMP
IOP-n section FAILED, p=address, CH=ipc
The test detected a time-out or a protocol error in ipc, the
interprocessor channel from IOP-O to IOP-n. section is one of
the following test sections: BASIC, JUMPS, or OPREG. The
following information is displayed:

P=address
CH=ipc

Parcel address relative to the start of dsiop in
IOP-O

•

Interprocessor channel number associated with IOP-O

This message precedes the following message (described in this
subsection):
PRESS ANY KEY TO CONTINUE WITH REGISTER DUMP

6-20

CRAY PROPRIETARY

SMM-I012 C

IOP-n section FAILED, p=address, MOS ERROR, BMA=bma
The test detected a failure in a data transfer between local
memory in one of the configured lOPs and buffer memory. section
is one of the following test sections: BASIC, JUMPS, or OPREG.
The following information is displayed:

P=address

Parcel address relative to the start of dsiop in
IOP-O; or if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

BMA=bma

Absolute word address in buffer memory

IOP-n BASIC FAILED, P=address, CH=ipc
EXP=exp, ACT=act
The BASIC test section detected a data compare error in ipc, the
interprocessor channel from IOP-O to IOP-n. The following
information is displayed:

P=address

Parcel address relative to the start of dsiop in
IOP-O

CH=ipc

Interprocessor channel number associated with IOP-O

EXP=exp

Expected data

ACT=act

Actual data

This message precedes the following message (described in this
subsection) :
PRESS ANY KEY TO CONTINUE WITH REGISTER DUMP
IOP-n JUMPS FAILED, CODE=code
The JUMPS test section detected a jump instruction error in
IOP-n. code is the error code returned from the accumulator
of the lOP being tested. This message precedes the following
message (described in this subsection):
PRESS ANY KEY TO CONTINUE WITH REGISTER DUMP
IOP-n OPREG FAILED, P=address, B=register
EXP=exp, ACT=act
The OPREG test section detected a data compare error in the
operand register in IOP-n. The following information is
displayed:

P=address

Parcel address relative to the start of dsiop in
IOP-O

B=register

B register in which the error was detected

SMM-1012 C

CRAY PROPRIETARY

6-21

EXP= exp

Expected data

ACT=act

Actual data

The message is displayed at the Kernel consoles of IOP-O and the
lOP being tested. This message precedes the following message
(described in this subsection), which is displayed at the IOP-O
Kernel console only:
PRESS ANY KEY TO CONTINUE WITH REGISTER DUMP
PRESS ANY KEY TO CONTINUE WITH REGISTER DUMP
The dsiop program detected an error and issued the err~r message
that preceded this message. If you press any key, dsiop dumps
the lOP being tested to the IOP-O Kernel console. The following
information is displayed:
A=a, C=c, B=b, (B)=r, E=e, (E)=sl, (E-l)=s2, (E-2)=s3
Accumulator of the lOP being tested

C=c

Carry flag

B=b

B register

(B)=r

B

register contents

Exit stack pointer

(E)=sl
(E-l)=s2
(E-2)=s3

Contents of the top three exit stack locations.
One of the stack locations normally represents the
address at which a fault was detected in the lOP
being tested.

Examine the dump values to isolate the fault. Depending on the
fault, some or all of the dump values can be unreliable.
Therefore, check the values for consistency. Prior to taking the
dump (by pressing any key), a field engineer can scope the
P register of the lOP being tested to ensure reliable values. Use
off-line diagnostics to isolate the fault.
dsmos messages - The following error messages are displayed at the
IOP-O Kernel console unless otherwise indicated. Use off-line
diagnostics to do further error isolation.

6-22

CRAY PROPRIETARY

SMM-1012 C

IOP-n MOS FAILED, P=address
The test detected a failure in the path between IOP-n and buffer
memory. The following information is displayed:

P=address

Parcel address relative to the start of dsmos in
IOP-O; or, if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

IOP-n MOS FAILED, P=address, NO RESPONSE
IOP-O did not receive a response from IOP-n following the buffer
memory test. The following information is displayed:

p=address

Parcel address relative to the start of dsmos in
IOP-O; or, if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

IOP-n MOS FAILED, P=address, MOS ERROR
The test detected a failure in the path between IOP-n and buffer
memory. The following information is displayed:

p=address

Parcel address relative to the start of dsmos in
IOP-O; or, if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

This message is displayed at the Kernel consoles of IOP-O and the
lOP being tested.
IOP-n MOS FAILED, P=address

LMA=lma, BMA=bma
EXP=exp
ACT=act
The test detected a data compare error in the path between IOP-n
and buffer memory. The following information is displayed:

P=address

Parcel address relative to the start of dsmos in
IOP-O; or, if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

LMA=lma

Absolute parcel address in local memory

BMA=bma

Absolute word address in buffer memory

EXP=exp

Expected data

ACT=act

Actual data

This message is displayed at the Kernel consoles of IOP-O and the
lOP being tested.

SMM-1012 C

CRAY PROPRIETARY

6-23

dshsp messages - The following error messages are displayed at the
IOP-O Kernel console unless otherwise indicated. Check the error logger
for double bit errors. Use off-line diagnostics to do further isolation.
IOP-O HSP CH=chlch FAILED, P=address, MOS ERROR
IOP-O tried to write the diagnostic overlay to MOS. Upon
completion, both the Busy and Done flags were found to be set.
The probable error is in the channel from IOP-O to MOS memory.
Run off-line diagnostics to further isolate the problem.

CH=chlch

High-speed channel pair

P=address

Parcel address relative to the start of dshsp in
IOP-O; or if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

The contents of CM or SSD remain unchanged.
displayed on the IOP-O console.

This message is

IOP-n HSP CH=chlch FAILED, p=address, NO RESPONSE
IOP-O sent an overlay package to MOS, deadstarted IOP-n, and
waited for a response. The Done flag was never set (indicating
that IOP-n did not respond by sending a return code). The
probable error is in the deadstarting of IOP-n, the ability of
IOP-n to read from MOS, or the test code was corrupt (due to a
hardware memory problem). Check for further test messages or run
off-line diagnostics.
IOP-n

The lOP that would not deadstart

CH=chlch

High-speed channel pair

p=address

Parcel address relative to the start of dshsp in
IOP-O; or if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

The contents of CM or SSD remain unchanged.
displayed on the IOP-O console.

This message is

IOP-n HSP CH=chlch FAILED, p=address, BAD RETURN STATUS, S=address
IOP-O sent a test to IOP-n.
IOP-n executed the tests and returned
a bad status. This indicates that the test found an error in IOP-n.
Check the IOP-n console for further messages.

6-24

IOP-n

The lOP that sent the message to IOP-O

CH=chlch

High-speed channel pair

CRAY PROPRIETARY

SMM-1012 C

p=address

Parcel address relative to the start of dshsp
in IOP-O; or if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

S=address

The address of the problem in IOP-n is
returned. The address is relative to the start
of the overlay sent to IOP-n.

It is unknown whether the contents of CM or SSD have been
corrupted. This message is displayed on the IOP-O console.
IOP-n HSP CH=chlch PASSED
IOP-O sent a test to IOP-n. IOP-n executed the tests and
returned a zero status indicating that no errors were discovered.
lOP-n

The lOP that sent the message to lOP-O

CH=chlch

High-speed channel pair

The contents of CM or SSD were restored to their original state.
This message is displayed on the lOP-O console.
The following messages are displayed on the lOP-n console.
lOP-n HSP CH=chlch FAILED, p=address, NO CONFIGURED MEMORY SIZE
IOP-n found a high-speed channel configured, but the configured
memory size for CM or SSD attached to that channel is zero. This
is not a hardware error. Correct the channel and memory size
configured in $APTEXT or $lOSDEF. The test in lOP-n for this
channel was bypassed.
lOP-n

The lOP being tested

CH=chlch

High-speed channel pair

P=address

Parcel address relative to the start of dshsp
in IOP-O; or if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

The contents of CM or SSD remain unchanged.
displayed on the IOP-n console.

SMM-1012 C

CRAY PROPRIETARY

This message is

6-25

IOP-n HSP CH=chlch FAILED, P=address, CH=ch, routine, TIMEOUT SAVEMEM
IOP-n tried to read from CM or SSD to save the contents of the
memory to be tested before beginning the test. After the read was
started, the program waited for the Done flag to be set. The Done
flag was never set so the program timed out. The probable error
is in the channel from IOP-n to CM or SSD memory. Run off-line
diagnostics to further isolate the problem.
IOP-n

The lOP being tested

CH=chlch

High-speed channel pair

P=address

Parcel address relative to the start of dshsp
in IOP-O; or if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

CH=ch

Channel on which the error was detected

routine

The test routine executing in IOP-n when the
error was encountered. The test routines in
order are HSPBUFF, HSPLMCM, HSPLMSSD, HSPCMA, and
HSPSSDA. The test routine HSPBUFF is the first
time the HSP channel is used.

The contents of CM or SSD remain unchanged.
displayed on the IOP-n console.

This message is

IOP-n HSP CH=chlch FAILED, P=address, CH=ch, routine, BZ & DN SAVEMEM
LMA=address, CMA or SSDA=address
EXP=exp
ACT=act
IOP-n tried to read from CM or SSD to save the contents of the memory
to be tested before beginning the test. Upon completion of the read
(when the Done flag was set), both the Busy and Done flags were found
to be set. The probable error is in the channel from IOP-n to CM or
SSD memory. Check the error logger for double bit errors. Run
off-line diagnostics to further isolate the problem.
This error can also occur if the test tries to read or write past the
end of CM or SSD. Check the configured memory size of CM or SSD in
$APTEXT.

6-26

IOP-n

The lOP being tested

CH=chlch

High-speed channel pair

CRAY PROPRIETARY

SMM-I012 C

P=address

Parcel address relative to the start of dshsp
in IOP-O; or if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

CH=ch

Channel on which the error was detected

routine

The test routine executing in IOP-n when the
error was encountered. The test routines in
order are HSPBUFF, HSPLMCM, HSPLMSSD, HSPCMA, and
HSPSSDA. The test routine HSPBUFF is the first
time the HSP channel is used.

LMA=address

Absolute parcel address in local memory of data

CMA or
SSDA=address

Absolute word address in central memory or SSD of
the data

EXP=exp

Expected data

ACT=act

Actual data

The contents of CM or SSD remain unchanged.
displayed on the IOP-n console.

This message is

IOP-n HSP CH=chlch FAILED, p=address, CH=ch, routine, TIMEOUT
IOP-n tried to read/write a test pattern from/to CM or SSD.
Check the channel number to determine if the error was on a read
or write. After the read/write was started, the program waited
for the Done flag to be set. The Done flag was never set so the
program timed out. The probable error is in the channel CH=ch
from IOP-n to CM or SSD memory. Run off-line diagnostics to
further isolate the problem.
lOP-n

The lOP being tested

CH=chlch

High-speed channel pair

p=address

Parcel address relative to the start of dshsp
in IOP-O; or if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

CH=ch

Channel on which the error was detected

routine

The test routine executing in IOP-n when the
error was encountered. The test routines in
order are HSPBUFF, HSPLMCM, HSPLMSSD, HSPCMA, and
HSPSSDA.

The contents of CM or SSD may have been corrupted.
is displayed on the IOP-n console.

SMM-1012 C

CRAY PROPRIETARY

This message

6-27

IOP-n HSP CH=chlch FAILED, P=address, CH=ch, routine, ERROR FLAG
IOP-n tried to write a test pattern to CM or SSD. Upon
completion of the write (when the Done flag was set), both the
Busy and Done flags were found to be set. The probable error is
in the channel CH=ch from IOP-n to CM or SSD memory. Check
the error logger for double bit errors. Run off-line diagnostics
to further isolate the problem.
IOP-n

The lOP being tested

CH=chlch

High-speed channel pair

P=address

Parcel address relative to the start of dshsp
in IOP-O; or if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

CH=ch

Channel on which the error was detected

routine

The test routine executing in IOP-n when the
error was encountered. The test routines in
order are HSPBUFF, HSPLMCM, HSPLMSSD, HSPCMA, and
HSPSSDA.

The contents of CM or SSD may have been corrupted.
is displayed on the IOP-n console.

This message

IOP-n HSP CH=chlch FAILED, P=address, CH=ch, routine, ERROR FLAG
LMA=address, CMA or SSDA=address
EXP=exp
ACT=act
IOP-n tried to read a test pattern from CM or SSD. Upon
completion of the read (when the Done flag was set), both the Busy
and Done flags were found to be set. The probable error is in the
channel CH=ch from IOP-n to CM or SSD memory •. Check the error
logger for double bit errors. Run off-line diagnostics to further
isolate the problem.

6-28

IOP-n

The lOP being tested

CH=chlch

High-speed channel pair

P=address

Parcel address relative to the start of dshsp
in IOP-O; or if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

CH=ch

Channel on which the error was detected

CRAY PROPRIETARY

SMM-1012 C

routine

The test routine executing in IOP-n when the
error was encountered. The test routines in
order are HSPBUFF, HSPLMCM, HSPLMSSD, HSPCMA, and
HSPSSDA.

LMA=dddress

Absolute parcel address in local memory of data

CMA or
SSDA=address

Absolute word address in central memory or SSD of
the data

EXP=exp

Expected data

ACT=dct

Actual data

The contents of CM or SSD may have been corrupted.
is displayed on the IOP-n console.

This message

IOP-n HSP CH=chlch FAILED, P=address, routine, CH=ch, DATA COMPARE
LMA=address, CMA or SSDA=address
EXP=exp
AC=dCt
IOP-n wrote a test pattern to CM or SSD and then read it back.
The data read from memory (ACT) did not match the original data
(EXP) written to memory. The probable error is in the channel
from IOP-n to CM or SSD memory.
Run off-line diagnostics to
further isolate the problem.
IOP-n

The lOP being tested

CH=chlch

High-speed channel pair

P=address

Parcel address relative to the start of dshsp
in IOP-O; or if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

CH=ch

Channel on which the error was detected

routine

The test routine executing in IOP-n when the
error was encountered. The test routines in
order are HSPBUFF, HSPLMCM, HSPLMSSD, HSPCMA, and
HSPSSDA.

LMA=address

Absolute parcel address in local memory of data

SMM-1012 C

CRAY PROPRIETARY

6-29

CMA or
SSDA=address

Absolute word address in central memory or SSD of
the data

EXP=exp

Expected data

ACT=act

Actual data

The contents of CM or SSD may have been corrupted.
is displayed on the IOP-n console.

This message

IOP-n HSP CH=chlch FAILED, P=address, CH=ch, routine, TIMEOUT RESTMEM
After testing, IOP-n tried to write to CM or SSD to restore the
original contents of memory. After the write was started, the
program waited for the Done flag to be set. The Done flag was
never set so the program timed-out. The probable error is in the
channel from IOP-n to CM or SSD memory. Run off-line
diagnostics to further isolate the problem.
IOP-n

The lOP being tested

CH=chlch

High-speed channel pair

P=address

Parcel address relative to the start of dshsp
in IOP-O; or if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

CH=ch

Channel on which the error was detected

routine

The test routine executing in IOP-n when the
error was encountered. The test routines in
order are HSPBUFF, HSPLMCM, HSPLMSSD, HSPCMA, and
HSPSSDA.

The contents of CM or SSD may have been corrupted.
is displayed on the IOP-n console.

This message

IOP-n HSP CH=chlch FAILED, p=address, CH=ch, routine, BZ &
ON RESTMEM
After testing, IOP-n tried to write to CM or SSD to restore the
original contents of memory. Upon completion of the write (when
the Done flag was set), both the Busy and Done flags were found to
be set. The probable error is in the channel from IOP-n to CM
or SSD memory. Check the error logger for double bit errors. Run
off-line diagnostics to further isolate the problem.

6-30

IOP-n

The lOP being tested

CH=chlch

High-speed channel pair

CRAY PROPRIETARY

SMM-1012 C

P=address

Parcel address relative to the start of dshsp
in IOP-O; or if IOP-O is being tested, the parcel
address relative to the start of the test module
in which the fault was detected.

CH=ch

Channel on which the error was detected

routine

The test routine executing in IOP-n when the
error was encountered. The test routines in
order are HSPBUFF, HSPLMCM, HSPLMSSD, HSPCMA, and
HSPSSDA.

The contents of CM or SSD may have been corrupted.
is displayed on the IOP-n console.

This message

dslsp messages - The error messages are displayed at the IOP-O Kernel
console. Use off-line diagnostics to do further error isolation.
In this subsection, the messages are grouped as follows:
•
•
•
•

Time-out messages
Channel interface status flag messages
Data compare error messages
Overlay messages

For information on the channel interface status flags (FLAGS=flags),
refer to the following CRI publications, as appropriate:
HR-0030
HR-0081

IIO Subsystem Model B Hardware Reference Manual
IIO Subsystem Model C/D Hardware Reference Manual

The time-out messages follow.
IOP-n LSP CH=chlch FAILED, P=address, LMA=lma, CH=ch, TIMEOUT
LSPCPUA, READ FROM CM
While attempting to read one word from central memory addresses in
mUltiples of 10, starting at address 100 and continuing to the end
of central memory, the program detected a time-out in the
low-speed channel pair chich in IOP-n. Central memory may
have been corrupted. The following information is displayed:
IOP-n

CH=chlch
P=address
LMA=lma

CH=ch
LSPCPUA

READ FROM CM

SMM-1012 C

lOP in which the test was executing
Low-speed channel pair
Parcel address relative to the start of dslsp
Absolute parcel address in local memory
Low-speed channel pair
Read one word from central memory addresses in
multiples of 10, starting at address 100 and
continuing to the end of central memory
Read from central memory

CRAY PROPRIETARY

6-31

IOP-n LSP CH=chlch FAILED, p=address, LMA=lma, CH=ch, TIMEOUT
LSPCPUA, WRITE TO CM
While attempting to write one word to central memory addresses in
multiples of 10, starting at address 100 and continuing to the end
of central memory, the program detected a time-out in the
low-speed channel pair chIch in IOP-n. Central memory may
have been corrupted. The following information is displayed:
IOP-n

CH=chlch
P=address
LMA=lma

CH=ch
LSPCPUA
WRITE TO CM

lOP in which the test was executing
Low-speed channel pair
Parcel address relative to the start of dslsp
Absolute parcel address in local memory
Low-speed channel pair
Write one word to central memory addresses in
multiples of 10, starting at address 100 and
continuing to the end of central memory
Write to central memory

IOP-n LSP CH=chlch FAILED, p=address, LMA=lma, CH=ch,
TIMEOUT
LSPDSDD, READ FROM CM
While attempting to read blocks of various lengths from central
memory address 0, the program detected a time-out in the low-speed
channel pair chIch in IOP-n. Central memory may have been
corrupted. The following information is displayed:
IOP-n

CH=chlch
P=address
LMA=lma

CH=ch
LSPDSDD
READ FROM CM

lOP in which the test was executing
Low-speed channel pair
Parcel address relative to the start of dslsp
Absolute parcel address in local memory
Low-speed channel pair
Read blocks of various lengths from central memory
address 0
Read from central memory

IOP-n LSP CH=chlch FAILED, p=address, LMA=lma, CH=ch,
TIMEOUT
LSPDSDD, WRITE TO CM
While attempting to write blocks of various lengths to central
memory address 0, the program detected a time-out in the low-speed
channel pair chIch in IOP-n. Central memory may have been
corrupted. The following information is displayed:
IOP-n

CH=chlch
P=address
LMA=lma

CH=ch
LSPDSDD
WRITE TO CM

6-32

lOP in which the test was executing
Low-speed channel pair
Parcel address relative to the start of dslsp
Absolute parcel address in local memory
Channel on which the error was detected
Write blocks of various lengths to central
memory address 0
Write to central memory

CRAY PROPRIETARY

SMM-I012 C

IOP-n LSP CH=chlch FAILED, p=address, LMA=lma, CH=ch, TIMEOUT
RESTMEM, WRITE TO CM
While attempting to restore the central memory locations used in
the test, the program detected a time-out in the low-speed channel
pair chich in IOP-n. Central memory may have been
corrupted. The following information is displayed:
IOP-n

CH=chlch
P=address
LMA=lma
CH=ch
RESTMEM
WRITE TO CM

lOP in which the test was executing
Low-speed channel pair
Parcel address relative to the start of dslsp
Absolute parcel address in local memory
Channel on which the error was detected
Final write to central memory
Write to central memory

IOP-n LSP CH=chlch FAILED, p=address, LMA=lma, CH=ch, TIMEOUT
SAVEMEM, READ FROM CM
While attempting to save the central memory locations used in the
test, the program detected a time-out in the low-speed channel
pair chich in IOP-n. Central memory is not corrupted. The
following information is displayed:
IOP-n

CH=chlch
P=address
LMA=lma

CH=ch
SAVEMEM
READ FROM CM

lOP in which the test was executing
Low-speed channel pair
Parcel address relative to the start of dslsp
Absolute parcel address in local memory
Low-speed channel pair
Initial read from central memory
Read from central memory

The status flag messages follow.
IOP-n LSP CH=chlch FAILED, p=address, FLAGS=flags, CH=ch
LSPCPUA, READ FROM CM
While attempting to read one word from central memory addresses in
multiples of 10, starting at address 100 and continuing to the end
of central memory, the program detected a hardware error in the
low-speed channel pair chich in lOP-D. Central memory may
have been corrupted. The following information is displayed:
IOP-n

CH=chlch
p=address
FLAGS=flags
CH=ch
LSPCPUA

READ FROM CM

SMM-1012 C

lOP in which the test was executing
Low-speed channel pair
Parcel address relative to the start of dslsp
An octal value representing one or more channel
interface status flags
Channel on which the error was detected
Read one word from central memory addresses in
multiples of 10, starting at address 100 and
continuing to the end of central memory
Read from central memory

CRAY PROPRIETARY

6-33

IOP-n LSP CH=chlch FAILED, p=address, FLAGS=flags, CH=ch
LSPCPUA, WRITE TO CM
While attempting to write one word to central memory addresses in
multiples of 10, starting at address 100 and continuing to the end
of central memory, the program detected a hardware error in the
low-speed channel pair chIch in IOP-O. Central memory may
have been corrupted. The following information is displayed:
IOP-n

CH=chlch
p=address
FLAGS=flags
CH=ch
LSPCPUA

WRITE TO CM

lOP in which the test was executing
Low-speed channel pair
Parcel address relative to the start of dslsp
An octal value representing one or more channel
interface status flags
Channel on which the error was detected
Write one word to central memory addresses in
multiples of 10, starting at address 100 and
continuing to the end of central memory
Write to central memory

IOP-n LSP CH=chlch FAILED, p=address, FLAGS=flags, CH=ch
LSPDSDD, READ FROM CM
While attempting to read blocks of various lengths from central
memory address 0, the program detected a hardware error in the
low-speed channel pair chIch in IOP-O.
Central memory may
have been corrupted. The following information is displayed:
IOP-n

CH=chlch
P=address
FLAGS=£lags
CH=ch
LSPDSDD
READ FROM CM

lOP in which the test was executing
Low-speed channel pair
Parcel address relative to the start of dslsp
An octal value representing one or more channel
interface status flags
Channel on which the error was detected
Read blocks of various lengths from central
memory address 0
Read from central memory

IOP-n LSP CH=chlch FAILED, p=address, FLAGS=flags, CH=ch
LSPDSDD, WRITE TO CM
While attempting to write blocks of various lengths to central
memory address 0, the program detected a hardware error in the
low-speed channel pair ChIch in IOP-O.
Central memory may
have been corrupted. The following information is displayed:
IOP-n

CH=chlch
P=address
FLAGS=flags
CH=ch
LSPDSDD
WRITE TO CM

6-34

lOP in which the test was executing
Low-speed channel pair
Parcel address relative to the start of dslsp
An octal value representing one or more channel
interface status flags
Channel on which the error was detected
Write blocks of various lengths to central
memory address 0
Write to central memory

CRAY PROPRIETARY

SMM-1012 C

IOP-n LSP CH=chlch FAILED, P=address, FLAGS=flags, CH=ch
RESTMEM, WRITE TO CM
While attempting to restore the central memory locations used in
the test, the program detected a hardware error in the low-speed
channel pair chich in IOP-O. Central memory may have been
corrupted. The following information is displayed:
IOP-n

CH=chlch
P=address
FLAGS=flags
CH=ch
RESTMEM
WRITE TO CM

lOP in which the test was executing
Low-speed channel pair
Parcel address relative to the start of dslsp
An octal value representing one or more channel
interface status flags
Channel on which the error was detected
Final write to central memory
Write to central memory

IOP-n LSP CH=chlch FAILED, P=address, FLAGS=flags, CH=ch
SAVEMEM, READ FROM CM
While attempting to save the central memory locations used in the
test, the program detected a hardware error in the low-speed
channel pair chich in rop-o.
Central memory is not
corrupted. The following information is displayed:
IOP-n

CH=chlch
P=address
FLAGS=flags
CH=ch
SAVEMEM
READ FROM CM

lOP in which the test was executing
Low-speed channel pair
Parcel address relative to the start of dslsp
An octal value representing one or more channel
interface status flags
Channel on which the error was detected
Initial read from central memory
Read from central memory

The data compare error messages follow.
IOP-n LSP CH=chlch FAILED, P=address, CMA=cma
LSPCPUA
EXP=exp
ACT=act
While writing and reading one word to and from central memory
addresses in multiples of 10, starting at address 100 and
continuing to the end of central memory, the program detected a
data compare error in the low-speed channel pair chich in
lOP-no The expected data did not match the actual data.
Central memory may have been corrupted. The following information
is displayed:
IOP-n

CH=chlch
P=address
CMA=cma

SMM-I012 C

lOP in which the test was executing
Low-speed channel pair
Parcel address relative to the start of dslsp
Absolute word address in central memory

CRAY PROPRIETARY

6-35

LSPCPUA

EXP=exp
ACT=act

Write and read one word to and from central memory
addresses in multiples of 10, starting at address
100 and continuing to the end of central memory
Expected data
Actual data

IOP-n LSP CH=chiCh FAILED, p=address, CMA=cma
LSPDSDD
EXP=exp
ACT=act
While writing and reading blocks of various lengths to and from
central memory address 0, the program detected a data compare
error in the low-speed channel pair chich in IOP-n. The
expected data did not match the actual data. Central memory may
have been corrupted. The following information is displayed:
IOP-n

CH=chlch
P=address
CMA=cma
LSPDSDD
EXP=exp
ACT=act

lOP in which the test was executing
Low-speed channel pair
Parcel address relative to the start of dslsp
Absolute word address in central memory
Write and read blocks of various lengths to and
from central memory address 0
Expected data
Actual data

The overlay messages follow.
IOP-n LSP CH=chlch
FAILED - OVERLAY NOT DSLSPCP
The overlay that the test read was not DSLSPCP. Central memory may
have been corrupted. The following information is displayed:
IOP-n

CH=chlch

lOP in which the test was executing
Low-speed channel pair

IOP-n LSP CH=chlch
FAILED - OVERLAYS NOT FOUND
The test could not find an overlay file.
Central memory may have
been corrupted. The following information is displayed:
IOP-n

CH=chlch

lOP in which the test was executing
Low-speed channel pair

IOP-n LSP CH=chiCh
FAILED - OVERLAY WRONG TYPE
The test found the overlay file DSLSPCP, but it has the wrong
overlay type. Central memory may have been corrupted. The
following information is displayed:
IOP-n

CH=chlch

6-36

lOP in which the test was executing
Low-speed channel pair

CRAY PROPRIETARY

SMM-1012 C

7.

UTILITY PROGRAMS

Utility programs are on-line diagnostic tools rather than tests.
section describes the following utilities:
•
•

7.1

This

olhpa (hardware performance analyzer)
runsequence (automatic test sequencer)

olhpa

The olhpa program is a hardware performance analyzer that analyzes and
reports the hardware errors and statuses recorded in the system error
log. The olhpa program displays the following types of reports:
•

A report listing one line of error information for each hardware
error. The error information is displayed in fields and is sorted
from left to right (refer to sort(l».

•

A comprehensive error report similar to the errpt(lM) report
(-1 command option)

•

A summary of total errors (-q command option)

•

A bar graph showing total errors for the specified time interval
(-9 [d]n command option)

7.1.1

PROGRAM SYNOPSIS

This subsection contains the olhpa program synopsis. All of the
command options except errfiles can be entered in any order.
If
errfiles is specified, it must be the last entry on the command line.
The olhpa program displays disk, memory, tape, and SSD error reports in
fields.
If olhpa is entered without command options and arguments, it
is equivalent to entering the following:
olhpa -dmtv
The start time is the current time and date minus 30 days. The end time
is the current time and date. The olhpa program reads from the error
file /usr/adm/errfi1e.
r

SMM-1012 C

CRAY PROPRIETARY

7-1

Synopsis:
olhpa [-1] [-q] [-g [d]n] [-d] [-m] [-t] [-v] [-D argument]

[-M argument] [-T argument] [-V argument] [-8 start]
[-e end] [errfiles]
-1

Displays a long version of the selected error ~eport. If
you select -1, do not select -q or -g [d]n. A
-1 report contains the same information as the
errpt(lM) report. For example, enter the following to
display a long version of a memory error report:
olhpa -m -1
Long reports are not sorted.

-q

Displays only the summary information of an error report.
If you select -q, do not select -lor -g [d]n.

-g [d]n

Displays a bar graph showing the total errors for the
specified time interval. If you select -g [d]n, do
not select -lor -q. A single mnemonic value
represents each error, as follows:
Mnemonic

R
U

Description
Represents one recovered/corrected error
Represents one unrecovered/uncorrected error

The required argument n indicates the time interval that
each bar in the graph represents. If the interval (n) is
in days, precede n with the d command; otherwise it is
assumed that n is in hours.
n can be any integer value. However, n should be
within the limits set by the start/end times and dates
(-8 start and -e end, respectively). For example, if the
start time is 7:00, the end time is 11:00, and n is 8,
the interval is adjusted so that the program generates a
report for one 4-hour interval.

7-2

CRAY PROPRIETARY

SMM-1012 C

-d

Displays a report of all disk errors. The default display
contains the following information in the order listed:
Field
Mnemonic

Field
Date
Time
Error type
Device type
lOP
Channel
Head
Sector
Cylinder
General status
Status

-m

SMM-I012 C

dte
tme
et
dt
iop
cha
hd
sct
cyl
gs
st

Displays a report of all memory errors. The default
display contains the following information in the order
listed:

Field

Field
Mnemonic

Date
Time
Syndrome
Bank
Failing bit
Chip select
Failing module
CPU
Current command
Count
Status

dte
tme
syn
bnk
bit
chp
loc
cpu
cmd
cnt
st

CRAY PROPRIETARY

7-3

-t

Displays a report of all tape errors. The default display
contains the following information in the order listed:
Field
Mnemonic
Date
Time
Error type
Initial channel
Initial device path
Final device path
Block
Retry
Sense byte #00
Status

-v

dte
tme
et
ich
idp
fdp
blk
ret
sO
st

Displays a report of all SSD errors. The default display
contains the following information in the order listed:

Field

Field
Mnemonic

Date
Time
Channel
Status
SSD address
Central memory address
Transfer length
Read/write flag

dte
tme
cha
st
sad
mad
len
rwf

-D argument, -M argument, -T argument -v argument
Displays a report of disk, memory, tape, or SSD errors
(-D, -M, -T, or -v option, respectively). The
required argument can be one of the following:
Argument

Description

p[,+],field[,field]
Replaces or adds to the default display.
If
entered with the plus (+) option, the
specified fields are displayed in addition
to the default display.
If entered without
the plus (+) option, the specified fields
are displayed instead of the default fields
(and the specified fields become the default
display for the test run).
field can be
any mnemonic listed in the help menu.

7-4

CRAY PROPRIETARY

SMM-1012 C

-D argument, -M argument, -T argument

-v argument

(continued)
The fields are displayed in the order in
which they are entered. The error
information is sorted from left to right.
Refer to sort(l).

S,field=value[,field=value]
Displays only the records in which the
fields meet all of the associated value
restrictions.
field can be any mnemonic
listed in the help menu.
value is the
field assignment.

H

-s start

Displays an associated help menu. The
mnemonics in the menu are used to select
fields for the field portion of the
preceding arguments.

Sets the start time and date of the report.
Enter the
-s option with one of the following required arguments:
Argument
1

Description

n

End time and date of the report
(-e end) minus n days

hh:mm,MM/DD/YY

Time (hours:minutes) and date
(month/day/year)

M:mm

Time (hours:minutes).
to the current date.

MM/DD/YY

Date (month/day/year).
set to 00:00.

The date is set

The time is

The default for start is the current time and date minus
30 days.

SMM-1012 C

CRAY PROPRIETARY

7-5

-e end

Sets the end time and date of the report. The required
argument must be in one of the following formats:
Format

Description

hh: mm, MM/ DD/YY

Time (hours:minutes) and date
(month/day/year)

hh:mm

Time (hours:minutes).
to the current date.

MM/DD/YY

Date (month/day/year).
set to 23:59.

The date is set

The time is

The default for end is the current time and date.

errfiles

7.1.2

Specifies the errfiles to be read. errfiles can be
one or more files created by errdemon(lM). The default
errfile is /usr/adm/errfile.

HELP MENUS

This subsection contains the menus to use in selecting the fields for the
field portion of the arguments associated with the -D, -M, -T,
and -v options.
Figures 7-1, 7-2, 7-3, and 7-4 show the Disk, Memory, Tape, and SSD Help
Menus, respectively.

7-6

CRAY PROPRIETARY

SMM-1012 C

dte)
dtc)
iop)
cha)
hd )
sst)
st )
blk)
cs )
df )
s1 )

Date
Dt-IOP-channel-unit
lOP
Channel
Head
Spiralled sector
Status
Block
Control status
Disk function
Status 01

tme)
dt )
ios)
et )
sct)
cyl)
ret)
sbk)
gs )
sO )
s2 )

Time
Device type
lOS
Error type
Sector
Cylinder
Retry
Spiralled block
General status
Status 00
Status 02

s21)
s23)
ecs)
edf)

Status 21
Status 23
End controller status
End disk function

s22)
ies)
eds)
fes)

Status 22
Initial error status
End drive status
Final error status

a1 )
b1 )
aof)
b2c)
a2c)
b2o)
a2o)
elm)

A1 - bit 5 of
B1 - bit 7 of
A-offset
B2 correction
A2 correction
B2 offset
A2 offset
Expected LMA

a2 )
b2 )
bof)
b1c)
alc)
b1o)
alo)
aIm)

A2 - bit 6 of
B2 - bit 8 of
B-offset
B1 correction
A1 correction
B1 offset
A1 offset
Actual LMA

cO )
c2 )
ofs)
sy1)
sy2)
c2c)
cOc)
c2o)
cOo)

CO - bit 3 of G.S.
C2 - bit 5 of G.S.
Offset
Chan. 1 syndrome
Chan. 3 syndrome
C2 correction mask
CO correction mask
C2 offset
CO offset

c1 )
c3 )
syO)
sy2)
c3c)
clc)
c30)
c1o)

C1 - bit 4 of G.S.
C3 - bit 6 of G.S.
Chan. 0 syndrome
Chan. 2 syndrome
C3 correction mask
C1 correction mask
C3 offset
C1 offset

0049 only
G.S.
G.S.
mask
mask

G.S.
G.S.
mask
mask

0039 only

Figure 7-1.

SMM-1012 C

Disk Help Menu (1 of 2)

CRAY PROPRIETARY

7-7

0040 only
ibs)
fbs)
msk)
dfa)
if1)
if3)
io1)
io3)
ic1)
ic3)
ef1)
ef3)
ec1)
ec3)

Initial buffer stat.
Final buffer status
0040 correction mask
Oefect address
Initial fault stat 1
Initial fault stat 3
Initial opere stat 1
Initial opere stat 3
Initial FRU code 1
Initial FRU code 3
Ending fault stat 1
Ending fault stat 3
Ending FRU code 1
Ending FRU code 3

ids)
fds)
off)
if 0)
if2)
ioO)
io2)
icO)
ic2)
efO)
ef2)
ecO)
ec2)
syn)

Initial drive status
Final drive status
0040 offset
Initial fault stat 0
Initial fault stat 2
Initial opere stat 0
Initial opere stat 2
Initial FRU code 0
Initial FRU code 2
Ending fault stat 0
Ending fault stat 2
Ending FRU code 0
Ending FRU code 2
Channel syndrome

0029/0019 only
cid)
req)
isr)
of d)
cvO)
cv2)

Cylinder from 10
Request
Interlock stat. reg.
Offset direqtion
Correction vector 0
Correction vector 2

csr)
fsr)
mrg)
mgn)
cv1)
cv3)

Cylinder status reg.
Fault status reg.
Margin
Magnitude
Correction vector 1
Correction vector 3

Figure 7-1.

dte)
cnt)
st )
mde)
syn)
bnk)
add)
loc)
cmd)

Oisk Help Menu (2 of 2)

Oate
Count
Status
Mode
Syndrome
Bank
Failing address
Failing module
Current Command

Figure 7-2.

7-8

tme)
ity)
sub)
cpu)
chp)
rh )
bit)
usr)

Time
Initial type
Subtype
CPU
Chip-select
Rh
Failing bit
Current user

Memory Help Menu

CRAY PROPRIETARY

SMM-1012 C

dte)
et )
ich)
idp)
fch)
fds)
ffn)
dns)
vol)
cmd)
sO )

Date
Error type
Initial channel
Initial device path
Final channel
Final device stat.
Final function
Density
Volume
Command
SBOO

tme)
st )
ios)
ids)
fdp)
ifn)
blk)
ret)
usr)
ipt)
sl )

Time
Status
IDS number
Initial device stat.
Final device path
Initial function
Block
Retry
User
Input tags
SB01

s22) SB22

s23) SB23

s24)
s26)
s28)
s30)

s25)
s27)
s29)
s31)

IBM 3480 only
SB24
SB26
SB28
SB30

Figure 7-3.

dte)
cha)
sad)
len)

Date
Channel
SSD-Address
Length

Figure 7-4.

7.1.3

SB25
SB27
SB29
SB31

Tape Help Menu

tme)
st )
mad)
rwf)

Time
Status
MEM-Address
Read/write flag

SSD Help Menu

PROGRAM EXAMPLES

This subsection contains olhpa execution examples. Depending on
whether errors are in the current error file, it may be necessary to
specify an error file.
If you need assistance, contact your CRI
representative.
To display disk, tape, memory, and SSD error reports, enter the following:
olhpa

SMM-1012 C

CRAY PROPRIETARY

7-9

To display a disk error report, enter the following:
olhpa -d
To display a disk error report for an error file, enter the following:
olhpa -d errfile

To display the disk help menu, enter the following:
olhpa -0 H

To display a disk error report for the date, time, head, and channel
fields only, enter the following:
olhpa -0 P,OTE,TME,HO,CHA

To display a disk error report of only the records for which the channel
is equal to 26 and the lOP is equal to 2, enter the following:
olhpa -0 S,CHA=26,IOP=2

The following example searches for disk errors for a specific channel and
lOP, and displays the associated error information in the specified
fields. The disk error report will display the following fields for only
the records for which the channel is equal to 26 and the lOP is equal
to 2: date, time, device type, general status, and AI, A2, B1, and B2 of
the general status. Enter the following:
olhpa -OS,CHA=26,IOP=2 -OP,OTE,TME,OT,GS,Al,A2,Bl,B2

To display a bar graph showing yesterday's disk errors in 2-hour
intervals, enter the following (using yesterday's date for date):
olhpa -d -s date -e date -g 2

7.1.4

SHELL SCRIPT GENERATION ANO EXECUTION

Shell scripts can allow you to easily generate and execute olhpa
command sequences.
The following example shows a shell script that generates a disk error
report for each disk drive for which errors are logged.

7-10

CRAY PROPRIETARY

SMM-1012 C

Example:
#

#
#

Shell script to report errors for each disk drive.

echo "**************************************************************"
echo
REP 0 R T
o F
DIS K
ERR 0 R S
echo
echo
Only devices which logged errors will create reports.
echo
echo "**************************************************************"
II

II

for DEV in 'olhpa -DPdtc $1 lawk '{print $1}' luniq Igrep ,_"
do
echo "**************************************************************"
$DEV
echo
echo "**************************************************************"
echo
olhpa -DSdtc=${DEV} $1
done
echo "**************************************************************"
echo
REP 0 R T
END
o F
echo "**************************************************************"
II

Error report output from preceding shell script:

**************************************************************
REP 0 R T

o

F

DIS K

ERR 0 R S

Only devices which logged errors will create reports.

**************************************************************

**************************************************************
40-1-34A

**************************************************************
Cray Hardware Performance Analyzer
10:26 03/02/88
Run time
10:26 02/01/88
Starting time
Ending time
10:26 03/02/88

Hardware Error Report For Disks
Restrictions:
Dt-IOP-channel-unit = 40-1-34A

SMM-1012 C

CRAY PROPRIETARY

7-11

Error report (continued) :
Date
88/02/26
88/02/26
88/02/26

Time
03:10:51
03:37:30
04:46:23

Errtyp
Read
Read
Read

DT/IOP/CHA
40-1-34A
40-1-34A
40-1-34A

HD

Sect
0000
0000
0000

Cyl

00
00
00

0013
0013
0013

Gen-Stat
011426
011426
011426

Status
Corre.
Corre.
Recov.

88/03/01
88/03/01
88/03/01

04:14:00
04:14:13
06:24:40

Read
Read
Read

40-1-34A
40-1-34A
40-1-34A

00
00
00

0000
0000
0000

0011
0011
0013

011411
011411
011426

Recov.
Recov.
Corre.

Total Disk Errors
Recovered Disk Errors
Corrected Disk Errors
Unrecovered Disk Errors
Uncorrected Disk Errors
Total Retries

30
12
18
0
0
70

**************************************************************
40-2-34A
**************************************************************

Cray Hardware Performance Analyzer
Run time
10:26 03/02/88
Starting time
10:26 02/01/88
Ending time
10:26 03/02/88
Hardware Error Report For Disks
Restrictions:
Dt-IOP-channel-unit

Date
88/02/26
88/02/26

Time
05:48:17
06:01:49

Errtyp
Read
Read

DT/IOP/CHA HD
40-2-34A
00
40-2-34A
00

Total Disk Errors
Recovered Disk Errors
Corrected Disk Errors
Unrecovered Disk Errors
Uncorrected Disk Errors
Total Retries

7-12

= 40-2-34A
Sect
0000
0000

Cyl
0007
0003

Gen-Stat
011433
011442

Status
Corre.
Corre.

2

a
2

a

a
6

CRAY PROPRIETARY

SMM-1012 C

Error report (continued):
Error information for all drives for which errors are logged is

displayed.

**************************************************************
END

o

F

REP 0 R T

**************************************************************

7.1.5

PROGRAM MESSAGES

If an invalid or nonexistent command option is entered, olhpa displays
the incorrect entry and the complete program synopsis.
If an invalid or nonexistent error file is entered, the following message
is displayed:
olhpa:

Cannot open file

In an error report, a field can contain the following symbols:
SymbOl

Description

N/A

No information was recorded in the system error log.

(x)

No information was recorded in the system error log.

The

field is specific to device type x.

SMM-1012 C

CRAY PROPRIETARY

7-13

runsequence

7.2

lne f
tUDI~lce utll't ·
l Y lS

used with the crontab(l) command to

per orm automatic test sequencing (scheduling and testing withou

operator intervention). Error messages are returned to
"f" d
thro h th U N I C O S · . .
spec~ ~e users
u9.
e
ma~l. Th~s alerts f~eld engineers and analysts that
there ~s an error. They can then examine the error log to determine
where the error occurred. The goal is to detect and isolate failures
before a system or application failure occurs.
To initiate automatic test sequencing, do the following:
1.
2.
3.
4.

Set the shell variables in the runsequence shell script.
Create the sequence files.
Create the input file for the crontab(l) command.
Execute the crontab(l) command.

After being called in from the crontab(l) input file, runsequence
reads a file containing a list of diagnostics and related command
optio~s, executes the diagnostics (one at a time), and saves any output
in a file.
After each diagnostic in the sequence file is executed, runsequence
determines the number of lines of output generated, as follows:
•

If there are more than five lines of output, runsequence assumes
that the diagnostic detected an error and sends specified users a
message.

•

If no error is detected but standard error output is generated,
runsequence sends specified users a message.

•

If no error is detected, the output files from the diagnostic are
removed.

1.2.1

crontab INPUT FILE

The crontab(1) input file contains the following information:
•
•

Times at which the sequences are to be run
Calls to runsequence

When defining the crontab(l) input file, you must include calls to
runsequence. Each call to runsequence must contain an appropriate
sequence file name and, optionally, a CPU designator. For additional
information on the crontab(l) command, refer to the UNICOS User
Commands Reference Manual, eRI publication SR-2011.

7-14

CRAY PROPRIETARY

SMM-I012 C

runsequence synopsis:
runsequence

seqfile [cpu]

seqfile

Indicates the name of the file containing the sequence of
diagnostics to be run, the diagnostic command options, and
any comments. The comments are the same as shell script
comments; they start with a pound sign (#) and continue to
the end of the line.

cpu

Indicates the CPU in which the diagnostics are to be run.
f, g, or h.
If the cpu
option is specified, the diagnostics in the sequence file
must be CPU tests. All the log and core files are placed
in a subdirectory of the DIAGLOG directory, which is
created if it does not already exist.

CpU can be a, b, c, d, e,

If the cpu option
the default value
diagnostic in the
are placed in the
subdirectory.

is not specified, the diagnostic uses
or you can specify the CPU option for the
sequence file.
All log and core files
DIAGLOG directory instead of a

The following example shows a sample crontab(l) input file:
# Run in a different cpu every 15 minutes
1 * * * * $HOME/scripts/runsequence hourlyseq a
15 * * * * $HOME/scripts/runsequence hourlyseq b
30 * * * * $HOME/scripts/runsequence hourlyseq c
45 * * * * $HOME/scripts/runsequence hourlyseq d

* * * *
* * * *

1
15
30
45
#
1
1
1
1
1
1

* * * *

* * * *

$HOME/scripts/runsequence sbtseq a,b,c,d
$HOME/scripts/runsequence sbtseq b,c,d,a
$HOME/scripts/runsequence sbtseq c,d,a,b
$HOME/scripts/runsequence sbtseq d,a,b,c

Run at midnight each day
* * 0-6 $HOME/scripts/runsequence dailyseq a
* * 0-6 $HOME/scripts/runsequence dailyseq b
* * 0-6 $HOME/scripts/runsequence dailyseq c
* * 0-6 $HOME/scripts/runsequence dailyseq d
* * 0-6 FSPATH=/tmp DT=DD49 $HOME/scripts/runsequence cfdtseq
* * 0-6 FINDPATH=$HOME/log $HOME/scripts/findseq

0
0
0
0
0
0

The minute field is set to 1 to offset the diagnostic program execution
to one minute after the hour. This allows scheduled system activities to
be performed at the start of each hour.

SMM-I012 C

CRAY PROPRIETARY

7-15

7.2.2

SEQUENCE FILES

The sequence files contain a list of the diagnostics to be executed and
their related command options. You must place these files in the
directory specified by the DIAGSCRIPTS shell variable. Before creating
sequence files, refer to appendix B, Test Execution Times.
The following example shows the recommended sequence files for the
erontab(l) input file.
Example:
hourlyseq:
# Run the following sequence once every 15 minutes in a different CPU.

olerit cputime 0:0:30 +getseed
olesve cputime 0:0:30 +getseed
olibuf cputime 0:0:30 +getseed
olefpt cputime 0:0:30 +getseed
olem cputime 0:0:30 +getseed

it
it
#
#
it

Read
Read
Read
Read
Read

seed
seed
seed
seed
seed

from
from
from
from
from

olerit.seed if available
olesve.seed if available
olibuf.seed if available
olefpt.seed if available
olem.seed if available

seed
seed
seed
seed
seed

from
from
from
from
from

olerit.seed if available
olesve.seed if available
olibuf.seed if available
olefpt.seed if available
olem.seed if available

dailyseq:
it Run the following sequence once a day.

olerit cputime 0:6:0 +getseed
olesve cputime 0:6:0 +getseed
olibuf cputime 0:6:0 +getseed
olefpt cputime 0:6:0 +getseed
olem cputime 0:6:0 +getseed

#
#
#
#
#

Read
Read
Read
Read
Read

sbtseq:
#
# sbtseq: This sequence tests olsbt in all cpus available
# it should be run once every 15 minutes.
#

olsbt cputime 30 +getseed

cfdtseq:
# Run the following sequence to test a mass storage device.
olcfdt maxp 50 fn $FSPATH/workfil.$$ rsz 512 sz 250 dt $DT
find $FSPATH -name 'workfil .• ' -user $LOGNAME -exec rm -f {} \\;

7-16

CRAY PROPRIETARY

SMM-1012 C

Example (continued):

findseq:

#
# findseq: This sequence finds and removes any small log files
# or stderr files that the runsequence created.

#
TOO OLD=180
#FPATH

# Number of days to save log files
Path to log files. default FPATH=$HOME/log in cronfile

find $FPATH \( \( -name '*.[0-9]*[0-9]' -size -300c \) -0 -name \
'stderr.*'\) -atime +0 -type f -exec rm -f {} \; 2>/dev/null 1>&2
#
#Remove any log file that has not been touched recently

#
find $FPATH -name '*.[0-9]*[0-9]' -type f -atime +$TOO_OLD \
-exec rm -f {} \;

Each site must determine if additional testing is desirable.

7.2.3

runsequence SHELL SCRIPT

The runsequence shell script runs under the Bourne shell and executes a
series of diagnostics by reading a file containing a list of the
diagnostics to be run. The diagnostics should be run with the verbose
option disabled (-verbose), because the size of each diagnostic output
file is used to determine if the diagnostic has failed.
The shell script maintains the diagnostic output and sends messages to a
specified list of users when an error is detected. You can set the
following variables in the runsequence shell script:
DIAGBIN=path
Indicates the full path name of the directory where the
executable binaries of the diagnostics reside.
If the
binaries reside in more than one directory, enter colons
between each directory. The following entry defines a
single directory:
DIAGBIN=/ce/bin
The following entry defines several directories:
DIAGBIN=/ce/bin:$HOME/bin

SMM-I012 C

CRAY PROPRIETARY

7-17

DIAGLOG=path
Indicates the full path name of the directory where the log
files are saved when a diagnostic detects an error
DIAGSCRIPTS=path
Indicates the path name where the sequence files reside.
You can specify only one full path name.
MAILLIST="user ••• user"
Provides a list of users to be notified when a diagnostic
detects an error. Enter a space between each user name and
enclose the list in double quotes. It is recommended that
the list contain more than one user name.
NICE=n

"Indicates the amount by which the diagnostic's priority
in the execution queue is to be lowered. n can be any
integer within the range 1 through 19. If a value greater
than 19 is entered, it is processed as if it were 19. If a
value less than 0 is entered, it has no effect.

ROlfLOG=logfile
Indicates the name of the log file containing information
on the sequence being run and any errors detected. The log
file resides in the DIAGLOG directory.
SAVECORE=ONIOFF
Enables (ON) or disables (OFF) the option that renames
and saves each core file generated. If SAVECORE is set
to OFF, any new core file overwrites an existing one.
The default values for the variables in the runsequence shell script
are as follows:
DIAGBIH=/ce/bin
DIAGLOG=$HOME/log
DIAGSCRIPT=$HOME/scripts
RUNLOG=$DIAGLOG/runlog
NICE=4
SAVECORE=OFF
MAILIST="$LOGNAME"

7-18

# Location of the executable diagnostics
# Location of the diagnostic log files
# Location of the diagnostic sequence lists
# Program log
# Lower the diagnostic's priority by this amount
# Existing core file will be overwritten
# List of people to receive error messages

CRAY PROPRIETARY

SMM-1012 C

APPENDIX SECTION

A.

ON-LINE DIAGNOSTIC PROGRAMS

This appendix lists and briefly describes the following types of on-line
diagnostic programs:
•
•
•
•

Confidence tests
Maintenance tests
Down-device programs
Network communications test (olnet)
1/0 Subsystem (lOS) deadstart programs
Utilities
offmon tests

•

•
•

The on-line diagnostic programs listed in this section are supported on
the following computer systems:
•

CEA systems
Y-mode (32-bit addressing)

•

A.1

CRAY X-MP and CRAY-1 computer systems

CONFIDENCE TESTS

Table A-1 briefly describes each on-line confidence test.

Table A-1.

Test

Confidence Tests

Description

Language

olefdt

Mass storage device test

CFT77

olefpt

Comprehensive floating-point test

CAL 2

olem

Central memory test

CAL 2

olerit

Comprehensive random instruction test

CAL 2

olcsvc

Comprehensive scalar/vector compare test

CAL 2

SMM-1012 C

CRAY PROPRIETARY

A-1

Table A-l.

Confidence Tests (continued)

Description

Test

Language

olibuf

Instruction buffer test

CAL 2

olsbt

Semaphore, shared Band T register test

CAL 2

A.2

MAINTENANCE TESTS

Table A-2 briefly describes each on-line CPU maintenance test.

NOTE
The CPU Maintenance Tests are supported for CX/CEA
systems in X-mode only.

Table A-2.

Test

CPU Maintenance Tests

Description

Language

olaht

A register indexing test

CAL 2

olarb

A register data test

CAL 2

olarm

A register multiply test

CAL 2

olbrb

B register basic data test

CAL 2

A-2

CRAY PROPRIETARY

SMM-I012 C

Table A-2.

Test

CPU Maintenance Tests (continued)

Description

Language

Random instruction and operand test

CAL 2

olcmptt

Vector compress instruction test

CAL 2

olcmzttt

Random instruction and operand test

CAL 2

o19thtt

Scatter/gather test

CAL 2

olibzttt

Instruction buffer test

CAL 2

olmit

Moving inversions memory test

CAL 2

olsfa

Simulate floating-point add test

CAL 2

olsfm

Simulate floating-point multiply test

CAL 2

olsfr

Simulate floating-point reciprocal

CAL 2

olsis

Scalar register instruction simulation test

CAL 2

olsr3

Random instruction issue register conflicts

CAL 2

olsra

Scalar register add test

CAL 2

olsrb

Scalar register basic test

CAL 2

olsrl

Scalar register logical test

CAL 2

olsrs

Scalar register shift test

CAL 2

olstan

Standard answer functional units test

CAL 2

olsvc

Scalar and vector compare test

CAL 2

oltrb

T register basic data test

CAL 2

t
tt

ttt

CRAY-l computer systems only
CEA (X-mode) and CRAY X-MP computer systems only
CRAY X-MP EA (X-mode) and CRAY X-MP computer systems only

SMM-I012 C

CRAY PROPRIETARY

A-3

Table A-2.

CPU Maintenance Tests (continued)

Description

Test

Language

olvpopt

Vector population count test

CAL 2

olvpptt

Vector population count test

CAL 2

olvra

Vector register add test

CAL 2

olvrl

Vector register logical test

CAL 2

olvrn

. Vector register random test

CAL 2

olvrr

Vector register random length test

CAL 2

olvrs

Vector register shift test

CAL 2

olvrztt

Vector register stress test

CAL 2

t
tt

A.3

CRAY-l computer systems only
CEA (X-mode) and CRAY X-MP computer systems only

DOWN-DEVICE PROGRAMS

Table A-3 briefly describes the down-device programs, which reside on
DIAGPL.

Table A-3.

Down-Device Programs

1

I

Test

Description

Language

1==============================================================

I

1

donut

On-line disk maintenance program

CFT77, C

oldmont

Down CPU monitor

C & CAL 2

I
1

& CAL 2

I
C
On-line magnetic tape test
I unitap
I ________~----------------------------------~---------------t

A-4

Multiple CPU Cray computer systems only

CRAY PROPRIETARY

SMM-1012 C

Tables A-4 and A-S briefly describe the down CPU tests, which reside on
XMPPL and execute under oldmon, the down CPU monitor.
These tests run
on CRAY X-MP computer systems in multiple-CPU environments only
(CRAY X-MP/4 and CRAY X-MP/2 computer systems).

Table A-4.

Test

Down CPU Confidence Tests

Description

Language

offcfpt

Comprehensive floating point test

CAL 2

offern

Central memory test

CAL 2

offcrit

Comprehensive random instruction test

CAL 2

offcsvc

Comprehensive scalar/vector compare test

CAL 2

offibuf

Instruction buffer test

CAL 2

Table A-S.

Test

Down CPU Maintenance Tests

Description

Language

aht

A register indexing test

CAL 2

arb

A register data test

CAL 2

arm

A register multiply test

CAL 2

brb

B register basic data test

CAL 2

cmp

Vector compress instruction test

CAL 2

CIllX

Random instruction and operand test

CAL 2

gth

Scatter/gather test

CAL 2

ibz

Instruction buffer test

CAL 2

mit

Moving inversions memory test

CAL 2

SMM-1012 C

CRAY PROPRIETARY

A-5

Table A-S. Down CPU Maintenance Tests (continued)

Test

Description

Language

sfa

Simulate floating-point add test

CAL 2

sfm

Simulate floating-point multiply test

CAL 2

sfr

Simulate floating-point reciprocal

CAL 2

sis

Scalar register instruction simulation test

CAL 2

sr3

Random instruction issue register conflicts

CAL 2

sra

Scalar register add test

CAL 2

srb

Scalar register basic test

CAL 2

srI

Scalar register logical test

CAL 2

srs

Scalar register shift test

CAL 2

stan

Standard answer functional units test

CAL 2

svc

Scalar and vector compare test

CAL 2

trb

T register basic data test

CAL 2

vpp

Vector population count test

CAL 2

vra

Vector register add test

CAL 2

vrl

Vector register logical test

CAL 2

vrn

Vector register random test

CAL 2

vrr

Vector register random length test

CAL 2

vrs

Vector register shift test

CAL 2

vrz

Vector register stress test

CAL 2

A-6

CRAY PROPRIETARY

SMM-1012 C

A.4

ON-LINE NETWORK COMMUNICATIONS PROGRAM

Table A-6 briefly describes the Cray-to-front end communications test,

olnet.
Table A-6.

On-line Network Communications Program

I
Test
Description
Language
I
I========~======================================~===========
I
CFT77 & ct
Cray-to-front end communications test
I olnet
(exercises all or part of the path between
I
a Cray mainframe and a front end)
I
I ________~------------------------------------------~------------

t

Motorola Operator Workstation (OWS) and Maintenance Workstation
(MWS) only

The olnet test is described in the On-line Diagnostic Network
Communications Program (OLNET) Maintenance Manual, CRI pUblication
SMM-1016.

SMM-1012 C

CRAY PROPRIETARY

A-7

A.S

IIO SUBSYSTEM DEADSTART PROGRAMS

Table A-7 briefly describes the IIO Subsystem (lOS) deadstart programs,
which reside on DIAGPL. The cleario program is executed independently
from the other programs listed. The dsdiag program, the lOS deadstart
diagnostic control program, loads and executes all of the programs
(except cleario) from a diagnostic overlay file, after first executing
a series of basic IOP-O tests.

Table A-7.
Program

IIO Subsystem Deadstart Programs
Description

Language

cleario

Attempts to clear the lOS if the deadstart
procedure fails

APML

dsdiag

Deadstart diagnostic control program

APML

dshsp

High-speed channel test from an IIO processor
(lOP) to central memory or to an SSD
solid-state storage device

APML

dsiom

Local memory addressing and data test for
each lOP

APML

dsiop

Instruction test for each lOP

APML

dslsp

Low-speed channel test from IOP-O to central
memory

APML and
CAL 1

dsmos

Buffer memory addressing and data path test
for each lOP

APML

dsmos16k

Test of the lower 16 Kbytes of buffer memory
from IOP-O only

APML

A-a

CRAY PROPRIETARY

SMM-1012 C

A.6

UTILITY PROGRAMS

Table

A-a

briefly describes each on-line utility program.

Table

A-B.

Utility Programs

Description

Utility

Language

olhpa

Hardware performance analyzer

C

runsequenee

Diagnostic sequencer utility

Shell script

A.7

offmon TESTS

Table A-9 briefly describes each offmon test.

Table A-9.

Confidence
Test

offmon Tests

Description

Language

offcfpt

Comprehensive floating-point test

CAL 2

offem

Central memory test

CAL 2

offcrit

Comprehensive random instruction
test

CAL 2

offesvc

Comprehensive scalar/vector compare
test

CAL 2

offibuf

Instruction buffer test

CAL 2

SMM-1012 C

CRAY PROPRIETARY

A-9

B.

TEST EXECUTION TIMES

This appendix lists the execution times for the following types of
on-line diagnostic tests:
•
•

Confidence
Maintenance

The tests were run at Cray Research, Inc. during normal workday
operations, using a default pass count of 512 (0'1000). The times are
for test execution in a single CPU of a CRAY X-MP computer system and
cannot be extrapolated to determine execution times for multiple CPU runs.

NOTE
The execution times may vary depending on system load,
and should not be used for CPU or benchmark comparisons.

In the test execution tables, the following times are listed in the
headings:

B.1

Time

Description

Elapsed
User
System

Wall-clock time
CPU time
System overhead time

EXECUTION TIMES FOR CONFIDENCE TESTS

Table B-1 lists the execution times for the confidence tests.
was run with a pass count of 512 (0'1000).

SMM-1012 C

CRAY PROPRIETARY

Each test

B-1

Table B-1.

Test

Elapsed Timet

tt

B.2

User Time

System Time

olem

65.00 s

34.25 s

0.88 s

olefpt

23.00 s

7.15 s

0.47 s

olerit

15.00 s

7.55 s

0.28 s

olesve

12.00 s

4.27 s

0.21 s

olibuf

78.00 s

21.00 s

0.11 s

4.66 s

2.29 s

1.43 s

olsbttt
t

Execution Times for Confidence Tests

Execution times may be reduced 0
increased by the use of
test-specific options.
Times are for test execution with four CPUs (cpu a,b,c,d)

EXECUTION TIMES FOR MAINTENANCE TESTS

Table B-2 lists the execution times for the maintenance tests. Each test
was run with a pass count of 512 (0'1000) except olibz and olsfm;
these tests were run for less than 512 (0'1000) passes, and their
respective execution times were then used to extrapolate elapsed, user,
and system times for 512 passes.

Table B-2.

Test

t
tt

B-2

Execution Times for Maintenance Tests

Elapsed Time

User Time

System Time

olaht

10.03 s

2.24 s

0.08 s

olarb

0.74 s

0.11 s

0.01 s

olarm

21.10 m

15.95 m

olbrb

0.69 s

0.24 s

0.01 s

7.10 s

2.92 s

0.04 s

17.35 s

CRAY-1 compute systems only
CEA (X-mode) and CRAY X-MP computer systems only

CRAY PROPRIETARY

SMM-1012 C

Table B-2.
Test

Elapsed Time

User Time

System Time

olcmx t

25.35 s

2.49 s

0.1

olqthtt

15.11 s

7.41 s

0.12 s

olibzt

6.74 h

1.62 h

1.25 m

olmit

1.61 m

42.12 s

1.58 s

olsfa

9.39 s

7.95 s

0.17 s

olsfm

117.0

h

olsfr

8.02

olsis

m

14.3

h

s

12.64 m

6.33 m

5.77 s

0.46 s

0.02 s

0.01 s

olsr3

0.46 s

0.18 s

0.01 s

olsra

0.96 s

0.70 s

0.04 s

olsrb

1.00 s

0.34 s

0.02 s

olsrl

1.96 s

0.05 s

0.01 s

olsrs

20.64 s

18.04 s

0.37 s

olstan

0.31 s

0.21 s

0.01 s

olsvc

0.35 s

0.17 s

0.01 s

oltrb

6.07 s

5.13 s

0.12 s

0.57 s

0.02 s

olvpopttt

t
tt
ttt

Execution Times for Maintenance Tests (continued)

0.73 s

olvpptt

0.84 s

0.62 s

0.01 s

olvra

0.82 s

0.68 s

0.02 s

olvrl

0.87 s

0.59 s

0.01 s

CRAY X-MP EA (X-mode) and CRAY X-MP computer systems only
CEA (X-mode) and CRAY X-MP computer systems only
CRAY-1 computer systems only

SMM-I012 C

CRAY PROPRIETARY

B-3

Table B-2.

Test

User Time

System Time

0.23 s

0.12 s

0.01 s

olvrr

0.28 s

0.12 s

0.01 s

s

17.34 s

0.36 s

olvrzt

B-4

Elapsed Time

olvrn

olvrs

t

Execution Times for Maintenance Tests (continued)

26.3

2.86 m

2.83 min

1.44 s

CEA (X-mode) and CRAY X-MP computer systems only

CRAY PROPRIETARY

SMM-1012 C

C.

ON-LINE DIAGNOSTIC PROGRAM LIBRARIES

This appendix describes the on-line diagnostic program libraries (PLs)
and their contents and associated decks. The on-line diagnostic PLs are
as follows:
PL

Description

DIAGPL

Contains on-line diagnostic programs that execute on
CX/CEA and CRAY-l computer systems

XMPPL

Contains diagnostic programs that execute on CX/CEA
systems

CRAY1PL

Contains diagnostic programs that execute on a CRAY-l
computer system

Each deck contains source code that is used to generate a binary.

C.l

DIAGPL

DIAGPL contains on-line diagnostic programs that execute on CX/CEA and
CRAY-l computer systems. The contents of DIAGPL are as follows:
Program

Deck

bnmtap
cleario
donut
dsdiag
olcm
olcfdt
olcfpt
olcrit
olcsvc
oldmon
olhpa
olibuf
olnet
olsbt
runsequence

BMXTAP
CLEARIO
DONUT
DSDIAG, DSDIAGD, DSMOS16K, DSIOM, DSIOP, DSMOS, DSHSP, DSLSP
OLCM
OLCFDT
OLCFPT
OLCRIT
OLCSVC
OLDMON
OLHPA
OLIBUF
OLNET
OLSBT
RUNSEQ

SMM-1012 C

CRAY PROPRIETARY

C-l

C.2

XMPPL

XMPPL contains diagnostic programs that execute on CX/CEA systems.
contents of XMPPL are as follows:

C.3

Program

Deck

olaht
olarb
olarm
olbrb
olcmp
olcmz
olgtb
olibz
olmit
olsfa
olsfm
olsfr
olsis
olsr3
olsra
olsrb
olsrl
olsrs
olstan
olsvc
oltrb
olvpp
olvra
olvrl
olvrn
olvrr
olvrs
olvrz

AHT
ARB
ARM
BRB
CMP
CMX
GTH
IBZ
MIT
SFA
SFM
SFR
SIS
SR3
SRA
SRB
SRL
SRS
STAN
SVC
TRB
VPP
VRA
VRL
VRN
VRR
VRS
VRX

The

CRAY1PL

CRAY1PL contains diagnostic programs that execute on CRAY-l computer
systems. The contents of CRAY1PL are as follows:

C-2

Program

Deck

olaht
olarb
olarm

AHT
ARB
ARM

CRAY PROPRIETARY

SMM-1012 C

Program

Deck

olbrb
olcmd
olmit
olsfa
olsfm
olsfr
olsis
olsr3
olsra
olsrb
olsrl
olsrs
olstan
olsvc
oltrb
olvpop
olvra
olvrl
olvrn
olvrr
olvrs

BRB
CMD
MIT
SFA
SFM
SFR
SIS
SR3
SRA
SRB
SRL
SRS
STAN
SVC
TRB
VPOP
VRA
VRL
VRN
VRR
VRS

SMM-1012 C

CRAY PROPRIETARY

C-3

D.

SOFTWARE PROBLEM REPORTING

This appendix describes the on-line diagnostic software problem reporting
procedure.
The on-line diagnostics are released as part of the operating system
software. To report problems with or request changes to the on-line
diagnostic software, send the information electronically to the automated
Software Technical Support database, or send a Software Problem Report
(SPR) form to the Software Technical Support department.
Figure D-1 shows an SPR form.
You can order these forms from the CRI
Distribution Center. For additional SPR information, refer to the
Software Problem Report (SPR) User's Guide, CRI publication SD-0235.

SMM-1012 C

CRAY PROPRIETARY

D-1

PLEASE PRESS HARD YOU ARE MAKING 3 COPIES

Software Problem Report
Phone

ame
Mainframe

Site Code

Date

o

o

o

o

lOS Version

Prerelease
YO

On-Site Analyst's Signature

No

Version

Prerelease

Version

YO NO
Prerelease
YO NO

Title of Problem

DUMP ED. NO.)

LISTING

JOB THAT PRODUCED PROBLEM

SPR DESCRIPTION

CORRECTIVE CODE SUPPLIED:

YON 0

TESTED:

YON 0

SEND TO:
TEST CASE SUPPLIED:

RESEARCH, INC.

1345 Northland Drive
Mandota Heights, MN 55120

DISTRIBUTION:

WHITE - CRI FILE

BLUE - SPR COORDINATOR

Figure D-1.

D-2

PINK - AIC

SPR Form

CRAY PROPRIETARY

SMM-I012 C

E.

SYSTEM UTILITIES

This appendix briefly describes the UNICOS system utilities that have
been identified as effective diagnostic tools. These utilities are as
follows:
Utility

Description

dda(l)

The dda command (dynamic dump analyzer) allows you
to examine the contents of a program memory dump.

icrash(lM)

The icrash command allows you to examine the 1/0
Subsystem (lOS) core image.

If you know of other system utilities that should be mentioned in this
appendix, please use one of the following options to forward the
information to the Technical Publications department:
•

Call our Technical Publications department at (612) 681-5729
during the hours of 7:30 A.M. to 6:00 P.M. (Central Time).

•

Send us electronic mail from a UNICOS or UNIX system, using the
following UUCP addresses:
uunet!cray!publications
sun!tundra!hall!publicatioDS

•

Send us electronic mail from a UNICOS or UNIX system, using the
following ARPAnet address:
publicatioDs@cray.com

•

Send a facsimile of your comments to the attention of
"Publications" at FAX number:
(612) 681-5602

•

Use the postage-paid Reader's Comment form at the back of this
manual.

SMM-1012 C

CRAY PROPRIETARY

E-1

•

Write to us at the following address:
Cray Research, Inc.
Technical Publications Department
1345 Northland Drive
Mendota Heights, Minnesota 55120

We value your comments and will respond to them promptly.

E-2

CRAY PROPRIETARY

SMM-1012 C

F.

SITE COMMUNICATIONS

This appendix describes on-line diagnostic field support.
includes the following:

This support

•

On-line diagnostic error dumps analysis

•

On-line diagnostic formatted error output analysis

•

On-line diagnostic installation, usage, and availability
information

Please use one of the following options to forward inquiries to the
On-line Diagnostic department:
•

Call our On-line Diagnostic department at (612) 681-5642 during
the hours of 8:00 A.M. to 5:00 P.M. (Central Time).
From 5:00
P.M. to 8:00 A.M., you can leave a recorded message.
Include the
following information in your message.
Your name
Telephone number
Site identification
Operating system/release level
On-line diagnostic release
Failing on-line diagnostic
Description of the problem

•

Send us electronic mail from a UNICOS or UNIX system, using the
following electronic mail address:
oldiag@Crayamid

•

Write to us at the following address:
Cray Research, Inc.
On-line Diagnostic Department
1345 Northland Drive
Mendota Heights, Minnesota 55120

SMM-1012 C

CRAY PROPRIETARY

F-1

G.

INSTALLATION INFORMATION

Typically, the on-line diagnostics are installed as part of the system
installation procedure documented in the UNICOS System Installation
Bulletin (SIB). If you need to re-install the on-line diagnostics
subsequent to system installation, a different procedure must be used.
This appendix describes how to install the on-line diagnostics after
system installation. The following topics are discussed:

G.l

•

On-line diagnostic directories

•

Generating on-line diagnostic binaries and listings

•

Saving off-line versions of on-line confidence tests and 1/0
Subsystem (lOS) deadstart programs

•

Generating olnet

•

Deleting proprietary source code

ON-LINE DIAGNOSTIC DIRECTORIES

The on-line diagnostics are located in the following directories:
Directory

Description

lusrlsrc/diag

Source code

Ice/bin

On-line diagnostic binaries

Ice/oldmon

Off-line diagnostic binaries for oldmon

Ice/olnet

olnet source code for front-end computer systems

Ice/scripts

runsequence scripts

Ice/log

Log directory for runsequence

Ice/ios

lOS deadstart programs for single lOS systems

Ice/iosa
Ice/iosb

lOS deadstart programs for two lOS systems

SMM-1012 C

CRAY PROPRIETARY

G-l

G.2

GENERATING ON-LINE DIAGNOSTIC BINARIES

Perform the following steps to generate on-line diagnostic binaries:
1.

Load the on-line diagnostic tape. This tape is normally included
with the UNICOS release package. . If necessary, you can order
another copy from the CRI Distribution Center.

2.

Enter the following commands to execute the Makefile:
cd lusrlsrc/diag
update -p diagpl -q DIAGMAKE -c diag -a m
mv diag.m diag.mk
Make -f diag.mk install SN=xxxx
xxxx is your mainframe's serial number.

G.3

GENERATING ON-LINE DIAGNOSTIC LISTINGS

To generate the on-line diagnostic listings, enter the following commands:
cd lusrlsrc/diag
make -f diag.mk listings

NOTE
The listings include all on-line diagnostic test
listings, off-line versions of CPU on-line test
listings, and lOS deadstart and cleario test listings.

The diagnostic listings are CRAY PROPRIETARY.
Print the
write them to tape; do not keep the listings on-line.

G-2

CRAY PROPRIETARY

listings or

SMM-I012 C

G.4

SAVING OFF-LINE VERSIONS OF ON-LINE CONFIDENCE TESTS

This section describes where to save off-line versions of on-line
confidence tests for Maintenance Workstation-based (MWS-based) systems
running the Cray Maintenance System (CMS) or expander-based systems
running DSS.

G.4.1

MWS-BASED SYSTEMS RUNNING CMS

Enter the following commands to copy the off-line confidence diagnostics
to the MWS:
rcp
rcp
rcp
rcp
rcp

/ce/oldmon/offcrit mws:/CPUDIR
/ce/oldmon/offcsvc mws:/CPUDIR
/ce/oldmon/offcfpt mws:/CPUDIR
/ce/oldmon/offibuf mws:/CPUDIR
/ce/oldmon/offcm mws:/CPUDIR

CPUDIR is the directory on the MWS where the CPU off-line diagnostics
reside.

G.4.2
1.

mws

is the hostname for the MWS.

EXPANDER-BASED SYSTEMS RUNNING DSS

Enter the following commands to write the off-line confidence
diagnostics to a scratch tape:
extd -0 -r
extd -0 -r
extd -0 -r
extd -0 -r
extd -0 -n

-n 0 (/ce/oldmon/offcrit
-n 1 (/ce/oldmon/offcsvc
-n 2 (/ce/oldmon/offcfpt
-n 3 (/ce/oldmon/offibuf
4 (/ce/oldmon/offcm

NOTE
Steps 2 and 3 cannot be performed while the operating
system is running.
Perform these steps the next time
you shut down your system.

SMM-I012 C

CRAY PROPRIETARY

G-3

2.

Copy the diagnostics to the off-line expander pack under FNT 4. To
copy the diagnostics from the tape that was just written, enter the
following commands under ossa:
READ
READ
READ
READ
READ

3.

@ SCRIT 4
@ SCSVC 4
@ SCFPT 4
@ SIBUF 4
@ SCM 4

These off-line diagnostics are dependent on the latest off-line IOPPL
release P2.0. This release of the Cray Maintenance Operating System
(CMOS) allows diagnostics larger than 6000 words to be loaded and
deadstarted. To load and execute these diagnostics, use the CMOS
command DS L.

G.S

SAVING 1/0 SUBSYSTEM (lOS) DEADSTART PROGRAMS

This section describes where to save 1/0 Subsystem (lOS) deadstart
programs for Operator Workstation (OWS), expander tape, or expander disk
UNICOS.

G.S.l

OWS UNICOS

To copy the newly created dsdiaq and cleario binaries to the OWS,
enter the following commands:
rcp
rcp
rcp
rcp

Ice/ios/dsdiag ows:IIOSDIR
Ice/ios/dsdiag.ov ows:IIOSDIR

Ice/ios/cleario ows:IIOSDIR
Ice/ios/cleario.ov ows:IIOSDIR

IOSDIR is a site-specific parameter that indicates the location of the
lOS kernel and overlays. ows is the hostname for the OWS. The
deadstart diagnostics should reside in the same OWS directory as the lOS
kernel and overlays. Two lOS systems will store diagnostics in two OWS
directories based on the lOS serial number.

NOTE
Two lOS systems store diagnostics in directories
Ice/iosal and Ice/iosb/.

G-4

CRAY PROPRIETARY

SMM-1012 C

The deadstart diagnostic binaries are now saved on the OWS as files
called dsdiag, dsdiag.ov, cleario, and cleario.ov.

G.S.2

EXPANDER TAPE UNICOS

Write the deadstart diagnostics to the same deadstart tape as the UNICOS
kernel. To write the newly created deadstart diagnostic binaries to
expander tape, enter the following commands:
extd
extd
extd

-0

-0
-0

-r -n 7 < Ice/ios/cleario
-r -n 8 < Ice/ios/dsdiag
-n 9 < Ice/ios/dsdiag.ov

NOTE
Two lOS systems store diagnostics in directories
Ice/iosal and Ice/iosb/.

The deadstart binaries are now saved on the expander tape as files called
CLEARIO, DSDIAG, and DSDIAG.OV.

G.S.3

EXPANDER DISK UNICOS

To write the newly created dsdiag and cleario binaries to expander
disk pack, enter the following commands:
exdf
exdf
exdf

-0

-0
-0

IINSTALLldsdiag < Ice/ios/dsdiag
IINSTALLldsdiag.ov < Ice/ios/dsdiag.ov
IINSTALLlcleario < Ice/ios/cleario

INSTALL is a site-specific parameter that indicates the location of
CLEARIO, DSDIAG, and DSDIAG.OV on an expander disk. The deadstart
diagnostic binaries should reside in the same directory as the UNICOS
kernel and overlays.

NOTE
Two lOS systems store diagnostics in directories
/ce/iosa/ and /ce/iosb/.

SMM-1012 C

CRAY PROPRIETARY

G-5

he deadstart binaries are now saved on the expander disk pack as files
called CLEARIO, DSDIAG, and DSDIAG.OV.

G.6

GENERATING olnet

This section describes how to generate olnet for computer systems with
the following front-ends:
•
•
•

G.6.1

IBM
Sun Workstation
Motorola workstation, OWS, or MWS

IBM FRONT-END

The following olnet build procedure is intended for sites with
front-end computer systems running VM.
1.

Transfer the following files created during the UNICOS build
procedure:
UNICOS Name

VM Name

Description

olnet.vm.f

file name OLNET
file type FORTRAN

olnet Fortran source
code

driver.vm.a

file name OLFEIV
olnet driver (BAL code)
file type ASSEMBLE

Perform steps 2 through 6 from the CMS user environment:
2.

Compile the olnet Fortran source code:
FORTVS OLNET

3.

Access the VM/SP macro libraries:
LINK MAINT 194 194 RR
ACCESS 194 B
ACC 194 I
GLOBAL MACLIB OSMACRO

4.

(a

password may be required)

DMSSP DMKSP CMSLIB TSOMAC

Assemble the VM driver:
ASSEMBLE OLFEIV
REL B
REL I

G-6

CRAY PROPRIETARY

SMM-1012 C

5.

Link the oln~t driver and source code modules to create an
executable binary module named OLNET:

GLOBAL TXTLIB VLNKMLIB VFORTLIB CMSLIB
LOAD OLNET OLFEIV
GENMOD OLNET

NOTE
The following step is required by the olnet licensing
agreement.

6.

G.6.2

Discard the following files:
File Name

File Type

OLNET
OLNET
OLFEIV
OLFEIV
LOAD

FORTRAN
TEXT
ASSEMBLE
TEXT
MAP

SUN WORKSTATION FRONT-END (NSC)

The following olnet NSC build procedure is intended for sites with Sun
Workstation front-end computer systems.
1.

2.

Transfer the following files created during the UNICOS build
procedure:
UNICOS Name

Sun Name

Description

olnet.sunnsc.f

olnet.sunnsc.f

olnet Fortran source
code

drv.sunnsc.c

drv.sunnsc.c

olnet driver (C code)

Compile the olnet Fortran source code and C driver:
f77

SMM-1012 C

-0

olnet olnet.sunnsc.f drv.sunnsc.c

CRAY PROPRIETARY

G-7

NOTE
The following step is required by the olnet licensing
agreement.

3.

Remove the following files:
rm
rm
rm
"rm

G.6.3

olnet.sunnsc.f
olnet.sunnsc.o
drv.sunnsc.c
drv. sunnsc.o

SUN WORKSTATION FRONT-END (VME)

The following olnet VME build procedure is intended for sites with Sun
Workstation front-end computer systems:
1.

2.

Transfer the following files created during the UNICOS build
procedure:
UNICOS Name

Sun Name

Description

olnet.sunvme.f

olnet.sunvrne.f

olnet Fortran source
code

drv. sunvme. c

drv.sunvrne.c

olnet driver (C code)

Compile the olnet Fortran source code and C driver.
f77

-0

olnet olnet.sunvrne.f drv.sunvrne.c

NOTE
The following step is required by the olnet licensing
agreement.

G-8

CRAY PROPRIETARY

SMM-1012 C

3.

Remove the following files:
rm
rm
rm
rm

G.6.4

olnet.sunvrne.f
olnet.sunvme.o
drv.sunvme.c
drv.sunvrne.o

MOTOROLA WORKSTATION, OWS, OR MWS FRONT-END (VME)

The following olnet VME build procedure is intended for sites with
Motorola workstation, OWS, or MWS front-end computer systems.
1.

2.

Transfer the following file created during the UNICOS build
procedure:
UNICOS Name

Sun Name

Description

olnet.mot.c

olnet.mot.c

olnet C source code

Compile the olnet C source code and driver.
cc

-0

olnet olnet.mot.c

NOTE
The following step is required by the olnet licensing
agreement.

3.

Discard the following files:
rm olnet.mot.c
rm olnet.mot.o

SMM-1012 C

CRAY PROPRIETARY

G-9

G.7

DELETING PROPRIETARY SOURCE CODE

The CRAY1PL, XMPPL, and DIAGPL libraries contain source code
that is CRAY PROPRIETARY. Therefore, the program libraries,
source code, binaries, and listings must not be maintained on
system storage.
Remove the source code files, listings, binaries, and program libraries
from system storage by entering the following commands:
cd lusrlsrc/diag
make -f diag.mk delete
rm -f craylpl xmppl diagpl craylpl.mods xmppl.mods diagpl.mods

G-10

CRAY PROPRIETARY

SMM-1012 C

INDEX

INDEX

cleario
execution, 6-2
messages, 6-4
overview, 6-2
Confidence tests
examples, 2-6
execution, 2-5
execution times, B-1
list of, A-1
messages, 2-8
off-line monitor (offmon), 2-10
olcfdt, 3-1
olcfpt, 3-11
olcm, 3-25
olcrit, 3-36
olcsvc, 3-61
olibuf, 3-85
olsbt, 3-107
on-line monitor (olemon), 2-1
overview, 2-1
termination, 2-5
Deadstart programs
cleario, 6-2
dsdiaq, 6-5
list of, A-8
overview, 6-1
system configuration, 6-1
donut
buffer utility menu, 5-13
disk mode
maintenance mode, 5-3
overview, 5-2
system mode, 5-3
disk selection, 5-2
error correction code test, 5-41
error utility menu, 5-17
error log menu, 5-19
error table menu, 5-18
examples, 5-44
execution, 5-5
exiting, 5-44
flaw table utility menus, 5-33
formatting menu, 5-20
examine data buffer menu, 5-22
ID analysis menu, 5-23
logical address of the sector
ID, 5-21
parameter menu, 5-27
position field of the sector ID, 5-22

SMM-1012 C

donut (continued)
main menu, 5-9
commands to change the data
buffer, 5-12
commands to change the type of write
command used, 5-12
commands to display commands
list, 5-13
commands to display flaw table
menus, 5-11
commands to display sUbmemus, 5-9
commands to display the data
buffer, 5-11
commands to select display
format, 5-10
commands to set arguments, 5-10
menu displays, 5-4
overview, 5-1
parameter menu, 5-42
surface tests menu, 5-27
examine data buffer menu, 5-33
parameter menu, 5-33
write data, read data and compare,
and surface analysis menus, 5-29
warnings and messages, 5-4
Down-device programs, 5-1
donut, 5-1
list of, A-4,5,6
oldmon, 5-50
unitap, 5-89
dsdiaq
execution
IOP-O tests, 6-7
lOS tests
dshsp, 6-14
dsiom, 6-10
dsiop, 6-10
dslsp, 6-15
dsmos, 6-13
dsmos16t, 6-9
overview, 6-9
messages
error
all tests, 6-17
dshsp, 6-24
dsiom, 6-19
dsiop, 6-20
dslsp, 6-31
dsmos, 6-22
dsmos16t, 6-19
IOP-O tests, 6-18

CRAY PROPRIETARY

Index-l

dsdiag messages (continued)
informative, 6-16
overview, 6-16
overview, 6-5

Error messages (see Program messages)
Examples (see Program execution examples)
Execution (see Program execution)

Installation information, G-1
generating olnet, G-6
generating on-line diagnostic
binaries, G-2
generating on-line diagnostic
listings, G-2
on-line diagnostic directories, G-1
saving lOS deadstart programs, G-4
saving off-line versions of on-line
confidence tests, G-3
lOS deadstart programs (see deadstart
programs)

Libraries (see Program libraries)

Maintenance tests
diagnostic memory image, 4-13
examples, 4-7
execution, 4-4
execution times, B-2
list of CPU tests, A-2
messages, 4-12
monitor (olmon), 4-1
overview, 4-1
synopsis, 4-2
termination, 4-7
test-specific requirements
olaht, 4-5
olCllUr:, 4-5
olibz, 4-6
Messages (see Program messages)
Monitors
down CPU (oldmon), 5-50
off-line confidence (offmon), 2-10
on-line confidence (olcmon), 2-1
maintenance (olmon), 4-1

offmon
list of tests, A-9
overview, 2-10
olcfdt
examples, 3-6
messages, 3-8
overview, 3-1
synopsis, 3-2
olcfpt
examples, 3-18
execution
comparison of simulation and
execution results, 3-16

Index-2

olcfpt execution (continued)
error isolation, 3-16
random floating point instruction
and data generation, 3-15
random floating point instruction
buffer execution, 3-16
random floating point instruction
buffer simulation, 3-15
test initialization, 3-15
messages, 3-23
overview, 3-11
synopsis, 3-11
termination, 3-18
olcm
examples, 3-30
execution
comparison of expected and actual
data, 3-30
error report, 3-30
test initialization, 3-26
test section execution, 3-27
messages, 3-34
overview, 3-25
synopsis, 3-25
termination, 3-30
olcmon
examples, 2-6
execution, 2-5
messages, 2-8
overview, 2-1
synopsis, 2-1
termination, 2-5
olcrit
examples, 3-49
execution
comparison of simulation and
execution results, 3-47
error isolation, 3-48
random instruction and data
generation, 3-46
random instruction buffer
execution, 3-47
random instruction buffer
simulation, 3-47
test initialization and hardware
configuration detection, 3-45
messages, 3-57
overview, 3-36
synopsis, 3-36
termination, 3-49
olcsvc
examples, 3-77
execution
comparison of execution results, 3-76
error isolation, 3-76
instruction buffer execution, 3-75
overview, 3-66
random instruction and data
generation, 3-67
test initialization and hardware
configuration detection, 3-66
messages, 3-83
overview, 3-61

CRAY PROPRIETARY

SMM-1012 C

oIcsvc (continued)
synopsis, 3-61
termination, 3-77
oIdmon
commands, 5-63
append (a) and dump (d), 5-66
common arguments, 5-65
CPU (c), 5-67
enter (e), 5-68
execute (z), 5-68
fill (f), 5-68
go (g), 5-69
halt (h), 5-69
load (1), 5-70
options (0), 5-70
quit (q), 5-71
redraw (r), 5-71
shell escape (!), 5-72
status (s), 5-72
up (u), 5-72
view (v), 5-72
write (w), 5-73
display modes
screen mode display, 5-62
scroll mode display, 5-61
down CPU tests, 5-50
example, 5-74
execution
down CPU tests, 5-53
environment variables, 5-58
test loop code, 5-56
messages, 5-87
overview, 5-50
synopsis, 5-51
oIhpa
examples, 7-9
help menus, 7-6
messages, 7-13
overview, 7-1
shell script generation and
execution, 7-10
synopsis, 7-1
olibuf
error isolation to the failing bit, 3-96
CRAY X-MP computer system error
isolation, 3-99
CXll system error isolation, 3-97
examples, 3-101
execution
comparison of expected and actual
data, 3-96
CRAY X-MP computer system test
buffer generation, 3-89
CRAY Y-MP computer system test
buffer generation, 3-92
error report, 3-96
test buffer execution, 3-96
test initialization, 3-88
messages, 3-105
overview, 3-85
synopsis, 3-85
termination, 3-101

SMM-1012 C

olmon
diagnostic memory image, 4-13
examples, 4-7
execution, 4-4
messages, 4-12
overview, 4-1
synopsis, 4-2
termination, 4-7
olnet, A-7
oIsbt
examples, 3-115
execution
comparison of simulation and
execution results, 3-114
error isolation, 3-114
random instruction and data
generation, 3-110
random instruction buffer
execution, 3-113
random instruction buffer
simulation, 3-113
test initialization and hardware
configuration detection, 3-110
messages, 3-126
overview, 3-107
synopsis, 3-107
termination, 3-115
On-line diagnostics
confidence tests
olcfdt, 3-1
olcfpt, 3-11
olCJR, 3-25
olcrit, 3-36
olcsvc, 3-61
olibuf, 3-85
olsbt, 3-107
overview, 2-1
deadstart programs
cleario, 6-2
dsdiaq, 6-5
overview, 6-1
down-device programs, 5-1
donut, 5-1
oldmon, 5-50
unitap, 5-89
environment, 1-1
list of
confidence tests, A-I
CPU tests, A-2
deadstart programs, A-8
down-device programs, A-4,5,6
maintenance tests, A-2
utility programs, A-5
maintenance tests
diagnostic memory image, 4-13
examples, 4-7
execution, 4-4
execution times, B-2
messages, 4-12
monitor (olmon), 4-1
overview, 4-1
synopsis, 4-2
termination, 4-7

CRAY PROPRIETARY

Index-3

On-line diagnostics (continued)
monitors
offmon, 2-10
olcmon, 2-1
oldman, 5-50
olmon, 4-1
utilities
olhpa, 7-1
runsequence, 7-14
system, E-l

Program execution
confidence tests
execution times, B-1
olcfdt, 3-1
olcfpt, 3-14
olcm, 3-26·
olcrit, 3-44
olcsvc, 3-66
olibuf, 3-88
olsbt, 3-110
overview, 2-5
deadstart programs
cleario, 6-2
dsdiag, 6-5
down-device programs
donut, 5-5
oldmon, 5-53
unitap, 5-91
examples
confidence tests, 2-6
donut, 5-44
maintenance tests, 4-7
olcfdt, 3-6
olcfpt, 3-18
olcm, 5-30
olcmon, 2-6
olcrit, 3-49
olcsvc, 3-77
oldmon, 5-74
olhpa, 7-9
olibuf, 3-101
olmon, 4-7
olsbt, 3-115
unitap, 5-111
libraries

DIAGPL, C-l
XMPPL, C-2
CRAYIPL, C-2
maintenance tests
execution times, B-2
overview, 4-4
times
confidence tests, B-1
maintenance tests, B-2
utilities
olhpa, 7-1
runsequence, 7-14
Program messages
confidence tests, 2-8
cleario, 6-4
donut, 5-4

Index-4

Program messages (continued)
dsdiag, 6-16
maintenance tests, 4-12
olcfdt, 3-8
olcfpt, 3-23
OIClll, 3-34
olcmon, 2-8
olcrit, 3-57
olcsvc, 3-83
oldmon, 5-87
olhpa, 7-13
olibuf, 3-105
olmon, 4-12
olsbt, 3-126
unitap, 5-111

runsequence
crontab input file, 7-14
overview, 7-14
sequence files, 7-16
shell script, 7-17

Site communications, F-l
Software Problem Report (SPR)
description, D-l
form, D-2
SPR (see Software Problem Report)
Support (see Site communications)

Times (see Program execution times)

unitap
debug tools, 5-102
breakpoint tool, 5-103
channel commands tool, 5-104
compare data tool, 5-107
display data buffer tool, 5-105
packet status tool, 5-110
programming tool, 5-109
system call history tool, 5-108
examples, 5-111
execution, 5-91
learn mode, 5-111
menus
canned test menu, 5-96
debug menu, 5-98
global options menu, 5-99
hardware layout menu, 5-100
main menu, 5-92
test menu, 5-94
variable menu, 5-93
messages, 5-111
overview, 5-89
synopsis, 5-90
trace file, 5-111
Utility programs
list of, A-9
olhpa, 7-1
runsequence, 7-14
system, E-l

CRAY PROPRIETARY

SMM-1012 C

READER'S COMMENT FORM
CRAY Y-MP, CRAY X-MP EA, CRAY X-MP, and CRAY-l Computer Systems
UNICOS On-line Diagnostic Maintenance Manual

SMM-lOl2 C

Your reactions to this manual will help us provide you with better documentation. Please take a moment to
check the spaces below, and use the blank space for additional comments.
1) Your experience with computers: _ _ 0-1 year _ _ 1-5 years _ _5+ years
2) Your experience with Cray computer systems: _ _0-1 year _ _ 1-5 years _ _5+ years
3) Your occupation: _ _ computer programmer _ _ non-computer professional
_ _ other (please specify): _ _ _ _ _ _ _ _ _ _ __
4) How you used this manual: _ _ in a class __as a tutorial or introduction _ _ as a reference guide
__ for troubleshooting
Using a scale from 1 (poor) to 10 (excellent), please rate this manual on the following criteria:
5) Accuracy _ _
6) Completeness _ _
7) Organization __

8) Physical qualities (binding, printing) _ _
9) Readability _ _
10) Amount and quality of examples _ _

Please use the space below, and an additional sheet if necessary, for your other comments about this
manual. If you have discovered any inaccuracies or omissions, please give us the page number on which
the problem occurred. We promise a quick reply to your comments and questions.

Name
--------------------Title _ _ _ _ _ _ _ _ _ __
Company _______________
Telephone ________________
Today's Date _ _ _ _ _ __

Address
--------------------City _____________
State/ Country _ _ _ _ _ __
Zip Code _ _ _ _ _ _ _ __

(")

C

-i
~

r

o
Z

G')

-i

I

en
r

Z

m

FOLD

.-----------------------------------------------~

111111

NO POSTAGE
NECESSARY
IF MAILED
IN THE
UNITED STATES

BUSINESS REPLY CARD
FIRST CLASS

PERMIT NO 6184

ST PAUL. MN

POSTAGE WILL BE PAID BY ADDRESSEE

RESEARCH. INC.

Attention: PUBLICATIONS
1345 Northland Drive
Mendota Heights, MN 55120

-----------------------------------------------~
FOLD

READER'S COMMENT FORM
CRAY Y-MP, CRAY X-MP EA, CRAY X-MP, and CRAY-l Computer Systems
UNICOS On-line Diagnostic Maintenance Manual

SMM-1012 C

Your reactions to this manual will help us provide you with better documentation. Please take a moment to
check the spaces below, and use the blank space for additional comments.
1) Your experience with computers: _ _ 0-1 year _ _ 1-5 years _ _5+ years
2) Your experience with Cray computer systems: _ _0-1 year _ _ 1-5 years _ _5+ years
3) Your occupation: _ _ computer programmer _ _ non-computer professional
__ other (please specify): _ _ _ _ _ _ _ _ _ _ __
4) How you used this manual: _ _ in a class __as a tutorial or introduction _ _ as a reference guide
__ for troubleshooting
Using a scale from 1 (poor) to 10 (excellent), please rate this manual on the following criteria:
5) Accuracy _ _
6) Completeness _ _
7) Organization __

8) Physical qualities (binding, printing) _ _
9) Readability _ _
10) Amount and quality of examples _ _

Please use the space below, and an additional sheet if necessary, for your other comments about this
manual. If you have discovered any inaccuracies or omissions, please give us the page number on which
the problem occurred. We promise a quick reply to your comments and questions.

Name
----------------------_
Title __________
Company ______________
Telephone __________
Today's Date _____________

Address ___________
City _____________
Statel Country ________
Zip Code ____________

("')

C

-I

»

r

o
Z
G)

-I

:::r:
Ui
r

Z

m

FOLD

.-----------------------------------------------~

111111

NO POSTAGE
NECESSARY
IF MAILED
IN THE
UNITED STATES

BUSINESS REPLY CARD
FIRST CLASS

PERMIT NO 6184

ST PAUL, MN

POSTAGE Will BE PAID BY ADDRESSEE

RESEARCH, INC.

Attention: PUBLICATIONS
1345 Northland Drive
Mendota Heights, M N 55120

,-----------------------------------------------~
FOLD

r
r

r

r,
r



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
XMP Toolkit                     : Adobe XMP Core 4.2.1-c041 52.342996, 2008/05/07-20:48:00
Create Date                     : 2009:10:18 21:42:53Z
Creator Tool                    : OmniPage Pro 15
Modify Date                     : 2009:11:08 18:20:23-06:00
Metadata Date                   : 2009:11:08 18:20:23-06:00
Producer                        : Adobe Acrobat 9.0 Paper Capture Plug-in
Format                          : application/pdf
Document ID                     : uuid:8cd1797e-4d36-4ef2-8c92-fe3415c4eed9
Instance ID                     : uuid:a6b7769d-80e2-458a-9d8b-dd3d41a2b05a
Page Count                      : 392
Creator                         : OmniPage Pro 15
EXIF Metadata provided by EXIF.tools

Navigation menu