1990_AMD_29K_Family_Data_Book 1990 AMD 29K Family Data Book
User Manual: 1990_AMD_29K_Family_Data_Book
Open the PDF directly: View PDF .
Page Count: 447
Download | ![]() |
Open PDF In Browser | View PDF |
29K Family 1990 Data Book Advanced Micro Devices Advanced Micro Devices 29K Family Data Book © 1989 Advanced Micro Devices Advanced Micro Devices reserves the right to make changes in its products without notice in order to improve design or performance characteristics. The performance characteristics listed in this document are guaranteed by specific tests, correlated testing, guard banding, design and other practices common to the industry. For specific testing details, contact your local AMD sales representative. The company assumes no responsibility for the use of any circuits described herein. 901 Thompson Place, P.O. Box 3453, Sunnyvale, California 94088-3000 (408)732-2400 TWX: 910-339-9280 TELEX: 34-6306 Am29000, Am29027, Am29041, 29K, ADAPT29K, ASM29K, BTC, Branch Target Cache, Fusion29K, HighC29K, MON29K, PCEB29K, and XRAY29K are trademarks of Advanced Micro Devices, Inc. CROSSTALK is a registered trademark of Digital Communications Associates, Inc. DEC is a registered trademark of Digital Equipment Corporation. Hewlett-Packard is a registered trademark of Hewlett-Packard, Inc. IBM and PC-AT are registered trademarks of International Business Machines Corporation. MetaWare is a trademark of MetaWare, Inc. Motorola and MC68000 are registered trademarks of Motorola, Inc. PAL is a registered trademark of Advanced Micro Devices, Inc. Sun Workstation is a registered trademark of Sun Microsystems, Inc. Sun and Sun-3 are trademarks of Sun Microsystems, Inc. Tektronix is a registered trademark of Tektronix, Inc. UniSite is a trademark of Data I/O Corporation. UNIX is a registered trademark of American Telephone and Telegraph Company. VAX is a registered trademark of Digital Equipment Corporation. Introduction INTRODUCTION The RISC-based Am29000 Streamlined Instruction Processor from Advanced Micro Devices is the highperformance solution for your general-purpose embedded systems needs. As the heart of the 29K Family, this 32bit CMOS microprocessor delivers outstanding performance, yet offers flexible cost-effective solutions that can quickly move your product to market. This data book is your comprehensive guide to AMD's 29K Family of microprocessors and development tools. These products have helped current developers create applications that fully exploit the power of the Am29000 microprocessor: laser printers of all types, real-time graphics systems, networks and bridges, and a host of other peripheral and communication devices. To provide a total system solution for you, AMD has taken the 29K Family's advantages of 17-MIPS performance, flexible memory-configuration requirements, and outstanding development tools and coupled them with our Fusion29KTM program. This program provides you with AMD and industry-standard third-party solutions, including the application-specific solutions you need for successful system integration that can substantially shorten the time-to-market factor of your design. AMD is committed to the 29K Family, and will continue to apply substantial resources to ensure that the present levels of high performance, cost and design flexibility, and rapid design cycles are maintained and further enhanced. Qualified support is readily available for our customers-our highly trained field applications engineers are backed by experts in the factory. For further details on how the 29K Family can be the solution to your deSign needs, call your local AMD sales office or the authorized representative listed in the back of this publication. /d(liGeoff Tate Senior Vice President Microprocessors & Peripherals Group iii 29K Family Data Book PREFACE Advanced Micro Devices' 29JSRCB THEN DEST <-TRUE ELSE DEST <-FALSE CPLT IF SRCA < SRCB THEN DEST <-TRUE ELSE DEST <-FALSE CPLTU IF SRCA < SRCB (unsigned) THEN DEST <-TRUE ELSE DEST <-FALSE CPLE IF SRCA <= SRCB THEN DEST <-TRUE ELSE DEST <- FALSE CPLEU IF SRCA <.. SRCB (unsigned) THEN DEST <-TRUE ELSE DEST <-FALSE CPGT IF SRCA > SRCB THEN DEST <-TRUE ELSE DEST <-FALSE CPGTU IF SRCA > SRCB (unsigned) THEN DEST <-TRUE ELSE DEST <-FALSE CPGE IF SRCA >= SRCB THEN DEST <-TRUE ELSE DEST <-FALSE CPGEU IF SRCA >= SRCB (unsigned) THEN DEST <-TRUE ELSE DEST <-FALSE CPBYTE IF (SRCA.BYTEO = SRCB.BYTEO) OR (SRCA.BYTE1 =SRCB.BYTE1) OR (SRCA.BYTE2 = SRCB.BYTE2) OR (SRCA.BYTE3 = SRCB.BYTE3)THEN DEST <-TRUE ELSE DEST <-FALSE ASEO IF SRCA = SRCB THEN Continue ELSE Trap (VN) ASNEO IF SRCA <> SRCB THEN Continue ELSE Trap (VN) ASLT IF SRCA < SRCB THEN Continue ELSE Trap (VN) ASLTU IF SRCA < SRCB (unsigned) THEN Continue ELSE Trap (VN) ASLE IF SRCA <= SRCB THEN Continue ELSE Trap (VN) ASLEU IF SRCA <= SRCB (unsigned) THEN Continue ELSE Trap (VN) ASGT IF SRCA > SRCB THEN Continue ELSE Trap (VN) ASGTU IF SRCA > SRCB (unsigned) THEN Continue ELSE Trap (VN) ASGE IF SRCA >= SRCB THEN Continue ELSE Trap (VN) ASGEU IF SRCA >= SRCB (unsigned) THEN Continue ELSE Trap (VN) Figure 37. Compare Instructions 1·59 29K Family CMOS Devices Mnemonic Operation Description AND DEST <-SRCA & SRCS ANDN DEST <-SRCA & - SRCS NAND DEST <-- (SRCA & SRCS) OR DEST <-SRCA I SRCS NOR DEST <-- (SRCA I SRCS) XOR DEST <-SRCA ,.. SRCS XNOR DEST <-- (SRCA ,. SRCS) Figure 38. Logical Instructions Mnemonic Operation Description SLL DEST <-SRCA « SRCS (zero fill) SRL DEST <-SRCA » SRCS (zero fill) SRA DEST <-SRCA » SRCS (sign fill) EXTRACT DEST <-high-order word of (SRCAlISRCS « FC) Figure 39. Shift Instructions Reserved Instructions Sixteen Am29000 operation codes are reserved for instruction emulation. These instructions cause traps, much like the floating-point instructions, but currently have no specified interpretation. The relevant operation codes and the corresponding trap vectors are: Operation Codes (hexadecimal) D8-DD E7-E9 F8 FA-FF 1-60 Trap Vector Numbers (decimal) 24-29 39-41 56 58-63 These instructions are intended for future processor enhancements, and users desiring compatibility with future processor versions should not use them for any purpose. Am29000 Mnemonic Operation Description LOAD DEST <-EXTERNAL WORD [SRCB] LOADL DEST <-EXTERNAL WORD [SRCB] assert ·LOCK output during access LOADSET DEST <-EXTERNAL WORD [SRCB] EXTERNAL WORD [SRCB] <-h'FFFFFFFF', assert LOCK output during access LOADM DEST.. DEST + COUNT = SRCB (single-precision) THEN DEST <-TRUE ELSE DEST <-FALSE DGE IF SRCA (double-precision) >= SRCB (double-precision) THEN DEST <-TRUE ELSE DEST <-FALSE FGT IF SRCA (single-precision) > SRCB (single-precision) THEN DEST <-TRUE ELSE DEST <-FALSE DGT IF SRCA (double-precision) > SRCB (double-precision) THEN DEST <-TRUE ELSE DEST <-FALSE SORT DEST (single-precision, double-precision, extended-precision) <-SORT[SRCA (single-precision, double-precision, extended-precision)] CONVERT DEST (integer, single-precision, double-precision) <-SRCA (integer, single-precision, double-precision) CLASS DEST (single-precision, double-precision, extended-precision) <-CLASS[SRCA (single-precision, double-precision, extended-precision)] Figure 42. Floating-Point Instructions 1-62 Am29000 Mnemonic Operation Description CAll DEST <-PCI/OO + 8 PC <-TARGET Execute delay instruction CALLI DEST <-PCI/OO + 8 PC <-SRCB Execute delay instruction JMP PC <-TARGET Execute delay instruction JMPI PC <-SRCB Execute delay instruction JMPT IF SRCA'" TRUE THEN PC <-TARGET Execute delay instruction JMPTI IF SRCA = TRUE THEN PC <-SRCB Execute delay instruction JMPF IF SRCA = FALSE THEN PC <-TARGET Execute delay instruction JMPFI IF SRCA = FALSE THEN PC <-SRCB Execute delay instruction JMPFDEC IF SRCA = FALSE THEN SRCA <-SRCA -1 PC <-TARGET ELSE SRCA <-S·RCA -1 Execute delay instruction Figure 43. Branch Instructions Mnemonic Operation Description CLZ Determine number of leading zeros in a word SETIP Set IPA, IPB, and IPC with operand register numbers EMULATE Load IPA and IPB with operand register numbers, and Trap (VN) INV Reset all Valid bits in Branch Target Cache to zeros IRET Perform an interrupt return sequence IRETINV Perform an interrupt return sequence, and reset all Valid bits in Branch Target Cache to zeros HALT Enter Halt mode on next cycle Figure 44. Miscellaneous Instructions 1-63 29K Family CMOS Devices DATA FORMATS AND HANDLING This section describes the various data types supported by the Am29000, and the mechanisms for accessing data in external devices and memories. The Am29000 includes provisions for the external access of bytes, half-words, unaligned words, and unaligned half-words, as described in this section. Integer Data Types Most Am29000 instructions deal directly with wordlength integer data; integers may be either signed or unsigned, depending on the instruction. Some instructions (e.g., AND) treat word-length operands as strings of bits. In addition, there is support for character, halfword, and Boolean data types. Byte Operations The processor supports character data through load, store, extraction, and insertion operations on wordlength operands, and by a compare operation on bytelength fields within words. The format for unsigned and signed characters is shown in Figure 45; for signed characters, the sign bit is the most-significant bit of the character. For sequences of packed characters within words, bytes are ordered either left-to-right or right-toleft, depending on the BO bit of the Configuration Register (see Special Floating-Point Values section). If the Data Width Enable (OW) bit of the Configuration Register is 1, the Am29000 is enabled to load and store byte data. On a load, an external packed byte is converted to one of the character formats shown in Figure 45. On a store, the low,-order byte of a word is packed into every byte of an external word. The External Data Accesses section describes external byte accesses in more detail. The Extract Byte (EXBYTE) instruction replaces the low-order character of a destination word with an arbitrary byte-aligned character from a source word. Forthe EXBYTE instruction, the destination word can be a zero' word, which effectively zero-extends the character from the source operand. The Insert Byte (INBYTE) instruction replaces an arbitrary byte-aligned character in a destination word with the low-order character of a source word. For the INBYTE instruction, the source operand can be a character constant specified by the instruction. The Compare Bytes (CPBYTE) instruction compares two word-length operands and gives a result of True if any corresponding bytes within the operands have equivalent values. This allows programs to detect characters within words without first having to extract individual characters, one at a time, from the word of interest. Half-Word Operations The processor supports half-word data through load, store, insertion, and extraction operations on wordlength operands. The format for unsigned and signed half-words is shown in Figure 46; for Signed half-words, the sign bit is the most-significant bit of the half-word. For sequences of packed half-words within words, halfwords are ordered either left-to-right or right-to-Ieft, depending on the Byte Order (BO) bit of the Configuration Register (see Addressing and Alignment section). If the Data Width Enable (OW) bit of the Configuration Register is 1, the Am29000 is enabled to load and store half-word data. On a load, an external packed half-word is converted to one of the formats shown in Figure 46. On a store, the low-order half-word of a word is packed into every half-word of an external word. The Extract Half-Word (EXHW) instruction replaces the low-order half-word of a destination word with either the low-order or high-order half-word of a source word. For the EXHW instruction, the destination word can be a zero word, which effectively zero-extends the half-word from the source operand. The Extract Half-Word, Sign-Extended (EXHWS) instruction is similar to the EXHW instruction, except that it sign-extends the half-word in the destination word (Le., it replaces the most-significant 16 bits of the destination word with the most-Significant bit of the source half-word). The Insert Half-Word (INHW) instruction replaces either the low-order or high-order half-word in a destination word with the low-order half-word of a source word. Unsigned: 31 23 15 7 0 III I I I I I I I I I I I I I I I II I I I I II I I I I I I I o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Signed: 31 23 15 data 7 0 II I I I I I I I I I I I I I I I I I I I I I I III I I I I I I s s s s s s s s s s s s s s s s s s s s s s s s s Figure 45. Character Format 1-64 d~a Am29000 Unsigned: 31 23 15 7 0 II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d~a Signed: I" II II I I I I I I I I I III II ". I I II I I I I I I 31 5 23 5 5 5 5 5 5 5 5 5 15 5 5 5 5 5 5 7 0 d~a 5 Figure 46. Half-Word Format Boolean Data Some instructions in the Compare class generate wordlength Boolean results. Also, conditional branches are conditional upon Boolean operands. The Boolean format used by the processor is such that the Boolean values True and False are represented by a 1 or 0, respectively, in the most-significant bit of a word. The remaining bits are unimportant; for the compare instructions, they are reset. Note that twos-complement negative integers are indicated by the Boolean value True in this encoding scheme. Floating-Point Data Typ~s The Am29000 defines single-· and double-precision floating-point formats that comply with the IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std. 754-1985). These data types are not supported directly in processor hardware, but can be implemented by a virtual floating-point interface provided in the Am29000. I n this section, the following nomenclature is used to denote fields in a floating-point value: • s: sign bit • bexp: biased exponent • frac: fraction • sig: significand Typically, the value of a single-precision operand is expressed by: (-1)**s * 1.frac * 2**(bexp-127). The encoding of speCial floating-point values is given in the Special Floating-Point Values section. Double-Precision Floating-Point The format for a double-precision floating-point value is shown in Figure 48. Typically, the value of a double-precision operand is expressed by: (-1)**s * 1.frac * 2**(bexp-1023). The encoding of speCial floating-point values is given in the Special Floating-Point Values section. In order to be properly referenced by a floating-point instruction, a double-precision floating-point value must be double-word aligned. The absolute register number of the register containing the first word (labeled "0" in Figure 48) must be even. The absolute register number of the register containing the second word (labeled "1" in Figure 48) must be odd. If these conditions are not met, the results of the instruction are unpredictable. Note that the appropriate registers for a double-precision value in the local registers depend on the value of the Stack Pointer. Single-Precision Floating-Point The format for a single-precision floating-point value is shown in Figure 47. Figure 47. Single-Precision Floating-Point Format 1-65 29K Family CMOS Devices 31 23 15 I I::::H: :::I:::. .~ra~ s 7 : : : : a ,H··: : : : : : : : I 0 Figure 48. Double-Precision Floating-Point Format Special Floating-Point Values The Am29000 defines floating-point values that are encoded for special interpretation. The values are described in this section. Not-a-Number A Not-a-Number (NaN) is a symbolic value used to report certain floating-point exceptions. It also can be used to implement user-defined extensions to floatingpoint operations. A NaN comprises a floating-point number with maximum biased exponent and non-zero fraction. The sign bit can be either 0 or 1 and has no significance. There are two types of NaN: Signaling NaNs and quiet NaNs. A signaling NaN causes an Invalid Operation exception if used as an input operand to a floatingpoint operation; a quiet NaN does not cause an exception. The Am29000 distinguishes Signaling and quiet NaNs by the most-significant bit of the fraction: a 1 indicates a quiet NaN, and a 0 indicates 2 Signaling NaN. An operation never generates a signaling NaN as a result. A quiet NaN result can be generated in one of two ways: • as the result of an invalid operation that cannot generate a reasonable result, or • as the result of an operation for which one or more input operands are either signaling or quiet NaNs. In either case, the Am29000 produces a quiet NaN having a fraction of 11000 ... 0; that is, the two most-significant bits of the fraction are 11, andthe remaining bits are O. If desired, the Reserved Operand exception can be enabled to cause a Floating-Point Exception trap. The trap handler in this case can implement a scheme whereby user-defined NaN values appear to pass through operations as results, providing overall status for a series of operations. Infinity Infinity is an encoded value used to represent a value that is too large to be represented as a finite number in a given floating-point format. Infinity comprises a floating-point number with maximum biased exponent and zero fraction. The sign bit of an infinity distinguishes +00 from -<><>. 1·66 Denormallzed Numbers The IEEE Standard specifies that, wherever possible, a result that is too small to be represented as a normalized number be represented as a denormalized number. A denormalized number may be used as an input operand to any operation. For single- and double-precision formats, a denormalized number comprises a floatingpoint number with a biased exponent of 0 and a nonzero fraction field; the sign bit can be either 1 or O. The value of a denormalized number is expressed by: (-1)**s· O.frac· 2*"'(-bias+1), where "bias" is the exponent bias for the format in question. Zero A zero comprises a floating-point number with a biased exponent of 0 and a zero fraction field. The sign bit of a zero can be eitherO or 1; however, positive and negative zero are both exactly zero, and are considered equal by comparison operations. External Data Accesses All processor external accesses occur between general-purpose registers and external devices and memories. Accesses occur as the result of the execution of load and store instructions. The load and store instructions specify which general-purpose register receives the data (for a load) or supplies the data (for a store). The format of the load and store instructions is shown in Figure 49. Addresses for accesses are given either by the content of a general-purpose register or by a constant value specified by the load or store instruction. The load and store instructions do not perform address computation directly. Any required address computations are performed explicitly by other instructions. In the load or store instruction, the Coprocessor Enable (CE) bit (bit 23) determines whether or not the access is directed to the coprocessor. If the CE bit is 0, the access is directed to an external device or memory. If the CE bit is 1, data is transferred to or from the coprocessor. The CE bit affects the interpretation of the Control (CNTL) field as well as the channel protocol. This section deals Am29000 31 23 15 7 0 I I I I I I I III I I I I I I I I I I I I II I I I I I I I xx x x x x X M .. CNTL RA . RBor I CE Figure 49. Load/Store Instruction Format with all external accesses other than coprocessor accesses. Current Processor Status Register. If the PA bit is 0, address translation depends on the PO bit. The format of the instructions that do not perform coprocessor data transfers (i.e., in which the CE bit is 0) is shown in Figure 50. The PA bit may be 1 only for Supervisor-mode instructions. If it is 1 for a User-mode instruction, a Protection Violation trap occurs. In load and store instructions, the "RB or I"field specifies the address for access. The address is either the content of a general-purpose register, with register number RB, or a constant with a value I (zero-extended to 32 bits). The M bit determines whether the register or the constant is used. Bit 20: Set Byte Pointer/Sign Bit (SB)-If the Data Width Enable (OW) bit of the Configuration Register is 0 and the SB bit is 1, the Byte Pointer Register is written with the two least-significant bits of the address for the access. These address bits can control subsequent character and half-word operations. If the BP bit is 0, the Byte Pointer Register is not affected. The data for the access is written into the generalpurpose register RA for a load, and is supplied by register RA for a store. The definitions for other fields in the load or store instruction are given below: Bit 23: Coprocessor Enable (CE)-The CE bit is 0 for a non-coprocessor load or store. Bit 22: Address Space (AS)-If the AS bit is 0 for an untranslated load or store, the access is directed to instruction/data memory. If the AS bit is 1 for an untranslated load or store, the access is directed to inpuVoutput. The AS bit must be 0 for a translated load or store; if the AS bit is 1 for a translated load or store, a Protection Violation trap occurs. The address space for a translated load or store is determined by the InpuVOutput (IO) bit of the associated TLB entry. ' Bit 21: Physical Address {PA)-The PA bit may be used by a Supervisor-mode program to disable address translation for an access. If the PA bit is 1, then address translation is not performed for the access, regardless of the value of the Physical Addressing/Data (PO) bit in the 31 23 If the Data Width Enable (OW) bit of the Configuration Register is 1 and the SB bit is 1 for a load, the loaded byte or half-word is sign-extended in the destination register; if the SB bit is 0, the byte or half-word is zero-extended. If the OW bit is 1 and the SB bit is 1 for either a load or store, then each bit of the Byte Pointer Register is written with the complement of the Byte Order bit of the Configuration Register. The Byte Pointer Register is set in this case to provide software compatibility across different types of memory systems. If the SB bit is 0, the Byte Pointer Register is not affected. Bit 19: User Access (UA)-The UA bit allows programs executing in the Supervisor mode to emulate User-mode accesses. This allows checking of the authorization of an access requested by a User-mode program. It also causes address translation (if applicable) to be performed using the PID field of the MMU Configuration Register, rather than the fixed Supervisor-mode process identifier zero. If the UA bit is 1 for a Supervisor-mode load or store, the access associated with the instruction is performed in 15 7 II I I I I I I ~X 0 IIII III XXXXXM I. I : : : : • : I CE : PA: I I AS RB or I I : UA SB Figure SO. Non-Coprocessor Load/Store Format 1-67 29K Family CMOS Devices the User mode. In this case, the User mode affects only TLB protection checking, the SUP/US output, and the use of the PID field in translation; it has no effect on the registers that can be accessed by the instruction. If the UA bit is 0, the program mode for the access is controlled by the SM bit. If the UA bit is 1 for a User-mode load or store, a Protection Violation trap occurs. Bits 18-16: Option (OPT}-This field is placed on the . OPT2-OPTo outputs during the address cycle of the access. There is a one-to-one correspondence between the OPT field and the OPT2-0PTo outputs; that is, the most-significant OPT bit is placed on OPT2, and so on. The OPT field controls system functions as described below. .,' Bits 15-8: (RA)-The data for the access is written into the general-purpose register RA for a load, and is supplied by register RA for a store. Bits 7-0: (RB or I)-In load and store instructions, the "RB or I" field specifies the address for the access. The address is either the content of a general-purpose register with register number RB, or a constant value I (zero-extended to 32 bits). The M bit of the operation code (bit 24) determines whether the registerorthe constant is used. Load and store operations are overlapped with the execution of instructions that follow the load or store instruction. Only one load or store may be in progress on any given cycle. If a load or store instruction is encountered while another load or store operation is in progress, the processor enters the Pipeline Hold mode until the first operation is completed. However, the address for the second operation may appear on the address bus if the first operation is to a device or memory that supports pipe lined operations (see Pipelined Accesses section). Load Operations The processor provides the following instructions for performing load operations: Load (LOAD), Load and Lock (LOADL), Load and Set (LOADSET), and Load Multiple (LOADM). All of these instructions transfer data from an external device or memory into one or more general-purpose registers. The LOADL instruction supports the implementation of device and memory interlocks in a multiprocessor configuration. It activates the LOCK output during the address cycle of the access. The lOADSET instruction implements a binary semaphore .It loads a general-purpose register and automatically writes the accessed location with a word that has 1 in every bit position (that is, the write is indivisible from the read). The LOCK output is asserted during both the read and write accesses. Note that, if address translation is enabled for the LOADSET instruction, the TLB memory-protection bits must allow both the read and 1-68 write accesses. If either the read or write access is not allowed, neither access is performed. The LOADM loads a specified number of registers from sequential addresses, as explained below. Load operations are overlapped with the execution of instructions that follow the load instruction. The processor detects any dependencies on the loaded data that subsequent instructions may have, and, if such a dependency is detected, enters the Pipeline Hold mode until the data are returned by the external device or memory. If a register that is the target of an incomplete load is written with the result of a subsequent instruction, the processor does not write the returning data into the register when the load is completed; the Not Needed (NN) bit in the Channel Control Register is set in this case. Store Operations The processor provides the following instructions for performing store operations: Store (STORE), Store and Lock (STOREL), and Store Multiple (STOREM). All of these instructions transfer data from one or more ,general-purpose registers to an external device or memory. The STOREL instruction supports the implementation of device and memory interlocks in a multiprocessor configuration. It activates the LOCK output during the address cycle of the access. The STOREM instruction stores a specified number of registers to sequential addresses, as explained below. Store operations are overlapped with the execution of instructions that follow the store instruction. However, no data dependencies can exist since the store prevents any subsequent accesses until it is completed. Multiple Accesses Load Multiple (LOADM) and Store Multiple (STOREM) instructions move contiguous words of data between general-purpose registers and external devices and memories. The numberof transfers is determined by the Load/Store Count Remaining Register. The Load/Store Count Remaining (CR) field in the Load/ Store Count Remaining Register specifies the number of transfers to be performed by the next LOADM or STOREM executed in the instruction sequence. The CR field is in the range of 0 to 255 and is zero-based; a count value of 0 represents one transfer, and a count value of 255 represents 256 transfers. The CR field also appears in the Channel Control Register. Before a LOADM or STOREM is executed, the CR field is set by a Move To Special Register. A LOADM or STOREM uses the most recently written value of the CR field. If an attempt is made to alter the CR field and the Channel Control Register contains information for an external access that has not yet been completed, the processor enters the Pipeline Hold mode until the access is completed. Note that since the CR is set independently of the LOADM and STOREM, the CR field may represent a valid state of an interrupted program even if the Contents Valid (CV) bit of the Channel Control Register is O. Because of the pipelined implementation of LOADM and STOREM, at least one instruction (e.g., the instruction that sets the CR field) must separate two successive LOADM and/or STOREM instructions. After the CR field is set, the execution of a LOADM or STOREM begins the data transfer. As with any other load or store operation, the LOADM or STOREM waits until any pending load or store operation is complete before starting. The LOADM instruction specifies the starting address and starting destination generalpurpose register. The STOREM instruction specifies the starting address and the starting source generalpurpose register. During the execution of the LOADM or STOREM instruction, the processor updates the address and register number after every access, incrementing the address by 4 and the register number by 1. This continues until either all accesses are completed or an interrupt or trap is taken. For a Load Multiple or Store Multiple address sequence, addresses wrap from the largest possible value (hexadecimal FFFFFFFC) to the smallest possible value (hexadecimal 00000000). The processor increments absolute register numbers during the Load Multiple or Store Multiple sequence. Absolute register numbers wrap from 127 to 128, and from 255 to 128. Thus, a sequence that begins in the global registers may make a transition to the local registers, but a sequence that begins in the local registers remains in the local registers. Also, note that the local registers are addressed circularly. The normal restrictions on register accesses apply for the Load Multiple and Store Multiple sequences. Forexample, if a protected general-purpose register is encountered in the sequence for a User-mode program, a Protection Violation trap occurs. Intermediate addresses are stored in the Channel Address Register, and register numbers are stored in the Target Register (TR) field of the Channel Control Register. For the STOREM instruction, the data for every access is stored in the Channel Data Register (this register also is set during the execution of the LOADM instruction, but has no interpretation in this case). The CR field is updated on the completion of every access so that it indicates the number of accesses remaining in the sequence. Load Multiple and Store Multiple operations are indicated by the Multiple Operation (ML) bit in the Channel Am29000 Control Register. This bit may be 1 even though the CR field has a value of 0 (indicating that one transfer remains to be performed). The ML bit is used to restart a multiple operation on an interrupt return; if it is set independently by a Move To Special Register before a load or store instruction is executed, the results are unpredictable. While a multiple load orstore is executing, the processor is in the Pipeline Hold mode, suspending any subsequent instruction execution until the multiple access is completed. If an interrupt or trap is taken, the Channel Address, Channel Data, and Channel Control registers contain the state of the multiple access at the point of interruption. The multiple access may be resumed at this point, at a later time, by an interrupt return. The processor attempts to complete multiple accesses using the burst-mode capability of the channel (see Burst-Mode Accesses section). Forthis reason, multiple accesses of individual bytes and half-words are not supported. If the burst-mode access is preempted, the processor retransmits the address at the point of preemption. If the external device or memory cannot support burst-mode accesses, the processor transmits an address for every access. If the address sequence causes a virtual page-boundary crossing, the processor preempts the burst-mode access, translates the address for the new page, and reestablishes the burstmode access using the new physical address. The last load or store is executed as a simple access. The processor will preempt burst-mode transfer immediately prior to the last word of the transfer. Option Bits The Option field in the load and store instructions supports system functions, such as byte and half-word accesses. The definition of this field for a load or store, depending on the AS bit of the instruction, is as follows: AS OPTz x x 0 0 0 1 x 0 0 0 OPT1 OPTo 0 0 1 0 0 1 0 0 0 1 -all others- 1 0 Meaning Word-length access Byte access Half-word access Instruction ROM access (as data) Cache control ADAPT29K accesses Reserved Note that some of these encodings do not affect processor operation, and could have other interpretations in a particular system. For example, the OPT values 000, 001, and 010 affect processor operation only if the OW bit of the Configuration Register is 1. However, nonstandard uses of the OPT field have an implication on the portability of software between different systems. 1-69 29K Family CMOS Devices Addressing and Alignment Address Spaces External instructions and data are contained in one of four 32-bit address spaces: 1. Instruction/Data Memory 2. Input/Output 3. Coprocessor 4. Instruction Read-Only Memory (Instruction ROM). An address in the instruction/data memory address space may be treated as virtual or physical, as determined by the Current Processor Status Register. Address translation for data accesses is enabled separately from address translation for instruction accesses. A program in the Supervisor mode may temporarily disable address translation for individual loads and stores; this permits load-real and store-real operations. It is possible to partition physical instruction and data addresses into two separate physical address spaces. However, virtual instruction and data addresses appear in the same virtual address space (Le., instruction/data memory). The coprocessor address space is not an address space in the strictest sense. The coprocessor address space is defined so that transfers of operands and operation codes to the coprocessor do not interfere with other external devices and memories. The processor does not directly support the access of the instruction ROM address space using loads and stores; this capability is defined as a system option requiring external hardware. For untranslated data accesses, bits contained in load and store instructions distinguish between the instruction/data memory, inpUt/output, and coprocessor address spaces. For translated data accesses, the Input/ Output bit of the associated TLB entry distinguishes between the instruction/data memory and input/output address spaces. For instruction fetches, the ROM Enable (RE) bit of the Current Processor Status Register distinguishes between the instruction/data and instruction ROM address spaces. Byte and Half-Word Addressing The Am29000 generates word-oriented byte addresses for accesses to external devices and memorie's. Addresses are word-oriented because loads, stores, and instruction fetches access words. However, addresses are byte addresses because they are sufficient to select bytes packed within accessed words. For load and store operations, the processor provides means for using the least-significant address bits to access bytes and halfwords within external words. 1-70 The selection of a byte within a word is determined by the two least-significant bits of an address and the Byte Order (BO) bit of the Configuration Register. The selection of a half-word within a word is determined by the next-to-Ieast-significant bit of an address and the BO bit. Figure 51 illustrates the addressing of bytes and halfwords when the BO bit is 0, and Figure 52 illustrates the addressing of bytes and half-words when the BO bit is 1. In Figure 51 and Figure 52, addresses are represented in hexadecimal notation. In the processor, the two least-significant bits of an external address can be reflected in the Byte Pointer (BP) field of the ALU Status Register when the OW bit of the Configuration Register is O. Alternatively, the two leastsignificant bits of the address can be used to control byte and half-word accesses when the OW bit is 1. The BO bit affects only the interpretation of the BP field and the two least-Significant address bits. If the BO bit is 0, bytes are ordered within words such that a 00 in the BP field or in the two least-significant address bits selects the high-order byte of a word, and a 11 selects the low-order byte. If the BO bit is 1, a 00 in the BP field or in the two least-significant address bits selects the low-order byte of a word, and a 11 selects the high-order byte. If the BO bit is 0, half-words are ordered within words such that a 0 in the most-significant bit of the BP field or the next-to-Ieast-significant address bit selects the highorder half-word, and a 1 selects the low-order half-word. If the BO bit is 1, a 0 in the most-significant bit of the BP field or the next-to-Ieast-significant address bit selects the low-order half-word of a word, and a 1 selects the high-order half-word. Note that since the least-significant bit of the BP field or an address does not participate in the selection of half-words, the alignment of halfwords is forced to half-word boundaries in this case. Alignment of Words and Half-Words Since only byte addressing is supported, it is possible that an address for the access of a word or half-word is not aligned to the desired word or half-word. The Am29000 either ignores or forces alignment in most cases. However, some systems may require that unaligned accesses be supported for compatibility reasons. Because of this, the Am29000 provides an option that creates a trap when a nonaligned access is attempted. This trap allows software emulation of the nonaligned accesses in a manner that is appropriate for the particular system. The detection of unaligned accesses is activated by a 1 in the Trap Unaligned Access (TU) bit of the Current Processor Status Register. Unaligned access detection is based on the data length as indicated by the OPT field of a load or store instruction, and on the two least-significant bits of the specified address. Only addresses for instruction/data memory accesses are checked; align- Am29000 31 o 7 15 23 Word 00000000 Half-Word 00000002 Half-Word 00000000 Byte 00000000 Byte 00000001 Byte 00000002 Byte 00000003 Word 00000004 Half-Word 00000006 Half-Word 00000004 Byte 00000004 Byte 00000005 Byte 00000006 Byte 00000007 Word FFFFFFFC Half-Word FFFFFFFC Byte FFFFFFFC Half-Word FFFFFFFE Byte FFFFFFFD Byte FFFFFFFE Byte FFFFFFFF Figure 51. Byte and Half-Word Addressing with BO = 0 o 31 Word 00000000 Half-Word 00000002 Byte 00000003 Half-Word 00000000 Byte 00000002 Byte 00000001 Byte 00000000 Word 00000004 Half-Word 00000006 Byte 00000007 Half-Word 00000004 Byte 00000006 Byte 00000005 Byte 00000004 Word FFFFFFFC Half-Word FFFFFFFE Byte FFFFFFFF Byte FFFFFFFE Half-Word FFFFFFFC Byte FFFFFFFD Figure 52. Byte and Half-Word Addressing with BO Byte FFFFFFFC =1 1·71 29K Family CMOS Devices ment is ignored for input/output accesses and coprocessor transfers. An Unaligned Access trap occurs only if the TU bit is 1 and any of the following combinations of OPT field and address bits is detected for a load or store to instructionl data memory: o o o o o 0 0 0 o o o o o o o 1 1 1 0 1 1 Unaligned word access' Unaligned half-word access The trap handler for the Unaligned Access trap is responsible for generating the correct sequence of aligned accesses and performing any necessary shifting, masking andlor merging. Note that a virtual pageboundary crossing also may have to be considered. Alignment of Instructions Inthe Am29000, all instructions are 32 bits in length, and are aligned on word-address boundaries. The processor's Program Counter is 30 bits in length, and the leastsignificant 2 bits of pro<:essor-generated instruction addresses are always 00. An unaligned address can be generated by indirect jumps and calls. However, alignment is ignored by the processor in this case, and it expects the system to force alignment (Le., by interpreting the two least-significant address bits as 00, regardless of their values). half-word accesses, but hardware accesses require that the system be able to selectively write individual byte and half-word positions within external devices and memories. The software-only technique is compatible with systems designed to provide hardware support for byte and half-word accesses. This section describes the operation of both software and hardware byte and half-word accesses. Byte and half-word accesses operate as described here for memory and input/output accesses, but not for coprocessor transfers. Coprocessor transfers are unaffected by the OW bit. The OW bit is cleared by a processor reset. It must explicitly be set to 1 by software before hardware byte and half-word accesses can be performed. Software Byte and Half-Word Accesses If the OW bit is 0, the Am29000 allows the Byte Pointer Registerto be set with the least-significant bits of an ad-' dress specified by any load or store instruction, except those that transfer information to and from the coprocessor. Insert and extract instructions can then be used to access the byte or half-word of interest, after the external-word has been accessed. This provides a general-' purpose mechanism for manipulating external byte and half-word data, without the need for external hardware support. To load a byte or half-word, a word load is first performed. This load sets the BP field with the two leastsignificant bits of the address. A subsequent EXBYTE, EXHW, or EXHWS instruction extracts the byte or halfword of interest from the accessed word. Accessing Instructions as Data To aid the external access of instructions and data on separate buses, the processor distinguishes between instruction and data accesses. However, it does not support a logical distinction between instruction and data address spaces (except in the case of instruction read-only memory). In particular, address translation in the Memory Management Unit is in no way affected by this distinction (although memory protection is). To store a byte or half-word, a load is first performed, setting the BP field with the two least-significant bits of the address. A subsequent INBYTE or INHWinstruction inserts the byte or half-word of interest into the accessed word, and the resulting word is then stored. In systems where it is necessary to access instructions as data, this function should be performed via the shared address space. The OPT field provides a means for loads to access instructions in the instruction readonly memory (ROM) address space. The Am29000 does not take any action to prevent a store to the instruction ROM address space. Hardware Byte and Half-Word Accesses If the OW bit is 1 on a load, the Am29000 selects a byte or half-word from the loaded word depending on the Option (OPT) bits of the load instruction, the Byte Order (BO) bit of the Configuration Register, and the two leastsignificant bits of the address (for bytes) or the next-toleast-significant bit of the address (for half-words). The selected byte or half-word is right-justified within the destination register. If the SB bit of the load instruction is 0, the remainder of the destination register is zeroextended. If the SB bit is 1, the remainder of the destination register is sign-extended with the sign bit of the selected byte or half-word. Byte and Half-Word Accesses The Am29000 can perform byte and half -word accesses in either software or hardware under control of the Data Width Enable (OW) bit of the Configuration Register. Software byte and half-word accesses are selected by a OW bit of 0, and hardware byte and half-word accesses are selected by a OW bit of 1. Software byte and halfword accesses are less efficient than hardware byte and Software that relies on loads and stores setting the BP field cannot operate correctly when the Freeze (FZ) bit of the Current Processor Status Register is 1, because the ALU Status Register is frozen. If the OW bit is 1 on a store, the Am29000 replicates the low-order byte or half-word in the source register into Am29000 every byte and half-word position of the stored word. The system is responsible for generating the appropriate byte and/or half-word strobes, based on the OPT2OPT0 signals and the two least-significant bits of the address, to write the appropriate byte or half-word in the selected device or memory (the system byte order must also be considered). The SB bit does not affect the operation of a store, except for setting the BP field as described below. If the SB bit is 1 for either a load or store and the OW bit is also 1, both bits of the BP field are set to the complement of the BO bit when the load or store is executed. This does not directly affect the load or store access, but supports compatibility for software developed for wordwrite-only systems. Hardware byte and half-word accesses-in contrast to software byte and half-word accesses-<;an be performed when the FZ bit is 1, because these accesses do not rely on the BP field. System Alternatives and Compatibility The two mechanisms for performing byte and half-word accesses create the possibility of two types of systems. These are named for convenience: .. Type 1: simple, word-only accesses in external devices and memories; software byte and half-word accesses. .. Type 2: byte/half-word strobes in external devices and memories; hardware byte and halfword accesses by the Am29000. 2. Perform a byte extract on the loaded word. • Type 1 system: The byte selected by the BP field is aligned to the low-order byte of the destination register and the remainder of the word is zero-extended. The selected byte may be in any byte position. II Type 2 system: The byte selected by the BP field (set to point to the low-order byte) is aligned to the low-order byte of the destination register and the remainder of the word is zeroextended. (Note that the selected byte was already in the low-order byte position. This operation does not change the program state but merely allows software compatibility.) The recommended instruction sequences for all types of byte and half-word accesses and for both types of systems are enumerated below. Compatibility between these systems follows the above example, but for brevity, compatibility is not described in detail here. Byte read, unsigned: Comments load O,17,temp,addr exbyte temp,temp,O Comments load O,1,temp,addr The provision for hardware byte and half-word accesses encourages Type 2 systems. Software for Type 1 systems can execute on Type 2 systems, but the reverse is not true. Software compatibility is possible primarily because of the OW bit and because the Am29000 sets the BP field with an appropriate byte pointer even when it performs byte and half-word accesses with internal hardware. Also, the system must return a full word in either type of system, regardless of the access datawidth. The OW bit must be 0 in Type 1 systems and must be 1 in Type 2 systems. To illustrate compatibility between systems, consider the following steps of an unsigned byte load compiled for a Type 1 system, but executing on a Type 2 system: 1. Perform a load with OPT =001 and SB =1. II Type 1 system: The addressed word is accessed and placed into the destination register. The BP field is set with the two least-significant bits of the address. .. Type 2 system: The addressed byte is accessed, aligned, padded, and placed into the destination register. The BP field is set to point to the low-order byte, reflecting the alignment that has been performed (the pointer depends on the value of the BO bit). ; OPT =001, SB =1 ; get byte ; OPT =001, SB =0 Byte read, signed: Comments load O,17,temp,addr exbyte temp,temp,O sll temp,temp,24 sra temp,temp,24 ; OPT =001, SB =1 ; get byte ; sign extend Comments load O,17,temp,addr ; OPT =001, SB =1 (sign extended) Byte Write: Comments load O,17,temp,addr inbyte temp,temp, data store O,1,temp,addr ; OPT =001, SB =1 ; insert byte IY.l2tl Comments store O,1,data,addr ; OPT =001, S8 =0 ; store 1-73 29K Family CMOS Devices Half-word read, unsigned: Half-word write: Comments ; OPT = 010, S8 = 1 ; get half-word unsigned load 0,18,temp,addr ; OPT = 010, S8 = 1 inhw temp,temp,data ; insert half-word store 0,2,temp,addr ; store Imtl Comments Imtl Comments load 0,2,temp,addr ; OPT= 010, S8=0 store 0,2,data,addr ; OPT = 010, S8 = Half-word read, signed: Comments load 0,18,temp,addr exhws temp,temp ; OPT = 010, S8= 1 ; get half-word signextend Comments load 0,18,temp,addr 1-74 Comments load 0,18,temp,addr exhw temp,temp,O ; OPT =010, S8 = 1, (sign-extend) ° Am29000 INTERRUPTS AND TRAPS Interrupts and traps cause the Am29000 to suspend the execution of an instruction sequence and to begin the execution of a new sequence. The processor mayor may not later resume the execution of the original instruction sequence. Current Processor Status; a 1 in the OA bit disables traps, and a 0 enables traps. It is not possible to selectively disable individual traps. The distinction between interrupts and traps is largely one of causation and enabling. Interrupts allow external devices and the Timer Facility to control processor execution, and are always asynchronous to program execution. Traps are intended to be used for certain exceptional events that occur during instruction execution, and are generally synchronous to program execution. A wait-for-interrupt capability is provided by the Wait mode. The processor is in the Wait mode whenever the Wait Mode (WM) bit of the Current Processor Status is 1. While in Wait mode, the processor neither fetches nor executes· instructions and performs no external accesses. The Wait mode is exited when an interrupt or trap is taken. Throughout this manual, a distinction is made between the point at which an interrupt or trap occurs and the point at which it is taken. An interrupt or trap is said to occur when all conditions that define the interrupt or trap are met. However, an interrupt or trap that occurs is not necessarily recognized by the processor, either because of various enables or because of the processor's operational mode (e.g., Halt mode). An interrupt ortrap is taken when the processor recognizes the interrupt or trap and alters its behavior accordingly. Note that the processor can take only those interrupts or traps for which it is enabled, even in the Wait mode. For example, if the processor is in the Wait mode with a OA bit of 1, it can leave the Wait mode only via the Reset mode or a WARN trap. Interrupts Interrupts are caused by signals applied to any of the external inputs INTIb-INTRo, or by the Timer Facility. The processor may be disabled from taking certain interrupts by the masking capability provided by the Oisable All Interrupts and Traps (OA) bit, Oisable Interrupts (01) bit, and Interrupt Mask (1M) field in the Current Processor Status Register. The OA bit disables all interrupts and most traps. The 01 bit disables external interrupts without affecting the recognition of traps and Timer interrupts. The 2-bit 1M field selectively enables external interrupts as follows: 1M Value 00 01 10 11 Result IN"fRo enabled IN~-IN"fRo enabled INTR:z-IN"fRo enabled IN1B:,-INlRo enabled Note that the INTRo interrupt cannot be disabled by the 1M field. Also, note that no external interrupt is taken if either the OA or 01 bit is 1. The Interrupt Pending bit in the Current Processor Status indicates that one or more of the signals INTIb-INTRo is active, but that the corresponding interrupt is disabled due to the value of either OA, 01, or 1M. Traps Traps are caused by signals applied to one of the inputs TRAP1-TRAPo, or by exceptional conditions such as protection violations. Except for the Instruction Access Exception, Oata Access Exception, and Coprocessor Exception traps, traps are disabled by the OA bit in the Wait Mode Vector Area Interrupt and trap processing rely on the existence of a user-managed Vector Area in external instruction/data memory or instruction read-only memory (instruction ROM). The Vector Area begins at an address specified by the Vector Area Base Address Register, and provides for as many as 256 different interrupt and trap handling routines. The processor reserves 24 routines for system operation and 40 routines for instruction emulation. The number and definition of the remaining 192 possible routines are system-dependent. The Vector Area has one of two possible structures as determined by the Vector Fetch (VF) bit in the Configuration Register. The first structure, as described below, requires less external memory than the second, but imposes the performance penalty of the vector-table lookup. If the VF bit is 1, the structure of the Vector Area is a table of vectors in instruction!data memory. The layout of a single vector is shown in Figure 53. Each vector gives the beginning word-address of the associated interrupt or trap handling routine, and specifies, by the R bit, whether the routine is contained in instruction/data memory (R = 0) or instruction ROM (R = 1). If the VF bit is 0, the structure of the Vector Area is a segment of contiguous blocks of instructions in instruction! data memory or instruction ROM. The ROM Vector Area (RV) bit of the Configuration Register determines whether the Vector Area is in instruction!data memory (RV = 0) or instruction ROM (RV = 1). A 64-instruction block contains exactly one interrupt or trap handling routine, and blocks are aligned on 64-instruction address boundaries. Vector Numbers When an interrupt or trap is taken, the processor determines an 8-bit vector number associated with the interrupt or trap. The vector number gives either the number 1-75 29K Family CMOS Devices 31 23 15 7 0 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I III Handler Starting Address R 0 Figure 53. Vector Table Entry of a vector table entry or the number of an instruction block, depending on the value of the VF bit. If the VF bit is 1, the physical address of the vector table entry is generated by replacing bits 9-2 of the value in the Vector Area Base Address Register with the vector number. If the VF bit is 0, the physical address of the first instruction of the handling routine is generated by replacing bits 15-8 of the value in the Vector Table Base Address Register with the vector number. Vector numbers are either predefined or specified by an instruction causing the trap. The assignment of vector numbers is shown in Figure 54 (vector numbers are in decimal notation). Vector numbers 64 to 255 are for use by trapping instructions; the definition of the routines associated with these numbers is system-dependent. Interrupt and Trap Handling Interrupt and trap handling consists of two distinct operations: taking the interrupt or trap, and returning from the interrupt or trap handler. If the interrupt or trap handler returns directly to the interrupted routine, the interrupt or trap handler need not save and restore processor state. Taking an Interrupt or Trap The following operations are performed in sequence by the processor when an interrupt or trap is taken: 1. Instruction execution is suspended. 2. Instruction fetching is suspended. 3. Any in-progress load or store operation is completed. Any additional operations are canceled in the case of Load Multiple and Store Multiple. 4. The contents of the Current Processor Status Register are copied into the Old Processor Status Register. 5. The Current Processor Status register is modified as shown in Figure 55 (the value "u" means unaffected). Note that setting the Freeze (FZ) bit freezes the Channel Address, Channel Data, Channel Control, Program Counter 0, Program Counter 1, Program Counter 2, and ALU Status Registers. 6. The address of the first instruction of the interrupt or trap handler is determined. If the VF bit of 1-76 the Configuration Register is 1, the address is obtained by accessing a vector from instruction! data memory, using the physical address obtained from the Vector Area Base Address Register and the vector number. This access appears on the channel as a data access, and the OPT2-0PTo signals indicate a word-length access. If the VF bit is 0, the instruction address is given directly by the Vector Area Base Address Register and the vector number. 7. If the VF bit is 1, the R bit in the vector fetched in Step 6 is copied into the RE bit of the Current Processor Status Register. If the VF bit is 0, the RV bit of the Configuration Register is copied into the RE bit. This step determines whether or not the first instruction of the interrupt handler is in instruction ROM. 8. An instruction fetch is initiated using the instruction address determined in Step 6. At this point, normal instruction execution resumes. Note that the processor does not explicitly save the contents of any registers when an interrupt is taken. If register saving is required, it is the responsibility of the interrupt or trap-handling routine. For proper operation, registers must be saved before any further interrupts or traps may be taken. The FZ bit must be reset at least two instructions before interrupts or traps are reenabled to allow the program state to be reflected properly in processor registers if an interrupt or trap is taken. Returning from an Interrupt or Trap Two instructions are used to resume the execution of an interrupted program: Interrupt Return (IRET), and Interrupt Return and Invalidate (IRETINV). These instructions are identical except in one respect: the IRETINV instruction resets all Valid bits in the Branch Target Cache, whereas the IRET instruction does not affect the Valid bits. In some situations, the processor state must be set properly by software before the interrupt return is executed. The following is a list of operations normally performed in such cases: 1. The Current Processor Status is configured as shown in Figure· 55 (the value "x" is a "don't care"). Note that setting the FZ bit freezes the registers listed below so that they may be set for the interrupt return. Am29000 Number 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24-29 30 31 32 33 34 35 36 37 38 39-41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58-63 64-255 Type of Trap or Interrupt Illegal Opcode Unaligned Access Out of Range Coprocessor Not Present Coprocessor Exception Protection Violation Instruction Access Exception Data Access Exception User-Mode Instruction TLB Miss User-Mode Data TLB Miss Supervisor-Mode Instruction TLB Miss Supervisor-Mode Data TLB Miss Instruction TLB Protection Violation Data TLB Protection Violation Timer Trace INlRo INlR, INm INm TRAPo TRAP1 Floating·Point Exception reserved reserved for instruction emulation (op codes 08-00) MULTM MULTMU MULTIPLY DIVIDE MULTIPLU DIVIDU CONVERT SORT CLASS reserved for instruction emulation (op codes E7-E9) FEO DEO FGT DGT FGE DGE FADD DADO FSUB DSUB FMUL DMUL FDIV DDIV reserved for instruction emulation (op code F8) FDMUL reserved for instruction emulation (op codes FA-FF) Assert and EMULATE instruction traps (vector number specified by instruction) Cause executing undefined instruction access on unnatural boundary, TU = 1 overflow or underflow coprocessor access, CP =0 coprocessorDERRresponse invalid User-mode operation IERR response DERRresponse,notcoprocessor no TLB entry for translation " " " TLB UE/SE=O TLB URISR = 0, UW/SW =0 on write Timer Facility Trace Facility INlRo input INlR, input INminput INminput TRAPo input TRAP1 input unmasked floating-point exception MULTM instruction MULTMU instruction MULTIPLY instruction DIVIDE instruction MULTIPLU instruction DIVIDU instruction CONVERT instruction SORT instruction CLASS instruction FEO instruction DEO instruction FGT instruction DGT instruction FGE instruction DGE instruction FADD instruction DADO instruction FSUB instruction DSUB instruction FMUL instruction DMUL instruction FDIV instruction DDIV instruction FDMUL instruction Figure 54. Vector Number Assignments 1·77 29K Family CMOS Devices "~-------""'V"----_..J~ Reserved I I: I I I I I I I I I I I I I : I I : I IP : TP : FZ: RE I CA :- I I: I I PO : SM I TE TU : I I : I : I I 1M : OA I I I LK WM PI Figure 55. Current Processor Status after an Interrupt or Trap 2. The Old Processor Status is set to the value of the Current Processor Status for the target routine. Current Processor Status, for Steps 3 through 10. 3. If the interrupt return instruction is an IRETINV, all Valid bits in the Branch Target Cache are reset. 3. The Channel Address, Channel Data, and Channel Control registers are set to restart or resume uncompleted channel operations of the target routine. 4. The contents of the Old Processor Status Register are copied into the Current Processor Status Register. This normally resets the FZ bit allowing the Program Counter 0, 1,2, Channel Address, Data, Control, and ALU Status registers to update normally. Since certain bits of the Current Processor Status Register always are updated by the processor, this copy operation may be irrelevant for certain bits (e.g., the Interrupt Pending bit). 4. The Program Counter 1 and Program Counter 0 registers are set to the addresses of the first and second instructions, respectively, to be executed in the target routine. S. Other registers are set as required. These may include registers such as the ALU Status, 0, and so forth, depending on the particular situation. Some of these registers are unaffected by the FZ bit, so they must be set in such a manner that they are not modified unintentionally before the interrupt return. 5. If the Contents Valid (CV). bit of the Channel Control Register is 1, and the Not Needed (NN) and Multiple Operation (ML) bits are both 0, an external access is started. This operation is based on the contents of the Channel Address, Channel Data, and Channel Control registers. The Current Processor Status Register conditions the access-as is normally the case. Note that Load Multiple and Store Multiple operations are not restarted at this point. Once the processor registers are configured properly, as described above, an interrupt return instruction (IRET or IRETINV) performs the remaining steps necessary to return to the target routine. The following operations are performed by the interrupt return instruction: 1. Any in-progress load or store operation is completed. If a Load Multiple or Store Multiple sequence is in progress, the interrupt return is not executed until the sequence is completed. 6. The address in Program Counter 1 is used to fetch an instruction. The Current Processor Status Register conditions the fetch. This step is treated as a branch in the sense that the proces- 2. Interrupts and traps are disabled, regardless of the settings of the OA, 01, and 1M fields of the 31 23 15 CA 7 TE TU LK WM Figure 56. Current Processor Status Before Interrupt Return 1-78 o Am29000 sor searches the Branch Target Cache for the target of the fetch. 7. The instruction fetched in Step 6 enters the decode stage of the pipeline. 8. The address in Program Counter 0 is used to fetch an instruction. The Current Processor Status Register conditions the fetch. This step is treated as a bra!1ch in. the sense that the processor searches the Branch Target Cache for the target of the fetch~ 9. The instruction fetched in Step 6 enters the execute stage of'the pipeline, and the instruction fetched in Step 8 enters the decode stage. 10. ,If the CV bit in the Channel Control Register is a 1, the NN bit is 0, and the ML bit is 1, a Load Multiple or Store Multiple sequence is started, based on the contents of the Channel Address, Channel Data, and Channel Control registers. 11. Interrupts and traps are enabled per the appropriate bits in the Current Processor Status Register. . 12. The processor resumes normal operation. Fast Interrupt Processing The registers affected by the FZ bit of the Current Processor Status Register are those that are modified by almost any usual sequence of instructions. Since the FZ bit is set by an interrupt or trap, the interrupt or trap handier is able to execute while not disturbing the state of the interrupted routine, though its execution is somewhat restricted. Thus, it is not necessary in many cases for the interrupt or trap handler to save the registers that are affected by the FZ bit. The processor provides an additional benefit if the Program Counter 0 and Program Counter 1 registers are not modified by the interrupt or trap handler. If Program Counters 0 and 1 contain the addresses of sequential instructions when an interrupt or trap is taken, and if they are not modified before an interrupt return iS'executed, Step 8 of the interrupt return sequence above occurs as a sequential fetch-instead of a branch-for the interrupt return. The performance impact of a sequential fetch is normally less than that of a nonsequential fetch. Because the registers affected by the FZ bit are sometimes required for instruction execution, it is not possible for the interrupt or trap handler to execute all instructions unless the required registers are first saved elsewhere (e.g., in one or more global registers). Most of the restrictions due to register dependencies are obvious (e.g., the Byte Pointer for byte extracts), and will not be discussed here. Other less obvious restrictions are listed below: 1. Load Multiple and Store Multiple. The Channel Address. Channel Data. and Channel Control registers are used to sequence Load Multiple and Store Multiple operations, so these instructions cannot be executed while the registers are frozen. However, note that other external accesses may occur; the Channel Address, Channel Data, and Channel Control registers are required only to restart an access after an exception, and the interrupt ortrap handler is not expected to encounter any exceptions. 2. Loads and stores that set the Byte Pointer. If the Set Byte Pointer (SB) of a load or store instruction is 1 and the FZ bit is also 1 , there is no effect on the Byte Pointer. Thus, the execution of external byte and half-word accesses using this mechanism is not possible. 3. Extended arithmetic. The Carry bit of the ALU Status Register is not updated while the FZ bit is 1. 4. Divide step instructions. The Divide Flag of the ALU Status Register is not updated when the FZ bitis 1. If the interrupt or trap handler does not save the state of the interrupted routine, it cannot allow additional interrupts and traps. Also, the operation of the interrupt or trap handler cannot depend on any trapping instructions (e.g., Floating;Point instructions, illegal operation codes, arithmetic overflow, etc.) since these are disabled. There are certain cases, however, where traps are unavoidable; these are discussed in the Arithmetic Exceptions section. WARN Trap The processor recognizes a special trap, caused by the activation of the WARN input. that cannot be masked. The WARN trap is intended to be used for severe systern-error or deadlock conditions. It allows the processor to be placed in a known, operable state, while preserving much of its original state for error reporting and possible recovery. Therefore. it shares some features in common with the Reset mode as well as features common to other traps described in this section. The major differences between the WARN trap and other traps are: 1. The processor does not wait for an in-progress external access to be completed before taking the trap, since this access might not be completed. However, the information related to any outstanding access is retained by the Channel Address, Channel Data, and Channel Control registers when the trap is taken. 2. The vector-fetch operation is not performed, regardless of the VF bit of the Configuration Register, when the WARN trap is taken. Instead. the ROM Enable (RE) bit in the Current Processor Status is set, and instruction fetching begins immediately at Address 16 in the instruction ROM. 1-79 29K Family CMOS Devices The trap handler executes directly from the instruction ROM without the need to access external (and possibly nonfunctional or invalid) instruction/data memory. Note that WARN trap may disrupt the state of the routine that is executing when it is taken, prohibiting this routine from being restarted. Sequencing of Interrupts and Traps On every cycle, the processor decides eitherto execute instructions or to take an interrupt or trap. Since there are multiple sources of interrupts and traps, more than one interrupt or trap may be pending on a given cycle. To resolve conflicts, interrupts and traps are taken according to the priority shown in Figure 57. In this table, interrupts and traps are listed in order of decreasing priority. This section discusses the first three columns of Figure 57. The last two columns are discussed in the Exception Reporting and Restarting section. In Figure 57, interrupts and traps fall into one of two categories depending on the timing of their occurrence relative to instruction execution. These categories are indicated in the third column by the labels "inst" and "async." These labels have the following meanings: . 1. Inst-Generated by the execution or attempted execution of an instruction. 2. Async-Generated asynchronous to and independent of the instruction being executed, although it may be a result of an instruction executed previously. The principle for interrupt and trap sequencing is that the highest priority interrupt or trap is taken first. Other interrupts and traps remain active until they can be taken, or are regenerated when they can be taken. This is accomplished, depending on the type of interrupt or trap, as follows: 1. All traps in Figure 57 with Priority 13 or 14 are regenerated by the re-execution of the causing instruction. 2. Most of the interrupts and traps of Priorities 4 through 12 must be held by external hardware until they are taken. The exceptions to this are listed in (3) below. 3. The exceptions to (2) above are the Data Access Exception trap, the Coprocessor Exception trap, the Timer interrupt, and the Trace trap. These are caused by bits in various registers in the processor and are held by these registers until taken or cleared. The relevant bits are: the Transaction Faulted (TF) bit of the Channel Control Register for Data Access Exception and Coprocessor Exception traps, the Interrupt (IN) bit of the Timer Reload Register for Timer inter1-80 rupts, and the Trace Pending (TP) bit of the Current Processor Status Register for Trace traps. 4. All traps of Priorities 2 and 3 in Figure 57, except for the Unaligned Access trap, are not regenerated. These traps are mutually exclusive and are given high priority because they cannot be regenerated; they must be taken if they occur. If one of these traps occurs at the same time as a reset or WARN trap, it is not taken, and its occurrence is lost. 5. The Unaligned Access trap is regenerated internally when an extemal access is restarted by the Channel Address, Channel Data, and Channel Control registers. Note that this trap is not necessarily exclusive to the traps discussed in (4) above. Note that the Channel Address, Channel Data, and Channel Control registers are set for a WARN trap only if an external access is in progress when the trap is taken. Exception Reporting and Restarting When an instruction encounters an exceptional condition, the Program Counter 0, Program Counter 1, and Program Counter 2 registers report the relevant instruction address(es), and allow the instruction sequence to be restarted once the exceptional condition has been remedied (if possible). Similarly, when an external access or coprocessor transfer encounters an exceptional condition, the Channel Address, Channel Data, and Channel Control registers report information on the access or transfer, and allow it to be restarted. This section describes the interpretation and use of these registers. The "PC 1" column in Figure 57 describes the value held in the Program Counter 1 Register (PC 1) when the interrupt ortrap is taken. For traps in the "inst" category, PC1 contains either the address of the instruction causing the trap, indicated by "curr," or the address of the instruction following the instruction causing the trap,indicated by "next." For interrupts and traps in the "async" category, PC1 contains the address of the first instruction, which was not executed due to the taking of the interrupt or trap. This is the next instruction to be executed upon interrupt return, as indicated by "next" in the PC1 column. Instruction Exceptions Fortrapscaused by the execution of an instruction (e.g., the Out of Range trap), the Program Counter 2 Register contains the address of the instruction causing the trap. In all of these cases, PC1 is in the "next" category. The Exception Opcode Register contains the operation code of the instruction causing the trap. The traps associated with instruction fetches (Le., those of Priority 13) occur only if the processor attempts the execution of the associated instruction. An exception Am29000 Priority 1 (highest) Type Of Interrupt Or Trap WARN InstlAsync PC1 Channel Regs async next see Note 1 2 User-Mode Data TLB Miss Supervisor-Mode Data TLB Miss Data TLB Protection Violation inst inst inst next next next all all all inst inst inst inst inst inst inst inst inst inst inst inst inst next next next next next next next next next next next next next all all 3 Unaligned Access Coprocessor not Present Out of Range Floating-Point Exceptions Assert Instructions Floating-Point Instructions MULTIPLY MULTM DIVIDE MULTIPLU MULTMU DIVIDU EMULATE N/A N/A N/A N/A N/A N/A N/A NlA N/A N/A N/A 4 Data Access Exception Coprocessor Exception async async next next all all 5 TRAPo async next multiple 6 'fRAP, async next multiple 7 INTR" async next multiple 8 IN~ async next multiple 9 INTR.z async next multiple 10 INTR., async next multiple 11 Timer async next multiple 12 Trace async next multiple 13 User-Mode Instruction TLB Miss Supervisor-Mode Instr. TLB Miss Instruction TLB Protection Violation Instruction Access Violation inst inst inst inst curr curr curr curr N/A N/A N/A N/A Illegal Opcode Protection Violation inst inst curr curr N/A N/A 14 (lowest) Note: The Channel Address, Channel Data, and Channel Control registers are set for a WARN trap only if an external access is in progress when the trap is taken. Figure 57. Interrupt and Trap PrlorHy Table may be detected during an instruction prefetch, but the associated trap does not occur if a nonsequential fetch occurs before the processor attempts the execution of the invalid instruction. This prevents the spurious indication of instruction exceptions. Data Exceptions The "Channel Regs" column of Figure 57 indicates the cases for which the Channel Address, Channel Data, and Channel Control registers contain information re- 1·81 29K Family CMOS Devices lated to an external access or coprocessor transfer (these registers collectively are termed "channel registers" in the following discussion). For the cases indicated, the access or transfer was not completed because of some exceptional condition. Note that the Channel Data Register contains relevant information only in the case of a store. Forthe WARN trap, the channel registers are valid only if a load or store were in progress when the trap was taken. Recall that the WARN trap does not wait for any in-progress access to be completed. For the traps with an "all" in the "Channel Regs" column of Figure 57, the channel registers contain information relevant to the trap in all cases. These traps are associated with exceptional events during external accesses or coprocessor transfers. For the traps with a "multiple" in the "Channel Regs" column, the channel registers might contain information for restarting an interrupted Load Multiple or Store Multiple operation. In these cases, the operation did not encounter an exception, but was simply canceled for latency considerations. The information contained in the channel registers allows the processor to restart the related operation during an interrupt return sequence, without any special assistance by software. Software must only ensure that the relevant information is retained in, or restored to, the channel registers before an interrupt return is executed. Arithmetic Exceptions trap unless the divisor is O. If the divisor is 0, an Out of Range trap always occurs, regardless of the DO bit. In addition to the operations described in the Interrupt and Trap Handling section, the following operations are performed when an Out of Range trap is taken: 1. The operation code of the instruction causing the exception is placed in the lOP field of the Exception Opcode Register. 2. For the MULTIPLY, MULTIPLU, DIVIDE, and DIVIDU instructions, the absolute register numbers of the excepting instruction's source and destination registers are placed into the Indirect Pointer A,lndirect PointerB, and Indirect Pointer C registers. 3. For the MULTIPLY, MULTIPLU, DIVIDE, and DIVIDU instructions, the destination register or registers are unchanged. Floating-Point Exceptions A Floating-Point Exception trap occurs when an exception is detected during a floating-point operation, and the exception is not masked by the corresponding bit of the Floating-Point Mask Register. In this context, a floatingpoint operation is defined as any operation that accepts a floating-point number as a source operand, that produces a floating-point result, or both. Thus, for example, the CONVERT instruction may create an exception while attempting to convert a floating-point value to an integer value. Integer and floating-point instructions can cause Out of Range or Floating-Point Exception traps, respectively, if an exception is detected during the arithmetic operation. This section describes the conditions under which these traps occur and the additional operations performed beyond those described in the Interrupt and Trap Handling section. In addition to the operations described in the Interrupt and Trap Handling section, the following operations are performed when a Floating-Point Exception trap is taken: Integer Exceptions Some integer add and subtract instructions-ADDS, ADDU, ADDCS, ADDCU, SUBS, SUBU, SUBCS, SUBCU, SUBRS, SUBRU, SUBRCS, and SUBRCUcause an Out of Range trap upon overflow or underflow of a 32-bit signed or unsigned result, depending on the instruction. 2. The status of the trapping operation is written into the trap status bits of the Floating-Point Status Register. The status bits that are written do not depend on the values of the corresponding mask bits in the Floating-Point Environment Register. Two integer multiply instructions-MULTIPLY and MULTI PLU-cause an Out of Range trap upon overflow of a 32-bit signed or unsigned result, respectively, if the MO bit of the Integer Environment Register isO. If the MO bit is 1, these multiply instructions cannot cause an Out of Range trap. Two integer divide instructions-DIVIDE and DIVIDUtake the Out of Range trap upon overflow of a 32-bit Signed or unsigned result, respectively, if the DO bit of the Integer Environment Register is O. If the DO bit is 1, the divide instructions cannot cause an Out of Range 1·82 1. The operation code of the instruction causing the exception is placed in the lOP field of the Exception Opcode Register. 3. The absolute register numbers of the excepting instruction's source and destination registers are placed into the Indirect Pointer A, Indirect Pointer B, and Indirect Pointer C registers. If the RB or RC fields specify a function code, that code is transferred to the corresponding indirect pointer. Note that if the most-significant bit of the this function code is 1, the value of the Stack Am29000 Pointer has been added to the RS field and must be subtracted to recover the original field. 4. The destination register or registers are left unchanged. Exceptions During Interrupt and Trap Handling In most cases, interrupt and trap handling routines are executed with the DA bit in the Current Processor Status having a value of 1. It is assumed that these routines do not create many of the exceptions possible in most other processor routines, so most of these are ignored. If the assumption of no exceptions is not valid for a particular interrupt or trap handler, it is important that the handler save the state of the processor and reset the FZ bit of the Current Processor Status, 50 that the handler itself may be restarted properly. This must be accomplished before any interrupts or traps can be taken. In this case, the state (or the state of some other process) must be restored before an interrupt return is executed. It is possible that errors reported via the IERR and DERR signals are associated with hardware errors, independent of any routine being executed. For this reason, the Instruction Access Exception, Data Access Exception, and Coprocessor Exception traps cannot be disabled by the DA bit, and the processor may take one of these traps even while handling another interrupt or trap. If the processor does take an unmaskable trap while handling another interrupt or trap, and the state of the interrupt ortrap handler is not reflected in processor registers, it is not possible to return to the point at which the unmaskable trap is taken. When the unmaskable trap is taken, the processor state saved is that state associated with the original interrupt or trap, not with the unmaskable trap; however, the Old Processor Status Register is modified to reflect the Current Processor Status Register of the interrupt or trap handler. This situation, indicated by the DA bit being.1 in the Old Processor Status Register, may not be recoverable. 1·83 29K Family CMOS Devices MEMORY MANAGEMENT The Am29000 incorporates a Memory Management Unit (MMU) for performing virtual-to-physical address translation and memory access protection. This section describes the logical operation of the Memory Management Unit. Address translation can be performed only for instruction/data memory accesses. No address translation is performed for instruction ROM, input/output, coprocessor, or interrupt/trap vector accesses. However, an instruction/data memory access can be redirected to input/output by· the address-translation process. Translation Look-Aside Buffer The MMU stores the most recently performed address translations in a special cache, the Translation LookAside Buffer (TLB). All virtual addresses generated by the processor are translated by the TLB. Given a virtual address, the TLB determines the corresponding physical address. The TLB reflects information in the processor system page tables, except that it specifies the translation for many fewer pages; this restriction allows the TLB to be Entry # Line 0 o TLB Set 0 incorporated on the processor chip where the performance of address translation is maximized. A diagram of the TLB is shown in Figure 58. The TLB is a table of 64 entries, divided into two equal sets, called Set oand Set 1. Within each set, entries are numbered 0 to 31. Entries in different sets that have equivalent entry numbers are grouped into a unit called a line; there are thus 32 lines in the TLB, numbered 0 to 31. Each TLB entry is 64 bits long and contains mapping and protection information for a single virtual page. TLB entries may be inspected and modified by processor instructions executed in the Supervisor mode. The layout of TLB entries is described in the Register Description section. The TLB stores information about the ownerShip of the TLB entries in an 8-bit Task Identifier (TID) field in each entry. This makes it possible for the TLB to be shared by several independent processes without the need for invalidation of the entire TLB as processes are activated. It also increases system performance by permitting processes to warm-start (i.e., to start execution on the Entry TLB Set 1 # o -------------- ~-------------------------+----------~------------------------~ Line 1 ______________ Line 2 2 2 -------------Line 3 3 ~------------------------_+----------i-----------------------------~ ~-------------------------+----------~-------------------------~ 3 ~-------------------------+----------i-----------------~ Line 4 4 4 ~-------------------+----------~---------------~ ---------------~---------------------+----------~--------------------~ Line 31 31 31 ---------------~----------------.----------~------------------~ ..-...... 64 bits - - . ..-...... 64 bits --. Figure 58. translation Look-Aside Buffer Organization 1-84 Am29000 Address Translation Controls processor with a certain number of TLB entries remaining in the TLB from a previous execution). The processor attempts to perform address translation for the following external accesses: Each TLB entry contains a Usage bit to assist managementof the TLB entries. The Usage bit indicates which set of the entry within a given line was least recently used to perform an address translation. Usage bits for two entries in the same line are equivalent. 1. Instruction accesses, if the Physical Addressing/ Instructions (PI) and ROM Enable (RE) bits of the Current Processor Status are both O. 2. User-mode accesses to instruction/data mem- The TLB contains other fields, described in the following sections. ory if the Physical Addressing/Data (PO) bit of the Current Processor Status is O. Address Translation 3. Supervisor-mode accesses to instruction/data memory if the Physical Address (PA) bit of the load or store instruction performing the access is 0, and the PO bit of the Current Processor Status is O. For the purpose of address translation, the virtual instruction/~ata address space of a process is partitioned into regions of fixed size, called pages. Pages are mapped by the address-translation process into equivalent-sized regions of physical memory, called page frames. All accesses to instructions or data contained within a given page use the same virtual-to-physical address translation. Address translation also is controlled by the MMU Configuration Register. This register specifies the virtual page size and contains an a-bit Process Identifier (PID)' field. The PID field specifies the process number associated with the currently running program, if this is a Usermode program. Supervisor-mode programs are assigned a fixed process number of o. The process number is compared with Task Identifier (TID) fields of the TLB entries during address translation. The TID field of a TLB entry must match the process number for the translation to be valid. Virtual addresses are partitioned into three fields forthe address-translation process, as shown in Figure 59. The partitioning of the virtual address is based on the page size. Page sizes may be of 1, 2, 4, or a kb, as specified by the MM U Configuration Register. The fields shown in Figure 59 are described in the following ' discussion. 1-kb Page Size: 31 23 15 2-kb Page Size: 31 23 15 4-kb Page Size: 31 23 15 7 8-kb Page Size: 31 23 15 7 7 Figure 59. Virtual Address for 1-, 2-, 4-, and 8-kb Pages 1-85 29K Family CMOS Devl~es Address Translation Process The address-translation process is diagrammed in Figure 60. Address translation is performed by the following fields in the TLB entry: the Virtual Tag (VTAG), the Task Identifier (TID), the Valid Entry (VE) bit, the Real Page Number (RPN) field, and the Input/Output (10) bit. To perform an address translation, the processor accesses the TLB line whose number is given by certain bits in the virtual address. The bits used depend on the page size as follows: Page Size 1 kb 2kb 4kb 8kb Virtual Address Bits (for Line Access) bit-numbers are relative to the VTAG field, not the TLB entry): Page Size Virtual Address Bits 1 kb 2kb 4kb 8 kb 31-15 31-16 31-17 31-18 VTAG Bits 16-0 16-1 16-2 16-3 Certain bits of the VTAG field do not participate in the comparison for page sizes largerthan 1 kb. These bits of the VTAG field are required to be O. For an address translation to be valid, the lollowing conditions must be met: 14-10 15-11 16-12 17-13 1. The virtual address bits match corresponding bits of the VTAG field as specified above. The accessed line contains two TLB entries, which in turn contain two VTAG fields. The VTAG fields are both compared to bits in the virtual address. This comparison depends on the page size as follows (note that VTAG 2. For a User-mode access, the TID field in the TLB entry matches the PIO field in the MMU Configu- Virtual Address TLB Set 1 : Number :U, 10 VirtuaW, I Task Real Page I PGM Tag :PROlID : Number : U, 10 ,~--~--~~------~--~ Protection Violation MPGMo-1 Physical Address Figure 60. Address Translation Process 1·86 Am29000 ration Register. For a Supervisor-mode access, the TID field is O. 3. The VE bit in the TLB entry is 1. 4. Only one entry in the line meets conditions 1, 2, and 3 above. If this condition is not met, the results of the translation may be treated as valid by the processor, but the results are unpredictable. If the address' translation is valid fo r one TLB entry in the selected line, the RPN field in this entry is used to form the physical address of the access. The RPN field gives the portion of the physical address that depends on the translation; the remaining portion of the virtual address, called the Page Offset, is invariant with address translation. The Page Offset comprises the low-order bits of the virtual address, and gives the location of a byte (because of byte addressing) within the virtual page. This byte is located at the same position in the physical page frame, so the Page Offset also comprises the low-order bits of the physical address. The 32-bit physical address is the concatenation of certain bits of the RPN field and Page Offset, where the bits from each depend on the page size as follows (note that RPN bit numbers are relative to the RPN field, not the TLB entry): Page Size 1 kb 2kb 4kb 8kb RPN Bits 21-0 21-1 21-2 21-3 Virtual Address Bits for Page Offset 9-0 10-0 11-0 12-0 Note that certain bits of the RPN field are not used in forming the physical address for page sizes greater than 1 kb. These bits of the RPN are required to be O. In addition, for certain instruction accesses, the Page Offset is incremented by 16. The address space of the physical address is determined by the InpuVOutput (10) bit of the TLB entry. If the 10 bit is 0, the address is in the instruction/data memory address space. If the 10 bit is 1, the address is in the inpuVoutput address space. Successful and Unsuccessful Translations If an address translation is successful, the TLB entry is further used to perform protection checking for the access. Bits in the TLB make it possible to restrict accesses-independently for Supervisor-mode and Usermode accesses-to any combination of load, store, and instruction accesses, or to no access. If the address translation is valid and no protection violation is detected, the physical address from the translation is placed on the processor's address bus and the access is initiated. If the translation is not valid or a protection violation is detected, a trap occurs. Depending on the state of the channel interface, the access reguest may be placed on the address bus with the signal BINV asserted, even though the trap occurs. Also, if the address translation is successful and there is no protection violation, the PGM bits from the TLB entry used for translation are placed on the MPGM1-MPGMo outputs during the address cycle for the access. If address translation is not performed, these pins are both Low for the address cycle. If the TLB cannot translate an address, a TLB miss occurs. The MMU causes a trap if either a TLB miss occurs, or the translation is successful and a protection violation is detected. The processor distinguishes between traps caused by instruction and data accesses, and between traps caused by User and Supervisormode accesses, as follows: Trap Vector Number 8 9 10 11 12 13 Type of Trap User-Mode Instruction TLB Miss User-Mode Data TLB Miss Supervisor-Mode Instruction TLB Miss Supervisor-Mode Data TL Miss Instruction TLB Protection Violation Data TLB Protection Violation The distinction between the above traps is made to assist trap handling,' particularly the routines that load TLB entries. Reload So that the MMU may support a large variety of memorymanagement architectures, it does not directly load TLB entries that are required for address translation. It simply causes a TLB miss trap when address translation is unsuccessful. The trap causes a program-called the TLB reload routine-to execute. The TLB reload routine is defined according to the structure and access method of the page table contained in an external device or memory. When a TLB miss trap occurs, the LRU Recommendation Register is written with the TLB register number for Word 0 of the TLB entry to be used by the TLB reload routine. For instruction accesses, the Program Counter 1 Register contains the instruction address that was not successfully translated. Fordata accesses, the Channel Address Register contains the data address that was not successfully translated. The TLB reload routine determines the translation for the address given by the Program Counter 1 Register or Channel Address Register, as. appropriate. The TLB reload routine uses an external page table to determine the required translation, and loads the TLB entry indicated by the LRU Recommendation Register so that the entry may perform this translation. In a demand-paged 1-87 29K Family CMOS Devices in a system during process switching. However, it is important to manage TLB entries so that an invalid match cannot occur between the PID field and the TID field of an old TLB entry. environment, the TLB reload routine may additionally invoke a page-fault handler when the translation cannot be performed. TLB entries are written by the Move To TLB (MTILB) instruction, which copies the contents of a generalpurpose register into a TLB register. The TLB register number is specified by bits 6-0 of a general-purpose register. TLB entries are read by the Move From TLB (MFTLB) instruction, which copies the contents of a TLB register into a general-purpose register. Again, the TLB register number is specified by a generalpurpose register. Protection If an address translation is performed successfully, the TLB entry used in address translation is used to perform protection checking for the access. There are 6 bits in the TLB entry for this purpose: Supervisor Read (SR), Supervisor Write (SW), Supervisor Execute (SE), User Read (UR), User Write (UW), and User Execute (UE). These bits restrict accesses, depending on the program mode of the access, as shown in Figure 61 (the value "x" is a "don't care"). Entry Invalidation There are two methods for invalidating TLB entries that are no longer required at a given point in program execution. The first involves resetting the Valid Entry bit of a single entry (this is done by a Move To TLB instruction). The second involves changing the value of the Process Identifier (PID) field of the MMU Configuration Register; this invalidates all entries whose Task Identifier (TID) fields do not match the new value. Note that for the Load and Set (LOADSET) instruction, the protection bits must be set to allow both the load and store access. If this condition does not hold, neither access is performed. If protection checking indicates that a given access is not allowed, a Data TLB Protection Violation or Instruction TLB Protection Violation trap occurs. The cause of the trap is determined by inspection of the Program Counter 1 Register for an Instruction TLB Protection Violation, or by inspection of the contents of the Channel Address and Channel Control registers for a Data TLB Protection Violation. If an entry is invalidated by changing the PID field, the TLB entry still remains valid in some sense. If the PID field is changed again to match the TID field, the entry may once again participate in address translation. This ability can be used to reduce the number of TLB misses SR SW SE UR UW x x x x x x x x x x x x x x x x x x x x x x x x 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 x x x x x x x x x x x x x x x x x x x x 1 0 1 1 1 1 x x x x UE 1 0 1 0 1 0 1 Type of Access Allowed No user access User instruction User store User store or instruction User load User load or instruction User load or store Any user access No supervisor access Supervisor instruction Supervisor store .Supervisor store or instruction Supervisor load Supervisor load or instruction Supervisor load or store Any supervisor access Figure 61. TLB Access Protection 1-88 Am29000 CHANNEL DESCRIPTION The processor channel provides the bandwidth required for performance, while permitting the connection of many different types of devices. This section describes the channel and methods of connecting devices and memories to the processor. The channel consists of three 32-bit synchronous buses with associated control and status signals: the Address Bus, Data Bus, and Instruction Bus. The Address Bus transfers addresses and control information to devices and memories. The Data Bus transfers data to and from devices and memories. The Instruction Bus transfers instructions to the processor from instruction memories. In addition, a set of signals allows control of the channel to be relinquished to an external master. There are five logical groups of signals performing five distinct functions, as follows (since some signals perform more than one function, a signal may appear in more than one group): 1. Instruction Address Transfer and Instruction Access Requests: A:Jl-Ao, SUP/US, MPGM1MPGMo, PEN, IREO, IREOT, PIA, BINV 2. Instruction Transfer: 131-10, IBREO, IRDY, IERR, IBACK 3. Data Address Transfer and Data Access Requests: A31-AD, R/W, SUP/US, LOCK, MPGMIMPGMo, PEN, DREO, DREOT1-DREOTo, OPT2-0PTo, PDA, BINV 4. Data Transfer: D31-Do, DB REO, DRDY, DERR, DBACK, CDA 5. Arbitration: BREO, BGRT, BINV User-Defined Signals There are two types of user-defined outputs on the processorto control devices and memories directly in a system-dependent manner. Each of these outputs is valid simultaneously with-and for the same duration asthe address for an access. The first set of user-defined signals, MPGM1-MPGMo, is determined by the PGM bits in the Translation LookAside Buffer entry used in address translation. If address translation is not performed, these outputs are both Low. The second set of signals, OPT2-OPTo, is determined by bits 18-16 of the load 0 r store instruction that initiates an access. These signals are valid only for data accesses, and have a predefined interpretation for coprocessor data transfers. Standard interpretations of OPT2-OPTo are given in the Pin Description section. Since the OPT2-0PTo signals are determined by instructions, they have an impact on application-software compatibility, and system hardware should use the given definitions of OPT2-0PTo. The OPT2-OPTo signals are used to encode byte and half-word accesses. However, for a load, the system should return an entire aligned word, regardless of the indicated data width. Note that the standard interpretations of OPT2-0PTo apply only to accesses to instruction/data memory and inpuVoutput. Other interpretations may be used ior coprocessor transfers. For interrupt and trap vector fetches, the MPGMIMPGMo and OPT2-0PTo outputs are all Low. Instruction Accesses Instruction accesses occur to one of two address spaces: instruction/data memory and instruction readonly memory (instruction ROM). The distinction between these address spaces is made by the I REOT signal, which is in turn derived from the ROM Enable (RE) bit of the Current Processor Status Register. These are truly distinct address spaces; each may be populated independently based on the needs of a particular system. Instruction/data memory contains both instructions and data. Although the channel supports separate instruction and data memories, the Memory Management Unit does not. In certain systems, it may be required to access instructions via loads and stores, eVl3n though instructions may be contained in physically separate memories. For example, this requirement might be imposed because of the need to load instructions into memory. Note also that the OPT2-0PTo signals may be used to allow the access of instructions in instruction ROM, using loads; the Am29000 does not prevent a store to the instruction ROM, and protection against stores to the instruction ROM must be provided externally, if required. All processor instruction fetches are read accesses, and the R/W signal is High for all instruction fetches. Data Accesses Data accesses occur to one of three address spaces: instruction/data memory, inpuVoutput (liD), and the coprocessor. The distinction between these spaces is made by the DREOT1-DREOTo Signals, which are in turn determined by the load or store instruction that initiates a data access. Each of these address spaces is distinct from the others. The protocol for data transfers to and from the coprocessor is slightly different than the protocol for instruction/ data memory and I/O accesses. Data accesses may occur either from a slave device or memory to the processor (for a load), or from the processor to a slave device or memory (for a store). The direction of transfer is determined by the RiW signal. In the case of a load, the processor requires that data on the data bus be held valid only for a short time before the end of a cycle. In the case of a store, the processor 1-89 29K Family CMOS Devices drives the data bus as soon as the bus is available and holds the data valid until the slave device or memory signals that the access is complete. Reporting Errors The successful completion of an instruction access is indicated by an active level on the IRDY input, and the successful completion of a data access is indicated by an active level on the DRDY input. If there are exceptional conditions for which an instruction or data access cannot be completed successfully, the unsuccessful completion is indicated by an active level on the IERR or DERR input, as appropriate. If the processor receives an IERR or DERR in response to an instruction or data access, it ignores the content of the instruction or data bus and the value of IRDY or DRDY. An IERR response causes an Instruction Access Exception trap, unless it is associated with an instruction that the processor does not ultimately execute (because of a nonsequential instruction fetCh). A DERR response always causes either a Data Access Exception trap or a Co-processor Exception Trap. The processor supports the restarting of unsuccessful accesses upon an interrupt return. In the case of an unsuccessful instruction access, the restart is performed by the Program Counter 0 and Program Counter 1 registers. In the case of an unsuccessful data access, the restart is performed by the Channel Address, Channel Data, and Channel Control registers. In any event, the control program must determine whether or not an access can and/or should be restarted. The Instruction Access Exception and Data Access Exception traps cannot be masked. If one of these traps occurs within an interrupt or trap handler, the processor state may not be recoverable. Access Protocols Figure 62 shows a control flowchart for accesses performed by the Am29000. This control flow applies independently to both instruction and data accesses. Since the processor performs concurrent instruction and data accesses, these accesses may be at different points in the control flow at any given point in time. Simple Accesses For a simple access, the processor holds the address valid throughout the entire access. This protocol is used for single-cycle accesses, and for accesses to simple devices and memories. On any cycle before the completion of the access, a simple access may be converted to a pipe lined access (by the assertion of PEN) or to a burst-mode access (by the assertion of IBACK or DBACK, if the processor is asserting IBREQ or DBREQ). Thus, the protocol for simple accesses also may be used during the initial cycles of pipelined and/or burst-mode accesses. This is advantageous, for example, in cases where the slave device or memory either. requires the address to be held for mUltiple cycles at the beginning of the pipelined or burstmode access, or cannot respond to the pipelined or burst-mode request within one cycle. Pipelined Accesses A pipe lined access is one that starts before an earlier inprogress accesses completed. The in-progress access is called a primary access and the second access is called a pipelined access. A pipe lined access is of the same type as the primary access. For example, an instruction access that begins before the completion of a data access is not'considered to be a pipe lined access, whereas a second data access is. The ,Am29000 allows only one pipelined access at any given time. Tradeoffs For accesses that require more than one cycle to complete, pipe lined accesses perform better than simple accesses because they allow the overlap of portions of two accesses. In addition, the ability to latch addresses in support of pipe lined accesses reduces utilization of the address bus, thereby reducing contention between instruction and data accesses. However, devices and memories that support pipe lined accesses are somewhat more complex than devices and memories that support only simple accesses. Note that the items on the flowchart of Figure 62 do not represent actual states and have no particular relationship to processor cycles. The flowchart provides only a high-level understanding of the control flow. Also, exceptions and error conditions are not shown. Support for pipe lined operations is required for both the primary access and the pipelined access. The slave performing the primary access must contain some means for storing the address and other information about the access. The slave performing the pipe lined access must be able to restrict its use of the instruction bus or data Bus, and must be prepared to cancel the access (as explained below). The channel supports three protocols for accesses: simple, pipelined, and burst-mode. These are described in the following sections. The various protocols are defined to accommodate minimum-latency accesses as well as maximum-transfer-rate accesses. The protocols allow an access to complete in a single cycle, although they support accesses requiring arbitrary numbers of cycles. Address transfers for accesses may be independent of instruction or data transfers. Plpellned Operation Pipelined accesses are controlled by the signals PEN, PIA, and PDA. Because of internal data-floW constraints, the Am29000 does not perform a pipelined store operation while a load is in progress. However, the protocol does not restrict pipelined operations. Other channel masters may perform a pipe lined store during a load. 1-90 Am29000 PROCESSOR SLA VE DEVICE ------------------------,---------------------------------------NO ACCESS -----------~---------------------------------------PRIMARY ACCESS Assert ~. t:rnrn Drive result and TROY or t5'ImV Primary Access Complete ---.--------~----------------------, PIPELINED ACCESS Assert J5iA. J5[5A Figure 62. Channel Flowchart 1-91 29K Family CMOS Devices Except as noted above, the processor attempts to perform pipe lining for every access; the input PEN indicates whether or not pipelining is supported for a given access. The PEN input can be driven by individual devices, or can be tied active or inactive to enable or disable system-wide pipelined accesses. The processor ignores the value of PEN unless it is performing an access. The processor samples PEN on every cycle during a primary access. If PEN is active on any cycle, the processorceases to drive the address and associated controls forthe primary access inthe next cycle. If the processor requires another access before the primary access is completed, it drives the address and controls for the second access, asserting PIA or PDA to indicate that the second access is a pipelined access. The output IREO or DREO, as appropriate, is not asserted for a pipe lined access. Devices and memories that cannot ~port pi~elined accesses should therefore ignore PIA and/or PDA, and base their operation upon IREO and/or DREO. A device or memory that receives a request for a pipe lined access may treat it as any other access, with one exception: the pipelined access cannot use the Instruction and data buses or the associated controls (e.g., IRDY or DRDY). In the case of a data read or instruction access, the results of the pipe lined access cannot be driven on the appropriate bus. In the case of a data write, the data do not appear on the data bus. Any other operations forthe access, such as address decoding, can occur. When the primary access is completed (as indicated by IRDYor DRDy), the pipelined access becomes a primary access. The processor indicates this by asserting IREO or DREO, depending on the type of access. The device or memory performing the pipelined access may complete the access as soon as IREO or DREO is asserted (poSSibly in the same cycle). When the access becomes a primary access, it controls the channel as any other primary access. For example, it may determine whether or not another pipelined access can be performed. When the ~Iined access becomes a primary access, the output PIA or PDA remains asserted for one cycle to ensure continuity of control within the slave device or memory. In the cycle after IREO or DREO is asserted, PIA or PDA is deasserted unless the processor initiates another pipelined access, in which case PIA or PDA remains asserted for the new access. Cancellation of Plpellned Accesses If the processor takes an interrupt or trap before a pipelined access becomes a primary access, the request for the pipe lined access is removed from the channel. This may occur, for example, when IERR or DERR is signaled for the primary access. 1-92 If the pipe lined access is removed from the channel, the slave device or memory does not receive an IREO or DREO forthe pipelined access. Hence, the pipelined access does not become a primary access, and cannot be completed. A pipelined access may be canceled in this manner at any time before it becomes a primary access. Because of this, a pipelined access should not change the state of a slave device or memory until the pipelined access becomes a primary access. Burst-Mode Accesses A burst-mode access allows multiple instructions or data words at sequential addresses to be accessed with a single address transfer. The number of accesses performed and the timing of each access within the sequence are controlled dynamically by the burst-mode protocol. Burst-mode accesses take advantage of sequential addressing patterns, and provide several benefits over simple and pipelined accesses: 1. Simultaneous instruction and data acc,esses. Burst-mode accesses reduce the utilization of the address bus. This is especially important for instruction accesses, which are normally sequential. Burst-mode instruction accesses eliminate most of the address transfers for instruc~ tions, allowing the address bus to be used for simultaneous data accesses. 2. Faster access times. By eliminating the address-transfer cycle, burst-mode accesses allow addresses to be generated in a manner that improves access times. 3. Faster memory access modes. Many memories have special high~bandwidth access modes (e.g., fast page mode DRAM). These modes generally require a sequential addressing pattern, even though addresses may not be presented explicitly to the memory for all accesses. Burst-mode accesses allow the use of these access modes without hardware to detect sequential addressing patterns. Burst-Mode Overview The control-flow diagrams in Figure 63 and Figure 64 illustrate the operation of the processor and an instruction memory during a burst-mode instruction access. The control-flow diagrams in Figure 65 and Figure 66 illustrate the operation of the processor and a data memory or device during a burst-mode data access. These diagrams are for illustration only; nodes on these diagrams do not necessarily correspond to processor or slave states, and transitions on these diagrams do not necessarily correspond to processor cycles. Am29000 IPB(1) location available SUSPENDED If no exception retransmit address TLB miss or protection violation (1) IPB = Instruction Prefetch Buffer Figure 63. Processor Burst-Mode Instruction Accesses: Control Flow A burst-mode access is in one of the following operational conditions at any given time: 1. Established: 2. Active: 3. Suspended: The processor and slave device have successfully initiated the burst-mode access. A burstmode access that has been established is either active or suspended. An established burstmode access may become preempted, terminated or canceled. Instruction or data accesses and transfers are being performed as the result of the burst-mode access. An active burst-mode access may become suspended. No accesses ortransfers are being performed as the result of the burst-mode access, but the burst-mode access remains established. Additional accesses and transfers may occur at some later time (Le., the burstmode access may become active) without the retransmission of the address for the access. 4. Preempted: The burst-mode access can no longer continue because of some condition, but the burstmode access can be reestablished within a short amount of time. 5. Terminated: All required accesses have been performed. 6. Canceled: The burst-mode access can no longer continue because of 1-93 29K Family CMOS Devices ~, mArn ERR Active, or interrupVtrap taken If no exception retransmit address TLBmiss or protection violation Note: The Am29000 does not suspend burst-mode data accesses. Figure 65. Processor Burst-Mode Data Accesses: Control Flow mode access on each subsequent address transfer, as long as there are more accesses yet to be performed. During any subsequent access, the addressed device or memory may establish a burst-mode access by asserting IBACK or DBACK. If the burst-mode access is never established, the default behavior is to have the processor transmit an address for every access. Active and Suspended Burst-Mode Accesses After the burst-mode access is established, IBREO and DBREQ are used during subsequent accesses to indicate that the pro~equires at least one more access. If IBREQ or DBREQ is active at the end of the cycle in which an access is successfully completed (Le., when IROY or DRDY is active), the processor requires another access. If the slave device or memory previously has not preempted the burst-mode acCess, and does not preempt (by deasserting IBACK or DBACK) or cancel (by asserting IERR or DERR) the burst-mode access in the cycle that the access completes, the additional access must be performed. The execution rate of instructions is known only dynamically, so that in certain situations, a burst-mode instruction access must be suspended. If IBREQ is inactive during the cycle in which an instruction access is completed, the burst-mode access is suspended (if it is neither preempted nor canceled at the same time). The burst-mode access remains suspended unless the processor requests a new instruction access (in which case IREO is asserted). or unless the instruction memory preempts the burst-mode access. A suspended burst-mode instruction access becomes active wheneverthe processor can accept more instruc1-95 29K Family CMOS Devices mmm,~Active ACTIVE Cannot continue burst Inactive Tenninated, Preempted, or Canceled by Processor Figure 66. Slave Burst-Mode Data Accesses: Control Flow tions. The processor activates the burst-mode access by asserting IBREO. If the instruction memory does not pree~pt the burst-mode access during this cycle, an instruction access must be performed. When a suspended burst-mode instruction acCess is activated, the resulting instruction access is not permitted to be completed in the cycle in which IBREO is asserted, but may be completed in the next cycle. The reason for this restriction is that the burst-mode protocol is defined such that the combination of an active level on IBREQ and IRDY causes an instruction access (as previously discussed).lfthe instruction access is completed immediately in the cycle where a suspended burst-mode access is activated, there is an ambiguity in the protocol: it is possible to interpret a single-cycle assertion of IBREO as a request for two instructions. The above ambiguity is resolved by delaying the instruction access resulting from a reactivated burst-mode access for a cycle. Since this restriction applies only when the Instruction Prefetch Buffer is full and the instruction memory is capable of a very fast access, the delayed instruction response has no performance impact. The Am29000 does not suspend burst-mode data accesses because the data transfers occur to and from general-purpose registers, which are always available. However,other channel masters may suspend burstmode data accesses (during direct memory accesses, 1-96 for example). The principles for suspending burst-mode accesses are the same as those for instruction accesses discussed above. Processor Preemption, Termination, and cancellation The processor may preempt, terminate or cancel a burst-mode access by deasserting IBREO or DBREQ and asserting IREQ or DREQ at some later point. Normally, the processor receives one more instruction or data word after IBREO or DBREQ is de asserted. How" ever, this access may be completed in the same cycle that IBREQ or D~REQ is deasserted. During the period after IBREQ or DBREQ is deasserted and before IREO or DREO is asserted, the burst-mode access is in a suspended condition. The slave device or memory cannot distinguish between preempted, terminated, and canceled burstmode accesses, when these are caused b1..!b!J?rocessor, until the processor asserts IREO or DREQ. If the slave continues to assert IBACK or DBACK after IBREQ or DBREQ is deasserted, the slave should be prepared to accept any new request during the cycle in which IREO or DREO is asserted to begin the new access. The reason for this is that the processor may attempt to establish a burst-mode access for the new access: if the slave is asserting IBACK or DBACK because of a previ- ously preempted, terminated, or canceled burst-mode access, the processor interprets the active IBACK or DBACK as establishing the new burst-mode access and removes the request in the following cycle. The processor preempts a burst-mode access when an external channel master arbitrates for the channel, or when a burst-mode fetch crosses a potential virtualpage boundary. Since the minimum page size is 1 kb, burst-mode instruction and data accesses are preempted whenever the address sequence crosses a 1-kb address boundary. The burst is reestablished as soon as a new address translation is performed (if required). A new physical address is transmitted when the burstmode access is reestablished. Note that the preemption resulting from page boundaries is advantageous for devices or memories that require counters to follow the burst-mode address sequence. Since all burst-mode accesses are word accesses and the processor retransmits an address at every 1-kb address boundary, an 8-bit counter in the slave device or memory is sufficient to follow the burstmode. address sequence. Additional address bits are simply latched. The processor terminates a burst-mode access whenever all required instructions or data have been accessed. In the case of instruction accesses, the burstmode access is terminated when a nonsequential fetch occurs. In the case of data accesses, the burst-mode access is terminated when the count indicates a Single load or store remains. The last load or store is executed as a simple access: The processor cancels a burst-mode access when an interrupt ortrap is taken. Note that a trap may be caused by the burst-mode access, for example when a Translation Look-Aside Buffer miss occurs on an address in the burst-mode sequence. If the processor cancels a burstmode access when an access in the sequence remains to be completed, this access must be completed in spite of the cancellation. Canceled burst-mode data accesses may be restarted at some (possibly much later) point in execution via the Channel Address, Channel Data, and Channel Control registers. In this case, the burst-mode access is restarted at the point at which it was canceled, rather than at the beginning of the original address sequence. Slave Preemption and Cancellation The slave device or memory involved in a burst-mode access may preempt the access by deasserting IBACK or DBACK. The processor samples IBACK and DBACK when IRDY and DRDY are active so that IBACK and DBACK may be deasserted as the last supported access is completed. However, IBACK and DBACK also may be de asserted in any cycle before the access i~ completed. If IBACK or DBACK is deasserted when the processor is in a state where it expects an access, the access must be completed. Am 29000 In general, the slave device or memory preempts the burst-mode access whenever it cannot support any further accesses in the burst-mode sequence. This normally occurs whenever an implementation-dependent address boundary is encountered (e.g., a cache-block boundary), but may occur for any reason. By preempting the burst-mode access, the slave receives a new request with the address of the next instruction or data word required by the processor. The slave device or memory may cancel a burst-mode access by asserting IERR or DERR in response to a requested access. The signals IBACK or DBACK need not be deasserted at this time, but should be de asserted in the next cycle. Note that the IERR and DERR Signals cat,Jse non-maskable traps, except in the case where IERR is asserted for an instruction that the processor does not execute. Arbitration External masters can gain access to the address, data, and instruction buses by asserting the BREQ input. The processor completes any pending acce~eempts any burst-mode access, and asserts the BGRT output. At this time, the processor places all channel outputs associated with the address, data, and instruction buses in the high-impedance state. For the first cycle in which BGRT is asserted, the output BINV is also asserted. If the external master cannot control the address bus and associated controls in the cycle where BGRT is asserted, the active level on BINV may be used to define an idle cycle forthe channel (Le., any spurious access requests are ignored). The BINV signal is asserted only for a single cycle, so the external master must take control of the channel in the cycle after BGRT is asserted. While the BREQ input remains asserted, the processor continues to assert BGRT. The external master has control over the channel during this time. To release the channel to the processor, the external master deasserts BREQ, but must continue to control the channel for the first cycle in which BREQ is deasserted. In the cycle after BREa is deasserted, the processor asserts BINV and deasserts BGRT;the external master should release control of the channel at this time. On the following cycle, the processor deasserts BINV and is able to use the channel. The processor reestablishes any burst-mode access preempted by arbitration. The processor does not relinquish the channel when the LOCK signal is active. This prevents external masters from interfering with exclusive accesses. 1-97 29K Family CMOS Devices Use of BINV to Cancel an Access Besides using the BINV signal to transfer control of the channel from one masterto another, the Am29000 uses the BINV signal to cancel accesses after they have been initiated. To cancel an access, BINVis asserted during a cycle in which IREO or DREO also is asserted. If an ac~is canceled, the~mpanying response (using IROY, IERR, DRDY or DERR) is ignored during the cycle where BINV is asserted; thereafter, the system should not respond to the canceled access. The BINV Signal is used to cancel an instruction access in the following situations: • when an interrupt or trap is taken • when an instruction fetch-ahead is canceled because a target block is only partially present in the Branch Target Cache • when an instruction TLB miss or protection violation occurs on an instruction access • when a branch instruction is the delay instruction of another branch, and the targets of both branches are in the Branch Target Cache (in this case, the external fetch for the target of the first branch is not required) • when the processor enters the Load Test Instruction Mode, and there is an active instruction request on the channel The BINV Signal is used to cancel a data access in the following situations: • when a data TLB miss or protection violation occurs on the data access • when an interrupt or trap is taken in the cycle where a pipelined data access becomes a primary access If, for data accesses, address translation is not performed and pipe lined accesses are not implemented, the BINV signal can be ignored by the system during the access. When a LOADSET instruction encounters a protection violation because store access is not permitted, the processor cancels the load access with BINV. Bus Sharing-Electrical Considerations When buses are shared among multiple masters and slaves, it is importantto avoid situations where these devices are driving a bus at the same time. This may occur when more than one master or slave is allowed to drive a bus in the same cycle if bus arbitration is incompletely or incorrectly performed. However, it also occurs when a 1-98 master or slave releases a bus in the same cycle that another master or slave gains control, and the first master or slave is slow in disabling its bus drivers, compared to the point at which the second master or slave begins to drive the bus. The latter situation is called a bus COllision in the following discussion. In addition to the logical errors that can occur when multiple devices drive a bus simu Itaneously, such situations may cause bus drivers to carry large amounts of electrical current. This can have a Significant impact on driver reliability and power dissipation. Since .bus collisions usually occurfor a small amount of time, they are of less concern, but may contribute to high-frequency electromagnetic emissions. The Am29000 channel is defined to prevent all situations where multiple drivers are driving a bus simultaneously. However, bus collisions may be allowed to occur, depending on the system deSign. In the case of the Am29000 channel, arbitration for the channel prevents the processor from driving the address and data buses at the same time as another channel master. If there is more than one external master, the system design must include some means for ensuring that only one external master gains control of the channel, and that no external master gains control of the channel at the same time as the processor. When the processor relinquishes control of the channel to an external master, bus collisions may be prevented by not allowing the external master to drive any bus while BINVisactive. This ensures that all processor outputs are disabled by the time the external master takes control of the channel. However, there is nothing in the channel protocol to prevent the external master from taking control as soon as BGRT is asserted. Slave devices and memories are prevented from simultaneously driving the instruction bus or data bus by allowing only the device or memory performing a primary access to drive the appropriate bus. When a pipe lined access becomes a primary access, it may drive the instruction or data bus immediately, so there is a potential bus collision if the pipe lined access is performed by a slave other than the slave performing the original primary access. This bus collision may be prevented by restricting all slaves to driving the instruction and data buses in the second half-cycle (using SYSCLK, for example). Since the processor samples data only at the end of a cycle, this restriction does not affect perfonnance. When the processor performs a store immediately following a load, it drives the data bus for the store in the second cycle following the cycle in which the data forthe load appears on the data bus. This provides a complete cycle for the slave involved in the load to disable its data drivers. The processor continues to drive the data bus until it receives a DRDY or DERR in response to the store; it ceases to drives the data bus in the cycle following the response. Am29000 the channel, an individual device or memory, or a location within a device or memory. Channel Behavior for Interrupts and Traps When a resource is locked, it is available for access only by the processor with the appropriate access privilege. The mechanisms for restricting accesses and the methods for reporting attempted violations of the restrictions are system-dependent. If an interrupt ortrap is taken, any burst-mode accesses are canceled. If a request for a pipe lined access is on the address bus, this request is removed. Any other accesses are completed and no new accesses are started, other than those required for the interrupt or trap. Note that any accesses that the processor expects to complete must be completed, even though burst-mode and pipelined accesses are canceled. Initialization and Reset When power is first applied to the processor, it is in an unknown state and must be placed in a known state. Also, under certain circumstances, it may be necessary to place the processor in a defined state. This is accomplished by the Reset mode, which is invoked by activating the RESET pin for the required duration. The Reset mode configures the processor state as follows: When interrupt or trap processing is complete, any canceled burst-mode access transactions are reestablished using the address of the access that was to be performed next when the interrupt or trap was taken. Uncompleted pipelined accesses are restarted, either by the interrupt return sequence in the case of an instruction access, or by restarting the initiating instruction in the case of a data access. 1. Instruction execution is suspended. 2. Instruction fetching is suspended. Note that the restarting of a pipe lined access is not performed by the Channel Address, Channel Data, and Channel Control registers, since these registers may be required to restart the primary access. The instruction initiating the pipelined access is not allowed to be completed until the primary access is completed, so that the Program Counter 1 {PC1} register contains the address of the initiating instruction when a pipelined access is canceled. The address in PC1 can restart this instruction on interrupt return. 3. Any interrupt or trap conditions are ignored. 4. The Current Processor Status Register is set as shown in Figure 67. 5. The Cache Disable bit of the Configuration Register is set. 6. The Data Width Enable bit of the Configuration Register is reset. 7. The Contents Valid bit of the Channel Control Effect of the LOCK Output Register is reset. The LOCK output provides synchronization and exclusion of accesses in a multiprocessor environment. LOCK has no predefined effect for a system, other than the fact that the Am29000 does not grant the channel to an external master while LOCK is active. Except as previously noted, the contents of all generalpurpose registers, special-purpose registers, and TLB registers are undefined. The contents of the Branch Target Cache are also undefined. The Reset mode also configures the processor to initiate an instruction fetch using an address of O. Since the ROM enable {RE} bit of the Current Processor Status is 1, this fetch is directed to external instruction read-only memory. This fetch occurs when the Reset mode is exited {Le., when the RESET input is deasserted}. The LOCK output is asserted for the address cycle of the Load-and-Lock and Store-and-Lock instructions, and is asserted for both the read and write accesses of a Load and Set instruction. LOCK may also be active for an extended period of time under control of the Lock bit in the Current Processor Status Register {this capability is available only to Supervisor-mode programs}. The Reset mode is invoked by asserting the RESET input and can be entered only if the SYSCLK pin is operating normally, whether or not the SYSCLK pin is being LOCK may be defined to provide any level of resource locking for a particular system. For example, it may lock I ~ 1:11 0 10 10 10 10 10 10 1:1 0 10 10 10 10 10 10 1:1 0 10 10 10 11 10 11 1: 11 11 11 10 0 11 1 1 , v Reserved ~ : I • I : I I I I I i I I I I I I I I I I : IP: TP: FZ: RE: PO: SM: 01 I I I I I I I I I I I I I I I I I I I CA •• TE I I TU •• LK WM PI 1M OA Figure 67. Current Processor Status Register In Reset Mode 1·99 29K Family CMOS Devices driven by the processor. The Reset mode is entered within four· processor cycles after RESET is asserted. The RESET Input must be asserted for at least four processor cycles to accomplish a processor reset. The Reset mode can be entered from any other processor mode (e.g., the Reset mode can be entered from the Halt mode). If the RESEf input is asserted at the time that power is first applied to the processor, the processor enters the Reset mode only after four cycles have occurred on the SYSCLK pin. The Reset mode is exited when the RESET Input is deasserted. Either three or four cycles after RESET Is deasserted (depending on internal synchronization time), the processor performs an initial instruction access on the channel. The initial instruction access is directed to Address 0 in the instruction read-only memory (instruction ROM). If instruction ROM is not implemented in a particular system, another device or memory must respond to this instruction fetch. If the CNTL1-CNTLo Inputs are 10 or01 when RESET is deasserted, the processor enters the Halt or Step mode, 1-100 respectively. If the processor enters the Halt mode immediately after reset, the protection checking that normally applies to the Halt instruction is disabled so that the Halt instruction can be used as an instruction breakpoint in a User-mode program. The Load Test Instruction mode cannot be directly entered from the Reset mode. If the CNTL1-CNTLo inputs are 00 immediately after RESEr Is deasserted, the effect on processor operation is unpredictable. If the CNTL l-CNTLo inputs are 11, the processor enters the Executing mode. The ~rocessor samples the STATo output internally when RESET is asserted. A High level on STATo in this case is used to enable a special test configuration and causes the processor to be inoperable. When RESET is asserted, the processor drives STAT0 Low in order to disable this test configuration. However, if processor outputs are disabled by the Test mode, the processor is not able to drive STATo. Thus, if RESET is asserted When the processor is in the Test mode, the STATo pin must be driven Low externally. (In a master/slave configuration, STATo is driven Low by the master processor when RESET Is asserted.) Am29000 ABSOLUTE MAXIMUM RATINGS OPERATING RANGES Storage Temperature Voltage on any Pin with Respect to GND Commercial (C) Devices Case Temperature (Tc) Supply Voltage (Vee) -0.5 to Vee +0.5 V Stresses above those listed under ABSOL UTE MAXIMUM RA TINGS may cause permanent device failure. Functionality at or above these limits is not implied. Exposure to absolute maximum ratings for extended periods may affect device reliability. oto +85°C +4.75 to +5.25 V MIlHary Devices Case Temperature (Tc)* Supply Voltage (Vcc) -55 to +125°C +4.5 to +5.5 V Operating ranges define those limits between which the functionality of the device is guaranteed. *measured "instant on" DC CHARACTERISTICS over COMMERCIAL and MILITARY operating ranges Parameter Symbol Parameter Description Test Conditions -0.5 2.0 -0.5 2.0 -0.5 Vee-O.8 VIL VIH VILINCLK VIHINCLK VILSYSCLK VIHSYSCLK Va. VOH lu !Lo Iccop Min. Output low Voltage for All Outputs except SYSClK Output High Voltage for All Outputs except SYSClK IOL=3.2 rnA .... ~.ll\r;h. Max. 0.8 Vee +0.5 0.8 Vee +0.5 0.8 Vee +0.5 V V V V V V 0.45 V 2.4 V ±10 ±10 Output leakage Current Operating Power-Supply Current 22 for Commercial 25 for Military VOLC VOHC Unit O.S Vee-O.S ~ mNMHz V V losGNO 100 rnA 100 rnA losvcc Circuit Current CAPACITANCE Parameter Symbol Parameter Description CIN CINCLK CSYSCLK COUT Coo Input Capacitance INClK Input Capacitance SYSClK Capacitance Output Capacitance VO Pin Capacitance Test Conditions fC=1 MHz (Note 1) Min. Max. 15 20 90 20 20 UnH pF pF pF pF pF Note: 1. Not 100% tested. 1·101 29K Family CMOS Devices SWITCHING CHARACTERISTICS over COMMERCIAL operating range No. 1 1A 2 3 4 5 6 6A 7 8 SA 9 9A 98 10 11 12 12A 12B 13 14 15 16 17 18 19 20 1-102 Parameter Description System Clock (SYSCLK) Period (T) SYSCLK at 1.5V to SYSCD< at 1.5V when used as an output SYSCLK High Time when used as input SYSCLK Low Time when used as input SYSCLK Rise Time SYSCLK Fall Time Synchonous SYSCLK Output Valid Delay Synchronous SYSCLK Output Valid Delay for 031-00 Three-State Synchronous SYSCLK Output Invalid Delay Synchronous ~ Output Valid Delay Three·State SYSCIJ< Synchronous Output Invalid Delay Synchronous Input Setup Time Synchronous Input Setup Time for Ds,-Oo' 13,-10 Synchronous Input Setup Time forlmDY Synchronous Input Hold Time Asynchronous Input Minimum Pulse Width INCLI< Period INCLI< to SYSCLK Delay ';'~,~ INCLI< to SYSCO< Delay INCLI< Low Time INCLI< High Time INCLK Rise Time INCLI< Fall Time INCLI< to Deassertion of ~ (for phase synchronization of SYSCLK) WARN Asynchronous Deassertion Hold Minimum Pulse Width BiNV Synchronous Output Valid Delay from SYSCIJ( Three-State synchronous SYSCLK output invalid delay for 0 31-00 25 MHz Min. Max. UnH Note 1 40 1000 ns Note 13 Note 13 Note 13 Note 2 Note 2 0.5T-1 19 17 0.5T +1 5 5 ns ns ns ns ns Notes 3. 12 3 14 ns Note 12 Notes 4, 14.15 4 18 ns 3 30 ns 3 14 ns 3 12 30 ns ns Test CondHlons 33 MHz Min. Max. Notes 5. 1.2. Notes :~ 1~" .i\, ~ ""'" 30 ns 15 15 ns 8 8 ns 9A Synchronous Input Setup Time for 0 31-00 , 131-10 98 Synchronous Input Setup Time for DRDY 16 16 ns 10 Synchronous Input Hold~ Note 6 2 2 ns 11 Asynchronous Input Minimu Pulse Width INCLK Period Note 8 T +10 25 500 T +10 30 500 ns 2 12 2 15 ns 12 2 12 15 ns INCLK Low Time 2 10 14 INCLK High Time 10 15 INCLK Rise Time 5 5 ns 16 INCLK Fall Time 5 5 ns 17 INCLK to Deassertion of RESET (for phase synchronization of SYSCLK) 5 ns 18 WARN Asynchronous Oeassertion 19 BiNV Synchronous Output Valid 20 Three-State synchronous SYSCLK output invalid delay for 0 31 -00 12 12A INCLK to SYSCLK Delay 128 13 INCLK to SYSCLK Delay . v v ",%li~.,. Hold Minimum Pulse Width Delay from SYSCLK ns ns 12 5 0 ns Note 9 0 Note 10 4T Note 12 Notes 11, 14,15 1 8 1 9 ns 3 25 3 25 ns 4T ns 1·103 29K Family CMOS Devices SWITCHING CHARACTERISTICS over MILITARY operating range No. 1 Parameter Test Description Conditions System Clock (SYSCLK) Period (T) 20 MHz 16 MHz Min. Max. Min. Max. Unit Note 1 50 1000 60 1000 ns SYSCLK at 1.5V to SYSCI:R' at 1.5V when used as an output Note 13 0.5T -1 0.5T +1 0.5T-2 0.5T +2 ns 2 SYSCLK High Time when used as input Note 13 22 27 ns 3 4 SYSCLK Low Time when used as input SYSCLK Rise Time Note 13 19 22 5 SYSCLK Fall Time 6 Synchonous SYSCLK Output Valid Delay 1A 6A 7 8 8A ns 3 16 ns 4 20 ns 30 3 30 ns 16 3 16 ns 30 3 15 30 15 8 8' 16 16 ns 2 2 ns Notes 5, 12{&f~ Notes 14, Na~7.,~\"\",,> :>I,i,> Three-State SYSCD< Synchronous Output Invalid Delay Synchronous Input Setup Time for 0 31-00 , 131-10 Synchronous Input Setup Time forl5RDY 12A 128 5 3 Jt~!' !:~"~5) ''\~J' I~:::~t~~ i'>, 3 ns ns ns ~'i 1~~iP' Asynchronous Input Minimum Pulse Width Note 8 INCLK Period --+~ \::;Y Synchronous Inputs 1.5 V Relative to SYSCLK 1-106 .- Am29000 SWITCHING WAVEFORMS INCLK Jr---1.5-4~81-----~ V 1~4~----------~~r-----------'~1 Asynchronous Inputs 1.5 V 1.5 V INCLK and Asynchronous Inputs 1-107 29K FamIlY'CMOS Devices SWITCHING WAVEFORMS ~----~3r-----~ ~------~2r-------~ SYSCLK Definition 1.5 V SYSCLK INCLK 1.--------;12r-------~ INCLK to SYSCLK Delay 1·108 Am29000 Capacitive Output Delays For loads greater than 80 pF This table describes the additional output delays for capacitive loads greater than 80 pF. Values in the Maximum Additional Delay column should be added to the value listed in the SWitching Characteristics table. For loads less than or equal to 80 pF, refer to the delays listed in the SWitching Characteristics table. No. 6 6A Total External capacitance Parameter Description Synchronous SYSCLK Output Valid Delay 100 pF 150 pF 209;, Synchronous SYSCLK Output Valid Delay for 0 31-0 B 19 BINV Synchronou~~utput Valid Delay from SYSCLK Maximum Additional Delay +1 ns +2 ns +4ns +6 ns +B ns +1 ns +6 ns +10 ns +15 ns +19 ns +1 ns +2 ns +4ns +6ns +B ns +1 ns +3 ns +4ns +6ns +7ns FJ. O'fjF OOpF 250pF 300pF 100pF 150 pF 200pF 250pF 300pF 100 pF 150 pF 200pF 250pF 300 F SWITCHING TEST CIRCUIT 10&. = 3.2mA Am29000 Pin Under Test 080751HlO1A ICOO 1030 CL is guaranteed to BO pF. For capacitive loading greater than BO pF, refer to the Capacitive Output Delay table. 1-109 29K Family CMOS Devices Am29000 Thermal Characteristics Pln-Grld-Array Package Thermal Resistance - °ClWatt 700 (3.58) 900 (4.61) 2 2 2 13 11 10 Parameter OJC Junction-to-Case SCA Case-to-Ambient (no Heat~i,~l~ SCA Case-to-Ambient (w ''0 Heatsink, Thermall'7 6 3 2 2 2 6 3 2 2 2 700 (3.58) 900 (4.61) OCA Case-to-Ambient (witttLnidiredional Pin Fin Heatsink, Wakefield 840-20) 10 Ceramlc-Quad-Flat-Pack Package , I II· . . I SJC I OCA r SJA IC001040 Thermal Resistance - °ClWatt Alrflow-ft./mln. (m/sec) Parameter Ox Junction-ta-Case SCA Case-to-Ambient Note: This is for reference only. 1-110 0 (0) 150 (0.76) 300 (1.53) 480 (2.45) Am29027 Advanced Micro Devices Am29027 Arithmetic Accelerator DISTINCTIVE CHARACTERISTICS • • • • • High-speed floating-point accelerator for the Am29000™ processor Comprehensive floating-point and Integer Instruction sets, Including addition, subtraction, and multiplication Single-, double-, and mixed-precision operations Performs conversions between precisions and between data formats • Complies with seven Industry-standard floating-point formats: -IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE std 754-1985), single- and double-precision • • • • • Exact IEEE compliance for denormallzed numbers with no speed penalty Simple Interface requires no glue logic between Am29000 and Am29027 ™ Eight-deep register file for Intermediate results and on-Chip 64-bit data path facilitate compound operations, for example, NewtonRaphson division, sum-of-products, and transcendentals Supports plpellned or flow-through operation Full complier and assembler support for IEEE format Fabricated with Advanced Micro Devices' 1.2micron CMOS process -DECTM F, DEC 0, and DEC G Standards -IBM~ Systeml370 single- and double-precision SIMPLIFIED SYSTEM DIAGRAM Data 32 09114-OO1C Publication' 09114 Rev. C Amendment 10 Issue Date: October 1989 1-111 29K Family CMOS Devices TABLE OF CONTENTS DISTINCTIVE CHARACTERISTICS •.•.••••.•.••••..••••••••••.•••••.••••••••••••••..• 1-111 SIMPLIFIED SYSTEM DIAGRAM •••.•••••.••••••••••••.••••.••••••••.••••••.••••••••• 1-111 GENERAL DESCRIPTION •••••••.••••••.•.•••••••.•••••••••.••••••.•••.••••••••.••• 1-114 CONNECTION DIAGRAMS ••••••.••••••••••••.•••.•.•••••••.••••••••••••••••••••••• 1-115 PIN DESIGNATIONS ••••••••.••••••••••••••.•••.•••••••.••••••••••.••••••.••••..•• 1-117 LOGIC SYMBOL ••••••.•••.••.••••.••••••••.•••.••. , •••••••••••••••••••••••••••••• 1-121 ORDERING INFORMATION ••••.••••..••••.•••••.•••••••••.••••.•••••••••••••••••.• 1-122 PIN DESCRIPTION ••••••••••••.•••...•••••••••••..•••••••••••••.•••••.•.••••••••• 1-124 FUNCTIONAL DESCRIPTION •••••••••..•••••••••••••••••.•••••••••••••••••••••.•••• Overview •••••••••••••••••••••.••••...••••••••••••••••.•••••.•••••••••..••• Architecture ....... ; ............................................................ Instruction Set ............................................................. ; ..... Performance .................................................... ,' ........ ; ...... Interface ....................................................................... Master/Slave .................................................................... Support ...................................................................... " Block Diagram Description ••.••••••.•.••••.•••...••••••••.••••..•••••••••••.••• Input Registers .................................................................. Operand Selection Multiplexers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Instruction Register .............................................................. ALU .......................................................................... Output Register/Register File ....................................................... Flag Register '................................................................... Status Register .................................................................. Output Multiplexer ............................................................... Mode Register ....................................................' .............. Control Unit .................................................................... Master/Slave Comparator ......................................................... System Interface •••.••...••••.••.•••.•..•••••••••••.•••..••.••.•.•••••• '•..••• Special-Purpose Registers •••••.••••••.•..••••••.•••..•••.••••...••••••••.••••• Mode Register .................................................................. Status Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Flag Register ................................................................... Precision Register ............................................................... Instruction Register, I-Temp Register ................................................ Operand Registers ••••.••.••••.•••.•.•.••••••.••••••••.•••••...••••.•••••..•• Accelerator Transaction Requests ..•.•••.••••••..•••••••.••..••..•••••.••• ,. ••..• Write Transaction Requests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Read Transaction Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Coprocessor Data Accept ......................................................... Data Ready .................................................................... Data Error .. '.' ........................................................... '. . . . . . .. Accelerator Instruction Set .•...••••...••..•..••.••••..•.••.....•.•••..•••.••••• Instruction Word ................................................................. Base Operation Code ............................................................. Sign-Change Selects ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Operand Precision Selects ......................................................... Operand Source Selects .......................................................... Register File Controls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Accelerator Operations ........................................................... 1-112 1-125 1-125 1-125 1-125 1-125 1-125 1-126 1-126 1-126 1-127 1-127 1-127 1-127 1-127 1-127 1-127 1-127 1-127 1-127 1-128 1-128 1-129 1-129 1-131 1-131 1-132 1-132 1-132 1-133 1-133 1-134 1-135 1-135 1-135 1-136 1-136 1-136 1-136 1-136 1-139 1-139 1-139 Am29027 Base Operation Code Description ................................................... Primary and Alternate Floating-Point Formats .......................................... Operation Precision ............ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Operation Flags .................................................................. Updating the Status Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1-143 1-145 1-145 1-145 1-148 Operatlon.Sequenclng •••••••••••••••••••••••••••••••••••••••.•••••••••••••••• 1-148 Operation in Flow-Through Mode ................................... : . . . . . . . . . . . . . . .. Operation in Pipeline Mode ........................................................ Pipeline Advance ......................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Performing Operations ............................................................ 1-148 1-153 1-153 1-153 Master/Slave Operation ••••••••••••••••••••••••••••••••••••••••••.•••••••••••• 1-158 Initialization and Reset •••••••••••••••••••••••••••••••••••••••••••••••••••••••• 1-158 Applications •••• .' ••••••••••••••••••••••••••••••••••••••••••••••.••••••••.••• 1-158 ABSOLUTE MAXIMUM RATINGS ••••••••••••••••••••••••••••••••••••••••.•.••••••••• 1-161 OPERATING RANGES •••••••••••••••.••••••••••••••••••••••••••••••.••••••••••••• 1-161 DC CHARACTERISTICS •••••••••••••••••••••••••••••••••••••••••••.••...•.•.•.••• 1-162 CAPACiTANCE ••••••••••••••••••••••••••••••••.•••••• .' •.••••••••••••••••.•••••• 1-162 SWITCHING CHARACTERISTICS •••••••••••••••••.•• .' ••••••••••••••••••••••.•••...•• 1-163 SWITCHING WAVEFORMS ••• .' •••••••••••••••••••••••••••••••••••••••••.••••.••.••. 1-165 SWITCHING TEST CIRCUIT ••••••••••••••••••••••••••••• .' ••.••••••••••••••••••.. '. ~ • 1-169 TEST PHILOSOPHY AND METHODS ••••••••••••••••••••••••••••••••.•• .' •..•••••••••• 1-170 APPENDIX A-DATA FORMATS •••••••••••••••••••••••••••••••••••••• .' .•••••••••••• 1-172 APPENDIX B--ROUNDING MODES ••••••••.• .' ••• .' ••••• .'.'.'.'.' ••••• .' ••••...•.•.••••••• .' 1-177 APPENDIX C-ADDITIONAL OPERATION bETAILS .' • .' • .' • .' • .' •• .' • .' ••.••.•. .' ......... .' ••• .' •. 1-180 APPENDIX D-TRANSACTION REQUEST/OPERATION TIMING • .' •••••• .' • .'.' .'.' • .' •.• .' •••.•... .' 1-182 1-113 29K Family CMOS Devices GENERAL DESCRIPTION The Am29027 Arithmetic Accelerator is a highperformance computational unit intended for use with the Am29000 Streamlined Instruction Processor. When added to an Am29000-based system, the Am29027 improves floating-point performance by an order of magnitude or more. The Am29027 implements an extensive floating-point and integer instruction set, and can perform operations on single-, double-, or mixed-precision operands. The three most widely used floating-point formats-IEEE, DEC, and IBM-are supported. IEEE operations fully comply with the IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE standard 754-1985), with direct implementation of special features such as gradual underflow and exception handling. The Am29027 consists of a 64-bit ALU, a 64-bit data path, and a control unit. The ALU has three data input ports, and can perform operations requiring one, two, or three input operands. The data path comprises two 64-bit input operand registers. an 8-by-64-bit register file for storage of intermediate results, three operand selection multiplexers that provide for orthogonal selection of input operands, and an output multiplexer that allows access to Jhe result data, the operation status, the flags, or the accelerator state. The control unit interprets transaction requests from the Am29000, and sequences the ALU and data path. Operations can be performed in either of two modes: flow-through or pipeline. In flow-through mode, the ALU is completely combinatorial; this mode is best suited to scalar operations. Pipeline mode divides the ALU into twO or three pipe lined stages for use in vector 1·114 operations, such as those found in graphics or signal processing. The Am29027 connects directly to Am29000 system buses and requires no additional interface circuitry. Fabricated with AMD's 1.2-micron CMOS technology, the Am29027 is housed in' two packages: a 169lead pin-grid-array (PGA) package, and a 164-lead ceramic-quad-flat-pack (CQFP) package for military applications. Related AMD Products Part No. Description Am29000 Streamlined Instruction Processor 29KTM Family Development Support Products Contact your local AMD representative for information on the complete set of development support tools. Software development products on several hosts: • Optimizing compilers languages for common • Assembler and utility packages high-level • Source- and assembly-level software debuggers • Target-resident development monitors • Simulators Hardware Development: • ADAPT29KTM' Advanced Development and Prototyping Tool Am29027 CONNECTION DIAGRAMS 169-Lead PGA * Bottom View ABC D E F G H 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 J K L MN P R T U @000000000000000® 00000000000000000 00000000000000000 0000·· 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 00000000000000000 00000000000000000 @000000000000000@ • Pinout observed from pin side of package . •• Alignment pin (not connected internally). CD009761 1-115 29K Family CMOS Devices CONNECTION DIAGRAMS (continued) 164·Lead CQFp· Top View (Lid Facing Viewer) 164 124 123 41 83 L 42 1-116 82 Am29027 PGA PIN DESIGNATIONS (sorted by Pin No.) Pin Na. Pin Name PinNa. Pin Name PinNa. Pin Name PinNa. Pin Name A-1 A-"2 S31 F4 Fs Fa Flo F12 F14 FIS Fla F21 F22 F24 F27 F2a F31 SLAVE It S30 Fl F3 F5 F7 F9 F13 F15 F17 F19 F23 F25 F26 F30 GND MSERR 15 S27 S28 Fo F2 Veeo GNDO Fl1 GNDO Veeo C-10 C-11 F20 Veeo GNDO F29 GNDO Veco 12 Is S24 S25 S29 (see note) 10 13 18 S21 S23 S2S 14 17 19 S18 S20 S22 Vee Ito 1t2 SIS S17 S19 GND hI 1t4 S13 S14 SIS GND 1t3 Its S11 S12 Vee 117 J-16 J-17 lIS Ita S9 SIO GND 121 120 1t9 S8 S7 S6 GNDO 123 122 S5 S4 S2 Veeo DRDY CDA S3 SI R30 NC EXCP DERR So R29 R26 126 NC NC R31 R27 R24 R20 Vee GND R12 R8 GND Vee ClK R-12 R-13 DREOTo RESET DREO 129 127 124 R28 R23 R21 R18 R16 R13 RIO R7 R5 R3 Ro OPTI DREOTI BINV 131 128 125 R25 R22 R19 R17 R15 R14 Rll R9 Rs R4 R2 Rl OPTo OPT2 R/W OE 130 A-3 A-4 A-5 A-6 A-7 A-8 A-9 A-10 A-11 A-12 A-13 A-14 A-15 A-16 A-17 B-1 B-2 B-3 B-4 B-5 B-6 B-7 B-8 B-9 B-10 B-11 B-12 B-13 B-14 B-15 B-16 B-17 C-1 C-2 C-3 C-4 C-5 C-6 C-7 e-8 e-g C-12 C-13 C-14 C-15 C-16 C-17 0-1 0-2 0-3 0-4 0-15 0-16 0-17 E-1 E-2 E-3 E-15 E-16 E-17 F-1 F-2 F-3 F-15 F-16 F-17 G-1 G-2 G-3 G-15 G-16 G-17 H-1 H-2 H-3 H-15 H-16 H-17 J-1 J-2 J-3 J-15 K-1 K-2 K-3 K-15 K-16 K-17 L-1 L-2 L-3 L-15 L-16 L-17 M-1 M-2 M-3 M-15 M-16 M-17 N-1 N-2 N-3 N-15 N-16 N-17 P-1 P-2 P-3 P-15 P-16 P-17 R-1 R-2 R-3 R-4 R-5 R-6 R-7 R-8 R-9 R-10 R-11 R-14 R-15 R-16 R-17 T-1 T-2 T-3 T-4 T-5 T-6 T-7 T-8 T-9 T-10 T-11 T-12 T-13 T-14 T-15 T-16 T-17 U-1 U-2 U-3 U-4 U-5 U-6 U-7 U-S U-9 U-10 U-11 U-12 U-13 U-14 U-15 U-16 U-17 Note: Pin Number 0-4 =Alignment Pin. Veeo and GNOO are power and ground pins for the output buffers. Vee and GNO are power and ground pins for the rest of the logic. 1-117 29K Family CMOS Devices PGA PIN DESIGNATIONS (sorted by Pin Name) Pin No. Pin Name Pin No. Pin Name Pin No. Pin Name Pin No. Pin Name T-14 M-17 R-11 N-17 BINV CDA G-15 H-15 8-16 MSERR So NC NC NC M-16 R-14 DRDY DREQ K-3 R-6 R-9 C-6 N-15 P-16 P-17 P-1 N-2 CLK DERR GND GND GND GND GND GNDO U-16 U-13 R-12 T-13 N-16 DREQTo DREQTl EXCP C-8 C-12 C-14 GNDO GNDO GNDO C-3 B-2 C-4 8-3 A-2 8-4 A-3 8-5 A-4 8-6 A-5 C-7 A-6 8-7 A-7 8-8 A-8 8-9 A-9 8-10 C-10 A-10 A-11 8-11 Fo Fl F2 F3 F4 F5 F6 F7 Fs F9 Flo F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 F21 F22 F23 L-15 0-15 A-17 C-16 0-16 E-15 8-17 C-17 E-16 0-17 E-17 F-16 G-16 F-17 H-16 G-17 H-17 J-16 J-15 J-17 K-17 K-16 K-15 L-17 GNDO 10 11 A-12 8-12 8-13 L-16 R-17 T-17 P-15 A-14 C-13 8-14 A-15 F24 F25 F26 F27 F28 Fn F30 F31 8-15 GND A~13 "R-16 T-16 R-15 U-17 T-15 M-3 N-1 SI S2 S3 OE OPTo M-2 M-1 S4 S5 T-12 U-14 T-11 OPTI OPT2 Ro L-3 L-2 L-1 S6 S7 Ss Is 19 110 111 112 113 114 115 116 117 lIs 119 120 121 122 U-12 U-11 T-10 U-10 T-9 U-9 T-8 R-8 U-8 T-7 U-7 R-7 T-6 U-6 U-5 T-5 U-4 T-4 U-3 R-4 T-3 U-2 T-2 R-3 Rl R2 R3 R4 R5 R6 R7 Rs R9 RIo R11 R12 R13 R14 R15 RIB R17 RIS R19 R20 R21 R22 R23 R24 K-1 K-2 J-1 J-2 H-1 H-2 G-1 H-3 G-2 F-1 G-3 E-1 F-3 E-2 0-1 0-2 E-3 C-1 C-2 0-3 8-1 A-1 A-16 S9 S10 S11 S12 S13 S14 SIS S16 S17 SIS S19 S20 S21 S22 S23 S24 S25 S26 S27 S28 S29 S30 S31 SLAVE 123 124 125 126 127 128 In 130 U-1 P-3 R-2 T-1 P-2 N-3 R-1 R-13 R25 R26 R27 R28 Rn R30 R31 RESET F-15 J-3 R-5 R-10 C-5 C-9 C-11 C-15 Vee Vee Vee Vee Veeo Veeo Veeo Veeo 131 U-15 R/W M-15 Veeo b 13 14 15 16 17 Note: Pin Number D-4 = Alignment Pin. Vcco and GNDO are power and ground pins for the output buffers. Vee and GND are power and ground pins for the rest of the logic. '-"8 F-2 Am29027 CQFP PIN DESIGNATIONS (sorted by Pin No.) Pin No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 Pin Name Fo FI F2 F3 F4 Veeo GNDO F5 Fe F7 Fs F9 Flo FII FI2 FI3 FI4 FIs GNDO Vcco FIe F17 FIB FI9 F20 F21 F22 F23 F24 F2S F26 Vcco GNDO F27 F2S F29 F30 F31 GND SLAVE M8ERR Pin No. 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 Pin Name Vee GND 10 11 12 13 14 15 16 b Is 19 110 III 112 113 GND 114 115 116 117 lIs 119 120 121 122 123 CDA DRDY DERR GNDO Vcco EXCP NC NC NC 124 125 126 127 I~ Pin No. 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 Pin Name 129 128 131 DREQ OE BINV RE8ET R/W DREQTI DREQTo OPT2 OPTI OPTo ClK Ro RI R2 R3 R4 Vee GND Rs R6 R7 Rs R9 RIo RII RI2 RI3 RI4 Rls RI6 RI7 RI8 RI9 R20 R21 R22 R23 R24 Pin No. 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 Pin Name R25 R26 R27 R2S R29 R30 R31 80 81 82 83 84 85 86 87 8s 89 810 811 GND Vee 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 82S 829 830 831 1·119 29K Family CMOS Devices CQFP PIN DESIGNATIONS (sorted by Pin Name) Pin No. 88 69 96 71 86 92 91 70 74 1 2 3 4 5 8 9 10 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27 28 29 30 31 34 35 36 37 38 1·120 Pin Name BINV CDA ClK DERR DREQ DREQTo DREQT1 DRDY EXCP Fo F1 F2 F3 F4 Fs F& F7 Fa F9 F10 Fll F12 F13 F14 F15 F1& F17 F18 F19 F20 F21 F22 F23 F24 F25 F26 F27 F28 F29 FlO F31 Pin No. 39 43 58 103 143 7 19 33 72 44 45 46 47 48 49 50 51 52 53 54 55 56 57 59 60 61 62 63 64 65 66 67 68 78 79 80 81 84 83 82 85 Pin Name GND GND GND GND GND GNDO GNDO GNDO GNDO 10 11 12 b 14 15 16 17 18 III 110 In 112 113 114 115 116 117 11a 119 120 121 122 123 124 125 12& 127 128 129 130 131 Pin No. 41 75 76 77 87 95 94 93 89 90 97 98 99 100 101 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 Pin Name MSERR NC NC NC OE OPTo OPT1 OPT2. RESET RtW· Ro R1 R2 R3 R4 Rs R6 R7 Ra RII R10 Rll R12 R13 R1. R15 R16 R17 R18 R111 R20 R21 R22 R23 R24 R2S R26 R27 R28 R29 R30 Pin No. Pin Name 130 R31 SLAVE 40 131 So 132 S1 133 82 S3 134 135 S4 136 Ss 137 S6 138 S7 139 Sa 140 S9 141 S10 142 S11 145 S12 . S13 146 147 S14 S15 148 149 S16 150 S17 151 S18 S19 152 S20 153 154 S21 155 822 156 S23 S24 157 S25 158 159 S26 S27 160 S28 161 162 S29 163 S30 164 S31 Vee 42 102 Vee Vee 144 Veeo 6 Veeo 20 Veeo 32 Veeo 73 Am29027 LOGIC SYMBOL RESET Transaction Request 2 CDA RIW DRDY DREO DERR ). Transact;on Status DREOT,-DREOTo F31-Fo OPTrOPTo BIN V MSERR R31-Ro EXCP 8 31 -8 0 bl-Io OE 09114B-002C 1-121 29K Family CMOS Devices ORDERING INFORMATION Standard Products AMD standard products are available in several packages and operating ranges. The ordering number (Valid Combination) is formed by a combination of: a. Device Number b. Speed Option (if applicable) c. Package Type d. Temperature Range e. Optional Processing AM29027 -25 . L= C G B .. e. OPTIONAL PROCESSING Blank.. Standard Processing B - Burn-in d. TEMPERATURE RANGE C ... Commercial (0 to +85°C) ~-------------------c.PACKAGETYPE G .. 169-Lead Pin Grid Array without Heatsink (CGX169) L -_ _ _ a. DEVICE NUMBER/DESCRIPTION Am29027 Arithmetic Accelerator Valid Combinations AM29027-25 AM29027-20 AM29027-16 1-122 GC,GCB b. SPEED OPTION -25 =25 MHz -20 .. 20 MHz -16 16 MHz = Valid Combinations Valid Combinations list configurations planned to be supported in volume for this device. Consult the local AMD sales office to confirm availability of specific valid combinations, to check on newly released combinations, and to obtain additional data on AM D's standard military grade products. Am29027 MILITARY ORDERING INFORMATION APL Products AMD products for Aerospace and Defense applications are available in several packages and operating ranges. APL (Approved Products List) products are fully compliant with MIL-STD-883C requirements. The order number (Valid Combination) is formed by a combination of 8. Device Number b. Speed Option (if applicable) c. Device Class d. Package Type e. Lead Finish AM29027 -20 z IB C L== e. LEAD FINISH C = Gold d. PACKAGE TYPE Z = 169-Lead Pin Grid Array without Heatsink (CGX169) Y = 164-Lead Ceramic Quad Flat Pack without Heatsink ' - - - - - - - - - - - - c. DEVICE CLASS IB = Class B b. SPEED OPTION -20 = 20 MHz -16 = 16 MHz ' - - - - 8. DEVICE NUMBER/DESCRIPTION Am29027 Arithmetic Accelerator Valid Combinations AM29027-20 AM29027-16 I I ISZC,/BYC Valid Combinations Valid Combinations list configurations planned to be supported in volume for this device. Consult the local AMD sales office to confirm availability of specific valid combinations or to check on newly released valid combinations. Group A Tests Group A tests consist of Subgroups 1, 2,3, 7, 8, 9, 10, 11. 1-123 29K Family CMOS Devices PIN DESCRIPTION BINV Bus Invalid (Synchronous Input) A logic Low indicates that the Am29000 address bus and related control signals are invalid. The Am29027 will ignore signal DREOTl when BINV is Low. CDA Coprocessor Data Accept (Three-State Output) A logic Low indicates that the Am29027 is ready to accept data from the Am29000. This signal is normally driven by the Am29027, and assumes a high-impedance state only if input signal OE is High or input signal SLAVE is Low. ClK Clock (Input) DERR Data Error (Three-State Output) A logic Low indicates that an unmasked exception occurred during or preceding the current transaction request. This signal is normally driven by the Am29027, and assumes a high-impedance state only if input signal OE is High or input signal SLAVE is Low. DRDY Data Ready (Three-State Output) A logic Low indicates that data is available on Port F. This signal is normally driven by the Am29027, and assumes a high-impedance state only if input signal OE is High or input signal SLAVE is Low. DREQ Data Request (Synchronous Input) A logic Low indicates that the Am29000 is making a data access. The Am29027 will ignore signal DREOTl when DREQ is High. DREQTo Start Instruction/Suppress Errors (Synchronous Input) This signal, when accompanied by a valid write operand R, write operand S, write operands R, S, or write instruction transaction request, commands the Am29027 to begin a new operation. When accompanying a valid read result LSBs, read result MSBs, read flags, or read status transaction request, DREOTo suppresses the reporting of operation errors. DREOTo also modifies the action of the write status transaction request to retime an operation in flow-through mode, or to invalidate the ALU pipeline in pipeline mode. DREQT1 Accelerator Transaction Request (Synchronous Input) A logic High indicates that the Am29000 is making an accelerator transaction request. This signal is consid- 1-124 ered valid only when signal BINV is High and signal DREO is Low. EXCP Exception (Three-State Output) Indicates that the status register contains one or more unmasked exception bits. This signal can be used as an interrupt or trap signal by the Am29000. EXCP is normally driven by the Am29027, and assumes a highimpedance state only if input signal OE is High or input signal SLAVE is Low. F31-Fo F Output Bus (Three-State Output) h1-lo Instruction Bus (Synchronous Input) Used to specify the operation to be performed by the accelerator. MSERR Master/Slave Error (Output) Reports the result of the comparison of processor outputs with the signals provided internally to the off-chip drivers. If there is a difference for any enabled driver, MSERR assumes the logic High state. OE Output Enable (Asynchronous Input) A logic High forces all accelerator outputs except MSERR to assume a high-impedance state unconditionally; master/slave comparison Circuitry is also disabled. This signal is provided for test purposes. o PTrOPTo Transaction Type (Synchronous Input) These signals, in conjunction with RNi, specify the type of accelerator transaction, if any, currently being requested by the Am29000. R31-Ro R Data Bus (Synchronous Input) RESET Reset (Asynchronous Input) Resets the Am29027. When RESET is a logic Low, the state of internal sequencing circuitry is initialized, and the status register is cleared. RESET must be connected to the signal line used to reset the Am29000. R/W Read/Wrlte (Synchronous Input) Determines the direction of a transaction. When R/W is High, data is transferred from the Am29027 to the Am29000; when Rm is Low, data is transfe rred from the Am29000 to the Am29027. Am29027 S31-80 • multiplication-accumulation S Data Bus (Synchronous Input) • • comparison selecting the maximumor minimum of two numbers • rounding to integral value • absolute value, negation, pass SLAVE Master/Slave Mode Select (Synchronous Input) A logic Low selects Slave mode; in this mode all outputs except MSERR assume a high-impedance state. A logic High selects Master mode. FUNCTIONAL DESCRIPTION Overview The Am29027 is a high-performance, single-chip arithmetic accelerator for the Am29000 Streamlined Instruction Processor. Architecture The Am29027 comprises a high-speed ALU, a 64-bit data path, and control circuitry. The core of the Am29027 is a 64-bit floating-point/integer ALU. The ALU takes operands from three 64-bit input ports and performs the selected operation, placing the result on a 54-bit output port. Seven ALU flags report operation status. The ALU is completely combinatorial for minimum latency; optional pipelining is available to boost throughput for vector operations. The data path consists of two 32-bit input buses, Rand S; two 64-bit input registers; two 64-bit temporary input registers; a 64-bit result register; an 8-word-by-64-bit register file for storage of intermediate results; three operand selection multiplexers that provide for orthogonal selection of input operands; an output multiplexer that selects data, operation flags, operation status, or other accelerator state; and a 32-bit output bus, F.lnput operands enter the floating-point accelerator through the R and S buses, and are then demultiplexed and buffered for subsequent storage in the input registers. The operand selection multiplexers route the operands to the ALU; operation results and status leave the device on Output Bus F. Operation results also can be stored in the register file for use in subsequent operations. On-board control circuitry sequences the ALU and data path during operations, and manages the transfer of data between the accelerator and the Am29000. A 32-bit instruction register and a 32-bit temporary instruction register hold the instruction words for current and pending operations. Instruction Set The Am29027 implements 57 arnhmetic and logical instructions. Thirty-five instructions operate on floatingpoint numbers; these instructions fall into the following categories: • additiOn/subtraction • multiplication • • reciprocal seed generation conversion between any of the supported floating-point formats, including conversions between precisions • conversion of a floating-point number to an integer format, with an optional scale factor By concatenating these operations, the user can also perform division, square-root extraction, polynomial evaluation, and other functions not implemented directly. Twenty-two instructions operate on integers, and belong to the following general categories; • additiOn/subtraction • multiplication • • comparison selecting the maximum or minimum of two numbers • absolute value, negation, pass • • logical operations, e.g., AND, OR, XOR, NOT arithmetic, logical, and funnel shifts • conversion between single- and double-precision integer formats conversion of an integer number to a floating-point format, with an optional scale factor • • pass operand One special instruction is provided to move data. Performance The Am29027 provides operation speeds several times greater than conventional floating-point processors by virtue of its extensive use of combinatorial, rather than sequential, logic. Most floating-point operations, whether single, double, or mixed precision, can be performed in as few as six system clock cycles. Performance is further enhanced by the presence of the on-board register file that can be used to hold intermediate results, thus reducing the amount of time needed to transfer operands between the Am29027 and the Am29000. The input operand registers and the instruction register are double-buffered, so that a new operation can be specified while the current operation is being completed. Interface The Am29027 connects directly to the Am29000 system buses. Am29027 operations are specified by a series of 1·125 29K Family CMOS Devices operand and instruction transactions issued by the Am29000. Eight input signals specify the transaction to be performed; three output signals report transaction status. Master/Slave The Am29027 contains special comparison hardware to allow the operation of two accelerators in parallel, with one accelerator (the slave) checking the results produced by the other (the master). This feature is of particular importance in the design of high-reliability systems. Support The Am29027 IEEE format is fully supported by those hardware and software tools available forthe Am29000, including: • • ASM29K Cross-Development Toolkit • ADAPT29K, a general-purpose hardware development system. The ADAPT29K permits single-step operation, break-point insertion, and other standard debugging techniques. Block Diagram Description A block diagram of the Am29027 is shown in Figure 1. The Am29027 comprises the input registers, the operand selection multiplexers, the instruction register, the ALU, the output register/register file, the flag register, the status register, the output multiplexer, the mode register, the control unit, and the master/slave comparator. HighC29K Cross-Development Toolkit 32 r-----------+----------~ 32 ; - - - - - - - - - - - - - - - - ~ ~ Prec. R!W DREQ DREOT, DREOTo OPT2 OPT1 OPTo Control Unit BINV CDA DRDY "DEAR OE RESET elK SLAVE ~ 0---. D----+ D----+ Figure 1. Am29027 Block Diagram 1-126 09114-OO3C Input Registers Operands are loaded into the accelerator via the 32-bit Rand S buses, and are demultiplexed and buffered for subsequent storage in 54-bit registers Rand S; input operands may be either single-precision (32-bit) or doubleprecision (54-bit). Two single-precision or one doubleprecision operand may be written to the input registers in a single system clock cycle. Accompanying the input registers are two 54-bit temporary registers, R-Temp and S-Temp, that permit the overlapping of operand transfers and ALU operations. Operand Selection MUltiplexers The operand selection multiplexers route operands to the ALU. These multiplexers, as well as selecting operands from input registers Rand S and register file locations RF7-RFo, also have access to a set of floatingpoint and integer constants. These constants are double-precision preprogrammed numbers for use in ALU operations, and are automatically provided in the appropriate format. Instruction Register The instruction register stores a 32-bit word specifying the current accelerator operation. Included in the instruction word are fields that specify the core operation to be performed by the ALU, operand format (integer or floating-point), sign-change selects for ALU input and result operands, operand precisions, operand sources, and register file controls. The instruction register is preceded by the 32-bit temporary register,l-Temp, permitting the overlapping of instruction transfers and ALU operations. Instructions enter the accelerator via 32-bit Instruction Bus I. ALU The ALU is a combinatorial arithmetic/logic unit that performs a large repertoire of floating-point and integer operations. The ALU has three operand inputs. Some operations require a single input operand, for example, conversion operations. Others, such as addition or mUltiplication, require two input operands. The multiplication-accumulation and funnel shift operations require three input operands. Most ALU operations allow the user to modify operand signs, thus greatly increasing the numberof arithmetic expressions that can be evaluated in a single ALU pass. The ALU can be configured in either flow-through mode, for which the ALU is completely combinatorial, or pipeline mode, for which ALU operations are divided into one or two pipeline stages. Output Register/Register File Operation results are stored in 64-bit output register F; results can also be stored in the 8-by-64-bit register file for use in subsequent operations. A precision register, part of the register file, contains bits indicating the preciSions of the operands stored in each register file , location, thus permitting the ALU to correctly process these operands in later operations. Am29027 Flag Register The 32-bit flag register stores flags pertaining to the most recently performed operation. The flags indicate error conditions, such as underflow or overflow, and also report results for operations that produce result flags, such as comparisons. Status Register The 32-bit status register contains information regarding the status of past, current, and pending operations. Six exception bits report operation error conditions. These exception bits are individually latched; once a given bit is set, it remains set until reset by the Am29000 or by system reset. The exception bits indicate error conditions of overflow, underflow, zero result, reserved operand, invalid operation, and inexact result. At the user's option, the presence of an exception can be used to report a data error to the Am29000, or to halt Am29027 operation; exception bits can be individually enabled or disabled by programming the corresponding mask bit in the mode register. Exception bit activity is summarized by a seventh bit, Exception Status, which indicates that one or more unmasked status bits are set. If deSired, the state of this bit can be placed on signal EXCP, which can be used to in~ terrupt the Am29000. The status register contains four additional bitsR-Temp Valid, S-Temp Valid, I-Temp Valid, and Operation Pending-that pertain to the state of pending operands and operations. Output Multiplexer The output multiplexer routes operation results and accelerator's internal state to the Am29000 through the 32-bit F bus. This multiplexer can select Register F, the flag register, status register, instruction register, mode register, or precision register. Mode Register The 54-bit mode register contains accelerator control parameters that change infrequently or not at all, such as floating-point format, round mode, and operation timing information. These parameters are initialized by the Am29000 during system start-up, and are modified as required during operation. Control Unit The control unit manages the transfer of data between the Am29000 and the Am29027, as well as the timing of operation execution. The Am29000 oversees operation of the Am29027 by issuing one of thirteen commands, or transaction requests, to the control unit via eight signal lines. Each transaction request specifies an action on the part of the Am29027, such as writing an operand to an input register or returning a result to the Am29000. The control unit interprets the transaction request and sequences the Am29027 to produce the desired action. Three transaction status lines are generated by the con1-127 29K Family CMOS Devices tral unit to indicate transaction completion, orto indicate the existence of an accelerator error condition. Master/Slave Comparator Each Am29027 output signal has associated logic that compares that signal with the signal that the accelerator provides internally to the output driver; any discrepancies are indicated by assertion of signal MSERR. dress Bus A31-Ao. Through these connections, the Am29000 can transfer to the Am29027 a 32-bit instruction, two 32-bit operands, or a 64-bit operand in a single cycle, or can receive a 32-bit result from the Am29027 in a single cycle. System Interface Twelve additional signals govern communication between the Am29000 and Am29027. Eight Am29000 output signals-Rm, OREa, DREQT1, DREQTo, OPT2OPTo, and BINV-are connected to the corresponding Am29027 signals and are used to issue transaction requests to the Am29027. Three Am29027 signals-COA, DRDY, and OERR-report transaction status. COA is directly connected to the corresponding input of the Am29000, while ORDY and OERR must be ORed with like signals from other resources. A fourth Am29027 signal, EXCP, may be connected to an Am29000 trap or interrupt input to signal the presence of Am29027 operation exceptions at the user's option. Am29000/Am29027 signal interconnects are depicted in Figure 2. The Am29027 takes its clock input from the Am29000 SYSCLK system clock output. Three Am29027 buses-R31-Ro, 131-10, and F31-Fo-are connected to Am29000 Oata Bus 031-00; the remaining Am29027 bus, S31-S0, is connected to Am29000 Ad- The signal used to reset the Am29000 must also. be connected to the Am29027 RESET input. For a single accelerator, this output comparison detects short circuits in output signals or defective output drivers, but does not detect open circuits. It is possible to connect a second accelerator in parallel with the first, with the second accelerator's outputs disabled by assertion of signal SLAVE. The second accelerator detects open-circuit signals, and provides a check of the outputs of the first accelerator. RESET Am29000 RESET Am29027 RESET RfiJ RfiJ DREQ DREQ DREQT, DREQT, DREQTo DREQTo OPT2 OPT2 OPT, OPT, OPTo BINV OPTo BINV CDA CDA DRDY DRDY DERR DERR Interrupt or Trap A3,-Ao D3t-Do ----------------32 EXCP S3''-SO R3,-RO b,-Io OE F3t-Fo SYSClK ClK INClK 09114-004C System Clock Figure 2. Am29000/Am29027 Hardware Interface 1-128 Am29027 Special-Purpose Registers The Am29027 contains six special-purpose registers: the mode register, status register, flag register, precision register, instruction register, and I-Temp register. Mode Register The 64-bit mode register stores 24 infrequently changed parameters pertaining to accelerator operation; its format is shown in Figure 3. The Am29000 modifies the accelerator parameter set by issuing a write mode register transaction request. The mode register should be initialized after hardware reset, and may be written with new parameters when a new mode of accelerator operation is required; mode changes take effect immediately. The Am29027 does not alter the contents of the mode register in the course of operation. Bits 63-47-Reserved for future use. This field must be set to 0 to assure future compatibility. Bit 46-EXCP Enable (EX): When EX is High, reporting of unmasked exceptions via signal EXCP is enabled. When EX is Low, signal EXCP is forced inactive (logic High). Bit 45-Halt On Error Enable (HE): When HE is High, the Am29027 will halt operation in the presence of an unmasked exception. Bit 44-Advance DRDY (AD): When AD is High, signal DRDYis advanced one cycle in flow-through mode. This bit has no effect in pipeline mode. Bits 43-40-Timer Count for the MOVE P Operation (MVTC): In flow-through mode, MVTC specifies the number of clock cycles needed for data to traverse the ALU for base operation code MOVE P; in pipeline mode, it has no effect. This field can assume values between 3 and 15, inclusive. Bits 39-36-Timer Count for the Multiply-Accumulate Operation (MATC): In flow-through mode, MATC specifies the number of clock cycles needed for data to traverse the ALU for base operation code F' =(P'x 01 + T'; in pipeline mode, it has no effect. This field can assume values between 3 and 15, inclusive. Bits 35-32-Plpellne Timer Count (PLTC): In flowthrough mode, PLTC specifies the number of clock cycles needed for data to traverse the ALU for any base operation code except F' = (P' x 01 + T' or MOVE P; in pipeline mode, it specifies the number of cycles needed for data to traverse a single pipeline stage for any base operation code. This field can assume values between 3 and 15, inclusive, in flow-through mode, and between 2 and 15, inclusive, in pipeline mode. Bits 31-28-Reserved for future use. This field must be set to 0 to assure future compatibility. Bit 27-Zero Result Exception Mask (ZMSK): When ZMSK is High, the status register zero result exception bit is masked and will not contribute to the detection of an error condition. Bit 26-lnexact Result Exception Mask (XMSK): When XMSK is High, the status register inexact result exception bit is masked and will not contribute to the detection of an error condition. Bit 25-Underflow Exception Mask (UMSK): When UMSK is High, the status register underflow exception bit is masked and will not contribute to the detection of an error condition. Bit 24-0verflow Exception Mask (VMSK): When VMSK is High, the status register overflow exception bit is masked and will not contribute to the detection of an error condition. Bit 23-Reserved Operand Exception Mask (RMSK): When RMSK is High, the status register reserved operand exception bit is masked and will not contribute to the detection of an error condition. Bit 22-lnvalld Operation Exception Mask (IMSK): When IMSK is High, the status register invalid operation exception bit is masked and will not contribute to the detection of an error condition. Bit 21-Reserved for future use. This bit must be set to 0 to assure future compatibility. Bit 20-Plpellne Mode Select (PL): When PL is High, pipeline mode is selected; when PL is Low, flow-through (unpipelined) mode is selected. Bits 19-17-Reserved for future use. This field must be set to 0 to assure future compatibility. Bits 16-14-Round Mode Select (RMS): Selects one of six rounding modes as follows: RMS Round Mode o00 o0 1 o10 o11 100 Round to nearest (IEEE) Round to minus infinity Round to plus infinity Round to zero Round to nearest (DEC) Round away from zero Illegal value 101 11X Additional information on round modes can be found in Appendix B. Bits 13-12-lnteger Multiplication Format Adjust (MF): Selects the output format for integer multiplica. tion. The user may select either the MSBs orthe LSBs of . an integer multiplication result, with optional format adjust. MF is encoded as follows: MF Output Format 00 01 LSBs LSBs. format-adjusted MSBs MSBs. format-adjusted 10 11 1-129 29K Family CMOS Devices "Format-adjusted" indicates that the product is shifted left one place before the MSBs or LSBs are selected. Bit 11-lnteger Multiplication Signed/Unsigned Select (MS): If MS is High, input operands for integer multiplication operations are treated as two's complement numbers. If MS is Low, the input operands are treated as unsigned numbers. Bit 1O-Reserved for future use. This bit must be set to 0 to assure future compatibility. Bit 9-IBM Underflow Mask Enable (BU): If BU is High, certain underflowed IBM operations will produce a normalized result with a biased exponent increased by 128. If BU is Low, these operations will produce a final result of true zero. BU affects only those operations that produce a result in IBM format and that use the following base operation codes: F' = P' + T' F' = P' x Q' Compare P, T F' = (P' x a') + T' Convert Tto Alternate F.P. Format Convert T from Alternate F.P. Format Scale Tto Floating-point by a Bit a-IBM Significance Mask Enable (BS): If BS is High, certain IBM operations having intermediate results of 0 will produce a final result of 0 with the biased exponent unchanged. If BS is Low, these operations will produce a final result of true zero. BS affects only those operations that produce a result in IBM format and that use the F' =P' + a' and COM PAR E P, T base operation codes. Bit 7-IEEE Sudden Underflow Enable (SU): If SU is High, all IEEE denormalized results are replaced by a 0 of the same sign; if SU is Low, the appropriate denormalized number will be produced. If IEEE traps are enabled (mode register bit TRP High), sudden underflow is disabled. Bit 6-1EEE Trap Enable (TRP): If TRP is High, IEEE trapped operation is enabled; the Saturate Enable (SAT) and Sudden Underflow (SU) bits are ignored. For an underflowed result, the biased exponent is increased by 192 (single precision) or 1536 (double precision), with the significand unchanged. For an overflowed result, the biased exponent is decreased by a like amount with the significand unchanged. If TRP is Low, IEEE trapped operation is disabled. This bit affects only those operations that produce a result in IEEE floating-point format. Bit 5-IEEE Affine/Projective Select (AP): If AP is High, IEEE addition or subtraction operations having infinite input operands are performed in affine mode; if AP is Low, these operations are performed in projective mode: In affine mode, it is permissible to add infinities of like sign or subtract infinities of opposite sign, producing an infinite result with the appropriate sign. In projective mode these operations will produce an invalid operation exception. This bit affects only those operations that produce a result in IEEE floating·point format. Bit 4-Saturate Enable (SAT): If SAT is High, overflowed results are replaced by the largest representable value in the selected format of the same sign as the overflowed result; if SAT is Low, the result produced depends on the overflow conventions for the selected floating-point format. If IEEE traps are enabled (mode register bit TR High), saturation is disabled for any operation that produces a result in IEEE floating-point format. Bits 1-0 Primary Floating-Point Format (PFF), Bits 3-2 Alternate Floating-Point Format (AFF): The primary format is used as the source and destination format for all floating-point operations except conversions; and as the'source or destination format for operations that convert between floating-point and integer formats. The alternate format is used as a source or destination format in operations that convert one floating-point format to another. Both the PFF and AFF fields are encoded as follows: High Bit Low Bit 0 0 IEEE 0 1 DEC F (Single), DEC D (Double) 0 DEC F (Single), DEC? G (Double) 1 IBM Floating-point formats are discussed in further detail in AppendixA. 63 47 31282726252423222120191716 . z X U V R I M M M M M M S S S S S S K K K K K K P • L . 46 45 141312111098 R M S M F M S . Figure 3. Mode Register 1-130 Format 44 7 43 40 39 36 35 6543 B B S T A S U S U R p A P T 32 o 2 1 A F F P F F 09114-005C Status Register The status register contains operation exception status, as well as the status of pending operands and operations; its format is shown in Figure 4. The Am29000 can initialize or modify the contents of the status register by issuing a write status transaction request, and can read current status register contents by issuing a read status transaction request or as part of a save state sequence. All status register bits are initialized to a logic Low after hardware reset. ,,\ (" ed ,,\ 11 10 9 8 7 6 5 4 3 2 1 0 0 I S R E Z X U V R I P V V V S E E E E E E P A A A X X X X X X (" 09114-OO6C Figure 4. Status Register Bits 31-11-Reserved for future use. This field must be set to 0 when written to assure future compatibility. Bit 10-0peratlon Pending (OPP): A logic High indicates that an operation awaits execution. Bit 9-1-Temp Valid (IVA): A logic High indicates that register I-Temp contains an instruction for a pending operation. Bit 8-S-Temp Valid (SVA): A logic High indicates that register S-Temp contains an operand for a pending operation. Bit 7--R-Temp Valid (RVA): A logic High indicates that register R-Temp contains an operand for a pending operation. Bit 6-Exceptlon Status (ES): A logic High indicates that status register bits 0-5 contain an unmasked exception. Am29027 Bit 3-Underflow Exception Bit (UEX): A logic High indicates that an operation result has underflowed the destination format. Latches until cleared. Bit 2-Overflow Exception Bit (VEX): A logic High in· dicates that an operation result overflowed the destination format. Latches until cleared. Bit 1--Reserved Operand Exception Bit (REX): A logic High indicates that a reserved operand appeared as an input operand to an operation orwas generated as a result. Latches until cleared. Bit O-Invalld Operation Exception Bit (lEX): A logic High indicates that input operands are unsuitable forthe operation performed (e.g., ooxO). Latches until cleared. Flag Register The flag register contains 7 flag bits that report exception or Boolean results for the most recently performed operation; its format is shown in Figure 5. The remaining 25 register bits are reserved for future use. The Am29000 can read the current flag register contents by issuing a read flags transaction request. Flag· register bits 6-0 correspond to Flag 6-Flag 0 (FLs-FLo). These flags assume a meaning that is operation-dependent, as discussed in the Operation Flags section. The flag register is made transparent in flow-through mode. 76543210 ,,\ .. erved ,'\ ( F F F F F F L L L L L L 6 5 4 3 2 1 F L 0 09114-OO7C Figure 5. Flag Register Bit 5-Zero Result Flag (ZEX): A logiC High indicates that an operation produced a zero result. Latches until cleared. Bit 4-lnexact Result Bit (XEX): A logic High indicates that an operation result had to be rounded to fit the destination format. Latches until cleared. 1-131 29K Family CMOS Devices Precision Register The precision register contains a bits that report the precision of operands stored in the register file; its format is shown in Figure 6. Bit 0 (PRo) reports the precision of register file location 0 (RFo), bit 1 the precision of location 1 (RF,), and so on. A logic High indicates a singleprecision value, logic Low a double-precision value. The precision register also contains the Accelerator Release Level (ARL), an a-bit, read-only identification number that specifies the accelerator version. The ARL field occupies bits 31-24. The remaining 16 bits of the precision word are reserved for future use, and must be set to 0 when written to assure future compatibility. ~ 876543210 ,\ ~: served R R R R R R R R 7 6 5 4 3 2 1 0 09114-OO8A 3130282724232019161514131211109876 F S F S Q M S T I R M P P S R R S I P 5 I Q 5 S I I T F 540 I C F 0 09114-009A Figure 7. Instruction Register Bit 31-Reglster File Enable (RF): Enables a write to the register file. When RF is High, the operation result is written to the register file location specified by RFS and the resulting precision is written to the corresponding bit of the precision register. When RF is Low, no write is performed either to the register file or the precision register. Bits 30-28-Reglster file select (RFS): Selects the register file location (RF7-RFo) to which the operation result is to be written. If bit RF is Low, the value of RFS is a "don't care." Bits 27-24-Select for P Operand Multiplexer (PMS): Selects the data input for the ALU P port. Bits 23-20--Select for Q Operand Multiplexer (QMS): Selects the data input for the ALU port. a 1-132 Bits 11-10--Slgn Q (SIQ): Sign-change control forthe ALU input. a Bits 9-8-Slgn T (SIT): Sign-change control for the ALU Tinput. The function of the instruction word fields is discussed in further detail in the Accelerator Instruction Set section. The instruction register contains a 32-bit instruction word that specifies the ALU operation; its format is shown in Figure 7. P M Bits 13-12-Slgn P (SIP): Sign-change control for the ALU P input. Bits 4-O-Core Operation (CO): Specifies the core operation to be performed by the ALU. Instruction Register, I-Temp Register R Bit 14-Result Precision (RPR): Precision of the ALU output; single precision when High, double precision when Low. Bit 5-lnteger/Floating-polnt Select (IF): A logic Low selects a floating-point operation, a logic High an integer operation. Figure 6. Precision Register R Bit 15-lnput PreCision (IPR): Precision of the operands in Registers Rand S; single preCision when High, double precision when Low. Bits 7-6-Slgn F (SIF): Sign-change control for the ALU output. P P P P P P P P ") Bits 19-16-Select for T Operand Multiplexer (TMS): Selects the data input for the ALU T port. The I-Temp register has a format identical to that of the instruction register; this register is used to temporarily buffer instructions for pending operations, thus allowing the overlap of operation specification and execution. The Am29000 can write to the instruction and I-Temp registers by issuing the write instruction transaction request, and can read the contents of these registers as part of the save state sequence. Operand Registers The Am29027 holds operands in thirteen 64-bit registers. Four registers-R, S, R-Temp, and S-Tempstore ALU input operands; a fifth register, F, stores ALU results. 'Eight remaining registers, RF7-RFo, are arranged as a file into which operation results can be written, and from which operands can be taken for use in subsequent operations. All operand registers share common data formats; any register can hold a single- or double-precision floatingpoint number, or a single- or double-precision integer. Floating-point numbers are stored with the sign bit in the most significant bit (bit 63) of the operand register. For Single-precision numbers, the 32 LSBs of the register are unused; the value of these unused bits is a "don't care." Integer numbers are stored with the least significant bit placed in the least significant bit (bit 0) of the operand Am29027 register. For single-precision numbers, the 32 MSBs of the register are unused; the value of these unused bits is a "don't care." Floating-point and integer formats are described in further detail in Appendix A. Accelerator Transaction Requests The Am29000 controls the Am29027 with 13 transaction requests. Transaction request type is indicated by the state of four signals: Rm and OPT2-OPTo. Table 1 lists the transaction types and corresponding signal states. Transaction requests are conditioned by signal DREOT 1 (which when High indicates an accelerator transaction) and signals BINV and DREO. The Am29027 will recognize a transaction request only if DREOTI and BINV are High and DREQ is Low. Signal DREOT0modifies the execution of most transaction requests. For transaction requests that transfer operands or instructions to the Am29027, asserting DREOTo will start the execution of an accelerator operation. For transaction requests that transfer operation results, status, or flags to the Am29000, asserting DREOTo will suppress the reporting of unmasked exceptions via signal DERR. For the write status transaction request, asserting DREOTo either retimes the operation currently described by the instruction register (flow-through mode) or invalidates the AlU pipeline (pipeline mode). Write Transaction Requests Write transactions transfer data from the Am29000 to the Am29027, or cause the Am29027 to transfer data internally. To perform a write request, the Am29000: • Issues the appropriate transaction request on Signals OPT2-0PTo, and asserts Signal Rm Low • Places the data to be transferred, if any, on output signals 031-00 and A31-Ao The Am29027 responds to the request by asserting one (and only one) of two status signals: • CDA indicates that the Am29027· will take the specified action and clock in the data accompanying the transaction request, if any, on the next rising edge of clock. • DERR indicates that the Am29027 is unable to accept the data, due to the presence of an unmasked exception. Timing for write transactions is illustrated in Appendix D. Table 1. Transaction Requests RlW OPT2 0 0 0 0 0 0 0 0 OPT, OPTD Request Type 0 0 0 0 0 0 0 1 1 0 1 1 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 1 1 1 1 1 1 Write Operand R Write Operand S Write Operands R, S Write Mode Write Status Write RF Precisions Write Instruction Advance Temp Registers Read Results MSBs Read Results LSBs Read Flags Read Status Save State There are eight write transactions: Write Operand R: An operand is written to Input Register Rand/or R-Temp. The most significant half of the 64-bit operand to be written is placed on Input Bus R, the least significant half on Input Bus S. The action taken depends on signal DREOTo and on whether an accelerator operation will be in progress during the next clock cycle. DREQTD asserted No Yes Yes Operation In progress Data next written clock cycle to X No Yes R-Temp valid bit R-Temp Set R-Temp, R Reset R-Temp Set Operation pending bit Unchanged Reset Set If DREOTo is asserted and no accelerator operation will be in progress during the next clock cycle, a new operation will be started on the next rising edge of ClK. If mode register bit HE (Halt On Error Enable) is High and an unmasked exception has been detected, the Am29027 will respond to a write operand R request by asserting signal DERR; the contents of Registers Rand R-Temp will not be changed, and the R-Temp Valid and Operation Pending bits will retain their current values. Write Operand S: An operand is written to Input Register Sand/or S-Temp. The most significant half of the 64-bit operand to be written is placed on Input Bus R, the least significant half on Input Bus S. The action taken depends on signal DREOT0 and on whether an accelerator operation will be in progress during the next clock cycle. 1-133 29K Family CMOS Devices Operation In progress Data written DREQTo next to asserted clock cycle No Yes Yes X No Yes S-Tem~ valid b t S-Temp Set S-Temp, S Reset S-Temp Set Operation pending bit Unchanged Reset Set If DREQTo is asserted and no accelerator operation will be in progress during the next clock cycle, a new operation will be started on the next rising edge of ClK. If mode register bit HE (Halt On Error Enable) is High and an unmasked exception has been detected, the Am29027 will re~ to a write operand S request by asserting signal DERR; the contents of Registers Sand S-Tempwill not be changed, and the S-Temp Valid and Operation Pending bits will retain their current values. Write Operands R, S: Two 32-bit operands are written to Registers Rand S and/or Registers R-Temp and STemp. The 32-bit operand to be written to Registers R or R-Temp is placed on Input Bus R; the 32-bit operand to be written to Registers S or S-Temp is placed on Input Bus S. Each 32-bit word is written to both the upper and lower halves of the target register. The action taken depends on Signal DREQTo and on whether an accelerator operation will be in progress during the next clock cycle. DREQTo asserted Operation In progress Data R-,Snext written Temp clock cycle valid bits to No X Yes No Yes Yes R-Temp S-Temp R-Temp S-Temp R,S R-Temp S-Temp Operation pending bit Set Unchanged Reset Reset Set Set If DREQTo is asserted and no accelerator operation will be in progress during the next clock cycle, a new operation will be started on the next rising edge of ClK. If mode register bit HE (Halt On Error Enable) is High and an unmasked exception has been detected, the Am29027 will respond to a write operands R, S request by asserting Signal DERR; the contents of Registers R, R-Temp, S, andS-Tempwill not be changed, and the RTemp Valid, S-Temp Valid, and'Operation Pending bits will retain their current values. Write Mode: A 64-bit word is written to the mode register. The least significant half of the mode word is placed on Input Bus R, the most significant half on Input Bus S. The state of signal DREQTo is a "don't care" for this transaction request. 1-134 Write Status: A 32-bit word is written to the status register and the status word to be written is placed on Input Bus R. Asserting signal DREQTo will produce an additional action that is mode-dependent. In flow-through mode, asserting DREQTo will cause the operation currently specified by the instruction register to be retimed; operation results will not be written to the status register orthe register file. In pipeline mode, asserting DREQTo will invalidate the ALU pipeline. Write Register File Precisions: A 32-bit word indicating the precisions of register file locations RF7-RFo is written to the preciSion register; the preCision word to be written is placed on Input Bus R. The state of signal DREQTo is a "don't care" for this transaction request. Write Instruction: A 32-bit accelerator instruction is written to the instruction register and/or Register 1Temp. The 32-bit instruction is taken from input signals 131-10. The action taken depends on signal DREQTo, and on whether an accelerator operation will be in progress during the next clock cycle. DREQTo asserted Operation In progress Data next written clock cycle to No Yes X No Yes Yes I-Temp I-Temp instruction register I-Temp I-Temp valid bit Operation pending bit Set Reset Unchanged Reset Set Set If DREQTo is asserted and no accelerator operation will be in progress during the next clock cycle, a new operation will be started on the next rising edge of elK. If mode register bit HE (Halt On Error Enable) is High and an unmasked exception has been detected, the Am29027 will respond to a write instruction transaction request by asserting signal DERR; the contents of Register I-Temp and the instruction register will not be changed, and the I-Temp Valid and Operation Pending bits will retain their current values. Advance Temp Registers: The contents of the RTemp, S-Temp, and I-Temp registers are transferred to Register R, Register S,and the instruction register, respectively. The state of signal DREQTo is a "don't care" forthis transaction request. The advance temp registers transaction request is used during restoration of accelerator state. Read Transaction Requests Read transactions transfer data from the Am29027 to the Am29000. When data is to be transferred, the Am29000: Am29027 • Issues the appropriate transaction request on signals OPT~OPTo, and asserts signal R/W High. • Places its data bus drivers in a high-impedance state. The Am29027 then places the requested data on signals F31-Fo and issues two status signals: • DRDY indicates that the data requested is available on Output Bus F31-Fo. • DERR indicates that the Am29027 has detected an unmasked exception; the exception mayor may not be related to the data requested. DRDYand DERR may both be active at the same time; if so, the Am29000 will respond to DERR and ignore DRDY. Timing for read transactions is illustrated in Appendix D. There are five read transactions: Read Result MSBs: The 32 MSBs of Register Fare placed on output bus F. Asserting signal DREOTo will suppress the reporting of unmasked exceptions. Read Result LSBs: The 32 LSBs and 32 MSBs of Register F are placed on Output Bus F in consecutive clock cycles. Asserting signal DREOTo will suppress the reporting of unmasked exceptions. The read result LSBs request must always be followed by a read result MSBs request. Read Flags: The flag register contents are placed on Output Bus F; bits F31-F7 will be logic Low. Asserting signal DR EOTo will suppress the reporting of unmasked exceptions. Read Status: The status register contents are placed on Output Bus F; bits F31-Fll will be logic Low. Asserting Signal DREOTowill suppress the reporting of unmasked exceptions. Save State: The contents of the instruction register, mode register, status register, register file, precision register, and Registers R, R-Temp, S, S-Temp, and 1Temp are transferred to the Am29000 via Output Bus F. Exception reporting via Signal DERR is suppressed; the state of signal DRETOo is a "don't care." Further details on the use of this request appear in the Saving and Re· storing State sections. Coprocessor Data Accept The Coprocessor Data Accept (CDA) Signal indicates to the Am29000 that the Am29027 is able to accept new operands or instructions. CDA is normally Low (active), but will go High if: • the Am29027 has an operation currently in progress and a completely specified pending operation waiting in the temporary registers, or • The Am29027 has halted in response to an unmasked exception (Halt On Error mode enabled). If the Am29027 issues any write transaction request and CDA is active Low, the transaction request will complete in a single cycle. If CDA is High, response to a write transaction request depends on request type: • For the write operand R, write operand S, write operands R, S, and write instruction transaction requests, the Am29027 will assert CDA active when it is able to accept new data. If it is not able to accept new data indefinitely due to presence of an unmasked exception (Halt On Error mode enabled), it will respond to the transaction request by asserting signal DERR. .. For the write mode, write status, write register file preciSions, and advance temp registers transaction reguests, the Am29027 will temporarily assert CDA during the cycle after the request is issued, regardless of whether an operation is in progress or an unmasked exception has halted the accelerator. CDA pertains only to write transaction requests; for read transaction requests, the Am29000 ignores the state of CDA. Data Ready The Data Ready (DRDY) signal indicates to tho Am29000 that the Am29027 is placing data on the F output bus. The Am29027 generates DRDY in response to the read result MSBs, read result LSBs, read status, read flags, and save state transaction requests. For the read result MSBs, read result LSBs, read flags, and read status transaction requests, there is usually a minimum of one cycle delay between the time the request is issued and the time that DRDY is asserted. The only exception to this rule is when a read result LSBs request is immediately followed by a read result MSBs request, in which case the Am29027 responds to the second request in a single cycle. If the Am29027 is unable to respond immediately to a read transaction request, as m~the case when an operation is in progress, the DRDY signal will be held inactive until such a time as the requested data can be output. Forthe save state transaction request, the delay between the issuance of the transaction request and the DRDY response varies according to the specific data requested. DRDY pertains only to read transaction requests; for write transaction requests, DRDY remains inactive. Data Error The Data Error (DERR) signal indicates to the Am29000 that the Am29027 is unable to respond to a transaction request normally, due to the presence of an unmasked exception bit in the status register. For read transaction requests, read result LSBs, read result MSBs, read flags, and read status, the Am29027 asserts DERR if the status register contains an unmasked exception bit. The Am29000 may suppress 1-135 29K Family CMOS Devices error reporting for these requests by issuing them with signal DREOT0 asserted. For write transaction requests, write operand R, write operand S, write operands R, S, and write instruction, DERR is issued in the presence of an unmasked exception if Halt On Error Mode is enabled in such an event, the contents of the target registers are left unchanged. DERR is never issued in response to transaction requests write mode, write status, write register file precisions, advance temp registers, and save state. Accelerator Instruction Set The ALU performs 57 arithmetic and logic instructions. Input operands for these instructions can be taken from Input Registers Rand S, register file locations RF7-RFo, and on-board constant stores. At the user's option, results can be stored in register file locations RF7-RFo. Instruction Word The 32-bit instruction word, IN31-INo, specifies the operation to be performed by the ALU. The instruction word is stored in the instruction register; instruction registerformat is shown in Figure 7.ln flow-through mode, the instruction word specifies the operation to be performed by the entire ALU.ln pipeline mode, the instruction word specifies the operation to be performed by the first pipeline stage; the remaining pipeline stage or stages r;lre controlled by their respective pipeline registers. The instruction word also specifies input operand sources, result destination, and operand precisions. An instruction word comprises five sections: base operation code, sign-change selects, operand precision selects, operand source selects, and register file controls. Base Operation Code The base operation code consists of the core operation field (CO), which specifies the type of operation to be performed, and the integerlfloating-point select bit (IF), which specifies whether the operation is integerorfloat~ ing-point. Available base operation codes and the corresponding values for CO and IF are listed in Table 2. Note that the value of IF is a "don't care" for base operation code MOVE P. Sign-Change Selects Each ALU input and output port has associated hardware that can be used to modify operand signs (see Fig- 1·136 ure 8). These sign-change blocks, when applied to base operations, greatly increase the number of available operations. The base operation code F' = P' + T', for example, can be used to perform operations such as P - T, ABS(P) + ABS(T), ABS(P + T), and others, simply by modifying the signs of the input and output operands. The SIP, SIO, and SIT instruction word fields control the sign-change blocks for the P, a, and T input operands, respectively; the SIO and SIF fields control the sign change block for output operand F. Using the sign-change blocks, the sign of an input operand may be left unchanged, inverted, set Low, or set High; the sign of the output operand may be left unchanged, inverted, set Low, set High, set to the sign of the P input operand, or set to the sign of the T input operand. Select codes for the P, a, T, and F sign-Change blocks are shown in Tables 3,4,5, and 6, respectively. Operand PreCision Selects The Am29027 supports mixed-precision operations; it is possible, for example, to perform an operation having single-precision inputs and a double-precision output, or one single- and one double-precision input, or any other combination. The precision of the operands in Registers Rand S is specified by instruction bit IPR, which is logic High for single-precision operands and logic Low for double-precision operands. Note that the operands in the Rand S registers must have the same precision if they are to be used in the same operation. This restriction does not preclude performing an operation with mixed-precision input operands, as there are no restrictions on the precisions of operands stored in the register file. The precision of each operand stored in the register file is recorded in the preCision register; this precision information is automatically supplied to the ALU when a register file location is specified as an input operand to an operation. The precision of an operation result is specified by instruction bit RPR, which is set High for a single-precision result, and Low for a double-precision result. Should the instruction word specify that the result is to be written to the register file (instruction bit RF High), the resulting precision will be written to the appropriate precision register bit when the result is written to the register file. Am29027 Table 2. Operation Codes CO IF INs I~ IN3 INz IN! INo 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 INs I~ IN3 INz IN! INo 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 INs I~ X 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 0 0 IN3 INz IN! INo 1 0 0 0 Base Operation Code (Floating-Point) F .. P F.P'+T' F' -p'xa' Compare P, T Max, P, T MinP, T Convert T to Integer Scale T to Integer by a F - (P' x a') + T' Round T to Integral Value Reciprocal Seed of P Convert T to Alternate F. P. Format Convert T from Alternate F. P. Format Base Operation Code (Integer) F .. P F .. P+T f .. Px a Compare P, T Max P, T MinP, T Convert T to Floating-Point Scale T to Floating-Point by a F= PORT F .. PANDT F- PXORT Shift P Logical a Places Shift P Arithmetic a Places Funnel Shift PT Logical a Places Base Operation Code (Special) MOVEP 1-137 29K Family CMOS Devices P T Q 09114-010C F Figure 8. ALU Sign-Change Blocks Table 5. Select Codes for T Operand Sign-Change Block Table 3. Select Codes for P Operand Sign-Change Block SIT SIP IN. INa SIGN(P) SIGN(P) o o o SIGN(T) SIGN 0 1 1 o o INn IN12 SIGN (P') o 0 0 1 o 1 SIGN (T') ('n Table 6. Select Codes for F Operand Sign-Change Block Table 4. Select Codes for Q Operand Sign-Change Block SIQ SIQ INn 0 IN10 SIGN (Q') 0 SIGN(Q) SIGN (0) 0 1 1 0 1 0 IN11 IN 10 F'= P (Floating-Point) 0 F = P (Integer) 0 QB Maximum P, T OR Minimum P, T A!1..Q1llill Base Operations 1-138 SIF Base Operation IN7 INs SIGN(F) X 0 0 SIGN(F') X X X 0 1 SIGN(F') 0 0 1 1 0 1 0 1 X X X X SIGN(P) SIGN(T) 1 0 1 1 X X X X X X X X 0 0 SIGN(F') 0 1 SIGN(F') 1 1 0 1 0 1 Am29027 Operand Source Selects Instruction fields PMS, OMS, and TMS specify the select codes for the P, 0, and T operand multiplexers, respectively; these codes are summarized in Table 7. The P, 0, and T operand multiplexers can independently select Register R, Register S, register file locations RF7-RFo, or one of six predefined constants. For operations with floating-point inputs, constants 0,0.5, 1, 2, 3, and pi are available; for operations with integer inputs, constants 0, -1, 1, 2,3, and _(2 63 ) are available. These constants are supplied to the ALU as double-precision numbers, independent of the precisions specified for other input and result operands. Hexadecimal values for the constants are listed in Table 8. Register File Controls Instruction fields RF and RFS control the storing of operation results in the register file. If registerfile enable bit RF is High, the result of the operation specified by the instruction word will be stored in register file location RFS, where RFS is a number from 7 to 0; the precision of the result, as specified by the RPR bit, will be written to the appropriate bit in the precision register. If RF is Low, the operation result is written to neither the register file nor the preCision register. some base operation codes, sign-change control settings SIP, SIO, SIT, and SIF are completely arbitrary; for others, only the sign-change field values shown in Table 9 are valid. Table 10 summarizes permissible sign-change field values for each base operation code. Table 7. Select Codes for P, 0, and T Operand Multiplexers PMS IN v IN26 IN25 IN24 OMS IN23 IN22 IN21 IN20 TMS IN,B IN,8 IN17 IN,6 P 0 T Register R 0 0 0 0 0 0 0 1 Register S 0 0 1 0 o (Zero) 0 0 1 1 0.5 (F.P.) - 1(integer) 0 1 0 a 1 0 1 0 1 2 0 1 1 a 3 a 1 1 1 1t 1 0 0 a RFo 1 0 0 1 RF, 1 0 1 0 RF2 Accelerator Operations 1 a 1 1 RF3 Table 9 illustrates a number of possible ALU instructions and corresponding values for instruction word fields SIP, SIO, SIT, SIF, IF, and CO. Note that the remaining instructionfields-RF, RFS, PMS, OMS, TMS,IPR, and RPR-can be specified independently. 1 1 0 0 RF. 1 1 0 1 RF5 1 1 1 a RF6 1 1 1 1 RF7 (F.P.) - 263 (integer) The user may create additional instructions using instruction words other than those listed in Table 9. For 1-139 29K Family CMOS Devices Table 8. Hexadecimal Values for On-Chip Constants IEEE Floating-Point Constant 0000000000000000 0.5 3FEOOOOOOOOOOOOO 3FFOOOOOOOOOOOOO 1 2 3 4000000000000000 4008000000000000 1t 400921FB54442D18 DEC D Floating-Point Constant o Hexadecimal Representation 0.5 0000000000000000 4000000000000000 1 2 3 4080000000000000 4100000000000000 4140000000000000 1t 41490FDAA22168C2 DEC G Floating-Point Constant o 0.5 1 Hexadecimal Representation 0000000000000000 4000000000000000 4010000000000000 2 4020000000000000 3 4028000000000000 1t 402921FB54442D18 IBM Floating-Point Constant Hexadecimal Representation o 0000000000000000 0.5 4080000000000000 1 4110000000000000 4120000000000000 2 3 4130000000000000 1t 413243F6A8885A31 Integer Constant o -1 1 2 1-140 Hexadecimal Representation o Hexadecimal Representation 0000000000000000 FFFFFFFFFFFFFFFF 0000000000000001 3 0000000000000002 0000000000000003 -263 8000000000000000 Am29027 Table 9. Instruction Words for Typical ALU Operations Operation SIP SIQ SIT SIF IF CO FPP FP-P FPABS(P) FP Sign(T) x ABS(P) 00 00 00 00 00 00 00 11 xx XX XX XX 00 01 10 XX 00000 00000 00000 00000 FPP + T FPP-T FPT-P FP-P- T FPABS(P + T) FPABS(P- T) FP ABS(P) + ABS(T) FP ABS(P) - ABS(T) FP ABS[ABS(P) - ABS(T)] 00 00 01 01 00 00 10 10 10 XX XX XX XX XX XX XX XX XX 00 01 00 01 00 01 10 11 11 00 00 00 00 10 10 00 00 10 a a a a a a FPPxO FP (-P) xO FPABS(PxO) 00 01 00 00 00 00 xx XX XX 00 00 10 FP Compare P, T 00 XX 01 00 a 00011 FP Max P, T FP Max ABS(P), ABS(T) 00 10 00 00 01 11 00 00 0 0 00100 00100 FP Min P, T FP Min ABS(P), ABS(T) FP Limit P to Magnitude T 01, 11 11 00 00 10 00 10 10 00 00 XX 0 0 0 00101 00101 00101 FP Convert T to Integer XX XX XX 00 00 0 00110 00 00 00 0 00111 00 00 00 00 10 10 10 00 00 01 01 10 10 11 00 00 00 00 00 00 00 0 0 0 FP ABS(P x 0) - ABS(T) 00 01 00 01 10 11 10 0 0 01000 01000 01000 01000 01000 01000 01000 FP Round T to Integral Value XX 00 a 01001 00 XX XX 00 FP Reciprocal Seed (P) XX 00 0 01010 FP Convert T to Alternate Floating-Point Format XX XX 00 00 0 01011 XX XX 00 00 00 00 00 00 00 11 00 00 00 00 00 00 00 01 10 0 1 1 1 XX 01100 00000 00000 00000 00000 int P + T int P- T int T - P int ABS(P + T) int ABS(P - T) 00 00 01 00 00 XX XX XX XX XX 00 01 00 00 01 00 00 00 10 10 00001 00001 00001 00001 00001 int P x 0 00 00 XX 00 00010 int Compare P, T 00 XX 01 00 00011 int Max P, T 00 00 01 00 00100 int Min p, T 01 00 00 00 00101 FP Scale T to Integer by 0 FPT + PxO FPT-PxO FP-T + PxO FP-T-PxO FP ABS(T) + ABS(P x 0) FP ABS(T) - ABS(P x 0) FP Convert T from Alternate Floating-Point Format int P int-P intABS(P) int Sign(T) x ABS(P) 0 a a a a a a a a 0 a a 00001 00001 00001 00001 00001 00001 00001 00001 00001 00010 00010 00010 1-141 29K Family CMOS Devices Table 9. Instruction Words for Typical ALU Operations (continued) Operation SIP SIQ SIT SIF IF CO int Convert T to Float XX XX 00 00 00110 int Scale T to Roat by Q XX 00 00 00 00111 int PORT XX XX XX XX 10000 int PAND T XX XX XX XX 10001 intPXORT int NOT T (see Note 1) XX XX XX XX XX XX XX XX 10010 10010 int Shift P Logical Q Places 00 00 XX 00 10011 int Shift P Arithmetic Q Places 00 00 XX 00 int Funnel Shift PT Q Places 00 00 00 00 1 10101 MOVEP XX XX XX XX X 11000 10100 Note 1. NOT T is performed by XORing T with a word containing all 1s (integer - 1). When invoking NOT T the user must set instruction field PMS to 0011 2, thus selecting integer constant -1. Table 10. Allowable Sign-Change Combinations IF 0 0 0 0 0 0 0 0 0 0 0 0 0 X Key: 1-142 CO Operation SIP SIQ SIT SIF V 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 FP F' .. P F V X FP F' = P' + T' V X V V FP F'=P'x Q' FP Compare P, T FP Max P, T FP Min P, T FP Convert T to Integer FP Scale T to Integer V F F F X X V X F F X F X F F F F F V F F F F F FP F' =(P' x Q') + T FP Round T FP Reciprocal Seed P FP Convert T to Alt Format FP Convert T from Aft Format V X F X X V X X X X V F X F F V F F F F 00000 00001 00010 00011 00100 00101 00110 00111 10000 10001 10010 10011 10100 10101 int F= P int F = P+ T F F F X F F F F int F= PxQ int Compare P, T int Max P, T int Min P, T int Convert Tto F.P. int Scale T to F.P. int F .. PORT int F= PANDT int F =P XORT int Shift P Logical int Shift P Arithmetic int Funnel Shift PT F F F F X X X X X F F F F X F F X F X X X F F F X F F F F F X X X X X F F F F F F F X X X F F F 11000 MOVEP X X X X V = Variable; user can specify arbitrary sign change. F .. Fixed; user is restricted to sign.chan~e combinations shown in Table 9. X .. Don't care; this field does not affect t e operation or its result. Am29027 Base Operation Code Description F' = P (Floating-Point): The P-operand is passed through the ALU unchanged, except for any specified precision conversions. If the user specifies different input and output precisions, the operation may be used to perform single-to-double or double-to-single conversions.lnstructions such as negation, absolute value extraction and sign transfer may be executed by setting the sign-change controls appropriately while executing this base operation. = F' P' + T' (Floating-Point): The two operands P' and T' are added, taking into account any specified precision conversions. Instructions such as subtraction, sum-ofabsolute-values, difference-of-absolute-values, absolute-value-of-sum, and absolute-value-of-difference may be executed by setting the sign-change controls appropriately while executing this base operation. = F' P' x Q' (Floating-Point): The operands P' and A' are multiplied, taking into account any specified precision conversions. Instructions such as negative-product and absolute-value-of-product may be executed by setting the sign-change controls appropriately while executing this base operation. Compare P, T (Floating-Point): The two operands P and T are compared, taking into account any specified precision conversions. The output of the operation is the result of the subtraction (P - T). The flags are set appropriately to indicate the result of the comparison, conforming to the relevant parts of the floating-point standards. For IEEE and DEC operations, one of four flags (greater than, less than, equal to, or unordered) is set for any given compare operation. For IBM operations, the unordered flag does not apply since the format does not support reserved operands. Maximum P, T (Floating-Point): The two operands P and T are compared, taking into account any specified preCision conversions. The most positive operand is selected as the output. The Winner flag indicates which of the operands is selected. Additionally, the operation maximum-of-absolute-value may be performed by setting the appropriate sign-change controls. Minimum P, T (Floating-Point): The two operands P and T are compared, taking into account any specified precision conversions. The most negative operand is selected as the output. The Winner flag indicates which of the two operands is selected. Additionally, the operations minimum-of-absolute-values and limit-P-to-magnitude-T may be performed by setting the appropriate sign-change controls. The limit-P-to-magnitude-T operation is useful for clipping a sequence of operands to ensure that their magnitude never exceeds a preset limit. Convert T to Integer (Floating-Point): The operand T is converted from floating-point representation to two's complement integer representation, taking into account the specified precision of the floating-point operand. If the output precision is specified as single, the result is a 32-bit integer. If the output precision is specified as double, the result is a 64-bit integer. Scale T to Integer by Q (Floating-Point): The operand T is converted from floating-point representation to two's complement integer representation, using the exponent of the floating-point operand as a scale factor and taking into account the specified precision of the floating-point operands. The unbiased exponent of the operand is added to the exponent of the operand T, permitting IEEE and DEC operands to be multiplied by any power of 2, and IBM operands by any power of 16, before the conversion is performed. If the output precision is specified as single, the result is a 32-bit integer. If the output precision is specified as double, the result is a 64-bit integer. a a = F' (P'x Q') + T' (Floating-Point): The operands P' and 0' are multiplied, producing a double-precision product. This product is added to the operand T', taking into account any specified precision conversions. Instructions such as P x T, T - P X 0, ABS (P x 0) + ABS(T) and ABS(P x + T) may be executed by setting the signchange controls appropriately while executing this base operation. a a- Round T to Integral Value (Floating-Point): The floating-point operand T is rounded to an integer-valued floating-point operand, using the speCified rounding mode and taking into account any specified precision conversions. As an example, the operation converts a floating-point representation of Pi (3.14159 ... ) to a floating-point representation of 3.0 or 4.0, depending on the rounding mode selected. The final result of the operation is a floating-point number. Reciprocal Seed of P (Floating-Point): An approximation to the reciprocal of the operand P is evaluated, taking into account any specified precision conversions. The reciprocal seed comprises an accurate sign, a fullyaccurate exponent and a mantissa that is accurate to only one place. This operation can be used as the initial step in performing Newton-Raphson division; optionally, an external seed look-up table can be used for faster convergence. Convert Tto Alternate Floating-Point Format (Floating-Point): The floating-point operand T, assumed to be in the primary floating-point format, is converted to a floating-point operand in the alternate floating-point format, taking into account any specified precision conversions. Convert T from Alternate Floating-Point Format (Floating-Point): The floating-point operand T, assumed to be in the alternate floating-point format, is converted to a floating-point operand in the primary floating-point format, taking into account any specified precision conversions. = F P (Integer): The P-operand is passed through the ALU unchanged except for any specified precision conversions. If the user specifies different input and output precisions, the operation may be used to perform 1-143 29K Family CMOS Devices single-to-double or double-to-single conversions. Instructions such as negation, absolute value extraction, and sign transfer may be performed by setting the signchange control appropriately while executing this base operation. F = P + T (Integer): The two operands P and Tare added, taking into account any specified precision conversions. Instructions such as subtraction, absolutevalue-of-sum, and absolute-value-of-difference may be performed by setting the sign-change controls appropriately while executing this base operation. F = P x Q (Integer): The two operands P and 0 are multiplied, taking into account any specified precision conversions. Either 32-bit multiplication or 64-bit multiplication may be performed, and the user may select either the MSBs or the LSBs of the product as the final result. In addition, format-adjusting may be implemented if required, and the operands may be considered as signed (two's complement) or unsigned. Compare P, T (Integer): The two operands P and Tare compared, taking into account any specified precision conversions. The output of the operation is the result of the subtraction (P- T). The flags are set appropriately to indicate the result of the comparison, one of three flags (greater than, less than, or equal to) being set for any given compare operation. Maximum P, T (Integer): The two operands· P and T are compared, taking into account any specified precision conversions. The most positive operand is selected as the output. The Winner flag indicates which of the two operands is selected. Minimum P,T (Integer): The two operands P and Tare compared, taking into account any specified precision ' conversions. The most negative operand is selected as the output. The Winner flag indicates which of the two operands is selected. Convert T to Floating-Point (Integer): The operand T is converted from two's complement integer representation to floating-point representation, taking into account the specified precision of the integer operand. If the output precision is specified as single, the result is a 32-bit floating-point operand. If the output precision is specified as double, the result is a 64-bit floating-point operand. Scale T to Floating-Point by Q (Integer): The operand T is converted from two's complement integer representation to floating-point representation, using the exponent of the floating-point operand 0 as a scale factor and taking into account the specified precision of the integer operand. The unbiased exponent of the operand Q is added to the exponent of the floating-point result, permitting IEEE and DEC operands to be multiplied by any power of 2, and IBM operands by any power of 16 after the conversion is performed. If the output precision is specified as single, the result is a 32-bit floating-point operand. If the output precision is specified as double, the result is a 64-bit floating-point operand. 1-144 F = P OR T (Integer): The operand P is logically ORed with the operand T. Before the operation is performed, the inputs, if 32-bit, are sign-extended to 64 bits. F = P AND T (Integer): The operand P is logically ANDed with the operand T. Before the operation is performed, the inputs, if 32-bit, are sign-extended to 64 bits. F = P XOR T (Integer): The operand P is logically exclusive-ORed with the operand T. Before the operation is performed, the inputs, if 32-bit, are sign-extended to 64 bits. This operation may be used to invert an operand by selecting the second operand to be the integer constant, -1, so that all bits of this second operand are 1. Exclusive-ORing an operand with -1 is equivalent to inverting each bit in the operand. Shift P Logical Q Places (Integer): This operation cannot be performed in mixed-precision mode. The precision of the result is the same as the precision of the input operand P. A two's-complement shift length in the range -64 to +63 (doiJble-precision) or -32 to +31 (single-precision) is extracted from the LSBs of the operand O. The operand P is logically right-shifted by the number of places specified by the shift length. A negative shift length therefore produces a left-shift. If a right-shift is performed, Os fill vacated bit positions to the left of the input operand. If a left-shift is performed. Os fill vacated bit positions to the right of the input operand. Shift P Arithmetic Q Places (Integer): This operation cannot be performed in mixed-precision mode. The precision of the result is the same as the precision of the input operand P. A two's-complement shift length in the range -64 to +63 (double-precision) or -32 to +31 (single-precision) is extracted from the LSBs of the operand O. The operand P is arithmetically right-shifted by the number of places specified by the shift length. A negative shift length therefore produces a left-shift. If a rightshift is performed, the MSB (bit 63 or 31) is replicated to fill vacated bit poSitions to the left of the input operand. If a left-shift is performed, Os fill vacated bit positions to the right of the input operand. Funnel Shift PT Q Places (Integer): This operation cannot be performed in mixed-precision mode. The operand T is interpreted as having the same precision as the input operand p. and the precision of the result is also the same as the preCision of the input operand P. A two's-complement shift length in the range -64 to +63 (double-precision) or -32 to +31 (Single-precision) is extracted from the LSBs of the operand O. A triple-width operand (96-bit or 192-bit) is formed by concatenating the input operands into the arrangement P-T-P, with the 32-bit or 64-bit result field initially aligned with the T-operand. The triple-width operand is logically right-shifted by the number of places specified by the shift length. A negative shift length therefore produces a left-shift. Move P (Floating-Point· or Integer): The 64-bit operand P is passed unchanged through the ALU. No exceptions are detected or signaled. Am29027 Primary and Alternate Floating-Point Formats Two mode register fields. PFF and AFF. specify the primary and alternate floating-point formats used by the ALU. All floating-point operations except format conversions are performed in the format specified by PFF. For format conversion operations. either primary floatingpoint format PFF or alternate floating-point format AFF are used as follows: • • • For conversions between floating-point and integer formats (base operation codes Convert T to integer. Convert T to floating-point. Scale T to integer by O. Scale T to floating-point by 0). the floating-point source or destination format is specified by PFF; for the scale operations. the format of operand 0 is also specified by PFF. When converting from the primary floating-point format to the alternate floating-point format (base operation code Convert Tto alternate F. P. format). an operand in format PFF is converted to format AFF. When converting from the alternate floating-point format to the primary floating-point format (base operation code Convert T to primary F.P. format). an operand in format AFF is converted to format PFF. Operation Precision The ALU performs all operations in double-precision format. All Single-precision input operands are converted to double-precision equivalents by the ALU at the start of an operation. If the operation is to report a single-precision result. the ALU converts the doubleprecision internal result to single-precision at the end of the operation. Note that operation flags and exception bits pertain to the source and destination precisions. If. for example. an operation produces a single-precision overflowed result. an overflow is indicated regardless of whether that result overflows the double-precision internal format. Operation Flags For each operation. the ALU produces thirteen flags. Of these. a maximum of seven are relevant to any given operation. The relevant flags are placed in the flag register in the manner shown in Table 11. All flags are active High. In flow-through mode the flag register is made transparent. and the selected flags are presented directly to the output multiplexer. The ALU flags are: C-CARRY: Carry-out bit produced by integer addition. subtraction. or comparison. I-iNVALID OPERATION: Indicates that the input operands are unsuitable for the operation performed (e.g .• 00 x 0). R-RESERVED OPERAND: Indicates that the operation result is a reserved operand. Reserved operands include signaling or quiet NaNs in IEEE format. and DEC reserved operands in DEC D or G formats. S-SIGN: Result sign; Low for a non-negative result. High for a negative result. U-UNDERFLOW: Indicates that the operation result underflowed the destination format. V-OVERFLOW: Indicates that the operation result overflowed the destination format. W-WINNER: Indicates which of two input operands is reported as the result of the MAX p. T and MIN p. Toperations. A logic High indicates that operand T is reported as the result. a logic Low operand P. X-INEXACT RESULT: Indicates that the operation result had to be rounded to fit the destination format. Z-ZERO RESULT: Indicates that the operation produced a zero result. Note that the result is exactly zero only if the Z flag is High and the X flag is Low. >, =, <, #-GREATER THAN, EaUAL TO, LESS THAN, UNORDERED: Used to report the re~ult of an operation with the Compare p. T base operation code. The Greater Than flag indicates that P > T. the Equal To flag that P =T. and the Less Than flag that P < T. The Unordered flag indicates that one or both input operands are reserved operands and cannot be compared. Note that the Unordered flag cannot arise when comparing IBM floating-point operands or integers. Exactly one comparison flag will be active per comparison operation. 1-145 29K Family CMOS Devices Table 11. Organization of Flags Flag Register CO F F F F F F F L L 5 L 4 L L 2 L 1 0 Z Z Z X X X V V V = > R R R R R R R R R R R R R Format Operation IN.INo 6 IEEE F' = P' IEEE IEEE IEEE IEEE IEEE IEEE IEEE IEEE IEEE IEEE IEEE IEEE F' =P' + T' F' = P'xQ' Compare P, T Maximum P, T Minimum P, T Convert T to Integer Scale T to Integer F' = (P' x Q') + T' Round T to Integral Value Reciprocal Seed of P Convert Tto Aft F.P. Format Convert Tfrom Alt F.P. Format 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 S S S S S S S S S S S S S DECD DECD DECD DECD DECD DECD DECD DECD DECD DECD DECD DECD DECD F' = P' F=P'+T' F'=P'xQ' Compare P, T Maximum P, T Minimum P, T Convert T to Integer Scale T to Integer F' = (P' x Q') + T' Round T to Integral Value Reciprocal Seed of P Convert T to Aft F.P. Format Convert T from Alt F.P. Format 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 S S DECG DECG DECG DECG DECG DECG DECG DECG· DECG DECG DECG DECG DECG F' = P' F = P' + T' F' = P' x Q' Compare P, T Maximum P, T Minimum P, T Convert T to Integer Scale T to Integer 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 S S S S S S S S IBM IBM IBM IBM IBM IBM IBM IBM F'= P' S F=P'+T' F' = P'xQ' Compare P, T Maximum P, T Minimum P, T Convert T to Integer Scale T to Integer IBM IBM IBM IBM IBM F' = (P' x Q') + T' Round T to Integral Value Reciprocal Seed of P Convert T to Alt F.P. Format Convert Tfrom Alt F.P. Format 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 1-146 F' = (P' x Q') + T' Round T to Integral Value Reciprocal Seed of P Convert Tto Aft F:P. Format Convert T from Alt F.P. Format S S S S S S S S S S S S S S S S S S S S S S S S S S S S Z Z Z Z Z Z Z Z Z Z Z Z = Z Z Z U X X X X X X > X X Z Z X Z Z X X Z Z Z X X X = > Z Z Z Z = Z Z Z U U U U U U < W W U X X X X X X > X X Z Z X Z U U < W W X X Z Z Z U U U U Z Z U U U < W W X X Z Z Z Z Z Z Z Z 3 U U U V V V V V V V V V V # V V V V V V V V V V # V V V V V V V R R R R R R R R R R R R R I I I I I R R R R R R R R I I R R R R R I I I V U U < W W V V V V U X X # L U U V V V V V I R R I Am29027 Table 11. Organization of Flags (continued) Flag RegIster F F F F F L F L L L 4 3 L 2 L 1 0 < 'V V V V F CO Format Operation IN4-1No 6 5 Integer Integer Integer Integer Integer Integer Integer Integer Integer Integer Integer Integer Integer Integer F .. P F=P+T F=PxQ Compare P, T Maximum P, T Minimum P, T Convert T to Floating-Point Scale T to Floating-Point F = PORT F= PANDT F= PXORT Logical Shift P by Q Places Arithmetic Shift P by Q Places Funnel Shift P T by Q Places 00000 00001 00010 00011 00100 00101 00110 00111 10000 10001 10010 10011 10100 10101 11000 S S S S S S S S S S S S S S Z Z Z MOVEP .. Z Z Z Z Z Z Z Z Z Z > L C C W W X X U V R V S Note: Unused flags assume the Low state. 1-147 29K Family CMOS Devices Updating the Status Register The status register exception bits are updated at the conclusion of each operation in flow-through mode, and at the start of each operation in pipeline mode. An exception bit is updated only if the operation reports that exception with a flag. For example, an IEEE floatingpoint addition operation produces an overflow flag and would therefore update the overflow exception bit; an IEEE floating-point comparison operation, on the other hand, does not produce an overflow flag and would therefore leave the overflow exception bit unchanged. The mode register exception mask bits do not affect the updating of the status register exception bits-masked exceptions still appear in the status register. However, a masked exception will not set the exception status bit (ES). Operation Sequencing The Am29027 can be configured for either pipelined or flow-through (unpipelined) operation. Flow-through mode is normally selected for performing scalar opera- tions; pipeline mode provides high throughput for vector operations. The manner in which operations are sequenced depends on the mode currently invoked. Operation In Flow-Through Mode Flow-through mode is invoked by setting mode register bit PL (Pipeline Mode Select) to logic Low. Programmer's Model A programmer's model of the Am29027 in flow-through mode is shown in Figure 9. Note that Output Register F and the flag register are made transparent in this mode. Performing Operations Flow~through mode operations are performed by: • • ~----------4-------~ Storing instructions and/or operands in the Am29027 and starting the operation Loading the result 32 64 64 p Q 64 T ALU F Flags Mode Instruction Register Prec. Register Register A 09114-11C Figure 15. Programmer's Model for Flow-Through Mode 1-148 Am29027 Storing instructions and operands can be done in any of three ways: and, optionally, the operation result will be written to the register file and precision register. • Writing the Instruction only. and starting the operation: This is appropriate when all necessary operands are already present in the Am29027, as is sometimes the case when using on-board constants or the results of previous operations stored in the register file. • Writing the operands only. and starting the operation: This is appropriate when the desired instruction is already present in the Am29027, as is the case when performing the second of two identical operations. There are two conditions for which the Am29027 will not start an operation immediately. The first condition is when an operation is already in progress. In this case the new operation is kept pending in the I-Temp, R-Temp, and S-Temp registers until the current operation is completed, at which time the new operation begins. The second condition is when a previous operation creates an unmasked exception in Halt On Error mode (mode register bit HE High). In this case the new operation is kept in the I-Temp, R-Temp, and S-Temp registers until the exception is cleared, at which time the new operation begins . • Writing the Instruction and operands. and starting the operation: This is appropriate whenever the next operation requires both a new instruction and new operands. Operands and instructions are written using the write operand R, write operand S, write operands R, S, and write instruction. transaction requests. Operands and instructions can be written to the Am29027 in any order, with the operation start bit (DREOTo High) accompanying the last of the transaction requests. Loading an operation result is performed using the read result MSBs, read result LSBs, and read flags transaction requests. The specific request used depends on whether the result of an operation is a flag or flags (as is the case with comparison operations) or data (as is the case with most other operations). In cases where the operation result is stored in the register file, the user may elect not to read the result but to proceed with the next operation. Operation Timing The Am29027 will usually start a flow-through operation during the first cycle following the receipt of a write operand R, write operand S, write operands R, S, or write instruction transaction request having signal DREOTo set High. Operation execution begins with the transfer of the contents of the R-Temp, S-Temp, and I-Temp registers to Register R, Register S, and the instruction register, respectively; only those temporary registers written to as part of the operation specification will be transferred. The operand or instruction accompanying the transaction request that starts the operation (that is, the transaction request for which signal DREOTo is High) is written directly to the appropriate working register, that is, Register R, Register S, or the instruction register. Once started, an operation will proceed for the number of cycles specified by mode register fields MATC, MVTC, and PLTC; MATC specifies the numberofcycles for base operation code (P x 0) + T, MVTC the number of cycles for base operation code MOVE P, and PLTC the number of cycles for all other base operation codes. At the end of the last operation cycle, the status register exception bits and exception status bit will be updated Timing for typical accelerator operations in the flowthrough mode is illustrated in Appendix D. Availability of Operation Results In order to directly read the result of an operation, the operation specification should be followed by the appropriate read transaction request. Should the Am29000 attempt to read an operation result before the operation is completed, the Am29027 will withhold acknowledging the transaction request by holding signals DRDY and DERR inactive until the operation has been completed. All read transaction requests, including save state, will be held off in this manner. Overlapping Operations Due to the presence of the R-Temp, S-Temp, and I-Temp registers, it is possible to partially or completely specify a new operation while the previously specified operation is being performed. Execution of the new operation will begin immediately after the previous operation is completed. Execution begins with the transfer ofthe contents ofthe R-Temp, S-Temp, and I-Temp registers to the corresponding working registers; only those temporary registers that have been written to as part of the operation specification are transferred. It is important to note that, once the new operation is completely specified, any attempt to read a result will be held off until the new operation is completed. This means that it is not possible to directly read the result of an operation if another operation is completely specified before the results of the first operation are read. If, for example, specification of operation 2.0 + 3.0 is immediately followed by specification of operation 4.0 x 5.0, subsequent read result LSBs and read result MSBs transaction requests will return value 20.0, the result of the second operation. Similarly, a read flags transaction request will return flags for the second operation, and a read status transaction request will return status reflecting the completion of the second operation. This delayed read feature is provided to eliminate ambiguity in the correspondence between operations and results. Should two operations be overlapped, and should the first operation have as its target a register file location, the second operation can be completely specified be1-149 29K Family CMOS Devices fore the first operation is completed. If the first operation produces a result that is to be read directly by the Am29000, the second operation can be partially specified before the result of the first operation is read. A partial operation specification is one that includes all but the last operand or instruction. Timing for typical overlapped operations in flow-through mode is illustrated in Appendix D. Saving and Restoring State In flow-through mode, the complete state of the Am29027 can be saved and restored with the save state transaction request. The first save state transaction request will return the contents of the instruction register; subsequent requests will return the contents of Registers I-Temp, R, S, R-Temp, S-Temp, the status register, the precision register, register file locations RF7-RFo, and the mode register. The user has the option of saving only part of the state by issuing only the number of save state transaction requests needed to save registers of interest. When issuing a series of save state transaction requests, data is returned in the following order: Request Data Returned 1 Instruction I-Temp R LSBs RMSBs S LSBs S MSBs R-Temp LSBs R-TempMSBs S-Temp LSBs S-Temp MSBs Status Precision RFo LSBs RFo MSBs 2 3 4 5 6 7 8 9 10 11 12 13 14 27 28 29 30 1-150 RF7 LSBs RF7 MSBs Mode LSBs Mode MSBs Sequencing for the save state transaction request is reinitialized when the Am29000 issues any transaction request other than save state. If, for example, the Am29000 issues a write operand R transaction request after a series of save state requests, the next save state request will return the contents of the instruction register. It should be noted that the process of saving state alters the contents of the instruction register and Registers R andS. Error reporting via signal DERR is suppressed for the save state transaction request. Accelerator state is restored using transaction requests in concert with the MOVE P base operation code. Before restoring state, all status register bits should be set to logic Low using the write status transaction request to prevent the possibility of an unmasked exception bit inhibiting the restore sequence. The accelerator operand and instruction registers can then be restored, followed by restoration of the status register using the write status transaction request, with Signal DREQToasserted to indicate the end of the restore sequence. When state restoration is complete, the Am29027 will retime the operation specified by current instruction register contents. Am29027 Accelerator state is restored in the following order: Register to be restored Procedure for restoring Status Set all bits in the status register to a logic low using the write status transaction request. Mode Write using request. RFo Write "Move R to RFo" instruction using write instruction transaction request. write mode transaction Write RFo value to Register R using write operand R transaction request, start operation. Write "Move R to RF7" instruction using write instruction transaction request. Write RF7 value to Register R using write operand R transaction request, start operation. Precision Guarantee that "Move R to RF7" operation has been completed by performing a read result MSBs transaction request. Write precisions using write register file precisions transaction request. R,S, Instruction Write R value to Register R-Temp using the write operand R transaction request. Write S value to Register S-Temp using the write operand S transaction request. tions of state restoration are the initial clearing of the status register, and restoration of the status register with signal DREOTo asserted to indicate completion of the restore sequence. Error Recovery Six exception bits-invalid operation, reserved operand, overflow, underflow, inexact result, and zero result-are maintained in the status register; these bits are updated upon completion of an operation. Exception bits can be masked individually by programming the appropriate bits in the mode register; if the corresponding mask bit is inactive (logic Low), the exception bit is said to be unmasked and contributes to error reporting. The Am29027 provides three mechanisms with which unmasked exceptions can be handled. Reporting Errors Upon Read If an unmasked status register exception bit is set, the Am29027 will signal an error by asserting signal DERR when the Am29000 performs a read result LSBs, read result MSBs, read flags, or read status transaction request. Error reporting can be suppressed by issuing any of these transaction requests with signal DREOTo asserted. Halt On Error Mode Should the application require, the Am29027 can be configured to halt operation upon detection of an unmasked exception; this mode is invoked by setting mode register bit HE (Halt On Error) High. Once configured this way, the Am29027 will respond to an unmasked exception as follows: • Signal CDA will become inactive upon completion of the operation producing the unmasked exception. • Should the operation producing the unmasked exception specify that the operation result be stored on-chip, that is, in the register file, the result will not be written to its destination. • A pending operation will not be started; the operands and/or instruction for that operation will remain in the appropriate temporary registers. • If the Am29000 attempts to start a new operation during the last cycle of the operation that produces the unmasked exception by issuing a write operand R, write operand S, write operands R, S, or write instruction transaction request with DREOTo asserted, and if no other operation is pending, the operand or instruction will be written to the appropriate temporary register rather than to the R, S, or instruction register. • Once CDA is deasserted, the Am29027will respond to the write operand R, write operand S, write operands R, S, and write instruction transaction requests by asserting signal DERR one cycle after the request is issued; the contents of the target register or registers will remain unchanged. Write instruction value to Register I-Temp using write instruction transaction request. Transfer contents of Registers R-Temp, STemp, and I-Temp to Register R, Register S, and the instruction register, respectively, using the advance temp registers transaction request. R-Temp, S-Temp, I-Temp Write R-Temp value to Register R-Temp using the write operand R transaction request. Write S-Temp value to Register S-Temp using the write operand S transaction request. Write I-Temp value to Register I-Temp using the write instruction transaction request. Status Write status to status register using the write status transaction request, with signal DREQTo asserted to indicate that the restore sequence is complete. The user may elect to restore only those registers relevant to a particular application by omitting parts of the state restoration sequence. The only mandatory por- 1-151 29K Family CMOS Devices Through these measures, the Am29027 will retain the input operands and instructions for the operation causing the exception. The input operands will be retained in the R register, S register, or register file locations, and the instructions will be retained in the instruction register. Additionally, the R-Temp, S-Temp, and I-Temp registers may contain the operands and instructions for a partially or fully specified pending operation. The Am29000 can recover these operands and instructions with the save state transaction request; this information can then be given to an error-handling routine for resolution. The error halt condition is removed by clearing the status register exception status (ES) bit and the exception bit or bits responsible for producing the halt. Reporting Errors via EXCP Signal EXCPwili go active Low inthe presence of an unmasked exception. This signal can be connected to an Am29000 trap or exception input signal, and is enabled or disabled independent of other exception handling mechanisms with mode register bit EX. PLTC PLTC specifies the number of cycles allotted to operations other than those using base operation codes (P x a) + T or MOVE P. This count can assume values between 3 and 15, inclusive, and must be given a value that satisfies the relationship: [8]~ PlTC x [1], where and [8] = Operation time, flow-through mode, all other base operation codes [1] = ClK period, as described in the Switching Characteristics table. MATC MATC specifies the number of cycles allotted to operations that use base operation code F' = (P' X a') + 1'. This count can assume values between 3 and 15, inclusive, and must be given a value that satisfies the relationship: [6]~MATC x [1], where Writing to the Mode, Status, and Precision Registers Unlike the R, S, and instruction registers, the mode, status, and precision registers are not preceded by temporary registers. Accordingly, writing to these registers may produce undesirable or unpredictable side effects if an accelerator 'operation is in progress at the time. To avoid such side effects, a write to any of these registers should be preceded by a read transaction request, which will guarantee that any current or pending accelerator operations will have been completed before the write transaction request is issued. and [6] = Operation time, flow-through mode, F' =(P' x 0') + T' [1] = ClK period, as described in the Switching Characteristics table. MVTC MVTC specifies the number of cycles allotted to operations that use the MOVE P base operation code. This count can assume values between 3 and 15, inclusive, and must be given a value that satisfies the relationship: [7] ~ MVTC x [1], where Writing to the Register File The numerical result of any operation may be written to the register file by specifying the desired destination in instruction field RFS and setting instruction bit RF High. The result can then be used as an input operand for subsequent operations. It is permissible for an operation result to be placed in a register file location that previously contained an input operand for that operation. In such a case, however, it is not permissible for the Am29000 to directly read the result, status, or flags for that operation, as the writing of the result modifies the operation performed by the ALU. Determining Timer Counts To provide optimum accelerator performance over a range of possible system clock frequencies, the timing of Am29027 operations is programmable. Three mode register fields-pipeline timer count (PLTC), timer count for the Multiply-Accumulate Operation (MATC), and timer count for the MOVE P Operation (MVTC)-must be programmed according to system clock frequency and accelerator speed. 1-152 and [7] = Operation time, flow-through mode, MOVE P [1]= CLK period, as described in the Switching Characteristics table. ADVANCING DRDY Normally, an operation result produced by the Am29027 in flow-through mode is read by the Am29000 no sooner than the clock cycle following operation completion. Depending on the system clock frequency used, it may be advantageous to overlap the reading of the result with the last cycle of the operation. Consider, for example, a system with a 45-ns clock cycle and an Am29027 that performs an operation in 240 ns. The pipeline timer count PLTC will have to be set to a minimum of 6 for such a system, and the Am29000 will read a result no sooner than during the seventh clock cycle after the start of an operation. Mode register bit DA, DRDY Advance, can be used to advance transaction status Signals DRDY and DERR by a full clock cycle, thus allowing the Am29000 to read data one clock cycle earlier than would otherwise be Am29027 possible. Forthe example given above PLTC remains at 6, but the Am29000 can read data during the sixth clock cycle after the operation starts rather than the seventh, thus saving a clock cycle. formed in pipeline mode, the pipe must be advanced twice (by starting two operations) before the result of the addition appears in Register F, the flag register, the status register, and, optionally, a register file location. In orderto advance DRDY and DERR, the following system timing conditions must be met: Performing Operations [19]S (MATC x [1])-[x9B]-lgate] [20] s (MVTC x [1]) -{x 9B]- [gate] [21]S(PLTC x [1])- [x9B]-[gate] where [19] = Data operation-start-to-output [20] = Data operation-start-to-output valid delay, F' = P' x a' + T' valid delay, MOVE P [21] = Data operation-start-to-output valid delay, all other operations [1] = ClK period and Pipeline mode operations are performed by: • Storing· instructions and/or operands Am29027, and starting the operation • Loading the result of a previous operation Storing instructions and operands can be done in any of three ways: • Writing the Instructions only, and starting the operation: This is appropriate when all necessary operands are already present in the Am29027, as is sometimes the case when using on-board constants or the results of previous operations stored in the register file. • Writing the operands only, and starting the operation: This is appropriate when the desired instructions are already present in the Am29027, as is the case when performing the second of two identical operations. • Writing the Instructions and operands, and starting the operation: This is appropriate whenever the next operation requires both new instructions and new operands. as described in the Switching Characteristics table and [x 9] = Synchronous input setup time as described in the Switching Characteristics table of the Am29000 Preliminary Data Sheet (order #09075). The term [gate] represents the delay of the external gate through which the DERR signal passes. Timing for a typical accelerator operation with DRDY advanced is illustrated in Appendix D. Operation In Pipeline Mode Pipeline mode is invoked by setting mode register bit PL (Pipeline Mode Select) to logic High. Programmer's Model A programmer's model of the Am29027 in pipeline mode is shown in Figure 10. Note that Output Register F and the flag register are non-transparent in this mode, thus permitting the overlap of the current operation(s) with the reading of the result for a previous operation. Pipeline Delays When placed in pipeline mode, the ALU is divided into three pipeline stages for multiply-accumulate operations, and into two stages for all other operations. The ALU configuration for pipeline mode is shown in Figure 11, Note that for multiplication-accumulation operations, multiplicand P and multiplier 0 enter the first· pipeline stage, while addend T enters the second pipeline stage. As a consequence, the source for operands P and 0 must be specified in the corresponding multiplyaccumulate instruction, while the source for operand T must be specified in the following instruction. Pipeline Advance The ALU pipeline is advanced whenever a new operation begins. One consequence of this advance criterion is that data does not fall through the pipe but instead is "pushed" through. If, for example, an addition is per- in the Operands and instructions are written using the write operand R, write operand S, write operands R, S, and write instruction transaction requests. Operands and instructions can be written to the Am29027 in any order, with the operation start bit (DREOTo High) accompanying the last of the transaction requests. Loading the result of a previous operation is performed using the read result MSBs, read result LSBs, and read flags transaction requests. The specific request used depends onwhetherthe result is a flag orflags (as isthe case with comparison operations) or data (as is the case with most other operations). In cases where the operation result is stored in the register file, the user may elect not to read the reSUlt, but to proceed with the next operation. Operation Timing The Am29027 will usually start a pipe lined operation during the first cycle following the receipt of a write operand R, write operand S, write operands R, S, or write instruction transaction request having signal DREOTo set High. Operation execution begins with the transfer of the contents of the R-Temp, S-Temp, and I-Temp registers to Register R, Register S, and the instruction register, respectively; data is transferred only from those temporary registers written to as part of the operation specification. The operand or instruction accompanying the 1·153 29K Family CMOS Devices 32 ~----------~------~ 32 64 p 64 Q 64 T ______ ..All! _____ _ 09114·012C Figure 16. Programmer's Model for Pipeline Mode transaction request that starts the operation (that is, the transaction request for which signal DREQTo is High) is written directly to the appropriate working register, that is, Register R, Register S, or the instruction register. At the start of the operation, the output of the last ALU pipeline stage is transferred to Register F, the flag register, and, optionally, to a register file location; the status register exception status and exception bits are updated. The outputs of all other ALU pipeline stages are written to their respective pipeline registers. Once started, an operation will proceed for the number of cycles specified by mode register field PLTe, which denotes the number of cycles needed for data to traverse a single pipeline stage. 1-154 There are two conditions for which the Am29027 will not start an operation immediately. The first condition is when an operation has been started recently and has not yet had time to settle at the output of the first pipeline stage. In this case the new operation is kept pending in the I-Temp, R-Temp, and S-Temp registers until the previous operation completes the first pipeline stage. The second condition is when a previous operation creates an unmasked exception in Halt On Error mode (mode register bit HE High). In this case the new operation is kept in the I-Temp, R-Temp, and S-Temp registers until the exception is cleared, at which time the new operation will begin. Am29027 P a T Instruction P a T Instruction Pipeline Register F F a. Multiply-Accumulate b. Other Operations 09114-013C Figure 17. ALU Configuration for Pipeline Mode Timing for typical accelerator operations in the pipeline mode is illustrated in Appendix D. Because Register F, the flag register, and the status register are updated at the beginning of an operation, these registers can be read at any time after an operation begins. eration starts and if another operation is completely specified thereafter, subsequent read result MSBs and read result LSBs transaction requests will return not X. but the result placed in the F register when the second operation begins; the read flags and read status transaction requests will behave in like manner. This delayed read feature is provided to eliminate ambiguity in the correspondence between operations and results. Overlapping Operations Saving and Restoring State Due to the presence of the R-Temp, S-Temp, and ITemp registers, it is possible to partially or completely specify a new operation while the previously specified operation is propagating through the first ALU pipeline stage. Execution of the new operation will begin immediately after the previous operation completes the first pipeline stage. Execution begins with the transfer of the contents of the R-Temp, S-Temp, and I-Temp registers to the corresponding working registers; only those temporary registers that have been written to as part of operation specification are transferred. Due to the presence of ALU pipeline registers. it is not possible to save the complete state of the Am29027 in pipeline mode. Pipeline operations may therefore be interrupted only under special circumstances, such as: Availability of Operation Results It is important to note that, once the new operation is completely specified, any attempt to read a result will be held off until the new operation begins; this means that it is not possible to read the result that is placed in the output registers when the first operation begins. If, for example. result X is placed in Register F when an op- • If the interrupting routine does not use the floating-point accelerator or • If· the current series of pipelined operations has been completed. and any operands needed for future operations have already been transferred to the Am29000 The save state transaction request is disabled in pipeline mode. It is permissible to switch to flow-through mode and use the save state transaction request, but 1-155 29K Family CMOS Devices doing so does not permit the saving of Register F, the flag register, or the ALU pipeline registers. The error halt condition is removed by clearing the status register exception status (ES) bit and the exception bit or bits responsible for producing the halt. Error Recovery As for flow-through mode, the Am29027 provides three mechanisms with which unmasked exceptions can be handled. Reporting Errors Upon Read If an unmasked status register exception bit is set, the Am29027 will signal an error by asserting signal DERR when the Am29000 performs a read result LSBs, read result MSBs, read flags, or read status transaction request. Error reporting can be suppressed by issuing any of these transaction requests with signal DREOTo asserted. Reporting Errors via EXCP Same as for the flow-through mode. Pipeline Invalidation There are several situations for which the ALU pipeline stages may contain invalid data. The Am29027 recognizes these situations and invalidates results automatically; results marked as invalid will not update the status register, register file locations RF7-RFo, or the precision register. Results are invalidated forthe following conditions: • The Am29027 is switched from flow-through mode to pipeline mode. Any data present in the ALU at the time of the switch is marked as invalid. This invalidation is illustrated in Figure 12a. • The Am29027 performs a multiply-accumulate operation that is preceded by an operation other than multiply-accumulate. The mUltiply-accumulate operation result and the result that precedes it will be separated by a spurious result, due to the insertion of an additional pipeline stage for the multiply-accumulate operation. The spurious result is marked invalid. This invalidation is illustrated in Figure 12b. Halt On Error Mode Should the application require it, the Am29027 can be configured to halt operation upon detection of an unmasked exception; this mode is invoked by setting mode register bit HE (Halt On Error) High. Once configured this way, the Am29027 will respond to an unmasked exception as follows: • Signal CDA will become inactive when the results of the operation producing the unmasked exception are transferred from the last pipeline stage to Register F, the flag register, and the status register. • Once CDA is deasserted, the Am29027will respond to the write operand R, write operand S, write operands R, S, and write instruction transaction requests by asserting signal DERR one cycle after the request is issued; the contents of the target register or registers will remain unchanged. Through these measures, the Am29027 will retain the input operands and instructions for the most recently started operation. The input operands for that operation will be retained in the R register, S register, or register file locations, and the instructions will be retained in the instruction register. Additionally, the R-Temp, S-Temp, and I-Temp registers may contain the operands and instructions for a partially or fully specified pending operation. Note that the input operands and instructions words for the operation causing the exception, as well as for operations currently in the ALU pipeline, will not be available. At the user's option, this information can be stored in a circular queue in the Am29000 register file so that full recovery from a pipe lined exception is possible. The Am29000 can read the contents of Am29027 operand and instruction registers by invoking flow-through mode and using the save state transaction request. Note that the contents of Register F, the flag register, and the ALU pipeline registers will be lost. This information can then be given to an error-handling routine for resolution. 1-156 The pipeline may also be invalidated manually by issuing a write status transaction request with signal DREOTo asserted High; this request invalidates all current pipeline contents. Pipeline invalidation does not apply to operation in flow-through mode. Writing to the Mode, Status, and PreCision Registers Unlike the R, S, and instruction registers, the mode, status, and precision registers are not preceded by temporary registers. Accordingly, writing to these registers may produce undesirable or unpredictable side effects if an accelerator operation is pending at the time. To avoid such side effects, a write to any of these registers should be preceded by a read transaction request, which will guarantee that any pending accelerator operation will have started before the write transaction request is issued. The mode register outputs are not pipelined in the ALU, that is, all pipeline stages receive mode information directly from the mode register. Accordingly, writing to the mode register may produce undesirable or unpredictable side effects for operations currently in the ALU pipeline. To avoid such side effects, a write to the mode register should be performed only if the contents of the ALU pipeline are a "don't care,"that is, only after the last operation result of interest has been written to Register F, the flag register, or a registerfile location. If, for exam- Am29027 ~ Start Operation ~ ~ ~ ~ ~ ~ ~ Operation 2 3 4 5 6 7 Pipeline Stage 1 I 2 3 4 5 6 7 Pipeline Stage 21 2 ? 3 4 5 6 Result 2 ? ? 3 4 5 j4-Pip eline Outpu~ Invalid i Switch to Pipeline Mode a. Pipeline Invalidation timing for switch from flow-through to pipeline mode. Operations shown Incur two pipe-line delays In pipeline mode [all base operations except F' (P' x a') + T]. = Start Operation ~ Operation· I ADD11 MPY11 MAC11 MAC2 1 MAC3 1 (DMAC)I ADD2 1MPY21 ADD3 1 MPY31 ADD41 MPY41 Pipeline Stage 11 ADD11 MPY11 MAC11 MAC2 1 MAC3 1 (DMAC)I ADD2 1MPY21 ADD3 1 MPY31 ADD41 MPY4 1 Pipeline Stage 21 1 ADD11 MPY11 MAC1 1 MAC2 1 MAC3 1 (DMAC)I ADD21 MPY21 ADD31 MPY31 ADD4 I Pipeline Stage 31 Result ? 1 MAC1 1 MAC2 1 MAC3 1 1 ADD11 MPY1 1 Pipeline Output Invalid I -+1 ? 1 MAC1 1 MAC2 1MAC31 ADD2 1MPY21 ADD31 MPY3 I I+- b. Pipeline Invalidation timing for mUltiply-accumulate operations In pipeline mode. Notes: ADDx MPYx MACx (DMAC) addition operation multiplication operation multiply-accumulate operation dummy multiply-accumulate operation 09114-014C Figure 18. Pipeline Invalidation Timing pie, the last in a series of addition operations has just been started, the mode register should not be written until the pipeline is advanced twice, placing that operation's results in the F register, flag register, and, optionally, a register file location. instruction field RFS and setting instruction bit RF High. The result may then be used as an input operand in subsequent operations. Because all ALU operations incur one or more pipeline delays, the result of an operation will not be available for use by the very next operation. Writing to the Register File It is permissible for an operation result to be placed in a register file location that previously contained an input operand for that operation. The numerical result of any operation may be written to the register file by specifying the desired destination in 1-157 29K Family CMOS Devices Multiplication-Accumulation Operations The pipeline structure of the Am29027 permits the evaluation of sum-of-products expressions in a canonically efficient manner by interleaving the evaluation of two sum-of-product expressions. Operation sequencing is described in Figure 13. can begin. This is accomplished by asserting the RESET signal, which initializes accelerator state as follows: • All bits in the status register are cleared • The accelerator is placed in flow-through mode • Signal COA is active; signals OROY and OERR are inactive • All internal circuitry controlling operation timing is initialized Determining Timer Counts As for flow-through mode, the timing of operations in pipeline mode is programmable to accommodate variations in system timing. A single mode register field-pipeline timer count (PLTC}-specifies the timing of all pipelined operations; fields MATC and MVTC are not used. PLTC specifies the number of cycles allotted for data to traverse a single pipeline stage. This count can assume values between 2 and 15, inclusive, and must be given a value that satisfies the relationship: [9]sPlTC X [1], where and [9] = Operation time, pipeline mode, all operations [1]= ClK period, as described in the Switching Characteristics table. Advancing DRDY Because the Am29027 F register and flag register are non-transparent in pipeline mode, it is not possible (nor advantageous) to advance OROY. Accordingly, mode register bit M44 has no effect in pipeline mode. Master/Slave Operation Two Am29027 accelerators can be tied together in master/slave configuration, with the slave checking the results produced by the master. All input and output signals of the slave, with the exception of SLAVE and MSERR, are connected directly to the corresponding signals of the master. The master is selected by asserting signal SLAVE Low, the slave by asserting signal SLAVE High. The slave accelerator, by comparing its outputs to the outputs of the master accelerator, performs a comprehensive check of master accelerator logic. In addition, if the slave accelerator is connected at the proper position on the Am29000 buses, it may detect open circuits and other fau Its in the electrical path between the master accelerator and the Am29000. Note that the master accelerator also performs a comparison between its outputs and its own internally generated results, and is therefore able to detect faults in its output drivers, which it reports with its MSERR signal. Initialization and Reset The accelerator is in an unknown state when power is first applied and must be initialized before processing 1-158 The RESET signal does not initialize the operand and instruction registers and may corrupt existing register contents. It is the responsibility of the user to initialize these registers, if needed. Applications Suggestions for Power and Ground Pin Connections The Am29027 operates in an environment of fast signal rise times and substantial switching currents. Therefore, care must be exercised during circuit board deSign and layout, as with any high-performance component. The following is a suggested layout, but since systems vary widely in electrical configuration, an empirical evaluation of the intended layout is recommended. The Veeo and GNOO pins carry output driver switching currents and can be electrically noisy. The Vee and GNO pins, which supply the logic core of the device, tend to produce less noise and the circuits they supply may be adversely affected by noise spikes on the Vee plane. For this reason, it is best to provide isolation between the Vee and Veea pins as well as independent decoupling for each. Isolating the GNO and GNOO pins is not required. Printed Circuit-Board Layout Suggestions 1. Use of a multilayer PC board with separate power, ground, and signal planes is highly recommended. 2. All Vee and Veeapinsshould be connected to the Vee plane. Veea pins should be isolated from Vee pins by means of an isolation slot which is cut in the Vee plane (see Figure 14). By physically separating the Vee and Veea pins, coupled noise will be reduced. 3. All GNO and GNOO pins should be connected directly to the ground plane. 4. The Veea pins should be decoupled to ground with a O.1-IlF ceramic capaCitor and a 10-IlF electrolytic capacitor, placed as closely to the Am29027 as is practical. Vee pins should be decoupled to ground in a similar manner. A suggested layout is shown in Figure 14. Operation I MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC I Register R 1 a11 1 a21 a12 a22 a13 a23 a14 a24 a31 a41 a32 a42 a33 a43 a34 844 Register S I b2 b2 b3 b3 b4 b4 bl bl b2 b2 b3 b3 b4 bl 1 MAC I bl I a14xb4 la12xb2+1a22xb2+ la13xb3+ I a23xb3+ Pipeline Stage 1 lall xblla21xbl 1 a12xb2 1 a22xb21 a13xb31 a23xb3 Pipeline Stage 2 1 lal1xbll a21xbl (Cl) (02) (Cl) 1 a14xb4+ 1 a24xb4+ 1 a31xbl (el) (02) I all xbl I a21xbl Ia12xb2+ Ia22xb2+ 1 a13 x b3 + Pipeline Stage 31 a24xb4 1 a31 xbl 1 a41 xbl 1 a32xb2 1 a42xb2 b4 1 a44xb41 I a41xbl I a32xb2+1 a42xb2+ 1 a33xb3+ la43xb3+ I a34xb4+1 (02) (03) 1 a23xb3+ 1 a14xb4+ 1 a24xb4+ I" a33xb3 I a43xb3 I a34xb4 (e4) (03) (e4) MAC· (03) I a31 xbl I a41 xbl I a32xb2+ 1 a42xb2+ 1 a33xb3+ I a43xb3+1 a44Xb4+1 (04) a34xb4+ la44xb4 + 1 (Cl) (02) (el) (02) (el) (02) (03) (04) (03) (e4) (03) RF t (el) (02) (Cl) (02) (el) (02) el 02 (03) (04) (03) (04) (03) (04) 03 104 RegisterF (el) (02) (el) (02) (el) (02) el c2 (03) (e4) (03) (04) (03) (04) 03 1 04 (04) Calculate matrix product C - A x S, where: A_ a11 a21 [ a31 a41 a12 a22 a32 842 a13 a14J a23 a24 a33 a34 a43 a44 B-[~J c-[~J c1 =a11 c2=a21 c3=a31 c4 =841 xb1 xb1 xb1 xb1 +a12xb2+a13xb3+a14 xb4 +a22xb2+a23xb3+a24xb4 +832xb2+833xb3+a34xb4 +842xb2+a43xb3+s44xb4 09114'{)15C Notes: 1. 2. Register file location RFo is used as the accumulator. Parentheses are used to indicate partial sums of products. • Additional MAC operation needed to terminate sequence. Figure 13. Canonically Efficient Sum-of-Products Evaluation In Pipeline Mode > 3 -" -" (II CD N CD o N ...... 29K Family CMOS Devices ABC 0 E F G H J K L MN P R T U 100000000000000000 200000000000000000 300000000@00000000 40 000 50 @ @OO 60 0 OOOA 70 sO 0 90 0 OOO~ 0 0 0 0 0 0 100 110 120 130 140 150 160 170 @t f-O 0000 7 C OOO~ 0 0 0 0000 0000 0000 O@O 000 000 OOO@ 000 000 000 000 000 000 000 CS O-t ~3@1 ~Of ~~ f-O 01 ~@~ C, C2 Cs Cs C3 C4 " Vee Isolation Cut o o = Through Hole CDOl17ll = Vee Plane Connection C, = C3 = Cs = C7 = 0.1 C2 = C. J.lF (ceramic or monolithic capacitor) = Cs = Ce = 10 J.lF (electrolytic or tantalum capacitor) Figure 20. Suggested Printed Circuit· Board Layout (power and ground connections) 1·160 Am29027 ABSOLUTE MAXIMUM RATINGS OPERATING RANGES Storage Temperature ............ ~ -65 to +150°C (Ambient) Temperature Under Bias .. -55 to + 125°C Supply Voltage to Ground Potential Continuous .. " -0.3 V to +7.0 V DC Voltage Applied to Outputs for High Output State ......... -0.3 V to +Vcc +0.3 V DC Input Voltage ........... -0.3 V to +Vcc +0.3 V DC Output Current,lnto Low Outputs ....... 30 rnA DC Input Current ............. -10 rnA to +10 rnA Commercial (C) Devices Stresses above those listed under ABSOLUTE MAXIMUM RA T1NGS may cause permanent device failure. Functionality at or above these limits is not implied. Exposure to absolute maximum ratings for extended periods may affect device reliability. Case Temperature (Tc) " ......... 0 to +85°C Supply Voltage (Vee) ....... +4.75 V to +5.25 V Milltary* (M) Devices Case Temperature (Te) ........ -55 to +125°C Supply Voltage (Vee) ......... +4.5 V to +5.5 V Operating ranges define those limits between which the functionality of the device is guaranteed. "Military Product 100% tested at Tc=+25°C, +125°C, and -55°C. 1-161 29K Family CMOS Devices DC CHARACTERISTICS over COMMERCIAL operating range unless otherwise specified (for APL Products, Group A, Subgroups 1, 2, and 3 are tested unless otherwise noted) Parameter Symbol VOH Parameter Description Output High Voltage VOL Output Low Vo~age VIH Guaranteed Input Logical High Voltage (Note 2) V IL Guaranteed Input Logical Low Voltage (Note 2) VIH(F) Guaranteed Input Logical High Voltage (Notes 2, 6) F Bus, Slave Operation Only VIL(F) Guaranteed Input Logical Low Voltage (Notes 2, 6) F Bus, Slave Operation Only IlL Input Leakage Current to Output Leakage Current Test Conditions (Note 1) Min. Max. Unit Vee =Min. V 2.4 IOH=-4.0 rnA Vee =Min. VIN = VIH or VIL 0.45 2.0 V V 0.8 Vee -0.5 V V 0.5 V .r,,""o'\\\i" Icc Static Static Power Supply Current ;~::~,;\,;t;~\~)" (Note 3) Qfy10S VIN =Vee or ',>" GND 240 (Note 3) TTL VIN =0.5 V or 2.4 V 275 (N6te3) rnA CMOS VIN =Vee or GND Te =-55 to (Note 3) +125°C Iccop Operating Power Supply Current TTL VIN =0.5 V or 2.4 V Vee =Max. Outputs floating 9.0 rnA/MHz Notes: 1. Vee conditions shown as Min. or Max. refer to ±5% Vee (commercial) and ±10% Vee (military). 2. These input levels provide zero noise immunity and should only be statically tested in a noise-free environment (not functionally tested). 3. Use CMOS lee when the device is driven by CMOS circuits and TTL Icc when the device is driven by TTL circuits. 4. lee (Total) .. lee (Static) + lecop x f, where f is in MHz. This is tested on a sample basis only. 5. Tested on a sample basis only. 6. These levels guaranteed compatible with F bus output levels. CAPACITANCE Parameter Symbol Parameter Description C IN Input Capacitance COUT Output Capacitance ClIO 1/0 Pin Capacitance 1-162 Test Conditions tc =1 MHz (Note 5) Min. Max. Unit 12 pF 20 pF 20 pF Am29027 SWITCHING CHARACTERISTICS over COMMERCIAL operating range 25 MHz No. Parameter Description 1 2 3 4 5 ClK Period 6 7 8 9 10 11 12 Test Conditions Min. (Note 1) 40 18 18 ClK Low Time ClK High Time ClK Rise Time (Note 2) ClK Fall Time (Note 2) "", 280~~'::: ""290>'; ;~;~ .<,:':,:'< l'i~:;~1'5~>:::: Operation Time, Pipeline Mode All Operations ,; ';;:~:'~120' '<;;.~ (Note 3) i"<,:>\~ ",'(;,:>it!·>' i(::,>",';,i:'~:"':"i:,: :"<");.1"1 (Note 3) Transaction Request Setup Time Transaction Request Hold Time BIN V Setup Time <~,;!,,::,\"~ii:::::'";':::;,:~, 13 BIN V Hold Time 14 15 16 Data Setup Time 17 Instruction Hold Time 18 CDA ClK-to-Output-V~li~"Q~lay. ;:: 19 20 F31-Fo ClK-to-Output-Val14:'qelay Instruction Setup TIme 50 20 20 16 MHz Max. Min. DC 60 22 22 5 ,c:t26~;) .,., Min. DC 5 5 Operation Time, low-latency Mode, F' = (P' x a') + T' MOVEP (All Other Base Operation Codes) Max. DC Unit ns ns ns 5 5 ns 5 300 150 250 360 180 300 ns ns ns 180 ns 150 ns 24 0 13 26 0 15 ns 2 2 2 ns 18 2 18 2 22 2 22 2 24 2 24 2 ns ns ns .;,,,. ~ ~;::"::t:(~ot;~~1'»';f' .,(;:;,~;,/",. Data Hold Time 20 MHz Max. "d:~~', ';';:':,"';"~:~,i; ,,"" i:"~~ote 5) "" . :t'I'""/:~· "",' ns ns ns 20 24 26 ns 30 35 37 ns 22 25 27 ns 285 135 235 21 340 160 280 23 ns ns ns DRDY ClK-to-Output-Valid Delay 270 110 190 18 25 DERR ClK-to-Output-Valid Delay 18 21 23 ns 26 EXCP ClK-to-Output-Valid Delay 21 MSERR ClK-to-Output-Valid Delay 23 30 ns 27 18 20 21 22 23 24 ""1"" F31-Fo Three-State ClK-to-Output-lnactive Delay (Note 6) Data Operation-Start-to-OutputValid Delay F'=(p'xa')+ T' MOVEP (All Other Base Operation Codes) 25 ns ns Notes: 1. ClK switching characteristics are made relative to 1.5 V. 2. ClK rise time/fall time measured between 0.8 V and (Vee -1.0 V). Tested on a sample basis only. 3. Transaction request signals include RiW, oREa, DREaT,-DREaT 4. Data signals include R31-RO and S31-S0. " and OPTrOPTo. 5. Instruction signals include b,-Io. 6. Three-State Output Inactive Test load. Three-State ClK-to-Output-lnactive Delay is measured as the time to a ±500 mV change from prior output level. Conditions: A. All inputs/outputs are TTL-compatible for V1H , V 1L, and VOL unless otherwise noted. B. All outputs are driving 80 pF unless otherwise noted. C. All setup, hold, and delay times are measured relative to elK at 1.5 V unless otherwise noted. 1·163 29K Family CMOS Devices SWITCHING CHARACTERISTICS over MILITARY operating range 20 MHz No. Parameter Description 1 2 3 4 ClK Period ClK Rise Time (Note 2) 5 CLK Fall Time (Note 2) 6 8 Operation Time, low-latency Mode, F'=(P'xQ')+ T' MOVEP (All Other Base Operation Codes) D Operation Time, Pipeline Mode All Operations 7 Test Conditions (Note 1) ClK low Time ClK High Time Max. Min. Max. Unit 50 20 20 DC 60 22 22 DC ns 'i('\ i!·•··':"':·:,:. . ,.'i,. Ii,::';; " .. 10 11 12 Transaction Request Setup Time (Note 3)/",,"\ 1(:.~::'.\24i· Transaction Request Hold Time (Note 3)':',"<: I.'·,", 0 13 BINV Hold Time 14 15 16 17 Data Setup Time /':' .... BINV Setup Time '\ Instruction Setup Time Instruction Hold Time ::,\:.: I""')' """'\".'';:, Data Hold Time .... , ns ns 5 5 ns 5 5 ns 360 180 300 ns ns ns 180 ns ./ ··'·~:'1..:,~:.,\, '., 3Q01 "'\.150 " .:250 1<> '-',,:,:"'\.,,1.·\ ,< 16 MHz Min. ~,. '":':':" ".":,., ,':'ii(Note"a) . ii'' .··.·.';.. .. : ........ I:, . ··'··· (Note 5) '" 150 ns 2 26 0 16 2 22 2 22 2 24 2 24 2 ns 14 ns ns ns ns ns ns 18 COA CLK-to-Output-Valid Delay';\, 24 26 ns 19 20 F31-FO CLK-to-Output-Valid'pelay 35 40 ns 26 30 ns 340 160 280 23 ns ns ns F3,-Fo Three-State CLK-toOutput-Inactive Delay (Note 6) Data Operation-Start-to·OutputValid Delay 21 22 23 24 F' = (P'xQ/) + T' MOVEP (All Other Base Operation Codes) DRDY ClK-to-Output-Valid Delay 285 135 235 21 25 DERR CLK-to-Output-Valid Delay 21 23 ns 26 EXCP ClK-to-Output-Valid Delay 21 ns 27 MSERR ClK-to-Output-Valid Delay 25 23 30 ns ns Notes: 1. ClK switching characteristics are made relative to 1.5 V. 2. ClK rise time/fall time measured between 0.8 V and (Vcc -1.0 V). Tested on a sample basis only. 3. Transaction request signals include RlW, DREQ, DREQT,-DREQTo, and OPTrOPTo• 4. Data signals include R3,-Ro and S3'-SO. 5. Instruction signals include b,-Io. 6. Three-State Output Inactive Test load. Three-State ClK-to-Output-lnactive Delay is measured as the time to a ±500 mV change from prior output level. Conditions: A. All inputs/outputs are TIL-compatible for V1H • V1L • and VOL unless otherwise noted. B. All outputs are driving 80 pF unless otherwise noted. C. All setup, hold, and delay times are measured relative to ClK at 1.5 V unless otherwise noted. 1-164 Am29027 SWITCHING WAVEFORMS ClK 3 - -......1 - - - Transaction Request Data, Instruction EXCP Input Signal Timing; COA, EXCP Timing 1-165 29K Family CMOS Devices SWITCHING WAVEFORMS (continued) Start of Operation ~I----- 6 , 7 , 8 - - - - -... ClK Transaction Request X_.;...;.;No;.;..;te;,..;1_X Operation Timing for Flow-Through Mode, DRDY, DERR Not Advanced (Mode Register Bit AD=O) Notes: 1. Transaction request Write Operand R; Write Operand S; Write Operands R, S; or Write Instruction with Signal DREQTo asserted. 2. Transaction Request Read Result MSBs, Read Result lSBs, Read Flags, Read Status, or Save State. If reguest Read Result lSBs is issued, the Am29027 produces two data outputs in two consecutive cycles, with DRDY or DERR active for both cycles. 3. Signal EXCP is asserted in the presence of unmasked exception. 1-166 Am29027 SWITCHING WAVEFORMS (continued) Start of Operation ~ elK Transaction Request 26 ~v I(I\N'''o'te 1.SV'---= 3) Operation Timing for Flow-Through Mode, DRDY, DERR Advanced (Mode Register Bit AD=1) Notes: 1. Transaction request Write Operand R; Write Operand S; Write Operands R, S; or Write Instruction with Signal DREQTo asserted. 2. Transaction Request Read Result MSBs, Read Result lSBs, Read Flags, Read Status, or Save State. If request Read Result lSBs is issued, the Am29027 produces two data outputs in consecutive cycles, with DRDY or DERR active for both cycles. 3. Signal EXCP is asserted in the presence of an unmasked exception. 1-167 29K Family CMOS Devices SWITCHING WAVEFORMS (continued) Transaction Request X X Not. t --+ 24, 25 +- --+ -------------------------;:~----------+-~ 24, 25 +- 1.5 V 26 4 ~.5V (Not. 3) Operation Timing for Pipeline Mode Notes: 1. Transaction request Write Operand R; Write Operand S; Write Operands R, S; or Write Instruction with signal DREQTo asserted. 2. Transaction Request Read Result MSBs, Read Result lSBs, Read Flags, Read Status, or Save State. If request Read Result lSBs is issued, the Am29027 produces two data outputs in consecutive cycles, with DRDY or DERR for both cycles. 3. Signal EXCP is asserted in the presence of an unmasked exception. ClK \. ... Master/Slave Discrepancy During This Cycle 2_7--+-J)-1.-5-V----~-27-~ MSERR _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Master/Slave Timing 1-168 Am29027 SWITCHING TEST CIRCUIT Vee R, = 300 ohms VOUT I Three-State Output Inactive Test IOL = 4.0 mA Am29027 Pin Under Test IOH = 4.0 mA 09075B-001A CL is guaranteed to 80 pF. 1-169 29K Family CMOS Devices TEST PHILOSOPHY AND METHODS The following nine points describe AMD's philosophy for high-volume, high-speed automatic testing. 1. Ensure that the part is adequately decoupled at the test head. Large changes in Vee current as the device switches may cause erroneous function failures due to Vee changes. 2. Do not leave inputs floating during any tests, as they may start to oscillate at high frequency. 3. Do not attempt to perform threshold tests at high speed. Following an output transition, ground current may change by as much as 400 mA in 5-8 ns. Inductance in the ground cable may allow the ground pin at the device to rise by hundreds of millivolts momentarily. 4. Use extreme care in defining point input levels for AC tests. Many inputs may be changed at once, so there will be significant noise at the device pins and they may not actually reach VIL or VIH until the noise has settled. AMD recommends using VIL'5, 0 Vand VIH ~ 3.0 V for AC tests. 5. To simplify failure analysis, programs should be designed to perform DC, Function, and AC tests as three distinct groups of tests. 6. Capacitive Loading!or AC Testing. Automatic testers and their associated hardware have stray capacitance that varies from one type of tester to another, but is generally around 50 pF. This, of course, makes it impossible to make direct measurements of parameters that call for smaller capacitive load than the associated stray capacitance. Typical examples of this are the so-called float delays, which measure the propagation delays into the high-impedance state and are usually specified at a load capacitance of 5.0 pF. In these cases, the test is performed at the higher load capacitance (typically 50 pF), and engineering correlations based on data taken with a bench setup are used to predict the result at the lower capacitance. Similarly, a product may be specified at more than one capacitive load. Since the typical automatic 1·170 tester is not capable of switching loads in mid-test, it is impossible to make measurements at both capacitances even though they may both be greater than the stray capacitance. In these cases, a measurement is made at one of the two capacitances. The result at the other capacitance is predicted from engineering correlations based on data taken with a bench setup and the knowledge that certain DC measurements (loH, IOL, for example) have already been taken and are within spec. In some cases, special DC tests are performed in order to facilitate this correlation. 7. Threshold Testing The noise associated with automatic testing (due to the long, inductive cables) and the high gain of the tested device when in the vicinity of the actual device threshold, frequently give rise to oscillations when testing high-speed circuits. These oscillations are not indicative of a reject device, but instead of an overtaxed test system. To minimize this problem, thresholds are tested at least once for each input pin. Thereafter, hard high and low levels are used for other tests. Generally this means that function and AC testing are performed at hard input levels rather than at VIL Max. and VIH Min. 8. AC Testing Occasionally, parameters are specified that cannot be measured directly on automatic testers because of tester limitations. Data input hold times often fall into this category. In these cases, the parameter in question is guaranteed by correlating these tests with other AC tests that have been performed. These correlations are arrived at by the cognizant engineer by using precise bench measurements in conjunction with the knowledge that certain DC parameters have already been measured and are within spec. In some cases, certain AC tests are redundant, since they can be shown to be predicted by some other tests that have already been performed. In these cases, the redundanttests are not performed. Am29027 Am29027 Thermal Characteristics Pin-Grid-Array Package 9JA = 9x + 9CA Thermal Resistance - °C/WaU Alrflow-ft./mln., (fTl/sec) ' 9CA Case-to-Ambient (wiflt'oqtp \:>~ ,../I-:/f; Heatsink, ThermalloY,J~4t7 , e Case-to-Ambient (wit~~~nidir~ctional Pin Fin CA 700 (3.58) 900 (4.61) 4 4 11 9 8 (2,;~5) Parameter Heatsink, Wakefield 840-20 10 6 3 2 2 2 6 3 2 2 2 700 (3.58) 900 (4.61) Am29027 Thermal Characteristics Ceramic Quad-Flat-Pack Package Thermal Resistance - °C/Watt Alrflow-ft./mln. (rn/sec) Parameter ex eCA 0 (0) 150 (0.76) 300 (1.53) 480 (2.45) Junction-to-Case Case-to-Ambient (no Heatsink) Note: This is for reference only. 1-171 29K Family CMOS Devices APPENDIX A-DATA FORMATS The following data formats are supported: 32-bit integer, 64-bit integer, IEEE single-precision, IEEE double-precision, DEC F, DEC 0, DEC G, IBM single-precision, and IBM double-precision. The primary and alternate floating-point formats are selected by mode register fields PFF and AFF. The user may select between floating-point operations and integer operations by means of instruction bit INs. The nine supported formats are described below: Integer Formats 32-Bit Integer The 32-bit integer word is arranged as follows: Bit 31 30 29 28 27 26 25 31 -2 30 2 29 2 28 2 27 2 26 2 7 25 6 5 4 3 o 2 76543210 2 22222222 TB001030 The 32-bit word is interpreted as a two's-complement integer. For integer multiplications, the user has the option of interpreting integers as unsigned. An unsigned single-precision integer has a format similar to that of the two's-complement integer, but with an MSB weight of 231. 64-Bit Integer The 64-bit integer word is arranged as follows: B~ 7 63 62 61 60 59 58 57 63 -2 62 2 61 2 60 2 59 2 58 2 57 6 5 4 3 o 2 76543210 2 22222222 TB001040 The 64-bit word is interpreted as a two's-complement integer. For integer multiplications, the user has the option of interpreting integers as unsigned. An unsigned double-precision integer has a format similar to that of the two's-complement integer, but with an MSB weight of 263. IEEE Formats IEEE Single Precision The IEEE single-precision word is 32 bits wide and is arranged in the format shown below: 31 30 29 28 27 26 25 7 6 5 4 3 2 24 23 22 21 20 19 18· 1 0 -1 222222222 sign biased exponent (e) -2 -3 -4 -5 2222' . . 3 2 1 0 -20 -21 -22 -23 .. 2222 fraction (f) TB001050 The floating-point word is divided into three fields: a single-bit sign, an 8-bit biased exponent, and a 23-bit fraction. The sign bit is 0 for positive numbers and 1 for negative numbers. 0 may have either sign. The biased exponent is an 8-bit unsigned integer representing a multiplicative factor of some power of 2. The bias value is 127. If, for example, the multiplicative value for a floating-point number is to be 2a, the value of the biased exponent is a + 127, where "a" is the true exponent. 1-172 Am29027 The fraction is a 23-bit unsigned fractional field containing the 23 least significant bits of the floating-point number's 24-bit mantissa. The weight of the fraction's most significant bit is 2- 1• The weight of the least significant bit is 2-23 • An IEEE floating-point number is evaluated or interpreted as follows: Not a Number Infinity Normalized number Denormalized number Zero If e=255 and f;tO ...... value=NaN If e = 255 and f = 0 ...... If 0 -1023 (1.1) If e = 0 and f 0 . . . . . . .• value = (-1 )52""1022 (0.1) Ife=Oandf=O ........ value=(-1)50 * Not a Number Infinity Normalized number Denormalized number Zero Infinity: Infinity can have either a positive or negative sign. The interpretation of infinities is determined by mode register bit AP. NaN: A NaN is interpreted as a signal or symbol. NaNs are used to indicate invalid operations and as a means of passing process status through a series of calculations. They arise in two ways: either generated by the Am29027 to indicate an invalid operation, or provided by the user as an input. A signaling NaN has the MSB of its fraction set to 0 and at least one of the remaining fraction bits set to 1. A quiet NaN has the MSB of its fraction set to 1. The IEEE format is fully described in ANSI/IEEE Standard 754-1985. 1·173 29K Family CMOS Devices DEC Formats DECF The DEC F word is 32 bits wide and is arranged in the format shown below: 31 30 29 28 27 26 25 6 5 4 3 24 23 2 1 22 0 21 20 19 18 . -2 -3 -4 -5 3 -6 1 0 -21 -22 -23 -24 2 222222222222· biased exponent (e) 2 2 2 2 fraction (f) TBO01070 The floating-point word is divided into three fields: a single-bit sign, an 8-bit biased exponent, and a 23-bit fraction. The sign bit is 0 for positive numbers and 1 for negative numbers; 0 has a positive sign. The biased exponent is an 8-bit unsigned integer representing a multiplicative factor of some power of 2. The bias value is 128. If, for example, the multiplicative value for a floating-point number is to be 2", the value of the biased exponent is a + 128, where "a" is the true exponent. The fraction is a 23-bit unsigned fractional field containing the 23 least significant bits of the floating-point number's 24-bit mantissa. The weight of the fraction's most significant bit is 2-2 • The weight of the least significant bit is 2-24 • A DEC F floating-point number is evaluated or interpreted as follows: If e¢O ..•.•.......... value¢(-1)S2O-128 (0.11) H s .. 0 and e .. 0 ...... value .. 0 If s .. 1 and e .. 0 ..•.... value =DEC· Reserved Operand DEC-Reserved Operand: A DEC-Reserved Operand is interpreted as a signal or symbol. DEC-Reserved Operands are used to indicate invalid operations and operations whose results have overflowed the destination format. They may also be used to pass symbolic information from one calculation to another. The DEC formats are fully described in the VAXTM Architecture Manual. DECO The DEC D word is 64 bits wide and is arranged in the format shown below: 63 62 61 60 59 58 57 56 55 6543210 54 53 52 51 50 . . -2-3-4-5-6 222222222222 biased exponent (e) . 3 2 fraction (f) 2 1 0 -53 -54 -55-56 2 2 2 TBO01080 The floating-point word is divided into three fields: a single-bit sign, an 8-bit biased exponent, and a 55-bit fraction. The sign bit is 0 for positive numbers and 1 for negative numbers; 0 has a positive sign. The biased exponent is an 8-bit unsigned integer representing a multiplicative factor of some power of 2. The bias value is 128. If, for example, the multiplicative value for a floating-point number is to be 2", the value of the biased exponent is a + 128, where "a" is the true exponent. The fraction is a 55-bit unsigned fractional field containing the 55 least significant bits of the floating-point number's 56-bit mantissa. The weight of the fraction's most significant bit is 2-2 • The weight of the least significant bit is 2-56• A DEC D floating-point number is evaluated or interpreted as follows: If e ¢ 0 .. . . . . . . . . . . . •. value = (-1 )$20-128 (0.11) H s = 0 and e = 0 ....... value = 0 If s'"' 1 and e =0 ....... value = DEC-Reserved Operand DEC-Reserved Operand: A DEC-Reserved Operand is interpreted as a signal or symbol. DEC-Reserved Operands· are used to indicate invalid operations and operations whose results have overflowed the destination format. They may also be used to pass symbolic information from one calculation to another. The DEC formats are fully described in the VAX Architecture Manual. 1-174 Am29027 DECG The DEC G word is 64 bits wide and is arranged in the format shown below: 54 53 52 63 62 61 60 sign 51 50 49 48 47 biased exponent (9) 3 2 1 0 T8001090 fraction (f) The floating-point word is divided into three fields: a single-bit sign, an 11-bit biased exponent, and a 52-bit fraction. The sign bit is 0 for positive numbers and 1 for negative numbers; 0 has a positive sign. The biased exponent is an 11-bit unsigned integer representing a multiplicative factor of some power of 2. The bias value is 1024. If, for example, the multiplicative value for a floating-point number is to be 2&, the value of the biased exponent is a + 1024, where "a" is the true exponent. The fraction is a 52-bit unsigned fractional field containing the 52 least significant bits of the floating-point number's 53-bit mantissa. The weight of the fraction's most significant bit is 2-2 • The weight of the least significant bit is 2-53. A DEC G floating-point number is evaluated or interpreted as follows: If e'l: 0 .. . . . . . . . . • . . .. value = (-1 )S20-1024 (O.H) If s=O and If s =1 and 9=0 ....... value=O 9 0 ....... value DEC-Reserved Operand = = DEC-Reserved Operand: A DEC-Reserved Operand is interpreted as a Signal or symbol. DEC-Reserved Operands are used to indicate invalid operations and operations whose results have overflowed the destination format. They may also be used to pass symbolic information from one calculation to another. The DEC formats are fully described in the VAX Architecture Manual. IBM Formats IBM Single Precision The IBM single-precision word is 32 bits wide and is arranged in the format shown below: 31 sign 30 29 28 27 26 25 24 biased exponent (e) 23 22 21 20 19 18 fraction (f) 3 2 1 a T8001080 The floating-point word is divided into three fields: a single-bit sign, a 7-bit biased exponent, and a 24-bit fraction. The sign bit is 0 for positive numbers and 1 for negative numbers; a true 0 has a positive sign. The biased exponent is a 7-bit unsigned integer representing a multiplicative factor of some power of 16. The bias value is 64. If, for example. the multiplicative value for a floating-point number is to be 16&. the value of the biased exponent is a + 64. where "a" is the true exponent. The fraction is a 24-bit unsigned fractional field containing the 24 least significant bits of the floating-point number's 25-bit mantissa. The weight of the fraction's most significant bit is 2- 1• The weight of the least significant bit is 2-24 • An IBM floating-point number is evaluated or interpreted as follows: Value = (-1)S 16H;4(0.f) Zero: There are two classes of zero. If the sign, biased exponent, and fraction are all zero, the operand is known as a "True Zero." If the fraction is zero, but the sign and biased exponent are not both zero, the operand is known as a "Floating-point Zero." The IBM format is fully described in the IBM System/370 PrinCiples of Operation Manual. 1-175 29K Family CMOS Devices IBM Double Precision The IBM double-precision word is 64 bits wide and is arranged in the format shown below: 63 62 61 60 59 58 57 56 5 4 3 2 1 0 55 -1 54 53 -2 -3 52 -4 51 -5 50 222222222222 sign biased exponent (e) -6 fraction (f) 3 2 1 0 -53 -54 -55-56 2 2 2 2 TBOO110 The floating-point word is divided into three fields: a single-bit sign, a 7-bit biased exponent, and a 56-bit fraction. The sign bit is 0 for positive numbers and 1 for negative numbers; a true 0 has a positive sign. The biased exponent is a 7-bit unsigned integer representing a multiplicative factor of some power of 16. The bias value is 64. If, for example, the multiplicative value for a floating-point number is to be 16a, the value of the biased exponent is a + 64, where "a" is the true exponent. The fraction is a 56-bit unsigned fractional field containing the 56 least significant bits of the floating-point number's 57-bit mantissa. The weight of the fraction's most significant bit is 2- 1 • The weight of the least significant bit is ~56. An IBM floating-point number is evaluated or interpreted as follows: Value = (_1)5 16tH;4(0.f) Zero: There are two classes of zero. If the sign, biased exponent, and fraction are all zero, the operand is known as a "True Zero." If the fraction is zero, but the sign and biased exponent are not both zero, the operand is known as a "Floating-point Zero." The IBM format is fully described in the IBM System/370 PrinCiples of Operation Manual. 1-176 Am29027 APPENDIX B-ROUNDING MODES The round mode is selected by mode register field RMS as follows: RMS Round Mode 000 001 010 011 100 101 11 X Round to Nearest (IEEE) Round to Minus Infinity (IEEE) Round to Plus Infinity (IEEE) Round to Zero (IEEE) Round to Nearest (DEC) Round Away from Zero Illegal Value Round to Nearest (IEEE) The infinitely precise result of an operation is rounded to the closest representable value in the destination format. If the infinitely precise result is exactly halfway between two representations, it is rounded to the representation having a least significant bit of O. Round to Minus Infinity (IEEE) The infinitely precise result of an operation is rounded to the closest representable value in the destination format that is less than or equal to the infinitely precise result. Round to Plus Infinity (IEEE) The infinitely precise result of an operation is rounded to the closest representable value in the destination format that is greater than or equal to the infinitely precise result. Round to Zero (IEEE) The infinitely precise result of an operation is rounded to the closest representable value in the destination format whose magnitude is less than or equal to the infinitely precise result. Round to Nearest (DEC) The infinitely preCise result of an operation is rounded to the closest representable value in the destination format. If the infinitely precise result is exactly halfway between two representations, it is rounded to the representation having the greater magnitude. Round Away from Zero The infinitely preCise result of an operation is rounded to the closest representable value in the destination format whose magnitude is greater than or equal to the infinitely precise result. A graphical representation of these round modes is shown in Figures B1 and B2. The IEEE standard specifies that all four "IEEE" modes be available so that the user may select the mode most appropriate for the algorithm being executed. The DEC standard specifies that two rounding modes be availableRound-to-Nearest (DEC) and Round-to-Zero. The IBM standard specifies that all operations be performed using the Round-to-Zero mode. It should be noted, however, that the Am29027 permits anyof the supported rounding modes to be selected, regardless of the format of the operation. It is permissible to use one of the IEEE rounding modes with an IBM operation, or DEC rounding with an IEEE operation, or any other possible combination. For those integer operations where rounding is performed, any rounding mode may be chosen. This flexibility allows the userto select the mode most appropriate for the arithmetic environment in which the processor is operating. 1-177 ~ r-.) ..., (0 ~ Q:) -(P+lq) Infinitely Precise Result Rounded Result -P -(P-lq) 0 P-lq P P+lq \\111 \\1 ~ 1 ~ III \\lll \\1 ~ ~ III J,I It, -P -(P+lq) -(P-lq) t,1 0 P-lq It, P P+lq Infinitely Precise Result Rounded Result ~ lOP t, I -(P+lq) -P -(P-lq) -P -(P-lq) 0 P-lq P P+lq P P+lq P P+lq 1/& 1~ 1 ~ 1/& lOP 1~ I t, t, I 0 P-lq I t, Round to Minus Infinity -(P+lq) Infinitely Precise Result ~1 Rounded Result t,1 -P ~\\1 -(P-lq) 0 ~1~ It, -(P+lq) -P -(P-lq) P-lq 1 ~1 ~1 t,1 0 P-lq "T1 Q) ~ -< o 3: oen c (!) < 0(!) Round to Nearest (Unbiased) -(P+lq) " ~1 ~ It, P Round to Plus Infinity Figure 81. Graphical Interpretation of Round-to-Nearest (Unbiased), Round-to-Minus-Inflnity, and Round-to-Plus-Infinlty Rounding Modes P+lq tn -(P+1q) Infinitely Precise Result ~l Rounded Result J,I -P -(P-1q) ~\l ~\\l~ IJ, -(P+1q) -P -(P-1q) 0 P-1q P P+1q P-1q P P+1q P-1q P P+1q P P+1q P P+1q 1 ~llP liP 1~ J,I 0 J, Round to Zero -(P+1q) Infinitely Precise Result Rounded Result . -P . -(P-1q) 0 \\111 \\1 ~ 1 ~ III \\111 \\1 ~ ~ 111 J, I I J, -(P+1q) -P -(P-1q) J, I 0 I P-1q L-, Round to Nearest (DEC) -(P+1q) Infinitely Precise Result Rounded Result -P ~ liP J,I -(P+1q) -(P-1q) 0 P-1q liP 1~ 1 ~ 1 IJ, -P -(P-1q) ~\\1 J,I 0 P-1q ~\\1 ~ I P L-, P+1q Round Away from Zero -' ..... ~ co Figure B2. Graphical Interpretation of Round-to-Zero, Round-to-Nearest (DEC), and Round-Away-from-Zero Rounding Modes l> 3 r-l I.D o r-l ~ 29K Family CMOS Devices APPENDIX C-ADDITIONAL OPERATION DETAILS There are several cases in which the implementation of the IEEE, DEC, and IBM floating-point standards in the Am29C327 differs from the formal definitions of those standards. This appendix describes these differences. Differences Between Floating-Point Arithmetic and Am29027 IEEE Operation Section 7.3 of the IEEE-754 standard specifies that ''Trapped overflow on conversion from a binary floating-point format shall deliver to the trap handler a result in that or a wider format, possibly with the exponent bias adjusted, but rounded to the destination's precision." According to the IEEE standard, then, if a double-to-single IEEE operation overflows while traps are enabled, the result is a double-precision operand, rounded to single-precision width (23-bit fraction), together with a correctly adjusted (double-precision) exponent and the appropriate flags for a trapped overflow. In the case of an overflow in any IEEE operation, the Am29027 returns a result in the destination format specified by the user, rounded to that destination format. In the case of the double-to-single overflow described above, the result from the Am29027 is a single-precisionoperand, together with a correctly adjusted (single-precision) exponent and the appropriate flags for a trapped overflow. A simple example serves to illustrate the discrepancy by describing the conversion of the double-precision IEEE number 52B123456789ABCD to single-precision, with traps enabled, and the round-to-nearest rounding mode selected. This number is too large to be represented in single-precision format. According to the IEEE standard, the result of this operation is the double-precision number 52B1234560000000, comprising the double-precision exponent of the input and a fraction truncated to 23 bits, together with flags V and X. When the operation is performed in the Am29027, however, using the F' = P' operation with appropriate precision controls, the result is the single-precision number 75891 A2B, comprising the single-precision (overflowed) exponent reduced by 192 (decimal) and a single-precision fraction, together with flags V and X. It should be noted that trapped operation is an optional part of the IEEE standard. Full adherence to the IEEE specification of trapped operation is therefore not necessary to ensure compliance with IEEE-754. Differences Between DEC Floating-Point Arithmetic and Am29027 DEC Operation The DEC F, DEC D, and DEC G standards, as implemented in the Am29027, differ from the implementations in a VAX only in the way in which the subfields of the floating-point word are arranged. The differences are listed in Table C1. Table C1. Differences In Am29027 and DEC Floating-Point Formats Am29027 Arrangement sign: OECF exponent: fraction: sign: DECO fraction: exponent: fraction: 1-180 bits 30-23 bits 22-0 bit 63 exponent: . bits 62-55 sign: OECG bit 31 VAX Arrangement sign: exponent: fraction: bit 15 bits 14-7 bits 6-0, bits 31-16 sign: exponent: fraction: bit 15 bits 14-7 bits 6-0, bits 31-16, bits 47-32, bits 63-48 sign: exponent: fraction: bit 15 bits 14-4 bits 3-0, bits 31-16, bits 47-32, bits 63-48 bits 54-0 bit 63 bits 62-52 bits 51-0 Am29027 Differences Between IBM 370 Floating-Point Arithmetic and Am29027 IBM Operation The Am29027's deviations from the IBM standard may be summarized as follows, assuming that the user has selected the round-to-nearest rounding mode: 1. The Am29027 provides more guard bits in its internal format than specified by the IBM standard. With certain combinations of input operands, the Am29027 produces more accurate results than a standard IBM processorfor instructions based on addition operations and comparisons. 2. The discrepancies are much larger for single-precision operations than double-precision operations, because the difference in the number of guard bits is much greater (33 more for single, one more for double). 3. There is no universal rule for determining whether a given set of input operands will result in a discrepancy. Pro vided the conditions in (1) above are met, the user must examine each operation on a case-by-case basis, taking into account the input operands and the internal formats discussed in this section. 4. The Am29027 does not produce unnormalized results from additions. The results of all addition operations are renormalized. Am29027 internal formats are compared with IBM internal formats in Figure C1. Overflow Bit [Y] .....1____2_4_F_r_ac_t_io_n_B_its_ _ _ -.J A , I IGIGIGIGIGIGIGIGIGI---IGIGIGIGIGIGIGIGIGIUil a. Am29027 Internal Format-lBM Single-Precision Overflow Bit I S~CitkY 37 Guard Bits I I 5 Guard Bits Sticky Bit I I ....___________________5_6_F_ra_ct_io_n_B_it_s____________________~ IGI GI GI GI GI[§] ~.I b. Am29027 Internal Format-IBM Double-Precision 4 Overflow Bit Guard Bits I ~.I I 24 Fraction Bits I GI GI GI GI c. IBM Internal Format-Single-Precision 4 Guard Bits Overflow Bit I ~.I~ _______________________5_6_F~ra~ct~io~n~B~it~s______________________~ I d. IBM Internal Format-Double-Precision 09114-016C Figure C1. Differences In Internal Mantissa Formats of an IBM CPU and the Am29027 1-181 29K Family CMOS Devices APPENDIX D-TRANSACTION REQUEST/OPERATION TIMING ClK Transaction Request ----« ~ ____ ~'~ ~>---- __________-J, ~< ~>-----I ------------------4~~c------------~--------------------I I 1 Data Accepted on this Edge a. Normal Operation, Data Accepted ClK Transaction Request --~<~--~)>-----------« -----« »------ )>---- \'--------'/ b. Halt On Error Mode, Unmasked Exception Present 091148-017C Note: Signals A31 -Ao and 0 31 -0 0 are the Am29000 address and data buses, respectively. Figure 01. Timing for the Write Operand R, Write Operand 8, Write Operands R, 8, and Write Instruction Transaction Requests 1·182 Am29027 ClK Transaction Request A3'-Ao 0 3,-00 ~ < < ) < ~ I I COA OROY OERR 1 Data Accepted on this Edge a.CDA Low ClK Transaction Request A3'-Ao 0 3,-00 COA ) < < < I ~ ~ I \ OROY ~ OERR t Data Accepted on this Edge b. CDA High Initially Note: Signals A3'-Ao and 0 3,-00 are the Am29000 address and data buses, respectively. 09114-018C Figure 02. Timing for the Write Mode, Write Status, and Write Register File Precisions Transaction Requests 1-183 29K Family CMOS Devices ClK Transaction Request ----~<~--~)~--------------I -~( ~>------ p>------- I ~( Registers Advanced on this Edge 8.CDA Low ClK Transaction Request )>----- -----« ~--------------------~I ~>----- -----« I )>----- -~( I \'-----f-~ . . . . . . . .~...............~ t Registers Advanced on this Edge b. CDA High Initially 09114-019C Note: Signals A31-Ao and D31-Do are the Am29000 address and data buses, respectively. Figure 03. Timing for the Advance Temp. Registers Transaction Request 1·184 Am29027 ClK Transaction Request \:-_____~X ------'\'I..~ X lSBs RD MSBs )>----------------- MSBs )>----------------- ''\ / -----~~~c---------------------------------------------------------- a. Read Result MSBs Request Issued in Cycle after Read Result LSBs Request ClK Transaction Request < ~____________...J) --~'..~ ~'\ lSBs X MSBs ) / )>-'----- Read Result MSBs < \ MSBs )>----j' -----~,~c-------------------------------------------------------- b. Read Result MSBs Request Issued Two or More Cycles after Read Result LSBs Request 09114-020C Figure 04. Timing for the Read Result LSBs Transaction Request, No Unmasked Exceptions 1-185 29K Family CMOS Devices elK Transaction Request ~----~)~------------------~~~ X · )>----- '\ "\ / / 09114-021C Figure 05. Timing for Read Result LSBs Transaction Request, Unmasked Exception Present 1-186 Am29027 ItClK Transaction Request 1 or More Cycles -I IL -C~'c---:- - - - 1 ) > - - - - - ~,~ )~-------- '\ / --------~,~,------------------------------------------------ a. No Unmasked Exceptions Present ItClK Transaction Request 1 or More Cycles -I IL -c: )>------ ~~~ )~--------- '\ '\ / / b. Unmasked Exceptions Present 09114-022C Figure 06. Timing for Read Result MSBs~ Read Flags, and Read Status Transaction Requests 1-187 29K Family CMOS Devices IClK Transaction Request 1 or More Cycles -I IL X -C~ave State ~ - - - - ' \ '.. lSBs X Save State )>----------------- MSBs )>------------------ / ~\ a. Second Save State Request Issued In Cycle Following First Request ClK Transaction Request < r--------J) ---~'..~ ~\ lSBs X MSBs ) / Save State < \ MS8s )>-----)>------ / ----~~~(---------------------------------09114-023C b. Second Save State Request Issued Two or More Cycles after First Request Figure D7. Timing forthe Save State Transaction Request, 64-Bit Resources (Registers R, R-Temp, S, S-Temp; Register File Locations RF7-RFo: Mode Register) 1-188 Am29027 I• ClK Transaction Request 1 or More Cycles -I ~ ----»).------ -C:~: ~,~ )~------- '\ / 09114-024C Figure 08. Timing for the Save State Transaction Request, 32-8it Resources (Instruction Register, Register I-Temp, Status Register, Precision Register) ~ Operation in Progress 6 Cycles II ClK Transaction Request A 31-Ao/ 0 31 -0 0 OREQTo --<3G)----( ) RM S --GX3 ~ COA V OROY DEAR Notes: WRS = Write Operands R, S = Read MSBs RM INST = Addition Instruction WI A. B RES = Write Instruction = Operands A, B = Result Signals A 31 -Ao and 0 31-0 0 are the Am29000 address and data buses, respectively. 09114-025C Figure 09. Typical Timing for Single-Precision Operation in Flow-Through Mode-Perform the Operation A PLUS 8, Readthe Result; Mode Register Field PLTC=6 1-189 29K Family CMOS Devices b Operation in Progress J 6 Cycles - - -..... r...- - - ClK ~___________R_l____________~ Transaction Request __~f\~____________________ DREOTo Notes: WR WI RM B lSB = Write Operand R = Write Instruction "" Read MSBs Operand B Result LSBs = = WS = Write Operand S Rl = Read lSBs A = Operand A INST = Addition Instruction MSB = Result MSBs 09114-026C Signals A 31 -Ao and 0 31-0 0 are the Am29000 address and data buses, respectively. Figure 010. Typical Timing for the Double-Precision Operation In Flow-Through Mode-Perform the Operation A PLUS B, Read the Result; Mode Register Field PlTC=6 ClK Transaction Request A31 -AoI 0 31-00 OREOTo -<3G)---( ) RM ~ ~ ~ COA OROY OERR Notes: WRS RM INST = Write Operands R, S = Read MSBs = Addition Instruction WI A, B RES = Write Instruction = Operands A, B = Result V V 09114-027C Signals A31 -Ao and 0 31 -00 are the Am29000 address and data buses, respectively. Figure 011. Typical Timing for Single-Precision Operation in Flow-Through Mode, with Unmasked Exception Present-Perform the Operation A PLUS B, Read the Result; Mode Register Field PlTC=6 1-190 Am29027 ~It----- Operation in Progress J 6 Cycles ---~, ClK ·)~------ Transaction Request OREOTo _________ R_l_ _ _ _ _ _ _ __~f\~___________ COA \\-------1/ OROY ----------~--~\ Notes: WR = Write Operand R WI = Write Instruction A = Operand A INST = Addition Instruction MSB = Result MSBs WS = Write Operand S Rl = Read lSBs B = Operand B lSB = Result lSBs / 09114-028C Signals A3'-Ao and 0 3,-00 are the Am29000 address and data buses, respectively. Figure D12. Typical Timing for Double-Precision Operation in Flow-Through Mode, with Unmasked Exception Present-Perform the Operation A PLUS B, Read the Result; Mode Register Field PLTC=6 ClK Transaction Request ~_ _ _ _ _ _ _ _R_M_ _ _ _ _ _~)~--------~--------------~~r---------------- OREOTo ~------------------- v Notes: WRS = Write Operands R, S RM = Read MSBs INST = Addition Instruction WI = Write Instruction A. B = Operands A, B RES = Result V 09114-029C Signals A3,-Ao and 0 3,-00 are the Am29000 address and data buses, respectively. Figure D13. Typical Timing for Single-Precision Operation in Flow-Through Mode, with DRDY Advanced-Perform the Operation A PLUS B, Read the Result; Mode Register Field PLTC=6 1-191 29K Family CMOS Devices ... ~ _ _ _ _ Operation in Progres;:;..s---tl~ 6 Cycles CLK ~___________R_L__________~~ Transaction Request ______________ OREOTo Notes: WR WI RM B LSB = = Write Operand R Write Instruction = Read MSBs = Operand B = Result LSBs -J~~__________________________________________ WS = RL = A = INST.. MSB = Write Operand S Read LSBs Operand A Addition Instruction Result MSBs 09114-030C Signals A 31 -Ao and 0 31 -0 0 are the Am29000 address and data buses, respectively. Figure 014. Typical Timing for Double-Precision Operation In Flow-Through Mode, with ORO Advanced-Perform the Operation A PLUS B, Read the Result; Mode Register Field PLTC=6 CLK Transaction Request ~~_ _ _ _ _ _R_M_ _ _ _ _ _ _-J)r------------- ~~--------------~~r---------OREOTo ~~--------------------------- v Notes: WRS = Write Operands R, S RM = Read MSBs INST = Addition Instruction V WI = Write Instruction A, B = Operands A, B RES = Result 09114-031C Signals A 31 -Ao and 0 31 -0 0 are the Am29000 address and data buses, respectively. Figure 015. Typical Timing for Single-Precision Operation In Flow-Through Mode, with DROY Advanced and Unmasked Exception Present-Perform the Operation A PLUS B, Read the Result; Mode Register Field PLTC 6 = 1-192 Am29027 Operation In Progress d lit-I- - - 6 Cycles - - -..... , ClK Transaction Request OREOTo _______ R_l________ ~)~------ __~f\~_______________________ COA \'----J/ OROY ---------------------~\ Notes: WR = Write Operand R WI = Write Instruction A = Operand A INST = Addition Instruction MSB = Result MSBs / WS = Write Operand S Rl = Read lSBs B = Operand B lSB = Result lSBs 09114-037C Signals A31-Ao and 0 31-00 are the Am29000 address and data buses, respectively. Figure D16. Typical Timing for Double-Precision Operation in Flow-Through Mode, with DRDV Advanced and Unmasked Exception Present-Perform the Operation A PLUS B, Read the Result; Mode Register Field PLTC 6 = Operation 2 - - -......- - 6 Cycles ------; lJlnIL elK Transaction Request OREOTo OERR Notes: WRS = Write Operands R. S WR = Write Operand R A. B = Operands A, B C = Operand C RES = Result WI = Write Instruction RM = Read MSBs 11 = Addition Instruction 12 = Multiplication Instruction 09114-032C Signals A31 -Ao and 0 31 -0 0 are the Am29000 address and data buses, respectively. Figure D17. Typical Timing for Overlapped Single-Precision Operations In Flow-Through Mode; Perform the Compound Operation (A PLUS B) x C by Performing Operations: (1) RFo ~ A PLUS B, (2) RFo x C Mode Register Field PL TC 6 = 1-193 29K Family CMOS Devices Operation 2 - - - - . . . - - 6 Cycles-tj ~ CLK ~----~~----~~ Transaction Request DREOTo ______~r\~~r\~~~~,,~---------_ _ _ _-----J/ ~.~,_ __ DRDY DERR Notes: WR WI RM B 11 LSB = Write Operand R WS = RL = A = C = 12 = MSB = = Write Instruction = Read MSBs = Operand B = Addition Instruction = Result LSBs Write Operand S Read LSBs Operand A Operand C Multiplication Instruction Result MSBs Signals A31 -Aol and DrDo are the Am29000 address and data buses, respectively. 09114-033C Figure 018. Typical Timing for Overlapped Double-Precision Operations In Flow-Through Mode; Perform the Compound Operation (A PLUS B) x C by Performing Operations: (1) RFo ~ A PLUS B, (2) RFo x C; Mode Register Field PLTC = 6 Mode Register Field PLTC 6 = CLK Transaction Request A31 -N D31-Do DREOTo I'J CDA \ V DRDY DERR Pl STAGE 1 Pl STAGE 2 Notes: A PLUS B WI RM A. B•... = Write Instruction Read MSBs Operands C PLUS D A PLUS B E PLUS F C PLUS D V V-- G PLUS H E PLUS F I PLUS J G PLUS H WRS=· Write Operands R. S I = Addition Instruction RES = Result Signals A 31 -Ao and D31-Do are the Am29000 address and data buses. respectively. Figure 019. Typical Timing for Single-Precision Operations in Pipeline Mode; Perform a Series of Addition Operations A PLUS B, C PLUS 0, E PLUS F, ... Mode Register Field PLTC 3 = 1-194 ClK Transaction Request A31 -Ad 0 31 -00 n OREQTo n n n n'--___ COA - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - OROY \ ; ____ U\ I '-- OERR------------------------------------------------------------~--------------- PlSTAGE 1 A PLUSB Pl STAGE 2 Notes: WI =Write Instruction WS = Write Operand S RM = Read MSBs A, B, ... = Operands MSB = Result MSBs C PLUS D E PLUS F I G PLUS H I PLUSJ A PLUS B C PLUSD .I E PLUS F G PLUS H WR = Write Operand R Rl = Read lSBs I = Addition Instruction lSB = Result lSBs Signals A31 -Ao and 0 31 -00 are the Am29000 address and data buses, respectively. 09114-035C Figure 020. Typical Timing for Double-Precision Operations in Pipeline Mode; Perform a Series of Addition Operations A PLUS B, C PLUS OJ E PLUS F, ... Mode Register Field PLTC = 3 ..... ~ U) U1 » 3 N <0 o ~ Table of Contents CHAPTER 2 29K Family Support Tools ASM29K Data Sheet ..............................................................................................................................................2-3 HighC29K Data Sheet .......................................................................................................................................... 2-10 MON29K Data Sheet ...........................................................................................................................................2-17 XRAY29K Data Sheet ..........................................................................................................................................2-24 ASM29K Advanced Micro Devices ASM29K Cross-Development Toolkit, Release 2 DISTINCTIVE CHARACTERISTICS • • Relocatable Macro Assembler supports complete Am29000™ microprocessor Instruction set. • LIbrarian provides management facility for organizing modules Into logical collections of functions. LInker/Loader combines separately assembled modules by resolving external references and by searching libraries. • IEEE Software Floating-Point Emulation routines. • Available for the PC-ATTM, and Sun-3™ development environments. GENERAL DESCRIPTION Processor performance depends on the processor's hardware and software environment. The key to maximizing performance lies in the realization that the processor is part of a system that is a collection of components that must be integrated properly. To take advantage of the advanced RiSe architecture of the Am29000 microprocessor, equally sophisticated software tools must be available. The ASM29KTM cross-development toolkit offers such a development environment for creating efficient and portable Am29000 microprocessor software. The package consists of the assembler, the linker, the floatingpoint emulation routines, and the object module librarian. These tools allow users to deSign more efficient systems and applications than ever before. Cross-development is the design of an application program on one computer (the host system) and the execution of that same application program on a different computer (the target system). The operating system on the host, such as UNIXTM or DOS, provides the tools needed to create the application program. These tools include editors for writing the source code, compilers and assemblers for translating the modules into executable code, and utilities for preparing the application for execution. The Am29000 microprocessor-based target computer generally does not provide the tools required to develop the application program. Figure 1 shows the path that an application follows from development on the host system to execution on the target system. Target Computer Host Computer Cj5 nnnn c = c::=:J Am29000 Microprocessor o000 Via On-Board Monitor or ADAPT29K Debugger Figure 1. Cross Software Development publication # ~ ~ 10292 B /0 Issue Date: September 1989 2-3 29K Family Support Tools The ASM29K cross-development toolkit transforms a PC or Sun-3 workstation host into a powerful software development environment. ASM29K software assembles user source and produces a relocatable object module. This module can be combined with other relocatable object modules (derived from the assembler or high-level language cross-compilers) using the ASM29K linker. Library modules prepared by the librarian can be linked in at this point as well. The resulting absolute object module then can be downloaded to a target system. AMD has established and published the Am29000 microprocessor Common Object File Format (COFF) to which all Am29000 development tools conform. The AMD COFF format extends the already standard AT&T COFF format to support source-level debugging and other Am29000 microprocessor-specific features. Similarly, AMD has established a common calling conven- tion that maximizes performance on the Am29000 microprocessor as well as defining another standard for software vendors. This has led to a variety of compilers, assemblers, debuggers, and associated tools that may be mixed freely by developers of Am29000 microprocessor software. The contents of the ASM29K cross-development toolkit include: • • • • • • ASM29K macro assembler ASM29K linker ASM29K librarian Hex utilities IEEE floating-point emulation routines Documentation ORDERING INFORMATION Licensing Order Numbers The ASM29K cross-development toolkit is licensed through AMD's Standard End-User Software License Agreement (Boxtop). This license does not require a signature; breaking the seal on the software envelope indicates acceptance of the license terms. If changes are required to the license agreement, they can be arranged through your AMD sales representative. Many software products require the customer to provide a CPU ID number when ordering the product. Contact your sales representative if this information is not available at the time of purchase. In addition, terms of the license require the customer to complete a Software Warranty card with the serial number and site of the host computer on which the software will reside. This card must be returned to AMD within 30 days of receipt forthe warranty to be valid. The ASM29K cross-development toolkit is available for several different environments. Documentation can be ordered separately. The order number (valid combination) is formed as a combination of: 2-4 • • • • • • Product Family Product Category Product Identifier License Type Host I OS Type Media Type ASM29K ORDER INFORMATION (continued) IlJl I Media Type 08 = 0.25" Sun cartridge tape, TAR format 14 = 3.5" DSHD floppies 21 = 9-track, 1600 BPI mag tape, TAR format 24 = 5.25" DSHD floppies Host I 05 Type 07 = Sun-3 10= PC-AT License Type B = Boxtop S = Signed "-- = Not Applicable Product Identifier ASM = ASM29K Cross-Development Toolkit Product Category SWI = Software Product DCI = Documentation Product MN = Maintenance Agreement Product Family Am29000 Microprocessor Valid Combinations Valid Combinations list configurations planned to be supported in volume for this device. Consult the local AMD sales office to confirm availability of specific valid combinations and to check on newly released combinations. Host Media AM29000SW/ASMB0708 ASM29K Toolkit Sun-3 0.25" cartridge tape, TAR format AM29000SW/ASMS0708 ASM29K Toolkit Sun-3 0.25" cartridge tape, TAR format AM29000SW/ASMB0721 Sun-3 9-track, 1600 BPI tape, TAR format Order Number Product ASM29K Toolkit AM29000SW/ASMS0721 ASM29K Toolkit Sun-3 9-track, 1600 BPI tape, TAR format AM29000SW/ASMB1014 ASM29K Toolkit PC-AT 3.5" DSHD floppies AM29000SW/ASMS1014 ASM29K Toolkit PC-AT 3.5" DSHD floppies AM29000SW/ASMB1024 ASM29K Toolkit PC-AT 5.25" DSHD floppies PC-AT 5.25" DSHD floppies AM29000DCIASM-99 ASM29K Documentation UNIX Not Media Specific AM29000MNASM-07 ASM29K Maintenance Sun-3 Not Media Specific AM29000MNASM-10 ASM29K Maintenance PC-AT Not Media Specific AM29000SW/ASMS 1024 ASM29K Toolkit 2·5 29K Family Support Tools FUNCTIONAL INFORMATION Assembler The ASM29K assembler converts user-written Am29000 assembly code into relocatable object modules. It produces standard COFF object modules that can be linked with other assembled or compiled modules. Its advanced features permit the design of wellstructured modules that are easily maintained. The assembler processes Am29000 microprocessor instructions as defined in Chapter 8 of the Am29000 User's Manual. Each instruction mnemonic and register identifier is recognized in both upper and lower case. Identifiers (that is. user-named variables) can have up to 63 characters. all of which are significant. Integer. character. string. and floating-point constants are supported as well as complex expression analysis. In addition to the Am29000 microprocessor instructions. the assembler supports a powerful macro facility. Programmers can define macros with multiple parameters and direct macros to be repeated a specified number of times. Macro code is inserted into the source code at the position of the macro call. Macros may use local labels-labels that are visible only within the macro itself-to label an instruction that can be copied several times throughout the program. Local labels are distinguished from regular labels by using the format "$n." where n can be from one to six digits. The assembler also provides a number of directives for organizing the code into efficient sections or modules. Use of the include directive merges separate files during assembly. The section directive assigns areas of code to named text. data. uninitialized memory. or initialized memory sections. Conditional assembly is also supported. This useful feature allows the programmer to assemble code conditionally for debugging. The assembler directives are listed in Table 1. The ASM29K software also produces a cross-reference table for symbols. Flags allow the programmer to print listings that contain expanded macros. instructions not assembled due to conditional statements. and symbol tables; and to insert user-specified headers into the listing. The assembler optionally emits debug information for use with the XRAY29KTM source-level debugger. This information allows the programmer to specify the symbolic names of variables and labels during debugging sessions. The wide selection of features available in the ASM29K assembler gives the user the latest tools to produce well-structured and maintainable code. 2-6 Linker The ASM29K linker integrates a group of separately compiled or assembled modules into a composite module in which all references between modules are resolved. It processes and produces COFF modules. including any module produced by a compiler in any language and any assembler that adheres to the AMDdefined COFF and calling-convention standards. Incremental linking is supported also. The ASM29K linker produces an extensive load map with an optional symbol cross-reference table. Object module libraries are searched with required modules automatically included. All code and data sections are given absolute addresses as specified by the programmer. The linker provides options that create ROMabie programs. generate warnings for possible undefined external references. produce a global crossreference. and list defined symbols. Directives to the linker may be included in a file (batch mode). on the command line. or in combination. Programmers can use the ASM29K to: - Resolve external references between separately compiled or assembled modules. - Assign absolute addresses. - Direct section ordering. - Perform incremental linking. - Load only those library modules referenced for efficient code space use. -Generates optionally ROMabie programs. Librarian The ASM29K librarian is a management facility for organizing independently developed pieces of software into logical units. It permits the addition. deletion. and replacement of object modules in one or more libraries. The ASM29K librarian: -Organizes and initializes modules into a library file. - Lists library contents and information. - Lists a library directory. ASM29K Table 1. Assembler DIrectives Group Directives Meaning File Processing .end .err .ident .include .else .endif .if .ifdef .ifeqs .ifnes .ifnotdef .eject .lflags .list .nolist .print .sbttl .space .title .equ .extern .global .reg .set .comm .data .dsect .lcomm .sect .text .use .align .ascii .block .byte .double .extend .float .hword .word .endr .irep .irepc .rep .endm .exitm .macro .purgem .def .dim .endef .file .line .In .scl .size .tag .type .val End of Assembly Generate Assembly Error Specify Module Name Include Text File Alternate Condition End of Conditional Assembly Block Assemble if Value is Not Zero Assemble if Identifer is Defined Assemble if Strings are Equal Assemble if Strings are Not Equal Assemble if Identifier is Not Defined Advance to Top of Page Set Listing Flags Enable Listing Disable Listing Print to Standard Output Set the Listing Subtitle Space N Lines Set the Listing Title Equate a Symbol to a Value (Unlimited Scope) Declare Symbols as External to This Module Make Symbols Visible to Other Modules Declare a Symbol as a Synonym for a Register Set a Symbol to a Value (Limited Scope) Declare a Common Symbol Use the .data Section Declare a Dummy Section Declare a Local bss Symbol Declare a New Section Use the .text Section Use a Declared Section Specify Byte Alignment Store the String Reserve Bytes Initialize Bytes Initialize Double-Precision Values Initialize Extended-Precision Values Initialize Single-Precision Values Initialize Half-Words Initialize Words End of Repeat Block Repeat for Each Item in the List Repeat for Each Character in the String Repeat N Times End Macro Definition Terminate Macro Expansion Macro Heading Purge All Macros Listed Define Symbol Table Entry Directive Dimensions of an Array Attribute End of Symbol Definition Block Directive Source Filename Directive Source-File Line-Number Directive HLL Source-File Line-Number Directive Storage Class of a Symbol Attribute Size of a Symbol Attribute Structure, Union, or Enumeration Identifier Attribute Basic and Derived Type of a Symbol Attribute Value of a Symbol Attribute Conditional Assembly Listing Control Symbol Declaration Section Declaration Data Storage Declaration Repeat Block Macro Definition High-Level Language (HLL) Debugging 2·7 29K Family Support Tools Floating-Point Emulation The Am29000 microprocessor instruction set includes floating-point and integer math operations. In the current processor implementation, these instructions cause traps to routines that perform the operations. The user is provided with source to two complete sets of routines that emulate IEEE Floating-Point Standard 754 for each of the instructions listed in Table 2. The first set of routines is provided for users who have integrated an Am29027™ arithmetic accelerator into their systems. The Am29000 microprocessor math instructions are emulated using the Am29027 co-processor. The second set of routines implements emulation of the floating-point operations entirely in software. No special hardware is required. Documentation instructs users how to integrate the package into their target system. Both packages are designed to insure upward compatibility with next generation processors. Table 2. Arithmetic Instructions Type Mnemonic Operation Integer Arithmetic MULTIPLY MULTIPLYU DIVIDE DIVIDEU FADD FSUB FMUL FDIV DADD DSUB DMUL DDIV FEQ DEQ FGT DGT FGE DGE CONVERT Signed Multiply Unsigned Multiply Signed Divide Unsigned Divide Single-Precision Add Single-Precision Subtract Single-Precision Multiply Single-Precision Divide Double-Precision Add Double-Precision Subtract Double-Precision Multiply Double-Precision Divide Single Compare Equal To Double Compare Equal To Single Compare Greater Than Double Compare Greater Than Single Compare Greater Than Or Equal To Double Compare Greater Than Or Equal To Convert Data Format Single-Precision Floating-Point Arithmetic Double-Precision Floating-Point Arithmetic Floating-Point Compare Data Format Conversion Hex Utilities A set of hex utilities are provided to create Hex files for downloading into target systems and for creating ROM images. These tools convert AMD standard COFF files into Motorola® S-Record or Tektronix® Extended Hex .files. These hex utilities and a brief description of each are listed below. Converts a binary file into an ASCII file. • btoa • coff2hex Converts a COFF file into a hex file. • sim29 2-8 ASM29K software architectural simulator. • nm29 Prints name list of a COFF file, • romcoff Generates COFF file for ROM. • cvcoff Translates Am29000 microprocessor COFF files between big endianJlittle end ian hosts. • strpcoff Strips symbolic information from a COFF file. ASM29K WARRANTY and SUPPORT Software Warranty Software programs licensed by AMD are covered by the warranty and patent indemnity provisions appearing in AMD's standard software license forms. AMD makes no warranty, express, statutory, implied or by description, regarding the information set forth herein or regarding the freedom of the described software program from patent infringement. AMD reserves the right to modify, change or discontinue the. availability of this software program at any time and without notice. Customer Support Maintenance All orderable software products include one year of free Maintenance Support, which starts from the date of original purchase. Maintenance Support allows customers to receive technical assistance from highly trained field and factory personnel, to use a call-in on-line infor~ mation system and to receive product and documentation updates at no additional charge. Customers may extend Maintenance Support in one-year increments. Customers can access support services by calling the 24-hour, toll-free 29I(TM Family hotline at (800) 2929-AMD (292-9263). On-Line Call-In Bulletin Board In addition to the support engineering staff, AMD offers a 24-hour on-line technical support center. The cus- tomer can call (800) 2929-AMD at any time to query the system for the latest information on a particular product: bug fixes, work-arounds, information on upcoming releases, etc. Messages may be left for the support engineering staff during "after hours." Training Classes AMD offers training classes for the 29K Family products. These classes focus on 29K Family system design and implementation using the broad range of AMD software development tools. Customers can shorten the development process through extensive hands-on training covering a variety of topics. Contact your local AM 0 field office for more information on training classes~ Fuslon29K Program AMD encourages broad-based development and support for the Am29000 microprocessor with the Fusion29KTM program, a joint-effort program between AMD and third-party developers. Published twice a year, the Fusion29K program catalog reveals the breadth of development and system solutions for the 29K FamilYi including software generation and debug tools; hardware development tools; executive, kernel and multi-user operating systems; board-level products; silicon products; and more. For a copy of the Fusion29K program catalog, call your local AM D field sales office or the literature center at (800) 222-9323. 2-9 29K Family Support Tools 'Mi@i+'i' HighC29K Advanced Micro Devices Cross-Development Toolkit, Release 2 DISTINCTIVE CHARACTERISTICS • Efficient, globally optimizing C complier technology developed by MetaWaren ", Inc. ANSI Standard C support and conformance verification (ANSI document X3J11/88-159, December 7,1988 and compile-time error checking. • HlghC29KTM toolkit Includes the entire ASM29KTM Cross-Development Toolkit. The ASM29K package contains: Relocatable macro assembler supports complete Am29000 microprocessor instruction set. • Complier supports load scheduling and delayed branch optimizations to promote fast Am29000™ microprocessor code execution. Linker/loader combines separately compiled or assembled modules by resolving external references and by searching libraries. . • Complier supports AMD's Am29027™ Arithmetic Accelerator. Librarian provides management facility for organizing modules into logical collections of functions. • Full ANSI standard run-time library of over 100 functions Include all standard 1/0 routines (stdlo). • Available for the PC-ATTM and Sun-3™ development environments. • Special library of high-performance transcendental functions. Full architectural simulator of the Am29000 microprocessor with user-defined memory access times. Allows designers to obtain pricel performance statistics for their particular Am29000 microprocessor design. IEEE software floating-point emulation functions accessible from C and assembly language modules. GENERAL DESCRIPTION Processor performance depends on the processor's hardware and software environment. The key to maximizing performance lies in the realization that the processor is part of a system which is a collection of components which must be properly integrated. To take advantage of the advanced RISC architecture of the Am29000 microprocessor, equally sophisticated software tools must be available to achieve this integration. The HighC29KTM Cross-Development Toolkit offers such a development environment for creating efficient and portable software for the 29KTM Family. The package consists of the full ANSI standard, optimizing C compiler, run-time libraries, assembler, linking loader, floating-point emulation, and object module librarian. These tools allow users to design more efficient systems and applications. Cross-development is the design of an application program on one computer (the host system) and the execution of that same application program on a different computer (the target system). The operating system on the host, such as UNIX or DOS, provides the tools needed to create the application program. These tools include editors for writing the source code, compilers and assemblers for translating the modules into executable code, and utilities for preparing the application for execution. The Am29000-based target computer generally does not provide the tools required to develop the application program. Figure 1 shows the path that an application follows from development on the host system to execution on the target system. The HighC29K Cross-Development Toolkit transforms a PC or Sun workstation host into a powerful software development environment. The HighC29K cross-compiler generates 29K Family relocatable object modules which can be combined with other relocatable object modules derived from the assembler or HighC29K compiler using the 29K Family linker/loader. Library modules prepared by the librarian can be linked in at this point as well. The resulting absolute object module can then be downloaded to a target system. AMD has established and published the 29K Family Common Object File Format (COFF) to which all 29K Family development tools conform. The AMD COFF format extends the already standard AT&T COFF format to support source-level debugging and other 29K Family-specific features. Similarly, AMD has estabPublication' 10957 Rev. B tssue Date: September 19811 2-10 Amendment /0 HIghC29K lished a common calling convention that maximizes performance on the 29K Family of microprocessors as well as defining standards for software vendors. This has led to a variety of compilers, assemblers, debug- gers, and associated tools that may be mixed freely by developers of 29K Family software. The contents of the HighC29K Cross-Development Toolkit include: HlghC29K: ASM29K (Included In HlghC29K Development Package): Optimizing C Compiler Relocatable Macro Assembler Documentation Documentation Function Libraries Architectural Simulator Linker/Loader Librarian IEEE Floating Point Emulation Routines Utilities Host Computer Target Computer Via On-Board Monitor or ADAPT29K Debugger Am29000 Microprocessor Figure 1. Cross Software Development 2-11 29K Family Support Tools ORDERING INFORMATION LIcensing Order Numbers The HighC29K Cross-Development Toolkit is licensed through AMD's Standard End-User Software License Agreement (Boxtop). This license does not require a signature; breaking the seal on the software package indicates acceptance of the license terms. If changes are required to the license agreement, they can be arranged through your AMD sales representative. Many software products require the customer to provide a CPU 10 number when ordering the product. Contact your sales representative if this information is not available at time of purchase. In addition, terms of the license require the customer to complete a Software Warranty card with the serial number and site of the host computer on which the development package will reside. This card must be returned to AMD within 30 days of receipt for the warranty to be valid. The HighC29K Cross-Development Toolkit is available for several different environments. Documentation can be ordered separately. The order number (Valid Combination) is formed as a combination of: • Product Family • Product Category • Product Identifier • License Type • Host/OS Type • Media Type AM29000 SWI HCC B ## ## T Media Type 08 = 0.25" Sun cartridge tape, TAR format 14 = 3.5" DSHD floppies 21 =9-track, 1600 BPI mag tape, TAR format 24 = 5.25" DSHD floppies Host/OS Type 07 = Sun-3 10 = PC-AT 99 = Not Host Specific LIcense Type B = Boxtop S = Signed "_" = Not Applicable Product Identifier HCC = HighC29K Cross-Development Toolkit Product Category SW/ = Software Product DCI = Documentation Product MN = Maintenance Agreement Product Family Am29000 Microprocessor 2-12 HlghC29K Valid Combinations Valid Combinations list configurations planned to be supported in volume forthis device. Consult the local AMD sales office to confirm availability of specific valid combinations and to check on newly released combinations. Order Number Product Host Media AM29000SWIHCCB0708 AM29000SWIHCCS0708 AM29000SWIHCCB0721 AM29000SW/HCCS0721 AM29000SWIHCCB1014 AM29000SWIHCCS 1014 AM29000SWIHCCB1024 AM29000SWIHCCS1024 AM29000DCIHCC-99 AM29000MAlHCC-07 AM29000MAlHCC-10 HighC29K Toolkit HighC29K Toolkit HighC29K Toolkit HighC29K Toolkit HighC29K Toolkit HighC29K Toolkit HighC29K Toolkit HighC29K Toolkit HighC29K Documentation HighC29K Maintenance HighC29K Maintenance Sun-3 Sun-3 Sun-3 Sun-3 PC-AT PC-AT PC-AT PC-AT Not Host Specific Sun-3 PC-AT 0.25" cartridge tape, TAR format 0.25" cartridge tape, TAR format 9-track, 1600 BPI tape, TAR format 9-track, 1600 BPI tape, TAR format 3.5" DSHD floppies 3.5" DSHD floppies 5.25" DSHD floppies 5.25" DSHD floppies Not Media Specific Not Media Specific Not Media Specific FUNCTIONAL INFORMATION Compiler The HighC29K cross-compiler supports an extended version of the C language designed for professional programmers. It includes a full ANSI implementation for portable applications, yet also allows user access to the best features of other languages such as nested functions from Pascal and named parameter association from Ada. Extensions to the C language also are supported, such as range notation in case statements and enumerated data types. The compiler allows users to create re-entrant procedures and to generate efficient code in terms of space and execution speed. The HighC29K cross-compiler facilitates program development for dedicated or stand-alone Am29000 designs. The compiler generates optimized, sharable code that takes full advantage of the Am29000 instruction set. The language contains a variety of control statements, data types, and predeclared procedures and functions that promote the development of wellstructured programs. For example, the user may specify the parameter types for external functions so that the compiler can check that arguments are passed correctly. The HighC29K cross-compiler generates 29K Family object modules directly. The HighC29K compiler optionally generates information necessary for symbolic debugging at the C or assembly level with XRAY29KTM, AMD's source-level debugger for the 29K Family. The compiler preprocessor allows the user to define macros, merge files into source and conditionally include or exclude code. Optimization As a highly optimizing cross-compiler, HighC29K software ensures the generation of fast, compact code by using advanced optimization techniques including common subexpression elimination, loop invariant analysis, global register allocation and automatic allocation of variables to registers. Many of the optimizations are particularly effective when using the unique features of the Am29000 microprocessor architecture. For example, its large register set means passing parameters in registers is more effective on the Am29000 microprocessor than on any other microprocesor. Optimizations specifically developed for the Am29000 RISC microprocessor architecture are also performed such as load scheduling for maximum instruction throughput. Additionally, the compiler makes extensive use of Am29000 microprocessor's large register file as a stack cache to store frequently accessed values. The list of optimizations performed include: Common subexpression elimination Retention/reuse of register contents Automatic allocation of variables to registers Dead code elimination and cascaded jumps Cross jumping (tail merging) Constant folding Switch statements optimally encoded using in-line branch table, binary search or linear search. Global flow analysis leading to removal of loop invariant values Load Scheduling Delayed Branch Several of these optimizations are explained below: Loop Invariant Analysis: Computations made inside of loops that do not change value in the loop can be moved outside the loop. The value is stored in a register for optimum access. Since an application may spend as much as 90% of its time executing loops, this optimization produces a significant gain in performance. 2-13 29K Family Support Tools Fold Constants: Operands that are constant can often be folded into a single constant, or into a temporary value. If constants are defined at compile time, the compiler can reduce them to a single value. Function Libraries The HighC29K toolkit includes three different sets of function libraries that enhance the functionality of the compiler. The library sets are comprised of: Load Scheduling: The Am29000 microprocessor supports overlapped load and store capabilities to decrease delays incurred while waiting for data. The compiler recognizes when certain instructions can be advanced in the pipeline for efficient operation. the ANSI standard library which provides the full set of functions specified by the ANSI C language standard Delayed Branch: The Am29000 microprocessor branch instruction is delayed by one cycle to allow the processor pipeline to achieve maximum throughput. The instruction following the branch instruction, called the delayed instruction is executed whether the branch is successful or not. In most cases, the compiler can easily place a useful instruction, i.e. an instruction other than NO-OP, as the delay instruction by reorganizing the code. a library of hand-coded transcendental functions optimized for use with the Am29000/Am29027 microprocessor combination. Data Types The single addressing mode of the Am29000 microprocessor combines with high-level language implementations to provide efficient access to all data types. Data Type int long int pointer short int char float double unsigned unsigned char unsigned short enum (default) enum (option) Size (Bits) 32 32 32 16 8 32 64 32 8 16 32 8,16,32 Am29027 Arithmetic Accelerator Support Target systems that include the Am29027 Arithmetic Accelerator for high-speed computations are directly supported through the compiler. Users may direct the compiler to generate in-line code to access the control and instruction registers of the accelerator. Versions of the libraries that assume direct use of the Am29027 microprocessor are included. Alternatively, the user can signal the compiler to generate Am29000 microprocessor floating-point instructions that are used in conjunction with the IEEE FloatingPoint Emulation Routines to access the accelerator. The HighC29K Cross-Development Toolkit includes AMD's entire ASM29K Cross-Development Toolkit. Details of this package are contained in the ASM29K Cross-Development Toolkit data sheet (order #10292). 2-14 a library of routines implementing the floating-point environment functions specified in the IEEE-754 standard Each library set contains several versions of the library which reflect the different possible target environments. The compiler driver is able to select the proper version of the library to use based on the compile-time options specified. ANSI Standard Library This library contains the full functionality specified by the ANSI standard for the C language (X3J11/88-159, December, 1988). At the lowest level, the library functions interface with HIF (Host Interface), a small kernel system defined by AMD. HIF is supported in all AMD products, and is defined in the HighC29K toolkit manual for the customer who needs to adapt to a different environment. The functions included in the ANSI Standard Library are: Mathematical atan2 abs ceil fabs floor log log10 sinh Routines frexp exp Idexp pow sin tanh modf tan atan sqrt asin cosh acos cos fmod Memory Allocation calloe free malloc realloc Standard Formated I/O fprintf printf sprintf vfprintf vsprint fscanf sscanf vprintf _setmode Standard File I/O fopen fclose fflush freopen remove rename Character Routines isalnum iscntrl isgraph isxdigit toupper isalpha ispunct isupper tolower Character I/O Routines fgetc fpute getc ungetc fgets fputs scanf setbuf setvbuf tmpfile tmpnam isprint isdigit isspace islower gets getchar putchar putc puts HlghC29K String Routines memchr strcat _strncat memcmp strxfrm memcpy _rmemcpy me move _rstrcpy memset _strcats Direct I/O Routines fgetpos fread fseek rewind General Routines abort atol strtoul atexit srand system exit strtod mblen qsort strncpy strerror strlen strncat strncmp strcspn strchr strcmp strcoll strcpy getenv bsearch atoi wctombs strtol Date and Time Routines asctime ctime gmtime strftime time clock strtok strpbrk strrchr strspn strstr fsetpos ftell fwrite mbstowcs labs div atof wctomb rand mbtowc Idiv onexit localtime mktime difftime Miscellaneous Routines assert ferror localeconv perror setjmp signal va_end clearerr kill longjmp raise setJocale va_arg va_start feof Floating-Point Environment Library The functions included in the Floating-Point Environment Library are: . class rclass copysign rcopysign finite rfinite isnan 10gb risnan rlogb nextafter rnextafter remainder rremainder scalb rscalb unordered runordered Fast Transcendental Library This library provides special hand-coded versions of the standard transcendental functions. These functions are optimized for performance with the Am29000/Am29027 microprocessor combination. The functions included are: atan cos exp sin sqrt tan log pow Floating-Point Emulation The Am29000 microprocessor's instruction set includes floating-point and integer math operations. In the simplest processor implementation, these instructions cause traps to routines that perform the operations. The user is provided with source to two complete sets of routines that emulate IEEE Floating-Point Standard 754 for each of the instructions listed below. The first set of trap handlers is provided for users who have integrated the Am29027 arithmetic accelerator into their systems. The Am29000 microprocessor math instructions are performed using the Am29027 microprocessor. The second set of trap handlers implements emulation of the floating-point operations entirely in software. No special hardware is required. Documentation instructs users how to integrate the package into their target system. Both packages are designed to insure upward compatibility with future generation processors. The floating-point routines are accessible from both the assembler and compiler. To eliminate the overhead incurred by using the trap handlers, direct code generation (in-line coding) of Am29027 microprocessor floating-point operations is an included option of the HighC29K Cross-Development Toolkit. Am29000 Microprocessor Floating-Point Instructions Mnemonic Operation CONVERT Convert values between types Integer, Float, and Double Compare Floats Equal Compare Doubles Equal Compare Floats Greater Than Compare Double Greater Than Compare Floats Less Than Compare Double Less Than Float Add Double Add Float Subtract Double Subtract Float Multiply Double Multiply Float Divide Double Divide FEQ DEQ FGT DGT FGE DGE FADD DADO FSUB DSUB FMUL DMUL FDIV DDIV Utilities A set of utilities is provided to work with the output files produced by the deve lopment tools. They allow the user to prepare output files for downloading into target systems and to create ROM images. The utilities include: • coff2hex: Converts Am29000 microprocessor COFF files to Motorola® S-record or Extended Tektronix® Hex Files. • romcoff: Allows creation of ROM images. from Am29000 microprocessor COFF files. • cvcoff: Translates Am29000 microprocessor COFF files between big endian/little endian hosts. • strpcoff: "Strips" symbolic information from an executable COFF file. 2-15 29K Family Support Tools MAINTENANCE AND SUPPORT Software Warranty Software programs licensed by AMO are covered by the warranty and patent indemnity provisions appearing in AMO's standard Software License Forms. AMD makes no warranty, express, statutory, implied or by des~rip tion regarding the information set forth herein or regarding the freedom of the described software program from patent infringement. AMD reserves the right to modify, change or discontinue the availability of this software program at any time and without notice. Support Customer Support All orderable software products include one year of free maintenance support, which starts from the date of original purchase. Maintenance support allows customers to receive technical assistance from highly trained field and factory personnel, to use a call-in on-line information system and to receive product and documentation updates at no additional charge. Customers may extend maintenance support in one-year increments. Customers can access suppport services by calling the 24-hour, toll-free 29K Family hotline at (800) 2929-AMD (292-9263). On-Line Call-In Bulletin Board In addition to the support engineering staff, AMD offers a 24-hour on-line technical support center. The customercan call (800) 2929-AMD at anytime to query the 2-16 system for the latest information on a particular product: bug fixes, work-arounds and information on up-coming releases. Messages may be left for the support engineering staff during "after hours." Training Classes AMD offers training classes for the 29K Family products. These classes focus on 29K Family system design and implementation using the broad range of AMD software development tools. Customers can shorten the development process through extensive hands-on training covering a variety of topics. Contact your local AMD field sales office for more information on training classes. Fuslon29K Program AMD encourages broad-based development and support for the Am29000 with the Fusion29I(TM program, a joint-effort program between AMD and third-party' developers. A bi-annual Fusion29K program catalog reveals the breadth of development and system solutions for the 29K Family, including software generation and debug tools; hardware development tools; executive, kernel and multi-user operating systems; board-level products; silicon products; and more. For a copy of the Fusion29K program catalog, call your local AM 0 field sales office or the literature center at (800) 222-9323. MON29K ~ Advanced Micro Devices MON29K Target Resident Debug Monitor DISTINCTIVE CHARACTERISTICS • Provides local control of an Am29000™ microprocessor-based system • Provides eight breakpoints plus singleand multiple-Instruction stepping • Interfaces to the XRAY29KTM Source-Level Debugger • Allows selection of user-defined displays after each breakpoint or single step • Allows modification and display of memory, registers and 1/0 ports • Provides In-line assembler and disassembler • • Supports modification and display of specialpurpose registers by group Supports downloading of COFF and hex flies from remote systems • • Allows access to both user- and system-level code Provided in source form (C and Am29000 microprocessor assembly) to simplify Installation of I/O devices • Supports the AMD Am29027™ Arithmetic Accelerator • Offers familiar user interface, similar to DEBUG on IBM~ PC • Allows modification and display of Am29027 microprocessor registers GENERAL DESCRIPTION The Target Resident Debug Monitor (MON29KTM) resides on Am29000 microprocessor-based hardware. It provides all the control a designer needs to load, execute and debug Am29000 microprocessor programs. MON29K software is provided in source form so its I/O drivers and service routines can be modified easily, which allows MON29K software to be customized for various hardware configurations. MON29K software provides the ability to set breakpoints, to set and display memory and registers, to read and write I/O ports, to trace execution in single or multiple steps, and to download files from a remote host. MON29K software is controlled by either an ASCII terminal or a host computer connected to a serial port on the target system. MON29K software supports high-level language debugging through XRAY29K, the Am29000 microprocessor source-level debugger. In addition to its own standard command set, the XRAY29K debugger supports all the MON29K software commands. The MON29K product includes: MON29K source code Documentation Publication # Rev. ~B Amendment --'0- Issue Date: September 1989 2-17 29K Family Support Tools ORDERING INFORMATION Licensing Order Numbers The MON29K Resident Monitor is licensed through AMD's Standard End-User Software License Agreement (Boxtop). This license does not require a signature; breaking the seal on the product package indicates acceptance of the license terms. If changes are required to the license agreement, they can be arranged through your AMD sales representative. Many software products require the customer to provide a CPU ID number when ordering the product. Contact your sales representative if this information is not available at the time of purchase. In addition, terms of the license require the customer to complete a Software Warranty card with the serial number and site of the host computer on which the resident monitor source will reside. This card must be returned to AMD within 30 days of receipt fO,r the warranty to be valid. MON29K software executes on Am29000 microprocessor-based systems but is distributed in machine readable source form for several hosts. Thus, media type is the only distinguishing characteristic when ordering MON29K software. Documentation can be ordered separately. The order number (Valid Combination) is formed as a combination of: AM29000 SWI MON B • • • • • • Product Family Product Category Product Identifier License Type Host/OS Type Media Type ## T Media Type 08 = 0.25" cartridge tape. TAR format 14 = 3.5" DSHD floppies 21 = 9-track, 1600 BPI mag tape, TAR format 24 =5.25" DSHD floppies L...-_ _ __ "-------~---------- L...-_ _ _ _ _ _ _ _ _ _ _ _ _ __ _ Host/OS Type 99 = Not Host Specific LIcense Type B = Boxtop S = Signed "-" = Not Applicable Product Identifier MON = MON29K Target Resident Debug Monitor Product Category SWI = Software Product DCI = Documentation Product MN = Maintenance Agreement Product Family Am29000 Microprocessor 2-18 MON29K Valid Combinations Valid Combinations lists configurations planned to be supported in volume for this device. Consult the local AMD sales office to confirm availability of specific valid combinations and to check on newly released combinations. Part Number AM29000SW/MONB9908 AM29000SW/MONS9908 AM29000SW/MONB9914 AM29000SW/MONS9914 AM29000SW/MONB9921 AM29000SW/MONS9921 AM29000SW/MONB9924 AM29000SW/MONS9924 AM29000DC/MON-99 AM29000MAlMON-99 Product MON29K MON29K MON29K MON29K MON29K MON29K MON29K MON29K MON29K MON29K Resident Monitor Resident Monitor Resident Monitor Resident Monitor Resident Monitor Resident Monitor Resident Monitor Resident Monitor Documentation Maintenance Host Not Not Not Not Not Not Not Not Host Specific Host Specific Host Specific Host Specific Host Specific Host Specific Host Specific Host Specific UNIX Not Host Specific Media 0.25" cartridge tape, TAR format 0.25" cartridge tape, TAR format 3.5" DSHD floppies 3.5" DSHD floppies 9-track, 1600 BPI tape, TAR format 9-track, 1600 BPI tape, TAR format 5.25" DSHD floppies 5.25" DSHD floppies Not Media Specific Not Media Specific FUNCTIONAL DESCRIPTION MON29K software resides on the target system and interfaces to the user through an ASCII terminal connected to a serial port on the target system. All commands and formatted displays are communicated through this serial link. MON29K software supports simple display formats so that compatibility can be maintained with any CRT. MON29K software provides program development support at the assembler source level. High-level source code development is provided by the XRAY29K debugger when it is connected to MON29K monitor. MON29K serves as the target resident monitor that interrogates memory and registers for the host-resident source-level debugger. Memory, Register and 1/0 Addresses MON29K software supports three address spaces: register, memory, and 110. Data values are always represented in hex, as are memory and I/O addresses. Register addresses are represented by decimal numbers and grouped as general, local, global, specialpurpose, and TLB. Special-purpose and TLB registers can be accessed by register number or by their abbreviated mnemonic. The Special-Purpose Registers section that follows discusses other commands for accessing these registers. Memory and I/O addresses are assumed to be real because MON29K software has no mechanism for calculating or interpreting virtual addresses. MON29K software allows specification of user and supervisor modes and specification of OPT lines with all memory and I/O addresses. Displaying Memory and Registers The Displaycommand shows data for a specified range of addresses, beginning at a specified address or from the currently active address. Each line in the display contains 16 bytes of data. The 16 bytes are displayed as either bytes, half-words, words, single-precision, or double-precision floating points, depending on the command entered. Floating-point numbers are displayed in decimal format if the value can be represented accurately within the digits available. Otherwise, scientific notation, E format, is used. Following the numeric data is a string of ASCII characters in which each character corresponds to one byte of data. When no ASCII equivalent exists for the byte of data, a period is displayed. Figure 1 shows examples of memory and register displays. Altering Memory and Registers Memory and register contents can be set, filled, or moved. The set command allows the contents of registers and memory to be examined and optionally changed. One or more values can be set without examining the previous contents. The fill command sets a range of register or memory addresses to a specific value. The move command copies blocks of data from one range of addresses to another. Blocks in the destination address range may overlap blocks in the source address range. 2-19 29K Family Support Tools Special-Purpose Registers In-Line Assembler/Disassembler The special-purpose register commands provide. another method for accessing the Am29000 microprocessor special-purpose and TLB registers. These registers are organized into groups: Unprotected, Protected, TLB Entries, and Coprocessor. Specific commands are used for examining the contents of registers in each. group. Within a group, each register's contents can be examined or changed explicitly. An in-line assembler/disassembler allows the user to examine and change memory using instruction mnemonics rather than hex values. This improves readability and minimizes user efforts while entering changes to instruction memory. The lexical conventions and statement syntax used are identical to the standard AMD assembler, ASM29KTM. I/O Commands I/O commands provide simple forms of input and output. They are intended to allow quick examination and simple control of devices. These commands read or write a full word of data to or from a real 110 address. The large number of registers necessitates special register display screens that clearly present each group's registers. To enhance display efficiency, the single command X is available. It displays the registers most likely to be in use: all the global registers, half the local registers, and all the unprotected registers. Figures 2 and 3 show examples of special-purpose register display screens. #dw LR4, LRll LR004 61006200 63006400 65006600 67006800 LR008 69006aOO 6b006cOO 6d006eOO 6f007000 # # # DB 100001, 1001FI 000100001 61 00 62 00 63 00 64 00 65 00 66 00 67 00 68 00 a.b.c.d.e.f.g.h. 000100101 69 00 6a 00 6b 00 6c 00 6d 00 6e 00 6f 00 70 00 i.j.k.l.m.n.o.p. Figure 1. Register and Memory Display #xp CA 0 OPS: 0 cps: VAB 0000 IP 0 0 TE 0 0 TP 0 0 CFG: PRL VF 01 1 CHA 00000000 RBP: BF 0 TCV 000000 # RV 0 CHD 00000000 BE 0 BD 0 TR: OV 1 TU 0 0 BC 0 IN 1 BO 0 FZ 0 0 CP 0 CHC: CE 0 BB 0 BA 0 LK 0 0 RE 0 0 PD 0 0 WM 0 0 PI 0 0 SM 0 0 1M 0 0 DI 0 0 CD 1 CNTL 00 B9 0 CR 00 B8 0 LS 0 B7 0 ML 0 B6 0 ST 0 B5 0 LA 0 B4 0 TR 00 TF 0 B3 0 PC2 IE TRV PCO PC1 0 000000 00010004 00010000 00000000 Figure 2. Protected Register Group Display 2-20 DA 0 0 B2 0 NN CV 0 0 B1 0 MMU: PS 0 BO 0 PID LRU 00 0 MON29K Downloading Program execution can be stepped one instruction at a time or a group of instructions at a time. User-defined displays and the address and contents of the next executable instruction are displayed after each instruction step. When stepping by group, these displays can be delayed either until after the last instruction in the group is executed, or until after each instruction is executed. An option allows only register data that was changed to be displayed. This automatically informs the user of register changes, thus eliminating the need to visually monitor register contents. Downloading controls the transmission of data from a remote system to the local memory on the target system. MON29K software can read COFF binary, Motorola S3 hex records, and TEK extended hex files. Each of these formats contains the address and byte count in-formation for loading memory, so no other parameters need to be specified. An optional downloading parameter, , can be specified by the user. The is a character string that is uploaded by MON29K to the remote host system. This command can be used to initiate the host download procedure remotely from the MON29K monitor terminal. Remote Mode MON29K software supports two serial ports: one to a terminal and one to a host computer. In normal mode, either port can be used for initiating commands or for downloading programs. In remote mode, the two serial ports are linked together, allowing the terminal to communicate directly with the host computer. Execution Control Execution control commands allow the user to start program execution, setup through instruction singly or in groups, breakpoint execution, and specify monitor commands to be performed when termination occurs. Following each break in program execution, the MON29K monitor displays the address and disassembled contents of the next executable instruction. In addition, the user can identify registers and memory he wishes to view after the termination of each breakpoint or step command. This reduces the amount of information displayed to the data that is pertinent to the current debugging session. Miscellaneous Commands An on-screen help facility, as seen in Figure 4, lists all MON29K monitor commands. Information about a specific command is obtained by specifying the command name as a parameter to the help command. Am29027 Arithmetic Accelerator Support MON29K software is fully integrated with the AMD Am29027 Arithmetic Accelerator. In the same manner that the Am29000 microprocessor registers can be accessed, the Am29027 microprocessor registers can be both displayed and modified using MON29K software. An example o(an Am29027 microprocessor register display is shown in Figure 5. MON29K software provides eight "sticky" and two "nonsticky" breakpoints. Sticky breakpoints remain set until expressly removed by the user. These are useful when debugging code within an instruction loop. Non-sticky breakpoints occur once and are removed automatically. Non-sticky breakpoints are optional parameters of the go command. Users can easily display, set, and reset breakpoint addresses. #XT LINE SET 1ST REG 00 0 TROOO 1 TR064 00 01 TR002 0 01 1 TR066 02 TR004 0 02 1 TR068 0 TR006 03 1 03 TR070 # 0: VTAG 00000 00000 00000 00000 00000 00000 00000 00000 VE 0 0 0 0 0 0 0 0 SR 0 0 0 0 0 0 0 0 sw 0 0 0 0 0 0 0 0 SE 0 0 0 0 0 0 0 0 UR 0 0 0 0 0 0 0 0 UW 0 0 0 0 0 0 0 0 UE 0 0 0 0 0 0 0 0 TID 00 00 00 00 00 00 00 00 1: RPN 000000 000000 000000 000000 000000 000000 000000 000000 PGM 0 0 0 0 0 0 0 0 U 0 0 0 0 0 0 0 0 F 0 0 0 0 0 0 0 0 Figure 3. TLB Entries Group Display 2-21 29K Family Support Tools Target System Requirements Other Tools The Am29000 microprocessor supports separate code and data spaces and provides no instructions for moving information between data and instruction spaces. Because of this. the target system must provide a mechanism for writing to code space in order for MON29K monitor to set breakpoints and load instruction memory. MON29K is a stand-alone product that does not depend on other software to function. However. MON29K software is delivered in source form and will need to be compiled with the AMD HighC29KTM CrossDevelopment Toolkit; modification may be necessary if compiled with other Am29000 microprocessor C compilers. MON29K software is designed to support a memory. mapped Z8530 SCC serial device. However. source code is provided so the user can change the MON29K monitor to support other devices on a particular target system. 4/0 H Help: H or? H ? XP-Display/set XT- Display/set XC- Display/set X - Display key Y - Load a file V - Save memory to see this display help with a named command help with a named command Target Resource Access: D - Display registers/memory S - Set registers/memory F - Fill registers/memory M - Move registers/memory A - Assemble in memory L - List disassembly from mem I - Input from port o - Output to port XU-Display/set unprotected reg protected reg TLB entries Arn29027 reg registers to memory to a file Execution Control: E - End execution command list B - Display/Set/Clear breaks G - Go (start execution) T - Trace (single/multiple step) Miscellaneous: R - Remote mode (talk to host) N - Normal (change 'normal' char) Q - Re-initialize monitor Figure 4. On-Screen Help Facility 4/0 xc RFO: RF2: RF4: RF6: PR 0 0 0 0 MSW 00000000 00000000 00000000 00000000 LSW 00000000 00000000 00000000 00000000 RF1: RF3: RFS: RF7: R: R TEMP: 00000000 00000000 00000000 00000000 S: S TEMP: F: 00000000 00000000 IP INSTR: 0 I TEMP: 0 RP 0 RF 0 0 0 RFS 0 0 PMS 0 0 QMS 0 TMS 0 0 0 STATUS: OP IV SV RV ES ZE XE UE VE RE IE 0 0 0 0 0 0 0 0 0 0 0 OP HE AD 0 0 0 MVTC MATC PLTC 0 0 0 PR 0 0 0 0 SIP 0 0 SIQ 0 0 FLAGS:FL6 FLS 1 0 LSW 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0 SIF 0 0 IF 0 0 CO 00 00 FL4 0 FL3 0 FL2 0 FLl 0 SIT 0 FLO 0 ZM XM UM VM RM 1M PL RMS MF MS BU BS SU TR AP SA AFF PFF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Figure 5. Am29027 Register Display 2-22 MSW 00000000 00000000 00000000 00000000 MON29K MAINTENANCE AND SUPPORT Software Warranty Software programs licensed by AMD are covered by the warranty and patent indemnity provisions appearing in AMD's standard software license forms. AMD makes no warranty, express, statutory, implied, or by description regarding, the information set forth herein or regarding the freedom of the described software program from patent infringement. AMD reserves the right to modify, change, or discontinue the availability of this software program at any time and without notice. Training Classes AMD offers training classes for the 29K Family products. These classes focus on 29K Family system design and implementation using the broad range of AMD software development tools. Customers can shorten the development process through extensive hands-on training covering a variety of topics. Contact your local AMD field office for more information on training classes. Customer Support Maintenance Fuslon29K Program AM D encourages broad-based development and support for the Am29000 microprocessor with the Fusion29KTM program, a joint-effort program between AMD and third-party developers. Published twice a year, the Fusion29K program catalog reveals the breadth of development and system solutions for the 29K Family, including software generation and debug tools; hardware development tools; executive, kernel, and multi-user operating systems; board-level products; silicon products; and more. For a copy of the Fusion29K program catalog, call your local AMD field sales office or the literature center at (800) 222-9323. All orderable software products include one year of free Maintenance Support, which starts from the date of original purchase. Maintenance Support allows customers to receive technical assistance from highly trained field and factory personnel, to use a call-in online information system, and to receive product and documentation updates at no additional charge. Customers may extend Maintenance Support in oneyear increments. Customers can access support services by calling the 24-hour, toll-free 29KTM Family hotline at (800) 2929-AMD (292-9263). On-Line Call-In Bulletin Board In addition to the support engineering staff, AMD offers a 24-hour on-line technical support center. The customer can call (800) 2929-AMD at any time to query the system for the latest information on a particular product: bug fixes, work-arounds, information on upcoming releases, etc. Messages may be left for the support engineering staff during "after hours." 2·23 29K Family Support Tools ~~i~!ij&if~----------------------------~--, Advanced Micro Devices XRAV29K Source-Level Debugger DISTINCTIVE CHARACTERISTICS • Supports symbolic debugging with C expressions and statements for Am29000'· microprocessor development environments • Controls and examines program execution In high-level and assembly-level modes • Provides Interface and start-up code for the Am29000 microprocessor, which allows use of the MON29K'· Target-Resident Monitor, ADAPT29K'· Advanced Development and Protoyplng Tool and PCEB29K'" PC Execution Board • Uses window-oriented display to segregate debug Information In meaningful regions • Allows single-step execution and placement of simple and complex breakpoints • Supports custom screens and vlewports, and one-key command functions • Provides command, breakpoint, and viewport macros • Supports automatic test sequences by processing command flies and logging output to a file • Includes on-line help, comprehensive documentation, and a sample debug session GENERAL DESCRIPTION AMD's XRAY29K'· source-level debugger provides engineers with a multiwindow interactive environment for debugging high-level and assembly-level software programs for Am29000"based systems. XRAY29K software resides on IBM8 ATs8 and compatibles, and Sun Workstations 8. Program execution is monitored and controlled in high-level source or assembly language, from the host system through the PCEB29K execution board, MON29K monitor or ADAPT29K debugger on the target system. Control is extensive, including debugger commands for seHing breakpoints, single stepping through the program, and examining or altering register and memory contents. XRAY29K software allows examination and modification of a variable's contents and computation of highlevel and assembly language expression values. Symbols can be added, displayed, and deleted in the symbol table. The XRAY29K product includes: II XRAY29K Software • Documentation • Install testing program • Start-up code for ADAPT29K or targets using MON29K Publication # 10626 Rev. C Issue Date: September 1989 2·24 Amendment /0 XRAY29K ORDERING INFORMATION Licensing Order Numbers The XRAY29K Source-Level Debugger is licensed through AMD's Standard End-User Software License Agreement (Boxtop). This license does not require a signature; breaking the seal on the product package indicates acceptance of the license terms. If changes are required to the license agreement, they can be arranged through your AMD sales representative. Many software products require the customer to provide a CPU ID number when ordering the product. Contact your sales representative if this information is not available at the time of purchase. The XRAY29K Source-Level Debugger is available for several different environments. Documentation can be ordered separately. The order number (Valid Combination) is formed as a combination of: • Product Family • Product Category • Product Identifier • License Type • Host I as Type • Media Type AM29000 SWI XRY B ## ## T Media Type 08 = 0.25" Sun cartridge tape, TAR format 14 = 3.5" DSHD floppies 21 = 9-track, 1600 BPI mag tape, TAR format 24 = 5.25" DSHD floppies Host I as Type 07 = Sun-3 10 = PC-AT B = Boxtop S = Signed "-" = Not Applicable Product Identifier XRY= XRAY29K Source-Level Debugger ' - - - - - - - - - - - Product Category SWI = Software Product DCI = Documentation Product MN = Maintenance Agreement Product Family Am29000 Microprocessor 2-25 29K Family Support Tools Valid Combinations Valid Combinations list configurations planned to be supported in volume for this device. Consult the local AMD sales office to confirm availability of specific valid combinations and to check on newly released combinations. Order Number AM29000SWIXRYB0708 AM29000SWIXRYS0708 AM29000SWIXRYB0721 AM29000SWIXRYS0721 AM29000SWIXRYB 1014 AM29000SWIXRYS1014 AM29000SWIXRYB1024 AM29000SWIXRYS 1024 AM29000DCIXRY-99 AM29000MAlXRY-07 AM29000MAlXRY-10 Product XRAY29K Source-Level Debugger XRAY29K Source-Level Debugger XRAY29K Source-Level Debugger XRAY29K Source-Level Debugger XRAY29K Source-Level Debugger XRAY29K Source-Level Debugger XRAY29K Source-Level Debugger XRAY29K Source-Level Debugger XRAY29K Documentation XRAY29K Maintenance XRAY29K Maintenance Host Sun-3 Sun-3 Sun-3 Sun-3 PC-AT PC-AT PC-AT PC-AT UNIX Sun-3 PC-AT Media 0.25" cartridge tape, TAR format 0.25" cartridge tape, TAR format 9-track, 1600 BPI tape,TAR format 9-track, 1600 BPI tape,TAR format 3.5" DSHD floppies 3.5" DSHD floppies 5.25" DSHD floppies 5.25" DSHD floppies Not Media Specific Not Media Specific Not Media Specific FUNCTIONAL DESCRIPTION XRAY29K software aids the control and examination of program execution, and can set and examine memory and register contents, set and remove breakpoints in either high-level source or assembly language code, and display and alter the microprocessor state. In addition to symbolic debugging, the XRAY29K debugger's special features include help screens, macro capabilities, command files, conditional commands, and debugging through ports. For example, in batch mode, command files can issue directives to XRAY29K software to implement automated test sequences. XRAY29K software functions in either high-level or assembly-level mode. In high-level mode, an application is debugged using C language source lines to control and monitor execution. C variables and expressions replace numeric addresses for memory access. Code can be viewed by line number or procedure name. In assembly-level mode, an application is debugged using assembly language statements. In addition to all the capabilities available in high-level mode, assembly-level mode includes machine-level register and status bit manipulation. For each mode, the monitor's screen is partitioned in areas called viewports, where information is displayed in meaningful regions and is easy to identify. 2-26 Viewport Commands When the XRAY29K debugger executes, the screen is divided in areas called viewports. The number of viewports and the information shown in each depends on whether the object module was written in a high-level language (high-level mode) or assembly language (assembly-level mode). The standard screen for high-level mode has four viewports: data, trace, code, and command. This screen is displayed when an object module generated by a highlevel source program is executed. The standard screen for assembly-level mode has five viewports: data, stack, disassembled code, Am29000 microprocessor registers, and command. This screen is displayed when an object module generated by an assembly language program is executed. Figures 1 and 2 show examples of these screens. Viewport commands control the way information is displayed on the screen. Changing a viewport's size, color, and cursor position as well as adding and deleting a custom viewport are viewport commands. In addition, viewports can be cleared of data, and macros can be associated with them. Frequently used viewport commands are associated with function keys for easy access. XRAY29K vactive vclear vclose vmacro vopen vscreen vsetc zoom Activate a viewport Clear data from a viewport Remove a user-defined viewport or screen Attach a macro to a viewport Create a screen/create or resize a viewport Activate a screen Set a viewport's cursor position Increase or decrease viewport size Commands to attach a macro to a viewport are part of the viewport command set. Commands that attach a macro to a breakpoint are part of the execution and breakpoint command set. define Create a macro show Display a macro source Debugger Commands Macro Commands XRAY29K software supports macros to create and execute complex command procedures, such as testing program variables, and to conditionally execute other sets of commands. Macros can be defined and used any time during a debugging session and can include comments to explain its function. The macro definition may contain parameters that can be changed for each macro call. Used as commands or in expressions, macros can be attached to a breakpoint to create complex breakpoint condition testing, or to a custom viewport to control data display. Complex initialization conditions can be represented as a sequence of macro commands in a command file. Statements to increment variables, perform loops and conditions, and control target program flow can be part of a macro. XRAY29K software provides a set of macro flow control statements. These statements are similar to C conditional statements (e.g., IF, ELSE, WHILE, DO, FOR, RETURN and CONTINUE). To create a macro, the define command is used. After macro creation, the show command allows the macro's source to be viewed. Commands, whether in high-level source or assembly language mode, can be entered interactively from the keyboard in the command viewport or placed in a command file and accessed as include or batch files. Some commands take qualifiers that provide additional information on how to execute the command and parameters that describe an object and communicate addresses or file specifications. Breakpoints and Execution Commands A breakpoint causes program execution to halt or causes the XRAY29K debugger to take some action, such as incrementing a counter each time the target program attempts to execute an instruction at a specified memory location. A macro can be associated with the breakpoint to control execution. A special breakpoint viewport shows breakpoint information during the debugging session, including the breakpoint identification number. Automatically assigned by XRAY29K software, the breakpoint number can reference or clear a breakpoint. Execution commands start program execution or re-sume execution after explicit suspension. The program can be instructed to continue, single step, or set temporary instruction breakpOints. Single stepping is performed by C source line in high-level mode and microprocessor instruction in assembly-level mode. In addition, for each step, a macro can be invoked. - - - - - Data------. Monitored Data TraceRoutine Traceback Information . - - - - - - - - - - - Code - - - - - - - - - - - , Source Code - - - - - - - - S t a t u s Line - - - - - - - . - - - - - - - - - - Command - - - - - - - - - , Debugger Commands Figure 1. Standard High-Level Screen 2-27 29K Family Support Tools ~Stack- Data I Stack Contents Monitored Data I , - - Registers- Code Am29000 Microprocessor Registers Disassembled Code Status Line Command Debugger Commands Figure 2. Standard Assembly-Level Screen breakinstruction Set an instruction breakpoint clear Clear a breakpoint go Start or continue program gostep step stepover execution Execute a macro after each instruction step Execute a number of instructions or lines Single step, but execute through procedures expand find fopen fprintf list monitor next nomonitor Display Commands Display commands write program information to a viewport or file about memory, expressions, or procedures. C source code, for example, can be listed starting at a particular line number or for a named procedure. Any active procedure-a procedure on the stack---<:an have its values displayed. Memory contents can be dumped in both hexadecimal and ASCII text format, and, when in assembly-level mode, memory can be disassembled and displayed in the code viewport. Variables can be monitored and examined in the data viewport as the target program executes. An expression or expression range can be displayed in the command viewport according to type. For type conversions, scaling, and output positioning, display commands can open a file or device and then write formatted output to it. Several format options are provided, similar in function to those provided to C in standard runtime libraries. disassemble. Display disassembled memory dump 2-28 (assembly mode) Display memory contents printf printvalue Display a procedure's local variables Search for a string Open a file or device for writing Print formatted output to a viewport Display C source code Monitor expressions Find a string's next occurrence Discontinue monitoring an expression Print formatted output to command viewport Print a variable's value Memory and Register Commands To help track down problems and test fixes, memory and registers can be examined and altered. Two blocks of memory, for example, can be compared for similarities or differences to check for a corrupt RAM image. Memory and registers can be modified temporarily to patch programs and continue testing during a debugging session. Expression evaluation is supported during searching and modification. compare copy fill nomen search setmem setreg test Compare two blocks of memory Copy a memory block Fill a memory block with values Prevent access to a memory location Search a memory block for a value Change a memory address Change a register's contents Examine memory area for invalid values XRAY29K Symbol Commands A symbol is a sequence of characters used to represent arithmetic values, memory addresses, and C variables. XRAY29K software knows about two types of symbols: program and debugger. Program symbols are symbolic data names or program labels that were defined during the source program's creation. Debugger symbols manipulate and direct the flow of the debugger and are specified by the user during a debugging session. Symbol commands encompass both types of symbols. Debugger symbols can be added to the debugger symbol table, and then displayed or removed. Information about program symbols, such as name, data type, storage class, and memory location, can be displayed. add Create a symbol context Show the current context delete Delete a symbol from the symbol table printsysbols Display symbol information scope Specify current module and procedure . scope Utility Commands Command files are commonly used to read macro definitions from a file or to change viewports. After a command file has been created, it may be included in a startup file and executed as if entered at the keyboard. When an include file error is encountered, XRAY29K software can be directed to quit, abort, or continue. A log of commands entered at the keyboard can be retained and then subsequently used as a command file. If XRAY29K software display and execution defaults are changed, they can be saved in a new startup file. All these operations are accessed through utility commands. Other utility commands control the microprocessor's state. Reset simulates a microprocessor reset. Restart restores the microprocessor to its initial state without initializing memory or restarting the program, and it sets the program counter to the original starting address from the absolute file but maintains breakpoint declarations. In addition, the user can temporarily change the default values for debugger startup options, such as enabling procedure-level tracing in the trace viewport and intermixing C source code with assembly code in the code viewport. XRAY29K software automatically selects the correct debugging mode-based on whether the object module was created by the high-level compiler or the assembler. When a program has both kinds of object modules, a utility command toggles between the two modes. XRAY29K software includes a search facility that can find information in a source file and display the value of an expression in decimal, hexadecimal or ASCII format. On-line help is provided for all debugger commands, command arguments, and function keys, and includes a selection menu. alias cexpression error help history include journal log mode option pause reset restart startup Replace the name of the command Calculate an expression's value Set include file error handling Display on-line help screen Recall a specifc command Read in and process a command file Save all viewport commands and data to a file Record debugger commands and errors in a file Select debugging mode (high or assembly) Set debugger options for this session Pause simulation Simulate microprocessor reset Reset the program starting address Save the default startup options Session Control The debugger session can be ended at any time or can be paused while the host operating system environment is used and then entered again. This area also controls which object modules are loaded for debugging. host Temporarily enter the host environment load Load an object module for debugging quit End a debugging session System Requirements The XRAY29K software resides on the host system and presents the user with a friendly, high-level interface to the Am29000 microprocessor-based system. The software communicates with the host system through a serial interface to the ADAPT29K unit or a target board running the MON29K target-resident debug monitor, or a bus interface to the PCEB29K personal computer execution board. The MON29K software and the ADAPT29K unit actually perform all the Am29000 microprocessor memory and register reads and writes requested by the user through XRAY29K debugger commands. Before the XRAY29K debugger can be used, an absolute object module must be created and downloaded into the target system RAM memory. The object module is created using AM D's HighC29K compiler or ASM29K assembler. Once generated, the object module is loaded into target system RAM memory by invoking the XRAY29K software Load command. Figure 3 illustrates the AMD development tool chain. 2-29 29K Family Support Tools Software Warranty Software programs licensed by AMD are covered by the warranty and patent indemnity provisions appearing in AMD's standard software license forms. AMD makes no warranty, express, statutory, implied, or by description regarding the information set forth herein or regarding the freedom of the described software program from patent infringement. AMD reserves the right to modify, change or discontinue the availability of this software program at any time and without notice. Customer Support Maintenance All orderable software products include one year of free Maintenance Support, which starts from the date of original purchase. Maintenance Support allows customers to receive technical assistance from highly trained field and factory personnel, to use a call-in on-line information system and to receive product and documentation updates at no additional charge. Customers may extend Maintenance Support in one-year increments. Customers can access support services by calling the 24-hour, toll-free 29K'· Family hotline at (800) 2929AMD (292-9263). On-Line Call-In Bulletin Board In addition to the support engineering staff, AMD offers a 24-hour on-line technical support center. A customer (so~ce)1 can call (800) 2929-AMD at any time to query the system for the latest information on a particular product: bug fixes, work-arounds, information on upcoming releases, etc. Messages may be left for the support engineering staff during "after hours." Training Classes AMD offers training classes for the 29K Family products. These classes focus on 29K Family system design and implementation using the broad range of AMD software development tools. Customers can shorten the development process through extensive hands-on training covering a variety of topics. Contact your local AMD field sales office for more information on training classes. Fuslon29K Program AMD encourages broad-based development and support for the Am29000 microprocessor with the Fusion29K'· program, a joint-effort program between AMD and third-party developers. Published twice a year, the Fusion29K program catalog reveals the breadth of development and system solutions for the 29K Family, including software generation and debug tools; hardware development tools; executive, kernel and multi-user operating systems; board-level products; silicon products; and more. For a copy of the Fusion29K program catalog, call your local AMD field sales office or the AMD literature center at (800) 222-9323. HighC29K or PCEB29K Figure 3. AMD Development Tool Chain 2-30 Table of Contents CHAPTER 3 29K Family Application Notes Am29000 SYSCLK Driving ...................................................................................................................................3-3 Connected Am29000 Instruction/Data Buses ....................................................................................................... 3-5 Byte-Writable Memories for the Am29000 ............................................................................................................3-8 Am29027 Hardware Interface .............................................................................................................................3-10 When is Interleaved Memory with the Am29000 Unnecessary? ........................................................................ 3-14 Implementation of an Am29000 Stack Cache ................................................. :.................................................. 3-20 Introduction to the Am29000 Development Tools ............................................................................................... 3-42 Preparing PROMs Using the Am29000 Development Tools .............................................................................. 3-81 Programming Standalone Am29000 Systems .................................................................................................. 3-107 Host Interface (HIF) v1.0 Specification .............................................................................................................3-163 Am29000 SYSCLK Driving Application Note by Tom Crawford INTRODUCTION The purpose of this note is to describe the options of connecting the SYSCLK pin in an Am29000™ system. GENERAL CONSIDERATIONS SYSCLK in any Am29000 system is going to be a highfrequency, heavily loaded signal with strict duty factor requirements. The most important considerations are DC levels, capacitive loading, rise/fall times, high/low times, and transmission line effects. Before resorting to parallel termination, one should consider carefully the effects of relatively high DC loading on the buffer V OH and VOL' The prudent engineer will analyze his SYSCLK signal with SPICE or a similar CAD package. This permits a prediction of the actual behavior of the circuit, which is essentially impossible to obtain without modeling. At this time, there is no guaranteed relationship between the input on INCLK and the output on SYSCLK. Information on this relationship will be included in the Am29000 Data Sheet (order #09075). There are basically two options. One may make SYSCLK a source or one may make SYSCLK a destination. SYSCLK AS A DESTINATION SYSCLK AS A SOURCE SYSCLK can be driven externally. This is typically done to provide an external signal with a known phase relationship to SYSCLK, perhaps at twice the frequency. Figure 2 shows the connections. The easiest (and the recommended) way to connect the clocks in the system is to have the Am29000 generate and drive SYSCLK. Figure 1 shows the connections. In this configuration, PWRCLK (pin P3) is connected directly to Vcc' This is a power pin; it must not be just pulled up through a resistor. Two times the desired operating frequency is injected into INCLK. This is a TTL signal and the duty factor is unimportant so long as it meets the minimum High time and Low time parameters (see the Am29000 data sheet, order# 09075). SYSCLK is an output with CMOS levels (it swings from nearly ground to nearly Vee)' All the SYSCLK relativetiming parameters are measured with respect to SYSCLK at 1.5 volts, the normal TTL ''trip point." Since SYSCLK must have fairly fast rise and fall times and may be physically long, it may behave as a transmission line (i.e., exhibit reflections). These effects can be minimized using a few precautions. If SYSCLK goes to more than one or, at most, two places on the board, separate traces to each destination should be used. This minimizes the length of each line and minimizes the capacitive loading on each line. Series resistors at the source (at the Am29000) for each line will reduce the edge rates. Using Schottky or Fast logic is often preferable to CMOS logic, which lacks input diodes to ground. Publication /I Rev. 11024 A Amendment 10 PWRCLK and INCLK must both be connected directly to ground. SYSCLK is an input and must be driven with a CMOSlevel clock at the operating frequency. The fact that signals are generated from both edges of SYSCLK dictates that it be very nearly a perfect square wave (from 1.5 V to 1.5 V). Perhaps the best way to generate such a signal is to begin with one at 2X frequency and divide it by two with a flip-flop. The result is buffered with one or more pieces of a CMOS buffer. A typical clock generator is shown in Figure 3. PWRCLK 2X Clock SYSCLK . - - -__~ INCLK to external logic Am29000 Figure 1. Source Issue Date 11/89 © 1989 Advanced Micro Devices, Inc. 3-3 29K Family Application Notes PWRCLK INCLK Am29000 As long as they are in the same package and are similarly loaded, they will exhibit similar delays. In some design groups, putting buffers in parallel is a prohibited activity, since it is sometimes difficult to determine when one of the buffers has failed. Local design rules should always prevail. GND SYSCLK CMOS Clock Figure 2. Destination The TTL oscillator operates at twice the required frequency. Since the 74AC74 is edge triggered, it responds only to the Low-to-High transition of the oscillator. Its output is nominally a square wave (nominally because the tPHL may not be the same as tPLH). The buffer is more interesting. Clearly, it has to be CMOS since SYSCLK is a CMOS input. It has to be characterized to drive substantial capacitance since the Am29000 has an input capacitance of 90 pF. One can put multiple elements in parallel as long as they are in the same package. In addition, one can drive different portions of the load with different sections of the device. Take, for example, the lOT 74FCT240A. With light DC loading, the output swings within 0.2 V of the power supply. At 50-pF loading, the propagation delay is 1.5 ns minimum and 4.8 ns maximum. Putting two elements in parallel will solve the capacitive-loading situation, if it really needs to be solved. The actual waveforms should be examined before adding another buffer. The lOT data book does not distinguish between tPHL and tPLH. The device should be characterized at the actual expected loading, temperature, and voltage ranges to determine the actual switching characteristics. Take, for a second example, the 74AC04. With light DC loading, the output swings within 0.1 V of the power supply. The guaranteed propagation delays for the 74ACOO are 1.0 ns to 7.0 ns; we expect an AC04 to be the same. In fact, a device actually driving an Am29000 has measured propagation delays of tPLH = 4 and tPHL =5. Two elements in parallel appear to provide a somewhat cleaner waveform. -...r '. ..~ D Q 1----+-1 74'AC74 OSC to Am29000 , , ' - - - - to Am29027™ Q -...r', .....,.-, .--. one half of board other half - . of board early SYSCLK Figure 3. Clock Generator 3-4 Connected Am29000 Instruction/Data Buses Application Note by Tom Crawford The use of the Am29000™ has been proposed in a system where the instruction and data buses are connected directly to each other and to a single memory. have to be two cycles (because BINV is valid so late). Presumably this would be either a fairly high-end system with lots of very fast memory or a cache system with a modest amount of SRAM backed up by lots of DRAM. Am29000 This depends on the availability of very fast static RAMs. The equation below shows how to calculate the required access time of the RAMs. Data ADRS - tMAX Instruction Static RAM ~ (para6 + para9A) For a 25-MHz device running at various clock rates: t -- = tClK - - FREQ tCLK para6 para9A tMAX 25.00 22.22 20.00 18.18 40 45 50 55 14 14 14 14 6 6 6 6 20 25 30 35 An attempt to actually mechanize a system like this uncovered a problem. When the Am29000 follows an instruction read with a data write, there is a guaranteed "bus crash." Figure 1. Block Diagram If the memory is very fast (single cycle), then pipe lined or burst accesses never need to take place. Every access is a simple one-cycle access. Data writes would tCLK Parameter 10 requires that the data remain on the bus for 2 ns after the rising edge of SYSClK; in fact, RAM disable times are typically 15 ns. This means there is no known method to get the instruction off the instruction bus until as long as 15 ns after the clock rises. Additionally, in the best possible case, a PAL Instruction Bus ~ Initial Address 11656A-O Figure 1. Typical Memory Publication. Rev. 11656 A 3-14 Amendment /0 Issue Date: 11/89 © 1989 Advanced Micro Devices, Inc. When Is Interleaved Memory with the Am29000 Unnecessary? ~ Not To Scale tClock ~'t' /' SYSCLK --. ~ tPO_Count )( -. Counter Column Address 14- tPO_MUX )K: -. +- tPO_Wire ) Address At DRAM < -. ~tSU 4-tMAX--- )( Data 11656A-02 Figure 2. Single-Cycle Burst In order to guarantee positive margins, the following inequality must be satisfied: Equation 1: n· tCLOCK - (tMAX + tPO_COUNT + tPO_MUX + tSU + tPO_WIRE) > 0 The value n is the number of clock cycles a~ailable for memory. If there is no interleaving or wait states, n = 1. For two-way interleaving, n = 2, and so on. The maximum column address delay (static column decode DRAM) that can be allowed is tMAX. The clockto-output delay of the counter is tPD_COUNT. The value of tPD_MUX is the input-to-output delay of the multiplexer. The value of tSU is the setup time for Am29000 instructions or data. The value of tPD_WIRE is the propagation delay from the multiplexer output to the furthest memory chip input. This is the propagation delay per unit length of wire times the length of the wire. The propagation delay per unit length can be estimated from the equation: (1) tpd' = tpd delay and is usually taken to be approximately 18.5 pFIft. The distributed capacitance (Cd) resulting from the memory chips is calculated from the per-device input capacitance and the device spacing; assuming 5 pF per device and two devices per inch gives: 120 pF/ft. Using these numbers in the above equation yields: tpd' = 1.77 ---J ( 1 + (120/18.5) = 4.84 ns/ft Finally, assuming that 32 devices at 24 devices per foot equals 1.33 tt j then the value fortPD_WIRE is 6.45 ns. These numbers are summarized in Table 1. Table 1. Initial Numbers Name Value Obtained From tPO COUNT tPO-MUX tPO-WIRE tS 25 MHz tSD 20/16 MHz 6.5 8.0 6.5 6.0 8.0 PAL16R8-7 74F253 In to Zn See discussion above Am29000 25MHz tSU Am29000 20/16 MHz tSU ns ns ns ns ns ---J ( 1 + (Cd I Co ) The unloaded propagation delay (tpd) is determined only by the board material dielectric constant. It is equal to approximately 1.77 nslft. The trace capacitance (Co) is a function of the trace impedance and propagation Figure 3 shows the results of these values in equation 1. The x-axis is tCLOCK and the y-axis is the allowable access time; The solid line shows the allowable access time for n = 1 (single-cycle operation [no interleaving]). The dotted line shows the allowable access time for n=2. 1 See Appendix A of the Am29000 Memory Oesign Handbook (order #10623) for additional information on this equation. 3-15 29K Family Application Notes The discontinuity in the n =1 line reflects the difference in tSU between 25 MHz and 20/16 MHz. The horizontal lines show the access times for -70, -80, and -100 Toshiba 1M-by-1 DRAMs. The vertical lines show the minimum tCLOCK times for 25-, 20-, and 16-MHz Am29000s. The hatched area indicates where operation is possible without interleaving. for a 16-MHz Am29000 from '1ast" DRAMs with no interleaving. However, one cannot build a single-cycle burst memory for a 20- or 25-MHz system without interleaving with any available DRAM. Finally, using two-way interleaving, it is possible to build a memory that supports single-cycle bursts at a clock rate of 25 MHz or below, from memories with a column address access time of less than 50 ns. INITIAL RESULTS From inspection of Figure 3, it might be concluded that it is almost possible to build a single-cycle burst memory n .. 2 (two-way interleave) 55 50 -100 RAM ~---+~--------~--------~~---------------- 45 40 Access Time 35 ~---+----------~--------~~------- ~----+------~-----4---.,,~ 30 25 20 15 35 40 45 50 55 60 tClock Figure 3. Initial Results 3-16 65 70 11656A-03 When Is Interleaved Memory with the Am29000 Unnecessary? ARE IMPROVEMENTS POSSIBLE? Just before RAS falls, the three-state buffer is enabled. When the Column Address is required, the three-state buffers of the PAL device are enabled and the counter is driven into the array. Could a system be built with single-cycle bursts without interleaving to run at 20 MHz? To answer this question graphically, move the heavy line in Figure 3 upwards (extending the hatched area to the left). This is done by reducing or eliminating the numbers, other than tMAX, in the inequality. These are examined below, one at a time. In this configuration, a worst-case design requires that the extraordinary loading on the PAL device be considered. The total capacitance connected to the outputs of the PAL devices is greater than the standard load. However, the capacitances are distributed rather than lumped. The driver never sees the entire load, so the wire delay allowance is sufficient. tPD_COUNT The 6.5 ns value is based on using a -7 PAL~. This is already faster than any 74F, 74AS, or 74ACT counter (or flip-flop, for that matter) in any data book this author has examined. tPDWIRE The wire delay can be reduced only by reducing the wiring length. Instead of connecting all the memory chips in serial, the board can be deSigned so that there are two sets of chips connected in parallel. This halves the 1.33-foot length previously calculated and reduces the wire delay to 3.22 ns. It is certainly possible to "play games" with the clock scheme. SYSCLK on the Am29000 could be driven a little later than the clock to the counter. Data hold time is unlikely to ever be a problem. But the uncertainties in propagation delay through a CMOS clock driver are likely to cancel a lot of what could be gained. Furthermore, delaying the clock to the Am29000 delays the address on the initial cycle. To reduce tSU, a fast Am29000 at a reduced clock rate can be used. For example, a 30-MHz Am29000 has a tSU of only 5 ns; this is 3 ns better than a 16-MHz part, but it is expensive. tPD_MUX Another approach is to insert a pipeline register with a very low setup time. For example, the data setup time of a 74F374 is only 2 ns. Of course, including a pipeline register has adverse consequences. The first access of a burst-mode access will then be one SYSCLK cycle longer than would otherwise be required. In addition, the control logic is made slightly more complicated. A positive side effect is that three-state buffers are included in the register packages. Figure 5 shows registers in the instruction path. The 8.0 ns value is based on using a 74F253. A 1/2 ns reduction could be realized by building a multiplexer with a 16L8-7 (7.5 ns). A better way is to completely eliminate the multiplexer delay by building a three-state bus. Figure 4 shows one way to do this. The counter is implemented with a 16R8-7 (actuaily, more than one is probably required). An 8-bit counter is required and 2 additional bits of address must be maintained. Since the clock is not gated, some additional inputs are required to indicate whether the counter should load, hold, or count. I I " ..../ ... .,) ... Buffer / Counter ..... ~ .,) Array 11656A-04 Figure 4. Multiplexer Avoidance 3-17 29K Family Application Notes Now, assuming the implementation of all the changes described above, the fixed numbers become the values shown in Table 2. Table 2. The Improved Numbers Name Value Obtained From tPO COUNT tPO MUX tPO WIRE tSU- 6.5 ns PAL16R8-7 0.0 ns 3.2 ns 2.0 ns Three-state multiplexor Length 74F374 data sheet If this is plotted as a function of cycle time, the line has moved up a considerable amount as compared to Figure 3. This indicates that it is possible to build a 20-MHz system with the fastest available DRAMs. It also indicates that it is possible to build a 16-MHz system with 100-ns DRAMs. Address Bus Am29000 Instruction Bus 1\ SYSCLK .... ..... Pipeline Register /\ ~~ K \ I y Counter ~ Instruction Memory I " 11656A-05 Figure 5. Pipeline Access 3-18 When Is Interleaved Memory with the Am29000 Unnecessary? configuration by the access-speed driven memory device costs of the configuration yields an approximate cost for each memory system approach. CONCLUSION By using the values for proposed memory architectures into Equation 1, two to four specific values of tMAX can be determined for appropriate values of tCLOCK. With this information it is easy to draw graphs like those of Figures 3 and 6. Such graphs provide a simple display of the available trade-ofts between system clock rate, memory architecture, and the memory device access speed. Multiplying the memory device count for each Such an analysis may point out significant cost reductions by quickly identifying those situations in which a non-interleaved memory architecture and reduced clock rate can support the required system performance. 55 50 45 40 Access Time 35 30 25 20 MHz 15 35 40 MHz 45 50 1 MHz 55 60 65 70 tClock 11656A-06 Figure 6. Final Results 3-19 Implementation of an Am29000 Stack Cache Application Note by Phil Bunce and Erin Farquhar INTRODUCTION This application note will describe the basic mechanisms of the AM D Am29000's cache of the run-time stack. The stack cache is an important performance feature, because it permits a procedure's entire context to be resident in on-chip registers, thus eliminating, or at least reducing, the need for memory accesses. Our discussion is centered around a single example program, which is shown in its entirety in Appendix B. . Before discussing this example, we provide a brief overview of the basic operation of the stack cache. OVERVIEW Procedures executing on the Am29000 make use of a run-time stack, which consists of consecutive, overlapping structures called activation records. An activation record contains the dynamically allocated information specific to a particular activation of a procedure. Each time a procedure is called, a new activation record is allocated on the stack; when the procedure has finished executing, its activation record is deallocated from the stack. Compilers and assemblers for the Am29000 use two run-time stacks for activation records: the register stack and the memory stack. A procedure's activation record may be divided between these stacks. Both stacks grow toward lower addresses in memory, and items on the stacks are referenced as positive offsets from RSP (Register Stack Pointer) and MSP (Memory Register Stack Pointer). Both pointers are realized using internal Am29000 global registers. The global and local registers are both subsets of the general-purpose registers. The register stack contains parameters passed to the procedure, the local scalar variables used by the procedure, return linkage information, and the arguments that the procedure will pass to procedures that it in turn calls. Publication II 13053 3-20 Rev. A Amendment 10 The register stack is cached in the local registers, IrDIr127, as explained below. The memory stack is used for local structured data, for example, arrays and records. It also is used for additional scalar data when needed. When the scalar portion of the activation record for a particular procedure requires more than 128 words of local-register storage, the excess may be kept in the procedure's activation record in the memory stack. Both stacks are aligned on a double-word (64-bit) boundary. Procedures are required to maintain this alignment by adjusting the size of the register stack frame allocated at procedure entry to be a multiple of eight bytes. STACK CACHE The 128 local registers are used to cache locations in the register stack, such that when a procedure is active, its entire register-stack activation record is mapped to the local registers. Each word location in the register stack is mapped to a Single local register. The registernumbercorresponding to a location in the register stack is given by bits 8-2 of the 32-bit memory address of that location in the register stack. Because there are 128 local registers, quantities whose addresses differ by 512 (all addresses are byte addresses) are mapped to the same local register and cannot be in the cache at the same time. Figure 1 shows a snapshot of the register stack in memory after some calls have been made, and the mapping of the register stack to the local registers. As shown in the figure, Global Register 1, called the Register Stack Pointer (RSP), contains the 32-bit virtual address of the top of the register stack in memory. This virtual address on the Am29000 is the lowest-addressed valid stack location in the current activation record. Issue Date: 11189 © 1989 Advanced Micro Devices, Inc. Implementation of an Am29000 Stack Cache Absolute Register Number R170 • • • I --+ ~ LR2 LR1 R213 R212 LRO LR127 R:71 ---- LR213 R215 R214 • • --. Start of Stack Registers 4 - - GR1 (RSP) ... "I. -. ll---L-R-2-14-~L _R_A~~ ---- Register Stack 4FFC Spilled Activation Records 4EAB Used Locations 40FE Current Activation Record 512 Bytes 4054 Free Locations 4CAB 11031A-01 Figure 1. Mapping of Register Stack to Stack Cache Local registers are addressed as positive word offsets from RSP, as in Figure 2. Specifically, when a local register operand is specified in an instruction (that is, the most significant bit of the register number is set), the seven least significant bits are added to bits 8-2 of RSP and the result is truncated to seven bits. For example, if RSP has the value 0, as shown in Figure 2, then IrO is absolute register 128 (the first local register), and Ir1 is absolute register 129 (the second local register): if RSP has the value four, then IrO is absolute register 129 and Ir1 is absolute register 130. Referring again to Figure 1, the current activation record is delimited by the Frame Pointer (FP), which by soft- 31 GAl (ASP) I ware convention uses Local Register 1, and RSP. FP points to the "top" of the previous activation record, that is, to the lowest-addressed word location above the current activation record. When a procedure is active, this entire area must be cached in local registers. The register stack between FP and RFB (Register Free Bound) contains the saved activation records of previously called procedures, which are also currently mapped to the local-register cache. RFB, by convention Global Register 127, is set to pOint to the lowestaddressed word in the register stack that is not mapped to the local registers. 15 :~t--_--------L~...:....-~:::..L....::......!~==:::::_~==T~ lR~ABSREG# T Ox80 0,05 = 21' 11031A-02 Figure 2. Local Register Addressing 3-21 29K Family Application Notes The register stack between RSP and RAB (Register Allocate Bound) represents stack locations (and corresponding local registers) that are currently "unused" and thus available for allocation when another procedure is called. RAB (by convention Global Register 126) is set to point to the lowest-addressed word in the register stack that is currently mapped to a local register. When a procedure is called, RSP is decremented by the number of words required to accommodate the called procedure's activation record. When RSP is decremented beyond the location pointed to by RAB and thus beyond the available local registers, more local registers will be required for the activation record, and some locations in the stack cache must be written to memory (or "spilled") before the new activation record is created. This condition is called overflow. Note that in Figure 1, locations between RFB and the Start of Stack are saved activation records that have been previously spilled to memory. On return from a procedure, the activation record is de-allocated by incrementing RSP by the same amount 3-22 it was decremented when the procedure was called. If the caller's FP (which points to highest location in the caller's activation record) is greater than RFB (which points to the first unmapped register stack location above the activation record), the contents of that portion of the register stack will have to be loaded into the local registers to accommodate the caller's activation record. This condition is called underflow. Overflow and underflow conditions are detected by instruction sequences in the prologue and epilogue, which are the instruction sequences that execute as a result of a procedure call and procedure return, respectively, and cause a transfer of control to the appropriate trap handler routine. In the case of an overflow, the trap handler moves the contents of the required number of local registers to the register stack in memory and adjusts the value in RAB and RFB. In the case of an underflow, the trap handler loads the required numberof register stack locations into the local registers and adjusts the value in RAB and RFB. Implem~ntatlon OVERVIEW OF EXAMPLE PROGRAM Our example program consists of the four text files listed below. regdcl.h: Register name declarations macros.h: Macro definitions for prologue and epilogue start.s: CPU Initialization Overflow and Underflow trap handler routines example.s: Two procedures main and recurse Appendix A contains partial listings from the example program that are described individually in the sub-sections below. of an Am29000 Stack Cache LOGUE and EPILOGUE. These macros are discussed in the Prologue and Epilogue sections. START CODE The module start.s contains code that sets up the execution environment for our example program. The initial portion of the start code is shown in Appendix A-2, Start Code. The overflow and underflow trap handlers, also in start.s, will be discussed later. We set the beginning of the stack (its highest address in memory) at Ox5000. The "& -7" in the expression ensures that the value is a multiple of eight, with rounding downward if necessary. .equ TOP_STK, (Ox5000 & -7) Appendix B contains the source for the entire example program which includes all of the above files. INCLUDE FILES There are two include files, regdcl.h and macros.h. Note that regdcl.h must be included before macros.h, because macros.h uses definitions from regdcl.h. In regdcl.h (see Appendix A-1, Register Declarations), we assign the value 80 as the base of registers to be used as temporaries by system software. Additional temporaries will be addressed as offsets from it. These registers will be used for work space in the start code and the two trap handler routines. The two temporary registers, tmp1 and tmp2, are assigned values that are offsets of SYS_TMP, which means that tmp1 is Global Register 80, and tmp2 is Global Register 81. .reg .reg tmp1, tmp2, %% (SYS_TMP + 0) %% (SYS_TMP + 1) Then we initialize the four pointers that define the stack environment. const rsp, (TOP_STK-8) .equ SYS_TMP, 80 isystem temp registers add const rsp, rsp, 0 rab, (TOP_STK-512) We also assign symbolic names to global and local registers, in accordance with the software calling conventions of the Am29000. const const fp,TOP_STK rfb,TOP_STK .reg .reg .reg .reg .reg .reg rsp,gr1 msp,gr125 rab,gr126 rfb,gr127 fp,lr1 raddr,lrO ilocal reg stack pointer imemory stack pointer iregister allocate bound iregister free bound iframe pointer ireturn address The overflow and underflow trap vectors, V_SPILL and, V_FILL, are set to the constant values 64 and 65. These are the vector numbers for the trap handlers chosen for this example. .equ . equ V_SPILL, 64 V_FILL, 65 The second include file in our example program, macro.h, contains the macro definitions for PRO- icreate ;double word ; alignment ;set stack ;pointer ; update rsp ;set register ;alloc bound ;set frame ptr ;set reg free ibound Figure 3 shows the initialized stack. Because there has been no spilling of local registers to the stack in memory, RFB points to the top of the stack. RAB is, by definition, 512 bytes less than RFB.ln the initial activation record, defined by FP and RSP, FP points to the top of the stack (because there has been no prior context) and RSP is set to a value eight bytes less than FP to allow for the current FP and raddr when a new activation record is created. Note that the setting of RSP must precede the setting of FP by at least two instructions because of the delayed effect of modifying RSP, and that an explicit arithmetic or logical instruction must be used to update RSP . The CPS (Current Processor Status Register) is initialized with the value Ox0072. Assuming the prior state of 3-23 29K Family Application Notes this register was Reset mode (shown in Figure 4), we have in effect cleared FZ, OA, and RE, and left the other bits unchanged. The FZ (Freeze) bit is cleared because the processor is unfrozen for normal operation. (For a description of the Freeze bit, refer to the section called "Special-Purpose Registers," in the Am29000 User's Manual). We clear the OA (Disable All Interrupts and Traps) bit to enable all traps. The RE (ROM Enable) bit is cleared because this example assumes we are executing from RAM. mtsrim cps,Ox72 We set the Vector Fetch bit in the Configuration Register to select a vector table configuration for the Vector Area. mtsrim mtsrim vab,O Next we initialize the vector table with the address of the Overflow trap handler routine, called SpiliHandler. First we load the address of the Spill Handler into a temporary register, using two CONST instructions for the case when Spill Handler is not in the first 64K-bytes of memory. PO, PI, SM, and 01 remain set, meaning that address translation is disabled (PO and PI), supervisor mode is selected (SM), and external interrupts are disabled (01). Supervisor mode is selected because some of the instructions in our example program are privileged. Address translation is disabled because this example is designed for systems not using the TLS. External interrupts are disabled because we have no interrupting devices and want to eliminate any spurious interrupt requests. fp ; VF The VAS (Vector Area Base Address) register, which specifies the beginning address of the vector table in memory, is set to zero. ; PD, PI, SM, DI rfb cfg,OxlO const consth tmpl,SpillHandler tmpl,SpillHandler y fp rsp - - . raddr ,. .... 512 Bytes " ft-----l[ rab _ _ 11031A-03 Figure 3. Initialized Stack 31 15 0 101 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 0111 0111 01111111 01 0111 \ V Reserved d I I I I IT I JI CA TU TE IP TP LK FZ WM RE PO DA 1M PI SM 01 11031A-04 Figure 4. Current Processor Status Register in Reset Mode Because each entry in the vector table is four bytes, we compute the address in the vector table by multiplying 3-24 the vector number V_SPILL (64) by four (a shift left by two). Implementation of an Am29000 Stack Cache const tmp2,V_SPILL sll tmp2,tmp2,2 ;compute vector ; address Then we store the address of Spill Handler (in tmp 1) into the vector table address we just computed. store O,O,tmp1,tmp2 write spill vector Initializing the vector table with the address of the underflow trap handler routine (vector number V_FILL) is done the sam~ way: const consth const sll store tmp1,FillHandler tmp1,FillHandler tmp2,V_FILL compute vect tmp2,tmp2,2 addr write fill O,O,tmp1,tmp2 vector The procedure start then calls main. passing it the return address (IrO). A NOP follows the call because the Am29000 always executes one instruction beyond a call instruction before the call is taken. call nop halt raddr,main ;halt after successful ; completion EXAMPLE FUNCTIONS MAINO AND RECURSEO After the start code has executed. control is passed to the procedure mainO. The purpose of mainO is to call the procedure recurseO. providing it with an initial set of values. RecurseO calls itself a total of 86 times. then returns to itself 86 times before returning to mainO. An overflow condition occurs with the 21 st call. and each subsequent call causes an additional spill of local registers to memory. When the program returns. the 22nd return causes an underflow condition. and each subsequent return causes an additional fill from memory to the local registers. The basic operation of mainO and recurseO is summarized by the following C program: main () { recurse(1,42); recurse(n,m) int n,m; int i,j; if (n > 85) return; i = n + 1; recurse (i,m) ; The code for mainO and recurseO is shown in Appendix A-3 and A-4. Code for MainO and Code for RecurseO. respectively. 3-25 29K Family Application Notes words needed for the called procedure's FP and return address when it calls another procedure. ANDing the expression with the complement of 1 (& -1) maintains double-word alignment on the stack by setting the least significant bit to zero. The "+ 1" ensures that the amount is rounded up, not down. PROLOGUE As with all Am29000 procedures, mainO begins with a prologue. The macro definition of PROLOGUE and the expansion of PROLOGUE for mainO are shown in Appendix A-5 and A-6, Prologue Macro and Prologue Expansion for MainO, respectively. The expression for SIZE_CNT includes INCNT and two additional words for Ira (return address) and FP of the caller. The purpose of PROLOGUE is to allocate an activation record and check for overflow before the body of the procedure is executed. It is invoked with three parameters: the number of arguments passed (INCNT), the number of registers required for the procedure's local variables (LOCCNT), and the maximum number of arguments that the procedure may pass to anyone function it in turn calls (OUTCNT). . macro The three macro variables, IN_PRM, LOC_REG, and OUT_PRM are used to establish offsets into the stack for input, local, and output arguments. These macro variables are set only if the corresponding value of the parameter is not equal to zero . PROLOGUE,INCNT,LOCCNT,OUTCNT · if (INCNT) .set IN_PRM, (2 + ALLOC_CNT + Ox80) .endif · if (LOCCNT) .set LOC_REG, (2 + OUTCNT + Ox80) .endif · if (OUTCNT) .set OUT_PRM, (2 + Ox80) .endif The values of ALLOC_CNT and SIZE_CNT are computed from the parameters. .set ALLOC_CNT, «2+0UTCNT+LOCCNT+l)&-1) .set SIZE_CNT, (ALLOC_CNT+2+INCNT) ALLOC_CNT is the amount of space on the stack that must be newly allocated by the Prologue for the procedure's activation record. SIZE_A is the amount of space that must be accessible by the procedure, that is, the size of its activation record. I~ the abo.ve, a macro variable is set equal to an expression that IS evaluated to a local register number when the program is assembled. The macro variable can then be used as the base register for offset addressing of parameters of that type (as shown in Figure 5). The "Ox80" provides the 125-word offset required for a local register access. The expression for ALLOC_CNT does not use INCNT, because incoming parameters were already allocated space on the stack as the outgoing parameters (OUTCNT) of the calling procedure. "2" is the number of 1..IN_PRM+1 IN_PRM+O oldfp old raddr LOC_REG+1 LOC_REG+O OUT_PRM+1 OUT_PRM+O fp RSP--' - r-- raddr 11031A-05 Figure 5. Prologue Parameters 3-26 Implementation of an Am29000 Stack Cache The body of the PROLOGUE macro has three instructions: In the following discussion of SpiliHandler, we assume the reader is familiar with the processor's response to traps. If not, referto the section called Interrupt and Trap Handling in the Am29000 User's Manual. rsp, rsp, (ALLOC_CNT« 2) V_SPILL, rsp, rab fp, rsp, (SIZE_CNT« 2) sub asgeu add The first three .reg directives assign symbolic names to the three temporary system registers used by SpillHandler. In the above instructions, ALLOC_CNT and SIZE_CNT are shifted left by two to convert them from word quantities to the required byte quantities (the stack registers, whose contents will be modified, contain byte addresses). .reg R_Cnt, %% (SYS_TMP+O) :temp for : count R_TmpPCO,%%(SYS_TMP+l);temp for :PCO R_TmpPC1,%%(SYS_TMP+2);temp for ;PCl .reg The first instruction allocates an activation record by decrementing RSP by the amount ALLOC_CNT. .reg The second instruction asserts that RSP of the new activation record is greater than or equal to RAB. If this is not the case, (that is, RSP has been decremented beyond RAB), an overflow trap occurs, and there is a transfer of control to the trap handler routine, SpiliHandler, pointed to by the vector V_SPILL. The trap handler will move (spill) the contents of the required number of local registers to the register stack in memory and adjust RFBand RAB, as described in the Overflow Trap Handler section. The old PCs are saved in two of the temporary registers just declared. mfsr mfsr :save the PCs The CPS (Current Processor Status Register) is set to the value Ox73. This clears the FZ (Freeze) bit, which was set by hardware when the trap was taken (see Figure 6), so that the trap handler can execute a Store Multiple instruction. (Note that the PCs must be saved before the FZ bit is cleared.) The DA (Disable All Interrupts and Traps) bit remains set, which prevents the processor from taking any traps except the *WARN, Instruction Access Exception, Data Access Exception, and Coprocessor Exception traps. PD, PI, SM, and DI also remain set. The third instruction sets FP to point to the location just above the new activation record, so it can be used for underflow checking in the EPILOGUE macro of a procedure that is called by this procedure (see Epilogue section). After the prologue, mainO calls recurseO. The expansion of PROLOGUE for recurseO is shown in Appendix A-7, Prologue Expansion for RecurseO. mtsrim cps,Ox73 : PO, PI, SM, DI, DA Now we can use the Store Multiple instruction to store the required number of local registers into the register stack in memory. This instruction requires a source, a destination, and a count. OVERFLOW TRAP HANDLER . On the 21st call to itself, recurseO causes an overflow trap. The code that services this trap is shown in Appendix A-8, Overflow Trap Handler, and is described below. 31 R_TmpPCO, pcO R_TmpPC1, pcl '5 0 I I01 0I01 0I01 0I01 0I01 0I01 0I01 0I01 0I01 0I01 01,1 0I01 01,1,1,1 0I01,1,1 0 I II I I I i i CA TE IP TU TP LK FZ WM RE PO PI 1M SM OA 01 11031A-06 Figure 6. Current Processor Status Register After an Interrupt or Trap 3-27 29K Family Application Notes As explained earlier (and shown in Figure 1), the area between RSP and RAB represents the local registers available for allocation when a procedure is called. Because there has been an overflow and RSP has been decremented beyond RAB, we can compute the size of the required spill (the count for the Store Multiple) by subtracting RSP from RAB. sub R_Cnt, rab, rsp ;R_Cnt = number of . ; bytes to spill Then we use R_Cnt to adjust RFB, so that it correctly reflects the area in the register stack that will be mapped to the local registers. sub rfb,rfb,R_Cnt ;move down the ;frame bound The local registers that have to be spilled are those corresponding to register-stack locations between RSP and RAB, because they are the local registers that must be occupied by the new activation record. So the instruction source will be IrO, which corresponds to RSP. The instruction's destination will be the register-stack location pointed to by the previously modified RFB, because that is the register-stack location at the correct 512-byte offset from RSP. storem 0, 0, IrQ, rfb ;spill from the ;allocated area Then we set RAB to point to the top of stack, because that is now the lowest stack address currently cached in local registers. add Before using the Load Multiple instruction, R_Cnt must be written as a word amount into the CR field of the Channel Control register, which is used by the processor to determine the number of loads to memory. So we convert R_ent from a byte to a word amount using the Shift Right Logical instruction. rab, rsp, 0 We set CPS to the value Ox473. This sets the FZ bit, which must be set before we restore PCO and PC1. PO, PI, SM, 01, and OA remain set. mtsrim cps,Ox473 ;R_Cnt = count of ;words to spill Because the CR field is zero-based, we subtract one from R_Cnt sub R_Cnt,R_Cnt, 1 ;FZ, po, PI, SM, ;DI, DA Then the two PCs are restored and the IRET (Interrupt Return) instruction restores the previous contents of CPS from the Old Processor Status Register, unfreezes the processor, and begins fetching from PCO and PC1. ;correct for storem and then use the Move to Special Register instruction to write it to the CR field. ;set up count for ;storem 3-28 ;move down the ;allocate bound mtsr mtsr iret pcQ, ~TmpPCO pcl, R_TmpPCl ;restore the PCs Implementation of an Am29000 Stack Cache EPILOGUE UNDERFLOW TRAP HANDLER When recurse has called itself 86 times, it returns and executes an Epilogue. The EPILOGUE macro is shown in Appendix A-9, EPILOGUE Macro. On the 22nd return of recurseO to itself, an underflow trap occurs. The code that services this trap is shown in Appendix A-11, Underflow Trap Handler, and is discussed below. EPILOGUE's first instruction de-allocates the procedure's activation record by adding ALLOC_CNT to RSP. This is followed by a NOP, because a change in the value of RSP must be separated by at least one cycle from an instruction that references a local register (in this case, the instruction JMPI, whose operand raddris The two old PCs are saved in temporary registers declared in the SpillHandler routine. mfsr mfsr R_TmpPCO, pcO R_TmpPC1, pc1 isave the PCs frO). add nap jmpi rsp, rsp, (ALLOC_CNT« 2) raddr Before the Jump Indirect instruction finishes executing, the next instruction, ASLEU, is executed. This instruction asserts that the caller's FP, now restored because the caller's RSP has been restored, is less than or equal to RFB. If the assertion is false (which means that FP is pointing to an unmapped, previously spilled registerstack location), an underflow trap occurs, and control is transferred to the trap handler routine, FiIIHandler, pOinted to by the vector V_FILL. The trap handler will move the contents of locations in the register stack to the local registers and adjust RAB and RFB, as described in the Underflow Trap Handler section. asleu V_FILL, fp, rfb At the end of the Epilogue, the parameters are set to an illegal value. This ensures that if they are used again before they are explicitly set, an assembly-time error will be reported. (1024) .set IN,,-PRM, .set LOC_REG, (1024) .set OUT_PRM, (1024) .set ALLOC_CNT, (1024) iillegal, to icause ierr on ref iillegal, to icause ierr on ref i illegal, to icause ierr on ref iillegal, to icause The CPS (Current Processor Status Register) is set to the value Ox73. This clears the FZ bit, so that the trap handler can execute a Load Multiple instruction. The DA bit remains set, which prevents the processor from taking any traps except the ·WARN, Instruction Access Exception, Data Access Exception, and Coprocessor Exception traps. PO, PI, SM, and 01 also remain set. mtsrim cps, Ox73 iPD, PI, SM, DI, DA We will use the Load Multiple instruction to load locations in the register stack into the local registers. The Load Multiple instruction requires a source, a destination, and a count. Clearly, the source for the Load Multiple instruction is the location pointed to by RFB, since RFB points to the first location in the register stack that was previously spilled from the local registers. The destination of the Load Multiple instruction will, of course, be the local register corresponding to RFB. Local registers may be specified as instruction operands in one of two ways: using a local register number (in the range from 0 to 127), or using the absolute register number (in the range 126 to 255) in an Indirect POinter Register. With the first method, the local register number is computed as a positive word offset of RSP. This option is not available to us because the trap handier has no way of knowing the offset from RSP (that is, the local register number) corresponding to RFB. So we will convert the address in RFB to an absolute local register number, put this nu mber in Indirect Pointer A (because the destination operand uses Indirect Pointer A), and then specify Global Register 0 (which indicates an indirect pointer access) as the destination register in the Load Multiple instruction. ierr on ref The expansion of EPILOGUE for recurseO is shown in Appendix A-10, Epilogue Expansion for RecurseO. To convert the address in RFB to an absolute local register number, we OR it with 512. This sets bit 9, which 3-29 29K Family Application Notes selects a local registe~; bits 2-8 give the absolute local register number. Because the CR field is zero-based, we subtract one from R_Cnt const R_Cnt,Sl2 sub or R_Cnt,~Cnt,rfb ;make local reg ;ip ;from rfb Then we use the Move To Special Register instruction to put this value in the Indirect Pointer A Register. :set up indirect ;ptr ;for loadm mtsr Recalling that the underflow trap was signaled because FP is pointing to an unmapped and previously spilled register stack location at a higher memory address than RFB, we can compute the numberof local registers to fill by subtracting RFB from FP. ;R_Cnt = t of ;bytes to fill sub We use the just-computed value to adjust RAB, so that it correctly points to the new lower bound of the register stack mapped to local registers. We perform this operation now because it requires a byte amount, and R_Cnt will be converted to a word amount in the next instruction. R_Cnt, R_Cnt,l and then use the Move to Special Register instruction to write it to the CR field. mtsr ;set up count for ;loadm Now we use the Load Multiple instruction to transfer the contents of the register stack in memory to the local registers, specifying RFB as the address in the register stack from which to load, and grO (Indirect Pointer A) as the local register number at which to begin the fill. loadm O,O,grO,rfb rab, rab, R_Cnt add rfb,fp,O srI 3-30 R_Cnt,R_Cnt,2 ;R_Cnt = number of ;words to fill ;move up frame bound We set CPS to the value OX473. This sets the FZ bit, which must be set before we restore PCO and PC1. PO, PI, SM, 01, and OA remain set. cps,Ox473 ;move up the ;allocate bound Before use of the Load Multiple instruction, the count must be written as a word amount into the CR field of the Channel Control Register. Hence, we convert R_Cnt from a byte to a word amount using the Shift Right instruction. ;fill area freed After the registers have been filled, we update RFB so that it correctly points to the upper bound of the register stack that is currently cached. mtsrim add :correct for loadm ;FZ, po, PI, SM, :DI, DA Then the two PCs are restored and the IRET (Interrupt Return) instruction restores the previous contents of CPS, unfreezes the processor; and begins fetching from PCO and PC1. mtsr mtsr iret pcO,R_TmpPCO pcl, R_TmpPCl ;restore the PCs Implementation of an Am29000 Stack Cache APPENDIX A: PARTIAL LISTINGS EXTRACTED FROM EXAMPLE PROGRAM A-1. REGISTER DECLARATIONS ,-----------------------------------------------------------------------; Global registers ,-----------------------------------------------------------------------.equ .reg .reg .reg .reg SYS _TMP, 80 rsp, grl msp, gr125 rab, gr126 rfb, gr127 system temp registers local register stack pointer memory stack pointer register allocate bound register free bound ,-----------------------------------------------------------------------; Local compiler registers (only valid if frame has been established) .reg .reg fp, lrl raddr, IrO frame pointer ; return address i----------------------------------------------------- -----------------; Vectors ,-----------------------------------------------------------------------.equ V_SPILL, 64 .equ V_FILL, 65 3-31 29K Family Application Notes A-2. START CODE . include .equ .text "regdcl.h" TOP_STK, (Ox5000 .global start & -7) create double word aligned value start: .reg .reg const add const const const tmp1, %%(SYS_TMP + 0) tmp2, %%(SYS_TMP + 1) rsp, (TOP STK-8) rsp,rsp,O rab, (TOP STK-512) fp,TOP_STK rfb,TOP_STK set correct mode cps, Ox72 mtsrim cfg, Ox10 mtsrim mtsrim vab,O connect up spill const consth const sll store handler tmp1,SpillHandler tmp1,SpillHandler tmp2,V_SPILL tmp2,tmp2,2 0,0,tmp1,tmp2 set set set set set stack ptr shadow rsp reg alloc bound frame ptr reg free bound PD, PI, SM, DI VF compute vect addr write spill vector connect up fill handler tmp1,FillHandler const tmp1,FillHandler consth tmp2,V_FILL const tmp2,tmp2,2 sll 0,0,tmp1,tmp2 store compute vect addr write fill vector call main program call raddr,main nop halt halt after successful completion 3-32 Implementation of an Am29000 Stack Cache A-3. CODE FOR MAINO . include . include .global "regdcl.h" "macros.h" main main () recurse(1,42); main: PROLOGUE 0,0,2 invoke macro ° ic, ° loc, 2 og name outgoing args .reg M_out n, %%(OUT_PRM + 0) .reg M_out_m, %%(OUT_PRM + 1) recurse(1,42) const call const M_out_m, 42 raddr,recurse M_out_n, 1 EPILOGUE 3·33 29K Family Application Notes A-4. CODE FOR RECURSE(} .global recurse recurse (n,m) { int i, j; if (n > 85) return; i = n + 1; recurse(i,m); recurse: PROLOGUE 2,2,2 invoke macro 2 ic, 2 loc, 2 og name ic args .reg .reg R_in_n, %%(IN_PRM + 0) R_in_m, %%(IN_PRM + 1) name locals .reg .reg R_i, %%(LOC_REG + 0) R_j, %%(LOC_REG + 1) name outgoing args .reg R_out n, %%(OUT_PRM + 0) .reg R_out_m, %%(OUT_PRM + 1) name temporary register R_tmp, IrO .reg i f (n > 85) cpgt jmpt i = n return R_tmp, R_in_n, 85 R_tmp, rec_01 + 1 add recurse(i,m) add call add EPILOGUE 3-34 R_out_m, R_in_m, raddr, recurse R_out_n, R_i, ° ° Implementation of an Am29000 Stack Cache A-S. PROLOGUE MACRO macro PROLOGUE Parameters: INCNT LOCCNT OUTCNT .set .set .set .endif input parameter count local register count output parameter count ALLOC_CNT, «2 + OUTCNT + LOCCNT + 1) & -1) SIZE_CNT, (ALLOC_CNT + 2 + INCNT) IN_PRM, (2 + ALLOC_CNT + Ox80) (LOCCNT) .set .endif LOC_REG, (2 + OUTCNT + Dx8D) (OUTCNT) .set .endif OUT_PRM, (2 + Ox8D) . if . if sub asgeu add .endm rsp, rsp, (ALLOC_CNT« 2) V_SPILL, rsp, rab fp, rsp, (SIZE_CNT« 2) A-6. PROLOGUE EXPANSION FOR MAINO main: PROLOGUE .set .set .set sub asgeu add ; invoke macro 0,0,2 ALLOC_CNT, «2 + 2 + 0 + 1) & -1) SIZE_CNT, (ALLOC_CNT + 2 + 0) OUT_PRM, (2 + Ox80) rsp, rsp, (ALLOC_CNT « 2) V_SPILL, rsp, rab fp, rsp, (SIZE_CNT « 2) 3-35 29K Family Application Notes A-7. PROLOGUE EXPANSION FOR RECURSEO recurse: PROLOGUE 2,2,2 ; invoke macro .set .set .set .set .set ALLOC_CNT, «2 + 2 + 2 + 1) & -1) SIZE_CNT, (ALLOC_CNT + 2 + 2) IN_PRM, (2 + ALLOC_CNT + Ox80) LOC_REG, (2 + 2 + Ox80) OUT_PRM, (2 + Ox80) sub asgeu add rsp, rsp, (ALLOC_CNT« 2) V_SPILL, rsp, rab fp, rsp, (SIZE_CNT« 2) A-8. OVERFLOW TRAP HANDLER .reg .reg .reg R_Cnt, %%(SYS_TMP + 0) R_TmpPCO,%%(SYS_TMP + 1) R_TmpPC1,%%(SYS_TMP + 2) .global SpillHandler temp for count (shared) temp for PCO temp for PC1 SpillHandler: This routine handles a false assertion in the standard prologue. In: rab > rsp Ir1 <= rfb rfb rab + 512 (requiring an allocation) Out: rab == rsp Ir1 <= rfb rfb rab + 512 (just enough allocated) mfsr mfsr mtsrim sub sub srI sub mtsr storem add mtsrim mtsr mtsr iret 3-36 R_TmpPCO, pcO R_TmpPC1, pc1 cps, Ox73 R_Cnt, rab, rsp rfb, rfb, R_Cnt R_Cnt, R_Cnt, 2 R_Cnt, R_Cnt, 1 cr, R_Cnt 0, 0, IrO, rfb rab, rsp, 0 cps, Ox473 pcO, R_TmpPCO pc1, R_TmpPC1 save the PCs PD, PI, SM, DI, DA R_Cnt = of bytes to spill move down the frame bound R_Cnt = count of words to spill correct for storem set up count for storem spill from the allocated area move down the allocate bound FZ, PD, PI, SM, DI, DA restore the PCs * Implementation of an Am29000 Stack Cache A-9. EPILOGUE MACRO ; macro EPILOGUE .macro EPILOGUE add nop jmpi asleu .else jmpi nop .endif . set .set .set .set .endm rsp, rsp, (ALLOC_CNT« 2) raddr V_FILL, fp, rfb raddr illegal, illegal, illegal, illegal, IN_PRM, (1024) LOC_REG, (1024) OUT_PRM, (1024) ALLOC_CNT, (1024) to to to to cause cause cause cause err err err err on on on on ref ref ref ref A-10. EPILOGUE EXPANSION FOR RECURSEO EPILOGUE add rsp, rsp, (ALLOC_CNT« nop jmpi raddr asleu V_FILL, fp, rfb 2) 3-37 29K Family Application Notes A-11. UNDERFLOW TRAP HANDLER .global FillHandler FillHandler: iThis routine handles a false assertion in the standard epilogue. iIn: iOut: lrl > rfb rsp >= rab rfb rab + 512 lrl == rfb rsp >= rab rfb rab + 512 mfsr mfsr mtsrim const or mtsr sub add srI sub mtsr loadm add mtsrim mtsr mtsr iret 3·38 (requiring de-allocation) (just enough freed) R_TmpPCO, pcO R_TmpPC1, pcl cps, Ox73 R_Cnt, 512 R_Cnt, R_Cnt, rfb ipa, R_Cnt R_Cnt, lrl, rfb rab, rab, R_Cnt R_Cnt, R_Cnt, 2 R_Cnt, R_Cnt, 1 cr, R_Cnt 0, 0, grO, rfb rfb, lrl, 0 cps, Ox473 pcO, R_TmpPCO pcl, R_TmpPCl save the, PCs PO, PI, 8M, DI, DA make local reg ip from rfb set up indirect ptr for loadm R_Cnt = # of bytes to fill move up the allocate bound R_Cnt = number of words to correct for loadm set up count for loadm fill area freed move up frame bound FZ, PO, PI, 8M, DI, DA restore the PCs Implementation of an Am29000 Stack Cache APPENDIX B: COMPLETE LISTING OF EXAMPLE PROGRAM . include "regdcl.h" .equ TOP_STK, (Ox5000 & -7) ;create double word ;aligned value .text .global start start: .reg .reg tmp1, (SYS_TMP + 0) tmp2, (SYS_TMP + 1) const const const const rsp, (TOP_STK-8) rab, (TOP_STK-512) fp,TOP_STK rfb,TOP_STK ;set correct mode mtsrim cps, Ox72 mtsrim cfg, Ox10 mtsrim vab,O ;set ;set ;set ;set stack ptr reg alloc bound frame ptr reg free bound ;PD, PI, SM, DI ;VF ; connect up spill handler const tmp1,SpillHandler consth tmp1,SpillHandler const tmp2,V_SPILL sll tmp2,tmp2,2 store 0,0,tmp1,tmp2 ;compute vect addr ;write spill vector ;connect up fill handler const tmp1,FillHandler consth tmp1,FillHandler const tmp2,V_FILL sll tmp2,tmp2,2 store 0,0,tmp1,tmp2 ;compute vect addr ;write fill vector ;call main program call raddr,main nop halt ;halt after successful completion ;The routines below handle overflow and underflow conditions. iThe temps which they use are given below. .reg .reg .reg R_Cnt, (SYS_TMP + 0) R_TmpPCO, (SYS_TMP + 1) R_TmpPC1, (SYS_TMP + 2) ;temp for count (shared) itemp for PCO itemp for PC1 3·39 29K Family Application Notes .global 8pillHandler 8pillHandler: iThis routine handles a failed assertion in the standard prologue iIn:rab > rsp(requiring an allocation) ifp <= rfb irfb == rab + 512 iOut:rab == rsp(just enough allocated) ifp <= rfb ;rfb == rab + 512 mfsr mfsr R_TmpPCO, pcO R_TmpPC1, pc1 isave the PCs mtsrim cps, Ox73 iPD, PI, 8M, DI, DA sub sub srI sub mtsr storem add R_Cnt, rab, rsp rfb, rfb, R_Cnt R_Cnt, R_Cnt, 2 R_Cnt, R_Cnt, 1 cr, R_Cnt 0, 0, IrO, rfb rab, rsp, 0 iR_Cnt = of bytes to spill imove down the frame bound ;R_Cnt = count of words to spill icorrect for storem iset up count for storem ispill from the allocated area imove down the allocate bound mtsrim cps, Ox473 iFZ, PD, PI, 8M, DI, DA mtsr mtsr pcO, R_TmpPCO pc1, R_TmpPC1 irestore the PCs * iret .global FillHandler FillHandler: iThis routine handles a failed assertion in the standard epilogue iIn:fp > rfb(requiring de-allocation) irsp >= rab irfb == rab + 512 iOut:fp == rfb(just enough freed) irsp >= rab irfb == rab + 512 3·40 mfsr mfsr R_TmpPCO, pcO R_TmpPC1, pc1 isave the PCs mtsrim cps, Ox73 iPD, PI, 8M, DI, DA const or mtsr R_Cnt, 512 R_Cnt, R_Cnt, rfb ipa, R_Cnt imake local reg ip ifrom rfb iset up indirect ptr for loadm Implementation of an Am29000 Stack Cache sub add srI sub mtsr load add R_Cnt, fp, rfb rab, rab, R_Cnt R_Cnt, R_Cnt, 2 R_Cnt, R_Cnt, 1 cr, R_Cnt mO, 0, grO, rfb rfb, fp, iR_Cnt = # of bytes to fill imove up the allocate bound iR_Cnt = number of words to fill icorrect for loadm iset up count for loadm ifill area freed imove up frame bound mtsrim cps, Ox473 iFZ, PO, PI, SM, DI, DA mtsr mtsr pcO, R_TmpPCO pcl, R_TmpPCl irestore the PCs ° iret ,._--------------------------------------------------------------------- 3-41 Introduction to the Am29000 Development Tools Application Note by Doug Kern and Douglas Walton INTRODUCTION The development of a microprocessor-based system is a complicated and detailed undertaking that requires skilled personnel and efficient test equipment. Because of the sophistication of modern microprocessing systems, they usually cannot be flawlessly designed on the first iteration, and nearly always require extensive debugging and testing time. Experienced developers know that few designs function perfectly at power-up. Faults occur due to erroneous logic, poor assembly, or defective parts, so some debugging is virtually always necessary. Therefore, every effort should be made to plan the debugging and testing process before the first prototype is built. Without advance planning, the deSigner may find that the circuit either cannot be successfully debugged, or that the necessary debug time is prohibitive. Planners should keep in mind that testing and debugging continues throughout the life of the product. Because different phases in the product life cycle have different characteristics, the requirements for each must be considered. The major phases of the product life cycle are development, production (pilot, limited, and large-scale), and field service. Apart from the skill of the personnel, the efficiency of test equipment is a critical area that affects the testing time in every phase. Outdated or ineffective equipment will slow down even the most highly trained personnel. More importantly, expensive, state-of-the-art test equipment will be wasted if its use is not preplan ned. Careful consideration must be given to the type of equipment needed to service the product, as well as its cost and how it will be disbursed to the field. actual system hardware. They normally are used with a prototype or production system to determine the cause of failure, and are distinguished from the 29KTM tools used to prepare programs for execution on a target system (see the 29K Tool Chain section). Figure 1 shows the relationship of these development tools to the application and each other. The components are described below: ADAPT29K-Advanced Development and Prototyping Tool. ADAPT29KTM is a standalone system that inter- faces to the application like an in-circuit emulator. It provides a wide range of debugging functions without intruding on the application's execution. MON29K-Target Resident Monitor.' MON29KTM is a monitor program that executes on the target Am29000. It provides many of the same ~ebugging functions as the ADAPT29K, even though it is a software product. XRAY29K-5ource-Level Debugger. XRAY29! Specific help on an individual command can be obtained by entering H followed by the letter of the command. All command explanations show the complete command syntax and give a short description of how the command functions. HOW THE ADAPT29K WORKS The ADAPT29K runs on a different processor than the target. It performs all operations on the target by controlling the target Am29000. A buffered cable connects the ADAPT29K to the target's Am29000 socket. Figure 10 shows the signals carried on the cable. Note that although the ADAPT29K traces the address bus, it cannot drive it, and, consequently, cannot provide an overlay memory. It uses the target Am29000 to set up all memory addresses before it can access them. Execution Control The execution state of the target Am29000 is controlled by using the CNTLO and CNTL 1 signals. By asserting different combinations of the two signals, the Am29000 can be placed in one of four states: RUN, HALT, STEP, and LOAD TEST INSTRUCTION. How these states affect the processor is explained in detail in the Am29000 User's Manual, order #1 0620. Introduction to the Am29000 Development Tools The LOAD TEST INSTRUCTION state should be noted due to its importance to the ADAPT29K. Because the LOAD TEST INSTRUCTION state interrupts normal sequential processing and permits a sequence of instructions to be loaded into the processor's instruction stream, the ADAPT29K, using the LOAD TEST INSTRUCTION STATE, can force the processor to perform operations on the target. Memory Access Due to the high speed of the Am29000, the ADAPT29K, unlike some in-circuit emulators, does not provide any overlay memory. To maintain real access times, the processor must be kept as physically close to its memory as possible. There is no time available for the propagation delay that would be experienced in accessing memory across the interface cable to the ADAPT29K. ADAPT29K Target ~ Data Bus 0-31 CNTL,-CNT~ ~ V... y STAT,-STATft .... .... ........ --...... lEST RESET ~ DRDY DERR Instruction Bus 0-31 Vt ~ ) All Am29000 Signals (except INCLK, SYSCLK, CNTL" TEST, RESEl) CNT~, 11014A-10 Figure 10. The ADAPT29K·to-Target Interface 3·51 29K Family Application Notes All target code and data is stored on the target. When the ADAPT29K is commanded to display a data object, it places the target Am29000 in the LOAD TEST INSTRUCTION state. Then a sequence of instructions is inserted to store the present Am29000 state, set up a new memory address, load the data into an Am29000 register, store the data to the ADAPT29K, and restore the Am29000 state. This method imposes certain requirements. Because data is transferred between the ADAPT29K and the target over the data bus, the target memory must be protected from corruption. To prevent inadvertent changes to the target memory, it must be disabled from responding when the ADAPT29K and the target processor are transferring data. There are two ways of doing this: (1) the memory can be disabled by a low state on the PIN169 alignment pin (pin D4), or (2) the target memory can be disabled when an 06 hex is decoded on the OPT2-0PTo pins. When the contents of instruction ROM must be displayed, the ADAPT29K must instruct the processor to read instruction ROM as data. Hence, a hardware path must exist for data stored in the instruction ROM space (on the instruction bus) to be loaded into an Am29000 register from the data bus. Similarly, when the ADAPT29K is used to download a program, the code will be written word-by-word to the target Am29000, which then writes the instructions into proper memory space. Suppose, for example, code is to be written into the instruction/data RAM. Because the ADAPT29K has no means for virtual translation of addresses, it will use Store instructions to write the code into the absolute address in the instruction/data space. When the Am29000 goes to execute the code, it will expect to fetch its instructions over the instruction bus. This requires that there be a hardware path from the data bus to the instruction bus and a one-to-one correspondence between addresses on the data bus and the addresses on the instruction bus. This occurs because the instruction is stored at an address on the data bus, but is fetched via the instruction bus. In other words, instructions fetched from an address in the instruction RAM space via the instruction bus must produce the exact information as would be retrieved from the same address in the data RAM space via the data bus. Breakpoints Because the Am29000 is one of the fastest commercial processors available, there is no practical way to read each address on the address bus and compare it against a breakpoint table to determine if a break should occur, as is done in an in-circuit emulator. The method used by the ADAPT29K is to swap a halt instruction into 3-52 memory at the location of the breakpoint. When the executing processor encounters the breakpoint, it halts. Then, the ADAPT29K, upon detecting the halt, compares the halt address with the breakpoint table and determines if there is a match. If there is, it swaps the original instruction back into memory and informs the operator that a breakpoint has occurred. This method of setting breakpoints also contributes to the requirement for a one-to-one translation of addresses between the data bus and the instruction bus. For example, when the ADAPT29K sets a breakpoint in the instruction ROM space, it does so by using the target Am29000 to read the original instruction, then writes the halt into the address location. This is performed as a data movement operation, using the bi-directional path to the instruction bus discussed in the Memory Access section. For the breakpoint to be effective, the executing program must encounter the breakpoint at the same address at which it was stored. TARGET DESIGN REQUIREMENTS Throughout the preceding discussion, it should be clear that the ADAPT29K only interfaces to the target via the target Am29000, and uses only the target memory for storage of the application program. This places certain hardware requirements on the application. These are listed below. Fora specific example, see the Standalone Execution Board section. 1. The physical device in the instruction ROM space must be a RAM device if code is to be downloaded to the instruction ROM space, or if breakpoints will be set in the instruction ROM space. 2. A bi-directional path must exist between the instruction and data buses. 3. There must be a one-to-one translation between instruction bus addresses and data bus addresses. 4. The ADAPT29K must be able to disable the target memory using a low signal on the PIN169 alignment pin (D4), orwhen OPTo-OPT2 are 06 hex. 5. Physical clearance must be provided for the connection of the interface cable at its proper orientation. 6. Signals driven by the ADAPT29K (see Table 5) must be open-collector or tri-state. Introduction to the Am29000 Development Tools MON29K TARGET RESIDENT MONITOR Table 5. Am29000 Signals Driven by the ADAPT29K Configuration Pin Alignment pin 031-00 131-10 OERR RESET OROY STAT1-STATo TEST MON29K is a target-resident monitor that has functionality similar to the AOAPT29K monitor. MON29K provides many important debugging capabilities, including memory display and alteration, code uploading and downloading, and assembly and disassembly. However, unlike the AOAPT29K, MON29K is an entirely software product. It resides completely in the target memo ry and executes on the target Am29000 (see Figure 11). (Input with pull-up resistor)I.2 (Tri-state) (Tri-state) (Input with pull-up resistor)1 (Open coli. pull-up with 1 K ohm resistor) (pull-up resistor) 1 (Input) (Open collector)3 1. Pull-up resistors should be 330 to 1000 ohms. 2. This is an optional configuration. It is used if memory will be disabled by the alignment pin (PIN169). 3. Note that mT is active longer than ~. Since all outputs will be in a high-impedance state, it may be prudent to pull up all Am29000 outputs to avoid ambiguous inputs (to other devices). MON29K has I/O driver routines to handle two serial ports. Either port can be used to receive commands, although the hardware must be supplied by the target. With the proper hardware, MON29K can receive commands from an ASCII terminal or a remote host. It also can act as the interface between XRAY29K and the target. MON29K is supplied in C source code form so the I/O drivers and service routines can be modified to fit the particular hardware environment. Since it is entirely software, MON29K can be permanently embedded in the product. It takes only 256K of address space in instruction ROM; thus, it can remain with the application and be used to diagnose problems at all stages of the product life cycle, from development to field support . • 11111111111 ~----------------~ Modem Communications Link PC or Terminal Modem Host Computer System DIE RS232 Port MON29K Installed DCE RS232 Port Target System 11014A-l1 Figure 11. MON29K System Connections 3-53 29K Family Application Notes Table 6. MON29K Commands FEATURES MON29K provides powerful testing capabilities. Many of MON29K's features are, in fact, the same as the ADAPT29K. These include: • Display and alteration of memory, I/O ports, and registers. Using MON29K, target data can be displayed, set, or altered. All Am29000 memory spaces may be accessed, including: Am29000 internal registers (global, local, and special), coprocessor registers, instruction/data RAM, or instruction ROM. • In-line assembly and disassembly. MON29K comes with a built-in, in-line assembler/disassembler. Am29000 instruction mnemonics can be converted to machine codes and stored at a specified location, or ranges of addresses may be disassembled and displayed in mnemonic form. • Uploading and downloading of programs. MON29K can use two serial ports, assuming they are provided by the target hardware. One port is a data communications equipment (DCE) port; the other is a data terminal equipment port (DTE). Files may be uploaded or downloaded in Motorola or Tektronix formats. Also, XRAY29K can communicate with MON29K through one of the ports. • Execution Control. MON29K can control target execution. It can initiate full-speed execution, or singlestep the processor. • SeVReset Breakpoints. Both permanent and temporary breakpoints are supported. • On-line help. On-line help that shows the complete syntax is available for all commands. MON29K Commands Many of the MON29Kcommands (and consequently the features) are identical to those of the ADAPT29K. The MON29K commands, all of which are implemented in ADAPT29K, are listed in Table 6. Command A B C D E F G I L M N o R S T V X XC XP Xl XU Y Description Assemble in memory Breakpoint display, set, and reset Check execution state Display registers/memory End execution command list Fill registers/memory Go (start program execution) Input from a port List memory Move memory Change the "normal character" Output to a port Enter remote mode Set registers/memory Trace (single-step) instructions Save memory to a file Display key registers Display/set co-processor registers Display/set protected registers Display/set TLB registers Display/set unprotected registers Load a file to memory Differences Between MON29K and ADAPT29K Because MON29K runs on the target processor, not as a separate unit, it has limitations that the ADAPT29K does not have. In particular, MON29K has no K (Kill), S (Jam), Z (Trace), or W (interface diagnostics) commands. MON29K is not able to assert a kill command because when the application is running, the application controls the processor. Clearly, when MON29K is not in control of the processor, it has no means of evaluating serial input and taking 29K polled the serial 110 device, but such continuous polling would hinder real-time execution. Instead, to allow programs to be forcefully terminated, MON29K can be configured to respond to interrupt-driven serial I/O. When MON29K is initialized to respond to interrupt-driven serial I/O, it intercepts a CTRL-C and passes control to a handler that recovers the processor to MON29K. This technique is effective in most cases, except if the application program has reached a HALT instruction. Then, the system must be reset. Usage of interrupt-driven serial 110 is determined as an option of the command (not present on the ADAPT29K). a 3-54 Introduction to the Am29000 Development Tools TARGET DESIGN REQUIREMENTS MON29K does place some requirements on the target design. They are listed below. For a sample implementation of the compatibility requirements, see the Standalone Execution Board section. 1. The physical part in the instruction ROM space must be a RAM device if the code will be downloaded to the instruction ROM space, or if breakpoints will be set in the instruction ROM space. 2. The Am29000 cannot write on the instruction bus, so a bi-directional path must exist between instruction and data buses. 3. Instruction bus addresses must produce the same data as data bus addresses. 4. As a target-resident monitor, MON29K does take up some of the target memory; thus, sufficient memory must be provided for MON29K. An application using MON29K must have 256 Kbytes of memory in the instruction ROM space for the program, and a 64Kbyte workspace in Instruction/data RAM. Both spaces must begin at address 0 (Or and Od). 5. If program control must be recovered from the application before it ends or returns control normally, accommodations must be made to use interruptdriven serial I/O. When interrupt-driven serial I/O is used, a MON29K interrupt routine will handle a CTRL-C by terminating the application program and returning control to MON29K. 6. MON29K expects the serial I/O driver to be an 8530 serial communications controller. Using a different I/O driver will require modifications to be made to MON29K. 7. AMD cannot anticipate every possible scenario in which the Am29000 will be introduced, and it is possible that MON29K will require some modifications to the I/O drivers and service routines before it can run on the target. Although binary code is available from AMD, MON29K is supplied in source code form. Of course, any changes will have to be compiled using a C compiler that produces object modules for the Am29000. XRA Y29K SOURCE-LEVEL DEBUGGER XRAY29K, the high-IeveVassembly-ievei debugger, is a program that provides an interactive, windowed environment for debugging Am29000-based systems. Using XRAY29K, program statements may be read in source language, and data objects may be modified and changed by referencing symbol names. Thus, target op- erations can be performed using source-level constructs, rather than machine codes and numeric addresses. To further clarify the target environment, XRAY29K's muhi-window interface simultaneously displays user-selected program information. Commands are issued to XRAY29K using a comprehensive debugger command language. The language supports a wide range of functions, including setting breakpoints, single-stepping, and examining or altering any C- or assembly-language variables. The language syntax is very similar to C, and also supports debugging commands, creation of symbols during a debugging session, and convenient specification of address ranges. XRAY29K resides on a host system and communicates with the target system through either the ADAPT29K or MON29K. Frequently, the host system is an engineering workstation attached to the ADAPT29K, as shown in Figure 12. In that system, XRAY29K provides a comfortable user-interface, while operations are asserted on the target by the ADAPT29K. Alternately, XRAY29K could reside on a mainframe and communicate with a target running MON29K. The user interface could then be done via an ASCII terminal. FEATURES XRAY29K supports source-level debugging in either of two modes: high-level or assembly-level. In high-level mode, an application can be debugged using Clanguage expressions and statements. In this way, C variables and expressions replace numeric addresses for memory access, and the code can be viewed by line number or procedure name. In assembly-level mode, an application can be debugged using assembly-language statements. The assembly-level mode additionally allows machine-level register and status bit manipulation. Commands are given to XRAY29K using its powerful debugger language, thus gaining access to XRAY29K's full range of debugging services. The services include: • Setting and examination of memory and register contents using the declared format and the variable name . • Simple and complex breakpoints that can be set and removed in either C-Ianguage or assembly-language source code. • Single-step and full-speed program execution. • Assembly and disassembly of object code. • Simulated I/O and interrupts. • Execution time measurement. 3-55 29K Family Application Notes XRAY29K DCE RS232 Port ~runnlngon the PC- PC or Terminal ADAPT29K Target 11014A-12 Figure 12. XRAY29K System Connections The commands for manipulating memory and registers are shown in Table 7. Table 7. XRAY29K Memory and Register Commands Command Description compare copy fill search setmem Compare two blocks of memory Copy a memory block Fill a memory block with values Search a memory block for a value Change the values of memory . locations Change a register's contents Examine memory area for invalid values setreg test Commands for controlling program execution are listed in Table 8; otherdisplaycommands are listed in Table 9. Table 8. XRA Y29K Breakpoint and Execution Commands Command Description breakinstruction clear go gostep Set an instruction breakpoint Clear a breakpoint Start or continue program execution Execute macro after each instruction step Execute a number of instructions or lines Step, but execute through procedures step stepnocall 3-56 Table 9. XRAY29K Display Commands Command Description disassemble dump expand Display disassembled memory Display memory contents Display a procedure's local variables Search for a string Open a file or device for writing Print formatted output to a viewport Display C source code Monitor variables Find string's next occurrence Discontinue monitoring variables Print formatted output to command viewport Print a variable's value find fopen fprintf list monitor next nomonitor printf printvalue Windowed Information Display XRAY29K shows all critical program information at once in multi-windowed displays. The contents of the runtime stack, the selected general-purpose registers, the current source lines being executed, or virtually any other program information, can be checked at a glance, without the need to constantly request each piece of information individually. Information is grouped into screens, which are composed of one or more windows of specific data called viewports. There are three predefined screens: highlevel, assembly-level, and standard 110. Distributed among these screens are the 17 pre-defined viewports listed in Table 10. Introduction to the Am29000 Development Tools Figure 13 shows the high-level mode screen display. It has four viewports: data, trace, code, and command. This screen is displayed when an object module generated by a C source program is executed. Figure 14 shows the assembly-level mode screen display. It has five viewports: data, stack, disassembled code, registers (Am29000), and command. This screen is displayed when an object module generated by an assembly-language program is executed. Table 10. XRA Y29K Predefined Vlewports Viewport Description Command(2) Debugger commands are submitted to XRAY29K from this viewport. There is a command viewport for both high-level and assembly-level modes. Code(2) Displays source code in high-level mode or disassembled instructions in assembly-level mode. Data(2) Displays monitored variable expressions in high-level and assembly-level mode. Trace Shows the procedure calling chain (high-level mode only). Stack Shows stack contents beginning from the stack pointer (assembly-level mode only). Displays current values of Am29000 registers (assembly-level mode only). Register Status Line(2) Used for debugger command information such as CPU type, current module name, and current operation. This viewport is present in both high-level and assembly-level modes. Standard I/O Shows interactive information being received from the std.ln or sent to the std.out. Break Shows breakpoint information such as number, address, module name. Temporarily overlays top of screen when breakpoint is encountered. Error Appears when an error occurs to indicate type and source of error. Help Shows on-line help information when requested. Log Displays logged keystrokes. Journal Shows all previous comma\1ds and their results. DATA ======== 3 ::;];'1.000018C4!??????\\ 1===== TRACE 4= 0.00010004:CRTO_S\\start 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CODE 1 2 3 4 5 6 7 8 1* sievex.c -- scaled down sieve with maxprime_2 instead of 8091 */ /* Eratosthenes Sieve prime number calculation *1 #define maxiter 1 #define maxprime_2 9 extern void printi\(\); extern void prints\(\); 9 10 Command > > extern char out ut; 29000 MODULE: CRTO_S BREAK #: 0 HELP=FS CO M ND====== Note: in startup routine. Press F9 M to go toA main. host ll. V# 1.0 11014A-13 Figure 13. The Standard High-Level-Mode Screen 3-57 29K Family Application Notes ~ DATA===12] r:::=== STACK = LR5 LR4 LR3 LR2 LR1 126->LRO I 00010004 00010008 0001000C 00010010 00010014 00010018 0001001C 00010020 00010024 00010028 0001002C Command 25010110 5E40017E 15810118 0300838C 02008301 03008240 03017921 72450101 030083BO 02008301 03008241 CODE SUB ASGEU ADD CONST CONSTH CONST CONST ASNEQ CONST CONSTH CONST 11 REGISTERS gr1,gr1,Ox10 Ox40,gr1,gr126 lr1,gr1,Ox18 lr3,Ox8c lr3,Ox10000 lr2,Ox40 gr121,Ox121 Ox45,gr1,gr1 lr3,OxbO lr3,Ox10000 lr2,Ox41 29000 MODULE: 14 =00000000 =00000000 =00000000 =00018000 =00080000 =000018C4 cha=000019FC chd=OOOOOOOO chc=00008116 q =00000000 pcO=00010008 pc1=00010004 pc2=00010004 grO =00000000 gr64=00000B84 gr65=00000000 CRTO_S BREAK #: 0 13 =- vab=OOOO mu =301 ops=0060 lru=OO cps=0060 alu=OOO cfg=Ol-11 bp =00 rbp=003F fc =00 tmc=FF62 cr =00 tmr=OFFFF62 gr1 gr96 gr97 HELP=F5 =0007FFF8 =00000210 =00000000 V# 1.0 C O M M A N D = = = = = = 10 ~ auto halt at address Ox00010004 Note: in startup ro~tine. Press F9 to go to main.' > 11014A-14 Figure 14. The Standard Assembly-Level Mode Screen The standard I/O screen has one regular viewport: the standard 1/0 viewport, although the breakpoint, error, and help viewports also will appear. The standard 1/0 screen is used when interactive input is requested from the standard input device, orwhen output is directed to the standard output device. The viewport commands, shown in Table 11, con!rol the way information is displayed on the screen. By uSing the viewport commands, a viewport's size, color, and cursor position can be changed. Viewports can be added or deleted, and custom screens and viewports can be defined. Table 11. XRAV29K Viewport Commands Command Description vactive vclear vclose Activate a viewport Clear data from a viewport Remove user-defined viewport or screen Select viewport colors Attach a macro to a viewport Create a screen or viewport or change size Set a viewport's cursor position Increase or decrease a viewport's size vcolor vmacro vopen vsetc zoom Utility Functions In addition to its powerful features for execution control and display of system information, XRAV29K provides several utility features. These features ease debugging by streamlining the routine operations. The services include command keys, macros, and command files. Command Keys The most· frequently used XRA Y29K functions have been assigned to a key combination referred to as a 3·58 Introduction to the Am29000 Development Tools "command key." By using command keys, common debugger commands can be entered with the minimum number of keystrokes, often only one key or a CTRL-key combination. Macros XRAY29K has a powerful, multifaceted macro facility. Because a macro may contain complex user command procedures, which are executed by entering the macro name on the command line, the facility can be used for several purposes. Table 12 shows the debugging language's macro-related commands. Table 12. Macro Commands Command Description define show Create a macro Display the macro source Macros can be invoked when a breakpoint is encountered. Powerful conditional and looping statements in the command language allow the macro to evaluate program or register variables, and alter program flow depending on their condition. Hence, macros can be used to establish very complex breakpoints that take specific action, depending on their environment. Macros also can be attached to user-defined viewports. When the associated window is opened, the macro will execute. This type of macro can write specific data into the window, which is useful for monitoring environmental information. Command/Batch Flies XRAY29K can process command files. A command file contains one or more debugger commands that can be processed by XRAY29K automatically, without the need for user interaction. This is also called batch-mode operation. Command files can be used to recreate a debugging session, easily implement automated test procedures, and eliminate reentering of frequently used command sequences. Other XRAY29K Utility Functions XRAY29K possesses several other utility functions. These include services for manipulating symbols, evaluating expressions, setting display and recording modes, and contrOlling the session. Table 13 lists the symbol commands, Table 14 lists the miscellaneous utility commands, and Table 15 lists the session commands. Table 13. Symbol Commands Command Description add delete Create a symbol Delete a symbol from the symbol table Display symbol, type, and address Specify current module and procedure scope printsymbols scope Table 14. Miscellaneous Utility Commands Command Description cexpression erro help include log Calculate an expression'S value Set include file error handling Display on-line help screen Read in and process a command file Record debugger commands and errors in a file Select debugger mode (high-level or assembly-level) Set debugger options for this session Pause simulation Simulate processor reset Reset the program starting address Save the default start-up options mode option pause reset restart startup Table 15. Session Command Command Description host Enter the host operating system environment Load an object module for debugging End a debugging session load quit TARGET DESIGN REQUIREMENTS XRAY29K itself places no restrictions on the target hardware design. However, being strictly a software product, XRAY29K needs a hardware connection to the target. For debugging Am29000-based systems, XRAY29K must be used in conjunction with either ADAPT29K or MON29K; the target design requirements for those tools apply. 3-59 29K Family Application Notes XRAY29K requires a host system. Versions of XRAY29K currently exist for UNIX and DOS environments. XRAY29K works only with object files that have been compiled in such a way that they contain debugger information regarding line numbers, etc. Thus, to use XRAY29K, either the ASM29K macro-assembler or HighC29K cross-compiler must be used, as well as the ASM29K linker. These are explained in the "29K Tool Chain" section. Am29000 PROBE INTERFACE The Am29000 probe interface provides a non-intrusive, low-capacitance connection to an Am29000. Inserted between the processor and its socket, the probe interface makes the Am29000 pins available for convenient attachment to a logic analyzer or other test equipment. Figure 15 shows the probe interface. The software available with the probe interface supplies configuration information about the Am29000 pins and instruction mnemonics to either an HP 1650 or 16500 log ic analyzer for display formatting. Whe n the display is formatted, the logic analyzer will disassemble instructions into mnemonics and display processor, bus, and error status, as well as data bus activity. Figure 16 shows how the probe interface is connected between the logic analyzer and the target. Although the probe interface was designed for the HP 1650 or 16500 logic analyzer, any type of test equipment can be attached to it. The following discussion assumes a connection to an HP 1650 or 16500 logic analyzer, unless otherwise stated. 11014A-15 Figure 15. The Probe Interface 3-60 Introduction to the Am29000 Development Tools Logic Analyzer I I Am29000-Based System 11014A-16 Figure 16. Connection of the Probe Interface FEATURES DISPLAYS The probe interface can add important event-triggering and high-speed (10 ns) resolution capabilities, including: Figure 17 shows data bus information, as would be shown on an HP 16500 logic analyzer. Figures 18 and 19 show signal state and timing screens and the disassembly screen for the 16500 analyzer. • Convenient connection to the target. • Low-capacitance probing. TARGET DESIGN REQUIREMENTS • Completed status information, including identification of burst, pipeline, and simple accesses. Because the probe interface only monitors Am29000 signals, there are no particular target compatibility requirements except for sufficient clearance to install the probe interface. Most applications will not be affected by low-capacitance, high-impedance connection; however, see the probe interface data sheet for electrical and physical specifications. • Status reporting of bus conditions, such as slave accesses, wait states, and co-processor transfers. • User-configurable setup and hold parameters allow triggering on a specific target condition. • Monitoring of all Am29000 signals except INCLK. The probe interface comes with the disassembler, configuration files, and a user's manual. Apart from supporting the physical size and electrical specifications of the connection, a logic analyzer is needed. The logic analyzer should have 80 to 160 state channels. Some termination adapters also are needed, . depending on the number of state channels on the logic analyzer. 3-61 29K Famll~ Aee"catlon Notes ( . Statemming C ( Markers Off ) ( Listing 1 ( ) II II AM29000 Data Bus access type bus status simple acc. read data data data data read data data data data read data data data data read Ox25788902 Ox4B79780E OxACOO7915 OX257D7D24 10 Group Run ) ) data -5 -4 -3 -2 -1 )( Cancel simple acc. simple acc. Simple acc. II R~W STAT II Hex C36B c76B C76B C76B C76B C36B C76B C76B C76B C76B C36B C76B C76B C76B C76B C36B wait wait wait wait wait wait wait wait wait wait wait wait symbol RD RD RD RD 11014A-17 Figure 17. HP 16500 Data Bus Information Display ( Statemming 8 ( "'rum"'a" OH ) )( ( Waveform 1 At )( ) ( Cancel ) ( Group Run ) IIREO) 88GG 8 8 IIREO IIBREO IMBACK IRDY STATO-Q STAT0-1 STAT0-2 IBGRT J 11014A-18 Figure 18. HP 16500 Signal and Timing Display 3-62 Introduction to the Am29000 Development Tools 29K rNST Markers - State Listing off Label Base -0247 -0246 -0245 -0244 -0243 -0242 -0241 ~ -0239 -0238 -0237 -0236 -0235 -0234 -0233 AM29000 Disassembly mnemonics 000018AO 000018AO 000018AO 000018AO 000018M 0000IBAO 00001BM 000018M 000018AO 00004000 00004000 00004000 00004000 00004000 00004000 CONSTH MTST CON 5TH MTST CONSTN IRET ASNEQ JMP SUB ASGEU GR85.0xOOFF TMC.GR85 GR85.0xOlff TMR.GRB5 GRB4,-OxOOOl 6B,SP,SP -OxOOO04+PC rBUS = 70400101 rBUS = c67AOBOO rBUS = CEOOOBSO rBUS = CEOOOB50 rBUS = CEOOOB50 SP,SP,Oxl0 64,SP,GRI26 *cont. brst *cont. brst *cont. brst *cont. brst *cont. brst *cont. brst *cont. brst *cont. brst *int ret wait state wait state wait state wait state brst init cant. brst E747 E747 E747 E747 E747 E747 E747 E747 E7SF 64D6 61D6 61D6 61D6 6146 6147 11014A-19 Figure 19. HP 16500 Disassembly listing SUMMARY OF THE TOOLS From the sections on ADAPT29K, MON29K, XRAY29K, and the probe interface, it should be clear that a comprehensive range of tools exists for developing Am29000-based systems. Each of the available tools has unique characteristics that make it more advantageous in particular situations. Depending on the characteristics of the application, one or all of the tools may be needed. This section summarizes the information presented in the previous sections with emphasis on highlighting what conditions are most appropriate for a particular tool or tool combination, and what compatibility requirements are placed on the target as a result of the tool selection. SELECTION GUIDE certain situations if they are combined with XRAY29K and/or the probe interface with a logic analyzer. The following questions highlight the critical target characteristics that suggest the optimum tool selection. How much memory does the target have? Perhaps the most crucial factor in deciding whether the ADAPT29K or MON29K is most appropriate depends on the size of the available target memory. This determines whether or not MON29K can be used. Because MON29K is target resident, it is necessary that the target have at least 256 Kbytes of space in instruction ROM, and 64 Kbytes of instruction/data RAM for MON29K's workspace. An application without this memory space will not be able to use MON29K, and will have to use the ADAPT29K. In the development phase of virtually any Am29000based system, either the ADAPT29K or MON29K will be needed. It is possible to debug a microprocessor system with only a logic analyzer and a PROM programmer, but this method is not very practical when compared against the following ADAPT29K and MON29K features: For systems with sufficient memory, MON29K, ADAPT29K, or both may be used. While both have excellent debugging features, the ADAPT29K has some features MON29K does not, including: • Memory display and modification, including special registers. • Provides a bus trace facility • Uploading and downloading of programs. • Execution control, including setting breakpoints and single-stepping. • Can halt a failing program • Can force execution of an Am29000 instruction • Provides memory diagnostics • Can be used with a target that cannot run its program Apart from the advantages gained from MON29K and the ADAPT29K, their performance can be augmented in 3-63 29K Family Application Notes It should be noted that in most cases (see the Differences Between MON29K and ADAPT29K section), MON29K can halt a crashed program if an interruptdriven serial 110 is provided on the target, and the target still is responding to interrupts. How many units will be produced? The number of units to be produced determines the volume over which the development and servicing costs can be defrayed. The ADAPT29K, while more powerful than MON29K, costs more and will raise the amount of nonrecurring charges that must be recovered. Of course, the difference will be insignificant for the advantages gained in large volumes. In fact, it may be advisable to use the AOAPT29K when the product is in development and final test, using MON29K for field service. How and where will servicing be performed? Servicing can be performed on-site or at service centers. Often this depends on the size, function, and value of the application system. If the system is moved to a service center for repair, the AOAPT29K will provide the most capabilitie~, particularly when coupled with the probe interface and XRAY29K. However, the AOAPT29K may be too bulky to perform maintenance on-site. MON29K can be embedded in the application and used to diagnose faults via a portable ASCII terminal or PC (which could run XRAY29K). How complex is the program? If the program is complex, XRAY29K should be considered. Debugging complex programs using hex values and physical addresses can be very time consuming and error prone, especially programs containing many modules. Often, XRAY29K's windowed interface and source-level debugging language will greatly reduce time spent tracking down errors encountered in address calculations, decimal to hex conversions, or just looking up values in a listing. ADAPT29K 1. The target must support RAM in instruction ROM. 2. A bi-directional path must exist between the instruction and data buses. 3. There must be a one-to-one translation of addresses between buses. 4. Target memory must be disabled either by a low signal on the alignment pin (04), or when OPT2OPT1 are 06 hex. 5. There must be physical clearance for the connection of the interface cable at the proper orientation. 6. The signals driven by the AOAPT29K must be opencollector or three-state. MON29K 1. The target must support RAM in instruction ROM. 2. A bi-directional path must exist between the instruction and data buses. 3. There must be a one-to-one translation of addresses between buses. 4. The system memory must include 256 Kbytes in instruction ROM beginning at Address 0 to store the MON29K program, and 64 Kbytes of instructionl data RAM at Address 0 for MON29K's workspace. 5. If program control must be recovered from the application without it ending or returning control normally, accommodations must be made to use interrupt-driven serial 1/0. 6. The 1/0 drivers may have to be modified. XRAY29K 1. Requires a host system, such as an engineering workstation. 2. Requires MON29K or AOAPT29K. SUMMARY OF COMPATIBILITY REQUIREMENTS Once a combination of tools has been selected, it is important to ensure that they will be compatible with the target system. The following lists summarize the compatibility requirements for each tool. More detailed explanations can be found in the specific sections related to the particular tool. 3-64 Probe Interface 1. Requires a logic analyzer (an HP 1650 or 16500 is recommended). 2. Requires termination adapters. 3. There must be sufficient phYSical clearance to allow the probe to be attached to the target. Introduction to the Am29000 Development Tools A COMPATIBILITY EXAMPLE: STANDALONE EXECUTION BOARD The Standalone Execution Board (STEB) is an excellent example of compatibility with all the development tools. It is a complete Am29000-based system that can run many types of programs, including the software packages MON29K and VRTX32129000®. The STEB can also be used with the ADAPT29K and/or the HP probe interface. STEB also can be used as an execution vehicle for application software or a comparison system for isolating hardware faults. This section focuses on how the STEB's design achieves compatibility with the development tools. The major areas of the STEB are discussed, with emphasis on how each area contributes to compatibility. See Figure 20 for a block diagram of the STEB. System Address Bus Data Bus Am29000~------------~ Processor 1I'v-----------,/I Buffered Address Bus Instruction! Data RAM Space Bank #0 Bank #1 Bank #2 Bank #3 11014A-20 Figure 20. Block Diagram of the STEB 3-65 29K Family Application Notes A 9513A timing controller is installed at U55-58 , and U64 on Sheet 10. The 9513A supports up to five 16-bit counters. Address decoding for various timer functions is provided by a PAL (U56 on Sheet 10). The clock source can be from the Am29000, a hardware oscillator, or a crystal oscillator. FUNCTIONAL DESCRIPTION Mounted on a single card, the STEB contains an Am29000 with memory, 110, and system timing resources. See Appendix A for schematic diagrams, 'Sheets 1 through 12. In addition to the Am29000 (U51 on Sheet 2), the STEB supports the Am29027 arithmetic accelerator (U1 0 on Sheet 3). The Am29027 is capable of high-speed, single-precision and double-precision arithmetic using fixed and floating-point numbers. It can be operated in pipelined or non-pipe lined (flow-through) mode, depending on system capability and requirements. The pipelined mode maximizes the overall execution time for scalar operations. Power to the STEB is provided by a series-regulated power supply that provides a regulated +12 VDC and +5 VDC to the board. Connectors are furnished for attachment to the type of power supply used with PCs. CIRCUIT AREAS CONTRIBUTING TO COMPATIBILITY System timing can be provided by one of two methods. The Am29000 itself can generate the system clock, which is output on the SYSCLK pin; or Circuitry on the board (U8, U9 on Sheet 4) can generate an external clock Signal that can be applied to the SYSCLK pin of the processor. Clock selection is done by jumpers. In the following section, circuit sections related to compatibility issues are described. The circuit sections are referenced by their locations on the STEB, as indicated in Figure 21. Memory is supported in both the instruction ROM and instruction/data RAM spaces. By using dip switch (SW3 on Sheet 7), between 0-7 wait states may be selected. Each space has its own wait-state generator, and may be configured separately, depending on the access speed of the installed memory devices. Because the ADAPT29K and MON29K are very similar to each other, several STEB design aspects simultaneously address their compatibility requirements. These include the type of memory supported, and the bus architecture for accessing memory. r ~ ... P'" l~ Buffers ADAPT29K and MON29K Compatibility II. Am29000 Processor J'- T• Wait State SW3 ~ ~ Instruction Bus Buffered Address ~=1 DREOT,-OREOTo=OO OPTrOPTo= 001 RlW=O Data Bus Bus ... ~ ROM' Space EPROM or RAM Bank #0 Bank #1 J ~ ... Swap I Buffers . ... Instruction/Data RAM Space Bank #0 Bank #1 Bank #2 Bank #3 I~ I" ... ... Figure 21. Data Read from Instruction/Data RAM 3-66 " y '""- 11014A-21 Introduction to the Am29000 Development Tools Support for RAM Devices in the Instruction ROM Space the result to the transceiver, the STEB channels data between the buses at the appropriate time. The STEB supports RAM in the instruction ROM (U25, U32 on Sheet 5) space and the instruction/data RAM (U33-U43 on Sheets 6 and 7) space. The instruction ROM space has a maximum capacity of 1024 Kbytes and uses 27010 EPROMs. The instruction/data RAM space has a maximum capacity of 512 Kbytes and uses 32-Kbyte x 8 static RAMs. The swap buffer is not required in many straightforward operations . .For example, when assembling/disassembling instructions or reading/writing other data into the instruction/data RAM space, data is written directly to the instruction/data RAM space overthe data bus. likewise, a standard instruction fetch from the instruction ROM space does not require the swap buffer, as instructions may be loaded directly into the processor's instruction pre-fetch buffer from the instruction bus. Instructions may be executed from either space. So that programs can be downloaded via the AOAPT29K or MON29K, the instruction ROM area can be constructed from 32-Kbyte x 8 static RAMs. However, the maximum memory size using RAM is limited to 256 Kbytes. However, when disassembling instructions in the instruction ROM space, the instructions must be read as data, which makes the swap buffers necessary. The configuration of the IREOT bits causes an instruction to be accessed from the instruction ROM, gated onto the data bus, and read into the processor. Note the combination of control signals indicated on the side of the figure. They are used to select the path for data movement. Swap Buffer On the STEB, a swap buffer provides the necessary bi-directional path between the data bus and the instruction bus (U11-U14 on Sheet 2). The swap buffer is created from four 74ALS245 octal bus transceivers. Transfer direction and timing are controlled by the transceiver's ENA and A~B inputs. By decoding the OREOT1-0REOTo, IREOT, o PT2-0PTo, OREO, and IREO signals (U17, U18, U49 on Sheet 4) and applying ~ I Buffers It.... I~ L..-_ _....1 " Similarly, when instructions are fetched from the instruction/data RAM, they must be transferred to the instruction bus from the data bus. The direction of data movement is shown by the darkened path in Figure 22. ~ Am29000 Processor ••••••"~ I.~,. • .. ..... 1 DREO= a IREOT = 0 OPT2-OPTo = XXX R!iJ=X Wait State ....._ _... SW3 Instruction Bus Buffered Address Bus ROM Space EPROM or RAM Bank #0 Bank #1 k"':::==::j....l A . ... Instruction/Data RAM Space Bank #0 Bank #1 Bank #2 Bank #3 Swap Data Bus IL.I I• I~ •" Buffers" . .. '-4 11014A-22 Figure 22. Instruction Fetch from Instruction/Data RAM 3-67 29K Family Application Notes One-To-One Address Translation Note that addresses in both memory spaces have a one-to-one translation. This means that when a data object is stored at a given address in the instruction/data RAM space, the exact same data object will be retrieved when the same address is asserted by an instruction fetch to the instruction/data RAM space. This is an important requirement for assuring compatibility with the ADAPT29K and MON29K because when they are downloading programs, they store instructions as data over the data bus. Neither tool has the capability to translate a virtual address, so when the program is executed it must find its instructions at their absolute addresses. ADAPT29K Compatibility In addition to the elements discussed in the ADAPT29K and MON29K Compatibility section, certain considerations were added to the STEB's design strictly for the ADAPT29K. These include tri-stating the control lines driven by the ADAPT29K and disabling memory during data transfers to and from the ADAPT29K. Trl-Stated Control Lines The STEB must relinquish some control lines to the ADAPT29K when it is operating. Therefore, these lines are tri-stated or open-collector, as was described in Table 7, thus preventing contention that they may cause unpredictable results. When the ADAPT29K is not connected to the target, the CNTLo and CNTL, lines are pulled high'to ensure that the processor is in a normal mode of operation. When the ADAPT29K is connected to the target, it isolates the CNTL,-CNTLo signals from the board. Any use of those signals by the application will be inhibited. Memory Disable The STEB supports both methods of disabling memory for ADAPT29K accesses. Via a jumper selection, the STEB can be configured to either decode an 06 hex on the OPT bits or disable memory when the alignment pin is low. When Jumper JP7 (on Sheet 7) has pins 1 and 2 connected together it causes the SEL_OP signal to PAL U20 (on Sheet 7) to be high. The ROM/RAM decode circuit (composed of U15, U20, U21, and U240nSheets 6 and 7) then decodes the OPT~OPTo pins to determine whether or not memory should be enabled. Memory is disabled by a low state on the alignment pin (D4) when jumper JP7 is used to connect pins 2 and 3 together. The low condition is decoded by the ROM/ 3-68 RAM decode circuit, which then disables memory. When the ADAPT29K is not installed, the alignment pin is pulled high to prevent inadvertent and/or intermittent memory disables. MON29K Compatibility Apart from the requirements mentioned in the "ADAPT29K and MON29K Compatibility" section, MON29K needs at least one, and preferably two, serial port(s) to communicate with the hosVoperator. It also needs sufficient memory to contain the software. Serial Ports The serial ports are provided by the 8530 serial communications controller (SCC) and support circuits located at U1, U2, and U5-U7 (on Sheet 8). The SCC is a dualchannel, multi-protocol data communications peripheral designed for use with 8-bit and 16-bit microprocessors. The interrupt request line INT can be wired to provide a trap or interrupt to the processor for MON29K. Dip switches on the board are used to select port characteristics. Because the 8530 is a dual-port device, it supports both the DTE and DCE RS232 ports on the STEB. The ports are standard RS232 ASCII ports. The DCE can be used to communicate with an ASCII terminal or PC running a terminal emulator; the DTE port can communicate with a remote host such as a UNIX machine. Because the C language does not differentiate between address spaces, the serial ports must be memorymapped into the Am29000 data space. This requirement allows C code to be used in place of assembly language. SuffiCient Memory Space Sufficient memory is provided on the STEB for MON29K. There is also room for additional application programs in the ROM space. The space normally is configured with MON29K in EPROMs (Bank 0), and RAM in the remaining banks. MON29K then could be used to download an application into the RAM in the instruction ROM space. MON29K also uses 64 Kbytes of workspace in RAM. This is provided for, with additional space available for use by the application program. Built-In Probe Interface The STEB includes built-in probe interface connectors. Thus, test equipment like the HP1650 or 16500 logic analyzer can be connected directly to the STEB, eliminating the requirement for a separate probe interface. Introduction to the Am29000 Development Tools Appendix A: STEB Schematic Diagrams PU1 6,9 R9 2K +l2V C7 41UF 1 Pll 2 ~ CII 47UF - v 3-69 29K Family Application Notes -o c· J ~..... ! ~~~~~~~~ ~=~:~J ~~J ~~J W0/'VVWvW ( ~ ~~~i~~· .~. r- . ~ .. .. ~ ~ .. ------- ~ < .::. ,. ::J CD xc. ,. .......... •• ..., ~ •• .. ., N • en ::. _ .., .. ., en ;:) IDI I~V .. () .:i • cen ___ i2~;;! ~fXQ' 0 __ In .., ...... _&II _ .. _ IN _ • _ •• It) 0 0 • I_V' xc. r,.wl xc. I_V' xc. z~v • xc. z~v • • . . . . . . . . . . . . . ::::':::: '!! '!!::: '!!::::::::::::::: ~ ~ ~ ~: ;;::;:- ~:: ~ ~:: - ~:: ~;: ~:: ~ N"_ .. .. ~" ( S ~ ~: ~: lS:: ~L ;1.....,-~"---+f++ - xc. xc. xc. xc. xc. '9_V '.dV IIdV ;; . . o. "_V • II_v' xc.. • ~f- I_V u (U .; 3-70 0- l~ ~~ v,=:: f=;: N .: ~.. c-1 1' .--- Introduction to the Am29000 Development Tools :=0 C> _NG)C\I 00:..: ,,-0 ~ I--...::-;;-..:-VV\;L.....- .. > 0 :;:) .. II rN 0 (j) N ::E « .. .. t til Otll 8Z11 8Z11 LZII 9Z11 !iZII U~ n n n n II 8 ZZIl tZII OZII Ull lUll Hll 9tll !itll ttll ~t~ ltll Otll 811 811 LII 911 !ill til til ZII til 011 n .... .1 I. AI I. .. I. 8 II I. 3-71 I~ ~I ." Q) I_WEe wrAOO- SA_BUS(I :0) [>S,l ~ WEAlh WEA24. IILI 01 ' '>R5 lOOk , . , BRESET R' 100 ~I C3 T4.1UF I 2 3 4 5 1 un 11 , SW4 Rl S,l S,l INCLK Z1 2,12 PI PI PI PI PI PI PI PI , 14 32 SIPCLK4 Ut • 14Fn UIOO . 1,11 '~t. 1,10 OPII OPI2 I_Ofe R_W I_[H. S,S,l SIPCLkO VCC OP I II~. lJ49 ,1!.,. CLkOPI vce SIPCLk2 r - - RON r - - RON BANk 0 BANk 1 - - - - tROY. PU2 P8 RONIPU IRED' RON1P27 RON," RONCEI RONOPU RONCEO RONOP27 RONOPI .1,.. I ~OTG~E JP 1 THRU JP6 TO SELECT RAM OR ROM. DIRECTION BA(15:U} 5 TO 3 5 TO 4 B. R~~:~?M, C. JP8 L. TU L:. (0 2 INCLK DIUvtN 3 SiStLR DRIvEN , • DEFAULT 8K X 8 ON THE SAME BLOCK ALL JUMPERS SHOULD POINT TO THE SAME DIRECTION. \' a 0 ::::I 2- It) en SIPCLk2 H 0 Z SIPCLU U. 14HCT2H OSC_32.000NHZ ~ 6,1 2,3,12 * SEE NOTE 2 * * AM27C512 .- * AM27C512 .AM27C512 * .- I_8Ui(lI:0) C> 2,12 AM27C512 001 DQOI' .'• 002 D03 OQ4 .• OQl . U25 ROMtEO U26 Doe • 001 ' U27 ROM BANK 0 U28 ______--'L_O~._ L...J ROMOP1 r- ROMOP27 ,..... ROMOP28 ~ * rr.===i3iJ' * AM27C512 * AM27C512 * AM27C512 AM27C512 AU AU AU Alt OQ1 OQOI' .'• DOt 003' • DQ4' OQ5 .• All Al0 At AI A7 A8 A5 AI DQI DQ7 • ' AS AZ AI 5' ROM BANK 1 a c. c: !l 5' ::J o s- AO U29 U30 U31 U32 (I) l> 3 1'1) (0 o o o C (I) < NOTES I. ADDITIONAL PIN CONNECTIONS fOR 27010 (IUK X B EPROM): (I) 2. ROMS CAN BE 27010, VCC-r-----, BA18 w ~ OR 12K X 8 RAM, RP3 PIN 10 21512, 21258 EPROM BK X B RAM 5" "3 (I) a -t o o ii) 29K Family Application Notes __ 1 1 :J L ~lllllll CCCCl C -'----- '!- 1llllll111tlF .. >< ~ '" :: :; r--::-- I ~ '" :; ~ x < ~ ~ < =: =£ ~ ~ - -~ ~ ~ ~ .; ..: ~ ~ ::: .; ~ ~ <.> '" z \llllllCIC XC::: ~~ E(~ '---- u - I ~lllllllCCCCl CL::::.; .: ~ .: - i;;; J ............ . ---------- 1--:-- -J 3-74 U ~a RESET. EH_DRDY. 2 OREQT! SA_BUS3! IREQ. IREQT " 2 2 RAM BANK O_EN. 55257P_12 55257P_12 55257P _12 55257P_12 "6 U43 ~" ,fII I Lt~:~,;t III !I , Ill!1 I, 55257P_12 55257P_12 III 'U ~w 55257P_12 DIPSW8 D4 C-..r--.. ii' '-0-4 5 102 103 • , I 106 107 106 • • • i 10 11 12 :: 55257P_12 " 0 ! 2 I 4 5 I 1 I i, lOll' 18: : "~ r.n~~t SW3 III' III !I I, -iUZZ U21 ~."..-' 1,1 , 10 "'I 102 103 ' 104 ' 105 108 107 lOS ~ It a II .. n Co II .. c: 2 o ::s .. U48 " U45 o 1 4 D-L>4 ;; ,(--RAM BANK 3 (I) ' DI_BUS(31:0) '--- l> 3I\) <0 o o o C (I) < (I) o 'C 3 NOTE. RAMS CAN BE 32K X (I) a RAM OR 8K X 6 RAM a -i o o Vi 29K Family Application Notes ..J « z: H f- :L en n::: o w f- :r: o f- o f- ~ ::; (9? ,? )? 0 0?9 ?9?9?9?9?~ -,:IH ~I- :1" :1- :1- :1-: - (9?90 ° 0 0 0 00 00 0;'] :I:I~t:I:I:I:I:I-t.;;I-· ~rl.I:" -" :1-: ~I-I=I:::I:I:I~I:I:I w:'-I I L--l--l-+-""""'-I,i' ..; II, L--I- I o C • • OC • • • • • • • CU.CIlOIO.CD O-cCWClCCCDCmCCDQCDOOCIDar:; < :;~~~~:;:;;~E~i~EE~~~~~ ~ ~~~M=~1l1lEg~=~ii~~6 C'IJ ::J }~!~~P rr[4-I-!f+++rtN!----, ~f-+-W Ir ... = .. :;;:;~ <.>, ~I --~ 0:: o.. ... Q.. ... O" ...... L 0", - ~- . .. "l -l """ .,..,,..., ~]~ Hl'"l"rnr-r-'" 0 (~ .. ( ~ ( ~~~,~,~3 ) 0( C(JU .......... 3 76 w (~OuJ eo') .......... 0:. Introduction to the Am29000 Development Tools OREO. DREOTO DREOT1 RESET. DIPSW8 11 II 01 t 12 02 13 03 14 04 15 05 18 08 17 07 18 08 : HALS245A 2 1 2 3 4 5 8 7 a: 8 o. ....... ttl SW2 B2 B3 B4 B5 B8 B7 B8 U88 PUI ~~~~~~~~ vee .. a: DIPSW8 11 II 01 t 12 02 13 03 14 04 15 05 18 08 17 07 18 08 HALS245A SWI DLBUS(15:0) 3-77 29K Family Application Notes iI:N~.n~~~:'o :::::::::! ~ <,!t ;;!I - - - - - - - => "" "" fl'" '" "" "" "" r-- - ul ;;!I - - 'I ~ .. . -~', _ .. -:~ <=> - ~JJ w f- 0 z - . ;:; .; ! 3-78 ~~ [) i ~ : ~ ... ~ ) ) . Introduction to the Am29000 Development Tools II ;:::=:::::1.:===: --- r-f. ~~ 1T o ll- :1' JI J J J l J --- - 1 ;l -i-- +1 ~( --t;. ~~ ~ ~/N.. ----- ... IJ ~~ ~ Q ( 3-79 29K Family Application Notes g:;~.~.~.I)·~·il· ~m ( T r 1 ..,.r-------------, ~~~~~~~~~ Tl ""r------------, ~~~~~~~~~ C")r-------------, ~':a~~~mm~m~ .. ., '"=>a> I .. en => I -< '" 3·80 Preparing PROMs Using the Am29000 Development Tools Application Note by Manoj Desai and Doug Walton INTRODUCTION Source code for a given application must be converted to executable Am29000™ object code and transferred to the appropriate storage media before it can be executed in a real system. Usually several utilities are involved; these include: grams that includes compilers, assemblers, linkers, and format translators. These programs perform the operations necessary to translate the source code into a machine-readable format. The components of the 29K tool chain are: • HighC29KTM Compiler • Assemblers • ASM29\{TM Assembler • Compilers • ASM29K Linker • Linkers • COFF2HEX (COFF to hexadecimal translator) • Format translators (optional, depending on the destination media) • ROMCOFF This application note shows how an example program in source code form is made into object code and downloaded to a target board with the ADAPT29\{TM Advanced Development and Prototyping Tool, or programmed into PROMs. THE 29K TOOL CHAIN The 29KTM tool chain is used to produce the executable object module. The tool chain is an integrated set of pro- Publication II Rev. Amendment 11966 A /0 Issue Date: 11/89 • BTOA (binary to ASCII translator) Figure 1 shows the relationship of the 29K tool chain elements to each other. In the following discussion, familiarity with these tools is assumed. Consult the appropriate reference manuals for more details. The 29K tool chain can be run under UNIX®, SunOS®, or DOS, but it must be installed properly on the host system before the following example can be performed. The host in the following discussion is assumed to be an IBM@ A~ or compatible. © 1989 Advanced Micro Devices. Inc. 3-81 29K Family Application Notes Cor Assembly Language Source File ""' ........ .C (C source file) or , ;S (assembly-language source file) HighC29K Compiler 1---------ASM29K Assembler .0 (relocatable object module) - .LlB .. Library Files ASM29K Linker ~ ,OUT (absolule objecl module) Binary to ASCII BTOA COFF2HEX ~ ,ASC (ASCII objecl module) PROM Programmer ADAPT29Kor MON29K Target 11966A-01 Figure 1. The 29K Tool Chain 3-82 preparing PROMs Using the Am29000 Development Tools SUGGESTED REFERENCE MATERIALS SOFTWARE Consult the following reference materials for more information on the topics covered in this application note. The software is a small program that initializes its operating environment and then continuously tests memory. It is comprised of a boot module and a C-Ianguage module. A flow chart for the complete application is shown in Figure 2. • Am29000 Streamlined Instruction Processor User's Manual, order #10620. It contains details regarding the instruction set and register organization of the Am29000. • Am29000 Streamlined Instruction Processor Data Sheet, order #09075. It embodies a great deal of information about the Am29000. including: distinctive characteristics, general description. simplified system diagram. connection diagram, pin designations and descriptions, functional description. absolute maximum ratings. operational ranges. DC characteristics. switching characteristics and wave-forms. and physical dimensions. • ADAPT29K User's Manual. It provides detailed information on the ADAPT29K. including installation. commands. theory of operation. and target design requi re me nts. • ASM29K Documentation Set. It provides complete information on the installation and use of the ASM29K assembler. linker, and librarian manager. This includes information on using the ROMCOFF and COFF2HEX utilities. • HighC29K Documentation Set. It covers how the Am29000 C compiler is used. These materials can be obtained by writing to: Advanced Micro Devices, Inc. 901 Thompson Place P.O. Box 3453 Sunnyvale. CA 94088-3453 or by calling (800) 222-9323. For questions that cannot be resolved with the current literature. further technical support can be obtained by writing or calling: 29K Support Products Engineering Mail Stop 561 5900 E. Ben White Blvd. Austin, TX 78741 (800) 2929-AMD (US) 0-800-89-1131 (UK) 0-031-11-1129 (Japan) THE EXAMPLE SYSTEM The example system used for illustration in this document consists of a generic hardware environment and a small software program. The only function of this selfcontained standalone system is to test a block of memory. This section describes how the example system works. The main portions of the program are contained in two source files: smplboot.s and cprog.c. The smplboot.s module is an assembly-language boot program that receives control on power up. The C-Ianguage program cprog.c performs the memory test. The tasks performed by smplboot.s are: (1) establish the execution environment. (2) set up a block of initialized data in instruction/data RAM (using a routine generated by the ROMCOFF utility), (3) call the main program cprog.c. and (4) evaluate the results of the memory test. If the test fails, smplboot.s halts the processor. The cprog.c program tests a 32K byte block of RAM, using a simple binary write and read test. Then, cprog.c checks the validity of the initialized data section in instruction/data RAM. After each successful completion, a flag is returned to smplboot.s, which increments a counter. If a test fails, cprog.c returns the address of the failing memory location. A memory map of the application is shown in Figure 3. Three additional files (traps.s, r29k.s, and scregs.def) contain the supporting procedures and declarations. All of the files in the application are listed in Appendices A through E. To actually perform the example, the files must be entered onto the host system. HARDWARE ENVIRONMENT The application runs on the Standalone Execution Board (STEB). manufactured by STEP Engineering. Figure 4 shows a block diagram of the STEB. which contains an Am29000, some RAM and ROM, and two serial ports (provided by an 8530 serial communications controller). A few important features of the STEB should be noted. First. data can be passed between the instruction and data buses via a bi-directional swap buffer. The swap buffer permits code to be downloaded into the instruction RAM area via the ADAPT29K. It also allows data objects in the instruction ROM space to be read as data. Second, the instruction ROM space can contain RAM devices or ROM devices. RAM devices should be installed when working with the ADAPT29K (see Appendix F). so that code can be downloaded into the instruction ROM space. 3-83 29K Family Application Notes Initialize Am29000 Transcribe Initialized Data Call Mem Test Write Pattern and Check Check Initialized Data 11966A-02 Figure 2. Flow Chart of the Example Application 3-84 Preparing PROMs Using the Am29000 Development Tools Instruction ROM Instruction/Data RAM Am29000 VAT Example Code Workspace OxO Ox400 ox420 Initialized Data Ox500 Tested Space 32K Ox8S00 Empty MStack 2K RStack 2K 11966A-03 Figure 3. Memory Map of the Example Application 3·85 29K Family Application Notes System Address Bus Data Bus Am29000 Processor v'--------"" _ _ _ _ _ _.....,.I"I ~ Buffered Address Bus Instruction! Data RAM Space Bank #0 Bank #1 Bank #2 Bank #3 11014A-04 Figure 4. Block Diagram of the STEB PREPARING AN EXECUTABLE OBJECT MODULE Preparing the executable object module involves several steps. Typically, the steps are repeated frequently 3-86 because errors must be corrected and revisions must be made. The process can be automated by placing the commands in a DOS batch file. Listing 1 shows the batch file sc.bat, which is used in the example application. Following the listing, each step is explained. Preparing PROMs Using the Am29000 Development Tools Listing 1. The Batch File sC.bat ... o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o .... @echo off echo ********************************************************* echo "Compiling cprog.c and Assembling the .s files" echo ********************************************************* hc29 -c -w cprog.c > cprog.e hc29 -8 -Hasm cprog.c > cprog.e as29 -1 > smplboot.lst -0 smplboot.o smplboot.s as29 -1 > traps.lst -0 traps.o traps.s as29 -1 > r29K.lst -0 r29k.o r29k.s echo echo echo echo Id29 ********************************************************* "Linking object files with libraries and generating" "executable object module for ROMCOFF" ********************************************************* -0 step1.out -f tx -m > outlink.map -c step1.cmd echo ********************************************************* echo "Using ROMCOFF" echo ********************************************************* c:\29k\bin\romcoff -tlb step1.out rom.o echo echo echo echo as29 ld29 ********************************************************* "Linking object files with libraries and generating" "final executable object module" ********************************************************* -1 > smplboot.lst -DRAMINIT -0 smplboot.o smplboot.s -c step2.cmd -0 step2.out -f tx -m > step2.map echo ********************************************************* echo "Converting executable object code to downloadable format" echo ********************************************************* c:\29k\bin\btoa step2s.out sc.a echo ********************************************************* echo "Converting executable into PROM-programmable format" echo ********************************************************* coff2hex -c t -m -p 27512 step2e.out > step2.e echo on COMPILING CPROG.C AND ASSEMBLING THE.S FILES The first group of operations in the batch file obtains relocatable object modules from the source files. The C-Ianguage source file cprog.c is compiled by invoking the HighC29K compiler with the command line: hc29 -c -w cprog.c HighC29K replaces the symbolic instructions in the source file with equivalent machine-code routines. Then a relocatable object file (cprog.o) is produced, as shown in Figure 5. - o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o .... The parameter -w suppresses warning messages,limit· ing the output to containing only errors; the -c parameter instructs the assembler to produce the object file. Note that a second compilation is performed with the -Hasm flag on. This produces an assembly listing (.5 file) only. Next, the ASM29K assembler is used to assemble the modules smplboot.s, traps.s, and r29k.s. This involves replacing assembly-language symbolic instructions in the source file with the corresponding machine instruction code. To assemble smplboot.s and obtain a 3-87 29K Family Application Notes relocatable object file, the following command line can be entered: as29 -1 > smp1boot.1st smp1boot.o smp1boot.s -0 } Same line. A relocatable object file (smplboot.o) and a listing file (smplboot.lst) are produced from the assembly. All assembly-time errors are directed to the std.out. The operation is shown in Figure 6. The same operation is done on traps.s and r29k.s. LInking Once the relocatable object files have been made, they must be linked (Le., assigned physical addresses). This is done using the ASM29K linker, which allows one or more object files from either the assembler or the compiler to be linked together into a single executable object file. The object modules are linked by entering the command line: 1d29 -c step!. cmd -0 step!. out} S -f tx -m > out1ink .map ame r me. Using the command file step1.cmd (see Listing 2), the files smplboot.o, r29K.o, and traps.o are linked with cprog.o into a single, non-relocatable object file called sc.out. A reference to where each module was placed is put in the map file step1.map. Any error messages are sent to the std.out. The linking process produces a map file that lists the local symbol table, external symbols, and the cross-reference. This type of output is a good reference to the entire application program. HighC29K Compiler 11966A-05 Figure 5. Compiling cprog.c 3-88 Preparing PROMs Using the Am29000 Development Tools ASM29K Assembler smplbootlst 11966A-06 Figure 6. Assembling smplboot.s Listing 2. The Linker Command File step1.cmd "o o o o o o o o o o o o o ~ ORDER .text=OxO ORDER .bss=Oxl00400 ORDER .data=Oxl00420 PUBLIC _MSTACK=Oxlf7fc PUBLIC _RSTACK=Oxlfffc load smplboot.o,r29k.o,traps.o load cprog.o load c:\29k\lib\libmw.lib TRANSFERRING CODE FROM ROM TO RAM: ROMCOFF The smplboot.s file contains a section of initialized data that must be loaded into instruction/data RAM and tested by the application program. This could be accomplished by writing many lines of const, consth, and storem instructions into the smplboot.s file. Another method is to use the ROM GOFF utility. The ROMGOFF utility transforms user-specified sections of an Am29000 program into a stream of instruc- o o o o o o o o o o o o tions that will perform the transcription. From a fully linked, executable Am29000 program, the ROMGOFF utility generates a GOFF output file containing initializers that will establish the image of an executable GOFF input file in instruction/data RAM. The output file contains one section, RUext, within which is one routine, RAMlnit. The output file can then be linked with other relocatable modules that will remain in Instruction ROM, to produce a single non-relocatable module for programming PROMs. 3-89 29K Family ApplIcation Notes ROMCOFF can be used to transcribe entire sections of code into instruction/data RAM. Then, once the application's boot program has finished preparing the environment, it transfers control to the transcribed program in instruction/data RAM. This allows the code to be executed out of high-speed RAM devices, which are frequently more cost effective than high-speed PROMs. See Figure 7. In the example program, only a section of initialized data in smplboot.s is transferred to RAM. ROMCOFF creates a relocatable object module that transcribes the data sections to RAM when the following command line is entered: romcoff -t1b stepl.out rom.o The linked output file step1.out is made into the file rom.o. Only the data section is output, because of the ROMCOFF options -Ub, which specify that the text, literal, and bss sections should be ignored. The output from ROMCOFF (rom.o) contains only code to transcribe data sections. It must be re-linked with the object files to produce a final absolute object module. First, the code in smplboot.s, which contains a call to the RUext section, must be assembled to include the conditional assembly statements. To assemble smplboot.s so that it will contain the call, enter: as29 -1 > smp1boot .1st -DRAMINIT} Same -0 smp1boot. 0 smp1boot. s line. The -0 option defines RAMlnit so that conditional assembly statements in the source file will be assembled. The statements include a definition of RAMlnit, and a call to it. Then, all of the object modules can be linked with rom.o as follows: 1d29 -c step2.cmd -0 step2.out -f tx -m > step2. map } Same line. A second linker command file is used because rom.o must identified to the linker (see Listing 3). Instruction ROM Or Boot Initialize Environment Main Instruction/Data RAM Main Transcribe Code to RAM Call Main Execute Application 11966A-07 Figure 7. Using ROMCOFF . 3-90 Preparing PROMs Using the Am29000 Development Tools Listing 3. The Linker Command File step2.cmd .... "" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ORDER .text=OxO,RI_text ORDER .bss=Ox100400 ORDER .data=Ox100420 PUBLIC _MSTACK=Ox1f7fc PUBLIC RSTACK=Oxlfffc load smplboot.o load rom.o load r29k.o,traps.o load cprog.o load c:\29k\lib\libmw.lib - - DOWNLOADING TO THE ADAPT29K Once the final executable object module is created, the example program can be downloaded to the target system and tested using the ADAPT29K. USING BTOA The BTOA utility creates an ASCII COFF output from the input file. Although the ADAPT29K can handle Tektronics® or Motorola® hex files, using the BTOA utility to make the ASCII hex file has several advantages. Most importantly, BTOA encodes the input file into (7-bit) ASCII using a compact base-5 scheme that limits file expansion to only 25 percent, as opposed to 150 percent for standard hex formats. Hence, the resulting output file is smaller, and consequently quicker to transfer. Also, BTOA maintains the ASCII COFF format, rather than converting it to absolute addresses. As shown in the sC.bat batch file, BTOA produces the output file sc.a and is invoked by: btoa step2s.out sc.a OOOOOOOOR c6400200 MFSR GR64,CPS 00000004R 03fb4lff CONST GR65,OxFBFF 00000008R 90404041 AND GR64, GR64, GR65 OOOOOOOcR ceOO0240 MTSR CPS,GR64 GR64,OxO 00000010R 03004000 CONST 00000014R ceOOO040 MTSR VAB,GR64 00000018R 0300403f CONST GR64,Ox3F 000000lcR ceOO0740 MTSR RBP,GR64 11966A-08 Figure 8. List Memory Display 3-91 29K Family Application Notes Listing 4. Results of "End Execution" Command List .., ..., o g o g o ..~ o o o o > d 400,420 00000400 00000410 00000420 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 o o o ,- ~ TESTING THE EXAMPLE PROGRAM WITH THE ADAPT29K the next prompt has appeared, the contents of the instruction ROM can be verified by entering: *1 Once the object module has been translated using the BTOA utility, it can be downloaded to the target using ADAPT29K. For use with ADAPT29K, the STEB should be configured as indicated in Appendix F. The ADAPT29K should respond to the L (list memory) command with the display shown in Figure 8. The locations starting at Ox400 in instruction/data RAM contain the status of the test and number of successful loops, respectively. Which location actually contains which variable is a decision made by the linker, and must be determined by inspection. To download the file, communication must be estab-. lished with the ADAPT29K. On a PC, this is done by invoking the terminal emulator program (for example, CrossTalkl!!i), establishing communication with the ADAPT29K, and entering (note that # is the ADAPT29K monitor prompt): * ya Or To check these locations automatically when the execution stops, set up an "end execution" command list by entering: e,Or *e The Y (load a file to memory) command prepares the ADAPT29K to receive an ASCII-encoded file from the DCE port. Then, the emulator must be instructed to transmit the file (for example, se sC.a when using CrossTalk). After the code has been downloaded, and d 400,420; The list is executed on entry. It should appear as shown in Listing 4. GR080 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 GR088 GR096 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00104a18 00000000 00000000 00000000 00000000 00000000 00000000 00000000 GR104 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 GR112 GR120 00000000 00000000 00000000 00000000 80000020 000095d9 00100400 00000095 ffffffff 80000000 00000000 00000000 00000000 000lf7fe 00000£££ 06050101 LROOO 00000928 OOOlf££e 00100414 00108414 000000£0 OOOlf££e 00000000 00000000 LR008 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 LR016 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 LR024 LR032 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 LR040 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 LR048 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 LR056 GR001 000lf£e4 (R249) IPC IPA IPB 00 00 00 (GROOO) (GROOO) Q 00000000 ALU: DF V N Z C 0 o 0 0 BP FC CR 0 00 00 (GROOO) Figure 9. Key Registers Display 3-92 0 11966A-09 Preparing PROMs Using the Am29000 Development Tools Prior to starting the test, it is a good practice to reset the system by using the P reset command: Enter: # e # preset The display will be similar to that shown in Figure 11. The precise display in any given situation, particularly the loop count stored in location 40CD is dependent on the exact time elapsed between the start execution and the entry of the E command. At another time, it may appear as shown in Figure 12. To verify the condition of the system before execution, the X (Display Key Registers) command is entered as: # x This will result in a display as shown in Figure 9. The special-purpose protected registers can be checked using the XP (display protected registers) command. The display appears as shown in Figure 10. The state of the processor can be checked using the C (check execution state command): # c To execute the program starting from address 0 in instruction ROM, the G (go-start execution) command is used: When the processor is running, ADAPT29K displays: Am29000 is Running. # g Or During execution, the status of the program can be checked by invoking the previously defined "end execution" command list. . # xp CPS: ops: CA IP TE TP TU FZ LK RE WM PD PI SM IM DI DA 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 1 1 1 0 0 1 1 1 LS ML ST LA TF TR NN CV 79 1 CFG: PRL 01 VAB 0000 CHA CHD 00104a14 00000000 VF RV BO 1 0 0 CHC: CE 00 1 CD CP 1 CNTL 0 1 1 CR 00 1 0 0 0 0 0 RBP: BF BE BD BC BB BA B9 B8 B7 B6 B5 B4 B3 B2 B1 BO 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 TCV 000000 TR: OV IN IE 1 1 0 TRV 000000 PCO PC1 PC2 00000a34 00000a30 00000a2c MMU: PS 0 PID 00 LRU 0 11966A-10 # Figure 10. Protected Registers Display 3-93 29K Family Application Notes > d 400,420 00000400 00000410 009595d9 00000000 00108414 0000002d 00000420 00000000 00100414 00000000 00000000 00000000 11966A-11 Figure 11. Check Status Display > d 400,420 00000400 00000410 00000420 # 009595d9 00000000 00108414 000000e1 00100414 ffffffff ffffffff ffffffff ffffffff 11966A-12 Figure 12. Second Check Status Display PREPARING PROMs PROGRAMMING THE PROMS Once the absolute object file has been prepared, it must be transferred to the media from which the code will be executed. Often, this medium is a PROM set. Most PROM programmers require their input to be in an ASCII hex format, so a translation normally is performed before sending the program to the PROM programmer. A PROM programmer is used to "burn" the binary object file into PROM devices. Many types of PROM programmers are available. The Data 1/0 Unisite(ll) PROM programmer is used in the following example. MAKING HEX FILES: COFF2HEX The COFF2HEX utility produces a 32-bit ASCII hex file in either the Motorola S3 or Tektronics Extended format. Both of these formats are accepted by most PROM programmers, as well as the ADAPT29K. Note that the ADAPT29K requires the file to be one module, rather than being divided into separate modules by part size (see the options of the COFF2HEX utility). In sc.bat, COFF2HEX is invoked by entering: coff2hex -c t -m -p 27512 step2e_out > sccoff.e } Same line. This produces 8-bit wide modules that will fit into a 27512 EPROM (-p option). The format is Motorola S3 (-m option), and will include only the text sections (-c t option). The resulting file(s) will be named a.aOO, a.a08, a.a16, and a.a24, indicating which bytes of the word they represent. If the file is larger than the capacity of the part size specified, additional sets of four will be generated with filenames a.bOO, a.b08, a.b16, a.b24, and so on, with further sets having a corresponding nomellclature. Once generated, the files can then be transmitted to a PROM programmer. 3·94 Assuming an object module had been created as described in the first part of this document (and a set of Motorola S3 modules were obtained using COFF2HEX), the following procedure could be used to create a PROM set. . 1. Turn on the PROM programmer. Make sure the algorithm disk is properly inserted in the lower front slot. 2. Once the power-up sequence and diagnostics have completed, a screen should appear on the attached terminal. If there is no terminal, or the screen does. not appear, refer to the set-up section of the user's manual for the PROM programmer. 3. Make sure a host system is attached. In this example, the use of a PC is assumed. At the PC, set the . COM1 serial port of the PC to 9600 baud, no parity, 8-bit bytes, and one stop-bit by entering: mode com1 :96,n,8,1. On the PROM programmer, select "Configure System," followed by "Edit," and then "Serial 110." Make sure the remote port parameters are set properly. 4. The program will be placed in AMD 27512 PROMs. To inform the PROM programmer, choose "Select Device," "3" (AMD), and "25" (27512). Preparing PROMs Using the Am29000 Development Tools 5. It is a good idea to clear the PROM programmer's memory before downloading data. This ensures that the PROMs do not become programmed with leftover data from a previous operation, which may cause troublesome errors. To clear the memory, select "Fill Memory." Enter 00 to 7FFFF as the address range, and FF as the data. 6. The PROM programmer must know the format of the incoming data. Select "Transfer Data," followed by "Format Select." Enter "95" for Motorola S3 Record. 7. Select "Load Device" on the programmer. On the PC, enter: B. Properly insert a PROM into the ZIF socket on the PROM programmer and engage the locking mechanism. Select "Program Device" option on the PROM programmer. 9. Once the PROM has been burned, remove it and label it with the program name, range of bits, verSion, and date. Then, repeat steps 7-9 using the files a.aOa through a.a24. If a larger program is used, it may be necessary to repeat steps 7-9 using modules a.bOO, a.bOB, a.b16, a.b24, and so on. copy a.aOO coml: This causes the lowest B bits of the application to be transmitted to the PROM programmer, which will load the data into its memory. 3-95 29K Family Application Notes APPENDIX A: smplboot.s .extern .extern .extern .extern .extern .equ .equ .equ .include .data .word .comm .text .ifdef .extern .endif .global r29k_init _main V_SPILL, V_FILL spill, fill _RSTACK,_MSTACK ROM_TH,Ox2 RSC_SIZE,Ox200 TBM_SIZE,Ox20000 "scregs.def" mfsr canst and mtsr canst mtsr canst tmpO,CPS tmpl,OxFBFF tmpO,tmpO,tmpl CPS,tmpO tmpO,O VAB,tmpO tmpO,Oxll mtsr canst canst consth sub srI sub CFG,tmpO tmp2,0 tmpO,O tmpl,TBM_SIZE tmpl,tmp1,tmpO tmpl,tmpl,2 tmpl, tmp1, 2 store jmpfdec add canst canst consth canst tmp1,mem_00 tmpO,tmpO,4 tmpO,256-2 tmp1,illtrap+Ox2 tmp1, illtrap tmp2,0 assembly module C module Linker definable V_SPILL and V FILL vector numbers spill and fill procedure Link time definable stack pointer assignments Spill and fill trap interface do truly reside in ROM space Default reg_stack_cache usage=512 32K*4=12Bkb of Inst/RAM size (20)170 mtp_count,4 RAMINIT RAMInit if RAMINIT Flag on make RAMInit available start start: vtd_init: store jmpfdec add canst consth canst sll store canst consth canst sll store canst consth canst sub sub canst consth add canst consth calli nap .ifdcf 3-96 0,0,tmp1,tmp2 tmpO,vtd_init tmp2,tmp2,4 tmpO,spilltrap+ROM_TH tmpO,spilltrap tmp1,V_SPILL tmp1,tmp1,2 0,0,tmpO,tmp1 tmpO,filltrap+ROM_TH tmpO, filltrap tmp1,V_FILL tmp1,tmp1,2 0,0,tmpO,tmp1 rfb,_RSTACK rfb,_RSTACK tmpO,RSC_SIZE rab,rfb,tmpO rsp,rfb,OxB msp,_MSTACK msp,_MSTACK Ir1,rfb,0 tmpO,r29k_init tmpO,r29k_init IrO,tmpO RAMINIT Read CPS Clear FZ bit Update CPS Set VAB pointing to LOW memory Set VF=l, i.e., Vector table scheme and CD=l, i.e., Branch Target Cache is disabled Write Data pattern = OxOOOOOOOO Low memory address High memory address Get address difference Get word count from diff value adjustments for jmpfdec instr fill TB_memory with all zeros 0,0,tmp2,tmpO Total of 256 vector table entries ROM based illegal trap handlers address, by default fill vector table with default trap handlers get spill trap entry point get spill trap vector number generate vect number location store address of trap handler into vector table get fill trap entry point get fill trap vector number generate vect number location store address of trap handler into vector table Set RFB Ox200=512 bytes ie 12B*4 Set RAB=RFB-512 Set RSP=RFB-B Set MSP Set Ir1 to RFB call procedure to init 29K registers if RAMINIT on, Preparing PROMs Using the Am29000 Development Tools const consth calli .else nop nop nop .endif nop const consth mtsrim mtsrim mtsr add mtsr xor iret tmpO,RAMInit tmpO,RAMInit gr96,tmpO set up RAMInit call and do the call make sure code takes same number of locations regardless of RAMINIT condition tmpO,exec tmpO,exec OPS,Oxl72 CPS,Ox573 PCl,tmpO tmpO,tmpO,4 PCO,tmpO tmpO,tmpO,tmpO in case we did calli get target application task address RE=l, PI=l, PO=l, SM=l and 01=1 Set Target application Task address Any additional regs clean up Give control to application via IRET exec: const consth calli nop sll sll sll const consth load cpeq jmpt nop halt lrO, _main lrO,_main lrO,lrO gr97,gr64,O gr98,gr65,O gr99,gr66,O gr64,mtp_count gr64,mtp_count O,O,gr65,gr64 gr67,gr96,O gr67,again get C-callable routine entry point make the call Save user global registers gr64 through gr66 get address of memory test pass count recorder get current count so far check for memory test pass? true then run test again false halt further memory testing again: add store sll sll sll jmp nop spilltrap: mfsr const consth mtsr add mtsr iret filltrap: mfsr const consth mtsr add mtsr iret illtrap: halt .end gr65,gr65,l O,O,gr65,gr64 gr64,gr97,O gr65,gr98,O gr66,gr99,O exec tpc,PCl tmpO, spill tmpO, spill PCl,tmpO tav,tmpO,tmpO+4 - PCO, tmpO tpc,PCl tmpO, fill tmpO, fill PCl,tmpO tav,tmpO,tmpO+4 PCO,tmpO bump mtp_count by 1 update in memory also Restore user global registers gr64 through gr66 run the memory test once again save return address in tpc get spill procedure entry point fill Am29000 pipeline target address fill Am29000 pipeline with target address+4 save return address in tpc get fill procedure entry point fill Am29000 pipeline target address fill Am29000 pipeline with target address+4 3-97 29K Family Application Notes APPENDIX B: IIdefine IIdefine #define IIdefine IIdefine #define #define #define #define int int int int cprog.c MT_PASSED SOLID_ONES SOLID_ZEROS MT_BLK_SIZE WORD_SIZE INIT_DATA MEM_BLOCK NIT_DATA_BASE INIT_DATA_SIZE 0 -1 32768 4 170 1056 1280 15 *mt_sts; Im_addr,hm_addr; initdata; *mem_test(); main () ( Im_addr = hm_addr = initdata mt_sts INIT_DATA_BASE; INIT_DATA_BASE+MT_BLK_SIZE/WORD_SIZE; MEM_BLOCK; mem_test(lm_adqr,hm_addr,initdata); int *mem_test(low,high,initd) int *low,*high,*initd; int *addr; /* Solid Ones test */ for{addr=low; addr<=high; addr++) *addr = SOLID_ONES; for{addr=low; addr<=high; addr++) if{*addr != SOLID_ONES) return{addr); /* Solid Zeros test */ for(addr=low; addr<=high; addr++) *addr = SOLID_ZEROS; for(addr=low; addr<~high; addr++) if(*addr != SOLID_ZEROS) return(addr); for (addr=initd;addr ' Address Bus Figure 7. Example Am29000 System 3-114 --q 11025A-07 Programming Standalone Am29000 Systems tions off (PD,PI = 1), turns on supervisor mode (SM = 1), and disables all interrupts and traps (DI,DA= 1). Step 2-Establlshlng a Simple Register Stack Frame BOOT.S calls several procedures, so it establishes a Register Stack Frame. However, control will not return to BOOT.S after calling _main. Therefore, it only needs to use a limited stack frame. The frame is set up with: const frame const sub pl add rfb, 512 ;set up temp reg rab, 0 rsp, rfb, 16 ;enough for pO and lrl, rfb, 0 Step 3-lnltlallzlng I/O Devices An I/O device is initialized early, so that it can be used to transmit error messages. The 8530 serial communications controller is initialized using the routine shown in Listing 1. LIsting 1. Initializing I/O Devices SerInit: .reg .reg const consth const store const store const store const store const store const store const store const store const store const store const store const store const store const store const store const store SI CtAd, %% (TEMP_REG + 0) SI_CtVl, %% (TEMP_REG + 1) SI CtAd, SCCCntlAd SI_CtAd, SCCCntlAd SI_CtVl, 9 0, 0, SI_CtVl, SI CtAd SI_CtVl, OxcO 0, 0, SI _CtVl, SI_CtAd SI CtVl, 4 0, 0, SI _CtVl, SI CtAd SI_CtVl, Ox44 0, 0, SI_CtVl, SI_CtAd SI_CtVl, 3 0, 0, SI_CtVl, SI CtAd SI_CtVl, OxcO 0, 0, SI_CtVl, SI _CtAd SI_CtVl, 5 0, 0, SI_CtVl, SI_CtAd SI_CtVl, Ox60 0, 0, SI_CtVl, SI_CtAd SI_CtVl, 9 0, 0, SI_CtVl, SI_CtAd S1_CtVl, OxO 0, 0, S1 CtVl, SI CtAd SI_CtVl, 10 0, 0, SI _CtVl, S1_CtAd S1 CtVl, OxO 0, 0, SI _CtVl, S1 CtAd SI CtVl, 11 0, 0, S1 _CtVl, SI CtAd SI CtVl, Ox56 0, 0, SI _CtVl, SI_CtAd S1 CtVl, 12 0, 0, S1 _CtVl, S1_CtAd SI - CtVl, Ox6 0, 0, SI _CtVl, SI CtAd - ;control port address ;control port value - ;reset the port - - ;x16, 1 stop, no parity ;8 bits receive - ;8 bits xmit ;Int. disabled ;NRZ ;Tx & Rx BRG out ;9600 baud - 3-115 29K Family Application Notes Listing 1. Initializing 1/0 Devices (continued) const store const store const store const store const store const store const store const store const store const store EPILOGUE SI_CtVl, 13 0, 0, SI_CtVl, SI_CtVl, OxO 0, 0, SI_CtVl, SI_CtVl, 14 0, 0, SI_CtVl, SI_CtVl, OxO 0, 0, SI_CtVl, SI_CtVl, 14 0, 0, SI_CtVl, SI_CtVl, Ox1 0, 0, SI_CtVl, SI_CtVl, 3 0, 0, SI_CtVl, SI_CtVl, Oxc1 0, 0, SI_CtVl, SI_CtVl, 5 0, 0, SI_CtVl, SI_CtVl, Oxea 0, 0, SI_CtVl, ;9600 baud SI_CtAd SI_CtAd ;BRG in RTxC SI_CtAd SI_CtAd ;BRG on SI CtAd - SI_CtAd ;Rx enable SI CtAd - SI_CtAd ;Tx enable SI_CtAd SI_CtAd Step 4-Testlng RAM The RAM is tested before code is transferred to it. BOOT.S calls a single test, an address pattern test. Other tests are included in the source listing shown in Appendix A. The test used by BOOT.S is shown in Listing 2. Step S-Settlng the Vector Table Entries to the Invalid Trap Handler will START.S set up the vector table, but BOOT.S guards against abnormal ends by making all of the vector table entries point to an invalid trap handler in ROM. This is done with the following routine, which is called from the main loop, as shown In Listing 3. Listing 2. Testing RAM .sbttl FUNCTION "RAM Address Pattern Test" RAMAddr, 2, 0, 3 This routine will run a two-pass test on RAM. It will be controlled by input values specifying the base address and the count of locations ~o be tested. In the first pass, the data will be set equal to the address. In the second pass, the data will be set equal to the complement of the address. 3·116 In: (see below) Out: (see below) .reg .reg .reg .reg .reg .reg .reg .reg RA_StrtAdd, RA_WrdCnt, RA_TmpCnt, RA_StrtPat, RA_PtrnInc, RA_NxtAdd, RA_WrtPat, RA_RedPat, %% (IN PRM + 0) n (IN_PRM + 1) %% (TEMP_REG + 0) %% (TEMP_REG + 1) n (TEMP_REG + 2) n (OUT_PRM + 0) %!is (OUT_PRM %!is (OUT_PRM + 1) + 2) :starting address ;count of words :total test word count ;starting pattern :ptrn increment value ;error address ;pattern written ; pattern read Programming Standalone Am29000 Systems Listing 2. Testing RAM (continued) .reg add const RA_Fail, %%(RET_VAL + 0) RA_StrtPat, RA_StrtAdd, 0 RA_Ptrnlnc, 4 ;fill memory with pattern add RA_NxtAdd, RA_StrtAdd, 0 RA_TmpCnt, RA_WrdCnt, 2 sub RA_WrtPat, RA_StrtPat, 0 add 0, 0, RA_WrtPat, RA_NxtAdd store add RA WrtPat, RA_WrtPat, RA_Ptrnlnc jmpfdec RA_TmpCnt, RA_2 RA_NxtAdd, RA_NxtAdd, add ;check memory for pattern add RA_NxtAdd, RA_StrtAdd, 0 RA_TmpCnt, RA_WrdCnt, 2 sub RA_WrtPat, RA_StrtPat, 0 add ;TRUE for fail ;start with address ;get start address ;for jmpfdec ;set the pattern ;next test mem addr ;get start address ;for jmpfdec ;set the pattern RA_3: CD, DATA_CTL, RA_RedPat, RA_NxtAdd load cpneq RA_Fail, RA_RedPat, RA_WrtPat jmpt RA_Fail, RA_ERR nop add RA_WrtPat, RA_WrtPat, RA_Ptrnlnc jmpfdec RA_TmpCnt, RA_3 add RA_NxtAdd, RA_NxtAdd, ; invert ptrn for next pass nor RA_StrtPat, RA_StrtPat, 0 cpneq RA_Fail, RA_StrtPat, RA_StrtAdd jmpt RA_Fail, RA_l subr RA_Ptrnlnc, RA_Ptrnlnc, 0 jmp RA_EXIT nop ;err if neq ;next test mem address ;invert initial ;negate inc value RA_ERR: call nop const consth lrO, RAMErr RA_Fail, TRUE RA_Fail, TRUE ;set after call RA_EXIT: EPILOGUE 3·117 29K Family Application Notes Listing 3. Setting Vector Table Entries . sbttl LEAF "Vector Initialization" Vectlnit, 0 This routine initializes the vector table and vab. All vectors are set to point to the invalid trap handler in ROM. .reg .reg .reg mtsrim mfsr const consth const VI_Vect, %% (TEMP_REG + ,0) VI_VectSt, %%(TEMP_REG + 1) VI_VectCnt, %%(TEMP_REG + 2) vab, 0 VI_VectSt, vab VI_Vect, (InvalidTrapHandler I 2) VI_Vect, InvalidTrapHandler VI_VectCnt, (256 - 2) ;vector value ;vector storage address ;vector count register store jmpfdec add EPILOGUE 0, 0, VI_VectSt, VI_Vect VI_VectCnt, VI_Loop VI_VectSt, VI_VectSt, 4 ;store the vector ; for jmpfdec VI_Loop: Step 6-Transcribing Code to RAM BOOT.S transcribes START.S and the C-Ianguage application (Simulated by TEST.S) into instruction/data RAM by calling RAMlnit. RAMlnit is a routine that is created by the ROMCOFF utility. When an executable Am29000 object file is submitted to ROMCOFF, the utility generates a relocatable object file of type RLText that (when called) establishes an image of the executable module in instruction/data RAM. BOOT.S transfers START.S and the C-Ianguage application to RAM by calling the RAMlnit routine created by ROMCOFF. RAMlnit is called by: RI_Ret,RAMlnit call ;initialize RAM Note that when RAMlnit is called, the return address is not stored In a local register (such as IrO) , and that RAMlnit is called just before transferring control to _main. To transcribe data to RAM, RAMlnit will create a stream of const and consth instructions that will load up the local registers starting from IrO. Then it will insert a store multiple command to transfer the data into memory. Consequently, any data in local registers will be overwritten. Step 7~alllng START.S As BOOT.S does not intend to have control returned to it, it calls START.S by Simulating a return from interrupt. This is accomplished by setting the freeze (FRZ) bit ON in the old processor status (ops) and current processor status registers (cps), putting the starting address of START.S in peo, and performing a return from interrupt (see Listing 4). The Main Loop of BOOT,S When all of the preceding steps are put together, the main loop appears as shown in Listing 5. Listing 4. Calling START.S mtsrim mtsrim const consth mtsr add mtsr iretinv 3·118 ops, cps, lrO, lrO, pc1, lrO, pcO, Ox473 Ox473 TextBas TextBas lrO lrO, 4 lrO ;FZ, PO, PI, SM, 01, DA ;FZ, PO, PI, SM, 01, DA ; (using lrO as temp) ;go to inst space, TextBas Programming Standalone Am29000 Systems LIsting 5. Main Loop of BOOT.S Boot: .reg mtsrim const const sub add call nop const consth call const call nop call mtsrim mtsrim const consth mtsr add mtsr RI_Ret, %%(TEMP_REG + 0) cps, Ox173 rfb, 512 rab, 0 rsp, rfb, 16 .lr1, rfb, 0 IrO, Serlnit p1, p1, IrQ, pO, (RAM_SIZE» 2) (RAM_SIZE» 2) RAMAddr 0 IrO, Vectlnit RI_Ret, RAMlnit ops, Ox473 cps, Ox473 IrO, TextBas IrO, TextBas pc1, IrO IrO, IrO, 4 pcQ, IrO CREATING THE EXECUTION ENVIRONMENT WITH START.S The START.S file is used to prepare the execution environment for the application program (simulated by TEST.S). Although a given application certainly will have varied requirements in different hardware environments, the tasks that will be performed by START.S are needed to establish virtually any operating environment on the Am29000. These are: 1. Configure the Am29000. 2. Allocate the register and memory stacks. 3. Initialize vector table and trap handlers. 4. Initialize the TLB by marking all entries invalid. 5. Call "main." Step 1-Conflgurlng the Am29000 Code similar to that shown below can be used to set the contents of the cfg so that the vector area is a table of pOinters (VF =1) and the Branch Target Cache™ is disabled (CD =1). Also, the cps register is set so that physical addressing is used for both instructions and data (PO = 1,PI = 1), all interrupts and traps are disabled (01 =1), and supervisor mode is ON (SM =1). The timer (tmr) is also set to 0 to avoid unwanted timer interrupts: mtsrim mtsrim mtsrim tmr, 0 cfg, (VFICD) cps, (PDIPIISMIDI) :RAMlnit return :RE, PO, PI, SM, OI, OA :set up temp reg frame :enough for pO and p1 :initialize an 8530 to report errors :test full RAM size :ca11 a RAM address test :test from addr 0 (input parm) to RAM test ito RAM test :routine to initialize traps to ;invalid trap handler :initialize RAM -- from ROMCOFF ;FZ, PO, PI, SM, OI, OA :FZ, PO, PI, SM, OI, OA : (using IrO as temp) The setting of the VF bit has determined the structure of the vector area table. The vector area is a usermanaged table in external instruction/data memory that starts at the address held in the vector area base (VAS) register. The vector area can have one of two different structures, as determined by the VF bit of the configuration register. If VF = 1, then the vector area is organized as a list of 256 pointers to interrupUtrap handlers. If VF =O,then the vector area is arranged as 256 64-instruction blocks, each corresponding to a given call. Each fixed block then contains the corresponding interrupt or trap handler. Figure 8 shows the two structures. When the Am29000 receives an interrupt or trap, the location of the appropriate handler is determined by the vector area (VA). Each interrupt and trap has a vector number between 0 and 255 that corresponds to an entry in the vector area. Of the vector numbers, 0 to 63 are reserved for system and floating-point operations. The assigned vector numbers are given in the Am29000 User's Manual. If the table is a list of pointers, control will be passed to the address at VAS + (vector number· 4). Multiplication by 4 adjusts the vector number to words. If the vector table is composed of handlers, control will be passed to a handler starting at VAS + (vector number • 64 • 4), where the vector number is adjusted to words and multiplied by the number of instructions per block (fixed) (see Table 2). . 3·119 29K Family Application Notes Table 2. The Location of a Pointer In the VAT CFG:VF ISR Address= VAB + (vector number· 4) VAB + (vector number • 256) 1 o Step 2-Allocatlng Register and Memory Stack Frames A full register stack frame is established by START.S, because it will call the application program Lmain). Further, control could be passed back to the START.S return address (which then initiates a ''warm start"). This should be done early in the main loop, as START.S will call some supporting assembly-language routines. The register stack frame can be established by the code shown in Listing 6. Arguments that overflow the register stack will have to be placed in the memory stack (see Figure 8). The current position in the memory stack is pointed to by the memory stack pointer (msp). The stack can be established by: const. consth InSp, ,MStkTop InSP, MStkTop LIsting 6. Allocating Register and Memory Stack Frames const consth const consth add sub rfb, rfb, rab, rab, lr1, rsp, vAB + (Vector Nu mber· 256) I ... - ;RStkTop is set to the ;desired address in the declarations file ;128*4, maximum ;part that can ;be cached ;adjusts for lrO, lr1, argc, and argv RStkTop RStkTop (RStkTop - 512) (RStkTop - 512) rfb, 0 rfb, 16 Handler ~ VAB ~ Handler ... J -I • • • • • • I CFG:VF=1 Figure 8. The Two Structures of the Vector Area 3-120 I VAB + (Vector Number • 4) Handler CFG:VF=O Handler --- I 11025A·08 Programming Standalone Am29000 Systems Step 3-lnltlallzlng the Vector Area and Vectors Although the organization of the vector area is determined by the configuration register, the table and pointers still must be initialized. In the following example, the vector initialization code is kept compact, while permitting easy expansion of the vector set, by using a table in the .data section. Each entry in the table has two words. The first is the vector number; the second is the handler address (see Listing 7). When the vector area base (vab) is supplied to the routine shown in Listing 8, it initializes the handlers. Listing 7. Initializing the Vector Area and Vectors ;switch to .data for table .data VectInitTable: .word .word .word .word .word .word .word .word .word .equ .text V_SupInstTLB, SupInstTLBHandler V_SupDataTLB, SupDataTLBHandler V_MULTIPLY, MultiplyHandler V_DIVIDE, DivideHandler V_MULTIPLU, MultipluHandler V_DIVIDU, DividuHandler V_SPILL, SpillHandler V_FILL, FillHandler V_Timer, TimerHandler VINIT_CNT, VectInitTable) / 8) «. - ;switch back to .text for code Listing 8. Initializing Vector Handlers VectInit: .reg .reg .reg .reg .reg mfsr const const con 5th VI_Vect,%%(TMP_REG + 0) VI_St,%%(TMP_REG + 1) VI_Cnt,%%(TMP_REG + 2) VI_Base,%%(TMP_REG + 3) VI_TbPt,%%(TMP_REG + 4) VI_Base, vab VI_Cnt, (VINIT_CNT - 2) VI_TbPt, VectInitTable VI_TbPt, VectInitTable load add 511 add load add jmpfdec store jmp nop 0, 0, VI_St, VI_TbPt VI_TbPt, VI_TbPt, 4 VI_St, VI_St, 2 VI_St, VI_St, VI_Base 0, 0, VI_Vect, VI_TbPt VI_TbPt, VI_TbPt, 4 VI_Cnt, VI_Loop 0, 0, VI_Vect, VI_St raddr ;vector ;vector ;vector ;vector ;vector value storage address count base base ;for jmpfdec VI_Loop: ;get the vector ;convert to address ;get the handler 3-121 29K Family Application Notes Step 4-lnltlallzlng the Translation Look·Aslde Buffer (TLB) When the Am29000 is first powered-up, the TLB will not have valid entries. To prevent erroneous TLB misses, the entries should be marked invalid by the start-up sequence before control is passed to the application program. This can be done with an assembly-language sequence (see Listing 9). Step 5-Calllng "main" Once the proper environment has been established for the application program, the main C program must be called. This is done by placing the address of the starting instruction in registers and performing a call. When the jump is "short," or less than 256 words, a call can be done directly. However, the jump often,will be farther, and calli must be used in conjunction with an address stored in registers, as shown below: Notice that raddr signifies the return address, usually IrO, by convention. Once the call is made, the return address of the caller has replaced the target location, in the event there is a return from _main. The START.S Main Loop The complete START.S main loop, as developed in the previous sections, is shown in Listing 10. The routine receives control after being transcribed to RAM; once there, it initializes the vector handlers, clears the BSS area, initializes the TLBs, and establishes initial stack pointers and an initial register frame. Lastly, it invokes _main. Note that, in the event _main returns, a warm start is performed. const raddr, _main ;store lower 16 bits consth raddr, _main ;store upper 16 bits calli raddr, raddr ;call indirect Listing 9. Initializing the TLB .reg .reg .reg const const const TI_Reg,%%(TEMP_REG TI_Val,%%(TEMP_REG TI_Cnt,%%(TEMP_REG TI_Reg, 0 TI_Val, 0 TI Cnt, (TLB_CNT - rnttlb jrnpfdec add TI_Reg, TI_Val TI Cnt, TI _Loop TI_Reg, TI_Reg, 1 - TI_Loop: 3-122 - + 0) + 1) + 2) ;the TLB register number ;the TLB value (0) ;the TLB register count 2) ; for jrnpfdec Programming Standalone Am29000 Systems LIsting 10. START.S Main Loop Start: Ox73 MMU_PS Ox10 RStkTop RStkTop (RStkTop - 512) (RStkTop - 512) rfb, 0 rfb, 16 ;set PO, PI, SM, OI, OA ;PID = 0 mtsrim mtsrim mtsrim const consth const consth add sub cps, mmu, cfg, rfb, rfb, rab, rab, lr1, rsp, const consth call msp, MStkTop msp, MStkTop lrO, Vectlnit ;routine to install handled IrO, TLBlnit ;routine to mark TLBs invalid cps, lr2, lr3, lrO, Ox10 0 0 _main ;SM ;argc ;argv cps, ops, cfg, chc, pc1, pcO, Ox473 Ox173 1 0 0 4 ;set FZ, PO, PI, SM, OI, OA ;set RE, PO, PI, SM, OI, OA ;cache disabled ;contents invalid ;cold start address ;VF ;set up stack pointers ;make room for IrO, Ir1, argc, argv vectors nop call nop mtsrim const const call nop mtsrim mtsrim mtsrim mtsrim mtsrim mtsrim iretinv = 0 0 3-123 29K Family Application Notes APPENDIX A: boot.s . title "ROM Boot Code" Copyright 1988, Advanced Micro Devices Written by Gibbons and Associates, Inc. This module is intended to receive control at address O. It handles a hardware reset or a simulation of that event in a "warm start" situation. Its purpose is to provide sufficient initializations for the operation of a program in RAM data/instruction space. The initializations must include the transcription of the program and its initialized data. The code and initialized data are stored in ROM prior to transcription. To provide for orderly operation, C linkages are used. It is known that the register stack will never overflow. When certain calamities occur (e.g., invalid traps), the registers will be re-initialized to allow the use of subroutines in this module. There is no intention of ever returning under these circumstances. Some of the routines in this module have a rather tedious implementation because they do not assume the validity of RAM or the readability of ROM. This is considered appropriate since it assures the validity of error handling. This module provides no global addresses for external use. be called. It is best thought of as bootstrap code. It is not intended to Some tests which are not actually used are included here for use in environments that may allow them. The external addresses named below are required. .extern RAMlnit ;romcoff generated This module needs the addresses for the control and data ports of the SCC. are declared below. .equ .equ SCCCntlAd,OxfffffffO SCCDataAd,Oxfffffff4 These ;control port address ;data port address This module assumes that RAM begins at data address 0 and has the size declared below. .equ .include .eject .sbttl RAM_SIZE,Ox40000 "romdcl.h" ;256K bytes "Section Declarations" This module has only one section, which is called "rom." It receives control at reset, i.e., it is an absolute segment based at address 0 (in ROM space). .sect .use 3-124 rom, text, absolute 0 rom Programming Standalone Am29000 Systems RomBase: jmp nop nop nop halt nop Boot ;the RESET entry ;the warn entry ;Could be a report routine .eject .sbttl "scc LEAF Serlnit,O Routines" This routine initializes the serial port for non-interrupt driven access at 9600 baud. In: (nothing) Out: (nothing) .reg .reg const consth const store const store const store const store const store const store const store const store const store const store const store const store const store const store const store const store const store const store SI_CtAd,%%(TEMP_REG SI _CtVI,%%(TEMP_REG SI _CtAd,SCCCntIAd SI _CtAd,SCCCntIAd SI_CtVI,9 O,O,SI_CtVI,SI CtAd SI CtVI,OxcO 0, 0, SI_CtVI,SI_CtAd SI CtVI,4 O,O,SI CtVI,SI_CtAd SI CtVI,Ox44 O,O,SI_CtVI,S1_CtAd S1 CtVI,3 O,O,SI CtVI,S1_CtAd SI CtVI,OxcO O,O,S1_CtVI,S1 CtAd S1 CtVI,5 O,O,SI CtVI,S1_CtAd SI _CtVI,Ox60 O,O,SI CtVI,S1_CtAd S1_CtVI,9 O,O,S1 CtVI,S1_CtAd SI _CtVI,OxO O,O,S1 CtVI,S1_CtAd S1 CtVI,10 O,O,SI _CtVI,S1_CtAd SI_CtVI,OxO O,O,S1 CtVI,SI_CtAd S! CtVI,11 O,O,S1_CtVI,S1 CtAd S1 CtVI,Ox56 0, 0, SI_CtVI, S1_CtAd SI _CtVI,12 O,O,SI CtVI,SI_CtAd S1 CtVI,Ox6 O,O,SI CtVI,S1_CtAd SI_CtVI,13 O,O,S1 CtVI,SI_CtAd S1 CtVI,OxO O,O,S! CtVI,SI CtAd + 0) + 1) ;control port address ;control port value ;reset the port ;x16,1 stop,no parity - - - ;8 bits receive - ;8 bits xmit - ;1nt. disabled - ;NRZ ;Tx & Rx BRG out - - ;9600 baud - - ;9600 baud - 3-125 29K Family Application Notes const store const store const store const store const store const store const store const store EPILOGUE SI CtVl,14 O,O,SI CtVl,SI CtAd SI CtVl,OxO O,O,SI CtVl,SI_CtAd SI CtVl,14 O,O,SI _CtVl,SI CtAd SI CtVl,Oxl O,O,SI CtVl,SI _CtAd SI CtVl,3 O,O,SI CtVl,SI CtAd SI CtVl,Oxcl O,O,SI _CtVl,SI CtAd SI _CtVl,5 O,O,SI CtVl,SI CtAd SI_CtVl,Oxea O,O,SI CtVl,SI CtAd LEAF SerXmt,l - - (see below) Out: (nothing) ;BRG on - - ;Rx enable ;Tx enable - This routine transmits a single character via the SCC. the SCC to become ready. In: ;BRG in RTxC - It will wait (forever) for .reg .reg .reg const consth SX_Char,%%(IN_PRM + 0) SX_Ad,%%(TEMP_REG + 0) SX_Vl,%%(TEMP_REG + 1) SX_Ad,SCCCntlAd SX_Ad,SCCCntlAd ; character ; port address ;port value load and cpeq jmpf nop const consth store EPILOGUE O,O,SX_Vl,SX_Ad SX_Vl,SX_Vl,Ox4 SX_Vl,SX_Vl,O SX_Vl,SX_Wait ; get the status ;check tx buf empty SX_Ad,SCCDataAd SX_Ad,SCCDataAd O,O,SX_Char,SX_Ad ;send the character LEAF SerRcv,O SX Wait:, This routine waits for a receive character to become ready, then reads and'returns that character. In: (nothing) Out: (see below) .reg .reg const consth 3-126 SR_Ad,%%(TEMP_REG + 0) SR_Char,%%(RET_VAL + 0) SR_Ad,SCCCntlAd SR_Ad,SCCCntlAd ; port address ;character (stat tmp) Programming Standalone Am29000 Systems SR_Wait: load and cpeq jmpf nop const consth load and EPILOGUE LEAF 0, 0, SR_Char,SR_Ad SR_Char,SR_Char,Ox1 SR_Char,SR_Char,O SR_Char,SR_Wait ;get the status ;check rcv buf ready SR_Ad,SCCDataAd SR_Ad,SCCDataAd 0, 0, SR_Char,SR_Ad SR_Char,SR_Char,Oxff ;fetch the character SerChk,O This routine checks to determine if a receive character is ready at the serial port. It will return -1 if a character is ready and 0 if it is not. In: (nothing) Out: (see below) .reg .reg const consth load and cpeq sra EPILOGUE .sbttl SC_Ad,%%(TEMP_REG + 0) SC_Rdy,%%(RET_VAL + 0) SC_Ad,SCCCntlAd SC_Ad,SCCCntlAd 0, O,SC_Rdy,SC_Ad SC_Rdy,SC_Rdy,Ox1 SC_Rdy,SC_Rdy,0 SC_Rdy,SC_Rdy,31 ; port address ; character ;get the status ;check rcv buf ready ;convert to 0 or -1 .eject "Error Message Routines" FUNCTION SendErr,O,O,l This routine sends the text "Error .reg call const call const call const call const call const call const call const call const EPILOGUE SE_Char,%%(OUT_PRM lrO,SerXmt SE_Char,'E' lrO,SerXmt SE_Char,' r' lrO,SerXmt SE_Char,' r' lrO,SerXmt SE_Char,'o' lrO,SerXmt SE_Char,' r' lrO,SerXmt SE_Char, , , lrO,SerXmt SE_Char,'-' lrO,SerXmt SE_Char, , , FUNCTION SendNL,O,O,l + 0) ; output character ;send a "E" ;send a "r" ;send a "r" ;send a "0" ;send a "r" ;send a ;send a "_" ; send a 3-127 29K Family Application Notes This routine sends a CR-LF sequence. .reg call const call const EPILOGUE SN_Char,%%(OUT_PRM + 0) lrO,SerXmt SE_Char,OxOd lrO,SerXmt SE_Char,OxOa FUNCTION SendWord,1,1,1 isend a "CR" isend a "LF" This routine sends a 32-bit word in ASCII hex .reg .reg .reg .reg const SW_Word,%%(IN_PRM + 0) SW_Shift,%%(LOC_REG + 0) SW_T_Flag,%%(TEMP_REG + 0) SW_Char,%%(OUT_PRM + 0) SW_Shift,28 srI and cplt jmpt add add SW_Char,SW_Word,SW_Shift SW_Char,SW_Char,Oxf SW_T_Flag,SW_Char,10 SW_T_Flag,SW_1 SW_Char,SW_Char,Ox30 SW_Char,SW_Char,Ox27 iconvert to ASCII digit iconvert to ASCII letter call nop subs cpge jmpt nop EPILOGUE lrO,SerXmt isend the character SW Shift,SW_Shift,4 SW_T_Flag,SW_Shift,O SW_T_Flag,SW_O inext digit shift fact icheck if done icontinue if not ithe word to send i shift factor icharacter to send iright shift factor SW_O: iisolate nibble icheck decimal SW 1: ........................................ , FUNCTION ' RAMErr,3,0,1' This routine reports RAM errors with the message, "Error - RAM at aaaaaaaa write bbbbbbbb read cccccccc\n" .reg .reg .reg .reg .reg call nop call const call const call const call const call const 3-128 RE_ErrAdd,%%(IN_PRM + 0) + 1) RE_RedPat,%%(IN_PRM + 2) RE_Char,%%(OUT_PRM + 0) RE_Word,%%(OUT_PRM + 0) RE_WrtPat,%%(IN~PRM lrO,SendErr lrO,SerXmt RE_Char,' R' lrO,SerXmt RE_Char,'A' lrO,SerXmt RE_Char,'M' lrO,SerXmt RE_Char, , lrO,SerXmt RE_Char,'A' isend "Error - isend a "R" isend a "A" isend a "M" isend a isend a "A" " Programming Standalone Am29000 Systems call const call const call add call const call const call const call const call const call const call const call add call const call const call const call const call const call const call add call nop EPILOGUE lrO,SerXmt RE_Char,'T' lrO,SerXmt RE_Char,' , lrO,SendWord RE_Word,RE_ErrAdd,O lrO,SerXmt RE_Char,' , lrO,SerXmt RE_Char,' w' lrO,SerXmt RE_Char,' r' lrO,SerXmt RE_Char,' i' lrO,SerXmt RE_Char,'t' lrO,SerXmt RE_Char,'e' lrO,SerXmt RE_Char, , lrO,SendWord RE_Word,RE_WrtPat,O lrO,SerXmt RE_Char, , lrO,SerXmt RE_Char,'R' lrO,SerXmt RE_Char, , e' lrO,SerXmt RE_Char,'a' lrO,SerXmt RE_Char,'d' lrO,SerXmt RE_Char,' , lrO,SendWord RE_Word,RE_RedPat,O lrO,SendNL FUNCTION ROMErr,l,O,l ;send a "T" ;send a ;send error address ;send a ;send a "w" ;send a "r" ;send a "i" ;send a "t" ;send a "e" ;send a ;send good pattern ;send a ;send a "R" ;send a "e" ;send a "a" ;send a "d" ;send a ;send bad pattern ;send a new line This routine reports a ROM sum error with the message, "Error - ROM sum aaaaaaaa\n" .reg .reg .reg call nop call const call const call const call const call const ROM_Sum,%%(IN_PRM + 0) ROM_Char, %% (OUT_PRM + 0) ROM_Word,%%(OUT_PRM + 0) lrO,SendErr lrO,SerXmt ROM_Char, 'R' lrO,SerXmt ROM_Char,'O' lrO,SerXmt ROM_Char,'M' lrO,SerXmt ROM_Char, , lrO,SerXmt ROM_Char, , 5' ;send "Error - " ;send a "R" ;send a "0" ;send a "M" ;send a ;send a "5" 3-129 29K Family Application Notes call const call const call const call const call const call add call nop EPILOGUE lrO,SerXmt ROM_Char,'u' IrO,SerXmt ROM_Char,'m' IrO,SerXmt ROM_Char, , IrO,SerXmt ROM_Char, '=' lrO,SerXmt ROM_Char, ' IrO,SendWord ROM_Word,ROM_Sum,° IrO,SendNL FUNCTION SizeErr,O,O,l ;send a "u" ;send a "mil ;send a ;send a "_,, ;send a ;send ROM check sum ;send a new line This routine reports insufficient RAM size with the message "Error - RAM size\n" .reg call nop call const call const call const call const call const call const call const call const call nop EPILOGUE FUNCTION SIZ_Char,%%(OUT_PRM + 0) IrO,SendErr IrO,SerXmt SIZ_Char,'R' lrO,SerXmt SIZ_Char, 'A' IrO,SerXmt SIZ_Char,'M' lrO,SerXmt , SIZ_Char, ' lrO,SerXmt SIZ_Char,'s' lrO,SerXmt SIZ_Char,'i' lrO,SerXmt SIZ_Char, 'z' lrO,SerXmt SIZ_Char,'e' lrO,SendNL ;send "Error - " ;send a \\R" ;send a "A" ;send a "Mil ;send a ;send a "s" ;send a "i" ;send a "z" ;send a "e" ;send a new line TrapErr,O,O,l This routine reports insufficient RAM size with the message "Error - Invalid trap\n" .reg call nop call const call const call const call 3-130 TE_Char,%%(OUT_PRM + 0) lrO,SendErr lrO,SerXmt TE_Char, 'I' lrO,SerXmt TE_Char,'n' lrO,SerXmt TE_Char, 'v' lrO,SerXmt ;send "Error - " ;send a "I" ;send a "nil ;send a "v" Programming Standalone Am29000 Systems const call const call const call const call const call const call const call const call const call nop EPILOGUE TE_Char,'a' lrO,SerXmt TE_Char,'l' lrO,SerXmt TE_Char,' i' lrO,SerXmt TE_Char,'d' lrO,SerXmt TE_Char, , , lrO,SerXmt TE_Char,'t' lrO,SerXmt TE_Char,' r' lrO,SerXmt TE_Char,'a' lrO,SerXmt TE_Char,'p' lrO,SendNL .eject .sbttl "ROM Checksum Test" FUNCTION ROMSum,2,0,1 isend a "a" isend a "1" isend a "iN isend a "d" isend a isend a "t" isend a "r" isend a \\a" isend a "p" ; send a new line This routine is used to ensure that the ROM is "intacted" correctly by using the checksum checking method. In: (see below) Out: (see below) .reg .reg .reg .reg .reg xor sub RS_StrtAdd,%%(IN_PRM + 0) RS_WrdCnt,%%(IN_PRM + 1) RS_SumTmp,%%(TEMP_REG + 0) RS_ChkSum,%%(OUT_PRM + 0) RS_Fail,%%(RET_VAL + 0) RS_ChkSum,RS_ChkSum,RS_ChkSum RS_WrdCnt,RS_WrdCnt,2 load add jmpfdec add CD, ROM_CTL, RS_SumTmp, RS_StrtAdd RS_ChkSum,RS_ChkSum,RS_SumTmp RS_WrdCnt,RS_1 RS_StrtAdd,RS_StrtAdd,4 cpneq jmpf nop RS_Fail,RS_ChkSum,O RS_Fail,RS_EXIT call nop const consth lrO,ROMErr iO/P para -- ChkSum RS_Fail,TRUE RS_Fail,TRUE istart address iword count ;TRUE for fail iclear ChkSum ifor jmpfdec iadd to ChkSum inext ROM addr ;if ChkSum == 0 then iRS_PASS else RS_ERR icall ROMErr routine iTRUE for test fail 3-131 29K Family Application Notes RS EXIT: EPILOGUE .eject .sbttl "RAM 01 Test" FUNCTION RAM01,2,0,3 This routine tests the RAM by the following method set all RAM area to 0 then check for o. set all RAM area to 1 then check for 1. In: (see below) Out: (see below) .reg .reg .reg .reg .reg .reg .reg xor ROl_StrtAdd,%%(IN_PRM + 0) ROl_WrdCnt,%%(IN_PRM + 1) ROl_TmpCnt, %% (TEMP_REG + 0) R01_NxtAdd,%%(OUT_PRM + 0) ROl_WrtPat,%%(OUT_PRM + 1) ROl_RedPat,%%(OUT_PRM + 2) ROl_Fail,%%(RET_VAL + 0) add sub R01_NxtAdd,R01_StrtAdd,0 ROl_TmpCnt,ROl_WrdCnt,2 store jmpfdec add CD, DATA_CTL, R01_WrtPat, R01_NxtAdd ROl_TmpCnt,R01_l ROl_NxtAdd,R01_NxtAdd,WRD_SIZ add sub R01_NxtAdd,R01_StrtAdd,0 ROl_TmpCnt,ROl_WrdCnt,2 load cpneq jmpt nop jmpfdec add cpeq jmpt nor jmp nop CD,DATA_CTL,ROl_RedPat,ROl_NxtAdd ROl_Fail,ROl_RedPat,ROl_WrtPat ROl_Fail,ROl_ERR call nop const consth lrO,RAMErr ROl~WrtPat,R01_WrtPat,ROl_WrtPat ROl 0: - ;starting address ;count of words ; counter ;error addres ;pattern written ;pattern read ;TRUE for fail ;0 to start ;set O's or l's ;get strt RAM addr ; for jmpfdec ROl 1 : - ;check for O's or l's ;get strt RAM addr ; for jmpfdec R01 2 : - R01_TmpCnt,ROl_2 ROl_NxtAdd,ROl_NxtAdd,WRD_SIZ ROl_Fail,ROl_WrtPat,O ;if WrtPat = 0 then ;ROl_O else done ROl_WrtPat, ROl_WrtPat, ROl_WrtPat ; invert ptrn ROl_EXIT ;pass 0 and 1 test ROl ERR: - ;O/P Parms -- NxtAdd,WrtPat,RedPat ROl_Fail,TRUE R01_Fail,TRUE EPILOGUE .eject .sbttl 3-132 ;err if neq "RAM Checker Pattern Test" ;TRUE for test fail Programming Standalone Am29000 Systems FUNCTION RAMChkr,2,0,3 This routine will run a two-pass checkerboard on RAM. It will be controlled by input values specifying the base address and the count of locations to be tested. In: (see below) Out: (see below) .reg .reg .reg .reg .reg .reg .reg .reg const consth RC_StrtAdd,%%(IN_PRM + 0) RC_WrdCnt,%%(IN_PRM + 1) RC_TmpCnt,%%(TEMP_REG + 0) RC_StrtPat,%%(TEMP_REG + 1) RC_NxtAdd,%%(OUT_PRM + 0) RC_WrtPat,%%(OUT_PRM + 1) RC_RedPat,%%(OUT_PRM + 2) RC_Fail,%%(RET_VAL + 0) RC_StrtPat,CHKPAT_aS RC_StrtPat,CHKPAT_aS add sub add RC_NxtAdd,RC_StrtAdd,O RC_TmpCnt,RC_WrdCnt,2 RC_WrtPat,RC_StrtPat,O store R_LEFT jmpfdec add 0, 0, RC_WrtPat,RC_NxtAdd RC WrtPat RC_TmpCnt,RC_2 RC_NxtAdd,RC_NxtAdd,4 add sub add RC_NxtAdd,RC_StrtAdd,O RC_TmpCnt,RC_WrdCnt,2 RC_WrtPat,RC_StrtPat,O load cpneq jmpt nop R_LEFT jmpfdec add CD,DATA_CTL,RC_RedPat,RC_NxtAdd RC_Fail, RC_RedPat, RC_WrtPat RC_Fail,RC_ERR RC 1: jstarting address jcount of words ;total test word count jstarting pattern jerror address jpattern written jpattern read ;TRUE for fail ;start with as ;fill memory with pattern ;get start address ;for jmpfdec jset the pattern RC 2: ;rotate ptrn left jnext test mem addr ; check memory for pattern ;get start address ; for jmpfdec ;set the pattern RC 3: RC_WrtPat RC_TmpCnt,RC_3 RC_NxtAdd,RC_NxtAdd,4 jerr if neq ;rotate ptrn left nor jmpt nop jmp nop RC_StrtPat,RC_StrtPat,O RC_StrtPat,RC_EXIT ;next test mem addr ; invert ptrn for next pass ; invert initial ;done if msb = 1 RC 1 ;try with inverted call nop const consth lrO,RAMErr RC ERR: RC_Fail,TRUE RC_Fail,TRUE ;set after call RC EXIT: EPILOGUE 3-133 29K Family Application Notes .eject .sbttl "RAM Address Pattern Test" FUNCTION RAMAddr,2,O,3 This routine will run a two-pass test on RAM. It will be controlled by input values specifying the base address and the count of locations to be tested. In the first pass, the data will be set equal to the address. In the second pass, the data will be set equal to the complement of the address. In: (see below) Out: (see below) .reg .reg .reg .reg .reg .reg .reg .reg .reg add const RA_StrtAdd,%%(IN_PRM + 0) RA_WrdCnt,%%(IN_PRM + 1) RA_TmpCnt,%%(TEMP_REG + 0) RA_StrtPat,%%(TEMP_REG + 1) RA_PtrnInc,%%(TEMP_REG + 2) RA_NxtAdd,%%(OUT_PRM + 0) RA_WrtPat,%%(OUT_PRM + 1) RA_RedPat,%%(OUT_PRM + 2) RA_Fail,%%(RET_VAL + 0) RA_StrtPat,RA_StrtAdd,O RA_PtrnInc,4 add sub add RA_NxtAdd,RA_StrtAdd,O RA_TmpCnt,RA_WrdCnt,2 RA_WrtPat,RA_StrtPat,O :fill memory with pattern :get start address : for jmpfdec :set the pattern store add jmpfdec add O,O,RA_WrtPat,RA_NxtAdd RA_WrtPat,RA_WrtPat,RA_PtrnInc RA_TmpCnt,RA_2 RA_NxtAdd,RA_NxtAdd,4 :next test mem addr RA 1 : ;starting address ;count of words ;total test word count ;starting pattern ;ptrn increment value :error address :pattern written : pattern read :TRUE for fail ;start with address RA_2: check memory for pattern add RA_NxtAdd,RA_StrtAdd,O sub RA_TmpCnt,RA_WrdCnt,2 add RA_WrtPat,RA_StrtPat,O 3-134 load cpneq jmpt nop add jmpfdec add CD,DATA_CTL,RA_RedPat,RA_NxtAdd RA_Fail,RA_RedPat,RA_WrtPat RA_Fail,RA_ERR nor cpneq jmpt subr jmp nop RA_StrtPat,RA_StrtPat,O RA_Fail,RA_StrtPat,RA_StrtAdd RA_Fail, RA_1 RA_PtrnInc,RA_PtrnInc,O RA_EXIT RA_WrtPat,RA_WrtPat,RA_PtrnInc RA_TmpCnt,RA_3 RA_NxtAdd,RA_NxtAdd,4 :get start address : for jmpfdec :set the pattern :err if neq :next test mem addr ; invert ptrn for next pass :invert initial :negate inc value Programming Standalone Am29000 Systems call nop const consth lrO,RAMErr RA_Fail,TRUE RA_Fail, TRUE iset after call EPILOGUE .eject .sbttl ~Invalid Trap Handler" InvalidTrapHandler: This routine receives control when an invalid trap occurs. It will reinitialize a register frame for use in error reporting. It then reports the fact that an invalid trap has occurred. Reporting of specific trap numbers could be achieved, but at considerable cost in size. The use of an instrument such as the ADAPT29KTM is recommended for invalid trap identification. If that is not practical, this handler (or some other) could be extended to report numbers. It would require 2K bytes of additional code (jmp/const for each of 256 vectors). mtsrim const const sub call add call nop halt nop cps,Ox173 rfb,5l2 rab,O rsp,rfb,8 IrO,SerInit Irl,rfb,O I rO, T rapErr .eject .sbttl ~Vector LEAF VectInit,O iRE,PD,PI,SM,DI,DA iset up temp reg frame iroom for linkage iready to report errors ismall frame required ishow trap error Initialization" This routine initializes the vector table and vab. All vectors are set to point to the invalid trap handler in ROM. .reg .reg .reg mtsrim mfsr const consth const VI_Vect,%%(TEMP_REG + 0) VI_VectSt,%%(TEMP_REG + 1) VI_VectCnt,%%(TEMP_REG + 2) vab,O VI_VectSt,vab VI_Vect, (InvalidTrapHandler I 2) VI_Vect,InvalidTrapHandler VI_VectCnt, (256 - 2) ivector value ivector storage address ivector count register store jmpfdec add EPILOGUE O,O,VI_VectSt,VI_Vect VI_VectCnt,VI_Loop VI_VectSt,VI_VectSt,4 istore the vector ifor jmpfdec VI_Loop: 3·135 29K Family Application Notes .eject .sbttl "Boot" Boot: This routine receives control upon a hardware reset. Its purpose is to establish the execution environment for the main program. This involves transcriptions of data and possibly code. The transcriptions may take the form of executing code since the ROM may not be readable. .reg mtsrim const const sub add call nop const consth call const call nop call mtsrim mtsrim const consth mtsr add mtsr iretinv ; end of boot.s 3-136 RI_Ret, %% (TEMP_REG + 0) cps,Oxl73 rfb,5l2 rab,O rsp,rfb,l6 lrl,rfb,O lrO,SerInit pl, (RAM_SIZE pl, (RAM_SIZE lrO,RAMAddr pO,O lrO,VectInit » » 2) 2) RI_Ret,RAMInit ops,Ox473 cps,Ox473 lrO,TextBas lrO,TextBas pcl,lrO lrO,lrO,4 pcO,lrO ;go to inst space,TextBas ; RAMIni t return ;RE,PD,PI,SM,DI,DA ;set up temp reg frame ;enough for pO and pl ;ready to report errors ;test full RAM size ;just use one test ;test from address zero ;invalid traps ; initialize RAM ;FZ,PD,PI,SM,DI,DA ;FZ,PD,PI,SM,DI,DA ; (using lrO as temp) Programming Standalone Am29000 Systems APPENDIX 8: start.s .title "Start and Other Assembly-language Routines" Copyright 1988, Advanced Micro Devices,Inc. Written by Gibbons and Associates, Inc. HISTORY: 1.3 29 July 88 E M Greenawalt SPR 0001 Fixed shift count on line 1034 This module provides initializations and trap handling for a program written in C and operating in a stand alone environment. It is designed for compatibility with the ADAPT29K and various Am29000 monitors. In this module, the first 16 system registers (gr64-gr79) are available for use as system statics. They are not used in any of the routines in this file. Their values are not saved and restored in the C interrupt handler interrupts, so they are truly static. The second 16 system registers (gr80,-gr95) are used as temporary registers by trap handlers, etc., in this module. No such trap handler is itself interruptable. No presumption is made about the preservation of values in these registers by any program. .extern .global .global NOTE: main V_SPILL V FILL ;the C main routine ;the spill/fill vectors The equates below define the padding in the vector section (to a full page), and constants related to the page size. The register and memory stack size are also declared. When operating with a monitor, the VECT PAD may need to be increased. .equ .equ .equ .equ .equ .equ .equ .equ . include NOTE: PS,3 RPN_SHIFT, (10 + PS) PAGE_SIZE, (1 « RPN_SHIFT) MMU_PS, (PS « 8) RPN_MASK, (- (PAGE_SIZE - 1)) VECT_PAD, (PAGE_SIZE - Ox400) RSTK_SIZE,PAGE_SIZE MSTK_SIZE,PAGE_SIZE "romdcl.h" ;page size designation The equates below define traps for divide by zero and divide overflow. They are not standard. They are not handled here. .equ .equ .eject .sbttl V_DIVO,80 V_DIVOV,81 ;divide by zero ;divide overflow "Section Declarations" 3-137 29K Family Application Notes Sections will be ordered in memory as shown below. vectors rstack mstack .data .bss .text endsect (at 0) (register stack) (memory stack) (dummy for establishing bounds) Vectors will be initialized by start-up code with pointers to an invalid trap handler in ROM. The initialization code will explicitly intercept those vectors that will be handled. .sect .sect .sect .sect vectors,bss rstack,bss mstack,bss endsect,bss The declarations that follow suggest the order of the segments, provide base names for each, and allocate sizes for the vectors and stacks. Jump instructions are also provided at the base of the .text section for ease in linkage to the Start routine and the special routine which provides for ADAPT29K initializations. .use .block .block .use vectors (4 * 256) VECT_PAD rstack RStkBase: .block RStkTop: .use mstack MStkBase: .block MStkTop: .data DataBase: ;base of init data .bss BSSBase: ;base of BSS data .text TextBase: jmp nop jmp nop .use 3-138 ;base of .text Start Adaptlnit endsect ;allows easy linkage to Stait ;for bootstrap code ;makes Adaptlnit easier to find Programming Standalone Am29000 Systems ;marks end of .text ;dummy to assure existence ;switch back to text EndBase: .block .text .eject .sbttl .global .global .global .global "Timer read/write functions" _GetTmCnt _SetTmCnt GetTmRld SetTmRld LEAF _GetTmCnt,O This routine returns the timer/counter register value. i.e., no mask is applied. In: (nothing) Out: (see below) .reg mfsr EPILOGUE GTC_Val,%%(RET_VAL + 0) GTC_Val,tmc LEAF _SetTmCnt,l This routine sets the timer/counter register value. i.e., no mask is applied. In: ·(see below) Out: (nothing) .reg mtsr EPILOGUE STC_Val,%%(IN_PRM + 0) tmc,STC_Val LEAF _GetTmRld,O All the fields are returned; ;timer reg value All the fields are set; ;timer reg value This routine gets the current contents of the timer reload register. are applied. In: (nothing) Out: (see below) .reg mfsr EPILOGUE GTR_Val,%%(RET_VAL + 0) GTR_Val,tmr LEAF _SetTmRld,l This routine sets the timer/counter reload value. i.e., no mask is applied. In: No masks ;timer reload value All the fields are set; (see below) 3-139 29K Family Application Notes ; Out: (nothing) .reg mtsr STR_Val,%%(IN_PRM + 0) tmr,STR_Val ;timer reload value EPILOGUE .eject .sbttl "32-bit Time Extensions" The routines below extend the timer counter to 32 bits via a trap handler. The 32-bit value may be initialized and read by C-callable routines declared as globals. The trap handler is also included. Note that the caller of the C routines must be running in supervisor mode . . global .global .bss _ClrTm32 _GetTm32 ;switch to declare bss .block .text 4 LEAF _ClrTm32,0 TimeUpper: ;reserve a word for extension ;switch back This routine clears the 32-bit extended counter by setting the tmc, tmr and software extension value. The timer interrupt is also enabled in tmr. In: (nothing) Out: (nothing) (timer initialized to zero) Temp: .reg .reg const consth mtsr consth mtsr const consth const store EPILOGUE CTVal,%%(TEMP_REG + 0) CTUpPt,%%(TEMP_REG + 1) CTVal,Oxffffff CTVal,Oxffffff tmc,CTVal CTVal,Ox1ffffff tmr,CTVal CTUpPt,TimeUpper CTUpPt,TimeUpper CTVal,O O,O,CTVal,CTUpPt (see below) ;timer reg value ;upper pointer ;for tc and TimeUpper ;should keep it busy ;set ie ;no extension LEAF _GetTm32,O This routine returns a 32-bit clock counter. The clock counter is implemented by extending the hardware counter in software and negating the value before it is returned. The negation causes the returned value to be an up counter of the time since the counter was last reset. The low-level timer access routines may be used in initializations to assure a desired starting value. The software extension to 32 bits introduces a coordination problem in reading the counter's value. This is resolved by reading the upper 8 bits both before and after the TC value. If the TC value is greater than 2**23, the second upper value read is presumed to be correct. Lengthy interruptions of this ,routine (> 2**21 clocks) could cause errors. In: 3-140 (nothing) Programming Standalone Am29000 Systems Out: (see below) (see below) Temp: .reg .reg .reg .reg .reg .reg const consth load add mfsr load sll jmpf or or TUpPt,%%(TEMP_REG + 0) TUpr1,%%(TEMP_REG + 1) TUpr2,%%(TEMP_REG + 2) TLwr,%%(TEMP_REG + 3) TChk,%%(TEMP_REG + 4) T32,%%(RET_VAL + 0) TUpPt,TimeUpper TUpPt,TimeUpper 0,0,TUpr1,TUpPt TUpr1,TUpr1,0 TLwr, tmc 0,0,TUpr2,TUpPt TChk,TLwr,8 TChk,GT_Exit T32,TLwr,TUpr1 T32,T32,TUpr2 ;upper time pointer ;upper time bits - 1st read ;upper time bits - 2nd read ;lower time bits - from cntr ;temp to check high bit ;32-bit time value ;get upper 8 bits of timer subr EPILOGUE T32,T32,0 ;negate to count up from zero ;hold till load complete ;get upper 8 bits again ;is upper TC bit set? ;if not, use 1st read ;poss ovfl before 2nd read ;poss ovfl after 1st read GT Exit: TimerHandler: This routine handles the timer trap. The timer trap will occur at intervals in the range of a second (depending on the actual clock speed). The extension to 32 bits makes the timer somewhat more useful for common benchmarks. A different scheme would be required for longer intervals. .reg .reg .reg mfsr sll srl mtsr const consth load srl sub sll store iret THTr,%%(SYS_TEMP + 0) THUpPt,%%(SYS_TEMP + 0) THUpVl,%%(SYS_TEMP + 1) THTr,tmr THTr,THTr,7 THTr,THTr,7 tmr,THTr THUpPt,TimeUpper THUpPt,TimeUpper 0, 0, THUPY1,THUpPt THUpVl,THUpVl,24 THUpVl,THUpVl,l THUpVl,THUpVl,24 0, 0, THUpVl,THUpPt ;done .eject .sbttl .global "C Interrupt Handler Interface" Clntf ;temp for tmr (shared) ;pointer to upper 8 bits ;upper a-bit value ;clear out upper tmr bits ;leaving ie alone ;decrement the upper bits Clntf: This routine is used to call a C routine that will handle an interrupt. In order to accomplish this, the context of the current program must be saved prior to the call and restored after the call. It is relatively expensive. In many instances, it may be best to write the interrupt handlers in assembly-language. Note 3-141 29K Family Application Notes that assembly-language handlers will have the system statics available to retain state information. Note also that system statics are not saved and restored here. They are "static." This routine receives as inputs the address of the C routine and the vector number. It passes the vector number to the C routine as its only parameter. An initial stack of 16 registers (including inputs) is provided to the C routine. + 0) + 1) In: (SYS_TEMP (SYS_TEMP Out: (nothing) Temp: (SYS_TEMP 2-13) (see below) 3-142 C routine address vector number used to hold specials .reg .reg .reg .reg mfsr mfsr mfsr mfsr mfsr mfsr mfsr mfsr mfsr mfsr mfsr add mtsrim sub const consth asge store mtsr storem add const sub add sub mtsr storem add add calli mtsrim CI_Rout,%%(SYS_TEMP + 0) CI_Vect,%%(SYS_TEMP + 1) CI_Stk,%%(SYS_TEMP + 14) CI_Frm,%%(SYS_TEMP + 14) st2,ops st3,cha st4,chd st5,chc st6,pcO st7,pc1 'sta, ipc st9,ipa st10,ipb stU, q st12,alu st13,rsp,0 cps,Ox73 msp,msp, «64 - 16) * 4) CI_Stk,MStkBase CI_Stk,MStkBase V_DataTLBProt,msp,CI_Stk O,O,graO,msp im O,O,graO,msp rfb,rsp,O CI_Frm,512 rab,rfb,CI_Frm rsp, rab, (13 * 4) msp,msp, (16 * 4) im O,O,rab,msp lr1,rfb,0 pO,CI_Vect,O lrO,CI_Rout cps,Ox13 mtsrim sub mtsrim loadm add mtsrim loadm add mtsr cps,Ox73 rab, rsp, (13 * CR, (16 - 1) O,O,rab,msp msp,msp, (16 * CR, «64 - 16) 0,0,gr64,msp msp,msp, «64 ops,st2 ithe C routine ithe vector istack check value iframe size (shared) isave specials temps ;PD,PI,SM,DI,DA iallocate space for globals icheck for overflow isimulate Prot (no return on fail) iflush for CPU bug CR, « 64 - 16) - 1) isave the globals imove down the frame ibeneath rsp iset rsp in 16 reg frame isave the frame CR, (16 - 1) 4) irequire remaining locals ivector is output parm 0 ;call the handler iwith prot and no ints (no good i for more complex TLB schemes) iready to reload ireload locals in frame 4) - 1) ireload globals 16) * 4) irestore specials Programming Standalone Am29000 Systems mtsr mtsr mtsr mtsr mtsr mtsr mtsr mtsr mtsr mtsr add iret cha,st3 chd,st4 chc,st5 pcO,st6 pc1,st7 ipc,stS ipa,st9 ipb,st10 q,st11 alu,st12 rsp,st13,0 ; return from int .eject .sbttl "Multiply and Divide Handlers" MultiplyHandler: This trap handler performs the (signed) operation: DEST//Q <- SRCA * SRCB. IPC, IPA, and IPB are set by the MULTIPLY instruction prior to the invocation of this trap handler. In: IPC IPA IPB DEST SRCA SRCB Out: DEST//Q IPB Temp: (see below) .reg mtsr mfsr mtsr mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul IPC MH_ IP,%%(SYS_TEMP + 0) q,grO MH_IP,ipc ipb,MH_IP grO,grO,O grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO . (unimportant side effect) ; temp for move operation ;SRCB (multiplier) to Q ;use a system temp to set ; ipb = ipc ;step I. (no initial prod) ;step 2. ;step 3. ;step 4. ; step 5. ;step 6. ; step 7. ; step S. ;step 9. ;step 10. ;step 1I. ;step 12. ; step 13. ; step 14. ;step 15. ; step 16. ; step 17. ;step lS. ; step 19. ; step 20. ;step 21; step 22. ; step 23. ; step 24. ; step 25. 3-143 29K Family Application Notes mul mul mul mul mul mul mull iret grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO ;done ; step ;step ;step ; step ;step ;step ;step 26. 27. 28. 29. 30. 31. 32. This trap handler performs the (unsigned) operation DEST//Q <- SRCA * SRCB. IPC,IPA,and IPB are set by the MULTIPLU instruction prior to the invocation of this trap handler. In: IPC IPA IPB DEST SRCA SRCB Out: DEST//Q IPB = IPC (unimportant side effect) Temp: (see below) .reg mtsr mfsr mtsr mulu mulu mulu mulu mulu mulu mulu mulu mulu mulu mulu mulu mulu mulu mulu mulu mulu mulu mulu mulu mulu mu1u mulu mulu mulu mulu mulu mu1u mulu mulu mulu mulu iret 3-144 MU IP,%%(SYS_TEMP + 0) q,grO MU_IP,ipc ipb,MU_IP grO,grO,O grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO grO,grO,grO ; done ;temp for move operation ;SRCB (multiplier) to Q ;use a system temp to set ; ipb = ipc ;step I. (no initial prod) ;step 2. ; step 3. ;step 4. ;step 5. ;step 6. ;step 7. ; step 8. ;step 9. ;step 10. ;step II. ; step 12. ; step 13. ;step 14. ;step 15. ; step 16. ;step 17. ;step 18. ;step 19. ;step 20. ;step 2I. ; step 22. ;step 23. ; step 24. ;step 25. ;step 26. ; step 27. ;step 28. ;step 29. ;step 30. ;step 3I. ;step 32. Programming Standalone Am29000 Systems DivideHandler: This trap handler performs the (signed) operation: DEST <- (SRCA//Q) / SRCB IPC,IPA,and IPB are set by the DIVIDE instruction prior to the invocation of this trap handler. ;In: IPC IPA IPB Q Out: DEST Temp: (see below) .reg .reg .reg .reg .reg .reg .reg .reg add mfsr sub add asneq DividendCheck: jmpf const cpeq subr subre DivisorCheck: jmpf nop cpeq subr DEST SRCA SRCB D_Rmdr,%%(SYS_TEMP + 0) D_Dvsr,%%(SYS_TEMP + 1) D_Sign,%%(SYS_TEMP + 2) D_DvdHi,%%(SYS _TEMP + 3) D_DvdLo,%%(SYS_TEMP + 4) D_Quot,%%(SYS TEMP + 5) D_Ovfl, %% (SYS _TEMP + 6) D_MnNg,%%(SYS_TEI1P + 7) D_DvdHi,grO,O D_DvdLo,q D_Dvsr,D_ Dvsr,O D_Dvsr,D_Dvsr,grO V_DIVO,D_ Dvsr,O D_DvdHi,DivisorCheck D_Sign, FALSE D_Sign, D_Sign, D_DvdLo,D_DvdLo,O D_DvdHi,D _DvdHi, ° ° ;shift area and remainder ;divisor ;0 for positive ;dividend high ';dividend low ;most negative integer ;SRCA is dividend high ;Q is dividend low ;divisor is in SRCB ;any easier access? ;check for divisor zero ;toggle flag ;negate dividend D_Dvsr, DivideOp ° D_Sign,D_Sign, D_Dvsr,D_ Dvsr,O ;toggle flag ;negate divisor q,D_DvdLo D_Rmdr, D_DvdHi D_Rmdr,D_Rmdr,D Dvsr D_Rmdr,D_Rmdr,D - Dvsr D_Rmdr,D_Rmdr,D- Dvsr D_Rmdr,D_Rmdr,D Dvsr D_Rmdr,D_Rmdr,D Dvsr D_Rmdr,D_Rmdr,D Dvsr D_Rmdr,D_Rmdr,D Dvsr D_Rmdr,D_Rmdr,D- Dvsr D_Rmdr,D_Rmdr,D Dvsr D_Rmdr,D_Rmdr,D _Dvsr D_Rmdr,D_Rmdr,D- Dvsr D_Rmdr,D_Rmdr,D- Dvsr ;dividend low to q ;D_Rmdr becomes shift high ; step I. ; step 2. ; step 3. ;step 4. ;step 5. ; step 6. ; step 7. ; step 8. ; step 9. ;step 10. ; step II. ; step 12. DivideOp: mtsr divO div div div div div div div div div div div div 3·145 29K Family Application Notes D Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr, D_Rmdr, D_Dvs"r D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Rmdr,D_Rmdr,D_Dvsr D_Quot,q D_Ovfl,D_Quot,O D_Sign,DivideCorrect ; step 13. ;step 14. ; step 15. ;step 16. ; step 17. ; step 18. ; step 19. ; step 20. ; step 21; step 22. ; step 23. ; step 24. ;step 25. ;step 26. ; step 27. ; step 28. ; step 29. ; step 30. ; step 31;don't need remainder ;get quotient out of q ; check overflow D_MnNg,D_MnNg,D~MnNg D_Ovfl,D_MnNg,D_Quot D_Ovfl,D_Ovfl,D_Sign ;set most neg ;check for most neg ;allow if to be neg DivideCorrect: jmpf aseq subr subr D_Sign,DivideExit V_DIVOV,D_Ovfl,O D_Quot,D_Quot,0 D_Rmdr,D_Rmdr,O ;done if positive ;trap on overflow ; negate quotient ;don't need remainder DivideExit: add iret grO,D_Quot,O ;done ;set DEST div div div div div div div div div div div div div div div div div div div divrem mfsr cplt jmpf cpeq cpeq cpneq DividuHandler: This trap handler performs the (unsigned) operation: DEST <- (SRCA//Q) / SRCB IPC,IPA,and IPB are set by the DIVIDU instruction prior to the invocation of this trap handler. In: IPC IPA IPB Q Out: DEST Temp: (see below) .reg add divO div div div 3-146 DEST SRCA SRCB DU_Rmdr, %% (SYS_TEMP DU_Rmdr,grO,O DU_Rmdr,DU_Rmdr DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO + 0) ;shift area and remainder ;SRCA to DU_Rmdr ;DU_Rmdr becomes shift high ; step 1;step 2. ;step 3. Programming Standalone Am29000 Systems div div div div div div div div div div div div div div div div div div div div div div div div div div div div divrem mfsr iret DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr,DU_Rmdr,grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr,DU_Rmdr,grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr, DU_Rmdr, grO DU_Rmdr,DU_Rmdr,grO DU_Rmdr,DU_Rmdr,grO DU_Rmdr,DU_Rmdr,grO DU_Rmdr,DU_Rmdr,grO DU_Rmdr,DU_Rmdr,grO DU_Rmdr,DU_Rmdr,grO grO,q idone .eject .sbttl "Spill and Fill Handlers" istep 4. i step 5. istep 6. i step 7. i step B. i step 9. i step 10. istep 1I. i step 12. istep 13. i step 14. i step 15. i st.ep 16. i step 17. istep lB. istep 19. i step 20. istep 2I. istep 22. istep 23. istep 24. istep 25. istep 26. istep 27. istep 2B. istep 29. i step 30. i step 3I. idon't need remainder iquotient to (ipc) The routines below handle the allocation and free assertions in subroutine prologues and epilogues. The temps they use are given below. .reg . reg .reg .reg R_Cnt,%%(SYS_TEMP + 0) R_Bnd, %% (SYS_TEMP + 0) R_TmpPCO,%%(SYS_TEMP + 1) R_TmpPC1,%%(SYS_TEMP + 2) itemp itemp itemp itemp for for for for count (shared) boundary PCO PC1 SpillHandler: This routine handles a false assertion in the standard prologue In: rab > rsp (requiring an allocation) lr1 <= rfb rfb rab + 512 Out: rab mfsr mfsr mtsrim sub sub rsp (just enough allocated) R_TmpPCO,pcO R_TmpPC1,pc1 cps,Ox73 R_Cnt,rab,rsp rfb,rfb,R_Cnt Ir1 <= rfb rfb rab + 512 isave the PCs iPD,PI,SM,DI,DA iR_Cnt = # of bytes to spill imove down the frame bound 3-147 29K Family Application Notes store srI sub mtsr storem add const consth asge O,O,lrO,rfb R_Cnt,R_Cnt,2 R_Cnt,R_Cnt,l cr,R_Cnt O,O,lrO,rfb rab,rsp,O R_Bnd,RStkBase R_Bnd,RStkBase V DataTLBProt,rab,R_Bnd mtsrim mtsr mtsr iret cps,Ox473 pcO,R_TmpPCO pc1,R_TmpPC1 iflush for storem bug iR_Cnt ~ count of words to spill icorrect for ~torem iset up count for storem ispill from the allocated area imove down the allocate bound icheck for possible overflow isimulate TLB prot iNOTE: no return on fail iFZ,PD,PI,SM,DI,DA irestore the PCs FillHandler: This routine handles a false assertion in the standard epilogue. In: Ir1 > rfb (requiring deallocation) rsp >= rab rfb == rab + 512 Out: Ir1 (just enough freed) rsp >= rab rfb = rab + 512 isave the PCs rfb mfsr mfsr mtsrim const consth asle R_TmpPCO,pcO R_TmpPC1,pc1 cps,Ox73 R_Bnd,RStkTop R_Bnd,RStkTop V DataTLBProt,rfb,R_Bnd const or mtsr sub add srI sub mtsr loadm add mtsrim mtsr mtsr iret R_Cnt,512 R_Cnt,R_Cnt,rfb ipa,R_Cnt R_Cnt,lr1,rfb rab,rab,R_Cnt R_Cnt,R_Cnt,2 R_Cnt,R_Cnt,l cr,R_Cnt O,O,grO,rfb rfb,lr1,O cps,Ox473 pcO,R_TmpPCO pc1,R_TmpPC1 .eject .sbttl iPD,PI,SM,DI,DA icheck for possible underflow ;simulate TLB prot iNOTE: no return on fail imake local reg ip from rfb iset up indirect ptr for loadm ;R_Cnt = # of bytes to fill imove up the allocate bound iR_Cnt = number of words to fill icorrect for loadm iset up count for loadm ifill area freed imove up frame bound iFZ,PD,PI,SM,DI,DA irestore the PCs "TLB Miss Handler" The routines below provide one-for-one TLBs, i.e., the virtual address is set equal to the physical address. A central routine is used to do the actual TLB update. Some enhancement would be appropriate to allow I/O access as data,i i.e., memory-mapped I/O. Speed improvements could be realized (four instructions) by the allocation and initialization of system registers for the bounds. The temp registers used are indicated'below. 3-148 Programming Standalone Am29000 Systems .reg .reg .reg .reg .reg .reg TH_Ad,%%(SYS_TEMP + 0) TH_Ac,%%(SYS_TEMP + 1) TH_Bnd,%%(SYS_TEMP + 2) TH_Reg,%%(SYS_TEMP + 3) TH_WdO,%%(SYS_TEMP + 4) TH_Wd1,%%(SYS_TEMP + 5) ;the miss address ;the required privileges ;access bound ;TLB register number ;TLB word 0 value ;TLB word 1 value This routine handles supervisor instruction TLB misses. An attempted access out of range is treated as an instruction TLB protection violation. mfsr const consth asge TH_Ad,pc1 TH_Bnd,TextBase TH_Bnd,TextBase V_InstTLBProt,TH_Ad,TH Bnd const consth aslt TH_Bnd,EndBase TH_Bnd,EndBase V_InstTLBProt,TH_Ad,TH Bnd jmp const TLBHandler TH_Ac,Ox4BOO ;NOTE: no return on fail ;NOTE: no return on fail ;VE,SE SupDataTLBHandler: This routine handles the supervisor data TLB misses. It should be enhanced to allow I/O access as well as data access. mfsr TH_Ad,cha const TH_Ac,Ox7000 ;VE,SR,SW const TH_Bnd,MStkBase consth TH_Bnd,MStkBase asge V DataTLBProt,TH_Ad,TH_Bnd iNOTE: no return on fail const TH_Bnd,TextBase consth TH_Bnd,TextBase aslt V_InstTLBProt,TH_Ad,TH Bnd ;NOTE: no return on fail (drop through to TLB handler) TLBHandler: This routine handles TLB updates once it has been determined that the update is appropriate.NOTE: This routine presumes an BK-byte page size. In: TH Ad TH_Ac lru the address where access is ,required the access that is required the recommended TLB for replacement Out: (lru) constn sl1 and and or mfsr provides access to TH Ad TH_Wd1 , RPN_MASK TH_WdO,TH_Wd1,5 TH_Wd1,TH_Wd1,TH_Ad TH_WdO,TH_WdO,TH_Ad TH_WdO,TH_WdO,TH_Ac TH_Reg,lru ishift for vtag iestablish addr fields iestablish access iset the TLB entry 3·149 29K Family Application Notes mttlb add mttlb iret .eject .sbttl LEAF TH_Reg,TH_WdO TH_Reg,TH_Reg,l TH_Reg,TH_Wd1 "TLB Initialization" TLBInit,O This routine is uoed to initialize the TLBs. It clears all the TLB registers, thus marking all entries invalid. In: (nothing) Out: (nothing) Temps: (see below) .reg .reg .reg const const const TI_Reg,%%(TEMP_REG + 0) TI_Val,%%(TEMP_REG + 1) TI_Cnt,%%(TEMP_REG + 2) TI_Reg,O TI_Val,O TI_Cnt, (TLB_CNT - 2) mttlb jmpfdec add EPILOGUE TI_Reg,TI_Val TI_Cnt,TI_Loop TI_Reg,TI_Reg,l .eject .sbttl ;the TLB register number ;the TLB value (0) ;the TLB register count ;for jmpfdec "Vector Initialization" In order that the vector initialization code might be compact and that the set of vectors initialized might be easily expanded, a table in .data is used. Each entry in the table has two words. The first word is the number of the vector to be initialized. The second word is the address of the handler. .data ;switch to .data for table VectInitTable: .word .word .word .word .word .word .word .word .word .equ .text 3·150 V_SupInstTLB,SupInstTLBHandler V_SupDataTLB,SupDataTLBHandler V_MULTIPLY, MultiplyHandler V_DIVIDE, DivideHandler V_MULTIPLU,MultipluHandler V_DIVIDU,DividuHandler V_SPILL,SpillHandler V_FILL, FillHandler V_Timer, TimerHandler VINIT_CNT, ((. - VectInitTable) / 8) ;switch back to .text for code Programming Standalone Am29000 Systems VectInit: This routine initialzes the vectors for which handlers exist. vector area base In: vab Out: (vectors initialized) (see below) Temp: .reg .reg .reg .reg .reg mfsr const const consth VI_Vect,%%(TEMP_REG + 0) VI_St,%%(TEMP_REG + 1) VI_Cnt,%%(TEMP_REG + 2) VI_Base,%%(TEMP_REG + 3) VI_TbPt,%%(TEMP_REG + 4) VI_Base,vab VI_Cnt, (VINIT_CNT - 2) VI_TbPt,VectInitTable VI_TbPt,VectInitTable ;vector ;vector ;vector ;vector ;vector load add sll add load add jmpfdec store jmpi nop O,O,VI_St,VI_TbPt VI_TbPt,VI_TbPt,4 VI_St,VI_St,2 VI_St,VI_St,VI_Base O,O,VI_Vect,VI_TbPt VI_TbPt,VI_TbPt,4 VI_Cnt,VI_Loop O,O,VI_Vect,VI_St lrO ;get the vector .eject .sbttl value storage address count base base ;for jmpfdec ;convert to address (fixed v1.3) ;get the handler "ADAPT29K Initializations" AdaptInit: This routine is for use in situations where the bootstrap process has not occurred. Instead, the ADAPT29K has been used to load the program. Initializations of the vectors, etc., will be required. As an aid to fault identification, the vector table is initialized with pointers to the words immediately following the vectors. These words are initialized with HALT instructions. When one of these halts executes, the ADAPT29K will report the event and the address of the halt. This will allow the invalid trap that has occurred to be identified. CAUTION! This requires that the vector pad be at least 1024. .reg .reg .reg .reg mtsrim mtsrim mfsr const const AI_Vect,%%(TEMP_REG + 0) AI_St,%%(TEMP_REG + 1) AI_Cnt,%%(TEMP_REG + 2) AI_Halt,%%(TEMP_REG + 3) cps,Ox73 vab,O AI_St,vab AI_Vect,1024 AI_Halt,Ox89000000 ;vector value ;vector storage address ;vector count register ;halt instruction register ;PD,PI,SM,DI,DA ;just beyond vectors 3-151 29K Family Application Notes consth const AI_Halt,Ox89000000 AI_Cnt, (256 - 2) store add store jmpfdec add jmp nop O,O,AI_St,AI_Vect AI_St,AI_St,4 O,O,AI_Vect,AI_Halt AI_Cnt,AI_Loop AI_Vect,AI_Vect,4 Start .eject .sbttl ; for jmpfdec :store the vector ;store the HALT "Start" Start: This routine receives control after any required bootstrap processes. It will initialize the vectors which are actually handled, clear the BSS area, initialize the TLBs, and establish initial stack pointers and an initial register frame. It will then invoke _main. In the event that _main returns, this routine will perform a warm start. In: vab Out: (nothing) mtsrim mtsrim mtsrim const consth const consth add sub const consth call nop call nop call nop mtsrim const const call nop mtsrim mtsrim mtsrim mtsrim mtsrim mtsrim iretinv ; end of start.s 3·152 indicates vector area cps,Ox73 mmu,MMU_PS cfg,Ox10 rfb,RStkTop rfb,RStkTop rab, (RStkTop - 512) rab, (RStkTop - 512) lr1,rfb,O rsp,rfb,16 msp,MStkTop msp,MStkTop lrO,Vectlnit ;install handled vectors lrO,TLBInit ;establish TLBs invalid lrO, ClrTm32 (leave to _main ???) cps,Ox10 lr2,O lr3,O lrO, main ;clear and enable timer cps,Ox473 ops,Ox173 cfg,l chc,O pc1,O pcO,4 :FZ,PD,PI,SM,DI,DA ;RE,PD,PI,SM,DI,DA ;cache disabled ;contents invalid ;cold start address - ;PD,PI,SM,DI,DA ;order It ~ 0 ;VF ;set up stack pointers ;lrO,lr1,argc,argv iSM ;argc ;argv = 0 0 - Programming Standalone Am29000 Systems APPENDIX C: test.s .title "Test of Assembly-language Utilities" Copyright 1988, Advanced Micro Devices, Inc. Written by Gibbons and Associates, Inc . "romdcl.h" . include .extern _GetTm32 .data OxDEADBEEF .word .bss 1024 .block .text .eject "Multiply/Divide Test" .sbttl LEAF ;just to test ; verify zeros _MultDiv,O This routine gives a test of the multiply and divide trap handlers by the simple expedient of performing one of each. Using the debugger, it can be forced to loop, etc. In: (nothing) Out: (nothing) (see below) Temp: .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg const const consth const consth MD_Mpd,%%(TEMP_REG + 0) MD_Mpr,%%(TEMP_REG + 1) MD_PrLo,%%(TEMP_REG + 2) MD_PrHi,%%(TEMP_REG + 3) MD_Mlp,%%(TEMP_REG + 4) MD_DvdHi,%%(TEMP_REG + 0) MD_DvdLo,%%(TEMP_REG + 1) MD_Dvsr,%%(TEMP_REG + 2) MD_Quot,%%(TEMP_REG + 3) MD_Dlp,%%(TEMP_REG + 4) MD_Mlp,O MD_Mpd,3 MD_Mpd,3 MD_Mpr, 5 MD_Mpr,5 multiply mfsr jmpt nop const const consth const consth const consth MD_PrHi,MD_Mpd,MD_Mpr MD_PrLo,q MD_Mlp,M_Loop mtsr divide jmpt q,MD_DvdLo MD Quot,MD_DvdHi,MD_Dvsr MD_Dlp,D_Loop ;multiplicand ; multiplier ;product low ;product high ;BOOLEAN for looping ;dividend high ;dividend low ;divisor ; quotient ;BOOLEAN for looping ; FALSE ; (full 32-bit for patching) M_Loop: MD_Dlp,O MD_DvdHi,O MD_DvdHi,O MD_DvdLo,15 MD_DvdLo,15 MD_Dvsr,3 MD_Dvsr,3 iFALSE i (full setting for patch) D_Loop: 3·153 29K Family Application Notes nop EPILOGUE .eject .sbttl "Spill/Fill Test" FUNCTION _Recurse,1,29,1 This routine is a simple recursive do-nothing that is used to test spill/fill. It accepts a count as its input, decrements that count, and, if the result is zero or greater, calls itself with the now decremented count. Each instance of the routine allocates 32 new registers. Thus the total register requirement is 32 * (InCnt + 1) where InCnt is the input count. In: (see below) Out: (nothing in final return) Temp: (allocated but not used) .reg .reg sub jmpt nop call nop R_InCnt,%%(IN_PRM + 0) R_OutCnt,%%(OUT_PRM + 0) R_OutCnt,R_InCnt,l R_OutCnt,R_Exit IrO, _Recurse R Exit: EPILOGUE .eject .sbttl .extern "C Interrupt Interface Test" CIntf LEAF _Trap70,1 This "C" routine handles trap 70. It increments the value of a global system register so that its effect may easily be seen. In: (see below) Out: stO st1 incremented set to input parameter value .reg add add EPILOGUE T70_V,%%(IN_PRM + 0) stO,stO,l st1,T70_V,0 ;the vector Trap70: ; This is the assembly-language routine that should get control on 3·154 Programming Standalone Am29000 Systems trap 70. It invokes CIntf in such a way as to give control to _Trap70, the "c" routine above. Note that control never returns to this routine. CIntf performs the iret. In: (nothing) Out: (nothing) .reg .reg const consth jmp const T70_Rout,%%(SYS_TEMP + 0) T70_Vect,%%(SYS_TEMP + 1) T70_Rout,_Trap70 T70_Rout,_Trap70 CIntf T70_Vect,70 .eject .sbttl .global FUNCTION _main,2,2,1 This routine plays the role of a C main routine. It is coded in assembly language to ease testing with an absolute debugger. .reg .reg .reg .reg call nop add call nop call const asneq call nop add EPILOGUE argc, %% (IN_PRM + 0) argv, %% (IN_PRM + 1) StTm,%%(LOC_REG + 0) EndTm,%%(LOC_REG + 1) IrO, GetTm32 ;argc (= 0) ;argv (= NULL) ; start time ;end time ;should return start time StTm,vO,O IrO, MultDiv ;save the result ;test multiply/divide IrO, _Recurse pO,IS 70,grl,grl IrO, GetTm32 ;test spill/fill ; require 1024 registers ;force trap 70 ;should return end time EndTm,vO,O ;save the result - - ; end of test.s 3·155 29K Family Application Notes APPENDIX D: romdcl.h .eject .sbttl "Register, constant, and Macro Declarations" Copyright 1988, Advanced Micro Devices Written by Gibbons and Associates, Inc. ; Global registers .reg .equ .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .equ .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .equ .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg 3-156 rsp,gr1 SYS_TEMP,64 stO,gr64 st1,gr65 st2,gr66 st3,gr67 st4,gr68 st5,gr69 st6,gr70 st7,gr71 st8,gr72 st9,gr73 st10,gr74 stll, gr75 st12,gr76 st13,gr77 st14,gr78 st15,gr79 SYS_STAT,80 ssO,gr80 ss1,gr81 ss2,gr82 ss3,gr83 ss4,gr84 ss5,gr85 ss6,gr86 ss7,gr87 ss8,gr88 ss9,gr89 ss10,gr90 ss11, gr91 ss12,gr92 ss13,gr93 ss14,gr94 ss15,gr95 RET_VAL, 96 vO,gr96 v1,gr97 v2,gr98 v3,gr99 v4,gr100 v5,gr101 v6,gr102 v7,gr103 v8,gr104 v9,gr105 v10,gr106 vll, gr107 v12,gr108 v13,gr109 v14,grllO ;local reg. var. stack pointer ;system temp registers ;system static registers ;return registers Programming Standalone Am29000 Systems .reg .equ .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .equ .reg .reg .reg .reg .equ .req .reg .reg .reg .reg .reg .reg .reg .reg i i temp registers ireserved (for user) itemp extension (and shared) Global registers with special calling convention uses .reg .reg .reg .reg .reg .reg .reg i vlS,grl11 TEMP_REG, 96 to,gr96 tl,gr97 t2,gr9S t3,gr99 t4,grlOO tS,grlOl t6,grlO2 t7,grlO3 'tS, grlO4 t9,grlOS tlO,grlO6 t11, grlO7 t12,grlOS t13,grlO9 t14, gr110 tlS,grl11 RES_REG, 112 rO, gr112 rl,gr113 r2, gr114 r3, gr11S TEMP_EXT, 116 xO,gr116 xl,gr117 x2, gr11S x3,grl19 x4,gr120 xS,gr121 x6,gr122 x7,gr123 xS,gr124 tav,gr121 tpc,gr122 lsrp,gr123 slp,gr124 msp,gr12S rab,gr126 rfb,gr127 itrap handler argument (also x6) itrap handler return (also x7) ilarge return pointer (also xS) istatic link pointer (also x9) imemory stack pointer iregister alloc bound ;register frame bound Local compiler registers - output parameters, etc. (only valid if frame has been established) .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg .reg plS,lr17 p14,lr16 p13,lrlS p12,lr14 p11,lr13 plO,lr12 p9,lr11 pS,lrlO p7,lr9 p6,lrS pS,lr7 p4,lr6 p3,lrS p2,lr4 iparameter registers 3·157 29K Family Application Notes .reg .reg p1,lr3 pO,lr2 ; TLB register count .equ .eject ; constants for general use .equ .equ .equ .equ WRD_SIZ,4 TRUE,OxBOOOOOOO FALSE,OxOOOOOOOO CHKPAT_a5,Oxa5a5a5a5 ;word size ;logical true -- bit 31 ; logical false -- 0 ; check pattern ; constants for data access control .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ CE,Ob1 CD,ObO AS,Ob1000000 PA,Ob0100000 SB,Ob0010000 UA,Ob0001000 ROM_OPT,Ob100 DATA_OPT,ObOOO INST_OPT,ObOOO ROM_CTL, (PA + ROM_OPT) DATA_CTL, (PA + DATA_OPT) INST_CTL, (PA + INST_OPT) IO~CTL, (AS + PA + DATA_OPT) ;co-processor enable ;co-processor disable ; set for I/O ;set for physical ad ;set for set BP ;set for user access ;OPT values for acc ;control field .eject ;----------------------------------------------------------------------;defined vectors i----------------------------------------------------- ------------------ 22 3-158 - .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ 31 reserved .equ V_IllegalOp,O V_Unaligned, 1 V_Out Of Range, 2 V_NoCoProc,3 V_CoProcExcept,4 V_ProtViol,5 V_InstAccExcept,6 V_DataAccExcept,7 V_UserlnstTLB,B V_UserDataTLB,9 V_SuplnstTLB,lO V_SupDataTLB,ll V_InstTLBProt,12 V_DataTLBProt,13 V_Timer, 14 V_Trace, 15 V_INTRO,16 V_INTR1,17 V_INTR2,lB V_INTR3,19 V_TRAPO,20 V_TRAP1,21 V_MULTIPLY, 32 Programming Standalone Am29000 Systems .equ .equ .equ .equ 37 - 41 reserved .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ .equ 56 - 63 reserved .equ .equ .equ .equ .equ .equ .eject .macro V_DIVIDE, 33 V_MULTIPLU,34 V_DIVIDU,35 V_CONVERT, 36 V_FEQ,42 V_DEQ,43 V_FGT,44 V_DGT,45 V_FGE,46 V_DGE,47 V_FADD,48 V_DADD,49 V_FSUB,50 V_DSUB,51 V_FMUL,52 V_DMUL,53 V_FDIV,54 V_DDIV,55 V_SPILL, 64 V_FILL, 65 V_BSDCALL,66 V_SYSVCALL,67 V_BRKPNT,68 V EPI _OS, 69 R_LEFT,REGVAR Rotate left Parameters: REGVAR register to rotate add addc .endm REGVAR,REGVAR,REGVAR REGVAR,REGVAR,O ;shift left by 1 bit,C ;add C to LSB .macro FUNCTION,NAME,INCNT,LOCCNT,OUTCNT MSB Introduces a non-leaf routine. This macro defines the standard tag word before the function, then establishes the statement label with the function's name and finally allocates a register stack frame. It may not be used if a memory stack frame is required. Note also that the size of the register stack frame is limited. Neither this nor the lack of a memory frame is considered to be a severe restriction in an assembly-language environment. The assembler will report errors if the requested frame is too large for this macro. It may be good practice to allocate an even number of both output registers and local registers. This will help in maintaining double word alignment within these groups. The macro will assure double word alignment of the stack frame as a whole, as required for correct linkage. 3·159 29K Family Application Notes Paramters: the function name input parameter count local register count output parameter count NAME INCNT LOCCNT OUTCNT .set .set .set .if .set .endif .if .set .endif .if .set .endif .word ALLOC_CNT, ((2 + OUTCNT + LOCCNT) « 2) PAD_CNT, (ALLOC_CNT & 4) ALLOC_CNT, (ALLOC_CNT + PAD_CNT) (INCNT) IN_PRM, (4 + OUTCNT + PAD_CNT + LOCCNT + Ox80) sub asgeu add .endm rsp,rsp,ALLOC_CNT V_SPILL,rsp,rab Ir1,rsp, ((4 + OUTCNT + LOCCNT + INCNT) .macro LEAF,NAME,INCNT (LOCCNT) LOC_REG, (2 + OUTCNT + PAD_CNT + Ox80) (OUTCNT) OUT_PRM, (2 + Ox80) ((2 + OUTCNT + LOCCNT) « 16) NAME: « 2) Introduces a leaf routine This macro defines the standard tag word before the function, then establishes the statement label with the function's name. Paramters: .if .set .endif .set .word NAME INCNT the function name input parameter count (INCNT) IN_PRM, (2 + Ox80) NAME: .endm .macro EPILOGUE Deallocates register stack frame . if add nop jmpi asleu .else jmpi nop .endif .set .set 3-160 (only and only if necessary) . (ALLOC_CNT) rsp,rsp,ALLOC_CNT IrO V_FILL,lr1,rfb IrO IN_PRM, (1024) LOC_REG, (1024) ;illegal,to cause err on ref ;illegal,to cause err on ref Programming Standalone Am29000 Systems .set .set endm OUT_PRM, (1024) ALLOC_CNT, (1024) ;i1legal,to cause err on ref ;illegal,to cause err on ref Initial values for macro set variables to guard against misuse ;illegal,t,o .set IN_PRM, (1024) ;illegal,to LOC_REG, (1024) .set ;illegal,to OUT_PRM, (1024) .set ALLOC_CNT, (1024) ;illegal,to .set cause cause cause cause err err err err on on on on ref ref ref ref ; end of romdcl.h 3·161 29K Family Application Notes APPENDIX E: testoid test.ld Linker Directives see test.s and start.s for descriptions of sections load test.o,start.o order vectors=O,rstack,mstack,.bss, .data, .text,endsect 3-162 Host Interface (HIF) v1.0 Specification Application Note by E. M. Greenawalt PREFACE This document describes HIF (v1.0), the Am29000 Architectural Host Interface, and explains how to use it. HIF is the software standard that defines the interface between the user's high-level language program and the Am29000 processor. The document is written for experienced programmers and assumes a working knowledge of the Am29000 microprocessor. INTRODUCTION Advanced Micro Devices is developing a complete line of Am29000™ simulators, hardware target execution vehicles, and high-level language development tools for the Am29000 32-bit Streamlined Instruction Processor. These products are designed to support end-users who are building embedded system applications based on the Am29000 processor. For these users, often there is no existing operating system or kernel fortheir hardware deSign. Before AMD could create development tools for the Am29000, a standard set of kernel services had to be defined that would interface a user-application program, written in a high-level language, to a host operating system or an Am29000 processor. HIF, the host interface, is the software specification that defines this standard set of kernel services. Figure NO TAG shows the level where HIF resides. As implied by the figure, HIF does not describe any particular implementation; but rather each simulator, hardware vehicle, and high-level language implements HIF in its own way. The kernel services provide the minimum functionality needed to interface high-level language library functions to the user's operating system code. Using HIF, program modules written in any of the languages available for the Am29000 can be combined and the resulting program can run, without change, o~ any Am29000 simulator or hardware execution vehicle. Future AMD products will also use HIF, and AMD is actively encouraging third-party vendor support. AMD is indebted to Embedded Performance, Incorporated (EPI), who originally developed the HIF concepts and then graciously placed them in the public domain. User's application program High-level language library Host interface (HIF) Operating system kernel Figure 1. HIF Interface Publication' 11014 Rev. A Amendment 10 Issue Date: 11/89 © 1989 Advanced Micro Devices, Inc. 3-163 29K Family Application Notes HIF APPLICATIONS The HIF specification has broad applications; currently it provides the interface between the user's high-level language program and the following hardware and software products: • Am29000 Architectural Simulator. This software product provides the means to simulate the operation of the Am29000 in a specified system environment. It provides detailed performance statistics by modeling the internal architecture of the Am29000, as well as system memory configurations and timing. The HIF specification is implemented to provide the interface between the user's program and the host operating system. • PC Execution Board (PCEB29K TM). This hardwarel software product contains an Am29000 processor and memory and is an add-in board to IBM® PC-based systems. Part of the HIF specification is implemented on the board with another part implemented on the PC, to interface with the DOS operating system. • Standalone Execution Board (STEB). This hardware product from STEP Engineering is intended to be an evaluation vehicle for the Am29000 and, optionally, Am29027™ Arithmetic Accelerator devices. The entire HIF specification is implemented on this board, which contains a resident monitor to implement the necessary kernel services. Because HIF is a general-purpose standard, it can be used to interface any high-level language to the Am29000. User programs need not be written entirely in a high-level language; they may incorporate assemblylanguage functions when maximized performance is the primary concern. HIFUSERS There are three categories of end-users who need to know the details of the host interface: • Those USing AMD-supplied hardware execution vehicles or simulators. This document defines the lowlevel mechanisms of HIF. With this information and the design concepts presented herein, end-users can extend the HIF environment to meet the needed degree of software functionality and sophistication. • Those developing a custom kernel operating system for an Am29000 design. These users need access to AMD's high-level and assembly-language development tools. This document provides the information required to build a HIF-conforming kernel that uses the high-level language development tools directly. With this information, end-users can extend and 3-164 customize the operating system code without interfering with the basiC capabilities of the HIF. • Those who are using the AMD-supplied high-level language development tools, but who must conform to another kernel operating system interface. There is sufficient information in this document to enable users to modify the development tools to properly interface with the target kernel's specifications. HIF CONCEPTS Programmers developing software in a high-level language do not work directly with the processor. Instead, they think in terms of a virtual machine ideally suited to the computational paradigm of the language. For instance, the C-Ianguage virtual machine has operations such as fprlntf() and strcpy(), and the FORTRAN machine has operations such as alog and sqrt. In actual practice, these virtual machines are implemented by libraries of object code that perform language-specific operations. As long as programmers use only the functions of the language's implied virtual machine, the programs will be portable across a broad range of implementations of the language. However, computer systems generally provide another virtual machine to the world: one that is defined by the operating system software. This virtual machine requires system calls to perform the services that are implemented within the operating system code. Typical services are: process management, file system management, device management, and memory management. The high-level language virtual, machine usually consists of: (1) functions that can be implemented entirely within library routines, and (2) functions that require the services of the operating system. The functions of the first group (usually defined as the standard library for that language) are independent of the operating system virtual machine on which they are implemented. The functions of the second group must be coded in terms of the operating system virtual machine. In other words, they must make system calls. It is often useful for end-users to also make system calls, even though this practice makes their programs less portable. This requirement can be accommodated by augmenting the language library with glue routines that specifically invoke the system calls, while providing the end user with suitable high-level syntax and semantics. (For detailed information on the glue routines for the C compiler, see the HighC29K Reference Manual, "Appendix A, Host Interface Definition.") Host Interface v1.0 Specification Given the above discussion. the required task is to create high-level language development tools that can be used easily and efficiently on a variety of execution vehicles. This task can be broken down into the following steps: • Define an operating system virtual machine that provides sufficient functionality to support the fundamental requirements of each high-level language. but not so much as to require a massive development effort to create. • Add appropriate glue routines to the standard libraries of the language so that the libraries are defined in terms of the operating system virtual machine. • Implement the operating system's virtual machine services on the various execution vehicles. For hardware vehicles. the virtual machine is implemented by a kernel, typically contained in a resident monitor software program. For simulation vehicles. the virtual machine is implemented by code internal to the simulator and by code simulated by the simulator. For the Am29000 hardware and software support products. HIF consists of the following operating system virtual machine definitions: • A carefully defined. efficient system call mechanism. Accessing an HIF kernel service requires a transition from user mode to supervisor mode on the processor. This requires a specific mechanism. such as a trap handler. to be invoked. • A set of services that support the primitive requirements of C. FORTRAN. and Pascal. Most of the services are defined according to UNIX® operating system interface specifications. • A specification of the environment created by the kernel. This involves the definition of storage allocation and register initializations implemented by the kernel. IMPLEMENTATION TYPES Implementations of the HIF specification take two fundamental forms: self-hosted and embedded. Examples of each of these are provided in the Standalone Execution Board (STEB) manufactured by STEP Engineering and AMD's PC Execution Board (PCEB29K). The STEB is a single-board computer that incorporates an Am29000 processor. an optional Am29027 arithmetic accelerator. program and data memory, serial ports, and timer-counter resources. The HIF implementation for this board consists of a resident monitor program that is downloaded into low-memory locations. and which implements the kernel services described in the "HIF Service Routine" section of this document. This is a selfhosted implementation. In contrast to the STEB. the PCEB29K is an add-in board for IBM PC-compatible computers that incorporates an Am29000 processor. program and data memory. serial ports. and timer-counter resources. The HIF implementation forthis board consists of two portions of code. One performs some of the kernel services on the board and the other performs some of the kernel services through the auspices of the DOS operating system. In the sense that the HIF is grafted onto the existing host operating system. it is called an embedded implementation. The architectural and instruction simulators are also embedded implementations because they share the HIF implementation between custom code and existing host-computer operating-system code. There is no preference for either type of implementation as long as the services and features of the H IF specification are fully implemented in the target environment. With the standard interfaces that a HIF implementation presents, application programs written for one environment will run equally well in another. HIF SERVICES PREVIEW Table 1 lists the services defined by the HIF interface. Most are similar or identical to equivalent UNIX operating system calls. The titles given in column one are not the names that actually exist in a particular library but, instead. are the generic names of the services. for the purpose of this overview. 3-165 29K Family Application Notes Table 1. HIF Services Page Name Description clock close cycles exit getargs getenv getpsize Iseek open read remove rename sysalloc sysfree setvec time tmpname write Returns the elapsed processor time, in milliseconds Closes a file Returns processor cycle counts Terminates a program Returns an argument address Gets the environment Returns the memory page size Sets a file position Opens a file Reads a buffer of data from a file Removes (deletes) a file Renames a file Allocates memory space Frees allocated memory space Sets user trap addresses Returns number of seconds since Jan. 1, 1970 Returns a temporary file name Writes a buffer of data to a file INTENDED AUDIENCE This document is intended for systems designers and programmers who have a working knowledge of the Am29000 and ~s supporting peripheral hardware. It does not cover CPU deSign, the Am29000 instruction set, or any other hardware detail. Those topiCS are adequately covered in the reference documents listed below. 28 14 29 10 27 23 26 17 11 15 19 20 24 25 30 22 21 16 initializations performed by the HIFconforming kernel prior to execution of a user program. Appendix A: HIF Quick Reference-lists all of the services and service parameters used in this document, in quick reference form. Appendix B: Error Messages-lists the error codes that HIF-conforming services may return. ABOUT THIS DOCUMENT The contents of each section and appendix of this document are described below: Section 1: Section 2: Section 3: Section 4: 3-166 Introduction-discusses the important concepts underlying the host interface definition and previews the services that form the basis of the HIF specification. System Call Mechanism-describes the mechanism used to make calls on the HIF services, and includes information on register usage for passing parameters and receiving results. Service Routine Descriptions-describes each of the services defined in HIF and shows details of the code sequences, including examples, for invoking the services. Process Environment-describes the standard memory allocation and register REFERENCE DOCUMENTS The user should have access to the following AMD documents: • Am29000 Streamlined Instruction Processor Users Manual, order #10620 • ADAPT29K User's Manual • MON29K User's Manual • MON29K Installation and Customization Manual • Am29000 Execution Board and Monitor User's Manual • ASM29K Utilities Manual from the ASM29K documentation set • HighC29K Reference Manual from the HighC29K documentation set Host Interface v1.0 Specification DOCUMENTATION CONVENTIONS This specification assumes some familiarity with the UNIX operating system and the C language. In the following sections, the conventions presented in the subsections below are assumed. Numeric Values All numeric values are presumed to be expressed in decimal notation, unless otherwise stated. Hexadecimal values are prefaced by the characters "Ox." Any value not prefaced by "Ox" is defined to be a decimal number. For example: 100092 Ox100092 Decimal number Hexadecimal number The first number, above, is a decimal value by implication, because it has not been prefaced by "Ox." The second constant includes the explicit "Ox" prefix, designating it as a hexadecimal value. Character Strings In the documentation, frequent mention is made of character strings that hold file names, path names, and environment variable names. In all cases, the HIF Specification requires that strings be constructed as a sequence of ASCII characters terminated by a NULL byte (an 8-bit character composed of all zero bits). This is the form in which strings are represented in the C language. Thus, the space reserved for a string must be one byte longer than the length of the string, to accommodate the NULL byte. Languages such as Pascal, which require "counted" strings (that is, a single 8-bit byte in the first character of the string that specifies the number of bytes that follow), are required to convert these to NULL-terminated form before calling the HIF kernel services. In addition, languages other than C may need to convert strings passed back from the HIF kernel services to a compatible internal form. All returned strings are in NULLterminated form. SYSTEM CALL MECHANISM System calls on Am29000-based systems are accomplished through invocation of a specific software trap. The Am29000 traps are roughly equivalent to software interrupts on other CPUs. System call traps are invoked through execution of an appropriate assert instruction whose assertion is FALSE at the time the instruction is executed. instruction, where the result of the assertion is FALSE, will cause the trap specified in the instruction to be taken. Once the trap is invoked, the Am29000 accesses a trap vector containing up to 256 separate trap handler addresses; or it may directly invoke a trap handler routine, depending on the implementation of the operating system trapping mechanism and the state of the Vector Fetch (VF) bit in the processor's Configuration Register. In most implementations, a table of vectors is used. However, the operating system software may implement direct trap execution for the increased efficiency it offers even though it requires the reservation of a much greater amount of system memory, but bypasses the need for vector table lookup. When a trap is taken, the normal program execution sequence is interrupted and the trap handler is invoked. At this point, the current program's context is contained in Am29000 CPU registers. No saving or restoring of registers is performed by the processor when a trap occurs. HIF services are required to preserve the following registers and restore their contents before returning to the application program: • All local registers • Global registers gr1, gr112, gr115, and gr125 • Global registers gr126 and gr127should be preserved according to AMD calling conventions. Their values may differ upon return from a HIF service, but the span between their values will remain the same. The HIF services may modify the contents of certain registers without first saving their values, namely: gr121, gr96, and gr97; although, the application program should not count on gr96through gr111 to be untouched by current and future HIF kernel services. HIF SERVICE INVOCATION Before invoking a HIF service, the service number and any input parameters to be passed must be loaded into Am29000 general registers. Both local and global registers are used for various HIF services, as shown in the HIF Quick Reference table in Appendix A of this document. Details for invoking specific services are contained in the Service Routine Descriptions section. Service Number Every HIF system service is identified by a unique number. Service numbers 0-127 and 256-383 are reserved for use by AMD and should not be used for user-supplied extensions. Execution of an ASEQ, ASGE, ASGEU, ASGT, ASGTU, ASLE, ASLEU, ASLT, ASLTU, or ASNEQ 3-167 29K Family Application Notes canst consth canst canst asneq canst consth canst canst asneq Ir2,input_file Ir2,input_file Ir3,O_RDONLY gr121,17 69,gr1,gr1 Ir2,input_file Ir2,input_file Ir3,O_RDONLY gr121,17 .69,gr1,gr1 The service number must be loaded into global register gr121, the trap-handler argument register. Gr121 is a temporary register and its value is not preserved over a system call, nor will its value be preserved over any trap invoked by the running program. Input Parameters Any input parameters to be passed must be placed in local registers Ir2 through Ir17. Input parameters are passed to HIF services using the parameter passing mechanism specified in the Am29000 calling conventions documentation (Am29000 Streamlined Instruction Processor User's Manual, order #1 0620). Invoking a HIF Service The HIF services are accessed by forcing trap 69 to occur, after the service number and parameters (if any) are loaded in the designated registers. Trap handler 69 executes the service in supervisor mode. set input file pathname address set open mode service number = 17 (open) force trap 69 (system call) set input file pathname address set open mode service number = 17 (open) force trap 69 (system call) Appendix B of this document for existing HIF implementations. . HIF does not specify these error codes. They may be completely defined by an implementation, except for cases in which there is a corresponding, existing, UNIX error code. In these cases, the UNIX error code is expected to be used. Example Assembly Code The code fragment above shows how the definitions are implemented in Am29000 assembly-language to invoke the open HIF service to open a file: In this example, local register Ir2 is loaded with the address of the filename constant; local register Ir3 contains the code: O_RDONLY, indicating that the file is to be opened for read-only access. The service number (17) is loaded into global register gr121 and the service is executed by asserting that register gr1 is not equal to itself. Since this is FALSE, the trap is invoked. Returned Values USER·MODE TRAPS Most services return values, usually a single integer value (number of bytes read or written, number of clock ticks, size of a memory block, etc.), or a pointer (address of a file descriptor, address of a memory block, etc.). These values are returned in register gr96, per standard high-level language calling conventions. If a service returns multiple values, the additional values are returned in gr97, gr98, and so forth. If the service fails to perform the requested task, the values contained in gr96 and succeeding registers are not guaranteed to be valid. See the documentation that accompanies your language processor for additional details on Am29000 high-level language calling conventions. When a trap is invoked, the Am29000 switches from user mode to supervisor mode to execute the trap handler code. Most traps are properly executed in this mode, including the kernel services that implement the HIF specification. However, a few traps, such as the spilllfill handlers, are intended to execute in user mode. In these cases, the trap handler code is not part of the kernel, but is supplied by the particular high-level language product library and is linked with the user's application program. In order to use a consistent trap handling mechanism, and to support the individual language products' methodologies for user-mode traps, a HIF service called setvec, is called with the address of the user-mode trap handler code for each of the traps handled in this way. Status Reporting In all cases, upon return from a HIF service, global register gr121 contains either a TRUE value (Ox80000000), or a positive non-zero integer error code indicating the reason for failure. Pre-defined error codes are listed in 3-168 Once the user-mode handler addresses have been supplied, and the corresponding trap is invoked, the operating-system kernel receives control in supervisor mode. It then reinstates user mode and invokes the appropriate language library trap handler to complete the Host Interface v1.0 Specification required operation. This bouncing from user mode to supervisor mode and back to user mode is referred to as a "trampoline" effect. When the trap handler's execution is complete, it returns directly to the user's application program, rather than back through the kernel. the fill-trap handler. Since register stack management is unique for each application environment, individual spilll fill handlers are provided with each of the high-level language products. The register stack spilVfill handlers are appropriate examples of code that is intended to execute in user mode. When a user's application program calls a function that requires a large number of local registers to execute, some currently unused registers may have to be written to main memory to free enough of the on-chip registers. In this case, the r~gisters are spilled to memory via the spill-trap handler. When the function completes execution and intends to return to its caller, the spilled registers may have to be restored by calling HIF SERVICE ROUTINES The HIF service routine calls currently defined are listed by decimal service number in Table 2 below and described in detail in the following pages. Service numbers 0 through 127 and 256 through 383 are reserved by AMD and should not be used for usersupplied extensions. Table 3 describes the parameter names used in the service descriptions. Table 2. HIF Service Calls Number 17 18 19 20 21 22 23 33 49 65 257 258 259 260 273 274 289 Title Description exit open close read write Iseek remove rename tmpnam time getenv sysalloc sysfree getpsize getargs clock cycles setvec Terminate a program Open a file Close a file Read a buffer of data from a file Write a buffer of data to a file Seek file byte Remove a file Rename a file Return a temporary name Return seconds Get environment Allocate memory space Free memory space Return memory page size Return base address Return milliseconds Return processor cycles Set user trap address Page 10 11 14 15 16 17 19 20 21 22 23 24 25 26 27 28 29 30 3-169 29K Family Application Notes Table 3. Service Call Parameters Parameter Description addrptr A pointer to an allocated memory area, command-line-argument array, pathname buffer, or NULLterminated environment variable name string. The base address of command-line-argument vector. A pointer to buffer area which data is to be read from orwritten to during the execution of I/O services. The number of bytes actually read from a file or written to a file. The number of processor cycles returned. The error code returned by the service, usually the same as the codes returned in the UNIX variable ermo. See Appendix B, Table 8, starting at page 35, for a list of HIF error codes. The exit code of the application program. A pointer to a NULL-terminated ASCII string containing the directory path of a temporary filename. The file descriptor, a small integer number. Descriptors 0, 1, and 2 are guaranteed to exist and correspond to open files on program entry (0 is UNIX equivalent of stdln and is opened for input, 1 is UNIX stdout and is opened for output, 2 is UNIX stderr and is opened for output). The fileno is returned when an open call is successful. A pointer to the address of a service. A series of option flags whose values represent the operation to be performed. Milliseconds. A pOinter to a NULL-terminated ASCII string that contains an environment variable name. The number of data bytes requested to be read from or written to a file, or numberof bytes to allocate from the heap. . A pointer to a NULL-terminated ASCII string that contains the directory path of a new filename. The number of bytes from a specified position (orig) in a file. A pointer to NULL-terminated ASCII string that contains the directory path of the old filename. A value of 0, 1, or 2 that refers to the beginning, current position, or the position of the end of a file. The memory page size in bytes returned. A pointer to a NULL-terminated ASCII string that contains the directory path of a filename. The UNIX file access permission codes. The return value that indicates success or failure. The seconds count returned. The trap number. The current position in a specified file. baseaddr buffptr count cycles errcode exitcode filename fileno funaddr mode msecs name nbytes newfile offset oldfile orig page size pathname pflag retval secs trapno where Each service description on the pages that follow contains a concise explanation of the purpose of the service, the input and result register contents, and example assembly-language code to invoke the service. In all cases, operating system kernel services that meet the HIF specifications are invoked by forcing the software trap 69 to occur. The service number is always contained in general register gr121 and parameters are passed, if necessary, in local registers, beginning with 1(2. HIF implementations are required to return an error code when a requested operation is not possible. The codes from 0 to 255 are reserved for compatibility with current and future error return standards. The currently assigned codes and their meanings are listed in Appendix B, Table 8, starting on page 35. If a HIF implementation returns an error code in the range of 0 to 255, it must carry the identical meaning to the corresponding error code in this table. Error code values larger than 255 are available for implementation-specific errors. When the service returns, general register gr121 is required to report the success orfailure of the service. If successful, gr121 is expected to contain a TRUE boolean value (a 1 bit in the most significant bit position). If the service is not successful, a positive non-zero error code is returned in g(121. If the service returns results, the first result is held in gr96, the second in gr97, and so forth. In the examples, references are made to error handlers that are not part of the example code. These are assumed to be contained in the larger part of the user's code and are not supplied as part of the HIF specification. The JMPF instructions have been provided to show that interface glue routines should incorporate this error testing philosophy in orderto be robust. In practice, error handling may be relegated to a single routine, or may be 3-170 Host Interface v1.0 SpeCification vested in individual sections of either in-line code, or as callable services by the glue routines. Since HIF implementations may exist over a wide spectrum of systems, the capabilities of the HIF may vary from one system to the next. In the simplest case, the HIF implementation in an embedded Am29000 system, such as a printer controller, may contain no external file system. In this event, the inpuVoutput facilities specified in the kernel service descriptions need not be implemented. In more common cases, where the HIF will exist on systems that have full operating system capabilities, such as DOS or UNIX, it is assumed that all of the features of the HIFwill be implemented. The service descriptions in this document provide a set of standard interfaces for commonly implemented operating system interfaces. If individual features are implemented, the interfaces are expected to follow the guidelines in this specification. Descriptions of the individual services follow on the remaining pages of this section. They are listed in numeric sequence by service number. Appendix A, HIF Quick Reference, allows easy location of a service by its number. 3-171 29K Family Application Notes Terminate a Program Service 1--exit Description This service terminates the current program and returns a value to the system kernel, indicating the reason for termination. By convention, a zero passed in Ir2 indicates normal termination, while any non-zero value indicates an abnormal termination condition. There are no returned values in registers gr96 and gr121 since this service does not return. Register Usage Type Regs Calling: Returns: Contents Description gr121 1 (Ox1) Service number Ir2 exitcode User-supplied exit code gr96 undefined This service call does not return gr121 undefined This service call does not return Example Call const const asneq Ir2, 1 gr121,1 69,grl,grl In the above example, the operating system kernel is being called with service code 1 and an exit code of 1, which is interpreted according to the specifications of the individual operating system. The value of the exit code is not defined as part of the HIF specification. In general, however, an exit code of zero (0) specifies a normal program termination condition, while a non-zero o 3-172 exit code = 1 service = 1 call the operating system code specifies an abnormal termination resulting from detection of an error condition within the program. Programs can terminate normally by falling through the curly brace at the end of the main function in a C-Ianguage program. Other languages may require an explicit call to the kernel's exit service. Host Interface v1.0 Specification Open a File Service 17-open Description This service opens a named file in a requested mode. Files must be explicitly opened before any read, write, close, or other file positioning accesses can be accomplished. The open service, if successful, returns an integer token that is used to refer to the file in all subsequent service requests. In many high-level languages, the returned token is referred to as a '1i1e descriptor." Register Usage Description Contents Type Regs Calling: gr121 17(Ox11) Service number 1r2 1r3 pathname A pointer to a filename Ir4 mode pflag See parameter descriptions below See parameter descriptions below gr96 fileno gr121 Ox80000000 errcode Success: ;;:: (file descriptor) Failure: < 0 Logical TRUE, service successful Error number, service not successful (implementation dependent) Returns: Parameter Descriptions Pathname is a pointer to a zero-terminated string that contains the full path and name of the file being opened.* Individual operating systems have different means to specify this information. With hierarchical file systems, individual directory levels are separated with special characters that can not be part of a valid filename or directory name. In UNIX-compatible file systems, directory names are separated by forward slash characters "/" (e.g., "/usr/jack/files/myfile"); where "usr," "jack," and '1i1es" are succeedingly lower directory levels, beginning at the root directory of the file system. The name "myfile" is the filename to be opened at the specified level. The individual characteristics of files and pathnames are determined by the specifications of a particular operating system implementation. Mode is composed of a set of flags, whose mnemonics and associated values are listed in Table 4. Table 4. Open Service Parameters Name Value Description O_RDONLY O_WRONLY O_RDWR O_APPEND O_NDELAY O_CREAT O_TRUNC O_EXCL O_FORM OxOOOO OxOOO1 OxOOO2 OxOOO8 OxOO10 Ox0200 Ox0400 Ox0800 Ox4000 Open for read only access Open for write only access Open for read and write access Always append when writing No delay Create file if it does not exist If the file exists, truncate it to zero length Fail if writing and the file exists Open in text format The O_RDONLY mode provides the means to open a file and guarantee that subsequent accesses to that file will be limited to read operations. The operating system implementation will determine how errors are reported * for unauthorized operations. The file pointer is poSitioned at the beginning of the file, unless the O_APPEND mode is also selected. The HIF specification intentionally refrains from defining the constituents of a legal path name, or any intrinsic characteristics of the implemented file system. In this regard, the only requirement of a H1F-conforming kernel is that when the open service is successfully performed, that the routine returns a small integer value that can be used in subsequent inpuVoutput service calls to refer to the opened file. 3-173 29K Family Application Notes The O_WRONLY mode provides the means to open a file and guarantee that subsequent accesses to that file will be limited to write operations. The operating system implementation will determine how errors are reported for unauthorized operations. The file pointer is positioned at the beginning of the file, unless the a_APPEND mode is also selected. The O_RDWR mode provides the means to open a file for subsequent read and write accesses. The file pointer is positioned at the beginning of the file, unless the a_APPEND mode is also selected. If a_APPEND mode is selected, the file pointer is positioned to the end of the file at the conclusion of a successful open operation, so that data written to the file is added following the existing file contents. Ordinarily, a file must already exist in order to be opened. If the O_CREAT mode is selected, files that do not currently exist are created; otherwise, the open function will return an error condition in gr121. If a file being opened already exists and the 0_TRUNC mode is selected, the original contents of the file are discarded and the file pOinter is placed at the beginning of the (empty) file. If the file does not already exist, the HIF service routine should return an error value in gr121, unless O_CREAT mode is also selected. The O_EXCL mode provides a method for refusing to open the file if the O_WRONLY or O_RDWR modes are selected and the file already exists. In this case, the kernel service routine should return an error code in gr121. a_FORM mode indicates that the file is to be opened as a text file, ratherthan a binary file. The nominal standard input, output, and error files (file descriptors 0,1, and 2) are assumed to be open in text mode priorto commencing execution of the user's program. 3-174 When opening a FIFO (interprocess communication file) with O_RDONLY or O_WRONL Y set, the following conditions apply: • If O_NDELAY is set (Le., equal to Ox0010): - A read-only open will return without delay. - A write-only open will return an error if no process currently has the file open for reading. • If O_NDELAY is clear (Le., equal to OxOOOO): - A read-only open will block until a process opens a file .for writing. - A write-only open will block until a process opens a file for reading. When opening a file associated with a communication line (e.g., a remote modem or terminal connection), the following conditions apply: • If O_NDELAY is set, the open will return without waiting for the carrier detect condition to be TRUE. • If O_NDELAY is clear, the open will block until the carrier is found to be present. The optional pflag parameter specifies the access permissions associated with a file; it is only required when O_CREAT is also specified (Le., create a new file if it does not already exist). If the file already exists, pflag is ignored. This parameter specifies UNIX-style file access permission codes (r, W, and xfor read, write, and execute, respectively) for the file's owner, the work group, and other users. If the parameter is missing, pflag will be set to -1 (all accesses allowed). See the UNIX operating system documentation for additional information on this topic. Host Interface v1.0 Specification Example Call path: .ascii .set .set " /usr/jack/files/myfile\O" mode, O_RDWRI O_CREAT 10_FORM permit,Ox180 fd: .word const consth const const const asneq jmpf const consth store o Ir2,path Ir2,path Ir3,mode Ir4,permit gr121,17 69,gr1,gr1 gr121,open_err gr120,fd gr120,fd 0,0,gr96,gr120 In the above example, the file is being opened in read! write text mode. The UNIX permissions of the owner are set to allow reading and writing, but not execution, and all other permissions are denied. As indicated above in the parameter descriptions, the file permissions are only used if the file does not already exist. When the open service returns, the program jumps to the open_err error handler if the open was not successful; otherwise, the file descriptor returned by the service is stored for future use in read, write, Iseek, remove, rename, or close service calls. address of pathname open mode settings permissions service = 17 (open) perform OS call jump if error on open set address of file descriptor store file descriptor As described in the introduction to these services, the HIF can be implemented to several degrees of elaboration, depending on the underlying system hardware, and whether the operating system is able to provide the full set of kernel services. In the least capable instance (i.e., a standalone board with a serial port), it is likely that only the O_RDONL Y, O_WRONLY and O_RDWR modes will be supported. In more capable systems, the additional modes should be implemented, if possible. 3-175 29K Family Application Notes Service 18-close Close a File Description This service closes the open file associated with the file descriptor passed in Ir2. Closing all file's is automatic on program exit (see exit), but since there is an implemen- tation-defined limit on the number of open files per process, an explicit close service call is necessary for programs that deal with many files. Register Usage Type Regs Calling: Returns: Contents Description gr121 18 (Ox12) Service number Ir2 fileno File descriptor gr96 retval Success: = 0 Failure: < 0 gr121 Ox80000000 errcode Logical TRUE, service successful Error number, service not successful (implementation dependent) Example Call fd: .word ° const consth load const asneq jrnpf nop gr96,fd gr96,fd O,O,lr2,gr96 gr121,18 69,gr1,gr1 gr121,clos_err The above example illustrates loading a previously stored file descriptor (fd, in this case) and calling the kernel's close service to close the file associated with that descriptor. If an error occurs when attempting to 3-176 set address of file descriptor get file descriptor service = 18 and call the as handle close error close the file, the kernel will return an error code ingr121 (the content of that register will not be TRUE) and the program will jump to an error handler; otherwise, program execution will continue. Host Interface v1.0 Specification Read a Buffer of Data from a File Service 19-read Description This service reads a number of bytes from a previously opened file (identified by a small integer file descriptor in Ir2that was returned by the open service) into memory starting at the address given by the buffer pointer in Ir3. Lr4 contains the number of bytes to be read. The num- ber of bytes actually read is returned in gr96. Zero is returned in gr96if the file is already positioned at its endof-file. If an error is detected, a small positive integer is returned in gr121, indicating the nature of the error. Register Usage Contents Description gr121 19 (Ox13) Service number 1r2 1r3 fileno File descriptor buffptr A pointer to buffer area Ir4 nbytes Number of bytes to be read gr96 count gr121 Ox80000000 errcode Success: > 0 (number of bytes actually read) EOF: =0 Failure: < 0 Logical TRUE, service successful Error number, service not successful (implementation dependent) Type Regs Calling: Returns: Example Call fd: buf: num: .word .set .block .word BUFSIZE,256 BUFSIZE canst consth load canst consth canst canst asneq jmpf canst consth store gr96,fd gr96,fd O,O,lr2,gr96 Ir3,buf Ir3,buf lr4,BUFSIZE gr121,19 69,gr1,gr1 gr121,rd_err gr120,num gr120,num O,O,gr96,gr120 ° ° The above example requests the HIF to return BUFSIZE bytes from the file descriptor contained in the variable fd. If the call is successful, gr121 will contain a TRUE value set address of file descriptor get file descriptor set buffer address specify buffer size service = 19 call the as handle read errors set address of 'num' argument store bytes read and gr96will contain the number of bytes actually read. If the service fails, gr121 will contain the error code. 3-177 29K Family Application Notes Write a Buffer of Data to a File .Service 20-write Description This service writes a number of bytes from memory (starting at the address given by the pointer in Ir3) into the file specified by the small positive integer file descriptor that was returned by the open service when the file was opened for writing. Lr4 contains the number of bytes to be written. The number of bytes actually written is returned in gr96. If an error is detected, gr121 will contain a small positive integer on return from the service, indicating the nature of the error. Register Usage Contents Description gr121 20 (Ox14) Service number 1r2 1r3 fileno buffptr File descriptor A pOinter to the buffer area Ir4 nbytes Number of bytes to be written gr96 count Success: = Ir4 Failure: 0$ gr96< Ir4 Extreme: < 0 Ox80000000 errcode Logical TRUE, service successful Error number, service not successful (implementation dependent) Type Regs Calling: Returns: . gr121 Example Call fd: buf: num: .word .set .block .word BUFSIZE,256 BUFSIZE canst consth load canst consth canst canst asneq jmpf canst consth store gr96,fd gr96,fd 0,0,lr2,gr96 Ir3,buf Ir3,buf Ir4,BUFSIZE gr121,20 69,grl,grl gr121,wr_err gr120,num gr120,num 0,O,gr96,gr120 ° ° The example, above, writes BUFSIZE bytes from the buffer located at buf to the file associated with the descriptor stored in fd. If errors are detected during execution of the service, the value returned in gr121 will 3-178 set address of file descriptor get file descriptor set buffer address specify buffer size service = 20 call the as handle write errors set address of "num" variable store bytes written be FALSE. In this case, the wr_err error handler will be invoked. The number of bytes actually written is stored in the variable num. Host Interface v1.0 Specification Seek a File Byte Service 21-lseek Description This service positions the file associated with the file descriptor in 1r2, "offsef' number of bytes from the position of the file referred to by the o,ig parameter. L,3 contains the number of bytes offset and 1,4 contains the value for o,ig. The parameter o,ig is defined as: The Iseek service can be used to reposition the file pointer anywhere in a file. The offset parameter may either be positive or negative. However, it is considered an errorto attempt to seek in front of the beginning of the file. o = Beginning of the file 1 = Current position of the file 2 = End of the file Register Usage Type Regs Calling: Returns: Contents Description gr121 21 (Ox1S) Service number 1r2 1r3 fileno File descriptor Ir4 offset orig Number of bytes offset from orig A code number indicating the point within the file from which the offset is counted gr96 where gr121 Ox80000000 errcode Success: ~ 0 (current position in the file) Failure: < 0 Logical TRUE, service successful Error number, service not successful (implementation dependent) Example Call fd: orig: off: .word .word .word const consth .Load const consth load const consth load const asneq jmpf nop 6 23 file descriptor 6 or1g1n start of file offset = 23 bytes gr96,fd gr96,fd O,O,lr2,gr96 gr96,off gr96,off O,O,lr3,gr96 gr96,orig gr96,orig O,O,lr4,gr96 gr121,21 69, gr1, gr1 gr121,seek_err set address of file descriptor get file descriptor set address of offset argument get offset set address of origin argument get origin service = 21 call the OS seek error if false a The above example shows how a file can be positioned to a particular byte address by specifying the o,ig, which is the starting point from which the file position is adjusted, and the offset, which is the number of bytes from the o,ig to move the file pointer. In this case, the file identified by file descriptor 6 is being repositioned to byte 23, measured from the beginning of the file (o,ig = 0). 3-179 29K Family Application Notes The file descriptor, offset, and orig values are loaded from preset constants and Iseek is called to perform the file positioning operation. If an error occurs when attempting to reposition the file, the value returned in 3·180 gr121 is not TRUE, containing an error code that indicates the reason for the error. Upon return, gr96 also contains the file poSition measured from the beginning of the file. In this case, this value is not stored. Host Interface v1.0 Specification Remove a File Service 22-remove Description This service deletes a file from the file system. Lr2 contains a pointer to the pathname of the file. The path must point to an existing file, and the referenced file should not be currently open. The behavior of the remove service is undefined if the file is open. Register Usage Type Regs Contents Description Calling: gr121 22 (Ox16) Service number 1r2 pathname A pointer to string that contains the pathname of the file Returns: gr96 retval gr121 Ox80000000 errcode Success: = 0 Failure: < 0 Logical TRUE, service successful Error number, service not successful (implementation dependent) Example Call path: .ascii "/usr/jack/files/myfile\O" const consth const asneq jmpf nop lr2,path lr2,path gr121,22 69,grl,grl gr121,rem_err Inthe above example, a file with a UNIX-style pathname stored in the string named path is being removed. The address (pointer) to the string is put into Ir2 and the kernel service 22 is called to remove the file. If the file set address of file pathname. service = 22 call the OS jump if error does not exist, or if it has not previously been closed, an error code will be returned in gr121; otherwise, the value in gr121 will be TRUE. 3-181 29K Family Application Notes Rename a File Service 23-rename Description This service moves a file to a new location within the file system. Lr2 contains a pointer to the file's old pathname and Ir3 contains a pointer to the file's new pathname. When all components of the old and new pathnames are the same, except forthe filename, the file is said to have been renamed. The file identified by the old path name must already exist, or an error code will be returned and the rename operation will not be performed. Register Usage Regs Calling: gr121 23 (Ox17) Service number Ir2 oldfile A pointer to string containing the old pathname of the file 1r3 newfile A pointer to string containing the new path name of the file gr96 retval gr121 Ox80000000 errcode Success: = 0 Failure: < 0 Logical TRUE, service successful Error number, service not successful (implementation dependent) Returns: Contents Description Type Example Call old: new: .ascii .ascii "/usr/fred/payroll/report\O" "/usr/fred/history/june89\O" const consth const consth const asneq lr2,old lr2,old lr3,new lr3,new gr121,23 69, grl, grl service = 23 (rename) call the as jmpf nop gr121,ren_err jump if rename error The above example moves a file from its old path (renaming it in the process) to its new pathname location. The file will no longer be found at the old location. 3-182 set address of old pathname set address of new pathname Host Interface v1.0 Specification Return Temporary Name Service 33-tmpnam Description This service generates a string that can be used as a temporary file pathname. A different name is generated each time it is called. Generally. the name is guaranteed not to duplicate any existing filename. The argument should be a valid pointer to a buffer that is passed in large enough to contain the constructed file name. HIF implementations are required to allocate a minimum of 128 bytes for this purpose. ,,2 and return a non-zero error number in global register g,121. The HI F specification sets no standards for the format or content of legal pathnames; these are determined by individual operating system requirements. However. each implementation should undertake to construct a valid filename that is also unique. ,,2 If the argument in contains a NULL pointer. the HIF service routine should treat this as an error condition Register Usage Type Regs Contents Description Calling: gr121 33 (Ox21) Service number Ir2 addrptr A pointer to buffer into which the filename is to be stored gr96 filename Success: pointer to the temporary filename string. This will be the same as 1r2 on entry unless an error occurred Failure: = 0 ( NULL pointer) gr121 Ox80000000 errcode Logical TRUE. service successful Error number. service not successful (implementation dependent) Returns: Example Call fbuf: = 21 bytes .block 21 buffer size canst consth canst asneq jmpf nap lr2,fbuf lr2,fbuf gr121,33 69,gr1,gr1 gr121,tmp_err set buffer pointer In the above example. the tmpnam service is called with a pointer to tbut. which has been allocated to hold a name that is up to 21 bytes in length. If the service is able to construct a valid name. the filename will be stored in service = 33 call the as jump if error tbutwhen the service returns. If the content of g,121 on return is not TRUE. the program fragment jumps to tmp_err to handle the error condition. 3-183 29K Family Application Notes Return Seconds Since 1970 Service 49-time Description This service returns, in register gr96, the number of seconds elapsed since midnight, January 1, 1970, as an integer 32-bit value. It is assumed that the kernel service will have access to a counter whose contents can be preloaded that measures time, with at least a onesecond resolution, for this purpose. Register Usage Type Regs Contents Description Calling: gr121 49 (Ox31) Service number Returns: gr96 secs gr121 Ox80000000 errcode Success: "# 0 (time in seconds) Failure: = 0 Logical TRUE, service successful Error number, service not successful (implementation dependent) Example Call sees: .word 0 canst asneq jmpf canst consth store gr121,49 69,grl,grl gr121,tim_err gr120,secs gr120,secs O,O,gr96,gr120 In the above example, the kernel service time is being called. If the value returned in g(121 is TRUE, the number of seconds returned in gr96is stored in the sees 3-184 service = 49 call the as jump if error set the address for storing 'sees' store the seconds variable; otherwise, the program jumps to tim_err to determine the cause of the error. Host Interface v1.0 specification Get Environment Service 65-getenv Description This service searches the system environment for a string associated with a specified symbol. Lr2contains a pointer to the symbol name. If the symbol name is found, a pointer to the string associated with it is returned in gr96; otherwise, a NULL pointer is returned. In UNIX-hosted systems, the setenv command allows a user to associate a symbol with an arbitrary string. For example, the command setenv TERM vt100 defines the string 'yt 100" to be associated with the symbol named TERM. Application programs can use this association to determine the type of terminal connected to the system, and, therefore, use the correct set of codes when outputting information to the user's screen. Toaccess the string, getenv should be called with 1r2 pointing to a string containing the TERM symbol name. The address returned in gr96 will point to the corresponding ''vt100'' string if TERM is found. In UNIXhosted systems, entering a different setenv command lets the user select a different terminal name without requiring recompilation of the application program. Operating system implementations that do not include provisions for environment variables should always return a NULL value in gr96 when this service is requested. Register Usage Type Regs Contents Calling: gr121 65 (Ox41) Service number 1r2 name A pointer to the symbol name gr96 addrptr Success: pointer to the symbol name string Failure: = 0 ( NULL pointer) gr121 Ox80000000 errcode Logical TRUE, service successful Error number, service not successful (implementation dependent) Returns: Description Example Call mysym: strptr: .ascii .word "MYSYMBOL\O" 0 canst consth canst asneq jmpf canst consth store lr2,mysym lr2,mysym gr121,65 69,gr1,gr1 gr121,env_err gr120,strptr gr120,strptr O,O,gr96,gr120 The above example program calls the operating system getenv service to access a string associated with the environment variable MYSYMBOL. If the symbol is found, a pointer to the string associated with the symbol set address of symbol to be locat~d in environment service = 65 call the OS jump if error set address of stxing pointer store string pointe~ ·is returned in gr96. If the call is not successful (Le., gr121 holds a FALSE boolean value upon return), the program jumps to env_err to handle the error condition. 3-185 29K Family Application Notes Allocate Memory Space Service 257-sysalloc Description This service allocates a specified number of contiguous bytes from the operating-system-maintained heap and returns a pointer to the base of the allocated block. Lr2 contains the number of bytes requested. If the storage is successfully allocated, gr96 contains a pointer to the block; otherwise, gr121 contains an error code indicating the reason for failure of the call. Register Usage Type Regs Contents Description Calling: gr121 1r2 257 (Ox101) nbytes Service number Number of bytes requested Returns: gr96 addrptr gr121 Ox80000000 errcode Success: pointer to allocated bytes, Failure: = 0 ( NULL pointer) Logical TRUE, service successful Error number, service not successful (implementation dependent) Example Call blkptr: .word 0 canst canst asneq jmpf canst consth store lr2, 1200 gr121,257 69,gr1,gr1 gr121,alloc_err gr120,blkptr gr120,blkptr 0,0,gr96,gr120 The above example requests a block of 1200 contiguous bytes from the system heap. If the call is successful, the program stores the pointer returned in gr96 into a 3-186 request 1200 bytes service = 257 call the as jump if error set address to store pointer store the pointer local variable called blkptr. If gr121 contains a boolean FALSE value when the service returns, the program jumps to alloc_err to handle the error condition. Host Interface v1.0 SpeCification Free Memory Space Service 258-sysfree Description This service returns memory to the system starting at the address specified in 1'2. L,3 contains the number of bytes to be released. The pointer passed to the sysfree service in 1,2 and the byte count passed in 1,3 must match the address returned by a previous sysalloc service request for the identical number of bytes. No dynamic memory allocation structure is implied by this service. High-level language library functions such as malloc() and free() for the C language are required to manage random dynamic memory block allocation and deallocation, using the sysalloc and sysfree kernel functions as their basis. Register Usage Type Regs Contents Description Calling: gr121 1r2 1r3 258 (Ox102) addrptr nbytes Service number Starting address of area returned Number of bytes to release gr96 retval Success: = 0 Failure: < 0 gr121 Ox80000000 errcode Logical TRUE, service successful Error number, service not successful (implementation dependent) Returns: Example Call blkptr: .word 0 canst consth load canst canst asneq jmpf nap gr120,blkptr gr120,blkptr O,O,lr2,gr120 Ir3,1200 gr121,258 69,grl,grl gr121,free_err The above example calls sysfree to deallocate 1200 bytes of contiguous memory, beginning at the address stored in the blkpt,variable. If the call is successful, the set address of previously block pointer fetch pointer to block set number of bytes to release service = 258 call the OS jump if error program continues; otherwise, if the return value in g,121 is FALSE, the program jumps to free_err to handle the error condition. 3-187 29K Family Application Notes Return Memory Page Size Service 259--getpsize Description This service returns, in register gr96, the page size, in bytes, used by the memory system of the HI F implementation. Register Usage Type Regs Contents Description Calling: gr121 259 (Ox103) Service number Returns: gr96 page size gr121 Ox80000000 errcode Success: memory page size, one of the following: 1024,2048,4096, and 8192 Failure: < 0 Logical TRUE, service successful Error number, service not successful (implementation dependent) Example Call pagsiz: .word ° const asneq jmpf const consth store gr121,259 69,grl,grl gr121,pag_err gr120,pagsiz gr120,pagsiz O,O,gr96,gr120 The above example calls the operating system kernel to return the page size used by the virtual memory system. If.the call was successful, gr121 will contain a boolean TRUE result and the program will store the value in gr96 3-188 service = 259 call the as jump if error set address to store the page size store it! into the pagsizvariable; otherwise, a boolean FALSE is returned in gr121. In this case, the program will jump to pag_err to handle the error condition. Host Interface v1.0 Specification Return Base Address Service 260-getargs Description This service returns the base address of the commandline-argument vector argv in register gr96, as constructed by the operating system kernel when an application program is invoked. Arguments are stored by the operating system as a series of NULL-terminated character strings. A pointer containing the address of each string is stored in an array whose base address (referred to as argv) is returned by the getargs HIF.service. The last entry in the array contains a NULL pointer (an address consisting of all zero bits). The number of arguments can be computed by counting the number of pointers in the array, using the fact that the NULL pointer terminates the list. Register Usage Type Regs Contents Description Calling: gr121 260 (Ox104) Service number Returns: gr96 baseaddr Success: base address of argv Failure: = 0 ( NULL pointer) gr121 Ox80000000 errcode Logical TRUE, service successful Error number, service not successful (implementation dependent) Example Call argptr: .word const asneq jmpf const consth store 0 gr121,260 69,grl,grl gr121,bas_err gr120,argptr gr120,argptr O,0,gr96,gr120 The above example calls operating system service 260 to access the command-line-argument vector address. If the service executes without error, the program continues by storing the argument vector address in the service = 260 call the as jump if error set address where base pointer is to be stored store the pointer variable basptr. If gr121 contains a boolean FALSE value upon return, the program jumps to bas_err to handle the error condition. 3·189 29K Family Application Notes Return Time in Milliseconds Service 273--clock Description This service returns the elapsed processor time in milliseconds. Operating system initialization procedures set this value to zero on startup. Successive calls to this service return times that can be arithmetically subtracted to accurately measure time intervals. Register Usage Type Regs Contents Description Calling: gr121 273 (Ox111) Service number Returns: gr96 msecs gr121 Ox80000000 errcode Success: '# 0 (time in milliseconds) Failure: = 0 Logical TRUE, service successful Error number, service not successful (implementation dependent) Example Call time: .word 0 const asneq jmpf const consth store gr121,273 69,grl,grl gr121,clk_err gr120,time gr120,time O,O,gr96,gr120 The above example calls the operating system kernel to get the current value of the system clock in milliseconds. On return, if gr121 contains a boolean FALSE value, the 3-190 service = 273 call the OS jump i f error set the address where time is to be stored store the time in ms. program jumps to elk_err to handle the error; otherwise, the time in milliseconds is stored in the variable time. Host Interface v1.0 Specification Return Processor Cycles Service 274-cycles Description This service returns an ascending positive number in registers gr96 and gr97that is the number of processor cycles that have elapsed since the last hardware RESET was applied to the CPU. It provides a mechanism for user programs to access the contents of the internal Am29000 timer counter register. The cycle count can be multiplied by the speed of the processor clock to convert it to a time value. Gr97will contain the most significant bits of the cycle count, while gr96 will contain the least significant bits. HIF implementations of this service are required to provide a cycle count with a minimum of 56 bits of precision. Register Usage Type Regs Contents Description Calling: gr121 274 (Ox112) Service number Returns: gr96 cycles Success: Bits 0-31 of processor cycles Failure: = 0 (in both gr96 and gr97) gr97 cycles Success: Bits 32-55 of processor cycles Failure: = 0 (in both gr96 and gr97) gr121 Ox80000000 errcode Logical TRUE, service successful Error number, service not successful (implementation dependent) Example Call cycles: .word .word °° MSBs of cycles LSBs of cycles canst asneq jmpf canst consth store add store gr121,274 69,grl,grl gr121,cyc_err gr120,cycles gr120,cycles O,O,gr97,gr120 gr120,gr120,4 O,O,gr96,gr120 service = 274 call the as jump if error set the address where the count is to be stored store the MSBs, increment the address, then store the LSBs of cycles. The above example program fragment calls the operating system service 274 to access the number of CPU cycles that have elapsed since it was powered on. The cycle count (in gr96 and gr97) is stored in the two words addressed by the variable cycles if the service call is successful. If gr121 contains a boolean FALSE value on exit, the program jumps to cyc_err to handle the error condition. 3-191 29K Family Application Notes Set User Trap Address Service 289-setvec Description This service sets the address for user-level trap handler services that implement the local register stack spill and fill traps. It returns an indication of success or failure in register gr96. The method used to invoke these traps in user mode is described on page 6 of this specification, in the "User-Mode Traps" section. Register Usage Type Regs Contents Description Calling: gr121 1r2 1r3 289 (Ox121) trapno funaddr Service number trap number address of user trap handler Returns: gr96 retval gr121 Ox80000000 errcode Success: = 0 Failure: < 0 Logical TRUE, service successful Error number, service not successful (implementation dependent) Example Call trpadr: .word const const consth const asneq jmpf const consth store 0 lr2,64 lr3,t64_hnd lr3,t64_hnd gr121,289 69,grl,grl gr121,vec_err gr120,trpadr gr120,trpadr O,O,gr96,gr120 The above example calls the setvec service to pass the address to be used for the trap 64 trap handler routine. If the service returns with gr121 containing a boolean 3-192 trap number = 64 set address of trap-64 handler service = 289 call the as jump if error set address where to store the trap address and store it! TRUE result, the program continues by storing the trap address returned in gr96; otherwise, the program jumps to vec_err to handle the error condition. Host Interface v1.0 Specification PROCESS ENVIRONMENT Register Allocate Bound (gr126) There are standard memory and register initializations that must be performed by a HIF-conforming kernel before entry to a user program. In C-Ianguage programs, this is usually performed by the module crtO. This module receives control when an application program is invoked, and executes prior to invocation of the user's main function. Other high-level languages have similar modules. The register allocate bound (RAB) register contains the register stack address of the lowest-addressed word contained within the register file. RAB is referenced in the prolog of most user program functions to determine whether a register spill operation is necessary to accommodate the local register requirements of the called function. Register Free Bound (gr127) STARTUP INITIALIZATION Initialization procedures must establish appropriate values for the general registers mentioned below. In addition, file descriptors for the standard input and output devices must be opened. Register Stack Pointer (gr1) The register stack pointer (RSP) register contains the main memory address in which the local register IrOwill be saved, and from which it will be restored. The content of RSPis compared to the content of RABto determine when it is necessary to spill part of the local register stack to memory. On startup, the values in RAB, RSP and RFB should be initialized to prevent a spill trap from occurring on entry to the crtO code, as shown by the following relation: (RAB + 256) RSP RFB This provides the crtO code with at least 64 registers on entry, which should be a sufficient number to accomplish its purpose. Memory Stack Pointer (gr125) The memory stack pOinter (MSP) register points to the top of the memory stack, or the lowest-addressed entry on the memory stack. This register must be preserved (or, more conventionally, restored). The register free bound (RFB) register contains the register stack address of the lowest-addressed word not contained within the register file (andgreaterthan RAB). RFB is referenced in the epilog of most user program functions to determine whether a register fill operation is necessary to restore previously spilled registers needed by the function's caller. Open File Descriptors File descriptor 0 (corresponding to the standard input device) must be opened for text mode input. File descriptors 1 and 2 (corresponding to standard output and standard error devices) must be opened for text mode output prior to entry to the user's program. PROGRAM TERMINATION The only valid way for an application to terminate execution is by calling the exit service. Most high-level languages provide this capability, even if the programmer does not explicitly invoke a corresponding library function. TRAP HANDLERS The trap vector entries shown in Table 5 must be installed, and corresponding handlers must be provided. 3-193 29K Family Application Notes Table 5~ Trap Handler Vectors Note: 3·194 Trap Description 32 33 34 35 36 42 43 44 45 46 47 48 49 50 51 52 53 54 55 64 65 69 MULTIPLY DIVIDE MULTIPLU DIVIDU CONVERT FEQ DEQ FGT DGT FGE DGE FADD DADO FSUB DSUB FMUL OMUL FOIV 001 V Spill (Set up by the user's task through a setvec call) Fill (Set up by the user's task through a setvec call) HIF System Call The Spill (64) and Fill (65) traps are returned to the user's code to perform the trap handling functions in user mode,as described in the "User Mode Traps" section. Host Interface v1.0 Specification APPENDIX A: HIF QUICK REFERENCE Table 6 lists the HIF service calls, calling parameters, and the returned values. If a column entry is blank, it means the register is not used or is undefined. Table 7 describes the parameters given in Table 6. Table 6. HIF Service Calls Title exit Returned Values Call1na Parameters Service GR121 1 LR2 exitcode GR97 LR3 LR4 GR96 GR121 mode pflag fileno errcode retval errcode , open 17 pathname close 18 fileno read 19 fileno buffptr nbytes count errcode nbytes count errcode orig where errcode retval errcode retval errcode filename errcode secs errcode write 20 fileno buffptr Iseek 21 file no offset remove 22 pathname rename 23 oldfile tmpnam 33 addrptr time 49 getenv 65 name addrptr errcode sysalloc 257 nbytes addrptr errcode addrptr newfile sysfree 258 retval errcode getpsize 259 pagesize errcode getargs 260 baseaddr errcode clock 273 msecs errcode cycles 274 LSBs cycles setvec 289 trapno nbytes funaddr retval MSBs cycles errcode errcode 3-195 29K Family Application Notes Table 7. Service Call Parameters Parameter Description addrptr A pointer to an allocated memory area, a command-line-argument array, a pathname buffer, or a NULL-terminated environment variable name string. The base address of the command-line-argument vector. A pointer to the buffer area where data is to be read from or written to during the execution of I/O services. The number of bytes actually read from file or written to a file. The number of processor cycles (returned value). The error code returned by the service. These are usually the same as the codes returned in the UNIX ermo variable. See Appendix B, Table 8, for a list of HIF error codes. The exit code of the application program. A pointerto a NULL-terminated ASCII string that contains the directory path of a temporary filename. The file descriptor which is a small integer number. File descriptors 0, 1, and 2 are guaranteed to exist and correspond to open files on program entry (0 refers to the UNIX equivalent of stdln and is opened for input; 1 refers to the UNIX stdout, and is opened for output; 2 refers to the UNIX stderr, and is opened for output). A pointer to the address of a function. A series of option flags whose values represent the operation to be performed. Milliseconds. A pointer to a NULL-terminated ASCII string that contains an environment variable name. The number of data bytes requested to be read from or written to a file, or the number of bytes to allocate from the heap. A pointer to a NULL-terminated ASCII string that contains the directory path of a new filename. The number of bytes from a specified pOSition (orig) in a file. A pointer to NULL-terminated ASCII string that contains the directory path of the old filename. A value of 0, 1, or 2 that refers to the beginning, the current position, or the position of the end of a file. The memory page size in bytes (returned val). A pointer to a NULL-terminated ASCII string that contains the directory path of a filename. The UNIX file access permission codes. The return value that indicates success or failure. The seconds count returned. The trap number. The current position in a specified file. baseaddr buffptr count cycles errcode exitcode filename fileno funaddr mode msecs name nbytes newfile offset oldfile orig pagesize pathname pflag retval secs trapno where 3-196 Host Interface vl.0 Specification APPENDIX 8: ERROR NUMBERS HIF implementations are required to return error codes when a requested operation is not possible. The codes from 0 to 255 are reserved for compatibility with current and future error return standards. The currently assigned codes and their meanings are shown in Table 8.lf a HIF implementation returns an error code in the range of 0 to 255, it must carry the identical meaning to the corresponding error code in this table. Error code values larger than 255 are available for implementationspecific errors. Table 8. HIF Error Numbers Assigned Number Error Name Description Not used. 0 EPERM Not owner This error indicates an attempt to modify a file in some way forbidden except to its owner. 2 ENOENT No such file or directory This error occurs when a file name is specified and the file should exist but does not, or when one of the directories in a path name does not exist. 3 ESRCH No such process The process or process group whose number was given does not exist, or any such process is already dead. 4 EINTR Interrupted system call This error indicates that an asynchronous signal (such as interrupt or quit) that the user has elected to catch occurred during a system call. 5 EIO I/O error Some physical I/O error occurred during a read or write. This error may in some cases occur on a call following the one to which it actually applies. 6 ENXIO No such device or address I/O on a special file refers to a sub-device that does not exist or is beyond the limits of the device. 7 E2BIG Arg list is too long An argument list longer th~n 5120 bytes is presented to execve. 8 ENOEXEC Exec format error A request is made to execute a file that, although it has the appropriate permissions, does not start with a valid magic number. 9 EBADF Bad file number Eithera file descriptor refers to noopenfile, or a read (write) request is made to a file that is open only for writing (reading). 10 ECHILD No children Wait and the process has no living or unwaited-for children. 11 EAGAIN No more processes In a fork, the system's process table is full, or the user is not allowed to create any more processes. 12 ENOMEM Not enough memory During an execve or break, a program asks for more memory than the system is able to supply or else a process size limit would be exceeded. 3-197 29K Family Application Notes Table 8. HIF Error Numbers Asslgne~ (continued) Number 13 Error Name EACCESS Description Permission denied An attempt was made to access a file in a way forbidden by the protection system. 14 EFAULT Bad address The system encountered a hardware fault in attempting to access the arguments of a system call. 15 ENOTBLK Block device required A plain file was mentioned where a block device was required. such as in mount. 16 EBUSY Device busy An attempt was made to mount a device that was already mounted. or an attempt was made to dismount a device on which there is an active file (open file. current directory. mounted-on file. or active text segment). 17 EEXIST File exists An existing file was mentioned in an inappropriate context. e.g .• link. 18 EXDEV Cross-device link A hard link to a file on another device was attempted. 19 ENODEV No such device An attempt was made to apply an inappropriate system call to a device. e.g .• to read a write-only device. or the device is not configured by the system. 20 ENOTDIR Not a directory A non-directory was specified where a directory is required. for example. in a path name or as an argument to chdir. 21 EISDIR Is a directory An attempt to write on a directory. 22 EINVAL Invalid argument This error occurs when some invalid argument for the call is specified. For example. dismounting a non-mounted device. mentioning an unknown Signal in signal. or specifying some other argument that is inappropriate for the call. 23 ENFILE File table overflow The system's table of open files is full. and temporarily no more open requests can be accepted. 24 EMFILE Too many open files The configuration limit on the number of simultaneously open files has been exceeded. 25 ENOTTY Not a typewriter The file mentioned in SUy or gUy is not a terminal orone of the other devices to which these calls apply. 26 ETXTBSY Text file busy The referenced text file is busy and the current request can not be honored. 27 EFBIG File too large The size of a file exceeded the maximum limit. 3-198 Host Interface v1.0 SpeCification Table 8. HIF Error Numbers Assigned (continued) Error Name Description 28 ENOS PC No space left on device A write to an ordinary file, the creation of a directory or symbolic link, or the creation of a directory entry failed because no more disk blocks are available on the file system. 29 ESPIPE Illegal seek A seek was issued to a socket or pipe. This error may also be issued for other non-seekable devices. 30 EROFS Read-only file system An attempt to modify a file or directory was made on a device mounted readonly. 31 EMLINK Too many links An attempt was made to establish a new link to the requested file and the limit of simultaneous links has been exceeded. 32 EPIPE Broken pipe A write on a pipe or socket was attempted for which there is no process to read the data. This condition normally generates a signal; the error is returned if the signal is caught or ignored. 33 EDOM Argument too large The argument of a function in the math package is out of the domain of the function. 34 ERANGE Result too large The value of a function in the math package is unrepresentable within machine precision. 35 EWOULDBLOCK Operatiqn would block An operation that would cause a process to block was attempted on an object in non-blocking mode. 36 EINPROGRESS Operation now in progress An operation that takes a long time to complete was attempted on a non-blocking object. 37 EALREADY Operation already in progress An operation was attempted on a non-blocking object that already had an operation in progress. 38 ENOTSOCK Socket-operation on non-socket A socket-oriented operation was attempted on a non-socket device. 39 EDESTADDRREQ Destination address required A required address was omitted from an operation on a socket. 40 EMSGSIZE Message too long A message sent on a socket was larger than the internal message buffer or some other network limit. 41 EPROTOTYPE Protocol wrong type for socket A protocol was specified that does not support the semantics of the socket type requested. Number 3·199 29K Family Application Notes Table 8. HIF Error Numbers AssIgned (continued) Error Name Description 42 ENOPROTOOPT Option not supported by protocol A bad option' or level was specified when accessing socket options. 43 EPROTONOSUPPORT Protocol not supported The protocol has not been configured into the system, or no implementation for it exists. 44 E~OCKTNOSUPPORT Socket type not supported The support for the socket type has not been configured into the system, or no implementation for it exists. 45 EOPNOTSUPP Operation not supported on socket For example, trying to accept a connection on a datagram socket. 46 EPFNOSUPPORT Protocol family not supported The protocol family has not been configured into the system or no implementation for it exists. 47 EAFNOSUPPORT Address family not supported by protocol family An address was used that is incompatible with the requested protocol. 48 EADDRINUSE Address already in use Only one usage of each address is normally permitted. 49 EADDRNOTAVAIL Cannot assign requested address This normally results from an attempt to create a socket with an address not on this machine. 50 ENETDOWN Network is down A socket operation encountered a dead network. 51 ENETUNREACH Network is unreachable A socket operation was attempted to an unreachable network. 52 ENETRESET Network dropped connection on reset The host yo'u were connected to crashed and rebooted. 53 ECONNABORTED Software caused connection abort A connection abort was caused internal to your host machine .. 54 ECONNR~SET Connection reset by peer A connection was forcibly closed by a peer. This normally results from a loss of the connection on the remote socket due to a timeout or a reboot. 55 ENOBUFS No buffer space available An operation on a socket or pipe was not performed because the system lacked sufficient buffer space or because a queue was full. 56 EISCONN Socket is already connected A connect request was made on an already connected socket; or a sendto or sendmsg request on a connected socket specified a destination when already connected. Number 3-200 Host Interface v1.0 Specification Table 8. HIF Error Numbers Assigned (continued) Number Error Name 57 ENOTCONN Description Socket is not connected A request to send or receive data was disallowed because the socket was not connected and (when sending on a datagram socket) no address was supplied. 58 ESHUTDOWN Cannot send after socket shutdown A request to send data was disallowed because the socket had already been shut down with a previous shutdown call. 59 ETOOMANYREFS Too many references; cannot splice. 60 ETIMEDOUT Connection timed out A connect or send request failed because the connected party did not properly respond after a period of time. (The timeout period is dependent on the communication protocol.) 61 ECONNREFUSED Connection refused No connection could be made because the target machine actively refused it. This usually results from trying to connect to a service that is inactive on the foreign host. 62 ELOOP 63 ENAMETOOLONG Too many levels of symbolic links A pathname lookup involved more than the maximum limit of symbolic links. File name too long A component of a pathname exceeded the maximum name length, or an entire path name exceeded the maximum path length. 64 EHOSTDOWN Host is down A socket operation failed because the destination host was down. 65 EHOSTUNREACH Host is unreachable A socket operation was attempted to an unreachable host. 66 ENOTEMPTY Directory not empty A non-empty directory was supplied to a remove directory or rename call. 67 EPROCLIM Too many processes The limit of the total number of processes has been reached. No new processes can be created. 68 EUSERS Too many users The limit of the total number of users has been reached. No new users may access the system. 69 EDaUOT Disk quota exceeded A write to an ordinary file, the creation of a directory or symbolic link, or the creation of a directory entry failed because the user's quota of disk blocks was exhausted; orthe allocation of an inodefor a newly created file failed because the user's quota of inodes was exhausted. 70 EVDBAD RVD related disk error 3-201 Table of Contents CHAPTER 4 General Information Related Literature ..................................................................................................................................................4-3 Package Outlines ... ................................................................................................................................................4-4 Related Literature CHAPTER 4 RELATED LITERATURE Additional Support Literature The following is a list of AMD 29K Family literature that can be ordered from your local AMD Sales Representative or the Literature Distribution Center at (800) 222-9323, extension 5000; inside California, call (408) 749-5000. Technical and marketing information concerning the 29K Family also can be obtained by calling the 29K Hotline at (800) 2929-AMD. Order No. 09548 10344 10345 10620 10621 10623 11426 11852 Title Am29000 Article Reprint Brochure Am29000 Family Overview Brochure 29K Support Products Brochure' Am29000 User's Manual Am29000 Performance Analysis Brochure Am29000 Memory Design Handbook Fusion 29K Catalog Am29027 Handbook 4-3 General Information PACKAGE OUTLINES* CGX169 .... BOnoMV!EW ------1.100 H " " BSC------J 1• L .. N P R T 1.740 UiO ----------4~~+_ ,100 BSC-I i- PIO II 07322B "For reference only. *For reference only. All dimensions are measured in inches. BSC is an ANSI standard for Basic Space Centering. 4·4 Package Outlines CQ164 .. r- . 1.665 1-:710 1.140 ..-J -1~5 -. -- 1.000 sse .. -- .500 sse Ilir Ir ~:250-.. MIN ~ f I~ ~ ~ II: = i:= 1.665 1.140 1-:710 1.165 !:I::: g ~ .006.l ·01OT -'ii -'ii t ~ .025 MAX + t 5 = --' --' ~ '1 u u uuu uu u uu TOP VIEW .004 .008 .008 ±.006 .080 .105 + • t =t~~~lt~==I~_____________ 4 ____~f~t~f 130!l2A 4-5 Notes / Notes Notes Notes International (Continued) _ _ _ _ _ _ __ North American __________ ALABAMA .............................................................. (205) ARIZONA ............................................................... (602) CALIFORNIA, Culver City ........................................................ (213) Newport Beach ................................................ F14l Rosev!"e ........................................................... (916) 882-9122 242-4400 645-1524 752-6262 786-6700 ~~~ ~~;~~.:::.:::::::::::::::::::::::::::::::::::::::::::::::::::::: (~6~) ~~g:bg~g Woodland Hills ................................................. (818) 992-4155 CANADA, Ontario, ~}W~!:d'aie .:::::::::: :::: ::::::::::: ::::::::::::::: :::::::::::::::: !~ ~ ~~ ~~tg~~g COLORADO .......................................................... (303) CONNECTICUT .................................................... (203) FLORIDA, Clearwater ........................................................ (813) Ft. Lauderdale .................................................. (305) Orlando (Casselberry) .................................... (407) GEORGIA .............................................................. (404) ILLINOIS, Chicago (Itasca) .............................................. (312) Naperville .......................................................... (312) 741-2900 264-7800 530-9971 776-2001 830-8100 449-7920 773-4422 505-9517 ~1~~t~ N'D'::::::::::::::::::::::::::::::::::::::::::::::::::::.::::::: ~~6 ~l ~~~:~j ~ g MASSACHUSETTS .............................................. (617) 273-3970 MICHIGAN ............................................................. (313) 347-1522 MINNESOTA ......................................................... (612) 938-0001 NEW JERSEY, ~~~;r~p~~~·::::::::::::::::::::::::::::::::::::::::::::::::::::::: !~gil ~~~:~6gg NEW YORK, ~~~~~!~:/~:i:~::::::::::::::::::::::::::::::::::::::::::::::::::: !~~ ~l i~r:~!~g NORTH CAROLlNA .............................................. (919) OHIO, Columbus (Westerville) .................................. (614) Dayton •.............................................................. (513) OREGON ............................................................... (503) PENNSyLVANIA ................................................... (215) SOUTH CAROLINA .............................................. (803) TEXAS, Austin ................................................................ (512) Dallas ................................................................ (214) Houston ............................................................. (713) 878-8111 891-6455 439-0470 245-0080 398-8006 772-6760 346-7830 934-9099 785-9001 In terna tional ___________ BELGIUM, Bruxelles ....... TEL ............................. (02) 771-91-42 FAX ............................. (02) 762-37-12 TLX ..................................... 846-61028 FRANCE, Paris ................ TEL ............................ (1) 49-75-10-10 FAX ............................ (1) 49-75-10-13 TLX ........................................ 263282F WEST GERMANY, Hannover area ............ TEL .............................. (0511) 736085 FAX .............................. (0511) 721254 TLX ........................................... 922850 MOnchen ...................... TEL ................................. (089) 4114-0 FAX ................................ (089) 406490 TLX ........................................... 523883 Stuttgart ....................... TEL ........................... (0711) 62 3377 FAX .............................. (0711) 625187 TLX ........................................... 721882 HONG KONG, .................. TEL ............................. 852-5-8654525 Wanchai FAX ............................. 852-5-8654335 TLX .......................... 67955AMDAPHX ITALY, Milan .................... TEL ................................ (02) 3390541 ................................ (02) 3533241 FAX ................................ (02) 3498000 TLX ................................... 843-315286 JAPAN, Kanagawa .................... ~~~ ::::::::::::::::::::::::::::::::: ~~~:~ ~:~n~ Tokyo ........................... TEL ............................... (03) 345-8241 FAX ..................... _......... (03) 342-5196 TLX ........................ J24064AMDTKOJ Osaka ........................... TEL ................................. 06-243-3250 FAX ................................. 06-243-3253 KOREA, Seoul ................. TEL ............................... 822-784-0030 FAX ............................... 822-784-8014 LATIN AMERICA, Ft. Lauderdale ............. TEL ............................. (305) 484-8600 FAX ............................ (305) 485-9736 TLX ................. 5109554261 AMDFTL NORWAY, Hovik .............. TEL .................................. (03) 010156 FAX .................................. (02) 591959 TLX .................................. 19079HBCN SINGAPORE .................... TEL ................................... 65-3481188 FAX .................................. 65-3480161 TLX .......................... 55650 AMDMMI SWEDEN, Stockholm .................... TEL .............................. (08) 733 03 50 (Sundbyberg) FAX .............................. (08) 733 22 85 TLX ............................................. 11602 TAIWAN ............................ TEL ............................. 886-2-7213393 FAX ............................. 886-2-7723422 TLX ............................. 886-2-7122066 UNITED KINGDOM, Manchester area ......... TEL .............................. (0925) 828008 (Warrington) FAX .............................. (0925) 827693 TLX ................................... 851-628524 London area ................ TEL .............................. (0483) 740440 (Woking) FAX .............................. (0483) 756196 TLX ................................... 851-859103 North American Representatives _ __ CANADA Burnaby, B.C. DAVETEK MARKETING ................................. (604) Calgary, Alberta DAVETEK MARKETING ................................. (403) Kanata, Ontario VITEL ELECTRONICS .................................... (613) Mississauga, Ontario VITEL ELECTRONICS .................•.................. (416) Lachine, Quebec VITEL ELECTRONICS .................................... (514) IDAHO INTERMOUNTAIN TECH MKTG, INC .......... (208) ILLINOIS HEARTLAND TECH MKTG, INC .................. (312) INDIANA Huntington - ELECTRONIC MARKETING CONSULTANTS, INC ...................................... (317) Indianapolis - ELECTRONIC MARKETING CONSULTANTS, INC ...................................... (317) IOWA LORENZ SALES .............................................. (319) KANSAS Merriam -LORENZ SALES ............................ (913) Wichita - LORENZ SALES ............................. (316) KENTUCKY ELECTRONIC MARKETING CONSULTANTS, INC ...................................... (317) MICHIGAN Birmingham - MIKE RAICK ASSOCIATES .. (313) Holland-COM-TEK SALES, INC ................. (616) Novi -COM-TEK SALES, INC ....................... (313) MISSOURI LORENZ SALES .............................................. (314) NEBRASKA LORENZ SALES .............................................. (402) NEW MEXICO THORSON DESERT STATES ....................... (505) NEW YORK East Syracuse - NYCOM, INC ...................... (315) Woodbury - COMPONENT CONSULTANTS, INC ...................................... (516) OHIO Centerville - DOLFUSS ROOT & CO ........... (513) Columbus - DOLFUSS ROOT & CO ............ (614) Strongsville -DOLFUSS ROOT & CO ......... (216) PENNSYLVANIA DOLFUSS ROOT & CO .................................. (412) PUERTO RICO COMP REP ASSOC, INC ............................... (809) UTAH, R2 MARKETING ....................................... (801) WASHINGTON ELECTRA TECHNICAL SALES ..................... (206) WISCONSIN HEARTLAND TECH MKTG, INC ................... (414) 430-3680 291-4984 592-0060 676-9720 636-5951 888-6071 577-9222 921-3450 921-3450 377-4666 384-6556 721-0500 921-3452 644-5040 399-7273 344-1409 997-4558 475-4660 293-8555 437-8343 364-8020 433-6776 885-4844 238-0300 221-4420 746-6550 595-0631 821-7442 792-0920 Advanced Micro Devices reserves the right to make changes in its product without notice in order to improve design or performance characteristics. The performance characteristics listed in this document are guaranteed by specific tests, guard banding, design and other practices common to the industry. For specific testing details, contact your local AMD sales representative. The company assumes no responsibility for the use of any circuits described herein. ~ ~ .... Advanced Micro Devices, Inc. 901 Thompson Place, P.O. Box 3453, Sunnyvale, CA 94088, USA Tel: (408) 732-2400 • TWX: 910-339·9280 • TELEX: 34-6306 • TOLL FREE: (800) 538-8450 APPLICATIONS HOTLINE TOLL FREE: (800) 222·9323 • (408) 749-5703 © 1989 Advanced Micro Devices, Inc. 8/9189 Printed In USA -_ .. _--_ .. _ - - - -
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.3 Linearized : No XMP Toolkit : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37 Create Date : 2013:08:06 10:04:09-08:00 Modify Date : 2013:08:07 11:33:21-07:00 Metadata Date : 2013:08:07 11:33:21-07:00 Producer : Adobe Acrobat 9.55 Paper Capture Plug-in Format : application/pdf Document ID : uuid:1f0949f7-0fad-e84c-9c58-1b669d2fef14 Instance ID : uuid:6f31228e-8d1d-7f49-8a83-4b71c22b150d Page Layout : SinglePage Page Mode : UseNone Page Count : 447EXIF Metadata provided by EXIF.tools