1990_TI_DSP_Applications_Vol_3 1990 TI DSP Applications Vol 3

User Manual: 1990_TI_DSP_Applications_Vol_3

Open the PDF directly: View PDF PDF.
Page Count: 572

Download1990_TI_DSP_Applications_Vol_3 1990 TI DSP Applications Vol 3
Open PDF In BrowserView PDF
~ TEXAS

I NSTRUM ENTS

Digital Signal Processing
Applications with the TAfS320 Family

1990

1990

Digital Signal Processor Products

DigftalSignalProcessing
Applications with the TMS320 Family
Volume 3
Edited by
Panos Papamichalis, Ph.D.
Digital Signal Processing
Semiconductor Group
Texas Instruments

TEXAS

INSTRUMENTS

IMPORTANT NOTICE
Texas Instruments (TI) reserves the right to make changes to or to discontinue
any semiconductor product or service identified in this publication without notice.
TI advises its customers to obtain the latest version of the relevant information
to verify, before placing orders, that the information being relied upon.is current.
TI warrants performance of its semiconductor products to current specifications
in accordance with Tl's standard warranty. Testing and other quality control techniques are utilized to the extent TI deems necessary to support this warranty. Unless mandated by government requirements, specific testing of all parameters of
each device is not necessarily performed.
TI assumes no liability for TI applications assistance, customer product design,
software performance, or infringement of patents or services described herein.
Nor does TI warrant or representthat license, either express or implied, is granted
under any patent right, copyright, mask work right, or other intellectual property
right of TI covering or relating to any combination, machine, or process in which
such semiconductor products or services might be or are used.
TRADEMARKS
ADI and AutoCAD are trademarks of Autodesk, Inc.
Apollo and Domain are trademarks of Apollo Computer, Inc.
ATVista is a trademark of Truevision, Inc.
Code View, MS-Windows, MS, and MS-DOS are trademarks of Microsoft Corp.
DEC, DigitalDX, VAx, VMS, and Ultrix are trademarks of Digital Equipment Corp.
DGIS is a trademark of Graphic Software Systems, Inc.
EPIC, XDS, T1GA, and TIGA-340 are trademarks of Texas Instruments, Inc.
GEM is a trademark of Digital Research, Inc.
GSS*CGI is a trademark of Graphic Software Systems, Inc.
HPGL is a registered trademark of Hewlett-Packard Co.
Macintosh and MPW are trademarks of Apple Computer Corp.
NEC is a trademark of NEC Corp.
PC-DOS, PGA, and Micro Channel are trademarks of IBM Corp.
PEPPER is a registered trademark of Number Nine Computer Corp.
PM is a trademark of Microsoft Corp.
PostScript is a trademark of Adobe Systems, Inc.
RTF is a trademark of Microsoft Corp.
Sony is a trademark of Sony Corp.
Sun 3, Sun Workstation, Sun View, Sun Windows, and SPARC are trademarks of
Sun Microsystems, Inc.
UNIX is a registered trademark of AT&T Bell Laboratories.
Copyright © 1990, Texas Instruments Incorporated

CONTENTS
FOREWORD ...............................................................................

v

PREFACE ................................................................................. vii
PART I. INTRODUCTION

I. The TMS320 Family and Book Overview ....................................................

3

2. The TMS320 Family of Digital Signal Processors
(Kun-Shan Lin, Gene A. Frantz, and Ray Simar, Jr., reprinted from PROCEEDINGS OF THE IEEE,
Vol. 75, No.9, September 1987) ......................................................... 11
3. The TMS320C30 Floating-Point Digital Signal Processor
(Panos Papamichalis and Ray Simar, Jr., reprinted from IEEE Micro Magazine, Vol. 8, No.6,
December 1988). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 31
PART II. DIGITAL SIGNAL PROCESSING ROUTINES

4. An Implementation of FFT, DCT, and Other Transforms on the TMS320C30
(Panos Papamichalis) .................................................................... 53
5. Doublelength Floating-Point Arithmet;c; on the TMS320C30
(AI Lovrich) ............................................................................. 137
6. An 8 X 8 Discrete Cosine Transform Implementation on the TMS320C25 or the TMS320C30
(William Hohl) ......................................................................... 169
7. Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30
(Sen Kuo, Chein Chen) .................................................................. 191
8. A Collection of Functions for the TMS320C30
(Gary Sitton) ........................................................................... 273
PART III. DIGITAL SIGNAL PROCESSING INTERFACE TECHNIQUES

9. TMS320C30 Hardware Applications
(Jon Bradley) .......................................................................... 333
10. TMS320C30-IEEE Floating-Point Format Converter
(Randy Restle and Adam Cron) ........................................................... 365
PART IV. TELECOMMUNICATIONS

II. Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX
(Mark D. Grosen) ...................................................................... 403

iii

PART V. COMPUTERS
12. A DSP-Based Three-Dimensional Graphics System
(Nat Seshan) ........................................................................... 423
PART VI. TOOLS
13. The TMS320C30 Applications Board Functional Description
(Tony Coomes and Nat Seshan) ........................................................... 467
TMS320 BIBLIOGRAPHY .................................................................... 533

iv

Foreword

Much has happened in the TMS320 Family since Volume 1 of Digital Signal Processing
Applications with the TMS320 Family was published, and Volumes 2 and 3 are a timely update to
the family history.
The DSP microcomputers keep changing the perspective of the systems designers by offering more computational power and better interfacing capabilities. The steps of change are coming
more quickly, and the potential impact is greater and greater. Because things change so rapidly in
this area, there is a pressing need for ways to quickly learn how to utilize the new technology. These
new volumes respond to that need.
As with Volume 1, the purpose of these books is to teach us about the issues and techniques
that are important in implementing digital signal processing systems using microprocessors in the
TMS320 Family. Volume 2 highlights the TMS320C25; and Volume 3, the TMS320C30 chip. A
large part of the books is devoted to such matters as characteristics of the TMS320C25 and
TMS320C30 chips, useful program code for implementing special DSP functions, and details on
interfacing the new chips to external devices. The remainder of the books illustrates how these
chips can be used in communications, control, and computer graphics applications.
What these two volumes make clear is how remarkably fast the field ofDSP microcomputing
is evolving. IC technologists and designers are simply packing more and more of the right kind of
computing power into affordable microprocessor chips. The high-speed floating-point computing
power and huge address spaces of chips like the TMS320C30 open the door to a whole new class
of applications that were difficult or impractical with earlier generations of fixed-point DSP chips.
The signal processing theorists and system designers are clearly being challenged to match the creativity of the chip designers.
The present books differ from Volume 1 in the inclusion of a small section on tools, This is
a hopeful sign, because it is progress in this area that is likely to have the greatest impact on speeding
the widespread application of DSP microprocessors. While useful design tools are beginning to
emerge, much more can be done to help system designers manage the complexity of sophisticated
DSP systems, which often involve a unique combination of theory, numerical and symbolic processing algorithms, real-time programming, and multiprocessing. No doubt future volumes of Digital Signal Processing Applications with the TMS320 Family will have more to say about this important topic. Until then, Volumes 2 and 3 have much useful information to help system designers
keep up with the TMS320 Family.
Ronald W. Schafer
Atlanta, Georgia
November 14, 1989

v

vi

Preface
The newer, floating-point DSP devices, such as the TMS320C30, have brought an added dimension to DSP applications. With the TMS20C30, programming is much easier because the designer does not have to worry about dynamic range and accuracy issues. An algorithm implemented
in floating-point in a high-level language can be easily ported to such a device. The new architecture
contains other features, besides the floating point capability, that simplify programming. Some of
these features (such as the software stack, the large register file, etc.) were added to facilitate the
development of high-level language compilers. Currently, C and Ada compilers have been introduced. In addition, Spectron Microsystems introduced an operating system for DSPs (called
SPOX) that further facilitates the development of algorithms on the DSP devices.
Volume 3 of Digital Signal Processing Applications with the TMS320 Family contains application reports primarily on the third generation of the TMS320 Family (floating-point devices).
This book is a continuation of Volumes 1 and 2 in the sense that it addresses the same needs of the
designer. The designer still has the task of selecting the DSP device with the appropriate cost, performance, and support, developing the DSP algorithm that will solve the problem, and implementing the algorithm on the processor. This volume tries to help by bringing the designer up to date
on the applications of newer processors or in different applications of earlier processors.
The objectives remain the same as in earlier volumes. First, the application reports supply
examples of device use and serve as tutorials in programming the devices. Of course, the same purpose is served on a more elementary basis by the software and hardware applications sections of
the corresponding user's guides. Second, since the source code of each application is provided with
the report, the designer can take it intact (or extract a portion of it) and place it in the application.
It is assumed that the reader has exposure to the TMS320 devices or, at least, has the necessary
manuals (such as the appropriate TMS320 user's guides) that will help the reader understand the
explanations in the reports. The reports themselves include as references the nec(,!ssary background
material. Additionally, the Introduction gives a brief overview of the available devices at the time
of the writing and points to the source of more information.

The reports are grouped by application area. The term report is used here in a broad sense,
since some articles from technical publications are also included. The authors of the reports are either the digital signal processing engineering staff of the Texas Instruments Semiconductor Group
(including both field and factory personnel, and sUmmer students) or third parties.
The source code associated with the reports is also available in electronic form, and the reader
can download it from the TI DSP Electronic Bulletin Board (telephone (713) 274-2323). If more
information is needed, the DSP Hotline can be called at (713) 274-2320.
The editor thanks all the authors and the reviewers for their contribution to this volume of
application reports.
Panos E. Papamichalis, Ph.D.
Senior Member of Technical Staff
vii

viii

Part I. Introduction
1. The TMS320C20 Family and Book Overview
2. The TMS320C20 Family of Digital Signal Processors
(Kun-Shan Lin, Gene A. Frantz, and Ray Simar, Jr., reprinted from
PROCEEDINGS OF THE IEEE, Vol. 75, No.9, September 1987)
3. The TMS320C30 Floating-Point Digital Signal Processor
(Panos Papamichalis and Ray Simar, Jr., reprinted from IEEE Micro
Magazine, Vol. 8, No.6, December 1988)

1

2

TMS320 Family and Book Overview
Digital signal processors have found applications in areas where they were not even considered a few years ago. The two major reasons for such proliferation are an increase in processor performance and a reduction in cost. Volume 3 of Digital Signal Processing Applications with the
TMS320 Family presents a set of application reports primarily on the TMS320C30, the third-generation TMS320 device.

Organization of the Book
The material in this book is grouped by subject area:
•
•
•
•

Introduction
Digital Signal Processing Routines
DSP Interface Techniques
Telecommunications

• Computers
• Tools
• Bibliography
The Introduction contains this overview and two review articles. The first article gives a
general description of the TMS320 family and is reprinted from a special issue of the IEEE Proceedings, while the second article discusses the TMS320C30 device and is reprinted from the IEEE
Micro Magazine. The overview points out how the TMS320 family has grown since the two articles
were published and also introduces newer devices.
The five articles in the Digital Signal Processing Routines section present useful algorithms, such as the FFT, the Discrete Cosine Transform, etc., that are implemented on the
TMS320C30. Two of the reports also consider implementations on the TMS320C25.
The section on nsp Interface Techniques contains an article on interfacing the
TMS320C30 with external hardware, such as memories and ND and D/A converters, and an article
on a hardware implementation of a floating-point converter between the IEEE and the TMS320C30
formats.
The following three sections contain one article each. In the Telecommunications section,
an implementation of the government-standard CELP speech-coding algorithm is presented. The
Computers section contains an article on 3-D graphics systems, which shows examples of using
the TMS320C30 device for graphics problems. In the Tools section, the article gives a functional
description of the TMS320C30 Application Board that is part of the hardware emulator for that deVIce.

The Bibliography section contains a list of articles mentioning DSP implementations using
TMS320 devices. The different titles are listed chronologically and are grouped by subject. The list
is not exhaustive, but it gives pointers for pursuing practical implementations in representative
application areas.
Digital Signal Processing Applications with the TMS320 Family, Vol. 3

3

The TMS320 Family of Processors
The TMS320 Family of digital signal processors started with the TMS32010 in 1982, but it
has been expanded to encompass five generations (at the time of this writing) with devices in each
generation. Figure 1 shows this progression through the generations. The TMS320 devices can be
grQuped in two broad categories: fixed-point and floating-point devices. As implied by Figure 1,
the first, second, and fifth generations are the fixed-point devices, while the third and the fourth
generations (the latest one under development) support floating-point arithmetic.

Figure 1. TMS320 Family Roadmap
Floating-point DSP
Fixed-point DSP ----,---r---+--,
* 1990 NEW TMS320

TMS32OC4x
*TMS320C40
TMS320C30

P m

*TMS320C30·26~~'--_~

e f
r

I

f

0

0

p
s
/

r
m

a

n

m

i

C P
e s

*TMS320C31
---"--~
TMS320C2x
TMS320C1x
TMS320C10. ·14
TMS320C10·25
TMS320C15/E15
TMS320C15·25
TMS320C17/E17
TMS320C14/E14

TMS32020
TMS320C25
TMS320E25
TMS320C25·50
*TMS320C26

*TMS320C50
*TMS320C51

Generation

4

Digital Signal Processing Applications with the TMS320 Family, Vol. 3

The following article, "The TMS320 Family of Digital Signal Processors," by Lin, et. aI.,
is reprinted from the Proceedings of the IEEE and gives an overview of the TMS320 family. Since
additional devices have been developed from the time the article was written, this section highlights
these newer devices. Table 1 shows a comprehensive list of the currently available TMS320 devices
and their salient characteristics.

Table 1. TMS320 Family Overview
Memory
Gen

1st

2nd

Device

TMS320CIO'
TMS320CIO-25
TMS320CIO-14
TMS320El4
TMS32OCl5 ,
TMS32OCI5-25.
TMS320EI5 ,
TMS320EI5-25
TMS320CI7
TMS320EI7
TMS32020'
TMS320C25 •
TMS32OC25-50'
TMS320E25 ,
TMS32OC26

Cycle
Time
(n.)

RAM

200
160
280
160
200
160
200
160
200
200

144
144
144
256
256
256
256
256
256
256

Integer

200
100
80
100
100

544
544
544
544
l.5K

Data
Type
Integer
Integer

Integer
Integer
Integer

Integer
Integer
Inreger

Integer

Integer
Integer
Integer
Integer
Integer

On·
Chip
ROM

EPROM

I/O
Off·
Chip

16xl6
16xl6
16xl6
16xl6

256

128K
128K
128K
128K
128K

I
I
I
I
I

16Mx32

2

16x16

1

4K
4K
4K
4K
4K
4K
4K
4K
4K
4K

Float Pt

60

2K

4K

16M

5th

TMS320C50'

Integer

50

8.5K

2K

128K

DMA

OnChip

Package

Timers
8xl6
8xl6
8xl6
7xl6
8xl6
8xl6
8xl6
8xl6
6xl6
6xl6

TMS320C30.

t

Serial

4K
4K
4K
4K
4K
4K
4K
4K
4K
4K

l.5K
I.5K
l.SK

3rd

,*

Parallel

I

4

2
2

I
I

t
t
t
t
t

*t

I
I
I
I

1

DlPIPLCC
DlPIPLCC
DlPIPLCC
CERQUAD
DIPIPLCC
DlPIPLCC
DlP/CERQUAD
DlP/CERQUAD
DlPIPLCC
DlP/CERQUAD
PGA
PGAlPLCC
PGAIPLCC
CERQUAD
PLCC

2

PGA

I

CLCC

External DMA
Extemalllntemal DMA

For infonnation on military versions of these devices, contact your local TI sales office.

Digital Signal Processing Applications with the TMS320 Family, Vol. 3

5

The additions to the first generation are the TMS320C14 and the TMS320E14; the iatter is
identical with the former, except that the latter's on-chip program memory is EPROM. The
TMS320C14/E14 devices have features that make them suitable for control applications. Figure
2 shows the components of these devices. The memory and the CPU are identical to
TMS320C15/E15, while the peripherals reflect the orientation of the devices toward control.
Figure 2. TMS320C14/E14 Key Features

Barrel Shifter

r------

32-bitALU

16x16-bit
Multiply

Watchdog Timer

32-bitACC
0,1 ,4-bit Shift

Timer/Counter 2

32-bit P-Reg

2 Auxiliary Registers

16 bit I/O
SERIAL PORT

4 level H/W Stack

Some of the key features of the TMS320C14/E14 are:
• 160-ns instruction cycle time
• Object-code-compatible with the TMS320C15
• Four 16-bit timers
- 1\\'0 general-purpose timers
- One watchdog timer
- One baud-rate generator
• 16 individual bit-selectable I/O pins
• Serial port/USART with codec-compatible mode
• Event manager with 6-channel PWM D/A
• CMOS technology, 68-pin CERQUAD
The additions to the second generation are the TMS320E25, the TMS320C25-50, and the
TMS320C26. The TMS320E25 is identical to the TMS320C25, except that the 4K-word on-chip
program memory is EPROM. Since increased speed is very important for the real-time implemen6

Digital Signal Processing Applications with the TMS320 Family, Vol. 3

tation of certain applications, the TMS320C2S-S0 was designed as a faster version of the
TMS320C2S and has a clock frequency of SO MHz instead of 40 MHz.
The TMS320C26 is a modification of the TMS320C25 in which the program ROM has been
exchanged for RAM. The memory space of the TMS320C26 has 1.5K words of on-chip RAM and
256 words of on-chip ROM, making it ideal for applications requiring larger RAM but minimal
external memory.
A new generation of higher-performance fixed-point processors has been introduced in the
TMS320 Family: the TMS320CSx devices. This generation shares many features with the first and
the second generations, but it also encompasses significant new features. Figure 3 shows the basic
components of the first device in that generation, the TMS320CSO.

Figure 3. TMS320C50 Key Features

Serial Port
Timer
SIWWaitsts
16x16

Some of the important features of the TMS320CSO are listed below:
• Source code is upward compatible with the TMS320C1x/C2x devices
•
•
•
•
•
•
•
•

SO/3S-ns instruction cycle time
8K words of on-chip program/data RAM
2K words boot ROM
S44 words of data/program RAM
128K words addressable total memory
Enhanced general-purpose and DSP-specific instructions
Static CMOS, 84-pin CERQUAD
JTAG serial scan path

Digital Signal Processing Applications with the TMS320 Family, Vol. 3

7

The software and hardware development tools for the TMS320 family make the development of applications easy. Such tools include assemblers, linkers, simulators, and C compilers for
the software. They include evaluation modules, software development boards, and extended development systems for hardware. These tools are mentioned in the following paper by Lin, et. al. The
interested reader can find much more information in the additional literature that is published by
Texas Instruments and mentioned in the next section. In particular, the TMS320 Family DeveLopment Support Reference Guide is an exceIJent source.
One important addition to the list of tools is the SPOX operating system, developed by Spectron Microsystems. SPOX permits you to write an application in a high-level language (C) and run
it on actual DSP hardware. The operating system of SPOX hides the details of the interface from
you and lets you concentrate on your algorithm while running it at supercomputer speeds on the
TMS320C30.

References
Texas Instruments publishes an extensive bibliography to help designers use the TMS320 devices effectively. Besides the user's guides for corresponding generations, there are manuals for
the software and the hardware tools. The TMS320 Family Development Support Reference Guide
is particularly useful because it provides information, not only on development tools offered by TI,
but also on those produced by third parties. Here is a partial list of the literature available (the literature number is in parentheses)
• TMS320 Family Development Support Reference Guide (SPRUOllA)
• TMS320Clx User's Guide (SPRU013A)
• TMS320C2x User's Guide (SPRU014)

•
•
•
•
•
•
•

TMS320C3x User's Guide (SPRU031)
TMS320ClxlTMS320C2xAssembly Language Tools User's Guide (SPRU018)
TMS320C30 Assembly Language Tools User's Guide (SPRU035)
TMS320C25 C Compiler Reference Guide (SPRU024)
TMS320C30 C Compiler Reference Guide (SPRU034)
Digital Signal Processing Applications with the TMS320 Family, Volume 1 (SPRA012)
Digital Signal Processing Applications with the TMS320 Family, Volume 2 (SPRA016)

You can request this literature by calling the Customer Response Center at 1-800-232-3200,
or the DSP Hotline at 1-713-274-2320.

Contents o/Other Volumes o/the Application Book
Volume 1
Part I. Digital Signal Processing and the TMS320 Family
• Introduction
• The TMS320 Family
Part II. Fundamental Digital Signal Processing Operations
• Digital Signal Processing Routines
8

Digital Signal Processing Applications with the TMS320 Family, Vol. 3

- Implementation of FIR/IIR Filters with the TMS3201OrrMS32020
- Implementation of Fast Fourier Transform Algorithms with the TMS32020
- Companding Routines for the TMS3201OrrMS32020
- Floating-Point Arithmetic with the TMS32010
- Floating-Point Arithmetic with the TMS32020
- Precision Digital Sine-Wave Generation with the TMS32010
- Matrix Multiplication with the TMS32010 and TMS32020
• DSP Interface Techniques
- Interfacing to Asynchronous Inputs with the TMS32010
- Interfacing External Memory to the TMS32010
- Hardware Interfacing to the TMS32020
- TMS32020 and MC68000 Interface
Part III. Digital Signal Processing Applications
• Telecommunications
- Telecommunications Interfacing to the TMS32010
- Digital Voice Echo Canceller with a TMS32020
- Implementation of the Data Encryption Standard Using the TMS32010
- 32K-bit/s ADPCM with the TMS32010
- A Real-Time Speech Subband Coder Using the TMS32010
- Add DTMF Generation and Decoding to DSP-f.,lP Designs
• Computers and Peripherals
• Speech Coding/Recognition
- A Single-Processor LPC Vocoder
- The Design of an Adaptive Predictive Coder Using a Single-Chip
- Digital Signal Processor
- Firmware-Programmable C Aids Speech Recognition
• Image/Graphics
- A Graphics Implementation Using the TMS32020 and TMS3406I
• Digital Control
- Control System Compensation and Implementation with the TMS32010
Volume 2
Part I. Introduction
• Book Overview
• The TMS320 Family of DSP
• The Texas Instruments TMS320C25 Digital Signal Microcomputer
Part II. Digital Signal Interface Techniques
• Hardware Interfacing to the TMS320C2x
• Interfacing the TMS320 Family to the TLC32040 Family
• ICC Requirements of the TMS320C25
• An Implementation of a Software UART Using the TMS320C25
• TMS320CI7 JEI7 and TMS370 Serial Interface
Digital Signal Processing Applications with the TMS320 Family, Vol. 3

9

Part III. Data Communications
• Theory and Implmentation of a Split-Band Modem Using the TMS320C17
• Implementation of an FSK Modem Using the TMS320C17
• An All-Digital Automatic Gain Control
Part IV. Telecommunications
• General Purpose Tone Decoding and DTMF Detection
Part V. Control
• Digital Control
Part VI. Tools
• TMS320 Algorithm Debugging Techniques

10

Digital Signal Processing Applications with the TMS320 F amity, Vol. 3

The TMS320 Family
of
Digital Signal Processors

Kun-Sban Lin
Gene A. Frantz
Ray Simar, Jr.
Digital Signal Processor Products-Semiconductor Group
Texas Instruments

Reprinted from
PROCEEDINGS OF THE IEEE
Vol. 75, No.9, September 1987

11

12

The TMS320 Family of Digital Signal Processors

The TMS320 Family of Digital Signal
Processors
KUN-SHAN LIN, MEMBER,
RAY SIMAR, JR.

IEEE,

GENE A. FRANTZ,

SENIOR MEMBER, IEEE,

AND

This paper begins with a discussion of the characteristics of digital signal processing, which are the driving force behind the design
of digital signal processors. The remainder of the paper describes
the three generations of the TMS320 family of digital signal processors available from Texas Instruments. The evolution in architectural design of these processors and key features of each generation of processors are discussed. More detailed information is
provided for the TMS320C25 and TMS320C30, the newest members
in the family. The benefits and cost-performance tradeoffs of these

processors become obvious when applied to digital signal processing applications, such as telecommunications, data communications, graphics/image processing, etc.
DIGITAL SIGNAL PROCESSING CHARACTERISTICS

Digital signal processing (DSP) encompasses a broad
spectrum of applications. Some application examples
include digital filtering, speech vocoding, image processing, fast Fouriertransforms, and digital audio [11-(10). These
applications and those considered digital Signal processing
have several characteristics in common:
mathematically intensive algorithms,
real-time operation,
sampled data implementation,
system flexibility.

a(i) • x(n - i)

!5 • d(n
+!

5

N

= 2:

From (1), we can see that to generate every y(n), we have
to compute N multiplications and additions or sums of
products. This computation makes it mathematically intensive, especially when N is large.
At this point it is worthwhile to give the FIR filter some
physical significance. An FIR filter is a common technique
used to eliminate the erratic nature of stock market prices.
When the day-to-day closing prices are plotted, it is sometimes difficult to obtain the desired information, such as the
trend of the stock, because of the large variations. A simple
way of smoothing the data is to calculate the average closing values of the previous five days. For the new average
value each day, the oldest value is dropped and the newest
value added. Each daily average value (average (n» would
be the sum of the weighted value of the latest five days,
where the weighting factors (a(i)'s) are 115. In equation form,
the average is determined by
average (n) =

To illustrate these characteristics in this section, wewill use
the digital filter as an example. Specifically, we will use the
Finite Impulse Response (FIR) filter which in the time
domain takes the general form of

y(n)

Mathematically Intensive Algorithms

(1)

'~1

where y(n) is the output sample at time n, a(i) is the ith coefficient or weighting factor, and x(n - i) is the (n - i)th input
sample.
With this example in mind, we can discuss the various
characteristics of digital signal processing: mathematically
intensive algorithms, real-time processing, sampled data
implementation, and system flexibility. First, let us look at
the concept of mathematically intensive algorithms.
Manuscript received October 6, 1986; revised March 27, 1987.
The authors are with the Semiconductor Group, Texas Instruments Inc., Houston, TX 77521·1445, USA.

IEEE log Number 8716214.

- 1) +

• d(n

- 3)

+!'d(n-5)

5

!5 • d(n
+!

5

- 2)

• d(n

- 4)
(2)

where d(n - i) is the daily stock closing price for the (n i)th day. Equation (2) assumes the same form as (1). This is
also the general form of the convolution of two sequences
of numbers, a(i) and x(i) (5), (6). Both FIR filtering and convolution are fundamental to digital Signal processing.
Real-Time Processing

In addition to being mathematically intensive, DSP algorithms must be performed in real time. Real time can be
defined as a process that is accomplished by the DSP without creating a delay noticeable tothe user. In the stock market example, as long as the new average value can be computed prior tothe next day when it is needed, it is considered
to be completed in real time. In digital signal processing
applications, processes happen faster than on adaily basis.
In the FIR filter example in (1), the sum of products must

©1989 IEEE. Reprinted, with permission, from PROCEEDINGS OF THE IEEE;
Vol. 75, No.9, pp. 1143-1159; September 1987

The TMS320 Family of Digital Signal Processors

13

be computed usually within hundreds of microseconds
before the next sample comes into the system. A second
example is in a speech recognition system where a noticeable delay between a word being spoken and being recognized would be unacceptable and not considered realtime. Another example is in image processing, where it is
considered real-time if the processor finishes the processing within the frame update period. If the pixel information
cannot be updated within the frame update period, problems such as flicker, smearing, or missing information will
occur.

Sampled Data Implementation

The application must be capable of being handled as a
sampled data system in order to be processed by digital
processors, such as digital signal processors. The stock
market is an example of a sampled data system. That is, a
specific value (c!osing valuel is assigned to each sample
period or day. Other periods may be chosen such as hourly
prices or weekly prices. In an FIR filter as shown in (1), the
output y(n) is calculated to be the weighted sum of the previous N inputs. In other words, the input signal is sampled
at periodic intervals (1 over the sample rate), multiplied by
weighting factor a(i), and then added together to give the
output result of y(n). Examples of sample rates for some typical sampled data applications [2], [4] are shown in Table 1.
Table 1 Sample Rates versus Applications

Application
Control
Telecommunications

Speech processing
Audio 'processing
Video frame rate
Video pixel rate

Nominal
Sample Rate
1 kHz
8 kHz
8-10 kHz
40-48 kHz
30 Hz
14 MHz

In a typical DSP application, the processor must be able
to effectively handle sampled data in large quantity and also
perform arithmetic computations in real time.

System Flexibility
The design of the digital signal processing system must
be flexible enough to allow improvements in the state of
the art. We may find out after several weeks of using the
average stock price as a means of measuring a particular

stock's value that a differ.. nt method of obtaining the daily
information is more suited to our needs, e.g., using dif~

ferent daily weightings, a different number of periods over
which to average, or a different procedure for calculating
the result. Enough flexibility in the system must be available
to allow for these variations. In many of the DSP applications, techniques are still in the developmental phase, and
therefore the algorithms tend to change over time. As an
example, speech recognition is presently an inexact technique requiring continual algorithmic modification. From
this example we can see the need for system flexibility so
that the DSP algorithm can be updated. A programmable
DSP system can provide this flexibility to the user.

14

HISTORICAL DSP. SOlUTIONS
Over the past several decades, digital signal processing
machines have taken on several evolutions in order to
incorporate these characteristics. Large mainframe com-

puters were initially used to process signals in the digital
domain. Typically, because of state-of-the-art limitations,
this was done in non real time. As the state of the art
advanced, array processors were added to the processing
task. Because of their flexibility and speed, array processors
have become the accepted solution for the research laboratory, and have been extended to end-applications in
many instances. However, integrated circuittechnology has
matured, thus allowing for the design of faster microprocessors and microcomputers. As a resuit, many digital signal processing applications have migrated from the array
processor to microprocessor subsystems (Le., bit-slice

machines) to Single-chip integrated circuit solutions. This
migration has brought the cost of the DSP solution down
to a point that allows pervasive use of the technology. The
increased performance of these highly integrated circuits
has also expanded DSP applications from traditional telecommunications to graphics!image processing, then to
consumer audio processing.

A recent development in DSP technology is the singlechip digital signal processor, such as the TMS320 family of
processors. These processors give the designer a DSP solution with its performance attainable only by the array processors a few years ago. Fig. 1 shows the TMS320 family in
graphical form with the y-axis indicating the hypothetical
performance and the x-axis being the evolution of the semiconductor processing technology. The first member of the
family, the TMS32010, was disclosed to the market in 1982
[11], [12]. It gave the system designer the first microcomputer capable of performing five million DSP operations
per second (5 MIPS), including the add and multiply functions [13] required in (1). Today there are a dozen spinoffs
from the TMS32010 in the first generation of the TMS320
family. Some of these devices are the TMS320C10,
TMS320C1S, and TMS320C17 [14]. The second generation
of devices include the TMS32020 [15] and TMS320C25 [16].
The TMS320C25 can perform 10 MIPS [16]. In addition,
expanded memory space, combined Single-cycle multiply!
accumulate operation, multiprocessing capabilities, and
expanded 110 functions have given the TMS320C25 a
2 to 4 times performance improvement over its predecessors. The third generation of the TMS320 family of processors, the TMS320C30 [26], [27], has a computational rate of
33 million DSP floating-point operations per second (33
MFLOPS). Its performance (speed, throughput, and precision) has far exceeded the digital signal processors available today and has reached the level of a supercomputer.
It we look closely at the TMS320 family as shown in Fig.
1, we can see that devices in the same generation, such as

the TMS320C10, TMS320C15, and TMS320C17, are assembly
object-code compatible. Devices across generations, such
as the TMS320C10 and TMS320C25, are assembly sourcecode compatible. Software investment on DSP algorithms
therefore can be maintained during the system upgrade.
Another point is that since the introduction of the
TMS32010, semiconductor processing technology has
emerged from 3-I'm NMOS to 2-I'm CMOS to 1-I'm CMOS.

The TMS320 Family of Digital Signal Processors

I----

2.4·pm NMOS

Fig.!. The TMS320 family of digital signal processors.
The TMS320 generations of processors have also taken the
same evolution in processing technology. low power consumption, high performance, and high-density circuit integration are some of the direct benefits of this semiconductor processing evolution.
From Fig. 1, it can be observed that various DSP building
blocks, such as the CPU, RAM, ROM, 110 configurations,
and processor speeds, have been designed as individual
modules and can be rearranged or combined with other
standard cells to meet the needs of specific applications.
Each of the three generations (and future generations) will
evolve in the same manner. As applications become more
sophisticated, semicustom solutions based on the core CPU
will become the solution of choice. An example of this
approach is the TMS320C17/E17, which consists of the
TMS320C10 core CPU, expanded 4K-word program ROM
(TMS320C17) or EPROM (TMS320E17), enlarged data RAM
of 256 words, dual serial ports, companding hardware, and
a coprocessor interface. Furthermore, as integrated circuit
layout rules move into smaller geometry (now at 2 I'm, rapidly going to 1 I'm), not only will the TMS320 devices become

smaller in size, but also multiple CPUs will be incorporated
on the same device along with application-specific 110 to
achieve low-cost integrated system solutions.
BASIC TMS320 ARCHITECTURE
As noted previously, the underlying assumption regarding a digital signal processor is fast arithmetic operations
and high throughput to handle mathematically intensive
algorithms in real time. In the TMS320 family [11]-[17). [26],
[27]. this is accomplished by using the following basic concepts:
Harvard architecture,
extensive pipelining,
dedicated hardware multiplier,
special DSP instructions,
fast instruction cycle.

The TMS320 Family of Digital Signal Processors

These concepts were designed into the TMS320 digital signal processors to handle the vast amount of data charac·
teristic of DSP operations, and to allow most DSP opera·
tions to be executed in a single-cycle instruction.
Furthermore, the TMS320 processors are programmable
devices, providing the flexibility and ease of use of generalpurpose microprocessors. The following paragraphs discuss how each of the above concepts is used in the TMS320
family of devices to make them useful in digital signal processing applications.
Harvard Architecture
The TMS320 utilizes a modified Harvard architecture for
speed and flexibility. In a strict Harvard architecture [18],
[19], the program and data memories lie in two separate
spaces, permitting a full overlap of instruction fetch and
execution. The TMS320 family's modification of the Harvard architecture further allows transfer between program
and data spaces, thereby increasing the flexibility of the
device. This architectural modification eliminates the need
for a separate coefficient ROM and also maximizes the processing power by maintaining two separate bus structures
(program and data) for full-speed execution.
Extensive Pipelining
In conjunction with the Harvard architecture, pipelining
is used extensively to reduce the instruction cycle time to
its absolute minimum, and to increase the throughput of
the processor. The pipeline can be anywhere from two to
four levels deep, depending on which processor in the family is used. The TMS320 family architecture uses a two-level
pipeline for its first generation, a three-level pipeline for its
second generation, and a four-level pipeline for its third
generation of processors. This means that the device is processing from two to four instructions in parallel, and each
instruction is at a different stage in its execution. Fig. 2 shows

an example of a three-level pipeline operation.

15

CLKOUT1

..-.
doc_

.......

N-'
N-2

..
II

N
N-'

..
II

11+,
N

..

Fig. 2. Three-level pipeline operation.

In pipeline operation, the prefetch, decode, and execute
operations can be handled independently, thus allowing
the execution of instructions to overlap. During any instruction cycle, three different instructions are active, each at a
different stage of completion. For example, as the Nth
instruction is being prefetched, the previous (N - 1)th
instruction is being decoded, and the previous (N - 2)th
instruction is being executed. In general, the pipeline is
transparent to the user.

(the closing price five days ago) was dropped and a new one
(today's closing price) was added. Or, each piece of the old
data is delayed or moved one sample period to make room
for the incoming most current sample. This delay is the
function of the DMOV instruction. Another special instruciion in the TMS32010 is the LTD instruction. It executes the
LT, DMOV, and APAC instructions in a single cycle. The LTD
and MPY instruction then reduce the numberof instruction
cycles per FIR filter tap from four to two. I n the second-generation TMS320, such as the TMS320C25, two more special
instructions have been included (the RPT and MACD
instructions) to reduce the number of cycles per tap to one,
as shown in the following:
RPTK 255

;REPEAT THE NEXT INSTRUCTION 256 TIMES

MACD

;LT, DMOV, MPY, AND APAC

(N

+

1)

Dedicated Hardware Multiplier

fast Instruction Cycle

As we saw in the general form of an FIR filter, multiplication is an important part of digital signal processing. For
each filter tap (denoted by i), a multiplication and an addition must take place. The faster a multiplication can be performed, the higher the performance of the digital signal
processor. In general-purpose microprocessors, the multiplication instruction is constructed by a series of additions, therefore taking many instruction cycles. In comparison, ~i:haracteristic of every DSP device is a dedicated
multiplier. In the TMS320 family, multiplication is a singlecycle instruction as a result of the dedicated hardware multiplier. If we look at the arithmetic for each tap of the FIR
filter to be performed by the TMS32010, we see that each
tap of the filter requires a multiplication (MPY) instruction.

The real-time processing capability is further enhanced
by the raw speed of the processor in executing instructions.
The characteristics which we have discussed, combined
with optimization of the integrated circuit design for speed,
give the DSP devices instruction cycle times less than 200
ns. The specific instruction cycle times for the TMS320 family are given in Table 2. These fast cycle times have made

;LOAD MULTIPLICAND INTO T REGISTER
LT
DMOV ;MOVE DATA IN MEMORY TO DO DELAY
MPY
;MULTIPLY
;ADD MUlTIPLICATION RESULT TO ACC
APAC
The other three instructions are used to load the multiplier
circuit with the multiplicand (Ln, move the data through
the filter tap (DMOV), and add the result of the multiplication (stored in the product register) to the accumulator
(APAC). Specifically, the multiply instruction (MPY) loads
the multiplier into the dedicated multiplier and performs
the multiplication, placing the result in a product register.
Therefore, if a 256-tap FIR filter is used, these four instructions are repeated 256 times. At each sample period, 256
multiplications must be performed. In a typical generalpurpose microprocessor, this requires each tap to be 30 to
40 instruction cycles long, whereas in the TMS320C10, it is
only four instruction cycles. We will see in the next section
how special DSP instructions reduce the time required for
each FIR tap even further.
Special DSP Instructions
Another characteristic of DSP devices is the use of special
instructions. We were introduced to one of them in the previous example, the DMOV (data move) instruction. In digital signal processing, the delay operator (z -') is very important. Recalling the stock market example, during each new
sample period (Le., each new day), the oldest piece of data

16

Table 2 TMS320 Cycle Times
Cycle Time
Device

(ns)

TMS320C10'
TMS32020
TMS320C25
TMS320C30

160-200
160-200
100-125
60-75

·The same cycle time applies to all of the first·generation processors.

the TMS320familyof processors highly suited for many realtime DSP applications. Table 1 showed the sample rates for
some typical DSP applications. This table can be combined
with the cycle times indicated in Table 2 to show how many
instruction cycles per sample can be achieved by the various generations of the TMS320 for real-time applications
(see Fig. 3).
As we can see from Fig. 3, many instruction cycles are
available to process the signal or to generate commands for
real-time control applications. Therefore, for simple control applications, the general-purpose microprocessors or
controllers would be adequate. However, for more mathematically intensive control applications, such as robotics
and adaptive control, digital signal processors are much
better suited [24). The number of available instruction cycles
is reduced as we increase the sample rate from 8 kHz for
typical telecommunication applications to 40-48 kHz for
audio processing. Since most of these real-time applications require only a few hundreds of instructions per sample (such as ADPCM [4), and echo cancelation [4)), this is
within the reach of the TMS320. For higher sample rate
applications, such as video/image processing, digital signal
processors available today are not capable of handling the
processing of the real-time video data. Therefore, for these

The TMS320 Family of Digital Signal Processors

Sample .....

Fig.]. Number of instruction cycles/sample versus sample rate for the TMS320 family.

types of applications, multiple digital signal processors and
frame buffers are usually required. From Fig. 3, it can also
be seen that for slower speed applications, such as control,
the first-generation TMS320 provides better cost-performance tradeoffs than the other processors. For high sample
rate applications, such as video/image processing, the second and third generations of the TMS320 with their multiprocessing capabilities and high throughput are better
suited.
Now that we have discussed the basic characteristics of
digital signal processors, we can concentrate on specific
details of each of the three generations of the TMS320 family devices.
THE FIRST GENERATION OF THE TMS320 FAMILY
The first generation of the TMS320 family includes the
TMS32010 (13), and TMS32011 [17], which are processed in
2.4-,..m NMOS technology, and the TMS320C10 [13],
TMS320C15JE15 (14), and TMS320C17/E17 (14), processed in
1.8-,..m CMOS technology. Some of the key features of these
devices are (14) as follows:
Instruction cycle timing:
-160 ns
-200 ns
-280 ns.
On-chip data RAM:
-144 words
-256 words [TMS320C15JE15, TMS320C17JE17).
On-chip program ROM:
-1.5K words
-4K words [TMS320C15, TMS320C17).
4K words of on-chip program EPROM (TMS320E15,
TMS320E17).
External memory expansion up to 4K words at full
speed.
16 x 16-bit parallel multiplier with 32-bit result.
Barrel shifter for shifting data memory words into the
ALU.
Parallel shifter.
4 x 12-bit stack that allows context switching.
Two auxiliary registers for indirect addressing.

The TMS320 Family of Digital Signal Processors

Dual-channel serial port [TMS32011, TMS320C17,
TMS320E17).
On-chip
companding
hardware
[TMS32011,
TMS320C17, TMS320E17).
Coprocessor interface (TMS320C17, TMS320E17).
Device. packaging
-40-pin DIP
-44-pin PLCC.

TMS32OC10
The first generation of the TMS320 processors is based
on the architecture of the TMS32010 and its CMOS replica,
the TMS320C10. The TMS32010 was introduced in 1982 and
was the first microcomputer capable of performing 5 MIPS.
Since the TMS32010 has been covered extensively in the
literature (4), (11)-[14), we will only provide a cursory review
here. A functional block diagram of the TMS32OC10 is shown
in Fig. 4.
As shown in Fig. 4, the TMS32OC10 utilizes the modified
Harvard architecture in which program memory and data
memory lie in two separate spaces. Program memory can
reside both on-chip (1.5K words) or off-chip (4K words). Data
memory is the 144 x 16-biton-chipdata RAM. There arefour
basic arithmetic elements: the ALU, the accumulator, the
multiplier, and the shifters. All arithmetic operations are
performed using two's-complement arithmetic.
ALU: The ALU is a general-purpose arithmetic logic unit
that operates with a 32-bit data word .. The unit can add, subtract, and perform logical operations.
Accumulator: The accumulator stores the output from the
ALU and is also often an input to the ALU. It operates with
a 32-bit word length. Theaccumulator isdivided into a highorder word (bits 31 through 16) and a low-order word (bits
15 through 0). Instructions are provided for storing the highand low-order accumulator words in data memory (SACH
for store accumulator high and SACL for store accumulator
low).
Multiplier: The 16 x 16-bit parallel multiplier consists of
three units: the T register, the P register, and the multi pier
array. The T register is a 16-bit register that stores the multiplicand, while the P register is a 32-bit register that stores
the product. In order to use the multiplier, the multiplicand

17

-

,.

Wf

MEN

IIRI

INSTRUCTION

MeifiJi
iN'f

i...

Ii§

c

PROGRAM
ROM
(1538 x 161

A,,-AOI
PA2·PAO

DATA RAM
(14411161

LEGEND:

Ace",

Accumulator

DATA

ARP = Auxiliarv register pointer
ARO '" Auxiliarv register 0
AR1 =

Auxiliary register 1

DP

Data page pointer

=

PC '" Program counter
P
'" P register
T
'" T register

,.

L-_ _ _~32

,.

Fig. 4. TMS320C10 functional block diagram.
must first be loaded into the T register from the data RAM
by using one ofthe following instructions: LT, LTA, or LTD.
Then the MPY (multiply) or the MPYK (multiply immediate)
instruction is executed. The multiply and accumulate oper·
ations can be accomplished in two instruction cycles with
the LTNLTD and MPY/MPYK instructions.
Shifters: Two shifters are available for manipulating data:
a barrel shifter and a parallel shifter. The barrel shifter performs a left-shift of 0 to 16 bits on all data memory words
that are to be loaded into, subtracted from, or added to the
accumulator. The parallel shifter, activated by the SACH
instruction, can execute a shift of 0,1, or 4 bits to take care
of the sign bits in two's-complement arithmetic calculations.
Based on the architecture of the TMS32010/C10, several
spinoffs have been generated offering different processor
speeds, expanded memory, and various 110 integration.
Currently, the newest members in this generation are the
TMS320C15/E15 and the TMS320C17/E17 [14].

18

TMS320C15/f15

The TMS320C15 and TMS320E15 are fully object-code and
pin-for-pin. compatible with the TMS32010 and offer
expanded on-chip RAM of 256 words and on-chip program
ROM (TMS320C15) or EPROM (TMS320E15) of 4K words. The
TMS320C15 is available in either a 2oo-ns version or a 160ns version (TMS320C15-25).
TMS320C17/f17

The TMS320C17/E17 is a dedicated microcomputer with
4K words of on-chip program ROM (TMS320C17) or EPROM
(TMS320E17), adual-channel serial port for full-duplex serial
communication, on-chip companding hardware (u-Iawl
A-law), a serial port timer for stand-alone serial communication, and a coprocessor interface for zero glue interface
between the processor and any 418/16-bit microprocessor.
The TMS320C17/E17 is also object-code compatible with the
TMS32010 and can use the same development tools. The

The TMS320 Family of Digital Signal Processors

Table 3 TMS320 First-Generation Processors
TMS320

Instruction
Cycle Time

On-Chip
progRaM

On-Chip
Prog EPROM

Data RAM

Off-Chip
Prog

Devices

(ns)

Process

(words)

(words)

(words)

(words)

Ref

TMS32010
TMS32010-25
TMS32011l-14
TMS32011
TMS320C10
TMS320C10-25
TMS320C15
TMS320C15-25
TMS320E15
TMS320C17
TMS320C17-25
TMS320E17

200

NMOS
NMOS
NMOS
NMOS
CMOS
CMOS
CMOS
CMOS
CMOS
CMOS
CMOS
CMOS

1.5K
1_5K
1.5K
1.5K
1.5K
1.5K
4.0K
4.0K

144
144
144
144
144
144
256
256
256
256
256
256

4K
4K
4K

[13]
[13]
[13]
[17]
[13]
[13]

1(,0

280
200
200
160
200
160
200
200
160
200

4.0K
4.0K
4.0K

device is based on the TMS320Cl0 core CPU with added
peripheral memory and I/O modules added on-chip. The
TMS320C17/E17 can be regarded as a semicustom DSP solution suited for high-volume telecommunication and consumer applications.
Table 3 provides a feature comparison of all members of
the first-generation TMS320 processors. References to more
detailed information on these processors are also prOVided.
THE SECOND GENERATION OF THE TMS320 FAMILY
The second-generation TMS320 digital signal processors
includes two members, the TMS32020 [15] and the
TMS320C25 [16]. The architecture ofthese devices has been
evolved from theTMS32010, the first memberoftheTMS320
family. Key features of the second-generation TMS320 are
as follows:
Instruction cycle timing:
-100 ns (TMS320C25)
-200 ns (TMS32020).
4K words of on-chip masked ROM (TMS320C25).
544 words of on-chip data RAM.
128K words of total program data memory space.
Eight auxiliary registers with a dedicated arithmetic
unit.
Eight-level hardware stack.
Fully static double-buffered serial port.
Wait states for communication to slower off-chip
memories.

Serial port for multiprocessing or interfacing to codecs.
Concurrent DMA using an extended hold operation
(TMS320C25).
Bit-reversed addressing modes for fast Fourier transforms (TMS320C25).
Extended-precision arithmetic and adaptive filtering
support (TMS320C25).
Full-speed operation of MAC/MACD instructions from
external memory (TMS320C25).
Accumulator carry bit and related instructions
(TMS320C25).
1.8-l'm CMOS technology (TMS320C25):
-68-pin grid array (PGA) package.
-68-pin lead chip carrier (PLCC) package.
2.4-l'm NMOS technology (TMS32020):
-68-pin PGA package.

The TMS320 Family of Digital Signal Processors

4.0K

On-Chip

4K
4K
4K
4K
4K

[13]

[14]
[14]
[14]
[14]
[14]

TMS320C25 Architecture

The TMS320C25 is the latest member in the second generation of TMS320 digital signal processors. It is a pin-compatible CMOS version of the TMS32020 microprocessor,
but with an instruction cycle time twice as fast and the inclusion of additional hardware and software features. The
instruction set is a superset of both the TMS32010 and
TMS32020, maintaining source-code compatibility. In addition, it is completely object-code compatible with the
TMS32020 sO that TMS32020 programs run unmodified on
the TMS320C25.
The 100-ns instruction cycle time provides a Significant
throughput advantage for many existing applications. Since
most instructions are capable of executing in a single cycle,
the processor is capable of executing ten million instructions per second (10 MIPS). Increased throughput on the
TMS320C25 for many DSP applications is attained by means
of single-cycle multiply/accumulate instructions with adata
move option (MAC/MAC D), eight auxiliary registers with a
dedicated arithmetic unit, instruction set support for adaptive filtering and extended-precision arithmetic, bit-reversal addressing, and faster I/O necessary for data-intensive
signal processing.
Instructions are included to provide data transfers
between the two memory spaces. Externally, the program
and data memory spaces are multiplexed overthe same bus
so as to maximize the address range for both spaces while
minimizing the pin count of the device. I nternally, the
TMS320C25 architecture maximizes processing power by
maintaining two separate bus structures, program and data,
for full-speed execution.
Program execution in the device takes the form of a threelevel instruction fetch-decade-execute pipeline (see Fig.
2). The pipeline is essentially invisible to the user, except
in some cases where it must be broken (such as for branch
instructions). In this case, the instruction timing takes into
account the fact that the pipeline must be emptied and
refilled. Two large on-chip data RAM blocks (a total of 544
words), one of which is configurable either as program or
data memory, provide increased flexibility in system design.
An off-chip 64K-word directly addressable data memory
address space is included to facilitate implementations of
DSP algorithms. The large on-chip 4K-word masked ROM
can be used for cost-reduced systems, thus providing for
a true Single-chip DSP solution. The remainder of the 64Kword program memory space is located externally. Large

19

features as well as many others such as a hardware timer,
serial port, and block data transfer capabilities.
A functional block diagram of the TMS320C25, shown in
Fig. 5, outlines the principal blocks and data paths within

programs can execute at full speed from this memory space.
Programs may also be downloaded from slow external
memory to on-chip RAM for full-speed operation. The VlSI
implementation of the TMS320C25 incorporates all of these

MP/Me

iJ'IfI2·01--P'-----'
A15·AO

015·00

LEGEND:
PC

ACCH
ACCL

- Accumulator high
~ Accumulator low

IFA
IMA

. Inte""pt mask register

He

AlU

= A,l1l1matle logic unit

IR

- Instruction legitiM

APTe

ARAU - Auxililll'V regiller .Ithmetlc unit
ARB
= AUldliary register pointe, butter
AAP
~ AUllni-v ~11.r pointe,
OP
~ Oe'e memory page poinler
DAR
'" Ser'elport de'e receive fltg'sier
OXR
- Serle' port dl'l tr...mil regi" ...

Interrupt flag register

Program counter

Pr.letch count ...
. Repeal inslNetion counter
GloMI memory "OCIItion "'lister
Serie' port receive shih register

MCS . MlcrocIl1 steck

GREG

QlR
PR

Queue'nstrUClion register
Jlq)duct registe'

ASA

PRO

P.dod regist ... for timer

XSR
ARO·AR7

Seriel port ulnsmit shth register
Aullmery regi$ter.

TIM

Timer

STO,ST1

StIlus ,,1"1

TA

. Tempo,.y,e"lslet

f''Io 5. TMS320C25 functional block diagram.

20

The TMS320 Family of Digital Signal Processors

the processor. Thediagram also shows all of the TMS320C25
interface pins.
In the following architectural discussions on the memory, central arithmetic logic unit, hardware multiplier, corio
trol operations, serial port, and I/O interface, please refer
to the block diagram shown in Fig. 5.
Memory Allocation: The TMS320C25 provides a total of
4K 16-bit words of on-chip program ROM and 544 16-bit
words of on-chip data RAM. The RAM is divided into three
separate Blocks (BO, B1, and B2). Of the 544 words, 256 words
(block BO) are configurable as either data or program memory by CNFD (configure data memory) or CNFP (configure
program memory) instructions provided for that purpose;
288 words (blocks B1 and B2) are always data memory. A
data memory size of 544 words allows the TMS32OC25 to
handle a data array of 512 words while still leaving 32 locations for intermediate storage. The TMS320C25 provides
64K words of off-chip directly addressable data memory
space as well as a 64K-word off-chip program memory space.
A register file containing eight Auxiliary Registers (AROAR7), which are used for indirect addressing of data memory and for temporary storage, increase the flexibility and
efficiency of the device. These registers may be either
directly addressed byan instruction or indirectlyaddressed
by a 3-bit Auxiliary Register Pointer (ARP). The auxiliary regIsters and the ARP may be loaded from either data memory
or by an immediate operand defined in the instruction. The
contents of these registers may also be stored into data
memory. The auxiliary register file is connected to the Auxiliary Register Arithmetic Unit (ARAU). Using the ARAU
accessing tables of information does not require the CAlU
for address manipulation, thus freeing it for other opera-

The output of the product register can be left-shifted 1 or
4 bits. This is useful for implementing fractional arithmetic
or justifying fractional products. The output of the PR can
also be right-shifted 6 bits to enable the execution of up to
128 consecutive multiple/accumulates without overflow.
An unsigned multiply (MPYU) instruction facilitates
extended-precision multiplication.
110 Interface: The TMS320C25 I/O space consists of 16
input and 16 output ports. These ports provide the full 16bit parallel I/O interface via the data bus on the device. A
single input (IN) or output (OUT) operation typically takes
two cycles; however, when used with the repeat counter,
the operation becomes single-cycle. I/O devices are mapped
into the I/O address space using the processor's external
address and data buses in the same manner as memorymapped devices. Interfacing to memory and I/O devices of
varying speeds is accomplished by using the READY line.
A Direct Memory Access (DMA) to external program/data
memory is also supported. Another processor can take
complete control of the TMS320C25's external memory by
asserting HOLD low, causing the TMS320C25 to place its
address, data, and control lines in the high-impedance state.
Signaling between the external processor and the
TMS320C25 can be performed using interrupts. Two modes
of DMA are available on the device. In the first, execution
is suspended during assertion of HOLD. In the second
"concurrent DMA" mode, the TMS320C25 continues to
execute its program while operating from internal RAM or
ROM, thus greatly increasing throughput in data-intensive
applications.
TMS320C25 Software

tions.

Central Arithmetic Logic Vnit (CAL V): The CAlU contains
a 16-bit scaling shifter, a 16 x 16-bit parallel multiplier, a 32bit Arithmetic logic Unit (ALU), and a 32-bit accumulator.
The scaling shifter has a 16-bit input connected to the data
bus and a 32-bit output connected to the ALU. This shifter
produces a left-shift of 0 to 16 bits on the input data, as programmed in the instruction. Additional shifters at the outputs of both the accumulator and the multiplier are suitable
for numerical scaling, bit extraction, extended-precision
arithmetic, and overflow prevention.
The following steps occur in the implementation of a typical AlU instruction:
1) Data are fetched from the RAM on the data bus.
2) Data are passed through the scaling shifter and the
AlU where the arithmetic is performed.
3) The result is moved into the accumulator.
The 32-bit accumulator is split into two 16-bit segments
for storage in data memory: ACCH (accumulator high) and
ACCl (accumulator low). The accumulator has a carry bit
to facilitate multiple-precision arithmetic for both addition
and subtract instructions.
Hardware Multiplier: The TMS320C25 utilizes a 16 x 16bit hardware multiplier, which is capable of computing a
32-bit product during every machine cycle. Two registers
are associated with the multiplier:
a 16-bit Temporary Register (TR) that holds one of the
operands for the multiplier, and
a 32-bit Product Register (PR) that holds the product.

The TMS320 Family of Digital Signal Processors

The majorityofthe TMS32OC25 instructions (97 out of 133)
are executed in a single instruction cycle. Of the 36 instructions that require additional cycles of execution, 21 involve
branches, calls, and returns that result in a reload of the
program counter and a break in the execution pipeline.
Another seven of the instructions are two-word, longimmediate instructions. The remaining eight instructions
support I/O, transfers of data between memory spaces, 6r
provide for additional parallel operation in the processor.
Furthermore, these eight instructions (IN, OUT, BLKD,
BlKP, TBlR, TBlW, MAC, and MACD) become single-cycle
when used in conjunction with the repeat counter. The
functional performance of the instructions exploits the parallelism of the processor, allowing complex andlor numerically intensive computations to be implemented in relatively few instructions.
Addressing Modes: Since most of the instructions are
coded in a single 16-bit word, most instructions can be executed in a single cycle. Three memory addressing modes
are available with the instruction set: direct, indirect, and
immediate addressing. Both direct and indirectaddressing
are used to access data memory: Immediate addressing uses
the contents of the memory addressed by the program
counter.
When using direct addressing, 7 bits of the instruction
word are concatenated with the 9 bits of the data memory
page pointer (DP) to form the 16-bit data memory address.
With a 12ft-word page length, the DP register points to one
of 512 possible data memory pages to obtain a 64K total data
memory space. Indirect addressing is prOVided by the aux-

21

iliary registers (ARO-AR7). The seven types of indirect
addressing are shown in Table 4_ Bit-reversed indexed
addressing modes allow efficient I/O to be performed for
the resequencing of data points in a radix-2 FFT program.

icated to distinct sections of the algorithm, throughput can
be increased viapipelined execution. The TMS320C25 is
capable of allocating up to 32K words of data memory as
global memory for multiprocessing applications.

Table 4 Addressing Modes of the TMS320C25

THE THIRO GEN'RATION OF THE TMS320 FAMILY

Addressing Mode
OPA
OP' (,NARP)
OP '+(,NARP)
OP '-(,NARP)
OP 'O+(,NARP)
OP 'O-(,NARP)
OP 'BRO+(,NARP)
OP 'BRO-(,NARP)

Operation
direct addressing
indirect; no change to AR.
indirect; current AR is incremented.
indirect; current AR is decremented.
indirect; ARO is added to current AR.
indirect; ARO is subtracted from
current AR.
indirect; ARO is added to current AR
(with reverse carry propagation).
indirect; ARO is subtracted from
current AR (with reverse carry
propagation).

Note: The optional NARP field specifies a new value of the ARP.

TMS32OC25 System Configurations

The flexibility of the TMS320C25 allows systems configurations to satisfy a wide range of application requirements
[16). The TMS320C25 can be used in the following configurations:

a stand·alone system (a single processor using 4K
words of on-chip ROM and 544 words of on-chip RAM),
parallel multiprocessing systems with shared global
data memory, or
host/peripheral coprocessing using interface control
si~nals.

A minimal processing system is shown in Fig. 6 using
external data RAM and PROM/EPROM. Parallel multiprocessing and host/peripheral coprocessing systems can be
designed by taking advantage of the TMS320C25's direct
memory access and global memory configuration capabilities.
In some digital processing tasks, the algorithm being
implemented can be divided into sections with a distinct
processor dedicated to each section. In this case, the first
and second processors may share global data memory, as
well as the second and third, the third and fourth, etc. Arbitration logic may be required to determine which section
of the algorithm is executing and which processor has
access to the global memory. With multi pie processors ded-

The TMS320C30 [26)-[27) is Texas Instruments third-generation member olthe TMS320 family of compatible digital
Signal processors. With a computational rate of 33 MFlOPS
(million floating-point operations per second), the
TMS320C30 far exceeds the performance of any programmable DSP available today. Total system performance has
been maximized through internal parallelism, more than
twenty-four thousand bytes of on-chip memory, single-cycle
floating-point operations, and concurrent I/O. The total system cost is minimized with on-chip memory and on-chip
peripherals such astimers,!nd serial ports. Finally, the user's
system design time is dramatically reduced with the availability of the floating-point operations, general-purpose
instructions and features, and quality development tools.
The TMS320C30 provides the user with a level of performance that, at one time, was the exclusive domain of

supercomputers. The strong architectural emphasis of providing a low-cost system solution to demanding arithmetic
algorithms has resulted in the architecture shown in Fig. 7.
The key features of the TMS320C30 [26), [27) are as follows:
60-ns single-cycle execution time, 1-l'm CMOS.
Two 1K x 32-bit single-cycle dual-access RAM blocks.
One 4K X 32-bit Single-cycle dual-access ROM block.
64 x 32-bit instruction cache.
32-bit instruction and data words, 24-bit addresses.
32/40-bit floating-point and integer multiplier.
32/4lJ..bit floating-point, integer, and logical ALU.
32-bit barrel shifter.
Eight extended-precision registers.
Two address-generators with eight auxiliary registers.
On.chip Direct Memory Access (DMA) controller for
concurrent I/O and CPU operation.
Peripheral bus and modules for easy customization.
High-level language support.
Interlocked instructions for multiprocessing support.
Zero overhead loops and single-cycle branches.
The architecture of the TMS320C30 is targeted at 60-ns
and faster cycle times. To achieve such high-performance

Fig. 6. Minimal processing system With external data RAM and PROM/EPROM.

22

The TMS320 Family of Digital Signal Processors

RAM
BLOCK 0
11k X 321

PROGRAM
CACHE
164 X 321

-

RAM

ROM

BLOCK 1

IIUlCK.
14K X 321

11K X 321

RUil5

Rl!l6A

mil

RiW

0131-01
A123·01

FSXO

DX.

IIRtT
iflfI3-0)
ilieI(

XFI1·01

0

INTEGER/
FlOATING·POINT
MULTIPLIER

INTEGERJ
FlOATING·POINT
"AlU

SOURCE AND OESTINATtON
ADORESS GENERATORS

CONTROL REGISTERS

N

VecI7-0)
VOBP
SUBS

CLKRO
FIX1

0

EXTENDED·PREctSlON
REGISTERS IAO·R71

DXI
eLKXI

X2/ClKIN
Vss110·01

'SAO

"""

32-81T BARREL SHIFTER

MC/MP

XI

ClKXO

ADDRESS
GENERATOR 0

'SRI

ADDRESS
GENERATOR 1

D,n
CutR1

AUXILIARY REGISTERS
IAAO·AR71

TCLKO
CONTROL REGtSTERS 1121
TCt..k1

FiB. 7. TMS320C30 functional block diagram.
goals while still providing low·cost system solutions, the
TMS320C30 is designed using Texas Instruments state-of·
the·art 1·"m CMOS process. The TMS320C30 's high system
performance is achieved through a high degree of paral·
lelism, the accuracy and precision of its floating-point units,
its on.bit format for immediate unSigned-integer operands and a 32-bit
single-precision unsigned-integer format.
The three floating-point formats are assumed to be normalized, thus providing an extra bit of precision. The first

23

is a 16-bit short floating-paint format for immediate floating-point operands, which consists of a 4-bit exponent, 1
sign bit, and an 11-bit fraction. The second is a single-precision format consisting of an 8-bit exponent, 1 sign bit, and
a 23-bit fraction. The third is an extended-precision format
consisting of an 8-bit exponent, 1 sign bit, and a 31-bit fraction.

The total memory space of the TMS320C30 is 16M (million) x 32 bits. A machine word is 32 bits, and all addressing
is performed by word. Program, data, and I/O space are contained within the 16M-word address space.
RAM blocks 0 and 1 are each 1 K x 32 bits. The ROM block
is 4K x 32 bits. Each RAM block and ROM block is capable
of supporting two data accesses in a single cycle. For example, the user may, in a single cycle, access a program word
and a data word from the ROM block.
The separate program data, and DMA buses allow for parallel program fetches, data reads and writes, and DMA operations. Management of memory resources and busing is
handled by the memory controller. For example, a typical
mode of operation could involve a program fetch from the
on-chip program cache, two data fetches from RAM block
0, and the DMA moving data from off-chip memory to RAM
block 1. AU of this can be done in parallel with no impact
on the performance of the CPU.
A 64 x 32-bit instruction cache allows for maximum system performance with minimal system cost. The instruction
cache stores often repeated sections of code. The code may
then be fetched from the cache, thus greatly reducing the
numberof off-chip-accesses necessary. This allows for code
to be stored off-chip in slower, lower cost memories. Also,
the external buses are freed, thus allowing for their use by
the DMA or other devices in the system.
DMA

The TMS320C30 processes an on-chip Direct Memory
Access (DMA) controller. The DMA controller is able to perform reads from and writes to any location in the memory
map without interfering with the operation of the CPU. As
a consequence, it is possible to interface the TMS320C30
to slow external memories and peripherals (A/Ds, serial
ports, etc.) without affecting the computational throughput
·ofthe CPU. The result is improved system performance and
decreased system cost.
The DMA controller contains its own address generators,
source and destination registers, and transfer counter.
Dedicated DMA address and data buses allow for operation
with no conflicts between the CPU and DMA controller.
The DMA controller responds to interrupts in a similar
way to the CPU. This ability allows the DMA to transfer data
based upon the interrupts received. Thus I/O transfers that
would normally be performed by the CPU may instead be
performed by the DMA. Again, the CPU may continue processing data while the DMA receives or transmits data.
Peripherals

All peripheral modules are manipulated through memory-mapped registers located on a dedicated peripheral bus.
This peripheral bus allows for the straightforward addition,
removal, and creation of peripheral modules. The initial
TMS320C30 peripheral library will include timers and serial
ports. The perfpherallibrary concept allows Texas Instni-

24

ments to create new modu les to serve a wide variety of
applications. For example, the configuration of the
TMS320C30 in Fig. 7 includes two timers and two serial ports.
Timers: The two timer modules are general-purpose
timer/event counters, with two Signaling modes and internal or external clocking.
Available to each timer is an I/O pin that can be used as
an input clock to the timer or as an output signal driven by
the timer. The pin may also be configured as a general-purpose I/O pin.
Serial Ports: The two serial ports are modular and totally
independent. Each serial port can be configured to transfer
8,16,24, or 32 bits of data per frame. The clock for each serial
port can originate either internally or externally. An internally generated divide-down clock is prOVided. The pins of
the serial ports are configurable as general-purpose I/O
pins. A special handshake mode allows TMS320C30s to
communicate over their serial ports with guaranteed

syh~

chronization. The serial ports may also be configured to
operate as timers.

External Interfaces

The TMS320C30 provides two external interfaces: the parallel interface and the I/O interface. The parallel interface
consists of a 32-bit data bus, a 24-bit address bus, and a set
of control Signals. The I/O interface consists of a 32-bit data
bus, a 13-bit address bus, and a set of control Signals. Both
ports support an external ready signal for wait-state generation and the use of software-controlled wait states.
The TMS320C30 supports four external interrupts, a number of internal interrupts, and a nonmaskable external reset
signal. Two dedicated, general-purpose, external I/O flags,
XFO and XF1, may be configured as input or output pins
under software control. These pins are also used by the
interlocked instructions to support multiprocessor communication.

Pipelining In the TMS32OC30

The operation of the TMS320C30 is controlled by five
major functional units. The five major units and their function are as follows:
Fetch Unit (F) which controls the program counter

updates and fetches of the instruction words from
memory.
Decode Unit (D) which decodes the instruction word
and controls address generation.
Read Unit (R) which controls the operand reads from
memory.
Execute Unit (E) which reads operands from the register file, performs the necessary operation, and writes
results back to the register file and memory.
DMA Channel (DMA) which reads and writes memory
concurrently with CPU operation.
Each instruction is operated upon by four of these stages;
namely, fetch, decode, read, and execute. To provide for
maximum processor throughput these units can perform
in parallel with each unit operating on a different instruction. The overlapping of the fetch, decode, read, and execute operations of different instructions is called pipelining. The DMAcontrolier runs concurrently with these units.
The pipelining of these operations is key to the high per-

The TMS320 Family of Digital Signal Processors

formanceof the TMS320C30. The ability of the DMA to move
data within the processor's memory space results in an even

greater utilization of the CPU with fewer interruptions of
the pipeline which inevitably yields greater performance.
The pipeline control of the TMS320C30 allows for
extremely high-speed execution rate by allowing an effective rate of one execution per cycle. It also manages pipeline conflicts in a way that makes them transparent to the
user.

While the pipelining of the different phases of an instruction is key to the performance of the TMS320C30, the
deSigners felt it essential to avoid pipelining the operation
of the multiplier or AlU. By ruling out this additional level
of pipelining it was possible to greatly improve the processor's useability.

Instructions
The TMS320C30 instruction set is exceptionally well
suited to digital signal processing and other numerically
intensive applications. The TMS320C30 also possesses a full
complement of general-purpose instructions. The instruction set is organized into the following groups:
load and store instructions;
two-operand arithmetic instructions;
two-operand logical instructions;
three-operand arithmetic instructions;
three-operand logic instructions;
parallel operation instructions;
arithmeticllogical instruction with store instructions;
program control instructions;
interlocked operations instructions.
The load and store instructions perform the movement
of a single word to and from the registers and memory.
Included is the ability to load a register conditionally. This
operation is particularly useful for locating the maximum
and minimum of a set of data.
The two-operand arithmetic and logical instructions con-

sist of a complete set of arithmetic instructions. They have
two operands; SfC and dst for source and destination,
respectively. The src operand may come from memory, a
register, or be part of the instruction word. The dst operand
is always a register. This portion of the instruction set
includes floating-point integer and logical operations, support of multi precision arithmetic, and 32-bit arithmetic and
logical shifts.
The three-operand arithmetic and logical instructions are
a subset of the two-operand arithmetic and logical instructions. They have three operands: two src operands and a
dst operand. The src operands may come from memory or
a register. The dst operand is always a register. These
instructions allow for the reading of two operands from
memory and/or the CPU register file in a single cycle.
The parallel operation instructions allow for a high degree
of parallelism. They support very flexible, parallel floatingpOint and integer multiplies and adds. They also include the
ability to load two registers in parallel.
The arithmetic/logical and store instructions support a
high degree of parallelism, thus complementing the parallel operation instructions. They allow for the performance
of an arithmetic or logical instruction between a register
and an operand read from memory, in parallel with the stor-

The TMS320 Family of Digital Signal Processors

ing of a register to memory. They also provide for extremely
rapid operations on blocks of memory.
The program control instructions consist of all those
operations that affect the program flow. This section of the
instruction set includes a set of flexible and powerful constructs that allow for software control of the program flow.
These fall into two main types: repeat modes and branching.
For many algorithms, there is an inner kernel of code
where most of the execution time is spent. The repeat modes

of the TMS320C30 allow for the implementation of zero
overhead looping. Using the repeat modes allows these
time-critical sections of code to be executed in the shortest
possible time. The instructions supporting the repeat
modes are RPTB (repeat a block of code) and RPTS (repeat
a single instruction). Through the useofthededicated stackpointer, block repeats (RPTBs) may be nested.
The branching capabilities of the TMS320C30 include two
main subsets: standard and delayed branches. Standard
branches, as in any pipelined machine that comprehends
them, empty the pipeline to guarantee correct management of the program counter. This results in a branch
requiring, in the case ofthe TMS320C30, four cycles to execute. Included in this subset are calls and returns. A standard branch (BR) is illustrated below.
BR
THREE ; standard branch.
MPYF
; not executed.
ADDF
; not executed.
SUBF
; not executed.
AND
; not executed.

THREE MPYF

; fetched 3 cycles after BR
is fetched.

Delayed branches do not empty the pipe, bu\.. rather,
guarantee that the next three instructions will be fetched
before the program counter is modified by the branch. The
result is a branch that only requires a single cycle. Every
delayed branch has a standard branch counterpart. A
delayed branch (BRD) is illustrated below.
BRD
THREE ; delayed branch.
MPYF
i executed.
; executed.
ADDF
; executed.
SUBF
AND
; not executed.

THREE MPYF

; fetched after SUBF fetched.

The combination of the repeat modes, standard branches,
and delayed branches provides the user with a set of programming constructs which are well suited to a wide range
of performance requirements.
The program control instructions also include conditional calls and returns. The decrement and branch conditionally instruction allows for efficient loop control by
combining the comparison of a loop counter to zero with

25

the check of condition flags, i.e., floating-point overflow.
Thecondition codes available include unsigned and signed

guage according to his application. The C compiler is supported on the TMS320C25 and the TMS320C30.

comparisons, comparisons to zero, and comparisons based

upon the status of individual condition flags. These conditions may be used with any of the conditional instructions.

The interlocked operations instructions support multi·
processor communication. Through the use of external signals, these instructions allow for powerful synchronization
mechanisms, such as semaphores, to be implemented. The
interlocked operations use the two external flag pins, XFO
and XF1_ XFO Signals an interlocked-operation request and
XF1 acts as an acknowledge signal for the requested interlocked operation. The interlocked operations include interlocked loads and stores. When an interlocked operation is
performed the external request and acknowledge signals
can be used to arbitrate between multiple processors sharing memory, semaphores, or counters.
DEVElOPMENT AND SUPPORT TOOLS

Digital signal processors are essentially application-specific microprocessors (or microcomputers). Like any other
microprocessor, no matter how impressive the performance of the processor or the ease of interfacing, without

good development tools and technical support, it is very
difficultto design it into the system. In developing an application, problems are encountered and questions areasked.
Oftentimes the tools and vendor support provided to the
designer are the difference between the success and failure
of the project.
The TMS320 family has a wide range of development tools
available [25]. These tools range from very inexpensive evaluation modules for application evaluation and benchmarking purposes, assemblerllinkers, and software simulators, to full-capability hardware emulators. A brief summary of these support tools is provided in the succeeding
subsections.
Software Tools
Assembler/linkers and software simulators are available
on PC and VAX for users to develop and debug TMS320 DSP
algorithms. Their features are described as follows:
Assembler/Linker: The Macro Assembler translates
assembly language source code into executable object
code. The Linker permits a program to be designed and
implemented in separate modules that will later be linked
together to form the complete program.
Simulator: The Simulator simulates operations of the
device in software to allow program verification and debug.
The simulator uses the object code produced by the Macro
Assembler/Linker.
C Complier: The C Compiler is a full implementation of
the standard Kernighan and Ritchie C as defined in The C
Programming Language [28]. The compiler supports the
insertion of assembly language code into the C sou rce code.
The user may also write functions in assembly language,
and then call these functions from the C source. Similarly,
C· functions may be called from assembly language.
Variables defined in the C source may be accessed in
assembly language modules and vice versa. The result is a
complier that allows the user to tailor the amount of highlevel programming versus the amount of assembly lan-

26

Hardware Tools

Evaluation mr;tdules and emulation tools are available for
in-circuit emulation and hardware program debugging for
developing and testing DSP algorithms in a real product
environment.
Evaluation Module (EVM): The EVM is a stand-alone single-board module that contains all of the tools necessary
to evaluate the device as well as provide basic in-circuit
emulation. The EVM contains a debug monitor, editor,
assembler, reverse assembler, and software communica-

tions to a host computer or a line printer.
SoftWare Development System (SWDS): The SoftWare
Development System is a PC plug-in card with similar functionality of the EVM.
Emulator (XDS): The eXtended Development System provides full-speed in-circuit emulation with real-time hardware breakpoint/trace and program execution capability
from target memory. By setting breakpOints based on internal conditions or external events, execution of the program
can be suspended and the XDS placed into the debug mode.
I n the debug mode, all registers and memory locations can
be inspected and modified. Full-trace capabilities at full
speed and a reverseassemblerthattranslates machine code
back into assembly instructions are included. The XDS system is designed to interface with either a terminal or a host
computer. In addition to the above design tools, other
development support is available [25]:
ApPLICATIONS

The TMS320 is designed for real-time DSP and other computation-intensive applications [4]. In these applications,
the TMS320 provides an excellent means for executing signal processing algorithms such as fast Fourier transforms
(FFTs), digital filters, frequency synthesis, correlation, and
convolution. The TMS320 also provides for more general'purpose functions via bit-manipulation instructions, block
data move capabilities, large program and data memory
address spaces, and flexible memory mapping.
To introduce applications performed by the TMS320, digital filters will be used as examples. The remaining portion
of this section will briefly cover applications, and conclude
by showing some benchmarks.
Digital Filtering

As discussed several times in this paper, the FIR filter is
simply the sum of products in a sampled data system. This
was shown in (1). A simple implementation of the FIR filter
uses the MACD instruction (multiply/accumulate and data
move) for each filter tap, with the RPT/RPTK instruction
repeating the MACD for each filter tap. As we saw earlier,
a 256-tap FIR filter can be implemented by using the following two instructions:
RPTK
255
MACD '-,COEFFP
In this example, the coefficients may be stored anywhere
in program memory (reconfigu rable on-chip RAM, on-chip
R0!'l, or external memories). When the coefficients are

The TMS320 Family of Digital Signal Processors

stored in on-chip ROM or externally, theentire on-chip data
RAM may be used to store the sample sequence. This allows
filters of up to 512 taps to be implemented. Execution of the
filter will be at full speed or 100 ns per tap as long as the
memory supports full-speed execution (either on-chip RAM
or high-speed external RAM).
Up to this point, it has been assumed that the filter coefficients are fixed from sample to sample. If the coefficients
are adapted or updated with time, such as in adaptive filters
for echo cancelation [4J, [20J, then the DSP algorithm
requires a greater computational capacity from the pro·
cessor. The requirement to adapt each of the coefficients,
usually with each sample, is accomplished by three instruc·
tions (MPYA or MPYS, ZAlR, and SAC H) on the TMS320C25
[16). A means of adapting the coefficients is the least-meansquare (lMS) algorithm given by the following equation:
b,(i + 1) = bk(i) + 28[e(;) , x(i - k»)

where b,(i + 1) is the weighting coefficient for the next sample period, bk(i) is the weighting coefficient for the present
sample period, 8 is the gain factor or adaptation step size,
e(;) istheerrorfunction,and x(; - k) isthe inputofthefilter.
In an adaptive filter, it is important to update the coef·
ficients bk(i) in order to minimize the error function e(i),

which is the difference between the output of the filter and
a reference signal. Quantization·errors are critical to the
performance of the filter when updating the coefficients
and can be minimized if the result is obtained by rounding
rather than truncating. For each coefficient in the filter at
a given point in time, the factor 2'8'e(;) is a constant. This
factor can then be computed once and stored in the T register for each of the updates. Thus the computational
requirement has become one multiply/accumulate plus
rounding. Without the new instructions, the adaptation of
each coefficient is five instructions corresponding to five
clock cycles. This is shown in the following instruction

LT

AR2,COEFFD ; LOAD ADDRESS OF
COEFFICIENTS.
AR3,LASTAP ; lOAD ADDRESS OF DATA
SAMPLES.
AR2
; errf = 2*B*e(i)
ERRF

ZAlR
MPYA

',AR3
'·,AR2

; ACC
; ACC

'+

+ errf'x(i - k) + 2"15
; PREG = errf'x(i-k+l)
; SAVE bk(i +1).

lRlK
lRlK
LARP

SACH

= bk(i)'2"16
= bk(i)'2"16

The adaptive filter coefficient update can further be simplified using the TMS320C30 [27] as shown below. The first
instruction defines the number of times to repeat the ker·
nel. The second instruction is the repeat-block instruction
(RPTB). The RPTB instruction allows the iterations olthe kernel to be performed with zero overhead looping. The kernel
assumes that the error term is stored in register RO. It is
important to note that all of the calculations are performed
in floating-point arithmetic. The MPYF3 is a three-operand
floating.point multiply of the input sample x(i - k), which
is stored in memory by the error term errf. The next step
is a three-operand floating.point add (ADDF3) of the change
in the filter tap to the filter tap in parallel with the store (STF)
of the previously updated filter tap. That is, the store (STF)
is to be performed in parallel with ADDF3. Thus the number
of cyles for a floating·point adaptation is only two.
lDI

N,RC

; load length N in·
to block repeat

RPTB

adapt

; repeat the adap·
tation loop N+1

MPYF3

'+ +ARO(1),RO,R1

; errf' x(i-k) - R1

ADDF3

'+AR1(1),R1,R2

STF

R2,'AR1 + +(1)

; b(k,i) + errf ' x(i - k)
R2
; R2 - b(k-1,i)

counter

sequence:

lRlK
lRlK
LARP
IT

AR2,COEFFD ; lOAD ADDRESS OF
COEFFICIENTS.
AR3,lASTAP ; lOAD ADDRESS OF DATA
SAMPLES.
AR2
ERRF
; errf = 2'B'e(i)

+ 2"15

times

adapt:

Since we have discussed the application of digital filterZAlH
ADD
MPY
APAC

',AR3
ONE,15
'-,AR2

SACH

'+

; ACC = bk(i)'2"16
; ACC = bk(i)'2"16 + 2"15
; ACC

=

bk(i)'2"16
+ errf'x(i - k) + 2"15
; SAVE bk(i+1).

ing, we can now describe several applications in the dreas
of telecommunications, graphicslimage processing, highspeed control, instrumentation, and numeric processing,

and then conclude this section with several benchmarks.
If more detail is needed on any of these applications, the
reader is referred to [4).
Telecommunications Applications

When the MPYA and ZAlR instructions are used, the
adaptation reduces to three instructions corresponding to
three clock cycles, as shown in the following instruction
sequence. Note that the processing order has been slightly
changed to incorporate the use of the MPYA instruction.
This is due to the fact that the accumulation performed by
the MPYA is the accumulation of the previous product.

The TMS320 Family of Digital Signal Processors

Many aspects of the telecommunications network can
take advantage of the TMS320. As telecommunications
evolves more toward an all·digital network, DSPwili become
even more utilized [23). Several typical uses of the TMS320
are discussed.
Echo Canceler: In echo cancellation [4), [20), an adaptive
FIR filter performs the modeling routine and signal modifications to adaptively cancel the echo caused by the
impedance mismatches in the telephone transmission lines.

27

For this application, a large on-chip RAM of 544 words and
on-chip ROM of 4K words on the TMS320C25 provides for
a 256-tap adaptive filter (32-ms echo cancellation) to be executed in a single chip without external data or program
memory.
High-Speed Modems: The TMS320 can perform numerous functions such a modulation/demodulation, adaptive
equalization, and echo cancellation [21], [22J. For lower
speed modems, such as Ben 212A and V.22 bis modems, the
TMS320C17 provides the most cost-effective single-chip
solution to these applications. For higher speed modems,
such as the V.32, requiring more processing power and
multiprocessing capabilities, the TMS320C25 and TMS320C30 are the designer's choice.
Voice Coding: Voice-coding techniques [3], [4J, such
as full-duplex 32-kbit/s ADPCM (CCID G.721), CVSD,
16-kbit/s subband coders, and LPC, are frequently used in
voice transmission and storage. Arithmetic speed, nor~
malization, and the bit-manipulation capability of the
TMS320 provide for implementation of these functions,
usually in a Single chip. For example, the TMS320C17 can
be used as a Single-chip ADPCM [4J, subband [4J, or LPC [4J
coder. An application of voice coding is an ADPCM transcoder implemented in half-duplex on a Single TMS320C17
or full-duplex on a TMS320C25 for telecommunication multiplexing applications. Another example is a secure-voice
communication system, requiring voice coding, as well as
data encryption and transmission over a public-switched
network via a modem; the TMS320C25 offers an ideal solution.

high data-transfer rate (ten million 16-bit words per second). Both devices can be used in dosed-loop systems for
control signal conditioning, filtering, high-speed computing, and multichannel multiplexing capabilities. The following demonstrates two typical control applications:
Disk Control: Digital filtering in a dosed-loop actuation
mechanism positions the read/write heads over the disk
surface. Supplemented with many general-purpose features, the TMS320 can replace costly bit-slice/custom/analog solutions to perform such tasks as compensation, filtering, fine/coarse tuning, and other signal conditioning
algorithms.
Robotics: Digital Signal processing and bit-manipulation
power, coupled with host interface, allow the TMS320C25
to be useful in robotics control [24J. The TMS320C25 can
replace both the digital controllers and analog signal processing hardware for communication to a central host prOM
cessor and for the performance of numerically intensive
control functions.

Graphics/Image Processing Applications

Numeric and array processing applications benefit from
TMS320 performance. High throughput resulting from features, such as a fast cycle time and anon-chip hardware
multiplier, combined with multiprocessi ng capabilities and
data memory expansion, provide for a low·cost, easy·to-use
replacement for a typical bit-slice solution. The TMS320C30's floating-point precision, high throughput, and
interface flexibility are excellent for this application.

In graphics and image processing applications [4J, the
ability to interface with a host processor is important. Both
the TMS320C30 and the TMS320C25 multiprocessor interface enable them to be used in a variety of host/coprocessor
configurations [4J. Graphics and image processing applications can use the large directly addressable external data
space and global memory capability to allow graphical
images in memory to be shared with a host processor, thus
minimizing unnecessary data transfers. The indexed indirect addressing modes allow matrices to be processed rowby-row when performing matrix multiplication for threedimensional image rotations, translations, and scaling.
The TMS320C30 has a number of features that support
graphics and image processing extremely well. The floating-point capabilities allow for extremely precise computation of perspective transformations. They also support
more sophisticated algorithms such as shading and hidden
line removal, operations which are computationally intensive.
The large address space allows for straightforward
addressing of large images or displays. The flexible addressing registers, coupled with the integer multiply, support
powerful addressing of multiple-dimensional arrays. Vector-oriented instructions allow the user to efficiently
manipulate large blocks of memory. Finally, the on-chip
DMA controller allows the user to easily overlap the processing of data with its I/O.

Instrumentation
Instrumentation, such as spectrum analyzers and various

high-speed/high-precision instruments, often requires a
large data memory space and the high performance of a
digital signal processor. The TMS320C25 and TMS320C30
are capable of performing very long-length FFTs and generating precision functions with minimal external hardware.
Numeric Processing

TMS320 Benchmarks
To complete the discussion on the applications that the
TMS320 can perform, we will provide some benchmarks.
The TMS320 has demonstrated impressive benchmarks in
performing some of the common DSP routines and system
applications. Table 5 shows typical TMS320 benchmarks [4J.
Table 5

TMS320 Family Benchmarks

Third
First
Second
DSP Routines/Applications Generation Generation Generation
FIR filter tap

400 ns

100 ns

256-tap FIR sample rate
lMS adaptive FIR filter tap
256-tap adaptive FIR filter
sample rate
Oi-quad filter element (five
multiplies)
Echo canceler (single

9.25 kHz

37 kHz

700 ns

400 ns

5.4 kHz

9.5 kHz

2 "'
8

m,

1 "s
32 ms

60 ns
>60 kHz
180 ns

>20 kHz
360 n,
>64 ms

chip)
SUMMARY

High-Speed Control
High-speed control applications [4J, [24J use the
TMS320C17 and TMS320C25 general-purpose features for
bit-test and logical operations, timing synchronization, and

28

This paper has discussed characteristics of digital signal
processing and how these characteristics have influenced
the architectural design of the Texas Instruments TMS320
family of digital signal processors. Three generations of the

The TMS320 Family of Digital Signal Processors

TMS320 family were covered, and their support tools necessary to develop end-applications were briefly reviewed.
The paper concluded with an overview of digital signal processing applications using these devices.
REFERENCES

(1) L. R. Rabin~r and B. Gold, Theory and Application of Digital
Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1975.
[2] A. V. Oppenheim, Ed., Applications of Digital Signal Process-

ing. Englewood Cliffs, NJ: Prentice-Hall, 1978.
(3] l. R. Rabiner and R. W. Schafer, Digital Processing of Speech
Signals. Englewood Cliffs, NJ: Prentice-Hall, 1978.
[41 K. lin, Ed., Digital Signal Processing Applications with the
TMS320 Family. Englewood Cliffs, NJ: Prentice-Hall, 1987
[5] A. V. Oppenhiem and R. W. Schafer, Digital Signal Processing.
Englewood Cliffs, NJ: Prentice-Hall, 1975.

[6] C. Burrus and T. Parks, DFTIFFTand Convolution Algorithms.
New York, NY: Wiley, 1985.
[7] T. Parks and C. Burrus, Digital Filter Design. New York, NY:
Wiley, 1987.
[8J J. Treichler, C. Johnson, and M. larimore, A Practical Guide
to Adaptive Filter Design. New York, NY: Wiley, 1987.
[9] P. Papamichalis, Practical Approaches to Speech Coding.
Englewood Cliffs, NJ: Prentice-Hall, 1987.
[10] R. Morris, Digital Signal Processing Software. Ottawa,OnL,
Canada: DSPS Inc., 1983.
(11) K. McDonough, E. Caudel, 5. Magar, and A. leigh. "Microcomputer with 32-bit arithmetic does high-precision number
crunching," Electronics, pp. 105-110, Feb. 24, 1982.
[12] S. Magar, E. Caudel, and A. leigh, "A Microcomputer with
digital signal processing capabl1ity," in 1982Int. Solid State
Conf. Dig. Tech. Pap., pp. 32-33, 284, 285.
[13] First Generation TMS320 User's Guide. Houston, TX: Texas
Instruments Inc., 1987.

The TMS320 Family of Digital Signal Processors

[14]

TMS320 First-Generation Digital Signal Processors Data Sheet.

[15]

TMS32020 User's Guide. Houston, TX: Texas Instruments
Inc., 1985.
TMS320C25 User's Guide. Houston, TX: Texas Instruments

Houston, TX: Texas Instruments Inc., 1987.

[16]

Inc., 1986.
[17] TMS32011 User's Guide. Houston, TX: Texas Instruments
Inc., 1985.
[18] H. Cragon, "The elements of single-chip microcomputer
architecture," Comput. Mag., vol. 13, no. 10, pp. 27-41, Oct.
1980.
[19] S. Rosen, "Electronic computers: A historical survey," Comput. Surv., vol. 1, no. 1, Mar. 1969.
[20] M. Honig and O. Messerschmitt, Adaptive Filters. Oordrecht, The Netherlands: Kluwer, 1984.
[21] R. lucky et al., Principles of Data Communication. New York,
NY: McGraw-Hili, 1965.
[22] P. Van Gerwen et al., "Microprocessor implementation of
high speed data modems," IEEE Trans. Commun., vol. COM25, pp. 238-249, 1977.
[23] M. Bellanger, "New applications of digital signal processing
in communications," IEEE ASSP Mag., pp. 6-11, July 1986.
[24] Y. Wang, M. Andrews, S. Butner, and G. Beni, "Robot-controller system," in Proc. Symp. on Incrementa' Motion Control Systems and Devices, pp. 17-26, June 1986.
(25)

TMS320 Family Development Support Reference Guide.

Houston, TX: Texas Instruments Inc., 1986.
(26) R. 5imar, T. leigh, P. Koeppen, J. leach, J. Potts, and D. Blalock, "A 40 MFlOPS digital signal processor: The first supercomputer on a chip," in Proc. IEEE Int" Conf. on Acoustics,
Speech, and Signal Processing, Apr. 1987.
[27] TMS320C30 User's Guide. Houston, TX: Texas Instruments
Inc., 1987.
[28] B. Kernighan and D. Ritchie, The C Programming Language.
Englewood Cliffs, NJ: Prentice-Hall, 1978.

29

30

The TMS320 Family of Digital Signal Processors

The TMS320C30 Floating-Point
Digital Signal Processor

Panos Papamichalis
Ray Simar, Jr.
Digital Signal Processor Products-Semiconductor Group
Texas Instruments

Reprinted from
IEEE MICRO MAGAZINE
Vol. 8, No.6, December 1988

31

32

The TMS320C30 Floating-Point Digital Signal Processor

The TMS320C30
Floating-Point
Digital Signal Processor

D

igital s.ignal processors have significantly impacted the way we bring
real-time implementations of sophisticated DSP algorithms to life.
What was once only a laboratory curiosity that required large computers or specialized, bulky, and expensive hardware is now incorporated into lowcost consumer products. The rapid advancement of programmable DSPs since
their commercial introduction in the early 1980s lets us satisfy the needs of very
demanding applications. Implementation of basic DSP functions, such as digital
filters and fast Fourier transforms, has been integrated into advanced system
solutions involving speech algorithms, image processing, and control applications. The variety of the applications increases every day as researchers,
developers, and entrepreneurs discover new areas in which DSP devices can be
used. At the same time, the design of new devices incorporates features that make
such implementations easier.
The Texas Instruments family ofTMS320 DSPs' evolved with the expanding
needs of the DSP applications and currently encompasses over 17 devices. The
TMS320 family consists of three generations of devices. The first two generations are 16-bit, fixed-point-arithmetic devices while the third one, represented
by the TMS32OC30 and explained in detail here, is a 32-bit, floating-point
device. Architecturally, the TMS320 family, like most DSP devices, relies on
mUltiple Harvard buses. In the first two generations, we expanded the basic
Harvard architecture to permit communication between the program and data
spaces. In the third generation, we unified the two spaces to form an organization
that encompasses the advantages of both the Harvard and the von Neumann
architectures.

Overview of the TMS320C30
The 32OC30 is a fast processor (16.7 million instructions per second for an
instruction cycle time of 60 nanoseconds) with a large memory space (16 million
32-bit words) and floating-point-arithmetic capabilities. This last feature is a
major trend in new DSP devices, which was developed to answer the need for
quicker, more accurate solutions to numerical problems. DSP algorithms, being
very intensive numerically, cause a designer to worry about overflows and the
accuracy of results. The introduction of floating-point capabilities eliminates
these difficulties.

Panos Papamichalis
Ray Simar, Jr.
Texas Instruments

©1989 IEEE. Reprinted, with permission, from IEEE MICRO MAGAZINE;
Vol. 8, No.6, pp. 10-28; December 1986

The TMS320C30 Floating-Point Digital Signal Processor

33

In the 32OC30, a chip design with I-J.1m geometries
produces instruction cycle times lower than those achieved
with the fixed-point devices of the first two generations. In
addition, the design produces a controlled increase in die
size that results more from the extended on-chip memory
spaces than from the floating-point capabilities.
The pipelined architecture of the 320C30 permits the
higher throughput achieved by the device, as we explain
later. Yet, programmers do not have to worry about the
pipeline when writing the code. We can describe the design
philosophy of the 320C30 (as well as all the other devices
in the TMS320 family) as an "interlocked" or "hiddenpipeline" approach. When writing the program, programmers can assume that the result of any instruction will be
available for the next instruction. Most of the instructions
execute in one machine cycle. If a conflict arises between
executing an instruction in one cycle and having the data
available for the next instruction, the device automatically
inserts the necessary delay to eliminate the conflict. Since
this delay could result in loss of performance, we provide
development tools that identify where such conflicts occur.
With this data, programmers can rearrange and optimize
code.
Many applications, such as graphics and image processing, are difficult to implement on the earlier DSP devices
because they require a large memory space. To satisfy this
need, the 320C30 provides a total memory space of 16
million32-bit words, memory several orders of magnitude
larger than the fixed-point devices. Furthermore, it contains significantly increased on-chip memory: six thousand 32-bit words of RAM and ROM. The desire to have
a device capable of offering system-level solutions to the
implemented algorithms guided the design decision to
increase on-chip memory. In other words, the 320C30
attempts to offer the capability of implementing an algorithm with as little peripheral circuitry as possible.
Along the same lines, the 32OC30 contains a peripheral
bus on which on-chip peripherals can be attached using a
memory-mapped approach. Currently available peripherals include two serial ports, two timers, and a DMA
controller. The modularity of the design permits easy
change, addition, or deletion of peripherals to accommodate different needs. For instance, if a J.1-law-to-linear
format converter or a gate array is more important than one
of the timers for certain applications, a user can make the
change without impacting the core of the device.
As the power of the DSP devices increases, so does the
sophistication of the algorithms that are implemented. The
implication is that constructing and debugging an algorithm at the assembly-language level becomes a more and
more tedious task. To address that problem, we provide the
320C30 development tools, which include a high-Ievellanguage compiler and a DSP operating system. The extended memory space, the software stack, and the large onchip register file also facilitate such a development. We've
already introduced a C compiler and announced an Ada
compiler. We expect compiler availability to change sig-

34

nificantly the way DSP algorithms are ported to DSP
devices. With these tools, programmers can develop the
algorithms on large computers, requiring at the most only
selective optimization when they incorporate the algorithm on the 320C30.
Here, we describe the 320C30 architecture in detail,
discussing both the internal organization of the device and
the external interfaces. We also explain the pipeline structure, addressing software-related issues and constructs,
and examine the development tools and support. Finally,
we present examples of applications.

Architecture of the 320C30
Studying the architecture of the device helps in understanding how the different components contribute toward
a high-throughput system. The interaction and the efficient
use of the parts can contribute to very effective programming. Another very important aspect to consider is the
system cost of the application. We designed the device to
incorporate on-chip features that minimize the amount and
the cost of external logic, thus leading to very compact and
cost-effective solutions. These advantages become explicit when looking at the architecture in detail. The internal structure of the 32OC30, as shown in Figure I, consists
of the
• on-chip memory and cache,
• CPU with register file,
• peripheral bus and peripherals, and
• interconnecting buses.
See Figure 2 for the die photograph. To interface willi
the external world, the 32OC30 provides pins correspondingto
• two buses (primary and expansion),
• two serial ports and two timers,
• four external interrupt signals,
• two external flags, and
• hold and hold-acknowledge signals.
In addition, other pins exist for address and data strobs,
power, and so on.
The overall architecture of the device is a Harvard type
in the sense that internally and externally it has multiple
buses to access program instructions, data, or perform
DMA transfers. However, italso has a von Neumann flavor
since the memory sp,ace is unified, and there is no separation of program and data spaces. As a result, the user can
choose to locate programs and data at any desired location.
Some of the major features of the 32OC30 are:
• a 60-ns cycle time that results in execution of over 16
million instructions per second (MIPS) and over 33 million'
floating-point operations per second (Mflops);
• 32-bit data buses and 24-bit address buses for a 16Mword overall memory space;
• dual-access, 4K X 32-bit on-chip ROM and 2K X 32bit on-chip RAM;

The TMS320C30 Floating-Point Digital Signal Processor

"'"
1:'"
E

D

it

RDY
HOLD
HOLDA
STRB

"'"c::

D

o

'in

FWi

:ij
~
w

D (3l--1l)
A (23--1l)

RESET
INT (3--0)
C

Integerl
floating-point
multiplier

Address
generators

Integerl
floating·point
ALU

Control registers

Extended-precision
registers (8)

MC/MP
Xl
X2/CLKIN
Vee (7--1l)
Vss (10--1l)

r
0

I
I

Address
generator 0

Address
generator 1

e
Auxiliary registers (8)

VBBP
SUBS

Control registers (12)

Figure 1. Block diagram of the TMS320ClO architecture.

• a 64 X 32-bit program cache;
• a 32-bit integer/40-bit floating-point multiplier and
ALU;
• eight extended-precision registers, eight auxiliary
registers, and 12 control and status registers;
• generally single-cyCle instructions;
• integer, floating-point, and logical operations;
• two- and three-operand instructions;
• an on-chip DMA controller; and
• fabrication in I-~ CMOS technology and packaging in a ISO-pin package.
Memoryorganization. The 32OC30 provides 4K 32bit words of on-chip ROM, and 2K 32-bit words of on-chip
RAM. The on-chip ROM is mapped into the first 4K of the
overall memory map; it is accessed when the processor
operates in the microcomputer mode. Location 0 of the
memory map holds the reset vector, and adjacent locations
hold other interrupt vectors. In microprocessor mode, the
reset vector resides in external memory, and on-chip ROM
is not accessed. The 2K on-chip RAM consists physically
of two segments of 1K words each. These two segments of
RAM are mapped into adjacent sections of the memory.
Figure 3 on the next page shows the arrangement of the onchip memory, as well as the cache, buses, and two external
interfaceslbuses, which we examine later.

Figure 2. Die photograph of the 320C30.

The TMS320C30 Floating-Point Digital Signal Processor

35

r
/

Cache
(64 x 32)

T

I rrYnrhI I
RAM
block 0
(tKx 32)

RAM
block t
(1K x 32)

rf~r+

I

PDATA BUS,

I I I

I I

I I

I

ROM
block
(4K x 32)

r=?T
I

I h~

I

I

PADDR BUS

I I

I I

I

I

I

'"

DDATA BUS
M
U

X

I I

I

::>
.0

I

M
U

DADDRt BUS

I

X

I

ffi
a.
x

W

DADDR2 BUS

I I

c:
o

'iii

I

I
DMADATA BUS

I I

I

I
DMAADDR BUS

~

II

Program counterllnstructlon register
(PC/IR)

II

CPU

DMA

/

Figure 3. On-c:hip memory, cache, and buses.

The internal memory (both ROM and RAM) supports
two accesses for reads and/or writes in one cycle. This key
feature permits high throughput and ease of programming,
since it makes possible three-operand instructions with
two operands residing in the memory. Notice that, to
support this feature, we include two buses dedicated to data
addresses (DADDR I, DADDR2) and one bus to carry the
data (DDATA). There are also separate program buses,
PDATA and PADDR.
The address buses are 24 bits wide, indicating that the
overall memory space is 16 million (32-bit) words. We
believe this large space will facilitate implementation of
algorithms in image processing applications that often
require large amounts of memory. The unified memory
space offers flexibility in placing program and data. But it
also permits optimal use of the memory space as a trade-off
between program and data.
An important addition to the architecture is the 64-word
instruction cache. To reduce the overall system cost of
applications, system designers often use slower (and
cheaper) external memories, a tactic that could slow down
the processor and degrade the performance. The instruction cache addresses this problem by storing on-chip instructions that have been fetched previously. Its main
advantage becomes obvious when loops must be executed.
In this case, the first time the instructions are fetched, they
are also stored in the cache. Any subsequent execution of
the loop does not access external memory but fetches
instructions from the cache, resulting in higher speed and

36

making the external buses available for data transfers.
The cache is segmented into two sections of 32 words
each that are transparent to users. A USer can, however,
control the operation of the cache by manipulating three
control bits that are contained in the status register of the
CPU. Each control bit is dedicated to a specific operation:
cache enable/disable, cache freeze, and cache clear. When
a cache miss occurs, that is, when the next instruction is not
included in the cache, the instruction is brought in and also
stored in the cache. The two cache sections are updated on
a least recently used basis.
CPU organization. The CPU consists of the ALU
(arithmetic logic unit), the hardware multiplier, and the
register file. These units are shown in Figure 4.
The register file consists of
·eight 40-bit-wide, extended-precision registers RO
through R7,
• eight 32-bit auxiliary registers ARO through AR 7,
and
• twelve 32-bit control registers.
The extended-precision registers function as accumulators and can handle both floating-point and integer numbers. When they are used for floating-point numbers, the
top eight bits represent the exponent and the bottom 32 bits
the mantissa of the number. In their integer format, registers RO through R7 use only their bottom 32 bits, keeping
the top 8 bits unchanged in any integer or logical operation.

The TMS320C30 Floating-Point Digital Signal Processor

The eight auxiliary registers ARO through AR7 can
function as memory pointers in indirect addressing, as loop
counters, or as general-purpose registers in integer arithmetic or logical operations. Associated with these registers
are two auxiliary register arithmetic units (ARAU) that
generate two memory addresses in parallel for the instructions that need them. The flexibility of indirect addressing
increases even further when two index registers are used in
conjunction with the auxiliary registers, as we discuss
later.
The register file contains 12 control registers designated
for specific functions. If the control registers are not used
for these functions, they can be treated as general-purpose
registers in integer arithmetic and logical operations.
Examples of such control registers are the

CPU 2

• status register,
• index registers,
• stack pointer,
• interrupt mask and interrupt flag registers, and
• repeat-block registers.
In particular, the stack-pointer register points to the
software stack. The user has the flexibility of designating
where the stack resides, and even of changing its location
during the program execution. This feature also makes the
stack of essentially unlimited depth and permits its usage
not only for storing the program counter during subroutine
calls but also for passing arguments to subroutines. Such an
arrangement is particularly convenient in the development
of compilers, and we have used it extensively in the
32OC30's optimizing C compiler.
The ALU performs floating-point, integer, and logical
operations. The ALU always stores the result in the register
file, but the input can come either from the register file or
from memory, or it can be an immediate value.
In the case of floating-point arithmetic, the input to the
ALU can originate from either a 4O-bit extended-precision
register or a 32-bit memory datum. Registers RO through
R7 store the 4O-bit-word result. On the other hand, in
integer arithmetic, both input and output are 32-bit numbers, and the output can move to either the lower 32 bits of
the RO through R7 registers or to any other register in the
register file.
The single-cycle hardware multiplier has been an integral part of DSPs because any real-time application relies
on the fast execution of multiplies. Following the same
distinction as in the previous paragraph on the ALU, the
multiplier performs both floating-point and integer multiplications. The 32-bit inputs to a floating-point multiplication yield a 40-bit-wide result for storage in one of the
extended-precision registers.
In both the ALU and the multiplier the results of the
operations are automatically normalized, thus handling
any overflows of the mantissa. If there is an exponent
overflow, the result is saturated in the direction of overflow
and the overflow flag is set. Underflows are handled by
setting the result to zero and setting an underflow flag.

l

CPU 1

ALU

Multiplier

32·bit
barrel
shifter

N

a:
o
o

«
o

Figure 4. The 320C30 cenlral processing unit.

Buses and peripherals. Figure 3 shows that multiple
on-chip buses handle program, data, and DMA operations
in parallel. The device contains separate address and data
buses for these three operations, with the data having two
address buses to accommodate the access of multiple
operands from the memory in one cycle. Also, separate
buses lead to the register file. The rule to remember is that,
in one cycle, up to two data memory accesses are permitted
for anyon-chip memory block. This multiplicity of buses
eliminates bottlenecks. The user can maximize the throughput of the device by ajudicious combination ofthe on-chip
memory with the two external buses (the primary bus and
the expansion bus).
The primary bus contains a 24-bit address bus and a 32bit data bus. Its true space, though, is 16M words minus the
on-chip memory and the expansion bus. The primary bus
can be placed in high impedance when the device is put on
hold. To facilitate its interfacing with slow memories, the
320C30 offers programmable wait states (up to seven) as
well as an external ready signal.
The expansion bus contains a 13-bit address bus and a
32-bit data bus. It has two strobes, one for memory and one
for I/O accesses. In other words, the memory space of the

The TMS320C30 Floating-Point Digital Signal Processor

37

Software

.
"
.

Serial port 0

.Q

!;

~
'to

'to
'to

~

.

~

J:!

J:!

!;

i

.2~

.,

Serial port 1

Timer 0

Q.

~

Timer 1

DMA controller

Figure 5. Peripheral bu.s and peripherals.

expansion bus is two segments of 8K words each. one
segment mapped as regular memory and the other one
mapped as I/O. Like the primary bus. the expansion bus
has up to seven software-programmable wait states.
A major innovation in the 320C30--to support systemlevel solutions and to help in adapting the device to
changing needs-is the peripheral bus shown in Figures 1
and 5. The peripheral bus supplies a way of expanding or
varying the interface with the outside world without changing the core of the device. All of the peripherals attached to
this bus are mapped to memory. and they can be replaced
by others with a minimal effort if certain applications have
different demands.
Currently. we have implemented a DMA controller. two
serial ports. and two timers as peripherals. The DMA
controller performs reads from and writes to any location
in the 320C30 memory map without interfering with the
operation of the CPU. The DMA controller contains its
own address generators. source and destination address
registers. and transfer counter. The two modular and totally
independent serial ports are identical with a complementary set of control registers. Each serial port can be configured to transfer 8. 16.24. or 32 bits of data per word. with
each port clock originating either internally or externally.
The pins of the serial ports are configurable as generalpurpose I/O pins. while the serial ports can also be configured and used as timers.
The two 320C30 timer modules function as generalpurpose timer/event counters; each have two signaling
modes and internal or external clocking. Available to each
timer is an I/O pin for use as an input clock to the timer. as
an output signal driven by the timer. or as a generalpurpose pin.

38

The software features of a programmable DSP are
probably the most important features because they determine the effectiveness of the implementation. Typically.
the user first develops an application on a large computer
using a high-level language and. once it is working satisfactorily. ports it to a DSP device. The software features
of the 320C30 that we discuss include the integer and
floating-point number representations. addressing modes.
pipeline effects. and different types of instructions and
constructs.
Integer and floating-point formats. A 32-bit. twoscomplement notation represents the integers. In addition to
this single-precision format. we have a short format. consisting of 16-bit. twos-complement numbers used only for
immediate operands. Every instruction of the 320C30
consists of one 32-bit word.
We use three formats for floating-point numbers: short.
single precision. and extended precision. The single-precision. 32-bit-wide format assigns 24 bits to the mantissa and
8 bits to the exponent. The exponent occupies the 8 most
significant bits. and it is represented in twos-complement
notation. taking values between -128 and 127. The exponent value -128 is the result reserved to represent zero.
The mantissa. placed at the 24 least significant bits of a
32-bit number. is normalized to a number with an absolute
value between 1.0 and 2.0. Since the mantissa is represented in a normalized. twos-complement notation. the
leftmost bit. which corresponds to the sign. and its adjacent
bit will always be the complement of each other. As a
result. only the sign bit is represented. with the most
significant bit suppressed. In other words. the mantissa
contains 24 significant bits plus the sign bit. with the most
significant bit implied.
Addressing modes. The 320C30 supports several addressing modes that allow the user to access data from
memory. registers. and the instruction word. The basic
addressing modes are
• register.
·direct.
• indirect.
• short immediate.
• long immediate. and
• PC relative.
In register mode the operand is placed into a CPU
register that is explicitly specified in an instruction. In
direct mode the data memory address is formed by preceding the 16 least significant bits of the instruction word with
the 8 least significant bits of the data page pointer. To keep
all instructions one word long. we store only the 16 least
significant bits from the address in the instruction word; the
rest become the data page pointer. This restriction implies
that in direct addressing the memory space is segmented
into 256 pages of 64K words each.

The TMS320C30 Floating-Point Digital Signal Processor

Table 1.
Addressing modes of the 320C30.
Mode

Example

Operation

Description

Register
Direct
Short
immediate
Long
immediate
PC relative
Indirect

ADDF RO,RI
ADDF @MEM, RI

Addr=ME¥

Operand in RO
Operand in MEM

ADDF 3.14,RI

Operand = 3.14

BRLABEL
BGELABEL
ADDF * +ARO(di),RI

Addr = ARO + di

Indirect

ADDF * - ARO(di),RI

Addr = ARO - di

Indirect

ADDF * + + ARO(di),RI

Indirect

ADDF * - -ARO(di),RI

Indirect

ADDF *ARO+ + (di),RI

Indirect

ADDF *ARO- -(di),RI

Indirect

ADDF *ARO+ + (di)%,RI

Indirect

ADDF *ARO- -(di)%,RI

Indirect

ADDF * ARO + + (IRO)B,RI

Addr = ARO + di
ARO=ARO+di
Addr = ARO - di
ARO=ARO-di
Addr= ARO
ARO=ARO+di
Addr= ARO
ARO=ARO-di
Addr= ARO
ARO = circ(ARO + di)
Addr=ARO
ARO = circ(ARO-di)
Addr=ARO
ARO = B(ARO + IRO)

Branch to LABEL
Branch to LABEL
Predisplacement add
without modification
Predisplacement subtract
without modification
Predisplacement add and
modify
Predisplacement subtract
and modify
Postdisplacement add
and modify
Postdisplacement
subtract and modify
Postdisplacement add
and circular modify
Postdisplacement subtract
and circular modify
Postindex (IRO) add and
bit-reversed modify

di is an integer between 0 and 255 or one of the index registers IRO and IRI.

Indirect addressing, the most versatile of all the modes.
specifies the address of an operand in memory through the
contents of an auxiliary register. As an option, the contents
of the register can be modified by constant displacements
or by the contents of the index registers. Table I lists all of
the addressing modes, with particular emphasis on indirect
addressing modes.
An instruction explicitly specifies the auxiliary register
used for indirect addressing. The user can modify it by a
constant displacement taking values 0 to 255 or by the
contents of one of the two index registers IRO or IR I. The
modification can take place before or after accessing the
memory. In the case of premodification. the user has the
option to change the contents of the auxiliary registereither
permanently or temporarily. The notation used for such
modifications is reminiscent of the C-language syntax.
Two special forms of indirect addressing that are particularly useful are bit-reversed and circular addressing.
Bit-reversed addressing is used with the fast Fourier transform to compensate for the fact that normally ordered data

at the input of the transform are scrambled at output (bitreversed order). To avoid moving the data around to place
them in the proper order. bit-reversed addressing accesses
the data in scrambled order for any subsequent operation.
Circular addressing implements circular buffers. Such
buffers are very convenient for use in digital-filtering
operations. In circular addressing, BK. one of the control
registers, specifies the size of the block. Then, when the
user modifies the contents of an auxiliary register (pointing
within that block) in a circular fashion. the final value is
tested to determine if it is still within the block. If it is not,
it is wrapped around using modulo arithmetic.
The short-immediate mode encodes immediate. 16-bitlong operands of arithmetic operations. The long-immediate mode encodes program control instructions (branch
instructions) for which it is useful to have a 24-bit absolute
address contained in the instruction word. Finally. the PCrelative addressing also applies to program control instructions and uses the difference from the present location of
the PC countetrather than an absolute address. The lasttwo

The TMS320C30 Floating-Point Digital Signal Processor

39

modes are transparent to the user. The user specifies the
branching label wanted, and the assembler assigns the
appropriate addressing mode.
Pipeline. To achieve the high throughput of the device,
the 320C30 uses a four-phase pipeline with five major
functional units operating in parallel. These five units are
• instruction fetching,
• instruction decoding and address generation,
• operand reads,
• instruction execution, and
• DMA transfer.
Figure 6 shows diagrammatically how the pipeline
operates on successive instructions. When the pipeline is
full, an instruction completes the' execution phase every
6O-ns machine cycle.
Occasionally conflicts may arise, as in the case of a
loaded auxiliary register that needs to be used for indirect
addressing in the next instruction. To handle such cases, we
established a priority between the different units, giving
DMA the lowest priority. Among the others, an Execute
instruction has the highest and a Fetch instruction the
lowest priority.
In programming the device, the user does not have to
worry about the pipeline conflicts, which do not occur that
often anyway. When a conflict does occur, the device
automatically inserts the necessary extra cycle(s) to make
the instructions behave as expected. In most cases, this
arrangement will be sufficient for successful operation.
For time-critical operations, though, it may be necessary to
remove the extra cycles caused by pipeline conflicts. The
user can make this correction by rearranging the instructions of the program. To do so, the user must determine
how to identify the locations where insertions occur. For
that purpose, the development tools (simulator, emulators)
contain a tracing feature that can display the pipeline. In
this trace, any conflicts are immediately identified, and
then the user can take steps to correct the problem.
Instruction set features. The instruction set of the
320C30 supports both two- and three-operand instructions. In all arithmetic instructions (except Store), the

1
U>

c

2

tl

3

0

~
.E

I

Fetch

I
I

Decode
Fetch

4

5

I
I
I

I

Read
Decode
Fetch

I
I

I

*ARI ,RO,R 1
Rl,*AR3

ADDF

STF

When executing a branch instruction, the pipeline must
be flushed since the path followed afier the branch is data
dependent. As a result, a regular branch instruction is more
costly than other instructions, taking four cycles to complete. This overhead may be unacceptable in some timecritical applications. To alleviate this problem and to offer
more flexibility to the programmer, the 32OC30 contains
a set of delayed branches that complement the set of
standard branches. In a delayed branch, the three instructions following the branch instruction execute whether the
branch is taken or not taken. As a result, the delayed branch
ends up taking only one cycle to execute. The same
approach can be used even when there are less than three
such instructions, by adding NOPs (no operations). The
branch will still take less than four cycles.
The greatest cost of branching occurs during the execution of loops. In looping, a counter is decremented and
compared to zero at the end ofthe loop. If it is not zero, a
branch is taken to the beginning of the loop. The 320C30
offers a special arrangement that implements loops with no

4

3

2

Cycle

destination is a register in the register file. The source
operands can come from memory or from a register or, in
the case of two-operand instructions, can be part of the
instruction word.
A unique feature of the 320C30 is the set of instructions
in which operations execute' in parallel. This construct
permits a high degree of concurrency and execution of any
arithmetic or logical instruction in parallel with a Store
instruction. It also supports parallel multiplies and adds, as
well as parallel loading and storing of two registers. Parallel multiply and adds lead to the peak performance of 33
Mflops. Executing the Store instruction at the same time
with another arithmetic operation essentially permits this
kind of data movement without a penalty. As an example,
the following instruction adds the contents of memory
pointed to by ARI (indicated by *ARI) to register RO
(treating them as floating-point numbers) and places the
result in register R I. In parallel with that process, the
original contents of R I are stored in the memory location
indicated by AR3.

Execute
Read
Decode
Fetch

5

I
I
I

I

I

Execute
Read
Decode
Fetch

6

I
I

I

I

Execute
Read
Decode

7

I
I

I

Execute
Read

I

I

Figure 6. Pipeline of 32OC30 instructions.

40

The TMS320C30 Floating-Point Digital Signal Processor

ADDF3

STF

User-friendly development tools
offer extra support:
an optimizing C compiler and
a DSP operating system.
overhead. The two instructions RPTB (repeat block) and
RPTS (repeat single) realize this arrangement. The format
of the RPTB instruction is:
RPTB

*ARO,Rl,R2
RO,*ARI

Furthermore, two loads or two stores can execute in
parallel, as is also the case with a multiply and an add or a
multiply and a subtract. The design of the instruction set
has been guided by a desire to ease programming efforts.
The execution results of an instruction are always available
for use in the instruction that follows.
Besides the regular arithmetic and logical instructions,
the 32OC30 includes instructions to handle the software
stack, internal and external interrupts, and branches and
subroutine calls. Conditional loads and calls make the
programming more compact and efficient, while special
instructions (called interlocked instructions) can be used in
multiprocessor environments.

LABEL

(put instructions here)
LABEL (last instruction)

Associated with the repeat -block construct are three of
the 12 control registers in the register file. One register
indicates the beginning of the block, the second indicates
the end of the block, and the third acts as the repeat counter.
The assembler automatically assigns values to the first two
registers. They contain the address of the instruction
immediately below RPTB, and the address of LABEL
respectively. Users should initialize the repeat counter
before entering the loop. In terms of execution time, this
arrangement behaves as if the loop were implemented with
straight-line code.
The instruction RPTS has the format
RPTS
count
and it repeats the following instruction "count" times. It
differs from RPTB in that it
• applies to only one instruction;
• does not refetch the instruction for every execution, but
keeps it in the instruction register thus freeing the buses for
data transfers, and
• is not interruptible.
Table 2 on the next page is a sample of the instructions
available on the 320C30. Although we included a rich set
of instructions for both DSP and general-purpose processing, the perceived size of the instruction set is much
smaller. The reason is that a symmetry exists between
integer and floating-point instructions, between instructions with two or three operands, and between single and
parallel instructions. For instance, addition is represented
by ADDI, ADDF, or ADDC in the case of adding integers,
floating-point numbers, or adding with a carry. The threeoperand instructions have the same form, with a 3 appended at the end (ADDF3). All of the multiplier and ALU
operations can be performed in parallel with a Store instruction, and such instructions take the form ofthe following example:

Development tools and support
The newer DSP devices offer increased processing
power that permits the implementation of more complicated and demanding algorithms. However, as the complexity of the algorithm increases, the task of debugging
the implementation becomes more difficult. The 32OC30
addresses this problem by providing user-friendly development tools and offering extra support in the form of an
optimizing C compiler and a DSP operating system.
The assembler translates assembly-language source
files into machine-language object files. Source files can
contain instructions, assembler directives, and macro directives. Assembler directives control various aspects of
the assembly process such as the source-listing format,
symbol definition, and method of placing the source code
into sections. Macro directives permit a concise representation of groups of instructions that occur frequently.
The linker combines object files into one executable
object module. As it creates the executable module, the
linker performs relocation operations and resolves external
references. The linker accepts relocatable COFF (Common Object File Format) object files, created by the assembler, as input. It can also accept archive library members
and output modules created by a previous linker run.
Linker directives allow the user to combine object-file
sections, bind sections or symbols to specific addresses or
within specific portions of 320C30 memory, and define or
redefine global symbols. An associated archivercim create
macro or object-file libraries.
The software simulator is a very important tool for
debugging 32OC30 programs. Its interface consists of a
screen broken into windows that display the internal registers, the reverse-assembled program, and a versatile window where memory, breakpoints, and a wealth of other
information can be displayed. The same interface (modified to accommodate some special features) is also used
with the hardware emulator. The major features of the
simulator include:
• Simulation of the entire 32OC30 instruction set and the

The TMS320C30 Floating-Point Digital Signal Processor

41

"

Instruction

Table 2.
Instructions for the 320C30.

Description

Load and store instructions
LDE
Load floating-point exponent
LDF
Load floating-point value
LDFcond
Load floating-point value conditionally
LDI
Load integer
LDIcond
Load integer conditionally
LDM
Load floating-point mantissa
Two-operand instructions
ABSF
Absolute value of a floating-point
number
ABSI
Absolute value of an integer
Add integers with carry
ADDC t
Add floating-point values
ADDF t
Add integers
ADDI
t
AND
Bitwise logical-AND
t
Bitwise logical-AND with complement
ANDN t
ASH
Arithmetic shift
t
CMPF t
Compare floating-point values
CMPI
Compare integers
t
FIX
Convert floating-point value to integer
FLOAT
Convert integer to floating-point value
LSH
Logical shift
t
MPYF t
Multiply floating-point values
Multiply integers
MPYI
t
Negate integer with borrow
NEGB
Negate floating-point value
NEGF
Negate integer
NEGI
Program control instructions
Branch conditionally (standard)
Bcond
BcondD
Branch conditionally (delayed)
Branch unconditionally (standard)
BR
Branch unconditionally (delayed)
BRD
Call subroutine
CALL
CALLcond
Call subroutine conditionally
Decrement and branch conditionally
DBcond
(standard)
Decrement and branch conditionally
DBcondD
(delayed)

Instruction

Description

POP
POPF
PUSH
PUSHF
STF
STI

Pop integer from stack
Pop floating-point value from stack
Push integer on stack
Push floating-point value on stack
Store floating-point value
Store integer

NORM

Normalize floating-point value

t
t

Bitwise logical-complement
Bitwise logical-OR
Round floating-point value
Rotate left
Rotate left through carry
Rotate right
Rotate right through carry
Subtract integers with borrow
Subtract integers conditionally
Subtract floating-point values
Subtract integer
Subtract reverse integer with borrow
Subtract reverse floating-point value
Subtract reverse integer
Test bit fields
Bitwise exclusive-OR

IDLE
NOP
RETlcond
RETScond
RPTB
RPTS
SWI

Idle until interrupt
No operation
Return from interrupt conditionally
Return from subroutine conditionally
Repeat block of instructions
Repeat single instruction
Software interrupt

TRAPcond

Trap conditionally

NOT
OR
RND
ROL
ROLC
ROR
RORC
SUBB
SUBC
SUBF
SUBI
SUBRB
SUBRF
SUBRI
TSTB
XOR

t

t

--t Two- and three-operand versions

42

The TMS320C30 Floating-Point Digital Signal Processor

key peripheral features;
• Command entry from either menu-driven keystrokes
(menu mode) or from line commands (line mode);
• Help menus for all screen modes;
• Quick storage and retrieval of simulation parameters
from files to facilitate preparation for individual sessions;
• Reverse assembly allowing editing and reassembly of
source statements;
• Multiple execution modes;
• Trace expressions that are easy to define;
• Trace execution that can display designated expression
values, cache memory, and the instruction pipeline; and
• Breakpoints that can occur on address read, write, or
both, on address execute, and on expression valid.
Perhaps the most important trend with the newer DSPs
is the availability of high-level-language compilers. The
presence of C and Ada compilers in the 32OC30 is not an
accident since the 320C30 was designed with a compiler in
mind. We expect this path to a high-level language to make
the porting of application programs from large computers
much easier. The algorithm can be developed almost
entirely on a large computer and then converted to the
320C30 assembly language by compilation.
The C compiler for the 320C30 has exceptional efficiency,' which makes a good C program almost as effective as the assembly-language program. The C compiler
will be sufficient for most applications. The exception is
time-critical applications. In such cases one can use the fact
that most DSP algorithms spend the vast majority of the
execution time on a small section ofthe code. (Researchers
often mention the 90/10 rule: 90 percent ofthe time is spent
on 10 percent of the code.) Underthese circumstances, the
user can optimize execution by creating very fast assembly-language routines that implement the time-critical
sections, and call them from C as regular C functions. To
achieve this, we define the C function interface very
precisely so that users can create their own routines. The Ccompiler package comes with a library of general-purpose
mathematical, interface, and 110 functions.
Besides this method of optimizing the performance of
the C language, two more methods can be used. The first
one is based on the fact that the output of the compiler is an
assembly-language program. The user can edit this program and optimize it by rearranging the instructions. The
second method is to use the "asm" directive supported by
the C compiler. The arguments of this directive are passed
to the output of the compilation without any alteration so
that the user can insert assembly-language instructions into
the middle of the C program ..
A key part of the 320C30 development environment is
Spox, the first real-time operating-system for a single-chip
DSP. Spox, developed by Spectron Microsystems, extends
the core C language with a library of standard 110 routines
and, most importantly, a DSP math package. One of Spox , s
unique features is that it provides users with software
objects that are especially suited for DSP. Some of these
. objects are vectors, matrices, filters, and streams. The math

Perhaps the most important
trend with the newer DSPs is
the availability of high·level·
language compilers.
package and these software objects are carefully designed
to take full advantage of the capabilities of the 32OC30.
Spox also supports multitasking, thus allowing the user to
easily implement the more complex control structures that
are becoming essential for DSP systems.
By providing a complete software development environment that includes compilers and operating systems
along with the more-traditional tools such as assemblers
and linkers, we allow the user to move from system
conception to system implementation in the shortest possible time.
The next level of development tools includes the hardware emulators for debugging target hardware or determining the performance of an algorithm on the 320C30
device itself. The XDS 1000 is a real-time, in-circuit emulator/software development tool based on the 320C30.
Besides these tools from Texas Instruments, other companies offer related support, such as the PC-based development board by Atlanta Signal Processors and the development platform of Spectron Microsystems for PCs and Sun
workstations.

Applications
Certain features of the 32OC30 such as its high speed,
architecture, and rich instruction set, make it easy
to Implement very demanding algorithms. The large
memory space makes the device suitable for application
areas such as image processing in which memory addressing is one of the prime considerations. And the C compiler
makes it easy to construct algorithms with complicated
logic.
ver~atile

General DSP algorithms. Almost every DSP application needs to perform some kind of filtering, the first
application considered for a DSP device. Digital filters are
categorized as FIR (finite-length impulse response) and
IIR (infinite impulse response) filters,'" or, equivalently,
as filters that have only zeros or both poles and zeros. Each
of these categories can have either fixed or adaptive coefficients.
The 32OC30 implements FIR filters very efficiently. For
instance, let an FIR filter have an impulse response h[O],
h[I], ... , h[N X I], and letx[n] represent the input of the
filter at time n. Then, the following equation gives the
output y[n] with the equation:

The TMS320C30 Floating-Point Digital Signal Processor

y[n] = h[O] X x[n] + h[l] X x[n - I] + ... +
h[N - I] X x[n - N + 1]

43

Typi cal Call ing Sequence,

load
load
load

ARO
ARt
RC

load

BK

CALL

FIR

Data Memory Organization:
Impulse
response
Low

+------------+
address I
h (N-l)
h(N-2)

Hi gh
address

+------------+
I
h <1)
I
+------------+
h (0)

Newest
input

+------------+
x (n)

+------------+

+------------+
x (n-1)
1---+

+------------+

J The physical address for the start of the input samples must be on
a boundary with the LSBt; 'Set to zero according to the length of the
buffer. The pOinter to the input sequence (x) is incremented and
assumed to be moving from an older input to a newer input. At the
end of the subroutine AR1 will be pointing to the pO$ition for the
; ne>:t input sample.
.
Argument Assi gnments:
I Function

Argl..lment

---------+----------------------Address of hCN-I)

ARO

AR 1
RC
BK

Addr"ess of x (N-1 )
Length of filter - :2
Length of -filter (N)

(N-2)

Regi sters used as input I ARO, AR1, RC, BK
Registers modified: RO, R2, ARO, AR1, RC
Regi star contai ni ng resul t: RO
Progranl si ze: 6 words
; Execution cycles:
I

I

11 T (N-l)

=========.=.=.==.====.===.==••

=.~==.=.==~==.=.=.=.============-=_

•• -.==.-.

I

.global

FIR

;

initialize ROI

FIR
filter

1:

MPVF3

*ARO++(l) ,*AR1++(U'X,RO

LDF

0.O,R2

( 1

<-

i

<

h(N-I) * M(n-(N-U) -) RO
initialize R2.

N)

RC

RPTS
MPYF3
ADDF3

*ARO++(1) ,*AR1++(1)X,RO
RO,R2,R2

ADDF

RO,R2,RO

••tup the repeat single.
h (N-I-i) * • (n-(N-I-i») -> RO
multiply and .dd operation
• add l ... t product

return sequence
RETS

, return

end
.end

Filure 7. FIR Iilter implementation on the 320C30.

44

The TMS320C30 Floating-Point Digital Signal Processor

Typical Call ing Sequence:
load
load
load
load
load
load
load
CALL

R2
ARO
AR1
IRO
IRl
BK
RC
IIR2

; Data Memory Organ1z:ation;
Filter
coefficients

Low

+------------+

address:

a2(l)

+------------+
I

Initial delay
node values

Newest
delay

a1(0)

+------------+
bl (0)
+------------+
bOlO)

d(O,n)
+- ... ----------+
d(O,n-U
t

b2(O)

+------------+

+------------+

01 dest +------------+
delay
d(O,n-2)
+------------+

Empty

Final delay
node values

+------------+
d(O,n-l)
:----+
+------------+
d(O,n-2): circular
queue
+------------+
I
d(O,n'
:----+
+------------+

Empty

+------------+

+------------+

+------------+
+------------+
I d(N-1,n-l):

+------------+
: d(N-l,n-l> t----+
+------------+
: d(N-l,n-2) : circular

+------------+

+------------+

: d(N-l,n-2):

d(N-1,n)

+---------~--+

+------------+

d(N-l,n)

queue
+------------+

a2(N-l)
+------------+
b2(N-l)
+------------+

Empty

:----+

Emptv

a.1 (N-1)

+------------+
bl (N-l)
+------------+
address I
bQ{N-l)
Hi gh

+------------+

The physical address for the start of each circLIlar queue of delay node
values must be on a boundary with the LS9s set to zero according to t.he
J length of the buffer. The BK (block size) register must contain the

(Contiltusdonpage26)

Figure 8. Implementation of N biquads on the 320C30.

Two features of the 320C30 facilitate the implementation of the FIR filters: parallel multiply/add operations and
circular addressing. The first feature pennits a multiplication and an addition to execute in one machine cycle, while
the second makes a finite buffer of length N sufficient for
the data x[n]. Figure 7 shows the arrangement of the data
and the assembly code for an FIR filter. Note that the filter
takes one cycle of execution per tap.
The transfer function of the I1R filters contains both
poles and zeros, and its output depends on both the input
and the past output. As a rule, these filters need less
computation than a FIR filter of similar frequency response, but they have the drawback of being sensitive to
coefficient quantization. Most often, the I1R filters are
implemented as a cascade of second-order sections, called
biquads. To implement an I1R filter consisting ofNbiquads,
let al [i], a2[i] be the numerator coefficients of the ith biquad and bO[i], b I [i], b2[ i] the denominator coefficients of

the same biquad. Also, let x[n] be the input and y[n] be the
output of the I1R filter. In canonic fonn, the following C
code implements the N biquads:
y[O,n] = x[n];
for (i=O; i R!
*
-> RO
*sumd (O,n-U
term of d(O,n) •
-;~

b2(O)
&1 (01

RO, R2, R2

first

"

ADDF3

*++ARO(l) , *AR1--(1)Z"
RO, R2, R2

::

MPYF3
STF

*++ARO( 1), R2, R2
R2, *ARl-- (1) 7-

RPTB

I

RO

LOOP

,

d(O,n-l) - ) RO
second sum term of d(O,n).

bl (0)

*

bO(O)
d(O,n) -> R2
store d  RO
first sum term of y Rl
second sum term of y(i-l,n)

::

MPYF3
ADDF3

*++ARO(1), *AR1, RO
RO, R2, R2

J ai RO
; first sum term of d(i,n),

MPYF3

*++ARO(l), *ARl--(l)%, RO
RO, R2, R2

bl(i) * d(i,n-l) -> RO
second sum term of d(i ,n).

STF

R2, *AR1--(UY.

store d(i ,n); point to

MPYF.3

*++ARO(l), R2,

:

::

ADDF3

*

d

;
LOOP

flnc!l

R2

(i

,n-2).

bOCi)

*

d(i,n) -) R2

!:tummatlon
AODF
ADDF3

RO,RZ
Rl,R2,RO

first sum term of y(N-I,n)
second sum term of Y'N-i.ni

NOP

*AR1--(IR1)
*AR1--{UI.

return to first biquad
point to d (O,n-L>

NOP

return sequence
RETS

return

end

.end

Figure 8 (conl'd,)

46

The TMS320C30 Floating-Point Digital Signal Processor

main. Computationally efficient implementation of Fourier
transforms are known as the fast Fourier transform
(FFf).3.5 Table 3 shows the timing for different FFfs on
the 32OC30. The code for these FFfs, as well as the
routines listed in Table 4, appear in the TMS320C30 User's
Guide."
The 320C30 has many features that make it well suited
for FFTs, such as the high speed ofthe device, the floatingpoint capability, the block-repeat construct, and the bitreversed addressing mode. For instance, the FFT shown in
Figure 9 on the next page can be implemented in code that
can be entirely contained in the 64-word cache of the
32OC30. 7
Telecommunications and speech. Telecommunications and speech applications have many requirements in
common with other DSP applications, but they also have
some special needs. For instance, telecommunications
applications interfacing to Tl carriers sometimes need to
convert between a linear signal and one compressed by f.!law or A-law formats. Such a conversion can be realized
with hardware by adding a peripheral to the DSP peripheral
bus. This is the approach taken in some members of the
TMS320 first generation of devices. An alternative way is
to do the same function with software.
In speech applications, digital filters are often implemented in lattice form. Depending on the application, both
FIR and IIR filters are realized this way, although sometimes the terminology lattice filter and inverse lattice filter
is used respectively.
Graphics and image processing. In graphics and image processing applications DSPs perform operations on
two-dimensional signals, and matrix arithmetic takes on
'particular significance. In the 320C30 matrix arithmetic
can be decomposed into a series of dot products, which can
be very effectively implemented using constructs similar
to the FIR filter implementation discussed earlier. Additionally, the large memory space of the 320C30 allows
processing of large segments of data at a time.
Benchmarks. We have implemented several generalpurpose and applications-oriented routines for the 320C30
and include these in the User's Guide." Table 4 lists some
of these routines with the necessary cycles and the memory
requirements for the program.

T

he last five years have seen a tremendous growth
in the utility of digital signal processors. This
growth has been fueled, at least in part, by the
ever-increasing level of performance and ease of use of
general-purpose DSPs. The TMS320C30 represents the
newest generation of DSPs. But, the end of this trend is not
yet in sight. Rather, we expect the trend of higher levels of
performance and greater ease of use to continue. For DSPs,
the next five years look bright indeed.

Table 3,
Timing of an FFT on the 320C30.
Number of Radix-2
points
(complex)
FFT timing (ms)
0.167
64
128
0.367
256
0.801
512
1.740
1,024
3.750
Code size
(Words)

55

Radix-4
(complex)

Radix-2
(real)

0.123

0.624
-

3.040

0.075
0.162
0.354
0.771
1.670

176

86

The code size does not include the sinel
cosine tables. The timing does not include bit
reversal or data I/O.

Table 4.
Program memory and liming
requirements for 320C30 routines.

Application
Inverse of a floating-point
number
Integer division
Double-precision integer
multiplication
Square root
Dot product of two vectors
Matrix times vector
operation
FIR filter
IIR filter (one biquad)
IIR filter (N) I biquads)
LMS adaptive filter
LPC lattice filter
Inverse LPC lattice filter
jL-law compression
jL-law expansion
A-law compression
A-law expansion

N
P
R
C

Words

Cycles
(best casel
worst case)

31
27

27/58

24
32
10

35
8 + (N - I)

10
5
7
16
9
II
9
16
13
18
15

31

20/24

2
7
7

+ R(C + 9)
+ (N - I)

19+6N
8 + 3(N - 1)
9 + 5(P - 1)
9 + 3(P - 1)
16

11/16
18

14/21

= length of appropriate vector
= length of lattice filter
= number of rows of a matrix
= number of columns of a matrix

The TMS320C30 Floating-Point Digital Signal Processor

47

GENERIC f>ROGRAM TO DO A LOOPED-CODE RADIX-2 FFT COMPUTAHON IN 320C30.
THE f>ROGRAM IS ADAPTED FROM THE FORTRAN PROGRAM IN PAGE III OF
REFERENCE [5l
AUTHOR. f>ANOS E. PAPAMICHALIS
TEXAS INSTRUMENTS
.GLOI3L
.GLOI3L
.GLOI3L
.1365

JULY 16, 1987

FFT SIZE
LOG2 HOLDS THE CURRENT STAGE NUMBER

O,AR6

@FFTSIZ , IRO
1,IRO
@FFTSIZ,R?
I,M?
I,ARS

;

INITIALIZE IE INDEX

*·t-+Ar~6 (1)

@tNPUT

ARO POINTS TO X (l)

..

AR2 POINTS TO ;( (U

Af.7.RC
loRe

;

RC SHOULD BE ONE LESS THAN DESIRED ...

BUTTERFLY WITHOUT TWIDDLE FACTO~S
RPTB
BLK!
AODF
*ARO,*AR2,RO
SUBF
*AR2++,*ARO++,Rl
AOOF
*AR2,*AFW,R2
II
~LKl



CURRENT FFT ST?lGE

,A~'i'

R-',APO,r~f;.

IRO=2*Nl (BECAUSE OF REAL/IMAG)
R7=N2
INITIALIZE REPEAT COUNTER OF FIRST LOOP

RO=XtI)+X(U
Rl=X (I) -x (L)
R2=Y (I )+Y(U

SUBF

*AR2,*ARO,R3

R3=Y(I)-Y(L)

STF
STF
STF

R2, *ARO--

Y ( I ) =R2
Y{L)=R3

CMPl

@LOGFFT ,AR6
END

R3,*AR2--

AND ..•

R0,*ARO++ ,R3

R3-R 1*COS AND ......

SU8F

I I

t1PYF

R.s~V(l)+V(L)

FilUre 9. Example of a lICIil·2, decimation·in·frequency FFT.

48

The TMS320C30 Floating-Point Digital Signal Processor

::

R3,*+ARO
RO,R3,R4
Rl,R6,RO
*AR2, *ARO, R3
R2,*+AR4 (IR1) ,R3
R3, *ARO++ (IRO)

BLK2

STF
SUBF
MPYF
AD OF
Mf'YF
STF
ADDF
STF

AND •••
II

STF

R4,*+AR2

CMPl
BNE

R7,ARl

LSH
LSH
LDI
LSH
BR

1,AR7

::
I:

END

RO,R3,R5
R5, *AR2++ (IRO)

lNLOP

1,ARS

R7,IRO
-1,R7
LOOP

,

YCI)=Y::::vX 

;::s

~
0

~

0

;::s

S.

'"

~
~

tv

C

0C

.GLOBL
.GLOIIL

_1It..2

.BSS
.BSS
.BSS

mSlZ.[

_sint

; ENTRY POINT FOO EXECIITION
; ADDRESS IF SINE TABLE

S[NTAB

_flL2:

FP
SP,FP
R4
R5
R6
R7

PUSH
lOI
PUSH
MH
PUS~

ruSHF

• GLOBAL

_SlOt

PUSH

•DATA
_nne
.FLOAT

PUSH

AR4
AR5

.FLOAT VAl..lf.l = Sln(Ot2fpl/N)
VALlE = sinUI-2+pilNI

PUSH
PUSH

AR6
AR7

• FLOAT

VIIlUEISII/41

LO[

*-fPI2),RO
RO,lFFTSlZ
HPI31,RO
RO.ILOGFFT
HPI41,RO
RO,tINPUT

= sinll5tN/4-11<2

"0
"0

=
9=

LOOPFT,I
Itt'IIT,1

[NITIAL[ZE C FlKT[ON
THE conFUTATION [S lOE [N PLACE, AND TIE ORIG[NAI. DATA [S DESTROYED.
BIT REVERSAL [S [11PLEI'IENTElI AT TIE £NO IF TIE FlKTlON. [F TH[S [S r«rr
NECESSARY, THIS PART CAN BE ilII'IENTEIl OOT.

i:i

AII3

•TEXT

'G

~

.set

; MRENT FFT STAGE
; ARC PO[NTS TO XIII
; AR2 PO[NTS TO XILI
; RC SIIlWI BE ONE LESS THAll DES[REO I

~
~

~

::....

FlST LOG'

::s

3'
~

RPTB
ADlF
SlI!f
ADIF

~

'"

::s

S

5"
::s

~

..
..

STF
STF
STF
STF

IILKI

Cl'I'I
B10

tl

(J
I::>

::s

11t.!F'

S-

...'"

~

1:5

::s

~
C

~

§
S-

'"

~

~
tv
C

a
C

LOI
LOI
ADDI
LOI
ADDI
ADDI
ADOI
LOI
SUBI
LIF

RPTB
SUBF
SUBF

..
..
..
.

::

-..l

Rl,fAR2++!lROI

vru

ILOOFFT ,ARb
END

.
, INIT LOG' CIlJIITER FOR INNER LOG'
, INITIIILIZE IA INIEl (AR4=IA)
, IA=IA+IE, AR4 POINTS TO COSlt£
, INCREIENT INNER LOG' CttMER
, IX (J), YU I I POINTER
, I ilL!, Y(Ll! POINTER

STF
ADIF
STF
STF

ADIF

II'YF
STF
SUBF

IIPYF
ADIF

II'YF

Nl=N2
N2=N212

t£lT

.

..

CONT
BITRI'

LOI
SUBI
LOI
LOI
LOI

IFFTSlZ,RC
l,RC
IfFTSlZ ,IRO
@Itf'IIT,1IRO
IINPUT,ARI

, RC SIOl.lI lIE M: LESS THIIH IESlRED •
, Rb=SIN

, R2=X(J)-X(lI
, Rl=YU )-Y(lI
, RQ=R24SIN AND•••
R3=YU )+YIL!
, R3=RUCOS rill •..
YU)=Y(I)+Y(L!
, R4=RltC0S-R24SIN
, RO=Rl4SIN AND...
R3=XIl )+IIL!
, R3=R24COS AND•••
X(J) =1 ( U+X! L! rill ARO=ARO+24N 1
, R5=R24COS+tIl4SIN
, XlLl=R24COS+R14SIN, II«:R AR2 AND .. ,
YIL!=R14COS-R24SIN

,
,

Clf'1

R7,ARI

&IE

11t.0P

, LOOP BACK TO TIE IllER LOOP

LSH

l,AR7

, INCREIENT LOOP ClUlTER FOR t£XT TIlE

LSH

l,AR5

, IE=24IE

m STIME

RPTB
Cl'I'I
IIOE
LIF

BITRIi
IIRO,ARI
COIIT
tIIRO,RO
4IIRl,Rl
RO,4llRl
Rl,+1IRO
f+AROt t). RO
t+ARlU) ,Rl
RO,t+ARl(l)
Rl, t+AR0(1)

LDF

STF
STF
LIF
LDF

STF
STF
I«P
HOP

RC=N

RC SIOl.lI lIE eN: LESS THIIH
lRO=SIZE OF m=N

I++MO(2)

+AIIl++IIRO)B

RESTmE TIE REGISTER VIILI£S AND RETlRI

BLK2

4AR2,4IIRO,R2
4fAR2, HMO,Rl
R2,Rb,RO
4fAR2, HMO,R3
Rl,4+AR4!lRll,R3
R3,HMO
RO,R3,R4
Rl,Rb,RO
fAR2,tIIRO,R3
R2, 4+AR4!lRl) ,R3
R3. 4i\RO++( lRO)
RO,R3,R5
R5,411R2++!lRO)
R4,4+AR2

II'YF

END'

..

2.ARI
ISINTAB,AR4
AR:i,AR4
ARl,ARC
2,ARI
Iltf'lJT,ARC
R7,ARO,AR2
ARl,RC
l,Re
4AR4,Rb

R7,IRO
-l,R7
LOG'

00 TIE 8lT -REIIERS INIl OF TIE OOTPUT

lIRE 00'£

SECOND LOG'

BLK2

00

RO=XIl )+1I1I
Rl=XIlH(lI
R2=Y(J)+Y(lI
R3=YU )-Y(lI
Y(J)=R2 AND .. ,
Y(1I=R3
,X(J)=RO AND •••
X(L!=Rl AND ARO.2 = ARO.2 + 24Nl

/lAIN INt£R LOG'

.:--l
C

,
,
,
,
,

RO, tMOt-t( IRO}

IF THIS IS TlE WT STIME,

."-3

LSH
IIR

IILKI
tIIRO,fAR2,RO
JAR2++. tAROt+, Rl
fAR2,tIIRO,R2
fAR2,tIIRO,R3
R2,411ROR3,4IIR2-

SUIIF

~

~

LOI

pa>
pa>
pa>
pa>

Pa>F
Pa>F
pa>
pa>
pa>

RETS

AR7
ARb
AR:i
AR4
R7
Rb

R5
R4
FP

~SIRED

I

00
00

COI'LEX. RADIX-2 DlT FFT

HfHfHHHHHfIH+HHofHHtHHHHHofH•• HHHfIHfHHHfHHHtHHtH

R2DlT.ASIt

THIS PROGRAII 1rt:uDES Fll.UJIINl FIlES'
WERIC PROGRAII RJl A FAST I.OIJ'ED-C(IE RADIX-2 DlT FFT COIfUTATlIlII
IlII Tl£ TI1S32OC3O
IIlITTEN BY' RAIIINl I£'IER. KMl SCIIWIZ
LStlSTtH. FI£R IW:HRICHTENTECHNIK

TI£ FILE 'TWIDIKIIR.ASIt' CIlIISISTS IF TWIDDLE FACTIIlS
TI£ TWIDDLE FACTIIlS ARE STIllED IN BITRE'IERSED IJUIER AliI WITH A TABLE
LENGTH IF N/2 IN • FFTWIlTH).
EXMPLE' _
Fill No32. 1111.) • COOI2tPltn/N) - jtSINI21/'11./N)

19.07.89

LWI'lERSlTAET~

CAl.(RSTRASSE 7. D-852O ERLAIllEN. FRG
ADIIlESS

~
~

~

r

is

§"
.Q..

~

tl

(')

.'"'3

~
C
S-

...
~

~

§

~
C

~
~

S~

~

t:l
tv
C

ac

THE ICOI'LEX) DATA RESIDE IN INTERN/IL I£IIORY. THE COtIPIJTATIIJI IS DIllIE
IN-PlACE. BUT THE RE5U.T IS I10IIED TO IIIiITHER I£IIORY SECTIIJI TO

o

1

IEl10NSTRATE THE BIT-1lEYEASED ADDRESSINl.

COEFFICIENT
R{ilNIO)) • COOI21/'ItO/32) • I
-HIllIO)) = SINI2tPltO/321 = 0
R{ilN(4)) • COOI21/'1t4/32) • 0.707
-HIII(4)} =SINI2tPIt4/32) =0.707

~

.~

.

Fill THIS PROORAII THE "INIIlIt FFTLENlTH IS 32 POINTS IlECIlUSE IF THE
SEPARATE STAGES.
12
13
14
15

FIRST TWO PASSES ARE REf<.IZED AS A FOUl BUTTERFlY LOOP SINCE THE
I'IJl.TIPLIES ARE TRIVIAL. Tl£ I'IJl.TIPLIER IS OIlY US£D FOR A LOAD IN
PAAALLEL WITH AN IIIU III SUBf.
ffffHHfffHftflftHHHUHfHnHffHHfHHHHHHHf+HIIIIIIIIIII.llllf

n

R{WNI3ll • COOI21/'It3/32) = 0.831
-HIII(3)) = SIN121/'113132) = 0.556
R{WNI7ll =COOI21/'It7I32) = 0.195
-HIflI7ll = SINI2tPlt7I32) = 0.981

lIEN GENERATED Fill A FFT LENGTH IF 1024. THE TABLE IS Fill ALL
AVAILABLE FFT IF LESS OR EllJAL LENGTH.

EXAItf'LE Fill A 1024-POINT FFT IEXCLUDltI; BIT RE'lERSALlI
~Y

THE "ISSINl TWIDDLE FACTIIlS 11II1l.1II1l ..... ) ARE GENERATED BY USINl
TI£ S~TRY IflIN/4+.) • -jllfll.). THIS CAN BE EASILY REf<.IZED BY
CHANlINl REf<. - AND IItAIlINARY PART IF TI£ TWIDilE FACTOOS AND BY
NEGATINl Tl£ NEW REf<. PART.

SIZE'

PROORAII

229 WORDS

DATA ITWIDDLE FACTIIlS)

512 WORDS

CYCLES PER BUTTERFLY'

STAGES 1 AND 2
STAGES 3 TO 8
STAGE 9
STAGE 10

8
8.25

TO CHANlE THE FFT LENGTH. MY TI£ PARAI£TERS IN THE I£ADER IF
TWIDIKBR.ASIt AND THE IIf'UT AND OOTPUT VECTOR LENGTHS NEED TO BE
ALTERED.
HfHff.HffHfffHHffffHHfflffffHHHffHfHfHH*tHfHHHIHHfffUH.

8.5
t

+
AR + j AI -------------------------------- AR' + j AI'

AI,{RAIlE CYCLES/BUTTERFlY
7. 27';
TOTAL BUTTERFLYCYCLES
• 37248
INITIALIlATlIJI MRHEAD
= 2181 = 5.55 I IF TOTAL TINE
TOTAL NltIBER IF INSTRlCTlIJI CYCLES. 39429
TOTAL TINE Fill A 1024 POINT FFT
2.30 os IEXCLUDIMl BIT
REI'ERSAL)

\

1+
\

I
\ I
1\
\ +

•

iii + j BI - - I COO - j SIN) - - - - - - - - - - - - - l1li' + j BI'

t
*

TR'lIitCOS+BI'SIN
TI'IIi*SIN-BI*COO
AR~= AR + TR
AI'. AI - TI
iii" AR - TR
BI'. AI + TI

HfHHHfHtHftHHHHHfffffHHHHHfffffHffflffHffftHfHfffffflHfH

I

*

*

*

J
f.
~

~

3
I

\
I

i

H+HHHffHHHffHfHHHfHHl-tHHHHfHHf+HtHHHHHHHHIIHtHtI

~

i

~

FIRST 2 STAGES AS RADIl-.4 IlUTTDFLY

;:

.glob.1
.glob.1
.globll
.glob.l
.,10bll
.glob.1

~

FFT
FILL PIPELINE

N

rtW.II
MlIERT
NATC!£l

IIIIIF
SlIIIf

.globil

"SINE

SlIIIf

5'

• ISS

INP,2048

~

• ISS

""t:l

~

""

;:

S

IIIIIIF

;:

~

~

•'"-l
I:>
;:

I:>..

~
~

""...

~
I:S
;:

~
<::S

FFTSIZ
FGW
FG41!3
FGeItZ
FG2
FG2113
LGGFFT
SINTAB
SINT"I
SINTP2

INPUT
1NPUTP2
OUTPUT

~
g

IIRO
ARI
AR2
ARJ
AR4 ,
AR5 ,
AR6'
AR7 ,

So

""
~

~

"-l
C

Q
C

eg

MP,2048

; INPUT !.tCTII! LEllGTH = 211 (IEPOIIIS
III HI
I OOlPUT !.tCTII! LEIIGTH = 211 UEPSI)S
III NI

,

.text

,:""'i

t;:,

IIIIIIF

.word

N

,.ord
.... rd

MlIERT-2
MlIERT-3
NATC!£l-2

.word
.word
,word
,l1li01'0

.word
.1III0rd

•word
...ord
.lIIord

.word

IIIIIF

..
..

tfW.B-3

"SIIE-I
SINE

STF
SlIIF
STF
ADIF
II'Yf

SlIIF
AIU
STF
stIIF
STF

IIIIIIF

lR4iSM+CR
,R'5=M-CR
;116=111+111
,R7=III-1lt
; . ' =ROaR4+R6
; RI = 01 , lit' = Ra = R4 - 116

; RO=BI +01, M' =RO
;RI=BI-OI,III'=Ra
; CR'=R2-R5+RI
; RI - CI , III' = Ra = R5 - RI
; R2-AI +el, CR'-R2
; R6-AI-el,III'-R3
; AI'

C

R4 .: R2 + RO

RADIX-4 BUTTERFLY u.P

SINE+2
INP
1NP+2
OOTP

FFTSIZ
1FG2,IRO
ISINTAB,AR7
IINPUT,IIRO
IRO,IIRO,ARI
lAO, ARI, AR2
IAO,AR2,ARJ
IIRO,AR4
ARI,AR5
ARJ,ARb
2,IRI
-I,IAO
IRO,RC
2,RC

..

::

rtW.II

RPTB

M + AI
III + BI
CR + el + CR' + CI'
III + 01
AR' + AI'
IIR' + 81'
III' + 01'
FIRST TWIIIl.E FACTIII = I
UlP
LDI
LDI
LDI
RODI
ADDI
RODI
LOI
LDI
LDI
LlII
LSI<
LDI
SUBI

stIIF

..

..

II'Yf

:1

AIU
AIU
STF

..
.

::

stIIF

..
..

IIIIIF

..
; LOAD PAGE POINTER
; IRO = Nl2 = (ffSET BETIIEEN INPUTS
; AR7 POINTS TO TWIDllE FACTII! I
; IIRO POINTS TO AR
; MI POINTS TO III
; AR2 POINTS TO CR
; ARJ POINTS TO II!
; AR4 POINTS TO AR'
; AR5 POINTS TO III'
; M6 POINTS TO III'
;Allll!ESS(ffSET
; lRO = H/4

...

N.IUIER CF R4-BUTTEJFLIES

stIIF
ftPYF

stIIF
STF
stIIF
ADDF
STF
stIIF
STF

I:

I
FFT:

II'Yf

::

1t1R2._,R4
lt1R2,tMO++,R'5
tMI,IM3,R6
tMl++,_,R7
R6,R4,RO
_,tM7,RI
R6,R4,Ra
RI,tMl,RO
RO,_
RI,tMl++,RI
Ra, tAR'5++
RI,R'5,R2
HM2,tM7,RI
RI,R'5,Ra
RI,_.R2
R2, tM2++IIRIl
RI, tARO++ , R6
R3,1t1Rb++
RO,R2,R4

AOOF
II'Yf

:1

STF
SUIIf
STF
ADIf

"'if
SlIIF

IIIIIF

::

..

STF
stIIF
STF

IlKI
1t1R2-,tM7,RO
RO,R2,R2
tMI++.tM7,RI
R7,R6,Ra
RO,_,R4
R4,tM4++
RO, tMO++, R5
R2, tAR'5++
R7,R6,R7
RI,IM3,R6

; RO - CR , (81' = R2 = R2 - ROI
; RI=III, (el' -R3=R6+R71
; R4 - AR + CR , (AI' = R41
; R5 - AR - CR , (81' = R21
; (01' = R7 = R~ - R71
; 116 = III + lit , 101' = R71

R7.fM6++

RI, IARJ++ , R7
Ra,tM2++
R6,R4,RO
_,tM7,RI
R6,R4,Ra
Rl,tMl,RO
RO,tM4++
RI,tMl++,RI
R3, tAR'5++
RI,R'5,R2
HM2,tM7,RI
RI,R'5,R3
RI,_,R2
R2, tM2++IIRIl
RI,tMO++,R6
R3,tM6++

; R7 = III - lit , (el' = R31
,M'-RO-R4+II6
;RI=0I,IIt'=R3=R4-R6
;RO=BI+DI,M'=RO
; RI =BI-DI,III' =R3
; CR' =R2=R'5+RI
; RI = el , III' = R3 = R5 - RI
; R2-AI +el, CR' =R2
; R6=AI-el,lII' =Ra

8

RO,R2,R~

ADIF

IILKI

; AI' = R4 = R2 + RO

CLEAR PIPELINE
SUflf
ADIF
STF
STF
SU!IF
STF
STF

~

"

RO,R2,R2
R7,Rb,R3
R4,tAR4
R2;tAR5
R7,Rb,R7
R7, tARb

THIRD TO lAST

R2,tAR3++
AIlS,RC

FIRST BUTTERfLY-TYPE;

; BI' = R2 = R2 - RO
; CI' = R3 = Rb + R7
; AI' = R4 , BI' = R2

TR = IIR • COS + BI • SIN
11 = IIR • SIN - BI • COS
AR'= AR .. TR

; DI' = R7 = Rb - R7
; DI' = R7 , CI' = R3

AI'= AI - 11
BR'= AR - TR
BI'= AI + 11

R3,t-AR2

;:

~

STF
llII

ttfUHtnHttUHf .......U.HfH-t.HHHHfH.HH.H+HtttH+HHHHHtH-H

(f

STAGE 2

HHfHHHH+f+4HtH+HtHfH-HHH+HfffHft++fHUHHHHHftHtHfHfHt

"t3

~

;:
~

;:

is

~.

;:

~

STUFE

~

~

i:::I

<"'l
.'""3
I:l

;:

I:>..

C

S.
~
....
~
I:l

;:

"

llII
llII
SUBI
LOI

t!FG2,IRI
IRO,AIlS
I ,AIlS
I,ARb

LOI
1lI1
1lI1
LOI
ADDI
1lI1
LSH
LSH
LSH
LSH

tsINTAB,AR7
O,AR4
@INPUT,ARO
ARO,AR2
IRO,ARO,AR3
AR3,ARI
I,ARb
-2,AIlS
I,ARS
-I.IRO

LSH
ADDI

-I,IRI
I,IRI

lCf
lCf

tARI++,Rb
tAR7,R7

~

~

"

tv
C

"
"

"
"
"

BFLII

"

lCf
It'YF
ADCf
It'YF
l'I'YF
ADCf
It'YF
SUBF
ADCf

f++M7,Rb
tARI-,Rb,RI
HtM4,RO,R3
tARI,R7,RO
fMl++,1{:fI.7--,RO
RO,RI,R3
tARI++,R7,RI
R3,tARO,R2
tARO++ , R3 ,115

BFLYI
.+ARI,Rb,1I5
R5, IM2++
RI,RO,R2
tARI,R7,RO
R2, 'ARO,R3
R2, fARO++ ,R4

R3, tAR3++
RO,II5,R3

It'IF

~1++,R6.RO

SUflf
It'YF
STF
ADCf
STF

R3, 'ARO,R2
tARl++,R7,Rl
R4, tAR2++
tARO++, R3, 115
R2. tAR3++

; 115 = BI • SIN, IAR' = 115)
; IR2 = TI = RO - R1l
; RO = BR • COS , IR3 = AI + TIl
, IR4 = AI - 11 , BI' = R3)
,R3=TR=RO+II5
, RO = IIR • SIN, R2 = AR - TR
, RI = BI • COS , IAI' = R4)
;II5=AR+TR,IIR'=R2

SWITCH OVER TO NEXT IliOLI'
, STEF FRO! 0llI IIWlINARI TO NEW REAl
VILLI:
; M'ItY llIAD, CM.I Foo ADDRESS Ll'DATE
: R7 = COS

,
,
,
,
;

g
S.

"

"

::

FILL PIPElINE

~
to

Q
C

; POINTER TO TWIDIl£ F~Too
: GR(XP CCJ..NTER
, LfPER REAl I1JTTERFL Y I tf'UT
; LfPER REAl I1JTTERFL I OUTPUT
, LIllER REAl IIJTTERFL Y OUTPUT
, LIllER REIL I1JTTERFl I INPUT
, oo..aE IliOLI' CCJ..NT
: IW.F I1JTTERFLI CCJ..NT
, ClEAR LSB
, IW.F STEF FRO! LfPER TO LIllER REAl
PART

GRLI'PE

~
C

~

"

RPTB
It'IF
STF
SUBF
It'YF
ADCf
SUflf
STF
ADCf

ARO = lfPER REIL BUTTERFlY INPUT
ARI = LIllER REAl BUTTERfL I Itf'UT
AR2 = lfPER REAl BUTTERFLY OUTPUT
AR3 = LIllER REIL I1JTTERFlI OUTPUT
Tf£ IIWlINARY PART HAS TO F(U~
, Rb = SIN
; RI = 81 • SIN
, [JJt11 AOCf foo CCJ..NTER Ll'DATE,
,RO=IIR.COS
, R3 = TR = RO + RI , RO = IIR • SIN
,RI =BI.COS, R2=AR-TR
,R5=AR+TR,IIR'=R2

"
"
"

"

SU!IF
ADCf
STF
SU!IF
STF
HOP
It'YF
STF
It'YF
It'YF
SUflf
It'IF
SUflf
ADCf
STF
llII

RI,RO,R2
R2, tARO,R3
R5, tM2++
R2, 'ARO++( IRl) ,R4
R3, iAR3++(JR1l
tARI++IIRIl
tARI--,R7,RI
R4, fAR2++( IRl)
.ARI,Rb,RO
tARl ++, *AA7++, RO
RO,RI,R3
tARl ++, Rb, RI
R3,tARO,R2
tARO+<, R3, 115
R2,tAR3++
AIlS,RC

;R2=1I=RO-RI
, R3 = AI + 11 , AR' = 115
, R4 = AI - 11 , BI' = R3
; ADDRESS II'DA TE
; RI = 81 • COS , AI' = R4
; RO = IIR • SIN
; R3 = TR = RI - RO , RO = IIR • COS
; RI = BI • SIN, R2 = AR - TR
; 115 = AR + TR , I1R' = R2

~

;:s

~

'15

~
~

'~"

tl

....,
"

;:s

I:>..

So

"

'.,"

"

§

"

~

~
<:;

~
'"§

"

BFLY2

"

~

"

~
N

C

::

C

::

a

BFLY2
f+ARI,R7,R5
R5,f/IR2++
RI,fIO,R2
tARI,Rb,RO
R2,tARO,R3

, R5 = BI • COS , (AR' = R5)

R2, fARO++,R4

, (R4 = Al - TI , BI' = R3)

R3,.ARO,R2
fARI++,R6,R)

AODF
ADDF
STF
CllPI
BNED
SUBF
STF
LDF
STF

, (R2 = TI = flO + Rll
• 110 = \Ill • SIN, (R3 = Al + TIl

,TR=R3=R5-fIO
,RO=BR'COS, R2=AR-TR

"

, RI = BI f SIN, (AI' = R4)

R4, 1M2++

tARO++ , R3, RS
R2, tAR3++

,R5=AR+TR,BR'=R2

RI,RO,R2
R2,tARO,R3
R5,iAR2H

,R2=TI=RO+RI
,R3=AI+TI
,AR'=R5

ARb,AR4
GRUPPE
R2, tARO++( IRI) ,R4

, DO FOLL[ljIN; 3 INSTRUCTIONS
, R4 = Al - TI , BI' = R3

N(f'

R3,I-M3++fJRl1
H+AR7,R7

"

ADDF
SUBF
AODF
SUBF
STF
STF
STF
STF
STF
STF
STF
STF

ADDF
SUBF
AODF
SUlF

,AR' =Rb=AR+BR
tARO,tARI,Rb
tARl ++, tARO++, R7
,\IIl'=R7=AR-BR
, AI' • R4 = Al + BI
tARO,tARI,R4
tARI++( IfIO), fARO++( IfIO) ,R5 , BI' = R5 = Al - BI
, (AR' = R2)
R2,fAR2++
, (BR' = R3)
R3,tAR3++
, (AI' = 110)
RO,.AR2++
, (BI' = Rll
RI, tAR3++
R6,.M2+t
,AR'=Rb
R7,tIIR3++
,BR'=R7
R4, f/IR2++( IRO)
, AI' = R4
R5, tAR3++( 1110)
, BI' = R5

, R7=COS
, AI' = R4

tARl++IlRI)

, BRIIIICH IERE

LDF

"
"

, ,Uf'

tARO++ , f+/IR I , R5
tARI,tARO,R4
tAR I++, tARO--, Rb
tARt +t, tARO++, R7

LDF
SUBF
STF
STF
STF

\0

tINPUT,ARO
ARO,AR2
IfIO.ARO,ARI

,AR' =RS=AR+BI
, AI' = R4 = Al - BR
; Bli =R6 = AI + BR
,\Ill' =R7=AR-BI

HARt, H+MO,R3

,AR'=R3=AR+BI
f-1iR7,RI
, RI = 0 (FlXl INNER LOCf»
, flO = BR (FlXl INfER LOCf»
tARI++,RO
*ARl++( lRO), fARO++, R2 ,BR'=R2=AR'BI
, (AR' = R5)
R5,t-AR2++
, (BR' = R7)
R7,tIIR3++
, (BI' = Rb)
R6,fAR3++

OUT AFTER lO(NH STIlJlE
5. TO ". WTTERFLY'
RPTB

BF2END

LDF

tAR7++,R7
R4,tAR2++
tAR7++,R6
R2,tAR3++

SECOND TO LAST STIlJlE
LDI
LDI
AODI

,AR'=R2=AR+BR
,\IIl'=R3=AR-BR
, AI' = flO = Al + BI
, BI' = RI = Al - BI

4. WTTERFLY' w"II/4
ADDF

R4, f/IR2++( IRll

4,IRO
STtJ'E

tARO, tARl ,R2
tARl++, tAROt+ ,R3
tARO, tARl , flO
fARl++,tAf'{O++,Rl

3. WTTERFLY, w"II/4

END OF THIS BUTTERFLY GROUP
CllPI
8HZ

ADDF
SUBF
ADDF
SUBF

2, BUTTERFLY' 0"0

CLEAR PIPELINE

So

'"

RPTB

tARl++,R7,RO

LIllER OUTPUT
POINTER TO TlIIDIlE FACTlXl
DISTANCE IIETWEEN TlIO Iil(U'$

1. WTTERFL y. ."0

I'PYF
STF
AODF
ItPYF
AODF
SUBF
STF
SUBF
ItPYF
SUBF
rf>YF
STF
AODF
STF

R3, tAR3++
fIO,R5,R3

ARI,1IR3
ISINTP2,AR7
S,IfIO
IFGSII2,RC

FILL PIPELINE

.HHHtHHHHnHfHHHHUfHHfHHtHfHfHtHHHfHfftHHfHHH*lH

•

(J

a

lOi

TR=BIfCOS-BRfSIN
TI = BI • SIN + \Ill • COS
AR'= AR + TR
AI'= Al - TI
BR'= AR - TR
BI'= Al + TI

5";:s

I:l

LDi

SECOND WTTERFLY-TYP£'

;:s

.:;,
.=ii
...,

lOi
lOi

IHf ...t,n.UUHuffHff+UffltffHfHHH.ftfHffHIHHtHHH'-IHHH.fHf

•

UPPER IIf'UT
UPPER OUTPUT
l~ IIf'UT

"

STF

"

STF

LDF

, R7 = cos

,

((AI' = R4))

, Rb = SIN, (BR' = R2)

\0
N

"

II'YF
STF
ADIF
II'YF
ADIF
SUIIF

STF
ADIF
II'YF
SUIIF

~

;0

~

"C::l

~
~

'"

;0

It

::
"

S

5"

"

~

"

;0

~

"

tI

II

,:--i

"

~

§
I:l..

a
So
'..."
~
~

"
"
"

;0

~
<:l

~

<:l

;0

"
I:
"

So

'"

"

~

::

a

"

~

N
<::::>

<::::>

II'YF
STF
ADIF
STF

<+ARI,RO,RS
R3,tM2++
RI,RO,R2
tMI,R7,RO
R2,_ARO,R3
R2,_ARO++URO),R4
Rl,  OFFSET

tMO,tMl,Rh
; AR" =R6=M+BR
tARl++,ofARO++,R7
,BR'=R7=AR-BR
AI' = R4 • Al + BI
,
tMO,tMl,R4
tMl++( lRO), _URO) ,RS , .81' = RS = Al - 81

2. BUTTERFLY; 1O"It/4

; (R4 = Al - Tl , 81' = Rl)

:1

, RI = 81 _ SIN, (AI' = R4)
,RS=AR+TR,BR'sR2

, R4 • Al - Tl , 81' = Rl

LAST STAGE

; (R2 s Tl = RO - Ril
, RO = BR t COS , (Rl = Al + TIl

,Rl=TR=RS-RO
; RO=BR_COS, R2=AR-TR

,R2=Tl=RO+RI
,Rl=AI+Tl, AR'oRl

II

ADIF
LIF
LDF
SlIlIF
STF
STF
STF

3. TO

, RS = 81 _ COS, (AR' = RS)
,(R2=1I =RO+RIl
,RO=BR_SIN, (Rl=AI+TIl

"

; (R4 = Al - Tl , y(U = BI' = Rl)

"

, R3=TR=RS-RO
,RO=BR_COS, R2=AR-TR

"

,AR'=Rl=AR+BI
<+AR1.tMO,Rl
, RI = 0 (FOR INI£R LOOP)·
HtR7,RI
, RO = BR (FOR IllER LOOP)
tMl++,RO
tMl++URO).tARO++,R2 ,BR'=R2=AR-81
, (AR' = Rh)
R6,tM2++
, (BR' = R7)
R7,_
, (81' = RS)
RS, tAR3++! lRO)

n. BUTTERFLY;

LDF
STF
LDF
STF
II'YF
STF
ADIF
II'YF

tM7++,R7
R4, tM2++URO)
tM7++,Rh
R2,tM3++
<+ARI,RO,RS
Rl,tM2++
RI,RO,R2
tMI,R7,RO

, R7 = COS , (AI' = R4)
, RO = SIN, (BR' = R2)
,RS=8ItSIN, (AR'=Rl)
, (R2 s 11 = RO + Ril
; RO = BR t COS , (Rl = Al + TIl

::...
;::;

AD[\'

R2,4ARO,R3

"

i:!~

SUIif
STF
AO[\'
I1PYF

"

SUIIf

\:i

"

rf'YF
STF

"

STF

R2, *AROH( IRO), R4
R3, tAR3++{ IROI
RO,RS,R3
tAA!++,Rb,RO
R3,4ARO,Rl
tAR! ++IIROI, R7, RI
R4, tAR2++ llRO I
fAROH,R3,R3
R2, tAR3++

:?
'f5

"
;::;

S·

;::;

"

AD[\'



[}
<::>

14

PRO)

DATA
CYCLES PER BUTTERFLY:
STAGES I AND 2
STAGES 3 TO 8
STAGE 9
STAGE 10

=1
=0
= 0.707
= 0.707

R{WNI3»
-HWNI3)
R{WNI7J)
-HWNI7))

= 0.831
= 0.556
= 0.195
= O.Wl

= COSI2IPI'3/32)
= SINI21PI'3/32J
= COSI2'PI.7/32)
: SINI2IPlt7I32J

'i

=
....
Q..

~
~

~

~

~

AVAIlABlE FFT

(F

LESS CIl EIIUAl LENGTH.

THE MISSING TWIDlli FACTCIlS IIINI) ,WNII, •••• ) ARE GENERATED BY USING
TI£ SYlV'ETRY IININ/4'nJ : -J'lINln). THIS CAN BE EASILY REAlIZED, BY
CHANGING REAl- AND 1I1AGINARY PART (F TI£ TWIDDlE FACT!l1S AND BY
NEGATIr«l TI£ NEW REAl PART.
TO CHfiNGE THE FFT lENGTH MY THE PARAIIETERS IN TIE i£AIER (F
TWlDIKBR.ASII AND TI£ IN'UT AND OOTPUT VECToo LOOTHS NEED TO BE
AlTEREO.

231 WCIlDS
512 WCIlDS

.,i'·...U+H·.....-HH'-fH.:...+iff-HH**Uff ..U.U'HHHU...... f .......... fH·UH ... . , ..

8
8.~

>

't:S

~

COEFFICIENT
R{WNIO» = COS(2.PltO/32)
-I (WNW)) = SIN121Plt(/32)
R(WNI4)) = COSI2IPI'4/32)
-HWNI4Il : SINI2IPI.4/32)

Wi£N GENERATED Foo A FFT lENGTH (F 1024. TI£ TAlilf IS Foo AlL

*HfHHHHfHHUUUHHfHfHHH+HHHHHUUHfHfffHHHf+fHHfHf*

;:,
;::s

I:i

12
13

(F

FIRST Hll PASSES ARE REAlIZED AS A Frul IlUTTERFLY LID' SINCE TI£
I'ILTIPlIES ARE TRIVIAl. THE "'-I.TIPlIER IS MY USED FCIl A LOAD IN
PARAllEL WITH AN ADDF CIl SUBF •

EXAIIPlE FOR A 1024-POINT FFT IWITH BIT REVERSIIl) :

So
'"....

•

15

.'"-l
;:,..
<:)

ADDRESS

o

~

~

EXAI\PlE: SOOWN Foo N=32, WNln) = COSI2IPI'n/N) - J.SINI2IPItn/N)

24.07.89

TI£ Ictl1P!.£X) DATA RESIlE IN INT~ i'£lUlY. TIE COII'UTATICt< IS IOE
IN-PlACE, BUT TI£ REStO IS IIMll TO AllJTi£R /IElIllY SECTICt< TO
IlErliNSTRATE TI£ BIT-REVERSED ADDRESSlr«l.

Ei

THE TWIDlLE FACT!l1S ARE STORED IN BIT REVERSfII CIlOER AND WITH A TAlilf
LEr«lTH (F NI2 IN = FFTlENGTH).

to

..

10.5 IIlE TO EIT. rElQlY WAITS) •

Pi< + j AI ---------------------------------------- AR/ + j AI/
I •
\

AVERAGE CYCLES/BUTTERFLY
7.4~
TOTAl BUTIERFLYCYCLES
: 38272
INITIALIZATICt< OVERI£AD
: 2185 : 5.4, (F TOTAl TIrE
TOTAl NlI1BER (F INSTROCTlCt< CYCLES = 40457
TOTAL TIrE FOR A 1024 POINT FFT
2.42 IS IINClUDlr«l BIT
REVERSAl) •

I

'*
f

HftffttHHHHHfHHHHHfHffHffHffHHHfHtffHHHHHHfffHtHHff

I

\ /
/ \

"
f

•
•
•

/

\

.

BR + j BI --- ( COS - j SIN) -------------------------- fIR'- + j BI"
TR=BR:l-COS+BI"SIN
T1 = BR * SIN - BI " COS
M'= AR t lR
AI'= AI - TI
DR': AR - TR
BI'= AI • TI

HHHH******HUHHH-HHHHHH....+H.. H-IHHftH+HH .. HHHHH*HHHH

....~
~

N

~
~
~

I

~

~

>
~

;:..
;0

~
'1:l

.glo... 1
.global
_global

;Ji

.gI0 ••1

~

.glob.l
.global
.global

~

;0

S

5'

.bss

;0

.Q,

.~

.bss

t:I

•FFTSlZ

::.;0

FGW
FD4n3
FGS112

(l
.~

::..
0

S-

...

~

~

I:l
;0

~
C

~

,word

~

~
~

tv

a
C

\0

VI

,lIIIord

. .,ord

S-

C

,word
.!IIord
.!IIord

FG2

;0

•FFT:

FILL PIPEWE

"SINE
111',2048

INPUT "",CTIIl LOOTH = 2N IIEPEIIDS
ON NI
OOTPUT IIOCTlil LOOTH = 2N llEPEIIDS
ON N)

OOTP,2048

. text

FG2n3
LOOm
SINTAB
SINTnI
SINTP2
INPUT
INPUTP2
OUTPUT
IlITPI

'"C

FIRST 2 STAGES AS RADII .... armRFLY

FFT
N
ItIAUI
MlIERT
NACHTEL

• itoI'd

.word
..,ord
.lIIOrd

, ..otd
...ord
,lIIord

.word

N

MlIERT-2
NYIERT-3
NACltTEL-2
ItIAUI
NHALB-3

"SIIE
SINE-!
SINEt2
INP
INPt2
DUTP
OOTPtl

FFTSIZ
1FG2,IRO
tsINTAB,AR7
IINPUT,ARG
IRO,ARO,ARI
lRO,ARI,AR2
lRO,AR2,AR3
ARO,AR4
ARl,AR5
AR3,ARb
2,IRI
-I,IRO
IRO,RC
2,RC

..
..
..
..
..

1M2, IARO, R4
IM2,IARO++,R5
tARI. tAR3,Rb
tARt ++ I fAR3++, R7

AlIII'
I'I'YF

R6,R4,~

AR'=RQ=R4+R6

IM3++ , fAR7 , R1
R6,R4,R3
RI,'ARI.RO
RO,tAR4++
Rl, tARt ++, Rt
Rl,+AR5++
RI,R5,R2
HARZ, IM7. Rt
RI,R5,R3
RI, IARO,RZ
R2, tAR2tt IIRII
Rl, tARO++ ,Ro
R3,*Mbt+
RO,R2,R4

RI=DI, 1IR'=Rl=R4-Rb

SU!F
ADIF
SlF
StIIIF
STF
ADIlF
I'I'YF

SU!F
ADIlF
STF
SU!F
STF
ADIlF

R4=ARtCR
R5=AR-CR
R6=M+8R

R7 = III - lIR

; RO=BI tDI, AR' =RO
; RI =BI-DI, IIR' =Rl
; CR' =RZ=R5tRI
; RI = CI , Ill' = Rl = R5 - RI
;R2=AHCI,CR'=RZ
;Rb=AI-CI,IIl'=Rl
; AI' = R4 = R2 t RO

RAD IX-4 9JTTERFLY LOOP

ARG AR t AI
ARI IIR t BI
AR2 CR t CI + CR' + CI'
AR3:IIltDI
AR4 : AR' tAl'
AR5 : IIR' t SI'
ARb : Ill' t DI'
AR7 : FIRST TWIDIl£ FACTIIl = I

UIP
LDI
lOI
lOI
AI1DI
ADDI
AI1DI
LDI
LDI
LDI
LDI
LSH
LDI
SUBI

.

ADIlF
SU!F
ADIlF
SUBF

:
;
;
;
;
;
;
;
:
;
;
;

LOAD PAGE POINTER
lRO = N/2 = IJ'FSET BETlEEN INPUTS
AR7 POINTS TO TWIDIl£ FACTIrl I
ARG POINTS TO AR
ARI POINTS TO IIR
AR2 POINTS TO CR
AR3 POINTS TO III
AR4 POINTS TO AR'
AR5 POINTS TO 1lR'
ARb POINTS TO Ill'
ADIIlESS IFFSET
IRO = N/4 = IUIIER IF R4-IUTTERFLIES

..
..
..
..
..
..
..
..
.
.

"

RPTB
nPYF
SU!F
NPYF
ADIlF
ADIlF
SlF
SU!F
STF
SU!F
ADIlF
STF
SU!F
STF
ADIlF
NPYF
SU!F
ADIlF
STF
SU!F
STF
ADIlF
I'I'YF

StIIIF

ADIlF
STF
SUBF

BLKI
tAR2-, tAR7, RO
RO,RZ,R2
tARI ++, tAR7 , RI
R7, 116, Rl
RO,'ARG,R4
R4,tAR4++
RO, +ARG+t, R5
R2, +AR5+t
R7,Rb,R7
RI,.AR3,Rb
R7.+ARb+t
RI, .AR3+t. R7
R3, tAR2++
R6,R4,RO
tAR3++,tAR7.RI
R6,R4,Rl
Rl,.ARI,RO
RO,tAR4++
RI, tARI++,RI
Rl,+AR5++
Rl,R5,R2
++AR2, tAR7 ,RI
RI,R5,R3
RI,IARO,R2
R2, IM2++ URI)
RI, +ARG+t, Rb

; RO = CR , (SI' = R2 = R2 - RO)
: RI = IIR , ICI' = R3 = 116 t R7)
; R4 = AR t CR , (AI' = R4)
: R5 = AR - CR , (SI' = RZ)
; (DI' = R7 = 116 - R7J
; 116 = III t IIR , 101' = R7)
; R7 = III - IIR , (CI' = Rl)
; AR . . =RO=R4 tRo
; RI=DI, 1IR'=Rl=R4-Rb

; RO = SI t DI , AR' = RO
; RI = SI - DI , IIR' = Rl
;CR~=R2=R5+Rl

; RI = CI , Ill' = Rl = R5 - RI
; R2=AI tel, CR'-R2

; R6=AI-CI, IIl'-Rl

\0
0\

STF
AIIIF

"

IlJ(I

R3,tM6++
RO,R2,R4

LDI
; AI" • R4

II

R2 + RO

CLEAR PIPELIIE
9.IlF

~

::s
~

~

[

AIIIF
STF
STF
SlIIF
STF
STf

II

:::to
STIFE

~

~

LDI
LDI
SUBI
LDI

1IFIl2,IRI
IRO,1III5
1,l1li5
I, ARt.

LDI
LDI
LDI
LDI
ADDI
LDI

ISINTAII,AR7
O,AR4
lllfUf,ARO
ARO,AR2
IRO,ARO,AR3
AR3,ARl
1,AR6
-2,l1li5
1,l1li5
-l,IRO

lSH

t;:,

lSH
lSH

~

,:-.,

lSH

s::.

::s

ADDI

-l,IRI
l,IRl

LDF
LDF

tMl++,R6
+AR7,R7

lSH

s::...

0

;;.

~

....

"

~

IiM'PE

§

~

<:i

::s
~

~

~

"
"

N

C

"

C

"

a

"
,
,
,
,
,
,
,
,

POINTER TIl T11ID11.£ FACTlI!
GROll' CWIIER
lI'PER __ IIIT1BIflY IIfUf
lI'PER __ IlUTTERfLY 001I'UT
UIER REAl. IIIT1BIflY 001I'UT
UIER REAl. IIJTTERFlY IlIFUT
OCUII.E GROll' ClUIT
HAlF BUTTERFlY CWIT

"
"

"
lIFLYl
"

, HAlF STEP fRIll lI'PER TO UIER REAl.
PART

LDF
l'fYF
AIIIF
IIPYF
II'YF
ADIF
IIPYF
SlIIF
AIIIF
STf

t++AR7,R6
tMl--,H6,RI
t++AR4, RO, R3
tMI,R7,RO
tMI++,tM7-,RO
RO,RI,R3
tMI++,R7,Rl
R3,tMO,R2
tMO++ ,R3,RS
R2,_

, STEP FlO! OLD lIIA61NARY TO NEll __
YALlE

,

"
II

, R7-COS

,HI =BIICOS, R2=AR-TR
,R5=AR+TR,IIR'=R2

IFLYI
_1,&,115
115,_
RI,RO,R2
tMl,R7,RO
R2,tMO,R3
R2, tMO++, R4
R3,_
RO,II5,R3
tMl++,R6,RO
R3,tARO,R2
tMI++,R7,Rl
M,_
tMO++,R3,II5
R2,_

5UIF
STF
AIIIF
II'YF
SlIIF
II'YF
STf
AIlIF
STF

9.IlF

, IUIff lMD, IN.Y FOR AIlDR£SS lI'DATE

, ARO = lI'PER __ IIIT1BIflY IIfUf
, Ml = UIER __ BUTTERFlY IlIFUT
, AR2 = lI'PER __ IIIT1BIflY 001I'UT
, AR3 = UIER __ IIIT1BIflY 001I'UT
, TIE lIIA61NARY PMT HIlS TIl FCllIJi
, & = SIN
,RI-BltSIN
• !lIllY ADIF FOR CWIIER lFIlA1E
,RO=lIRtCOS
,R3=TR=RO+RI, RO=BRISIN

IIPT8

II'YF
STF
SlIIF
II'YF
AIlIF

,R5=BltSIN, (AR'sR5)
, (R2 = TI = RO - RIl
,RO'IRICOS, (R3'AI+TIl
, (R4 = AI - TI , II' = R3)
,R3=TR=RO+II5
,RO=IRISIN, R2=AR-TR
, Rl • BI t COS , (AI'

=R4)

,R5=AR+TR,IIR"R2

SIIITat MR TO IEXT GROll'

, ClEAR lS8

AIIIF
STF
5UIF
STf
MOP

"

FIll PIPELINE

;;.

, DI' • R7 • R6 - R7
, DI' s R7 , CI' = R3

II

S

~
<:i

TR-IIRICOS+BltSIN
T1=IIRISIN-BItCOS
M"= AR + TR
AI'= AI - TI
IIR'= AR - TR
81'= AI + TIt

, BI' • R2 = R2 - RO
, CI' • R3 = R6 + R7
, AI' =M , II' = R2

THIRD TIl LAST-2 STAGE

•

~

g

RO,R2,R2
R7,&,R3
M,_
R2,_
R7,R6,R7
R7,tAR6
R3,t-AR2

11115,RC

FIRST IIU1lERfLY-TYPE'

II

:1
II

IIPYF
STF

"'VI'

IIPYF
SlIIF
II'YF
SlIIF
AIlIF
STf
LDI

Rl,RO,R2
R2,tMO,R3
R5,tAR2++
R2, +MO++lIRIl,R4
R3,_IIRll
tMI++IIRIl
tMI-,R7,RI
R4, tAR2++IIRIl
tMI,R6,RO
tMI++,+AR7++,RO
RO,RI,R3
tMI++,R6,HI
R3,tARO,R2
tMO++ ,R3,R5
R2,_
11115,RC

SECOND BUTTERFlY-TYPE'
TR=BIICOS-BRISIN
T1=BIISIN+BRtCOS
AR'= AR + TR
AI'= AI - Tl
BR'. AR - TR

,R2sTlsRO-Rl
,R3=AI+Tl,M'=R5
, M = AI - Tl , BI' = R3
, ADIJIESS lI'DATE
,Rl=IIICOS,AI'=H4
,RO=BRtSIN
,R3=TR=HI-RO,RO=lIRtCOS
, HI = BI t SIN, R2 • M - TR
,R5=AR+TR,BR'=R2

SlIF

~

SI'= AI + TI

~

rg'

RPTS

IFLY2

II'YF

_I,R7,R5
115,_
RI,RO,R2
IMI,RIo,RO
R2,tMO,R3
R2,_,R4
R3,_
RO,II5,R3
IMI++,R7,RO
R3,tMO,R2
tARl++,R6.Rl
R4,1M2++
tMO++,R3,R5
R2,_

ADIF
SUIF
ADIF

~

is

~

~

c;:,

(")

!"'l

II

..
..
..

IfLY2

'"...

..

~
l:I

II

~

:;

~

~
g
So

'"

~

~
N
C

Q
C

STF
AlII'
STF

, 115 • SI • COS, lilli' • 1151
, IR2 = TI a RO + RII
, RO = SR • SIN, IR3 = AI + TIl
, IR4 = AI - TI , SI' = R31

:1
,TR=R3=II5-RO
,RO-SR'COS,R2=IIII-TR
, RI • SI • SIN, IAI' = R41

I:

..

ADIF
AIIIF
STF
Cll'1
lIED
SIIIF

STF
LIF
STF
10'

, SI' • RI a AI - SI

RI,RO.R2
R2,tMO.R3

-

R5,~++

STF
STF
STF
STF
STF
STF
STF
STF

AIIIF
SUIF
ADIF
SIIIF

,R2=T1=RO+RI
,R3=AI+T1
,l1li'=115

R2,_IIRIl,R4
R3, tAR3++IIRIl
t++AR7,R7
R4, IM2++IIRII
IMIHIIRII

, R7 = COS
, AI' • R4
, lIRMC1t !ERE

_,'+ARI,R5
IMI,'IIRO,R4
IMI ++, tIIRO-, RIo
IMIH,_,R7

,l1li' =115= l1li +SI
, AI' • R4 = AI - SR
, SI' = RIo = Al + SR
,SR'=R7=IIII-SI

4. BUTTEllfLY, l1"li/4

ARIo,IIII4

, 00 FIllIIIINl 3 INSTROCTICtIS
, R4 = AI - TI , BI' = R3

,M'aRlo-IIII+SR
tMO,IMl,RIo
IMI ++, tMO++ ,R7
,SR'=R7=IIII-SR
, AI' • R4' AI + SI
tMO,IMI,R4
IMI++IIROI,_IIROI,R5 , SI' = 115 = AI - SI
, lilli' = R21
R2,iAR2++
, ISR' = R31
R3,IM3++
, IAI' • ROI
RO,IM2++
, lSI' • RII
RI,IM3++
RIo,IM2++
,1III"RIo
,SR'aR7
R7,IM3++
R4, IM2++IIROI
, AI' = R4
, SI' = 115
115, IM3++IIROI

3. IlUTTEllfLY' l1"li/4

,R5-IIII+TR, SR' =R2

AIlIF

..
..

LIF
LIF
SIIIF
STF
STF
STF

_I, _ , R3

,1III'=R3=IIII+SI
, RI = 0 IFIR IIftR LQ(p1
'-AR7,RI
,
RO = SR IFIR IIftR LQ(p1
IMI++,RO
IMI++IlROI, .AROt+,R2 ,SR'=R2=IIII-SI
, lilli' = 1151
R5,tAR2++
R7,_
, ISR' = R71
RIo,_
, lSI' • RIoI

END OF THIS BUTTERFLY GROll'
5. TO ". BUTTERFlYO
Cll'1

eNZ

4,IRO
STtfE

, ..uP OOT AFTER LDINI-3 STADE

SECIltl TO LAST STADE
LDI
LDI
ADDI
LDI
LDI
LDI
LDI

1III'UT,IIRO
1IRO,AR2
.IRO,IIRO,IIIII
1III1,AR3
1S1NTP2,11117
5,IRO

tFPER III'UT
tFPER OOTPUT
LIIER III'UT
LIIER 00TPI1T
POINTER TO TYIIlIl.£ FIICTIR
DISTAra BETlEEN 00 GROlI'S

IFG~.RC

-

I. IlUTTEllfLY' w"O
SUIF

ADIF

RPTB

1F2ENl

..
..
..

LIF
STF
LIF
STF

.

II'YF
AlII'
stlIf

tM7++,R1
R4,_
tAR7++,RIo
R2,tAR3++
_I,RIo,R5
R3,IM2++
RI,RO,R2
IMI,R7,RO
R2,tMO,R3
R2,tMO++IlROI,R4
R3, tAR3++IIROI
RO,R:i,R3
IMI++,RIo,RO
R3,tMO,R2
IMI++,R7,RI
R4, tAR2++IIROI

I:

FILL PIPELINE

~

SlIF

..

!lEAR PIPELIIE

S.
C

II'YF
AlII'
SIIIF
STF
SIIIF
II'YF
SIIIF
II'YF

:1

~

So

STF
AlII'

II

IMI++,_,RI

2, IlUTTEllflYO w"O

tMO,IMI,R2
IMI++,_,R3
tMO,IMl,RO

l1li' -R2=IIII+SR
SR'=R3=IIII-SR
AI' = RO = AI + BI

..
..

~F

STF
ADIF

STF

II'YF
SIIIF
II'YF
STF

, R7 = COS, IIAI' = R411
, RIo = SIN, ISR' • R21
, 115 a SI • SIN, lilli' = R31
, IR2 = TI = RO + RII
,RO=SR'COS,IR3=AI+TII
, IR4 - AI - TI , BI' = R31

;R3=TR=RO+R5
, RO - SR • SIN, R2 = l1li - TR
, RI = BI • COS , IAI' a R41

\0

00

..

AIIIF

STF
II'YF
STF

-

::

SIIiF

II'YF

:1

SIIiF
STF

::
~

:::

~

't:i

~

II

:!

..

::to

is
§

::

.s;,

II

~

AIIIF
II'YF
S1IIF
II'YF
STF
AIIIF

..

STF
II'YF
STF

SIIiF

_,R3,AS
112,_
_I,R6,AS
AS,tlIR2++
RI,RO,R2
tMI,R7,RO
II2,oARO,R3
R2,_,R4
R3,tM3++
RO,AS,R3
tMI++,R6,RO
R3,oARO,II2
tMI++UROI,R7,RI
R4,tlIR2++
_,R3,R3
112,_
_I,R7,AS
R3,iAR2++
RI,RO,R2
tMI,R6,RO
II2,oARO,R3
R2,tMO++UROI,R4
R3,_IIROI
RO,AS,R3
tARl++,R7,RO
R3,oARO,II2
tMI++,R6,RI
R4,tlIR2++(IROI
tMO++,R3,AS

1:::1

II

.!'"'l

:1

II'YF
ADDF
SU8F
STF
S1IIF
II'YF
S1IIF
II'YF
STF

1::1

..

STF

112,_

S-

::

II'YF
STF

~
l::I
:::
~
C

::

_1,R7,AS
AS,tlIR2++
RI,RO,R2 .
tMI,R6,RO
II2,oARO,R3
II2,tMO++,R4
R3,_
RO,AS,R3
-tllRl++,R7,RO
R3,oARO,II2
tMI++UROI,R6,RI
tMO++,R3,R3

~
(')

5..

C

'II

....

~

§
S'II

~

~

N

C

Q
C

..

AIIIF

AIIIF
II'YF

AIIIF

I:
II
8F2END

..

SU8F
STF
SU8F
II'YF
S1IIF
II'YF

-

; AS=M+TR ,IIR' =112
; AS

:=

BI .. SIN ,

(M~

n

ADDF

..

STF
S1IIF

STF
STF

LDI
LDI
LDI
LDI
LDI
lDI
LDI
LDI

; (R4 = AI - TI , BI' = R31
;R3=TR=RO+AS
;RO=IIR-SIN,R2=M-TR

; AI' = R4

1III'UT,ARO
1OOlI'UT, AR2
1III'lITP2,MI
I!IlIlPI,AR3
1S1NTP2,M7
IFFTSII,IRO
3,IRI
IFIl4I12,Rt

,lJ'PEIIllI'UT
; F£Il.WTPUT !~!
; LOIER III'UT
; IMGIHMY OUTPUT !!!
; POINTER TO TWIIIIl£ FACTORS
; BIT fI£IIERSAl
; GRIll' IFFSET

; RI - BI _ COS , (AI' = R41
FlU PIPElIt£
;R3=M+TR,IIR'=II2

I, WTTERFlYI 0"0
AIIIF
SU8F
SU8F
ADDF

; AS-BHCOS, (M' =R31
; (112 = TI = RO - RII
; RO = IIR _ SIN, (R3 = AI + TIl

;R3=TR=AS-RO
;RO=IIR-COS,II2=M-TR
; RI = BI _ SIN, (AI' = R41
; A5=AR+TR, SR' =R2

"

::

; AS = BI _ COS , (M' = ASI

U8F
LDF
LDF
AIIIF
STF
STF
STF

PTB

; IR4 = AI - TI , ylU = BI' = R31

LDF

::

STF

..

LDF

STF
HPYF
STF
A1IIIF

II'YF

; IIR' = 112 , AI' = R4

; R4 = AI - TI , BI' = R3

BFlEND

17 CYClES IF FFT SIZE (1024 Il£ TO THE US< (F INTERIIAl I£IIOlY fill BIT
RE~, 21 CYClES IF m SIZE = 1024 Il£ TO THE US< (F EllERlW. lElOIY
Fill BIT fI£IIERSAl

;R3=TR=RS-RO
;ROslIR_COS,II2=M-TR

;R2=T1=RO+RI
; R3=AI +TI, M' =R3

_I,oARO,R3
;1IR'=R3-M-BI
; RI = 0 (Fill 1Nt£R lOOPI
<-M7,RI
; RO = IIR (fill INNER lOOPl
tMI++,RO
tMI++IIRIl,_MO++,R2 ;M'=R2=M+BI
; (M' = R61
R6, +AR2++lIROlb
AS,_IIROlb
; (AI' = ASI
; IIIR' := R71
R7,tlIR2++I1ROlb

3, TO H, WTTERFLY:

; (112 = TI = RO + RII
; RO = IIR - SIN, IR3' AI + TIl

; RI = BI - SIN, R3 • M + TR

; AR~ =R6=AR+8R
oARO, tMI, R6
;1IR'=R7=M-1IR
tMI++,tMO++,R7
tMI, oARO,R4
; BI' = R4 =AI - BI
tMI++(IRII,_(IRII,AS ;AF=ASsAI+BI

2, BUTTERFlYI ."11/4

; IR4 = AI - TI , BI' = R31

::
112,_
R4,tlIR2++
RI,RO,R2
II2,oARO,R3
R3,tAR2++
R2,oARO,R4

R3,_
R4,tlIR2

LAST STA(i; WITH INTEllRATED BIT fI£IIERSAl

• R51

; (R2=T1 =RO-RII
; RO = IIR - COS·, (R3 = AI + TIl

ClEM PlPElIt£
STF
STF

::

"

SIIiF

I:

STF
AIIIF

A1IIIF

II'YF

tM7++,R7
R4,_IIROIB
tM7++,R6
R2,tlIR2++IIROIB
t+MI,R6,RS
R3,tlIR2++IIROIB
Rl,RO,R2
tMl,R7,RO
II2,oARO,R3
112, tARO++URII ,R4
R3,_IIROIB
RO,AS,R3
tMl++,R6,RO

; R7 = COS , IIBI' = R411
; R6 = SIN, 1M' = 1121
; RS • BI _ SIN, IIIR' • R31
; IR2=TI =RO+RIl
; RO = IIR t COS , IAI' = R3 - AI - TIl
; IBI' =R4=AI +TI,AI' =R31
;R3=TR=RO+AS
; RO - IIR t SIN, M' = 112 = M + TR

:.
;:
~

ADIF

"

II'YF
STF

~
;:

"

STF

is
:=:.

"

II'YF
STF
SUBF

~

C
;:

~

.~
I::l

r'l

.""l

"

II'YF

"
"

"

~
C

"

~

'"

~

N

C

ac

~

STF
STF
ADIF

SUBF
STF
ADIF

STF
STF

R2, tM2++IIROIB
R4, tIiR3++t lROIB
RI,RO,R2
R2,tMO,RJ
RJ,tM2
R2,tMO,R4
RJ, tM3++( lROIB
R4,tAR3

ENDrfFFT
END:

s..
"-

~

ADrf
II'YF
SUllF

ttMI,R7,R5
RJ, tAR2++( lROIB
RI,RO,R2
tMI,RO,RO
R2,tMO,RJ
R2, *ARO++nRll ,R4
RJ,tM3++IIROIB
RO,R5,RJ
tMI++,R7,RO
RJ,tMO,R2
tMl++t IRll, RO, RI
RJ, tARO++ , RJ

, RI = BI •

cos ,

(BI' • R41

, BR' = RJ = All - 1R , All' = R2
, R5 = BIt COS, (BR' = RJI
, (R2 = TI = RO - RII
, RO = BR * SIN, (AI' = RJ = Al - TIl
, (BI' = R4 = Al • Tl , AI' = RJI
,RJ=1R=RO-R5
, RO = BR * COS , All' = R2 = All + 1R
, RI = BI * SIN, BR' = RJ = All - TR

Q,EAII PIPEWE

"

C
;:

ADIF

STF
SUBF
II'YF

lIFLEND

~

§

SUBF

"
::

~

5..
0
.,"-s..

SlJIIF

RJ,tMO,R2
tMI++IIRI',R7,Rl
R4, tM3++IIROIB
RJ, tMO++, RJ
R2, tAR2++IIROIB

to'
to'
to'

NIP
t
SELF

BR
.end

SELF

, All' = R2, (BI' = R41
,R2=TI=RO+RI
, AI' =RJ =Al - Tl , BR'
, BI'

=R4 =Al

, BI' = R4

=RJ

+ Tl , AI' = RJ

8

Af'PEII)U A5

.flQat

TlTlEl TIIIDIKBR.A5It
TAIIlE WITH TIIIDIlE FACTIJIS FIR A m 1.1' TO A LEMlTH IF 1024 COIf'LEX
POINTS.
FILE TO lIE LIIIIED WITH 11£ SOlflCE CODE : R2D1T.A5It m R2DITB.A5It

~
~
~

§"
g.

s....

IIIITTEN BY : RAIIUIl !EYER AND KIIIIL SOtIARZ
LElllSTIIL F\ER NACHlIClllEllTECIIIIK
lJIl'lERSITAEl~

L£NlTH IF TIIIDIlE FACTm TAIILE : 512 REAL VAU£S (=1024 ml
HHHHH+HHHfHHHHHffHHHHHHIII".IIII,IIIIIIIIIIIHHHHHHU

.,Iobol
.globol
.,Iobol
.gl.bal
.global
.gl.bal

~

~
~

~

~
$?

~
~

§

~
c::s

~
g

So

'"

~

0

nh.lb
OVlert
DuMel

.set
ant
.Sft

.set
.Sft

...

.Stt

tMalb

...t

I~

.

.set

8
4
5

.OYlert
fAacht.1

.Sft

.Stt

;012
; n/4
, 0/8

, MItIIER IF STAGES

sine
.fI ..t

.f1.at
.fI ..t

.flolt

.floit
.float

~.13588%4915452e-()OJ

9. 99981175282~OI.-OO1

f
~

~

~~

= """
~~
o 1-3

=Id(ol

:;:
=~

t~
....

(')~
o
.
9 1-3

~ ~.
~ Q..

~~

• data

Q
c

,m-LElllTHo

• !ll.y 11£ FIRST I~ YIIJ..lES IF 11£ TAIl.E ME IEEIlED

.fl ..t
.fl ..t
.fI ..t

C

1024
512
256
128
10

7.027547444572~-OO1

.float
.float

=
==
~>
~~
=-1

sint
0

ohalb
oviert
Mehte1

•• AMlTHER EIAIfLE IF m -L£IIlTH " =32'

N

f:j

14.07.89

7.114321957452160-001

.float

I. ()()()()()()OO(+OOO
O.()()()()()()OO(+ooo

7.071~781181i548t-001
7. 071~781181i548t-001
9.23879532511287.-001
3.826834323650900-001
3.826834323650900-001
9.23879532511287.-001
9.80785280403230e-001

s. ~
!~
• 0
~
~

.,

=

Appendix B. Radix-4 Complex FFf

I"~

Ii

An Implementation of FFT, DCT, and Other Transforms on the TMS320C30

101

....

S

.BSS
.BSS

HfHHHH-IHHfHHUfHHfHftHfffHfUfHHHffHflfHHHHfIHHfH+H

-

APPEIIDlX 81
GENERIC PROIlRAII TO 00 A UX»'ED-CODE RADlX-4 FFT
TI1S32OC30.

CM'lJTATl~ ~

11£.

.SSS

-

IN ORDER TO !'AVE THE FINAl RESULT IN BIT-REIIERSED ORDER, THE Tl«l MIDDLE
iRANCHES OF THE RADlX-4 BUTTERFLY ARE INTERCI'ANOED DlIUNO STIllAGE. NOTE
THIS DIFFERENCE IoI£N CQnPARINO WITH 11£ PROOIWt IN P. 117 OF 11£ MAUS
lIND PARKS !lOOK.

~
~

;:s

is

::to

~
~

~

AUTHOR: PANOS E. PAPAnICHAlIS
TEXAS INSTRlft:NTS

~

(')

.!""l
I:>

;:s

AUlJST 23, 1987

tHHHHfffHHIHfHfUHI-HHH-tHHfHftHHft+HfHfUffHH+fHH+HH1f

-

INP

I:>..

.GLOR
.!l.OR
.GLOR
.!l.OBL

"SINE

:
:
:
:

.USECT

'IN" ,1024

: I£IOiY WITH INPUT DATA

FFT
N

ENTRY POINT F!Jl EXECUTION
FFT SIZE
LOO4IN)
ADDRESS OF SINE TAILE

• TEXT

<:>
So
~

INITlAlIZE

.....

LOCATl~

FFT

: STARTlNO

• SPACE

100

: RESERVE 100 W!JlDS Fill IlECTIllS, ETC.

.1mD
.1Ql[)

$+2
FFTSII
N

C

.1Ql[)
.1Ql[)

"SINE

So
~

.1Ql[)

INP

.BSS
.BSS
.BSS
.BSS
.BSS
.BSS
.BSS

FFTSI1,I
LOGFfT, I
SINTAB,I
INPUT, I
STAGE, I
RPTCNT, I
IEINDX,I

;:s

~
C

~

;:s

~
~
N

C

a
C

TEMP
STORE

.1Ql[)

UP
LDI
LDI
LDI

TEIf'
ITEIlP,ARO
@STIFE,ARt
tAROH,RO

STl
LDI
STl
LDI
STl
LDI
ST!

RO, fARt++

LDP
LDI
LDI
LDI
LDI
STl

FFTSlZ
IFFTSlZ.RO
IFFTSIZ,IRO
IFFTSlZ,IRI
O,AR7
AR7,@STAGE

LSH
LSH
LDI
STl

I,IRO
-2,IRI
I,AR7
AR7, IRFTCNT

LSH
STl
ADDI
STl
SUBI

-2,RO
AR7,IIEINDX
2,RO
RO,M
2,RO
I,RO

LSH

OF 11£ PROGRA/I

.1Ql[)

~

I:>

tI>
~

=

.,..

: XFER DATA FR01 IJE I£IOiY TO 11£

fAROH,RO

~~

RO, *AR1++
fARO++,RO

tI>
~tI>

=

==

RO, tAR1++

tARO,RO
RO,-ARt

: CfftWID TO LOAD DATA PAGE POINTER'

::!.

OOn
(M~
N

Q
=(JQ
(i"'l
(MSI)

: @STAGE HOLDS THE D.RlENT STAGE
: N1J1BER
: IR0=2tNl llECAUSE OF REAlIl/1AG)
: IRI=N/4, POINTER F!Jl SIN/COS TAILE

=51

....
Q

0Q

: INlTlALlZE Rfl'£AT Illt4TER OF FIRST
LIXP

SI)

: INlTlAllZE IE IHlEX

Q
Q

t""I

"C
tI>
Q.
I
(i

: JT=RO/2+2

: RO=N2

OUTER LIXP

Q

Q.

L()(J>:
LDI
ADDI
ADDI
ADDI
LDI
SUBI

: BEGIffjlNO OF TEll' STGRAGE AREA

FFT SIZE
LOO4IFFTSIZl
SINE/COSINE TABLE BASE
AREA WITH INPUT DATA TO PROCESS
m STAGE.
REPEAT ~TER
IE IN!£X Fill SINE/COSINE

=
e:

: INlTlAllZE DATA lOCATlIH>
: CllWIH!I TO LOAD DATA PAIf: POINTER

OTHER

THE TWIDllE FACTIllS ARE SlI'PLIED IN A TAlILE PUT IN A .DATA SECTlON. THIS
DATA IS INCLUDED IN A SfPARaTE FILE TO PRESERIoIO 11£ GENERIC NAT\JlE OF THE
PROGRAIl. FOR THE SAI£ Pll1POSf, THE SIZE OF 11£ FFT N AIID LOO4IN) ARE
OEFINED IN A .!l.OR DIRECTIVE AIID SPECIFIED DlIUNO Llft(INO.

~
"G

"C
"C

FFT:

THE PROGRAIl IS TAKEN FROI THE 8lIlR1JS AIID PARKS Il00<, P. 117. 11£ Cllf'LEX
DATA RESII£ IN INTERNAl ~. AND THE CM'lJTATl~ IS OO'E IN-PLACE.

::...
;:s

>

SEOOIIIH.()(J> ca.NT
JT ca.NTER IN PROOIWt, P. 117
IAI IHlEX IN PROOIWt, P. 117

LPCNT,1
JT,1
IAI,1

FIST

@INPUT,ARO
RO,ARO,ARI
RO,ARI,AR2
RO,AR2,ARJ
IRPTCNT,RC
I,Re

:
:
:
:

ARO
ARI
AR2
ARJ

POINTS
POINTS
POINTS
POINTS

TO x([)
TO XIIl)
TO Xm)
TO X([3)

: RC SI«JlLD IE IJE LESS THAN I£SIIiEO I

UJ(J'

RPTB
ADOF
ADDF
ADDF

BLKI
I+ARO,f+AR2,Rl
HARJ, '+ARl, R3
Rl,RI,Rb

RI=VII j+Y( 12)
R3=Y!1 1) tV ( 13)
Rb=RI+R3

tI>

r;

e:

...
~
I

~
~

Q

=

;:...

::s

SUBF

~

SUIF

~

[

LDF

"

~

is

g'

"

~

"

~

tl

t:l

5.
S?
::-

.,

"
Bl.K1

"

'10

~

511

c::>

C

0

IN

RI=RH!3
R5=X1121
R7=VIIlI
R3=XIIlI+l 1131
; RI=I(J)+X

~
IV

SUIF
SUBF

lOl
lOl
ADDI
AIIOI
SUBI
ADDI
SUBI

R4=V(J)-Y1121
Y(J)=RI~

IF THIS IS TI£ LAST STAII, YOU ME OOIE

§

~

UF
ADDF
AIIIF
S1F
AIIIF
SlIBF
STF

SlIBF

::

(J

.!'"'l

*+M2,I+MO,R4
R6,HARO
RJ,Rl
tAR2,R5
<+ARl,R7
tAR3, tARl ,RJ
R5,tARO,RI
Rt,ltARt
RJ,RI,R6
R5,tARO,R2
R6, IARO++ IlRO I
RJ,Rl
tAR3, tARl ,R6
R7 , 1+AR3, RJ
Rl,tARl++IlROI
R6,R4,R5
R6,R4
R5,ttAR2
R4,1+AR3
RJ,R2,R5
RJ,R2
R5, tAR2++UROI
R2, tAR3++( lROI

S1F

; IM:REIOT lIfER lOOP ClDlTER

"
; IAt=IAt+IE
; IX(J),V(J)I POINTER

"

; IIUII,VIIlI) POINTER
; 11I121,VII211 POINTER
; IXII3I,V(131) POINTER
; RC SIIUII lIE IX LESS THAN IESIRED •
; IF I.PCIIT=JT, 00 TO
SPECIAL IlUTTERFl Y

"
1IlJ(2

ADDF
If'YF
STF
If'YF
SlIBF
If'YF
STF
""VF
ADDF
IWVF
STF
II"/F

SUBF
IIPYF
STF
II"IF
ADDF
STF

IlK2

1+AR2,1+MO,R3
*+AR3,'+Ml,R5
R5,RJ,R6
1-tAR2,t+ARO,R4

R5,RJ
tAR2, tARO,Rl
tAR3, tARl, R5
RJ,++AR5I1RII,R6
R6,ttARO
R5,Rl,R7
tAR2, tARO,R2
R5,RI
Rl,tARS,R7
R7, IARO++ IIRO I
R7,R6
f+AR3, HARI, R5
Rl, t+AR5IIRII,R7
R6,ttARl
RJ,+AR5,R6
R7'R6
R5,R2,Rl
R5,R2
tAR3, tARl,R5
R5,R4,RJ
R5,R4
RJ, ++AR411RII ,R6
R6,tARl++UROI
Rl, tAR4,R7
R7,Rb
Rl,++AR4URII,Rb
R6,ttAR2
RJ,fAR4.R7
R7,Rb
R4, tfARM IRII ,Rb
Rb, fAR2++IIROI
R2,fARb,R7
R7,Rb
R2, t+AR6IIR11 ,Rb
R6,++AR3
R4,fAR6,R7
R7,R6
R6, tAR3++ II RO I

;
;
;
;

R3=VUI+Y!I21
R5=Y!l1I+Y1131
R6=R3+R5
R4=Y!II-V( 121

; R3=R3-R5

;
;
;
;
;
;
;
;
;
;
;
;
;
;

Rl=XIII+11l21
R5=1(1lI+11131
R/o=R3IC02
V(J)=R3+R5
R7=Rl+R5
R2=XIII-lIl2I
Rl=Rl-fi5
R7=RltS12
11l1=Rl+R5
R/o=R3IC02-RltSI2
R5=VI 11 1-Y1131
R7=RltC02
YllII=R3iC02-i1ltSI2
R6=R3tSI2

;R6=Rl~12

;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;

RI=R2+R5
R2=R2-fi5
R5=XIIlI-lIl3I
R3=R4-fi5
R4=R4+R5
R6=R3tCOI
XIIlI=RltC02+R3tSI2
R7=RltSll
R6=R3fCOl-illtSI1
R6=RlfCOl
YII2I=R3tCOI-RltSl1
R7=R3tS11
R6=RtlCOt +R3tSI1
R6=R4fC03
11l21=RltC01+R3tS1l
R7=R2tSI3
R6=R4tC03-i12tS13
R6=R2tC03
VII3I=R4tC03-R2tS13
R7=R4tSI3
R6=R2tC03+R4'SI3
XII3I=R2tC03+R4'SI3

~

Cll'1

II
IIR

IJ'CIIT,RO
III.OP
COO"

STI
LDI

, LOOP BM:K TO TIE IIIIElI LOOP

LSH
ANlI
Sf!
SUBI

Sl'ECIM. IIUTTERFlV FIll IFJ
SI'Cl

LDI
LSH
II1II1

IRI,M4
-I,M4
ISINTAB,AR4

.SH
, POINT TO SI"I451
, CllEATE COSINE INlEX _ I

IIR

AR6,tlEIIm
RO,IRO
-3,RO
2,RO
RO,M
2,RO
I,RO
LOOP

; NI=N2
; JT>II212+2
; N2>II214
, NEXT fFT STAlE

STORE RESllT OOT USII«> BIT-REIIERSED ADDRESSII«>

RPTB

~

::

AIIF

stIF

i"

AIIF

~

r

stIF
IIIIIIF
SlIIF
IIIIIIF

~

S

AIIF

g.
~

~

t:l

~

stIF

"

"

SIIIF
AIIF

~

stIF
IPYF

~

:to

"...

;;'l

~

"

::

I::I

~

§

AIIF
STF
STF
stIF
SlIIF
STF
STF
AIIF
stIF

IlK3

::

So

AIIF
IPYF
STF

stIF
IPYF
SlF
AIIF
IPYF
STF
STF

IlK3
tM2,_,RI
tM2,_,R2
<+M2,Ht1RO;Rl
<+M2,_,R4
IM3,tMI,R5
RI,R5,R6
R5,RI
_,_I,RS
R5,Rl,R7
R5,Rl
Rl,_
RI,-"!IROI
IM3,tMI,RI
_,_I,Rl
R6,.tMl
R7,tMl++!IROI
Rl,R2,RS
R2,Rl,R2
RI,R4,Rl
RI,R4
R5,Rl,RI
_,RI
R5,Rl
_,Rl
RI,.tM2
M,R2,RI
tM4,RI
Rl.tM2++URO)
M,R2
_,R2
RI,_
R2, tM3++llRO)

, RI-XIII+Xml
, R20Xm-11I21
, Rl-Vm+VII2I
, R4=YCI)-YCI2)
, R5=IUll+X1131
,R6oII5-II1
, RIzR\+R5
, R5=V(l1I+YCI31
,R7_

,-

,vmzR3+R5
, IHlO«I+R5
, RI-lIIU-X1131
, R3=YCIU-YCl31
, V(l1lot15-RI
,IUlI_
, R5-R2+R3
, R2=--R2tR3

~!!

, ft3zRHIl
,_4+111
,RI_
, RIO«I1C021

,-

, R3=R3IC02I

,
I
i
,

Y!I21-IRl-R5)1C021
Rl=R2--R4 III
RIO«IIC02I
11l2)=IIG+fIS)1C021
,R2-R2+R4 ! !!
, R2oR2IC02I
t V(3)=-tR4-it21tC021 !!'
, 1!I3)=(R4+f12)1C021 ".

".

BPO

IJ'CIIT,RO
III.OP

, LOOP BM:K TO TI£ IIIIElI LOOP

N

LDI
LDI
LSH

IfIPTCNT,AR7
IIEINDX,AR6
2,AR7

, IIU£IEIIT R£PEAT CIUITER FIll NEXT

C

STI
LSH

AR7 ,IRPTOO
2,AR6

, IE=40IE

Cll'1

"

~
~

C

a

COO"

TINE

00'

"

mRY
"

SELf

LDI
SlIIII
LDI
LDI
LDI
LOP
LDI

IFFTSIZ,RC
I,RC
IFFTSIZ ,IRO
2,IR!
IINI'IIT,ARO
STIllE
ISTIIIE,ARI

RPTB

BlTRY

LIF

_tIl,RO
-"!IROIB,RI
RO,_IUI
RI,tARl++!IRll

LDF
STF
STF
IIR
,00

SELF

; RC=N
; RC SIO.lD lIE ONE LESS TltIIN IESIRED •
; IROoSIZE IF moll

, IIRAIO! TO ITSELF AT THE EIIIl

>

~

::

~!\"
~

~
g'
~

~

tl

r"l
!"'!

~
~

::-

.,
III

_ I I 82

NIIIIE' ffU -

iat

fft_4IN.
N
ft
1I0&! . .t.
ilt

int

~
g
So

"

~

~

Q
C

~

ft, DATAl
FFT SIZE._
IUIER IF STAlES • I.QG4INl
ARRAV Willt 1II'1II AlII 00lPIII DATA

.,lobal
•••tl
.1I00t
.fl ..t
.1I01t

A113

.GUIl.
.1LOIl

..ffU
_SII£

•ISS

FfTSll,1
LOGFfT,I
IIfUT,I

~

z sill(o-~i/Nt

Yllue2 = sinU.2t,ilN)

.11 .. (511141 • 5ift((SINI4-Uf2tpi/Nl

11£ \/AllES vol .. I, ",,1 ...2, ETC" ARE 11£ SINE HIWE 1MUfS. AI! l1li
IH'OINT FFT, TIm ARE NOII/4 IMUfS F(lI A All AIIl A IUIRTER I'ERIIIl
IF 11£ SINE HIWE. IN litiS IIIV, A All SINE AlII COSINE I'ERIIIl ARE
AllAILAIILE (SlJ'ERIIII'05EIlI,

.lIIord

§Ie

_SINE

INITIAlIZE C FlKTllIi
PUSH

LDI
PUSH
PUSH
PUSHF
PUSHF
PUSH

FP
SP,FP

LDI
STI
LDI
STI
LDI
STI

o-fPI21,RO
RO,IfFTSIZ
t-FP131,RO
RO,II.OIJFT
t-FP141,RO
RO,IIII'III

PUSH

~

IEIlICATED REGISTERS

.855

.BSS
.855
• ISS

LDI
LDI
LDI
LOI

STAGE,I
III'TOO,I
IEINIlI,I
LPCNT,I
JT,I
IAI,I

REGISTERS USED' All, RI, R2, R3, M, AS, R6, R7, 1lIIO, ARl, AR2, A113, AR4,
MS, _, /1(/, IRO, IRI, AS, E, At

STI

IfFTSIZ,AIl
IfFTSI Z,IRO
IfFTSIZ ,IRI
O,AII7
AIl7,ISTAGE

_'IWOiE._ICIIlLIS
!ElM IIISTRUIIIITS

LSH
LSH
LDI
STI

1,1110
-2,IRI
I,AII7
A117,IIPTtNT

OCTOllER 13, 1987

........ttHHffHt.II.IIIII •• II •• IIIIIIIIIIII •••• IHflHHtHtHHfHflHlH

I

~

J:.

n
; !OlE _ S TO LOCATlIJIS MTCHIIil
; 11£ HAlES IN 11£ _

i

"e.
~

INITIAlIZE FFT RWTINE
.ISS

S·

==
==

AS

PUSH

PUSH

Q.~
;

R4

R6
R7
AR4
MIS
AR6
M7

.ISS

STACK STlU:1IR: LfIW 11£ CALL'
+-----+
-fP(41
DATA
-fP(31
ft
-fP(ZI
N
-fP(I)
llIETIIIN_
-fP(OI
OLDFP
+------+

~

~

_sine
veluel

1

; EIITRY POINT AI! EIEtUTUJI
; _ss IF SINE TAIILE

•TEXT
f
SINTAB

_lIt_4'

IN IIIlER TO HIWE 11£ filIAl. RESllT IN BIT-lIEVERSED lIllER, 11£ TIll
ftlDllf BIWICIES IF Tl£ !WI1I-4 IIUTTERflV ARE IIITEIDtIIIIiED IUIII1l
STaWi£. 1Il!E litiS DlfFEI£Ia lIEN Clll'ARIIil Willt 11£ _
III
P. 117. Tl£ COfIITATlIII IS DIllE 1lH'lJU, 0111) 11£ (lI1G11IAI. DATA IS
DESTROYED. BIT _ _ IS IIIUIENTEIl AT Tl£ 00 IF 11£ FlKTIIli.
If THIS IS lIlT NECESSARY, litiS PART CAN IE COOOEIITED OOT. 11£
SINE/COSINE TAIILE AI! 11£ TWIDIlE FAtTllAS IS EXPECTED TO IE SII'PlIED
IUIII1l LIN( TIlE, lIND IT SIO.lIl HIWE TIE Al.I.QIIIIl f(lIMTI

_sine

.SET

,ISS
,ISS

IESCAIPTlIII'
GENERIC FlKTIIli TO DO A 1W11I-4 FFT COfIITATlIII III 11£ TIIS32OC3O,
11£ DATA ARRAV IS 2ttI-1I11l, Willt AEIII. AlII IIWiINARY IMUfS IUBIIIIIToo. 11£ _
IS BASED III 11£ _ _ IN 11£ IIMJS
0111) _S _ , p, 117.

~

Q

!WIIH COI'I.£I FFT TO lIE C41U.ED AS AC FlKTIIli.

SYIO'SIS'

I:i

~

FP

; FFT STAGE •
; REPEAT CIlMTER
• IE INDEI AI! SINE/COSINE
; SECIIIIH.OOP CIlMT
; JT CWIIElI IN _ , P. 117
; IAI IHIEI IN _ , P. 117

3
S'

==
I'D
n

e..
;-

; ISTAIE HDLDS 11£ ClIIRENT STAlE
IUIER
;
; IfIOa2tNl llIECAUSE IF AElll.IIMGI
; 1R1'il/4, POINTER F(lI SIN/COS TAIILE
; INITIAlIZE IEPEAT CIlMTER IF FIRST
LIIlP

Q.

&J

=

n

~

LSIt
STI
AlIII
STI
SIIII
LSIt

-2,RO
1iR7,IlfllllX
2,RO
RO,IJT
2,RO
I,RO

, INlTll'Ll1f IE IIIIEI

,-

I1AIN INNER

I JT=RO/2+2

OIITER l.O(I'

LDI
ADDI
AlIII
ADDI
LDI
SIIII

~

r
~

FIST

S
:::.
§

~

\:j

~

t<::>

..
..

So

U

!:;'l
§

..

"...

~
~

~

II

§
So

IlJ(I

"

~
~

C

Q
C

LDF
LDF
AIIIF
AIIIF
STF
AIIIF
stIIF
STF
stIIF
stIIF
stIIF
STF
stIIF

STF
STF

stIIF

..

•
;
;
;

ARO
MI
M2
IIR3

; RC

POINTS
POINTS
POINTS
POINTS

SIIJW)

TO
TO
TO
TO

XIlI
11111
X1121
11131

1£ lIE LESS THAll IESIRED •

l.O(I'

RPTB
AIIIF
AIIIF
AIIIF
stIIF
STF
stIIF

~

t1If'UT,ARO
RO,ARG,MI
RO,AIII,M2
RO,M2,1IR3
IIIPTCNT,IIC
I,RC

ADIF
STF
STF

IlJ(I
_,t+M2,RI
O+IIR3, t+AIII,Rl
R3,RI,1It.
<+M2,_,R4

1It.,_
R3,RI
+M2,RS
_I,R7
OIIR3,+ARI,R3
RS,_,RI
RI,t+AR1
Rl,RI,IIt.
RS,_,R2
R6,_UROI
Rl,RI
OIIR3,+ARI,Rt.
R7,_,Rl
RI, +ARI++UROI
Rt.,R4,RS
Rt.,R4
RS,ttAR2
R4,_
Rl,R2,RS
Rl,R2
R5,~++(lRO)

H2,_UROI

IF THIS IS 11£ LAST STAGE,
LDI
ADDI
CllPI
BZD
STI

ISTAGE,AII7
1,1iR7
1LDGFFT,1iR7
END
A117;ISTAGE

;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;

RI=YIlI+Y(121
R3=Y(II I+YIl31
R6=R1+R3
R4=YIlI-VII21
VII I=RI +R3
RI=RH3
R5=11I21
R7=Y(II1
R3=X1ll1+11l31
Rl:zXtI )+XU2}
YIIII=RI-R3
R6=R1+R3
R2=XUHU21
11II=R1+R3
RI=RI-R3
R6=IIIIH1131
-R3=YIllI-YU31
XUII=RI-R3
R4=R4tR6
V1121=R4-R6
V1131=R4tR6
R5=R2-R3 '"
R2aR2tR3 ~ ~ !
X(2)=R2-R3 ~ ~ ~
X(J3)=R2+R3 !I!

2,ARt.
1LPCNT,ARt.
ILPCNT,ARO
IIAI,IiR7
IIEINlI,AII7
1IIf'UT,ARG
A117,t1AI
RO,ARO,AIII
A116,ILftNT
RO,AIII,M2
RO,M2,1IR3
IRPTCNT,RC
I,RC
1JT,ARt.
SPCl
IIAI,IiR7
tlAI,AR4
ISINTAB,AII4
A114, A117 , AR5
1,AR5
A117,AR5,AII6
1,ARt.

LDI
LDI
ADDI
ADDI
SUBI
AlIII
SUBI

, INIT IAI llIIE.I
; INIT

l.O(I'

CIUITER FOR INNER

l.O(I'

RPTB
ADIF
AIIIF
ADIF
stIIF
SUBF

'"

..

STAGE

; IAI=IAHIE
; (lIlI,YIlII POINTER
; (I III I , YIII II

POINTER

; 111121, V11211 POINIBl
; IIU31, YU311 POINTER
; RC SIIJW) 1£ lIE LESS THAll IESIRED •
; IF LfCU=JT, GO TO
SPECIAl. BUTTERFLY
; CREATE CUSIIE IIUI A114
I IA2=IAI+IAI-1
; 1Al=1A2+IAH

IlJ(2

ADIF
II'YF

HAR2,"ARO,R3
O+IIR3, .+AIII,RS
RS,Rl,IIt.
<+M2,_,R4
RS,Rl
tM2,_,RI
tIIR3,+ARI,RS
Rl, ++AR5URII, lit.

STF

1It.,_

ADIF

RS,RI,R7
tM2, _,H2
RS,RI
RI,_,R7
R7,_UROI
R7,R6
ttllR3,t+AIII,R5
RI,++AR5I1RlI,R7
R6,t+AR1
Rl,_,R6
R7,R6

SUBF

.
m

; IM:IISIEIIT INNER LID' aJIIjlBl

SEWID LID'

.

vru ARE BIlE

, CIHIIT

LDI
AlIII
LDI
LDI
ADDI
ADDI
STI
ADDI
STl
ADDI
ADDI
LDI
SUBI
CllPI

BZD

, R5=R4-R6

,
;
;
,
;
;
;

1,1iR7
1iR7,IIAI
2,1iR7
1iR7,IJ'CNT

11UP;

l.O(I"

::...
::s

l.O(I'

LDI
STI
LDI
STI

SUBF
II'YF
STF

SUBF
SUBF
II'YF
STF
II'YF
ADDF

;
;
;
,

R3=YIlI+VII21
R5=VllII+YII3I
R6=R3+R5
R4=YIlI-YI121

; R3=R3-R5
, RI=IIlI+XU21
; R5=XIllI+1II31
,
;
;
;
,
,
,
,
,
,
,
,
,

R6=R3tC02
YIII=R3+R5
R7=RI+R5
R2=XIIHII21
RI=RI-ll5
R7=RltS12
11lI=R1+R5
R6=R3tC02-RltSI2
R5=YIllI-Y1131
R7=RI.COZ
YIIII=ft3tCOMltSI2
R6=R3tS12
R6=R1 tC02+R3tS12

::t...
;:s

ADIF
SUII'
SUII'

~
~

~
~

~

is

.

g"
.Q,

~

,:--l

"

..

tl

(J

,:--l
!:l

~

0

.
1IIJ(2

S-

...'"

SlJIF
ADDF
I'PYF
STF
I'PYF
SlJIF
If'YF
STF
II'YF
ADIF
I'PYF
STF
If'YF
SlJIF
If'YF
STF
If'YF
ADDF
STF

Cl'Pi
IlP
IIR

~

Ir.i,Rl,RI
Ir.i,Rl
tAR3,tARI,Ir.i
Ir.i,R4,R3
Ir.i,R4
R3,ttAR411RII,Rb
Rb,tARI++lIROI
RI,tAR4,R7
R7,Rb
RI,ttAR41lRII,Rb
Rb,ft11R2
R3,tAR4,R7
R7,Rb
R4,ttARbllRlI,Rb
Rb, fAR2++IIROI
Rl,tARb,R7
R7,Rb
Rl,++ARbIlRII,Rb
R6,f+AR3

R4,tARb,R7
R7,Rb
Rb, tAR3++1 lROI

, RI=R2+1r.i
, R2=R2-RS
, R5=XIlIH!l31
, R3=R4-RS
, R4=R4+1r.i
, Rb=R3tCOl
, XIlII=RIfC02+R3fSI2
, R7=RlfSll
, Ri>=R3fCOI-RI fSll
, Rb=RltCOl
, Y!I21=R3fCOI-RlfSll
, R7=R3fSll
, Rb=RI tCOl +R3fSll
, Rb=R4tC03
, XI!2I=RIfCOI+R3fSll
, R7=R2fSI3
, Rb=R4fCQ3-RlfSI3
, Rb=R2tC03
, YI131=R4tC03-R2tSI3
, R7=R4fSI3
1 R6=R2*C03+R4*S13
, XI!3I=R2tC03+R4fSI3

SlJBF

..
..
BLr3

..

CONT

, LOtY

BACI(

~
<:)

TO TIE IhNER LOtY

~
<:)

LOI
LSH
ADDI

IRI,AR4
-I,AR4
tsiNTAB,AR4

RPTB
ADIF
SlJIF
ADDF
SlJIF
ADDF
SlJIF

S-

'"

~

~

N

C

a
C

ADDF

HARJ, f+AR1,R5

SU!f

1r.i,R3,R7
Ir.i,R3
R3,*fARO
RI, tARO++IIROI
>AR3,tARl,RI
*>AR3,ttARl,R3
Rb,ttARl
R7,tARlttllROI
R3,Rl,R5
Rl,R3,Rl
RI,R4,R3
RI,R4

AODF
STF
STF
SlJIF
SlJIF
STF
STF
ADIF

-

SUII'
SlJIF
ADDF

0

-I

-:::::,'?:::::-:-:::-::-~:

ADIF

lLK3
>ARl, fARO,RI
+AR2, fARO, Rl
ftAR2,_,R3
ftIIR2, _,R4
>AR3,>ARI,1r.i
Rl,lr.i,Rb
Ir.i,RI

_ ":>---_"C-"'r

-~~~~-~-"-

I~OP

, LOtY

LOI
LOI
LSI<

@Rf>TCHT,AR7
@IEINDX,ARb
2,AR7

, ItCRfPEHT REPEAT COONTER Fill t£XT

STl

AR7 ,IRPTCHT
2,ARb
ARb, tlE1NDX
RO,IRO
-3,RO
2,RO
RO,M
2,RO
I,RO
LOtY

ADOI
STl
SUBI
LSH
IIR
, POINT TO SIN(451
, Cl£ATE casllE INDEX AR4=C021

;:s
RI=XIl 1+X!l21
R2=X!lH(!21
R3=YIlI+YII21
R4=Y!l I-YIl21
R5=XIlII+XII31
Rb=R5-R1
RI=RI+1r.i
R5=YIlII+YII31
R7=ft3-RS
R3=ft3+1r.i
Ylli=R3+1r.i
1!l1=RI+Ir.i
RI=XIlIJ-XII31
R3=Ylli I-Y!l31
YIlII=R5-R1
XIII I=R3-R5

, R5=R2+ftl
; R2=-R2+R3

, R3=R4-R1
, R4=R4+R1

~ ~~

BACI(

TO HE IIf£R LOtY

IE=4fIE
, NI=N2

, JT=N2/2+2
, N2=N2/4
, IEXT FFT STAlE

00 TIE BIT -REVERSI~ OF THE curPUT

END:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

RI=ft3-R5
RI=RltC021
R3=ft3+1r.i
R3=R3tC021
Y!I21=1R3-lr.iifC02I
RI=R2-R4 '"
RI=RlfC021
1II21=IR3+lr.iltC021
R2=R2+R4 '"
R2=R2tC021 '"
Y1131=-IR4-R2ltC021
X!l3I=IR4+R2IfC02l

@lPCNT,RO

LSH

WIT

SPECIAL llJTTERFLY Fill W=J
f
SPCL

,
,
,
,
,
,
,
,
,
,
,
,

TlI'E

s::i

;:s

Ir.i,R3,RI
fAR4,RI
Ir.i,R3
*AR4.R3
RI,f+AR2
R4,R2,RI
tAR4,RI
R3, +AR2++IlRO I
R4,Rl
tAR4,R2
RI, ftAR3
R2, tAR3++IlROI

Cl'Pi
IlPD

LSH
ST!
LOI

!I.PCNT ,RO
I~OP

If'YF
ADDF
If'YF
STF
SlJIF
I'PYF
STF
ADIF
If'YF
STF
STF

..
.
.

"
CONT
BITRV

LDl
SUBI
LOI
LDl
LDI

tFFTSIZ,R(
I,RC
tFFTSIZ,IRO
tiNPUT ,ARO
tiNPUT,ARI

RPTB
C/1PI
OOE

BITRV
ARO,ARI
CONT
fARO,RO
tARl,RI
RO, _ARI
Rl,fARO

LIF
LDF
STF
STF
LDF
LDF
STF
STF
Nfl'
HOP

RC=N
RC SHolllJ BE Il£ LESS THAN DESIRED I
lRO=SIZE OF FFT=N

HARQ! 1) ,RO
HARUl),Rl

RO,ttARlIlI
Rl,++MOtl)

..tAR0121
tARI++IlROIB

RES TIllE TIE REGISTER VALlES IWII RETlIlH

!Ulilili!iI!lIli!e:

uu~hu~

108

An Implementation of FFT, DCT, and Other Transforms on the TMS320C30

Appendix C.Radix-2 Real FFT

An Implementation of FFT, DCT, and Other Transforms on the TMS320C30

109

-

Clf'I

0

I¥'P£NlIX CI
GENERIC _

TO 00 A RADlI-2 REAl FFT CfWUTATlON ON lIE

TIiS32()(3)

lIE FROORAIl IS TAKEII FR(It TIE P/lPER BY SORENSEN ET 1'1.., .AJIE 1987 ISSUE
OF THE TRANSACTlIJIS ON ASSP.

"
"

CONT
BITRY

TIE IREALI DATA RfSIOE IN INTERNAl. I£IIalY. lIE COIf'UTATlON IS lXN:
IN-flACE. TIE BIT REI/ERSAl. IS DONE AT TIE BEGltf1l1(; OF TIE _ .
~

;::s

~

::I

'";::s

AUTH(ll; PANOS E. PAPAI1ICHALIS
TEXAS IMSTRlI1ENTS

is

gO

~

~

Itf'

tl

C'"l
.!'"'l

~

§

~
~

~
'"§

"SINE

;
;
;
;

ENTRY POINT FOR EXECUTION
FFT SlIE
L0G2INI
ADORESS OF SINE TABlE

.USEeT
.SSS

'IN",1024
IlITP,I024

; I£IIalY WITH INPUT DATA
; I£IIalY WITH OUTPUT DATA

*
FFTSll
LOOFFT
SINTAB
INPUT
OUTPUT
FFT:

•WORD

FFT

; STAIITIOO LOCATION OF THE _

• SPACE

100

; RESERVE 100 WORDS FOR VECTORS, ETC.

•WORD
• WORD
• WORD

.WORD
.WORD
LOP

FFTSIZ

tiNPUT,~

IRQ,RC
I,RC

RI'TlI

ILKI
<+ARO,>AROtt,RQ
>ARC,HIRO,RI
RO.II-ARO
RI,IARO++

;
;
;
;

RQ=Xl!ltl(!tll
RI=XIIHlltll
XIII=X!IltXI!t11
X!Itll=XIIl-Xlltll

; ARC FOINTS TO XIII
; IRQ=2=N2

LDI
LDI
LDI
LSH
SUBI

tiNPUT,ARC
2,IRO
tFFTSlI,RC
-2,RC
I,Re

RPTB

1LK2
; RQ=XIII tX!It21
++ARDIIROI,IARO++IIROI.RQ
>ARC, *-ARCIIROI ,RI ; RI=XIIHllt21
; RO=-XI!t31
++ARD,RO
; XIII=XilItXIIt21
RO, HIROIlROI
RI,

a;

"g;:t')::3.

e.:p
Q Q

=(JQ
§ ;

-=-n> Q

~~
OO~

~~
Q~

='N

INNER LOIJ' 100-20 LOIJ' IN TIE PRDGRAI1I

INUI'

.

joooI.

; ARC FOINTS TO XIII
; REPEAT NI2 TlI£S
; RC SIfUJ) lIE ONE LESS THAN OESlRED •

MAIN LOOP IFFT STAGESI

00 TIE BIT-REVERS II(; AT TIE BEGIi'fjIOO

C

a

"

1LK2

SINE
INP
OUTP

C

~

~
n

FIRST PASS OF TIE 00-20 LOOP ISTAGE K=2 IN 00-10 LOOPI

"

tFFTSIZ.RC
I,RC
tFFTSIZ,IRO
-I,IRO
!INPUT,ARC
IINPUT,AIII

~

"

"

N

LDI
SUBI
LDI
LSH
LDI
LDI

N

ILKI

-=>

1

LDI
LDI
SUBI

ADOF
SUBF
STF
STF

INITIALIZE

So

'"

l.IA"
l.IA"
STF
STF
IU'
IU'

; XCHANlE LOCATlIJIS 1M.Y
; IF AROARC,RQ
tMI,RI
RQ,1AR1
RI,>ARC
>ARCH
*AIII++IlRQIB

LENGTH-TWO BUTTERfLIES

TIE TWIDILE FACTORS ARE SUPPLIED IN A TABLE PUT IN A .DATA SECTION. THIS
DATA IS ImUllED IN A SEPARATE FILE TO PRESERVE TIE GfMERIC NA~ OF THE
FROGRAII. FOR TIE $ME P\.IIPOSE, THE SIZE OF TIE FFT N AND L0G2INI ARE
OEFINED IN A .GLOIL DIRECTIVE AND Sl'ECIFIED WUI(; LINKII(;. TIE LEl«llH OF
THE TABlE IS N/4 t Ni4 = N12.

~
~

AIII,ARC

BG:

; AIlS POINTS TO Xill
POINTS TO SIN/COS TABlE
, IRI=N4

; ARQ

~

e.

3

~

::s

~

'i:j

~
"-

::s
is'

gO

~

~
~

<"'l

"
"

;'"i

C

S-

..."-

~

I:)

~
g

S-

"-

~
~

tv

AR5,ARl
l,ARl
ARl,AR3
R3,AR3
AR3,AR2
2,AR2
R3,AR2,AR4

LDF

tAR5++IIRll,RO
t+AR5IIRll,RO,Rl
RO, t++AR5URll,RO
Rl,t-AR5URll
RO
t++AR5I1Rll,Rl
RO,tAR5
Rl,tAR5

ADDF
5UIIF
STF
IEGF
IEGF
STF
STF

"

"

LOI
LSH
LOI
SUBI

IFFTSII,IRI
-2,IRI
R4,RC
Z,RC

RPTB

lU(J

ItPVF
If'YF
If'YF
ADDF
If'YF
5UIIF
SUIIF
ADDF

tAR3,t+AROIIRll,RO
tAR4,tARO,RI
tAR4,'+AROIIRll,RI
RO,RI,R2
tAR3,tARO++lIROI,RO
RO,RI,RO
tAR2,RO,RI
tAR2,RO,RI
RI,tAR3++
tARI,R2,RI
RI,tAR4-R2,tARI,Rl
Rl,tAR1++
RI,tAR2-

STr

ADDF

"

STF

"
BLKJ

STr
STF

SlIIIF

, AR3 POINTS TO XU3)=XII+J+/C1)
, AR2 POINTS TO X112)=XlhJ+N2)
, AR4 POINTS TO XU4)=XU-J+Nll
,
,
,
,
,
,
,
,

RO=XIIl
Rl=XU)+XU+N2)
RO=-XIIl+lU+N2)
X(j)=X(j)+XU+N2)
RO=XUHII+N2)
Rl=-XU+N4+N2)
XII+N2)=XUHU+N2)
Xllffl4+II2)=-XII+ft4+N2)

, IRI=SEI'ARATIIlii BETWEEN SIN/COS TBLS
, REPEAT N4-1 TII£S

,RO=XU3)tCOS
, RI=XfI4)IISIN
, RI=XU4)tCOS
, R2=XlI3ltCOS+X!I4)tSIN
; RO=XII3)'SIN
; RO=-X

5.

~
C

LDI
ADDI
LOI
ADDI
LOI
StIlI
ADDI

, AR5=I+Nl
, UXF BACK TO TI£ IIf£R LO(F

END

BR
• END

END

,BRANCH TO ITSELF AT Tl£ END

........

to.)

FP

IIPPEIilIl C2
NAI\E.

IfUI -

~

::s

~~
;!!

~

is

gO
~

~

SYNl'fSIS'
lOt 1ft-. HM, ", dot. I
FfT SIZE. N=2Hf1
int N
NUIIIER IF STAiiES = l.DG2(N)
lnt"
_ I WITH Itf'UT AND OOTPUT DATA
float "d"h,

.gl ...1

.floit viluel = sinfOf 2*pilN)
.float vilue2 = sinUf2fpi/N)
.flOlt value(N/2)

= cosl (Nl4)f2t-pi/N)

STACK STRUCTURE lIPON TI£ CALL'

'"

"N

::

: REMN AOOR
OLDFP

I:

+-------------+

IlEGISTERS USED. RO, RI, R2, R3. R4, R5. ARO, ARI, AR2, AR4, AIlS, IRO,
IRI, RS, lIE, RC

t-.l

AUTHCIl' PANOS E. PAPMICHALIS
TEXAS INSTRI.I£NTS

C

a
C

~

.

~

_SINE

; SAYE IIDICATED REGISTERS

PUSH

LDI
STI
LDI
STI
LDI
STI

H'PI2),RO
RO,IFFTSIZ
H'P(3},RO
RO,ILIXFFT
<-FP(4),RO
RO,IItf'UT

; I'IIVE ARWEIIIS TO LOCATIC»IS MTCHING
TI£ NIllES IN TI£ PROORAII

LDI
LDI

IFFTSIZ,Re
I,Re
IFFTSIl,IRO
-I,IRO
11tf'UT,ARO
11tf'UT,ARI

RPTB

BITAY

~I

ARI,ARO

BGE

COItT

LDF
LDF

tARO,RO
IARI,RI
RO,1AR1
RI,tARO

CONT
BITRV

STF
STF
HOP
HOP

OCTOIIER 13, !9S7

LDI
LDI
SUBI

I

~
~

~

=

; RC=N.
; Re Stru.D BE ONE LESS THIIN IESIREG •
; lRO=HAl.F TI£ SIZE IF FfT=N/2

~

-=
~

=:I
~

n

; XCHANGE LOCATIONS ONLV
; IF ARO

"CI

INITIALIZE C FI.IICTION

Tt£ SINEICOSINE TAlilE FOR TI£ TWIDlLE FACTORS IS EXPECTED TO BE
SUPPLIED OORING UM( TI"" AND IT Stru.D HAlE TI£ FOLLOilING FORMn

\:)

~

SINTAII

T)£ _
IS BASED ON T)£ FORTRAN _
IN TI£ PN'ER BV SORENSfN
ET AL •. JJE 1997 ISSlf IF TRANS. ON ASSP. TI£ COII'UTATION IS IXlIE
IN-PLACE, AND TI£ CIlIGINAL DATA IS IESTROIED. BIT _
IS
JrIPLElENTED AT T)£ BEGINNING IF TI£ FI.IICTIOIt. IF THIS IS NOT
NECESSARY, THIS PART CAN BE C!lftNTED OOT.

!

So

• GUlL
.(L0Il.

LDI

~

s::.

AR3

•TEXT

IfSCllIPTION.
GENERIC FlKTION TO 00 A RADlX-2 FFT COII'UTATION ON TI£ TlIS32OC3O.
TI£ DATA _ I IS N-LOItG, WITH ONLV REAl. DATA. TI£ OOTPUT IS STalED
IN TI£ SNE LOCATIONS WITH REAl. AlII IMGINARI' POINTS R AND I AS
FOLLOWS' RW), R(l), ... , R(N/2), [(NIH), ••• , !(l)

t:I

,:""'l

RADIX-2 REAl. FFT TO BE 0ILl£D AS A C flKTION.

.SET

ARO POINTS TO xm
REPEAT N/2 TI",S
Re Stru.D BE ONE LESS THIIN IESIIIED •

-==
~

....

~

::

II'TB

~

't:i

~

~

ILKI
t+ARO,iMO++,RO
f/IRO,'-IIRO,RI
RO,f-MO
RI,f/IRO++

ADIF
SUIif

..

1IU<1

is

STF
STF
FiRST PASS

~

..

; ROoXIIl+lII+I)
; . RI=KItHIl+1I
; XIl)=XIIl+XU+I)
; XII+Il=XlIHII+1I

Til: 00-20 lIU' ISTAOE K=2 IN 00-10 lIU')

~

~

\:j

r':l

,:""l
I:l

::

I:l..
C)

.
.

BLK2

S
~

§

~
C
LIU'

li
!NLOP

~
N
C

a
C

t.)

II'TB
AIlDF
SUIif

ILK2
t+AROIIRO) , 1t1R0++1IRO) ,RO
; RO=XW+IIH2)
f/IRO, t-AROI IRO),RI , RI=XUHIl+2)
;
ROo-XCI+3)
"ARO,RO
RO,HIROIlRO)
; XW=XW+KIt+2)
; XCI+2)=XUHIl+2)
RI, 'IIRO++IIRO)
RO,f+ARO
; XCI+l)=-XC/+l)

t£GF
STF
STF
STF

; IIRO POINTS TO XIII
; IROo2=N2
; REPEAT N/4 T/I£S
; RC SIIllD lIE OlE LESS THAN

~IR£D

LDI
LSIl
LDI
LDI
LDI
LSH
LSIl
LSH

IFFTSIZ ,IRO
-2,IRO
3,RS
I,R4
2,Rl
-I,IRO
I,R4
I,Rl

LDI
LD!
ADD!
LD!

tlNPUT,ARS
IRO,IIRO
tsINTAB,ARO
R4,!RI

LDI
ADDI
LD!
ADD!
LD!
SUBI
ADD!

ARS,ARI
I,ARI
ARI,AR3
Rl,AR3
ARl,M2
2,M2
R3,AR2,AR4

LDF

1M5++IIRII,RO
t+liR5lIRll,RO,RI
RO, 'HARS, IRII, RO
RI,'-AR5IlRII
RO

ADDF

I:

SUIIf
STF
t£GF

•

::

.
::

..

; IROoINDEX FOR E
; RS IKDS Til: CURRENT STAOE _
;R_
; R3=II2

; E=E12
; N4=2'fi4
; N2=Z+N2

INNER LOOP 100-20 LIU' IN TI£ PIIOGRAI1I

S

~

tINPUT,1IRO
2,IRO
IFFTSIZ,RC
-2,RC
I,RC

MIN LOOP IFFT STAOESI

....

~

LDI
LDI
LDI
LSIl
SUBI

; ARS POINTS TO XII)
; ARO POINTS TO SIN/COS TABLE
; !RI=N4

; ARI PO!NTS TO XUI)=XIl+J1
; AR3 POINTS TO XIIl)'XII+J+N2)
; M2 PO!NTS TO XII2)=XU-J+N2)
; AR4 PO!NTS TO 11/4)=llhl+ll11
ROoXI/l
RI=XC/)+XU+N2)
ROo-XIl)+XII+N21
XIlI'IIII+XII+N21
ROoXUHCl4/Q)

STF

t++M5IlRII,RI
RO,+M5
RI,IM5

RI~KIt'fi4+N2)

KIt +N2) =m I-X II +N2)
XIl_)~m'fi4+N2)

ItfElIIIOST LIU'

~.

li

t£GF
STF

BLK3

LDI
LSIl
LDI
SUBI

IFFTSIl,IRI
-2,IRI
R4,1I:
2,11:

APTB
II'YF

STF
SUBF
STF
STF

BLK3
tIIR3, t+IIROllRll, RO
fllR4,f/IRO;RI
fllR4,t+IIROIIRIl,RI
RO,RI,R2
'ARl,f/IRO++ClRO),RO
RO,RI,RO
fllR2,RO,RI
'M2,RO,RI
Rl,tAR3++
fllRI,R2,RI
RI,fllR4R2,fllRl,RI
RI,fllRl++
RI,fllR2--

SUBI
ADDI
C/l'1
BLED
ADDI

IINPUT,AR5
Rl,ARS
IFFTSIl, ARS
IIl.OP
IItf'IJT,ARS

II'YF
II'YF
ADIF
II'YF
SUIIf
SUIIf

ADDF
STF

ADDF

; IRl=SEPARAT/(W IlETIEEN SIN/COS TIllS
; REPEAT NH TII£S

;
;
;
;
;
;
;

ROoX IIl)tOOS
RI=XlI4)'SIN
RI=X!I4).OOS
R2=X(!3)tCOS+XII4)tSIN
ROoXlI3)'SIN

;
;
;
;
;
;

X(3)--XII2HRO !!!
RI=XUII+R2
XIl4)=III2)+RO '"
RI=XUIH!2
11II1=IIW+R2
X112)=XIII)-R2

~XUl)tSlN+XU4)tCOS

; AR5=1+NI
; LIU' BACK TO Til: IIIIER LIU'

r«IP

r«P
ADDI
C/l'1
BLE

I,RS
IL~FT,RS

LOOP

RESTOR£ TI£ RfDISTER I'ALI£S AND RETURN
POP
POP
POP
POP
POP
RETS

ARS
AR4

RS
R4

FP

!!'

Rl~X(2)+RO !!!
; Rl=XCJ2)+RQ !!!

-

LOOP

oj:>.

Ii'PEIIlIX C3
INLlI'

IBERIC PROGRM TO 00 A RADIX-2 REIl. IIrIERSE m cot'UTATIIII III TI£
TIIS32OC3O.
TI£ IREIl.) DATA RESIlE IN INTEIINAl. IEID!I'. TI£ cot'UTATIIII IS IXI£
IIH'I.M:E. TI£ BIT RE'IER9l. IS IXI£ AT TI£ IIEGINNINl IF TI£ PROGRM. TI£
INPUT DATA ARE STM£D IN TI£ FW-IJlINlIlUER:
REIO). REIII ..... REIN/2). 1"IN/2-U ..... 1"111

~

;:s

~
"5

.,

r
i:>

TI£ TWIDILE FACTlRS ARE SlI'flIED IN A TAIIlE PUT IN A • DATA SECTIIII. THIS
DATA IS llruJIED IN A SEPARATE FILE TO PI£SER'IE TI£ IBIERIC NAME IF ll£
PROGRAft. FOR TI£ SNE PlR'OSE. TI£ SIZE IF TI£ FFT N All) LOO2IN) ARE
DEFINED IN A .GLOa. DlRECTI~ AND SPECIFIED mUNl LlNUNl. ll£ LENlTH OF
TI£ TAIIlE IS N/4 + N/4 = N/2.

MlTHIIl: PANOS PAPAftICHAlIS
TEXAS INSTRl.IUTS

:::to

§

~

.GLoa.

IFFT

.GLOa.

N

.GLoa.

"SINE

.GLOa.

tI

.BSS

!'i

•TEXT

~

INITIALIZE

§

~
<;j

~
§
So

""
~
~

N

.WORD

C

IIIP

_IIRU
, POINT TO XII+N4)
Hlli5I1RIJ. *+AR5URI ).RO
t+AR51 IRU .'-AR5I1RIJ .RI
, XII)=XIIJ+XII+N2)
RO.'-AR5I1RIJ
, XII+N2)=XUJ-XII+N2)
RI ....IIRSIIRIJ
oARS.RO
2.0.RO
, XII+N4)'2*XII+N4)
RO. Hlli5URU
_IIRIJ.RI
-2.0.RI
RI._URI)
, XII+N4+N2)-III+N4+N2)'2

ADIF
SIJBF

"

,
,
,
,

\:

ENTRY POINT FOR EIECUTIIII
m SIZE
L0G2IN)
AIIIIRESS OF SINE TABLE

LIF
It"fF
STF

INP.I024

, ARI POINTS TO XIIU-XII'-l)

IFFT

, STARlINl LOCATIIII IF TI£ PROGRAft

100

,

•SPACE

FFTSIZ
LOOfFT
SINTAB
INPUT

.1DlD
.1DlD
.WORD
•WORD

"SINE

IFF!:

LIP

mSlz

,

~

100 I«lWS FOR

~TlRS.

::

(M

tmSIZ.IRI
-2,IRI
R4.11:

, IRI=SEPARATIIII IlETWEEN SIN/COS TB..S

2.Re

, REPEAT NH TII£S

RPTB

1lJ(3
oAR2.oARl.RI
oAR2. oARI,RO
RI.*+AROURIJ.RO
RO. oARl"
oAR3. oAR4.R2
oAR3. oAR4.R6
R2.oIIRO.R6
R6.0IIR2R6.RO
R2.*+AROIIRIJ.R6
RO.0AR3++
RI. oARO++IlRO).RO
R6.RO
RO.oAR4-

LSH
LDI
LSH
INNER LIXI'

I.IAO
3.R5
tFFTSIZ.R3
-1.R3
IFFTSIZ.R4
-2.R4

AIIIIF
It"fF
STF
AIIIIF

,

~

TO LOAD DATA

f'A(E

POINTER

"

::
, lROoINlEX FOR E
, R5 IILDS TI£ C\IIRENT STAGE _
, R3=N1120H2

1lJ(3

It"fF
STF
SIJBF
It'Yf

STF
It'YF
ADIF
STF
SUBI
Clt'l

, R4=NI/_

LTD
ADIJI
LDI
ADDI

tlNPUT.1IRS
tFFTSIZ.M5
liLOP
IINPUT.M5
lRO.ARO
ISINTAB. ARO

n~

~ ~
"8itt')::I.
::p

8=

8]
e.tD

LDI
LSH
LDI
SUBI

SIJBF

INP

IMIIN UlOP IFFT STAGES)
LDI
LDI
LDI

ETC.

.

, ARl POINTS TO XU3)-XU+J+N2)
, AR2 POINTS TO X112)=lIhJ+N2)
, 1IR4 POINTS TO X(14)=1II-J+N1J

, I£IO!Y WITH INPUT DATA

SIJBF

C

a

M5.ARI
I.ARI
ARI.AR3
R3.ARl
ARl.AR2
2.AR2
R3.AR2.AR4

"CI

, ARO POINTS TO SIN/COS TAIIlE
, IRI=N4

INNEltIOSTUlOP

~

~

LDI
ADDI
LDI
ADDI
LDI
SUBI
ADDI

STF

.s;,

..,""~

IECEIIIER 21. 1988

tINPUT.M5
lRO.ARO
ISINTAB.ARO
R4.IRI

STF
STF
LIF
It'Yf

>
'i
=
~
n

, M5 POINTS TO XIII

LDI
LDI
AIIIII
LDI

, RI=Tl=XIIlJ-XII2)
, ROoTlOCOS
, IIUJ=XlUJ+X112)
, R2=T2=11 13J+X1141
, R6=T20SIN
, X112)=X114J-X1I3)
, R6=T2oCOS
, X113)-TlOCOS-T20SIN
, ROoTlOSIN
, l!I4)=Tl'SIN+T2oCOS

=

~i

00=

~~
n~

(M.

=~

~

-

=
~
~
"1
I'll

tD
, LOll' BACK TO TI£ INNER UlOP
, ARO POINTS TO SIN/COS TAIIlE

3

~

ADDI
Clf'1
!LED
lSH
lSH
lSH

~
'tl

r
is

§.

i:l
(":l

~

::

o
s..
~

s..

~

~

a

VI

UF
III'TB
AD!F
SUIlF
STF
SIT
UF

_OIlROl,RO
BLK2
RO,IAIlO++(IROl,RI
RO,I-AIlOIlROl,R!
RI,I-RROIlROl
RI, fARo.+
I-AIlO,R!
2.0,RI
R!,I-RROIlROl

; RO=xtl+21

SIT
UF

; REPEAT N/4 TlI£S
; Re S!IJLlD BE (IE lESS THAll DESIRED •

;
;
;
;

RI=X(J)+XIl+21
RI=X(JHIl+21
X(J)=X(J)+X(J+21
X(J+21=xtlH(I+21

; RI=2.0fX(I+1l
; X(I+1l=2.O+XII+ll

fARO++,Rt

-2.0,R!
RI,t-AIlO
t+AAOIlROl,RO

; RI=-2.0+X(J+31
; XIl+31=-2.01X(J+31
; RO=X(I+4+21

LENGTH- Tl«l BUTTERFLIES

~

c

; AIlO POINTS TO X(J)
; IRO=2=IQ

If'YF

..

BLK2

§

~

@INPUT,AIlO
2,IRO
IffTSlZ,Re
-2,Re
I,Re

STF
llF

~

~

E=E12
_/2
N2=N212

lOI
lDI
lOI
lSH
SOOI

If'YF

...

~

CONT
tARO,RO
tAAI,RI
RO,IARI
RI,itIIRO
tARO..
*AR1++(IRO)B

END

IIR

END

.END

~

§

CONT
BITRV

IIGE
LDF
UF
STF
SIT
to'
to'

..
..

UlST PASS OF TI£ MIN lOOP

~

~

1,R5
1I.00FFT,R5
lOOP
I,IRO
-1,R4
-1,R3

..

8lKI

xm

lOI
lDI
lSH
SOOI

!INPUT,ARO
IFFTSlZ,Re

; AIlO POINTS TO

-1,At
I,Re

; REPEAT Nl2 TlI£S
; Re S!IJLlD BE (IE LESS TIWl DESIRED •

RPTB

BUO
_O,IAIl()++,RO
tARO, I-RRO, RI
RO,I-RRO
RI,tAAO++

;
;
;
;

AD!F
SUBF
STF
SIT

RO=X(J)+X(J+ll
RI=X(JHIl+ll
X(!)=X(J)+X(J+ll
XIl+ll=X(JHIl+Il

DO TI£ BIT REVERSING AT TI£ END
lOl
SlJlI
lOI
lSH
lOI
lOI

IFFTSIZ,Re
I,Re
IFFTSII,IRO
-I,IRO
@III'UT,AIlO
@INPUT,ARI

RPTB
Clf'I

BITRV
ARI,AIlO

; Re=N
; Re S!IJLlD IE

(IE

LESS THAll DESIRED •

; IRO=HAlF TI£ SIZE OF m=N/2

; XCIMlE lOCATlOOS

(It. Y

IF IIROER BY SOOENSr. '.. 'i.., OCT 1985 1SSl£
IF TI£ TRANSACTI!»IS !»I ASSP.

..
..

CO!!
BITRV

11£ TIIIDILE FACTORS ARE SIFPlIED IN A TABLE PUT IN A • OATA SECTI!»I. THIS
OATA IS UIl.IJI£D IN A SEPARATE FILE TO PRESERVE TIE IElERIC NATIH: IF TIE
PROORAII. FOR TI£ SAlE PURPOSE, 11£ SIZE IF TIE FHT N AND UIG2(NI ARE
If:FlNED IN A .GLOIIL DlRECTlI'E AND SPECIFIED lUlING LIN=

~

=
~

~

~

, lRO=INlf:X FOR E
, A:; I«LlIS TI£ ctRIDIT STAGE N.IIIIER
,R4=N4
, R3=N2
, E=E/2
, N4--2tN4
, N2=21N2

INNER LOOP (DO-3O LOOP IN TIE _ I

, XCIilNGE LOCATI!»IS (Jl.Y
IF IiRO

~
i:=
~

tMl,Rl
RO,tMl
Rl,tMO
tARO++
tMl++UROIB

LEIIGTIt-M Il/TTEIIfUES

THE (REALI OATA RESllf: IN INTERNAL I£IIOOY. 11£ COI1fUTATI!»I IS IOE
IN-PLACE. TI£ BIT-REYERSIi. IS IOE AT TIE IIEGIN'4lt1l IF TIE PROORM.
~

LIF
S1F
STF
10'
10'

, AR5 POINTS 10 X(JI
, ARO POINTS TO SIIIICOS TABLE
, IRI=N4

a=

~

...,
""I

i~

a

~

::s

~

"tj

~

::I
<1>
::s
is

5·

::s
~

LDI
ADDI
LDI
ADDI
LDI
SUBI
ADDI

AR5,IIAI
I,IIAI
1lA1,AR3
R3,AR3
1lA3,IIA2
2,IIA2
R3,IIA2,1IA4

UF

+/IR5++1 IRI) ,R!
f+/lR5I1RI), RO, Rl
RO,fHAR5IIRIl,RO
RI,f-AR5I1RIl
RO
RO,+/1R5
t+IIR5l1RIl ,RO
RO,f-AR5I1RIl,RI
RO,f-AR5I1RII,Rl
RI,t-AR5IIRIl
Rl,f+AR5I1RIl

,
,
,
,
,
,
,
,
,
,
,

LDI
lSH
LDI
SUBI

IFHTSIZ,IRI
-2,IRl
R4,Re
2,Re

, REPEAT 114-1 TIlES

RPTB

1l.K3

II'VF
II'VF
II'YF

tAR3, t+AROll RIl , RO
fllA4,iIIRO,Rl
tllA4, t+AROllRII,RI
RO,RI,R2
tAR3,fIIAO++IlRO),RO
Rl,RO,RO
RO,tIIA2,RI
tIIA2,RO,RI
RI,tllA4-tllAl,R2,RI
RI,fllA2-R2,tllAl,RI
RI,tllAl++
Rl,_

AJ)JF

SUIIF

STf

.~

"

tl

"

UF
AIIIF
SUIIF

"

STF
STf

STf

(")

."-3
I:l

~

So
<1>

...

~
s::;

~
C
"

So
<1>

~

~
tv
C

ADOF
II'YF
SLIIF
SUIIF

ADIF
"

STf

II

STF
SUBF
STf
STf

C

Q

"

BlK3

ADOF

SUBI
ADDI
Cll'1

Il.TD
ADDI

IQ

, AR3 POINTS TO XIL3)'XIlI+N2)
, IIA2 POINTS TO XI12)'XlJ-I+I+N2)
, IIA4 POINTS TO 1Il4)'XI12+N2)
RO=XIJ)
Rl'XIJ)+Xl12)
RO=-XlJ)+II12)
IIJ)'IIJl+II121
RO=XIJI-XI121
XI12I,XIJI-Xll2)
RO=XIl41
Rl=Xll3l+1Il4)
RI=Xll3I-Xll4)
XIl31=XlL3)+XIl4)
Xll41'Xll3I-Xll41

IIIIEIIIOST lOO'

0

~
g

NEGF

, IlAI POINTS TO XILH'XlJ+I-1)

IINPUT,AR5
R3,AR5
IFHTSIZ ,AR5
1lt.1P
IINPUT,AR5

NIIf'
NIIf'

ADDI
Cll'1

l,RS
IlUMT,R5

, IR1=SEPARATllJIllETlEEIl SIN/COS 1III.S

,RO=XlL3)fCI)S
, Rl=XIl4ItSIN
, RI=Xll4)fCI)S
, R2=Xll3IfC1)S+Xll4)tSlN=Tl
, RO=XIl3)tSIN
, RO=Xll3)tSlN-1114)fCOSoT2
, RI=XI12)-T2
, RI'X(12)+T2
, Xll4)'Xll2)-T2
, RI·Xllll+TJ
, XI12)'XI12)+T2
, RI'XILH-Tl
, XIlll=XlLH+Tl
, XI13)=XIll)-Tl

, AR5=1+N1

, UO' BlICK TO Tl£ INNER UO'

END

IIlE

UO'

IIR
,00

EIIl

,1IRIIIDt TO ITSnF AT Tl£ EIIl

120

An Implementation of FFT, DCT, and Other Transforms on the TMS320C30

Appendix E. Discrete Cosine Transform

An Implementation of FFT, DCT, and Other Transforms on the TMS320C30

121

~

, l1II IlUTTERFl.IES lIE CM.CllA1III AT
I 11£ SAlE TIlE.

IIITSIIE-LOOP'
1tl1111.E..L011"

IIPI'EIItIl EI
AFAST COSINE _IHI

II

UF
UF

IM2,R2
_,R3

; GET UIofR tIII.F ~ EACH IlUTTERFl.Y.
(THIS IUIIIS fill 1lIIIE PARIU.B..
;

II

SUIF3
SUIF3
1F'iF3
AllF3

_,_,RI
IM2,tMI,RO
RI, t++M7 ,RI
R3,_,R3

IF'iF3
ADIF3

RO.t-M7,RO
R2,tMI,R2

STF
STF

RI. tM2++IIRlll
R3,_IIRlll

; SlITRACT SECOtI IlJTTBFlY DATA.
; SlITRACT FIRST IlUTTERFl.Y DATA.
, IILTiPlY 2Itl UTRACTllltllESll.T BY
; COSIIE ClEFFICIENT. IIIIl SECOII)
BUTltIlFlY DATA.
, IILTlPlV 1ST UTRACTIIIt.IIESIl.T BY
COSINE ClEFFICIENT. IIIIl FIRST
IlJTTERfLY DATA.
I ~ 2JtI) IILTiPlY RESlLT IN UIofR
tIII.F IF IlJTTBFlY. _
2JtI)
AOOITlIIt IN lfPER 2JtI) IlJTTBFlY.

!lASED lit 11£ IIl.WIITII1 OUTlUe BY BYElIIIl 61 LEE IN HIS MTIa.E, FCT - A

FAST COSINE _ , Pl8.ISI£D IN 11£ PIiOCEEDIIClS IF 11£ IEEE INTEl!MTlIM. caFEIIEIa 1)1 ACOOSTICS, SPEECH. All) SIGNM. PROCESSING, SM
DIEGO, CA, 19-21 IWICH 1984, P 281\.3/1-4 YII.. 2, (CHI9$I-5/84/~).
LEE'S IIl.WIITII1 HAS BEEN IIlDIFIED TO AU.III tflTlM. OIlER TIlE _IN
ClEFFICIOOS RA1IER THAN 11£ LESS IIIIOED INPUT stmSTED IN HIS MTIa.E.

~

;:s

~
W

TIE FRElIIEICY _IN ClEFFICIENTS lIE IN BIT REVERSE !IllER. THIS IS All IN
PLACE CAlC\LATlIIt.
AUTHOR' PAUL WllJELII

~

::to

.globil
• global
• globil
.globtl

g

~
t:I

~

!"'3

~
<::>
.,.
.,S~

s::i

;:s

~
(:)

~
Q

;:s

.,.S~

~

tv
C

Q
C

CIIIMIS LAwn

,
,,
,,

ElID_CENTELLW"

S

~

.
.

,

FCT
ft
COS_TAB
CIEFF

;
;
,
;

FAST COSINE _
ENTRY POINT.
l.fMlTH ~ DATA ENTRY.
TABLE IF COSINE ClEFFICIENTS •
TABLE ~ INPUT DATA.

STF
STF

.
END

~

RO,_IIRlll
R2, tMI++IIRIlX

;

~

1ST IILTiPlY IN UIofR tIII.F ~
2HIl1lUTTERFl.Y. ~ 1ST 1IIIl1Tl1It
IN lIPER 1ST IlUTTERFl.Y.

;

AIlDI3

..cus

...DATA

••,..0
.•r4
.1Iord

ft
COUAB
aJEFF

AllF3

_,IM2-,RO
ARl,AR2
ftlDllLLW'
tMI++,tM4-,RO

IIIIlI
OR

2,MT
OIOOH,ST

AIIF3
CftPI

FCT.

IFCTSI ZE, ARO
IFCTSIZE,lI<

LOI
LOI
LDI
LOI
LOI
AODI3
SUlI
LSH3
LOI
ADDI
ADD13
ADDI3

LllATA,AR6
LCOS,MT
ARO,IRI
-1,1110
ARb,ARI
AR6, ARO, AR2
1,AR2
IRO,ARO,ARl
I,MS
AR6,ARl
IRO,ARl,AR4
IRO.AR5,RC

, LIIAO DATA LENGTH.
, GET BLOCI< SIZE fill CIRCLUR
IIIIlRESSING.
, LIIAO DATA POINTER.
, LIIAO COSINE TABLE POINTER.
, INITIALIZE INIEI REGISTERS fill FIRST
BUTTERflY SERIES.
, INlTIALllE DATA POINTERS.

,
,

, INITIALIZE 2'S POER CIUlTER.
, FINISH DATA POINTER INITIALIZATlIIt.
SHW..D lIE ONE LESS THAN COlM'
,, RCIESIREll.

LSIt
SIJIII3

-I,IRI
ARb,ARI
IRO,AR6,AR2
IRI,AR2
2,IRI
IIITSIIU.OOP
I,MS
IRO, AR4, ARl

AOOI3

IRO,MS,RC

FIRST LW' SERIES

END_CENTElUOOP

,
,,

_SS

IELAY IIIWDI FROft !ERE TO ftiDOLE-LOOP.
LSH
LOI
ADDI
ADDI
CftPI

THIS l.W' SERIES lXES IU TIE IIUTTERFI.Y STAlES EXCEPT 11£ FIML ONE.
RPTB

, tJIIIATE REPEAT CIUlTER fill NElT LOCI<
REPEAT.
; tJIIIATE DATA POINTERS.
I HAl( IlUTTERFl.IES BEEN COIIPI.ETED?
; IELAYEll IIIWDI, IF NOT.
I tJIIIATE FIML TWO POINTERS fill NEXT
REPEAT.
, tJIIIATE COSINE ClEFFICIENT POINTER.
, GET REPEAT /lIE. (FASTER THAN ISING
RPTB lIEN STMT All) END
lIE STILL Il(g)
;

IIGTO

LOI
LOI

IRO,AR5,RC

lIGTD

END IF FIRST LOOP SERIES.
FINAL BUTTERFLY STAGE LOOP.

l!I!j
.....

>

·CENTER LOOP IF FIRST LW' SERIES.

• text

FCTSllE

>
i
=
~
"C

, Ll'llATE INlEX REGISTER. (DIVIIE BY 2)
, REINITIALIZE DATA POINTERS.
IS FIRST BUTTERflY SERIES ClJFLETE?
IELAY IIIWDI, IF NOT.
ItLTlPlY 2'S POER CIUlTER BY 2.
ClltTlME REINITIALIZING DATA
POINTERS.
GET REPEAT CIUlTER Fill REPEAT BLOCI<.

~

=
11.1

("}

i

5"
~

~

!8'

e

::..

;:s

INCLUDES LAST BUTTERFLIES AND FIRST STAGE IF BIT REVERSE ADDITIONS.

~
'G

ADDI
ADDI3
rl'YF3

. 4,IRI
1,AR3
-1,AR5
3,AR4
IRG,AR5,RC
*M.7 • ftAR7 , R4

IIPTD

ENG_2ND-LOOP

SUllF3
SUllf3
nPYF3
ADIF3

, SUBTRACT 1ST BUTTERFLY DATA.
1M2, tARI, RG
, SUBTRACT 2ND BUTTERFLY DATA.
tAR4,1AR3,Rl
, I1lJl.TlPLY 1ST SUBTRACTl{)l RESILT
RG,R4,RG
, BY S. ADD 2ND BUTTERFLY
tAR3++IIRI I, IAR4++lIRll ,R3
liITA.
, ILUIPLY 2ND SUBTRACTl{)l RESlU
Rl,R4,Rl
BY S. ADD 1ST BUTTERFLY
tAR 1HIIRI I, f/lR2HIIRII, R2
DATA.
, IIJl.TlPLY 2ND ADDITION RESILT BY
RJ, 1+AR7, RJ
7071. SAYE 1ST SUBTRACTl{)l IN
RO,HIR2IIRlI
LOIIER 112 IF 1ST BUTTERFLY.
, IlLTlPL Y 1ST ADDITI{)l RESIL T BY
RZ, f+AR7, R2
.7071 SAYE 2ND SUBTRACTl{)l IN
Rl,f-AR4IIRlI
LCIjER 112 IF 2ND BUTTERFLY.
, ADD 2ND SIIBTRACTl{)l IlLTlPLY TO ZND
RJ,Rl,R3
AODITIOO ItUIPL Y.
, SAYE 1ST ADDITION IlLTlPLY IN If'PER
R2,HIRlIlRlI
liZ IF BUTTERFLY.

LDI
ADDI

~

LSH

~

~

is

::to
C

;:s

~

~
\::)

(J
~
s:l

5.

C
S-

.,"-

~

I:l

;:s

~
C

~

'"C

;:s

S"-

~

"
"
"
"

If'YF3
ADIlF3
nPVF3
STF
rl'YF3
STF
AD[I'3
STF

RJ,I-!lR3IlRll

C

THIS LIXJ' SERIES DOES ALL IF THE BIT
COSINE TRANSFCR!.

ENG_INSIl(:
STF

Rl,fAR3HIIRlll:

, SAYE SECOND ADDITIOO.

ENG IF INSIDE LIXJ' FOR LAST LIXJ' SERIES.
AD[I'3
AD[I'3
ADIlF3
AD[I'3
CliPi
BlED
LDI
SUBI
OR

fARIH!lR01B,tARZ++!lRG1B,RG
,If'DATE DATA POINTERS.
tARJ++!lRG1B,fAR4++IlRGIB,RO
tARJ++(IR01B,tAR4++!lROIB,RO
tARl++lIROIB, fAR2++IIR01B,RG
R4,AR4
, IS THIS LIXJ' COrI'LETE?
LASUNSIl(-L1XJ'
,!(lAYED IIRAIOI, IF NOT.
AR5, RC
, SET If' REPEAT crumR.
I,RC
0100H, ST
, SET REPEAT lIllIE.

IIPTB
AD[I'3

LAST Jl.OCK
, slIa THERE ARE AN ODD It.IIlIEII IF
tARl,tAR2++!lRlll,RO
,
ADDlTlOO, THE FINAL (H:S ARE
~.

LASUILOCK:
STF

LOI
LOI
ADDI
LDI
LOI

W

fAR 1,tAR2++ IlRm, RO ,ADD FIRST M DATA.
tARJ,tAR4++!lRlIX,Rl, ADD SEca«I M DATA.
RG,fARl++(IRlIX
, SAYE FIRST ADDITI{)l.

lDt:

BIT REVERSE ADDITION LIXJ' SERIES.

-

ADIlF3
AD[I'3
STF

I

C

to.)

, TI«I ADDITIOO ARE lDt: IN EACH LOOP.

LASUNSllE_LIXJ':

, SAYE 2ND ADDITION IlLTPLY IN If'PER
liZ IF If'PER BUTTERFLY.

ENG IF FINAL BUTTERFLY STAGE LIXJ'.

a

END_INSIlE

ra

BRANCH !(LAYED TO LASLlNSIlLLIXJ',

~

N

RPTB

SUBI
, INITIALIZE REPEAT crumR.
, CAlCllATE IZmIICOSIPI/41.
U.E.-) ISliRHZllln THIS VALLE IS
CAlLED, S, BELCIi.)
, M BUTTERFLIES ARE CAlCllATED PER
LIXJ'.

I
ENG-2NlLLOOP:
STF

LDl

AR5,RC
, SET If' REPEAT crumR.
fAR2++IlRGIB, tAR4++IIR01B,RO
,DATA POINTER If'li\TE.
ARl,R4
, USE INITIAL ARI VALLE AS INNER LIXJ'
C{)lTRIL
I,RC
tAR4++IIROIB
, CONTINLE If'DATlt<:
LOI

RO, tARl++( IRlil

ADDITIONS AT THE ENG IF FAST
I,IRO
lRO,R4
1,AR5
LASUIJTSIl(_LIXJ'
R4,ARZ
R4,ARI
I,IRI

IlLTlPLY lRO BY Z.
Ll'DTEE INNER LIXJ' C{)lTR(L REGISTER.
ARE CAlCllATlOO COIPLETE ?
!(LAVED IIRAIOI, IF NOT.
If'liITE DATA POINTERS.
, IlLTlPLY IRI BY Z.

DELAYED IIRAIOI TO LASLIlJTSlDE_LOOP.

.----~-

--'~-_-.,f'.~,

~

END IF LAST LOOP SERIES.

IIILTJPL Y CW'F1CIENT ZERO BY .5, IF MIT ZERO.

~

;:s

~

rg'

~

E

~

~

.'""3

t:::I

t"'l

.'""3

~

o

So
....
~

~

§

~
c

~
g

So
~

~

~
t-.,)
c

Q
c

UF
IIEIlD

oM.,RO
IXfff_STOOE

LSH

24,AR5

SUBI3

A115, _ , /lIU

SET ZERO ~ IF oM•• O.
IF CW'FICIEIIT IS ZERO, OOIn DO
THIS.
USE INTESER !tAIH FCR FLOAT DIVllE
BY 2.

I«l'

IELAYED IIWOi FRIll I£RE IF VALlE IS MIT TO lIE STORED.
STI
JIONT..STIlRE:

RETS

AlII, oM.

; STOOE, IF EXPOIEIIT IIASN'T -128.

~

::s

BASED 011 TI£ AlGtlUTfII
FAST rosllE TRANSFORII,
NATlIlW. CIlN'ERfN:£ 011
DIEGO, CA, 19-21 ~

is

g'

WTLIIED BY BYEOIIS 61 LEE IN HIS ARTIIl.E, FCT - A
PUlLISI£D IN TI£ _IMIS IF TI£ IEEE IntorImJSTICS, SPEECH, AND SIGNAl. PROCESSIIfl, SIIH
1984, P 28A.3/1-4 YIL 2., ICHI954-5/84/OO1iO-0299I.

LEE'S AlGORlTfII HAS IEEN I'IlIIIFlED TO AIl1II NATURAL IRDER TIlE IDIAIN
ClEFFICIENTS.

~

~

.'"-l

THE FREQlEIICY IDIAIN COEFFICIENTS ARE IN BIT RE'SSE IRDER. THIS IS III IN
PlACE CALCWlTlOII.

tl

AUTH(JI: PAUL WIU£l.II

~

.'"-l

.global

IFCT

~

::s
:::...

.global

a
S.
..,'"
~

::s

~
C

~

>
FCTSIZE
.DATA
.ros

•IFCn

'"

~
~

N

C

0C

. global

CIEFF

.glObal

ros.TAB

, IIIIERSE FAST rosllE TRANSFORII ENTRY
POINT.
; LOOTH IF ARRAY TO lIE TRANSFORIIED.
, TABLE IF rosl!E ClEFFICIENTS•
, TABLE IF ARRAY MTA TO lIE
TRANSFORIIED.

UF

AR4++IIROIB,RO ,DO SECOIID ADDITIOII.
RI,

"CI
"CI
~

=
Q..

~.

.~

>
~

!

n

...
=
1-3

Q

rIl

END IF lItER nasT LOll'.

::
, LOAD ARRAY SIZE.
, LOAD BLOCK SIZE FIJI CIRCtuIR
AIIORESSIIfl
; LOAD POINTER TO DATA TABLE.
, LOAD POINTER TO rosllE TABLE.
, POINT TO lAST rosl!E YAlI£ IN TABLE.

LQ(P.

END.CENTER:

::

ros.TAB

, TOP IF IllER IIlST LOll'.
, T(f' IF MIDDlE

ADIJF3

START IF BIT REVERSED ADDITIOII LOll' SERIES.

N
VI

END_CENTER

,

MIDDLE:

• text

§
S.

.....

RPTB

ADIF3
LDI
LOI
LOI
ADIF3

A FAST rosllE TRANSFORII (JIIIERSE TRANSFORIII

~

Sl
~

l::i

LSH

tM2++(JROIB
AR3++1 IROIB, 

I¥'I'ElUIlX E2

~
'"is

LDF
UF
ADIF3
STF
ADIF3

LOI
Clf'1

BlED
LSH
SUBI
OR

>IiR3++1 IRIII, -ARl,RS
*AR7,R3,RI

ADDF3
5U11F3

RI, *-Aft4. R3
R4.>-ARl.R4

; BIT REI'ERSED ADDlTIOO Foo 2NO
BUTTEIRY.
,COSINE PII4 TIlES LOWER HAlF IF 1ST
BUTTERFlY.
, COSINE PI/4 TIrES LOWER HAlF IF 2NO
BUTTERFLY.
, BIT REI'ERSED ADDlTIOO All 4TH
BUTTERFLY.
, ADD lfI'ER HAlF IF 1ST WTTERFL Y.
, COSINE PII4 TIlES LOWER HAlF [Ii' 4TH
IlUTTERFLY.
, ADD lfI'ER HAlF [Ii' 2NO IlUTTEIRY•
, SIIITRACT LOWER HAlF [Ii' 1ST
BUTTERFLY.
, tIL TI PLY UPPER HAlF IF 2NO BUTTERFLY
BY COSIIE COO'FIEIENT.
, SlmTRACT LMR HAlF IF 2NO
BUTTERFlY.
, STORE UPPER HAlF [Ii' 1ST IlUTTERFLY.
, STORE LOWER HAlF IF 1ST IlUTTERFLY.
, STORE LMR HAlF [Ii' 2rtI IIUTTERFl Y.
, COSINE PII4 TIlES LOWER HAlF [Ii' 3RD
BUTTERFLY.
;
, tlLTiPLY LOWER HAlF [Ii' 2NO IIUTTERFlY
BY COSINE COEFFICIENT
;
, SUBTRACT LOWER HAlF IF 4TH
IlUTTERFLY.
, ADO lfI'ER HAlF IF 3RD WTTEIRY.
, IIJI.TlPLY ,MR HAlF [Ii' 4TH WTTERFlY
BY COSllE COO'FICIENT
, ADD lfI'ER HAlF [Ii' 4TH IlUTTERFLY.
, SUBTRACT LOWER HAlF [Ii' 3RD
IlUTTERFLY.
, tlLTiPLY lfI'ER HAlF IF 4TH BUTTEIRY
BY COSINE ClEFFICIENT.
, STORE lfI'ER HAlF [Ii' 4TH BUTTERFLY.
, STORE UPPER HALF [Ii' 2NO IIUTTERFL Y•
, STORE UPPER HALF [Ii' 3RD IIUTTERFL Y•

::t...

::I

~
'\5

rf~

~

5'
::I

~

~

i::l

(J

II

..

~

~
s-a
...'"
~
I:i
::I

~
C

~

"

§

....S-

~

"

~

1t'YF3

"'AR7,R3,RI

C

"

C

:t

STF
STF
STF

RI,~4
RO,I/IR2++(iRII%
RS, >-ARl

IV

a

STF
STF

.

EHlI IF CfNTER IlUTTERFLY LOIP.

THIS SERIES IF LOOPS DOES ALL IIUT TI£ LAST WTTERFLY STAGE. ALL TI£
COSIIE COO'FICIENT tlLTlPLICATIOOS ARE lIOI£, 1Ill.III1NG TI£ tlLTIPLICATIOOS All TI£ UlST IlUTTERFLY STAGE. !THIS _
FLOW ALLilIS All
FAST EXECUTIOO.1

, SAllE REPEAT COJITER Fill LATER USE.

,

,
,
,

SUBI
SUBI
lOl
LDF
LDF

2.Aft7
I,Aft4
AR5,RC
tAR7-,RS
1IIR7--,R4

,
,
;
,

lPMTE COSINE COO'FICIENT POINTER.
lPMTE DATA POINTER.
RELOAD REPEAT COJITER.
GET COSINE COEFFICIENTS.

RPTB

EHlI_NTl

, 00 IIUTTERFlIES ARE CAlCWIT£O PER
CYClE _
TI£ INI£R LOIP.

NTLl-COP:

,

,

, STmE l.(Ij[R HAlF IF 4TH BUTTERFLY.
; STmE LOWER HAlF IF 3RD BUTTERFLY.

START NEXT TO UIST LOOP SERIES.

, GET COSllE PII4.

,
,

RI,1IIR4++!lRIlX
R4,1IIR3++!lRIII

..
"
"

5UIIF3

1IIR2,IIIRI,R3

STF
STF

RO,_IIRm
R2,IIIRI++!lRIIX

, SUBTRACT LOWER HAlF [Ii' 2111
IlUTTERFLY.
, ADD UPPER HAlF [Ii' 2111 IlUTTERFL Y•
, tlLTiPLY lfI'ER HAlF IF 2rtI BUTTERFLY
BY COSINE ClEFFICIENT.
, ADD lfI'ER HAlF IF 1ST BUTTEIRY.
, tl.LTlPLY LOWER HAlF [Ii' 2NO WTTERFLY
BY COSINE COO'FlElCENT •
; SUBTRACT LOWER HAlF IF 1ST
BUTTERFLY.
; STORE lfI'ER HAlF [Ii' 2NO IIUTTERFl Y•
, STORE UPPER HAlF IF 1ST IlUTTEIRY.

STF
STF

RI,IIIR4++!IRIlI
R3, tAR2++(iRIlX

, STORE LOWER HAlF [Ii' 1ST IIUTTERFlY.
, STORE LOWER HAlF IF 2NO BUTTERFLY.

5U11F3

1IIR4,IAf/.3,R6

AllllF3
1t'YF3

1IIR4,IAf/.3,R7
RS,R6,RO

ADDF3
1I'YF3

tAR2,IIIRI,R2
R4,R7.RI

,

ElIi..NTL:

"

EHlI IF CfNTER LOIP IF NEIT TO UlST SERIES.
lOl
LDF
LDF

AR5,Re
tAR7-,RS
1IIR7-,R4

Cll'1

AftI,AR6
NTL_LOOP
1IIR4++, >IIR3-, RO

BNED
ADDF3

, RELOAD REPEAT CIUITER.
, GET NEW COSINE COO'FICIENTS. IFYITI£ UlST TIrE, THIS Will FETCH
FRO! IEIIllY BELIJI TI£ COSIIE
TABLE. I
, HAS "IDILE LOIP BEEN C!If'I.£TED ?
, IF NOT, BRAN:H~YEO.
, IUIf( ADOS TO
IE DATA POINTERS.

~

ADIIF3
III

~~

_

§

a
g'

Cll'1

BGED
AIIII3
LSH
LBI

.~
c
()

, SET REPEAT 1tOIl. (STMT/STIP
AIIII\ESSES ARE STIll OOOD,I

lELAY FROIIER£ TO NTL-UU'.

LBI
AIIII3
LSH

~

tM2++, tMl-, RO

OIOOH,ST

lELAYED _

M3,MI
IRI,MI,M3
1,IRI
IRI,MO
NTLUU'
IRO,M3,M4
-1,AR5
AR5,fIt

, lPMTE DATA POINTERS,
lPMTE IIIIEX II£GISTER.
IS THIS LOIP SERIES CM'lETE ?
IF fliT, _
lELAYED.
lPMTE DATA POINTER •
lPMTE REPEAT 1nNTER,

FROIIER£ TO NTLUlOP.

!""l

END IF IEXT TO UIST LOOP SERIES.

~
s-

START IF TI£ UIST LOOP,
TI£ lAST LOOP IS TI£ UIST IIITTERFlY STAIlE WITHWT TI£ rosiNE ClEFFICIOO
IILTlI'lICATlIIIS, 1II1C11 HAI£ AI.READ'I IIEEH 11K,

C
~

LBI
AIIII3
511113
LBI
LSH
SIIII

2,IRI
IRO,M2,M4
IRO,Ml,M3
MO,RC
-2,fIt
I,RC

, INITIALIZE IIIEX II£GISTER.
, INITIALIZE DATA POINTERS.

RPTB

END..lAST-UU'

, 1110 IIITTERFlIES ARE IDE FIll EACH
, CYClE _
TI£ LOOP.

s-

lJIF

_,RO

~

ADIIF3
SUlf3

IM2,tMl,RI
1M2, tMl,R2

~

§

~

~

g
Ilo

~

~

Q
c

I:

::

, INITIALIZE REPEAT 1nNTER.

R3, tM3-1IR1I

GET I/AU£ Fill l.mER IW.FIF 2IIl
lIUTTERFlY.
AlII II'PER IW.F IF 1ST lIUTTERFlY.
SlllTRACT LOWER IW.F IF 1ST
lIUTTERFlY,
AlII II'PER HALF IF 2IIl lIUTTERFlY.
STlIIE II'PER IW.F IF 1ST lIUTTERFlY,
SImTRACT l.mER IW.F IF 2IIl
lIUTTERFLY,
STlIIE LOWER IMLf IF 1ST lIUTTERFlY,
STlIIE II'PER HALF IF 2IIlllUTTERFlY.

R4,_nRlI

I STIllE l.mER IMLf IF 2111 lIUTTERFlY.

AlllF3

RO,IM3,R3

STF

RI, tMl-lIRlI

stIf3

RO, IM3,R4

STF
STF

R2, tM2++nRlI

EIILlAST-UU"
STF

END IF UIST LOOP, MIl 1 _ COSINE TRMISFORN.

8

II£TS
••nd

Appendix E3. FCT Cosine Tables File

**
**

FeT COSINE TABLES FILE

**

TO BE LINKED WITH FCT SOURCE CODE FOR 32 POINT FCT.

**
*
**
*
*
*
*
*f

APPENDIX E3

COEFFICIENTS ARE 1/(2 * COS(N*PII2/'1», WHERE NIS A NUl'lBER FROM 1 to
/'1-1. MIS THE ORDER OF THE TRANSFORM.
FOR A 32 POINT FeT, NIS IN THE FOLLOWING ORDER:
1, 15, 3, 13, 5, 11, 7, 9,
2, 14, 6, 10,
4, 12,
8

THE LAST VALUE IN THE TABLE IS 21M.

f

*
*/'I
*

COS_TAB

.global
.global

M

.set

16

•data

*COS_TAB
.float
.float
•float
• float
.float
.float
•float
• float
.float
• float
.float
.float
• float
.float
.float
.float

0.5024193
5.1011487
0.5224986
1.7224471
0.5669440
1.0606777
0.6468218
0.7881546
0.5097956
2.5629154
0.6013449
0.8999762
0.5411961
1.30b5630
0.7071068
0.1250000

.end
128

An Implementation of FFT, DCT, and Other Transforms on the TMS320C30

Appendix E4. Data File

*
*
*

*

*

*
*
COEFF

APPENDIX E4

DATA FILE
.glcfbal

COEFF

•data
.float
.float
.float
.float
.float
.float
.float
.float
• float
.float
• fl (fat
.float
.float
.float
.float
.float
.end

137.0
249.0
105.0
217.0
73.0
185.0
41.0
153.0
9.0
121.0
233.0
89.0
201.0
57.0
169.0
25.0

An Implementation of FFT, DCT, and Other Transforms on the TMS320C30

129

130

An Implementation of FFT, DCf, and Other Transforms on the TMS320C30

Appendix F. Test Vectors, 64-Point Sine Table, Link Command File

An Implementation of FFT, DCT, and Other Transforms on the TMS320C30

131

-

0.5598
0.9166

~

~

IV'PElIDIl FI

O.I~

0.7054
0.0178
O.UII
0.1358
0.0503
0.5782
0.2432
0.9448
0.5876
0.7256
0.2849
0.6767
0.8642
0.1943

EIAll'LE IF A 64-POINT 'lECTOR TO TEST TI£ FfT RWTlNES
1=

•

;3

§"

'G

if
'"is
;3

::to

g

~

~

t:::J

(')

!""I
I:>

S.
0

So

'".,

~
I:l
;3

~
~

~

0

;3

So

'"

~
~
N
C

Q
C

0.2113
0.0824
0.7599
0.0087
0.8096
0.8474
0.4524
0.8075

".4832

0.6135
0.2749
0.8807
0.6538
0.4899
0.7741
0.9626
0.9933
0.8360
0.7469
0.0378
0.4237
0.2M3
0.2403
0.3405
0.1167
0.6250
0.5510
0.3550
0.4943

0.0365
0.2260
O.BI59
0.2284
0.8553
0.0621
Q.7075
').2408
0.6907
0.1062
0.2640
0.7034
O.~I

0.6553
0.9700
0.0380
0.0988
0.2560

64-POINT FFT CORI£SI'IH)IIIl TO 'lECTOR X

~
1
=
~
"'!!j

I-'

°

~

I

e~
~

y.
1l.3774
1.7780 - 2.5584i
-1.0376 - 2.3999i
-1.0123 + 2.4889i
0.6594 + 2.3639i
-1.5228 - O.7527i
-3.BI71 - O.~i
-2.7096 • 1.2841i
2.1622 - 1.6863i
0.2879 + 1.8671i
-1.5479 + 1.6298i
-0.6366 - 0.1I76i
2.2902 + 1.5549i
-2.4837 - 0.5842i
-1.7338 + O.0738i
-0.2180 - 0.4726i
-0.2104 + 0.4IW7i
-1.7473 - 1.02l3i
0.1233 - 2.3915i
-0.6415 - 1.II44i
-2. n19 - O.4802i
-0.0063 - O.3885i
-0.7163 + I.~i
O.32IB - 1.3316i
-0.7823 + 1.0607i
-0.2553 + 2.8270i
-1.0813 - 2.7861i
3.4869 + 1.9485i
3.0352 + 1.3855i
3.2099 + 2.3564i
-1.9511 - 0.7714i
1.8755 + 0.2867i

~eo

-i
...

S'
1-3

f!

f

~

1-3

a....~

;

fIl

~
~
'G
~

§
is

§"
~

~
tl

(")

!""l

~
~
~
~

§

~
c::;

~
§'"

So

"'

~

"-i

ao~

~
~

-1.5474
1.8755
-1.9511
3.2099
3.0352
3.4869
-1.0813
-0.2553
-6.7823
0.3218
-6.7163
-o.OOb3
-2.7719
-0.MI5
0.1233

-1.7473
-0.2104
-0.2180
-1.7338
-2.4837
2.2902
-0. b3bb
-1.5479
0.2879
2.1622
-2.70%
-3.8171
-1.5228
0.6594
-1.0123
-1.0376

1.7780

- 0.2867i
+ 0.7714i
- 2.3564i
- 1.3855i
- 1.9485i
+ 2.7861i
- 2.8270i
- 1.0607i
+ 1.3316i
- 1.5I>82i
+ O.3885i
+ 0.4802i
+ 1.1144i
+ 2.3915i
+ 1.0213i
- 0.4897i
+ 0.4126i
- O.0738i
+ 0.5842i
- 1.5549i
+ 0.1176i
- 1.6298i
- 1.8b71i
+ 1.6863i
- 1.2841i
+ 0.205Oi
+ O.7527i
- 2.3639i
- 2.4889i
+ 2.39'19i
+ 2.5584i

w

.fl ..t

"""

.fl.tt

APPEND!! F2

.fl ..t

FILE TO !IE LINKED WITH lIE SWlC£ COlE FIll A64-i'OINT, RAD!!-4 m.

•N
::t..
;:s

"

~

.fl ..t
.flod

.fl ..t
.flod

~

.float
.f1 ..t

~

.fl ..t
• float

.fl ..t
. float
.fl ••t
.f1 ..t
.f1 ••t
.fl ..t

tl

~

.""3
t:>

..,

~

§

~
c::i

~
g

S~

6

• float

§'

C

64

.Stt

.fl ••t

S

S~

... t

SINE

§

5..

.flOit
.fl ..t
.f10i.t

.11 ..t
.float
.flOi.t

. float
.floit
.flOit
.floit
.floit
.float
.floit
.float

.f1Olt
.float
. float
.f1 ..t
.f1 ••t

.float

.float
.float
.f1 ••t
.f1 ..t

.flod
.float

.float
.f1 ..t
.f1 ..t
.f1 ..t
.flod
.fl ..t
.floit
.f1 ..t
.fl ••t

.float
.f1 ••t
.f1 ..t
.f1 ••t
. f1 ..t

.float

~

Q

.f1 ..t

C

.float

C

O.OQOOOO
0.098017
0.195090
0.290285
0._
0.471397
0.555570
0.634393
0.707107
0.n3010
0.831470
0.881921
0.923880
0.956940
0.980785
0.995185

COSINE

.f1 ..t
.f1 ••t
.f1 ..t
.f1 ••t
.float

~

SINE

.dot.

~

~

.gl.bl
.globl
.gl.bl

• float

.fl ..t

• float

I.OQOOOO
0.995185
0.980785
0.956940
0.923880
0.881921
0.831470
0.n3010
0.707107
0.634393
0.=0
0.471397
0:_
0.290285
0.195090
0.098017
O.OQOOOO
-0.098017
-0.195090
-0.290285
-0.382683
-0.471397

.float

.float
• float
.flnt

. float
.floit
.f1 ..t
.float
.f10i.t
. float

-0.555570
-0.634393
-0.707107
-o.n30IO
-0.831470
-0.881921
-0.923880
-0.956940
-0.980785
-0.995185
-I.OQOOOO
-0.995185
-0.980785
-0.956940
-0.923880
-0.881921
-0.831470
-o.n30IO
-0.707107
-0.634393
-0.555570
-0.471397
-0._
-0.290285
-0.195090
-0.098017
O.OQOOOO
0.098017
0.195090
0.290285
0._
0.471397
0.555570
0.634393
0.707107
O.n30IO
0.831470
0.881921
0.923880
0.956940
0.980785
0.995185

>

"'CI

1
~
~

r;~

~:.
J:..~

Cd

~I'D

~i
I'D

Q.

•.~
~

go

t'I.l

8
~

I'D

n
~

~

S...
1=
~
~
I

..~
~

Appendix F3. Link Command File

*
~,

AF'DENDIX F3

*
**

LINK COMMAND FILE

.~

*

00 NOT TYPE IN THESE FIRST &VEN LINES

12opt64.olJt
12fopt.obj
sin64.obj
-0

SECTIONS
{

• text: {}
· data: {}

IN 809800h : { 12fopt.objlINI }
•bss 809COOh: {}

An Implementation of FFT, DCT, and Other Transforms on the TMS320C30

135

136

An Implementation of FFT. DCT. and Other Transforms on the TMS32OC30

Doublelength Floating-Point
Arithmetic on the TMS320C30

AI Lovrich

Digital Signal Processor Products-Semiconductor Group
Texas Instruments

137

138

Doublelength Floating-Point Arithmetic on the TMS320C30

In the past, extended-precision arithmetic has been implemented only on fixed-point
processors. The introduction of the TMS32OC30 Digital Signal Processor (DSP), a floatingpoint 33-MFLOP device, enables us to represent multilength floating-point math in terms
of singlelength floating-point math. Extended-precision arithmetic allows designers to have
more accuracy in their applications. Some of these applications include digital filtering,
FFTs, image processing, control, etc.
This application report describes how to extend the available precision of floatingpoint arithmetic on the TMS320C30. Our emphasis is on implementing an efficient extension of the available precision while minimizing both the execution time and the memory
usage.
The structure of this report is as follows: The first section describes the TMS320C30
DSP floating-point number representation. The second section discusses doublelength
arithmetic and some basic definitions. The third section discusses the algorithms used along
with the TMS320C30 implementation. An analysis of the error introduced by the algorithm
is presented in the fourth section. The last section provides an insight into generating Ccallable functions from assembly language routines. Finally, the appendix provides the
source listings for the extended-precision arithmetic.

Floating Point Format
The TMS320C30 supports three floating-point formats [1].
•

Short floating-point format, used to represent immediate operands, consisting of a 4-bit exponent and a 12-bit mantissa.

•

Single-precision format, used for regular floating-point value representation, consisting of an 8-bit exponent and a 24-bit mantissa.

•

The extended-precision format, used with the extended-precision registers,
consisting of an 8-bit exponent and a 32-bit mantissa.

For the extended-precision algorithms to work properly on the DSP, it is important
to start from the highest-precision floating-point format available in the system that is
used for basic floating-point operations. The single-precision format is of particular interest in developing the TMS320C30 code for extended-precision floating-point operations. Therefore, a working knowledge of the properties of this format is essential for
the concepts presented in this application report.

Doublelength Floating-Point Arithmetic on the TMS320C30

139

. In the single-precision format, the floating-point number is represented by an 8-bit
exponent field (e) in two's complement notation, and a two's complement 24-bit mantissa
field if) with an implied most-significant nonsign bit. Bit 23 of the mantissa indicates the
sign (s), as shown in Figure 1.

f
Figure 1. Single-Precision Floating-Point Format of the TMS32OC30
Operations are performed with an implied binary point between bits 23 and 22. When
the implied most-significant nonsign bit is made explicit, it is located to the immediate
left of the binary point after the sign bit. We show the implied bit explicitly throughout
this application report for clarity. The floating-point number x is expressed as follows:

x =

if
if
if

=
=

0;
1;
e = -128,

S
S

S

= 0, andf= 0

The range and precision available with the TMS320C30 single-precision floatingpoint format are illustrated by the following values:
Most Positive:
Least Positive:
Least Negative:
Most Negative:

x
x
x
x

=
=
=
=

+3.4028234 x 10+ 38
+5.8774717 X 10- 39
-5.8774724 X 10- 39
-3.4028236 x 10+ 38

Doublelength Floating-Point - The Basics
The techniques used to develop doublelength results in this application report require a singlelength floating-point system and arithmetic that satisfy certain conditions.
The TMS320C30 implementation takes the singlelength system as the highest floatingpoint precision system available. The algorithms'presented do not require a doublelength
accumulator with respect to the singlelength system used. The extended-precision formats
available are used to control the truncation or rounding of the single-precision results.
The doublelength arithmetic presented here increases precision of a given floatingpoint operation without the need for a doublelength accumulator. ,Using this method, the
result of the floating-point operations on two single-precision numbers can be determined
exactly. If x and y are two such numbers and the desired operation is addition, the result
can be represented as a pair of floating-point numbers z and zz. The z value represents

140

Doublelength Floating-Point Arithmetic on the TMS320C30

the most significant portion of the floating-point operation, while zz represents the least
significant portion of the floating-point operation.
As an example, consider the result of the exact addition of two floating-point numbers
x and y that are expressed in the single-precision format of the TMS32OC30:
x
y

= 217FFFFFh
=

OC7FFFFFh

(decimal: 1.71798682 X 1010)
(decimal: 8.19199951 x 103)

The values are represented in the TMS320C30 binary equivalent as follows:
x
Y

= 2 33
= 212

X

01.111 1111 1111 1111 1111 If11b

X 01.111 1111 1111 1111 1111 1111b

Addition of two floating-point numbers requires aligning the two variables x and y [1]:
x
y

= 233

X

= 233 X

01.111 1111 1111 1111 1111 1111b
00.000 0000 0000 0000 0000 0111 1111 1111 1111 1111 1111 1000b

As can be seen in this example, most of the precision available for y will not be
available to carry out the addition. Maintaining full precision for floating-point addition
requires extra mantissa bits beyond the 24 bits available on the DSP. Since the need for
such precision is rare, software methods are used to represent the result of the operation
as a floating-point number pair (z,zz). In our example, the exact result is represented as
follows:
z = 2 34 X 01.000 0000 0000 0000 0000 ool1b
zz = 209 X 01.111 1111 1111 1111 1111 1000b
The corresponding hexadecimal representation of (z,zz) is shown below:
z
zz

= 22000003h
=

097FFFF8h

(decimal: 1.71798753 X 1010)
(decimal: 1.0239995 X 103)

Some definitions are basic to the development of concepts in this report. First is
the definition of the floating-point operations over a system R. The system contains all
the possible floating-point numbers that the single-precision .format of the TMS32OC30
can represent. All the floating-point arithmetic is carried out in base 2. Therefore, R can
be represented as follows on the TMS320C30:

R

= [xix = m(x)2e (x),

Im(x) I <224, -128 Iyl must be valid for Equation (4) to be valid. Implementation
of Equation (4) on the TMS320C30 always generates the exact correction term zz if the
result of floating-point addition operation is made optimal. This requirement guarantees
that the result of single-precision floating-point add and subtract belongs to system R. By
swapping the x and y values 'when Ixl < Iyl, the condition for obtaining an exact result
is met.
The algorithm requires that x and y be normalized. Normalization guarantees that
the floating-point number has only one sign bit, and that sign bit is followed by nonsign
bits [1]. Floating-point addition on the TMS320C30 assumes that the operands are normalized.
The TMS32OC30 assembly code for obtaining the doublelength sum of two
singlelength floating-point numbers x and y is shown in Appendix A. First, the values
for x and y are interchanged when Ixl < Iyl. When you add x and y values, the number
with the smaller exponent, y, is shifted repeatedly until the exponents of x and yare equal
and their mantissas are aligned. We have now calculated the singlelength number, z, that
satisfies Equation (2). Since the floating-point addition on the TMS32OC30 is made optimal by rounding, the extra precision is, in effect, dropped. The extra precision value,
zz, is obtained by implementing Equation (4). Figure 2 is a graphical representation of
the implemented algorithm. The figure also shows the relationship between doublelength
number pair (z,zz) and singlelength floating-point numbers and their representation on
the TMS320C30.

Doublelength Floating-Point Arithmetic on the TMS320C30

143

x
y

F·~
=e(X~
e(y)

I

24
f(x)

1

f(y)

x+y

z
zz

f2(normalized)

Figure 2. Exact Singlelength Addition

The same algorithm can be used to implement exact floating-point subtraction on
the DSP. This is accomplished by negating the second operand and performing an exact
addition.

Doublelength Addition
A natural extension of exact singlelength addition and subtraction is its application
to doublelength arithmetic. Figure 3 shows an algorithm for implementing doublelength
addition on the DSP. Using this algorithm, you can add two doublelength numbers (x,xx)
and (y,yy) and represent the result as a doublelength number (z,zz).
The algorithm requires forming a doublelength number (r,rr) that represents an ex+ yy) + xx), results in
a number pair (r,s) that approximates the addition of (x, xx) and (y,yy). Finally, an exact
addition of rand s generates a doublelength number (z,zz) that has the same value as (x,xx)
+ (y,yy).
act addition of x and y. Generating a second number, s = «rr

To obtain exact results for addition and subtraction, subtraction and addition must
be optimal; this is guaranteed by following each subtraction or addition instruction on
the DSP with a round instruction.

144

Doublelength Floating-Point Arithmetic on the TMS320C30

; Calculate the doublelength sum of (x,xx) and (y,yy),
; the result being (z,zz)
r = x + y;
if (abs(x) >abs(y))
s = x - r + y
else
s = y - r + x
z = r + s;
zz = r - z + s;

+

yy

+

xx;

+

xx

+

yy;

Figure 3. Doublelength Addition

Exact Singlelength Multiplication
The exact singlelength multiplication is shown in Figure 4. The algorithm requires
breaking the x and y mantissas into half-length numbers, referred to as head (hx,hy) and
tail (tx,ty) sections [2]. This algorithm requires addition and subtraction to be optimal
and multiplication faithful. The TMS320C30 DSP multiplication result is faithful if the
contents of the extended-precision register are truncated.
To split x and y into two half-length numbers, a constant value is needed that is
dependent on the number of available digits. The TMS320C30 device has t = 24 bits
of mantissa in the single-precision format. Equation (5) shows that head section hx is chosen
to be as near to the value of x as possible.
hx = round(m(x)2 -tl )2e(x) +tl

(5)

Also, t1 is chosen to be approximately one-half of the available precision, or 12,
on the processor. This effectively breaks the mantissa into half-length values. Equation
(5) shows that hx is obtained by rounding and is defined to be an element of R(tl J. The
tail section tx is easily obtained by subtracting hx from x. Since floating-point subtraction
can be made optimal on the TMS32OC30, it follows that tx is an element of R(tl - IJ.
Setting the constant equal to 212 does not always satisfy Equation (5) when t is even. When
the constant is set to 212 + I, the definition of Equation (5) is satisfied. The proof for
the above is given in Reference [2].

Doublelength Floating-Point Arithmetic on the TMS320C30

145

;
;
;
;

Calculate the exact product of x and y, the result being
a doublelength number (z,zz). This algorithm uses the
following syntax when called from a user program as shown
mult12 (x,y,z,zz);
p = x X constant;
hx = x - p + p;
tx = x - hx;

p = y X constant;
hy = y - p + p;
ty = y - hy;
p = hx X hy;
q = hx X ty + tx X hy;
z = P + q;
zz = p - z + q + tx X ty;

Figure 4. Exact Singlelength Product

Doublelength Multiplication·
The doublelength multiplication algorithm, shown in Figure 5, relies on the
singlelength algorithm discussed earlier. The algorithm generates a nearly doublelength
approximation of the output result (c,cc). Note that the exact singlelength multiplication
routine is used for this approximation. Exact addition is used to generate a doublelength
floating-point number that is the closest approximation to the actual result.
The doublelength product program implementation uses the TMS32OC30 stack
capabilities to save some intermediate variables. These programs are written to be used
as callable functions or macros in your program. In either case, the stack pointer must
be set to a valid memory segment for proper code execution.
; Calculate the doublelength product of (x,xx) and (y ,yy)
; the result being a nearly doublelength number (z,zz).
; Program uses exact singlelength multiplication, mult12 (.).
mult12 (x, y, c, cc);
cc = x X yy + xx X Y
z = c + cc;
zz = c - z + cc;

+

cc;

Figure 5. Exact Doublelength Product

146

Doublelength Floating-Point Arithmetic on the TMS320C30

Doublelength Quotient and Square Root
Figures 6 and 7 show the algorithm used in calculating the doublelength quotient
and doublelength square root routines. Singlelength multiplication is used to generate a
doublelength approximation of the quotient or square root values. As with doublelength
multiplication, exact addition is used to generate a doublelength floating-point result.

i
I .•

Ii,

; Calculates the doublelength quotient of (x,xx) and (y ,yy)
; the result being (z,zz)
c = x / y;
mult12(c, y, u, uu);
cc = (x - u - uu
z = c + cc;
zz = c - z + cc;

+

xx - c X yy) / y;

Figure 6. Doublelength Quotient
; Calculate the doublelength square root of (x,xx), the
; result being (z,zz)
if (x>O) (
c = sqrt (x);
mult12 (c, c, u, uu);
cc = (x - u - uu + xx) x 0.5 / c;
z = c + cc;
zz = c - z + cc;J
else (
z = zz = O.J;

Figure 7. Doublelength Square Root

Doublelength Floating-Point Arithmetic on the TMS320C30

147

Error Analysis
This section discusses and determines an upper bound for the error generated in
forming a doublelength result. The value of the doublelength number (z,zz) is equal to
z + zz. Singlelength addition, subtraction, and multiplication results are always exact.
In doublelength addition, any error introduced in the end result is generated by calculating
the zz term. An upper bound error magnitude has been calculated in Reference [2] and
is shown in Equation (6) as follows:

IE+I s(lx+xxl + Iy+yyll x

22-2t

= IZI

x

22-2t

(6)

where t = 24 for this system. This gives an upper bound of Izi x 2-46, or approximately Izl x 1.42 x 10- 14 • This translates to a theorical accuracy greater than 13 decimal
places. Table 1 shows an example of doublelength addition using the exact addition
algorithm previously described. The numbers in the left column represent TMS320C30
hexadecimal notation for the floating-point results, and (z,zz) is the decimal equivalent
of the doublelength output result. Appendix B shows a listing ofC programs (exact) that
convert from TMS32OC30 hexadecimal notation to decimal notation.
Table 1. Exact Singlelength Arithmetic Examples
Slnglelength Addition
x
y
z
zz

= 217FF~FFh
= OC7FFFFFh
= 22000003h
= 097FFFF8h

x
y
z
zz

= FC7C8923h
= OA29A7E5h
= OA29ABD8h
= EFA46000h

(z,zz)

=

17179876351.9995117 (Exact)
17179876351.9995117 (DSP)

(z,zz)

=

1357.37010409682989 (Exact)
1357.37010409682989 (DSP)

Singlelength Multiplication

z
zz

= OF7FFFFFh
= 21FFFFFFh
= 30800000h
= 18800002h

x
y
z
zz

= FC7CB923h
= OA29A7E5h
= 07277BF7h
= EBA714FOh

x
y

148

(z,zz)

=

-:...,562949986975740 (Exact)
- 562949986975740 (DSP)

(Z,ZZ)

=

167.484236862815123 (Exact)
167.484236862815123 (DSP)

Doublelength Floating-Point Arithmetic on the TMS320C30

The doublelength product, quotient, and square-root algorithms all have a small
relative error. The upperbound error magnitude for each is given in Equations (7) through
(9).

lEX I =(Ix+xxl x Iy+yyl) x 11 x 2- 48

(7)

IE+I=(lx+xxl

(8)

Iyxyyl) x 21.1 x 2- 48

IEv' l=sqrt(lx+xxl)xI2.7 x 2- 48

(9)

Equation (7) establishes an upperbound of Izl X 3.9 x 10- 14 , or approximately
13 decimal digits of accuracy for doublelength multiplication. Similarly, an upperbound
of Izl X 7.5 X 10- 14 , or greater than 13 decimal digits for the doublelength squareroot algorithm, is established. Table 2 shows examples for each algorithm discussed, along
with the algorithm output and expected theorical output.

Doublelength Floating-Point Arithmetic on the TMS320C30

149

I,

I,'','

Table 2. Exact Doublelength Arithmetic Examples
Doublelength Multiplication
x
xx

y
yy
z
zz
x
xx

y
yy
z
zz

=
=
=
=
=
=
=
=
=
=
=
=

22000000h
097FFFFEh
21000001h
097FFFFEh
43000002h

(z,zz)

=

1.47573996570139475 x 10 20 (Exact)
1.47573996570139427 x 10 20 (DSP)

2A7FFFFCh
22000003h
097FFFF8h
OA29ABD8h
EFA46000h
2C29ABDDh

(z,zz)

=

23319450552284.2434 (Exact)
23319450552284.1250 (DSP)

13907DC2h

Doublelength Quotient
x
xx
y

yy
z
zz
x
xx

y
yy
z
zz

=
=
=
=
=
=
=
=
=
=
=
=

43000002h
2A7FFFFCh
2C29ABDDh
13907DC2h
1641205Ah

(z,zz)

=

6328365.08044074177 (Exact)
6328365.08044075966 (DSP)

FC24BE20h
22000000h
097FFFFEh
21000001h
097FFFFEh
007FFFFDh

(z,zz)

=

1.99999964237223082

(Exac~

1 .99999964237217398· (DSP)

D3400000h

Doublelength Square Root
x
xx
z
zz
x
xx
z
zz

150

=
=
=
=
=
=
=
=

2C2BDDOOh
3907DC2h
61451A4h

(z,zz)

=

4860114.04539400958 (Exact)
4860114.04539400712 (DSP)

FB39EF11h
21000001h
097FFFFEh
103504F5h
F7BC0784h

(Z,ZZ)

=

92681.9110722252960 (Exact)
92681.9110722253099 (DSP)

Doublelength FloatingcPoint Arithmetic on the TMS320C30

Note that the results were obtained using the programs shown in Appendix B. The
C programs were created and compiled on a 80386-based microcomputer running under
MS-DOS 3.3.

How to Generate C-Callable Functions
The source listings for the extended-precision arithmetic presented in Appendix A
are optimized for execution speed and code size. These routines are designed to be used
as macros in a user program environment or, with a few adjustments, as a C function.
This section provides an overview of TMS32OC30 C compiler calling conventions
necessary to create functions that can be added to the C compiler library. You need a
working knowledge of C language to understand the terminology in this section [4, 5, 6].
The C compiler uses the processor stack to pass arguments to functions, store local
variables, and save temporary values. The C compiler uses two registers of the TMS32OC30
to manage the stack pointer (SP) and the frame pointer (AR3).
When a C program calls a function, it must
1. Push the arguments onto the stack,
2. Call the function, and
3. Pop the arguments off the stack,
in that order.
On the other hand, the called C function must perform the following tasks:
1.
2.
3.
4.
5.
6.
7.
8.

Set up a local frame by saving the old frame pointer on the stack.
Assign the new frame pointer to the current value of stack pointer.
Allocate the frame.
Save any dedicated registers that the function modifies.
Execute function code.
Store a scalar value in RO.
Deallocate the frame.
Lastly, restore the old frame pointer [4].

The following code segment shows the singlelength addition routine modified to be
in C-callable form. Note that registers R4 through R7 and AR4 through AR7 are dedicated
registers used by the compiler. These registers must be saved as floating-point values.

single
fp

x
y

z
zz

.set
.set
.set
.set
.set
.set

OFFh
ar3
rO
r1
r2
r3

Doublelength Floating-Point Arithmetic on the TMS320C30

151

w
x1
y1

.set
.set
.set
.global
.width
.text

r4
r2
r3
_add12:

push
pushf
push
Idi
Idi
Idi
absf
absf
cmpf
Idflt
Idflt
dflt

fp
r4
r4
sp.fp
* -fp[2].rO
* -fp[3].r1
x.x1
y.y1
y1.x1
x.x1
y.x
x1.y

addf3
rnd
subf3
rnd
subf3
rnd
pop
popf
pop
retsu
.end

x.y.z
z
x.z.w
w
w.y.zz
zz
r4
r4
fp

96

_add12:
; Save old fp

; Point to top of stack
; Load x into rO
; Load y into r1

; Ixl > Iyl

;z=x+y
;Formw=z-x
; zz

= y - [y - w]

; Restore fp

Conclusion
This report presented an implementation of extended-precision arithmetic routines
for the TMS320C30 DSP. The programs presented include singlelength floating-point addition, subtraction, and multiplication, which produce exact doublelength results.
Doublelength floating-point addition, subtraction, multiplication, division, and square root
were also presented. The doublelength floating-point routines all had a small relative error that appeared in the correction term zz. However, it has been shown that the accuracy
of the doublelength floating-point result is at least 13 decimal digits. Table 3 is a summary
of information about the routines contained in Appendices A and B. Execution times shown

152

Doublelength Floating-Point Arithmetic on the TMS320C30

in the table are given only for the routines in Appendix A. These times do not include
the call and return if the routine is implemented as a called function. They also do not
include any context saves and restores that may be required.
Table 3. Summary Information
Routine

Mnemonic

Appendix

Code Size
(Wordsl

Execution
(Cycles)

Singlelength Add

_add12

A1

12

12

Doublelength Add

_dbladd

A2

25

25
35

Singlelength Multiply

_mult12

A3

35

Doublelength Multiply

_mult2

A4

51

51

Doublelength Divide

_div2

A5

115

115

Doublelength Square Root

_sqrt2

A6

163

163

C30DBL

B1

C30DBL2

B2

Change Two Single-Precision
TMS320C30 Numbers to One
Double-Precision Result
Change Two Double-Precision
TMS320C30 Numbers to a
Double-Precision Result

References
[1.]

[2.]
[3.]

[4.]

[5.]
[6.]

Third-Generation TMS320 User's Guide Oiterature number SPRU031), Texas Instruments, Inc., 1988.
Dekker, T.J., "A Floating-Point Technique for Extending the Available Precision",
Numer. Math. 18, 1971, pp 224-242.
Linnainmaa, S., "Software for Doubled-Precision Floating-Point Computations",
ACM Transactions on Mathematical Software, Vol. 7, No.3, Sept. 1981, pp
272-283.
TMS320C30 C Compiler Oiterature number SPRU034), Texas Instruments, Inc.,
1988.
Kernigan, B.W. and Ritchie, D.M., The C Programming Language, 2nd Revision,
Prentice-Hall, Englewood Cliffs, New Jersey, 1978.
Kochan, S.G., Programming in C, Second Edition, Howard K. Sams, Indianapolis,
Indiana, 1988.

Doublelength Floating-Point Arithmetic on the TMS320C30

153

Appendix A

154

Douhlelength Floating-Point Arithmetic on the TMS320C30

~

•

i
~

I

-Oddl2

MIGI' Al Lovricb 2121/89
Tens Instraents, Inc.

So
~

FlKTII»I IEF

f

t
f

~.

Entry Conditions:
Upon entry (rO,rl) ctnt.ins (x,y)
Exit Conditions:
Upon exit (r2.r3) contains (z,zz).
Registers Affectedl
rO, rl, r2, r3, r4

o~

:g>

aO
~

t
g.
g

So

!II

~

•x!

.stt

rl
r2
r3

...t

,4

.stt

,2

y1

...t

,3

.5tt

...t

zz

Q.
~

.>
~

....

til

•text

&

...ddI2.

~
~
c

Q
c

absf
lblf
copf
Idflt
Idflt
Idflt

x,xl

lddf3
rod
••bf3
rod

x,f,z

..bf3

'.y,zz

rod

II

retsu

.end

E:

=
....

y,yl
yl,x!

x,xl

~

I hel > Iyl ?
; if not, Debug_ x , y

y,x
xl,y

z
x,z,'

• ; zz = y .....

I

z·x+y

;fo.... =z ... x

t"4

B
~
>

8:

Appendix A2. Double Length Add

******************************************************

• FUNCTION DEF:

dbladd

• AlJlHOII: AI Lovrich

2/21/8\l

Texas Instruments, InG.

*'

Entry Conditions:
Upon entry (r(g,r1) (:ontains (x,xx) and
(r2,r':J) Gontoin (Y,YY) .
.. Ex it Cond t t ions:
Upon exit (r4,r~) contains (1,11) .
.. Re~Jisters Affected:
rfll, r1, r2, r3, 1'"4, r~), r6, r1
.. Hev i s i on: Or i 9 i na I
.. Exeeutiorl time: 25 cycles

******************************************************
.global

· set
xx
Y
YY

.5Ht
• SC"t

.set
· set

r~
,1
r2
,:J
r4

21

.5<>1

,..5

x1
y1

· set

,6

.set
· set

r1

.set

,.,

.Iexl
dblaeld:
ab'f
ab5f

unpf
I df I I
I elli t
Idll I
I df I I
I dfl I
I 

if not, exchange (x,xx)
and (Y,YY)

...
"

y + yy

.. Y + yy

"

xx

1"",1

156

addf'3
"nd

S,

r ,1

5ubf:l
rnd
acielf3

7,

r, 11

rnd
retsu
.end

zz

Z :::

r ... s

zz

Zl.

S ,12 ,1Z

2'1.

Z -+- S

Doublelength Floating-Point Arithmetic on the TMS320C30

1?;::
~

~
~

:::

()Q

S~

...............................................................................................
I

AKTIIJlIEF , ....1112

I

AUTlOII Al Lovrich 212118'1
TtxU InstrUMnh, Inc.

I

Entry Condi tioos:

I

2
::to

~

Upon entry IrO,rl) contlins bi,y)
Exit Conditions:
Upon exit CrO,rU contains IZ,21).
Registers Affected:
rOt 1'1, r2, ,.3, r4, r5, I'll, r7

;aS·

I

~

single

...

::I.

I

S-

~

::to
<"I

§
S-

"'

~

~
N

C

1:3
C

...t
...t
x
tx
q
hy
ty

.s.t
.set
... t
...t
.Stt

...

...t

rO

.Stt

.ttop

•set

r1
r7

lIPyf3

hx.hy.p

i.Ddo

single"

,p=hxlhy
; fll.) is faithful

Ipyl3
..do
1IPY13

hx.ty.t....

; tnp=hx.ty

single, teip

, II II) is llithlul
,q=txlhy
, IlIl) is liithlul
,q=hx.ty+tx.hy

tx.hy.q
singJ',q
q.t..... q
q

iddl3
rod

p,q,z

5vbf3

Z,p,11

rod
oddl
rod
lI\Iyf3
..do
oddf3
rod

zz

;z=p+q

z

>

"CI
"'CI

~

-Iu)112
Oflh
rO
r1
r2
r3
r4
r5
rS

zz

, ty=y-hy

oddf3
rod

IHHHHHIllllllllllllllllIlHHHHfllHlHfHlflH

.globil

hy.y.ty
ty

IRdn

Rtvision: Or-igi .... 1
Execution TiM: 35 Cycles

.set
.ltt

.ubl3
rod

q,Zl

; 11

= P- z

; zz=p-z+q

zz
tx.ty.t....

sing1.,t_

zZ,tn"zz
zz

,ttlp=tx'ty
, II II) is flithfu)
; zz=P-Z+q+txlty

~

t"4

.fI ..t
.tod

Idf
lIPyf3
..do

Iconstlnt, tap
top,x"
sioglt,p

; P =X

subf3
rod

p.x.hx
bx

, hx'x-p

ceDstiont
, IlCo) is flithlul
..

~
....

t7.l

-

constant:

-1011121

~.

~

!'thu
.dotl

.tlxt

=

Q"

40'17

, coo.t..t = 2"124-2412)+1

~

=

'a

=-

~
....
"CI
-<
:='

-

Iddl3
rnd

hx.p.hx
hx

; hx=x-p+p

5ubf3
rod

bx,x, tx

, tx-=x-bx

lIPyf3
..do

t ..P.Y.P
singl.,p

..bf3
rod
oddf3
rod

p,y,hy

, hy=y-p

hy
hy.p.hy
hy

,hy=y-p+p

Ix
I p .. y 1 constant

; fl (I) is flithful

VI

-...l

"~~O;=~-<~~"-'-- .,.".""..,---•• --,_~~-

-

VI
00

HHHHII.IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

f
f

f
f

FOCIl.. IEF .....1t2

~So
~

~

~.

~

Revisionl Origint.l
Execution nltt: :51 cyclts
111111111111111111111111111111111111111111111111111111

f
f

.global
.HI
...1

-"1112
Offh

rO

y
p

.I.t

hx

...1
...1
...1
...1

r1
r2
r3
r4
r'

olngl.

.set

Ix
q
hy
ty

.stt

r6

rO

zz

...1
...1

...

xx
n

...1

::I.

et

•..1

ltopO
I• .,

...1

S·
::t..

§"
::to
"'
(')

§
So

"'

~
....,

.set

•let
•set

r1
r2
r3
r4
r6
r6
r7

•text
Indn
opyf3
..do
oddf
rod
pushl

=

x,n, 1• .,0
; leo,o Xfyy
single, tapO
I I ..p • ytxx
'1,XX, tap

single, ttap
In,o,ltop
I • .,
ltop

I ltop • Xfyy + ylxx

Q
C

1 IUlt12Cx, 'I, c, eel

; hx=x-p+p

subf3
rod

hx,x,tx
Ix

oddf
rod

=x .. constant

teap,cc

; CC-Xfoyy+xxty+cc

cc

:II

C

+ CC
iddf3
rod

CC,CtZ

I Z

=C + cc

z

; (Xfyy + ytxxl

.,yf3
indo

t.IP,Y,P
single,p

.ubl3
rod
Iddl3
rod

p,y,hy
hy
hy,p,hy
hy

; tx=x-hx

; p c Y ..

; hy=y-p

;hy·y-p+p
I ty.y-hy

opyf3

hx,hy,p

; pShxlh)'

Indn

singl.,p

IIfIyf3

hx, Iy, I.op

Indn

singlt,top
Ix,hy,q

litop-hxfly

; qctxthy

i.Rdn

single,q

oddf3
rod

q,ltop,q
q

,q= hx f Iy + Ix f hy

.,yf3

lx, Iy, ltop

II..p=lxfly

andn

slogl',ltop

oddl3
rod

p,q,c

; C

.ubf3
rnd
oddf
rod
oddf
rod

c,p,Cc

I cc·p-c

=- P + q

cc

cc·p-c+q

q,cc

t

cc
ltop,cc
cc

;cc=p-c+q+txfty

te.p

; Xfyy

1 restore wrilbles
brelle:
popl
t

ee=xfyy+xxf'l+CC

+ ytxx

Z,',ZZ

t

zz -= c -

I

zz

Z

zz
ZZ,CC,Zl

:II

C- Z

+

CC

zz

retsu

.dda

constantl
.fl ..1

hy,y,ly
ty

opyl3

.ubf3
rod
oddf3
rod

CORlbnt

f

N

C

;hx-x-p

hx
hx,p,hx
hx

.ubf3
rod

,.,

.... 1t2.
opyf3

p,x,bx

rod
oddf3
rod

; P

zz·c-z+cc

zz=e-z+eC;

i\

iubf3

ttap,x,p

single"
Z

T.xu InstrUHnts, Inc.
Entry Conditions.
Upon entry (rO,rU conti,ils (x,y).
Ind (1'2,1'3) contains (xx.W)'
Exi t CoRdi tioAs:
Upon lXit (rO,l'll centaios h,zz).
Registers Affected:
1'0, rl f 1'2, 1'3, 1'4, r~, 1'6, 1'7

IlUlt12Cx, y, c, cd;
cc=xt-yy+xx-ty+cCJ
z = c + CC;

~

kORstant, ttlP

2121189

AUTIO!' AI LoYl'icb

Algorilhl used'

S-

Idf
opyf3
..do

.end

~7

; constant

=2"'(24-2412)+1

f=
B=
~

~
~

~

t"'"

i=~

~

~

pu.hf

~

s.
~

l·

cl'

a·
;:...

::I.

t

MmO!! AI Lov,ich 2121/89

~

Texas IMtr-.ents, Inc.
f

Entry Conditionsl

t

Exit Condi tioasl

f

Upon ait (rO,rlJ contains (z,u).

t

•

A.gist.rs Affectedl
rO, rJ, "2, r3, r4. r5, 1'6, r7

I

AlgorU..

I Regist.r uud as inputl Rl
Rtgisters IIOdifiedl ROt Rtf R2, R3
Rtgist.r containing Nlult. R4

•illy_f:

uHdl

IlUlt12(c, 1, v, vu),

p••hf
pop

cc=Cx-u-uu+xx-ctyy)/y;
zec+CC;
ZZ • C - Z + cc;

~
;::;.

uh

singl.

~

,

~

Q
c

y

...t
, ..t
••11

IIlc

... t

t.
yl

...t

4
by

ty

zz
xx
yy

•Itt
.Jtt
•Itt
.stt

...t
...t
.Ht

...t

t.op

.Stt
.Stt

t..,1
t ..,2

...t
...t

cc

...t
•••t
...t
.ttxt

.set

.

::a

•-div2'

pu.hl
pu,hl
p••hf

rl,r3
,1

; Y

is

AWel

for lat.r.

; U ••Igoritt. UStl v

= I'll.

Offh
..0
,I
,2
,3

Algi
• ubi
uh
push

,6
..0
.1
,2

popf

I Tht 8 LS8s of RI c••t.i. tho • .,o...t
I of Y.

..0
1,,0
24,..0
rO
rO

I NoM .. haVt -e-l, tht .xpontnt of x[G) •

I NOlI RI ••[0] • 1.0 • 2tt1...-1I.

• NOlI tho it...lio•• btgl ••

,3
,7
,3
,I
,2
,3

opyf3
..d.

rO,r.,r2
linglt,r2

,R2-vtx[Ol

s.brf

2.0,,2
,2
'2 •.0

I R2

,.d
opyl
..dB

zz

....

xx

..0
"'24,rO

• .[0] foralio. gi... tho ..,....t of v.

,4
r4
,5
r5

yy

,I

A few c. .ats on bolndlry conditiolll, If t • -128, then y =- O. Tht
follOlliog .[0] calcullli•• yi.ld. RI' -128 - I • 127 1M tho .lgo'ila
owrftow 11141 satunt. sinc. x[O] is larg•. Thil !Has Nl.50nablt. If 127,
tho RI • -127 - I • -128. Th•••[0] • 0 .ad Ihi ••ill c.... tho llgo,it...
I to yitld uro. Sinc' th. MOtiSN of V is Ilways blblttn 1 thil is llso
t NUoniblt. As I; rtsult, boundary conditions i,'" handltd nt_ticillly ia
• • !'tUo....l. fuhl ...

g

ir

ldf
....f

Exl,ut tho ••p....t of v.

c =- x I y,

S.

,-x/y;
The flOiting-point ft.er y is stored it Rl. Aft.r the cGaputltion is
co.pl.t.d, 1/'1 is aho ItorH in R4.

Upon tntry (rO,rU contlinl (X,f),
aDd (r2,,,3) contains (xx,»,).

t

1 51ve 'J

; HWY"J
, SIV. xx
I SlV.

x

=2.0 -

v ••[0]

I RI ••111 ••[0] • 12.0 - v ••[Oll

linglt,rO

.,yf

rO,rl,r2
siftll',r2

; R2l1ytxUl

••b,f
,.d
opyf

2.0,,2
,2
'2,.0

; R2 - 2.0 - V I xU]

••do

lintlt,rO

; RI

=.[2] •

xIII • 12.0 - v • x[1lI

>

i~

~

g.~
;'

i

~

3:

i'

-

g

.........

rO,.I,.2

IJYf

I R2.

x[21

V"

1M

liltl.,r2
I R2 • 2.0 - v • x121

IJYf
.....

2,0,.2
•2
.2,rO
lingl •• rO

; RI

IffYI

rO,rl,r2

1R2··tx[31

linglt,rO
2.0,.2

; R2 • 2.0 -

.2
r2,rt

I RI • xI4] • xI3] t (2,0 - v 'x1311

lin,l.,rO

, This ai.aizlI ttr... in till lSBI.

.ad

-....,
....
.ad
IffYI

=x(3) •

x[2] t (2.0 -v • x12])

~

:!l
• ... tilt

OQ

~

C

li'

•
f
::So

rO•• 1

; RoUH SillCl

this is- fol1 • •y I fPYF.

cut .,

v <0 !. _lod.

x[4] • 1.0.. 01 •••) I

; R2 • 1.0 - ...14] • 0.0.. 01 ....) 0
I R2 • x14]

I

(1.0 - •

I

x[4])

!Ix

adM3

IDe",.

rl,r2
-t3,r3
1'2,1'1

I Thil ..to <....iti.. 'I....
,11.(0. thinRI'-R1

ldl

1"1,1"4

I .... lIy

yl,x

IJYf

.....

it

p,y,1Iy
IIy
lIy,p,1Iy
IIy

Illy' Y - P

...13
.l1li

lIy.y.ty

I ty.y-lly

IJYf3
ando

"".by.p
liIlt1",

IffYI3

.ad
oddI3

.M

-

_1til(c, 1,

I, U)

x
yl

IIIy'y-p+p

ty
I p' ""

I

IIy

I t .... ·"".ty

be,,,,.

Iq·tx."Y

oddI3
.ad

.11II10.q
q,t..... q
q

I q.""tly+tx'hy

siftJlt,t..,

I ,..,... tx • ty OptrttiH lAd .t... tM .... 11 h t .....
• .,tial..... I ' ,..I.h.. on tIN "'Iet.

tx.ty,t....
.Ingl•• "",

I "",·tx'ty

, •• ,U

; ... ptq

...13

u,p,uu

, uu·p-u

odd'

",,"t

luu·,-u-+q

...

, .... <
I .... lIy

li ..1",

"".ty,t....

.M

Pf.hI
p.dl

tx

1fyf3

I KW X

• ........ i ... l ..

•

.. tx-x- •
;,-ylcoastut

oddI3
.od

1,-xl(l1y)

hx-x-,+,

"""y"

I rt.tore Y
I Nltore x

li.,1.,x

IDe. x -,

I

IJYf3

an...

push'

""

I

Jix,x.tx

IffI3

pop'

P,x,_

I p·xt ....tut

...13
.l1li

....

IItgf
1M
ldfn

popf

~.

C

(x[4]t(l.~(vtxl41111+X14]

I

Nit.... YltiollS

g

a

I R2 • xiS] •

...

S-

~
~

addl

IJYf

i:

~.

I R2 ••

H~I

...,3
.M

J.~f3

1'0,,.1,1'2
.in"I,t2
1.0••2
.2
,0, ..2
.111110••2
..21 rO

...

1\'
I\"

lin.l."

xU]

F.. tM lut It...U . . . . . . tilt I .... l.ti •••
• xI51' (xI4] I (1,0 - (v I x14])JJ + x[4]
Ifyl

tllp,x"

..d•

~M
V I

.....t ••t,t....

1JYf3

•

.u

nil

II to

~

popl
p.pl
p.pl
subl3
••d
.ubl
rod
popl
addf
••d
popf
.pyl
.. d.
subf
••d
;"yl
..d.

""So

~

~

~

S·

OQ

~
i'
~

yl
c
t ..p
U,t_,CC
cc

uU,ce
cc
ttlP
tHP,CC

I'utor' 1/y
restore c
restore x
cc = x - u
;C(=)(-U-UU

; restort xx
;cc·x-u-uu+xx

cc
t ..,
c,t..,

; r-tstortY'f
; c • Y1

sintl.lt.p
top,CC
cc
y1,«(

; cc=x-u-uu+xx-c"'ff
; cc • ( x - u - uu + xx -

lingle,cc

::I.
z=c+cc

f

addf3
.Ad

~.

§
So

f

i

~

Q

-01

; Z

=C + cc

z

lZ=C-Z+CC

"

Q

C,CC,Z

5ubf

Z,Cllt

rod
addf
r.d

((, ZZ

; ZZ

=-

C- Z

zz
,zz=c-z+((

zz

reha
• datI.

(Gostant:
.Il ..t
.tnd

4097

; constlAt

=2. . '24-24121+1

C f

yy I I Y

~

................................. 11 ............ 1111...

•
••
•
•
•

FllCTIIII IEF , _1qI't2
_

Al

Lovrie~

2/21/89

Texas IftltMlMnts, IIC.
Eatry Conditi... '
Upon ..try (.0,,11 coot.l.. (x,xx).
Exit Conditi....
Upeft exit (rO,".) contains btU).
Rtgi5t... Affect...

rOt 1'1, ,.2, ,.3, r4, rS, t6, ,.7
Algor!tllo used'
c· I4rt(x),
..1t12le, c, u. w);
cc· « x - u - .... + xx ) 10.5 I 'I

r·c+cc,
zz·c-z+ ccl

~

S-

1\

!S-

::a

2

~.

~

a'

•
•

Execution TiM' I63CycI ..

.11 .. 111 .. 11 ••••• 1 ....... 11 ...... 1.11.11 •• 11•• 111.1111

•y

p
hx
tx
~

~

ty

u
xx

~

tat

S-

ec

::I.

:1

~

r:)'

g

.•

cI

.,lokl
• lIt
,lit
.Ht

.m

.Ht
.Ht
.Nt
.lIt
.lIt
.lIt
.lIt
.lIt
.lIt
...t
...t
...t
.lIt
...t
• text

.....t2

om
rO
,I
,Z
,3
r4
r5
r5
r6

~

N

C

Q
C

; Ave X
I rounding

; add

hit in tilt

e.pORtAt

Iinglt,rO
,0
,I
-25,,1

I The

e l.S8s .f RI

coat,i' liZ the .",on

• xtOI for ..tI •• gl... the ........ t .f v.
,I
24,,1
,I
,I

I .... ,I

opyf

0.25,.0

; v/2

lAd.

singl.frO

negi
Dh

push
popf

=x{OI =1.0. 2II(.../Z).

_ ...to viZ.

opyf
..do
opyf
..d•
sub,f
,od

....
opyf

u.

tue rounding hi tout.

• c· .Ux)
• Extract tH ..,.ntnt of Y.

)41
rebl.

rO,r3

, Ave Y

poshl

xx

I rotw. if . - . ....-p••iti...
, SlYI' xx

,2,,1
siag't,tl

I ,I • xtll ••tOI • U.5 - (v/2)"{O]"

.ndo
opyf
..do
opyf
lAd•
sob'f
rod
..yf
lAd.

,1,,1,,2
single,r2
.0,,2
.ingle,r2
1.5.,2
,2
r2,rl
singlt,rt

; ,,2 -= )([2] • x[2]

subof

.0

I ,2 • 1.5 - Iv/2) ••tOI ••to]

I ,2' (v/2) • xtll • xtll

,od
opyf

..sq,t2I

I ,Z • (v/Z) ••tOr ••tOI

I ,2 ••tIl • >

1~
~

I ,Z' xtOI • >

S~

~
~
~

<::>

a
<::>

0'1
c..l

.ubl3
rnd
oddl
rnd
oddl
rnd

sin.ll,rO

Save Vlrjlb1.s

pu.bl
Idl

::l'.

oddl3
rnd

rl,rO
r3,"'0

• cc
I

.ingl',4
I..p.q
q

I q-hxtty+tXfby

I ptrfortl tx I ty 09tl'ltion Ind stOrt the result in tHP.
• This is to optilizt use of registers on tilt devic•.

udn

::I.

; tnp·hxfty
, q=Ix'hy

, r2 = (v/21 • x[41 • x[41

()Q

~~.

hx.ly.l....
singl.ftop
tx,by,q

=( x -

Ix. Iy. 110,
singl •• ttlP
p,q,u

,1..... lx.ly

u,p,uu

;uu·p-u

I U· P + q

uu
4,UU

; uu·p-u+q

uu
I.....~u
uu

,uu=p-u+q+tx1ty

u - uu + xx ) .. 0.5 I c

Rltt2(c, c, v, uu)

popl
popl
subl3
rnd

c
ItlP

; rlltore c
; rutore x

u,ttap,cc

; cc = x - u

Idl
lPyf3
.. dn

icoRstaht,tHP
teap,x,p
; p
singlt, P

.ubf3
rnd
oddl
rnd

p,x,hx
bx
p,bx
bx

.ubl3
rnd

I\)(,x, tx
Ix

; tx·x-hx

lPyl3
IndB

teap,Y,P
single,p

; p = y " (Onstlnt

" The floating-point nUlbtr Y is stored in Rl. After the c...,utltion is
f
(OllPlfted, llv is Ilso stored in R4.

subf3
rnd
addl3
rnd

p,y,hy

,hy=y-p

hy
by,p,by
by

,hy=y-p+p

I Register used IS inputl R2
I Atgisttl's Hdi Hed: RO. AI. R2. A3
I Register cOBtlining rtiultl R2

subl3
rnd

hy,y,ty
Iy

,ty=y-hy

"Py13
lndn

hx,hy,p
single.p

,p=hx*hy

=X i

constut

; hx=x-p
;hx=x-p+p

cc

subl

uU,ce

rod
popl
oddl
rnd

cc

pusbl
pusbl

cc

Idl
.b,f

I..p
top,ec

;cc=x-u-uu
; restore )0(
; cc=x-u-uu+xx

cc

r2.r3
r2

; NYt cc
; MY' (

is SI.Ytd for I,ter.
; The II,orilt. uses v III: Ivl.

; Y

i

f

EXt.-.ct tile _ . t .f v.
pushf

"pyf
IIId.

Jubff

r2
rl
-24,,1

pop
uh

; Tht 8 LS8s of RO contlil'l tbl exponent
I

• ><10]

_ti .. gi..R tht exp....t
negi
uh
push

=v •

x[4] • 1.0•• 01 •• -> I

1.0,.0
rO

, RI • 1.0 - v f x[4] - 0.0.. 01 .. , => 0

r.d
opyf
lodn
addf

f1,rO
sinll.,rO

, RI • x[4] • (1,0'- v f ><14])

.o,rl

, RO • x[5] • (x[4]fU.o-(vtx[4])1I+x[4]

; NoM .. hlve -t-l, the expoRfnt of x[O]

rAd

rl,r2

; Round since this is fo 11 owd by

&

tFYF

• Noll tht cu, .f v < 0 is hi.dled.

24,.1
rl
,I

popf
t

I RI

.f v.

rl
I,rl

HIli

of v

rl,r2,rO
.ing).,tO

, Noll RO = x[O]

=1.0 f

negf
ldf
ldf.

2H( ...-II.

r2,.o

r3,,3
rO,r2

; This uts condition flags.
,lfv<14] = x!3]

For Iht lut it.ratioR .. un tilt f ....l.ti •••
xl5l = Ix[4]

singl.,ce

IRI'v o .[2]

singl.,rO

.....,

0.5,ce

indn

addf3
r.d

I RO • x[2] • x[l] 0 (2.0 - v • x[l])

,1,,2,.0

udt

I rt-stort c

; cc • ( )( -

z=c+cc

udt

-

t..p
cc

v .. xU]

opyf
.ndo

,ad

popf
p.pf
"pyf

+ .[4]

0

.fl ..t
••nd
12.0 - v 0 ><13])

4097

I c•••t ..t - 2"(24-24/2)+1

I.

Appendix B

Doublelength Floating-Point Arithmetic on the TMS320C30

165

166

Doublelength Floating-Point Arithmetic on the TMS320C30

~

1\
~

s.

::J
~

~.

cl'

a"

~

::I.

S.

~~.
~

§

s.
"'

~
~

Q
c

1DOg iot IIIRtis5I., sign,
long int exp;

i!-

sigo • )C • 0x00I00000exp=x» 241
•
/f .xp-I28 co......,oftds

to O. txp-l27 is

deno ...l iztd in iNt:

"1fI...5eDt it as O. f/

=

if lexp (. -1271 .It..nIOII

I-"

i-t,
d.!
ptintH'Typo boo C30 hlx _"'n')1
,rintft·x - .),

/1 odd loplild bit ond .ig ....xt.......tis.. 1/

.h.

prtnU(ly =."
>c..fl '%I' .'11) I
xl • c3)totlxt);
x = lion. dou~IIU'lfl ..t IUbllI;
yl • c3Ot.. lyll,
y' lion. doubl.Ullfl ..t ,U'1m;

untissa

/f adjust III.DtilK if it •• "2.0 ./

If l... ti ••1
MOtiSSI -

I

if (op

if 1_lti.. = 51 z • ,..tlxll
prlntW'nz • %.1...•• zll

CbcOO8OOOOO,

>127)

Nturn(O};

/* tot IViI RUllbtr, rttUrft ttror */

/* uk••"PORtDt 127-exctSS Ind return ittt nuMer 1/

t"" '"

1271
... tis. . . I...ti .... 0x007fffffl : Islgn« 81 : Ilxp « 2311
..ttW'ntKRtisll);
)

=~
g.~

7~

~~
~''a

... I'D

8 ::p

...

~~
~

E.

.... 8~.

g.~
~

+ )'1

priDtff·'n\nType 0 to oit, .IH continut :

= OxOIGOOOOOH

'XP++I

if (eplNtita - 3) z = )C • Y;
if 10000tion - 4) z • x I )'1

!ICIIIf 1"Id·••UI
) Ril. U !- 0);

:=- 0x00I00000,

if (signl III.DtilH . . .atitA;

if (operation fa 11 z = )C + YI
if fOffration - 2) z • x - Y;

priatf(·\nz • 1.1... •• tI;

t:d~

/. COJlvtrt Mntissa t. si.n.....gni tu4t fl

do!
".lntWAHCII. Sob121. !!Jy131. DI.141. S4rt1511 'I;
lCUWU· .......tI .. l,
) IIhIlI l.,.ratlol!l II o,.ratlo0>5l,

)C

~

..ntil .. =x • OxOO7fffff;
If Isignl
MDtilS' : - OxffOOOOOO;

ttlftf(IUI,axul

z•

_

)C)

!

llinll
!
Ion. doublt )C, y, ZI
I... int xl. yll
iot it epentiOAI
I..g int c30h.llong inti;

".lltW'.'.Typo In C30 /lox result"n'll
prillltfClz - I),
lcufl"U' .blll
priatf('zz • I),
scuf('U·yilyU.
xl • c30tHlxlll
x • II ... doubl.II'lfl ..t 'lIbml
yl • c30tHlyUI
Y • II .., _111I'lflllt 'lI'1ml

~

loog int c30h.C 101, iat

3/. C3OI8. - '...... tI operlt. 01 t. sihtle-precision . "..5
i. C30 f. . .t IOd producl a d.uble-prtcllitn .esult ./
Ilnclu4t 
lllIeludt 

·.1

I
/. C3OTOE - routint t. c.....t f ... I c30 flooting poiot n _ t. I
_ . in INt forMt. Both loput IOd ..t,ut In hlx••/

SO

;

i

,. C3IIIIIL2 - Pr..... t ••,....t••1 tlIO ....bl......dli ..._ 1
ift C30 fo ...t Old pr"'co • _1e-p..clli ......1t 'f
liocl" C'altto."
lioc1_ 

)

.iaU
(

1..,
loo,
ilt
1...

_10 x. y, "
lot xl, YI. xxi, yyl;
i .....ti ..,
iot c3Ot.. n ... iotl,

J
So

~

~.

~

i·

•
::I.

~
<\
::t.
r'\

g
So
<\

~

~

Q
c

prilt"'y • '),
sW\f("U" ,'rll;
prlltf"" • ').
ICllfl"XX" ,'rYIl,
xl • c30totlxll,
xxi • c30totlxxll,
yl • c30t.. lyll,
yyl • c30totlyyl1,
X' n... 4.ubl''''"l ..t '''..111 +
n... dtubl,,,,"l ..1 '''bKlIl,
y. n... "oI>lo".lfl ..t '1('rlll +
110o. _1''''111 ..t ''''rYII1;
"(

priolfl"A4dUl, Sob121, 1tpy131, Diy141, !vtIS" "I,
ICIIIflWU", ......U..h
) ..il. ' .......ti ••<1 II .,...1100>51,
If I_tien - 11 ••
if ' ....U.. - 2).If I .....ti.. - 31 • if ",...ti.... 41 ••
if I _ t i.. - 51 ••

printf("U

c

%.1... ·,

~

e

long ilt Mltti.H... lip;
1• ., i.t ....1

"

si,. ;; x • 0x00B00000,
•., • x » 24,
/1 ~128 c.....pood.
rlpreHnt it u O. II

n

to O.

~127

is _

i'~

g.~
-~
IP.~

..d ...tiln ./

li'g.

...tiln • x • IlxODIlffl',
if l.i,.1
... ti ... I-

OxffOOOOOOJ

..":== -~~

Pi1"

.1 ..
IIfttiJ51. :. 0x00B00000;
,. c.....t ...U.n to

.i,•

..,.1 I .....,

e.~~
.... 5"

if (silO) IInti ••• 1IRtilHI

Pi1"

/. adj ••t ...ti ... if it . . -2.0 ./

:=

if 1...lis" .. 0.010000001(
Ixp++J

IIInti.A • 0x00B00000,
1
if I.... ) 1271 ..t ...101, /1· too 1.... _ , ret ..........,

Iqrlbtl;

Z),

I,.

.., .. 127;

xl • c30totlxll,
X' nOl, 400blt"'lfl ..t .11..111;
yl • c30t.. lyl1l
Y • n ... _l.II'(f101t '1I'rlll;

... ti .... I ... til... 0x007fflffl : Illgn

rt-tura(MRtiIH),
I

« 81

I IlXp

~

~

9

/...k••.,....t 127...xc... lid ..t ... i........./

ICUf('U',b:I',
".intf"zz • '1,
scufC'U',IayI),

r

. .lill4 I. ill"

if I .... (. -1271 ..t ...101;

'" .dd il(lli04 bit lid .I••_

x + y,
x -)'1
x • YI
x / y,

prl.I"""".Typo in C30 ........111"'"1,
protfC'z •

Z-X+YI

~

(

p.iotll"T1I" tlIO· C30 .... 0".111."1,
prittfC·x· '),
1CIIIf1"U"

,"II,

1

1.... i.t c30t..n .., I.t x,

i-I;
do(

",htfC'xx III 'I;
Icuf("ZX·,bxU.

"CI

/t C3OTOE - routint to cOIWrt frOi • c30 floating poiat n.Hr te •
..... ill I... I ....t. Both ioput oM ..tput 10 ......,

« 231,

S'

~

8 x 8 Discrete Cosine Transform
Implementation on
the TMS320C25 or the TMS320C30

William Hohl

Digital Signal Processor Products-Semiconductor Group
Texas Instruments

169

170

An 8 x 8 Dist;r.(!te; Cosine Transform Implementation

on the TMS320C25 or the TMS32OC30

Introduction
In the general class of orthogonal transforms, there exists one in particular, the
discrete cosine transform (DCT), that has recently gained wide popularity in signal processing. The DCT has found applications in such areas as data compression, pattern recognition, and Weiner filtering, primarily because of its close comparison to the Karhunen-Loeve
Transform (KLT) with respect to rate distortion criteria [1]. Although the KLT is considered to be optimal, there is no fast algorithm to compute it. Since there is no fast KLT
algorithm, the DCT is an attractive alternative.
For image coding, the DCT works well because of the high correlation among adjacent data samples (pixel values). Because of this correlation, the DCT provides near optimal reduction while retaining high image quality. In a comparative study [2], the DCT
was shown to outperform the Fourier, Hartley, and cas-cas transforms for image compression, providing even more motivation for finding fast implementations.,
A number of algorithms have been developed, most notably those of Hou [3] and
Lee [4], which generate higher-order DCTs from lower-order ones. This paper presents
two 8 x 8 DCT routines, one for the TMS320C25 and another for the TMS32OC30, based
upon the routine in [3].

An 8 x 8 Discrete Cosine Transjonn Implementation
on the TMS320C25 or the TMS320C30

171

I,

I,:

The neT Algorithm
For a given real data sequence
given in [1] as

Zk

{f-

=

o(k)

N

N-l

E

Xn

cos

XO,Xb •• ',XN-l,

('If' (2n+

n=O

l)k)

2N

k

=

the discrete cosine transform is

.
0, 1, . . . ,N - 1

(la)

and its inverse is

xn =

{f-

('If' (2n+

N-l
E O(k)Zk cos
N k=O

l)k) k = 0, 1, . . . ,N - 1

(lb)

2N

1
where 0 (k) =
for k = 0; otherwise, the transform is unitary. If zo is scaled up
by 2, the neT can also be written in matrix form as

.J2

z

=

{f

(2)

T(N) x,

where x and z are column vectors denoting the input and output data sequences, and T(N)
is the neT matrix of order N. Actually, expanding the matrix (neglecting the factor of

.J ~

for the moment), a 4-point

neT appears as

Zo

172

1

o

Z2

o

-0

Zl

{3

-0

-{3

{3

-0

1

Xo

-Q!

(3)
-{3

Xl

Ali 8 x 8 Discrete Cosine Transform Implementation
on the TMS320C25 or the TMS32OC30

1

where a

= .J2' {3 =

cos (~). and 0

=

sin

(ID. Similarly. the 8-pt neT can be

expressed as

ZO

1

1

1

1

1

Z4

a

-a

a

-a

a

-a

Z2

{3

-0

-{3

0

{3

-0 -{3

Z6

0

{3

-0

-{3

0

{3

Zl

}.

P.

-p

-"Y

-}.

-p.

p

"Y

X7

Z5

P.

p

-"Y

}.

-p.

-p

"Y

-}.

X5

Z3

"Y

-}.

P.

p

-"Y

}.

-p.

-p

X3

Z7

p

"Y

}.

P.

-p

-"Y

-}.

-p.

Xl

cos

GID. p.

where}.

=

cos

(~). "Y =

=

1

sin

1

Xo

a -a

x2

0

x4

-0 -{3

x6

GID. and

p

=

(4)

sin (1~). Note that

the input is no longer in natural order but has been rearranged according to the permutation
matrix P and the relation
.i

= Px,

(5)

where

P

=

1

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

0

0

0

0

0

An 8 x 8 Discrete Cosine Transform Implementation

on the TMS320C25 or the TMS320C30

173

Upon examination, the matrix T(N) in (4), which is the matrix T(N) with the rows and
columns rearranged, can be described more compactly as

(~)

T
[

T(N)

T

D (~)

(~)

-D(~)

1

(6)

since the upper half of the 8-point DCT is exactly the 4-point DCT matrix previously
generated. Using the results obtained in [3], the relationship between D (~) and
T

(~) is a given as
(7)

where
K

= RLRt,

R being the matrix that performs a bit reversal on the input data; L is the lower triangular
matrix

L

and Q

=

diag [ cos

1

0

0

0

0

0

0

0

-1

2

0

0

0

0

0

0

1

-2

2

0

1

0

0

0

-1

2

-2

2

0

0

0

0

1

-2

2

-2

2

0

0

0

-1

2

-2

2

-2

2

0

0

1

-2

2

-2

2

-2

2

0

-1

2

-2

2

-2

2 -2

2

(n

+

:)(;)L for n = 0,1,

., 7. The output vector z

is now in bit-reversed order. Signal flow graphs for 2-point, 4-point, and 8-point DCTs
are shown in Figure 1, with the multipliers defined as in (4).

174

An 8 x 8 Discrete Cosine Transform Implementation
on the TMS320C25 or the TMS32OC30

Zo

Zo

Xo

A

Zo

Xo
Z.

A

X,

Z.
Z,

A

X.

Za

a.
X,

X3

2

-1

A

Za

2-ptDCT

Z,

Z,

1:2DEMUX

2:1 MUX

(b) 4-Point

(a) 2-Point

Zo

Xc

Z.
X,
Z.

A

Zo
A

Z.
A

Z.

X.

Z.

Xa

A

Za
A

)"

Z,

X-

A

"

Xs

Zs
A

-u

X.

Z3
4-ptDCT

-y

A

X7

Z7
2:1 MUX

1:2DEMUX

(e) B-Point

Figure 1. Signal Flow Graphs for 2-Point, 4-Point, and 8-Point neTs
The structure of the algorithm looks very much like that of a Fast Fourier Transform
(FFT), since the most fundamental computation is a 2-point butterfly. This routine is actually
a generalized case of the Cooley-Tukey FFT algorithm with the addition of the recursion
at the end. If the equations for the signal flow graph are written explicitly, the recursive
nature of the DCT becomes clear; for a 4-point DCT, we have

Zo

= ZO,

Z2

= Z2,
= Z],

Zl
Z3

= 2Z3

- Zl,

An 8 x 8 Discrete Cosine Transform Implementation
on the TMS320C25 or the TMS320C30

175

and for the 8-point OCT,

zo,

1.0 =

Z4 = Z4,

=

1.2
Z2,
1.6 = Z6,
1.] = Z],
1.3
2Z3 - 1.].
1.5 = 2Z5 - 1.3,
1.7 = 2Z7 - 1.5·

=

To create a unitary transform, each element in the vector should be multiplied by
the scaling factor ~ for both the forward and inverse transforms. The inverse
transform is obtained by completely reversing the direction of the signal flow graph; i.e.,
performing the bit-reversal first, then the recursions and the butterflies, and finally, the
data permutation.
For the two-dimensional case of interest, the OCT can be described in the form

z(k,I) =

2. Ol(k) Ol(l)
N

x(m,n)

=

NE I NE I x(m,n) cos
m=O n=O

2.

NEI NEI Ol(k) Ol(l)z(k,l) cos

N

k=O

1=0

('If' (2m +

I)k) cos

2N·

('If' (2m+
2N

l)k) cos

('If' (2n +

1)~ (8a)

2N

}

('If' (2n+

I)~

2N

}

(8b)

FI

where Ol (k) =
for k = 0, unity otherwise. Like the FFT, the OCT kernel is
separable, allowing the transform to be performed in two steps, first along the rows and
then the columns.

176

An 8 x 8 Discrete Cosine Transform Implementation
on the TMS320C25 or the TMS320C30

Implementation on the TMS320C25
The DCT algorithm may be carried out in one of two ways, either using

1. A matrix formulation, where the DCT coefficients are simply multiplied by
the data, or
2. The signal flow graph.
This routine uses a matrix formulation, which requires the sixty-four cosine
coefficients to be stored in an array in memory. The matrix formulation is based on the
following equation:

Zo

1

1

1

1

1

1

1

1

Xo

Zl

A

'Y

P.

II

-II

-p.

-'Y

-A

Xl

Z2

{3

0

-0

-{3

-{3

-0

0

{3

X2

Z3

'Y

-II

-A

-p.

p.

II

-'Y

x3

4

a

-a

-a

a

a

-a -a

a

X4

Z5

p.

-A

II

'Y

-'Y

A

-p.

X5

Z6

0

-{3

{3

-0

-0

0

X6

Z7

II

-p.

'Y

-A

-II

X7

where A = cos

(~),

'Y = cos

(N), p. =

A
-II

{3 -{3

A -'Y

p.

(i~'

and

sin

II

= sin

(7)

(~).

The algorithm described above has been shown to be numerically stable for fixedpoint processors; however, to prevent serious data errors, truncation and roundoff must
be accounted for. A roundoff technique similar to the one in [6], is used to prescale the
matrix coefficients by (2 15 - 1). This product is then loaded into the accumulator with
a one-bit left shift, effectively dividing it by 215. After a multiplication is performed, the
32-bit value in the accumulator must be rounded to sixteen bits, where bits 13,14, and
15 are used to determine the value of the sixteenth bit. The TMS32OC25 performs this
operation in a single instruction by adding 3000h to the accumulator product with a onebit left shift, as outlined in the code shown in Figure 2.

An 8 x 8 Discrete Cosine Transform Implementation
on the TMS320C25 or the TMS32OC30

177

*
*

INITIALIZE MATRIX COEFFICIENTS AND ROUNDOFF VALUES INTO
INTERNAL BLOCK 0

*
DCTINI

*
*
*

T2

LDPK
RSXM
SPM
LRLK
RPTK
BLKP
LRLK
RPTK
BLKP

RNDOFF

1
AR1,COEFF
EDATA-IDATA
IDATA, * +
AR1,RNDOFF
10
EDATA,*+

SIGN-EXTENSION MODE
LEFT SHIFT 1 BIT
COEFFICIENTS
VARIABLES

SECOND SET OF COEFFICIENTS
LAR

AR1,DST

MAR
LAR
LARK
LT
MPY
ZAC
RPTK
MAC

*+,AR2
AR2,SRC
AR3,7
*+,AR2
C10

LTA
MPY
ADD
SACH
BANZ

* + ,AR1
C10
RNDOFF
*0+,AR3
t2,*-,AR2

AR1 IS NOW DESTINATION
POINTER
WORK ON SECOND COLUMN

6
C11,* +

*

Figure 2. TMS32OC25 Code for Roundoff Routine

178

An 8 x 8 Discrete Cosine Transform Implementation

on the TMS320C25 or the TMS32OC30

After the multiplications are computed, the results are stored in another array area
in transposed order; thus, a separate routine for transposing the matrix is not needed. Once
the rows are transformed, the pointers for the input and output matrices are exchanged.
When the procedure is repeated, the output is stored as rows, completing the transform.
Appendix A contains a complete program listing for the forward transform on the
TMS320C25. To perform an inverse OCT, the table of cosine coefficients should be
replaced with those used for an inverse transform.

Implementation on the TMS320C30
The TMS320C30's increased speed and flexible addressing modes can reduce
execution time substantially. In using the FFT-like structure, extraneous multiplications
are removed, and because of the TMS320C30's ability to perform parallel
multiplication/additions, two butterflies can be computed at once. After an initial subtraction
is done, the coefficient multiplication can be executed in parallel with the addition of the
data. The TMS320C30's floating-point capability eliminates not only the problems of
roundoff error associated with fixed point processors but also the need for any truncation
routines.
Because the OCT size is fixed to eight points, there are only four locations that need
exchanging; this allows for a fast bit-reversal of the data. When using the TMS320C30's
extended-precision registers for temporary storage, the transfers can be done in-place.
These data transfers are also done in parallel, since two load or store operations can be
performed simultaneously. The code for performing the bit reversal is shown in Figure
3 below.

*
*

CORRECT ORDER FROM BIT REVERSED TO NATURAL

BITREV

II
II
II
II

LDF
LDF
STF
STF
LDF
LDF
STF
STF

*ARO,RO
*-AR2,R1
R1,*ARO
RO,*-AR2
*AR1,RD
*-AR3,R1
R1, *AR1
RO,*-AR3

ONLY FOUR LOCATIONS ARE
ACTUALLY SWITCHED

Figure 3. TMS32OC30 Code for Bit Reversal

An 8 x 8 Discrete Cosine Transform Implementation

on the TMS320C25 or the TMS32OC30

179

Because of the amount of data shuffling that occurs, an eight-word scratch-pad vector
has been created with four permanent pointers set up at every other memory location.
This allows access to each element in the vector (by predecrement or preincrement
addressing) without requiring constant alteration of one or two pointer locations. Although
there is no overhead for looping on the TMS32OC30, straight-line coding is used as much
as possible to increase performance.
You can transpose the DCT matrix in the same way as in the TMS320C25
implementation: namely, store the transformed row vector as a column vector in another
matrix and interchange the input and output pointers.
, The complete routines for the forward and inverse transforms are given in Appendix B.

Results
The execution times and memory requirements for the two routines are given in
Table 1. For the TMS32OC30 implementation, the forward transform contains the scale
factor of ~, so the transform is not unitary. When the signal flow is reversed,
instructions accumulate and the time required to perform the inverse transform actually
increases (see Table 1). This increase occurs because certain multiplications cannot be
performed in parallel with another instruction. The two times are identical on a TMS32OC25
because it uses a matrix routine to compute the transform.

Table 1. Execution Times and Memory Requirements
Device

Memory Required
'Data
Program

TMS320C25

232 words·
232 words

TMS320C30

148 words**
155 words

203
203
136
136

words
words
words
words

Time Required
(1&81
257.3
257.3
99.4
107.9

(forward)
(inverse)
(forward)
(inverse)

* TMS320C25 wordlengths are 16 bits
* * TMS320C30 word lengths are 32 bits

180

An 8 x 8 Discrete Cosine Transform Implementation

on the TMS320C25 or the TMS320C30

I.'

I

Summary
Two routines for a two-dimensional Discrete Cosine Transform are presented: one
for the TMS320C25 and one for the TMS320C30, with a development of the algorithm
given for clarification. This report also discussed the similarities of the DCT to the CooleyTukey FFT algorithm and arithmetic shortcuts which can reduce the DCT's execution
time. Although these implementations use the most recent formulation, there is still room
for investigation into more efficient methods. Another approach that might prove fruitful
is to deal with the entire 8 X 8 array all at once, as suggested by Haque [7], rather than
transforming the array by rows and columns. However, both routines given in the
appendices provide fast, numerically stable solutions for applications requiring the DCT.

Acknowledgements
The author thanks Steve Ford for supplying the original code for the TMS320C25
implementation. Francois Charlot helped in modifying the code for the TMS320C25, as
well as in preparing this manuscript. Daniel Chen improved the performance of the code
for both the TMS320C25 and the TMS320C30.

References
[1] Ahmed, N., Natarajan, T., and Rao, K.R. "Discrete Cosine Transform," IEEE

Transactions on Computing, vol. C-23, pp. 90-93, January 1974.

[2] Perkins, M. "A Comparison of the Hartley, Cas-Cas, Fourier, and Discrete
Cosine Transforms for Image Coding," IEEE Transactions on Computing, vol.
36, pp. 758-760, June 1988.
[3] Hou, H.S. "A Fast Recursive Algorithm for Computing the Discrete Cosine
Transform," IEEE Transactions on ASSP, vol. ASSP-35, No. 10,
pp. 1455-1461, October 1987.
[4] Lee, B.G. "FCT - A Fast Cosine Transform," Proceedings of 1984 Conference
on ASSP, pp. 28.A.3.1-28.A.3.4, March 1984.
[5] Jayant, N.S., and Noll, P. Digital Coding of Waveforms, New York, PrenticeHall, 1984.
[6] Srinivasan, S., Jain, A.K., and Chin, T.M. "Cosine Transform Block Codec for
Images Using the TMS32010," Proceedings of IEEE ISCAS '86, Cat. No.
86CH2255-8, vol. 1, pp. 299-302.
[7] Haque, M.A. "A Two-Dimensional Fast Cosine Tranform," IEEE Transactions
on ASSP, vol. ASSP-33, pp. 1532-1539, December 1985.

An 8 x 8 Discrete Cosine Transform Implementation

on the TMS320C25 or the TMS320C30

181

00

111111111111111 .. 111111111111111111111111111111111.1.11.111111IIIIIfHHHHH

IWIC

C01,1+

, IlC(; • 0 ,PIIEG= 10 < COO

LTA

..,AR2
C_OO
RNIlOFF
to+,AR3
n, ....,ARI

, INl.UIIE LAST PROru:T IIIIIl LOAD PREG

~

8 I 8 2D-OCT IIlOOUTllt Fal lIE 1MS32OC25

IFf

THIS _
Will _
A Tlll-DIItEHSIOOAL OCT 00 EIGHT-BIT IIWlE DATA
IIIIIl IDIIIAlIZE lIE DATA TO "1"lmE TRlKATioo IIIIIl ROlNXIFF,

ADD
SAcH
8ANZ

HHfHffHlllllllllllllllflHHHlfHHHHHfHffffHHHftHlllllllllIllll1

,title

SECON) SET fE COEFFICIENTS

'8x8 OCT'

LAR
lARK
LT

ARI,DST
",AR2
AR2,SRC
AR3,7
.. ,AR2

IFf

UO

LAR
~

RESET: 8RIIIICH TO OCT, IIIIIl SET IIRP TO 0

•sect
B
• text

'RESET'
OCTINI, <,ARI

12

RSX"
SPII

8lKP

EDATA,t+

00
X
00

8I.J(P

RPTK
lRlK

10'

~(\

OCT

n
~ ~,
OI~

, YARIAa.ES

AR7. DIItEHSION-l
POINTER INCREI£NT FOR DATA TRIINSPOSITIOO
'IWIC' lEEDS I !I'ERANII IN _
/EI1IlRY

13

if~

IFf

T1

§'

IIPY

.. ,ARI
c..10

ADD

RNIlOFF

SACH
BAllI

to+,AR3
n,I-,M2

AR3,7
ARI,SRC
AR2,DST

..

c..OO

?5
~

f

:l,

ARI,SRC
AR2,DST

, ARI MIl SCI..RCE POINTER

AIlRI(

2

lIIRP
lARK
LT

..

, THIRD COlUIIN
, ACTIVATE ARI

I
AR3,7
UO

lAC

IWIC
LTA

ADD
SACH

8ANZ

COUNT FOR 8 I-D OCTs
SOURCE ADDRESS
DESTINATIoo ADDRESS (FIRST COWII)
TREG =10
IlC(;=O,PREll-xntcOO

C21,f+

..,AR2
c..2O
RNDOFF
to+,AR3
13, .... ,ARl

FOORTH SET fE COEFFICIENTS

lAC

lAR

ARl,DST

AIlRI(

3
2

lIIRP
LAR
lARK
LT

Rl'TK

IFf

T4

ZAC

..

AR2,SRC
AR3,7

c...3()

1=
~

LAR
LAR

IFf

!i5§
~

LTAS

RPTK

.tqu

lARK
LAR
LAR
LT

CIl, ..

IFf

FIRST SET fE COEFFICIENTS

i

~~

AR7,1
ARO,8

IWIC

>

!=

lAC

THIRD SET OF COEFFICIENTS

LOOP Fal DIItEHSIoo

~!
~I\'

lARK
lARK
CIA'

~~

~~

, SIGll-EXTEllSIOO ~
, lEFT SHIFT I BIT
, COEFFICIENTS

I£RE IS lIE OCT FLtaION

!II

~

RP1l(

lRlK

~

RNOOFF

I
ARI,COEFF
EDATA-IDATA
IDATA,*+
ARI,RNIKFF
10

::..
;S
t:l

g

I.DPI(

, ARI IS MIl DESTiNATIOO POINTER
, II!R( 00 SECOOD COl.UI1N

RPTK

INITIAliZE MATRIX COEFFICIENTS IIIIIl _ F VAlLES INTO INTERNAl. BLOCK 80
OCTINI

, STOAE RfSll T AIID TIIIIN?OSE

r.,
S'

i
~
;:::

en
(oN
~

=
n
~

!II

RPTK
I!AC
LTA

g~
s.00

"

X

~;.
\.o,i'"

t-,)<"I

v.n

-.

1;

s.~

"~~
~l
~O

T5

C~

Q~

C.§

~

§

II'\'

UO

II'\'

UO

RNIIOFF

ADD
SACH
BANI

RIIDOFF

<0+,AR3
T4,<-,AR2

LAR
LAR
Am<
LARP
LARI<
LT
rf'y
lAC
RPTK
I!AC
LTA
rf'Y
ADD
SACH
BANI

g'

LAR
Am<
LARP
LAR
LARI<
LT
II'\'

..

C-4O
T8

C41,++
++,l1li2
C_4O
RIIDOFF



lAC
RPTK
I!AC
LTA
II'Y
ADD
SACH
BANI

CO
t.)

T7

ARl,IIST
7
2
AR2,SRC
AR3,7

..

C-70

C71,f+

.. ,ARI

UO
RNIIOFF


1
AR3,7

..

LbO

CHANGE SOORCf AND DESTINATION POINTERS,
SO RESllT OF FIRST PASS BEaJI£S ~
OF SECOO PASS, FINAL RESULT WILL BE IN
PICT
AR7 : DII£NSION crurrER
UXP FOO hEXT DII£NSION

CNFD
B

C51,"

SEVENTH SET OF COEFFICIENTS

LAR
LAR
Am<
LARP
LARI<
LT
II'Y
lAC

LAR
Am<
LARP
LAR
LARK
LT
II'Y
lAC
RPTK
I!AC
LTA
If'Y
ADD
SACH
BANI

LOOF FOO hEXT DII£NSION

ARl,IIST

C-SO

811

.lIOrd

4551

,
,
,
,

THIS IS TO SET Lf' TIE LAllfLS FOO A CNFP
OCT ClEFFICIENTS
FIRST ROW OF COEFFICIENTS
5792 = 11/41 < 2«H/21 IN Q15 FORMT

, SEClNl ROW OF ClEFFICIENTS

~

~
Co
X
Co

t;,
<:)

to·
~

:!~

it~

mi'

~ii
~

~~

~r

~
i.
a
C§

......

CI3
CI4
CI5
CI6
CI7
C20
C21
C22
C23
C24
C25
C26
C27
C30
C31
C32
1:33
C34

•• rel

C3S
C36

.lIOrd

...rel

.Il10,.4
.lIIOrd

.ltOrd
,1fO,.d

,word
.liHIrd

.werd
.... d
,lIIOro
.oord
,l1li01"6

.....d

.word

-1~

,lfOrd
.MOrd

-8034
-4551
4551
8034
15'18
-6811

......
.word

C37
C40
&41
&42
C43
C44
&45

,lIIOrd
,lIIOr6

C46

.Morel

&47
C50
CSI
C52
C53
C54
C55

.word
•••rd
.word

,liIIord

sm

,word

,word
.ltOrd
...rd
,lIIOr.

-5792
-5192
5792
5792
-5m
-5m
5792
4551
-8034
15'18
6811
-6811
-15'18
8034
-4551
3134
-7568
7568
-3134
-3134
7568

,lor.

-7568

,.rd

.word

... rd

._d

.lIIIord
,lIIord

C56

,lIIord

C57
C60
C61
C62
C63
C64
C65
C66
C67
C70
C71
C72
C73
C74
C75
C76
C77

,11101'41

.word
,word

.lIIOrd

I~

.word

-4551
6811
-8034
8034
-6811
4551
-15'18
EDATA

...rd
,1II10rd

.1III0rd
,.ord

,word
.lu.1

;
;
;
;
;
;
,

15'18 • H/41
4551 • H/41
6811 • (lf41
8034 • H/41
third 1'0l1li of
3134 = H/41
7568 = H/41

.word

.......

• 51N(P1I161 IN 815 FONT
• SIN(3I'1I161 IN GI5 FONT
• COS(3I'1I161 IN GI5 FONT
• COS(P1I161 III GI5 FONT
coeffici.nts
• SI"(PII81 IN 915 FONT
• COS(PI181 IN 915 FONT

••rd
.ltOrd
••rd
...rd

12288
P1CT
IIESlLT

sm
8034
7568
6811

.• word

sm

,1fO,.d
,word

4551
3134

,.rel

I~

,_FAtTlII
, AIDIESS IF PICTlIiE
; II1I1IIIESS IF IIESlLT
, COO ClEFFICIEHI
, CIO ClEFFICIEHI
; C20 ClEFFICIEHI
, C30 ClEFFICIEHI
; C40 ClEFFICIEHI
, C50 ClEFFICIENT
, C60 ClEFFICIEHI
I C70. ClEFFICIENT

DATA IEFINITIONS
, FOURTH ROW IF ClEFFICIENTS

, FIFTH ROW IF ClEFFICIEllTS

; SIXTH ROW IF aEFFICIEIITS

I se.BlTH ROW

(F

aEFFICIENTS

3134

,lford

.lIIOrd

.MOrei

I~

-15'18
-4551
-6811
-8034
7568
3134
-3134
-7568
-7568
-3134
3134
7568
6811

; EIGHTH ROW

, END

(F

(F

ClEFFICIEIITS

ClEFFICIENTS TAIlE

aEFF

.used
.BSS
.BSS
.BSS
.BSS
.BSS
.BSS
.BSS
.855
.855
.BSS
.BSS
.BSS
.BSS
.tnd

"aI£fFS" .64
PICT.64
IIESlLT,64
_,I
SRC,I
IIST,I
c..OO,1
c..10, I
1:.20,1
c..3O,1
c..40,1
c..so,1
c..6O,1
C_70,1

I
I
I
I
I
,
I
I
I
I
;
;
;
,

OCT ClEFFICIEliTS ((as INTO 801
PICTlIiE
IIESlLT, AFTEII OCT
RWiD!FF FACTIJ!
SOIJIC£ AIlDRESS FUR CUlRENT DCT LOOP
IESTINATION ADIIIESS
COO ClEFFICIENT
CIO ClEFFICIEIIT
C20 COEFFICIENT
C30 ClEFFICIEIIT
C40 ClEFFlCIENT
C50 aEFFICIEIIT
C60 ClEFFICIENT
C70 ClEFFICIENT

ffHfHHHllfHllllllllllllllllllllIllHHMHlHfHflHHHHHllIllllllllI1

g~

•

II

_IIIX,RI
RI,_(lRIl
1AR4++1II1,RI
RI, _(lRII
_IIIX,RI
RI,_(lRII
1AR4++IIIX,RI
RI, _(lRII
_(lRII,R5

IIIJ(I

SUBI

63,ARb

LDI
LDI
LDI
LDI

ISCRATCH,AR4
IOUIPUT, AR5
IINPUT,ARb
7,RC

LDI
IIIUI
LDI
LDI
ADDI

1!RTN2,R4
BLK3
OCT
AR5,ARD
AR5,ARI
I,ARI

n

LIlF
STF
LIlF
STF
LIlF
SIF
LIlF
STF
LIlF
STF
LIlF
STF
LIlF
STF
LIlF
STF
LIlF

_UIX,RI
RI,_(lRII
1AR4++IIIX,RI
RI, _(lRII
1AR4++IIIX, RI
RI, _(lRII
_++(!IX,RI
RI,_(lRII
_IIIX.RI
RI,_(lRIl
1AR4++1II1, Rl
RI,_(lRII
_IIIX,RI
RI,_(lRII
IAR4++UI1,RI
RI,_(lRII
_(lRII,R5

BLK3

SUlI

63,_

I IN:REIEIIT PQINTERS

IR

EIII

; END

II

(\

IIIITIO!' WILLIAII lOt.

::

~~
Nli

THIS PROOIWI IS IIASEII 011 A RECENT ALGORITIII PROPOSED BY H,S. IIIU
(TRANSACTIONS 011 ASSP, '1(1.. ASSP-35, III. 10, OCTOIER 1987, PP. I~
14611.

s.00X

Bll
\.Ion

-.
ir~

.,C) 2

~[

•

INPUT IlATRIX IS STORED IN RAIl, lIND THE RESlLTS ARE STORED IN TI£ SAllE
lOCATlOII.

1111111111111I11111111111I.llllftflfHlHHHHlllllllllllllllllll1111111"'11

•

~~
N~

Q~
c~

r
:s

~.

g

.ISS
.ISS
.ISS
.globol
.global

INPUT

.lIIord
.word

OOTPUT

.!IIord

SCRATCH
SCRlAST
RTNI
RTN2

,lfOrd
... rd
.MClrd
....rd

-COS

II
II
II

:I

COSTAl
INP
M
SCR
SCR+7
TRANSI
TlWIS2

RPTB

.ttxt
LDI
LDI
LDI
LDI
LIlF
LDI
LDI
LDI
LIlF
LIlF

7.RC
2,IRD
8,IRI
8,11(
ISCRATCH
ISCRATCH,AR4
lOOTPUT ,ARb
IINPUT,AR5
Q.25,R6
2.0,R7

LDI

IRTNI,R4
IUO
OCT
AR5,ARD
AR5,ARI
I,ARI

RPTB
IIIUI
LDI
LDI
ADDI

00
VI

; SCRATCHPAD PEIIORY

:I

.dltl

•START

-

M,M
INP,M
SCR,8
COSTAl
START

•TlWIS2'
; SET IlFFER lEIIIl11F8

II

n
;
;
;
;

VARIABLE lOCATIONS
I«llIS INPUT MTRlX
CONSTANT 0.25
CONSTANT 2.0

; RET\III ADDRESS IF StIIROOTIIE

II

::
II
II

I POINTS TO INPUT

_++(1)X,RI ; TMNSPOSE TI£ lOIS
RI,_!IRII ; INTO CCLIIIe
_++(IIX,RI
RI,_(lRIl
_++(!IX,RI
Rl, _!IRII
_(!IX,RI
RI, _ ( IRII

LIlF
STF
LIlF
STF
LIlF
STF
LIlF
STF
LIlF
STF
LIlF
STF
LIlF
STF
LIlF
STF
LIlF

TRANSII

TITLE' 2-D DISCRETE COSllE TIWI!fOIIIt, (8)<81 VERSION 1.0

n

•END

f
Q.
~.

.=
; 00 OCT 011 CWJIIN
; \lECTORS
; RETUlli ADDRESS IF SUBROOTIIE

~
t-3

i

:3.

; POINTS TO INPUT

f
fIl

S-

'~"
til

~

~

1M

=

....

~

SHUFFLE TI£ IMTA IICaIUIING TO PERftUTATlIII "'TRIX P
II

IICT

LOI
LOI
LOI

AR4,M2
1SCRLAST,AR3
LCOS,AR7

I POINTS TO OOTPUT

LIIF
LIIF
SIF
SIF
LIIF
LIIF
STF
SIF
LIIF
LIIF
SIF
SIF
LIIF
LIIF
SIF
SIF

IARO++(IROI,RO
tARI++(IROI,RI
RO,IAR2++111 I GOING IIIMI
RI, tAR3-111 I GOING II'
fARO++fIROI,RO
tARI++(IROI,RI
RO, iAR2++UI
RI, tAR3-111
IARO++!IROI,RO
tARI++!IROI,RI
RO,IAR2++(li
RI,tAR3-(1I
IARO++IIROI,RO
IARI++!IRO I, RI
RO,IAR2++(1I
RI, tAR3-(11

I TAI!I.f POINTER

"

SIF
SIF
SIF
STF

RI,t-AR3
R2,t-MI
RO,tAR3
R3,tARl

SECOIIII GROUP OF BUTTERFLIES
II
II

II
II

II

IIIDIFIED

~
00
X

00

~

§ ~

it;
~ ~.

II

~~

91
~ ~
it~

~i

~§
.~
~
Qc:).
0::1

I:

II

I:

m

II

::
II

"
II

"

ALGORITItI

LOI
ADDI
LOI
ADDI
lOI
ADDI
LOI
ADDI

AR4,ARO
I,ARO
ARO,ARI
2,ARI
ARI,M2
2,M2
M2,AR3
2,AR3

LIIF
LIF
SUlIF3
SUlIF3
II'YF3
ADDF3
II'YF3
ADDF3
SIF
SIF
SIF
SIF
LIIF
LIIF
SUlIF3
SUlIF3
II'YF3
ADDF3
1I'YF3
ADDF3

t-AR2, R2
I THESE SECTlIIIS PERFlIlII
1AR2,R3
, Till BUTTERFLIES AT ilia
t-AR2, t-ARO,RI
1AR2,IARO,RO I POINTERS ARE SET AS FIllOWS:
RI,tAR7++III,RI,
R3,tARO,R3
I 1101
RO,tAR7++III,RO, XIII ARO
R2,t-ARO,Ri
I XI21
RI,'-AR2
I XI31 ARI
R2, t-ARO
I XI41
RO,IAR2
I X151 M2
R3,fARO
I XI61
t-AR3,R2
I XI71 AR3
tAR3,R3
t-AR3, t-MI,RI
tAR3,tARI,RO
RI,tAR7++UI,RI
R3, tARI,R3
RO,tAR7++I1I,RO
R2, t-MI,R2

I POINT TO OUTPUT

"

::
I SET UP POINTERS

"
I:

LIIF
LIIF
SUlIF3
SUlIF3
1I'YF3
AGOF3
II'YF3
AGOF3
SIF

t-MI,R2
I THIS IS THE SAllE AS ABIl'IE EXC£PT THE
tARI,R3
POINTERS CHANGE
t-MI,t-ARO,RI
tARl,*ARO,RO
RI, tAR7++III, RI
R3,fARO,R3
RO,tAR7--III,RO
R2, t-ARO,R2
RI,*-MI

STF

R2,f-ARO

SIF
SIF
LIIF
LIIF
SUlIF3
SUlIF3
II'YF3
ADDF3
II'YF3
ADDF3
SIF
STF
STF
SIF

RO,IARI
R3,fARO
t-AR3,R2
tAR3,R3
<-AR3, *-AR2,RI
tAR3,IAR2, RO
RI, tAR7++(11, RI
R3,IAR2,R3
RO, tAR7++(li ,RO
R2, t-AR2,R2
RI, t-AR3
R2, t-AR2
RO,fM3
R3,1AR2

lAST SET OF BUTTERFliES

::

"
"

::
"

"

"
:I

LIIF
LIIF
SUlIF3
SUlIF3
1I'YF3
AGOF3
II'YF3
ADDF3
SIF
STF
SIF
STF
lDF
lDF
SUlIF3
SUlIF3
II'YF3
AGOF3
II'YF3
AlllF3

fARO,R2
tARI,R3
fARO, <-ARO, RI
tARI, '-MI, RO
RI, .AR7 ,RI
R3, t-MI,R3
RO, tAR7 ,RO
R2, t-ARO,R2
RI,'ARO
R2, t-ARO
R3,t-MI
RO,tARl
1AR2,R2
tAR3,R3
tAR2, t-AR2, RI
tAR3, t-AR3,RO
RI, tAR7,RI
R3, t-AR3,R3
RO, tAR7 ,RO
R2,t-AR2,R2

g

~Oo

!II

STF
STF

~
X

~~
N<'I

~~

STF

STF

II

..

So S·

!II

!II

~~

~~
l\"

I

~.

g

EXIT

STF

LDF

I:

LDF
STF
STF
LOF
LDF
STF

II

II

STF

CONTINUE WITH
RECURSE
II

::

If'YF3
I1PYF3
SlEF3
SU8F3
STF
STF

LASTLOOP 1'IPYF3
I1PYF3
II
SUBF3
II'YF3
:1
STF
SU8F3
SUBF3
STF
STF

tAAO,RO
I-AR2,Rl
Rl,lARO
RO,+-AR2
tAAl,RO
*-AR3,Rl
Rl,tAAl
RO,t-AA3
RECU'lSI~

, ONLY TWO LOCATIOOS ARE ACTIJAll Y SWITCI£D

:1

:1
II

I:

00

-..J

R4
t-AAO,RO
tAA7,RO,RO
RO,t-AAO

, RETURN

, MUlT BY I/SQRT(2)
; STORE TIlE RESU. T

I1PYF3
STF
I1PYF3
STF
I1PYF3
STF
I1PYF3
STF
I1PYF3
STF
I1PYF3
STF
I1PYF3
STF
I1PYF3
STF

I

COSTAS

, II o.t
.float
.float
.float

.float
.float
.float
.end

AlGORITHII

R7,H1R3,R2
R7,*M3,Rl
I-AR1,R2,R2
tAAl,Rl,Rl
Rl,lM3
R2,t-AA3
R7,tAA1,RO
R7,IAR2,Rl
lIARO,RO,R2
R7,(Ill,Rl
RI,tAR6++(IRll
tAR4++(IlX,RI
RI, tAR6++(IRll
tAR5++llRllRS

SUBI

63,ARb

DR

END

..
H

• datil

_COS

~

;:
00
X
00

C

...S1
<")

llif'UT
OUTPUT
SCRATCH
RTNI
RTN2

•START

;: ~

901\
!\

n

~ ~.
~!\

~~

Qs::I
v,~
C <:::)
"'I

ciS

~

§.

...ord
,lIIIl)rd

.uord
. text

TRANSI'

COS_TAB
INP
OUT
SCR
TRANSI
TRANS2

LOI
LOI
LOI
UF
LOI
lOP
LOI
LOI
LOI

7,Re
2,100
B,IRI
2.0,R7
8,BK
@OUTPUT
iOUTPUT, ARb
@SCRATCH,AR4
@INPUT,ARS

LOI

iRTNl,R4
BlKI
IOCT
ARS,ARC
@_COS,AR7
I,ARC

RPTB

~~
~~
I\l;:

.~ord

.lIIord

BRD
LOI
LOI
ADDI

~

~~

.lIIord

LDF
STF

tAR4++lm,RI
RI,tAR6++IIRl)

TRANS2'

I:
; ~TIPllER
; SET BLfFER lENGTH=64
; VARIABlE LOCATIONS

tAR4+>( !lX,RI
RI,tAR6++(IRll
tAR4+>I1)Z,RI
RI, tAR6+H IRI)
tAR4+>(!)X,RI
RI,tAR6++(IRll
tAR4+>(!lX,RI
RI,tAR6++(IRll
tAR4+>(I)X,RI
RI,tAR6++(IRll
tAR4+> ( !lX, RI
RI,tAR6++(IRll
tAR4++ (!lX, RI
RI,tAR6++(IRll
tAR5++(IRll,RS

UF
STF
UF
STf
UF
STF
LDF
STf
LDF
STf
LDF
STf
LDF
STF
LDF

II

THIS PROORA" IS BASED ON A RECENT AlGORITIf( PROPOSED BY H.S. HOU
(TRANSACTIONS ON ASSP, 1Il.. ASSP-35, 00. 10, OCTOBER 1987, PP. 14551461).

.

:1

II

LDF
STF

:1

LDF
STF

LDF

; HOlDS llif'UT MATRIX

STF
UF
STf
LOF
STF

• RETURN ADDRESS OF SUBROUTINE

; POINT TO INPUT
; TABlE POINTER

LDF

lLK6

•END

CORRECT XIO) IF ttlNZERO

; POINT TO INPUT
; TABlE POINTER

; END

g

~

IDCT

;;'00
.,. X

~~
~!:l'
S?~

~~
V,Q

g,

I1PYF3

STF

~.!
1\l<:S

~~
~~

fa

t-IIRO,RO
tAR7,RO,RO
RO,H1RO

SUBF3
SUBF3

~~

§'

LDF
""YF3
STF

H

; I'II.lT BY lIS1lRTt21

; STIllE THE RESIJlT

IIfGIN WITH RECURSION

~~

[

ARO,MI
2,Ml
Ml,M2
2,M2
M2,AR3
2,M3

SlJ8F3
SUBF3
STF
tlPYF3
ADDF3
IIPYF3
ADDF3
STF
STF
STF

tAR7++llI,R3,R3
Rl,tARl
RO,f/lRO
R2,fAR2
R3,f/IR3
f/lRO,R2
;
fARl,R3
;
tARO ,II-ARO, RO
fARl, '-AR1,Rl
RO,./IRO
Rl,ftAR7,Rl
;
R3,I-llRI,RO
,
RO,tAR7,RO
R2,t-ARO,R2
R2,*-ARO
RO, *-AR1
RI,fARI

LDF
LDF
SlJBF3
SUBF3
STF
I1PYF3
ADOF3
tlPYF3
ADOF3
STF
STF
STF

tAR2,R2
fAR3,R3
tAR2,f-AR2,RO
'AR3,'-AR3,RI
RO,f/IR2
Rl,f+AR7.Rl
; -DELTA ON NEXT I>lOUP
R3. f-AR3. RO
RO.f/IR7++IIROI,RO; BETA ON NEXT I>lOUP
R2. f-M2. R2
R2.'-/lR2
RO ....ARJ
RI.fAR3

1f'YF3
STF
STF
STF
STF
LDF

LDF

I\l<"l

~

LDI
ADDI
LD!
ADD!
LD!
ADDI

SECLOOP

SUBF3
1f'YF3
STF
STF
STF
i'lPYF3
STF
STF
SlJ8F3
SUBF3
II'YF3

STF

II

/l'YF3
STF
STF
STF

:1
Ii

XI6)-XI81
1M3, tAR2,R2
X141-XI6I
R2, tAIU,R3
2xt81-)RO
IM3,R7,RO
R2,tAR2
; XI21-X141
R3.*ARO,R2
tAR2,R7,RI
; 2*XI61-)Rl
R3,I/IRI
RO,IM3
Rl,-IAR2
IMl,R7,RO
RO,IMI
R2,tARO
f-ARJ,I-llRl,R2 Xt3I-Xt71
tAR3,IMI,R3
XI41-X181
R7,_,RO
2.X171
R2,t-llRl
R7,tAR3,Rl
; 2*X181
R3,IMI
RO,f-11R3
Rl,IM3

:i

~TURAL

TO BIT-REVERSED

LDF

f

SUBF3

BITREV

LDF

LDF

::
II

STF
STF
LDF
LDF
STF
STF

tARO,RO
f-AR2,Rl
Rl,lARO
RO,I-M2
IMl,RO
_,Rl
Rl,f/IRI
RO,_

; ONLY TWO LOCATIONS /IRE ACTUALLY SWITCHED

::

SlJBFJ
ADDF3
ADor3
STF
STF
STF
STF
LDF

FIRST SET OF BUTlERFLlES
LDF

::

~

LDF

tARO,RO
tllRl,RI

LDF

§#IR2,R2

LIF
""YFJ
IIPYF3
II'YF3

tAR3,R3
tllR7,Rl,Rl
tAR7,RO,RO
tllR7, R2, R2

LDF

:1
; PfRFIR'I TI£ IV'HA I'II.lT'S

THESE SECTIONS PERFORII
TWO BUTTERFLIES AT ONCE

-DELTA
BETA

SECOND GROUP OF BUTTERFLIES
LDF

CORRECT IlUIER FR01

; SK!P TO NEXT COEFF

::

SlJBF3
SlJBF3
N'YF3
ADDF3
N'YFJ
ADDF3
HPYF3
I1PYF3

f-ARI,R2
; THIS IS THE SIIItE AS ABOVE. EXCEPT THE
*/lRl.R3
; PO!NTERS CHAt«
I-ARI,*-IIRO.RI
tARl.*ARO.RO
R3,*IIRO.R3
R2. t-ARO.R2
Rl.'-ARI
R2.t-ARO
RO.tARl
R3.tARO
_.R2
tAR3,RJ
l-AR3,f-AR2,Rl
tAR3. tAR2. RO
tM7++(ll,Rl,Rt ;-NJ
R3,'AR2,R3
tAR7++I1I.RO.RO ;-0AMtIA
R2."-AR2,R2
R2,'/lR7++11I.R2 ; lAI1BOA
R3.tAR7.R3
; It)

~~~,i'iii":~~~

~

__-

-

8

::
::

STF
STF
STF
STF

Rl,<-AR3
R2lt-AR2
RO, > Amin, the setting time is limited by the
slowest mode, Amin. Figure 8 shows the relaxation of the mean square error from its initial value EO toward the optimal value Emin.
Adaptation based on a gradient estimate results in noise in the weight vector, therefore
a loss in performance. This noise in the adaptive process causes the steady state weight
vector to vary randomly about the optimum weight vector. The accuracy of weight vector
in steady state is measured by excess mean square error (excess MSE = E [E - EminD.
The excess MSE in the LMS algorithm [1] is
excess MSE

=

u Tr[R] Emin

(14)

where Emin is minimum MSE in the steady state.
Equations (13) and (14) yield the basic trade-off of the LMS algorithm: to obtain
high accuracy (low excess MSE) in the steady state, a small value of u is required, but
this will slow down the convergence rate. Further discussions of the characteristics and
properties of the LMS algorithm are presented in: [1, 3 through 9]. The implementations
of LMS algorithm with the TMS32OC25 and TMS32OC30 are presented next.

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

205

X

10- 3

Initial Wo

= 0.2.

w,

=

-1.0

30.00

22.50

15.00

~~=O.l
~

7.50

..........,..,.'...
~ -

__....._-'-...._'v-..~

.00
64.75

128.50

192.25

256.00

Iteration

Figure 8. Learniug Curve of an Adaptive Transversal Filter and an LMS
Algorithm with Different Step Sizes

206

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

Since u*e(n) is constant for N weights update, the error signal e(n) is first multiplied
by u to get ue(n). This constant can be computed first and then multiplied by x(n) to update w(n). An implementation method of the LMS algorithm in Equation (10) is illustrated
as
ue(n) = u*e[n];
for (i=O; i = 0
= wi(n) - ue(n) , if x(n-i) <0
for i = 0,1 , ... ,N - 1. Since the sign determination is required inside the adaptation loop
to determine the sign of x(n - i), slower throughput is expected. The total number of instruction cycles needed is lIN + 26 for the TMS32OC25 and 5N + 16 for the TMS32OC30.
Finally, the sign-sign LMS algorithm is
~(n+ 1)

=

~(n)

+ u sign[e(n)]

sign[~(n)]

(27)

which requires no multiplications at all and is used in the CCITT standard for ADPCM
transmission. As we can see from the above equations, the number of multiplications is
reduced. This simplified LMS algorithm looks promising and is designed for VLSI or
discrete IC implementation to save multiplications.
The sign-sign LMS algorithm can be implemented as
for (i=O; i = 0.) (
if (xn[i] > = 0.)
wn[i] + = u;
else
wn[i] - = u; J
else (
if (xn[i] > = 0.)
wn{i]- = u;

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

217

else
wn[i] + = u;

JJ

. When this algorithm is implemented on TMS32OC25 and TMS32<>C::30 with pipeline
architecture and a parallel multiplier, the performance of sign-sign LMS algorithm is poor
compared to standard LMS algorithm due to the determination of sign of data, which can
break the instruction pipeline and can severely reduce the execution speed of the processors.
In order to avoid double branches inside the loop, the XOR instruction is utilized
to check the sign bit of e(n) and x(n-i). The sign-sign LMS algorithm can be implemented
as
wi(n+1)

= wi(n)
= wi(n)

+ u , if sign[e(n)]
- u , otherwise

= sign[x(n-i)]

The following TMS32OC25 instruction sequence implements this algorithm without
branching (assuming that the current address register used is AR3):
LRLK
LRLK
LRLK
ADAP LAC
XOR
SACL

AR1,N-1
AR2,COEFFD
AR3,LASTAP+ 1
*-,O,AR2
ERR
ERRF

LAC

ERRF

XORK

MU,15

ADD
SACH
BANZ

*,15
*+,1,AR1
ADAP, * - ,AR3

;
;
;
;
;
;
;
;
;
;
;
;
;
;

Set up counter
Point to wi(n)
Point to x(n-i)
Load x(n-i)
XOR with e(n)
Save sign bit, sign = 0 if same signs
Sign = 1 if different signs
Sign extension to ACCH,
ACCH = 0 If ERRF > = 0
ACCH = OFFFFh if ERRF < 0
Take one's complement of m
If sign = 1
Weight update
Save new weight

The one's complement of u is used instead of -u, because they are only slightly
different and the step size does not require the exact number. The weight update with
this technique requires ION instruction cycles and FIR filtering requires N instruction cycles
so that the total number of instruction cycles needed is lIN +21. The complete TMS32OC25
assembly program is given in Appendix Fl.
To determine whether a positive or negative u should be used without branching
is trickier in the TMS32OC30. Fortunately, the extended precision registers of TMS320C30
interpret the 32 most-significant bits of the 40-bit data as the floating-point number and
the 32 least-significant bits of the 40-bit data as an integer. When a floating-point number
218

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

changes its sign, its exponent remains the same. Therefore, the sign of step size u can
be determined by using XOR logic on its mantissa. The following code shows how the

sign-sign LMS algorithm is implemented on the TMS32OC30.

II
SSLMS

II

=
=
=
=
=
=

ASH
XOR3
LDF
ASH
XOR3
ADDF3

-31,R7
RO,R7,R5
*ARO+ +(I)%,R6
-31,R6
R5,R6,R4
*AR1,R4,R3

;
;
;
;
;
;

R7
RS
R6
R6
R4
R3

LDI
RPTB
LDF
STF
ASH
XOR3
ADDF3

order-3,RC
SSLMS
*ARO+ +(1)%,R6
R3, *ARI + +(1)%
-31,R6
R5,R6,R4
*AR1,R4,R3

;
;
;
;
;
;
;

Initialize repeat counter
Do i = 0, N-3
Get next data
Update wi(n + 1)
Get the sign of data
Decide the sign of u
R3 = wi(n) + R4

LDF
STF
ASH
XOR3
ADDF3
STF

*ARO,R6
R3, *ARI + +(1)%
-31,R6
RS,R6,R4
*AR1,R4,R3
R3, *ARI + +(1)%

;
;
;
;
;
;

Get last data
Update wN-2(n+l)
Get the sign of data
Decide the sign of u
Compute wN-l(n+l)
Store last w(n + 1)

Sign[e(n)]
Sign[e(n)] * u
x(n)
Sign[x(n-i)]
Sign[x(n-i)]*Sign[e(n)] * u
wi(n) + R4

I

I

r~'

Here, RO, R4, and RS contain the value of u before updating. ARO and ARI point
to x array and w array, respectively. R7 contains the value of error signal e(n). The complete program is given in Appendix F2. The total number of instruction cycles is 5N + 16,
which is much higher than LMS algorithm.
The sign-sign LMS algorithm is developed to reduce the multiplication requirement
of the LMS algorithm. Since DSPs provide the hardware multiplier as a standard feature,
this modification does not provide any advantage when implementing this algorithm on
the DSPs. On the contrary, it causes some disadvantages since decision instructions will
destroy the instruction pipeline. If you use the XOR logic operation in order to avoid using the decision instructions, the complexity of the program will be increased and the total
number of instruction cycles will be greater than the regular LMS algorithm.

Leaky LMS Algorithm
When adaptive filters are implemented on signal processors with fixed word lengths,
roundoff noise is fed back to adaptive weights and accumulates in time without bound.
This leads to an overflow that is unacceptable for real-time applications. One solution is

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

219

i

11

,

1
I

based upon adding a small forcing function, which tends to bias each filter weight toward
zero. The leaky LMS algorithm has the form
~(n +

1) = r

~(n)

(28a)

+ u e(n) !(n)

where r is slightly less than 1.
Since r can be expressed as 1 - c and c < < 1, the TMS32OC25 can take advantage
of the built~in shifters to implement this algorithm. Therefore, Equation (28a) can be
changed to
~(n +

1) = ~(n) - c ~(n) + u e(n) !(n)

(28b)

In order to achieve the highest throughput by using ZALR and MPYA, cw(n) can
be implemented by shifting wj(n) right by m bits where 2- m is close to c. Since the length
of the accumulator is 32 bits and the high word (bits 16 to 31) is used for updating w(n),
shifting right m bits of wj(n) can be implemented by loading wj(n) and shifting left
16 - m bits. The sequence of TMS320C25 instructions to implement Equation (28b) is
shown as
LRLK
LRLK
LRLK
LT
MPY
ADAPT ZALR
MPYA
SUB
SACH
BANZ

AR1,N-l
AR2,COEFFD
AR3,LASTAP+l
ERRF
. *-,AR2
*,AR3
*-,AR2
*,LEAKY
*+,O,ARI
ADAPT, * -,AR2

;
;
;
;

Set up counter
Point to wj(n)
Point to x(n - i)
T = ERRF =u*e(n)

; LEAKY = 16-m

For each iteration, 7N instruction cycles are needed to perform the adaptation process (6N for the LMS algorithm). The total number of instruction cycles needed is 8N + 28
(see Appendix Gl for the complete program). The leaky factor r has the same effect as
adding a white noise to the input. This technique not only can solve adaptive weights
overflow problem, but also can be beneficial in an insufficient spectral excitation and stalling
situation [5] .
. The method used above is especially for the TMS32OC25, which has a free shift
feature. Since TMS320C30 is a floating-point processor, r can simply multiply to filter
coefficient. However, in order to reduce the instruction cycles, this multiplication can
combine with another instruction to be a parallel instruction inside the loop. The following code shows how tei rearrange the instructions from the LMS algorithm to include this
multiplication without an extra instruction cycle.
220

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

II
II
LLMS

MPYF
MPYF3
MPYF3
ADDF3
LDI
RPTB
MPYF3
ADDF3
MPYF3

*ARO+ +(1)%,R7,Rl
*ARO+ +(I)%,R7,Rl
*ARl,Rl,R2
order-4,RC
LLMS
*AR2,R2,RO
*+ARl(I),Rl,R2
*ARO+ +(1)%,R7,Rl
RO, *ARI + +(1)%

;
;
;
;
;
;
;
;
;
;

MPYF3
STF
MPYF3
ADDF3

*AR2,R2,RO
*+AR1(1),Rl,R2
*ARO,R7,Rl
RO, *ARI + +(1)%
*AR2,R2,RO
*+AR1(1),Rl,R2

;
;
;
;
;
;

MPYF3
STF
STF

*AR2,R2,RO
RO, *ARI + +(1)%
RO, *ARI + +(1)%

" STF
MPYF3

II ADDF3
II
II

@u~,R7

*

II

R7 = e(n)*u/r
Rl = e(n)*u*x(n)/r
Rl = e(n)*u*x(n-l)/r
R2 = wo(n) + e(n)*u*x(n)/r
Initialize repeat counter
do i = 0, N-4
RO = r*wj(n) + e(n)*u*x(n -i)
R2 = wi+l(n) + e(n)*u*x(nz-i-l)/r
Rl = e(n)*u*x(n-i-2)/r
Store wj(n + 1)
RO =
R2 =
Rl =
Store
RO =
R2 =

r*wN-3(n) + e(n)*u*x(n-N+3)
wN-2(n) + e(n)*u*x(n-N+2)/r
e(n)*u*x(n-N+l)/r
wN-3(n+ 1)
r*wj(n) + e(n)*u*x(n-N+2)
wN-l(n) +
e(n)*u*x(n - N + 1)/r
; RO = r*wj{n) + e(n)*u*x(n - N + I)
; Store wN-2(n+ 1)
; Update last w

Auxiliary registers ARO and ARI point to x and w arrays. AR2 points to the memory
location that contains value r. R7 contains the value of error signal e(n). RI and R2 are
updated before the loop because the parallel instructions inside the loop use the previous
values in Rl and R2. Note that Rl is updated twice before the loop because the updating
of R2 requires the previous value of RI. In order to update x array pointer to the new
beginning of the data buffer for next iteration, two of the loop instruction sets have been
taken out of loop and modified by eliminating the incrementation of ARO. The TMS32OC30
assembly program of an adaptive transversal filter with the leakage LMS algorithm is listed
in Appendix G2 as an example. The total number of instruction cycles for this algorithm
is 3N + 15, which is the same as the LMS algorithm. This example shows the power and
flexibility of the TMS320C30.

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

221

I
I

~.

I,

I;
I

Implementation Considerations
The adaptive filter structures and algorithms discussed previously were derived on
the basis of infinite precision arithmetic. When implementing these structures and algorithms
on a fixed integer machine, there is a limitation on the accuracy of these filters due to
the fact that the DSP operates with a finite number of bits. Thus, designers must pay attention to the effects of finite word length. In general, these effects are input quantization,
roundoff in the arithmetic operation, dynamic range constraints, and quantization of filter
coefficients. These effects can either cause deviations from the original design criteria
or create an effective noise at the filter output. These problems have been investigated
extensively, and techniques to solve these problems have been developed [28, 29].
The effects of finite precision in adaptive filters is an active research area, and some
significant results have been reported [30 through 32]. There are three categories of finite
word length effects in adaptive filters:
•

Dynamic Range Constraint (scaling to avoid overflow). Since this is not
applicable for a floating-point processor, the TMS32OC30 is not mentioned
in this portion.

•

Finite Precision Errors (errors introduced by roundoff in the arithmetic).

•

Design Issues (design of the optimum step size u that minimizes system
noise).

Dynamic Range Constraint
As shown in Figure 1, the most widely used LMS transversal filter is specified by
the difference equations
N-1
y(n)

= E

wj(n) x(n -i)

(29)

i=O
and
wj(n+1)

=

wj(n) + u*e(n)*x(n-i), for i

=

0, 1, ... , N-1

(30)

where x(n-i) is the input sequence and wj(n) are the filter coefficients.
If the input sequence and filter coefficients are properly normalized so that their
values lie between -1 and 1 using Q15 format, no error is introduced into the addition.
However, the sum of two numbers may become larger than one. This is known as overflow.
The TMS32OC25 provides four features that can be applied to handle overflow management [13]:

222

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

A.
B.
C.
D.

Branch on overflow conditions.
Overflow mode (saturation arithmetic).
Product register right shift.
Accumulator right shift.

One technique to inhibit the probability of overflow is scaling, i.e., constraining
each node within an adaptive ftlter to maintain a magnitude less than unity. In Equation
(29), the condition for Iy(n) I < 1 is
N-l
Xmax

< 1/

E

(31)

IWi(n)I

i=O
where Xmax denotes the maximum of the absolute value of the input. The right shifter
of the TMS320C25, which operates with no cycle overhead, can be applied to implement
scaling to prevent overflow of multiply-accumulate operations in Equation (29). By setting the PM bits of status register STl to 11 using the SPM or LSTl instructions, the
P register output is right-shifted 6 places. This allows up to 128 accumulations without
the possibility of an overflow. SFR instruction can also be used to right shift one bit of
the accumulator when it is near overflow.
Another effective technique to prevent overflow in the computation of Equation (29)
is using saturation arithmetic. As illustrated in Figure 12, if the result of an addition
overflows, the output is clamped at the maximum value. If saturation arithmetic is used,
it is common practice [28] to permit the amplitude of x(n -i) to be larger than the upper
bound given in Equation (31). Saturation of the ftlter represents a distortion, and the choice
of scaling on the input depends on how often such distortion is permissible. The saturation arithmetic on the TMS320C25 is controlled by the OVM bit of status register STO
and can be changed by the SOVM (set overflow mode), ROVM (reset overflow mode),
or LST (load status register).
output

1-2- 16
-1

: 1-2- 16

--------1-------"" Input

Figure 12. Saturation Arithmetic

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

223

II

Filter coefficients are updated using Equation (30). As illustrated in Figure 13, a
new technique presented in reference 31 uses the scaling factor a to prevent filter's coefficients overflow during the weight updating operation. Suppose you use a = 2-m. A right
shift by m bits implements multiplication by a, while a left shift by m bits implements
the scaling factor 1/a. Usually, the required value of a is not expected to be very small
and depends on the application. Since a scales the desired signal, it does not affect the
rate of convergence.

d ( n ) - - - - - - -......

8

1-----1,.....
x(n)

----41....-...

FILTER
STRUCTURE

1/8

e(n)

y(n)

ADAPTIVE
ALGORITHM

Figure 13. Fixed-Point Arithmetic Model of the Adaptive Filter

Finite Precision Errors
The TMS32OC25 is a 16/32-bit fixed point processor. Each data sample is represented
by a fractional number that uses 15 magnitude bits and one sign bit. The quantization interval

o = 2- b ,
(b

(32)

= 15), is called the width of quantization since the numbers are quantized in steps of o.

The products of the multiplications of data by coefficients within the filter must be
rounded or truncated to store in memory or a CPU register. As shown in Figure 14, the
roundoff error can be modeled as the white noise injected into the filter by each rounding
operation. This white noise has a uniform distribution over a quantization interval and
for rounding
- 1/20

224

< e

~1/20

(33a)

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

and

oi = (1112) 02
where

(33b)

oe2 is the variance of the white noise.

In general, roundoff noise occurs after each multiplication. However, the
TMS32OC25 has a full precision accumulator, i.e., a 16x 16-bit multiplier with a 32-bit
accumulator, so there is no roundoff when you implement a set of summations and
multiplications as in Equation (29). Rounding is performed when the result is stored back
to memory location y(n), so that only one noise source is presented in a given summation
node.
8

x~·t>>--...(4t---·
y

= Rounding Ix

• aJ

=x

• a

+

y

8

Figure 14. Fixed-Point Roundoff Noise Model
For floating-point arithmetic, the variance of the roundoff noise [31] is slightly different from Equation (33b),
(33c)
Since TMS32OC30 has a 4O/32-bit floating-point multiplier and ALU, the result from
arithmetic operation has the mantissa of [31] bits plus one sign bit. Therefore, the 0 in
Equation (33c) is equal to 2 - 31. Another roundoff noise is introduced when you restore
the result back to memory. This noise has the power of 2- 23 because the mantissa of
TMS32OC30 floating-point data is 23 bits plus one sign bit. Therefore, unless the fIlter
order is high, the roundoff noise from arithmetic operation is relatively small.
The steady-state output error of the LMS algorithm due to the finite precision
arithmetic of a digital processor was analyzed in reference [31]. It was found that the power
of arithmetic errors is inversely proportional to the adaptation step size u. The significance
of this result in the adaptive fIlter design is discussed next. Furthermore, roundoff noise
is found to accumulate in time without bound, leading to an eventual overflow [32]. The
leaky LMS algorithm presented in the previous section can be used to prevent the algorithm
overflow.

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

225

Design Issues
The performance of digital adaptive algorithms differs from infinite precision adaptive algorithms. The finite precision LMS algorithm is given as
~(n+l)

= ~(n)

+

Q[u*e(n)*~(n)]

(34)

where Q [.] denotes the operation of fixed point quantization. Whenever any correction
term u*e(n)*x(n -i) in the update of the weight vector in Equation (34) is too small, the
quantized value of that term is zero, and the corresponding weight wi(n) remains unchanged. The condition for the ith component of the vector w(n) not to be updated when the
algorithm is implemented with the TMS320C25 is

I u e(n) x(n-i) I <0/2
where

l)

= 2 -15.

(35a)

The condition for TMS320C30 is

I u e(n) x(n-i) I <

(35b)

2exp * 0/2

where exp is the exponent of wi(n) and

l)=

2- 23 .

Since the adaptive algorithms are designed to minimize the mean squared value of
the error signal, e(n) decreases with time.lfu is small enough, most of the time the weights
are not updated. This early termination of the adaptation may not allow the weight values
to converge to the optimum set, resulting in a mean square error larger than its minimum
value. The conditions for the adaptation to converge completely [30] is u > Umin where
(36a)
for the TMS32OC25 and the TMS320C30
U2min

/)2*2exp
= --";;"--7---

(36b)

4ox2 €min

where Ox 2is the power of input signal x(n) and
at steady state.

Emin

is the minimum mean squared error

In the Leaky LMS Algorithm section, it was mentioned that the excess MSE given
in Equation (14) is minimized by using small u. However, this may result in a large quantization error since the most significant term in the total output quantization error is [31]

226

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

NO/

(37)

2 a2 u

The optimum step size uo reflects a compromise between these conflicting goals.
The value of uo is shown to be too small to allow the adaptive algorithm to converge completely and also to give a slow convergence. In practice, u > uo is used for faster convergence. Hence, the excess MSE becomes larger, and the roundoff noise can typically
be neglected when compared with the excess mean square error.
Finally, recall Equations (11) and (12). The step size u has an upper limit to guarantee
the stability and convergence. Therefore, the adaptive algorithm requires

1

O 300
Initialize ONE = 1

; Initialize U

= MU = 0.01

************************************************************************

*

PERFORM THE PREDICTOR

************************************************************************
INPUT:

IN

D,PA2

; Get the input

CALL

LMS

; Call subroutine

Y,PA2

; Output the signal

D
AR7

; Insert the newest sample

*
*
OUTPUT: OUT

*

LAC
LARP
SACL
B
.end

*
INPUT

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

229

The symbols, such as ORDER, U, ONE, D, LMS, Y, and ERR, are defined and
referred to for the purpose of modular programming. The uninitialized sections specified
by the directive .usect can be placed in any location of memory according to the linker
command fIle. Note that MACD instruction requires the sources of the operands on program memory and data memory separately, and CNFP instruction configures RAM block
oas program memory. Therefore, the coeffs section has to be in data RAM block 0, and
the buffer has to be in RAM block 1. Appendix HI contains the adaptive transversal fIlter
with LMS algorithm subroutine using the TMS320C25, and Appendix H2 contains an
example of a linker command fIle.

TMS32OC30 Assembly Subroutine
Instead of a hardware stack, TMS320C30 uses a software stack, which is more flexible and convenient for a high-level language compiler. The stack memory location is
pointed to by the stack pointer SP. In order to maintain the proper program sequence,
the programmer must make certain that no data is lost and that the stack pointer always
points to proper location. The PUSH, PUSHF, POP, POPF, CALL, CALLcond, RETIcond, and RETScond instructions will change the value of the stack pointer; in addition,
writing data into it and using the interrupt will also change that value. It is the programmer's responsibility to initialize the stack pointer in the beginning of the program. The
same adaptive line enhancer example above using TMS32OC30 is listed below. The
adapfltr. int program that initializes the stack pointer and the data RAM is given in Appendix H3.

*

*

*
N
mu

*

*

*
begin

230

DEFINE GLOBAL VARIABLES AND CONSTANTS
.copy
. global
.set
. set

, 'adapfltr .int' '
LMS30 ,order ,u,d,y ,e
20
0.01

INITIALIZE POINTERS AND ARRAYS
.text
.set
LDI
LDP
LDI
LDI
LDF
RPTS
STF

$

N,BK
@xI1-addr
@XI1-addr,ARO
@wD-addr,ARI
O.O,RO
N-l
RO, *ARO+ +(1)%

;
;
;
;
;

Set
Set
Set
Set
RO

; x[]

up circular buffer
data page
pointer for x[]
pointer for w[]
= 0.0

= O.

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

IlsTF
LDI
LDI

*
*

RO, *ARI + +(1)%
@iIL-addr,AR6
@ouLaddr,AR7

; w[] = O.
; Set pointer for input ports
; Set pointer for output ports

PERFORM ADAPTIVE LINE ENHANCER

*

hput:
LDF
IILDF
STF
STF

*

*AR6,R7
*+AR6(1),R6
R7,@d
R6,*ARO

;
;
;
;

*

CALL ASSEMBLY SUBROUTINE

*
*
*
*

CALL LMS30
OUTPUT yen) AND e(n) SIGNALS
LDF
BD
LDF
STF
STF

@y,R6
input
@e,R7
R6,*AR7
R7,*+AR7(1)

;
;
;
;
;

Input den)
Input x(n)
Insert den)
Insert x(n) to buffer

Get yen)
Delay branch
Get e(n)
Send out yen)
Send out e(n)

*

DEFINE CONSTANTS
*
*
.usect "buffer" ,N
n
.usect "coeffs" ,N
wn
.usect "vars",1
iIL-addr
ouLaddr .usect "vars",1
.usect "vars" ,1
xIL-addr
wIL-addr .usect "vars" ,1
.usect "vars",1
u
.usect "vars",1
order
.usect "vars",1
d
.usect "vars",1
y
.usect "vars",1
e
" .cinit "
. sect
cinit
. word 6,iIL-addr
.word 0804000h
. word 0804002h
. word xn
.word wn
Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

231

.float
.word
. end

mu
N-2

In the above example, data memory order is initialized to N - 2 for computation convenience. The linker command fIles and the subroutine that implements the LMS transversal fIlter can be found in Appendixes H4 and H5.

C Function Libraries
The TMS32OC25 and TMS32OC30 C language compilers provide high-level language
support for these processors. The compilers allow application developers without an extensive knowledge of the device's architecture and instruction set to generate assembly
code for the device. Also, since C programs are not device-specific, it isa relatively
straightforward task to port existing C programs from other systems.
To allow fast development of efficient programs for adaptive signal processing applications, C function libraries have been developed. These libraries include functions for
adaptive transversal, symmetric transversal, and lattice structures.
TMS32OC25 C-CaHabJe Subroutines

In a C program, the memory assignments are chosen by the compiler. There are
two ways to use the most efficient instruction MACD:
A. Use inline assembly code to assign memory locations for fIlter coefficients and
buffers.
B. Reserve the desired memory locations for them and do the assignment in the
linker command fIle.
The latter method is used in this report.
For a C main program, the parameters passed to and returned from the subroutines
are all within the parentheses following the subroutine name, as shown below:
n - Filter order
mu - Convergence factor
d - Desired signal
x - Input signal
y - Address of output signal
e - Address of error signal
Since the TMS320C25 C compiler pushes the parameters from right to left into software stack pointed by ARt , the subroutine gets the parameters in reverse order, as shown
below:
MAR.
LAC

232

**-

; Set pointer for getting parameters
; ACC = N

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

SUBK
SACL
LAC
SACL
LAC
SACL
LAC
LRLK
SACL

1
ORDER
*-

; ORDER = N - 1
; Getting and storing the mu

U

*; Getting and storing the D
D
*-,O,A-R3
; Insert the newest sample
AR3,FRSTAP
*

The assembly subroutine returns the parameters y and e as follows:
LARP
LAR
LAC
SACL
LAR
LAC
SACL

ARI
AR2,*-,AR2

; Get the address of y in main

y
*,O,ARI
AR2,*,AR2
ERR
*,O,ARI

; Store y
; Get the address of e in main
; Store e

Therefore, the parameters should be entered in the order given above. If there are
other parameters, they should be inserted right after the convergence factor mu. The leaky
LMS algorithm subroutine is given as an example.
llms(n,mu,r ,d,x,&y ,&e)
the r is defined in Equation (28a). Note that the values of the AR registers, which will
be used in subroutine, and the status registers must be saved at the beginning of the
subroutine and restored right before returning to calling routine. An example of a C-callable
program is given in Appendix 11. Memory locations 0200h to 0200h + N -1 and 0300h
to 0300h + N -1 are reserved for filter coefficients and buffers, respectively. N denotes
the filter order.

TMS32OC30 C Subroutine
As previously mentioned, the TMS320C30 architecture has features designed for
a high-level language compiler. Note that the callable word is dropped in this section title
because the TMS320C30 is so flexible that the restrictions for the TMS320C25 no longer
exist. Since the memory locations of filter buffers and coefficients are determined by the
parameters that pass from the calling routine, the same subroutine can be used in different
places. However, the only restriction is that the memory locations of filter buffers must
align to the circular addressing boundary [14]. The features of TMS32OC30 architecture
that make a major contribution toward these improvements are dual data address buses,
software stack, and flexible addressing mode. The parameters passed to subroutine are
pushed into the stack. Therefore, after returning from the subroutine, the stack pointer,
SP, must be updated to point to the location where SP pointed before pushing the parameters
Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

233

into the stack. However, this will be done by the C compiler. The usage example of the
C function subroutine is given as follows:
tlms(n,u,d,&w ,&x,&y ,&e) where

n - Filter order
u - Step size
d - Desired signal
&w - Filter coefficients
&x - Input signal buffers
&y - Addr of output signal
&e - Addr of error signal

The example below shows how the C subroutine receives and manipulates the
parameters passed from the caller program and how the result is returned to the caller
routine.

*

*

*

FP

*

*

*
*
*
*

SET FRAME POINTER FP
. set
PUSH
LDI

AR3
FP
SP,FP

GET FILTER PARAMETERS
LDI
LDI
LDI

* - FP(2),R4
; Get filter order
*-FP(6),ARO; Get pointer for x[]
*- -FP(5),ARI ; Get pointer for w[]

COMPUTE ERROR SIGNAL e(n) AND STORE yen) AND e(n)
LDI
SUBF3

IlsTF
LDI
STF
MPYF

POP

*-FP(2),AR2
R2, * +FP(1),R7
R2, *AR2
*-FP(3),AR2
R7, *AR2
* + FP(2),R7
FP

;
;
;
;
;
;

Get yen) address
e(n) = den) - yen)
Send out yen)
Get e(n) address
Send out e(n) ,
R7 = e(n) * u

Note that AR3 is used as the frame pointer in TMS32OC30 C compiler. Appendix
I2 contains the com.plete LMS transversal filter example subroutine program.

Development Process and Environment
Following a four stage procedure [33] to minimize the amount of finite word length
effect analysis and real-time debugging, adaptive structures and algorithms are implemented

234

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

on the TMS32OC25. Figure 15 illustrates the flowchart of this procedure. Since the implementation on TMS32OC30 is done only by the simulator, the last stage, real-time testing,
is not implemented.
Algorithm Analysis
and C Program
Implementation

-.
.I

Re-write C Program
to Emulate
DSP Sequence

J

•

Implement in DSP
Program and Testing
by DSP Simulator

I

-.
Real-Time
Testing

J

+
Figure 15. Adaptive Filter Implementation Procedure
In the first stage, algorithm design and study is performed on a personal computer.
Once the algorithm is understood, the filter is implemented using a high-level C program
with double precision coefficients and arithmetic. This filter is considered an ideal filter.
In the second stage, the C program is rewritten in a way that emulates the same
sequence of operations with the same parameters and state variables that will be implemented
in the processors. This program then serves as a detailed outline for the DSP assembly
language program or can be compiled using TMS320C25 or TMS320C30 C compiler.
The effects of numerical errors can be measured directly by means of the technique shown
in Figure 16, where H(z) is the ideal filter implemented in the first stage and H'(z) is
a real filter. Optimization is performed to minimize the quantization error and produce
stable implementation.

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

235

H(z)

+

M

18(n)12

x(n)

~E
n=1

e 2 (nl

H(z)

Figure 16. A Commutational Technique for Evaluating Quantization Effects
In the third stage, the TMS320C25 and TMS320C30 assembly programs are
developed; then they are tested using the simulators with test data from a disk fIle. Note
that the simulation of TMS320C25 can also be implemented on the SWDS with the data
logging option. This test data is a short version of the data used in stage 2 that can be
internally generated from a program or data digitized from a real application environment. Output from the simulation is compared against the equivalent output of the C program in the second stage. Since the simulation requires data fIles to be in Q15 format,
certain precision is lost during data conversion. When a one-to-one agreement within
tolerable range is obtained between these two outputs, the processor software is assured
to be essentially correct.
The final stage is applied only to the TMS32OC25. First, you download this assembled
program into the target TMS320C25 system (SWDS) to initiate real-time operation. Thus,
the real-time debugging process is constrained primarily to debugging the 110 timing structure of the algorithm and testing the long-term stability of the algorithm. Figure 17 shows
an experimental setup for verification, in which the adaptive fIlter is configured for a onestep adaptive predictor illustrated in Figure 18. The data used for real-time testing is a
sinusoid generated by a Tektronix FG504 Function Generator embedded in white noise
generated by an HP Precision Noise Generator. The DSP gets a quantized signal from
the Analog Interface Board (AlB), performs adaptive prediction routines, and outputs an
enhanced sinusoid to the analog interface board. The corrupted i~put and predicted (enhanced) output waveforms are compared on the oscilloscope or on the HP 4361 Dynamic
Signal Analyzer. The corresponding spectra of input and output can be compared on the
signal analyzer. The signal-to-noise ratio (SNR) improvement can be measured from the
analyzer, which is connected to an HP plotter.

236

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

DSP DEVELOPMENT SYSTEM
(SWDS and AlB)

PERSONAL
COMPUTER

TEK2235
SCOPE

~

~

i

I

!,

FG504
FUNCTION
GENERATOR

+

HP3561A
DYNAMIC
SIGNAL
ANALVZER

E
+

i'e-

PRECISION
NOISE
GENERATOR
HP PLOTTER

Figure 17. Real-Time Experiment Setup

x(n)--.-------------,
d(n)

+
I - - - - - - i . e(n)

Adaptive

x(n-1)

Filter

1-_ _...._ _ _ _ Enhanced
yin)

Output

Figure 18. Block Diagram of a One-Step Adaptive Predictor
To illustrate the operation in a nonstationary environment, the adaptive predictor
is implemented using a TMS320C25, and the following experiment is performed. The
input signal is swept from 1287 Hz to 4025 Hz, then jumps back to 1287 Hz. The time
for each sweep is one second. The input spectra at every second are shown in Figure 19a;
the corresponding output spectra are shown in Figure 19b. From the observations on the
Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

237

oscilloscope and signal analyzer, the significant SNR improvement, convergence speed,
ability to track nonstationary signals, and long-term stability of the adaptive predictor are
observed.
RANGE: 17 dBV
1/15
115dBV

STATUS: PAUSED

A:MAG

Amplitude

6dB/DIV

-33
START: OHz

BW: 47.742 Hz

Figure 19(a). Spectrum

238

STOP: 5,000 Hz

or Input Signal

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

RANGE: 13 dBY
1/15

STATUS: PAUSED

A:MAG

15 dBY

TIme

Amplitude

8 dB/DIY

-33
START:

0 Hz

BW: 47.742 Hz

STOP: 5,000 Hz Frequency

Figure 19(b). Spectrum of Enhanced Output Signal

Summary
Three adaptive structures and six update algorithms are implemented with the
TMS320C25 and TMS320C30. Applications of adaptive fIlters and implementation considerations have been discussed. Two subroutine libraries that support both C language
and assembly language for two processors were developed. These routines can be readily
incorporated into TMS320C25 or TMS320C30 users' application programs.
The advancements in the TMS32OC25 and TMS320C30 devices have made the implementation of sophisticated adaptive algorithms oriented toward performing real-time
processing tasks feasible. Many adaptive signal processing algorithms are readily available
and capable of solving real-time problems when implemented on the DSP. These programs provide an efficient way to implement the widely used structures and algorithms
on the TMS320C25 and TMS320C30, based on assembly-language programming. They
are also extremely useful for choosing an algorithm for a given application. The performances of adaptive structures and algorithms that have been implemented using the
TMS320C25 and TMS320C30 have been summarized in Tables 1 and 2.

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

239

Table 1. The Performance of Adaptive Structures and Algorithms of TMS320C25
TMS320C25
LMS

Instruction Cycles

7N+28

Program Memory (Word)

33

Leaky

Instruction Cycles

8N+28

LMS

Program Memory (Word)

34

Sign-Data

Instruction Cycles

11N+26

Transversal

LMS

Program Memory (Word)

41

Structure

Sign-Error

Instruction Cycles

7N+26

LMS

Program Memory (Word)

30

Sign-Sign

Instruction Cycles

11 N + 21

LMS

Program Memory (Word)

30

Normalized

Instruction Cycles

7N+57

LMS

Program Memory (Word)

47

Instruction Cycles

7.5N+38

LMS

Symmetric
Transversal
Structure

Program Memory (Word)

50

Leaky

Instruction Cycles

8N+38

LMS

Program Memory (Word)

51

Sign-Data

Instruction Cycles

9.5N+36

LMS

Program Memory (Word)

58

Sign-Error

Instruction Cycles

7.5N +36

LMS

Program Memory (Word)

47

Sign-Sign

Instruction Cycles

9.5N+31

LMS

Program Memory (Word)

47

Normalized

Instruction Cycles

7.5N+69

LMS

Program Memory (Word)

66

Instruction Cycles

33N+32

LMS

Program Memory (Word)

63

Leaky

Instruction Cycles

35N+32

Lattice

LMS

Program Memory (Word)

65

Structure

Sign-Error

Instruction Cycles

36N+32

LMS

Program Memory (Word)

65

Normalized

Instruction Cycles

90N+34

LMS

Program Memory (Word)

92

Note: N represents filter order.

240

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

Table 2. The Performance of Adaptive Structures and Algorithms of TMS320C30
TMS320C30
LMS

Instruction Cycles

3N+ 15

Program Memory (Word)

17

Leaky

Instruction Cycles

3N+ 15

LMS

Program Memory (Word)

19

Sign-Data

Instruction Cycles

5N+16

Transversal

LMS

Program Memory (Word)

24

Structure

Sign-Error

Instruction Cycles

3N+ 16

LMS

Program Memory (Word)

18

Sign-Sign

Instruction Cycles

5N+ 16

LMS

Program Memory (Word)

24

Normalized

Instruction Cycles

3N+47

LMS

Program Memory (Word)

49

Instruction Cycles

2.5N+ 15

LMS

Symmetric
Transversal
Structure

Program Memory (Word)

23

Leaky

Instruction Cycles

2.5N+ 19

LMS

Program Memory (Word)

26

Sign-Data

Instruction Cycles

3.5N+18

LMS

Program Memory (Word)

30

Sign-Error

Instruction Cycles

2.5N+18

LMS

Program Memory (Word)

24

Sign-Sign

Instruction Cycles

3.5N+ 17

LMS

Program Memory (Word)

30

Normalized

Instruction Cycles

2.5N+50

LMS

Program Memory (Word)

56

Instruction Cycles

14N+9

LMS

Program Memory (Word)

20

Leaky

Instruction Cycles

16N+9

Lattice

LMS

Program Memory (Word)

22

Structure

Sign-Error

Instruction Cycles

16N+9

LMS

Program Memory (Word)

22

Normalized

Instruction Cycles

67N+9

LMS

Program Memory (Word)

73

Note: N represents filter order.

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

241

References
[1] B. Widrow and S. Stearns, Adaptive Signal Processing, Prentice-Hall, 1985.
[2] R. Lucky, J. Salz, and E. Weldon, Principles of Data Communications, McGrawHill, 1968.
[3] S. Haykin, Adaptive Filter Theory, Prentice-Hall, 1986.
[4] M. Honig and D. Messerschmit, Adaptive Filters: Structures, Algorithms, and Applications, Kluwer Academic, 1984.
[5] J.R. Treichler, C.R. Johnson, and M.G. Larimore, Theory and Design of Adaptive
Filters, Wiley, 1987.
[6] T. Alexander, Adaptive Signal Processing, Springer-Verlag, 1986.
[7] G. Goodwin and K. Sin, Adaptive Filtering Prediction and Control, Prentice-Hall,
1984.
[8] M. Bellanger, Adaptive Digital Filters and Signal Analysis, Marcel Dekker, 1987.
[9] J. Proakis, Digital Communications, McGraw-Hill, 1983.
[10] C. Chen and S. Kuo, "An Interactive Software Package for Adaptive Signal Processing on an mM Person Computer," 19th Pittsburgh Conference on Modeling and
Simulation, May 1988.
[11] S. Kuo, G. Ranganathan, P. Gupta, and C. Chen, "Design and Implementation of
Adaptive Filters," IEEE 1988 International Conference on Circuits and Systems, June
1988.
[12] S. Kuo, G. Ma, and C. Chen, "An Advanced DSP Code Generator for Adaptive
Filters," 1988 ASSP DSP workshop, Sept. 1988.
[13] Texas Instruments, Second-Generation IMS320 User's Guide, 1987.
[14] Texas Instruments, Third-Generation IMS320 User's Guide, 1988.
[15] S. Qureshi, "Adaptive Equalization," Invited Paper, Proceedings of the IEEE, Sept.
1985.
[16] L. Rabiner and R. Schafer, Digital Processing of Speech Signals, Prentice-Hall, 1978.
[17] N. Jayant and P. Noll, Digital Coding of Waveforms: Principles and Applications
to Speech and Video, Prentice-Hall, 1984.
[18] J. Makhoul, "Linear Prediction: A Tutorial Review," Proceedings ofthe IEEE, April
1975.
[19] C. Cowan and P. Grant, Adaptive Filters, Prentice-Hall, 1985.
[20] C. Gritton and D. Lin, "Echo Cancellation Algorithms," IEEE ASSP Magazine,
April 1984.
[21] D. Messerschmitt, et al, "Digital Voice Echo Canceller with a TMS32020," in Digital
Signal Processing Applications with the IMS320 Family, Prentice-Hall, 1986.
[22] B. Widrow, et al, "Adaptive Noise Cancelling: Principles and Applications," Proceedings of the IEEE, December 1975.
[23] A. Lovrich and R. Simar, "Implementation of FIR/IIR Filter with the
TMS32010/TMS32020," in Digital Signal Processing Applications with the IMS320
Family, Texas Instruments, 1986.

242

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

[24] S. Orfanidis, Optimum Signal Processing, MacMillan, 1985.
[25] G. Frantz, K. Lin, J. Reimer, and J. Bradley, "The Texas Instruments TMS320C25
Digital Signal Microcomputer," IEEE Micro, December 1986.
[26] B. Friedlander, "Lattice Filters for Adaptive Processing," Proceedings of the IEEE,
August 1982.
[27] A. Gersho, "Adaptive Filtering with Binary Reinforcement," IEEE Transactions
on Information Theory, March 1984.
[28] A. Oppenheim and R. Schafer, Digital Signal Processing, Chap. 9, Prentice-Hall,
1975.
[29] L. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Chap.
5, Prentice-Hall, 1975.
[30] J. R. Gitlin et al, "On the Design of Gradient Algorithms for Digitally Implemented
Adaptive Filters," IEEE Transactions on Circuit Theory, March 1973.
[31] C. Caraiscos and B. Liu, "A Roundoff Error Analysis of the LMS Adaptive
Algorithm," IEEE Transactions on ASSP, February, 1984.
[32] J. Cioffi, "Limited-Precision Effects in Adaptive Filtering," IEEE Transactions on
Circuits and Systems, July 1987.
[33] R. Crochier, R. Cox, andJ. Johnson, "Real-Time Speech Coding," IEEE Transactions on Communications, April 1982.

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

243

List of Appendices for Implementation of Adaptive Filters with the
TMS320C25 and TMS320C30
Appendix
Al
A2
Bl
B2

Cl
C2
Dl
D2

El
E2
Fl
F2
Gl
G2
HI
H2
H3
H4
H5
11
12

244

Title
Transversal Structure with LMS Algorithm Using the TMS320C25
Transversal Structure with LMS Algorithm Using the TMS320C30
Symmetric Transversal Structure with LMS Algorithm Using the
TMS320C25
Symmetric Transversal Structure with LMS Algorithm Using the
TMS320C30
Lattice Structure with LMS Algorithm Using the TMS320C25
Lattice Structure with LMS Algorithm Using the TMS32OC30
Transversal Structure with Normalized LMS Algorithm Using the
TMS320C25
Transversal Structure with Normalized LMS Algorithm Using the
TMS320C30
Transversal Structure with Sign-Error LMS Algorithm Using the
TMS320C25
Transversal Structure with Sign-Error LMS Algorithm Using the
TMS320C30
Transversal Structure with Sign-Sign LMS Algorithm Using the TMS32OC25
Transversal Structure with Sign-Sign LMS Algorithm Using the TMS32OC30
Transversal Structure with Leaky LMS Algorithm Using the TMS320C25
Transversal Structure with Leaky LMS Algorithm Using the TMS320C30
Assembly Subroutine of Transversal Structure with LMS Algorithm Using
the TMS320C25
Linker Command File for Assembly Main Program Calling a TMS320C25
Adaptive LMS Transversal Filter Subroutine
TMS320C30 Adaptive Filter Initialization Program
Assembly Subroutine of Transversal Structure with LMS Algorithm Using
the TMS320C30
Linker Command/file for Assembly Main Program Calling the TMS32OC30
Adaptive LMS Transversal Filter Subroutine
C Subroutine of Transversal Structure with LMS Algorithm Using the
TMS320C25
C Subroutine of Transversal Structure with LMS Algorithm Using the
TMS320C30

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

~

'TUtS'

.titl.

RESERVE IIIlOOESSES FOO PMAl£TERS

ffHfffUHfHHfHfHf"*UH......+tHH.HH. .H ....H ..HfltH+HI.HHfH

I

TLI'tS I Adaptive Filter Using Transversal Structure
and UIS Algorithll, Looped Code

s

dIn) -------------------------:

§.

:+

~

ISl!II--).lnl

.§"

xlnl ---------: AF :--------:-----) y(nl

:-

~

yIn) = SlIt lII{klfx{n-kl

~

e{n)

So
So

.UStct

>

~paraltter511.1

"CI
"CI

~pi.raaeters" , 1

"para.ters· ~ 1
"parueters",l

~

....=

HHtHffHfffHfH-IHflHHHfHff

c:l.

PEIlFIlRIt TI£ ADllPTlIlO FILTER

~

fHHffHffftHHHHHfHfHHffH

.>

• text

k=O,!,2, ••• ,63

=dIn)

FIR

- yIn)

Where we use fi lter order

~

N

Q

~

~

So

""
~

NEG
ADDH
SACH

•10'
IN'
1f,I.

64

Mbuffer",mIER-l
Mbuffer·,1
Mcoeffs·,ORI£R

Repeat N tilts
EstiNte Y(n)
Configure SO as data ._ory
; Store the fi Iter output

ADAPT

=D(n)

; ERRCn)

- Yen}

; Round the resu 1t

LARK

A111,IlRDER-I
1IR2,1f,I
AR3,IN+I
ElIRf
f-,1IR2

; Set up counter

SACH
8ANl
FINISH

.tnd

It,O,AR!
ADAPT, f- , AII2

-=
[Il

="~ 00

*

• ElIRf

~

00

; Point to tbe c9tfficients
; Point to the data saap!.

=U f

=U f

ERR(n)

ERR(n)

X(n-kJ
; Load ACOt liIith A(k,n) &: round

; p

f

; WU:,n+1) =W(k,n) + P
; P =U ERRlnl Iin-kl
; Stort WUt,n+U

*

=~

~

=U * ERRlnl

; T register

~N=
n~

0IE,15
ElIRf

f,AR3

=

N ....
til;.

ADD
SACH

1-,M2

~§
[Il
[Il
....
-<
IJ'Q ..,
~

~d
OO~

=- Vlnl

; T =ERR(nJ
• P =U ERRlnl

LRlK
LT
ItPY
ZIUl
ItPYA

1-3
..,

1-3-

• ACt
D
ERR

PAC

DEFIt£ AOORESSES OF lIlFFER AND CW'FICIENTS
.usect
.ulect
.usect

Using rounding
point to the oldest supl.

ERR
U

LT
ItPV

LRLK

o

Clear the P register

UPDATE TI£ !.EIGHTS

DEFIt£ PAllAl£TERS

PAGEO'

~fdOOh,f-

SACH

fH4tHfH-lfff'ffHfHfHfHfHHffflHfHHff4fHftHfffffHfftHfffH

.equ
.equ

I\ACD

Configur. BO as ptogru Hlory

COIIPUTE TI£ ERRm

Chen, Chein-Chung February, 1989

•OODER'

RPTK

0
OlE, 15
AR3,XN
ClIDER-I

APAC

=b4 and IIU =0.01.

Initial condition:
1) Ptt status bit should be equal to 01.
2) SXI1 status bit should be set to 1.
3) The current If (data .aery page pointer) should be page O.
4) Ilita IItliory Qt.E should be 1.
S) Data atllory U should be 327.

AR3

CNFD

Nott: This source progru is the gtntric version; I/O configuration has
not been set up. User has to lIodify the Min routint for specific
appl ication.

~

UIRP
CNFP

If'YK
LAC
LRLK

wlkl = w(k) ... ufe(n)fX(n-k) k=O,l,2, •• 63

""

~

ElIRf'

.=0

~.

ac

Ypirueters·,l

U.

b3

~

C

·paraltters·,1

.uslet
.useet
.uslet

Algoritt.11

::J

N

.uslet

.useet

ESTlMTE TI£ SIGNAl. V

~.

~

D.
V.
ERR'
iX£'

*

-..,....>

IJ'Q

-e
C

="

~

HHHHHI ..... III .. IIIIU ... IIIIIIU.II .... UIlIlIIlIlIIIl ....

f

input:

us i ng the 1ltS32OC3O
..

110 configuration:

~

(SIJI)--)

,In)

1'IPVF3
II ADDF3

1-

ADIF

xln) - - - - I PF : - - - : - - - - ) yin)

§o

AIgorHIlI:

~

yIn)

=SIJI .(k)fx(n-l<)

II

k=O,1,2, ___ ,63

k=O

::to..

~

Where

IItt

I'IPVF

~

us. filter order = 604 lAd .... = 0.01.
Clltn, Clltin-Chung llireb, 1989

~

~­
So
So
~

HHHHHfHHHffIIIIIIIIIIIIIIIIIIIIIIIIHHHHIIIIIIIIIIIIIIIIIIIII

~

order
au

,copy

I I ADIF3
SIF
1'IPVF3
I: ADDF3

.adapfltr.int.

BD

IHIHHIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

SIF
ADIF3
STF

PERFlRt ADAPTIVE FILTER
fHllfHHftffHHffHlllllllllllllllllfHHffHHfHHfHf

~

.set
,set

64
01

.ttxt
.set
LDI

begin

~

UP

So
~

LDI
LDI

~

UF
IIPTS

STF
"

SIF
LDI
LDI

XlI

.n

order,.
bn-i.ddr
Ixn-&ddr ,M<>
hn-i.ddr,ARl
O.O,RO
order-l
RO,_(lll
RO, IMI++WX
Ii n-i.ddr , ARb
tout-i.ddr ,M7

IU,R7
_(1)X,R7,RI
order-3,RC
UIS
iMO++(!)X, R7, RI
IMI,RI,R2
R2, IMI++U)X
tMO,R7,RI
IMI,RI,R2
input
R2,IMI++(!)X
IMI,RI,R2
R2,IMI++(1)X

DEFINE COOSTIWTS

INITIALIZE POINTERS AND IIRRAVS

N

ac

1'IPVF3

UIS

Set up circular buffer
Set ot. page
Stt pointer for x[]
$It pointer for It(]
RO =0_0
xIl
.Il
Set
Set

°°

•
•
pointer for input ports
pointtr for output ports

in_addr
Gut-i.ddr
xru.ddr
Itn_addr

•cinit

.usect
,usect
.usect
.VStct
.usect
.usect
.useet
.Stct
• .01"4

.lford
,lford
.IfGrd
.lford
,Il ..t

.• nd

t(n) =dIn) - yIn)
Send out yin)
Stnd out ,In)

.(n)

1'IPVF3
LDI
RPTB

!II(k) = !liCk) + u.. ln)'Ixln-k) k=O.1.2, •• 63

~

5TF

LPDATE IEIGHT5

,In) = din) - yin)

~.

R2,R7
R2,tM7
R7 , t+AR7 (1)

SIIlF
SIF

63

~
N

~

18=
~

~

COI'IPUTE ERROR 51-' t(n) IIIID OUTPUT yin) AND ,(n) 51-.5

s

C

Input x(n)

0,0,R2
; R2 • 0,0
_(1)X, IMI++( llX,RI
ordtr-2
_ ( IlX, IMI++(1)X,RI
Rt,R2,R2
; yin) = III[].X()
Rl,R2
; Inc1udt li.st result

UF
I'fYF3
RPT5

1+

Q

Input dIn)
Insert xen) to buffer

aJtPUTE FILTER OOll'UT yIn)

dln)------

f

tt1R6,R7
Ott1R6(1),R6
R6,tMO

UF
II UF
SIF

130 - Adaptive traRsver5i.l filttr lIIith UIS algorithti

·buffer· ,order
·co.ffs·,order
·vars·,!
·vars· ,1
·vars·,1
·vars· ,1
·vars·,!
-,dnit·
5,in-i.ddr
O804OOOb
O8O-lOO2h

xn
.n

.

; R7 = tin) .. u
; Rt =.(n) f u f xln)
; Inith.1izt repo.t counter
; Do i 0, N-3
; Rl tin) f u .. xln-i-l)
; R2 =IIIHn) + ,In) .. u f xln-i)
; lIIi(n+ll = wHo) + tIn) f U .. x(n-i>
; For i N - 2

1-3
""l

d~
til til

....

=~
~

;i
~

00

=
=

1-3-

=

oo~
~=
N""l

; De 1iy brineD

~d

; IIIHn+ll = IIIHn) + ,In) .. U ... xln-i)

=~

; Updot. l .. t •

~

~~

....

=~

~

~
~
""l

....

~

9

~~

§

sg.

.titl.

'Y25'

tHHHtHfHHfHHHfIIIIIII.IIIIIIIIIIIIIIIHHfHHlHHHHtlffH

Y25 = AdlptiVf Fi 1tet Using SyIMtry TransverSi) Structure
Uld UIS Algoritha, looped Code

~

,110)

~
~

:--: :

(StII)

~.

~

~

:--:

\

(StII)

:-:

:-:

\

(StII):

:-:

z:

1-:
\

:-:

. used

VI

oU51et

ERR:

.used
• used
ouseet
• used

(J£:

U:
ERfiF:

Q'-<

::3. 9

·parutters" ,I
·~r... ttr5",1
"pi.ruetets", 1
"~rUlttel'5 II ,1
"pi.rueters" ,1
"pi.raeters·,l

~9

9

~

~

tHffHHHf+IHHH+HffHltHHH

d!§
;~

SYMETRIC IlFFER ADDITION

=StII .(k)lx(o-t)

lJIRP

k=O.1.2 ••••• 31

if
~

"ere .. uu fi Iter order

LARK
LRLK

PO

=.(le)

LRLK

+ uftoln)+ziCn-td k-O,l,2, •• 31

SVlI

=64 ud 1M! =0.01.

LRLK
LAC
ADD
S4ICL

IIANZ
Note: This source progru is the gtntric version; ItO configurition hu
net been set up. User hIS to IIOdify the Rin routine for specific
iPPlici.tion.

Initii.l conditionl
U At stdus bit should be eqM1 to 01.

Sill st.tus bit should b< .. t to logic I.
The current If Iddi. HMiry pi.ge pointer should be pi.ge O.
Do.t .....ry lIE sh.uld b< I.
Do.t..... ry U sh.uld bt '1I1.

Cbtn, ChtiR-Chun, Februvy. 1989

;
;
;
I

SIt up the CouAter
Point to oldest dati.
Point to ntwlt dati
Point to first buff'r

f-,O,AR4

".O.ARI
SVlI ..... AR3

: Bufftr(k) • IlAno+k) + IlAno-ll+k)

FIR

ltPV)(

o

LAC
LRLK
RPTK
MCD
CN'D
PnC

lIE. 15
AR3.lASIIlF
~

rjioo

z1ln-k) = xln-k) + xln-63+k) k=O,l, ••• ,31

.. (kJ

c

·cotffs",1

~.

• text

...,

C

.us.ct

g..

PERFORII Tl£ ADAPTIVE FILTER

y(o)

Q

.used
.used

Alg.ritho:

Q
c
N

\

III:

"bufftr" ,(JU£R2-1
"buffer·, I
·cOfffs· ,DRIlER2
·coeffs·,CRDER-l

D:

a

aEFFlCIENTS

IlESfRYE AOOR£SSES All PMAl£TERS

z :••• -:-: z :---:-:

:-:

I-J

All)

.used
.useet

1-:

(StII)

:-: Z :-:-:

N

~

:

~

114
32

FRSBIF:
lASIIIF:
FRSDATI
LASIlAT:

xln) ---J---J Z :-1--: Z : ••• : - : Z : - : - - :
:--:
I-I
:-1
:/ :-:
:/
:/
:/

So
So

"

:------:

:-::--1

~

~

,lIo-k)

:------1

~.

•• qu
•• qu

JEFlIE AOOR£SSES rE IlFFER

- - - - - yin)
+:+
: A. F. :----)'SlItI--) tin)

~

rEFllE PlIIW£TERS

1IIIER2:

d(o) - - - - - - - - - - - - - - - - - - - - - :

;:

~

•

"IlUER:

Configure EM) is progru lHaory
elMr the P regist.r
Using rounding
Point to the oldest buff.r
Reptit Nl2 ti..
Esti.. tt V(n)
Configure 80 IS ati .aery

; Sto.. thl fi II.. output

....~
~

~

~

lEG
ADD
SACH

; ACe

D.15
ERR

=- YIn)

; ERRln) = Din) - Yin)

lI'MTE 11£ !EIGHTS

~

r
~

~

;;J

;t

s:
~

~
~

N

n
.."
C

....

~

~

~
N
C

Q
c

~1.0RDERl-1

LRlJ(

AR2.hW
M3.LASBUF

ZALR

if>YA
SACH
BANl

~

:!l

LARK

LT
IPY
AD4\PT

~

~
::to

ERR

LRlJ(

is

g'

LT
IPY
PAC
ADD
SACH

U
()E. 15

ERRF

ERRF
1-,M2
1.M3
.-,M2
oH,O,ARl

; T = ERRln)
; p = U I ERIIln)

; Round the rtsult
; ERRF = U • ERIIln)
; Stt up countel'
; Point to tht coefficients
; Point to the last bufftt
; T reghter = U * ERR(n)
; P =U .. ERR(n) .. K(n-k)
; Load ACCH with A(k,nl & round
; WCk,n+ll = W(k,nl + P
; p = U f ERR(n) .. Hn-kl
; Store W(k,n+1)

AMPT ..... AR2

lI'MTE MTA POSTION FOR t£XT ITERATION
FINISH
MOOV

AR2, LASDAT-l

Set pOinter

RPTl<

1JdlER-2

II'IIlY

1-

Rtpta.t N-l hIM'S
Sttift dda. for next iteration

LRlJ(

.end

~

~

Algorit""'

...is

zh-kl

g'

:I:

IIt.ER

~

Whtr. we us. fi I ttl" order ::;: fA and au

II'YF3
I I STF
ADIIF3

RI.tAAl++Ul.R3
RI.tAA2++111
R3.R2.R2

>
i

"0
:=

x[n-il + x[n-tN-il

=
....

, yll • on.zll
; Store z(n)
; Acc ..ulitt tilt rfsult

Q..
~

=

tAA4++11 II. tM5--UlX.RI

HHHHH+fHfIIIIIIIIII •• llllltHHlfHHttHtHf

<::!

HlHlHlllllllllllnllllllllllfllll.IIIIIIIIIUII'

II'YF3
II STF
AIIlF

PElf'IRI ADAPTIVE FILTER

..

oreltr

I

• copy

·i.Rpfltr.int·

...t

64
0.01

.set

SUIIF
STF
II STF

; Filter order
; Step size

.set

~
N

C

Q

"'.,C"

UIF

S<1>

RPIS

~

"'IS

SlF

STF

II SlF

~

STF
II STF
Ull
lOl

N

C

order, 11K
bJuddr
Ixn..lddr,ARO
......ddr.ARI
Izn-&ddr 1 AR2
ordfr12-1,IRO
O.O.RO
01'411'-1
RO._11I1
ordtr12-2
RO. tAAl++UI
RO,tAA2++UI
RO.tARl--UROI
RO, tAR2--UROI

;
;
;
;

Set
Set
Set
Set

ditl patt
pointer fir
pOinttl' for w[)
point.r for z[)

xn

; Set index pOinter
; SO ;, 0.0

lOl
STF

lIlt]

..

0
=0
=0
=0
point.r for input portl

:I:

' •• t .....dr .AR7
_.R7
tIt1R6U1.R6
ARO,IIR4
R6, tARO-Ult

Inp.t dlnl
Inp.t xlnl
Set Elrwrd pOinter for xU
Insert x(n) to buff...

CIIII'IfIE FILTER OOTPUT ylnl

UtS

,xll=O

input:

UIF
II UIF

II'YF
1I'YF3
Ull
RPlB
II'YF3
I I ADIIF3
STF
II'YF3
IIOOF3
BD
STF
ADIIF3
STF

t Set up circulf.r buffer

zll
oil
z[J
Set
Set point.r for output ports

liLlddr ,AR6

....
9
:r~
9 ....
""l
~ ;;.

R2.R7
R2,tAR7
R7.t+M7Ul

; ten) ::;: dIn) - yin)

; Send out yIn)

....

; Stnd out ten)

fIl

==

lfDATE !.EIGHTS olnl

•text
lOl
L.II'
lOl
Ull
lOl
Ull

RI.tAAl--llROI,R3 , yll =o!l.z[]
; Store zln)
RI. tAR2--IIROI
; Inc1udt lut rtsult
R3.R2

COIf'UTE ERRal SIGNAL ,Inl IIIIl OOTPUT ylnl lAd 'Inl SIGNALS

INITIIII.IlE POINTERS IIIIl ARRAYS

begin

>~
-00
~'<
:3. 9

; zln) =- x[n-i] + x[n... il

s: 0.01

~

~

IItD

tAA4++1 Ill, 1/IR5--1 Ill,RI
; zen)

AIIlF3

~
<1>

Q
C

RPlB

IIIlk) : ;: .(kl + Uft{n).zIIHcl k=O,I,2, •• 31

;{i

~

; R2 • 0.0
; Set bact.rd point.r for x[)

tIn) : ;: dIn) - yIn)

~:::to

S<1>

0.0.R2
ARO.M5
order/2-2.Rt

ADIIF3
xCn-k-lI + xln-63+kl k=O,I, ...• 31

31
ylnl = SUI1olkl1zln-kl koO.I.2..... 31
koO

.s;,

~

Ull
Ull

V30 - Adiftive sy.ttric transversal filttr Mitt.
Uts llgotitt.. using the TttS32OC3O

I

~

s:

UIF

Hf+HHlllllllllllllllllllllllllllllln.llllllllllllllHHMH9H

I

P"I'=
:rfll
~ <

; R7 = e(n) " u
; Rt = e(n) " u " ztn)

; lnitiiliz. repot counter
; Do i = 0, N-3

; Rt =ten) .. u " zen-i-l)
; R2 = lIIIi(n) + etn) " u I zh-i)
; 1IIIi

IItFFERS IN! ClEFFICIOOS

-i

Algorit"":

~

64
(F

K1=

n:

~

N

:-J

-i

:--:

IEFIIE AlllRESSES

-iZi-i-)ISUlI--) •••• ---iZi-i-)ISUlI-)bilnl
bO(nH-J
bUn)
bi-lIn) J-:

~

'"

tki-l :

ItO i

~

So

i--J-:
tki-l

1«0

~

••qu

li-llnl

x(n)-J

't:i
::to

c

11Inl

-1-)(SltIl--} .... ---J-}(SlJU--)fHn)
ii-

::...

~
..,

+

10lnl

~

~

LI'tS Algorit_, Looped Code

!IIIIER:

is
§

So
So

IEFIIE PIIRAIETERS

L2S: AdatltiYf Fi lter UsiAg littie. Structure

::to

s.

'125'

HfHfHHfIIIIIIIIIIIIIIIIIIIIIlIIIIIHHtHHHt+tHHHfftHHtHf

i=I , 2, •• 64

=64 and au =0.01.

Nottl This source progru is the generic version; ItO configuri.tion hu
not been set up. User hu to aodify the ..in routine for specific
i.pplication.
Initial conditionl
I) AI status bit should be .qual to 01.
2) Sit stltus bit should be set to logic 1.
3) The current IP ldata MaOry page pointer) sbould be page O.
41 Ilat.....,y U should be m.
5) The 81 &: BOI pointer 1M3 &: AR4) should be exchanged every
iteri.tion. For exuple,
For odd iteration: AR3 -) 81
M4 -) BIll
For even iterdion: AR3 -) BOI
IIR4 -) BI

C'-).,

===

UIRP

l1li3

!Mf'D

iJIII(
lAU(
lAU(
lAU(
lAU(
lAU(

MI,IRDElH
AR2,FI
11113,BI
M4,BIlI
IRi,GI
M6,KI

flit"'"

INITIALIZE Tl£ BI
UIC
SACL
SACL

All)

~~

Q~

FI

C'-)

==

e::
~
.,

<,O,AR2
<,0,l1li3

INITIALIZATZIlI
LT
l'PY
PiIC

SACH
lEG
SACH

<,1Ri
<,AR2

~

T =BI
P=BI1

11

~wU~·~!·i

~

~

I

~~i;~i~~~

•• '3

252

~B~i;~:oi

~
~

i

I

~ •• *
Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

~

LaO: Adaptive lIttice Structure Filter lIIith

~

Algorithll:

is

fUn)

;:s

bUn) = bi-Hn-l) - KHnJ
i-I

~

~

5'
~

~::l'.

~
~
::;-

= fi-1(nJ

Kilntll = Kiln)

I

fi-Unl

i=1,2, ••• ,64

au

t

f [

fHh)fbi-Hn-l) + bilnlffi-Hn) J

f

,Hnl

f

bHnl

i=1,2, .. 64

littice
HHHlfHHHHHHIIIIIIIIIIIIIIIIIIIIII.IIHtHlHHHffffHH

~

•copy

idi.pfltr.int·

PEll'CRI AIlAPTII,£ FILTER

~

tHffffoJlff*fHfHHHHHHHfffHHHHHHH+

ord't

V,

•

.s.t
.set

OIl

C

0.04

INITIALIZE PUINTERS

....

AlII)

ARRAYS

.text

. set

begin

LDI
LII'
LDI
LDI
LDI
LDI

~

~
N

C

a
C

ordert 2,BK
Ikn_addl'
Ikn..i.ddr,MO
Ibn_odd' ,ARI

;
;
;
;

'truddr ,M2

; Set pointer for g[J

Set
Set
Set
Set

up circular buffer

IIPTS
STF
:: STF
ADDI
LDI
LDI

.useet

pointer for k[)
pointer for b[]

b.
in_addr
out_addr

.useet
.useet
.useet
.useet
.useet

tn_addl'

O.O,RG
orderf2-1
RO,_lIll
RO,

.useet

g.

, RG = 0.0
, k11 • 0.0 ond gIl = 0.0
= 0.0 ..d bd[] = 0.0

, b[]

input:

LDF
:: LDF

•kn

; Input d(n)
; Input xln)

bn..addr
gn __ ddr
u
cinit

.useet
.useet
•sect
•.ord
.lIIord
.word

.word
, .word
•..ord
.1I ..t
.,nd

-~-.---

- -

--

------.~

input
R4,R6
Rb,1M7
R7,'+AR](1l


'C
'C
~

Q.
=
...
~

n
N

.

~t-

...
= -~...
(Ij

(JQ

De h,y branch
Take out lut tel'll

Updde ~9(] pointer

~

=-~

Send out yen)
Stnd out .(n)
Update k[] pointer

~

-00
"1

~=

:::ll
001:;
~~

~~

~=

~=-

DEFIlE ClIISTANTS

ditl page

order,IRO

LDF

SO
SUBf
STF
:: STF
LDI
II LDI

; Fi Itfr order
; Step sizt

64

rl'YF
ADDF3
rl'VF3
STF
AIlDF
SUBF

Bl • Gl
Insett Bl
E=D-BlTIj

- Kith) • bi-Hn-l)

GHn+1J = Gill'll + IU

~

Q

II'YF3
STF
SUBf

ADDF

~

N

::

oilnl = dlnl - SIll ylclnl = oi-1 - bHlnHGHlnl i=1,2, ... ,64
k=O
64
64
yin) =SIM yil,,) =QJt bHnHGHn)
i=1
i=1

~

SS-

u.s Algorithl

using the l1tS32OC3O

·cDtffs". order
"cotffs', order

'buffer" ,2torder
·vars' ,1

'vars',l
"vii's· .1
1
,1
'viI's',1
'v.rs',l
',dnit'
',iruddr
O804OOOh
0804002h
kn
bn
gn

-""'5

.

=t-

:::

00

-...>

(JQ

-=C
"1

9

.title

~

"Te"

Y:
ERR:
M:

lX25: Adi.ptiw Fi atr Using TrusytrSi.l Structur.
ind Nor.liztd UIS A1,orit.. ,Looped Code
Algoritha:

U:

ERRF:
!JAR:

,used
.useet
• usect
.ustet
.useet
• usect

"paruet.rs· ,1
"pt.rueters·.J
"parUM!t.rs" ,1
·pirUtters·,!
"parUtt,rs·,1
"piI'uet.rs" ,1

f

ffHfllllllllllllllllllllll •• 111111

f'ERFORIt TI£ ADAPTI"; FILTER
:iHHHHfHffll.I.111111111111I111

63

y(n)

~

ESTIIlATE TI£ P!IER

-

=dIn) - yIn) vt.rtk) = (t.-r) * yuCK-U + r I xln) I xCn)
.(k) =.(k) + ... (n)fX(n-!<)/Vlr(k) koO,I,2, •• 63
.(n)

s

gO

~

hr.

.

•

lilt

ust filter orlftr = fA and au

LRLK

SPH

=0.01.

Nott: This sourct progru is the gtn.ric vtrsion; 110 configuntion hiS
not betn 5tt up. User hl5 to .odify tbe Min routint for sp.cific
applicition.

~
~

Initial condition:
1) At status bit should be equal to 01.
2) SlI' status bit sbould be Stt to 1.
3) The current If (dita HMry page pointtr) should be pige O.
4) Dot....ory !IE shou)d bo I.
5) Dita MlIOry U should b. m.
b) Otta . . . ry YAH should be initializ.d to 07fffh.

~

'"~

~.

~

Q
Sl

s..

1HfIHlllllllllllllllllffHlHfIHtHHIHfHHHf

IEFIIE PM/IIETERS
1I1lIER:
SHIFT:
PAGEO:

•lO:

••qu
.Iflu
.eflu

IEFIIE AIIIIIESSES

64
7

o
(F

IlFFER

All)

Ci£FFICIENT5

IN:

.us.et

~

.UI.et

-Cotff5-,~

~

•

Q
c

SUB
IIIlD

ERRf
VM
VM,SHIFT
ERRf,SHIFT

J ACe

SACH

VM

; I Xln)
; Store VARfn)

=

; !ICC \IM(n-l)
; ACt = (1-r) • VAfUn-l)

= (1-r)

VARfn-1) + r

f

*' xenl

D~

.useet

I£SERI'E AIIIIIESSES All PM/IIETERS

.us.et

-parutttrs-,1

§"rg"1

....

o

Configure BO as progru ....ry
eliif' the P r.gisttr

LRLK

!IE, 15
I1R3,XH

Using rounding
Point to the oldest sUCl'le

IIf'TI(

OOER-I

Repeat" ti...

=
.. =

Mel)

IIl+OfdOOh,'-

EstiRte yen)
Configur. BO as data MlIOry

1-3E;

CIf'I'
II'YK
LAC

FIR

=el

Or:IJ
r:IJ !.

~

SACH

; Store tM fi I ttr output

C!»IPIITE TI£ ERRm
lEG
ADDl

D

SACH

ERR

=- YIn)
ERR(n) =D(n)

; !ICC

;

- YIn)

II'DATE TI£ IEIGHTS

..:

~

~

>~
EI-3
Q "1
"1 s=

; Point to input sigMl X
; Squire input signal

CIFD

Chll'l, Cbtin-thul'Ig F.bruary, 1989

-buff.r-,IJttER-1
-buff.r-,1

'"

51-'

ESTIIlATE TI£ SI-' Y

::t.

~

(F

AR3
AR3,IO

UIAI'

llUf

I

t

s..
s..
'"

~

• text

k=O,l.2 ••••• 63

koO

I
~

=SlIt w(kl*X(n-k)

t:I.l
tJQ:;-

=-~
I'D
....

~I'D
t:I.l~
~

....

~~
NQ

fAa

!.

LT

ERR

; T • ERR(n)

IFf

U

; P = U 4- ERRfD)

PAC
ADD

~.

!IE, 15

; Round the result

Q..

; PIIk. dividend positiVI
; Repnt 15 tilltS

t:I.l

-'IZEC!JI'IERlEFIICTII!
ABS
IIf'TI(

14

5U8C
BIT

VM
ERR, 0

; P.rfor. U *' lEARfn): I VM.
; Check sign of ERfUn)

~

~

t

IEXT

is
::to
§

.:;,

~
~

~.

So
So
~

~
~

N

C

Q
...<::l
So
~

~
~

N

C

ac
~

ERRF

; Store ERRF

LARK

ARl,ORIEIH

LRI.J(

AR2,1oN
AR3,XN+l
ERRF

; Set up counter
; Point to the coefficients
; Point to the dih sa.ples

LRlK

rl'Y
ADAPT

~

~.

NEXT

LT

.§"
:!l

8l!1
lEG
SAC!.

F1NISii

; ERRF = - U f :ERRlnJ: I I'AA

rl'YA

*-,M2
., M3
f-,AR2

SACH
BANI

H,O,ARl
AOAPT, f-, AR2

I~R

.end

; T register = U f ERRln)
; P

=U f

ERRCn)

* Xtn-k)

; Load ACCH lIIith A(k,n) rt round
; Wlk,ntl) = W(k,nl t P
; p =U .. ERR(n) * Xln-kl
; Store Wlk,n+lJ

N
VI

01

HHHH ... IIIIIIIIIIIII.II.I ••• IIIIIIIIIIIIII.ttHHHHHHHf

•

·

•

ESTIMTE TI£ POlO IF TI£ III'UT SIGIIIIl.

IPYF
IPYF
LIF
IPYF

TN30 - Adaptiv. transversal filter lIIitb Noraliztd UIS Ilgol"itbli
using the TlIS32OC3O
A)gorithi'

vuh}

l

= rlvv(A-1)

IIItkl

S
:::to
§

I:

~
~

i'\
~

~.

So

..

...t

pollltr

.Stt

olpllo
.lphol

.ltt
.set

01'41.1'

.Itt

bl,in

LDI
LDI
LIF
APTS
STF
II STF
LDI
LDI

.,O

So

a
C

,t.xt
... t
LDI

LIP

Q

c

R2,R7

; tlnl = dIn) - yen)

OOTPIJT yIn) NIl .In) SIGIIIILS

64
0.01
1,0
0.996
0,004

; Fi ittI' order
; Step sizt
; Input signll

STF
:: STF

; 1.0 - llpho

•

PUSIF
ASH

,

NEGI
SUBI

.rder,1II(

I SIt up circuliI' buffer

ASH

"n-lINr
IxIuddr ,AM
houddr.lIRl
O.O,RO
.rdo.... l
RO, tMO++Ill1
RO, tMl++IIlX
lin..a4dr,AR6

I Sot dit. PIg.
I Sot pointor for x[l
! Sot p.inter for .11
; RO = 0.0

PUSH
PII'F

tout-lddr,M7

; xll • 0
0
, Stt point.r fer ilput ports

II'YF
SUIIIF

II'YF

; Set point.r for output porb

inputt

_,R7
HllR6I1l,R6
R6,_

R3
R2
-.24,R2
R2
l,R2
24,R2
R2
R2
R2,I3,RO
2.0,RO
RO,R2

IPYF
SUIIIF

; Input dIn)
I Input xln)

; InHrt x(n) to blfffer

IPYF

R2,A3,RO
2.0,RO
RO,R2

; Cooput. 1I..rln)
; vu(n) :II I I 2.

N
~.

rJ'Q~
Q ...

...... =
==

a=-<~

.... 1:Il

-

O~
I:Il

.... t'I:l
....
== ...

rJ'Q

... =
~~

t'I:l
; Now

11ft

bav.

~-1

! Nooo R2 • xlO) • 1.0 • 2-0-1,
;ROz;YfX[O]

; RO • 2,0 - v • x[O]
I R2 • xlll • x[O) • 12,0 - v • x[O))
I RO=vlxlll
; RO =2,0 - v • xlll
=x[2] • xlll • 12.0 - v • x[m

! R2

x[21

;ROlIIy

IPYF

R2,R3,RO
2,O,RO
RO,R2

IPYF

R2,I3,RO

; RO-ytx[31

SUIIIF

el

~~

11;1

IPYF
LIF
II LIF
STF

; Stnd out yIn)
; Send out tin)

lfIIATE !EIGHTS .In)

POP

; w[)

R2,1AR7
R7,_711)

pOWI'

INITIALIZE POINTERS MID _VS

C

~

tMO++I 1ll,tMl++ll)1,RI
; yIn) = .[l,xll
RI,R2,R2
; Include lISt result
RI,R2

ERROR SIGIIIIl. .In)
5IJIF

PERFlRt AD1PTIVE FILTER

~

~
N

-

~

HtfffHHltHlllllllllllIllIllllllllIHHHH

N

'"

CIIPIIiE

I R2 • 0,0

"HHlftltIIIIIIIlIIIIlHIIIIIIIIIIIIIIIII.1111

ir
~

•

~

tMO++Ull, tMl++llll,Rl
R6,R3
R3, ...r
I Atltor. wrh)
ordtr-2

II AD1F3

"in,fltr. int"

==
....

IPYF3

IPYF3

=0.01.

Ind IU

Hf.IHHHHfffHHHflHlHHHHfHHtIHfIHfHI

•

>
i
"C

=r * v.,.la-1)

0,0,R2

STF
APTS

Clltn, Chtin-Cllung ",reb, 1m

• copy

, R3

LIF

1:_

lII(leJ.+ u.. tnltxh-Jc:)/w.rh" k=O,1,2, •• 63

Wbtl't' we uu fHter order· 64

~

~:::to

U-r)h:(ft)~(n)

.In) • dIn) - yin)

~

~

+

;R6-x2
; R6 = II-r) • x2

COI'\JTE FILTER OOTPUT yIn)

63
yIn) • SUI.lk)",ln"") koO,1,2, .. ,,63
k=O

~

R6,R6
1r_1,R6
1r,.13
.... ,13

f

I RO =2.0 - v • x[21
; R2 • x(3) = x[2) • 12,0 - v • x[2))

~.

N==z
~

....

nQ
~ ...
=a
e-N....
~

~

t~

t'I:l

!il

i5 i~1 i~~~~fi~Ei~~i~ I,. ~~~~~~~~~~~~~~~~~~~~~

.~.o;~HLL~~

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

257

~

00

.titl.
'~'
1IIIIIIIIIIIIIIIIIIIIIIIIIIHHHfllllllllllllllllffHHHtHfHHfH

TSE25: Adaptive Filter Using TraDsversl1 Structure
and Sign-vror Uts Algorith. ,Looped Code

ERR:
M:
U:
ERRF:

.used
.uslet

IEGIIU:

.used

·puueters" ,1
"pafueters· ,1
Iptrueters" ,1
"paraeters· ,1
·~rUttten· ,1

.useet

.useet

>

'C
'C

HHffHHHfHHfHHfHfHHHHf

Algoritho'

PERFIJlII TI£ ADAPTIVE FILTER

tD

..=

HlflfffffffffH-IfHHHlfHffHfff

63
yIn) • SIJI.lk)lxIn-Ic) koO,I,2 .... ,63
koO

~

I
is

g'

= 0,1,2, ... ,63

w(kl =w(kl + utx(n-k) if e(rd
.(k) = .(k) - utx(n-k) if e(n)

Where 1M UH filter order = 64 Ind IKI

~

~
...

~

s:

eben, Cbein-Chung FebrUlf'Y, 1989

So

~

So

'"

DEFIlE IWWET£RS
!IIIlER:
PAGEO'

XO:
IN'
iii:
t

~

t

C

Q
C

.equ
.tqu

DEFIlE _

!"l
~
N

•
IEXT

D=
VI

.useet
.Ustct
.useet
RESEINE _

.uleet
.useet

aEFFlCIENTS

-buffet",ClU£R-l
-buffer·, 1
·coeffs·,fIUER
FOR IWWET£RS

·piI'ueters",l
-Pirutters",l

progru Maory

; StOte the fi 1ter output
~

D

IEXT

LT

r£GIIJ

LJIRI(

m.R

BANZ

FINISH

.end

t+,O,ARI
ADPf'T .... ,AR2

~~
.., 0:1
=~
:r~
~

e

.. !..
.....
.=. . ..,=
r:Il

; T regist.r = U

ARI,(IIIlEll-!
AR2,1oN
AR3,XN+1
<-,AR2
'.AR3
<-,AR2

~~

~r:Il

ERROR

BGEZ

SACH
AI(!)

1.5

Configure BO is dita .-ory

LT
r£G
ADDH

/l'YA

t.4

°

SIGN

LRl.K
UlLK
II'Y

IlDAPT

IF IIlFF£R

eo

Clear the P register

Using rounding
Point to the oldest supJ.
Repeat N tiMs
Esti.ott YIn)

; ACe
; ACe

=- Ytnl

=DCn)

; T register

- YCn)

=-U

UPDATE Tl£ WEIGHTS

HlIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

~

Configure
0

OIE,IS
AR3.XN
(JUER-I
II!>OfdOOh,<-

SACH

condition:
PIt .tatu. bit .hould be oquol t. 01.
Sill statu. bit .hould be .ot t. I.
The current IF (ddl HMry pitt pointer) should be pt.,e O.
Data _'Y OlE sh.uld be I.
Dat, _ry U should be W.
lo) Data _ry r£GIIJ .hould be -W.

~
~

~

tf.j

AR3

Pf'P{;

lnitiil
\)
21
3)
4)
5)

~

~

RPTK
MCD
ClFD

Cl£C!( TI£

~.

~

LRl.K

FIR

Notel This source progr. is the generic version; I/O configuri.tion hu
not been set up. User his to IIOdify the Min routine for specific
IPPI ication.

~

'"

LARP
ClFP
II'VI(
LAC

>= 0
<0

=0.01.

g.

ESTIMATE TI£ SIGNAL Y

dIn) - yIn)

.(n) •

For k

. text

aCI

t:/)

:rt')
tD .....

~a

; Stt up counter
; Point to the coefficients

t:/)~
eM ••

; Point to tht dah. sup1.
; p = U t X(n-t)
; Load ACCH Mith W(k,n) " round
; Wtk,n+U = W(k.n) + P
; p = U • X(n-kl
; Store W(k,n+U

NaCI
(11=

~cr

~~
I

..,
a..,

tf.j

~
t:/)
==

i

~

t

t

Algor;tho'

s

gO

=dl.)

.1.)

=w(le)

wlk)

NMrt

~

IH

-

.copy

tD

STF
.. STF

i

~

R2.R7

; ,(nl = din) - yin)

R2.tM7
R7.i+M1l1)

ASIi

tHtfHffHl-ffftHfHHfHtHHflHtHHHflH

1f'YF3
LDI

AlW'n"" FILTER

.5et

64

0.01

C

i

wrs

SEUIS

INITIlLilE POINTERS AND _15
• text

begin

.5et

lin
LIP
LDI
LDI
lDF

it
~

RPTS

~

.

tv

C

Q
c

STF
STF
LDI
LDI
lDF
LlF

i
input:

order,BK
txn-lddr
IxlLlddr. ARO
1wn_lddr,ARl
O.O.RO
ordtr-l
RO.1ARO+t1 III
RO. iN!1++f1IX
lin-lddr ,M6
lout-i.ddr ,M7
lu.R4
".R5

STF

; Set up circuliI' buffer
; Set dot. page
, Set pointer for ;d]
; Sft pointer for !lin
; RO :II\: 0.0
x[]

lie

0

o[] ·0
Set pointer for input ports
Set pointer for output ports
R4 =..
R5 z ou

AllllF3
STF
i

...

_.R7
i+AR6 f1I. R6

R6.1ARO

I.put dl.1
Input xln)
IoM.t xlnl to buffe'

~(1)I,R5.Rl

order-3,Re
SELItS
IARO++IllX,RS,Rl
tMI.Rl.R2
R2. tMI++I1)t
1ARO.R5.Rl
tMI.Rl.R2
input
R2. tMl++l1)t
tMI.Rl.R2
R2.tMl++llIt

IB'IIE CONSTANTS

in_addr
ouLi.ddr
xI_addr
1m_lddr
u

chit
lDF
:: lDF
STF

1f'VF3
:: ADDF3
STF
II'YF3
: I ADIF3
BD

-31.R7
R4.R7.R5

.us.et
.useet
.uslet
• used
.uSlet

• used
.UStct

. sect
.•rll

.word
.ltOrd

.word
.ltOrd
.fl ..t
.tnd

"bufffr" •order
·cotffs·,order
"ViI'S·, 1

"virs",1
·vlrs·,1
"Vlrs·,t
1
",cinit"
5,irLlddr

.y"" •.

xn

on

IU

tr1

>~

dQl-3
C

; Send out yin)
; Send out tin)

""I

, Get SignCtl.) I
; AS = Shin)] • u
; Rl = Slfln)] * u • x(n}

; Initialize r.pot counter
,Doi=O.N-3
; Ri = Shin») * U f xhH-U
; R2 = IIIHn} + S[.(n)] I u • xtn-i)
; lIIitn+ll II: IIIHn} + S[.(n)).utxln-il
; For i = N - 2

="Un)

; Updlt. lut III

~

=-<
53
.. 111e.
= ....
.=-f")
.=
tD
""I
~C"'-l
C"'-l

(JQ

tD

""I

....

1-3E;

; Delay bri.hch
; MHn+!)

""I

~~

lPOATE WEIGHTS 01.)

XORJ

_

c=.
..=

, yin) • o[].x[]

; Includt last result

OUTPUT yl.) AND .1.) SIGNAlS

H.ttH-IHttfHfHHHftHHHfffHtHlfHflH

... t

~

i

"Idlpfl tr. int"

ordtl'

<:l

Rl.R2.R2IIII
Rl.R2

HtltHHHfHfltHlHfHHHHtH
••"'
......_ _ _..

IU

...

>

'C
'C

ClltPUTE ERROR SIGNAl .In)

=64 Ind IIU c 0.01.

ClltA, CheiA-Chung fIIrch, 1989

~

av.

IARO+tlllt.tMl++lllt.Rl
order-2
IARO+tlllt. tMl++lllt.Rl

SIJBF

utxln-k) if ten) ( 0.0

use Filter order

~

tv

1f'YF3
RPTS
II'YF3

wlkl ;; IIIlk) + utxln-kl if tin) )II 0.0

~

~

i

- yl.)

For k=O,1,2, •• 63

~.

'"

0.0.R2 II

II ADDF3
ADlF

1=0.1.2 ••••• 63

k=O

~

~.

, R2 • 0.0

lDF

63

=SlIt 0(1)"'1.-11

yl.)

~
;:...

So
So

Clll'UTE FILTER OOTPUT ylnl

TSE30 - AdiptiVt trtnsvtl"Hl filter lIIith Sign-Error l.ftS
Ilgorititl using tilt T11S32OC3O

+ S['(nl]fafXln-i)

a=tD

..

111~
~
N ....

Q=nll1
~

..

Q~I

tr1
""I
""I

C

""I

~

a::
111

'TSS25'

•Ii tie

B

lWE:

.unct

·parueter,,,1

UI

. u5tCt

·plrueters· • t

TSS: Adaptiv. Filter Using TranSytrSill Structurt
Ind Sign-Sign l.ftS Algor-Ubi! ,looped Code

.U5tet
"puueters.,l
fHfHHHffHHHHHfIHHHHfff

AlgOl'm••

fHfHlfflHtHfHHHfHHHHHft

ERRF:

•

_

TI£ ADll'TIIIE FILTER

f....

• !txt

63

=SlII wlkl",ln-ki

yIn}

koO

3'
'f::l

~is'

For k = 0.1.2..... 63
.,Ck) =tICk) + u if eCn)lxCn-lcJ )= 0
wIt} = .Ikl - • if elnl",ln-ki ( 0

g'
~
...

1)

21
3}
41
51

~
~

~

PIt stltus bit should bt eq~1 to 01.
SXft st.t •• bil sho.ld be ..I to I.
The cUl'rent IF (dltl lItaory page pointer) should be Pigt O.
Dol. _ry lIE .hould be I.
Dol. _ry U .hould be 3ZI.

~

LARK
l.RI.J(

•

*

IEFIIEPI1RAIEf£RS

1lIDER'
PAGEO'

•equ
••qu

ADAPT

64
0

WI(:

IN:

-SIII

Q

c

ADIIH

D

SAO!

ERR

.usect
.usect
.usect

LAC
XIIl

SIICL
LAC
XlIlk
ADD
SACH
BANZ

IEFIIE AlDlESSES IF IlfFER II1II IXEFFICIOOS

...

~

Set up counter
Point to the coefficients
Point to the dlh Hap1e

-buffer- ,MlER-1
-buffer-,1
·coeffi-,~

FINISH

•• nd

t-.O.AR2
ERR
ERRF
ERRF
11).15
*.15
".I.ARI
ADAPT ..... AR3

~

-....

"!j

>~

IJ'Q~

~

"1

~

-=

a="1:12~

~~
1:12, ~

.... -

=~

IJ'Q

~a

; IICC • Dlnl - Vlnl

n> (')

~=
::~

!.PIlATE TI£ IEIGHTS

*101

~

ARI. OR1ER-I
AR2.1Il
AR3.XN+I

HflHHlHHtHHffHfHHHHHHHHHHHHHH

v.

~

; Store the fil ter output

CI£()( TI£ SIGN IF ERROR

Chen, CMin-Chung February. 1989

C

<:)

AR3.XN
OR1ER-I
IoN+OfdOOll ....

l.RI.J(

N

f,)

l.RI.J(

lEG

~

~

lIE. IS

Configure £K) as progru ...ory
Cltlr the P refister
Using rounding
Point to tbt ollest s.ple
R.pt<1 N Ii ...
Estillite ¥(nl
Configure 80 i.S diti. . .Dry

SET !.P TI£ POINTERS

Initial condition:

~.

LAC

SACH

Nott: Tbis source progr. is tlte generic vtrsion; 110 configuration his
not bttn Stt up. User ItIS to IIOdify tbe .in routine for s,.cific
Qplicltion.

::to.

o

IW:II
ClFD
APIIC

Q.

AR3

II'YJ(

RPTK

FIR

.rt we UH filt'r order = 04 and au = 0.01.

.Q,

SIII

i.J1RP
ctFP

.In} = dlnl - ylnl •

~

s:

ESTUIATE TI£ SIGIW. V

koO.I.2 ••••• 63

IICC =Xln-kl
Gel lilt .ign of ERRlni t Xln-kl
Store lilt sign
Get the sign with its sign fxtension
Get the convergent fi.ctor ItJ or -IIJ
Upd.l. Wlkl

oo~
f.M ....

N ....

=="
noo

N ....
tltlJ'Q

=
I

00
r§.

I£SEINE AIDIESS£S Fill PI1RAIEf£RS

D:
VI
ERRI

.usect
.usect
.IStct

·~rueters-,t

"parillfters-.1
·pirUtters",1

~

00

3'

t
~

=

=dtnl

:!l

IIttN

~

11ft

n

.coPY

orcltr
...

:=

ASH
X0R3

Uf'
ASH

=0.01.

64 and.,

XIlR3
AIIIF3

lOI
RPTB

-i.Apfltr.illt-

.Ht

64

.s.t

0.01

Uf'

:: STF
ASH
X0R3

SSlns

INITIII.IZE POINTERS NIl _VS

_3

.ted
begin

.Stt

lOl

...
s..
'"
~

<:::>

; stt

.....ddr.ARI
".RO
".R4
".115
O.O.RO
order-l
RO. tARO++(Il1
RO.IAAI++CIl%
li ....ddr.AR6
hut.ilddr .AR7

; Set pointer for

poiAter for xU

vn

STF

STF

;RO=-IIU

, R4 =..
,115= ...
, RO =0.0

I....
~

-31.R7
RO.R7.115
tARO++(1)l.R6
-31.R6
R5.R6.R4
IAAI.R4.R3

R7

; Set pointer for Oltput ports

, Input dtnl
; Input xCn)

.,

R5 = Sign[tlnll .. u
Rb = xl.1
R6 = Sign[x(n-i Il
R4 = Sign[x(n-i)]fSign{eln)) ..
R3 = wiln) + R4

, R2

xl_addr

.used

·vars·,1

WI_addr

.useet
,ustet
•sed
,1liiOI'd

·Vlts·.!
·vars 1

=0.0

IF/F3

tARO++tlll. tMI++t1l1.RI

Get I .. t dot.
Updat. or+-2tntll
Get the sign of dati
0. lay branch
Decide the sign of u
Cooput. oN-Itn+1I
Store lut 111(0+1)

ill-Iedr
ollt..a.ddr

CIII'IIIE FILlER 001I'\Il ytnl

O.O.R2

_.R6
R3.IAAI++(1)l
-31.R6
input
115.Rt..R4
1AA1.R4.R3
R3.1AA1 ++(1)X

-huff.r a ,order
"ct,ffs·,.rder
·vars'"
·virs·,l

cinit

.tnd

Ini tii.l ill r.pot count.1'
Do i =O. 1+-3
Get next dlta

Get the sign of dlti.
Dtdft the sign of u
R3 = .Hn) + R4

.U5tct
.ustet
.useet
.use,t

; Insert dn) to buFfer

Uf'

ol'dtr-3~RC

l ,

-,doH5, in-lddr

• 1I01'd

0804000b

• word
• 1liiOI'd

0804002h
xn

.1II01'd

1M

.flOit

IU

~

=Sign[e(n)]

SSIJIS
tARO++tIlX.R6
R3.1AA1 ++tIlt
-31.R6
R5.R6.R4
1AA1.R4.R3

liFlNE ClllSTANTS
XD

,x[]=O
, wI] = 0
; Set pointer for input ports

input:
_.R7
_tIl.R6
R6.1AAO

ASH
B1I
X0R3
ADDF3

, Sot data POI'

IxIudd.
IxIu.ddr,MO

Uf'
nUf'

LIF
:: STF

; s.t up circulill'" ~uffer

LIP

LIJF

N

orw,.

lOl
lOl
Uf'
Uf'
LIF
APTS
STF
II STF
lOl
lOI

~

~

.Inl =dtnl - ylnl
Send out yin)
Send out ten)

lPDATE WEIGKTS wtnl

HfHHHHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHt

VI

~
.....

STF
STF

R2.R7
R2.1AA7
R7.t+AR7(1)

Chen, Cbein-CtlUh9 ....rcb, 1989

Q

Q
<:::>

- ytnl

UH filter orftr

;;"l

~
N

"

f..- k=O.I.2..... 63
vUe} •• U:I + u, if X(n-tJIe(IIIJ )z 0.0
wtkl = wtkl - u. if xtril .. tnl < 0.0

~.

~

SIIIIf

k=O
.tnl

order-2
tARO++I 111.1AA1 ++1 111.RI
RI.R2.R2
, ylnl =.I].x[]
Rl,R2
I Include list result

CUI1PUTE ERRal SIGNIII. ,tnl AND 001I'\Il ytnl AND etnl SIGNiIlS
63
SUI wtklIXtn-lc1 k=O.1.2..... 63

ylnl

~
;:..
~

'"

ADIF

Algoritbo:

g"

~.

RPTS
1F/F3

TSS30 - Adlptivt trlfts'arSi.l fi lttr .ith Sign-Sigh utS
olg•• itbo using the TIIS32OC3O

s

s..
s..

"_3

tHHHHfHHlHffHHHHHHHHfHlttHHHfIHtHHHHHH

•

Updlt. lIIi(n+O

>~
~1-3
U

~ "'l
....
-::s
=-rIl
~

a

~
~~

.... rIl

-~

Jioo

i~

1-3=

~~
oo=!!
eM ....

N==(ioo
eM ....

=~

....

00

~

~

00

.title

N

~

'Tl25'

fHHfffHH".IIIIIIIIIIIIIIIIIIIHHHHlIIIII.I •• 11111111111I11111

U:

.aleet

ERRF:

.U5ect

·puUtters·, t
-piruet.rs·,1

~

111111111111 ..... 111111111111.11111

Tl25: AdiptiVt Fi ltet Using TrUIy.rn1 Structure
lAd LMky-UIS Algoritho. Looped Code

_

11£ AlW'TII£ FILlER

B

*",'HHHffHflfHHHfHHHHIH

• text

~

Algorith.;
ESTilIATE 11£ SIGNAl. Y

~~

y(n)

63
= SlIIw(H1x(n-k1 k=O,l,2, •.. ,63
Ir-O

,fn)

:=

U1RP
C1FP

d(nl - yen)

§

olk) .... lk) + u.. ln)txln-k) k.o.1.2••• 63

is'

Whet.

§'

11ft

use filter orHr = 64 Ind ..,

s

FIR

1IACD
ClFD

0.01.

~

~

~
i.EI1KY'

N

IIFINE AIIlR£SSES

64
7

(F

IIlFFER AND ClEfFICIENTS

VI

~

IN:

.uHd

"bufhr ,CRER-l
"buffer" ,I

.alz

.usect

"coeffs" ,tIUER

lr
~
N

C

12
c

; ERRCn) = DCn) - YCn)

;2

ERR
U

, T = ERRln)
, p = U • ERRln)

::=
oo~

11£.15
ERRf

; Round the result
, ERRf = U • ERRI.)

LRlJ(

ARI.ORIEIH
AR2.\III
AR3.XNtI

LT
ltPY

ERRf
.... AR2

;
;
;
;

ZALR
ltPYA

'.AR3
.... AR2

;
,

SUB
SACH
BANZ

*.i.EI1KY

;
,

LRlJ(

o

•10:

~

ERR

UiRlI

.equ
.equ
.,qll

wUsect

ll

IIESER'IE AIHIIESSES FtII PARAIET£RS
1)1

.usect

y.
ERR'
11£'

.uJeet
.us.ct
.usect

AIW'T

""rutters" ,1
"pa"lMten" ,I
"",rueters" ,1
"pat-utters" ,I

•FINISH

~
ac-

SACH

.end

f+,O,ARl
AIW'T ..... AR2

.... 00
l-3~

JEFiNE PARAIET£RS

pj\(jE0.

=

.... l"Il

=- Yin)

D

SACH

•IIIIER'

E

~~
l"Il ...

_
NEG

LT
ltPY
PAC
ADD

eben, Cbein-thung February. 1989

~

, ACe

lI'DATE 11£ NEIGHTS

~

Q

; Store the fi 1ttl" output

COIFUIE 11£ EJIIal

Initial condition:
l) Pft stotus bit should be equo' to 01.
2) SX" st,tus bit should be set to 1.
3) The current D' ld,tl ....ry pogo pointer) should be poge O.
4) Dlt.....ry 11£ should bt I.
5) Dlt....ory U should be "1ZI.

~

C

~fdOOll.'"

SACH

.'"""

Configure BO as progru ...ory
elM!" the P register
Using rounding
Point to the oldest sup1.
R.pnt N tiHs
Esti.ate yen)
Configure BO as data ••ory

not bun set up. User bu to .odify the ...in routine for speci fie
application.

~.

~

11£.15
_ _I
AR3.XN

me

t...

~

o

LAC
/II'l1(

Noh: This source progl'Ul-is tbl gneric version; I/O configurltion MS

~

lr

1f'Y1(
LRlJ(

~

~

AR3

;

;

Set up counter
Point to the coefficients
Point to the dih 5UJle
T register = U f ERIHn)
P =U f ERRfn) * X(n-k)
Load ACCH .ith A(k,n) " round
Wlk.n+l) = Wlk.n) + P
P =U f ERR(n) * Xtn-kl
ACe = R * Wlk.n) + P
Store W(k, n+1I

~~

=. .

n
....
t-,)="
f.II~
t'D

~

~

00

f

~

~

"G

~

;:!
~

HffHfI-HfflHfHHHfHtHfHHfHtffllllll'I •• III.I.I".11111

•

II IIDIF3
AIlW

TL30 - Adiptivt trusverHl filter with leaky UIS 11goritha
using the 1ltS32OC3O

;:,:

SUBF
STF
It STF

63
y(nl = SU1olkl"'ln-kl k=O,1,2, ••• ,63
k=O

.s;.,
~

/l'YF

.Ikl = r .. (kl + u•• lnl"'ln-kl k=O,1,2".,,63

Where we use filter order = 64, r = 0.995 and

'<:!

I'I'YF3

/l'YF3

.u =0.01.

II

~

~

s:
So
~

~

RPTB
/l'YF3

HtHHIHfftHfHtftHHffffHtHfltfHHffHHHt

~

• copy

Hadapfltr.int·

"

UJtS

HffHf*fHIHHHfI+HHHfHtHHHHHfffH

PERFOOI! AlW'TlI.t: FILTER

.set
.set
.ut

b4

0.01005
0.995

; IU

I leaky

INITIALIZE POINTERS IIIID MRAVS

\..;

N

C

0
V.
...<::>

.hxt
.set
LDI
LIP
LDI
LDI
LDI

begin

So
~

UF

~~

"

C

a
C

RPTS
STF
STF
LDI
LDI

order.8K
bn_addr.
Ixn..addr, ARO
!!on_iddr ,AR!
Ir _addr , AR2
O.O,RO
order-l
RO, 

"0
"0
tI)

R2,R7
R2,eln-N+t1
, Store .,..2(0+11
; Update last III

.~

.,~
=

=
<:Il

~~
<:Il .,
.... <:Il

==

-=--a

IJQ

-

00-

tI)

~~

~=
OO-~
id~

=. .
~;.

~~
tI)

=

~

~

~

00-

~

.,=
_.
:;=-

IJQ

-

~

.titl.

.t.xt

'1UtS'

tfHHfIHffHftlHlHHIftfHHtfHHflllllllllllllllHHHHHIHH

LIIS

IUtS , Adaptive Fi 1ttl' subroutint usi-ng TruIy.rlt.1 Structure
&nd UIS AI!lotithl, Looped Cod.
AI90rit""'

lJ1I1P
SM
SM
SM
CNFP

M3
1IR1.SA\£f
11R2.SA11E2
M3.SA\£3

II'Yk

0
ONE. 15
M3.XN
tIIiSI-1

lAC

N-I
yIn) =SlII.(k)",(.-l<) koO.I.2 ••••• N-I
koO

l

~

Where .. un filter oror

1)

2)
3)
4)
51
6)

~

O!

p.5.

;E

s:

IHHtHHMHHHltHHHHtHHHtHHHftHlHH

lEFllE MIl REFER SVIIIOLS

IV

.g101lo1

C

ddt

."1'),

.,

~

ERRF'

SAIIE2'
SA\£3.

.useet
.ustet

-

LIIS.tIIiSI.U,D.(l£. Y.ERR.XN.1II

.useet
.useet

·parUtters·,l
"pt.ruehrs·,l
"pt.rueteu·,l
"paraeters· •t

111111111111111111111111111111111.,

•

PERARI TIE _[1£ FIL1£R

HlIIIIII.II .... II .......... IIIIII.

ESTIMlE TIE SIIJR. Y

, ACe

D
ERR

=- Y(.)

; ERRC,,)

=OCn)

- YCn)

•FINISH RET

LT
II'Y
PAC
ADD

ERR
U

, T =ERR(.)
; P • U • ERR(n)

ONE. 15

SI1CH

ERRF

; round the N5Ult
; EAIf' IE U .. Bitt.)

U1I1l(

IIRI.OR[E1H
11112.111
M3.XN+1
ERRF
.... 11112
'.M3
'--, M2

; LMd ACQ4 with A(k,nJ &: round
; W(k,n+1) :IE W(k,nJ + P

SACH
lIftHZ

",O.ARI

; Store W(t.D+!)

LIlA
LIlA
LIlA

IIRI.SA\£I
11R2.SA11E2
M3.SA\£3

; Restore flgist.r Ml
; Restore register AR2
I Rutor. register AR3

LRlJ(
LRlJ(

LT
II'Y

lAIR

"'YA

_

..... 11112

"CI

'i

=
Q..
~.

==

t"'*.""'"

~~

00;
>9
-r::r

lI'DAl£ TIE WEIGHTS

RESERVE I1lDIESS FOR PARAlETER

•SAVEll

c

ADIII
SACH

1) Tht r.turn current auxiliary register .ill be AR2.
2} ARt AR3 hlV' been used in this subroutine.

Clteft. Cbein-tnuh9 Februt.ry, 1989

~

Q

15

, Store tht fi Ittr output

lEG

Dtt. MUt)' (f£ should be eqUi) to 1.
Dot, Meory U ilI•• ld b. eq..1 t. III (Q15 f.r ..tI.
PI! stlt.S bit sh •• ld be eq••1 t. 01.
SI.. status bit should be set to logic 1.
0I0!t st.t•• bit sho.ld b. set to I.
The current DP (dltl. IItllOry page pointer) should be pig' O.

So

C

=N

Initial conditions:

~
~

~
IV

; Configure BO

Iff(;

Nlt.z This subroutine ptrForls Adaptive Filter using the UtS Algoritt..
There Ire 50M initial (.editions to Ntt btFort (1lhng it.

t--.

~

; AtPflt N tiM'
;. Estilit. YCn)

COIIPlITE TIE ERRUI

.a.

C

Stt cur,...t reg.isttr
Save r.,ist.r M1
SaYe registtr AR2
Save rtgist.r M3
Configure 80 u progril MlOry
; CINt tbe P register
; Usin, roahding
; Point to the old.st I_I.

.(k) ••(k) .... (.)",(0-1<) koO.I.2 ..... N-1

-g'

n
v.

~dOOt.,t-

SACH

is

~

IIPTK
IIACD
Clf'D

FIR

ten) = dCn) - yen)·

~;:s

"'

LRlJ(

>

;
;
;
;
;

; Set ., counter
t Point to tilt c..fficitnh
; Point to the dlt... H..pl.
; T regist.,. = U .. ERfUn)
; P = U .. ERR(n) • I(n-k)
, P =U • ERR(.) • Xln-k)

(JQ-

~'<

:=:00
9=-=~
~Q
.~.•.fa

~

5r
~

;

Q
~

~=;3
~§

oo~

~<
~~

='"
n~

-

~-

CllOO
",

.eltd

=
~
",
~

....
=-

~

Appendix H2. Linker Command File for Assembly Main Program
Calling a TMS320C25 Adaptive LMS Transversal Filter Subroutine

:a

..

L

~

0:>

.:>

.0

....

!~
~

!;l . . .

- ....

.~ h~

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

265

~

••idth

132

tfHHtffllllllJlIllIllllIllIlllllHtHlllllllllllllllllllllllf
I

This is the initial boot routint for TItS32OC3O adaptive

I

fi I ttl' Progrus.

I

This .dule p.tfforas tbt foll.jng actioRs:
1) All.ciotts and initializes the 5YSt. stack.
21 Ptrforas utcHrlitialintioR. Ifhicb copies section
•• constl dda fro. 101 to D1TA HM.
31 P~Plre to st..rt the user's usably progru.

~

~

~

~
S

g'

.Q,
~

t~.
~

STAClLSIZE •••t

FP

. .,t
•stet

RESET

.1II0rd

40h
AR3

; Size of 5yst.. stack
; Frut pointer

-vectors·
dip_iait

I
I
I

AU.OCATE SIW:f FIll n£ SYSTEII STIICIC. INITIALIZE n£ FIRST IIIUIS IN
• ttxt TO POINT TO n£ STIICIC II1II lNITIALlZATUIl TAIILES.

I

stack

.used

lI.stack- ,STACK...SlZE

•text
I

stacuddr .•rd
init.Addr

• SET LI' n£ INITIAL STIICIC POINTER
UIP
lOl
lOl

~
I

...

I:)

So

~

~

c

LOid the address into SP
Aod into FP too

Pl~

of stored address

init-lddr

I Ott

; Sft ,chlress of ioit h,bl.s
1 If AM ..del. skip init

lOl
BID
lOl
LOl
SUBl
Rl'TS

RI

; Block copy

lIEQ

~

Q

Ott plge of stored addtess

liniLaddr.#IRO
-1,ARO
do..
tARO++,RI
do..
tARO++,1IR1
tARO++,RO
I,RI

UIP
lOl
Clf'1

v,

C

staclu.ddr
IstlCk-lddr ,SP

SP,FP

DO IIJTl)lNITIALIZATlIII

Q

N

~

a...
~

bogin
.end

.

c:::2.o

.t5
~

~
=
(i

=
>
~

~.

So
~

~

"ove next count iAto Rl
If there is .ort. ttI)N.t
Get next dtst address
Get next first !fOrd
Count - 1

'a

~

~

IIR

RO.tllRl++
tARO++,RO
RO,RI
do_init
tARO++,1IR
tARO++,RO
I.RI

~

stack

1?
;;t

s:

•done;

STI
II lOl
lOl
BNID
lOl
lOl
SUBl

; Get first count
1 If 0, nothing to do
; Get Hst address
f Get first .ord
; Count - 1

do..init:

~
....
if
"1

........~
!.
....
a....
!!oil
Q

=

I9

~

I

STF

HffHHHHHfHHffHtHHHHtHfHfHIIII •• IIIIII •• IIIII.I.1

~
I-

BT30 - TI1S320C30 adaptive transversal filter lIIIith

u.s

STF

•

g'

~

tin) = din) - yIn)

RPTB

~

Whtre

~

~:::.

wlk)

:!l

=IIII1k1
IlIf

u~(nlfx(n-kl

k=O,1,2, •••• tH

ARO and ARt should point to x[O] and .. (O].
2) Data Itllory u should contain step size.
3) Dlta .ellory order should contain M-2 ••bere N is filter ordtr.
4) Data It.orits d. Y. and t should bt d.fin.d in cal1.r routine.

~

§:

s.

Ch.n. Ch.in-Chung I1irch. 1989

PIPF
PIl'
PIPF
PIPF
PIl'

R3
R3
R2
Rl
Rl

.global

l.ftS3O.u.d.y.f.order

.0

PERf~ ADAPTIVE FILTER
HHHHfH+HHflfffHHUftHfffHHHHHHffH+

•text
UIS30

.Sft
PUSH
P\JSIF

PUSIF
PUSH
PUSIF

~
e.i
IV

R3

, R3

=0.0

II'YF3

_IIIZ.IM1++IIIX.Rl
forder
iARO++I 1IX. IMl++11 IX.Rl
Rl.R3.R3
, ylnl = .£I.x[J
Rl,R3
; includt lut result

ADIf

="Un)

; U,date last

III

+ .(n)

f

u

I-

xln-i)

>=
~

-C"
CICl_

.0-C
~'<

.end

0"0"

=

OQ"'1
rIl S5° 5°
CICl ~
-Q

0"...,

="'1

c

•

; wilnt!)

==~

OOrIl

OOrIl
t,H~
N~

R3

0.0.R3

If'YF3

;
;

;fori=N-2

~

t""'0

==g;

lDF

II ADlf3

; Rt

~"'1

Rl
Rl
R2

C

RPTS

=0, N-3
=.(n) * u t x(n-j-l)
R2 = IIIHn) + tin) .. u * xln-i)
lIIi1n+1) =lIIitn) + tin) oJ u f xln-ii

; Do i

~ ~

$

CMUTE FILTER OOTPUT yin I

Q

~

f

00

RETS

tHfHHtH++f++IHftHfHfIHlfHfHfffHHHlfH+

=tIn)
=.(n)

u
Rt
f u f xln)
Ini hal iz. repnt counter

lordtr,RC

ADIF3
STF
ADIF3
STF

II'YF3
AOOF3
STF
II'VF3

R3

lu.R3
_UIX.R3.Rl
1.Rt
UIS
tARO++(UX,R3,Rl
IMl.Rl.R2
R2.IM1++llIX
iARO;R3.Rl
IMl.Rl.R2
R2,tAR1++(UI
IMl.Rl.R2
R2.IM1++I1IX

HfHff+tH+""UHHH+H+UHHH+"HHHHfffH

'"

~

"
UIS

"

1)

~

v.
<::>
....
S.

+

us. filter order = N and au = 0.01.

Initial condition:

~

Q

:s

Q..

IPYF
1I'YF3
LDI
SUBI

C

~

lPDATE !EIGHTS oil lIND SHIFT xil

N-l
ylnl =SIJI1 .lkliXln-kl k=O.1.2 ..... N-l
k=O

~
e.i
IV

>

"CI

algoritt. asseltbly subroutine.

AlgorHhl:

is

'"

st ... ylnl
,Inl =dlnl - ylnl
Store .In)

R3.1y
h.R3
R3.h

catPUTE ERRIJ! SIIl'W. ,Inl IWD STOOE yin I AlII ,Inl

~;

I

t,H-

=00

.0~
~

Appendix H5. Linker Command/file for Assembly Main Program
Callbig the TMS320C30 Adaptive LMS
Transversal Filter Subroutine

268

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

.title

'CUIS'

~

HHHltHffHHfltHHHHHflHHtfHtHHHffHfHfHllftHtHlfH

~

CUIS: Ada.ptive Filter C subroutine using Trl.nsversal structure
Ind UIS Algoritha, looped Code

s

Algoritll.:

"G

~

g"
;:...

§-

f(n) =

'g.

.Ik)

~
~

~

;;J
~

~
So

~

Note:

d(n) -

=.Ik)

So
~

~

~

N
C

ac

_lis

y(nl -

+ utoln)txln-k) k=O,1,2, ••• ,N-l

. text
SAR
SAR
SAR
SAR
SST
SSTI

SPII

.used

~

ff-

ACe

SUB!(

1
OllIE!

,IIUER=N-l

SACL

FRSTIIP
AIRST

; Store address of h.st tip

LAC

f-

SACL

SACL

_1.s

.ustct

.lIsed
.unct
.usect

.usect
.usect
.usect
,useet
,useet
.used
,useet
.vstet

f-

SACL

D
t-,O,ARJ
ARJ,FRSTIIP

LAC

SACL

$

All)

ClEFFICIENTSS

a ::.

O~

°

pointer for getting pi-pueter

~. C

=N

:I

(JQ

If'VK

°

lJU(

1,15

lJIR
FIR

ARJ,AlR.ST
IIUER

APT
I1ACII
CNFD
APAC
SACH

a:£FFP,f-

tD

:I

~~

~~
oor:ll

, Got and ,to... tht D

~D:!

N-

=00
('"J ....

N"

UI?$....

; Configure BO i.5 progru anory
; tlnl' the P register
; Using rounding

.,==

; Point to the oldest supJ.
; Repeat N tiMS
; Estillih Y{n)

tD

~
....
....
:s"

; Configurf BO as dtta . .ory
; Store the fi 1tel' output

t.EG

~

.... .,~
:s"D:!

ESTIIIIITE 11£ SIGNAl. V

aItPUIE 11£
IEfIt.E AIJOOESSES OF IlfFER

:s"==

P re,ister shift .ode
sign extension lode
overflollll .ode
data pogt =

; Insert newest supJe

CNFP

==

~r::;'
....
.,
.... C

; Gtt and store the KJ

LAC

LRLK
'pataMten',!
'parueters',1
'pirueters' ,I
·pa.rueters· ,I
'puueters' ,I
'pvueters' ,I
'pltueters" ,I
'pariMters' ,I
'pltueters",1
'parutters" ,I
'p&tilltt.rs" , 1
'pvueters' ,I
'p&rueter5',1
'parueters' ,I

(JQ

LAC

l.IfI(

ADlJAESSES Fill PARAIETERS

.usect

>(1
-00

Set
Set
Set
Set
Stt

sovn

HHHfttHHfHHHHfHHtHHHHHHHHfHHH.

DSTO.
I1SW
SAllElI
SAVE2'
SAllE3'
SAllE4'
ClIDER'
X.
D'
U:
V'
EAR:
EARF'
AIRST.

~

~

ARJ,SAIlE3
AR4,SA1lE4
DSTO
DSTl

SSX"

eMn, Chein-Chung February, 1989

~

.

ARl,SAllEl

AR2,SAVE2

AIl.J(

.def

Q..
....
>!!

GET TI£ ADAPTIIIE FILTER PARAI£TERS

Otta ....ry 0200" 0200h+N-1" 030011 0300h.... l ate reserved.

~

c

>

"0
"0
tD
:I

SAllE 11£ VAU£S OF 11£ REGISTERS

las(n,lIU,d,x,'Y,&t1
n - order of fi 1ter
au - convtrgence factor
d - desired sighil

N

9...

.equ

PERFOOII 11£ ADAPTIIIE FILTER

)( - input signal
IIy - a4dr of output signal
lie - addr of errot 5igM,1

~

OffOOb
O2OOh
OJOOh

.equ

fffffHHftHfHffftffHKHHHtH

Where we use filter order = N
USl.ge:

.equ

Hftf,****fflfHfffHHHfHffHHf

N-l
yIn) = su..lk)txln-l<) k=O,1,2, ... ,N-l
k=O

.Q.,

COEFFl':
COEFFD:
FRSTAP:

~

ERR(J!

, ACe

~

=- YIn)

00

-~~

'1

- __ -.;c.-'<-"-;:::'--'o-,'

I!i

i

I!i~

J~

! "ui~
~

I

270

i.U
Uii~~!-.·

~

.~ g~ 1(" ~ ~ ~

U

!iI.!t

i

i

.

!f

~

iii!' iii

nu

~ ~~i;jii
iii
I
~
l!! 5§~~~~
l!! ~~il~~il~
~
;;;
~
Iii
.,;
iii;j~~;jl!i~

.~ .

Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30

~

1

f

Algoritb.;

is"

tlnl

~

~

Wher.

~

- ylnl

1ft

ast filter order = N lnd

~

IU -

= 0.01.

ADIf
I

convergence factor

So
So

'"

eben, Chein-chung Pkrch.

.globol
... t

FP

II'YF
II'YF3
Ull

1989

RPTB

_til'
AR3

II'YF3
:: ADIf3

_~IIo(FiLTER

S;

1HtHHHHt111111111l11111111111111

So

_tll5

'"

~

~

c

Q
c

•

.toxt
.set
1'IJSH
LDI
1'IJSH
1'IJSH
1'IJSH
1'IJSH
PUSIF
1'IJSH
PUSIF
I'\.ISH
PUSIF
PUSIF

GET FILTER _

•

FP
sp.FP
IIRO
AlII
AR2
RI
RI
R2
R2
R4
R6
R7

filter order
pointer for x[]

pointer for

UIS

f

III[]

loop counttr

0.0.R2

, R2

....Q.

=0.0

_+UI.oARl++UI.RI
R4
oARO++U I. oARl++III.RI
RI.R2.R2
, ylnl =III.xll
RI.R2
, Include l .. t ...ult

<-fP121.AR2
R2. t+fP!II.R7
R2.tAR2
<-fP131.AR2
R7.tAR2

llF
:: STF
STF
AIIIF3
STF
PII'f
PII'f
I'(P

PII'f
I'(P

PII'f
I'(P
I'(P
I'(P
I'(P
I'(P

RETS

.end

t+fP121.R7
t-AROIII.R7.RI
R4.RC
UIS
I--AROIII.R7.RI
t-ARIUI.RI.R2
tIIRO.R6
R2.oARl
R6. ttIIROlII
t-AR1U1.RI.R2
R2.tARl
R7
R6
R4
R2
R2
RI
RI
AR2
AlII
IIRO
FP

~

.~
>r':l
-t:n

Get yen) address
tlnl =41nl - ylnl
5on4 out ylnl
Get .(nl address
Send out tin)

~.,

=
=-

~a

=-=a.
a

=

OtD

IJ'DAlE IEIGHTS III AND SHIFT xl]

HHlfHHHIIIIIIIIIIII.IIIII.III.III •• 11111111111

tv

Get
Get
Get
Set

CIItPUTE ERROR SllHIl. oW AND STOllE ylnl AlII olnl

LDI
SUIIF3
:1 STF
LDI
STF

d - desired sign&)
.. - filter coefficients
Ix - input si9MI buffer
&y - addr of output signal
&t - ddr of error Signal

~.

-

IU

Usi.JI= tlls(n,IIU,d,h.Lx,'Y,h'
n - order of fi Jtel'

;;;f

~

II'YF3
RPTS
II'YF3
II ADlf3

.flel = lII(k~ + u*eCnltxh-kl k=O,1,2, ••• ,N-l

~.

Q

=41nl

<-fP121.R4
<-fP161.1IRO
t-FP151.AIII
2.R4

CIItPUTE FILTER OOIPUT ylnl

llF

k=O

~

~

•

N-I
ylnl =SlII IlklOXln-k1 k-o.1.2 ..... N-1

§"

~

LDI
LDI
LDI
SUBI

C130 - 11tS32OC3O C subroutine adiptivt transversal fi Her with
UIS .lgoritllo.

; R7

ia

.(nl

f

u

; Rl I;. e(l) I u f x(n..... ll
; Initiahze I"tpeat counttr
, Do i =I. N-I
; Rt

==-

t(nl

f

u .. x(n-i+1)
f u • xln-j)

; R2 =..Hn) + tin)
, Got xlln+HI+II

; 1IIi1a+U = wUn) + .(n) .. U • xfn-i)
, Shift xCl
; R2 =_Un) + e(n) • u • x{n)

; Update last.

~. 0

=

~
(JCI1-3

~~
tD

=

1-3~

~~

~!.

=. .

~t:n

r':l.,
~=

=~
:l.
~

i

272

Implementation of Adaptive Filters with the TMS320C25 or the TMS32OC30

A Collection of Functions
for the TMS320C30

Gary Sitton
Gaslight Software

273

274

A Collection of Functions for the TMS320C30

Introduction
This report presents a collection of efficient machine language programs for advanced
applications with the TMS32OC30. These programs provide basic math and transcendental functions. Other routines include vector functions, FFTs and linear algebra.

Library Overview
The set of programs fall into six categories:
I.
IT.
ill.
IV.
V.
VI.

Normal precision floating point math functions,
Extended precision floating point math functions,
Integer arithmetic routines,
Vector utility routines,
Radix 2 FFT routines, and
Linear algebra routines.

Categories I and IT are programs which implement a minimal set of elementary
mathematical functions for advanced applications. In these categories, the functions FPINV
and SQRT are improved versions of the programs in the TMS320C3x User's Guide [1].
In category ill, IMULT and IDIV are improved versions of the programs EXTMPY and
DIVI in [1]. In category IV, *FMIEEE and *TOIEE are array versions of the TOIEEE
and FMIEEE scalar programs from the User's Guide.
The names and short descriptions of these routines use some special notation:
Categories I and IT:
Categories IV and VI:

Categories II and VI:

xd - indicates that the relative accuracy of the implemented function is x decimal digits.
* - program name prefix stands for M or R.
M - selects the memory based parameter entry point.
R - selects the register based parameter entry point.
X - indicates the extended precision program
version.

A Collection of Functions for the TMS320C30

275

Consult the program source listings for more details.
The following are brief descriptions of the programs by category:
I.

Normal floating-point (32-bit) math functions (SMATH.ASM):

A.
B.
C.
D.
E.

F.
G.
H.
II.

276

a 7d sine(x) for all x in radians.
a 7d cosine(x) for all x in radians.
a 7d exp(x) for all Ixl ~ 88.
a 7dln(x) for all x > O.
a 7d atan(x) in radians for all x.
an 8d sqrt(x) for all x ~ O.
an 8d lIx for all x *- O.
an 8d x/y for all x and all y *- O.

SINX
COSX
EXPX
LNX
ATANX
SQRTX
FPINVX
FDIVX
FMULTX

-computes
-computes
-computes
-computes
-computes
-computes
-computes
-computes
-computes

a 9d sine(x) for all x in radians.
a 9d cosine(x) for all x in radians.
a 9d exp(x) for all Ixl ~ 88.
an 8d In(x) for all x > O.
an 8d atan(x) in radians for all x.
a IOd sqrt(x) for all x ~ O.
a IOd lIx for all x *- O.
a IOd x/y for all x and all y *- O.
a IOd x*y for all x and y.

Integer (32-bit) math routines (SMATm.ASM):

A.

ILOG2

B.

lMULT
IDIV

C.
IV.

-computes
-computes
-computes
-computes
-computes
-computes
-computes
-computes

Extended-precision, floating-point (40-bit) math functions (SMATHX.ASM):

A.
B.
C.
D.
E.
F.
G.
H.
I.
III.

SIN
COS
EXP
LN
ATAN
SQRT
FPINV
FDIV

-computes m = log2(n), n ~ 2m for use with radix
2 FFT programs.
-computes 64-bit product of two 32-bit numbers.
-computes quotient and remainder of two 32-bit
numbers.

Vector utilities (SVECTOR.ASM):
A.

*CORMULT

B.

*CONMULT

C.

*CBITREV

D.

*FMIEEE

-in-place computation of the complex vector product of two complex arrays using the complex conjugate of the second array.
-in-place computation of the complex vector product of two complex arrays.
-in-place bit reverse permutation on a complex array with separate real and imaginary arrays.
-in-place fast conversion of an IEEE array to a
TMS320C30 array.
A Collection of Functions for the TMS320C30

V.

VI.

E.

*TOIEEE

F.
G.
H.

*VECMULT
*CONMOV
*VECMOV

-in-place fast conversion of a TMS320C30 array to
an IEEE array.
-in-place multiplies a constant times an array.
-moves (fIlls) a constant into an array.
-moves (copies) an array into another array.

Radix 2 FFT routines ($FFT2.ASM):

A.

CFFFT2

B.

CIFFT2

-Complex DIF forward radix 2 FFT using separate
real and imaginary arrays and 3/4 cycle sine table.
-Complex DIT inverse radix 2 FFT using separate
real and imaginary arrays and 3/4 cycle sine table
(does not include the liN scale factor).

Linear algebra routines ($LINALG.ASM):
A.

*SOLUTN

B.

*SOLUTNX

-Solves a well conditioned system of linear equations with any number of dependent variable sets.
Uses no (diagonal) pivoting with normal-precision
floating-point math.
-Solves a well conditioned system of linear equations with any number of dependent variable sets.
Uses no (diagonal) pivoting with extendedprecision floating-point math.

Extended vs. Normal Precision
Categories I, II, and VI represent a dual collection of programs implemented with
32-bit single- or normal-precision TMS320C30 floating-point arithmetic, and with 4O-bit
extended-precision TMS320C30 floating-point arithmetic. Some of the normal-precision
programs (category I, for example) have been written using the TMS320C30 RND instruction for rounding to obtain the optimal precision from the standard floating point
TMS320C30 instruction set. This has been done with a slight loss of speed. Such rounding can be carefully eliminated by the user if the additional speed is necessary at the expense of some accuracy.
Extended-precision was implemented on the TMS320C30 by the simple implementation of the 4O-by-40 floating-point multiply routine, FMULTX. This was necessary since
the TMS32OC30 has 40-bit addition and subtraction instructions, but the multiply operates
only on 32-bit inputs. By using the native add and subtract FMULTX and the extendedprecision registers RO to R7, 40-bit floating-point math was effect~. A1l40-bit constants
are stored in two consecutive words in memory. The first word is the normal truncated
32-bit floating-point number. The least significant byte of the second word contains the
remaining bottom 8 bits of the extended mantissa. The programs are coded to properly
load extended-precision registers with these double-word constants.
A Collection of Functions for the TMS320C30

277

The extended-precision versions of the programs in this report may be slower than
their normal precision counterparts. When using extended-precision results in RO from
category II programs, note that the results may be stored in memory with or without rounding. A more accurate normal-precision result will generally be obtained by rounding. You
should never round before using an extended-precision result as input to another extendedprecision program unless special circumstances exist. Note that truncation, not rounding,
will occur if an extended-precision register is moved to any 32-bit register or any memory
location. This will generally cause loss of accuracy in the amount of the value of the least
significant bit of the mantissa.

Program Utilization
Since all programs in this collection are intended to be invoked by a CALL instruction, you must have the stack pointer (SP register) appropriately set to an available memory
area, preferably in internal RAM. Programs in categories I and II save and restore the
data page register DP by using the stack area pointed to by SP. Programs in category
III do not alter or use the DP register at all. The programs in categories IV through VI
alter but do not restore the DP register.
All of the programs in categories I through III, except for ILOG2, are implemented
as straight line code. You may wish to disable the instruction cache while these programs
are being executing. This will cause no loss of execution speed and will avoid flushing
out potentially reusable instructions in the cache. It is beneficial to have the cache enabled
when using most of the remaining programs (categories IV through VI) as they generally
contain multi-instruction loops.
Programs in categories IV through VI allow input through externally defined variables
addresses. The .global references indicate these addresses, where the input variable values
and/or addresses are located. The starting address of these memory locations is given by
the external variable $PARAMS. All of the addresses are assumed to be in the same
TMS320C30 memory page as $PARAMS. If this is not the case, the addresses or the
programs should be changed assure that the DP register gets set properly.
Programs in categories IV and VI also allow the use of registers to hold input
parameters. The exact registers to be used are found in the program source listings. When
using the register input entry point, refer to the program using the R prefix on the program name, e.g. RSOLUTN. The memory based parameter input entry uses the M prefix,
e.g. MSOLUTN. The .global references to the R prefix entry points may be deleted if
they are not needed.

278

A Collection of Functions for the TMS320C30

Function Approximation Techniques
Categories I and II are made up of a collection of elementary mathematical functions numerically approximated using two basic methods. The functions SIN, COS, EXP,
LN, and ATAN are approximated by using polynomials fitted to the various functions
over a limited range of the independent variable. The functions SQRT and FPINV are
approximated by iteratively solving a particular non-linear equation. The extended precision versions of these programs (category II) use the same approach with extended-precision
arithmetic and resort to more accurate polynomials or more iterations to achieve the desired
precision.

Polynomial Approximations
The polynomial approximation method is fundamentally very simple. A limited part
of a function is approximated by a polynomial of some order sufficient to obtain the desired
accuracy. The polynomial is generally a series of the form:
n

P(n, x)

=

E [a[i]xiJ,

(1)

i=O

where x is the independent variable, n the polynomial order (a fixed integer), and a[i]
is a set of n + 1 fixed coefficients.
The desired function, say f(x), is then approximated by a particular P(n, x) such that:
f(x)

=

P(n, x)

+ e(x), xl < x < xu,

(2)

where xl and xu are the limits of the domain of x, and e(x) or e(x)/f(x) is the error function which has been usually minimized in the min-max (equi-ripple) sense. This is done
by selecting an appropriate means of calculating the coefficients a[i].
Various techniques and schemes are used in the selection of:
•

the approximation interval,

•

transformations on the function,

•

selection of the polynomial form,

•

error minimization criteria, and

•

calculation of the coefficients.

See Hastings [2] for an excellent tutorial on this numerical methodology. All of the
polynomial approximations used in here were obtained from the National Bureau of Standards reference edited by Abramowitz and Stegun [3].

A Collection of Functions for the TMS320C30

279

Non-Linear Equation Approximation
The second method of approximation, using the solution of non-linear equations,
is easier to understand. This method requires that a solution for the equation g(x) = 0
be found. One means for solving this equation is by Newton-Raphson iteration. This can
be understood by considering the Taylor series expansion for g(x):
g(x

+

h)

=

g(x)

+

hg'(x)

+

r(x, h),

(3)

where r(x, h) is the remainder of the series (which can be assumed to be small), and g'(x)
is the derivative of the function g(x). Leaving off the remainder in (3) we get, in terms
of incremental values of x, the approximation:
g(x[i+ 1])

=

g(x[i])

+

[x[i+ 1] -x[ing'(x[i]).

Solving for x[i+ 1] in (4) with g(x[i + 1])

=

(4)

0 yields the approximation:

x[i + 1]

= x[i] - g(x[i])/g'(x[i]).
Thus, x[i + 1] will converge to a solution of g(x) = O.

(5)

Convergence can be shown
to be quadratic, i.e. the error in the approximation at each iteration is proportional to the
square of the error in the previous iteration. Minimally, this requires a sufficiently close
starting value for x[O] and the condition that Ig'(x)1 > 0 for all iterated values of x.

Math Functions Details
The approximation techniques can be applied to each of the classes of functions.
The following sections describe the approximations as they are applied to each function.

Inverse and Square Root l'unctions
For the problem of computing good approximations to sqrt(c) (SQRT and SQRTX
routines) and lIc (FPINV and FPINVX routines), both g(x) and g'(x) must be derived
and then use the iteration of equation (5). This is complicated by the restriction that division should be avoided since the TMS320C30 has no divide instructions. For the iteration
to find the inverse of c, you can' write:
g(x[i])

=

lIx[i] - c

=

0,

(6)

which is solved when lIx = c or x = lIc. Taking the derivative of (6) and substituting
into (5) and simplifying gives us:
x[i+ 1]

=

x[i][2 - cx[i]J,

(7)

which needs no division.
Thus, (7) will converge to lIc with the accuracy (in digits) for each iteration equal
to twice that of the preceding one. Thus, if x[O] approximates lIc to 3 bits of precision,
only three iterations of (7) will yield about 24 = 3(23) bits of accuracy.

280

A Collection of Functions for the TMS320C30

A similar iteration from f(x)
g(x[i])

= x[i]2

- c

= x2 for sqrt(c) can be derived from the formulation:

= 0,

(8)

which is solved when x2 = c or x = sqrt(c). The solution for (8) leads to the classic square
root formula:
x[i+l]

=

0.5[c/x[i] + x[i]J,

(9)

but this equation uses division. However, the iteration from f(x)
be shown to be:
x[i+ 1]

=

=

lIx2 for lIsqrt(c) can

x[i][1.5 - c'x[i]2],

(10)

where c' = c/2 = 0.5c. Though (10) needs no division, the final desired result must be
transformed by an extra multiplication by the input c because:
. sqrt(c)

=

c[lIsqrt(c»).

(11)

Formula (10) will also converge, in the precision doubling fashion of the NewtonRaphson iteration, given a suitable close starting value for x[O] and the use of sufficiently
accurate arithmetic. Note that the extended-precision version routines FPINVX and SQRTX
both use an extra iteration (for a total of 4) to achieve the needed 32-bit accuracy for the
40-bit format.
The initial guess x[O], for the iterations of lIsqrt(c) and lIc, may be obtained using
= (1 + m)2e , where
o :s m < 1 and -127 :s e :s 127. The extra 1, added to the fractional mantissa m,
is the implied bit. Then we can write the inverse of cas:
an interesting approximation. A TMS32OC30 floating-point number c

lIc

=

(12)

11(1 + m)2- e .

An excellent approximation for the inverse of the mantissa is:

11(1 + m)

=

1 - ml2,

which is exact at the end points: m
reciprocal would be:

lIc

(13)

=

0 and m

= (1 - ml2)2- e.

A Collection of Functions for the TMS320C30

=

1. Then the approximation for the
(14)

281

It turns out that this approximation can be achieved in a single logical operation.
If you compute the unlikely value of c' = c XOR OFF7FFFFFFFh, you would complement all bits in c except the sign bit. Including the implied bit and taking the effect of
one's complement arithmetic into account results in a final value of:

c' = (1

+

(1 - m)J2-(e + 1),

(15)

or the desired approximation:
c' = (1 -m/)2- e = lIc.

(16)

c' gives about 3 bits of precision, which is an excellent seed x[O] for the lIc iteration.
Using e/2, you have a start for the lIsqrt(c) iteration as well.

Sine and Cosine Functions
The SIN, COS, SINX, and COSX (sine and cosine) routines all use the same basic
approximation (section 4.3.98, p. 76 in [3]). The series is for sin(x)/x but is obviously
transformed by multiplying by x. The polynomial of even terms then is of the form:
5

sin(x) = x

E

(a[2i]x2iJ

+

xe(x) ,

(16)

i=O
where Ixl S Pi/2 and Ixe(x) I S 2(10- 9). Instead of using another power series for cos(x),
you can use the fact that:
cos (x) = sin(x

+

Pi/2).

(17)

The series given by (16) is only accurate in the 1st and 4th quadrants, i.e. Ixl S
Pi/2. Sin(x) in the other two quadrants is found from:
sin(x) = sin(Pi - x).

(18)

The case for x < 0 is expediently handled by using Ixl for all calculations except
for the final multiply by x in (16).

Exponential Functions
The EXP and EXPX (exponential) routines use an approximation (see Section 4.2.45,
p. 71, in [3]). The expansion is of the form

7

exp(x) =

E (a[i]xiJ + e(x),

(19)

i=O
where 0 S x S In(2) and le(x)1 S 2(10- 10). The series for 2Y is found by substituting
y = xlln(2) since:
exp(x) = exp(ln(2)y) = 2Y.

282

(20)
A Collection of Functions for the TMS320C30

The new expansion then becomes:

7

2y

E

=

[b[i]yi)

+ e(x) ,

(21)

i=O
where b[i]

=

a[i](ln(2)i). See the coefficients in the EXP routine.

Values of exp(x) for x outside the convergent range are found by two means. First
for x < 0, note the relationship:
exp( -x)

=

lIexp(x),

(22)

which does require an inverse (see the FPINV and FPINVX routines). For y > 1, let
y = n + f where n = 1, 2, ... and 0 S f < 1. By substituting y in (20), you get
exp(x) = 2n+f = (2f)(2n).

(23)

Natural Log Functions
The LN and LNX (natural or base e logarithm) routines use the approximation from
[3] (section 4.1.44, p. 69). The expansion comes in the form:

8

In(1

+ x) =

E

+ e(x),

[a[i]xi)

(24)

i=1
where 0 S x S 1 and le(x) I s 3(10- 8). The expansion for In(y) can be used if the
transformation y = x - I is applied.
Values ofln(x) for x outside the convergent range are found in the following way.
First, make the substitution x = f(2n) for 1 S f < 2 and n = 0, 1, ... , and then write:
10g2(x)

=

10g2(f2n)

=

n

+ 10g2(f),

where 10g2(x) is the log base 2 of x. Using the relationship that 10g2(x)
you get the equation
In(x)

=

In(f)

+ nln(2).

(25)

=

In(x)/ln(2),
(26)

Arctangent Functions
The ATAN and ATANX (arc or inverse tangent) routines use the approximation
from section 4.4.49, p. 81 in [3]. The series with only even terms for atan(x)/x is transformed to

8

atan(x)

=x

E

[a[2i]x2i)

+ xe(x) ,

(27)

i=O
A Collection of Functions for the TMS320C30

283

where -1 ~ x ~ 1 and Ixe(x)I ~ 2(10- 8). Values for atan(x) for x outside the convergent range are obtained by noting the following identity:

+ 1» + Pi/4.
Using the bilinear transformation y = (x -

atan(x)

=

atan«x - 1)/(x

(28)

1)/(x + 1) assures, at the expense of
a divide operation, that y ~ 1 for x ~ 1. The case for x < 0 is expediently handled
by using Ixl for all calculations except for the final multiply by x in (27).

Divide and Multiply Functions
The last group of routines in category I and II are those for the additional arithmetic
functions FDIV and FDIVX (floating-point divides), and FMULTX (extended-precision
floating-point multiply). The divide operation for the TMS320C30, a = blc is done by
calculating the reciprocal or inverse of the divisor c. Then you compute
a

= b(1/c).

(29)

For a normal-precision divide, FDIV finds lIc by a call to FPINV. A subsequent
normal TMS320C30 floating-point multiply of the rounded inverse provides a suitable
quotient. For an extended-precision divide, FDIVX finds lIc by a call to FPINVX. The
inverse is then extended-precision multiplied by the dividend using FMULTX.
The extended-precision floating-point multiply simulated by FMULTX is the key
to the implementation of virtually all of the extended-precision functions. The extended
multiply is achieved using the normal floating-point multiply of the TMS320C30. For two
extended-precision numbers xa and xb, you can represent each as the sum of two floatingpoint numbers: xa = a + ea(2- 24) and xb = b + eb(2- 24 ). The quantities ea and eb
are the one-byte extensions of xa and xb respectively.
Thus the complete product xc
xc

=

(a)(b)

+

[(a)(eb)

+

=

(xa)(xb) can be expanded and written as

(b)(ea)]2 -24

+

(ea)(eb)2 -48.

(30)

The last term in (30) is always less than the 32-bit precision in the mantissa of the
final result. Therefore, you need only to compute the first two terms in the product xc.
Also, note that all the indicated products in (30) may be computed using a normal-precision
native TMS320C30 multiply as long as the terms are collected in extended-precision
registers. The additions are also done using the native TMS32OC30 add as it is implemented
in extended-precision.

284

A Collection of Functions for the TMS320C30

Integer Arithmetic Program Details
Integer routines differ from the floating-point versions because they produce only
integer results. If the computation can produce fractional values, then the fraction must
be truncated to leave only the integer result.

Integer Result Log Base 2
The routine ILOG2 is a useful utility for computing integer value m of the log base
2 of the integer n. The result is computed by successive multiplies by 2 (implemented
as shifts by 1). The resulting relationship is n :s 2m, such that if log2(n) is not an exact
integer, m is rounded up to the next largest integer. This is useful as it allows the determination of m from any value n > 0 (e.g. not a power of two) which might require the
padding of additional values (zeros) for a radix 2 FFT. This program is very fast because
of a delayed branch loop and internally requires only 4(m+ 1) cycles (cached) to do the
calculation.

Extended Precision Integer Multiply
The IMULT routine is a modified version of the program EXTMPY in the
TMS320C3x User's Guide [1]. It has been modified and slightly speeded up. The negation
of the final 64-bit product is done in two instructions by direct two's complement negation rather than by using one's complement to simulate the same result. The product is
computed by breaking the multiplier and multiplicand up into two 16 bit integers each.
Thus the full product c of the numbers a = au(216) + al, and b = au(216) + bl is
c

= (au)(bu)232

+ [(au)(bl) + (bu)(al)]2 16 + (al)(bl),

(31)

where the powers of two indicated are accomplished by shifts. Note that each product
in (31) must be represented as a 32-bit integer. The adds in the sum must be done with
care to facilitate the carry between the two final 32-bit components of the product.

Integer Divide
The IDIV routine is a modified version of the program DIVI in the TMS320C3x
User's Guide [1]. It has been modified to return the absolute value of the remainder of
the integer division. The remainder was originally computed, but was discarded during·
the extraction process for the quotient. A few more instructions allow the extraction of
both the quotient and remainder from the result of the SUBC process. The program IDIV
may be used for the computation of the modulo function. The output of IDIV is the pair
[q, IrlJ = alb, with the property:

o :s r = (a modulo b)

< a,

for a > 0 and b > O. The complete relationship is, by definition, a
a and b.

A Collection of Functions for the TMS320C30

(32)

= bq +

r, for positive

285

Vector Utility Routines
Vector utilities are functions which operate on arrays of numbers. Some utilities,
like dot products and convolutions, are simple. Other utilities, like those presented here,
are more involved.

Complex and Complex Conjugate Array Multiplies
The array routine *CORMULT computes the point-by-point complex conjugate
multiply of two complex arrays. If the arrays are ci and c2, and are of length n, then:
el[k]

+-

el[k]conj(c2[k]), k

=

1, ... , n,

(33)

where +- means replaces. Each complex array is assumed to be stored as two separate
arrays, i.e. (elJ = (xl, yIJ and (c2J = (x2, y2J. In cartesian complex representation, (33)
becomes
(xl

+ iyl)

+-

(xl

+ iyI)(x2 - iy2) ,

(34)

where i represents the imaginary constant sqrt( -1). Separating the real and imaginary
parts, we have:
xl

+-

xlx2

+ yly2, yl

+-

ylx2 - y2xI

(35)

This operation can be used for the frequency domain correlation of two FFTs to implement time domain correlation.
On the other hand, the array routine *CONMULT computes the point-by-point complex multiply of two complex arrays. If the arrays are c1 and c2, and are each oflength
n, then
el[k]

+-

c1[k](c2[k]), k = 1, ... , n,

(36)

In cartesian complex representation, (36) becomes
(xl

+ iyI)

+-

(xl

+ iyI)(x2 + iy2).

(37)

Separating the real and imaginary parts results in
xl

+-

xlx2 - yly2, yl

+-

ylx2

+ y2xl.

(38)

This operation can be used for the frequency domain convolution of two FFTs to implement digital fIltering.

286

A Collection of Functions for the TMS320C30

Complex Array Bit Reversal
The array routine *CBITREV executes an in-place bit reverse permutation on two
arrays simultaneously. This operation is generally used for index scrambling before a DIT
FFT (decimation in time, see CIFFT2), or after a DIF FFT (decimation in frequency,
see CFFFT2) for index unscrambling. Therefore, *CBITREV is useful in permuting complex arrays stored as two separate arrays which are associated with radix 2 FFTs. The
program uses the bit reverse indexing feature of the TMS320C30 to achieve this function.
The loop in *CBITREV is nearly as efficient in permuting two arrays together as permuting one array alone. This is due to the use of parallel load and store instructions and
a delayed (single cycle) conditional branch.

Floating Point Conversions
The array routines *FMIEEE and *TOIEEE are vectorized versions of their original
scalar counterparts FMIEEE and TOIEEE. Both routines do fast conversions from or
to IEEE format by avoiding dealing with special rare cases. Also, both programs convert
the numbers in the arrays in-place which destroys the original data. These array versions
of the format conversion routines are much faster than calling the scalar version routines
in a special loop. These routines also have their own internal, shared constant table for
conversions.

Vector Primitives
The array routines *VECMULT, *CONMOV, and *VECMOVare a useful suite
of efficient programs for simple array operations. The first routine, *VECMULT, performs the simple operation' x[k] +-- x[k]c which is a scalar-vector multiply useful in uniformly scaling an array by a constant c. You can use this for scaling arrays after an inverse
FFT by choosing c = lin. The next routine, *CONMOV, performs the operation
x[k] +-- c which is useful in filling or initializing any portion of an array to a single constant c. The last routine, *VECMOV performs the simple operation x[k] +-- y[k], an array
move, and is, therefore, generally useful.

FFT Routines
This category contains the two complementary radix 2 complex FFT programs
CFFFI'Z and CIFFT2. These programs differ from previously available TMS32OC30 FFT
programs in that they operate on complex arrays which are stored as two separate and
independent real arrays. Both routines do the FFTs in-place and do no index permutations
or constant scaling (multiplication). Also these programs require only a 3/4 cycle external, pre-computed sine table. As with previous FFT programs, these, too, have a special
multiply-less butterfly loop for the occurrence of unity twiddle or complex rotation factors.

A Collection of Functions for the TMS320C30

287

The. routine CFFFT2 is a DIF radix 2 complex forward FFT program and thus
assumes a normally indexed pair of input arrays. The output array is bit-reverse permuted
and normally must be unscrambled to be of any use (see *CBITREV). The routine CIFFf2
is a DIT radix 2 inverse FFT program and thus assumes a bit-reverse indexed pair of
input arrays. A normally indexed complex frequency spectrum must be bit-reyerse scrambled before using CIFFT2 (again, see *CBITREV). On the other hand, the output from
this inverse FFT is in normal indexed order, but lacks the traditional scaling by the factor
of lin. Therefore, back-to-back calls of CFFFT2 and CIFFT2 will return the original
complex array (in proper order) but multiplied by a factor of n. Consult the handbook
by Burrus and Parks [4] for additional FFT algorithm details.

Linear Algebra Routines
The routines *SOLUTN and *SOLUTNX are the normal- and extended-precision
implementations of the algorithm for solving simultaneous linear equations. This algorithm
is the modified Gauss-Jordan elimination without (off diagonal) pivoting. This is a simple
algorithm which is intended for use with well-conditioned systems of dense linear equations of moderate size. Well conditioned means that the system of linear equations is linearly
independent or non-singular. This subject and further algorithm details are to be found
in chapter 2 of [5] by Press et al, or any other book on the numerical techniques oflinear
algebra. This algorithm is suitable for a wide range of problems requiring the solution
of a ~ystem of linear equations, e.g. exact or least squares polynomial fitting.
A simple system of linear equations has the form:
A[l, l]x[l]
A[2, l]x[l]

+
+

A[1, 2]x[2]
A[2, 2]x[2]

+

A[n, 2]x[2]

+ ... +
+ ... +

A[l, nJx[n]
A[2, n]x[n]

= y[l],
= y[2],

(39)

+ ... + A[n, n]x[n] = yen].
Symbolically, you may write A = A[i, j] as the n x n matrix of coefficients,
A[n, l]x[l]

and
x = xCi] as the unknown independent variable (column) vector, and y = Ylilas the dependent variable (row) vector. Thus (39) can be written in short hand form as Ax = y or
Ax - y = 0, where the multiplication indicated is a matrix-vector multiply. The fundamental problem in linear algebra, then, is to find the solution vector x. In fact, you
may desire to find the m different solutions to m sets of linear equations which share the


~
....

~
to.)

fHIHfftHfHfHH*,**HHfHHHHlHHHHfHHffHfH

*

PROGRAII' SIN

*

WRITTEN BY: GARY A. SITTON
GAS LIGHT SOFTI/ARE
HOUSTOII. TEXAS
MIlCH 1989.

AND

LIIF

RO
RO,R4

; ROUND X
; R4 (= X

COSIII: ENTRY POINT
ECOS;
SCALE AND I1AP VARIABlE X

*

SINE AJNCTIOII' RO (= SIN(ROl.

*

APPROXl"ATE ACClllACy, 7 DEtIPlAL DIGlTS.
INPUT RESTRICTIOOS' NONE.
REGISTERS FOO IN'UT' RO (ARGlI£NT IN RADI~l.
REGISTERS USED AND RESTORED' OP AND SP.
REGISTERS ALTERED' ARO, IRO, AND RO-4.
REGISTERS FOO OOTPUT' RO.
ROOTIIES NEEDED' NlNE.
EXECUTIIJII CYCLES (HIN, !'lAX); 44 , 44.

*
*

•
*
*
*

HffffHIHffHflHlHHHHfHfHfffHffHHHHfffHHI

EXTERNAL PROGRAII NAHES
.1lLOIIL SIN
.GI.OIIL ECOS
INTERNAL CONSTANTS
~

g
~

B.
<:>

;:

CDF

; CI (PII2l
1.570796327
-o.6459b40968 ;C3
0.07969260878 ;CS
-0.00468166687 ; C7
0.00016025884 ;C9
-0.000003433338; C11
CDF

~

(4.

C·

;:

'0'"...>
So

.1llRlI

.FLOAT -1.0, 0.0, 1.0, 0.0 ; /lAPPING CONSTANTS

ACON

•WORD

C

; ADOOESS OF CONSTS •

START OF SIN PROGfIAII

~

C

CON

• TEXT

~

Q

; ADOOESS OF COEFFS.

ACOF
CON

'"
N

; 2/PI

•FLOAT
.FLOAT
.FLOAT
.FLOAT
.FLOAT
.FLOAT

~
;:

.FLOAT 0.636619n2

POLYNOItIAL COEFFS. FOO SIN(X*2/PIl, -! ( X ( I
SIFT

SIN'
PUSH
LOP

LIIF
tlPYF
FIX
FLOAT
WBF
NEGF
ADDI
AND
TSTS
LllFNZ
LDI
ADDF
NEGF
LOI
POP

RO
RO,RI
_,R!
RI,IRO
lRO,R2
R2,R!,RO
RO,R3
I,IRO
3,100
2,IRO
R3,RO
@ACIJIi,ARO
*+ARO(JOOl,RO
RO,R3
@ACOF,ARO

DP

;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;

RO {= ;X:
RI {= RHO IX:
R! (= H2IPI
lRO (= INTEGER QUADRANT Q
R2 (= FLOATINJ QUADRANT Q
RO (= X, -! ( X { I
R3 (= -X
lRO (= Q + I
IRO {= TABLE INDEX
LOOK AT 2ND LSB
IF I THEN RO (= -X
ARI -) CONST. TABlE
FINAL HAPPING. RO (= X + C
R3 (= -X
ARO -) COEFF. TABLE
L!lSAVE DP

EVALUATE TIlI.N:ATEO IODDl SERIES

.DATA

NOR"

ABSF

DP
@ACOF

,SAYEDP
; LOAD DATA PAGE POINTER

tlPVF
RND

00, RO,R2
R2

; R2

I1PYF
ADDF

*ARO-,R2,RI
*ARO--,RI

; Rl (= X**2*Cl1
; RI (= C9 + RI

I1PVF

R2,RI
IARO-,Rl

; Rl {= X**2*(C9 + Rll
;Rl(=C7+Rl

R2,Rl
fARO-,Rl

; Rl {= X**2*(C7 + Rll
; Rl (= CS + Rl

ADDF

Rl
R2,Rl
tARO-,R!

ROOND BEFOOE *
Rl (= X"2*(CS + Rll
Rl(=C3+RI

RHO
tlPVF
ADDF

Rl
R2,Rl
tARO,Rl

ROUND BEFORE *
Rl {= X"2«C3 + Rll
RI(=Cl+Rl

ADDF
IPVF

ADDF
RND
tlPYF

(=

X**2

; ROOND X"2

~

g

a
~.

~

~
;:
Q.

FINISH UP SERIES AND

R£T~

LDF
LlIFN

R4,R4
R3,RO

POP
!IUD

R2
R2

TEST ORIGItW. X
IF X <0 TIEN RO <= -x
R2 <= RE~ ADmESS
~ (lELAVED)

RHO

RO
RI
RI,RO

ROOND BEFORE •
ROOND BEFORE •
RI <= X'ICI + RIl

RHD
MPYF

fHHHH+HIHHIHfHHHtHfHfHfHffH+***HHtHfH

PROORA'U

•

GAS LIGHT SOFTIIARf
HOUSTON, TEXAS
IWlCH 1~89.
•

COSINE FUM:TION: RO

t

N'PROXIMATE ACMACY: 7 DECIMAL DIGITS.
INPUT RESTRICTIONS: NONE.
REGISTERS FGR INPUT: RO IARGlItENT IN RADIANSI.
REGISTERS USED AND RESTORED: DP AND SP.
REGISTERS AlTERED: ARG, IRO. AND RO-4.
REGISTERS FOR OUTPUT: RO.
ROOTINES NEEDED: EGOS ISINI •
EXECUTION CYCLES IMIN, MAX>: 40 , 40.

t

~.

•

~

•

t

...

So

'"

~
t3
N

cos

IIRITTEN BY: GARY A. SITTON

t

(=

COSIROI.

NOTE: USES SHFT CONSTANT FROM SIN PROGRAM!

***HHIHHHf-fHHHffHf*,tHff***********H******f*ft
EXTERNAL PROORAII NAtES

C

a
c

.GLOIII. cos
.GLOBL EGOS
•TEXT

START OF COS PROORAII
COS:
PUSH
LOP

iI'

IIRD
RND

ECOS
RO
@SHfT,RO
RO.R4

ADDF
LDF

@ACOF

; SAVE iI'
; LOAD DATA PAGE POINTER
RO <=
ROUND
RO (=
R4 <=

COSIX)

=SINIX'),

IDELAYED)

X

X'
X'

=

X + PII2

RETURN OCCURS FROM SIN !

~

·-:---;:--~-~~;;;-o-~_n~,~·~---::--

~

***HfHHfH+HfffH**HffHHHftH+tH+ffHHHHHHf

PUSH

<

UP

PROGRM' EXP

AND
<

·
*

•
•
•<
<

IEGf
LDF

WRITTEN BY' GARY A. SITTOO
GAS L1I)11T SOFTWARE
HOUS~, TEXAS
I'IARrn 1989.

UIFN

I'I"IF

EXPONENTIAL FIJICTION: RO (= EXPIRO).
APPROXlllAlE ACC\IW:Y' 7 lECllIAL DIGITS.
INPUT RESTRICTIONS' lRO: (= 88.0.
REGISlERS FCIl INPUT: RO.
REGISlERS USED AND RESTOREG' lI' AND sp,
REGISlERS ALTERED: RO-4.
REGISlERS FOR OUTPUT: RO.
ROUTINES NEElElI: FPINV.
EXECUTION CYCLES ("IN, 1lAx): 44 (RO (= 0), 70,

FIX
FLDAT
SUBF
IEGI
LSH
PUSH
POPF
lOI
POP

lI'
tIa
RO
RO,R2
RO,RI
R2,RI
_,RI
RI,R3
R3,RO
RO,RI
R3
24,R3
R3
R3
tIa,ARO
lI'

; SAVEli'
; LOAD DATA PAIl£ POINTER
;RD.l'..

~

~
~
C·
;:s

SCALING COEFF, FOR 2*<-1

00"

.FLDAT 1,442695041

'c'".,>

C7

.FLOAT
.FLOAT
.FLOAT
.FLOAT
•FLOAT
.FLOAT
.FLOAT
•FLOAT

;;.

f{;7

.IIlRD

~
;:s
~

~.

;:s

n.

~

C

C

1.0000000000
-0.693147180
0.240220469
-o.re5503I>54·
0.009615978
-0.001328240
0.000147491
-0.000010863
C7

• TEXT
START OF EXP _

~

Q

I'I'VF
• I/LN(2)

POLYNO"IAL ClEFFS. FOR 2**-X, 0 (= X ( I.

~

tv

ADDF

EXP.
SCALE VARIAlM..E X

CO
CI
C2
C3

C4
C5
C6
C7

ADDF
RND
I'I'YF

ADIF
AND
I'I'VF

lEST FOR X ( 0 AND RETlRI
LOF
lIND

ADDF
AND
I'I"IF
RETS

R2,R2
FPINV


INTERNAl. CIIlSTANTS

Q
<::>

.DATA
SCALING CDEFFS. FIlR LHII+X)
LHRK

CO

.FLDAT 0.6931471806
.FLOAT 1.0000000000

, LHI21
, CO (1.0)

POL YIIlI!IAI. COEFFS. Fl1l LHI ltXl, 0 C= X ( I.
0.9999964239
-O.4m741238

C8

.FLllAT
.FLOAT
.FLllAT
,FLOAT
.FLllAT
• FLOAT
.FLllAT
.FLOAT

AC8

.WOOD

C8

0.3317990258
-0.2407338084
0.1676540711
-0.0953293897
0.0360884937
-0.0064535442

, TIP OF CI
, TOP OF C2
,TIPOFC3
; TIP OF C4
, TIP OF C5
, TOP OF C6
, TOP OF C7
; TOP OF C8

LN'
UlF
RO,RO
RETSLE

ASH
FLIIAT
LDF

LDE
SUBRF

UlF
KPVF
LOF
LDI
POP

IP
lACS
RO
R3
-24,R3
R3,RI
@CO,R2
R2,RO
RO,R2
ILNlK,RO
RI,RO
RO,R3
IACS,ARO
IP

,SAVEIP
, LOAD DATA PAGE POINTER
, SAVE AS FLT. PT.
, R3 C= INTEGER FfJRlIAT
; R3 C= E = SIGNED EXP.
; RI C= FLT. PT. E VALUE
, R2 C= 1.0
; EXP. RO C= 0 (I C= X <2)
; R2 c= X - I 10 C= X C Il
; RO C= LN(2)
, RO c= EARD-,RI,RO
>ARD-,RD

, RI <= RND 10'2
, RD <= XH2tCI7
; RO<=CI5+RD

I'I'YF

RI,RD
>ARD-,RD

, RD <- XH2'ICIS + RD)
,RD<=CI3+RD

ADlIF

RI,RD
>ARD-,RD

, RD (= IH20ICI3 + RD)
,RD<=CII+RD

.!IORD

CI7

rt'YF
ADlIF

RI,RD
>ARD-,RD

, RD <= XH20ICII + RD)
,RD(=C9+RD

RND

RD
RI,RD
>ARD-,RD

RWm IIEFORE •
RD <= IH20IC9 + RD)
RD<-C7+RD

START IF ATAN _

Q

, -1'114
, PII4
, ZERO

POl_1M. COEFFS. FOR ATANIlI, -I (= X (. I.

.TEXT

N
C

C

, RI <; Rl

.MTA

-

So

, SAVE RND X

,
,
,
,
,

RND

i

RI
RD,RI
ICI,RI
R2,RD
FDIY

R2
SKIP
RO,R3
RD,RI
I,IRO

POPF
BGED

INTER«. ClllSTANTS

SAVEli'
I.MD MTA PAGE POINTER
R2 {= :x:
R2 <= :x: - I
IF IX: ) I THEIl SCAlE I!£lAVED)
R3 <= RND X
RI <- RND X
IRD <= 0, POST SCAlE Ir«X

:x: ) I

TEST FOR X'

.GlOIL ATAN
.GlOIL FDIY

•~

lI'
MCI7
RD.R2
1C1,R2
SKIP
RD,R3
RD,RI
O,IRO

CI
C3

C5
C7
C9
CII
CI3
CIS
CI7

ADlIF
I'I'YF

I'I'YF
ADlIF

I!£lAVED)

~

~

i

~

~
;:

~.

~
....

So
~

; ROOOl BEFlJ~ I
; RO (= Xu211C7 • ROl
; RO (= C5 • RO

AND
II'YF
ADlf

RO
RI,RO
llIRO-,RO

AND

RO
RI,RO
1AR0-,RO

RtWD BEFORE I

ADDF

AND
II'YF
ADlf

RO
RI,RO
IARO-- ,RO,RI

RtWD BEFDRE I
RO <= Xu21!C3 • ROl
RI <= CI • RO

II'YF

RO

(=

Xff2f(CS + ROI

RO

(=

C3 • RO

FINISH UP, POST SCALE BY C AND RETURN
PIJ'
BUD
AND
II'YF

ADlf

R2
R2
RI
R3,RI,RO
H+AROIIROl, RO

R2 <= RET~ ADDRESS
RETURN IlE.AYEDl
ROOOl BEFORE I
RO <= ATAll(Xl = XIII. ROl
RO <= ATANIXl • C 10.0, Pli4 OR -P1/4l

fHHH ........foHHHHf.ffftHHffHfl*ffHflHH+HffHH

I

PROORIIII' SIIlT

I

!IlITTEN BY, GARY A. SITTIll
GAS LIGHT SDFTIIARE
HOUSTON, TEXAS
MARCH 1989.

I

SQUARE ROOT FUNCTION' RO

I
I'
I
I
I
I
I
I

APPROXIMATE ACCURACY' B lEC111AL DIGITS.
INPUT RESTRICTIONS' RO )= 0.0.
REGISTERS FOR INPUT' RO.
REGISTERS USED AND RESTORED' UP AND \P.
REGISTERS AlTERED' RO-4.
REGISTERS FOR OUTPUT: RO.
ROUWES NEElED' tflNE.
EXECUTION CYCLES I"IN, MAW 49 , 49.

<=

SIIlTiROl.

..

HffHffffHHfH+ffHHHffHHfHfHffHfffHHfHfffff

~

EXTERNAl PROGRAII NAi'fS

~

.Gl.081. SQRT

N

C

ac

INTERNAl CONSTANTS
•DATA
CNST!
CNST2
CNST3
CNST4

SMSK

• SET
• SET
.FLDAT
.FLOAT

0.5
1.5
1.103553391
0.780330006

• WORD

OFF7FFFFFH

; ADJUSTED 1.0
; ADJ.JSTED SIIlTI1/2l

• TEXT
START OF SQRT PROORIIII.
SliRT:
LDF
RETSLE

RO,R3

, TEST AND SAYE V
; RET~ tDI IF V <= 0

GET APPROXIMATION TO IIV. FOR V = 11+1t112"E
AND 0 (= " < I, FOR E EI'EN' no] = IH1I2112H-£/2
AND FOR E ODD' no] = SIIlTl1I2III H1/2l12H-El2

~

PUSH
LIIP
PUSIF
PIJ'
lOR
lOl

DP

tsI1SK
RO

R2
tsI1SK,R2
R2,RI

SAVE DP
LOAD DATA PAG£ POINTER
SAVE V AS FLT. PT. V = 11+lt112HE
R2 (= V AS INTEGER
R2 <= rnf'LEI£NT AlL BUT SIGN

RI

<=

I H1I2l12H-£

~

LDI
LSH
ASH
PUSH
PIFF
lIE
LIF

LSH
UFIfI
ff'YF

PIP

R2,R4
8,RI
-I,R2
R2
R2
R2,RI
1CIIST3,R2
7,R4
1CHST4,R2
R2,RI
lP

,R4<-RI
I RI <- RI EXP. _
, R2 <. R2 WITH -EI2 EXP.
I SAVE R2 AS INlEliER
, R2 <= FLT. PT.
, RI <- (I-III2)'2H-El2
, R2 <= 1.1 ... Fill OOD E
, TEST lSII !J' E (AS SIGN)
, IFE EVEIIR2 <=0.78...
, RI <= CIHIECTEO ESTIItATE
, \IISAIIE lP

•

WRlmN BY' GIIRY A. SI111I(
GAS LIIIIT SIFTIIARE
lOJSTllII, TEXAS
!lARCH 1989.
•

FLOATING POINT INVERSE. RO <= l/RO

I

II'fIIOXiItATE ACClIlACY' 8 1EC1lW.. DIGITS.
llFUT RESTRICTI()IS' RO != 0.0.
REGISTERS fill Itf'UTI RO.
REGISTERS USED AND RESTORED' lP AND 51'.
REGISTERS ALTERED' RO-2 AND R4.
REGISTERS fill OUTPUT' RO.
ROUTINES NEEIED' N!J£.
EXECUTIGN CYCLES ("IN, 1fAX): 33 , 33.

_TE Yl2 UJS£S ItPYF).
I

ff'YF
RND

CNSTI,RO
RO

I RO <= VI2 TRI.I«:.
, RO <= RND Vl2

NEWTGN ITERATIGN Fill VIXl

=X-

VH-2

R2
R2
R2
RI

<=
<=
<=
<=

X[01"2
(VI2) • X[01H2
1.5 - (Vl2) • X[01H2
XCII = X[OI • 11.5 - (V/2)'XCOllf2)

R2
R2
R2
RI

<= XCIlH2
<= (VI2) • XCI]H2
<= 1.5 - (Yl2) • mlH2
<= XC2] = XCIl • 11.5 - (Vl2)'XCIlH2)

_.FPINY

•
I

=0 ...

HHHIIIIIIIIIIIIIIIIIIIII •• IIIIIIIIIIIIIIIIIIIIIIIIHH

::..

~

1\
~

§'

.Q,

~
;::s
~

~.

'C...>
So

'"

~

~

~

a
c

ff'YF
ff'YF
stJIIRf
ff'YF

RI,RI.R2
RO,R2
CNST2,R2
R2,RI

ff'YF
ff'YF
stJIIRf
IIPVf

RI.RI,R2
RO,R2
CllST2,R2
R2,RI

,
,
,
,

EXTERNAL PROGRAII NMES

.GL08L FPINY
INTERNAL C()ISTANTS
.DATA
ONE

ff'YF

II'VF
stJIIRf
ff'YF

RI,RI,R2
RO,R2
CNST2,R2
R2,RI

,
,
,
,

R2 (= X[21 ..2
R2 (= (Yl2) • XC2]H2
R2 <= 1.5 - (Vl2) • X[2lH2
RI (= XC31 = XC21 • 11.5 - (Yl2)'X[21"2)

TI/O
/lSI(

.SET
.SET

1.0
2.0

• WORD

0FF7FFFFFH

.TEXT
RND
ff'YF
RND
ff'YF
SUBRF
RND
IFiF

RI
RI,RI,R2
R2
RO,R2
CNST2,R2
R2
R2,RI

, ROOHD 8EF!IlE I
, R2 (= X[31H2
,RIlJlD8EFORE •
• R2 (= (Yl2) • XC31"2
, R2 (= 1.5 - (VI2) • X[3]1I2
, _ 8EF0RE'
, RI

(=

X[4]

=

X[31 • 11.5 - (Vl2)'X[3]H2)

INVERT FINAL RESlLT AND REMN

PIP
BUll
RND
RND
ff'YF

R2
R2
R3
RI
RI,R3,RO

R2 (= RETURN ADDRESS
RETlIlN (lELAVED)
ROlNl 8EF!IlE •
8EF!IlE
_

*

RO = SQRT(V) = _ ( I I V )

START !J' FPINY PROGRAIt
FPINV'
LIF

RO,RO

RETSZ

• TEST F
, RETlIlN N!JI IF F =0

GET II'fIIOXIIfATlGN TO IIF. fill F = II+IU • 2ttE
AND 0 (= " < I, USE. XCO] = (1-It/2) • 2tt-E
PUSH

LDP
PUSHF

PIP
XIII
PUSH
I'OPF

PIP

lP

• SAVE lJATA PAGE POINTER
, LlJAD lJATA PAGE POINTER

RO
RI
_,R!
RI
RI
lP

•
,
,
•
,
,

SAVE AS FLT. PT. F = 11+") I 2ttE
FETCH SACK AS INTEGER
ctII'LEIIENT E ~ " BUT NOT SIGN BIT
SAVE AS INTEGER, AND BY I1AGIC...
RI (= X[OI =11-11/2) • 2H-E.
\IISAIIE lP

•~
~
Q.

§'
~

~
::s

g.

i:l
'&
.,
So
til

~

~
N

_'FDIV
I'I'YF
SUBRF
I'I'YF

Rl,RO,R4
TIIl,R4
R4,Rl

, R4 <= FIno]
, R4 (= 2 - F • XW]
, Rl <= xm = HO] 1 12 - FIno])

I'I'YF

Rl,RO,R4
TWO,R4
R4,Rl

, R4 (= F • xm
, R4 <= 2 - F • xm
, Rl {= xm = xm * 12 - F •

x[m

•

FLOATING POINT DIVIDE FlKTION: RO (= ROfRl,

,R4<=Flx[2]
, R4 (= 2 - F • X[2]
, RI <= X[3] = xm • 12 - F • H2])

1

I'I'YF

Rl,RO,R4
TIIl,R4
R4,Rl

FOR

LAST ITERATION:

X[4]

RO,R4
Rl,RO
RO,R4

, ROUND F BEFORE LAST I'IJL TlPLY
, RruND H3] BEFORE ItILTlPLIES
, R4 {= F • X[3] = I + EPS

APPROXlI!ATE ACClilACY' 8 DECII!AL DIGITS.
IM'UT RESTRICTIONS: RI != 0.0.
1
REGISTERS FIR III'Ur: RO (DIVIDEND) AND Rl !DIVISOR I.'
REGISTERS USED AND RESTlRED: iI' AND 51'.
REGISTERS Ii..TERED: RO-4.
REGISTERS FIR OUTPUT: RO (QUOTIENT>.
ROUTINES t.£EDED: FPINII •
EXECUTllll CYCLES (~IN, !!AXI' 43 , 43.

SUIIRF

I'I'YF
ItPYF
SUIIRF

1

1
1

THE

= m3] •

11 -

IF.

X[3J1) I

+ X[3]

I
1

RHO
RHO
I'I'YF

1
1

WRITTEN BY' GARY A. SITTIll
GAS LIGHT &fiI/ARE
IIlJSTlll, TEXAS
APRIL 1989.

Hl-fHfHffHtffHfffHftHflHlfHfHfHHHHHHHHff

FINISH ITERATION AND RETURN
p(f>

C

!IUD
SUIIRF

C

t1'YF
AIlIF

Q

....1H+fH*"***4HfHHHf4HI+fHftf+lH+,.......................... ..

NEWTON ITERATION FOR: YIX) = X - IfF = 0 ...

R2
R2
0IIE,R4
RO,R4
R4,RI,RO

EXTERNIi.. 'PROGRAII NAI1ES

R2 (= RETURN ADDRESS
RETtIlN (DELAYED)
R4 (= 1 - F • H3] = EPS
R4 <= X[3] • EPS
RO (= X[4] = (1[3]111 - (FIX[3]»1 + H3]

.GL08L FDIV
.GLOBL FPINV
•TEXT
START IF FOIV PROGRAI1
FDIV:
RNO
l.DF
Cli..L
RHO
tl'YF
RETS
•END

~

RO,R3
RI,RO
FPINV
RO
R3,RO

R3 (=
RI (=
RO {=
ROlIlD
RO (=

RHO X
Y
lIY
BEFlRE
X

,RETURN

1

8

1IIIIIIIIIIIIIIIIIIIIIIIIIIftHHfHHIIIIIIIIIIIIIIIHH

•

_'SINX

•

IIlITTEN BY' GARY A. SInON
GAS LIGHT S(fTWARE
HOUSTON, TEXAS

PROGRII/t; tllATHX.ASI1
EXTENIED-fRECISION, FUlATIIIH'OINT I40-BITl MTH Al'«:TIONS
tllATHX.ASI1 CONSISTS

(F

ItARCH 1m.

THE FDLLONING ROOTINES'

SINX

- cat'\JTES A 9D SIN(x) FOR ALL X IN RADIANS.

COSX

- COII'UTES A 9D COSINEIX) FOR ALL X IN RADIANS.

EXPX

- cat'\JTES A 9D EXP(X) FOR ALL iXi

UlX

- cat'\JTES AN aD UlIX) FOR ALL X ) O.

=<

88.

EXTENDED PRECISION SINE Al'«:TION' RO
•
•
I
I
I

ATANX - ClJI'UTES AN aD ATAN(X) FOR ALL X IN RADIANS.

I

SQRTX - COIf'UTES A 10D SQRT (Xl FIll ALL X )= O.

fHflHHHHHfHlffHHflflHIHHfHflffHHHHfIHH

I
FPINYX - OOf'UTES A 10D 1/X FOR ALL X 1= O.

EXTERNAL PROGRAI1 NAIIES

FDIVX - CMPUTES A 10D XIV FIll ALL X AND ALL V 1= O.

.GLOBL SINX
•GLOBL £COSX

FIIJLTX - COI1PUTES A 10D XIY FOR ALL X AND ALL V.

.GLOBL FllULTX

....fHHHHHHffHfHfHffHfftHfHfHHfff*"fH**HflffffHtftHHHHf**

INTERNAL CONSTANTS

~

•DATA

~

a
§.

<= SIN(RO).

APPROXIItATE ACClRACy. 9 DECIIIAL DIGITS.
IIf'UT RESTRICTIONS' NDNE.
REGISTERS FOR INPUT. RO (ARGlItENT IN RADIANS).
REGISTERS USED AND RESTllRED' DP AND SP.
REGISTERS ALTERED' ARO, IRO, AND RO-7.
REGISTERS FIll OOTPUTI RO.
ROOTINES NEEDED' FltULTX.
EXECUTION CYCLES ("IN, Mill 160, 160.

SCALING COEFFS. FIll SIN(X)
NR112

•WRD

~1

.WORD

00G0OOO6FH
OFF22F9S3H

; BGT~ OF 21PI
; TDP OF 2/PI

.Q,
~
;:s

SHF2

~

• WORD

0000000A3H

StiFl

.WRD
.WORD
• mD
•WORD
• mD
• WORD
.I«lRD
•WORD

OOG49OFDAH
OOOOOOODIH
OFFDAA218H
OOOGOOGElH
OFC2335EOH
OF8Eb9754H
OF3280B28H
0ED9997B4H

;
;
;
;
;
;
;
;
;

COF

; ADDRESS OF COEFFS.

po"Y~IAL

~.

'c>
...

if
~

COF

COEFFS. FOR SIN(X)
BOTTON
TDP OF
BOTTON
TDP OF
BOTTIJI
TIP OF
TDP (F
TOP (F
TDP (F

IF Cl (PII2)
Cl (PII2)
OF C3
C3
OF C5
C5
C7
C9
Cll

ACOF

.1«lRD

~
tv

CON

.FUlAT -1.0, 0.0, 1.0, 0.0 ; tilPPING CONSTS.

Q

ACON

.1IlID

C

c

.TEXT

CON

; ADORESS OF CONSTS.

~

g

START OF SINX PROGRAI1

ADDF

[

PUSH

LDP

li
~

r
~

~.
~
..,

So
~

~
~
N

C

ac

CAU.
LDF
(ll

SINX:

IP
IINRIII

, SAVE IP
, LOAD MTA PAGE POINTER

COSX ENTRY POINT

SCALE AND rIAl' VMlABLE X
PUSIF

AllSF
LDF
(ll
CAU.
fIX
flOAT

SUBf
NEGI'

ADDI
AND
ISTB
LOFN!

LDP
LDI
ADDF
NEGI'

LDI

00
00
_I,RI
_,RI
FllULTX
00,100
IOO,RI
RI,RO
OO,R3
I,IRO
3,100
2,100
R3,OO
lACON

IACtW,ARO
<+ARO(IRO),RO
OO,R3
!ACOf,ARO

, SAllE (IUGINAI. X
, 00 (= IX:
, RI (= TOP OF 2IPI
, OR IN 1I011!»1 OF 2/PI
, 00 (= :X:<2/PI
, 100 (= INTEGER IlliWlANT Q
, RI <= flOATING QUADRANT Q
, 00 <= x, -I < X < I
, R3 <= -x
,R2<=Q+I
, IRO <= TABLE INDEX
, LOOK AT 2ND LSB
, IF I TI£N 00 <= -X
, LOAD OATA PAGE POINTER
, AIIO -) cam. TABLE
, fiNAL rlAl'PING, RO <= X + C
,R3<=-X
, ARO -) COEFf. TABLE

EVALUATE TRWCATED SERIES

~

LDF
CAU.
LDF

RO,RI
fltLTX
OO,RI

, RI (= X
, 00 <= X"2
, RI (= X..2

MPYF
ADDF



.WIJID
.I«lRD

• TEXT
START IF EIFX PROGRAIt

; co

<= X<1.

(1.0)
; IIOTTON IF Cl
, TIl' OF Cl
, BOTTON IF C2
, TOP OF C2
, BOTTDN IF C3
, Tll'1F C3
, TIl' IF C4
, Tll'1F C5
,TII'IFC6
, TIl' IF C7

~

EJPI.

g

taIL£ YMIMLE I

~

PUll

B.

§

~

~

;:r

B.

..'a-§
...

S.
I\>

~

I..>
N
C

Q
C

LII'

IIIf
UF
UFII
UF
(Jt

II'
IICT
fIO,R2
fIO,RI
R2,fIO
_,R!
IEIRI2,RI

au

FlU.TI

FII
FUlAT

RO,R3
R3,HI
RI,IIO,RI
R3
24,R3
R3
R3
IICT,IIRII
II'

SIB'
IEDI
LSH
PUll

I'O'F
Ull
POP

,SAllEII'
I LDAIl INITA !'ME POINTER
; R2 (III -I
, RI <= I
, IF I < 0 TIEN RI <= IX:
, RI <- TIl' (f IILN(21
, (Jt IN IIOTTOI (f IILN(21
, 110 <= X• :1:/LN(21
, R3 <= I = INTEGER (f X
, RI <= FLT. PT. I
, R! <= FRM:TI()I (f :·X:, 0 <= X < I
, R3 <= -I
,1lIIIE-I TO EIP,
, SAllE AS INT.
, R3 <= FLT. PT, 2H-1
, IIRII -) COEFF. TABLE
; l.IISAI'ElIP

EWlLUATE TRlKATED SfRIES
IPYF
A11IIF

tARO-,RI,RO
tARO-,RO

, 110 <= XtC7
,1IO<=C6+1IO

IPYF
AIIIIF

RI,RO
tARO-,RO

, 110 <= X'(C6 + ROI
,RO<=CS+RO

IPYF
AIIIIF

RI,IIO
tARO-,RO

, 110 <= X'(CS + ROI
,RO<=I:4+RO

IPYF

RI,IIO
tARO-,R4
tARO-,R4
R4,RO

, 110 <= X«1:4 + ROI
,R4<=TII'(feJ
,IIlINBIlTTOII(feJ
,RO<=eJ+RO

RI,RO
tARO-,R4
tARO-,R4
R4,RO

, RO <= X'(eJ + ROI
, R4 <= TOP·(f C2
,IIlINBOTTOII(fC2
,RO<=C2+RO

CII.L

FlU.TI

UF
AIlIF

tARO-,R4
tARO-,R4
".RO

, l1li <= X'(C2 + ROI
, R4 <= TOP IF CI
, III IN IIOTTOI (f CI
,IIII<=CI+RO

CII.L

FllLTX

; RO

UF
(Jt

A11IIF
IPYF

UF
(Jt

AIlIF

(Jt

(=

If(Cl +- ROI

TEST FIll X< 0 AIID RErulH
~

8

UF

R2,R2

, TEST ORIGINAL -X

BIID
ADIIF
IPYF
LII'I
RETS

FPINYX
 IF C2
,CltINIIlTmIIFC2
IRO(aC2+RO
I
,
,
I

RO <= X.IC2 • ROI
R2 <= TQ> IF CI
CIt IN IIITTIJIIF CI
RO <= CI+ RO

, RO <= X>ICI • ROI
E_T,
, RO <= LNIXl + E
*
*
*
*
>
•

EXTENlED PRECISI(W ARC TIINlEHT' RO <= ATANIROI.
APPROXlIlATE ACClIlACY' 8 lECitIAL DIGITS.
INPUT RESTRICTlDNS' NOIE.
REGISTERS FOR 11f'UT' RO.
REGISTERS USED AND RESTORED' OP AND SP.
REGISTERS ALTERED. ARO, lAO, AND.RO-7.
REGISTERS FOR OOTPUT: RO lIN RADIANSI.
ROUTINES NEEDED' FIlULTl, AND FDlVX.
>
EXECUTI(W CYCLES IMIN, PlAXI. 210 IIATANXI<=Il, 332 ••

HHfHffHlHfHffffffHHfHflfHflHHIIHHHHHittH

, RET\Rj

EXTERNAL PROGRAII NAI'ES

~
t\,)

•GLOBL ATANX
.GLOBL FIlULTX
.GLOBL FDIVX

c

Q
c

INTERNAL CDNSTANTS
• DATA
SCALING COEFFS. FOR ATANIXl
.IIJRD_

.WORD
•WORD
.WORD
.WORD
.WORD

0FtB6F025H
0000000A2H
OFF490FIi1H
OOOOOOOOGH
OSOOOOOOOH

BOTIOM
TOP IF
BOTIOM
TQ> IF
BOTTOM
TOP IF

IF -f'I/4'
-Pl/4
IF PI/4
Pl/4
OF ZERO
ZERO

POLVNIJIIAL COEFFS. FOR ATANIXl, -I (= X (= I,
C1

~

.WDRO
.IIJRD

•WORD
•WORD
•WORD
•WORD
.WDRO
•WORD
•WORD
•WORD
•WORD
•WORD
• WORD

OOOOOOOOOH
000OOOO6EH
0FED55SlI4H
OOOOOOOD9H
OF04CBIlE4H
OOOOOOOFFH
OFOEEB038H

-

OFC5A3D83H

000000093H
OFCE5CE811H

OOOOOOOBFH
OFB2FCIFIIH

TOP IF CI 11.01
BOTTOM OF C3
TOP IF C3
BOTTIJI OF C5
TOP IF C5
BOTTOM OF C7
TDP IF C7
BOTIOM OF C9
TOP IF C9
BOTIOMIF ell
TOP IF CII
BOTTOM OF CI3
TOP OF CI3

~

CI1
IItI1

.1IlRD
.1IlRD
.1IlRD

; TIP OF CI5
; TIP OF CI1

OFIIFB9IFEH
0F13BD144IH

C11

LDI

;
I
;
;
;
;

II'
MeI1
RO,R2
1t1,R2
SKIP
RO,Rl
RO,RI
O,IRO

SAVEII'
LOAD MTA P11GE POINTER
R2 (= IX:
R2 (. IX: - I
IF IX: ) I TIEN SCAL£ UELAYED)
Rl (= X

; R1

<= x

I IRO (- 0, POST SCAL£ INDEX

SCAL£ FCR :X: ) I

I'IISIF
AIISF-

4IIIF
LIF

:..
~

au.

PII'F
88ED

§

LIF
LIF

~

01

~
:::

IEIIF

B.
C
i:;

01

8CIPI

au.

I lET (IIIGI"" X
; IF X ( 0 TIEN RO (- -X' UELAYED)
I Rl (- X'
1 RI (= X'
I IRO (. -2, (PII4)

RO,Rl
2,IRO

; R3 (- -I'
1 IRO (. -4, l-P1I4)

S-

MIllIE _lED

~
~

N

C

Q
C

...
LIF

JII¥f

IP/F
LIF
III

I RO (- XH2
1 ARO -) COEFF. TAIILE
1 lJISA\IE II'

FllLTX
MeI1,ARO
II'

,.

"'

FIllTX
tARO-,R2
tARO-,R2
R2,RO

; RO <= XH2*IC9 + ROI
; R2 (= Til' OF C7
;CRINIIOTTIIIOFC7
;RO(=C7+RO

FIllTX
'ARO-,R2
tARO-,R2
R2,RO

; RO

ADIF

FIllTX
tARO-,R2
tARO-,R2
R2,RO

; RO (= XH2*IC5 + RO)
; R2 (= TIP OF Cl
;CRINIIOTTIIIOFCl
;RO(=Cl+RO

CAU.

FIllTX

f

(l1li)

SERIES

RO,RI
_ _,RI,RO
_ _ ,RO

; R1 (= IH2
I RO (- XH2*C11
IRO(=CIS+RO

II,RO

; 110 (= XH2*ICI5 + 110)
I R2 (= TIP OF CI3
,IIIIlIIIOTTlIIOFCI3

--,rq
_ _,R2-

CAU.

LDF
III

ADIF
CAU.

LDF
III

ADIF

LDF
III

(= XH2I(C7 + ROJ
;R2(=TII'OFC5
;CRINIIOTTlIIOFC5
;RO(=C5+RO

RO

<= XH2*(C3 + ROJ

FINISH If'

R4
SKIP
RO,Rl
RO,RI
2,IRO

'C...>

LDI

; RO (= XH2*(Cl1 + RO)
; R2 (= TIP OF C9
;CRIMIIOTTlIIOFC9
;RO(=C9+RO

()ILL

;SAYEX
; RI (= :x:
1 RI (. IX: + I
IRO(-II:-1
I RO (. m: - 111m: + 11

RO
RO,RI
ItI,RI
R2;RO
FDIYX

TEST FCR I' ( 0

~

B.

FIllTX
tARO-,R2
tARO-,R2
R2,RO

LDF
CR
ADIF

SCAL£ VMIAIILE X

UF

; RO (= X"2*(CI3 + RO)
; R2 (= TIP OF til
; CRIM IIOTTlII OF ell
;RO(=CII+RO

CAU.

ATIIIIX'

AIISF

; RO (= CI3 + RO

RI,RO
tARO--,R2
tARO-,R2
R2,RO

ADIF

ST4IRT OF ATIIIIX _

SIIIf
IUD
LDF

R2,RO

III

.TEXT

PUSH
LIP

ADIF

II'YF
LDF

ADIF
UF
CAU.
10'

tARO-,RO,RI
Rl,RO
FIllTX


I
I
I
I
I

I·

L.IFI.W
/lPYF

_1l1li1£ ACCtJIiICY. 10 IEtIIllll DIGITS.
IIFUI' RESlRICTIOIS. RO )0 0.0.
IUISlBIS FlII llFUI" RO.
IUISlBIS USED MIl RESnJIED. IF AND SP.
IIIIISlB1S M.lBIED. 110-7.
IUISlB1S FIR 1lIlPUT. RO.
RllUTIIES IEEIED' FIIl.TI.
EJmJTIOI C'/Q.fS ("IN, l1li110 138, 138.

;;.

EITEIM. _

~

.GLOIL SQRTI
.GLOIL FIIl.Tl

">
C

llIIERNAL CONSTANTS

~YF
~

~YF

SUBRF
~YF

~YF
~YF

S()IlRF

.DATA

~YF

CllST4

0.5
1.5
.FLOAT 1.103553391
.FLOAT 0.7S033OO8b

IIPYF
SUBRF
It>YF

5IISI(

.1I1RD

c

CllSTI
CIIST2
CIlST3

.SET
.SET

~YF

, ADJ.ISTED 1.0
, ADJ.IS1£D SllRTIlIZ)

OFF7FFFFFH

LDF
LDF
CALL
LDF

.I£XT
START OF SQRTX PROGRAH.

LIF
LIF
CALL
SUBRF
LIF

SllRTI.

LIF
RETSLE

RO,R3

, TEST AND SAVE V
, RETI.I!N NOW IF V (= 0

CALL

GET APPROXII'IATIOO TO I/V.

FOR V = O+M)*Z**E
AND 0 (= " { I, FOR E EVEN' I[OJ = (1-II/Z)*Z**-E/2
AND FOR E ODD: IIOJ =SQRTH/Z)*0-M/2)*2>*-E/Z

PUSH
LDP
PUSIF
POP

9

lOR
lOI
lOI

IF
@SItS!(

RO
R4
ts/tSK,R4
R4,RI
R4,R5

<=

RI EXP. REI\OYED

, R4 <= R4 WITH -E12 EIP,
, SAVE R4 AS IN1£GER
, R4 <= FLT. PT.
,
,
,
,
,

CNSTl, RO
R3,RO

RI <= II-ItIZ)IZH-E12
R2 (= 1.1 ... FOR ODD E
TEST LS8 OF E (AS SIGN)
IF E EVEN R2 (= O. 7S ...
RI <= COORECTED ESTII'IATE

, RO
, RO

NEWTON lTERATIOO FOR Y(I)

NIllES

~YF

Q

, RI

GElERAI£ VIZ !USES IIPYF).



~

S,RI
-1,R4
R4
R4
R4,RI
@CNST3,R2
7,R5
tcNST4,R2
R2,RI

=I R2
R2
R2
RI

RI,RI,R2
RO,R2
CNST2,R2
R2,RI

<= YlZ TRUNC.
<= VI2 Fill PREC.
I'H-Z

=0 ...

(= I[OJHZ
<= (YlZ) 1 XCOJ**Z
<= 1.5 - (VI2) 1 I[OJ**Z
<- 1m = X[O] f 0.5 - (YlZ)fXCO]HZ)

RI,RI,R2
RO,R2
CNST2,R2
R2,RI

; R2 (= X[1]H2
, R2 <= (VI2) • IlllHZ
, R2 (= 1.5 - (VI2I • XllIH2
'. RI {= I[Z] = XIlJ • 0.5 - (YlZ)IIll]ffZ)

RI,RI,R2
RO,R2
CNSTZ,R2
R2,RI

,
,
,
,

RO,R2
RI,RO
FI1lI.TX
RI,R4
R2,RI
R4,R2
FNJLTX
CNST2,RO
R2,RI
FI1lI.TX

; R2

,
,
,
,
,
,
,
,
,

R2
R2
R2
RI

RO
RO
R4
RI
R2
RO
RO
RI
RO

{=
{=
(=
{=

X[2]H2
(VI2) • 1[211'2
1.5 - (V/2) • X[2]**2
X[3J = 1[2J * U.5 - (VI2)'II2JI.2)

(= V/2
{= 1[31
(= 1[3]"2
{= 1[3J

{=

Yl2

{= 1[31
(= (V/2) • X[31**2

1.5 - (VI2) • X[3JH2
1[3]
(= 1[4] = 1[3] • (I.S - (V/2)fX[3]**2)
(=

(=

INVERT FINAL RESll T AND RETURN
BAD
LDF
POP

SAVE IF
LOAD DATA PAGE POINTER
SAVE V AS FLT. PT. V = (1~)*ZnE
R4 (= V AS INTEGER
R4 (= COII'LE~T ALL BUT SIGN
RI (= I H1/2)*Z"-E
R5 (= RI

NOP

FMllTX
R3,RI
DP

RO = SQRTlV) = VlSIlRTH/V) (DELAYED)
RI (= V
LWSAVE IF
DEAD CYCLE

RETURN octll5 FROM FMIJLTX '

-

------

--

~

HfHIIIIIIIIIIIIIIIIIHHHffHffHHHftltHHHftffHH.

_ . FPIMiX

IIPYF
f

t.IllTTEN BY' IlARV A. SITTON
GAS LIGHT SOFTIIAR£
HOOSTOO, TEXAS
~1989.

EXTENIED PIE. FlT. PT. IMIERSE' RO (= IlRO.

f

1IPI'Rl1lMTE ACCtIlACV' 10 DECIIW.. DIGITS.
IIf'tIT RESTRICTllIiIS' RO != 0.0.
REGISTERS FOO IIf'tITI RO.
REGISTERS USED AlII RESl1IiEDI [I' AlII SP.
REGISTERS ALTEJe' RO-I AND R4-7.
REGISTERS FOO OUTPUT' RO.
RIlUTII£S llEElEI): FllLTX.
EXECIITIIII CYCLES I"IN, MX)I 76, 76.

I

f

IIIII .. IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIH

EXTERNAL _

NIllES

INTEJIIAL CllilSTANTS

lIE

~.

TIll

§"

III1C

'a...>

;;.

FPIIIIIJI

LDF
IIETSl

~

tv

C

ac

RO,RO

; TEST F
, RETlRI tOe IF F = 0

GET _IMTIII! TlI IIF, FOO F = II~) _ 2HE
IIIID 0 (. " ( I, USE' 1[01 = 11-11/2) • 2H-E

~

~

0FF7FFFFFH

START IF FPINIII _

~
~.

1.0
2.0

,TEIT

~
;:s
~

..SET
.SET

PUIIII

..

-

.....
... ..
UP
PUIIIIF
PUIIII

I'II'F

RO
RI
_,RI
R1
R1

,SAI'E[I'
I LOCID MTA PAGE POlmR
; SAI'E AS FlT. PT. F • II~) f 2HE
, FETOt BID( AS INTEGER
; aII'l.EI£HT E • " BUT MIT SIGN BIT
, SAI'E AS INTEGER, AlII BY MGle•••
, R1 (. no] = 11-11/2) f 2H-E.
;-[1'

=X -

IIF

=0 •••

R4

{=

R4
RI

(=

F _ xtO]
2 - F • no]

(=

XII]

F f X[1]
2 - F - X[1]
X12] = X[1] _ 12 - F • XCI])

=

no] - 12 - F f X[Q])

RI,RO,R4
TIIl,R4
R4,RI

R4

(=

R4

{=
(=

IIPYF
SUBRF
IIPYF

RI,RO,R4
TI«),R4
R4,RI

R4 {= F f 1[2]
R4{=2-F_X[2]
RI (= 1[3] = 1[2] f 12 - F f 1[2])

RI

FOR Tl£ LAST ITERATlOO' X[4]
CAll.
SUBRF
CAll.
ADDF

.END

,MTA

RI,RO,R4
TWO, R4
R4,RI

SUBRF
PlPYF

RETS

.GLOlIL FPINIIJ
.GLOlIL FIIJl.TX

::t..

SUBRF
IV'YF
rPYF

f

f

~

NEWTOO ITERATION Foo, VI X)

FII.LTX
M,RO
FIIlLTX
RI,RO

= (x[3] •

11 - IF • 1[3]))) • X[3]

RO {= F • X[3] = I • EPS
RO{=I-F'XC31=EPS
RO {= J[3] - EPS
RO {= XC4] = (X[3]-1I - IF-X[3l))) • X[3]
; RETlRI

:...
~

,~IFDIVX

: .. I:::::.:!lunIIIlIIUIUIIUIIUIIIIfHf

~

IIUTlEII BY. GllRY A. SIn(Jj

§.
~
;::

ti.

~

'C....i'
So

HH+Hf-U*************4***HlHt-*****UHHUH4HHHHH

•

PROGRAI1: FMLUX

•

WRITTEN BY: GARY A. SITTOO
GAS LIGHT SOFTWARE
HOUSTON, TEXAS
MARCH 1989.

•

EXTENDED PRfCISION MlUIPLY: RO

•

APPROXIMATE ACCURACY: 10 DECIMAL DIGITS.
INPUT RESTRICTIONS' NONE.
REGISTERS FOO INPUT: RO.
REGISTERS USED AND RESTORED: OF AND SP.
REGISTERS ALTERED: RD AND R4-7.
REGISTERS Foo OUTPUT: RO.
ROUTINES NEEDED' NONE.
EXECUTION CYCLES IMIN, MAl): 20, 20.

GAS LIGHT S(f'1'WME
1OJSTl'M,'lEXAS
ItIIAtH 1989.

Q

~

..............

•

ElTEIIED PRECISI(Jj DIVIDE' RO

•

APPROlI!lATE ACClRICY' 10 DECIIIAl DIGITS.
IIfUT RESTRICTIOO. RI '= 0.0.
REGISTERS Fill IlfUT' RO IDIVIDENDI AND RI IDlVISOR).'
REGISTERS USED AND RESTIllED' If> AND SP.
REGISTERS AI. TERED' RO-7.
REGISTERS Fill OOTPUT' RO IQUOTlENTl •
RWTIIES t€EDED. FlU.TX AND FPIN'1X.
EXECUTIOO CYCLES I"IN, !lAl): 107, 107.

•
•

(=

RO/RI.

~

•
•
•
•
•
•

(=

ROtRI.

***H**H*****************,*****************************

~

ElTERNAI. _

~

N

C

o
C

NIllES

EXTERNAL PROGRII'I NAi'ES

.!L0lII. FDIVl
.GLOlII. FPINVl
•GLOIIL FlU.11

• GLOBL FMULTX

.TEXT

START OF FHUlIX PROGRAM

• TEXT

START OF FDIVl PROOAAII

FHUlTXI

FDIVX'
LlF
LlF
CALL
LlF

RO,R3
RI,RO
FPIN'lX
R3,RI

IIR

FlU.TX

R3
RI
RO
RI
RD

RETURN OCClllS FRil'I FlU.IX '

(=
(=

(=
(=
(=

X
Y
I/Y
X
X/Y

ABSF
XOO
ABSF
MFYF
LOF
ANON
SUBRF
MFYF
ADDF
LDF
ANDN
SUIIRF

MFYF
ADDF
NEGF

RO,R4
RI,RO
RI,R7
R4,R7,Rb
R4,RS
OFFH,RS
R4,RS
R7.AS
Rb,RS
R7,R6
OFFH,R6
R7,Rb
R4,Rb
Rb,AS
AS,Rb

; R4

(=

:XA!

; RO
; R7

(=

(=

SIGN INFO.
:XB:
A'B
: XA:
A = XA - EA*2H-24
EAf2*f-24
B*EA*2**-24
A*B + B*EAI2H-24
:XB:
B = XB - EB*2**-24
EB*2**-24
AAEB*2u-24
:XAtXB: = A*S + IB'EA+A*EB)'2ff-24
- :XA*XB:

; Rb

(=

; RS

(=

; f?5

(=

;
;
;
;

RS
R5
R5
Rb

(=

; Rb

(=

;
;
,
;

(=
(=

R6
R6
AS
R6

(=
{=
(=

(=
(=

TEST FOO XAIlB ( 0 AND RETURN
pop
BUD

g

LOF
LDFN
LOF

R4

R4 (= RETURN ADDRESS

R4
RO,RD
Rb,AS
RS. RO

RETURN IDELAYED)
TEST ORIGINAL (lA' XS)
IF XAteXB (

RO

(=

XA*XB

(J

THEN R5

(=

-:XA*XB:

....
....
o

fHtoHHHIIIIIIIIIIIIIJIIH"fUIHf:HHHHHUHHH*H***ffHIH*fl**H**H

••

"HH**I********I"***I***********************+*********
-II-

PROGRAM: IL0G2

•

~ITTEN

PROGRAIU $llATHI.ASII
BY: GARY A. SITTON

INTEGER I32-BITI /lATH ROUTJNES

GAS Llffi'T SOFTWARE

$llATHI,ASII CIJoiSISTS OF THE FOLlOWING ROUTINfS:

MARCH 1989.

HOUSTON, TEXAS
IL0G2'- COIIPUTES M = L0G2tN), N={ 2HM FOR USE WITH RADIX 2 FFT

•

INTEGER LOG BASE 2: 110

•
•
•
•

INPUT RESTRICTIONS: 110 ) O.
REGISTERS FOR INPUT: RO.
REGISTERS USED AND RESTORED: 51'.
REGISTERS ALTERED: lAO-I AND RO.
REGISTERS FOR tlJTPUT: RO.
ROUTINfS NEEDED: NONE.

(=

tINTEGER) LOG2tRO).

PROORAIIS.
IItJLT - COIf'UTES A /,4-BIT PRODOCT OF Till 32-BIT NIl'IBERS.
IDIV - CllIf'UTES THE QOOTJENT AND RE/lAINDER OF Till 32-BIT MlMBERS.

t

tHfHf*~I*HfHHHH**HH*H*I***"HI**H**HH

EXTERNrL PROGR!III NAl'lES
•GLOBL ILOG2
.TEXT
START Of ILOG2 PROGRAM
ILOG2:

::to.

~

ag'
~

~

;:s

~

~.

'c>
....
So
~

~
~

N

C

Q
c

LOOP:

LDI
LDI

1,1110
-1,1R1

, IRO
, IRI

ClV'I
LSH
ADDI
ClV'I

1110,110
LiltP
1,1110
1,IRI
1110,110

,COIIPAREITON
, LOOP IF N ) I IDELAYED)
, I (= 2*1
, M= M+ 1
,COIIPAREITON

LDI

IR1,RO

BOTD

RETS

, RO

(=

I tlNIT. 11
UNIT. -1)

(= "

(=

, RETURN

L0G2IN)

~

2

• _mil BY'

§.

•

HIlUsnJI, ' TElAS

•

1IIIRtII1!I89.

~

•
•

~
~

~

~
§.
~
..,'"
S.
flo

llIEl
LSH
ADDI
AIlllC

...... IIILT

GMY A. SITT.
IlASLlIlHnilfTlIME

••
•

~

SlJBRI

SlJBRB
OOIIE'

IRY RESTRICTIIlIIS' IDE.

I
II1II116 IEEIEIlI JOE•
111111111111 .. 1111111111111111111111111111111111111111111

ElTEJllll.PROORIIII_
.1l.IlIl. IIILT

~

.TEXT
STIIRT

c

(f

IIILT PROORIIII

IIILTI
XlII
AIlSI
AIlSI

RO,RI,ARO
110
RI

ARO <= SIGlUI (_II
110 (= IX:
RI (. :V:

SEl'MATE lU.T1PLIER IINIllllLTlPLICANIl IN TIll PARTS

LDI
LSH
AlII

LSH
IINIl

-16,MI
MI,RO,R2
OFFFFH,RO
MI,RI,R3
OFFFFH,RI

MI <. -16 (FOR SIIIFTSI
R2 (. XI • lFPER 16 BITS (f IX:
110 (= XO • LDWER 16 BITS Of IX:
R3 <= VI = II'PER 16 BITS (f :V:
RI <= VO = I.0IO 16 BITS (f :V:

CMRY WI TIE IILTlPLICATIOi

IPYI
IPYI
IPYI
ADDI
IPYI

RO,RI,R4
R3,RO
R2,RI
RO,RI
R2,R3

R4 <= XOtVO = PI
110 <= IOtVI =P2
RI (= XlIVO = P3
RI <= P2+P3
R3 <= Illy! =P4

P\lT TIE PROIltTS TOGETl£R

-....
~

LDI
LSH
ClI'I

RI,R2
16,R2
O,ARO

)0

0 TIEN DOlE (lElAY£DI

<= II'PER 16 BITS (f P2+P3
<= MO = LOIB _ (f TIE PROru:T

<= WI =II'PER _ . (f TIE PROIU:T

R2
R2

<= P2+P3
(=

tIEl(

0,110
O,RI

; RO
; RI

LIllER 16 BITS (f P2+P3
TIE SIGN (f TIlE PROII.tT

RfTS

(f

OPPOSITE SIGN

<= -wo

<= -WI

I

IEDlSIERS FIR 11FUT. 110 IIIIl RI.
IIllISIERS USED AlII RESTlIED' SP.
IEIlISIERS M.TDEII' IIRO-I IIIIl RO-4.
IlEDISIERS FIR ilITPUT' RI UI'PERI IN) 110 (LOIBl.

Q

IF
RI
RO
RI

NEGATE TIE PROIU:T IF IUlBERS IERE

IIIIBIER 32 I 32 IILTlPLV' RI, 110 (= _I.
lUlU IS TIE 64 BIT PROru:T (f TIll 32 BIT III'UTS.

~

DOlE
MI,RI
R4,R2,RO
R3,RI

;RfTI.IlII

(WITH_I

w

IHHfHHHHHHHHfHftHHfffHffH-fHHHffHitHfH

N

-

_'IDIV

_

WRITTEN BY' GARY A. SITT~
GAS LIGHT SlFTWARE
IWST~, TEXAS
I'IARCH 1989.

SUBI
LSH

IOO,IRI
IRI,RI

: IRI (= DIFFERENCE IN ElPOIENTS
; RI (= ALIGNED DIVISOR WITH DIVIDEND

00 IR1+1 SUBTRACT l SHIfTS.

RPTS

suee

IRI
RI,OO

; REPEAT IRI+l TII1ES
; RO (= 2-100 - RII

I1ASK OFF Tf£ LOWER IRI +l BITS IF 00

•

-

-

INTEGER 32 I 32 DIVlIE' RO, Rl (= RO/RI.
RESllT IS A 32 BIT QUOTIENT AND lREI1AINDER:.

LDI
SUBRI
LSH
NEGI
LSI!
SUBRI
LSH

INPtIT RESTRICTI(JlS' Rl != O.
•
REGISTERS Fill INPtIT. 00 IDIVIDENDI AND RI IDIVISORI._
REGISTERS USED AND REST€IlED. SP.
REGISTERS AlTERED' lRO-I AND RO-3.
REGISTERS Fill WIP\JT. RO IQUOTIENT! AND
RI i:REI1AINDERIi.
ROUTINES NEEDED' NIl£.

OO,RI
31, IRI
IRI,RO
IRI
IRI,OO
-32,IRI
IRI,Rl

Rl (= : REMAINDER, QUOTIENTi
IRI (= 32 - IIR1+1I
00 (= RO SHIFT LEFT IRI
IRI (= -IRI
RO (= II:IIY:
IRI (= -IIRI+!}
RI (= : REMAINDER:

CHECK SIGN AND NEGATE RESllT IF NECESSARY.

tHIIIIIIIIIIIIIIIIIIIIIIII ••• I.IHfHHHfHHfHttHtH

EXTEIINAl _

NEGI

NAI£S

ASH

LDINZ
at'I
RETS

•GI.OBL IDIV

OO,R3
-31,R2
R3,OO
0,00

R3 (= -:XlI;V:
TEST SIGN BIT
IF SET 00 (= -RO
SET STATUS FRO! RESULT
RETIJlN

START IF IDIV _
RETIJlN ZERO QUOTIENT.

~

~

a
~.

.TEXT
ZERO:
IDlY:
IETERllIIE SIGN IF RESULT. GET ABSa.UTE VAlUE OF IFERIINDS.

•END

~

XIIl
ABSI
ABSI

~.

at'l
IIIIID

~
;:s

'C....>
;;.
~

~
~

RO,RI,R2
RO
RI

R2

(=

00

(=

SIGNUIt IRO/RII
IX:

RI

(=

:Y:

TEST INPtIT VAl.I£S

OO,Rl

; cat'ARE DIVISOR TO DIVIDEND

ZERO

; IF RI ) RO TIEN RETIJlN 0 IJELAVEDI

-'11£ _ . USE DlFFEREN:E IN EXPOIENTS AS
SHIFT COOIT Fill DIVISOR, AND AS REPEAT ~T Fill SUBe.
FtIlAT

PUSIF
....
LSII

RO,R3
R3
IRI
-24,IRI

R3 (= NI!IIAlIZED DIVIDEND
PUSH AS FtIlAT
IRI (= INTEGER
IRI (= DIVIDEND EXPl>'£NT

RI,R3
R3
IRO
-24,IRO

R3 (= NI!IIAlIZED DIVISOR
PUSH AS FtIlAT
IRO (= INTEGER
lRO (= DIVISOR EXPOIENT

tv

C

ac

LDI
LDI
RETS

FtIlAT
PUSIF

....
LSII

RO,RI
0,00

RI (= : REMAINDER:
RO (= 0 QUOTIENT
RETIJlN

;:...

IUIIUIIIIIIIIIIIIIIIIIIIIIIIIIII •• IIKIIUIfHHHHftffHfHfUHffffUHHt

IfHUH**f**fffHUHfffHfHfH+HMI.I.I.llllllllllllllfH
t

PROGRAM: 'fCMl'lUlT

*

WRITTEN BV: GARV A. SITTOO

g

I'IIOOIWU 'YECT(Il.ASII

~

\£tT(Il UTILITIES

GAS LI IlHT SlFTIIIIRE

tYECTOO.ASII mlSISTS OF 11£ FOU.I*IJ«; ROJTINES:

HOUSTOO, TEXAS
FE8RUARV 1989.

'"§


+C8ITREV - IN-PLACE BIT REVERSE PERllJTATlOO 00 A COO'LEX _v WITH
SEPARATE REAL AND Il'IIGlNARV _VS •

~

+FMlEEE - IN-PLACE FAST CIlNI'ERSlOO OF AN IEEE _v TO A TI1S32OC30

MCDRIILlT ENTRV PROTOCOL:
VARIABLES FOO ItflIT:
'lAD! -) mOl, 'IAD2 -) mOl,
'SAD! -) mOl, 'SAD2 -) 12[01,
$N =N (LENGTH), tPARItS = DATA PAGE.
INPUT RESTRICTIOOS: $N ) O.
REGISTERS ALTERED: RC, 111', ARO-3 AND RO-3.

~

*
+TOIEEE - IN-PLACI: FAST CONVERSlOO OF A TMS32OC30 ARRAV TO AN IEEE

RCORnT ENTRV PROTOCOL:
REGISTERS FOR ItflIT:
ARO -) mOl, AR! -) mOl, AR2 -) X2101,
AR3 -) 12[01, RC = N (LEN3TH).
INPUT RESTRICTIONS: RC ) O.
REGISTERS ALTERED: RC, ARO-3 AND RD-3.

•
*
*

REGISTERS USED AND RESTOOElI: SP.
REGISTERS FOR OUTPUT: NONE.
ROLTlNES NEEDED: NOMO.

~

~

IV

C

_V.

tnt'lEX _VS.

_V.

-v.

[}

*IIECIlJLT - IN-PLACE IILlTlPLIES A COOSTANT TlMES AN ARRAV.

C

+CONIIOV - MOVES (FllLS) A COOSTANT INTO AN ARRAV.
tVECI'IJII - MOVES (COPIES) AN ARRAV INTO ANOTHER ARRAV.
H . .tHffHffH* ....n+ffHH**ftHffHHHHHftHHHfUH.. t-HtffHffHffHf

ffHttUffHHH***fHUHHft*****fHHHHfHHfHHI-tfHH

EXTERNAL MEMORV
• GL08LtPARMS

AD~SSES

, PARAMETER PAGE ADDRESS

EXTERNAL VARIABLE ADDRESSES
.Gl.08L
.GL08L
.GL08L
•GL08L
•GL08L

ARRAV LENGTH N
ADDRESS OF I tflIT
ADOOESS OF INPUT
ADDRESS OF ItfliT
AD~SS OF ltf'lIT

$N

'lAD!
'IAD2
'SAD!
'SAD2

X!
Yl
X2
V2

EXTERNAL PROGRAII NAMES
.GL08L MCDRtIlILT
.GL08L RCIJRI1ULT

,MEtl)RV ENTRV FOR COIIPLEX (CORR.) MlILTlPLV
,REGISTER ENTRY FOO CO'IPLEX (CGAR.) IfillPLV

START Of PROORAI1 AREA

IN

• TEXT

IN

MEMORV BASED PARAMETER ENTRV

W
..j::o.

~T:

l.IIf'
LDI
LOI
LOI
LOI
LOI

HfUHHHf ... IIIIIUII ••• IlIiHHHHfHfU,IIII.U'IUlfff

*
LOAD DATA PAGE POINTER
Re (= N
ARO -) mOl
ARI -} mOl
AR2 -} X2[01
AR3 -) ¥2IOI

HPMIIS
HN,RC
HIADI,ARO
HIAD2,ARI
HSADI,AR2
HSAD2,AR3

REGISTER BASED PARAIETER ENTRY

_:.aHlULT

*

IilITTEN BY: GARY A. SITTON
GAS LIGHT SOfTWARE
HOUSTON, TEXAS
APRIL 1989.

•
•
•

COi'IPLEX IN-PLACE FREQI£M;Y COl/liN CIlNI'Il.UTION:
CI (= CI C2, CI AND C2 ARE BOTH IF lfNGTH
N, AND CI = IXI + ItVll AND C2 = IX2 + lVF

::

If'YF
ADIF

If'YF
SUBf

LOIJ>I: STF

I:
~

D

STF

RETS

(=

N- I

REPEAT lLOCI< N TII£S
R! (= XlIll.X2[IJ
RJ (= YlIIIIY2[IJ
RO (= V1IIHX2[IJ, INCR. AR2 AND ...
R2 (= XlIIIIX2[IJ + YlIIIIY2[IJ
RI {= XlIIIIY2[IJ, INCR. AR3
RJ (= YlII l1X2[IJ - XlIIIIY2Ill
XlIII (= R2, INCR. ARO AND ...
YlIIl (= RJ, IN:R. AR!

; RETIJlN

I

~

~.

~

EXTERNAL IElOlY ADOOESSfS
•ROlli. $PMIIS

~

EXTERNAL

i'

.ROlL
.ROlli.
.ROlli.
.ROlII.
.ROlli.

s.
'"

EXTERNAL _

;::

'0'
....

~

~

N

; PARAIETER PAGE ADmoss

~ARIABI..E

ADmoSSES
ARRAY lfNGTH N
ADmoSS IF INPUT
ADmoSS IF INPUT
ADOOESS IF INPUT
ADmoSS IF INPUT

$N

$IADI
$IAD2
$SAD!
$SAD2

.ROlli. /OJIIPU.T
.ROlII. RCCH1llT
START IF _

NIllES
; IElOlY ENTRY FOO COIf'LEX (CONY.) IU.TIPLY
; REGISTER ENTRY FOO ClIIf'l.EX (CONY.) IU.TIPLY
AREA

C

Q
c

Xl
YI
X2
¥2

.TEXT
IElOlY BASED PARAl£TER ENTRY

~

II:IIIU.TO

~
~
~
§.

UIP
LOI
LOI
LOI
LOI
LOI

~

~
;:
~

ctltPlfl IlLTIPLV (COIMLUTION) LOOP

So
'I>

Q
c

l$SAD2,1IR3

*

IIUTTEN BV.

*
*
*

BIT REVERSE INDEX I1AP TIIO REAL MRAVS AS A SIIIlLE
COIf'LEX MRAV WITH TI£ SWAPPING DONE IIH'UICE.
xm, VIIl (-) nJl, VIJl, WIER£ J = IIRm.
LENGTH OF MRAVS N >= 4 IS ABSWJTELV REQUIRED.

SUBI

I,RC

;RC<=N-I

*

IQITREV ENTRY PROTOCOL'
VMIABlfS FIJI ,INPUT'
SIADI -) 1[01, SIAD2 -) V[ol,
SN = N (LENGTH), $PARI!S = DATA PAGE.
INPUT RESTRICTIONS' SN >= 4.
*
REGISTERS I'LTERED' Re, lIP, lRO, ARO-3 AND RO-3. *

RPTB

LOOP2
1ARO,*IlR2,RI
IMI,IM3,R3
*IlR2++,IMI,RO
R3,RI,R2
1ARO,*IIR3++,RI
Rl,RO,R3
R2,1MO++
R3, 4Ml++

;
;
;
;
;
;

II'YF
II'YF
II'YF
II

SUIF
II'YF
ADDf

L.Oil'2' STF
II
STF

RETS

REPEAT BLOC!( N TIllES
RI <= II [IJIX2m
R3 <= YHl l*V2[IJ
RO <= VHll*X2I1l, INCR, 1IR2 AND .. ,
R2 <= XHll*X2m - VI[lJIY2I1l
RI <= lIlIl*V21I1, INCR. 1IR3
; R3 <= VI [IJ112[IJ + Xl[ n*Y2IIl
; lIIIl <= R2, INCR, ARO AND .. ,
; YHll (= R3, III:R, IIRI

*

RCB ITREV ENTRV PROTOCOL'
REGISTERS FIJI Itf'Ur:
ARO -) nOl, IIRI -) V[ol, Re = N (LENGTH), *
INPUT RESTRICTIONS' RC >= 4,
REGISTERS I'LTEREG' Re, lRO, IIRO-3 AND RO-3.

; RETURN

*
I

REGISTERS USED' AHD RESTORED' 51'.
REGISTERS FIJI OOTPUr: NONE.
ROOTiNES NEEDED' NONE,

A. SITTON
GAS LIGHT SlFTIIME
HOUSTON, TEXAS

fJN('{

IMCH 1989.

RaIfllTl

'&
.,

~
~

UlAD DATA PAGE POINTER
RC (= N
ARO -) mOl
MI -) mOl
M2 -) X2[01
IIR3 -) Y2[ol

ItII,RC
I$IADI,ARO
1$1AD2,MI
_1,M2

REGISTER BASED Pt1RAIETER ENTRY

~.

~

-

*

*

HHHfHftHHHHHHHHHH+I••• lllllllllllltH+HHf

EXTERNAL IEIIIRY ADDRESSES
.GlOIIl SPAIII!S

; PMA/ETER PI\Ii: ADDRESS

EXTERNAL VARIABLE ADDRESSES

,GlDBL SN
,GlOBL SIADI

ARRAY LENGTH N
ADDRESS OF Itf'UT X
ADIJIESS OF INPUT V

,GLQBL SIAD2

ElTERNI'L PROGRAII _S
,GlOBL rl:BITREV
,GlDBL RCBITREV

• I'EIKIlY ENTRY FIJI ctltPLEX BIT REVERSE
; REGISTER ENTRY FIJI ctltPLEX BIT REVERSE

START OF PROGRAII AREA
• TEXT

-

rEllJRY BASED PARAltETER ENTRY

w

VI

~

MCBlTREV'

w

LIIP
LDI
LDI
LDI

0\

-

LOAD DATA PAGE POINTER
RC (= N
ARO -) ARRAY X
ARI -) ARRAY Y

lSH,RC
HIADI,ARO
ISIAD2,ARI

fHHfttHHHHt**ttHHfIHfHfHtHHHHtfffIHftHtt

*

PROGRAI'1: ...FPllEEE

•

II1InEN BY: GARY A. SInON
GAS LIGHT SOFTWARE
HOOSTON, TEXAS

•
•

CONVERT Ii'! ARRAY IF IEEE FLOATING-POINT N..tIBERS TO •
Tl1S32OC3O FLDATING-POINT F!R1AT. ASSlIES NO: INF., •
NIi'I, OR DENORlW.IIED NUItIIERS.

REGISTER BASED PIlRAllETER ENTRY

rtARCH 1989.
RCBITREV:
LDI
SUBI
LSH
LDI
MI'
MI'

LDI

RC,IRO
,
3,RC
,
-I,IRO
,
ARC,AR2
,
>AR2++!lRO)B
tARO++

ARI,AR3

IRO (= N
RC (= N - 3
IRO <= NIZ FOR BIT REVERSE
AR2 -) ARRAY X (SIT REV.)
,INCR. JIR(AR2) (OOTSIDE LOOO)
, Ita. ARC (OOTSIDE LOOP)
, AR3 -) ARRAY Y (SIT REV.)

•

•

tF"IEEE ENTRY PROTOC!X.:
VARIABLES FOR INPUT:
SIADI -} X[OI, SN =N (LENGTH),
SPARHS =DATA PAGE.
INPUT RESTRICTIONS: SN ) 0,
REGISTERS ALTERED' RC, lIP, ARO-I AND RO-I.

•

RFHIEEE ENTRY PROTOCOL:
REGISTERS FOR It= AR2, LOOP (DELAYED)
INCR. ARI
• INCR. BR{AR3)
RO <= XIll, INCR. ARO

tAR2,R2
, R2 <= l[JI
tARI,RI
, RI <= YIll
tAR3,R3
, R3 <= Y[JI
RO,tAR2
, x[JI <= RO
R2,.,XIll (= R2
RI,tAR3
, Y[JI (= RI
R3, tARl
, YIll <= R3
_!lRO)S
,INCR. BR(AR2)

tHHHfffHHHHfHHHf+HHHHHIIIIII •••• IIIIHHH

EXTERNAL I1EIIlRY ADIIlESSES
• GLOBL SPARItS

, I'AIWtETER PAGE ADIlRESS

EXTERNAL VIIRIABLE ADIIlESSES
.GLOBL SN
.GLOBL SIADI

, RETIRl

;:

B.
c

• ARRAf LENGTH N
, ADDRESS IF INPUT X

EXTERNAL PROGRAIt NIII£S

;:

'('"
3">

.GLOBL tFMIEEE
.GLOBL RFHIEEE

....
So

CONSTIi'!TS FOR BOTH CONVERSIONS

'"

~

f:j
N

C

ac

, I£IIlRY ENTRY FOR IEEE -) 'C30 CONVERSION
, REGISTER ENTRY FOR IEEE -) 'C30 CONYERSION

•DATA
CTAB

.WORD
,WORD
•WORD
,WORD
.WORD

OFFBOOOOOH
OFFOOOOOOH
07FOOOOOOH

06II000000H
081000000H

::t..

TAIIA

~

.-

~

.TEXT

§'

_

~

~

ClAS

STIIRT IF _

Hl+fHffHfHfflfHHfHfHHHfHHfffHHfHHHfHfH

AREA

IAlIEEE'
LIf

ISPAIIIIS

is'

LDI

lIN. lit
ISIADI,ARO

un

; UWI INITA MGE POINTER
, lit <= N
, ARO -) IEEE MAAY

~

'"
'c>
...
So

'"

~

~

tv
C

ClC

_ , ITOIEEE

•

WRlmN BY' GARY A. SITTOO
GAS LIGHT SOFTWARE
HOiJSTOO, TEXAS
APRIL 1989.

•
•
•

CONYERT AN ARRAY OF T1IS320C30 FLOIITING-POINT
IUIBERS TO IEEE FLOIITING-POINT FIetAT. ZERO
IS TIE OILY SPECIAL CASE.

•

tlTOlEEE ENTRY PROTOCOL'
VARIABLES FOR INPUT'
'IADI -) X[O], $N =N (LENGTH),
$PAIIIIS =OIITA PAGE •
INPUT RESTRIClIOOS' $N ) O.
REGISTERS ALTEIIEG' lit, DP, ARO-I AND 00-1.

•

RTOIEEE ENTRY PROTOCOL'
REGISTERS FOR INPUT'
ARO -) 1[0], lit =N (LENGTH).
INPUT RESTRICTIONS' lit ) O.
REGISTERS ALTERED' lit, ARO-I AND RO-l.

•
•
•

REGISTERS USED AND RESTORED' SP.
REGISTERS FOR OUTPUT' NONE.
ROUTINES NEEDED' NONE.

•

NOTE; ITOIEEE SHARES TI£ CTAB TABLE FROH tFI1IEEE

BASED PMAI£TER ENTRY

~
~
~

•

REGISTER BASED PMAI£TER ENTRY
AFltIEEE'
SUBI
LIf
LDI

1,1It
I!CTAB
ITABA,ARI

,RC<=N-I
, LDAD INITA PAGE POINTER
; ARI -) ClIISTANT TABlE

IEEE -) 'C30 COOERSIOO LOIP
RPTB

POPF

00
00

REPEAT LOIP N TIllES
REPlACE FRACTION WITH 0
SHIFT SIGN AND EXI'(JIENT INSEATING 0
IF ILL ZERO, LOAD 'C30 0.0
TEST ORIGINAl. IUIBER
IF)= 0, STORE IUIBER (DELAYED)
; RElI7YE EXPOOENT BIAS 1127)
; SAo,£ AS AN INTEGER
, UNSAVE AS A FLT. PT. NlMIIER

NEGF

00

; NEGATE 'C30 IUIBER

.GLOBL $PARIIS

; STORE 'C30 IUIBER, INCR. ARO

EXTERNAL YARIABLE ADORESSES

, RETlRl

.GLOliL $N
.GLOBL SIAD!

AND
ADDI
LDIZ
LDI
IIGED
SUBI
PUSH

LOOP4' STF

RETS

LOOP4
IARO, fAAI, 00
IARO,OO
fflIRIII),RO
IARO,RI
LOIP4
fflIR1(2),RO

RO,+ARO++

,
;
;
;
;
;

fHfffHHffffHHf.l-tHHHHfffHHfffHfffHHtfHffH

EXTERNAL IIEI10RY ADORESSES
; PARli'ETER MGE ADffiESS

; ARRAY LENGTH N
; ADffiESS OF INPUT X

EXTERNAl. PROGRAII NAllES
.GLOBL KTOlEEE
• GLOBL RTOIEEE

; KEKORY ENTRY FOR 'C30 -) IEEE CON\IERSlOO
; REGISTER ENTRY FOR 'C30 -) IEEE CONYERSIOO

START OF PROGRAK AREA
• TEXT

-

KEMQRY

I.»

- .I

KTOIEEE'

BASED PARAKETER ENTRY

~

\I)

LlI'

00

LDI
LDI

-

LOAD DATA PAGE POINTER
RC (= N
ARO -) '1:30 ARRAY

ISII,RC
IfIADI,ARO

+fHllfHHHfHH***fftfffHHHHHHHHHffHtHHHHHf

•

PRiJGRIiII' Il'ECII..lT

•

WRITTEN BY' GllRY A. SlTIlIII
GAS LIGHT SOFTWARE

RElliSTER BASED PARlIIETER ENTRY

I«JUSTIIII, TEXAS
FEBRUARY 1989.

RTOIEEfI
SIIII
LDP

LDI

RC<-N-I
LDAD DATA PAlE POINTER
ARI -) C!IISTANT TABLE

I,RC
teTAS
ITASA,ARI

•
•

PlllECIlLT ENTRY PROTOC!lL'
VARIABLES FOR IN'UTI
SIADI -) X[o], fN

'1:30 -) IEEE crtIIIERSllIiI LlXI'

RPTB

l.lD'5

ABSF
LDFZ

LDF

tARO,RO
ttARl(4),RO
I,RO
RO
tARO,RI

BGED

l.lD'5

POP

ADDI
LSH

RO
ttARl(2),RO
-I,RO

REPEAT LOOP NTIllES
TEST INJIIIER I
IF = 0, LOAD FAKE 0.0
SHIFT OFF SIIlN BIT
SAVE AS A FLT. PT.
TEST ORIGINAl. tuIBER
IF >= 0, STORE i'UIBER IIElAYED)
; lNSA'IE AS AN INTEGER
; ADD EXI'IJEIIT BIAS (27)
, ADJUST FOR SIGN BIT

OR

ttARl(3),RO

; NEGATE IEEE Nli1IIER

LSH
PUSIf

::...

~
~

~.

l.lD'5' STI
RETS

RO,_

,
,
;
;
;
;
;

SCALAR - 'lECTOR I1Ul.TlPLY' xm <= xmoc, C IS A
CllilSTANT AND TIE ARRAY X IS OF l.EIIGTH N >= I.

•

•
•

R'IECIlLT ENTRY PROTOC!lL'
REGISTERS FOR IN'UT'
AND -) X[o], RO =c, RC = N (LENGTH).
IN'UT RESTRICTIIIIS' RC ) O.
REGISTERS ALTEREll' RC, ARO AND RI.
REGISTERS USED AND RESTORED' SP.
REGISTERS FOR OUTPUT' NIIiIE.
ROUTINES NEEDED' NONE.

; STORE IEEE tuIBER, ItG. AND

; RETlIIII-

EXTERNAL IEItORY ADftSSES

•GLOBL fPARIIS

.GLOBL fN

§'

.GLOBL SIADI

~.

'a...>
So

"'

~
~

N

C

Q

c

; PAR/ItETER PAlE ADDRESS

EXTERNAL VARIABLE ADDRESSES

~
~

=N (LENGTH),

fCNST =c, fPARIIS =DATA PAlE.
IN'UT RESTRICTIIIIS' $II ) O.
REGISTERS ALTEREll' RC, 11". ARO AND RO-I.

ARRAY l.EIIGTHN
ADDRESS OF CllilSTANT C
ADIlRESS OF IN'UT X

•Gl.OBL fCNST

EXTERNAL PROGRIiII NAIIES
•GLOBL PIIIECIlLT
.GLOBL R'IECIlLT
START OF _

,1EItORY ENTRY FOR SCALAR - IETOR IlLTIPLY
; RElliSTER ENTRY FOR SCALAR - IETOR IlLTIPLY

AREA

.TEXT
IEItORY BASED PARlIIETER ENTRY

II'IECIU.TI
LlI'

LDI

ISII,RC

; LDAD DATA PAlE POINTER

, RC

<= N

~

LDI
lIF

~

~

§'

REGISTER BASED PAIW£TER ENTRY

Cll'I

lilT

~.

'"

~

~
N

*

IIIITTEN BY: G/Iirf A. SITTrIl
GAS
SOF11IARE
_LIGHT
, TEXAS
FEBRliIRY 1989.

SUBI
IPYf

~

So

; ARO -) xto]
;RO<=C

1MtIU.T:

.s;,
~
'C...>

IfIADI,ARO
ItCNST,RO

2,RC
RO, 

•TEXT

~

ac

; I'EIIl!Y ENTRY FIll I'£CTm 111 I/EtTOI lINE
; REGISTER ENTRY FIll I/EtTOI 111 I/EtTOI lINE

STIIIT OF_AREA

s.
c

_Y LENGTH N
ADDRESS OF INPUT I
ADDRESS OF INPUT Y

.GLOBL $N

.s;..

~~

, PARAIETER PA(E ADDRESS

EXTERNAl WlRIABLE ADDRESSES

I'EIIl!Y BASED PARAIETER ENTRY
11I'£CIIOY'
LIP

1$1'_

WI
LDI

UN,RC
I$IADI,ARO

lDAD DATA PAGE POINTER
RC <. N
ARO -) X[O]

~

LDI

~
::::

1$1AD2,ARI

; ARI -}

no]

~

~

~

;:
~

~.
~
.,

So
~

~

~
tv
C

ac

I.»
N

•

REGISTER BASED PARAI'ETER OORI

PROCoR!lll; SfFT2. ASI1

SUBI
LlF
Clt'I
lILT

SfFT2.ASI1 CONSISTS

~

g'

**H******HfftHfHf***"*"HHfftHfHfftHHfHHHHHI.IIIIIIII.III •• 111

RADIX 2 FFT ROUTINES

R'IEOOI;

I'ECT~

RPTS
LlF
STF
SKIP2' STF
RETS
.00

2,RC
tAROt+,RO
O,RC
SKIP2

RC<=N-2
RO (= llO]
CfJIf'ARE RC TO 0
IF RC < 0 TI£N SKIP LW'

11M: LOOP
RC
tAROt+,RO
RO, tARl++

; RfPEAT INST, N-I T1tES
; RO <= X[/+l]
; 11M: xm TO 1m

RO, tARl

; 11M: UN-ll TO YlN-I]
; RETlIlN

[f

TIE FOI..Lc.lING ROOTINES;

CFFFT2 - COI'I'L£X DlF FOR\IAAlI RADII 2 FFT USING SEPARATE REAl.. AND
I!lAGINARY ARRAYS AND 3/4 CYClE SINE TABLE,
CIFFT2 - CIlII'LEX DlT INI'ERSE RADII 2 FFT USING SEPARATE REAl.. AND
I!lAGINARY ARRAYS AND 3/4 CYClE SINE TABLE 1000S NlT INCLUIE
THE liN SCAlE FACT~.
.IHIH***** ... lnn****unfffHU*HnHIHHHfft.Hff+ftH+fffftHHHftftf

w

~

....... H-Hfl**t+HI+H ... ll'HfHH-IffH*lfH*H4fH*H*ff*IHHit

LD!
LSH

CFFFT2

I

_:

I

WRITTEN BY: GARY A. SITTON
GAS LIGHT SOfTWARE
HOUSTON, TEXAS
IWiCH 1989.

I

SPECIAL IffiSION USES 3/4 SINE TABLE LOIJI(lij' WITH
Tl£ PAAAI£TERS PASSED IN PREDEFINED i'IElUlY LOCATIONS.
CM'LEX RADIX-2 DIF FORlIARll FFT FOR Tl£ TllS320C30.
THIS _
ASSlIIES _ _ ORDERED DATA AS INPUT,
lIUT LEAY£S THE OUTPUT INDEXED IN Bll REVERSED ORDER.
TWO POINTERS ARE USED FOO SEPARATE REAL AND llIAGlNARV
ARRAYS.

LDI
LDr

I

I

•
I
I

I

•

•
t
t

VARIABLES FIlR INPUT:
$IADI -) REAL!Ol, $iAD2 -) 1I1AGIO],
$N =N (LENlTH), $/\
(L0G2(N)),
$SIIE -) SINE TABLE, $PAIlI!S = DATA PAGE.
INPUT RESTRICTIONS: $N )·1.
REGISTERS ALTERED: Re, lJ', IRQ-I, ARO-7, AND RO-7.
IiEGISTERS USED AND RESTORED: SP.
REGISTERS FOO OUTPUT: NOlIE.

t

IWTINES NEEDED: NlNE.

LSH
LD!

OUTER LOOP
fLOOP:

I

="

I

~

•G.GBL CFFFT2

EXTERNAl MEMORY ADIlRESSES

~

.G.0iII. $Slit:
•G.Oa. $PARItS

~
;:s

FIRST INNER LOOP l
N

LDr
LSH
LDI
LDI
LDI
LSH
LDI

~

IRI (= N
IRI (= N14, OFFSET FOR COSINE
ARb (= K (INlT, M)
R7 (= N2 IINlT, 11
R5 (= N
R5 (= IE I INlT, NI2)
IRO (= NI (INlT, 2)

IRO,IRI
-2,IRI
I1$l1,AIlb
I,R7
IRO,R5
-1,R5
2,IRO

MPVF
SUBf
MPYF

LDI
ADDI
LDI
ADDI
LDI
SUBI

@$IADI,AIlO
R7,ARQ,ARI
@$IAD2,AR2
R7,AR2,AR3
R5,Re
1,Re

;
;
;
;
;
,

AIlO -) XIO)
AIlI -) XILI
AR2 -) VIO)
AR3 -) VILl
SETUP 1ST INNER LOOP REPEAT roJNTER,
RC lONE LESS 1HAIITHE DESIRED #)

IBLK2:

"

FIRST INNER LOOP IUNITV TWIDDLE FACTOR)

RPlB

;:..

g
~

9-.
<:>

ADDF
SUBF
ADDF
SUBF
STF
::
STF
IBLKI: STF
::
STF
CllPI
SEQD

IBLKI
>ARO, *ARl,RQ
*ARl, +!IRQ, Rl
1AR2, *AR3, R2
*AR3, *AR2,R3
RO,*ARO++IIRQ)
Rl,fAR1++(IRO)
R2, 1AR2++1 IRO)
R3, .M3++(JRO)

;
;
,
,
,
;
,
;
,

REPEAT BLOCK IE TIMES
RQ (= XI!) + XlLI
Rl (= XI!) - XILI
R2 (= VII) + VILl AND",
R3 (= VI!) - VILI
XII) (= RQ, INCR, AIlO AND",
XILI (= Rl, INCR, AIll
VII) (= R2, INCH. AR2 AND".
VILI (= R3, INCR. AIl3

"",AR6
SKIP

,COI1PAREMTDK
; IF K = MTHEN SKIP TWIDDLED LOOP

~

...-.'"'
;OS

<:>

;OS

'"

~
....

S-

~

~
~

tv
C

a
C

ADDF
SUBf
STF
ADDF
STF
ADDF
STF
STF
CMPI

R7,AIl7

; COMPARE N2 10 J

BLTD
LDI
LDI
ADD!

IINLOP
AR7,ARO
AIl7,AR2
I,AR)

; IF J ( N2 THEN LOOP lDELAYED)
; ARQ (= J
, AR2 (= J

SUBI
CMPI

I,ARb
O,ARb
ILOOP
-1,R5
IRO,R)
I,IRQ

, K (= K - 1
,COI1PAREOTOK
, IF K ) 0 THEN LOOP lDELAYED)
, IE (= IEI2
, N2 (= Nl
: Nl (= 2+Nl

8610
LSH
LDI
LSH

lOI
LDI
LDI
LDI

InLIJ': ADIII
LDF
ADDI
ADDI
ADDI
ADDI
ADDI
lOi
SUBI

2,AR7
1,ARQ
1,AR2
!SSII'E,AR5

, J (= 2, IPRE-iNCREllENTED)
; ARQ (= I IINlT. 11
, AR2 (= I (INlT. 11
; AR5 (= IA IINlT. 0)

R5,AR5
iIIR5,R6
AR5, IRl, AIl4
!SIADl,ARQ
ISIAD2,AR2
R7,ARQ,AIll
R7,AR2,AR3
R5,Re
1,Re

;
;
,
,
,
;
,
,
,

AR5 -) SINTABIIA (= IA + IE]
R6 (= SINIXl, IX = 12

HfHfHIHfHHHfHHHHfffHfHfHHHfHfffHflHflffftfHfHHfHffftfff

~

~
N
C

Q
c

PROGRIIII: 1S00UTN

•

WRITTEN BY: GARY A. SITTON
GAS LIGHT S!FTWARE
HOUSTON, TEXAS
/lAY 1989,

*

SOlIlES A SYSTEM OF LINEAR EIllATIONS AfX = Y IN THE •
TABLEAU FIllIIAT B =AH, AN M X N /lATRIX. THIS
MEANS THAT A IS AN M X M SQUARE MATRIX (F aEFFI- •
CIENTS, AND -Y IS AN MX N-II RECTANDULAR HATRIX
•
OF N-II VECTOOS EACH HAVING M ELElENTS. EACH lEPEN- •
DENT VARIABLE COlUMN VECTOR IS NEGATED AND I¥'PEHDED •
TO THE COEFFICIENT MATRIX A. TI£ SET OF N-II INIE- •
PENDENT SOlUTION VECTORS X WILL APPEAR IN PlACE (F •
THE ORIGINAL APPENIEO COLUMNS WHEN SOLUTN FINISHES••
ROW MAJOR MATRIX STORAGE FORItAT IS ASSUMED PLUS
THE PROGRAM ASSUMES N ) " ) 1 ANO B(O, 01 '= 0.0
SINCE THE METHOD USES DIAGONAL PIVOTING ANO STARTS •
WITH BIO, OJ. ANY PIVOT ELEMENT ( 10*,~ IN ITS
ABSOLUTE VALUE wILL IMPLY AN "ILL CONDITIONED"
SYSTEM IF EIllATIONS, I. E. NOT HAVING SUFFICIENT
LINEAR INDEPENDENCE, ANO WILL RESlU IN AN It«:OM- •
PLETE SOlUTION. AN INCOMPLETE SOlUTION WILL BE
INDICATED BY THE VALUE OF R3 = 0.0 ON EXIT, ELSE •
R3' =0.0 AND EQUALS THE LAST PIVOT ELEMENT VALUE. •

SLIIIALG.ASI1 CONSISTS OF TI£ F1LLIIIIMl ROOTII£S:

~.

if

•

PII'OTIMl WITH EXTENIED-PRECISION FLOATlMl-POINT HATH.

INORIIAL PRECISION VERSION)

•
•
o
•

•
•
•
•
•

*
•
•
•
•
•
•
•
•

KSOLUTN ENTRY PROTOCOL:
VARIABLES FOR INPUT:
$IADI -) B(O, 0.1, $NROW = K,
SNCOL = N, $PARMS =DATA PACE.
INPUT RESTRICTIONS: N ) K ) 1.
REGISTERS ALTERED: RC, DP, ARO-7, IRe-I,
ANO RO-7.

•

RSOLUTN ENTRY PROTOCOL:
REGISTERS FOR INPUT:
ARO -) B(O, OJ, ARI = K, AR2 = N.
INPUT RESTRICTIONS: AIl2 ) ARI } 1.
REGISTERS ALTERED: Re, 000-7, IRO-l, AND RO-7. •

*

REGISTERS USED ANO RESTORED: SP.
REGISTERS FOR OUTPUT: R3.
ROUTINES NEEDED: FPINV (SEE $MATH).

,
•
•

NOTE: CMlENTED OUT RND INSTRUCTIONS KAY BE ACTl-

•

vATED FOR: ADDITIONAL ACCURACY WITH LOSS OF SPEED.

•

'"************"******Hff**********Uf*,"**itH*********..
f.»

~

EXTERNAL PRDGRRK NAllES

--.-~T;C;;-

~~--~~-,"-:>---'

~-.-,.--~--.-

.--

~

•Gl.OBl IISOWTN

0'\

.GlOB!. RS01.UTN
.Gl.OBl FPINV

t,,)

rEJ10RV BASED ENTRY
REGISTER BASED ENTRY
RECII'ROCI'l ROOTII£

CAll
RND

FPINV
RO

, RO (= -I/BIK, KI
, ROUND INVERSf

DIVIDE RIGHT PART OF PIVOT ROW BY -PIVOT EWENT
ElTERNIIL PARMETER NAllES

.1iI.IlIl._
.1iI.OB1. 'IADI
.GlOB!. SNROI
.1iI.OB1. SNCOl

, _TER SPACE ADDRESS
, POINTER TO MATRIX B, ADDRESS OF B[o, 01
, NlIIBER OF ROWS IN B, VALLE OF "
,NlII1IIEROFC!l.UI'RSIMB,VIILlEOFN

INTERNAL ctt6TANTS

ADDI
LOI

RPTB
If'YF
RHO
1iI.O!P: STF

ZERO

.FlDAT I.Of~
.SET
0.0
START

, AR7 -) BIK, KI
, RC <= N-i(-2

1il.0!P

,
,
;
,

RO,*++M7,R2
R2
R2.*AR7

, SIMlUI..ARlTY CRITERIQIj
, SIMlUI..ARlTY FLAG

5O.UTN_

IlO!P:

.TEXT

LOI
LOI

O,IRI
ARO,AR4

, IRI <= I (INIT. 0)
, AR4 -) BIO. 01

CllPI

lRO.IRI
SKIP

,COIf'AREITOK
, IF I = K THEN SKIP PIVOT ROW

BEQ

Clli'IPlETE PIVOTING OPERATION

IIEDY BASED PARMETER ENTRY
I!SU.U1lII

•g

LIP
LOI
LOI
LOI

~

~

~

g

~.

~

...

lr
~

~

-

fSIAIlI.ARO
_.ANI
tsNaI..AA2

AR4 .IRO, AR5
1AR5.RO
AR6.RC
I.Re
JUI1P

,
,
,
,
,

SUBI
ADDI
II'YF

I,RC
AR3.IRO,AA7
RO,t++M7.RI

, RC <= N-i(-3
, AA7 -) BIK. J1
, RI <= BIK. Ktll
So

"'

~

~

~

IIR3 -) B[K B[O,

~

LOI
LOI

g'

REGISTER BASED PARIIIETER ENTRY

R

.s;,

§'
R
~'

~
So

"

~~

c

ac

01

(= "

<= N

RSII.IITIII •

_
LOAD MTA PAGE POINTER

LOI
LOI
SIIII
LOI
SIIII

,.
aFf

kT

1110 (= K (lNIT, 01
AR3 -) B10, 01
ARI <. 1'1-1
AR6 <= N
AR6 <- N-2

0,1110
1IRO,AR3
I,ARI
AR2,M6
2,M6

ItAIN LID'

KLID'X' lIF

IRO,IRI
SKIPX

,CCII'AR£ITOK
, IF I = K TI£N SKIP PIVOT ROW

ADDI
LDF
lOl
ClPI

AR4,IRO,AR5
tARS,RO
AR6,Re
I,Re

BLTD

.A.tf'X

AR5 -) BlI, K1
RO (= B[I, K1
RC (= N-i(-2
CCII'AR£ RC TO I
IF RC ( I TI£N Nl RPTB UIEIAYEDI

SUBI
ADDI
MPYF

I,Re
AR3,IRO,AR7
RO,0++AR7,RI

RC (= N-i(-3
AR7 -) B1K, J1
RI (= B[K, K+l1tBlI, K1

START INNER-INNER LOOP (J IIIElI

SETII' LID' REGISTERS

LIP

, IRI (= I (INIT, 01
, AR4 -) B[o, 01

CCII'LETE PIVOTlt«l OPERATION

IEIIJ!Y 'BASED PARlllETER ENTRY

IISWJTNX.
LIP

O,IRI
ARO,AR4

(K

RHO

JLDOPX: STF

R3 (= B1K, K1, tEXT PIVOT
110 <- 1R31
CCII'AR£ IB[K, Kli TO EPS
IF IBl:K, Kli 

Valid

I
I
t2 ~

During write operations, as shown in Figure 5, the RAMs outputs do not turn on at all, due
to the use of the chip select controlled write cycles. The chip select controlled write cycles are generated by the fact that R/W goes active (low) before the STRB term of the chip select input. Because
the RAMs output drivers are disabled whenever the WE input is low (regardless of the state of the
OE input) bus conflicts with the TMS320C30 are automatically avoided with this interface.The circuit's data setup and hold times (t1 and t2 in the timing diagram) of approximately 50 and 20 ns,
respectively, also easily meet the RAMs timing requirements of 10 and 0 ns.

Figure 5. Write Operations Timing
H1

~'-_..JI

\.'------.J

\.'------.J1

\.'---

--'X\..______________--'X\..____

A23-0 _ _

STRB

R/W
031-0

\.'-_____A

/
----'

I

\.~--------------~:----I
-

I

----------« ~L-----~-~I
: }>-----1f-1II---t1 ----tt~

I

~t2

I

I

..,

If more complex chip select decode is required than can be accomplished in time to meet
zero-wait state timing, wait states or bank switching techniques (discussed in a later section) should
be used.
340

TMS320C30 Hardware Applications

It should be noted that the CY7C186's OE control is gated internally with CS, therefore the
RAMs outputs are not enabled unless the device is selected. This is critical if there are any other
devices connected to the same bus; if there are no other devices connected to the bus, then OE need
not be gated internally with chip select.

RAMs without OE controls can also be easily interfaced to the TMS320C30 using a similar
approach to that used with RAMs with OE controls. If there is only one bank of memory implemented, and no other devices are present on the bus, the memories' CS input may often be connected
to STRB directly. If several devices must be selected, however, a gate is generally required to AND
the device select and STRB to drive the CS input to generate the chip select controlled write cycles.
In either case, the WE input is driven by the TMS320C30 R!W signal. Provided sufficiently fast
gating is used, 25 ns RAMs may still be used.
As with the case of RAMs with OE control lines, this approach works well if only a few banks
of memory are implemented where the chip select decode can be accomplished with only one level
of gating. If many banks are required to implement very large memory spaces, bank switching can
be used to provide for multiple bank select generation while still maintaining full speed accesses
within each bank. Bank switching is discussed in detail in a later section.

Ready Generation
The use of wait states can greatly increase system flexibility and reduce hardware requirements over systems without wait state capability. The TMS320C30 has the capability of generating
wait states on either the primary bus or the expansion bus and both buses have independent sets of
ready control logic. Ready generation is discussed in this subsection from the perspective of the
primary bus interface, however, wait state operation on the expansion bus is similar to that of the
primary bus, therefore these discussions pertain equally well to expansion bus operation. Thus,
ready generation will not be included in the specific discussions of the expansion bus interface.
Wait states are generated on the basis of the internal wait state generator, the external ready
input (RDY), or the logical AND or OR of the two. When enabled, internally generated wait states
effect all external cycles, regardless of the address accessed. If different numbers of wait states are
required for various external devices, the external RDY input may be used to tailor wait state generation to specific system requirements.
If the logical OR (or electrical AND since the signals are true low) of the external and wait
count ready signals is selected, the earlier of either of the two signals will generate a ready condition
and allow the cycle to be completed. It is not required that both signals be present.
The OR of the two ready signals can be used to implement wait states for devices that require
a greater number of wait states than are implemented with external logic (up to seven). This feature
is useful, for example, if a system contains some fast and some slow devices. In this case, fast devices can generate a ready signal externally with a minimum of logic, and slow devices can use the
internal wait counter for larger numbers of wait states. Thus, when fast devices are accessed, the
external hardware responds promptly with a ready signal that terminates the cycle. When slow devices are accessed, the external hardware does not respond, and the cycle is appropriately terminated after the internal wait count.
The OR of the two ready signals may also be used if conditions occur that require termination
of bus cycles prior to the number of wait states implemented with external logic. In this case, a
TMS320C30 Hardware Applications

341

shorter wait count is specified internally than the number of wait states implemented with the external ready logic, and the bus cycle is terminated after the wait count. This feature may also be used
as a safeguard against inadvertent accesses to nonexistent memory that would never respond with
ready and therefore lock up the TMS320C30.
If the OR of the two ready signals is used, however, and the internal wait state count is less
than the number of wait states implemented externally, the external ready generation logic must
have the ability to reset its sequencing to allow a new cycle to begin immediately following the end
of the internal wait count. This requires that, under these conditions, consecutive cycles must be
from independently decoded areas of memory and that the external ready generation logic be capable of restarting its sequence as soon as a new cycle begins. Otherwise, the external ready generation logic may lose synchronization with bus cycles and therefore generate improperly timed wait
states.
If the logical AND (electrical OR) of the wait count and external ready signals is selected,
the later of the two signals will control the internal ready signal, and both signals must occur. Accordingly, external ready control must be implemented for each wait state device in addition to the
wait count ready signal being enabled.

This feature is useful ifthere are devices in a system that are equipped to provide a ready signal but cannot respond quickly enough to meet the TMS320C30s timing requirements. In particular, if these devices normally indicate a ready condition and, when accessed, respond with a wait
until they become ready, the logical AND of the two ready signals can be used to save hardware
in the system. In this case, the internal wait counter can be used to provide wait states initially, and
become ready after the external device has had time to send a not ready indication. The internal wait
counter then remains ready until the external device also becomes ready, which terminates the
cycle.
Additionally, the AND ofthe two ready signals may be used for extending the number of wait
states for devices that already have external ready logic implemented but require additional wait
states under certain unique circumstances.
In the implementation of external ready generation hardware, the particular technique
employed depends heavily on the specific characteristics of the system. The optimum approach to
ready generation varies depending on the relative number of wait state and non-wait state devices
in the system and the maximum number of wait states required for anyone device. The approaches
discussed here are intended to be general enough for most applications, and are easily modifiable
to comprehend many different system configurations.
In general, ready generation involves the following three functions:
1) Segmentation of the address space in some fashion to distinguish fast and slow devices.
2) Generating properly timed ready indications.
3) Logically ORing all ofthe separate ready timing signals together to connect to the physical ready input.

342

TMS320C30 Hardware Applications

Segmentation of the address space is required so that a unique indication of each of the particular areas within the address space that require wait states can be obtained. This segmentation is
commonly implemented in a system in the form of chip select generation. Chip select signals may
be used to initiate wait states in many cases, however, occasionally chip select decoding considerations may provide signals that will not allow ready input timing requirements to be met. In this
case, coarse address space segmentation may be made on the basis of a small number of address
lines, where simpler gating allows signals to be generated more quickly. In either case, the signal
indicating that a particular area of memory is being addressed is normally used to initiate a ready
or wait state indication.
Once the region of address space being accessed has been established, a timing circuit of
some sort is normally used to provide a ready indication to the processor at the appropriate point
in the cycle to satisfy each device's unique requirements.
Finally, since indications of ready status from multiple devices are typically present, the signals are logically ORed using a single gate to drive the RDY input.
One of two basic approaches may be taken in the implementation of ready control logic depending upon the state in which the ready input is to be between accesses. If RDY is low between
accesses, the processor is always ready unless a wait state is required; if RDY is high between accesses, the processor will always enter a wait state unless a ready indication is generated.

If RDY is low between accesses, control offull speed devices is straightforward; no action
is necessary since ready is always active unless otherwise required. Devices requiring wait states,
however, must drive ready high fast enough to meet the input timing requirements. Then, after an
appropriate delay, a ready indication must be generated. This can be quite difficult in many circumstances since wait state devices are inherently slow and often require complex select decoding.
If RDY is high between accesses, zero wait state devices, which tend to be inherently fast,
can usually respond immediately with a ready indication. Wait state devices may simply delay their
select signals appropriately to generate a ready. Typically, this approach results in the most efficient
implementation of ready control logic. Figure 6 shows a circuit of this type which can be used to
generate 0, 1, or 2 wait states for multiple devices in a system.

TMS320C30 Hardware Applications

343

Figure 6. Circuit For Generation of 0, 1, or 2 Wait States for Multiple Devices

TMS320C30
Address Bus
STRB ---'-"'-i

Device
Selects

Other 1
Walt
Status
Devices

74AS32

+5V
2 Walt
Status
Devices

Other 0
Walt
Status
Devices

4.7kQ

;-A-..

A

J PRE

9 L..-":;'<1_~
QI-=----lt..
BA' -'- A'
1l!. AD

l sAo

I
I
I
I,

-

OoTRB A

BA8
BA7
BA6
::: BAS
BA<

'"
05
06 f,;07

17'

e;;-

f"'-.-

e52

~

~
tv

<:::>

0<:::>
~

it
;t

'~"

~

~
;:;0

'g""

;:,

'"

A8
A7
A6
AS
..
.,
A>
A'
AD

BA12 , '"
BA" ... A"
.", "
'"
.,.
A9
BIIIt 25 A8

~
~:::
r,0
~
'17'

03

04
DO
06 If,;07

1

e;;f"'-.-

WE

,.--01-

V~14

V

8/

BSTRB
BANKSELO
BR;W -+

GNO
11'
8
' -'- A> 05 "
BA,---,- A, D6
BAO 1l!. AD 07 "

OE
GNO

~

A12

'"
A10
A9
' - BAS '" A8
B., .-'- . , DO "

r,;-e;;-

BANKSEL 2D CS,
""TRB 26 "'"
27 WE

,

8

BA" "
...., "

~
'17'

~

- - - " OE

.-----"" OE
GNO J

Bm

' - B" -' A7 00
B " • A6 "'
- BAS -' AS " '
B.. , . . 03""
i'- BA3 7 A3 ' "
BA>
A> DO
BA'
06
l BAD ' " A'
AD 07

:0:

15

v'"2

~

DO "
0'
'"

BANKSEL 20 CS'
- """'B . . es>

~

27 WE

I

l

"

22-

I

-"
-'
•
-'
-"
7
BA> -'OA1--"
B"" 1l!.

~
~
03""
~ ~..,
00 "
"'
02

BANKSEL 20 e51

1

15

Vee
Vee
Bm , '"
BA" "- A"
-'1 "0
.M
A9

Vee

..12

I~

1

I,.

15

V

,

'

-------J

A wait state is required with this implementation of bank memory because of the added propagation delay presented by the address bus buffers used in the circuit. The wait state is not a function
of the fact that the memory is organized as multiple banks or the use of bank switching. When bank
switching is used, memory access speeds are the same as without bank switching once bank boundaries are crossed. Therefore, no speed penalty is paid when using bank switching except for the occasional extra cycle inserted when bank boundaries are crossed. It should be noted, however, that
ifthe extra cycle inserted when crossing bank boundaries does impact software performance significantly, code can often be restructured to minimize bank boundary crossings, thereby reducing the
effect of these boundary crossings on software performance.
The wait state for this bank memory is generated using the wait state generator circuit presented in the previous section. Because A23 is the signal which enables the entire bank memory system, the inverted version of this signal is ANDed with STRB to derive a one wait state device select.
This signal is then connected in the circuit along with the other one wait state device selects. Thus,
any time a bank memory access is made, one wait state is generated.
Each of the four banks in this circuit is selected using a decode of A15-A13 generated by the
74AS138 (see Figure 8). With the BNKCMPR register set to OBh, the banks will be selected on
even 8K-word boundaries starting at location 080AOOOh in external memory space.

TMS320C30 Hardware Applications

347

Figu~

8. Bank Memory Control Logic
74ALS2541

2
A
3
A1
4
A
5
A3
6
A4
7
A
8
A
9
A7

A1
A2
A3
A4
A5
A6
A7
A8
G1

Y1
Y2
Y3
Y4
Y5
Y6
Y7
Y8
G2

18
17
16
15
14
13
12
11
19

BAO
BA1
BA2
BA3
BM
BA5
BA6
BA7

74ALS2541
2
A8
3
A
4
A1
5
A11
6
A1
7
RiW
8
9

A1
A2
A3
A4
A5
A6
A7
A8
G1

Y1
Y2
Y3
Y4
Y5
Y6
Y7
YB
G2

18
17
16
15
14
13
12
11
19

BA8
BA9
BA10
BA11
BA12
BRiW

74AS04
74AS138
A1
A14
A13

A2

348

3
2

6
4
5

STRB
C

B
A

Y1

BANKSELO
BANKSEL1
BANKSEL2
BANLSSEL3

~

BSTRB

G1
G2A
G2B

TMS320C30 Hardware Applications

The 74ALS2541 buffers used on the address lines are necessary in this design since the total
capacitive load presented to each address line is a maximum of 20 x 5 pF or 100 pF (bank memory
plus zero wait-state static RAM), which exceeds the TMS320C30 rated capacitive loading of 80
pF. Using the manufacturers derating curves for these devices at a load of 80 pF (the load presented
by the bank memory) predicts propagation delays at the output of the buffers of a maximum of 16
ns. The access time ofa read cycle within a bank of the memory is therefore the sum of the memory
access time and the maximum buffer propagation delay or 25 + 16 =41 ns, which, since it falls between 30 and 90 ns, requires one wait state on the TMS320C30.
The 74ALS2541 buffers offer one additional system performance enhancement in that they
include 25-ohm resistors in series with each individual buffer output. These resistors greatly improve the transient response characteristics of the buffers especially when driving CMOS loads
such as the memories used here. The effect of these resistors is to reduce overshoot and ringing
which is common when driving predominantly capacitive loads such as CMOS. The result of this
is reduced noise and increased immunity to latchup in the circuit, which in turn results in a more
reliable memory system. Having these resistors included in the buffers eliminates the need to put
discrete resistors in the system which is often required in high speed memory systems.
This circuit could not have been implemented without bank switching, since data output's
turn-on and turn-off delays would have caused bus conflicts. Here, the propagation delay of the
74AS138 is only involved during bank switches, where there is sufficient time between cycles to
allow new chip selects to be decoded.
The timing of this circuit for read operations using bank switching is shown in Figure 9. With
the BNKCMPR register set to OBh, when a bank switch occurs, the bank address on address lines
A23-A13, is updated during the extra HI cycle while STRB is high. Then, after chip select decodes
have stabilized, and the previously selected bank has disabled its outputs, STRB goes low for the
next read cycle. Further accesses occur at normal bus timings with one wait state as long as another
bank switch is not necessary. Write cycles do not require bank switching due to the inherent address
setup provided in their timings.

TMS320C30 Hardware Applications

349

Figure 9. Timing For Read Operations Using Bank Switching

H1

A23-13

--'1 t 3 !4-+! t 4 !4~......
I -1-1....;..JI---\l i

X

:

I

\\,.._-Jr

Valid

----~I~--------~~~------------

XI~----------~i------------J
valid
X~----

A12-0

_ _ _ _- J

____

~A

~~I_______________

--.! t 2i4--

I

_________Jfir,-------------~~l-----------------------I
I

rl

~t3~

\~l~------------rl________________

----------~I

031-0

Bank 0 0" Bus

ts 14I

}~------_C{...' __B_8_n_k_1_0_"_B_U_S_ ___

The timing for this interface is summarized in the Table 1.

Table 1. Bank Switching Interface Timing
Time Interval

Event

Time Period

t1
t2

HI falling to address/STRB valid
Add to select delay
Memory disable from STRB
HI falling to STRB
Memory output enable delay

14 ns
10 ns
10 ns
10 ns
3 ns

13

t4
t6

Expansion Bus Interface
The TMS320C30s expansion bus interface provides a second complete parallel bus which
can be used to implement data transfers concurrently with and independent of operations on the
primary bus. The expansion bus comprises two mutually exclusive interfaces controlled by the
MSTRB and IOSTRB signals, respectively. This section discusses interface to the expansion bus
using IOSTRB cycles; MSTRB cycles are essentially equivalent in timing to primary bus cycles,
and are discussed in the previous section.
Unlike the primary bus, both read and write cycles on the I/O portion of the expansion bus
are two HI cycles in duration and exhibit the same timing. The XR/W signal is high for reads and
low for writes. Since I/O accesses take two cycles, many peripherals that require wait states if interfaced either to the primary bus or using MSTRB may be used in a system without the need for wait
states. Specifically, in cases where there is only one device on the expansion bus, devices with access times greater than the 30 ns required by the primary bus, but not more than 59 ns can be interfaced to the I/O bus without wait states.
350

TMS320C30 Hardware Applications

AID Converter Interface
ND and D/A converters are components that are commonly required in DSP systems and
interface efficiently to the I/O expansion bus. These devices are available in many speed ranges
and with a variety of features, and while some may be used at full speed on the I/O bus, others may
require one or more wait states.
Figure 10 shows an interface to an Analog Devices AD1678 analog to digital converter. The
AD1678 is a 12-bit, 5 f.tS converter allowing sample rates up to 200 kHz and with an input voltage
range of 10 volts bipolar or unipolar. The converter is connected according to manufacturers specifications to provide 0 to +10 volt operation. This interface illustrates a common approach to connecting devices such as this to the TMS320C30. Note that the interface requires only a minimum
amount of control logic.

Figure 10. Interface to AD1678 AID Converter
XA124 ............. 6
5]
r----rIOW

lOST RB
XR/W

lOR

"\.8
f

74AS3~

13

11

74LS244
18 lVl
1
14
12

OE
SC
CS
12/8
ONE--r:::I? SVNC
1
EOCEN

5
3
lG

17
11
19
20
21
22
23
24
25

2G

1 I

~

74LS244
X08
X09
X010
X011
XOBus

18 Wl
16
14
12
lG

lAl

REFOUT

~

4--

~

[1-"

AIN 6

t

+5

A01678

20K

EOC 27

PGNO

~

USOQ

BIPOFF

01
02
03
04
05
06
07
08
09
010
011

8

REFIN

~16 DO

lA12
4
6
8
2Al 11
13
15
17

2Vl

Voo

2

~

r
XOO
X01
X02
X03
X04
XDS
X06
X07

28

11

Vee

9
10J

XA 12 12,

r

T

74AS32
74AS04Vs

r

4

VEE

r
Q

Analog
Input

INTO

AGNO

J: J:

1L-ONE

~

The AD1678 is a very flexible converter and is configurable in a number of different operating modes. These operating modes include byte or word data format, continuous or non-continuous
conversions, enabled or disabled chip select function, and programmable end of conversion indication. This interface utilizes 12-bit word data format, rather than byte format to be compatible with
the TMS320C30. Non-continuous conversions are selected, so that variable sample rates may be
used, since continuous conversions occur only at a rate of 200 kHz. With non-continuous conversions, the host processor determines the conversion rate by initiating conversions through write operations to the converter.
TMS320C30 Hardware Applications

351

The chip select function is enabled, so the chip select input is required to be active when accessing the device. Enabling the chip select function is necessary to allow a mechanism for the
AD1678 to be isolated from other peripheral devices connected to the expansion bus. To establish
the desired operating modes, the SYNC and 12/8 inputs to the converter are pulled high and EOCEN is grounded, as specified in the AD1678 data sheet.
In this application, the converter's chip select is driven by XA12, which maps this device at
804000h in I/O address space. Conversions are initiated by writing any data value to the device,
and the conversion results are obtained by reading from the device after the conversion is completed. To generate the devices Start Conversion (SC) and Output Enable (OE) inputs, 10STRB
is ANDed with XR/W. Therefore, the converter is selected whenever XA12 is low, and OE is driven when reads are performed, while SC is driven when writes are performed.
As with many ND converters, at the end of a read cycle the AD1678 data output lines enter
a high impedance state. This occurs after the Output Enable (OE) or read control line goes inactive.
Also common with these types of devices, is that the data output buffers often require a substantial
amount of time to actually attain a full high-impedance state. When used with the TMS320C30,
devices must have their outputs fully disabled no later than 65 ns following the rising edge of
IOSTRB, since the TMS320C30 will begin driving the data bus at this point if the next cycle is a
write. If this timing is not met, bus conflicts between the TMS320C30 and the AD1678 may occur,
potentially causing degraded system performance and even failure due to damaged data bus drivers.
The actual disable time for the AD1678 can be as long as 80 ns, therefore buffers are required to
isolate the converter outputs from the TMS320C30. The buffers used here are 74LS244s that are
enabled when the AD1678 is read, and turned off30.8 ns following 10STRB going high.Therefore,
the TMS320C30 requirement of 65 ns is met.
When data is read following a conversion, the AD1678 takes 100 ns after its OE control line
is asserted to provide valid data at its outputs. Thus, including the propagation delay of the 74LS244
buffers, the total access time for reading the converter is 118 ns. This requires two wait states on
the TMS320C30 expansion I/O bus.
The two wait states required in this case are implemented using software wait states, however, depending on the overall system configuration it may be necessary to implement a separate wait
state generator for the expansion bus (refer to section on ready generation). This would be the case
if there were multiple devices that required different numbers of wait states connected to the expansion bus.
Figure 11 shows the timing for read operations between the TMS320C30 and the AD1678.
At the beginning of the cycle, the address and XR/W lines become valid tl =10 ns following the
falling edge of HI' Then, aftert2 = 10 ns from the next rising edge of HI, 10STRB goes low, beginning the active portion of the read cycle. After t3 =5.8 ns, the control logic propagation delay, the
lOR signal goes low, asserting the OE input to the AD1678. The '74LS244 buffers take t4 =30 ns
to enable their outputs, and then, following the converters access delay and the buffer propagation
delay (ts =100 + 18 =118 ns) data is provided to the TMS320C30. This provides approxim!ltely
46 ns of data setup before the rising edge of IOSTRB. Therefore, this design easily satisfies the
TMS320C30s requirement of 15 ns of data setup time for reads.
352

TMS320C30 Hardware Applications

Figure 11. Read Operations Timing Between the TMS320C30 and AD1678

H1

~

~~~~~

lOR

/

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _- - J

------------::::;~

/

~I---------------------------------J

R~~-----------Hi~~I________________-J)~------~t4

I.

t

ts

.!

Unlike the primary bus, read and write cycles on the I/O expansion bus are timed the same
with the exception that XR/W is high for reads and low for writes and that the data bus is driven
by the TMS320C30 during writes. When writing to the AD1678, the '74LS244 buffers do not turn
on and no data is transferred. The purpose of writing to the converter is only to generate a pulse
on the converter's SC input, which initiates a conversion cycle. When a conversion cycle is completed, the AD1678's EOC output is used to generate an interrupt on the TMS320C30 to indicate
that the converted data may be read.
It should be noted that for different applications, use of TLC1225 or TLC1550 AID converters from Texas Instruments may be beneficial. The TLC1225 is a self-calibrating 12-bit-plus-sign
bipolar or unipolar converter which features 10 Ils conversion times. The TLC1550 is a 10-bit,
6 Ils converter with a high speed DSP interface. Both converters are parallel-interface devices.

D/A Converter Interface
In many DSP systems, the requirement for generating an analog output signal is a natural consequence of sampling an analog waveform with an AID converter and then processing the signal
digitally internally. Interfacing D/A converters to the the TMS320C30 on the expansion I/O bus
is also quite straightforward.
As with AID converters, D/A converters are also available in a number of varieties. One of
the major distinctions between various types of D/A converters is whether or not the converter includes latches to store the digital value to be converted to an analog quantity, and the interface to
control those latches. With latches and control logic included with the converter, interface design
is often simplified, however, internal latches are often included only in slower D/A converters.
Because slower converters limit signal bandwidths, the converter chosen for this design was
selected to allow a reasonably wide range of signal frequencies to be processed, in addition to illustrating the technique of interfacing to a converter using external data latches.
Figure 12 shows an interface to an Analog Devices AD565A digital to analog converter. This
device is a 12-bit, 250 ns current output DAC with an on-board 10 volt reference. Using an offboard current-to-voltage conversion circuit connected according to manufacturers specifications,
TMS320C30 Hardware Applications

353

the converter exhibits output signal ranges 0 to +10 volts, which is compatible with the conversion
range of the ND converter discussed in the previous section.

Figure 12. Interface Between the TMS320C30 and the AD565A
+12V

Vee

VEE 7
11
20 V SPAN

4 REF. OUT

-12V

50 Q

10VSPAN~1~0__~__~~-,
Analog

3 Bil12 (LSB)
r.-______---:1c:i

Oul

14 11

15 10
16 9

17 8

-12
AD565A

24 Bit 1 (MSB)

2.4 K

Power 12
GND
AGND

12

XA12

Because this DAC essentially performs continuous conversions based on the digital value
provided at its inputs, periodic sampling is maintained by periodically updating the value stored
in the external latches. Therefore, between sample updates, the digital value is stored and maintained at the latch outputs that provide the input to the DAC. This results in the analog output remaining stable until the next sample update is performed.
The external data latches used in this interface are '74LS377 devices that have both clock
and enable inputs. These latches serve as a convenient interface with the TMS320C30; the enable
inputs provide a device select function, and the clock inputs latch the data. Therefore, with the enable input driven by inverted XA12 and the clock input driven by lOW, which is the AND of
lOSTRB and XR/W, data will be stored in the latches when a write is performed to I/O address
805000h. Reading this address has no effect on the circuit.
Figure 13 shows a timing diagram of a write operation to the D/A converter latches.

354

TMS320C30 Hardware Applications

Figure 13. Write Operation to the D/A Converter Timing Diagram

I
:

H1\
I

XA12-XAO

~
I

XA12

I
I
I
I

/

tr~~

I
I
I

\l

.

t2-~

/

I4-toI- t4

\l

lOW

~

><==
r

I

I

i-

II

I

I

I
I

I

If
I
I

~I

XD32"XDO

/

\

:I

I
I
I
\
I
I II+- t3+1
I
I
I

I

IOSTRB

\

I

~--t5---'

I
I
tot
1

}-I
I

14---t6~

Because the write is actually being performed to the latches, the key timings for this operation
are the timing requirements for these devices. For proper operation, these latches require simply
a minimal setup and hold time of data and control signals with respect to the rising edge of the clock
input. Specifically, the latches require a data setup time of20 ns, enable setup of25 ns, disable setup
of 10 ns and data and enable hold times of 5 ns. This design provides approximately 60 ns of enable
setup, 30 ns of data setup, and 7.2 ns of data hold time. Therefore, the setup and hold times provided
by this design are well in excess of those required by the latches. The key timing parameters for
this interface are summarized in Table 2.

Table 2. Key Timing Parameter for D/A Converter Write Operation
Time Interval
t1
t2
t3
t4
15

t6

Event
HI falling to address valid
XA12 to XA12 delay
HI rising to IOSTRB falling
IOSTRB 10 lOW delay
Data setup to lOW
Data hold from lOW

Time Period
10
5
10
5.8
30
7.2

ns
ns
ns
ns
ns
ns

System Control Functions
There are several aspects ofTMS320C30 system hardware design that are critical to overall
system operation. These include such functions as clock and reset signal generation and interrupt
control.
TMS320C30 Hardware Applications

355

Clock Oscillator Circuitry
An input clock may be provided to the TMS320C30 either from an external clock input or
using the on-board oscillator. Unless special clock requirements exist, using the on-board oscillator is generall y a convenient method of clock generation. This method requires few external components and can provide stable, reliable clock generation for the device.

~y

Figure 14 shows a clock generator circuit using the internal oscillator. This circuit is designed
to operate at 33.33 MHz and since crystals with fundamental oscillation frequencies of 30 MHz
and above are not readily available, a parallel-resonant third-overtone circuit is used.

Figure 14. Crystal Oscillator Circuit

TMS320C30
X1

X2/CLKIN

o

33.33 MHz
1--.__----,

47

PF 0:.::H1
T

T~PF

In a third-overtone oscillator, the crystal fundamental frequency must be attenuated so that
oscillation is at the third harmonic. This is achieved with an LC circuit that filters out the fundamental, thus allowing oscillation at the third harmonic. The impedance of the LC circuit must be inductive at the crystal fundamental and capacitive at the third harmonic. The impedance of the LC circuit is given by:
(1)
z (w) =
L/C
j [w L - l/wC ]
Therefore, the LC circuit has a pole at:

(2)

At frequencies significantly lower than wP' the l!(wC) term in (1) becomes the dominating
term, while wL can be neglected. This gives:
z (w) = jwL for w < wp

(3)

In (3), the LC circuit appears inductive at frequencies lower than wp' On the other hand, at
frequencies much higher than wP' the wL term is the dominant term in (1), and l!(wC) can be neglected. This gives:
356

TMS320C30 Hardware Applications

z (w)

= -.]wC
-1

(4)
for w > w p

The LC circuit in (4) appears increasingly capacitive as frequency increases above wp. This
is shown in Figure 15, which is a plot of the magnitude of the impedance of the LC circuit of Figure
14 versus frequency.

Figure 15. Magnitude of the Impedance of the Oscillator LC Network
I

I,

I z(W) I
Capacitive
Region

W

(rad/s)

Based on the discussion above, the design of the LC circuit proceeds as follows:
1) Choose the pole frequency wp approximately halfway between the crystal fundamental
and the third harmonic.
2) The circuit now appears inductive at the fundamental frequency and capacitive at the
third harmonic.
In the oscillator of Figure 13, choose wp =22.2 MHz, which is approximately halfway between the fundamental and the third harmonic. Choose C =20 pF. Then, using (2), L =2.6 !-tHo

Reset Signal Generation
The reset input controls initialization of internal TMS320C30 logic and also causes execution of the system initialization software. For proper system initialization, the reset signal must be
applied at least ten HI cycles, i.e., 600 ns for a TMS320C30 operating at 33.33 MHz. Upon powerup, however, it can take 20 ms or more before the system oscillator reaches a stable operating state.
Therefore, the powerup reset circuit should generate a low pulse on the reset line for 100 to 200
ms. Once a proper reset pulse has been applied, the processor fetches the reset vector from location
zero which contains the address of the system initialization routine. Figure 16 shows a circuit that
will generate an appropiate powerup reset circuit.

TMS320C30 Hardware Applications

357

Figure 16. Reset Circuit
TMS320C30

+5V
74LS14

74LS14

R t = 100KQ

DGND

The voltage on the reset pin (RESET) is controlled by the R 1Cl network. Mter a reset, this
voltage rises exponentially according to the time constant Rl Cl> as shown in Figure 17.

Figure 17. Voltage on the TMS320C30 Reset Pin.
Voltage

V=Vee (1-e- t/ 't)
Vee

---------:=;.;o----~-

TIme

The duration of the low pulse on the reset pin is approximately tl, which is the time it takes
for the capacitor Cl to be charged to 1.5 V. This is approximately the voltage at which the reset input
switches from a logic 0 to a logic 1. The capacitor voltage is given by:
V
358

= Vee

[ 1 - e

-f

(5)
TMS320C30 Hardware Applications

= R I CI is the reset circuit time constant. Solving (5) for t gives:
V
t
Rl C1 In [1 - ]
Vee
Setting the following:

where

'l'

=-

Rl
CI
Vee
V

=
=
=
=

(6)

lOOkQ
4.7 I-lF
5V
VI = 1.5 V

gives t =167 ms. Therefore, the reset circuit of Figure 16 provides a low pulse oflong enough
duration to ensure the stabilization of the system oscillator.
Note that if synchronization of multiple TMS320C30s is required, all processors should be
provided with the same input clcock and the same reset signal. After powerup, when the clock has
stabilized, all processors may then be synchronized by generating a falling edge on the common
reset signal. Because it is in the falling edge of reset that establishes synchronization, reset must
be high for a period of time (at least ten HI cycles) initially. Following the falling edge, reset should
remain low for at least ten HI cycles and then be driven high. This sequencing of reset may be accomplished using additional circuitry, based on either RC time delays or counters.

Serial Port Interface to Ale
For applications such as modems, speech, control, instrumentation, and analog interface for
DSPs, a complete analog-to-digital (NO) and digital-to-analog (D/A) input/output system on a
single chip may be desired. The TLC32044 analog interface circuit (AIC) integrates on a single
monlithic/CMOS chip a bandpass, switched-capacitor, antialiasing-input filter, 14-bit resolution
NO and D/A converters, and a lowpass, switched-capacitor, output-reconstruction filter. The
TLC32044 offers numerous combinations of master clock input frequencies and conversion/sampling rates, which can be changed via digital processor control.
Four serial port modes on the TLC32044 allow direct interface to TMS320C30 processors.
When the transmit and receive sections of the AlC are operating synchronously, it can interface to
two SN54299 or SN74299 serial-to-parallel shift registers. These shift registers can then interface
in parallel to the TMS320C30, other TMS320 digital processors, or to external FIFO circuitry. Output data pulses are emitted to inform the processor that data transmission is complete or to allow
the DSP to differentiate between two transmitted bytes. A flexible control scheme is provided so
that the functions of the AlC can be selected and adjusted coincidentally with signal processing via
software control. Refer to the TLC32044 data sheet for detailed information.
When interfacing the AlC to the TMS320C30 via one of the serial ports, no additional logic
is required. This interface is shown in Figure 18. The serial data, control and clock signals connect
directly between the two devices and the AIC's master clock input is driven from TCLKO, one of
the TMS320C30s internal timer outputs. The AlC's WORD/BYTE input is pulled high selecting
16-bit serial port transfers to optimize serial port data transfer rate. The TMS320C30s XFO, configured as an output, is connected to the AIC's reset (RST) input to allow the AIC to be reset by the
TMS320C30 under program control. This allows the TMS320C30 timer and serial port to be initialized before beginning conversions on the AlC.
TMS320C30 Hardware Applications

359

Figure 18.. AIC to TMS320C30 Interface
TlC32044

TMS320C30

FSXO
DXO
FSRO
ORO
CLlO(O
ClKRO
TClKO

Q2
Q3
P3
Q1
M5

N4
P4

14
12
4
5
10

FSX
DX
FSR
DR
SHIFTClK

IN+ 26
IN-

AOV
AGNO

OUT+ 22
OUT- 21

AOUT

7

Voo
20
Vee+
19
VeeAGNO 18
AGNO

6 MSTR ClK

WOR01 BYTE 13
RST 2
XFO
G2

+5
+5
+5
AGND
+5

OGNO
9
DGND

To provide the master clock input for the AlC, the TCLKO timer is configured to generate
a clock signal with a 50% duty cycle at a frequency ofHl/4 or 4.167 MHz. To accomplish this, the
timer 0 global control register is set to the value 3C1h, which establishes the desired operating
modes. The timer 0 period register is set to 1 which sets the required division ratio for the HI clock.
To properly communicate with the AlC the TMS320C30 serial port must be configured appropriately. To configure the serial port, several TMS320C30 registers and memory locations must
be initialized. First the serial port should be reset by setting the serial port global control register
to 2I70300h. (The AIC should also be reset at this time. See description below of resetting the AlC
using XFO). This resets the serial port logic and configures the serial port operating modes including data transfer lengths and enables the serial port interrupts. This also configures another important aspect of serial port operation: polarity of serial port signals. Because active polarity of all serial port signals is programmable, it is critical that the bits in the serial port global control register
that control this be set appropriately. In this application all polarities are set to positive except FSX
and FSR which are driven by the AIC and are true low.
The serial port transmit and receive control registers must also be initialized for proper serial
port operation. In this application, both of these registers are set to I11h, which configures all of
the serial port pins in the serial port mode, rather than the general purpose digital I/O mode.
With the operations described above completed, interrupts are enabled, and provided the serial port interrupt vector(s) are properly loaded, serial port transfers may begin after the serial port
is taken out of reset. This is accomplished by loading E170300h into the global control register.
To begin conversion operations on the AlC and subsequent transfers of data on the serial port,
the AIC is first reset by setting XFO to zero at the beginning of the TMS320C30 initialization rou360

TMS320C30 Hardware Applications

tine. Setting XFO to zero is accomplished by setting the TMS320C30 IOF register to 2. This sets
the AIC to a default configuration and halts serial port transfers and conversion operations until
reset is set high. Once the TMS320C30 serial port and timer have been initialized as described
above, XFO is set high by setting the IOF register to 6. This allows the AIC to begin operating in
its default configuration, which in this application is the desired mode. In this mode all internal filtering is enabled, sample rate is set at approximately 6.4 kHz, and the transmit and receive sections
of the device are configured to operate synchronously. Conveniently, this mode of operation is appropriate for a variety of applications, and if a 5.184 MHz master clock input is used, the default
configuration results in an 8 kHz sample rate which makes this device ideal for speech and telecommunications applications.
In addition to the benefit of a convenient default operating configuration, the AIC can also
be programmed for a wide variety of other operating configurations. Sample rates and filter characteristics may be varied, in addition to which, numerous connections in the device may be configured
to establish different internal architectures, by enabling or disabling various functional blocks.
To configure the AIC in a fashion different from the default state, the device must first be
sent a serial data word with the two LSBs set to one. The two LSBs of a transmitted data word are
not part of the transferred data information and are not set to one during normal operation. This condition indicates that the next serial transmission will contain secondary control information, not
data. This information is then used to load various internal registers and specify internal configuration options. There are four different types of secondary control words distinguished by the state
of the two LSBs of the control information transferred. Note that each secondary control word
transferred must be preceded by a data word with the two LSBs set to one.
The TMS320C30 can communicate with the AIC either synchronously or asynchronously
depending on the information in the control register. The operating sequence for synchronous communication with the TMS320C30 shown in Figure 19, is as follows:
1) The FSX or FSR pin is brought low.
2) One 16-bit word is transmitted or one 16-bit word is received.
3) The FSX or FSR pin is brought high.
4) The EODX or OEDR pin emits a low-going pulse.

Figure 19. Synchronous Timing ofTLC32044 to TMS320C30
SHIFTCLK

DR--;=~

DX~==~==~~~~~~~~~~~
EOOR, EO OX

For asynchronous communication, the operating sequence is similar, but FSX and FSR do
not occur at the same time (see Figure 20). After each receive and transmit operation, the
TMS320C30 asserts an internal receive (RINT) and transmit (XINT) interrupt, which may be used
to control program execution.
TMS320C30 Hardware Applications

361

;"

Figure 20. Asynchronous Timing ofTLC32044 to TMS320C30

u u u u
u
u

XDSIOOO Target Design Considerations
The TMS320C30 Emulator is an eXtended Development System (XDSlOOO) which has all
the features necessary for full-speed emulation. The TMS320C30 uses a revolutionary technology
to allow complete emulation via a serial scan path. If users provide a 12-pin header on their target
system, realtime emulation can be performed using the TMS320C30 in their target system.
To use the emulation connector of the XDSI000, the signals shown in Figure 21. should be
provided to a 12 pin header (two rows of six pins) with pin 8 cut out to provide keying. Table 3 describes the pins and signals present on the header.

Figure 21. 12 Pin Header Signals
Header Dimensions:
Pln-to-pln spacing: 0.100 Inches (X,V)
Pin width:
0.025 Inches square post
Pin length:0.235 Inches nominal

EMU1

t

2

GND

EMUO t

3

4

GND

EMU2 t

S

6

GND

PD (+5V)

7

EMU3

9

10

GND

11

12

GND

H3

NO PIN (KEY)

TOP VIEW

Table 3. Signal Description
Signal Name
EMUO
EMU!
EMU2
EMU3
H3
GND
PD

362

Description
Emulation pin O.
Emulation pin 1.
Emulation pin 2.
Emulation pin 3.
TMS320C30 H3.
Ground.
Presence detect. It indicates that the cable is connected and target system is powered up. It
should be tied to +5 volts in the target system.

TMS320C30 Hardware Applications

In addition to the signals required at the emulation connector, the EMU4 through EMU6 signals on the TMS320C30 must also be appropiately connected to ensure proper emulation operation.
The EMU4 signal must be tied to +5 volts and EMU5 and EMU6 must be left unconnected. Also,
the RSVO through RSVIO signals must be tied to +5 volts as described in the Third-Generation
TMS320 User's Guide (literature number SPRU031).

Summary
The TMS320C30 is a high-performance 32-bit floating-point digital signal processor. Its
dual parallel-interface busses and serial ports, along with a wide variety of additional support interfaces make the device an extremely flexible system-level DSP microprocessor. Using the techniques described in this report, the TMS320C30 can be used to implement sophisticated signal processing applications with the high precision and dynamic range provided by 32-bit floating-point
arithmetic.
This application report has described the use of external interfaces on the TMS320C30 to
connect it to memories, ND and D/A converters, and numerous other peripheral devices, as well
as the generation of wait states and other system functions.
The interfaces described in this report have all been built and tested to verify proper operation, and the techniques described can be extended to encompass design of more complex systems.

TMS320C30 Hardware Applications

363

364

TMS320C30 Hardware Applications

TMS320C30-IEEE Floating-Point
Format Converter

Randy Restle, Regional Technology Center, Waltham, MA
Adam Cron, Digital Signal Processor Products-Semiconductor Group
Texas Instruments

365

366

TMS320C30-IEEE Floating-Point Format Converter

Introduction
Certain applications require the exceptionally high arithmetic throughput inherent in the
TMS320C30 Digital Signal Processor but must use the IEEE floating-point number format, which
differs from the TMS320C30's number format. The TMS320C30 uses a 2's complement format
for the mantissa and exponent. Besides making the device more compatible with analog to digital
converters, it is computationally more efficient in both speed and die size than the IEEE format.
Applications requiring the IEEE format can benefit from the use of a custom chip for this conversion. For this reason, a chip has been designed, built, and tested. This report describes that chip.
The TMS320C30-IEEE Floating-Point Number Format Converter is a peripheral that performs floating-point number conversions between the native format of the TMS320C30 and the
Single-Precision IEEE Standard 754-1985. This conversion is performed in hardware and can convert an incoming (IEEE-formatted) or outgoing (TMS320C30-formatted) floating-point number
in less than one TMS320C30 instruction cycle. Normally, the part is placed between memory and
the TMS320C30.
This peripheral has two operating modes.
• Mode 1 does not pipeline any data through the chip. Instead, one wait state is automatically generated to compensate for the converter's propagation delays. This mode is equivalent in performance to equipping the TMS320C30 with a single-cycle convert instruction.
In those applications where speed is of utmost importance, the pipeline mode is provided.
• Mode 2 enables the converter's built-in pipeline.
Because propagation delays through the chip reduce the access time required for
TMS320C30 external memory, the pipeline mode allows conversions to take place on one data value while a previously converted value is being read, or written, by the TMS320C30. Depending
on the TMS320C30 instruction cycle time and the access time of memories being used, the pipeline
mode can eliminate degradation in TMS320C30 throughput entirely. However, it should be noted
that values fed through the pipeline appear at the output in the next cycle. Therefore, an extra read
or write (Le., the same operation that was being performed) must be performed to flush the pipeline.
Consequently, when pipeline mode is used, data values and their addresses are skewed from one
another. This mode is intended for high-speed block transfer/conversion, and the address skew
should be acceptable.
All control signals to and from the converter are compatible with TMS320C30 signals so that
no extra circuitry is required to use this chip. In fact, it has been designed to appear as much as possible like a simple bus transceiver (e.g., SN74LS245). Consequently, it has two data buses. Data bus
A (pins DA31 through DAO) should be connected directly to one of the TMS320C30's data buses
and the other to memory. Its direction pin (DIR) should be tied to the read/write pin (R/W), and
its output enable pin (OE) can be tied to either STRB or MSTRB of the TMS320C30, depending
on where in the TMS320C30 memory map IEEE numbers are stored.

Key Features
This device is designed to fit into systems equipped with TMS320C30 external memory into
which IEEE formatted numbers are stored. Below is a list of some specific features of the
TMS320C30-IEEE Floating-Point Converter:
TMS320C30-lEEE Floating-Point Format Convener

367

• Automatic wait-state generation during conversions
• Automatic interrupt generation when IEEE NaNs are encountered
• Automatic pipeline mode for single-cycle conversions
• Built-in SCOPE (i.e., JTAG) testability logic

Report Overview
• External Interfaces - Describes the external interfaces of this chip, the pinout, and pins.
• Architectural Overview - Describes the functions of the converter. Gives an overview of
the TMS320C30 and IEEE Standard 754-1985 number formats and the scope of numbers
that can be converted.
• Converter Operating Modes - Describes the converter's operating modes.
• Interrupts - Describes the Not a Number interrupt generated by the converter.
• Software Application Examples - Contains software application examples.
• Hardware Application Examples - Contains hardware application examples.
• JTAG/lEEE-1149.1 Scan Interface - Contains the JTAG/lEEE scan interface description.

Typographical Conventions
In this report, buses are signified with the bus name in capital letters, followed by the range
of signals (bits) enclosed in parentheses and separated by a colon. For example, TI(31:0) is bus
"TI", bits 31 through 0 (31 is the most significant bit, 0, the least). Table 1 shows the symbols and
their corresponding meaning that are used in sections of the report concerning control logic, algorithm overview, and bit-specific conversion algorithms.

Table 1. Symbols and Meanings
Symbol

+

I

&
!

1\

Name
plus
pipe
ampersand
exclamation point
minus
caret

Meaning
arithmetic summation
logical OR
logical AND
one's complement
two's complement
EXCLUSIVE OR

External Interfaces
Packaging
The TMS320C30 device is housed in an 84-pin package. This pinout was chosen for efficient
flow through connection to the buses. The TMS320C30-IEEE Converter's pin assignments are
shown in Table 2, and the pin locations are shown in Figure 1.
368

TMS320C30 IEEE Floating-Point Format Converter

Table 2. Pin Assignments
Pin

Name

Pin

Name

Pin

Name

1
2
3
4
5
6
7
8
9

GND
DB15
DB14
DB13
DB12
DBll
DB10
DB9
DB8
DB7
DB6
DB5
DB4
DB3
DB2
DB1
DBO
WAIT

29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

DA3
DA4
DAS
DA6
DA7
DA8
DA9
DAlO
DAll
DA12
DA13
DA14
DA15
VCC
GND
DA16
DA17
DA18
DA19
DA20
DA21
DA22
DA23
DA24
DA25
DA26
DA27
DA28

57
58
59
60
61
62
63
64
65
66
67
68
69
70

DA29
DA30
DA31
TDI
TMS
TCK
VCC
GND

10
11

12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

PIPE

CLK
VCC
GND
NAN
DIR

OE
DAO
DAI
DA2

TMS320C30 IEEE Floating-Point Format Converter

71
72

73
74
75
76
77

78
79
80
81
82
83
84

IDO
TIP
RST
DB31
DB30
DB29
DB28
DB27
DB26
DB25
DB24
DB23
DB22
DB21
DB20
DB19
DB18
DB17
DB16

vee

369

Figure 1. Pin Locations
O~NM~~

~~~~O~NM~

~~~~~~~~~~C8-~~~NNNNN
mmmmmmmmmmZ
mmmmmmmmm
cccccccccc~>ccccccccc

085
084
083
082
081
080
WAIT
PIIPE
ClK
Vee
GNO
NAN
OIR
OE
OAO
OA1
OA2
OA3
0A4
OA5
OA6

11109 876 5 4 3 2 1 84838281807978777675
74
12
73
13
72
14
71
15
70
16
69
17
68
18
67
19
66
20
TMS320C30· IEEE
65
FLOATING POINT
21
FORMAT CONVERTER
64
22
63
23
62
24
61
25
26
60
27
59
58
28
57
29
56
30
55
31
54
32

0825
0826
0827
0828
0829
0830
0831
RST
TIP
TOO
GNO
Vee
TCK
TMS
TOI
OA31
OA30
OA29
0A28
OA27
0A26

~~~~U~~~~~~~~~~~~~~~~

Pinout Description
Table 3 describes the pin functions.

Table 3. Converter Signals
Signal

Pins

Type

Description

DIR

1

Input

Direction - This pin determines what type of conversion
should take place. When it is high, data on bus B is converted
from IEEE to TMS320C30 format and output on bus A. When
it is low, data on bus A is converted from TMS320C30 to IEEE
format and output on bus B. This pin is normally tied directly
to the TMS320C30 read/write pin.

OE

1

Input

Output Enable (active low) - In combination with the DIR
pin, this pin disables the currently driven bus (Le., bus A or B).

370

TMS320C30 IEEE Floating-Point Format Converter

Table 3. Converter Signals (Concluded)
Signal

Pins

Description

Type

WAIT

1

Output

This pin is driven high in nonpipelined operations to signal the
TMS320C30 to extend its external memory access to allow
the conversion to complete. It can be tied directly to the
TMS320C30 ready line. It is appropriately driven for both
read and write operations, but is always low in pipelined mode
of operation.

PIPE

1

Input

Pipeline Enable - When this is high, the converter is configured in pipeline mode. It must be tied low for nonpipeline
mode.

CLK

1

Input

Clock - This clock is the wait-state generator and the pipeline
clock. It should be connected directly to the TMS320C30 HI
clock pin.

NAN

1

Output

Not-a-Number Interrupt - This pin is driven low for 1.5 CLK
cycles and signals an attempted conversion of the IEEE format: Not-a-Number. This pin can be tied directly to one of the
TMS320C30 interrupt pins and can signal command or message passing in multi-processor, shared-memory-type designs.

DA(31:0)

32

Input/Output

Data Bus A - This 32-bit bus should be tied to either one of
the two TMS320C30 data buses (Le., the primary or expansion buses).

DB(3I:O)

32

Input/Output

Data Bus B - This 32-bit bus is normally connected to a
memory array containing IEEE-formatted data.

TCK

1

Input

Test Clock.

TMS

I

Input

Test Mode Select.

RST

I

Input

Reset (active low) - This pin resets all logic on the device.

TDI

1

Input

Test Data In.

TDO

1

Output

Test Data Out.

TIP

1

Output

Test Instruction Register Parity - During instruction register
scan, when paused, this output reflects instruction register
even parity.

TMS320C30 IEEE Floating-Point Format Converter

371

Architectural Overview
Figure 2 shows the block diagram of the converter.

Figure 2. Converter Block Diagram

T
M

S
3
2

o

C
3

o
I
N

T
E
R
F

HOLDING
REGISTER

IEEE TO
TMS320C30
CONVERTER

S
C
A
N

HOLDING
REGISTER

HOLDING
REGISTER

L

o

TMS320C30
TO IEEE
CONVERTER

HOLDING
REGISTER

G
I
C

M
E

S
C

M

A

R
y

N
L

o
G
I

C
~----.! CONVERTER CONTROL LOGIC

I
N
T
E
R
F
A

C
E

A

C

E

o

TEST CONTROL LOGIC

Introduction
The TMS320C30 attains a peak performance of 33 MFLOPS, largely due to the floating-point format that it uses. In this format, both exponent and mantissa are represented in 2's-complement form.
In the IEEE format, the mantissa is represented in signed-magnitude form, and the exponent
includes a bias (i.e., an offset). Additionally, values of numbers are not determined by the same formula. Instead, the exponent is used to flag numbers that are encoded differently. For example, if
the exponent is 255, the value is considered not a number (NaN). Another exception is signaled
when the exponent is zero. In this case, the mantissa is defined to be denormalized.
The TMS320C30's floating-point format is considerably simpler; most numbers can be converted
to it without any loss of precision. However, some denormalized IEEE numbers are smaller than
can be represented in TMS320C30 format. When these numbers are converted, they are translated
to the closest TMS320C30 values. The error is less than ±2-127.

IEEE Floating-Point Format Overview
IEEE Standard 754-1985 defines formats for single-, single-extended-, double- and
double-extended-precision floating-point numbers. The single-precision format fits entirely with372

TMS320C30 IEEE Floating-Point Format Converter

in 32 bits, which is the bus width ofthe TMS320C30, and is the only format supported by the converter.
The format of the single-precision IEEE Standard 754-1985 is shown below:

Figure 3. Single-Precision IEEE Standard 754-1985 Format
31 30

O-BIT#

23 22

FRACTION

EXPONENT

S
MSB

LSB MSB

LSB

In this format,
S is the sign bit of the mantissa (0

=positive, 1 =negative).

EXPONENT is an unsigned 8-bit field that determines the location of the binary point
of the number being encoded.
FRACTION is a 23-bit field containing the fractional part of the mantissa.
LSB

is the least significant bit of a field

MSB

is the most significant bit of a field

The decimal value (v) of some number X is defined by one offive separate cases shown below:
Case 1: If EXPONENT =255 and FRACTION
Case 2: If EXPONENT =255 and FRACTION

pO

0, then v is NaN.

=O,then v = ± infinity.

Case 3: If 0 < EXPONENT < 255, then v =(_1)5 2exp--127 (1.FRAC)
where:
S is either 0 or 1

FRAC is the decimal equivalent of FRACTION
EXP is the decimal equivalent of EXPONENT
Note that an implied 1 exists to the left ofthe binary point as shown above. This means
the mantissa of an IEEE-encoded value has 24 bits of precision.
Case 4: If EXPONENT;::; 0 and FRACTION pO 0, then v is a denormalized number and
v =(_1)8 2- 126 (O.FRAC)
where
S is either 0 or 1

FRAC is the decimal equivalent of FRACTION
Note that an implied 0 exists to the left of the binary point as shown above. This means
the mantissa of an IEEE-encoded value has 24 bits of precision.
TMS320C30 IEEE Floating-Point Format Converter

373

Case 5: If EXPONENT =0 and FRACTION

=0, then v =± zero.

TMS320C30 Floating-Point Format Overview
TMS320C30 single-precision floating-point format uses a 2's-complement exponent and
mantissa and is shown in Figure 4.

Figure 4. TMS320C30 Single-Precision Floating-Point Format
31

o .... BIT#

24 23 22

EXPONENT
MSB

S
LSB

FRACTION
LSB

MSB

The decimal value (v) of some number X is determined as follows:
v ={(_2)S + (.FRAC)) 2exp
where S is either 0 or 1
FRAC is the decimal equivalent of FRACTION
EXP is the decimal equivalent of EXPONENT

An alternate way of describing the TMS320C30 mantissa is as follows:

ss.fraction
Note that the bit to the left of the binary point is implied and is the complement of the sign
bit. This gives the TMS320C30's mantissa 24 bits of precision and not 23 bits as might be expected.
For example:
The most positive TMS320C30 mantissa is
01.11111111111111111111111

=2 -

2-23

The least positive TMS320C30 mantissa is
01.0000 0000 0000 0000 0000 000 =1
The most negative TMS320C30 mantissa is
10.0000 0000 0000 0000 0000 000 =-2
The least negative TMS320C30 mantissa is
10.11111111111111111111 111

=-1- 2-23

Note that zero is uniquely identified when the TMS320C30 exponent is -128.

IEEE Number Conversion
This section describes the classifications of IEEE numbers, how they are decoded, and the
algorithms necessary to translate them to TMS320C30 format.
374

TMS320C30 IEEE Floating-Point Format Converter

IEEE Dynamic Range
Table 4 shows the dynamic range of IEEE numbers. This chart can be used to quickly determine the case classification of an IEEE number.

Table 4. IEEE Range of Numbers
Sign

Exponent

Type

Value

Mantissa

Case

0
0
0
0
0
0

FF
FF
FE
FE
FE
FE

0
0.000 ... 000
1.111...111
1.111 ... 110
1.111 ... 101
1.111...100

not applicable
+ infinity
(2_T23)x2127
(2_2-22)x2127
(2_T21+T23jx2127
(2_2-21 )x2 12

0
0
0
0
0

FE
FD
FD
FD
FD

1.000... 000
1.111 ... 111
1.111...110
1.111...101
1.111...100

2127
(2_T23)x2126
(2_2-22)x2126
(2_2-21+2-23Jx2126
(2_2-21 )x2 12

+ Normalized
+ Normalized
+ Normalized
+ Normalized
+ Normalized

0
0
0
0
0

01
00
00
00
00

1.000... 000
0.111...111
0.111...110
0.111...101
0.111...100

2-126
(1-T23)xT 126
(1_T22)xT126
(1_2-21+2-231x2-126
(1_2-21)x2-1 6

+
+
+
+
+

Normalized Number
Denormalized Number
Denormalized Number
Denormalized Number
Denormalized Number

3A
4A
4A
4A
4A

0
0
0
0

00
00
00
00

0.100 ... 000
0.011...111
0.011...110
0.011...101

T127
(1-2-222x2-127
(1_T 2 )xT 127
(1_T20+2-22)x2-127

+ Denormalized Number
- Denormalized Number
- Denormalized Number
- Denormalized Number

4A
4B
4B
4B

0
0
0
0
1
1
1
1

00
00
00
00
00
00
00
00

0.000 ... 011
0.000... 010
0.000 ... 001
0.000 ... 000
0.000 ... 000
0.000 ... 001
0.000 ... 010
0.000 ... 011

(1+T1)x2-148
2- 148
T 149
+ 0.0
-0.0
--(2- 149)
--(2- 148(
--(1+2- )xT 148

- Denormalized Number
- Denormalized Number
- Denormalized Number
+ Zero
- Zero
- Denormalized Number
- Denormalized Number
- Denormalized Number

4B
4B
4B

TMS320C30 IEEE Floating-Point Format Converter

NaN
+ Infinity
+ Normalized
+ Normalized
+ Normalized
+ Normalized

Number
Number
Number
Number

1
2A
3A
3A
3A
3A

Number
Number
Number
Number
Number

3A
3A
3A
3A
3A

5
5
4D
4D
4D

375

Table 4. IEEE Range of Numbers (Concluded)
Sign

Exponent

Mantissa

Type

Value

Case

1
1
1
1
1

00
00
00
00
00

0.011 ... 111
0.100 ...000
0.100 ...001
0.100 ... 010
0.100 ... 011

--(l_2-zz)xrlZ7
--(2-12
--(1+2- 2)xr127
--(l+2-21)x2-127
--(1 +r21+2-22)xr127

1

00

0.111...111

--(1_r23)x2-126

- Denormalized Number

4C

1
1
1
1

01
01
01
01

1.000... 000
1.000...001
1.000... 010
1.000... 011

--(r12~
--(1+2- 3)x2-126
--(1 +r22)xr126
--(1 +2-22+2-23)xr126

-

Normalized Number
Normalized Number
Normalized Number
Normalized Number

3C
3B
3B
3B

1
1
1
1
1

01
02
02
02
02

1.111...111
1.000...000
1.000...001
1.000...010
1.000...011

--(2_r23)xr126

i

-

Normalized
Normalized
Normalized
Normalized
Normalized

Number
Number
Number
Number
Number

3B
3C
3B
3B
3B

1
1
1
1

FE
FE
FE
FE

1.111...100
1.111...101
1.111...110
1.111...111

--(2_2-21)x2127
--(2_r22)x212
--(2_2-23)x2127

-

Normalized
Normalized
Normalized
Normalized

Number
Number
Number
Number

3B
3B
3B
3B

1

FF

=0

-infinity

- Infinity

?:

--(2-12

--(2+2- 3)xr125
--(2+2-22)x2-125
--(1 +2-22+r23)x2-125

--(2-2-21+2-23ix2127

-

Denormalized Number
Denormalized Number
Denormalized Number
Denormalized Number
Denormalized Number

4D
4D
4C
4C
4C

2B

IEEE-to-TMS320C30 Control Logic
The control logic that classifies incoming IEEE data in order to perform correct translation
to TMS320C30 format is shown below. The form of the expressions was chosen to minimize propagation delay through the device.
The logic is simplified if the following three factors are used (refer to typographical definitions for symbols used):

376

EXPFF=

IEEE(30)
IEEE(26)

& IEEE(29)
& IEEE(25)

& IEEE(28)
& IEEE(24)

& IEEE(27)
& IEEE(23)

EXPOO=

!( IEEE(30)
IEEE(26)

IIEEE(29)
IIEEE(25)

I IEEE(28)
I IEEE(24)

IIEEE(27)
I IEEE(23)

)

MANTO = !( IEEE(21)
IEEE(17)

I IEEE(20)
I IEEE(16)

I IEEE(19)
IIEEE(15)

I IEEE(18)
IIEEE(14)

I
I

&

I

TMS320C30 IEEE Floating-Point Format Converter

IEEE(13)
IEEE(9)
IEEE(5)
IEEE(1)

I IEEE(12)
IIEEE(8)
IIEEE(4)
I IEEE(O»

I IEEE(ll)
IIEEE(7)
IIEEE(3)

I IEEE(lO)
IIEEE(6)
IIEEE(2)

Then
Case 1: NaN
= EXPFF & ( IEEE(22) I !MANTO )

Case 2A: positive infinity
= !IEEE(31) & EXPFF &!( IEEE(22) I !MANTO)

Case 2B: negative infinity
= IEEE(31) & EXPFF & !( IEEE(22) I !MANTO)

'se 3A: positive normalized numbers

=!IEEE(31) & !EXPOO & !EXPFF
!

3B: negative normalized numbers with fraction

~

0

=IEEE(31) & !EXPOO & !EXPFF & (!MANTO I IEEE(22»
ase 3C: negative normalized numbers with fraction =0
=IEEE(31) & !EXPOO & !EXPFF & !( !MANTO I IEEE(22»
Case 4A: positive denormalized numbers ~ 2- 127

=!IEEE(31) & EXPOO & IEEE(22)
Case 4B: positive denormalized numbers < 2-127

=!lEEE(31) & EXPOO & !IEEE(22) & !MANTO
Case 4C: negative denormalized numbers :s (_1_2-23 ) x 2-127

=IEEE(31) & EXPOO & IEEE(22) & !MANTO
Case 4D: negative denormalized numbers> (_1_2-23 ) x 2-127
= IEEE(31) & EXPOO & (IEEE(22)

1\

!MANTO)

Case 5: positive and negative zero
= EXPOO & !IEEE(22) & MANTO

IEEE-to-TMS320C30 Conversion Algorithm Overview
Table 5 shows the conversion algorithms used on the sign, exponent, and mantissa fields of
IEEE numbers to produce the correspondingTMS320C30 fields. These fields are broken down into
bit-specific algorithms in the following section.
TMS320C30 IEEE Floating-Point Format Converter

377

Table S. Conversion Algorithms from IEEE to TMS320C30 Format
TMS320C30,
Exponent

Sign

l.

elEEE

SIEEE

fIEEE

2A.
2B.
3A.
3B.
3C.
4A.
4B.
4C.
40.
5.

7Fh
7Fh

SIEEE

7FFFFFh
OOOOOOh

Case

Note:

SIEEE

+ 8Ih
elEEE + 8Ih
elEEE

elEEE ,.,

80h

8Ih
80h
81h
80h
80h

Fraction

SIEEE

fIEEE

SIEEE

-fIEEE

SIEEE

-fIEEE

SIEEE

2 x flEEE
OOOOOOh
2 x -fIEEE
OOOOOOh
OOOOOOh

SIEEE
SIEEE

0
0

Fraction, above, has only 23-bits

IEEE-to-TMS320C30 Bit-Specific Conversion Algorithms
These circuits were designed by examining Table 5 and finding all possible choic
bit. The different choices were fed into data selectors, whose addresses were derivel
case-identifying logic described in the preceding section on control logic.
For maximum performance, all data selectors were designed from NAND gates. '.
permitted minimization by eliminating all NAND gates that had an input of 0 and by redut..
number of NAND inputs where a bit was always 1. However, for clarity, no minimization is!.
here. Instead, that detail can be seen in the following figures.
The following bit algorithms are shown in bit descending order, starting with IEEE bit

Figure 5. IEEE Bit 31 to TMS320C30 Bit 23
IEEE(31)---[;J=O------~

- - - . . TMS320C30(23)

CASE4D:

CASES

378

'

'

--

TMS320C30 IEEE Floating-Point Format Converter

Figure 6. IEEE Bit 30 to TMS320C30 Bit 31
IEEE(30)

=::j

ab

CASE3C
"1"

aB

"0"

Ab

IEEEBIAS(30)

AB

TMS320C30(31 )

b =CASEl I CASE2A I CASE2B I CASE3C
B= !b
A = CASE2A I CASE2B I CASE3A I CASE3B
a= !A
Figure 7. IEEE Bit 0 to TMS320C30 Bit 0+1, Where 29 ~ 0

"1"

---------.r ab
---------.r aB

"0"

-----------.! Ab

IEEE(n)

~

24

t - - - - . TMS320C30(n+1)

IEEEBIAS(n) -------~ AB

b = CASE2A I CASE2B I CASE3A I CASE3B
B= !b
a =CASE2A I CASE2B I CASEll CASE3C
A= !a
Figure 8. IEEE Bit 23 to TMS320C30 Bit 24

~

---------.r..
"1" ---------.r aB
"0" --------.r Ab

IEEE(23)

IEEEBIAS(23)

--------I-""'~

t - - - -... TMS320C30(24)

AB

~V

b =CASEl I CASE3C I CASE4B I CASE4D I CASE5
B= !b
A = CASE4B I CASE4D I CASE5 I CASE3A I CASE3B
a= !A
TMS320C30 IEEE Floating-Point Format Converter

379

Figure 9. IEEE Bit n to TMS320C30 Bit n, Where 22

O!:

n

O!:

1

~

IEEE(n)

abc

"1 It

.

"0"
IEEENEG(n)

..

IEEE(n-1)
IEEENEG(n-1)

abC

.....

aBc

TMS320C30(n)

aBC
Abc

V

C = CASE2A I CASE3B I CASE3C I CASE4C
c= !C

b =CASE 1 I CASE2A I CASE3A I CASE4A I CASE4C
B= !b
A =CASE4A I CASE4C
a= !A

Figure 10. IEEE Bit 0 to TMS320C30 Bit 0

IEEE(O) --------~
"1" --------~

"ott _ _ _ _ _ _ _ _

1----. TMS320C30(O)

~

B =CASE2A
b =!B
A =CASEl I CASE2A I CASE3A I CASE3B I CASE3C
a= !A

TMS320C30 Number Conversion
This section describes the classifications ofTMS320C30 numbers, how they are decoded,
and the algorithms necessary to translate them to IEEE format.

TMS320C30 Dynamic Range
Shown in Table 6 is the dynamic range ofTMS320C30 numbers. As with Table 4, this table
can be used to quickly determine case classification of a TMS320C30 number.
380

TMS320C30 IEEE Floating-Point Format Converter

Table 6. TMS320C30 Range of Numbers

Case

Type

Value

Mantissa

Exponent

Sign

7F
7F
7F
7F

0
0
0
0

1.111...111
1.111...110
1.111...101
1.111...lO0

(2_Z-23)x2127
(2_2-22)x2127
(2_Z-21+Z-23jX2127
(2_Z-21 )x2 12

Positive
Positive
Positive
Positive

Number
Number
Number
Number

6
6
6
6

7F
7E
7E
7E

0
0
0
0

1.000 ... 000
1.111...111
1.111...110
1.111...lOi

2127
(2_2-23)x2126
(2_2-22)x2126
(2_2-21+2-23)x2126

Positive
Positive
Positive
Positive

Number
Number
Number
Number

6
6
6
6

00
FF
FF
FF

0
0
0
0

1.000... 000
1.111...111
1.111...110
1.111...101

1
1_2-24
1_2-23
l_Z-22+2-24

Positive
Positive
Positive
Positive

Number
Number
Number
Number

6
6
6
6

FF
FE
FE
FE

0
0
0
0

1.000... 000
1.111...111
1.111...110
1.111...101

2-1
(2_2- 23 )xZ-2
(2_2-22 )xZ-2
(2_2-21 +Z-23)x2- 2

Positive
Positive
Positive
Positive

Number
Number
Number
Number

6
6
6
6

82
81
81
81
81

0
0
0
0
0

1.000 ... 000
1.111 ... 111
1.111...110
1. 111... lOl
1.111...lO0

2-126
(2_Z-23)xZ-127
(2_Z-22)xZ-127
(2_2-21+T231xZ-127
(2_2-21)x2-1 7

Positive Number
Positive Number
Positivr Number
Positive Number
Positive Number

6
7 (note
7 (note
7 (note
7 (note

81
81
81

0
0
0

1.000... 0lO
1.000... 001
1.000... 000

(1 +Z-22)x2-127
(1 +Z-23)x2- 127
Z-127

Positive Number
Positive Number
Positive Number

7 (note 1)
7 (note 1)
7 (note 1)

80
80
80

0
0
0

0.111...111
0.111...110
O.l11...lOl

(note 2)
(note 2)
(note 2)

Implied Zero
Implied Zero
Implied Zero

8
8
8

80

0

0.000 ... 001

(note 2)

Implied Zero

8

TMS320C30 IEEE Floating-Point Format Converter

1)
1)
1)
1)

381

Table 6. TMS320C30 Range of Numbers (Concluded)
Exponent

Sign

80

0

0.000 ...000

0.0

Zero

8

80
80
80

1
1
1

10.111...111
10.111...110
10.111 ... 101

(note 2)
(note 2)
(note 2)

Implied Zero
Implied Zero
Implied Zero

(note 3)
(note 3)
(note 3)

80
80
80

1
1
1

10.000... 011
10.000... 010
10.000... 001

(note 2)
(note 2)
(note 2)

Implied Zero
Implied Zero
Implied Zero

(note 3)
(note 3)
(note 3)

80

1

10.000... 000

(note 2)

Implied Zero

8

81
81
81

1
1
1

10.111...111
10.111...110
10.111...101

(_1_2-23)x2-127
H-T221xT127
(-1-2-2 +2-23)x2-127

Negative Number
Negative Number
Negative Number

9 (note 1)
9 (note 1)
9 (note 1)

81
81

1
1

10.000... 010
10.000... 001

(_2+T22)xrI27
{_2+2-23)xrI27

Negative Number
Negative Number

9 (note 1)
9 (note 1)

81
82
82
82

1
1
1
1

10.000... 000
10.111...111
10.111...110
10.111...101

-(r I26)
(-1-2-2 )x2- 126
(_1_2-22)x2-126
(_1_2-21+2-23)x2-126

Negative
Negative
Negative
Negative

Number
Number
Number
Number

10
11
11
11

FF
FF
00
00
00

1
1
1
1
1

10.000... 001
10.000... 000
10.111...111
10.111...110
10.111...101

_1+2-24
-1
(_1_r23)x2~1
(_1_2-22)x2-1
(_1_2-21+2-23)x2-1

Negative Number
Negative Number
Negative Number
Negative Number
Negative Number

11
10

00
00
01
01
01

1
1
1
1
1

10.000... 001
10.000... 000
10.111...111
10.111...110
10.111...101

_2+2-23
-2
_2_r22
_2_2-21
_2_2-20 +2-22

Negative Number
Negative Number
Negative Number
Negative Number
Negative Number

11
10
11
11

7F
7F

1
1

10.000... 001
10.000... 000

(_2+r23)x2127
-(2 128)

Negative Number
Negative Number

11
12

382

Mantissa

Value

Type

Case

11
11
11

11

TMS320C30 IEEE Floating-Point Format Converter

Notes:

1) Numbers converted to IEEE denormalized values lose one least significant bit of accuracy.
2) The TMS320C30 does not produce these numbers under normal arithmetic operations. Because the exponent
of these numbers is -128, the TMS320C30 considers them zero. TMS320C30 Boolean operations are capable of producing numbers of these forms. Because of this, proper conversion to IEEE format is unclear and
should be avoided. See note 3.
3) Case 8 & Case 9 are activated simultaneously. This is the only instance where the cases arc not mutually exclusive. The TMS320C30 does not produce these numbers under normal arithmetic operations. Because the
exponent of these numbers is -128, the TMS320C30 considers them zero. TMS320C30 Boolean operations
are capable of producing numbers of these forms. Because of this, proper conversion to IEEE format is unclear. This dilemma can be resolved with minor modification to the case qualifier logic. See note 2.

TMS320C30-to-IEEE Control Logic
Conversion from TMS320C30 format to IEEE format is qualified with a different set of
Boolean equations. To eliminate confusion between IEEE and TMS320C30 cases, different case
numbers are used.
The logic is simplified if the following three factors are used:
EXPSO Sl =

!C30(31)
C30(27)

I C30(30)
I C30(26)

I C30(29)
I C30(2S)

I C30(2S)

EXP7F=

!C30(31)
C30(27)

& C30(30)
& C30(26)

& C30(29)
& C30(2S)

& C30(2S) &
& C30(24)

MANTO =

C30(22)
C30(lS)
C30(14)
C30(10)
C30(6)
C30(2)

I C30(21)
I C30(17)
I C30(13)
I C30(9)
I C30(S)
I C30(1)

I C30(20)
I C30(16)
I C30(12)
I C30(S)
I C30(4)
I C30(O)

I C30(19)
IC30(lS)
I C30(1l)
I C30(7)
I C30(3)

Then,
Case 6: positive numbers ~ 2- 126
= !EXPSO_81 & !C30(23)
Case 7: positive numbers N such that
(2_2-23 ) x 2- 127 ~ N;;, 2-127
= EXPSO_S1 & C30(24) & !C30(23)
Case S: zero
= EXPSO_S1 & C30(24)

TMS320C30 IEEE Floating-Point Format Converter

383

Case 9: negative numbers N such that
(_1_2-23)x2-127 O!: NO!: (_2+2-23 )x2-127
= EXP80_81 & C30(23) & lMANTO

Case 10: negative numbers N such that
-(2- 126)

2!

N O!: -(2127) and whose fraction is 0

= l( EXP80_81 & !C30(24» & lEXP7F & C30(23) & MANTO

Case 11: negative numbers N such that

-(2-126) > N > -(2 128) and whose fraction.,. 0
= lEXP80_81 & C30(23) & !MANTO

Case 12: negative 2 128
= EXP7F & C30(23) & MANTO

TMS320C30-to-IEEE Conversion Algorithm Overview
Table 7 shows the conversion algorithms used on the sign, exponent, and mantissa fields of
TMS320C30 numbers to produce the corresponding IEEE fields. These fields are broken down into
bit-specific algorithms in the next section.

Table 7. Conversion Algorithms from TMS320C30 to IEEE Format
IEEE
Case
6
7
8
9
10
11

12

Sign
sC30
sC30
0
sC30
sC30
sC30
sC30

Exponent
eC30+ 7Fh
00
00
00
eC30+ 8Oh
eC30+ 7Fh
FFh

Fraction
fC30
(fC3012)+400000h
OOOOOOh
(fC30+ 1)/2+400000h
OOOOOOh
fC30+ 1
OOOOOOh

TMS320C30-to-IEEE Bit-Specific Conversion Algorithms
These circuits were designed by examining Table 7 and finding all possible choices for each
bit. The different choices were fed into data selectors whose addresses were derived from the
case-identifying logic described in the preceding section on TMS320C30 to IEEE control logic.
Just as in the IEEE case-identifying logic, all data selectors were designed from NAND gates
for maximum performance. This also permitted minimization by eliminating all NAND gates having an input of 0 and by reducing the number of NAND inputs where a bit was always 1. However,
for clarity, no minimization is shown here. Instead, that detail can be seen in the following figures.
384

TMS320C30 IEEE Floating-Point Format Converter

The following bit algorithms are shown in bit-descending order, starting with TMS320C30
bit 31.

Figure 11. TMS320C30 Bit 31 to IEEE Bit 30

TMS320C30BIAS(31)

--------~

ab

"1" --------~aB

"0" --------~ Ab

I----~

IEEE(30)

) - - - - - - . . AB

B =CASElO I CASEl2

b =!B

a =CASE61 CASElli CASEl2

A= !a

Figure 12. TMS320C30 Bit 0 to IEEE Bit 0-1, Where 31

TMS320C30BIAS(n)

"1"

O!:

0

O!:

24

--------~~
--------~

aB
1----.IEEE(n-1)

"0" - - - - - - - -.... Ab
TMS320C30(n) - - - - - - - -.... AB

~/

B = CASElO I CASEl2
b= !B

a =CASE61 CASElli CASEl2

A= !a

0

Figure 13. TMS320C30 Bit 23 to IEEE Bit 31
TMS320C30(23)
CASE8

_ _ _--I
.

TMS320C30 IEEE Floating-Point Format Converter

1 - - - - IEEE(31)

385

Figure 14. TMS320C30 Bit 22 to IEEE Bit 22

TMS320C30(22) --------~ ab

"1"

--------~

as
1 - - - -.... IEEE(22)

"0" - - - - - - - -.. Ab
TMS320C30NEG(22) --------~ AS

B =CASE71 CASE91 CASEll
b= !B
a = CASE61 CASE71 CASE9
A= !a

Figure 15. TMS320C30 Bit 0 to IEEE Bit 0, Where 21

TMS320C30(n+1)

il!: 0

il!:

1

--------~~

TMS320C30(n) --------~
I-----I~.

IEEE(n)

"0" --------~

TMS320C30NEG(n+1) --------~
TMS320C30NEG(n) - - - - - - - -..
~/

C = CASE61 CASE9
c= !C

b = CASE61 CASE71 CASEll
B= Ib
A=CASEll
a=IA

386

TMS320C30 IEEE Floating-Point Format Converter

Figure 16. TMS320C30 Bit 0 to IEEE Bit 0:
TMS320C30(0)

-------~~~

TMS320C30(1) -----------'~ aB

1 - - - -..... IEEE(O)
"0" ---------~
.. Ab
TMS320C30NEG(1) --------~ AB

V

B = CASE71 CASE9
b =!B
a =CASE61 CASE71 CASEll
A= !a

Scope of Conversion
This section describes the actions taken by the converter when it converts to and from the
IEEE format. When there is not a match between formats, the converter forces the translated number to the closest approximation.

IEEE-to-TMS320C30 Exceptions
The match is not exact in translating from four sets of IEEE numbers to TMS320C30 numbers. They are: NaN, ± infinity, ± zero and denormalized numbers too small to represent.

NaN (Not a Number)
The NaN format is especially useful in passing commands to another process. So that commands can be passed through the converter, NaNs are not converted. However, the bit positions of
the sign and exponent bits are altered. That is, the sign bit of the IEEE number is transferred to the
sign bit of the TMS320C30 format. Likewise, the exponent field is transferred. In this way, the sign
of the NaN is preserved which may aid in quick detection of the code. In other words, the
TMS320C30 Branch on Positive instruction (BP) or Branch on Negative instruction (BN) are effective. So that the command can be acted on quickly, a NaN interrupt is generated.
Infinity
When positive or negative infinity is passed through the converter, the most positive or negative TMS320C30 number is produced.
±

Denormalized numbers whose magnitude < 2-126
Half of the denormalized IEEE numbers are out of range of TMS320C30 numbers. These
denormalized numbers have very small magnitudes and are therefore forced to zero when converted.
± Zero

The IEEE format includes representations for positive and negative zero, but the
TMS320C30 format does not. The converter forces each of these numbers to the singular
TMS320C30 zero format.
TMS320C30 IEEE Floating-Point Format Converter

387

TMS320C30-to-IEEE Exceptions
There are two sets ofTMS320C30 numbers that do not perfectly match IEEE numbers. One
set consists of a single value (- 2 127). The other consists of numbers converted to IEEE denormalized numbers.
_2127

The single value, - 2 127 , is a very large negative number. When this number is translated, negative infinity is produced.
Numbers Translated to Denormalized Values
When the exponent is -127, denormalized IEEE numbers are produced, and one least significant bit of accuracy is lost. This occurs because the TMS320C30 mantissa must be right-shifted
one bit in order that the exponent be increased to -126, which is the most negative exponent the
IEEE format can use.

Converter Operating Modes
The converter is controlled by the TMS320C30. Conversions occur when the converter's
output enable pin (OE) is active (i.e., low) and the TMS320C30 performs a read or write over its
primary (STRB active) or expansion (MSTRB active) buses. This requires the converter to be
placed directly between the TMS320C30 and external memory. That memory is where IEEE data
will be stored. If direct (i.e., no conversion wanted) access to that memory is desired, transceivers
like the SN74LS245 should be added in parallel with the converter. However, doing so requires that
only one data path be enabled at a time. If unused, one of the XF pins of the TMS320C30 can be
dedicated to perform this selection.
During a read, data is converted from IEEE format to TMS320C30 format. During a write,
data is converted from TMS320C30 format to IEEE format. This will happen if the TMS320C30
R/W or XR/W pin is tied to the converter's direction (DIR) pin. Table 8 shows how to put the converter into its two operating modes and briefly describes each mode.

Table 8. Converter Operating Modes
Mode

Pin

Memory

PIPE=O

Flow-Through Conversion Enabled - In this mode, the converter essentially
behaves like a simple bus transceiver, such as an SN74LS245, except with an
integrated floating-point format converter. When this mode is used, conversions take two cycles. Because of this, the converter automatically generates a
wait state, which will halt the TMS320C30 for one cycle until the conversion
is complete.

Pipeline

PIPE=1

Converter's Pipeline Registers Enabled Internally - This mode permits
single-cycle conversion. As one data value is being converted, a previously
converted value is output.

388

Description

TMS320C30 IEEE Floating-Point Format Converter

Memory Mode Operation
In this mode, one wait cycle is automatically generated during conversions from
• IEEE format to TMS320C30 format (reads)
• TMS320C30 format to IEEE format (writes)
The converter will not generate wait cycles of any other length and requires that the
TMS320C30 HI clock pin be tied to the converter's CLK pin. Figure 17 shows the timing diagram
for this mode of operation.

Figure 17. Memory Mode Timing Diagram

ClK
OE
OIR
WAIT

=\

=;

~

OA(31 :0) 08(31:0)

~««««

)..\---!---L_..J..---l-----'-_J.....-

\

oj

\

:C30 9UT

~

:C30 9UT

--« : IEE~IN:

~

: IEE:EIN :

I

~

>- ~ - ~

\,,---,_--._--,__

T~S32~C30 \N ~-~I

I

>--:--~< I~EEO~T }--:-1

'

I

I

I

I

I

I

Pipelined Operation
Pipeline mode permits consecutive conversions every instruction cycle without wait cycles.
However, because the pipeline has two internal stages, it takes two consecutive occurrences of the
same operation (i.e., two reads or two writes) before it is filled. Therefore, the first read after a transition from a write will not provide properly converted data, and vice versa.
There is an address skew of one address when consecutive data values are converted. This
should not be a major problem when blocks of memory are converted. The only added task will
be to perform one extra transfer (read or write) to convert the last value remaining in the pipeline.
With this exception, operation is identical to the Memory mode. Figure 18 shows a timing diagram
for this mode of operation.

TMS320C30 IEEE Floating-Point Format Converter

389

Figure 18. Pipeline Mode Timing Diagram

elK

DIR

J,

WAIT ______~________~__________
, --~------------~----~--_r-A(31:0)

-~

8(31:0)

-~ IE~E IN 1

:'

,

X'~OUT1:X'C~OUT2>-~"-.l....-'C-30"':I-N-1-.o.-'"\*
X IEE:E IN2 X

IE~E IN 3

>--K

:'

I

: 'C30:IN2

~

~"':'--IE-E"'~""O-U-T"'1:---....~
I

I

I

Interrupts
The converter automatically generates an interrupt whenever the conversion of an IEEE
number classified as Not a Number (NaN) is attempted. The interrupt pulse is 1.5 HI cycles wide.
This is compatible with the TMS320C30 edge-triggered interrupt types. Table 9 shows this interrupt and its trigger. Note that the converter does not change the value of the NaN, but it does alter
its bit positions. This assures that the sign bit of the IEEE number remains a sign bit in the
TMS320C30 format. The same is true of the exponent field. The fractional field is left unchanged.
If NaN is used to pass a code or command to the TMS320C30, interpretation of the code requires
only the alteration of the comparison mask in software. For more information, refer to the previous
subsection NaN (Not a Number).

Table 9. NaN Interrupt

390

Name

Function

Sources

NAN

Not a Number

IEEE CASEl: NaN

TMS320C30 IEEE Floating-Point Format Converter

Software Application Examples
Simple Nonpipelined Conversion
If an external device (i.e., RAM, ROM, dual bus RAM, latch, etc.) contains a single-precision
IEEE floating-point number and the corresponding TMS320C30 number is needed, the following
TMS320C30 code will perform the required conversion:
EXTD

*

.word

OBOOOOOh

put address of external device here

LDI
LDF

@EXTD,ARO
*ARO,RO

load ARO w/address of external device
RO=C30 formatted number

The following example performs TMS320C30-to-IEEE format conversion:
EXTD

*

.word

0800000h

put address of external device here

LDI
STF

@EXTD,ARO
RO, *ARO

load ARO w/address of external device
location pointed to by ARO=IEEE formatted
number

*

Simple Pipelined Conversion
This example illustrates the overhead when the converter's pipeline mode is used. Since a
single value will be converted, it is necessary to read the converter one extra time to flush the pipeline. Once again, assume that an external device (i.e., RAM, ROM, dual bus RAM, latch, etc.) contains a single-precision IEEE floating-point number, and the corresponding TMS320C30 number
is needed.
EXTD
*
*

.word

OBOOOOOh

put address of external device here

LDI
LDF

@EXTD,ARO
*ARO,RO

LDF

*ARO,RO

load ARO w/address of external device
ignore loaded value, 1st load queues
pipeline
RO=C30 formatted number, address is
immaterial

*

The following example performs TMS320C30 to IEEE format conversion:
EXTD

*

.word

OBOOOOOh

put address of external device here

LDI
STF
STF

@EXTD,ARO
RO,*ARO
RO,*ARO

load ARO w/address of external device
value stored not correct until 2nd store
location pointed to by ARO=IEEE formatted
number

*

Pipelined Block Conversions
In the previous subsection, the pipeline was used, but not efficien~ly. This example shows a
more typical application of pipeline mode. Again, external memory contains IEEE formatted data.
N

EXTD
DADR

.set
.word
.word

03FFh
OBOOOOOh
OB09BOOh

N = # of values to convert - 1
put external address here
put destination address here

*
TMS320C30 IEEE Floating-Point Format Converter

391

Lor

RCR:

LDI
LDF
LDI
RPTB
LDF
STF

@EXTD,ARO
@DADR,ARl
*ARO++,RO
N,RC
RCR
*ARO++,RO
RO,*AR1++

*

load ARO w/address of external device
load ARl w/destinatiort address
prime (preload) the converter's pipeline
block will be repeated N (0400h) times
specify end address of block repeat
read converted values into RO
store converted values into on-chip
memory

This is more efficient:
N

EXTD
DADR
*

.set
.word
.word

03FEh
0800000h
0809800h

N = # of values to convert - 2
put external address here
put destination address here

LDI
LDI
LDF
LDF
RPTS

@EXTD,ARO
@DADR,ARl
*ARO++,RO
*ARO++,RO
N

LDF
STF

*ARO++,RO
RO,*AR1++

load ARO w/address of external device
load ARl w/destination address
prime (preload) the converter's pipeline
read 1st converted value for 1st STF
repeat next instruction N-l (03FFh)
times, extra loop is to store last
value converted
read converted values into RO
store converted values into on-chip
memory, 1st store will save junk

*

*

II

*

The following example performs TMS320C30 to IEEE format conversion:
N
EXTD
SADR

*
*
*
AC:

.set
.word
.word

0400h
0800000h
0809800h

N equals number of values to convert
put external address here
put source data address here

LDr
LDr
LDr

@EXTD,ARO
@SADR,AR1
N,RC

RPTB
LDF
STF

AC
*AR1++,RO
RO,*ARO++

load ARO w/address of external device
load AR1 w/source data address
block will be repeated N+1 (0401h) times,
extra loop is to store last value
converted
specify end address of block repeat
read TMS320C30 format numbers into RO
store converted values into external
device

*
This is more efficient:
N

EXTD
SADR
*

*
*

II

*

.set
.word
.word

03FFh
0800000h
0809800h

N equals number of values to convert - 1
put external address here
put source data address here

LDr
LDr
LDF
RPTS

@EXTD,ARO
@SADR,AR1
*ARO++,RO
N

LDF
STF

*AR1++,RO
RO,*ARO++

STF

RO,*ARO++

load ARO w/address of external device
load AR1 w/source data address
read 1st converted value for 1st STF
repeat next instruction N (0400h) times,
extra loop is to store last value
converted
read converted values into RO
store converted values into external
device
store last value

Using TMS320C30 External Flag 0 (XFO)
As mentioned in the section on converter operating modes, one of the TMS320C30's XF pins
can be tied to the converter's output enable (OE) pin to enable the data path through the converter

392

TMS320C30 IEEE Floating-Point Format Converter

or to bypass it, as the case may be. The following TMS320C30 code uses the TMS320C30 XFO
pin to do this (see Hardware Applications Examples section later in this report for the hardware
configuration). Nonpipelined mode is assumed.
N

EX TO
SAOR

*

II

*

.set
.word
.word

03FFh
0800000h
0809800h

N equals number of values to convert put external address here
put source data address here

LOI
LOI
LOI
LOF
RPTS
LOF
STF

@EXTO,ARO
@SAOR,AR1
2,IOF
*ARO++,RO
N
*AR1++,RO
RO,*AR1++

LOI

6,IOF

load ARO w/address of external device
load AR1 w/source data address
XFO=output=O, select the converter
read 1st converted value for 1st STF
repeat next instruction N+1 (0400h) times
read converted values into RO
store converted values into on-chip
memory, 1st store will save junk
XFO=output=l, deselect the converter

1

Using the TMS320C30 DMA Capability
The built-in TMS320C30 DMA controller can be used to read converted IEEE values. The
TMS320C30 assembly code to set up the DMA is shown below. Non-pipelined mode is assumed.
OMA
.word
GLBL
.word
N
.set
EXTO
.word
OAOR
.word
*
OMA controller
*
*
LOI
LOI
LOI
LOI
LOI
STI
STI
STI
STI

0808000h
OC53h
0400h
0800000h
0809BOOh

base address of OMA registers
OMA global register init value
N equals number of values to convert
put external address here
put destination data address here

setup
@OMA,ARO
@EXTO,RO
@OAOR,R1
N,R2
@GLBL,R3
RO,*+ARO(4)
R1,*+ARO(6)
R2,*+ARO(8)
R3,*ARO

ARO -> OMA control registers
RO
address of IEEE data
R1 = converted data destination address
R2 = OMA transfer count
R3 = OMA Global register initial value
OMA will transfer from external device
OMA will transfer to RAM block 0
OMA will transfer N values
start the OMA

Hardware Application Examples
IEEE Data Stored in TMS320C30 External MSTRB Memory
Below is shown an example of interfacing the converter to TMS320C30 external memory
containing only IEEE formatted data. In this configuration, it is likely that the memory would be
dual bus RAM to enable a second processor to share data with the TMS320C30 through this
memory. Figure 19 shows an interface to a static RAM (SRAM) bank.

TMS320C30 IEEE Floating-Point Format Converter

393

Figure 19. Interface to Static RAM

I
XA(12:0)

I

I

AOOR(12:0)

MSTRB

CS
IEEE CONVERTER

T
M
S
3
2
0
C
3
0

H1

SRAM
8Kx32

OE
ClK

XROY

WAIT

XO(31:0)

OA(31:0)

-

OB(31:0) ! - - OATA(31:0)

OIR

-

WE

XR/W

I--

.~

OE

I--

SN74AlS04

Bypassing the Converter
A previous subsection (Using TMS320C30 External Flag 0) showed TMS320C30 assembly
code that used the TMS320C30 XFO pin either to steer data through the converter or to bypass the
converter for direct, or unconverted, access to that memory. Figure 20 shows a circuit that can be
used with that code.

394

TMS320C30 IEEE Floating-Point Format Converter

Figure 20. Steered Access to the Memory

(4) SN74AlS245
6(8:1)
SN74AlS32

~

G
A(8:1)
OIR

XA(12:0)

AOOR(12:0)

MSTR6

CS
IEEE CONVERTER

T
M
S
XFO
3
2 XO(31:0)
0
C
3
0

-

tr=>-

XROY

SRAM
8Kx32

OE
OA(31:0)
WAIT

H1

06(31:0)

ClK
r--

I-

OATA(31:0)

OIR

XR/W

WE

~

OE

SN74AlS04

TMS320C30 IEEE Floating-Point Format Converter

395

JTAG/IEEE-1149.1 Scan Interface
Integrated circuit and board-level testing is increasingly important. JTAG or IEEE-1149.1
is a standard test methodology. It is based on a 4-wire connection to a device and provides access
to all I/O buffers (boundary scan) of a device. This permits stimulation and observation of internal
logic. By allowing stimulation of output pins and observation of input pins, external circuitry can
also be tested. If implemented completely, this can eliminate "bed of nails" test rigs.
The TMS320C30-IEEE Floating-Point Format Converter is equipped with a JTAG/
IEEE-1149.1 compatible scan interface. The internal architecture is based on Texas Instruments'
SCOPEtm design specifications. This provides for boundary-scanning of the device and inclusion
of an eight-bit instruction register.
Figure 21 shows the internal scan architecture and gives the naming conventions used to describe the device blocks:

Figure 21. Scan Architecture

BOUNDARY DATA REGISTER

BYPASS DATA REGISTER

TDO
TDI

INSTRUCTION REGISTER

TMS
TITAP

TIP

TCK

I/O Pin Description

TCK
The TCK input clock signal is the scan clock. It typically will be generated off-board by a
test controller. All tests of the device are controlled by an external controller and proceed at the scan
clock (TCK) speed.

TMS
The TMS input signal is clocked in by TCK. TMS controls the test mode of the device. Using
TMS and TCK, a test controller can scan registers through the device, perform tests, or place the
device in a normal functional mode.

396

TMS320C30 IEEE Floating-Point Format Converter·

TDI
The TDI input signal is used to input serial data through the registers in the device. All data
is clocked in by TCK and shifts according to the state of the test logic set up by an external test controller using TMS and TCK.

TDO
The TDO output signal is used to scan serial test data out of the device under the control of
the test host. While shifting data, TDO is active-shifting data out on the falling edge ofTCK. When
through shifting data, IDO is tri-stated.

TIP
TIP is an output indicating good or bad parity in the instruction register. The indication defaults to good if the external controller does not check for parity. To check parity, the test controller
places the device in the instruction register pause state. While in this state, the device will output
the actual (i.e., hardware-determined) parity of the device's instruction register. A high logic level
indicates good parity, while a low logic level indicates bad parity.

Architectural Elements
TITAP
The Texas Instruments' Test Access Port (TITAP) is a 16-state state-machine designed according to the JTAG and IEEE-1149.1 specifications. The TITAP controls the test logic and is controlled by the TMS and TCK inputs to the device from an external test host controller.

Instruction Register
The Instruction Register is eight bits in length. Table 10 lists the instructions available for
this device.

TabJe 10. Test Instructions
msb-> Isb

Instruction

00000000
10000001
10000010
00000011
00000110
10000111
00001010
10001011
00001100
11111111
All Others

Boundary Scan
ID Register Scan
Sample Boundary Scan
Boundary Scan
Control Boundary HI-Z
Control Boundary 1/0
Read Boundary-Normal
Read Boundary-Test
Boundary Selftest
Bypass Scan
Bypass Scan

The Instruction Register is preloaded with 00000001 (msh-lsb) in the instruction register
capture state of the TITAP. This is not per the JTAG/IEEE-1148.1 standards.
TMS320C30 IEEE Floating-Point Format Converter

397

Boundary Scan Instruction
This instruction places the device in test mode: all function inputs and outputs are controlled
by the test logic. Function inputs and outputs are sampled in the data register capture state of the
TITAP, and the boundary data register is selected in the data register scan path during data register
scans.

ID Register Scan Instruction
This instruction places the device in normal mode: all function inputs and outputs operate
in their normal modes. The bypass data register is selected in the data register scan path during data
register scans.

Sample Boundary Scan Instruction
This instruction places the device in normal mode: all function inputs and outputs operate
in their normal modes. Function inputs and outputs are sampled in the data register capture state
of the TITAP, and the boundary data register is selected in the data register scan path during data
register scans.

Control Boundary HI-Z Instruction
This instruction places the device in test mode: all function outputs are tri-stated (if possible),
while ali function inputs operate in their normal mode. The bypass data register is selected in the
data register scan path during data register scans.

Control Boundary 1/0 Instruction
This instruction places the device in test mode: all function inputs and outputs are controlled
by the test logic. The bypass data register is selected in the data register scan path during data register scans.'

Read Boundary - Normal Instruction
This instruction places the device in normal mode: all function inputs and outputs operate
in their normal modes. The boundary data register retains its current state in the data register capture
state of the TITAP, and the boundary data register is selected in the data register scan path during
data register scans.

Read Boundary - Test Instruction
This instruction places the device in test mode: all function inputs and outputs are controlled
by the test logic. The boundary data register retains its current state in the data register capture state
of the TITAP, and the boundary data register is selected in the data register scan path during data
register scans.

Boundary Self-Test Instruction
This instruction places the device in normal mode: all function inputs and outputs operate
in their normal modes. The boundary data register contents are toggled, and the data register captures the state of the TITAP. Also, the boundary data register is selected in the data register scan
path during data register scans.
398

TMS320C30 IEEE Floating-Point Format Converter

Bypass Scan Instruction

This instruction places the device in normal mode: all function inputs and outputs operate
in their normal modes. The bypass data register is selected in the data register scan path during data
register scans.

Boundary Data Register
The boundary data register contains 70 bits and is ordered according to Figure 22.

Figure 22. Scan Path Bit Order
TDI -> DIR -> PIPE -> CLK -> OEZ -> NAN -> WAIT->
DA31 -> DA30 -> ... -> DA1 -> DAO->
DB31-> DB30 -> ... -> DB1-> DBO - - - - - > IDO

Bypass Data Register
The Bypass Data Register is one bit in length and is operated in accordance with the JTAG/
IEEE-1149.1 specifications.

Scan References
Refer to the following documents for further descriptions of the test logic of this device:
1) A Test Access Port and Boundary Scan Architecture; Technical Sub-Committee of the
Joint Test Action Group (JTAG).
2) IEEE Standard 1149.1- IEEE Standard Test Access Port and Boundary-Scan Architecture.

TMS320C30 IEEE Floating-Point Format Converter

399

400

TMS320C30 IEEE Floating-Point Format Converter

Part IV. Telecommunications
11. Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX
(Mark D. Grosen)

401

402

Implementation of a CELP
Speech Coder for the TMS320C30
Using SPOX

Mark D. Grosen
Spectron Microsystems, Inc.

403

404

Implementation of a CELP Speech Coder for the. TMS320C30 Using SPOX

Introduction
Speech coders are critical to many speech transmission and store-and-forward systems. With
the emergence of universal standards, it is possible to develop systems that are interoperable. Quality and bit rate for speech coders vary from toll quality at 32 kilobits/second (kbps) (CCIIT
ADPCM) to intelligible quality at 2.4 kbps (DOD LPC-lO). Recently, a new standard for 4.8 kbps
with near toll-quality has been proposed and is based on code-excited linear prediction (CELP)
techniques [1,2]. Unfortunately, products based on new coding algorithms are often slow to appear
because of the considerable time and effort required to develop real-time implementations.
The purpose of this article is to demonstrate how a CELP coder based on this new standard
can be quickly developed using SPOX. Utilizing the power of the TMS320C30 DSP plus the ease
of use provided by C and the SPOX DSP library, an efficient and portable coder can be written in
a much shorter period of time than that required by conventional assembly language methods. Because of the portability of SPOX and C, the coder can also be compiled and executed on a variety
.of hardware platforms.

A 4.8-kbps CELP Coder
CELP coders were first introduced by Atal and Schroeder in 1984 [3]. These coders offer
high quality at low bit rates, but at a high computational cost. Implementing the original systems
directly required several hundred million instructions per second (MIPS). Much of the research on
CELP techniques has concentrated on reducing this computational load to facilitate real-time implementations.
The proposed U. S. Federal Standard 4.8-kbps CELP coder (USFS CELP), Version 2.3, uses
several techniques to reduce the complexity to a level where a one- or two-processor implementation is possible. These are the main characteristics of the coder:
• 240-sample frame size at 8-kHz sampling rate
• Tenth-order short-term predictor
- Calculated once per frame, open loop
- Autocorrelation with Hamming window
- LSP quantization
• Four subframes (60 samples)
- One tap pitch predictor
1) Closed loop analysis
2) Even/odd sub frame delta search method
- 1024-e1ement codebook
1) Overlapped by 2 (see Pitch and Codebook Search)
2) 75% of elements are zero
Block diagrams of the decoder and encoder are shown in Figure 1.

Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

405

Figure 1. USFS CELP Decoder and Encoder Structures

I'NDEX.-1~ j-rJ------J

CODEBOOK

ADAPTIVE
POSTFILTER

SYNTHESIZED
SPEECH

~

~
I
FROM CHANNEL

DECODER

LSP

-+- CODES-,

INPUT
SPEECH

I

r------~

I

r----------..., II

I

I
I
I

~-~----~-~I
DELAYI
GAIN

I

INDEX!
GAIN

L ____ --,

:

I

,_J

L$J
't

TO CHANNEL

ENCODER
Bit allocations are given in Table 1 [2,4].

Table 1. 4.S.kbps CELP Parameters
Spectrum
Update
Parameters
Bps

30 ms (240 samples)
10LSP
1133.3

Pitch
7.5 ms (60)
1 delay, 1 gain
1466.7

Codebook
7.5 ms (60)
1 of 1024 index, 1 gain
2000

Remaining 200 bps reserved for expansion, error protection, and synchronization

406

Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

The standard also specifies an error protection scheme utilizing forward error-correcting
Hamming code and parameter smoothing.
The major computational parts of the algorithm are the pitch search and the codebook search,
both of which are performed four times per frame. An important technique to reduce the computations is the end-correction convolution technique (see Pitch and Codebook Search). This is a recursive convolution method that reduces the number of multiply-adds by an order of magnitude.
In addition, the codebook is designed to have approximately 75% of the samples equal to
zero. This allows many of the convolution updates in the code book search to be reduced to a simple
shift of a vector of samples. On DSP processors with circular addressing, this shift can be replaced
by using circular buffers.
To further reduce complexity, the pitch search is limited in range for every other subframe.
During even-numbered subframes, the optimal pitch value is performed over the range 20 to 147
(128 values). On the odd subframes, the search is only over the range 16 from the previous pitch
value. This also decreases the bit rate with a negligible effect on speech quality.
If adequate processing power is not available, you can implement an interoperable coder by
using a subset of the full codebook. For example, if only the first 128 vectors from the codebook
could be used, the sub-optimal coder would work with an optimal coder if the same frame structure
and bit rate were used.
These techniques produce complexity estimates for the USFS CELP coder ranging from 5.3
MIPS to 16.0 MIPS for a 128-vector and 1024-vector codebook, respectively[4].

Using SPOX in Development
The computational complexity of CELP coders, even with use of the various techniques to
reduce it, has made real-time implementations impractical on first- and second-generation DSPs.
The recent introduction of the third-generation TMS320C30[5], however, makes it feasible to implement the USFS CELP coder with one or two processors. Furthermore, because of the generalpurpose capabilities of the TMS320C30 and the availability of a C compiler and SPOX, development of a real-time coder can be significantly expedited.
In particular, SPOX provides the following functions to facilitate software development.
• C standard I/O functions

- printf(), scanf( )
- fopen(), fread( ), fwrite( )
• Stream I/O to move data efficiently
• Standard set of DSP math functions
- Filters
Vector operations
- Windows
Levinson-Durbin algorithm
• Processor independence
Both FORTRAN and Cversions of the Version 2.3 USFS CELP coder were available as starting points for the real-time implementation. The initial development was done on a Sun workstaImplementation of a CELP Speech Coder for the TMS320C30 Using SPOX

407

I

tion equipped with SPOX/SUN [6] and the usual UNIX programming tools, such as the symbolic
debugger dbx. SPOX/SUN is a library of SPOX DSP math functions that can be used for developing SPOX applications on Sun workstations. The new version of the coder utilizing SPOX was
checked against the existing implementation for correctness. After the new version was debugged
on the workstation, the source code was recompiled employing the Texas Instruments TMS320C30
C compiler and linked with the SPOX/XDS library for the XDS 1000 development system.
The same facilities for testing the code on the workstation were available on the XDSlOOO ..
A SPOX stream function (see Input/Output section) read digitized speech from a disk file. Status
information was printed to the console screen. Command line arguments were used to vary the encoder's parameters such as the codebook size.
The software development process for the USFS CELP coder followed three evolutionary
steps:
• C program using standard 1/0
• C program using SPOX functions for faster math and I/O
• C program using SPOX and assembly language optimizations
The first step was taken because an existing C implementation was available. The C standard
I/O provided by SPOX made it possible to run the application code written in C directly on the
XDSIOOO. For example, functions (fscanf()) that read control information from a disk file on the
Sun also worked on the XDSIOOO using the PC's hard disk.
In general, it would have been easier to start with the SPOX library functions to implement
some of the common operations contained in the coder. Many of the functions needed (filtering,
correlation, dot-product) are in the SPOX DSP library. In this case, the C implementations of these
standard vector and filter functions in the existing program were replaced with the corresponding
SPOX functions. The SPOX functions, written in optimized assembly language, execute several
times faster than the corresponding C functions.
The last step was needed to meet real-time constraints. XDS 1000 timing capabilities allowed
the identification of two time-critical sections of the code which were then rewritten in
TMS320C30 assembly code. Since the interface to the SPOX math functions is open, new math
functions can be written that work with SPOX data structures such as vectors and filters.

Implementation
Several major parts of the USFS CELP encoder are implemented with a mixture of C, SPOX,
and TMS320C30 assembly language functions. The decoder can be easily constructed from the
material presented here. An adaptive postfilter for the decoder is not described here.
The framework of the resulting encoder is shown in Figure 2. A description of the major
functions performed can be found in the following sections. Appendix A provides a short summary
of the SPOX functions employed in the next four sections (Input/Output, Spectrum Analysis, Filters, and Pitch and Codebook Search).

408

Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

Figure 2. Structure of the Encoder Function
encoder(instream, outstream)
SS Stream
instream;
ss=stream
outstream;
while ( SS_get(instream, SV_array(speech))

) {

/* Apply a high pass filter to the input speech */
SF_apply(hpfilter, speech, speech);
/* Find the coefficients of the short-term prediction filter */
calculateLP(speech, invcoeffs);
/*
* Convert the direct form coefficients to line spectrum pairs.
* Then quantize the LSP's and convert back to direct form.
*/

SV_a2lsp(invcoeffs, lsps);
quantizeLSP(lsps, qntzlsps);
SV_lsp2a(qntzlsps, invcoeffs);

/*
* For each of the 4 subframes, determine the pitch prediction
* parameters and codebook (excitation) parameters
*/
for Ii = 0; i < 4; i++) (
genShortResidual(s[i], res[i]);/*
pitchSearch(s[i], res[i]);
/*
genFullResidual(s[i], res[i]); l*
codeSearch(res[i], reshat);
/*
updateFilters(reshat);
/*

generate short term residual */
find optimum pitch predictor */
generate residual */
find best codebook vector */
update filter states */

}

packParams();
/* pack parameters into output array */
SS-put(outstream, params);

Input/Output

»,

Input speech samples are obtained by employing a function (SSJet( which reads data
from a named stream (instream). The creation of instream during program initialization determines the source of the data. During development, the easiest source is a disk file with digitized
speech. When real-time testing is needed, a codec connected to a TMS320C30 serial port could be
utilized. For example, instream could be created to read from standard input with the following
code segment.
#define FRAMESIZE
instream

=

240 * sizeof(Float)

SS_create(OF_FILE, OF_STOIN, FRAMESIZE, NULL);

The output stream (outstream) consists of the packed frame parameters. It could also go to
a disk file or a serial port by using SSJlut().

Spectrum Analysis
Mter preconditioning the signal with a highpass filter (see the Filters section), the coefficients of the short term prediction filter can be found by using the function calculateLP( ) shown
below.
Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

409

window,

~c,

e~~o~,

co~,

gammavec;

calculateLP(s, coeffs)
SV_Vector
5, coeffs;
SV window(s, window,s);
sv=corr(s, 5, cor);
SV_autorc(cor, coeffs, rc, error);
SV_mu12(gammavec, coeffs);

/* window the speech in-place */

/* autocorrelation */
/* Levinson-Durbin */
/* bandwidth expansion */

The vector window is initialized to contain the desired window; in this case, a Hamming window is used. The autocorrelation terms are stored in the vector cor that has the same length as the
order of the short term filter. SV_autorc() uses a Levinson-Durbin type algorithm to compute the
inverse filter coefficients. As a side effect, the reflection coefficients are also stored in rc. Finally,
a IS-Hz bandwidth expansion is produced by the multiplication ofthe inverse filter coefficient vector by a vector (gammavec) consisting of the terms
g[i]

= 0.994i

for i

= 0,

I, • . • ,

m-l

Efficient quantization is obtained by:
• Transforming the prediction coefficients into line spectrum pairs (LSPs)
• Then quantizing the LSPs
The conversions between prediction coefficients and LSPs are not currimtly in the SPOX library. The existing C implementation evaluates cosine values directly, which is too expensive computationally. A more efficient routine (SV_a2Isp(», that employs table-lookup of cosine values,
has been written utilizing the algorithm outlined in [7]. The quantized LSPs are transformed back
to direct-form coefficients for use in the short-term predictor.

Filters
Three filters in the encoder can be realized by use of SPOX filter objects. The inverse filter
A(z) and the short term predictor l/A(z) share the same filter coefficients. The former is an FIR filter
and the latter an all-pole filter. The final filter is the all-pole weighting filter W(z) with coefficients
given by l/A( Az), with A = 0.8.
During the initialization of the encoder, the filters are created with the code fragment shown
below.
#define FILTERSIZE
SF Filter
. SV-Vector
SA=Array

11 * sizeof(Float)
invfilter, predfilter, wgtfilter;
invcoeffs, wgtcoeffs;
array;

array = SA create(SG CHIP, FILTERSIZE, NULL);
invfilter ~ SF_create(array, NULL, NULL);
SF_bind(invfilter, invcoeffs, NULL);
array = SA_create(SG_CHIP, FILTERSIZE, NULL);
predfilter = SF_create(NULL, array, NULL);
SF_bind (predfilter, NULL, invcoeffs);
410

Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

array = SA_create (SG_CHIP, FILTERSIZE, NULL);
wgtfilter = SF_create (NULL, array, NULL);
SF_bind(invfilter, NULL, wgtcoeffs);
Note that the inverse and prediction filters are both bound to the same coefficient vector. For
each new frame of speech, this vector is updated when it is passed to calculateLP().
An important consideration is that the filters are used more than once during a frame. A different signal is filtered each time, but the state (history) of the filter must be the same. This is accomplished before each filter operation by using the

• SF~etstate() function to recover a vector with the state of the filter at the end of the previous frame

• SF_setstate( ) function to restore the filter's state
The following code segment shows how the short term prediction residual is generated for
the pitch search.
SF setstate(predfilter, NULL, predstate);
SV-fill(residual, 0.0);
SF=apply(predfilter, residual, residual); /* zero input of filter */
SV_sub3(residual, speech, residual);

/* speech - history */

SF setstate(invfilter, invstate, NULL);
SF=apply(invfilter, residual, residual);

/* filter with inverse */

SF setstate(wgtfilter, NULL, wgtstate);
SF=apply(wgtfilter, residual, residual);

/* filter with weighting */

Pitch and Codebook Search
After the program finds the short-term predictor and generates the corresponding residual,
the pitch predictor and code book parameters are found for each of the four subframes. The pitch
and code book search functions are similar: both search over a set of values to minimize an error
term. In this section, only the code book search is illustrated (see Figure 3). Many of the functions,
however, can be applied to the pitch predictor calculations.

Figure 3. Codebook Search Block Diagram
h
(WEIGHTING FILTER
IMPULSE RESPONSE)

r
(RESIDUAl)

ERROR

}---+--_

GAIN

CODEaOOK

The search in Figure 3 minimizes the distance between the input vector and one of many generated vectors. The quantity being minimized is the Euclidean norm:
Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

411

e

= II r-r 112
=r' r - 2 rt r + rt

(1)

r

where

r =the original residual

r=the synthesized residual
It can be seen from the vector definition that only two terms need to be computed - the correthis is because the energy of the original residual is invariant
lation of rand and the energy of
over all the generated residuals. It appears that there would be N convolutions and 2N dot products
to perform for each sub-frame. Implemented directly, the codebook search would thus require 66
MIPS if N =256 and a sub-frame length of 60 are specified.

r

r;

Instead, the USFS CELP coder uses a specially structured codebook that greatly reduces the
computational load. The bigge~t savings comes from the elimination of all but one of the convolutions for each subframe. The codebook is overlapped, as shown in Figure 4.
Figure 4. Structure of Overlapped Codebook

~~

X3

E~

X2

X1

XO
This structure permits a recursive convolution computation. The first codebook vector is
convolved normally with the weighting filter. Subsequent convolutions, however, make use ofthe
following relationships.

V/+l(Z) =z-IRi(z) + Xi+l(1]H(z)
Ri + 1(z) =Z-IVi+1(Z) + Xi+l[O]H(z)

(2)

where R;{z) is the Z-transform of the generated residual. Given the convolution of the previous codebook vector with the weighting filter, the convolution employing the next vector can be
found with only 120 (2 x 60) multiplies and adds.
This number can be further reduced by another property of the codebook. The vectors are
generated by center-clipping a gaussian noise source, which causes approximately 75% of the elements to be zero. Thus, 75% of the updates to the convolutions require no multiplications or additions; however, the convolution elements must still be shifted. The following function update( )
implements the recursive update operation. Note that it must be called twice per codebook vector,
once for each new term.
412

Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

update(x, res, wgtimpulse)
Float
x;
SV_vector
res, wgtimpulse;
Float
Int

*rptr, *rptrml, *wptr;
len;

len
sv_getlength(res);
rptr = (Float *) SV_loc(res, len - 1);
rptrml = rptr
1;
if ( x == 0.0 ) {
for (; len> 1; len--) {
*rptr-- = *rptrml--;
}

*rptr

/* no input, so just shift */

= 0.0;

}

/* update using new input */
else {
wptr = (Float *) SV_loc(wgtimpulse, len - 1);
for (; len> 1; len--) {
*rptr-- = *rptrml-- + x * *wptr--;
}

*rptr

=

x * *wptr;

Once the convolution has been determined, the corresponding error and gain can be found.
The following function calculates the error and gain terms.
Float error(res, reshat, gain)
SV Vector
res, reshat;
Float
*gain;
Float

cor, energy;

SV_dotp(reshat, reshat, &energy);
SV_dotp(reshat, res, &cor);
*gain = cor / energy;
return( *gain * cor );
}

The codebook search function with update( ) and error( ) functions is shown below. The
first convolution must be calculated directly, so it is done outside of the main for loop. The error
for each entry is compared against the current maximum; if it is greater than the maximum, this
entry becomes the new best vector. The process is repeated for each of the N vectors.
codebook, wgtimpulse;
codeSearch(res, reshat)
SV_Vector res, reshat;
Float
Float
Int

errmax, gain, err;
*cbptr;
i, best;

findlmpulse(wgtimpulse);
SV_setbase(codebook, FIRSTVEC);
convolve (codebook, wgtimpulse, reshat);
errmax = error(res, reshat, &gain);
Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

413

best = 0;
cbptr = (Float *) SV_loc(codebook, 0) - 1;
for (i = 1; i < N; i++) {
update(*cbptr--, reshat, wgtimpulse);
update(*cbptr-~, reshat, wgtimpulse);
if ( (err = error(res, reshat, &gain)) > errmax ) {
errmax = err;
best = i;

After the search is completed, the gain of the best vector is recomputed and quantized. The
corresponding gain index and index of the code book element can then be readied for transmission.

Assembly Language Enhancements
The codebook and pitch searches require the largest share of the computation cycles in the
encoder. One way to increase performance is to recode critical parts of these functions in assembly
language. One such function is the update( ) function described above for the recursive convolution computation.
An assembly language version of update( ) was written to take advantage of the parallel instructions and repeat block capabilities of the TMS320C30. The assembly language function utilizes the same calling structure as the C version. The function was written using the assembly language macros provided with SPOX to work with the vector, matrix, and filter objects in the DSP
library[8]. The new version of update( ) is listed in Figure 5.

414

Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

Figure S. Update Function Written in TMS320C30 Assembly Language
*

* Synopsis:

*
*
*
*
*

void update(x, res, wgtimpulse)
Float
x;
SV_Vector
res, wgtimpulse;

#include 
FP

-

. set
ar3
.global _update
.text

update:

*
*

push
ldi

FP
sp, FP

Set the following registers by using vector object macros
arO
SV loc(wgtimpulse, 0)
arl
SV=loc(res, 0)
rc
the length of the vectors
r2 - x

*

*
*

*

*

Idi
*-FP(2), ar2
SV getl ar2, SV LOCO, arO
IdI
*-FP(3)-; ar2
SV_get2 ar2, SV_LENISV_LOCO, rc, arl

*
Idf
bzd
subi
addi
Idi
*
*
*

*-FP(4), rl
shift
l, rc
rc, arl
ar1, ar2

x
x is 0 so just shift

ar1 -> res[l
ar2 -> res[i

1]
1]

General case when x 1= 0.0
addi
subi

rc, arO
2, rc

arO -> wgt[l - 1]
set loop count

mpyf
addf

r1, *arO--, r2
r2, *- -ar2, rO

x * wgt[i]

rptb
mpyf
addf
H2O:
stf

Ip20
r1, *arO--, r2
r2, *- -ar2, rO
rO, *ar1--

x * wgt[i]

bud
stf
mpyf
stf

end
rO, *arl-rl, *arO, rO
rO, *arl

res[O)

*
* Case for x -- 0.0
*
shift:
subi
2, rc
ldf
*- -ar2, rO
sIp:
II

*
end:

rptb
Idf
stf

sIp
*--ar2, rO
rO, *ar1--

stf
Idf
stf

rO, *ar1-0.0, rO
rO, *arl

pop
rets

FP

x*wgt[O)

loop 1 - 1 times
prime the pipe

final store
first term = 0.0

Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

415

Performance
A complete CELP encoder was implemented as described above. TWo versions were tested:
• One encompassing C and standard SPOX functions
• One having C, SPOX, and two custom TMS320C30 assembly language functions
Table 2 shows the execution times for different combinations of codebook size, processor,
and implementation. To achieve near real-time performance for a codebook with 128 vectors, the
codebook and pitch search functions were completely rewritten in assembly language. Each function required approximately 130 lines of assembly code.

Table 2. Timing of Various Implementations of the CELP Encoder
for One Frame of Speech
Codebook Size

Sun (C/SPOX)

C30 (C/SPOX)

C30 (C/SPOx/ASM)

128
256

16,000 ms
24,000 ms

88.2 ms
114.6 ms

39.0 ms
54.3 ms

Memory requirements for the program on the TMS320C30 were approximately 14,000
words for instructions and approximately 6,000 words for data. The application code required approximately 4500 words of instructions. The SPOX operating system and DSP math functions consumed the remaining 9500 words of memory. This figure reflects many functions that are essential
for easing development but unnecessary for a real-time implementation.
Once a real-time implementation has been achieved, the SPOX memory requirements can
be greatly reduced by porting (or customizing) SPOX to a custom hardware implementation. In this
case, the SPOX memory requirements can be reduced to approximately 4000 words, making a
12K-word implementation feasible (both data and instruction memory requirements).
These timings show that a real-time CELP coder can be implemented on a single
TMS320C30. They also illustrate the power of the TMS320C30 compared to a standard microprocessor. Note that a TMS320C30 implementation has approximately 500,000 instruction cycles
available in a 30-ms frame.
Version 3.0 of the USFS CELP coder has significant improvements in computational complexity, including:
• Ternary codebook to eliminate multiplications
• Shorter codebook
• Faster LSP conversion and quantization
Work to bring the SPOX implementation up to Version 3.0 is continuing. An investigation
of a two-processor implementation is also being performed.

416

Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

Summary
A 4.8-kbps CELP coder based on a Department of Defense-proposed standard has been implemented on a TMS320C30. Several ofthe functions used in the encoder were illustrated. A suboptimal implementation of the encoder using a 128-vector codebook is possible on only one
TMS320C30. Work is continuing on both the algorithm and the software implementation to improve the coder's real-time performance.
With SPOX, the encoder was developed in less than one month. The resulting source (with
the exception of two TMS320C30 assembly language functions) can be compiled and run on a Sun
workstation, a PC, or a TMS320C30 system such as the Texas Instruments XDSI000. This represents a considerable improvement in development time and effort over previous implementation
methods.

References
1)
2)
3)
4)
5)
6)
7)
8)

Kemp, D.P., Sueda, R. A., and Tremain, T. E., "An Evaluation of 4800 bps Voice Coders," Proceedings of ICASSP '89, IEEE, May 1989.
Campbell, J. P., Welch, V. c., and Tremain, T. E., "An Expandable Error-Protected 4800
bps CELP Coder," Proceedings ofICASSP '89, IEEE, May 1989.
Atal, B. S., and Schroeder, M. R., "Stochastic Coding of Speech at Very Low Bit Rates,"
Proceedings of ICC '84, pages 1610-1613,1984.
Tremain, T. E., Campbell, J. P., and Welch, V. c., "A 4.8 kbps Code Excited Linear Predictive Coder," Proceedings ofMobile Satellite Conference, pages 491-496, May 1988.
Texas Instruments, Inc., Third-Generation TMS320 User's Guide, 1988.
Spectron MicroSystems, Inc., SPOX/SUN User's Guide, April 1989.
Soong, F. K., and Juang, B. H., "Line Spectrum Pair (LSP) and Speech Data Compression," Proceedings of ICASSP '84, pages 1.10.1-1.1004, IEEE, 1984.
Spectron MicroSystems, Inc., Adding Math Functions to SPOX, March 1989.

Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

417

Appendix A
The SPOX functions used in the code examples are briefly described below. Complete descriptions can be found in Getting Started With SPOX and the SPOX Programming ReferenceManual. These manuals are supplied with the XDSIOOO. They are also available from Spectron MicroSystems, Inc.
.Stream Functions
SS_get - get data from a stream into an array
Int SS_get{stream, array)
SS Stream
stream;
array;
sA=Array
SS_put - put data from an array to a stream
Int SS_put{stream, array)
SS Stream stream;
SA=Array
array;

Vector Functions
SV_autorc - perform inverse filter calculations
void Sv_autorc{cor, inv, rc, alpha)
SV Vector
cor;
SV-Vector
inv;
SV-Vector
rc;
sv=vector
alpha;
calculate correlation of two vectors
SV Vector SV corr{srcl, src2, dst)
SV Vector
srcl;
SV-Vector
src2;
Sv=vector
dst;
SV_dotp - calculate the dot product of two vectors
SV Vector SV_corr{srcl, src2, result)
SV Vector
srcl;
SV-Vector
src2;
Float
*result;
fill a vector with a value
SV Vector SV fill{vector, value)
- SV Vector
vector;
Float
value;
SV_getlength - return the length of a vector
Int SV_getlength{vector)
SV_Vector
vector;
return the address of a vector element

418

Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

Ptr SV loc(vector, num)
SV-vector
vector;
Int
num;
mUltiply elements of two vectors
SV_Vector SV_mul2(src, dst)
SV Vector
src;
sv=vector
dst;
SV_setbase - set the base of a vector
void sv_setbase(vector, base)
SV vector
vector;
Int
base;
subtract elements of two vectors and store results in a third
vector
SV Vector SV sub3(srcl, src2, dst)
SV Vector
srcl;
SV-Vector
src2;
sv=vector
dst;
apply a symmetric window to a vector
SV Vector SV window(src, wnd, dst)
SV Vector src;
SV-Vector wnd;
Sv=vector dst;

Filter Functions
SF_apply - apply a filter to a vector
SV Vector SF apply(filter, input, output)
- SF Filter
filter;
Sv-vector
input;
sv=vector
output;
SF bind

bind coefficient vectors to a filter
Void SF bind(filter, num, den)
SF Filter
filter;
SV-Vector
num;
Sv=vector
den;

SF_getstate - copy filter state arrays into vectors
Void SF_getstate(filter, hisinv, hisoutv)
SF Filter
filter;
SV-Vector
hisinv;
sv=vector
hisoutv;
SF_setstate - copy vectors into filter state arrays
Void SF setstate(filter, hisinv, hisoutv)
SF Filter
filter;
SV-Vector
hisinv;
sv=vector
hisoutv;

Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

419

420

Implementation of a CELP Speech Coder for the TMS320C30 Using SPOX

Part V. Computers
12. A DSP-Based Three-Dimensional Graphics System
(Nat Seshan)

421

422

A DSP-Based
Three-Dimensional Graphics System

Nat Seshan
Digital Signal Processor Products-Semiconductor Group
Texas Instruments

423

424

A DSP-Based Three-Dimensional Graphics System

This application report is based on the author's bachelor's thesis at the Massachusetts Institute of Technology.
The placement of a high-performance computational engine, such as an advanced digital signal processor, between the host processor and the video controller in a graphics system can improve
performance tremendously. Several factors make the Texas Instruments TMS320C30 Digital Signal Processor well-suited to this task:
• 32-bit floating point arithmetic provides both high-resolution and large dynamic range in
calculation.
• Single-cycle, 60-ns instruction execution and parallel bus access greatly improve system
throughput.
• A hardware single-cycle multiplier facilitates the matrix arithmetic, which is frequently
required in 3D graphics.
• The ease of programmability allows the design of flexible and expandable systems.
• Software tools, such as simulators[1], assembler/linkers[2], and high-level language debuggers/compilers[3], decrease product development time.
• In-circuit scan-path emulators[ 4], decrease hardware prototyping and debugging time.
• The use of a standard device lowers the overall system cost.
With the use of the TMS320C30, the host processor can request higher-level commands of
the rest of the system. Instead of issuing requests for line-draws or screen clears, it can, for example,
request that a 3D object be rotated 90 degrees and then be redrawn. In addition, a rendering element
(usually a video controller or graphics system processor) can devote its resources solely to screen
management rather than doing some portion of the computationally intensive processing. The following pages provide a description of how a 3D graphics system used the TMS320C30 to compute
object transformations.
The digital signal processor resides on the TMS320C30 Application Board (C30AB) designed for the IBM PC/AT or compatible. The PC's 80x86 acts as the host processor and communicates to the C30AB through an 8-bit bus slot. Also resident on the bus is a Texas Instruments
TMS34010 Software Development Board (SDB)[5,6]. The SDB contains a TMS34010 Graphics
System Processor (GSP) [7], which manages the screen memory and drives the video display.
Overall, this system is meant to serve as an instructional model of how a graphics system can be
designed using an advanced digital signal processor.

The Potential for Graphics Pipelines
A mechanical engineer for an automobile manufacturer wants to design a robot arm for plant
automation. Before building a prototype machine, he wishes to compare the ways in which various
designs can pick up and assemble components. To do this,the engineer needs a CAD system capable
of creating, storing, and adjusting representations of 3D objects and then rendering the images on
a video display. The CAD system has four basic aspects:
1) A user interface for command entry.
2) A data management system to store objects and their screen representations.
3) One or more computational engines to perform high-speed calculations for applications
such as transformations, clipping, lighting/shading, and fractal graphics.
A DSP-Based Three-Dimensional Graphics System

425

4)

A rendering engine to control the video memory and to drive the video display..

These four tasks are common to many graphics systems, whether they be intended for CADI
CAM, fractal graphics, heads-up displays in fighter aircraft, or Postscript printer control. If one or
more processors are assigned to each function, the resulting pipeline will achieve greatly improved
system throughput.
In a single-processor system, the CPU is directly responsible for all computations. It must
write to video memory, perform all necessary computations, interface to the user, and manage all
data storage and recovery. Although additions to the system, such as a video-memory controller
or a floating-point coprocessor, may speed up the system, the CPU remains overly burdened as the
only intelligent component of the system.

Independent Screen Management
A two-processor system can use a GSP to drive the CRT and to control the video memory.
To control the display, the GSP either must interface to an analog monitor through a color palette
or must directly drive a digital monitor. If the video memory is volatile, the processor needs a refresh controller that runs in parallel with other processor actions. Special hardware can be developed for screen clears and polygon fills. For flexibility of data representation, the processor should
to be able to access pixels of varying bit-widths. At the instruction level, specialized operations
could be created to speed pixel processing. Libraries of subroutines for windowing, drawing, and
text management enable the rendering engine to execute higher-level commands. Overall, these
features allow the CPU to send more powerful directives to the GSP.

A Multiprocessor Pipeline
Adding more links in the graphics pipeline can further relieve the CPU of burdensome tasks.
Performance improvements result from each stage being optimized for a particular function. In addition, throughput increases with the number of stages. The pipeline may also contain multiple processors running in parallel at a particular stage to further improve the latency of that stage. Figure
1 shows a full-scale implementation of a graphics pipeline for 3D graphics.

426

A DSP-Based Three-Dimensional Graphics System

Figure 1. A Full Scale Graphics Pipeline

r-----i~

CLIPPING AND

>+----1 PERSPECTIVE r----.t
ENGINE

0

DIGITAL
SIGNAL
PROCESSOR

0

GENERAL
PURPOSE
CPU

0

TRANSFORM
ENGINE

LIGHTING
ENGINE

GRAPHICS
SYSTEM
PROCESSOR

In a large-scale graphics pipeline, the host processor runs the applications program. The user
may be trying to use a CAD program, model the formation of galaxies, animate 3D objects, etc.
The host runs these programs at the top level, provides the user interface, and communicates to all
I/O devices, including mass storage systems. For numerically intensive applications it may be appropriate to have a digital signal processor as this host. For example, modeling the formation of
galaxies requires numerical solutions to systems of differential equations. But even in such a case,
it would be reasonable to have a more general-purpose CPU act as a user front end to the digital
signal processor.
The purpose of the object manager is to communicate with the host by receiving data and
transferring it to other processors in the system. It mana,ges the global representation of all screen
parameters and objects. A Reduced Instruction Set Computer (RISC) processor would be
well-suited as either the host or the object manager because of its high-performance general-purpose architecture.
Because a DSP has a highly parallel architecture, a fast execution cycle time, an instruction
set optimized for numerical processing, and several development tools, it would perform well as
any of the computational stages in a graphics pipeline. For example, a DSP could act as a transform
manager that calculates the new universal coordinates of globally stored objects according to rotaA DSP-Based Three-Dimensional Graphics System

427

tion, translation, and scaling commands from the object manager. Also, the DSP could act as a lighting manager that accepts parameters of environmental lighting settings from the object manager
and applies them to the transformed objects. For example, the user may set ambient intensities as
well as other sources of varying geometries, intensities, and colors. The lighting manager then applies these light sources to the surfaces of the objects, which may have varying degrees of specular
or diffuse reflection, to compute the necessary shading.
Although the perspective and clipping stage of the system is represented in Figure 1 by a
single processing unit, the task may be further partitioned to several DSPs working in series. The
perspective calculation takes viewing parameters from the object manager, such as direction of
view, location of viewer, and zoom, and produces a two-dimensional projection for the screen. Objects that are too high, too low, or too far right or left can be clipped automatically because the resulting two-dimensional coordinates are off screen. However, clipping objects fully or partially obscured by other objects may require additional stages. Also, objects behind the viewer and those
too far away for the user to recognize should be clipped appropriately.
Although digital signal processors are well-suited to be the computational stages of a graphics pipeline, a processor optimized to be a rendering engine might serve better to drive the video
display and manage the video memory. Such a processor could also help with the clipping tasks
described above. A z-buffer could hold the transformed z-coordinate of each pixel that is projected
on to the x-y plane ofthe screen to facilitate hidden surface removal. A device such the Texas Instruments TMS34010 or the recently introduced TMS34020 could serve as the rendering engine in a
full scale system. Both these processors have 32-bit general-purpose architectures with instruction
sets and external memory interfaces optimized for graphics.

An Overview of This Implementation
The system shown in Figure 2 is not intended to be a marketable product. Rather, it is targeted
toward those who have the intention of designing products in the graphics market. Firms having
experience in graphics will be able to resolve the tougher issues of graphics system design without
presentation of the described system. The system shown in this report illustrates an attractive option
for designing a fast, reliable, portable graphics system with quick turn-around time.

Figure 2. A Simple Three-Processor Graphics Pipeline

IBM PCXT
BUS
INTEL 80X86/7

428

TRANSFORM
AND
PERSPECTIVE

TI TMS320C30

TI TMS34010

A DSP-Based Three-Dimensional Graphics System

One strength of this system is its complete use of standard, commercially available parts. In
general, use of standard parts allows for faster design and manufacturing, as well as a more reliable,
easier-to-support product. Even the three hardware subsystems can be found on the market:
1)
2)
3)

The IBM PC compatible host
The TMS320C30 Application Board object manager and transform engine subsystem
The TMS34010 Software Development Board rendering subsystem

Another strength of this system is the complete use of portable software. Use of portable software often speeds design times because system software can be mostly debugged before the actual
target hardware is available. All software for this system was written in Kernigan and Ritchie C.
The command and rendering routine was first debugged on the PC and GSP with the intermediary
stage removed. Once debugged, the computationally intensive portion of the software was ported
to the DSP, which then assumed control of the GSP. The software on the TMS34010 SDB used
many of the graphics routines in the TMS34010 Graphics/Math Library. These routines have been
used in many other graphics systems using the TMS3401O.

System Hardware
The IBM PC was chosen as the host because of its extensive support by TI development tools.
In addition, a large amount of documentation is available concerning interfacing to the PC bus. The
system described in this report is designed to run best on an 80386-based IBM PC compatible with
an AT power supply and an 80387 floating-point coprocessor. However, either Intel 8086 or 80286
general-purpose microprocessors can also act as the host to the computational engine. The host
computer sends commands to
• Load and delete objects
• Target an object for adjustment
• Adjust a particular object
• Recalculate the perspective or
• Redraw the screen.
The 80X87 floating-point coprocessor is not absolutely necessary but greatly improves the
time to generate floating-point parameters for the next stage.
This graphics demonstration was the first application developed using the TMS320C30
Application Board (C30AB). Since that time, the C30AB has been included as a part of the
XDS1000 emulation system for the TMS320C30 Digital Signal Processor. The TMS320C30's features include
• 60-ns single-cycle execution time (more than 33 MFLOPS)
• 2K x 32-bit dual-access RAM
• 4K x 32-bit dual-access ROM
• 64 x 32-bit instruction cache
• Two 32-bit external memory expansion buses
• Single-cycle floating-point multiply/accumulate
• Two external 32-bit memory ports
A DSP-Based Three-Dimensional Graphics System

429

• On-chip DMA controller
• Zero-overhead loops and single-cycle branches
• Two on-chip timers and two serial ports
• Floating-point/integer and logical 32/40-bit ALU
• 16M-word memory space
• Register-based CPU
• Development tools, including a simulator, assemblerllinker, optimizing C compiler, Csource debugger, and an in-circuit emulator/debugger
• On-chip scan-path emulation logic
• Low-power CMOS technology
The TMS320C30 executes commands from the 80X86 to transform objects, load objects into
or delete objects from the system, and compute the projection of 3D objects on the 2D screen. When
given a directive to draw the screen, it sends a command to the rendering engine to clear the current
screen. Then, the TMS320C30 transfers lists oflines, points, and polygons for the next stage to render.
The TMS34010 Software Development Board (SDB) has been used in TMS34010 development support since 1987. It is configurable for a variety of monitors. The board supports the
TMS34010 Graphics/Math Function Library [8] (a library of high-level routines callable from any
C program). This board was slightly modified to receive commands from the C30AB as well as
from the PC host. Program loaders, C compilers [9], assemblers, and C language standard I/O library support have been developed for this board, as well as for the C30AB. Both cards interface
to an IBM PC through an 8-bit slot on the AT bus. The TMS34010 GSP on the SDB is an advanced
high-performance CMOS 32-bit microprocessor optimized for graphics display systems. Its key
features include:
• 160-ns instruction cycle time
• Fully programmable 32-bit general-purpose processor with a 128M-byte address range
• Pixel processing, X-Y addressing, and window clip/pick built into the instruction set
• Programmable pixel size with 16 boolean and 6 arithmetic pixel processing options (Raster-Ops)
• 31 general purpose 32-bit registers
• 256-byte LRU on-chip instruction cache
• Direct interfacing to both conventional DRAM and multiport video RAM
• Dedicated 8/16-bit host processor interface and HOLD/HLD interface
• Programmable CRT control (HSYNC, VSYNC, BLANK)
• Full line of hardware and software development tools, including a C compiler
The TMS34010 GSP receives commands from the TMS320C30, along with arrays of points,
lines, and filled polygons to be drawn. It then uses library routines to render these images on the
video display.
430

A DSP-Based Three-Dimensional Graphics System

System Limitations
The system described here is an instructional system built in a limited development time. Aspects of the system could be optimized for speed and for memory usage. A high-speed 3D graphics
system has many features that were not implemented.
This design is non-optimal in several ways. The C routines could be hand-coded to execute
faster. A 32-bit host bus interface would allow word-at-a-time data transfers to the TMS320C30.
The GSP could be interfaced to faster video memory. At the time of this writing, the TMS34020
second-generation graphics system processor is available. The entire TMS320C30 program could
be configured to run from internal memory. Many of these optimizations were not realized because
of the limited time available for developing the system.
Many operations that an advanced digital signal processor could easily perform were not designed into this system. These tasks include curved and textured surface generation, lighting, shading, and front and back clipping. For demonstrative purposes, only the endpoint transformation and
perspective calculations were implemented.
Similarly, the capabilities of the GSP are clearly underutilized in this pipeline. The GSP is
adept at managing multiple windows for display. It can also display text in various fonts. The presented system simply requires that the GSP manage a single graphics-only (no text) window.

Representation of Graphics Elements
Any graphics system must have a method of representing the image to be portrayed on the
screen. This method requires a system that is able to store and display primitive elements. These
elements could range in complexity from three coordinates describing a point to a set of parametric
equations representing an irregular three-dimensional surface. However, simply defining a set of
primitive drawing structures does not result in an adequate graphics data representation. The engineer designing the robot does not think of the system as several sheet-metal polygons welded together. He more likely conceives of the arm as a clamp attached to a hand, which, in turn, is attached
to an arm, etc. A powerful graphics system must not only describe the primitives to be rendered
on the CRT, but also how the primitives are organized or related.
Frames of reference play the central role in the organization of graphics primitives. Any set
of graphics primitives rigid with respect to each other can be said to exist in the same, constant
frame. When the primitives move, they move as a single unit and remain in the same orientation
with respect to each other. In this system, any such set of primitives is called an object. The transformational state of any object is determined by three sets of three parameters each. These sets of the
object correspond to the
• Translation
• Scale
• Rotation
Translation of an object within its frame simply amounts to moving all locations in that frame
a specified distance along the X-, yo, and z-axes. Thus, each object must hold a set of translation
factors, denoted in this system's software by dx, dy, and dz (See Listing 1 in the Appendix). SimiA DSP-Based Three-Dimensional Graphics System

431

larly, sx, sy, and sz determine the scale of an object. These factors determine how many units of
the untransformed object's coordinates are represented by one unit of the transformed object's
coordinates. The three parameters shown in Appendix Listing 1 that represent all possible orientations of an object (theta, phi, and omega) are described in Table 1.
Table 1. Angles of Rotation
Angle

Axis Rotation is
Around

Direction of
Positive Rotation

Zero Value

8

z

x to y

Positive x-axis

ill

x

y to z

Positive y-axis

+cos8 sinQ sin<\> ) r22 =sy(cos8 cos<\> -sin8 sinQ sin<\> ) . r23 =-szcosQ sin<\> r24 =sin<\> (sinQ (dxcos8 -d ysin8 )-dzcosQ )+cos<\> (d xsin8 +dycos8 ) r31 =sx(sin8 sin -cos8 sinQ cos<\> ) r32 =syCcos8 cos<\> +sin8 sinQ cos ) r33 =szcosQ cos<\> r34 =cos<\> (sinQ (-dxcos8 +d ysin8 +dzcosQ )+sin<\> (d xsin8 +dycos8 ) fJ2 (2.2) (2.3) (2.4) (2.5) (2.6) (2.7) (2.S) (2.9) (2.10) (2.11) (2.12) Note that there also exists a matrix p[3][4] (see Listing 1 in the Appendix) that represents the product of all the ancestral transform matrices of an object and that object's R matrix. This matrix represents the object's transformation from the absolute origin of the system. The Host Processor's Access to Objects The SOXS6 host can exert its control over objects in the following ways: 1) Target Objects - The host can set the target object for adjustment, deletion, or insertion of a child object by either targeting the parent object or a particular child object of the currently targeted object. 2) Load and Delete Objects - The host has the ability to add objects to the system with initial transform parameters. In addition, it can remove objects from the system (including all objects within the deleted objects). When the targeted object is deleted, the new target object defaults to being the object's parent. 3) Adjust Objects - By specifying the nine transform parameters, the host can adjust an object in its parent's frame. 4) Change Perspective - To change the viewing perspective, the host must request that the *universe be adjusted. 5) Update Screen Representation - The host can request that the targeted object and its child objects have their location array's screen representations updated. 6) Redraw View - Once all adjustments and updates of screen coordinates are re-specified, the host can request that the view be updated. Overall, the object structure serves well as a data representation for 3D graphics. A single set of locations is available to be referenced by the points, line segments, and filled polygons to be rendered on the screen. Each object contains parameters and matrices that specify the transformed state of the object. Thus, at any time these matrices could be applied to the original co-ordinates 436 A DSP-Based Three-Dimensional Graphics System loaded into the system to calculate the transformed location of the point. Therefore, as the transformation and the projection on to two-dimensional co-ordinates are done in one step, the original 3D coordinates can be retained and only the final modified two-dimensional screen representation need be updated. The point of view can simply be modified by adjusting the *universe as one would adjust any other object. Overall, the hierarchical object structure provides a powerful and flexible way to manage graphical data. DSP Command Execution The digital signal processor assumes the role of the object manager and keeps track of the representations. Before examining the precise manner in which the TMS320C30 processes the commands from the host, one needs to understand the underlying hardware of this subsystem. A description of the TMS320C30 Application Board can be found in the application report TMS320C30 Application Board Functional Description, located in this book. The report describes the avenues of communication between the C30AB and the PC over the PC's bus. An examination of how the TMS320C30 receives and processes data and commands from the 80X86/7 follows. Initialization As its first initialization task, the PC maps the dual-port SRAM of the C30AB into its address space by writing the 8 MSBs of address to the mapping register. It then brings the C30AB out of reset by writing a 1 to the SWRESET in the C30AB's control register. The PC then loads the TMS320C30 application program into the dual-port SRAM. Loader support software on the C30AB EEPROM moves the code to the proper location in the TMS320C30's address space. Finally, the PC switches the TMS320C30's memory map into run mode to start program execution. The first part of the main routine initializes the system (see Listing 8 in the Appendix). For the system software to run properly, the DSP software must initialize several different items. 1) 2) 3) In 1) 2) 3) 4) It enables the on-chip instruction cache. It sets the external flag bit on the C30AB target connector to transfer control of the rendering system from the PC to the C30AB (This assumes that the PCloaded the rendering software before it started up the C30AB). It configures both the primary and the expansion bus with zero software wait-states. Thus, all wait states are generated by the address-decoding PALs on the C30AB. addition, the linker configures Primary bus SRAM as program storage Expansion bus SRAM as heap memory allocation Zeroth page of internal RAM as space for system constants First page of internal RAM as the system stack. This configuration maximizes the potential for parallel data and instruction accesses A DSP-Based Three-Dimensional Graphics System 437 The initialization procedure then appropriates several local variables for system use, including 1) Two registered looping variables, i andj 2) The constant 2 PI 3) Registered pointers to the communication registers of the rendering subsystem, *hstdata and *hstcntl The TMS320C30 initially sets the contents of these GSP registers to indicate that the computational stage does not have any requests of the rendering stage. The TMS320C30 systeM software contains the global variables shown in Listing 7 ofthe Appendix. The dual-port SRAM pointer dual_port is initialized to point to the lowest location on the 1(0 expansion bus. This pointer points to an integer array that contains all data and command from the Pc. Another pointer to the currently targeted object (*to) is set to reference the universe. The *universe is set as its own parent with an obnum of 0, indicating no internal objects are loaded. During the final part of initialization, the C30AB software waits for the PC to load the static *universe object. To understand how the PC loads objects into the system, you must comprehend the general communications protocol between the TMS320C30 and the 80X86. Host to DSP Communication A two-way polling scheme arbitrates access of the dual-port SRAM. The software allocates the first two words of the SRAM as COMMAND and ACKNOWLEDGE signals, respectively (see Listing 6 in the Appendix). Remember that the TMS320C30 must mask off the 24 MSBs of dual-port data to receive the proper 8-bit value. The processors poll and write to these two words in order to send requests and acknowledgments. During initialization, the TMS320C30 clears both the COMMAND and ACKNOWLEDGE locations of the dual-port SRAM. The PC graphics application software must run after this point to ensure that this phase of the initialization does not clear a command from the Pc. Once the system software starts executing on both the PC and the TMS320C30, the following sequence enables the PC to send a command to the C30AB: 1) The PC waits for the dual-port SRAM to become free by polling the ACKNOWLEDGE word for a zero. 2) The PC loads all command parameters into the dual-port SRAM. 3) The PC then loads the appropriate command byte into COMMAND. 4) Once the TMS320C30 returns to its command detection loop, it acknowledges a received command by writing the same byte into the ACKNOWLEDGE word. 5) The PC sees that the TMS320C30 has acknowledged the command and writes OOh into COMMAND to withdraw its command. The PC thereby relinquishes control of the dual-port SRAM. 6) The TMS320C30 reads all necessary parameters into its main memory. 7) The TMS320C30, by writing a zero to the ACKNOWLEDGE word, indicates that the PC can request another command. This returns the seq~ence to step (1). The TMS320C30 treats all of its data types as 32-bit values, btJt it can read only one byte of valid data from the dual-port SRAM. Thus, the TMS320C30 must mask and concatenate the bytes that the PC maps into contiguous locations to form multibyte words. In addition, since Intel and 438 A DSP-Based Three-Dimensional Graphics System the TMS320C30 have different standards, floating-point values from the PC must be converted before the TMS320C30 can use them. The TMS320C30 can receive either unsigned 8-bit chars or unsigned 16-bit short integers from the Pc. The macros shown in Listing 6 of the Appendix are used to access these data types from the dual-port SRAM. The DPLONG macro takes a certain location in the dual-port, finds the short integer located there, and concatenates it into a 32-bit value for the TMS320C30. The word LONG in the macro indicates all integers whether chars, shorts, or longs are represented as 32-bit values by the TMS320C30. Table 7. Comparison ofIntel and TMS320C30 32-Bit Floating-Point Formats Standard Exponent Field Bits Exponent Format Sign Bit Mantissa Field Mantissa Format TMS320C30 Intel 31-24 30-23 Two's Complement Offset Binary 23 31 22-0 22-0 TWo's Complement Magnitude Table 7 illustrates the differences between the TMS320C30 and the Intel single-precision floating-point formats. For every floating-point value that the TMS320C30 receives, it must extract the appropriate fields, convert the fields to the appropriate numerical representation, and then reassemble the fields in TMS320C30 floating-point format. The dpfloat routine shown in Listing 9 of the Appendix uses the union structure flIong shown in Listing 6 of the Appendix to allow manipulations normally available only for integers on the floating-point value. The program first concatenates the four-byte value in the dual-port SRAM into a single 32-bit integer and then converts this word to TMS320C30 format. Computational Subsystem Software Using the communication techniques described in the last section, the TMS320C30 processes the graphics command from the Pc. After performing C30AB initialization, the program main enters a command detection/execution loop. For each valid value of the COMMAND byte, a C case statement executes the appropriate code. Since these routines are, in general, too long to be discussed in exhaustive detail, the rest of this section merely summarizes how they work. When the PC wants to load an object, it first loads the initial nine floating-point transformation parameters into the dual-port SRAM. It then loads the number of 1) Locations 2) Drawn points 3) Lines 4) Filled polygons These values are limited to 16 bits, thereby allowing for only 65,535 primitives of each type. The size of the dual-port SRAM further limits the array sizes in this implementation. Then the PC loads three floating-point parameters, (x,y, and z), for each location. The size of the dual port limits the number of locations to 377. Once these parameters are loaded into the memory, the host places the command byte for an object load into COMMAND. Upon reception of these parameters, the TMS320C30 allocates space for the object as a child of the current target object and also allocates A DSP-Based Three-Dimensional Graphics System 439 space for the location, point, and line arrays. Because the size of each polygon varies, space is allocated as each polygon is read. After allocating global space for the new object and loading the locations, the TMS320C30 requests more data from the PC. It first requests the points, then the lines, then each polygon. The dual-port SRAM limits the primitive arrays to 2047 points and 1364 lines. In addition, each polygon is limited to 4092 vertices. The TMS320C30 makes a data request by replacing the current COMMAND byte that it wrote in ACKNOWLEDGE with 127, the flag for the PC to load more data. Although the roles of ACKNOWLEDGE and COMMAND are reversed in this case, the TMS320C30 requests data in much the same way the PC requests commands. Once the TMS320C30 completes loading the object, it selects the object as the new target object. Finally, using the equations in Table 6, the TMS320C30 calculates the initial value of the object's transformation matrix. The target object is the object in the hierarchy selected for adjustment, deletion, or calculation of screen coordinates. The PC can either target an object's parent or one of the object's child objects. The command to target a child requires the PC to specify either the child object's sibling number or subnum. Thus, when selecting objects for adjustment, the PC must remember where it loaded objects into the hierarchy. To adjust the transformation parameters of a given object, the PC simply loads the new parameters into the dual-port SRAM. The TMS320C30 adds the values of the new angles of rotation and translation factors to the previous ones. In addition, the TMS320C30 multiplies the old scaling factors by the new ones. Then, the TMS320C30 calculates the transformation matrix of the object by using the equations in Table 6. It does not recalculate screen locations, however, until this is specifically requested by the pc. The TMS320C30 can thus avoid calculating screen coordinates until all adjustments have been made. Once the PC requests all the changes for a frame on the display, it requests recalculation of screen coordinates at each node it changed. The PC can request recalculation for a particular object and thus update its internal objects as well. This allows the TMS320C30 to avoid recalculating screen coordinates of unchanged locations. For maximum efficiency, the PC must request recalculation in the highest node that it adjusted along any particular path. Thus, in the planetary example given earlier, if, in a period of time, only Pluto and its moon Charon were moved (the other bodies miraculously standing still), only Pluto would need to be targeted for recalculation. To calculate transformations, the TMS320C30 multiplies the object's transformation matrix by its parent's parent transformation matrix to obtain its own parent transformation matrix, p[3] [4]. The TMS320C30 right-multiplies all locations within that object by this matrix to achieve the transformation from the absolute origin of the system. The computational engine calculates perspective by dividing the transformed x- and y-coordinate by the transformed z-coordinate so that locations farther away appear closer together. The plane z=O is defined to be the plane of the screen. This also has the feature that objects behind the viewer appear upside-down in front of the viewer because the objects' z-coordinates are negative. Thus, the program running on the PC must maintain all objects in front of the viewer. Then, the TMS320C30 recursively executes this procedure for each object within the targeted object. Unlike the recalculation of screen coordinates, the redrawing of objects is done for all objects within the system. Thus, the draw_object routine is called with the *universe as the argument. The 440 A DSP-Based Three-Dimensional Graphics System precise manner in which the TMS320C30 uses this program to redraw the screen is described in the TMS320C30 Drawing Routine Section found later in this report. Summary of DSP Command Execution The dual-port SRAM on the C30AB provides all means of communication between the PC and the TMS320C30. A two-way polling scheme arbitrates the TMS320C30's and the PC's access to this SRAM. Using this protocol, the PC can request object loading, deletion, or adjustment, but can request only modification of the object currently targeted for these changes. Also, at the host's request, the computational engine may recalculate the screen representation of all locations within the targeted object. Once all updates for a particular view are made, the PC may request a redrawing of the display. The description of the rendering subsystem, presented next, facilitates a better understanding of how the TMS320C30 requests rendering commands of the GSP. The Rendering Subsystem A modified version of the TMS34010 Software Development Board serves as the rendering stage of this graphics pipeline. A complete overview of this PC-based card can be found in the TMS34010 Software Development Board User's Guide [2]. Because only minor modifications were made to the commercially available SDB, the hardware aspects of the rendering subsystem are discussed in less detail than the computational stage. The same holds true for many software routines taken from the TMS34010 Math/Graphics Function Lihrary.[8] After presenting overviews of the TMS34010 and the SDB, this section focuses on the C30AB/SDB interface and the communications protocol used for command and data transfer between the TMS320C30 and the GSP. The TMS34010 Graphics System Processor The TMS3401 0 combines the best features of general-purpose processors and graphics controllers in one powerful and flexible Graphics System Processor. Key features of the TMS34010 are its speed, high degree of programmability, and efficient manipulation of hardware-supported data types, such as pixels and two-dimensional pixel arrays. The TMS3401O's unique memory interface reduces the time needed to perform tasks such as bit alignment and masking. The 32-bit architecture supplies the large blocks of continuously-addressable memory that are necessary in graphics applications. TMS34010 system designs can take advantage of video RAM technology to facilitate applications such as high-bandwidth frame buffers; this circumvents the bottleneck often encountered when using conventional DRAMs are used in graphics systems. The TMS3401O's instruction set includes a full complement of general-purpose instructions, as well as graphics functions from which you can construct efficient high-level functions. The instructions support arithmetic and Boolean operations, data moves, conditional jumps, plus subroutine calls and returns. A DSP-Based Three-Dimensional Graphics System 441 The TMS34010 architecture supports a variety of pixel sizes, frame buffer sizes, and screen sizes. On-chip functions have been carefully selected so that no functions tie the TMS34010 to a particular display resolution. This enhances the portability of graphics software and allows the TMS34010 to adapt to graphics standards such as MIT's X, CGI/CGM, GKS, NAPLPS, PHIGS, and other evolving industry and display management standards. TMS34010 Software Development Board Figure 4 shows the block diagram of the modified TMS34010 SDB. The graphics SDB is a single card designed around the IBM PC/XT Expansion Bus and serves as a software development tool for programmers writing application software for the TMS34010 Graphics System Processor. The development of a high-performance bit-mapped graphics display in this application report demonstrates the simplicity of hardware design using the TMS34010 SDB. 442 A DSP-Based Three-Dimensional Graphics System Figure 4. Modified TMS34010 Software Development Board Block Diagram Ri Bj Gi TMS34070 COLOR PALETTE DOTCLK DIGITAL INTERFACE t L ."'~ f-i VRAM 128K BYTES DRAM 512K BYTES T .t PROM 1K BYTES OPTIONAL VRAM 128K BYTES TT + CONTROl BUFFER RiGjBj I j TT + T + DATA ADDRESS ... DATA TRANSCEIVER USART ADDRESS LATCH AND DECODE VIDEO SIGNALS I T LADT J 8« a 0I-!Z :x:oJ. ;I! '"0:x: a«:x: ..../..../ (j~ 01..../z 0 0 :x: oJ. lie: ..../ ~ ..../ TMS34010 - ~ SYSTEM CLOCK DATA T A R G E T CONTROL C HDATA ADDRESS CLOCKOUT VIDEO TIMING HOST INTERFACE S W D S 0 ..../ 0 a:: z I- 0 0 \ N N E C T '"'" i:! a:: « c UJ c « c PC XTBUS 0 \ R Lr This board comes with interactive debug software. Its features include software breakpoints, software single-step and run with count. At the same time, current machine status is displayed on the top half of the host monitor. The SDB contains 512K bytes of program RAM for the TMS34010 to execute drawing functions, application programs, and displays. Both the program RAM and the frame buffer are accessible to the host through the TMS34010's memory-mapped host port. A DSP-Based Three-Dimensional Graphics System 443 The frame buffer consists of eight SIP memory modules organized into four color planes. This allows 16 colors per frame from the digital monitor. The TMS34070 color palette incorporates a 12-bit color lookup table to give you a choice of 16 colors in a frame from a 4096-color palette. Furthermore, the palette incorporates a variety of unique line load features to allow the color lookup table to be reloaded on every line; this means that 16 of 4096 colors can be displayed per line. The TMS34010 Host Interface The GSP has two 16-bit buses: one interfaces with the video and program memory, and a second interfaces to a host processor. The host can access the GSP by writing and reading four internal memory-mapped GSP 16-bit registers: • HSTADRL and HSTADRH together form a 32-bit pointer to a location in the GSP's address space. • HSTCNTL contains several programmable fields that control host interface functions. • HSTDATA buffers data that is transferred through the host interface between the GSP's local memory and the host processor. Several signals are available for communications between the host and the GSP. • HD15 through HDO are the actual data lines. • HCS is the interface select signal strobe from the host. • HSFI and HSFO select which host register is being addressed. • HREAD and HWRITE are, respectively, the read and write strobes from the host. Table 8 shows how the above signals address the four host registers. • HLDS and HUDS signals, respectively, select the low byte or the high byte of the host interface registers. • HRDY informs the host when the GSP is ready to complete a transaction. • HINT is the interrupt signal from the host to the GSP. 444 A DSP-Based Three-Dimensional Graphics System Table 8. TMS34010 Signals Controlling Host Port Interface Host Interface Control Signals HCS 1 0 0 0 0 0 0 0 0 HSF1& HSFO HREAD HWRlTE XX X X 00 00 01 01 10 10 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 11 11 Operation No Operation HSTADRL read HSTADRL write HSTADRH read HSTADRH write HSTDATA read HSTDATA write HSTCNTL read HSTCNTL write The fields in HSTCNTL control host interrupt processing, auto-incrementing of the host address register, and protocol in byte-at-a-time accesses to the 16-bit host port (whether the lower or the higher byte comes first). HSTCNTL also contains the status of interrupts from the host to the GSP and from the GSP to the host and a three-bit message word in either direction. These control bits are shown in Table 9. Table 9. TMS34010 Host Control Register Fields Field Name 0-2 3 4-6 8 8 9 10 MSGIN INTIN MSGOUT INTOUT NMI NMIN Unused INCW INCR LBL CF HLT 11 12 13 14 15 Purpose Input Message Buffer Input Interrupt Bit Output Message Buffer Output Interrupt Bit Nonmaskable Interrupt Nonmaskable Interrupt Unused Increment Pointer Address on Write Increment Pointer address on Read Lower Byte Last Cache Flush Halt TMS34010 Processing Write Access Host Only Host Only GSP Only GSP Only Host Only GSP and Host Neither GSP and Host GSP and Host GSP and Host GSP and Host GSP and Host TMS320C30 Application Board Interface In its unmodified form, the SDB communicates to the PC host through a single transceiver. A PAL decodes the PC address into the appropriate register selection signals. The registers are mapped redundantly into blocks of PC memory address space, as shown in Table 10. The board was modified by the addition of a connector to a cable from the C30AB's target connector. The TMS320C30 sends to the modified SDB the following: • The TMS320C30s expansion bus address • The TMS320C30s data signals • I/O address space access strobe • Expansion bus read and write strobes A DSP-Based Three-Dimensional Graphics System 445 ~ These signals map the GSP's host interface registers in the TMS320C30's address space (also shown in Table 10). The TMS320C30 mapping is actually replicated in four-word blocks untillocation 8057FFh. Table 10. Mapping ofTMS34010 Host Control Registers Register PC Mapping TMS320C30 Mapping HSTDATAO HSTCNTL HSTADRL HSTADRH C7000h - C7CFFh C7DOOh - C7DFFh C7EOOh - C7EFFh C7FOOh - C7FFFh 805002h 805003h 805000h 805001h The modified SDB board must be able to select either the PC or the C30AB as its host. The C30AB target connector makes the two external flag bits XFO and XFl available to the SDB. The TMS320C30 can configure these flags as either input or output pins. Upon leaving reset, these pins default to inputs and remain in the high-impedance state. XFO is pulled low on the SDB to appear off when the TMS320C30 is in reset. After the PC loads the rendering software into the GSP, it activates the C30AB and loads the TMS320C30's software. As discussed earlier, the TMS320C30, during initialization, configures XFO as an output and loads it with a one. The address-decoding PALs on the SDB use this signal to select the C30AB as the SDB's host. When the TMS320C30 contrpls the SDB, it communicates through a full 16-bit interface to the GSP. Thus, before the integer screen coordinates are sent in two's-complement form to the GSP, they must be clipped to a range of -32,768 to 32,767. Fortunately, this range is still two orders of magnitude greater than the resolution of most monitors. In general, the above interface is fairly straightforward. The only complication is that the designers ofthe GSP expected a relatively slow microcoded general-purpose processor as a host. This allows the GSP to actually assert its HRDY line 80 ns before it is actually ready to process a transaction. When interfacing to the TMS320C30, PALs become necessary as state machines to create the appropriate number of wait-states on host reads and writes and thus ensure proper interprocessor communication. DSP to GSP Communication The TMS320C30 loads all commands and data into a command buffer contained within a space not usually mapped by the SDB's C compiler configuration. This portion of GSP address space, the Shadow RAM, is normally reserved for optional PROMs. However, by writing a 1 to an RS latch in the GSP's memory space, this area becomes occupied by the topmost portion of program/data DRAM. Before the TMS320C30 starts writing to HSTDATA to access this memory, it configures the host address to auto increment. Once the GSP finishes processing data in the shadow RAM, it resets the value of the address registers tp point to the beginning of the shadow RAM in order to allow the TMS320C30 to properly load its next command and data. The communication protocol between the TMS320C30 and the GSP closely resembles the protocol between the PC and the TMS320C30. The MSGIN and MSGOUT fields, respectively, replace the. COMMAND and ACKNOWLEDGE words. However, rather than these fields con- 446 A DSP-Based Three-Dimensional Graphics System taining a particular value for a command, the value of 3 (binary 011) in either of these fields indicates that a command or an acknowledge exists. Upon reception of a command request, the GSP refers to the first location of the shadow RAM for a command word from the TMS320C30. Thus, the overall command scheme proceeds as follows: 1) The TMS320C30 waits until it sees that the MSGOUT field contains a O. 2) The TMS320C30 stores all command and data into the shadow RAM. 3) The TMS320C30 writes a 3 to the MSGIN field and waits for acknowledgment. 4) The GSP acknowledges the reception of a command by writing a 3 to the MSGOUT field. 5) The TMS320C30 withdraws its request by writing a 0 to MSGIN. 6) The GSP reads the first word of the shadow RAM for the command and jumps to the appropriate case to process it. 7) Once the GSP is finished with all data in the shadow RAM, it resets the values of the host address registers and then writes a 0 to the MSGOUT bit, indicating that the TMS320C30 is free to request another command. The TMS320C30 Drawing Routine When the TMS320C30 receives a redraw-screen request from the PC, it sends a command to the GSP to clear the screen after the monitor has drawn the bottom line; this ensures that the last view was drawn in its entirety. The TMS320C30 then calls its draw_object routine with *universe as an argument. For each array of primitives within the object, the TMS320C30 sends the size of the array and the array of screen representations of the primitives themselves to the TMS3401O. Thus, the TMS320C30 can request the GSP to draw arrays of points, lines, or filled polygons. Once all arrays are drawn, draw_object recursively executes for all child objects within the universe. In this manner, all objects defined within the system are drawn. GSP System Initialization Several initialization routines are provided in the TMS34010 Math/Graphics Function Library User s Guide [8]. The GSP executes these programs to properly configure the system before it begins its command detection loop: • The call to init_video configures the graphics buffer for an NEC Multisync Monitor displaying 640 x 480 resolution. • The init_grapbics function initializes the graphics environment by setting up the data structures for the graphics functions and assigning default values to system parameters. • The init_screen command initializes the screen. The entire frame buffer is cleared, and a color lookup table is loaded with the default color palette. • The init_vuport function initializes the viewport data structures and opens viewport 0, the system, or root window. • The set_origin command sets the origin of the system to the center of the screen. Drawing Routines Several drawing routines are also provided in the TMS34010 Math/Graphics Function Library Users Guide [8]: A DSP-Based Three-Dimensional Graphics System 447 • For each primitive in an array sent from the TMS320C30, the asp sets the proper drawing color with the set_color command. • The TMS320C30 commands the asp to execute to the clear_screen before it starts to request drawing of primitives for the next view. • The TMS320C30 requests a wait_scan execution from the asp to ensure that the asp has fully displayed the last view before drawing the current view. • The asp uses the drawJloint(x,y) function to render a point on the display. • Similarly, it uses the drawJine(xl,yl,x2,y2) command to draw a line. The arguments are the screen coordinates of the two end-points of the segment. • The fiIlJlolygon(n, linelist, ptlist} function takes as arguments of the number of vertices, an array of the line segments forming the sides of the polygon, and a list of screen coordinates referenced by the Iinelist. Summary The TMS34010 Software Development board provides a good rendering module for this graphics system. The support hardware has been debugged and used in industry since 1987 and thus makes a reliable rendering subsystem. The target connector to the C30AB provides acceSs to the TMS320C30 as an alternate host. Three PALs and two transceivers allow the TMS320C30 to assume control of the asp, once both have started running their software. The draw_object program on the TMS320C30 can command the asp to draw graphics primitives. Functions in the TMS3401 0 Math/Graphics FunctionLibrary User:S Guide [8] allow the asp to initialize the moni'tor interface, clear the screen, ensure that an entire screen has been drawn, and draw the graphics primitives. Overall, the TMS34010 development tools provide an easy means to develop a rendering subsystem for this graphics pipeline. Possible Improvements Several changes may be incorporated into the system to improve performance. Some simple enhancements involve modifications of the computational subsystem's software to allow faster and more transparent command execution. Restructuring the method in which the data and command. pass through the pipeline, a more complex modification, can greatly increase throughput. Additional features such as more complex primitives, lighting, windowing, and text display would require major software modifications to the system. However, any such modifications would not need to change the communication protocols or the command detection loops significantly. Finally, although the TMS320C30 represents the state-of-the-art in digital signal processing, the host processor and the rendering engine may be improved. Computational Subsystem Software The drawing routine currently sends the primitive arrays of an object one at a time to the asp. Instead, it should send all primitive arrays for all objects to be redrawn in a single pass. The asp should then process the contents of this stack of commands and data. Currently, as soon as the PC finishes requesting objects adjustments, it must request recalculations of the screen coordinates of location arrays. The screen_object routine must operate on all 448 A DSP-Based Three-Dimensional Graphics System objects that have been adjusted directly or indirectly by having their ancestors adjusted. Instead, this routine should be called once with the *universe as the argument. The object structure should contain a flag that is set when an object is adjusted and reset when it is drawn. Thus, the new screen_object procedure would recursively search down the hierarchy of objects until it encounters an object that has been adjusted and then should recalculate all the screen coordinates for it and those of its internal objects. Upon completion, it should search the rest of the hierarchy for adjusted objects. Thus, the host would have to request only adjustment, targeting, and draw commands. Screen representations would be automatically recalculated whenever a draw command is executed. Rendering Subsystem Software Rendering subsystem drawing routines could be improved by designing functions coded to handle the primitive arrays rather than individual programming elements. These functions may be able to fit in the GSP's instruction cache and improve execution time. Improved Data Flow One problem consistent at all stages of the system is the method of buffering. A single buffer usually contains all data and commands to be transferred from one stage to the next. Thus, during command execution one processor may wait for the other to relinquish control of the command buffer. The first of two methods to improve the dual-port SRAM connecting the PC and the DSP is to divide the SRAM into two buffers. The PC writes the current command to one buffer, while the TMS320C30 processes commands and data stored in the other. This prevents contention for the dual-port SRAM. The particular buffer which each processor controls is swapped on each command request. Second, adding three more 4K x 8 dual-port SRAMS in parallel would allow the PC to communicate to the TMS320C30 with full 32-bit wide words. Thus, the masking and concatenation necessary to receive larger data types would become unnecessary. On the original design the potential addition of these RAMs consumed a prohibitive amount of board space. Full word size is possible only if space constraints are eased. The splitting of the command buffer between the TMS320C30 and the GSP allows the GSP to draw the current screen while the TMS320C30 sends the primitive arrays for the next. Similarly, two display buffers allow one buffer to be displayed on the monitor while the GSP draws the next view to the other. Computational Features The DSP is suited to perform many other types of computational features. Because these functions are more complex, they were not implemented in the limited design time available. This system truncates objects that are too high, too low, too far right, or too far left by using the GSP's drawing routines that automatically clip coordinates outside the screen boundaries. However, the system cannot determine whether one object is in front of another and draw the objects appropriately. Functions to do this hidden-surface removal require complex algorithms to determine whether A DSP-Based Three-Dimensional Graphics System 449 one 3D surface obscures another. Simpler routines could be made to clip objects that are too far away to see or objects that are behind the viewer. A lighting feature would allow appropriate factors of light intensity and reflection to determine the shading of surfaces. Lighting may be ambient (equal everywhere) or come from several possible source geometries. Reflections could either be diffuse and scatter light equally in all directions, orbe specular like those off any shiny surface. With these parameters, the TMS320C30 can compute the appropriate shading of a given pixel. In this scenario, the GSP is reduced to drawing single points with a given color. Thus, any lighting function would slow rendering time. More complex primitives can be produced by using the TMS320C30 to generate arrays of pixels representing solutions to equations. The PC could dispatch a command to draw a primitive based on a particular type of equation (such as the parametric equations representing a sphere) and then load the appropriate parameters for that equation. The DSP would generate the appropriate set of pixels for that object and send it to the GSP as arrays of points. Rendering Features The TMS34010 Math/Graphics Function Library [8] permits the user to create and select various windows for display. Once a window is selected the DSP can run the existing system software within that window. Thus, the host would also need to be able to direct the DSP to tell the GSP how to manipulate its windows. The Library also enables the GSP to print text on the screen. This feature also would not be very difficult to implement. A More Advanced Host A more advanced host could be a high-speed RISCprocessor such as SPARe. This unit could communicate with the DSP at faster rates, so command transfers would consume less time. In addition, SPARC is a 32-bit machine, which could allow word transfers between host and DSP in a single instruction. A More Advanced Rendering Engine The TMS3401O's performance as a rendering engine could be improved. lithe GSP could be ready to complete a transaction when the HRDY line is asserted and not some period of time later, the C30AB to SDB interface would be more straightforward and not require as many wait states. This problem is corrected in the second-generation GSP TMS34020, which was not available at the time of the design of this system. In addition, the TMS34020 also allows the host to transparently access the GSP's bus while the GSP continues processor functions. Conclusion Despite its shortcomings, this system still demonstrates the dataflow in a graphics pipeline using a digital signal processor as a computational element. One main benefit of the digital signal 450 A DSP-Based Three-Dimensional Graphics System processor is the availability of development tools such as C compilers, assembler/linkers, software development boards, and in-circuit emulators that accelerate design time. The TMS320C30 also provides speeds comparable to many bit-slice processors that require programmers to develop extensive microcode routines. The hardware multiplier, floating-point capability, RISC architecture, and parallel bus access facilitate fast, precise graphics calculations. Overall, a digital signal processor provides an attractive option to the graphics system designer interested in making high-performance systems with quick turnaround time. References 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) TMS320C30 Simulator User:'" Guide (literature number SPRU017), Texas Instruments, 1989. Third-Generation TMS320 User's Guide (literature number SPRU031), Texas Instruments, 1988. TMS320C30 C Compiler User's Guide (literature number SPRU034), Texas Instruments, 1988. TMS320C30 Assembly Language Tools User's Guide (literature number SPRU035), Texas Instruments, 1989. TMS34010 Software Development Board Schematics (literature number SPVU003), Texas Instruments, 1986. TMS3401 oSoftware Development Board User's Guide (literature number SPVU002A), Texas Instruments, 1987. TMS34010 User's Guide (iiterature number SPVU001A), Texas Instruments, 1988. TMS34010 Math/Graphics Function Library User's Guide (literature number SPVU006), Texas Instruments, 1987. TMS34010 C Compiler Reference Guide (literature number SPVU005A), Texas Instruments, 1986. Foley, J.D. and Van Dam, A., Fundamentalsoflnteractive Computer Graphics, Addison Wesley, 1984. A DSP-Based Three-Dimensional Graphics System 451 452 A DSP-Based Three-Dimensional Graphics System Appendix A Graphics Programs Listing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Name TMS320C30 C Structure Representing an Object TMS320C30 C Structure Representing a Location TMS320C30 C Structure Representing a Point TMS320C30 C Structure Representing a Line TMS320C30 C Structure Representing a Filled Polygon TMS320C30 Communications Macros TMS320C30 Global Variables TMS320C30 Main Command Execution Loop TMS320C30 Floating-Point Conversion Routine TMS320C30 Object Loading Routine TMS320C30 Screen Coordinate Calculation Routine TMS320C30 Transformation Matrix Evaluation Routine TMS320C30 Object Deletion Routine TMS320C30 Request for Additional Data in Object Load TMS320C30 Object Drawing Routine TMS34010 Point Structure TMS34010 Line Structure TMS34010 Color Array TMS34010 Color Palette TMS34010 Main Command Execution Routine PC Object Loading Data Structure PC Communications Macros PC Global Variables PC Targeted Object Adjustment Routine PC Routine to Set Parameters for an Object Load PC Routine to Target Parent of Current Target Object PC Routine to Target a Child of Current Target Object PC Routine to Redraw Screen PC Routine to Load the Primitives of a Wireframe Cube PC Main Routine to Draw a "Planetary System of' Cubes A DSP-Based Three-Dimensional Graphics System I,; " 453 i· j: ~ f,*********fffHHHHHfHHHi:ffHHHfffHffHHfffHHHHffHHfHfHfIH *****'lHfHHHflfHHHHHfHffHfffHHHfffHHHHffHHHfHHHHHfH --)Listing 1: TftS320C30 C Str-ucture Representing an Object --}Luting 3; TI'tS32OC3O C Structure Repres.enting stru(t object tYDedef struct { 'f t:J E@ r ~ ~ t I" t Q i:! ~ ~. ~ ~ Point { struct object *Dirent;/f object within \!Iho . . 5o frallt the object is defined If sib ling nUllber of object long subnul; to.ng 1o.enu,,; nUilber of locations 1* nUlber of points 1o-0g ptnul; If nUllber of lines longlnnu",; lo.ng pgnum; 1* nUliber of pc lygons /f nUlibel' of daughter objects long oonufI; float sy: float sz; /* scale factors float iX; float dy; float dz; II offsets float dx; II angle of roh.tion around Z-axis (x to y) float theta; /1 angle of rotation around x-axis (y to z) float phi; /* angle of rotation around y-axis (z to xl float ollega; /1 IfIitrix forlHd by scale, the offset. then rotate fl •• t ,[3J[4J; fl ••t 0[3][4J; /1 ascending product of all aneutral r latricu II pointer to location array loe *1 GCS; /* pointer to point array point *points; 'I pointer to I ine array line *lines; II pointef' to polygon irray polygon 'polygons; struct object fobjects[r!AXOB); 'I pointer to atray of /'1pointers to child objects ::t.. it. 1/ lo.ng (olef'; lDng lDcn; II *1 1* nUlber of location in location artay 1/ } pOlllt; 1/ II II **HI******IH*lffHHHHfiUHHHHHHHlfHH-I:lHHH:lfffHHHfHIfHIHf 1/ 1/ Hftf:lltffl:lHHtffH'HfffHftfftfHHHffHHfllfflfH:lffHHfHtfHtflllfH 1/ 1/ --)listing 4: Tr.5320C30 C Stf'ucture Representing a Line 1/ ./ typedef struct 1/ { '1/ long cOIl or; long staf'tlocn; long endlocn; ) line; 1/ 1/ I' /1 stitt loc number 1/ 1* end Joc number 1/ 1/ 1/ IIlfff**IHHHHlfHH+fHffHIIHHHfHlfHflHHHIHffIHfHHIfHIHIHf 1/ l; IHffHIHffHtfHfIHfIHf**"fIHHlfI-HfHffHfffHffHffH:llfffHffHffHI *1**HHfIfHIHHlfHIHHfIIIHfHH*HHfHfHffHHf:lHHHfHH+HHtHH --)Listing 5: TI1S320C30 C Structure Representing a Filled Polygon typedef struct -)Listing 2: TI'tS32OC3O C Structure Representing a Location typedef struct { float x; long i; } Toe; float y; long b; float .Z; /Ililorld coordinates 1/ screen coordinates 1/ 'f HflfHlHfIHfflHIHfIHfHffHffHffflHIHfIHflffHHtflltfHflHtfHIHf { long color; 101'1£1 vertnum; 101'19 Ivertlocl'l; ) polygon; I' /1 nUliber of vertices 1/ /f array of vertices loc nlJllbers *********IHfI-HHfHHf*HlfHHffffll'lHIIHfHHffHHHHlfffHHIHHHH :a:.. 111111111.1111 ........ 11111111 ...... 11 •• 11111111111.11111HHIIIIIIIIIIII.IIII ~ -)Listing 6: 11IS32OC3O CoaunicltioRs ""r05 ';ti ~ ~ ;;I ~ "6 I' t ~ {l ::s~. ~ ~ -)listing 81 TIIS32OC3O .... in CoMInd Execution Loop /.'---------- /. /. ....fi .. Idefinl .... fi .. Htfi •• .defi.. ./ ./ CllllUICATlIIIS IIICftOS TO GSP -----------0/ CIlFREE CTlREQ C1l.ACK CTUlITH IIlSTCNTL 0x0800 0x0803 0x0830 I'hsteotl • 0x00fFF1 IF INTERIill IIIJECTS ./ ./ /' .... fi .. I!I1XOB 10 11---------- /. ---*1 PC CllllUICATllIIlWlTlIIIS ./ /1------------------------------1/ I4tfi •• COI1ANII 1..... Lp.'t • 0x0FF1 Htfi .. IlCKNOII.EIiGE duaLp.,tI11 /.---- DATA R££OIoBl\' FROI1 T1£ DUlL parr Htfi •• 1F0111 .... fi .. 0'1111 H.fi .. 1P2111 dUlLp"tro1 Idefint lP3tal Idefine IFLDNGtal dUlI_porUa + 3] =6.283185308, ..Ica LDI 02h.I(f'II); /. set XFO Ind lSSU" control of 340SDB */ ,. set for zero internl.l lIII.it stiltn on both busts " '«(unsigned long *) 0x809060) =0; I«unsigned long .) Ox808Ob4J =OxlOOO; Instentl =CTLFREE; ,. turn off Iny request to 11tS34010 1/ .fdull-port IE 0; 'I turn off any request from the PC ./ ACKtOLEOOE II 0; /f turn off a.ny ackno.leg.lHnt to the PC 1/ II allocate space for the internal object /0-----------------------------1/ /. retister fl.lt hI.pi I"fgist.r long i, j; ;I -----1/ PlAXIIUI _ ( register long 'hstdat. = 110ng .) 0x805002; ,. 340 host dati register */ .-.gist.r long 'hstcntl -= (Jong f) 0x805003; /. 340 host control registerll dual_port (uAsigned long f) 0x804000; ISI(I (II: OSOOh,ST1J;I* enlbl. c,cbe ., 0x0833 /. /. void Kine) ./ -1/ universe = (struct object to universe; to-)subnu. = 0; to-)parent to; "'i1etCOtltAND != 11; ACIOOl.EDGE 1; whil.(CQfNND != 0); load_objedO; I¥XtOLEOOE = 0; IN.trixO; fortH I = = = duaLporUa + 11 dUll-p.rtrl + 21 1I1.0gl 110'1101 • 0x00FF1 « 8 I 11F0111 • 0x00FF1I1 HHfHHHIIIIIIIIIII.IIIIIIIIIIIIHffHHHHfHlffHHHIHHHHHIIHffH II II 1N1Ioc (sizeof(struct objectll; 1* target universe II 1* set universe sibling nuber to 0 1/ /* universal object is its Ollm parent 1/ /1 first cOlIHnd est bf a load object */ II lCknolllledge that c30 is ready 1/ II lIMit for pc to witbdraw request 1/ /* IMd uniVf'rse 1/ /* sbow that dual port is free */ 1* calculate transfor....tion .trix I' 1* infinite loop for PC cHMnd detection*' { HlllllllllllllllfHHHflllllllllllllllllllllllllllllllllllllll1IIIIIIIHUH -)Luti.g 7: TftS32OC3O Gl.bal V... ilbl.5 long k,l; struct object luniverst. Ito. *no; unsigned long IduaLport; union ( fl ••t I, unsigned long i; 1* te.porary lnd looping variables *' II universe. target object. next object 41 /. dual p.rt _ 1* Y&tiable to cOlstruct I. c30 for.t 1* floet fro. intel fOrMt allowing 'I bit lNlIipulation on I float ./ 4' II I' J f11.og, HltHHHHlfHHHHfHHHHlfHfHHHfHffffHtHllHHHHHffH*lHffH ~ VI lIhil.I_=OI, j • COI'ItANII, IlCKNOILEOOE • j, lIIhileUn'WID != 01; .. itch Ijl { c... 11 ./ ./ /f lIIlit for PC to request service If save CHMnd /1 acknowledge request '* lIIIi t for PC to wi thdl'llil request /1 execute requested cOIRnd nUIINr ./ /. LOAD A DAOOHTER OBJECT ./ ./ ./ if tto-)obnua = ""XOBI brelk; 1* abort if) lMXilUlii objects .1 j ++to-)obnul; '* increue lutllber of daughter objects *1 1* Illoclte spice for RftI object 11 to-)objects[jl (struct object I) .11oe (sizeof(struct object I); no = to-)objects[jl; '* next object is daughter object t1 no-)subnul jt II set sibling nUliber of next object *1 no-)parent to; 1* usign current object as no's paret *' to = no; " target daughter object 1oad-objecttI; 'I load da.ughter object 1/ ACICtOl.EDGE 0; II shOll! tbi.t dua.l port is free Iltrix(l; It cilculate transfor. Iiltrix ./ break; = = = = = I' I' clSe 21 j lI'UNl121, /. TMIlET A _TER 1II.ET ./ 1* get daughter object n.er to target 1/ = ~ - M:ICNIIII..EIll • 0, / ••h.. thot ....1 port is frtt ./ if (j ) lo-)Obnuol b....k; /. (III only Ilrg.t existing objtel ./ to = to->obj.ct.ljl; /. tvget dlught.r object ./ bNlk; /'TAR6ET_~ cue 3= ~=O; /. Ih.. thll d..1 port is fret to = to-)Plrent, /. $It targeted object to parent ""il.U«ISTCIIll. !=-CTlFREEI; / . . .it for 340 to b. fr.. ibstd&ta • 6; /. enter co_nd for I. SCiolirtf' Ihstcntl • CTlREQ; , .. request SIf'vict f"ol 340 IIfMltUtOSTCNTl != CTlAClO; /1 lAit froa IcknOfliledgtlltnt tII.leoll • CTlWITH, ./ ./ ./ cu. 71 '* +ttWARNIMl+++ /. C41i.CtlATE SCREEN CIXIIIIINAlES ./ sc!"ten cOIIMnd to I' tn. PC user MIst execute • It screen 111 objects tt.at hive been adusted since- the lut JI dro before the next dru. HowYfr, if an object is/. screened all daughter o.bjects ve is wll. ~ 11: 0; shOll that dual port is fret scNtft..o.bjectUo), /. calcucl.te sCretA coordinltes break; 4' /. IEl.ET£ TAR6ET£II O&£CT ./ /. ,hoo thlt r.qu..1 dU&1 po.1 is fr.. ./ if (to = uni.,.".) bro.k;/' don't 11101i1 deletioft- of universe'/ j • to-)subnua + 1; I. get nua.1" of Dlxt sibling 1/ no =to-)par'At; II SIt ntxt object to parent 1/ dllete_objectCto); /f dellt. current object ., to • no; II tltgtt pvtnt object 11 1 =to-}obnulI If filt' totll nUllbtr of siblings 1/ ~ 1/ II II */ break; brtlk; m. / ••llhd..... requllt ./ '; 0; '* *1 ., */ 1/ 1/ deflultl ADOKkEOOE b...Ie; dtcr-eHnt sibling ftuM,r Oft all younger sib) ings 1:/ forCi = j; i <~ 1; ++U -to-nbjectIUl-)subnua; -to-nitA", /1 dec....nt total RuMer of dl'gMtr objects */ c 0; /* Iholll tba.t dual port b ff'te 1/ /1 brflk; :.. ~ ~ ra. ~ ~ 1\ S1 ~. ~ [ ~ itl ~ ~ 5' Io->sx .. to-).y .. 10-)" .. to-)dX - -Ie lo-)dy t= to-)dl .. to->thtll" Io-)phl -Ie t ...>ootgl . . CIS. ~ /. AlWST TAR6ET£II O&£CT dpflOlU21, /. Idju'l sal.. dpll ..I(61; 4pll ..I(l0I, dpflOltU41, /. &dju.1 off.ets dpfl ..tU81; dpfIOlI(22I; dpfl ..I(UI, /. &dj ••t 111910. dpflOlII30I; dpfIOlU34I; ~ 0; t* Ihow that dual port is fret ./ ./ HllHHHttHHftHHHHHHHftHlHltHHtHHlltHHlHHHf.HHHfIHH --)listing 9; TrIS32OC3O FIOIting.ofoint Conversion RoutiAt ./ float dpflo.1 (al I'"tgist.r unsigned lo.ng a; theta • faod(to-)ttlet., tltOpU; lo-)phi • food(lo-)Phi , toopil, b-)•• = fMdne-}aeegt, btopi); .trixCto); . /f Nealcul.t. trans-fora .trix brtlk; ,a clSe 6r ~=O; /. _ OOYERSE /* show that dUll port is frH ./ 1/ (DP3(al « 24 Unl.1 • 0x00Ff1 « 16 UI'1(11 • 0x00Ff) 8 (lI'O(l) • 0x00Ff1l; .Ign • (I • 0xS00000001 » 8, ex • ( ( I ' 0.71'8000001 if (sign) ./ 1/ ./ « I, /I.xtract and "'position sign bit / ••xtract exponent ./ ./ ./ /1 (oov.rh to 2/ s cc.pldent ( Knt = (- a) if (Hnt ./ ibstdat. :a 4; It tnter cOlland for I screen clor */ tbltcntl ~ CTlAEQ; /t request service fro. 340 t/ lIIhileOllSTCNTl !.- CllACK)l /t wit fOl' uknfMIIledgtMnt 1/ 'bllenll • CTlWITH; / ••itbd_ req.est ./ /t concatenate 4-byt. value « - Ox3f800000) ""n.O«ISlam. !=-CTlFREEI; / . . .it fo. 340 10 be frtt drULObj.ctlunivers.); -/f dru aniverse /. offl.t fro. 51••1 of d..1 port SlIM ./ ( . /t taitt'S 2~s co.p·ltHnt of ..ntissl *f 0x01OOOOOO; 1* chtcks for input Mntis'" of -2 ./ Ir: 0x007FFFFF; = OJ ex -z else Mnt == a " 0x007FFFfF; a =sign + Mnt + IXJ Fllong.i = at /1 otheMlist leav...ntiSSt 110ne /t reconstruct flN:ting-point fields ./ ./ return fl1on,.f; /t rt"turo reconstructed float ./ ./ HlHHIHfHHHHIHHtHHfIIIIIIIIIIIIIIIIIIIIIIHlHHHHHHHHfH*Ht •~ ~ -llisli., 10' TIIS32OC3O OIIjlci Loading Rouli .. [ ;;I r.,hter 10e Ntist.r 1iM It.,loc; *tapln; "b poly,.. poi.I 10., Ic 10., pi ottIPP" ~ I' i ~ {l ~ 5- .. ~ ~ l!! ( ( rtgist.r 10lllg i,j; r.gisttr struct objKt to; '1_1, =1fLOIIl121, • 1fLOIIl141; I••, I. 10., PI • 1fLOIIl161; =1fLOIIl181; /1 tapor.ry and looping varial.s poi.lor to I ... ,.t ebjoct /t teaporary location pointer /t tnporary 1iAf poiftter t ..poruy polygon polnlor 'I 'I't " tapor-ary point point.r n"" or coordinate locations /1 nUliber of points /t n••r of liReS nUlb,r of polygons 'f 1/ 1/ 1/ 't ., /t initialize priaitiw nabtts and trlosfor. ,arueters o-)locn.. I: Ie; o-)ptnul • ptl o-)lmu. = lA; o-lpgo" • PII o-)obftUli • -I, 0-l4> • dpIlOlI(22I; .-)IlIttl • dpll ..II34I; o->.y =dpfl ..III41; o-ldy =dpflo.I!261; o->IIM • dpll.,II38I; ., o-)loCl o-)points = Che II .11oe O-)1iMS "' (Sizeof (loe) lid; *) .11oc (sizeof (point ) t pt); *) M,nOC hizeof (line I I: In); (polygon f:) .)Joe hizeoF (polygon) I pg}; = (point "' = 'tforLOAD =lI'TO0, 'Injll46,LOCATJCHIIe; FeR O&£CT ++i, += 12) i ( (i c /f 60) =IfLOIIlljl, lopln-)color ttlpln-)startiocn c Ift..CNI(j + 2); tapln-)endlocn • I:fLCI«I(j + 4): "' _1__Ian, tc 4. ( I_I I~>c.IOI' • " ...".loIsUlI; ,. set t..,.1'OI'Y poiot locati•••, • Il'UllGtj 1\ , . . .I point col.,. ., I_t->IKt • III'UIIBIj + 21\ set topor-Iry lint If: get color "' if (p,1 (i = 0; i vtrtlocnU] = DPUWG(j); taporary location 1/ /_ 1m wrld coordinates 1/ CcCO:'")locsU)), loploc->X =dpfl ..lljl; I..,lo<-)y • Oot,. = dpfl ..t(421 '" ALI.OCATE SPACE FQl O&£CT PRI"1TI1o£S ( ., I =dplloaillOI; o-lsx lore_dih. (); for (i :II 0, j=2 ; i " /t Ht t."get objtct as object for loading I to; 0:11 "' ,. LOAD lI'TO 1364 LII\£S if (lnl ..id 10000000)Kto ,. If! poi.I locali.. ., ffHtHfHffHfIHfHHIHHHHtHflHffHftHlHfHHfHl-ftHHHflHHHHf ~ 00 (k) 32000) k' 32000; ehe if (k (1) 32000) I. 32000; tis. if (1 if if ->Lilti", III TIIS3ZOC3I) <-32000) <-32000) k • -32000; I • -32000, SC.... eo.,dinott Clle.),ti•• Routi .. ./ 1* set lertln coordinat.s ttIlP1oc->t. s: k; void 1C.........jtCt(.) rttilt., ItrllCt o.jtet 10, taploc-)b :& 1; ( ,. toporvy ud looping writb 1., /1 tnporvy location point.r regis-tel' ltft9 itj; register lie ItaplOCI 't t,..orvy object pointtr ttgisttr struct object oftopob; ...,ilt., float X,Y; /. CIIIPUlE PARENT IIITRIX /1 if object is universe set parent Nttix to trlnsfor. IIltrix r if '0 - vRints,) =0, i <3; /1 serHn ..11 intel'M) objtcts ++il fo,lj =0; j <4; ./ IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHHHHHHHHHIHUHHH ./ ->Listing 12: TJIS32OC3O TrusforJlltion ItI.trix Evaluation RoutiM! ++j) o-)p[ilIj]' o-),[ilIjl; ...ttlx(} { I"tgisttf float /f otbtrwist P Mtrix is product of r Mtrix arrd piJ'ent's p Mtrix elst hr(i == 0; i < 3; ++U ;.. o-)P[iJ[OI • 0-),[01[01 • + o-)r[2][OI o-)p[i][II • 0-)r[011ll • + 0-)r[21111 0-)p[iJ[21 • 0-),[01121 • + 0-)r[21121 0-)P[i1131 = 0-)r[01[3I • + 0-)r[2][31 ~ [ ;;l b I' ~' ~ o@ ~ ~' .. ~ " :! 1/ cost, sint; float coso, sino, register .tract objtct *0; C05P~ /f transfer. t...orary sinp;I'vt.riables ./ II { ~ ~ ./ j = o-)obnu, for fi =0; i (11 j; ++il scrttlLobj.ct(o->Obj.ctsU]); /. set toporery object.to Pl.rtnt object ./ tlllPob • o-)porent; fo,H I' /1 co-ordindt flolting point values 1/ /1 Ind perspective constant 1/ floot I,d, ( */ ./ t..,0b-)p[iII01 + o-)r[ll[O] • t ..pob-)p[ilIU • t..,ob-)p[i1121, tlllPob-)pUIIOI + 0-)r[l1l1l • t..,ob-)p[illll • t ..pob-)p[i1121; t..,0b-)p[iII01 + 0-)r1ll[2] • t ..pob-)plillll • t ..pob-)p[i112]; teopob-)pIil[OI + 0-)r[ll[31 • t..,ob-)plil[ll • t"pob-)p[i112] + ttopob-)p[iJ[31; 0= to; cost· cos(o->tbttaJ; SiAt =sinCo->thetl}; coso· C05(0-). .,I,); sino sin(o-)OHgI,); cos, II: = cos(o-)phi); sin, • 5in(o-)phi l; o-)r[O](O] -= o-}r[O][1] 0-)5X ., ./ /. get nUlber of locations 1/ /. CIIIPUlE SCR£EII COORDINATES j = o-)IO(,u_; for (i • 0; i < j; ++i) ( teiploc II ,. set t.poruy location l!'o-)locdiU; 'I' llYe ,1 . .1 coordinates X" ttllPloc->X; *' ./ y ~ te!aploc-)y; z = te.,-loc-n; ,. calculab Z valut, 1441 offset of 5, and invert for ,.rspectiw ., d = I/(x • o-)P[21101 + y • 0-)P[2][I] + _ • 0-)P[21121 + 0-)P[2l[31 + 10); ,. cllcullt. transforMd x and y, add perspective, and sCil. to screen" k = n.ng) (Ix' 0-}J[01l01 + Y • o-)P[OHII + z ••-)p[OIl21 + 0-)p[01l31) • d • 200); ) = n ••g) (Ix' o-)p[l][OI .+ Y • o-)pllllll + _ • 0-)P[1J[21 + 0-)p[l][3]) • d • 200), /. clip· to • 16 bit int.ger ./ cost =- o->sy f 0-}r(0][2] = 0-)5Z f I sint coso; I coso; sino; 0-}r[0][3] == (o-)dx I cost - o-}dy f sir.t) f coso + o-}dz f sino; o-)rU][O] =o-}sx • (sint f cosp + cost .. sino * sinp); o-)r[1][1] = O-)Sf I (cost f cosp - sint f sino. sinp); 0-},,(1)[2] =- o-)sz t coso. sinp; 0-)r[1][3] = (Co-)lIIx f cost - o-}dy f sint) f sino - o-}dz * coso) • sinp +- (o-)dx • sint + o-)dy • cost) • cosp; 0-)r(2J[O] =o-)sx • (sint .. s.inp - cost. sino I cosp); o-)r(2](1l z 0-)5Y I (cost. sinp + 5int • sino' tosp); 0-)r(2](2] =o-)u • coso. cosp, 0-)1'(2][3] = 1(- o-)'x. cost + o-)dy' sint) .. Soino + o-Xtz • C050) • cos, + (o->dx' sint + o-)dy • cost) • sin,; HKfHfHHfHfHHffffHfHHHHHHfHHfHfHfHfHfffffHfHfHHHHflf ~ tl ~ r ~ fffffffHftHfHHffffHfff**IfHfffftHIHff:llfffiHffHHHHfHHfffHHfH UHf.I*ff*****nHHIIIIIIIIIIIIIIHfHHHfuHHHHHfHHtH+HffHflHHf --)lishng 13: TJItS32OC3O Object Deletion Routine --}Usting 15; TI1S320C30 Object DI'.lllling Routine void delete_object (0) registtl' str-uet object *0; VOId drllll_object (0) { ,. ttltporary. looping variables *1 register long i, j; ~ free (o-)Iocs); /1 ~elete location array free (o->ooints.); /1 delete point array fret (o-)lines)~ /* de lett line array j =o~ )pgnllll; /1 get nUlier of po Iygons for Ii = 0; i {= j; ++i) frte lo-}polygonstil.vertlocn); /1 delete for Ii =0; i (= j; +til Frte lo-)polygonsl; /1 polygons j =o-)obnulI; /1 get nulllber of daughter objects for Ii =0; i (= i;++U delete_object(o->tbjects[il); /1 delete objects free (0); /* delete object ~ 6 I" t reglSter struet object *0; { 1/ */ 1/ *1 1/ 1/ 1/ II 1/ register register o01nt register polygon register tE!glSter register 1/ 1/ ./ 1/ 1/ 1/ ., II 0' ( c:'.l flffffHftHftHHHllfHHffHUHHfHHfffffffHlHflHIHtfHffHffHHHI ~ HHHHtHHHHtHHHHHtIHfIHlIIHIHfHfHHlHfHHfffHfHHfHHHf whilt II«ISTCNTL ~= CTLFREEI, Ihstdata = 123; ll'tstcntl =CTlREQ; *hstdi.ta = j; forListing 141 Tl'tS320C30 Request for Additional Utta in Objtct Load tupln = &:lo-)1ines[i]); Ihstdlta =hlpln-)coior; fhstdata =o-}locs[telplR-)startlocnl.a; *hstdata =o-)locs[topln-}startlocnl.b; Ihstdata II o-)locs[te.pln->tndlocnl.a; Ihstdata =o-)Iocs[tellpln->tndlocnl.b; ( ACIlisting 17: TI'634010 Line Structure ( = 0; i ( 1; forCi /f draw polygons Hi) ( toppg = &:(o-)polygons.[i]); j = tsppg-}vertnua; wa.it ''I*, '* If till 340 is free send (GllEnd to drakl object IIhilt UfOSTCNTL != CTLFREE); request snvice from 340 fnstdah. = 5; /1 send nu.ber of points fnstcntl = CTlREQ; /f send p9ints fbstdah. = tellppg-)color; fnstdata = j; *' *, *,*' "*, *'*' send color If send nUlber of vertec!s typtdtf struct 1* LII'£ H ( short color; /1 short xl; If short yl; short ><2; /1 short y2; /f } line; ,* line color x co-ordinilte of starting point Y co-ordinate of starting point x co-ordinate of end point Y co-orllinate of fno point "*, *'" " Hff. . . . fflffHH.H4H.HHHHIHHf***** .....*f-H•• HfHHHH. . .UHffHfHf /f Sfno point connect list (0,1 , 1,2 , 2,3 .... j-2,j-l , j-l,O 1/ = 0; forek =.1; k ( j.; ++kl fhstdata ffffl4f+1UHf.fHfffUHffHHHHHffHHfHfffUHffUffHfffffHffHHHfH ( +hstdata Ihstdata = k; *hstdah = k; --)Listing = 0; 10n(l color(16) /f send vertex location list for!k = 0, k j, .HI II < ( ~ ~ ~ r[ c:"l 3 ~ 5' ~ ~ II IIHtllHHHfHfltHtHflHfHtttHtHfHtHHfIHllffflltfflHlffffHllllfl1 tIIIhi1@(HQSTCNTL != CTLACK1; /1 wait for' 340 to acknolweodge requestfl ~ ~ ~ ={ ceo, eCl, CC2, CC3, CC4, CC5, CU, C(7, cca, et9, etl0, CCll, et12, CCB, CC14, CC15}; HIH**II**I**IHlflHHHlfffflttl**IHHffHfHffttfHffffHIIHHfffHIIHf =&:(o-)locs(te.ppg-)v@ftl'lcn(k))); /1 save point thstdah = tellploc-)a; *hstdah = hllploc-)b; tellpl'lc t;::, 18; TMS34010 C'llor Af'f'ay fhstcntl = CTLWITH; If Ifithdralll request 1/ I-)Listing 19: lltS34010 Col'lr Palette short lIIypaletrlbl = { '* *' IJlAII IVf( DAUGHTER OBJECTS j = o-)obnua; II g!t dalJghter objects II for Ii = 0; i (= j; Hil dralf_objectlo-)obj!ctsUl); HfffftHIHHHHtHIHHfffHHltHHHfHffHHHHtflHHfHHffHfffHIH ff**HffHffHHfffHHIHfffffftHfHffHffffffffHHffHHHffHffHfHfffff --)Listing 16: TI'tS34010 Point StrlJcture '* typedef struct short color; short Xt short y; } DOlnt; If point color If x (o-ordinate II Y co-ordinat~ POINT *, ., *'*' HfffHlfHffHHHffffHHlffffffffHllHfffflHttfHHffHffffHfHfHffffff Oxoooo, OxFOOO, OxOOFO, OxFOfO, OxOFOO, OxFFoo, OxOFFO, OxFFFO, OxOAFO, 0x0900, OxFA70, 0xF4AO, Ox178O, Oxb6bO, Oxmo, OxBB8O J, HHffHHt*HflfffHfHH4HfHHIHlfffffIHffffftf+tfHfHfHHfflffHlfflf ~ tl ~ ~ a ~ ~ t[ c:':l a ~ ~. ~ ~ ffffHfIHf**ffHfHHfHffHHffHHHHffHfHHHttHHHHHffHfHHHHf --)LlShng 201 TMS34010 Main COlINno Execution Routine ( reglSter line 'templn; re~llster point ttellppt; regISter short tellpint; /f t.aporary line pointer /f te.porary point pointer 'I get RUllber ofII /1 points II 'f t ••portry integer /f looping variable If pointer to line arra.y registtr short i; 1/ 1/ 1/ hne flines; 1/ tooint .points; /f pointer to point array 1/ short fhstadrh, Instadr I, f hstctll, ",uabt,. 1 *pgnulII, .pointer 1 adrl, adrh; f«short I) Ox04000(00) = 0x0001; 1* turn 031'1 shadolll 1'0 1/ t«short *l OxCOOOOOBOl Ie= Ox7FFF; If ena.ble cache *1 hstctll = Ishort *1 = (sllort tl 1* host control register 10111 byte 1/ /f host address register high IdOrd *1 1* host addttss register low lIord 1/ pOInter (short II OxFFFOOOOO; /1 pointer to beginning of shadow rill 1/ lines = (line II (OxFFFOOO20I; /1 starting point of line array 1/ points (point II (OxFFFOOO2O); /1 starting pOInt of point array 1/ D{lnUIi (short II (OxFFFOOO20); /flocation of nUlber of polygon vertecesfl nUllifier (short Il (OxFFFOOOI01; II nUlllber of prilitivts to dralll */ adrl = (short I «(Iongl pointer) It OxOOOOFFFFI; adrh = (short) ««(long) pointer) » 16) It OxOOOOFFFFI inlt-video!!); /1 configure for a r£C I'U..TISYNC, non-interlaced, 60Hz I' inlLgraflx(); " initialize graphics enviroJtlfnt f' Init-screenl); /1 initialize screen II inlLvuoort(); /f initialize vieliling window fl seLorigin(320,2401; If place origin at center of screen II *hstadrh adrh; 'I reset start dati address II Ihsh.drl = adrl; Ihstctll 0; / .. turn off any cOllllli.nd to tlte 340 II for In) hstadrh OxCOOOOOFO; OxCOOOOOEO; hstadrl = (short II OxCOOOOODO; = = = = !lihUe Clnstct11' != OXOOO3); = 0x0030, ",t'2, t ..p1n->y2), Instadrh = adrh; *hstadrl adrl; II reset stut data lddress II Ih,tetll II turn off any ' ....nd to the 340 II = =0, telppt s: lr:(pointsHl); set_col ort (col or[teappt-)co lor]); drallll_point( toppt-)x, temppt-)yl; I' ( ~ /1 MAW POINTS 1/ ( IMinO = = - break; case 11 telpint = InUliber; for Ci=O~ i < te.pint; ++il II save point 'I set colors II draw point 'II I' 1/ = = = II' reset start data address II *nstadrh adrh; *hstadr I adr 1; II turn off any cOllMnd to the 340 II Ihstctl1 0; break; /1 SET SCREEN JlACI1 = 2, DATASIIJRTI181 = 3, DATASIIJRTI201 = " llATASIIJRTl221 = 3, llATASIDlTl241 = 0, DATASIIJRTI2bl = DATASIIJRTl2S1 = 4, DATASIDlTl301 = 5, llATASIIJRTl321 = DATASIIJRTl341 =5, IlATASIDlTl3l>1 =b, llATAS!IlRTI381 = " llATASIDlTl401 = b, llAT_Tl42) = 7, IlATAS!IlRTI441 = IlATASIIJRTI461 = 7, llATASIIJRTl481 = 4, DATASIIJRTlSOI = " llATASIDlTl521 = 0, llATASIDlTl541 = 4, llATASIIJRTl5I>1 = IlATASIIJRTI581 = 1, llATASIIJRTlbOl = 5, llATASIOlTl/,21 = llATASIIJRTll>41 = 2, DAT_TlObI = b, DATASIIJRTlbSl = " IlATASIIJRTl701 = 3, DAT_Tl72I = 7, Whl le IACKNCIl.EOOE != 1>; If Mit for C30 to rfSUM- INding II CCfIW/D=O, II shOM no requests data- )otnul = B; dati-)dtnua = 0; '* HfffHffHIH-HHfffHffHHtffHffHfffHffffHHftHKHftHHHHfHHHH t ~. 'I, 1* tuget 1st daughttr object 1/ /f COlINnd to target daughter object 1/ /f wit for C30 to ackno.l ege requestf/ /f withdraw request 1/ *' -1, -1, -1, -1, -1, -1, _=0, '* =" *' c, c, c, c, c, *' fflfHfIIHfHfttH+Hlfffl-HtHHHfHfHfIH+ffHtHfHHtHHHfHftfHffH $ l ••••••••• II................................ II •• IIIIIIIIIIIHHHHItHftHHH ->LiJliIIt 301 PC ""iR._iot to Dr• • 'Pl ...t&rySyst.. of' CIIbt. adj••t ..... j.ctll. ,I, ,I. ,0, ,0, ,0., .4,0. ,j}.I; Ilrttt-Pll'tftlll; torJOl-Pl.ontl I; .iln . 41rt1uc....U, IcrHA..o~jtct()1 ( N.ilftt ilt XI ...1_1 • Ichlt.l - . /. lKlti•• of d." ,..t .... dall • II.... '1 0>£e : I I 1 I I I I 4 0L I 1 / r 1 1 \1 I I I H««<: I I / >- TMS320C30 Applications Board Functional Description Figure s. Single Write-Cycle Timing Diagram 2, 4, 3, 5, H3 H1 . 1 ADD~ STRB --f-\ , , REN I I , 1 I I I 1 ROWSEL MADD , \ , I RO~ \ '/ I I I I I I I I I , , 1 1 / I :x I ROY , /' 1 I I I CAS DATA 1 I I I I I I I I <~ I / 1 1/ I \1I 1 1 : :x 1 1 1 WRj\ RAS I I I I 1 , CO:L : I / \1 I I \ I : :x , / , ): I TMS320C30 Applications Board Functional Description 477 Figure 6. Single Read-Cycle Timing Diagram 2, 4 3, 5, , H3 H1 I X ADD=> I STRB i\. RAS ROWSEL MADD I I I I I I I I : , :\ I I I I I I I I / '/ \: : X DATA :X : CO:L \ CAS ROY \' 11I I RO~ I I I I I I I , I I I I / " I I I I I / :( ): Expansion Interface The APPB 's two expansion connectors contain the signals from the TMS320C30 expansion port, serial ports, flag pins, etc. Each 50-pin connector (P3 and P4 of Figure 7) is composed of a dual row of 25 pins located on O.I-inch centers. These expansion connectorsprovide easy connection to other hardware via standard 50-wire flat ribbon cable. Figure 6 shows the orientation of the connectors. See schematic sheet 7 of Appendix C for pinout details. 478 TMS320C30Applications Board Functional Description Figure 7. TMS320C30 Applications Board PIN 1 PIN 1 cccccccccccccccccccccccc cccccccccccccccccccccccc ccccccccccccccccccccccc ccccccccccccccccccccccc c Dual-Port SRAM Interface All communications between the TMS320C30 and the host occur through the dual-port SRAM, which is 4K~bytes deep, with 8 dedicated semaphore registers. On the host side, the dual-port memory array is memory-mapped, while the semaphores are I/O-mapped. On the TMS320C30 side, the dual-port SRAM is located on the expansion bus with the memory array mapped from Ox00804000-{)x00804FFF and the semaphores mapped from Ox00805FF8-0x00805FFF. The host can directly access the dual-port SRAM without having to compensate for byte-wide access limitations. However, as the TMS320C30 can do only 32-bit accesses, the upper 24 bits of a data word are undefined. The TMS320C30 must therefore format data written to and read from the dual-port SRAM. A software example is given later in this report. While dual-port SRAMs provide an excellent means for multiprocessor communications, a certain amount of software overhead is required to coordinate data flow. As might be expected, there are numerous methods for coordinating data flow. This application report presents a set of primitives that have been developed to form a basic communications protocol. The primitives are written entirely in C and have been tested on the XDS1000 with the simple test routine provided. Remember that there are numerous ways to do a communications protocol. The method shown in this report is not the best for all applications; it is simply a method that makes good use of the capability of the dual-port SRAM. The following are basic ideas of the communications protocol developed for this applications report. 1) The dual-port memory is broken into eight equal segments. The first segment is used only for control structures and command passing. The remaining seven segments are used entirely for data passing. Segment size is set to 512 bytes. The number and size of segments can be changed at compile time if desired. TMS320C30 Applications Board Functional Description 479 2) Each pf the seven data segments is totally independent from any other data segment. However, only one processor can own a particular segment at any given time. The TMS320C30 and host can simultanously access the dual-port SRAM as long as both are not trying to access the same segment. 3) The host is the master; the TMS320C30 is the slave. The TMS320C20 polls the dual-port control segment to determine if the host has deposited a command. If a command is present, the TMS320C30 executes the command and then returns to polling. 4) Only the first semaphore register is used in the dual-port. Each processor uses this semaphore to gain access to the control segment. Access to the seven data memory segments are coordinated via the control structures, not the semaphores. 5) There are seven control structures in the c;ontrol segment, one for each data segment. Each control structure consists of 22 bytes and are defined as follows: Byte Name 0 pflag command buf_stat nc count addr message 1 2 3 4-7 S-ll 12-21 Definition Buffer present (Le., being used) Command to execute Status of the data buffer Reserved Number of 32-bit words to transfer TMS320C30 to read/write data Ten bytes reserved for message passing Appendix A contains routines for the communication primitives used by the host and the TMS320C30. Appendix Al contains routines for the PC side, Appendix A2 routines for the TMS320C30 side. Note that the routines on both sides have the same names and perform essentially the same function. Appendix A3 contains a memory map and description (TMS320C30 view). After the code has been compiled, use the following sequence to execute the test program: 1) Reset the XDS/lOOO: xreSElt [RETURN} c30reset [RETURN} 2) Get jJlto the emulator and load the TMS320C30 dual-port code. emu30 xr 10 xd [esc] q 'yes' [RETURN] 'file name' load emulator reset the c30 load the object file execute disconnect escape to main menu quit emulator At this point, your dual bus code should be executing and waiting for a host input. 3) Execute host dual-port code. 'file name' The host code will then print the numbers 0 through 25 to the screen. 480 TMS32()C30 Applications Board Functional Description Conclusion This report has provided basic functional details of the TMS320C30 APPB. Because of their complexity, the DRAM and dual-port SRAM interfaces have been discussed. The features of the TMS320C30 allow it to encompass a wide range of interfaces. The TMS320C30 bank-switch mode and continuous strobe signal on back-to-back read cycles overcome traditional DSPIDRAM problems of interface difficulty and limited processor address space. A set of communications primitives routines to use with dual-port SRAM have been provided in Appendix A. These routines are written in C for ease of understanding and modification to meet individual needs. TMS320C30 Applications Board Functional Description 481 Appendix A TMS320C30 Application Board Routines, Memory Map and Description Al A2 A3 482 TMS320C30 Application Board Routines - PC Side TMS320C30 Application Board Routines - TMS320C30 Side Memory Map and Description (TMS320C30 View) TMS320C30 Applications Board Functional Description l*'"ftf*tfffH**U***UHHfHUf*Hff*fHffU*UHffHHHff**ffHfH*fHHfl ~ \'I> ~ ~ tv C a C ~ ~ ~. 50 i:l ~ $:l a ~ ;:: ~ [ ,*,,* ,-,-''t* ,*,* ,*,* ,*,* ,*,* APPENDIX Al TItS32OC30 APPllCATlOO BOARD ROOTII£S - PC SIDE Texas Instrulltnts Inc. 10'25'89 Functions; int int int int lnt int int tnt rt ,.,.'* /. ,.,* ,.,- APPB_resetO APPB-dpinlt() APPB-getsell() APPB_relselllO APPB_9.tctlblkll APPB_relctlblkO APPB_get...blkll APPB-putaellllblkil *,*'*, 1* /ltffU**f*fl-ffHHIHf***f***ffIHffffffH***,*HlfHffHf***f"*fffff+H***/ <"l ,* ,* §. /- Constant definitions for the T/'IS320C30 Applications Board. .j>o. *,t'*, IfH******HHf************"********llfflfflIfIHIIHIH*IIHII*****,ffffHf/ outport inoort SEII-BASE I'IAP-REG CTLREG • deflne idefine idefine 'o.fine #define .define CINT XlNTCLR_ JlPSEL SWRESEL lINT CINTCLR_ ~def1l\e MJIANI( .define .define .define tdthne .dlfinl lffiIILSlZE DPRAlLBLKS DPRAlLBlLSIZE NII1..SEIIS tlALSElLTlI1E Oxlooo 7 512 8 10000 .define .dehne BUF..EIIPTY .define .deflne tdeHne NIlP OXOO HOSTJ'IEII..IoIl HOST_I1EM_RD 0xB0 0xB1 ty.. d.f typedef typedef unsigned chir LOIAR; uns I gned snort UINT; unSlgntd long LlONG; typedef struet > "0 "0 t!) _. = c:l. BUF..FULL ~ > ~ ~ s= 00 tN N ( UC/lAR I.OIAR I.OIAR UC/lAR lLINl ULOHG I.OIAR lOPC/UL, pflag; cOlIINnd; buLstat; nCr count; addr; aessageU01; <= <1 tN <= > -...._._. "0 "0 (') ~ jl*ff****+lI*HffHl**"HffHHtfH**fffl*****fHf***H-HfHHfHoHHifffflf/ Idefine .define .deflne #define 'definE' 00 W *,"' *,*' *,*'t' *' *'*'*' *'*' *'*' ., ~ '" Reset APPB Irltialize APPB. Ott iccess to sellipnore bit N Release acctss to stophore oi t N Get i control block in rFRAft Relns-t control block in f.IPRAI'I Get a to Iock of INory from IPRAM Put a block of melory to DPRAI1 t' t' All code lAS cOlpiled lIIith I'Ilerosoft C compiler version 5.1 using tlH! II large lIol1el. If snll IIIOdl1 is used, then pointers used to access the 1/ dual port SRAtt would have to be declared and used as ~far/ pointers Il.e. 32-bit pOinter). Under the large modtl. all pointers are defaulted to 32 bits. ilnclude (stdio.h) ~o *,t' outp inp Ox0330 0x0338 0x0339 0x01 0><02 Ox04 0><08 OxIO 0><20 Ox4Q 0><80 Ideflne HSWAP Idefine .define Idefine IIPRAILCTL 0xC9000000 JlPRIlILSEG 0xC9 DPRAIU1EIIBASE 0xC9000200 0 = t:= C"Il 0 ~ "'1 c:l. ~ ....==_. 0 = t!) t <1 _. 00 c:l. ~ ItfUHH**fHHHHHfHHHfHHHHHff,*fHHHIHHHHHHfflfHffHtH/ i H ,.,- ,'* ,-,.,- Test progra•. seQuence: ~ I "r 1tt i bloCk ~f Itt.ory to the dua I port. 2) Rtid back the block of dita. fro. the dual port. -,W-, -, -,*' W -, /fHHflHlllff,*IfHfHftfHfHfHII******HHHfHfHffffltfH**********H/ JtfffHHHUHftHHHfHHtHHtfffHfnftfHfHIfHHfHHffH....fHHHH/ ,. W /f APPi..rtsttO ,PC side ,-,,-,'*'* ''** Reset APPB. Sequence: U Cltar control "tgister. 2) Stt SHSEL to I. 1/ -, *'-, -,*' *' *' *' /HfHHHHfflHt+HHHfHHHIotfftfHof*tH*****ttt-ft-H****ffIHHHHfHH/ HlnO int APPILreset{ J lII~T ( st.nUl[~RAI'LBlKSl; outpOl't (CTLREG, 0); int i; ULONG outportlCTLREG. SWRESET _); returnlOI; IItlirray[251,lU2arriy[ZSl; APPB_dpinttl; /HfUUHH*HfH*H*UHHHHHUIUtIUHH+HHU**HfU*lHHHHfH*H/ ~ ~ ~ N C Q c :i hrH=O;1<25;i++) {HlIlrray(i] = (Ul.oMiJi; fIflI2array[i] = OUL;) i fIAPPB_putr"llllb 1k(2Sll, MHrray, OxOO8099OOJ) printH Dfai ltd Mllory write\n"); if (APPB_getltelBb 1k12&.l, OxOO8099OO,.ta2arri,yl ) prlntf{UfailtQ IrItflOry f'ud\nn); for{i=O;i{25;i++) printfl"valut read tXlttoh Xd\n",Ite~rri)'[i]); ,. '* ,-'* '* ,-'* ''** APPB_dpintO, PC side Sequence; II 1) Set IPRAf'I stMpbores to 1 Ifree). 2) Set DPRM IIlPping register. 3) Stt IfRAIt globll fnUl. bIt t. 1. /fHfHHHHHfIHIIIIIIllllllllf*ffI'lHfffU*HHHfHU****ItHHHUfHU/ i.t lII'PII_dpi.tIl ( int 1.; UINT ....dd' =SEILBASE, lCHAR fdprn = ; 1!lse retur'o(O); r ~ ~ N ~ ~ ~ i:r... ~. ra. ~ t ~ ~. §' /*ffH".HfHffHfftHt-HHfffHlfHftHfHfHHfHfHHHfHHHfHHHlHf/ A ,. ,.,. ,.,. ,.,. ,. ',.* " ., APPlLpul...blkO, PC side Write block of ....6ry to tbe dual port. i 0 if successful, i. -1 if failed. Return Sequence U Find free block of dual port b write lMao-ry. 21 Write the IItlory. 3) Write HIIory paruetus to control block. A .,., .,.,., ., .,.,., " /111111111 ••• IHHIIIIIIIII •• lllftIHHtffHHHHHfHffHHfHHHHHfHH/ /fHiHfftffHffHffHffHfffHHfHHfHiiHHHffofHHfHfffHfHffHHffH/ A '* '* '* '* '* '* '* ,t ,. '* 't'* Al'f'B..geloeoblkO, PC ,ide RHd block of lIBOry to tbf dual port. Return I 0 if successful, a -1 If ftiled. Sequence 1) Find free block of dUll port for lOory. 2) Write IItllOry plf'ueters to control block. 3) Wait for DIS32OC30 to put requested reeaory into the dual port. 41 Read data f,.. tilt du.l "",t, 5) Release block of dIAl port ulIOry. '* int APPB_putee.blk(cClt.stc,dstl l.l.fHicnt: lJl.CNj fsre; LtONil d.l, DPCNTl UlONG UINT tnt /UHHtHHHHHIIIIIIIIIIIIIIIHHH+fffHHfHH4Hf:lfffH:tfHHHH:lHH/ int APPB_gft"lblk(cnt.src.dst) 1l.(Hi cnt; t.lCt«1S1'e; UI..ON3 fdst; *docll = Ilf'CNTL *llll'RAll..CTL, Idpru; .pblk, i; if(APPB_getctlbikCltdpblk» return 1-1 I; DPCNTL *dpctl = IDPCNTL 'IDfRArl..CTL, lI..ONG fdpru; UINT dpblk, int i; UINT Ii...ut =tlALStlLTlIIE, returBl-ll, 'p'" = WL(I1;tHDPRJlIUEnB41SE + Idpblk * DfRArI..IIU<..Sll£lI, ifIAPPB..getcllblkl~dpblkll for "0 "0 pflog, c...ad; hoi_sIll, of. WANt 1/ CINT XIIITCUL IlPSEI. SIIEET_ JlNT· CIIITCUL struct WtIIR WtIIR WtIIR WtIIR WtIIR UCHAR WtIIR )IIPCIITL, jl 14th... 14th .. lyot4tf ( -/ /- aR5ifM'e cltu' WtIIR, uRsi9aH short UINT; uDsilMd loftg. t.l.IJ&, */ A)) coat .,.,. ,.-pUH with TIIS32OC3O C COitfU I". vtorsion 2.1, usin, the 1/ 1/ ...11_1. Constlat dtfiaiti.". for tile tJtS32OC3O AppHedioas. 8ov4. 1"'4t1 typo4tf typ.4tf 1/ t/ IltffHtHHHHHffHtHHfHfHHHlHflIHHtHHfIHHHHHffHHHHfHI /- 1IlST..I£JUII 1IlST..I£IUID *' block .f ......y t. _ Rood • long lit fr •• tilt _ Read .. cOMlIId and Watters fro. - IQ' Hlfi .. Hlfi .. Hlfi .. Iblk, .cId, kAt, III.ddrl ('Da: ~~ ~~ ~n N CN == O~ ="0 00 .... - .... n Q"I» tD ::t. Q = I'll t:C Q I» a /HH. .UHfllffHftHHHfffHHHHfHHftHftHI ••• IIIIIII ... II ........ ,III/ ~ ~ ~ N Q c i ~r ~. rit i H ~ /t /t Tut ,'-091'_, TftS320C30 Sldt. /f /f St41J.ACt: ;t 11 In1h"hzt ta. tnt' 4al port SRAI't. 2) Poll dUll ~Ol't fOt' c... ndl~ 3) Extcut. C..utJI U ,lIIcount.,.td. /f /f /f *I /f Sequtnce l *' f/ /. 0/ /. /. t! f/ 1) Stt lWW'I SfMfh'Of't5 to 1 (fret), 21 Cltll' Inhrt dtN.l port RAIt. ./ ./ ./ IfHHHff-HH4HHH+HHHfIHKHftHHH4HfHHfHHHHHfHtfHtHHHI int APf'I).4I1ntl) { Mnt) int i; lO¥iA fftfN.ddr • (t.QtAR flsat.8ASE; ltHAR tdp,... = (LOW!: .)IfMtLCTl; ( int I; I'PI\RftS ep&1'1K; forH=Ol i<8l i++l fJtMddr++ :I: 1; f.r(ioO,i(l)ffiAlLSIZE;I++1 .4pr..... N'PL4p ..t( I, fot(" I rtturntOll ( Al'fL9ttC.....(Iopa... " switcb'.,.tU.IICM) ( i wt'" s /. f/ ~ 11f'f'ILdpintli. TIIS32OC3O .i4o. /f ./ 0/ CUt M)); i'§. H (f f/ ItffHIfHIIHfHHftHHHffHHHfH.,II ......... , ••••• IHHHI. . ."HHHI/ ~ [ t/ t/ /HtfHHHHHfHHIHHfftHt. . .tHftflHlHfffHHffHtffffH+tftlttfHfH/ btHk; mt HOSTJlElLIoR' I1fI'B..got_lk(.,..... lICnt •.".,......44r .......... lkl, CIS. HOST..I£ILRD' Af'PLput_lk(.,.,.... o \kit till read 1. ./ ./ If /. /. /. /f ,.,. /. ~ ~B_getstl(), +, llIS32OC3O- side Atttapts to gain .CCtSS of stMph6r-e "stanya' / ,Sequence ./ ./ ., 11 Write 0 to selN.phor-e. 2) Wiit till rud a O. .1 ~ Q c t ~ §. ~. ~ a ~ ~ [ ~ q '6. §' ,. ~lMphore at_ IseanuI" ~ ~ /. H ./ ./ .1 IHHHfHffHfHfHHHtHfHHHfffHfHHfHfHfHHHfHfHHHHHftfHf/ int APPB-getsea( seanu.) UINT stinUl; lnt APPB_relseaCselDnull UINT silma; t stllnull; tSeMddr- :: 0; Ifhile('lsellador If lULl; returnWl; ~ . ~ APPB-relseaO, Tt1S320C30 side /. /fflfHHH+HHHf'll+HHfflHfHfHfHffHHfHHHHHf****fHHHfHHtH/ UCWIR fselliddr· :: (OCHAR f)(SEtLBASE ~ ~ . ./ ./ ~ *stwddr :: H£HAR f)(SEI'lBASE + sflflnua); tselli.ddr = 1; .,hile(!(fselMddr- &: lUL»; r-eturntOl; ~ ~ ~ N C Q 1*******HfHf***"******ffH********HfH*Uf*******fH+fHHffHHffff*******/ /f**H********HHfHfHffff****HHftff***flfffffff****ffffffff**ffff"H****/ ~ ~ ~ *' /* '* /* APPB_getctltlkO, T11S320C3(l slde. ~ /. ;. /. /* ;. Sequence *' /* ;* II ~ /. 21 If block free, set /* 3) 2 ~. ~ !:l $Eoarctl control structures for free block Else, SEllnUil return -1 Hailed to hnd blocki. <, <, 'I '* Releasi block of IfIill'lory In the dual port. Returo a 0 If successful, o! -1 if failed. if ,. ,.,.'* If Sequince Null out the control structure. Returo. int APPB_reldlbHdsellnuli DPCNTL i5- g- ~ i* /+ /f APF'B-gttl ol'lgO , Tf'lS32OC30 sid!!. /*HHHHlffHtfHffHHHHHHftHHfHffHftHffHHHffHfHfHfHfHUfl H ~ */ +/ /+ /+ /+ / fHHHf************fHfffHfHHHHI-HHHfHHHf*****f**Hff******HHIf/ 1+ lnt APPB_9ttlon91src,dstl /+ /+ /+ +/ Get i long lIIord of data fUIII the dUi.l port. LlLON(; Isre: ULONG fdst; f~l'\j;J)~j(32;j+=S) retundOi: lost := (('5rt++) "Qx!XI()OOOffl « j; Starch the (lual port control structures for cOfllMnds. *'+/ /+ /+ Get access hi dual port stlN.phore O. 2) If at end of control structures, reset currenLblk. 3) Seuch control structures for a. cOlltind. 4) If found, fOl'lrIit parUteters, return. /. 5) */ +/ +/ +/ +/ 11 Elst, starch to the tr,d of hst, return. H 1+ /HfffHHfHHfHfHHHHUHtHHtfH****HHHH******************t****H/ iot APPB_gttco..and(lIIpults) MPARI'IS "par·ls; DPCNTl fdpctl = (oPCNTL tlDPRArLCTL; sh.hc int cu ... r-~nLblk = -1; APPB_g~tse .. (ol ; if((urrtnLblk )= DPRAPLBLKS) currerlt_olk = -1: IiIhile(currtnLblk++ (rf'RAtLBlt(S) if(dpctl[currtot-blkl.pflag &: lUll IIIl)arIllS-)II;CIII<3 = dpctJ(curr·ent_bJl::l.collllind & OxOOOOOOff; Klparllls->mbik = current-ttlk; APPB_getl oog(&:dpctl [curr·tot_OJ k]. count ,&:aparills-)acnt); AP?B_9tt 1oog (tcdpct 1(currenLb 1U. ",ddr, &rJpurJs-)li.ddr); APPB_relst'/I"I(O); returrdO); APPB_rtlstm(OJ; mparllls-)lIcmd t..., */ +/ */ +/ StQuence 1+ lnt jl fdst = (Ill; H APPB_getcoltWnd 1), 111S32OC30 5 ide. =NOP; f·eturn(O); APPENDIX A3. Memory Map and Description (TMS320C30 View) Listed below is a summary of the APPB memory map. 0000000040004000004000004400004800004COOOO500000800000802000804000804000805000 805FF7 80SFF8806000808000809800809COO80AOOOFOOOOO- 80SFFF 807FFF 8097FF 809BFF 809FFF EFFFFF F03FFF F00800- FFFFFF 494 003FFF 3FFFFF 4FFFFF 43FFFF 47FFFf 4BFFFF 4FFFFF 7FFFFF· 801FFF 805FFF 805FFF 804FFF 805FF6 EPROM (Boot EPROM/remappable) Unused DRAM space 256K-word DRAM minimum configuration 256K-word DRAM minimum configuration 256K-word DRAM option bank 2 256K-word DRAM option bank 3 Unused SRAM space 1 (16K-byte zero wait-state SRAM) Reserved by TI I/O Devices 4K-byte dual-port SRAM I/O Expansion Bus Control Register R dual-port RAM Semaphores (DO only) Reserved by TI Memory mapped Peripherals RAM Block 0 RAM Block 1 Unused SRAM space 0 (16K-byte zero wait-state SRAM, remappable) Unused TMS320C30 Applications Board Functional Description TMS320C30 Applications Board Functional Description 495 AppendixB Modules Appendix al B2 B3 B4 BS B6 496 Name Module US - TMS320C30 Software Development Board Module U6 - TMS320C30 Software Development Board Module RAMDEC - TMS320C30 Software Development Board Module RDYEN - TMS320C30 Software Development Board Module RAMCONTROL - TMS320C30 SWDS DRAM Module Module RAMDEC - TMS320C30 SWDS DRAM Module TMS320C30 Applications Board Functional Description Appendix Bl. TMS320C30 Software Development Board Module U5 title' DWG NAME DWG # COMPANY ENGR DATE TMS320C30 SOFTWARE DEVELOPMENT BOARD 2554377 TEXAS INSTRUMENTS INCORPORATED NATSESHAN 10/01/88' XSUC8 device 'P20l8'; SAO SAl SA2 SA3 SA4 SAS SA6 SA7 SA8 SA9 NSMEMW GND NSMEMR NSIOW NSGBA NPQ XAEN NRG NQG NDPSEML NDPCEL SGAB NSIOR VCC Pin 1; Pin 2; Pin 3; Pin 4; Pin 5; Pin 6; Pin 7; Pin 8; Pin 9; Pin 10; Pin 11; Pin 12; Pin 13; Pin 14; Pin 15; Pin 16; Pin 17; Pin 18; Pin 19; Pin 20; Pin 21; Pin 22; Pin 23; Pin 24; "PC XT ADDRESS LINES - INPUTS "PC XT MEMORY WRITE STROBE "PC XT MEMORY READ STROBE - INPUT "PC XT 10 WRITE STROBE - INPUT "SDB READ STROBE - OUTPUT "DUAL-PORT ADDRESS RANGE STROBE - INPUT "PC XT BUS TRANSACTION DISABLE - INPUT "SDB CONTROL REGISTER R ENABLE - OUTPUT "SDB DUAL-PORT ADDRESS LATCH ENABLE - OUTPUT "DUAL-PORT SEMAPHORE SELECT - OUTPUT "DUAL-PORT SRAM CHIP ENABLE - OUTPUT "HOST DATA BUS INPUT ENABLE - OUTPUT "PC XT 10 READ STROBE - INPUT SA =[SA9, SA8, SA7, SA6, SAS, SA4, SA3, SA2, SAl ,SAO]; X =.x.; equations = = == == !NQG !XAEN & (SA "h338); !NRG !XAEN & (SA "h339); !NDPSEML = !XAEN & SA9 & SA8 & !SA7 & !SA6 & SAS & SA4 & !SA3 & !NSIOW # !XAEN & SA9 & SA8 & !SA7 & !SA6 & SAS & SA4 & !SA3 & !NSIOR; TMS32OC30 Applications Board Functional Description 4V1 !NDPCEL SGAB !NSGBA = !XAEN & !NPO; = !NSIOW & !XAEN # !NSMEMW & !XAEN ; = !XAEN & !NSIOR & (SA == "b339) # !XAEN & !NSIOR & SA9 & SA8 & !SA7 & !SA6 & SAS &SA4& !SA3 # !XAEN & !NSMEMR & !NPO; end US 498 TMS320C30 Applications Board Functional Description Appendix B2. Module U6 Module U6 title' DWGNAME DWG# COMPANY ENGR DATE TMS320C30 SOFTWARE DEVELOPMENT BOARD 2554377 TEXAS INSTRUMENTS INCORPORATED NATSESHAN 10/01/88' XSUFlO Device CIOAO CIOA1 CIOA2 CIOA3 CIOA4 CIOAS CIOA6 CIOA7 CIOA8 CIOA9 CIOAlO GND CIOA11 CIOA12 TIOW NSRANGE CIORNW NFR NFG NDPMEMGR NDPSEMGR TIOR NCIOSTRB VCC Pin 1; Pin 2; Pin 3; Pin 4; Pin 5; Pin 6; Pin 7; Pin 8; Pin 9; Pin 10; Pin 11; Pin 12; Pin 13; Pin 14; Pin 15; Pin 16; Pin 17; Pin 18; Pin 19; Pin 20; Pin 21; Pin 22; Pin 23; Pin 24; 'P20LB'; X=.x.; C= .c.; CIOA = [CIOA12,CIOAll,CIOA10,CIOA9,CIOA8, CIOA7,CIOA6,CIOAS,CIOA4,CIOA3,CI0A2,CIOA1,CIOAO]; equations !NSRANGE = !NCIOSTRB & !CIOA12 # !NCIOSTRB & (CIOA >= "h1FF7); !NDPMEMGR = !NCIOSTRB & !CIOA12; !NDPSEMGR = !NCIOSTRB & (CIOA >= "h1FF8); TMS320C30 Applications Board Functional Description 499 !NFG !NFR !TIOR !TIOW = !NCIOSTRB &!CIORNW & (CIOA == "h1FF7); ::: !NCIOSTRB & CIORNW & (CIOA == "h1FF7); = NCIOSTRB # (CIOA >= "h1FF7) # !CIOA12 # !CIORNW; = NCIOSTRB # (CIOA >= "h1FF7) # !CIOA12 # CIORNW; test vectors ([CIOA, NCIOSTRB, CIORNW] -> [TIOR, TIOW, NSRANGE, NFG, NFR, NDPMEMGR, NDPSEMGR]); READ OR WRITE TO A SEMAPHORE ["hlFFS, 0, X] -> ["hlFF9, 0, X] -> ["hIFFA, 0, X] -> ["hlFFB, 0, X] -> ["hIFFC, 0, X] -> ["hlFFD, 0, X] -> ["hlFFE, 0, X] -> ["hlFFF, 0, X] -> [0,0,0, 1, 1, 1,0]; [0,0,0, 1, 1, 1,0]; [0,0,0, 1, 1, 1,0]; [0,0,0, 1, 1, 1,0]; [0,0,0,1, 1, 1,0]; [0,0,0,1,1, 1,0]; [0,0,0, 1, 1, 1,0]; [0,0,0,1,1,1,0]; WRITE TO F REGISTER ["h1FF7, 0, 0] -> [0,0,0,0,1,1,1]; READ FROM F REGISTER ["hlFF7, 0,1] -> [0,0,0,1,0,1,1]; NCIOSTRB DISABLED [ X ,1, X] -> [0,0, 1, 1, 1, 1, 1]; EXTERNAL READS ["bl000000000000, 0,1] -> [1,0,1,1,1,1,1]; ["bl000000000001,0, 1] -> [1,0,1,1,1,1,1]; ["blOOOOOOOOOOlO, 0, 1] -> [1,0,1,1,1,1,1]; ["b1000000000011, 0,1] -> [1,0,1,1, 1, 1, 1]; ["bl000000000l00,0, 1] -> [1,0,1,1,1,1,1]; ["blOOOOOOOOOl0l, 0, 1] -> [1, 0,1,1,1,1,1]; ["bl000000000110, 0,1] -> [1,0,1,1,1,1,1]; ["bl000000000111,0, 1] -> [1,0,1,1,1,1,1]; ["bl00000000l000, 0,1] -> [1, 0,1,1,1,1,1]; ["bl00000000l00l, 0,1] -> [1,0,1,1,1,1,1]; 500 TMS320C30 Applications Board Functional Description ["blOOOOOOOOlOl0, 0,1] -> [1,0,1,1,1,1,1]; ["blOOOOOOOOlO11, 0,1] -> [1,0,1,1,1,1,1]; ["blOOOOOOOOllOO, 0,1] -> [1,0,1,1,1,1,1]; ["bl00000000110l, 0,1] -> [1,0,1,1,1,1,1]; ["bl00000000111O,0, 1] -> [1,0,1,1,1,1,1]; ["blOOOOOOOOl111, 0,1] -> [1,0,1,1,1,1,1]; ["hlFFO, 0,1] -> [1,0,1,1,1,1,1]; ["hlFFl, 0,1] -> [1,0,1,1,1,1,1]; ["hlFF2, 0,1] -> [1, 0,1,1,1,1,1]; ["h1FF3, 0,1] -> [1,0,1,1,1,1,1]; ["hlFF4, 0,1] -> [1,0,1,1,1,1,1]; ["hlFFS, 0,1] -> [1,0,1,1,1,1,1]; ["h1FF6, 0,1] -> [1,0,1,1,1,1,1]; EXTERNAL 10 WRITES ["blOOOOOOOOOOOO, 0, 0] -> [0,1,1,1,1,1,1]; ["blO00000000001, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["b100000000001O, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["blOOOOOOOOOOll, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["blOOOOOOOOOlOO, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["blOOOOOOOOOl0l, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["bl000000000110, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["bl000000000111, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["blOOOOOOOOlOOO, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["blOOOOOOOOlOOl, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["blOOOOOOOOlOlO, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["blOOOOOOOOl011, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["bl000000001100, 0, 0] -> [0, 1, 1, 1,1,1,1]; ["b1000000001101, 0, 0] -> [0,1,1,1,1,1,1]; ["blOOOOOOOOlllO, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["bl00000000l111, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["hlFFO, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["h1FFl, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["hlFF2, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["hlFF3, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["hlFF4, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["h1FFS, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; ["h1FF6, 0, 0] -> [0, 1, 1, 1, 1, 1, 1]; test vectors ([CIOAI2, NCIOSTRB, CIORNW] -> [TIOR, TIOW, NSRANGE, NFG, NFR, NDPSEMGR, NDPMEMGR)); DUAL-PORT SRAM READ OR WRITE [0,0, X] -> [0,0,0, 1, 1, 1 ,0]; end U6 TMS320C30 Applications Board Functional Description 501 Appendix B3. Module RAMDEC module RAMDEC title' DWG NAME TMS320C30 SOFIWARE DEVELOPMENT BOARD DWG # 2554377 COMPANY TEXAS INSTRUMENTS INCORPORATED ENGR TONY COOMES DATE 10/01/88' XSUB4 device 'PI6L8'; a12 a13 a14 a15 a16 a17 a18 a19 a20 a21 a22 a23 m_swap vss Pin 1; Pin 2; Pin 3; Pin 4; .Pin5; Pin 6; Pin 7; Pin 8; Pin 9; Pin 11; Pin 13; Pin 14; Pin 15; Pin 10; "c30 address inputs memen sram eprom busen Pin 18; Pin 17; Pin 16; Pin 12; Pin 20; vee "sram/eprom swap bit "dram expansion select " sram select "eprom select "eprom/dram data buffer select madd =[a23,a22,a21,a20,a19,a18,a17,a16,a15,a14,a13,a12]; equations "On reset the eprom and sram maps are swapped " m_swap =0 m_swap = 1 "sram FOOOOO-F03FFF 00OOOO-003FFF "eprom 00000(k)03FFF FOOOOO-F03FFF sram = !«(madd >= "hOOO) & (madd <= "h003) & m_swap) # «madd >= "hFOO) & (madd <= "hF03) & !m_swap»; eprom = !«(madd >= "hOOO) & (madd <= "h003) & !m_swap) # «madd >= "hFOO) & (madd <= "hF03) & m_swap»; memen = !«madd >= "h400) & (madd <= "h4FF»; busen 502 = !(!eprom # !memen); TMS32QC30 Applications Board Functional Description test vectors ([ madd, m_swap] -> [STarn, eprom, memen, busen]) ["hOOO, 1 ] -> [0, 1, 1, 1 ]; ["hOOO, 0 ] -> [ 1, 0, 1, 0]; ["h004, 1 ] -> [1, 1, 1, 1 ]; ["hFOO, 1 ] -> [ 1, 0, 1, 0]; ["hFOO, 0 ] -> [0, 1, 1, 1 ]; ["hFFO, 1 ] -> [ 1, 1, 1, 1 ]; ["hFOO, 1 ] -> [ 1, 0, 1, 0]; ["b400, 0 ] -> [ 1, 1, 0, 0]; ["h4CF, 1 ] -> [ 1, 1, 0, 0]; ["hBOO, 1 ] -> [ 1, 1, 1, 1 ]; endRAMDEC TMS320C30 Applications Board Functional Description 503 Appendix B4. Module RDYEN module ROYEN titJe' OWO NAME TMS320C30 SOFIWARE DEVELOPMENT BOARD OWG # 2554377 COMPANY TEXAS INSTRUMENTS INCORPORATED ENGR TONY COOMES DATE 10/01/88' XSUC3 device elk busen eprom 5ttb rd_wr bhiz oe vss Pin 1; Pin 2; Pin 3: Pin 4; Pin S; Pin 7; Pin 11; Pin 10; daUd dat_wr prdy epromcs vcc PiQ 19; Pin 18; Pip 17; Pin 12; Pin 20; 'P16R4'; "eprom/dram data bus enable "eprom select "c30 strobe "e30 read/write "dram expansion bus hold "data read enable "data write enable "eprom ready "eprom chip select c=.C.; equations "note: bhiz is active for 1 TMS320C30 clock cycle at the end of a dram .. access. This provides the necessary turn off time between .. drarn/eprom accesses. daUd = !(!busen & !strb & rd_wr & bhiz); dat_wr = = .-'- (!busen & !strb & !rd_wr & bhiz); epromcs prdy S04 !(!busen & rd_wr & !strb & leprom & bhiz); !(!busen & !strb & rd_wr & prdy & !eprom & bhiz); TMS32OC30 Applications Board Functional Description test vectors ([elk, strb, busen, rd_wr, eprom, De, bhiz ] -> prdy) [ c, 1, 1, 1, 1, 0, 1 ] -> 1·, ] -> 1·, [ c, 0, 0, 1, 0, 0, [ c, 0, 0, 1, 0, 0, 1 ] -> o·, [ c, 0, 0, 1, 0, 0, 1 ] -> 1·, [ c, 0, 0, 1, 0, 0, 1 ] -> 0·, [ c, 1, 0, 1, 0, 0, 1 ] -> 1·, [ c, 1, 0, 1, 0, 0, 1 ] -> 1·, test vectors ([strb, busen, rd_wr, eprom, bhiz ] -> [dat_rd, dat_wr, epromcsJ) 1 ]; 0, 1 ] -> [ 1, [ 1, 1, 1, 1, 1 ]; [ 0, 0, 1, 1, 1 ] -> [ 0, 0, 1 ]; [ 0, 0, 0, 1, 1 ] -> [ 1, 1, 1 ]; [ 0, 1, 1, 1, 1 ] -> [ 1, 0, 0, 1 ]; 1 ] -> [ 1, [ 1, 0, 1, 1, check eprom 0, 1 ]; 1 ] -> [ 1, [ 1, 0, 1, 0, ]; [ 0, 0, 1, 0, 1 ] '-> [ 0, 0, 1 ]; [ 0, 0, 1, 0, 0 ] -> [ 1, 0, 1 ]; [ 0, 0, 0, 0, 1 ] -> [ 1, 1, 1 ]; 0, 1 ] -> [ 1, [ 0, 1, 1, 0, 1 ]; 0, 1 ] -> [ 1, [ 1, 0, 1, 1, ° ° end RDYEN TMS320C30 Applications Board Functional Description 505 Appendix B5. Module RAMCONTROL Module RAMCONTROL title' DWGNAME 320C30 SWDS DRAM MODULE DWG# 2554397 COMPANY TEXAS INSTRUMENTS INCORPORATED ENGR TONY COOMES DATE· 10/01/88' XDUE5 devi~e 'P16R8'; clk refre[output,ren]) [c, 0, 0, 1, 0, 1 ]->[idle, 1]; [c, 0, 1, 1, 1, 1 ]->[rasO, 1]; [c, 0, 1, 1, 1, 1 ]->[casO, 1]; [c, 0, 1, 1, 1, 1 ]->[cas1, 1]; [c, 0, 1, 1, 1, 1 ]->[casO, 1]; [c, 1, 1, 1, 1, 1 ]->[cas1, 1]; [c, 1, 1, 1, 1, 1 ]->[trp , 1]; [c, 1, 1, 1, 1, 1 ]->[refl, 1]; [c, 1, 1, 1, 1, 1 ]->[ref2, 1]; [c, 1, 1, 1, 1, 1 ]->[ref3, 1]; [c, 0, 1, 1, 1, 1 ]->[ref4, 1]; [c, 0, 1, 1, 1, 1 ]->[idle, 1]; [c, 0, 1, 1, 1, 1 ]->[rasO, 1]; [c, 0, 1, 1, 1, 1 ]->[casO, 1]; [c, 0, 1, 1, 1, 1 ]->[cas1, 1]; [c, 0, 1, 1, 1, 1 ]->[casO, 1]; [c, 0, 1, 1, 1, 1 ]->[cas1, 1]; [c, 0, 0, 1, 1, 1 ]->[trp , 1]; [c, 0, 0, 1, 0, 1 ]->[idle, 1]; test_vectors "write cycle ([c1k,refreq ,strb , rd, memen, be ]->[output,ren]) [c, 0, 0, 0, 0, 1 ]->[idle, 1]; [c, 0, 1, 0, 1, 1 ]->[rasO, 0]; [c, 0, 1, 0, 1, 1 ]->[casO, 0]; [c, 0, 1, 0, 1, 1 ]->[whld, 0]; [c, 0, 1, 0, 1, 1 ]->[whld, 0]; [c, 0, 1, 0, 1, 1 ]->[whld, 0]; [c, 0, 0, 0, 1, 1 ]->[idle, 1]; [c, 0, 0, 1, 0, 1 ]->[idle, 1]; "write cycle Iref [c, 0, 0, 0, [c, 0, 1, 0, [c, 1, 1, 0, [c, 1, 1, 0, [c, 1, 1, 0, [c, 1, 1, 0, [c, 1, 0, 0, [c, 0, 0, 1, [c, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1 1 1 1 1 1 1 1 1 ]->[idle, 1]; ]->[rasO, 0]; ]->[casO, 0]; ]->[whld, 0]; ]->[refl, 0]; ]->[ref2, 0]; ]->[ref3, 1]; ]->[ref4, 1]; ]->[idle, 1]; end RAMCONTROL 508 TMS320C30 Applications Board Functional Description Appendix B6. Module RAMDEC module RAMDEC title' DWG NAME 320C30 SWDS DRAM MODULE DWG # 2554397 COMPANY TEXAS INSTRUMENTS INCORPORATED ENGR TONY COOMES DATE 10/01/88' XDUD5 device 'P16R4'; c1k refclr a18 a19 memen strb mux oe vss Pin 1; Pin 2; Pin 3; Pin 4; Pin 5; Pin 6; Pin 7; Pin 11; Pin 10; "clear refresh stat "c30 address 18 "c30 address 19 "dram board memory enable "c30 strobe "address mux "pal output enable rasO ras1 ras2 ras3 rowsel vee Pin 17; Pin 16; Pin 15; Pin 14; Pin 13; Pin 20; "ras select 0 "ras select 1 "ras select 2 "ras select 3 "row address select c=.C.; equations rasa := ras1 := ras2 := ras3 := !(!refclr # (!a19 & !(!refclr # (!a19 & !(!refclr # (a19 & !(!refclr # ( a19 & rowsel =mux; !a18 & !memen & !strb)); alB & !memen & !strb)); !a1B & !memen & !strb)); a18 & !memen & !strb)); TMS320C30 Applications Board Functional Description 509 test_vectors "page mode read, ref, page mode read ([clk,refclr, memen, strb, a19, a1S, oe]->[rasO, ras1, ras2, ras3]) [ c, 1, 1, 1, 0, 0, 0]->[ 1, 1, 1, 1 ]; [ c, 1, 0, 0, 0, 0, 0]->[ 0, 1, 1, 1 ]; [ c, 1, 0, 0, 0, 1, 0]->[ 1, 0, 1, 1 ]; [ c, 1, 0, 0, 1, 0, 0]->[ 1, 1, 0, 1 ]; [ c, 1, 0, 0, 1, 1, 0]->[ 1, 1, 1, [ c, 1, 1, 0, 1, 1, 0]->[ 1, 1, 1, 1 ]; [ c, 1, 0, 1, 1, 1, 0]->[ 1, 1, 1, 1 ]; [ c, 0, 0, 1, 1, 1, 0]->[ 0, 0, 0, [ c, 1, 0, 1, 1, 1, 0]->[ 1, 1, 1, 1 ]; [ c, 0, 0, 0, 1, 1, 0]->[ 0, 0, 0, [ c, 1, 0, 0, 1, 1, 0 ]->[ 1, 1, 1, °]; ° ]; ° ]; ° ]; test_vectors "rowsel (mux -> rowsel) 1 -> 1; -> 0; endRAMDEC ° 510 TMS320C30 Applications Board Functional Description Appendix C TMS320C30 Application Board Schematics Appendix Cl C2 Title TMS320C30 Software Development Schematics TMS320C30 SWDS DRAM Module Schematics TMS320C30 Applications Board Functional Description 511 Appendix Cl. TMS320C30 Software Development Schematics 512 TMS320C30 Applications Board Functional Description ~ '" ~ NOTES. c ~ ~LL AS. vee IS APPLIED TO PIN 8 OF ALL 8-P1N lC' •• 2 ALS. L8 DEVlCES ARE PREFIXED WITH AN SN'jI4 REUISIONS 1 _~t~'I'ION ...."" I ",,","F:O'}ED PIN 14 OF ALL 1<4-"IH IC' s. PIN 16 OF ALL 16-PIN IC's. PIN 20 OF ALL 28-PIN IC'., ETC. , .. 2' S. GROUND IS APPLlED TO PIN 4 OF ALL 8-PIN Ie's. PIN? OF ALL 104-"IN IC's. PIN 8 OF ALL 16-PIN IC·s. PIN 18 OF ~LL 20-PIN Ie's. ETC. 4. DEVICE TVPE. PIN HUHBERS. AHD REFERENCE DESIGNATOR OF GATES ARE SHOJIN AS FOLLOYS: ~. ee ~ ~ t ~NtJ S. &.. RESISTORS ARE 7. CAPACITANCE VALUES ARE IN nIcIt(lFARADS. 6 7 8 9 18 11 ~. g, DEVICE ,[Y"ES PIN NUMBERS REFERENCE DESIGNATORS ARE IN OHI1S. lr"4 UATT. SX. c aJIITEtITS TITLE PAGE PC lIT COIItECTORS. IIlIFFEIIS. _ ~IUERS DUAL PORT SIIAtI _ EXPAHSIOH IJUSS cotn1IOL CIRCUI'l1I\I EXPAHSIOH BUSS j6JC x 32 SIIAtI ntS3Z8C3B IIlISSES 'llIS3ZIIC38 TlI1EIt SIGHALS. 12-PIH Itrl'EIII'l'ICE. PULL--\JP PACICAGES ADDRESS IJUSS _ COHTRIIL SIGNAL IRJFFERS DATA IJUSS ~IVES TARGET _ COItIECTORS. ItDtORY EXPAHSIOH _ COItIIlC'IURS PAIIALLEL IJUSS tIIttORV EXPAHSIIItI PARALLEL IJUSS COITROL CIRCUf'l1l\l _ £P_ DECOIJPLI KG CAPACl11lRS 4 S W -- 04 UO? 04 .. L 2. AND 3 .. uee. AND ua? '"' RESISTANCE VALUES PAGE 1 2 3 • ~ =-lC>;:•• uo. ~ a. 12 • COItPIJTER ~ NO 1 PART OR IDENTIFYING NUf1BER PARTS I SH .. .. . . . .• . . . . REU SH 2 I a 4 7 6 S • c::••• SH II 12 IS 14 16 IE. I? 18 18 2. 22 23 24 2'6 26 27 28 28 ae 31 .2 33 34 as 36 31 88 3. 40 R•• SH - - 86=is-88 .. --.. ~~ ~ - I. 21 RE. S ... h&n ~S••l'Ian To~v REVISION STRTUS OF SHEETS REU ~-~ 7119 255437S USED ON HlXT !ASSV APPLlCATlOH - - I DIlANIKG : - 110 HOT ..., NOHEHCLATURE OR N. , GDtEIIATED LIS T ,. VI .... IN . - I I-I UNLESS OTHER'-IISE SPECIFIED: I Q - - - ,-- ~ ~ c REVISE ~LY I DESCRIPTtO'~ NOTE:!' TEXAS I HSTIIUIIEHTS Data SysteMS Gx-ou¥!' ---- ntS3Z8C3II SOF'lWARE DEVELOPIIEHT _RD 8"1 _ • ClllUE NONEI r - • ZSS4377 ___ 1~ 1 •.-" lop 12 . VI 4 ....... ~ SGAB [3] ~"""; : '.v~ 5 -4 a :2 1 o [3] S6"0-- 4 5 €. 7 e .s 2A 3A 4A SA 6A 7A eA ~ ~ tv ~ ~17 ttl 2 ~; ~ I XRESET DRU 1133 I vee (3J AS04 UA? ~ .. ~ .. >-Jli . ~ ~ II. XNEMWXMEMR- 113 Xl01-l- 114 XIOR- ~. ~!.U!i ~ .... 118 J3 119 .0.~~Q3··· ._,0.~A~KI~ ~ ~ XDR~'l"" X'DAC'K0-' ~. 0 CLK x"iRQ"; 121 ~ ';S 5" :3 102 1.4 ~ ..... 0.~~~­ ) 11516 C 1;; 7 6 5 -4 112 ac E 18 17 16 15 14 13 ~ ~ ~ 1B 211 3:84:8 511 6B :: ~ .. ,.0.~qrH_~I!~::~XAEN [3]' <1\ SRESET~ 1., 0}~~.H~~:-. ;;l [3J px. px. ~ SD<8:7> 1:22 I XIRQ6 123 I XIRQ5 124 I XIRQ4 125 I XIRQ8 ~ ......0.I!~C.~.2:~ . . . X,T(C, ... ., S'l ?y ev ~ .. 129 4 AS04 --~ -- SDACK0 UA7 STATUS4 XINT [3] -~- (3] 'XAii: v'ce' ~ ....0.9 SC ~ . )-lE- 2 a ~ :::t SA<8: 19) <"l g. SMEMj..,I- §. STiJR= [3J (3] [:oi]"--- SMEMRSIO~J­ - -[8]-_~ ~ (~-IA ~ '"<"l ~" g" I .~ ~ ~ I [21 ~ ~ c S 4 ac S 6 St1EM~- " SlOW 12 13 14 01 IS I. I/Ol 1/02 11"03 1/04 1/05 ~ 10 (2) [2] ~ [:iJ r:;OJ ~- S MW- 11 SMEMR- 1.3 SIOU- 14 SIOR- 23 Ie 18 ". I,--'06 111 112 02 22 21 20 19 18 17 SGAB [21 DPCEL DPSEML QGRG XAEH RG 16 PQ OER R/WL SErlL A0L Fu'i1R a AlL A2L ASL A4L A5L A6L A?L A8L ASL AieL A11L 100L 102L AiR A2R A3R A4R A5R A6R A7R A8R A9R AleR AllR roeR laiR 102R 103L I?3R ~~:~ ~~;~ i~:: 32 ~g;:~! RCK SIOj.J A$32 uns 118 !l4 DPMEMGRdQ 6 UCB 1[5.71 ~ BIOSTRB- 5 9 Ie 11 12 6 7 13 14 8 15 8 16 10 5 4 17 DPCER- AS32 11 ~ ;:: ~ ----1 C!OA I I SIORD'i- ~ ALS541 -;> .4 ~s ~2D SA 6 6A 3D ~ .:!9. ~fll1 4 I lA 2A 7 6 Ie? 11 15 14 -- - CIOD [4_5.B1 TQ ~ 2Q 30 ._. __ . - !~ 7A R- BIOSTRE-l'~ ~ ~ G2 UE5 CK AS32 UB3 20 MSL-JAP_ ~ I~LS17S1Q 40 FG- 1 Q 4D BI OSTRB- AS32 WAITSRANGE- ~ IDALS74 P I8 4 ~ ~ - ~i~~~L.---~ -- __~l4J -.ilL ; DPSEMR- ~ BIOSTRB-,12 4 I"ce _10 C1 HT_ __ XTNicL~DPSEL SWRESETXINT UE9_. . . CK l' 1:;: UC9 DPSEMGR- UDll!1 CKll ~ IT ·8 06 1 11)7 Q(l!I:7} f-- 2 4i5 ,E8E1- ~" 5 AS32 VB3 I 12 7 ~'1 i~-"i l;;:,~ ~ !! [1 ~-L UD. 5 ? ,~~ i* _2 ~ ~ 2D P=;Q~ UCll!1 .- :3 I'" , I,,:", , 4 [fC.1 ALS541 UEll!1 02 Q3 Q4 [4.7] CINT0 P 0 5 ALS74' _ _ _ _.4L_112? 8 12 14 r'.§..!1 4 ~D [8) 4 5 CIORD ....'- 810A<8:11> 1 2 ALS521 '3:3 12 15 Eo AS11 5 ~ UC4 RC K 0 1 2 *~:; 8D CV -- BIO~TRB-- DPOERCIOR/WDPSEMR0 13 TlOW- _ [5,7] CMSTRBSIORD'i- DPCER- UFll!1 :3 [7] A0R 51 46 S0 49 45 ~ 13 114 LS377 §- TIOR SEFi1i lOlL UD3 PAL20t8-lS :3 4 ~ lit .... lit GEL Eo 2 3 7 uR- _15 SGBA- [2) crR 'Ci"E'R 1 DPOEL- 1 2 S 412 C6, 9] TIORD'i DPR/WLDPSEML121 8M EMR- I I I 4 IDT71342S45 PAL20LS-lS I? 9 ';'j ~ I I DPC.EL- W ~ I SA(l!I: 19} r:=:- CL UEB SWRESET- 11 9J TRESET- _ _ _ _ _ _ _ .---!..§.1 113; SRESET-:3 TEXAS I HSTRUHEHTS 4 AS11 UC4 !>WN N. S~sj)",.n [SSUE --I 1216-14-88 !D"""WIt~G Na2S5~377 ---·-l~:v I D ... TE I ZH£E;··~·-- ..... 0\ U\ [aJ rlBANK [7' MNSTRB rlIOR .... W- [7J - w w 0 8 t!----h ~9 ~ ~ 6 1 ~*~: :~ I:~ ~ ~ ~ 1 r.;: .,•• C'17C164-25 CY7Ct64-25 CY'?C164-:25 .,•• •• ~ •• ~ p\L-f.,7 ~. AI I~ A2 A2 A3 •• AS A6 ~ ~ ~ ~ 1/00 ~ 1/01 1/02 t#+ A? AS AS .11 .,2 1'2 • .,3 ~t---f. -- STRTUS6 I ,}-+.? - A1. lias HH~ ~ ~ ~ ~ 1 1 Clf ilE A. AS ~ ~ ~ ~ ~ 1/00 ~ AG A? A8 I/01 1.-'02 A9 1/0$ A1. All A12 AI3 I!; ie ~i0 ~ J--!.L-U- ttH~ ~ 1 1 Clf ilE UJ4 rH-~ •• A6 A? AS AS AI. .11 1/00' ~ 1/01 ti ~: I~ ; ~ ~ [/OS ~.Lli ~~0 : ~ A? AS AS All ~ 1 1 U ilE 1 ...·00' 13 24 1/01 1/02 H-f-~ 1/08 ~~ .,. t~~ ~ ~ r .... 02 ~ A12 A13 lJH4 CY7C164 25 AO AI A2 A3 ~I A' A5 AG A2 AS A4 ~- UG4 A12 .,8 Clf ilE UF4 ~ t-.) acc ~ ~ ~ 2 19 ~ 13 2" I~ ~ I'" ; 11 67 it ~ ~ 112 • rl 1 ;: t ; ~~ :~ AI ~ ~ ~ 2. A3 A' 'S ~ ~ ~~ ~. CY7C164-25 •• .2 AG [/00 A? AS AS AI. .11 AI2 A13 fh\• 2 I-f.M- ~7 [/01 ~ 1/02 ~ l/oa ~ ~ ~ ~!~ ~ ~12 , : 1 I U ilE UJ3 [3.5.8] [3.71 3 CY7C164-25 •• ~~ A1 02 A3 A. ~~ •• AG A? AS AS A1. 011 A12 A13 1/00 ~ 1/01 ~ 1/02 ~ I ..... OS ~ ~~ I~ ;: '2 A3 ~i0 ~ ttH~ ~ 1 1 A9 AI. All "2 ... 1/00 ~ 1/01 ~ 1/02 ~ liaS C'17C164 25 A• !! ., ;: ~ ~ ~ ~ ~ : ~ ~~ ~ ~ ~i0 ~ 1 1 C1r ill[ UG3 lIf3 A2 .3 A. AS AG A7 AS AS 1/00 1/01 1/0:2 I/OS A1. All A12 A13 CE WE rtHf ~* 1" 31 j-- UF3 CIOD BIDA !~ w I TEXAS 1 HSTIIUI'IEtITS ~. g" CY7C164-25 AO AI ~ A. ~ AS ~ Fie. ~ A7 ~ AS C1r mr ~~ - - - - - II~H" ..suo .... Soshan e6~1~:_88Is;:a r;~; - I S:C:M.E: NONE I DRAWING NO 2554377 - I :S:~EE1' 114 ~j ;:sl 4 '" ~ ~ [3.4.81 N C ac [8.181 I CIOD<8:31> CD<8:31> CA<8:23> ~ ~ g. § '" 8 b:I it il ;:s B. C ;:s ~ ~ '...." 'B" §" 4 s • • 7 9 I. 11 12 13 14 15 I. 17 I. 19 2. 21 22 23 24 2. 2. 27 2. 2S •• 31 C4 DS 34 60 De Dl • 2 2 D2 •• 19 D<4- 35 51 D5 D6 C6 AS 2e 36 5 •• •• D7 C7 B7 .7 •••• .,. AS .S CS D9 BI. All Cl. Oil A12 D10 ell Bl2 4 .. AI .2 .3 S DO .4 CS D• A4 .s 3Z8C38 ,.2 I.' ,.4 I •• 12. 119 D7 FIS .21 .22 A23 R/W S4 64 A7 D8 D9 D19 .,..,2 147 163 17S ISS 178 ,.2 194 •• ,.. .s ,.. .9 AI. 21 Dl1 All 52 6 87 22 7 8 28 S 24 38 D12 D13 D14 D15 D16 D17 D18 D19 D29 D21 .'2 013 .,4 10 54 26 II 40 26 12 56 D22 D23 D24 D25 D26 D27 D28 D2S 41 D30 D31 27 9. "4 61. HIS HI. JI. JI4 JI' klS JI2 KI' LIS K13 LI4 MIS .,2 LI3 MI4 HIS MIS LI2 H14 .4 A. 134 150 132 149 ,.5 149 164 .. .,5 , AI. .,7 AI. .,S • 2. STRS HOLDA ?7 fB- F2 • •• I 2 S S 7 •• 10 11 12 13 14 IS I. 17 ,. 19 2. 21 22 2' CR/WCSTRB- • •• I 2 P. H. 3 OS ••• •• •• P. M7 7 I. II 12 I. 14 IS I. 17 I. 19 2• 21 22 2. 24 2. 2. 27 2. 2S •• 31 H7 P7 .7 P. .8•• .,. PS HS MS PI • Oil HI. PII .,2 MI. Hl1 P12 .'S .,4 Mil H12 PIS .,5 PIS 214 20. 186 21. 201 172 21. '.7 2.2 217 20. 21. 21S 2.4 19. 22. 174 20S 221 19. 20. 222 17. 181 2.7 223 224 17. 192 2 •• 225 21. lonB 32BC38 IOD2 IOD3 10D4 IOD5 lOnE. IOD7 ..... I r-- IOAE. IDA? IOAS IOAS rOAte IOAI! IOA12 IOD8 IOD9 IOD10 10Dli 10D12 10D13 10D14 IOD15 IODte. IODl? DX. DXl SCOUT IODte XI IOD19 IOD20 IOD21 IOD22 IOD23 IOD24 IOD26 IOD2e. IOD27 IOD28 IOD29 IOD30 IODal [3.71 HI H3 T'ACi< -#-~ 14 A14 s• DII •I 2 -i}-~--~ *-~-- ;~ :: ~~: ~ ~ -R-.---Wa----}~ 1~ 45 CIS ;: ~!; - 213 197 Q3 CDX0 P2 CDXl ._- -!~ ~ [8] [8] ~~_i&COUT ____ .l§1o.--- B- -fL-___._3__ L~~ ~~_t;~?-e:..!.!l ~~_.£l£!~_--...-l_~l Dl_-.fIOR/W- _L~IB IOR/W ~_._ _ F4 C10§,TRB- ____ ~_ 11iSl'RB 7:9 fi'S'TRB BANKSEL 63 E8 Cf1STRB- __ LL'Z1 f~~~;- -- 2554377 __ ..1.L UB6 06-14-88 -...J IOAe IOAI IOA2 IOAS IOA4 IOA5 IODI UB6 \.Jl 17.".111 CIOA<8: 12} 115 ~!~~I T'EV .'10; Ul I ...... 00 vee C1K- vee 1 i 22K RH18 Ie. 17 18 Is 110 12 18 14 [11 lCRD'iHOLD- El [S] celKIN CRESET B1 Fl Ie. 76 [8] CINT0 H2 107 Hi J1 J2 33 34 lee. 121 122 123 TIi'f3 124 "f"N"'f5 '" ~ ~ C ac ~ £.. CINTlT K4 189 CHlTIR M1 C IIHT0- L3 CINT11 t12 166 153 167 171 2T1 [3] CFSR0 P3 198 (9) CDR! Nt [8] CFSRI M3 181 168 CDRe _ f F14 8.9 E15 75 FIB 88 F12 87 ~ ~ 9 ~ t19~ ~ ~ ~ ~ k B, ~" ;: 137 151 138 152 ee0 ~ 2' a 136 ('Cl CSCIN CMCB ~ §" Kl K2 L1 K8 L2 Ql [9] - CINTE. GINT? CINTS CI NTe! CINTeR [3,11] ----[3 9) --I [3," CLKxt INT2 IN'I4 170 Gl2 M5 cemo [8l 212 CFSX0 [3] 184 1'14 CCLKR0 [8] 182 1'42 CCLKXl [9] I -vec II/ ~-*, 1CLKI XFe XFl 185 N5 C1CLK1 (9) ~112 92 93 [8] Hl1 169 158 177 116 [8l D12 57 G2 CXFe GS CXFl M4 L8 Ea 64 n:rffti fNfTT PADTOG DRO FSR0 Cl LOCATOR l)SUBS 108 N8 188 111;:l >14 MP 43 H8 '2 PH c. selN 38 33 183 133 43 17 21219 23 I 10,IDD 64 HI3 C8 C3 N3 N13 C13 DR1 FSRI PDi)OD 10,/DD ADI)DD ADIJDD DDIJDD DDIJDD 68 D4 NCS 328C38 lJDD UDII UDD UDD ~-.!..!..Q. MDunn INT? IN'I$ IN'IS0T" INTS0R IN'l'$1 T INT31R Mel ~* 196 PI CFSxt [8] 154 L4 CCLKRI (9) 189 P4 C1CLK0 [:3] n:rre: tJBEF" L1.§.... lJSS l)S~3 l)SS l..lSS Dl)SS DIJSS Dl)SS Dl)SS ClJSS CU~3S IUSS ~ UB6 UB6 RH9 BHIZ- TIORD':.:'- TRESET ffi CXFl---12 3 14 15 '--- PEl PEl _C_C_l_.... ~~ L... 6l';]8 ~ vee cce 19 110 22K [5] RHll CSCOU~ VI::C ~ I5,?,11)CH3 vee 3 13 12 ~ ....'"' 'S" ~. I CLKRI 1CLK0 FSXl fffTf 22K "~..£ . L CLKxe FSX0 CLKR0 HOLD lORD':.:' X2!CLKIN RESET ItH0 CIORD,>:, ~ 47 328C38 RDV [8] [9) CINTI [S] CINT2 [9] CIN18CINT4 CINTS tv 61 ?S FS D2 8 u~~081 r-- 931) >-~ I) >-~~ CSCIN 05 vcc 971) >-~ (91) 0~ 111) ~ SOUT <:"f":LK ~- _ II AS08 UDe Sesha:n "1 I I DR .... I4ING NO 2S54377--"·-],P:'" .J...-.___--,:-_-l=,=c...c=-'--~ _ _ _ IJ3EE~-=l06~:-= -,-L-_ _ _ _ _ ~ ~ • r+ ~ N C ac , 1 2 3 4 5 4 G 11 13 7• IS 17 8 ~ t§.... •9 ~. ra. I 8 10 11 12 1. [5] : CR .... W- CA [111.111 ~ ? 1 2 • 6 : h~ 5 6 7 4. 2 5 6 G e7 l' IS 17 ~ 18 lG 14 12 9 I" "2 I.~n lA4 lY4 2Al 2':.'1 2A2 2Y2 2n 2M 2A4 2~4 26 RAe RAg RAte RA11 RAI2 + 7 RAtS RR/~- I UAI I s 5 8 1 e~ 48 6 9 7 10 211 +e-r-T RH3 II 6 ~ ~ • I RSRAMCE- r-n-- 11 • 12 11 1. 15 17 CIOR!~ t 4.51 5 12 713 2 •• [5] CMSTRB- CIOSTRB- ~ "_,,0][3.51 CIOA 4 SRAMCE- ~. , 2 • 4 5 ALS541 1A 2. 6 •• --+ -4 GA 7A -4 ? 19THI l?THS 16TIOR/I.I- [9J 15TIOSTRB- [9J 14TIACK[91 4' " •• 8' lIT G2 12 RIOAS 9 RIOA4 7 RIOA5 2Al 2A2 2'11 2'>'2 2AS 2'1':3 2'14 2A'" 26 UD4 ~ I =Sl l ~RIOA1 A6 ~ •• BlOA •421 20_ RH8 ~ :~.~ • 4 2 7 : RH7 AS244 16 tAl tA2 l'il lY2 t'i4 1·~r lA4 2AI 2A2 2Yl 2Y2 lAS 2A4 Z6 2'iS 2Y4 UD3 I 8' Ie RIOAS 16 14 RIOA9 RJOA10 5 RIOA!1 '1' RIOA12 12 9 '33 2 8 49 6 1. 8 11 RH6 '33 i;trl).MSTRB- ~ 17 RIOStRB- 1 •4 12 5 2 13.4] I R/ NMSTRB BIOStRB- - - [4] 14] [8') ALS541 ., 1~ I. 2A SA 2' S~ 4A SA 5A 4' 6' ~ +J*12 [9] TIOA(II: 12> ~ ~ ~ "e, ~ 7A ~ 1lH18 ~ r--} ~ ~ ~ 1--+ ~~ 7 r!!- ALS541 " IA r---+ 2. r---:. •• >?-~ e. :t: HlIT Ltl9 I IHSTRU"EHTS 2~ 4A 4' 6A 7. I ", I ~ ~ ~ ~ ~ ~ 121 .,.,""' ::t!:t SA t:E G2 = TEXAS \J:) IA~" 1':1'4 8A , .... 5 ~A2 lA4 ~ lIT pt 1 VI -+ --+ -t ~ ~ ~ [9] [9] lY 2' "' ~RIOAI I 33 RH5 '--- .,., i-# rf? 4A SA '1' RH4 12 [3.5.11l CHI [5.6.11] CH3 [S.6] CIOR/14 [S.SlCIOSTRB[5] CIACK- I'll 1'12 2 RIOAa '--- \::l ~ I AS244 tA2 [19] [ ~. 16 tAt RItZ AS244 16 lAI lA2 ~ • I • r+ 8 1 4' 2 • 5 RA? IJA3 33 Rtll 12 1lA4 A t - t ? -IV4 f' 9 2A! lYt 2A2 2':/2 2AS 2YS ±=;=il1"A4 2A4 2"4 ~ RAG RA6 26 [5.9.111 ~ ;:s R ,. I 7 AS244 s, 0018 S •• han r te-0S-88Is;"r~ - I SGIM.I!:: ....ONE I - ---------=r~ NGNO 2554375 J. SHEET ..., 117 ""- V\ t3 t1 .!!! ]B.... (8:31> [S.18J [9.1U ..v, . . I· 2 .8 • 28 4 2• • 27 6 .. 2. 24 8 • 2.:. 2. SA 4A SA SB .... &1 6A 61 ?A SA ~ ? SB: [3.4.5] (11lBR- 18 2. aB ~. " "I ~ 1 4. 58 68 ,. 88 I 188 178 16 Ie 16 11 ~ sa.. 22 a 2A 2. 5 48 14 12 uro .. 18 ? SA .6A 15. 18 13 17 16 S " .V 21 ' .. . SA 12 14 11 15 :8 2. 6. 7. 88 ?A SA ~ a " C 11 I. 10 2' 21 2' 2S ~ "i:i [ 4 6 ,3 •• V 2A SA _ 6 ~ 28 . . .171 ... '1 38 48 16 18 15 19 SA 5B 142. ? SA 8 7A SSA 68 78 88 IS 21 12 22 1128 1<4 3 2A ..' .. 18 ... SA 12 5 .oM 11 6 SA 10 7 &A S 8 7A 8 SeA V~ 28 .. 17 . . .1-4 . SB 16 IS 48 15 12 58 14 11 68 13 Ie 78 12 S 881.18 ~. r llL....!!..9.! a g.f §. ~ .s. g 1824 17 25 162615 27 1428 18 28 12 ae 11 31 as 48 58 68 J 1. 88 a 2A ~ .. 5 6 4A SA GA 7A 8A 4B 58 68 78 88 1 •• .. • 2 I tOl !') :::to 18 2. tJou- • '.. SA .. V "1 8 8 """""" • .. 2• aB I H. ... I. ? I •• I ... 14 S IS 2 12 I " Seshan 8 86-14-88 • ~ "' 4 1[71 1'4 I _TIOD CTCLKl~ ~ ~ [6] [6J CCLI ~o B 1 • §" 7 5 rit'"' • ,L ~ t ..,~ [3] TRESET- [7J TIOSTRB I~INT3(Eo] CINTl- P4 I __ ~., 14e.rIAcK471 ~CIHT249 I '- ..... 15131 C?J [.J C7J THI Io. 051 971 0.1 I •• CDR. [OJ II. CFSRe [.J ~~ lsi 114 CCLKRI 151 171 ,.1 211 2. I 251 271 2.1 ( II. II. 12. .~ 124 12 • 12. ( 1.0 CMII:23> TeLKe [6] CCLKR0 [6] 1 • 5 7 [6] [.J [.J [.J D., FSRI XFO 12 10 vee 9' 11 13 15 [71 • • 2114~ P4 33 I ~~ ~ 381 P4 1.4 I •• I •• 14. ~~ 4S~ 144 ~5-l-?~ "'.'~ I 148 ~~ I? 19 0~') 031, o• 071~ 9.1( [S.7.UI 1e4 I •• I •• II. II. II. '2. 25 I ( '2. 12. '" ~~ P2 • [11] TJORD'¥'- [3] TIOR .... W- [1] TH3 [7] DSTRB- [Ill MENEN[6.11] BHIZ- ~~ ~-t .7 I) >-!9~_.± 2 4 '_ _ 05 1 ) +. • . 33 I • 51 .71 DHI vee ~~ 43 I 144 ~~ ~# 49 I IS0 • 16 ,. >-!..!.§.. 12 1·4~!~ 12 14 171 21 l7 >-lE.-nl7 >-l2"- ,. ~~~~ ~; ,. 2. 22 23 ..J~W.-_2i 2!...!.) >lg PI DRt'~- [Ill DRD't- [ I1J SDACK0 [2J ~~ [il] 112 F-+--'~.'-'I~ P2 1.4 I •• I,. ~-l-> >-liLL ~~ :~ , ~,. 23 ~ ~ ~ 291 23 4 2 ~~~ e ~~ 13 114 8 15 17 191( DD [8.111 ~ 1 )P{02VCC DH3 Pi 38l-->~~4 ~~~2' ~I)~.~ ~4~·s. ~~~ ~>144 [3.11] :~ :-7 >-:: vee vee I 4STS ~-- f- jjE UGZ UJZ ~ 1 04 AS A6 .,. A7 AS AS 1/0a H-}~ 1/01 1/02 1 ..... 03 I ~: ;i ~~ All A12 Ala CE WE UC2 ~ N C ac ~ '15 8 ~ ~ ~ ~ ~ ~ ~ ~ ! ~. ~: ~ 1'2 it e .,•• ~ ~ ~ ~ ~ ~ ~ ~ ~ .2 •• •• •• AS A? AS AS AI. All 1/08 1/01 1/02 1/03 ~ ~ ~ ~ ~i0 ~ ~ ~ ~ .,2 A13 ~ cr 1 ~ I : ~ l! ? I:l C':t'7C164-25 Fi'E 1 WE I~ ~~ •• •• A4 PIG A? I/oe 1 ..... 01 AS AS AI. A11 lI02 1/09 f*H Al t~· ~ ~ A5 AG A? AS AS AI. 1/0121 1/01 1/02 I/OS t~ I!! ;~ I!~ ~; .,2 AIS ~ cr 1 .13 1 ....·00 1/01 1/02: AS' 1/03 .,. ~ IT 1 jjE WE A5 A6 A? AS t~ ! t;· ~ tg ~ ~ .11 ~ A12 IT ; .8 A4 13 28 I~: ~: I" ., All A12 A13 WE UDZ UDZ CD(8:31) BA CSTRJj:- [3_, '3 ';'] CHI ~ ""~ [9] DSTRB- [9] DHl [3] DHS [3.- 8) UC5 [5J c:> a DR/WA~:;32 832 IS, E., 7] lH·::I1 1'::1 , 11 ~' uee ~ ~ ~ ;: PAL16R4-i0 Q I [SI719]CA1~ I 15 15 '1-3 IIP~Ll6L8-10 2 11 CK 11 12 t/02 I3 Ql 14 15 Q2 03 [6 [7 Q4 [/03 ~[i04- 19 18 17 10 15 14 13 12 BR- [8] DRD'.'- [8] B"W("8] PRD'>:'- PRD'l- 12 ~ASll UC4 SRAMCSTRB- REPROI1C$- , OE UC3 !7 I r /0 1 10 8 0_ 8_ AS32 RSRM1C~:. ~lSWAP ., [3] BA [7.1111 '17C2918 8 7 6 5 4 4 6 7 Z 1 28 22 Aa Al A2 AS A4 A5 OUTI2I QUTl OU12 OUT3 OU14 OUTS Ro OUTS A? A8 Fl8 OUT? Ie 21 A10 213 19 CST 18 C53 !Jee DD _ _ _ ~ _ _ .__ {7J.__ 33 IS I8 110 EPROMCS- [8.91 ______ ~ __ ~ __ ~ UB3 UB4 ~' W 5 6 IS IE 18 18 ;~ 11 4 CSTRB CR!W [6,8) EHIZ 14 17 ~ 3 I/~11~!~~~~;:N- l~J 12 13 14" q "6' VI N -2 J)USENEPROM 'l7C281S 9 • ,. 1 11 2 13 3 144 15 5 16 e, I? 7 3 5 4 4 5 6 3 2 7 8 1 28 9 22 11321 vee C52 UHl 20 18 18 A0 Al H2 A3 A4 AS AE. A? OUT0 A8 A. Al. 1~ ~ OUT~, 11 10 OUT.:. OUT3 OUT4 OUTS OUT& OUT7 CST CS2 CS3 UFl 1311 14 12 15 18 16 14 17 1S 2 • 3 5 4 4 5 6 3 2 7 1 • S 23 22 1021 2. vee Ti9 18 AO OUTO Ai A2 OUT2 OUTI OU13 OUT4 OUT5 OUT£. OUT? AS A4 AS AE> A7 :~ A 113 CST g~~ UDl 9 l- - 17~ ~A 1& 11 18 113 13 18 14213 15 21 16 22 17 23 . .' 4 4 ~.., r; ; 23 8 22 1021 -,,£C ','7C2'''3 At . 0 A2 A3 ... 4 AS He, ~~ OUH. 01.117 ~8 AIEl H~~~ - OUTl OUTO. OUl2 OUT3 OU14 OUTS CS3 UBI ~ 1025 .8 11 2Eo 1327 1428 15 IE. 313 17 3} 2-~3 VI ~ 4 vee 1. CIS lJee ,. es. ()CC ~ '" ~ 11 1I ,. 1I 1I ,. I1 1. I ,. 1I ,. 1I ,. 1I ,. I1 ,. 1I ,. 1I ,. 1I .1. .1. C31 032 .1. C38 C34 CBS CS6 C37 CSS C39 C4. C41 C42 10 C48 1. 10 ~ L:~ ,. C4S ~ N C ac J _ ~ ~ 2· g. '"' "CC I +.F:.-'-~--:.F:7 CTI leT> I _. ;E-';-;--:.r:~ 4.7 en I CT4 I CT. } CT6 ~ a ~ ;:, ~ [ ~ '"' -e.'...."' §. , HI (58 1216-16-88 ~, ...!.. ,HI } 1: C60~ 113 C61 ~ '" ~ ~ 8 ____ ~IOTE:3 N ALL c;:, o c;:, 2 8 LS DE~)JCES AI':E > -----r- "'EV PREFIf:ED WITH AN SN74 "0 "0 I'D ::s \lee IS APPLIED TO PIN 8 OF ALL 8-PIN Ie's, ALL 14-PIt~ Ie'g, PIN 16 OF ALL lIs~rIN Ie"'L PHi 20 OF ALL 213-PIN IC'se ETC 4 14 OF ~ ~. (j ~ DEI,IICE PiPE, PIt~ NUrlBERS Aim REFERENCE DE'E.IGNATOF: OF ('elTE"S ARE SHOUN AS FOLLOWS -~O..3---· __ '" .., -.-...!~' 04 -~-:: ·:1U ~ ~ ::: ALS. L _____2_~_J REI)ISIONS GF:OUt-lD IS APPLIED TO PIN 4 OF ALL 8-PIN Ie's, PIN? OF ALL 14-PIt~ Ie'.=:, PIN 8 OF ALL tS-PIN Ie-'s .. PIN HI OF ALL 2i3-PIN Ie's ETC §" it ct:r 6 SPECIFIED OTHEP~JISE f.1'= F'lN ::... ~ --..1 ut~LES~3 U\37 s=rJ1 !JOE. 00 ANJl 04 '" DEUICE T'..-'PES 1 2. AND:3 PIN NUr1BEPS U05 HND U'::'7 .,.. PfFEPEt·KE DE :IGNATORS t.;; f,'E.'3r':,Ti4\-lCE APE PE"'~I-:,T'-'F:S ';' f APA( I I.)RLUe-S T.H~( ~ ARE' IN (IHt1:3 1 .. '4 E I-'ALUr';:' AFF N o s:; I~ATT r ~l MI( F'OFHRHD'O': (j :::?, g- ~ o !2.. rJ1 ~ ~ "€;- rJ1 §" ~ ~ q COMPUTER Pi-I "'T"L- 'TE" NO I PART OR IDENTIF\:'ING ~-1=--1 t~~~IB;R A R-T S GENERATED 'C"laSIO:1~':"-TU: -2 3 1~ ~~ ~ E; 14 15 ~~ tv U\ 17 18 18 ... I!t 2554395 NEXT ASS't 26 .27 28 28 3a -.2 o::.'? 3"' ~ 39 -40 3~ --- 3..± 35 - 7119 - - 7 ON '" VD MFG I s=o ~ = I'D rJ1 n ::r I'D 32BC38 SHDS DRAM MODULE S ~ .... ;:;. MTE: :O"'TE StU . FSCM-~--- ili8888 _.. J: -- r;jO;:;;-~if,:;-;; SCT"'L~~~~L._. __ ~ ____ _ APPLICATION I ~ MANUALLY 1!t7-12-88 RLSE USED REVISE n"TE £N.~ ~~_ - NOT ---~------~~~~~r--.---~~~S,~~~.:U~~~:: .. -~"TE ~-8r-3r10 ~l 2~ ~-4"'3~ -- U\ u: ~~ETh:=r- -- 4 DO r;;IS-T--~.:!:~-~--~~,~;_~~~~·r:L~~-= T _COOMES ------ DRAHING 8 2 2S~Jo-1:3f;rl ~HF.:ET t B til VI VCC ~ N 0\ ~~ 5.) 'Rl :;;: 4. 71< ~ 2 4.?K ____ l .8____ _ CASEN- 1 J--- ~3=i33 -~~~~= U~:08 ~Ut"- ~ ~ - -MHI --- --- AS0S UC5 REFCLR- __ ------"---L...I~:A;;;---·_-- 16 :~ ------ ~6 UH5 CK ---==-____-1 ~ [C~_~N-=-__ ~:~ UE5 p' • AS?4 UJ5 iIlS. CK {7 1. __ J:l.tf~. ____ _ IV <::> a <::> ~ '1:J :::-: g 8 r '" ~ a ~ ;:. ~ t RASEN- [--R~?PlC_ Jl 01 02 '3 0' UB5 0' 11.!2. DE 1 ..... 08 lS U04 12 UD5 - RASEN- 4 f Sf " _______ BQg§.-',g},_. I 1OJ..ISEL DL25HS IN I? L Q2 ~....~ ----.J ~! ~~ __ "::,?:::]::,":E:":EH=-:j5j ~ 14 ~~ r §" " ci --'0 o Ii •• E DRAM COtlTROL ~ '..."' '€i" g" 01'-12-88 _.!.~ __ ., ;;l ~ JJ. ~ ~ N C ac [21 RAS0- CAse [~-­ nl:!.<+-_ MW0- ~ gO 1 2 ... 7 9 ~o 5 6 ? • ~ ~ ;: . •• " a •• •• ~ a , 11 A4 12 AS 13 14 AS A? 15 5 -,. • 17 ~-t·· 01 •2 :3 AS I I , I I "2 • •• 1 ... • DQl 1 I D02 2 18 2 DO' 19 • TF G l!1IS 7 S 4 • 6 ? 11 12 13 14 S 4C 5 1 2 .3 •• Ai' 15 AS TF I •• "17 • Al .2 A. A' A. "C 4 11 6• 7 12 13 14 07 • •15 A. TF •• -i'--;;;- 'z"e §-8 Mi 1 D.2 2 18 19 D •• D." 3~.l.! 1. l? 18 19 CA'S -~g;. "8 S •17 l!1IS 'CA'S • UGI 7_ J~ 12 - " UEI ~ i --- ~i-6 ----:f ~-~=3. DQl_~ nr'2 }" "5 _:?_~ It(~:: ~§! ~ DI!4 ):.;. ..2.7 ';"'7 TF "N[ __ l7 - l' ". A. .6 S ~f~=Ll I ... l'iO~ jj 7 • • • •• DQl 1 DQ2 2 DO. 18 10 DO. 19 11 A4 AS AS •• C1iS 11 Wi • • ." .e Al A2 C'A:~ l1 UCl [ ~ q 'So • •• •• " e • 6 §o 1 2 5 7 9 Al A2 .3 11 A4 12 13 AS AS 14 15 Ai' AS TF • [OJ [7J DO' • 7 • •5 11 12 6 .. 7 I. • • •• I' .2 ~ A3 •• A' DQl 1 DQ2 :2 IS D.4 19 •• .0' .7 A8 ~! ~1 rs--rr ~ 12 18 14 15 CAS l?~ Ii UHl 3 G 16 4 17 RS • 11 UFl 11A<8:S> DD<8:31> tlS44C:':S"; MS44C25 AO Al .2 A8 ."A. A. ~ A7 ~ A. ~ TF TF • 17 H'+ h\--+- Al :? I. 4 • 844 A• '6 'AS 16 '--. 4 1 DQ2 2 1. 6 19 7 D., DQl 1 2 • • "" .2 (---:;; 2--'8 1~_ rk-¥.D., ~ 001 002 HI .3 o. "Oll!····~ • 11 A5 ]:-.!~- '§:_12 ';' DO. ~ D .... 2 .~_..?~ DQ3 J§_~~~. A6 1.,:!. .7 LiS DC14 A. 11 .~..!... ---~-- TF _!6 ~ ·-"17 RAS CAS .. • RS ~ __3 " UDI 1I UBI DRA" BAHK 8 U\ N -l '--_. i II 1 1 I TEXAS IHSTRUttEH'l'S 4 IhSl,"lt:lr.:' ""~ .7-12- •• B - c __ N~ -r- 2 I",·... · f"Am~=~~'U91 J___ I nll=E T , 83 , ~_ VI N 00 ,II ~~-r ~::~= Ju::~r:r:='1"!.~_.-6 2 8 Ae At A2 ! ~ 1:: 5 6 I' S ri" ~ ~ 12 18 14 16 5 -~. ..t:±-·.._'-. -:-.13 AS A6 A? AS TF e Dat ~ ~ DG/2 18 2 nils 18 a: DIl<4 ..._ . ,II S .. 2 8 1~ ..L_ ........... L ... 4 1 2 a 6 1 8 9 4 11 5 6 12 13 7 e k MS44C2S • .,A. • • A' Ae 2 DQt 1 DQ2 2 DQ3 18 DQ 18 14 AI' IS 5 AS TF .. 17 AAS 'C"AS • ti UG2 8 '9 10 11 ~-!,+__ . ~~ A2 4 11 5 12 A4 AS A. • 1. 1. .7 S 1. 7 5 ..~ 16 G C'AS ti W2 "'344(2::':' ~_..§_. ",0 1 Al A2 AS A4 AS AS 17 1_._~ AS . 4__ !J A4 _~_ 11. AS ~._}1 AE. .?... L~ A";' DDt 1 16DIl2 2 17 DQa 18 18 DQ 1~ .. is- .!l.-12 A8 ~~'~IJ~~:~ .1 [If.! -1 ~~ .~,-;? AS ti!f:=-!O ~s TF 6 RAS CAS t-- .... il UE2 _ __..17 CA'? ....• n UC2 ~ N C n V;; c ~ ~ 44C2SE ...B 6 A0 o I? At 2 8 8 11 12 13 14 A2 AS A4 AS AS A? AS A4 AS A6 A? L-LS 5 AS 2 S S 4 11. 5 12 6 13 ? 1-4 8 15 1 2 9 I. 6TF 5 6 7 it ~ [ e At 4~ ~ ~ Ae a: ~. ::s .!_!1_? [81 [71 e DOl Dt'!2 DOS DO TF 1 2 18 18 4 5 6 7 A2 DQl DQ2 DQ3 DQ 1 2 18 19 12 13· 14 -4 1t 5 6 l' 8 15 A8 12 13 14 15 HS404CZSE AS Al A2 AS A-4 Dill AS DIl2 A6 DDs A? DQ AS. I •• 4 RAS 17 C/i§" :3 il lI-I2 .. 17~ Ifn ~ •17 Ifn • ti UF2 • -= 111A<8:B> fI"44('~sr e 6 A0 tj~!.!.. :~ ~ 4 __ 1 2E1 2 21 18 22 A-4 '"5. 12 AS S -is AS TF ~~~~ :~ 19 23. S .TF IS G • IS DQ~12-"2-j:J 28 DQ2 DQ~ J!~ Dil _t:L~.~ 16 G '4 ~AS -i1 cA"s } ti UD2 li UB2 DD<8:31> DRAtI _ K 1 ~ ~ IIIISTIIUIIEHTS~ Ii:jOLEHAN 'J"£ICM; . 'is' gO 6 7 S 9 1 I ~ 10M .':-;;-.8r~~r= • ",,!'La, """"I r""NO~~ ._... ,":"... -'~-l~.~I~.• ~~. ~ ~ ~ ~ Q c ., -:!~--·T ~:;~=.- DIJ~i:.:.LC::Ji!!1;_ ~:::: T ~ a ~. r _t+: it 9 AS 4-:i! 5" 12 '6"-- 13 A4 AS A6 7' 14 '8---15 ---5· A? .~~~~~§ l ,I DQI~. DQ2 2 1 DQ3 18 2: DQ4 19 3 AS TF TMS44C2S6 til 1 6 Ail e 6 Fie €t ..2__. ";I At 2 S I? 2 8 At A2 1 ? a e: 9 AS 11 12 1S 14 A4 AS AG A? 15 AS •I . TF ll" llAS a A2 9 AS 4 11 A4 S 6 ? 8 12 13 14 15 A5 FIe. A? AB b- ,. • 4 - °i? 'C1iS ... a-·il .. I S44C2se ~* ~~ 2 - S A2 2 I L. __'._ ....... . 3 UJ3 ~ ;:s DQl DQ2 DQ3 DQ4 1 2 18 19 8 9 10 11 4 6 ? 8 TF ~ ~ 4 "2""8 1 MS44C25 A0 A1 A2 ~..§. .• _ AS 16 4 DQ 1:2 17" DQZ 18 18 DQ3 19 18 DQ4 11 A4 5 -12 "613 AS AGo ~-=rt :! -- 5 TF !I(ll~L 24 DQ2 1_-"21; DO':: Di,H 18 _~ -26 --.f? ~~ ~AS 17 "C'"AS KJ ~ • II UG3 • II UE3 i 1';0 CAS -.J..---t----r--."¥ W UC3 ~ [ f •-., 'l3. 6 I 2 -~- §" •~11 9 5 __12 • • 13 Al .2 •• .4 A. A6 7 .14 • 7 4 5 • 7 ? • 4 5 " '2 •7 I IS. •••• I. ,. TF "§ I!U •7 G 17 'EA'§" RAS .. 3 - • ;ltfA<8:~IIH3 • 4 • •• •• •• •• •••• • •• • lfo!IS 2 .- .2 DQl 1 DQ2 2 12 •• DO' 18 14 DO .. 19 16 l?~ II UF3 S• •7 • 11 12 04 I 13. .7 I. TF I . "§ I? l:A§ 3 . •• ...••• MS 4(2S •• 1-"'1- •• ? .2 . AS TF 440 I 2 • •• 15 5 16 [81 [71 Dill 1 D.2 2 D•• 18 DO • " • • .,•• • • •••• • i. D•• 21 D.2 2 D•• 18 22 D•• 19 23 LL 54 "12 6 J3 ? I. Al .2 D01W--?~ n o. A. , • TF A7 DQ2'l .. D08 I~_~ nQ4 ..!.lL~.! 5 I. e RAS s·'-7 cAs II UD3 II UB3 DD<8:31> DRAH BANK Z Fl~"""f"N\i"""""NO----'-- 2SS4397 VI ~ r:" VI W o r-- t21 6 ,II R.S3- D'}~L-=r:I ~:~1o 6 A0 1 ? 8 At S 4 9 AS £> 12 2 ;;l 6 7 A4 AS 13 A6 14 A7 S . -- 15 AS 5: TF 16 6 4 ~ _+±1= ==-~ - '" ~ 17 ,I 2 D01~. a .2 .. 5 11 12 .s 18 14 .7 15 6 7 • -, nl;)441,..;2~' •• .1 1 DQ2 2 1 DQa 18 2 DQ4 19 S I ~ • A2 11 L.___ ._.!." 4 •• •• DQl 1 DQ2 2 DQ3 18 DQ 19 .s • •• 8 9 10 11 €'I 6 I? 2 a Ae Al A2 3 4 5 AS A4 AS 6 7 8 TF ~ t! ••• CAS CAS W W4 ];r 9 11 12 13 14 AS A? ..... 15 5 16 1l" 17 CAS M844 " £. DQl DIl2 DQ3 DQ-4 AS TF 1 2 16 I? '7 8 9 Ti 12 e:-i3 18 18 1 U.9 A0 At A2 AS A4 AS A6 A7 7 14 €I" I§.. A8 2r~~ ~g;~~---~ rrGls )8--2€" Df.l4 jfTi --=~6 ~F 4 11 ~+-+- 3 - iI UE4 UG4 1 2 :3 4 5 I, RAS c'A'S 1--1..1" UC4 ~ N C ac ~ ~ .. 121 §" §" ~ it ~ ,.,~... 'S" ~" MS44C £> A0 Al A2 AS 11 A4 12 AS 13 AS 14 A7 8_ 15 AS 5 TF 16 G '"'RAS '" l 6 7 8 9 CaJ [71 "-= t ~7 ¥S DQt '7 8 9 S44 :2 A0 Al A2 AS A4 6 4 4 11 D1l2:2 5 DIlS 18 6 DQ4 19 7 5 6 7 8 12 AS" 13 A6 14 A? 15 A8 STF 16 G 4R1iS " I.tK HA<8:a> DD<8:31> 1 ~7 ~ DIH €'I 1 2 3: MS 4 25 A0 Al A2 A9 A4 12 4 11 DQ2 2 13 DQS 18 14 DQ4 18 15 5 12 AS 13 AS 14 A? 15 AS 5 TF 16 ~ " UF4 1 6 '7 8 9 DQl 1 20 DQl DQ2:2 21 DQ3 18 22 DQ4 19 23 1 28 DQ2:2 28 DQ3 1880" DfJ4 )..fJI 4~ 17 3 ~ "UD4 DRAH BAHl( 3 L ___._ •.. _ _ _ .7-12-8. 2 I -~~~~~i~t~-I~~]T ~ '" ~ ~ N -""-I)CC Q c:> ~ '1:l 2 ~_f~ ~--I--~~~--T-:--~E ~F~~1. ~. '" ~ ~ ~ ;::, ~" ~ ~ r'~:-T-~-r~le l~~~l::---L~- c:> ..c_ .!l_~: Il. I 19 Ilo I 19 I 10 I,~ 1. -T-:-T-l~X 10 }e4 }es }eG }e? }es }C8-iC10 1 I ___ L~:_-L~~_J~_14 } .10 10 10 C15 1 } 10 06 1 } 10 (17 J-~L:--I C20 7r;:-Il~,UT Il~:--r-;~-I~--:r-~~--11o 1 (21 l'n } en } C24 j C25 } C2. 1:" <:27 028 t c:" l no ' ....-. ' - - '--- '---- ~---- '-._.---L-- '---1-~--i :~~ ~±~-:,=i ~~ >-l~--± ~-0 )--~_t~ __.?_ J :-- lC~1_1~3~_J~~ '"..,'"' --1 10 C34 !~~C--l-:_~---} l~T~_I 15 4 7 I? CTl 18 >7'-~' 1~ >-!l~ 12 t} ...l"-~ >-J20~1-122 ~ >_!_2.~· __ .1.§. 2sl)- >~~ 1_~ 270~_-# 28 ~~.l?.~ -;1 17 -,~ ? 1~._ _J'~ -;;I~ '-L!.~ >-"'~ 11: ~!~1i~~~1H P2 391)# .~>-J-iL -.L~.!.~-1~ ~f:!_~_ ~ [:::J __ l,iCC ~~ ~# in~ i{~~fI!: -.----='~--)- ~~l1HH I" j_4.:C. I ~~ ~_1~1£f >-140 }--7 _48 >-I~-~- >- .•..• EXPANSION HEADERS ----,----- V\ V> "-' - 1~~ ~,'l 4 23 ~~:~~~= §" 11 l'O:.~ >l--.L§_~. 1'~~ . P2 '"6" ~ ~ >-H-;--ho 11 13 ~~~~·T--~ --I---:'--~I--~~-'-l: i P2~ ~~_~~~f .! 10 1.10 DD(II: 31> [3.4.5.61 CA(II: 23> [2.8] I~"~o;:;,;:~07-1L~ M.~~r-"o- OR-.''''O-= .L~IIII L <5SUE I B DATE 3 .. __ ' tv ~-~~.~~--~ [7] HA(I!:B> CA 2 CA0 & 2A 11 2. 19 14 13 3A :3B 4A 4B - _~_L1 _.YQI,!_g:L _____ ----L-_ 1 ". • 1A 8 CA le (:142 5 [3.4.5.1>] AS257 ----cA'9 'C:A"1 ~.:?-:~. ____ ._2~_~_~ ! 1! " 2',,' 3't 4" 16_ SEL 12 oc llAl ;;l '" ~ ~ tv C - AS257 CAB CA 12 ----cA4 CAi:;:: -- .:~. '"E ~-1§.' ..£----- ~~-­ 2A? '3 2B 2'1 ------- S ~. __LA 14 314:3 --::-- [3 BB 4A 8',;' I 4 0:. 2 MA 5 ....!!'i.e tlA 4 2 MA Eo r--~--- RH2 12 4" -- 7 ' SEL C oc ~ UA2 '15 2 §' "" ~ a. ~ ;::: CAE. --CATSCA?- i:AT'6-- ~ ..CA 17 I. 2.,. AS257 1 'l 3~ 3B 4A ROWSEL Q .~ 33 .! I •• SEL 4'1112 I 4~ ~ n 2B 3A ~-~RH3 oc o· llA3 ADDRESS HUX ~ ~ ~ ~. §' ~:--__~_,- ..._..._.._... _ .:~Al:'88 S>~:L:;~~;~a,~~~t::?~' BB . TMS320 Bibliography Since the TMS32010 was disclosed in 1982, the TMS320 family has received an ever-increasing amount of recognition. The number of outside parties contributing to the extensive development support offered by Texas Instruments is rapidly growing. Many technical articles are being written about TMS320 applications in the field of digital signal processing. The following articles and papers have been published since 1982 regarding the Texas Instruments TMS320 Digital Signal Processors. Readers who are interested in gaining further information about these processors and their applications may obtain copies of these articles/papers from their local or university library. The articles are broken down into 12 different application categories. Articles in each category are in reverse chronological order (most recent first). Articles having the same publication date are shown in alphabetical order by authors name. The application categories are: 1) General Purpose DSP 2) Graphics/lmaging 3) Instrumentation 4) Voice/Speech 5) Control 6) Military 7) Telecommunications 8) Automotive 9) Consumer 10) Industrial 11) Medical 12) Development Support General Purpose DSP 1) R. Chassaing, "A Senior Project Course in Digital Signal Processing with the TMS320," IEEE Transactions on Education, USA, Volume 32, Number 2, pages 139-145, May 1989. 2) P.E. Papamichalis, C.S. Burrus, "Conversion of Digit-Reversed to Bit-Reversed Order in FFI' Algorithms," Proceedings of ICASSP 89, USA, pages 984-987, May 1989. 3) P.E. Papamichalis, "Application, Progress and Trends in Digital Signal Processing," Proceedings of Mikroelktronik Conference, Baden-Baden, March 1989. 4) R. Chassaing, "Adaptive Filtering with the TMS320C25 Digital Signal Processor," Proceedings of 1989 ASEE Conference, USA, pages 215-217, 1989. 5) P.E. Papamichalis, R. Simar, Jr., "The TMS320C30 Floating-Point Digital Signal Processor," IEEE Micro Magazine, USA, pages 13-29, December 1988. 6) K. Rogers, "The Real-Time Thing (Digital Signal Controller)," Electronic Engineering Times, USA, Number 506, page 85, October 1988. 7) P.E. Papamichalis, "Impact of DSP Devices on Fast Algorithms," Proceedings of the 1988 IEEE DSP Workshop, USA, September 1989. Digital Signal Processing Applications with the TMS320 Family, Vol. 3 533 8) 9) 10) 11) 12) 13) 14) 15) 16) 17) 18) 19) 20) 21) 22) 23) 24) 534 G. Umamaheswari, C. Eswaran,A. Jhunjhunwala, "Signal Processing with a Dual-Bank Memory," Microprocessor Microsystems, Great Britain, Volume 12, Number 4, pages 206-210, May 1988. G. Castellini, P. Luigi, E. Liani, L. Pierucci, F. Pirri, S. Rocchi, "A Multiprocessor Structure Based on Commercial DSP," Proceedings of ICASSP 88, USA, Volume V, page 2096, April 1988. M.R. Civanlar, R.A. Nobakht, "Optimal Pulse Shape Design Using Projections onto Convex Sets," Proceedings of ICASSP 88, USA, Volume D, p. 1874, April 1988. LJ. Eriksson, M.C. Allie, C.D. Bremigan, R.A. Greiner, "Active Noise Control Using Adaptive Digital Signal Processing," Proceedings ofICASSP 88, USA, Volume A, page 2594, April 1988. G. Mirchandani, D.D. Ogden, "Experiments in Partitioning and Scheduling Signal Processing Algorithms for Parallel Processing," Proceedings ofICASSP 88, USA, Volume D, page 1690, April 1988. P. Papamichalis, "FIT Implementation on the TMS320C30," Proceedings of ICASSP 88, USA, Volume D, page 1399, April 1988. AC. Rotger-Mora, "An N-Dimensional SIMD Ring Architecture for Implementing Very Large Order Adaptive Digital Filters," Proceedings of ICASSP 88, USA, Volume V, page 2140, April 1988. J. Santos, J. Parera, M. Veiga, "A Hypercube Multiprocessorfor Digital Signal ProcessingAlgorithm Research," Proceedings ofICASSP 88, USA, Volume D, page 1698, April 1988. R. Simar, A Davis, "The Application of High-Level Languages to Single-Chip Digital Signal Processors," Proceedings ofICASSP 88, USA, Volume D, page 1678, April 1988. K. Bala, "Running on Embedded Power. (Dedicated 32-Bit Microprocessors Used in New Microcontrollers)(Technology Trends: Microprocessors and Peripherals)," Electronic Engineering Times, USA, Number 478, page 34, March 1988. J. Cooper, "DSP Chip Speeds VME Transfer," ESD: Electronic Systems Design, USA, Volume 18, Number 3, pages 47,48,50,51, March 1988. L. Vieira de Sa, F. Perdigao, "A Microprocessing System for the TMS32020," Microprocessing Microprogramming, Netherlands, Volume 23, Number 1-5, pages 221-225, March 1988. G. Wade, "Offset FIT and Its Implementation on the TMS320C25 Processor," Microprocessing Microsystems, Great Britain, Volume 12, Number 2, pages 76-82, March 1988. R. Chassaing, "Digital Broadband Noise Synthesis by Multirate Filtering Using the TMS320C25," Proceedings of 1988 ASEE Conference, USA, pages 394-397, 1988. R. Chassaing, "A Senior Project Course on Applications in Digital Signal Processing with the TMS320," Proceedings of1988ASEE Conference, USA, pages 354-359,1988. L.N. Bohs, R.C. Barr, "Real-Time Adaptive Sampling with the Fan Method," Proceedings of the Ninth Annual Conference of the IEEE Engineering in Medicine and Biology Society, USA, Volume 4, pages 1850-1851, November 1987. T. Kimura, Y. Inabe, T. Hayashi, K. Uchimura, K. Hamazato, "Dual-Chip SLIC Using VLSI Technology," Conference Record of GLOBECOM Tokyo '87, Volume 3, pages· 1766-1770, November 1987. Digital Signal Processing Applications with the TMS320 Family, Vol. 3 25) W.S. Gass, R.T. Tarrant, T. Richard, B.l. Pawate, M. Gammel, P.K. Rajasekaran, R.H. Wiggins, C.D. Covington, "Multiple Digital Signal Processor Environment for Intelligent Signal Processing," Proceedings of the IEEE, USA, Volume 75, Number 9, pages 1246-1259, September 1987. 26) L. Johnson, R. Simar, Jr., "A High Speed Floating Point DSP," Conference Record of MIDCON/87, USA, pages 396-399, September 1987. 27) K.S. Lin, G.A. Frantz, R. Simar, Jr., "The TMS320 Family of Digital Signal Processors," Proceedings of the IEEE, USA, Volume 75, Number 9, pages 1143-1159, September 1987. 28) S.L. Martin, "Wave of Advances Carry DSPs To New Horizons. (Digital Signal Processing)," Computer Design, USA, Volume 26, Number 17, pages 69-82, September 1987. 29) C. Murphy, A. Coats, J. Conway, P. Colditz, P. Rolfe, "Doppler Ultrasound Signal Analysis Based on the TMS320 Signal Processor," 27thAnnuai Scientific Meeting ofthe Biological Engineering Society, Great Britain, Volume 10, Number 2, pages 127-129, September 1987. 30) G.S. Kang, L.J. Fransen, "Experimentation With An Adaptive Noise-Cancellation Filter," IEEE Transactions on Circuits and Systems, USA, Volume CAS-34, Number 7, pages 753-758, July 1987. 31) R. Chassaing, "Applications in Digital Signal Processing with the TMS320 Digital Signal Processor in an Undergraduate Laboratory," Proceedings of the 1987ASEEAnnual Conference, USA, Volume 3, pages 1320-1324, June 1987. 32) D.W. Horning, "An Undergraduate Digital Signal Processing Laboratory," Proceedings of the 1987 ASEE Annual Conference, USA, Volume 3, pages 1015-1020, June 1987. 33) D. Locke, "Digitising In The Gigahertz Range," lEE Colloguium on Advanced A/D Conversion Techniques, Great Britain, Digest Number 48,10/1-4, April 1987. 34) S. Orui, M. Ara, Y. Orino, E. Sazuki, H. Makino, "Realization of IIR Filter using the TMS320," Resident Reports of Kogakuin University, Japan, Number 62, pages 195-204, April 1987. 35) R. Simar, T. Leigh, P. Koeppen, J. Leach, J. Potts, D. Blalock, "A 40 MFLOPS Digital Signal Processor: The First Supercomputer on a Chip," Proceedings of ICASSP 87, USA, Catalog Number 87CH2396-0, Volume 1, pages 535-538, April 1987. 36) R. Simar, "TMS320: Texas Instruments Family of Digital Signal Processors," Proceedings of SPEECH TECH 87, USA, pages 42-47, April 1987. 37) G.y' Tang, B.K. Lien, "A Multiple Microprocessor System For General DSP Operation," Proceedings of ICASSP 87, USA, Catalog Number 87CH2396-D, Volume 2, pages 1047-1050, April 1987. 38) L. Vieira de Sa, "Second MicroProcessor Enhances TMS32020 System," EDN: Electronic Design News, USA, Volume 32, Number 9, pages 230-232, April 1987 . 39) T.J. Moir, T.G. Vishwanath, D.R. Campbell, "Real-Time Self-Tuning Deconvolution Filter and Smoother," International1ournal ofControl, Great Britain, Volume 45, Number 3, pages 969-985, March 1987 40) R. Simar, M. Hames, "CMOS DSP Packs Punch of a Supercomputer," EDN: Electronic Design News, USA, Volume 35, Number 7, pages 103-106, March 1987. Digital Signal Processing Applications with the TMS320 Family, Vol. 3 535 41) S. Sridharan, "On Improving the Performance of Digital Filters Designed Using the TMS32010 Signal Processor," Journal ofElectrical and Electronic Engineers ofAustralia, Australia, Volume 7, Number 1, pages 80-82, March 1987. 42) R. McCammon, "Software Routine Probes TMS32010 Code," EDN: Electronic Design News, USA, Volume 32, Number 4, pages 200,202, February 1987. 43) J. Prado, R. Alcantara, "A Fast Square-Rooting Algorithm Using A Digital Signal Processor," Proceedings of IEEE, USA, Volume 75, Number 2, pages 262-264, February 1987. 44) T.G. Vishwanath, D.R. CamppbelJ, TJ. Moir, "Real-Time Implementation Using a TMS32010 Microprocessor," IEEE Transactions on Industrial Electronics, USA, Volume 1E-34, Number 1, pages 115-118, February 1987. 45) R Chassaing, "Applications in Digital Signal Processing with the TMS320 Digital Signal Processor in an Undergraduate Laboratory," Proceedings of1987ASEE Conference, USA, pages 1320-1324, 1987. 46) R.M. Sovacool, "EPROM Enhances TMS32020 Mu C's Memory," EDN: Electronic DesignNews, USA, Volume 32, Number 1, page 231,1987. 47) F. Kocsis, F. Marx, "Fast DFT Modules For The TMS32010 Digital Signal Processor," Meres and Automation, Hungary, Volume 35, Number 1, pages 6-11,1987. 48) Y.V.V.S. Murty, WJ. Smolinski, "Digital Filters for Power System Relaying," Internationallournal of Energy Systems, USA, Volume 7, Number 3, pages 125-129, 1987. 49) S. Wang, "The TMS32010 High Speed Processor and Its Applications," Mini-Micro Systems, China, Volume 8, Number 3, pages 24-32,1987. 50) G.A. Frantz, K.S. Lin, J.B. Reimer, J. Bradley, "The Texas Instruments TMS320C25 Digital Signal Microcomputer," IEEE Microelectronics, USA, Volume 6, Number 6, pages 10-28, December 1986. 51) P. Renard, "NO Converters: The Advantage of a Mixture of Techniques," Mesures, France, Volume 51, Number 16, pages 80-81, December 1986. 52) M. Ara, E. Suzuki, "Design of Real Time Filter Using DSP," Resident Reports ofKogakuin University, Japan, Number 61, pages 115-127 October 1986. 53) J. Reidy, "Connection of a 12-Bit ND Converter to Fast DSPs/' Electronik, Germany, Volume 35, Number 22, pages 132-134, October 1986. 54) G.R. Steber, "Implementation of Adaptive Filters on the TMS32010 DSP Microcomputer," Proceedings of IECON 86, Catalog Number 86CH2334-1, Volume 2, pages 653-656, September/October 1986. 55) D. Collins, M.A. Rahman, "Digital Filter Design Using The TMS320 Digital Signal Processor," Proceedings of EUSIPCO-86, Volume 1pages 163-166, September 1986. 56) R. Simar, Jr., J.B. Reimer, "The TMS320C25: A 100 ns CMOS VLSI Digital Signal Processor," 1986 Workshop on Applications of Signal Processing to Audio and Acoustics, September 1986. 57) J. Dudas, A. Stipkovits, E. Simonyi, "On The recursive Momentary Discrete Fourier Transform," Proceedings ofEUSIPCO-86, Volume 1, pages 303-306, September 1986. 58) E. Feder, "Digital Signal Processor - General Purpose or Dedicated?," Electronics Industry, France, Number 111, pages 74-82,.september 1986. 59) K. Herberger, "The Use of Signal Processors For Simulating Data Circuits," Proceedingsof EUSIPCO-86, Volume 2, pages 1109-1112, September 1986. 536 Digital Signal Processing Applications with the TMS320 Family, Vol. 3 60) K. Kassapoglou, P. Hulliger, "Implementation of Recursive Least Squares Identification Algorithm on The TMS320," Proceedings of EUSIPCO-86, Volume 2, pages 1263-1266, September 1986. 61) G. Lucioni, "General Processor Application; CAD Tool For Filter Design," Proceedings ofEUSIPCO-86, Volume 2, pages 1335-1338, September 1986. 62) R. Schapery, "A 10-MIP Digital Signal Processor From Texas Instruments," Conference Record of Midcon 86, USA, 1/2/1-11, September 1986. 63) "DSP Microprocessors," In! Elettronica, Italy, Volume 14, Number 7-8, pages 21-28, 64) R.L. Barnes, S.H. Ardalan, "Multiprocessor Architecture For Implementing Adaptive Digital Filters," Conference Record ofICC-86, Catalog Number 86CH2314-3, Volume 1, pages 180-185, June 1986. 65) AD.E. Brown, "EPROMS Simplify TMS32010 Memory System," EDN: Electronic Design News, USA, Volume 31, Number 13, page 230, June 1986. 66) T. Kolehamainen, T. Saramaki, M. Renfors, Y. Neuvo, "Signal Processor Implementation of Computationally Efficient FIR Filter Structures-Theory and Practice," 2ndNordic Symposium on VLSI in Computers and Communications, 10 pages, June 1986. 67) T.G. Marshall Jr.,"Transform Methods For Developing Parallel Algorithms For Cyclie-Block Signal Processing," Conference Record of ICC-86, Catalog Number 86CH2314-3, Volume 1, pages 288-294, June 1986. 68) S. Abiko, M. Hashizume, Y. Matsushita, K. Shinozaki, T. Takamizawa, C. Erskine, S. Magar, "Architecture and Applications of a 100-ns CMOS VLSI Digital Signal Processor," Proceedings ofICASSP 86, USA, Catalog Number 86CH2243-4, Volume 1, pages 393-396., April 1986. 69) T.P. Barnwell, "Algorithm Development and Multiprocessing Issues for DSP Chips," Proceedings of Speech Technology 86, April 1986. 70) W. Gass, "TMS32020 - The Quiek and Easy Solution to DSP Problems," Proceedings of Speech Technology 86, April 1986. 71) M. Hashizume, S. Abiko, Y. Matsushita, K. Shinozaki,T. Takamizawa, S. Magar, J. Reimer, "A 100-ns CMOS VLSI Digital Signal Processor Using Double Level Metal Structure," Semiconductor Group 1986 Technical Meeting, April 1986. 72) R.E. Morley, AM. Engebretson, and J.G. Trotta, "A Multiprocessor Digital Signal Processing System for Real-Time Audio Applications," IEEE Transactions on Acoustics, Speech and Signal Processing, USA, Volume ASSP-34, Number 2, April 1986. 73) S.G. Smith, A Fitzgerald, P.R Denyer, D. Renshaw, N.P. Wooten, R. Creasey, "A Comparison of Micro-DSP And Silicon Compiler Implementations of a Polyphase-Network Filter Bank," Proceedings ofICASSP 86, USA, Catalog Number 86CH2243-4, Volume 3, pages 2207-2210, April 1986. 74) J. Reimer, M. Hames, "Next Generation CMOS Chip Stakes High-Performance Claim on 10-MIPS DSP Operations," Electronic Design, USA, Volume 34, Number 8, pages 141-146, April 1986. 75) w.w. Smith, "Playing to Win: Product Development with the TMS320 Chip," Speech Technology Magazine, March/April 1986. 76) D. Essig, C. Erskine, E. Caudel, and S. Magar, "A Second-Generation Digital Signal Processor," IEEE Journal of Solid-State Circuits, USA, Volume SC-21, Number 1, pages 86-91, February 1986. Digital Signal Processing Applications with the TMS320 Family, Vol. 3 537 77) w.K. Anakwa, T.L. Stewart, "TMS320 Microprocessor-Based System For Signal Pro~ cessing," Proceedings, of the ISMM International Symposium, pages 64-65, February 1986. 78) M. Omenzetter, "Universal Signal Processors Offers High Data Throughput," Electronik, Germany, Volume 35, Number 4, pages 71-77, February 1986. 79) P.F. Regamey, "Matched Filtering Using a Signal Microprocessor TMS320," Mitt. AGEN, Switzerland, Number 42, pages 31-35, February 1986. 80) "TI Set To Show 2nd-Generation DSP," Electronics, USA, pages 23-24, February 3, 1986. 81) "TI Preps CMOS Versions of Signal-Processor Chips," Electronics Engineering Times, USA, page 6, February 3, 1986. 82) D. Wilson, "Digital Signal Processing Moves on Chip," Digital Design, USA, Volume 16, Number 2, pages 33-34, February 1986. 83) "TI Chip Heads for Fast Lane of Digital Signal Processing," Electronics, USA, page 9, January 27, 1986. 84) R.D. Campbell and S.R. McGeoch, "The TMS3201 Digital Signal Processor-An Educational Viewpoint," Internationall ournalfor ElectricalEngineering Education, Great Britain, Volume 23, Number 1, pages 21-31, January 1986. 85) P. Eckelman, "The Cascadable Signal Processor For Digital Signal Processing," Electronics Industry, Germany, Volume 17, Number 10, pages 26-27,1986. 86) R. Cook, "Digital Signal Processors," High Technology, USA, Volume 5, Number 10, pages 25-30, October 1985. 87) C.F. Howard, "A High-Level Approach to Digital Processing Design," Proceedings of MILCOMP/85, USA, October 1985. 88) H.E. Lee, "Versatile Data-Acquisition System Based on the Commodore C-64/C-128 Microcomputer," Proceedings of the Symposium ofNortheastern Accelerator Personnel, USA, Volume 57, Number 5, pages 983-985, October 1985. 89) N.K. Riedel, D.A. McAninch, C. Fisher, and N.B. Goldstein, "A Signal Processing Implementation for an IBM PC-Based Workstation," IEEE Micro, USA, Volume 5, Number 5, pages 52-67, October 1985. 90) K.E. Marrin, "VLSI and Software Move DSP Into Mainstream," Computer Design, USA, Volume 24, Number 9, pages 69-72, September 1985. 91) "Signal Processor ICs: Highly Integrated ICs Making DSP More Attractive," Electronics Engineering Times, USA, pages 37-38, September 2, 1985. 92) K.E. Marrin, "VLSI and Software Move DSP Techniques into Mainstream," Computer Design, USA, September 1985. 93) "High-Speed Four-Channel Input Board," Electronics Weekly, USA, Number 1277, p. 31, July 24, 1985. 94) "4-ChanneIAnalog-Input Board Puts Signal-Processing on VMF Bus," EDN: Electronic Design News, USA, Volume 30, Number 17, page 74, July 1985. 95) R.H. Cushman, "Third-Generation DSPs Put Advanced Functions On-Chip," EDN: Electronic Design News, USA, July 1985. 96) w.w. Smith, Jr., "Agile Development System, Running on PCs, Builds TMS320-Based FIR Filter," Electronic Design, USA, Volume 33, Number 13, pages 129-138, June 6, 1985. ° 538 Digital Signal Processing Applications with the TMS320 Family, Vol. 3 97) 98) 99) 100) 101) 102) 103) 104) 105) 106) 107) 108) 109) 110) 111) 112) 113) 114) S. Magar, Sol. Robertson, and W. Gass, "Interface Arrangement Suits Digital Processor to Multiprocessing," Electronic Design, USA, Volume 33, Number 5, pages 189-198, March 7, 1985. G. Kropp, "Signal Processor Offers Multiprocessor Capability," Elektronik, Germany, Volume 34, Number 6, pages 53-58, March 1985. S. Magar, D. Essig, E. Caudel, S. Marshall and R. Peters, "An NMOS Digital Signal Processor with MUltiprocessing Capability," Digest ofIEEE International Solid-State Circuits Conference, USA, February 1985. "Tl 'Shiva' Chip Outlined," Electronics Engineering Times, USA, page 15, February 18,1985. S. Magar, E. Caudel, D. Essig, and C. Erskine, "Digital Signal Processor Borrows from P to Step up Performance, Electronic Design, USA, Volume 33, Number 4, pages 175-184, February 21,1985. C. Erskine, S. Magar, E. Caudel, D. Essig, and A. Levinspuhl, "A Second-Generation Digital Signal Processor TMS32020: Architecture and Applications," Traitement de Signal, France, Volume 2, Number 1, pages 79-83, January-March 1985. S. Baker, "TI 'Shiva' Chip Outlined," Electronic Engineering Times, USA, Number 317, page 15, February 1985. S. Baker, "Silicon Bits," Electronic Engineering Times, USA, Number 316, page 42, February 1985. H. Bryce, "Board Arrives For Digital Signal Processing on the VMEbus," Electronic Design, USA, Volume 33, Number 2, page 266, 1985. K. Marrin, "VME-Compatible DSP System Incorporates TMS320 Chip," EDN: Electronic Design News, USA, Volume 30, Number 2, page 122, January 1985. C. Erskine and S. Magar, "Architecture and Applications of A Second-Generation Digital Signal Processor," Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, USA, 1985. D.P. Morgan and H.E Silverman, "An Investigation into the Efficiency of a Parallel TMS320 Architecture: DFT and Speech Filterbank Applications," Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, USA, Volume 4, pages 1601-1604, 1985. P. Harold, "VME Bus Meeting Sparks Change in Standard, New Products," EDN: Electronic Design News, USA, Volume 29, Number 26, page 18, December 1984. W. Loges, "A Code Generator Sets up the Automatic Controller Program for the TMS320," Elektronik, Germany, Volume 33, Number 22, pages 154-158, November 1984. H. Volkers, "Fast Fourier Transforms with the TMS320 as Coprocessor," Elektronik, Germany, Volume 33, Number 23, pages 109-112, November 1984. Keun-Ho Ryoo, "On the Recent Digital Signal Processors," Journal of South Korean Institute of Electrical Engineering, South Korea, Volume 33, Number 9, pages 540-549, September 1984. D. Wilson, "Editor's Comment," Digital Design, USA, Volume 14, Number 9, page 14, September 1984. "Signal Processors Will Squeeze Into One Chip, Says TI's French," Electronics, USA, Volume 57, Number 9, pages 14,20, May 1984. Digital Signal Processing Applications with the TMS320 Family, Vol. 3 539 115) S. Mehrgardt, "32-Bit Processor Produces Analog Signals," Elektronik, Germany, Volume 33, Number 7, pages 77-82, April 1984. 116) S. Magar, "Signal Processing Chips Invite Design Comparisons," Computer Design, USA, Volume 23, Number 4, pages 179-186, April 1984. 117) S. Mehrgardt, "General-Purpose Processor System for Digital Signal Processing," Elektronik, Germany, Volume 33, Number 3, pages 49-53, February 1984. 118) T. Durham, "Chips: Familiarity Breeds Approval," Computing, Great Britain, page 26, January 1984. 119) J. Bradley and P. Ehlig, "Applications of the TMS32010 Digital Signal Porcessor and Their Tradeoffs," Midcon/84 Electronic Show and Convention, USA, 1984. 120) J. Bradley and P. Ehlig, "Tradeoffs in the Use of the TMS32010 as a Digital Signal Processing Element," Wescon/84 Conference Record, USA, 1984. 121) E. Fernandez; "Comparison and Evaluation of 32-Bit Microprocessors," Mini/Micro Southeast Computer Conference and Exhibition, USA, 1984. 122) D. Garcia, "Multiprocessing with the TMS32010," Wescon/84 Conference Record, USA,1984. 123) S. Magar, "Architecture and Applications of a Programmable Monolithic Digital Signal Processor - A Tutorial Review," Proceedings of IEEE International Symposium on Circuits and Systems, USA, 1984. 124) D. Quarmby (Editor), "Signal Processor Chips," Granada, England 1984. 125) R. Steves, "A Signal Processor with Distributed Control and Multidimensional Scalability," Proceedings ofIEEE NationalAerospace and Electronics Conference, USA, 1984. 126) V. Vagarshakyan and L. Gustin, "On A Single Class of Continuous Systems - A Solution to the Problem on the Diagnosis of Output Signal Characteristics Recognition Procedures," IZV. AKAD. NAUK ARM. SSR, SER. TEKH. NAUK, USSR, Volume 37, Number 3, pages 22-27, 1984. 127) J. So, "TMS320 - A Step Forward in Digital Signal Processing," Microprocessors and Microsystems, Great Britain, Volume 7, Number 10, pages 451-460, December 1983. 128) J. Elder and S. Magar, "Single-Chip Approach to Digital Signal Processing," Wescon/83 Electronic Show and Convention, USA, November 1983. 129) M. Malcangi, "VLSI Technology for Signal Processing. III," Elettronica Oggi, Italy, Number 11, pages 129-138, November 1983. 130) P. Strzelcki, "Digital Filtering," Systems International, Great Britain, Volume 11, Number 11, pages 116-117, November 1983. 131) W. Loges, "Digital Controls Using Signal Processors," Elektronik, Germany, Volume 32, Number 19, pages 51-54, September 1983. 132) "TI's Voice Chip Makes Debut," Computerworld, USA, Volume 17, Number 15, page 91, April 1983. 133) L. Adams, "TMS320 Family 16/32-Bit Digital Signal Processor, An Architecture for Breaking Performance Barriers," Mini/Micro West 1983 Computer Conference and Exhibition, USA, 1983. 134) R. Blasco, "Floating-Point Digital Signal Processing Using a Fixed-Point Processor," Southcon/83 Electronics Show and Convention, USA, 1983. 540 Digital Signal Processing Applications with the TMS320 Family, Vol. 3 135) R. Dratch, "A Practical Approach to Digital Signal Processing Using an Innovative Digital Microcomputer in Advanced Applications," Electro '83 Electronics Show and Convention, USA, 1983. 136) C. Erskine, "New VLSI Co-Processors Increase System Throughput," Mini/Micro Midwest Conference Record, USA, 1983. 137) L. Kaplan, "Flexible Single Chip Solution Paves Way for Low Cost DSP," Northcon/83 Electronics Show and Convention, USA, 1983. 138) L. Kaplan, "The TMS3201O: A New Approach to Digital Signal Processing," Electro '83 Electronics Show and Convention, USA, 1983. 139) S. Mehrgardt, "Signal Processing with a Fast Microcomputer System," Proceedings ofEUSIPCO-83 Second European Signal Processing Conference, Netherlands, 1983. 140) L. Morris, "A Tale of Two Architectures: TI TMS 320 SPC VS. DEC Micro/J-11," Proceedings of IEEE International Conference on A coustics, Speech and Signal Processing, USA, 1983. 141) L. Pagnucco and D. Garcia, "A 16/32 Bit Architecture for Signal Processing," Mini/ Micro West 1983 Computer Conference and Exhibition, USA, 1983. 142) J. Potts, "A Versatile High Performance Digital Signal Processor," Ohmcon/83 Conference Record, USA, 1983. 143) J. Potts, "New 16/32-Bit Microcomputer Offers 200-ns Performance," Northcon/83 Electronics Show and Convention, USA, 1983. 144) R. Simar, "Performance of Harvard Architecture in TMS320," Mini/Micro West 1983 Computer Conference and Exhibition, USA, 1983. 145) K. McDonough, E. Caudel, S. Magar, and A. Leigh, "Microcomputer with 32-Bit Arithmetic Does High-Precision Number Crunching," Electronics, USA, Volume 55, Number 4, pages 105-110, February 1982. 146) K. McDonough and S. Magar, "A Single Chip Microcomputer Architecture Optimized for Signal Processing," Electro/82 Conference Record, USA, 1982. 147) L. Kaplan, "Signal Processing with the TMS320 Family," Midcon/82 Conference Record, USA, 1982. 148) S. Magar, "Trends in Digital Signal Processing Architectures," Wescon/82 Conference Record, USA, 1982. Graphics/Imaging 1) J.A. Lindberg, "Color Cell Compression Shrinks NTSC Images," ESD: Electronic Systems Design Magazine, USA, Volume 17, Number 10, pages 91-96, October 1987 2) S. Ganesan, "A Digitial Signal Processing Microprocessor Based Workstation For Myoelectric Signals," Fifth International Conference on System Engineering, USA, Catalog Number 87CH2480-2, pages 427--438, September 1987. 3) JU. Pokovny, O. Skoloud, "Digitisation of a Video Signal From a Television For a Microcomputer," Sdelovaci Tech., Czechoslovakia, Volume 35, Number 6, pages 207-211, June 1987. 4) M.E. Bukaty, "A Vehicle Identification System For Surveillance Applications," Topical Meeting on Machine Vision. Technical Digest Series, USA, Volume 12, pages 106-109, March 1987. Digital Signal Processing Applications with the TMS320 Family, Vol. 3 541 5) 6) 7) 8) 9) 10) 11) 12) 13) 14) 15) KN. Ngan, AA Kassim, H.S. Singh, "Parallel Image-Processing System Based on THe TMS32010 Digital Signal Processor," lEE Proceedings in Electronics, Great Britain, Volume 134, Number 2, pages 119-124, (March 1987. KN. Ngan, AA Kassim, H. Singh, "A TMS3201O-Based Fast Parallel Vison Processor," Proceedings ofthe International Workshop on IndustrialApplications ofMachine Vision and Machine Intelligence, Catalog Number 87THOI66-9, pages 156-161, February 1987. P. Bellamah, "Hardware-Software Increases Video Storage Capacity," PC Week, USA, Volume 4, Number 4, page 15, January 27 1987~ J.M. Younse, "Motion Detection Using the Statistical Properties of a Video Image," Proceedings of SPIE International Society of Optical Engineering, USA, Volume 697, pages 233-243, August 1986. T. Gehrels, B.G. Marsden, R.S. McMillan, J.V. Scotti, "Astrometry With a Scanning CCD," Astronomylournal, USA, Volume 91, Number 5, pages 1242-1248, May 1986. S. Srinivasan, AK Jain, T.M. Chin, "Cosine Transform Block Codec For Images Using TMS32010," IEEE International Symposium on Circuits and Systems, USA, Catalog Number 86CH2255-8, Volume 1, pages 299-302, May 1986. D.M. Holburn and I.D. Sommerville, "A High-Speed Image Processing System Using the TMS3201O," Software and Microsystems, Great Britain, Volume 4, Number 5-6, pages 102-108, October-December 1985. C. D: Crowell and R. Simar, "Digital Signal Processor Boosts Speed of Graphics Display Systems," Electronic Design, USA, Volume 33, Number 7, pages 205-209, March 1985. J. Reimer and A Lovrich, "Graphics with the TMS32020," WESCON/85 Conference Record, USA, 1985. H. Megal and A Heiman, "Image Coding System - A Single Processor Implementation," MILCOM/85 IEEE Military Communications Conference Record, USA, 1985. G. Gaillat, "The CAPITAN Parallel Processor: 600 MIPS for Use in Real Time Imagery," TraitementdeSignal, France, Volume 1, Number 1, pages 19-30, October-December 1984. Instrumentation 1) G.R. Halsall, D.R. Burton, MJ. Lalor, C.A Hobson, "A Novel Real-Time Opto-Electronic Profilometer Using FFT Processing," Proceedings of ICASSP 89, USA, pages 1634-1637, May 1989. 2) AJ. Pratt, R,E. Gander, B.R. Brandell, "Real-Time Median Frequency Estimator," Proceedings oftheNinthAnnual Conference ofthe IEEE Engineering inMedicine andBiology Society, USA, Volume 4, pages 1840-1841, November 1987. 3) D.Y. Cheng, A Gersho, "A Fast Codebook Search Algorithm For Nearest-Neighbor Pattern Matching," Proceedings of ICASSP 86, USA, Catalog Number 86CH2243-4, Vol 1, pages 265-268, April 1986. 4) Y. Chikada, M. Ishiguro, H. Hirabayashi, M. Morimoto, K Morita, T. Kanazawa, H. Iwashita, K Nakazima, S. Ishikawa, T. Takahashi, K Handa, T.Kazuga, S. Okumura, T. Miyazawa, K Miura, S. Nagasawa, "A Very Fast FFT Spectrum Analyzer For Radio 542 Digital Signal Processing Applications with the TMS320 Family, Vol. 3 Astronomy," Proceedings ofICASSP 86, USA, Catalog Number 86CH2243-4, Volume 4, pages 2907-2910, April 1986. 5) R.C. Wittenberg, "Four Microprocessors Power Multifunction Analyzer," Electronic Engineering Times, USA, Number 306, page 30, November 1984. 6) D. Lee, T. Moran, and R. Crane, "Practical Considerations for Estimating Flaw Sizes from Ultrasonic Data," MaterialsEvaluation, Volume 42, Number 9, pages 1150-1158, August 1984. 7) S. Magar, R. Hester, and R. Simpson, "Signal-Processing c Builds FFT-Based Spectrum Analyzer," Electronic Design, USA, Volume 30, Number 17, pages 149-154, August 1982. Voice/Speech 1) A. Aktas, H. Hoge, "Multi-DSP and VQ-ASIC Based Acoustic Front-End for Real-Time Speech Processing Tasks," Proceedings of EUROSPEECH 89, pages 586-589, September 1989. 2) D. Bergmann, D. Boillon, F. Bonifacio, R. Breitschadel, "Experimental Speech Input/ Output System," Proceedings of ICASSP 89, USA, pages 1138-1141, May 1989. 3) J. DellaMorte, P.E. Papamichalis, "Full-Duplex Real-Time Implementation of the FED-STD-1015 LPC-10e Standard V.52 on the TMS320C25," Proceedings of SPEECH TECH 89, pages 218-221, May 1989. 4) B.I. Paw ate, G.R. Doddington, "Implementation of a Hidden Markov Model-Based Layered Grammar Recognizer," Proceedings ofICASSP 89, USA, pages 801-804, May 1989. 5) P.E. Papamichalis, "High Quality Speech Coding: Some Recent Algorithms," Proceedings of SPEECH TECH 89, pages 329-333, May 1989. 6) J.C. Ventura, "Digital Audio Gain Control for Hearing Aids," Proceedings of ICASSP 89, USA, pages 2049-2052, May 1989. 7) N. Matsui, H. Ohasi, "DSP-Based Adaptive Control of a Brushless Motor," IEEE Industry Application Society Annual Meeting, USA, October 1988. 8) A. Albarello, R. Breitschaedel, A. Ciaramella, E. Lenormand, "Implementation of an Acoustic Front-End For Speech Recognition," CSELTTechnical Report, Italy, Volume 16, Number 5, pages 455-459, August 1988. 9) D. Curl, "Voice Over Data Means More For Your Money," Communications, Great Britain, Volume 5, Number 8, pages 27-29, August 1988. 10) H. Hanselman, H. Henrichfreise, H. Hostmann, A. Schwarte, "Hardware/Software Environment for DSP-Based MuItivariable Control," 12th. IMACS World Congress, July 1988. 11) J.B. Attili, M. Savic, J.P. Campbell, Jr., "A TMS32020-Based Real Time Text-Independent, Automatic Speaker Verification System," Proceedings ofICASSP 88, USA, Volume S, page 599, April 1988. 12) D. Chase, A. Gersho, "Real-Time VQ Code book Generation Hardware For Speech Processing," Proceedings of ICASSP 88, USA, April 1988. 13) T. Kohonen, K. Torkkola, M. Shozaki, J. Kangas, O. Venta, "Phonetic Typewriter for Finnish and Japanese," Proceedings of ICASSP 88, USA, Volume S, page 607, April 1988. Digital Signal Processing Applications with the TMS320 Family, Vol. 3 543 14) I. Lecomte, M. Lever, L. Lelievre, M. Delprat, A Tassy, "Medium Band Radio Communications," Proceedings of ICASSP 88, USA, April 1988. 15) J.B. Reimer, K.S. Lin, "TMS320 Digital Signal Processors in Speech Applications," Proceedings of SPEECH TECH '88, April 1988. 16) M. Smmendorfer, D. Kopp, H. Hackbarth, "A High-Performance Multiprocessor System for Speech Processing Applications," Proceedings ofICASSP 88, USA, Volume V, page 2108, April 1988. 17) P. Vary, K. Hellwig, R. Hoffmann, RJ. Sluyter, C. Garland, M. Russo, "Speech Codec for the European Mobile Radio System," Proceedings of ICASSP 88, USA, Volume S, page 227, April 1988. 18) A Hunt, "A Speaker-Independent Telephone Speech Recognition System: The VCS TeleRec," Speech Technology, USA, Volume 4, Number 2, pages 80-82, March-April 1988. 19) R.A Sukkar, J.L. LoCicero, J. W. Picone, "Design and Implementation of a Robust Pitch Detector Based on a Parallel Processing Technique," IEEE Journal of Selected Areas of Communications, USA, Volume 6, Number 2, pages 441-451, February 1988. 20) AZ. Baraniecki, "Digital Coding of Speech Algorithms and Architecture," Proceedings of IECON '87, November 1987. 21) G.R. Steber, "Audio Frequency DSP Laboratory on a Chip-TMS3201O," Proceedings of IECON '87, Volume 2, pages 1047-1051, November 1987. 22) S.H. Kim, K.R. Hong, H.B. Han, W.H. Hong, "Implementation of Real Time Adaptive Lattice Predictor on Digital Signal Processor," Proceedings ofTENCON 87, South Korea, Volume 3, pages 1131-1135, August 1987. 23) J.B. Reimer, M.L. McMahan, W.W. Anderson, "Speech Recognition For a Low Cost System Using a DSP," Digest of Technical Papers for 1987 International Conference on Consumer Electronics, June 1987. 24) A Ciaramella, G. Venuti, "Vector Quantization Firmware For an Acoustical Front-End Using the TMS32020," Proceedings of ICASSP 87, USA, Catalog Number 87CH2396-0, Volume 4, pages 1895-1898, April 1987. 25) G.A Frantz, K.S. Lin, "A Low Cost Speech System Using the TMS320C17," Proceedings of SPEECH TECH '87, pages 25-29, April 1987. 26) Z. Gorzynski, "Realtime Multitasking Speech Application on the TMS320," Microprocessors and Microsystems, Great Britain, Volume 11, Number 3, pages 149-156, April 1987. 27) P. Papamichalis, D. Lively, "Implementation of the DOD Standard LPC-I0/52E on the TMS320C25," Proceedings of SPEECH TECH '87, pages 201-204, April 1987. 28) B.I. Pawate, M.L. McMahan, R.H. Wiggins, G.R. Doddington, P.K. Rajasekaran, "Connected Word Processor on a Multiprocessor System," Proceedings ofICASSP 87, USA, Catalog Number 87CH2396-0, Volume 2, pages 1151-1154, April 1987. 29) S. Roucos, A Wilgus, W. Russell, "A Segment Vocoder Algorithm For Real-Time Implementation," Proceedings of ICASSP 87, USA, Catalog Number 87CH2396-0, Volume 4, pages 1949-1952, April 1987. 30) H. Yeh, "Adaptive Noise Cancellation For Speech With a TMS32020," Proceedings of ICASSP 87, USA, Catalog Number 87CH2396-0, Volume 2, pages 1171-1174, April 1987. 544 Digital Signal Processing Applications with the TMS320 Family, Vol. 3 31) R. Conover, D. Gustafson, "VLSI Architecture For Cepstrum Calculations," 1987 IEEE Region 5 Conference, USA, Catalog Number 87CH2383-8, pages 63-64, March 1987. 32) K. Field, A Derr, L. Cosell, C. Henry, M. Kasner, J. Tiao, "A Single Board MultrateAPC Speech Coding Terminal," Proceedings of ICASSP 87, USA, Catalog Number 87CH2396-0, Volume 2, pages 960-963, April 1987. 33) H. Brehm, W. Stammler, "Description and Generation of Sperically Invariant Speech-Model Signals," Signal Processing, Netherlands, Volume 12, Number 2, pages 119-141, March 1987. 34) AZ. Baraniecki, "Digital Coding of Speech Algorithms and Architectures," Proceedings of IECON '87, Volume 2, pages 977-984, 1987. 35) B. Flocon, P. Lockwood, J. Sap, L. Sauter, "MARIPA: Speaker Independent Recognition of Speech on IBM-PC," Eighth International Conference on Pattern Recognition, Catalog Number 86CH2342-4, pages 893-895, October 1986. 36) M.T. Reilly, "A Hybridized Linear Prediction Code Speech Synthesizer," Conference Records for MILCOM 86, USA, Catalog Number 86CH2323-4, Volume 2, 32.5/1-5, October 1986. 37) K. Torkkola, H. Riittinen, T. Kohonen, "Microprocessor-Based Word Recognizer For a Large Vocabulary," Eighth International Conference on Speech Recognition Proceedings, Catalog Number 86CH2342-4, pages 814-816, October 1986. 38) C.H. Lee, D.Y. Cheng, D.A Russo et aI, "An Integrated,Voice-Controlled Voice Messaging System," Proceedings of Speech Technology 86, April 1986. 39) Kun-Shan Lin and G.A Frantz, "A Survey of Available Speech Hardware for Computer Systems," Proceedings of Speech Technology 86, April 1986. 40) L.R. Morris, "Software Engineering for an IBM PCrrI-SPEECH Realtime Digital Speech Spectrogram Production System," Proceedings ofSpeech Technology 86, April 1986. 41) K. Torkkola, H. Riittinen, "A Microprocessor-Based Recognition System For Large Vocabularies," Proceedings of ICASSP 86, USA, Catalog Number 86CH2243-4, Volume 1, pages 333-337, April 1986.1) 42) Z. Gorzynski, "Real Time Software Engineering on the TMS320: Application in a Pitch Detector Implementation," International Conference on Speech Input/Output; Techniques and Applications, Conference Publication Number 258, pages 270-275, March 1986. 43) S. Ganesan, M.O. Ahmad, "A Real Time Speech Signal Processor," Proceedings ofthe ISMM Internal Symposium, pages 46-49, February 1986. 44) L. Gutcho, "DECtalk-a Year Later," Speech Technology, Volume 3, Number 1, pages 98-102, August-September 1985. 45) B. Bryden, H.R. Hassanein, "Implementation of a Hybrid Pitch-Excited/Multipulse Vocoder for Cost-Effective Mobile Communications," Proceedings ofSpeech Technology 85, April 1985. 46) M. McMahan, "A Complete Speech Application Development Environment," Proceedings of SPEECH TECH 85, pages 293-295, April 1985. 47) H. Hassanein and B. Bryden, "Implementation of the Gold-Rabiner Pitch Detector in a Real Time Environment Using an Improved Voicing Detector," Proceedings ofIEEE International Conference on Acoustics, Speech and Signal Processing, USA, 1985. Digital Signal Processing Applications with the TMS320 Family, Vol. 3 545 48) K. Lin and G. Frantz, "Speech Applications with a General Purpose Digital Signal Processor," IEEE Region 5 Conference Record, USA, March 1985. 49) K. Lin and G. Frantz, "Speech Applications Created by a Microcomputer," IEEE Potentials, USA, December 1985. 50) M. Malcangi, "Programmable VLSI's for Vocal Signals," Electronica Oggi, Italy, Number 10, pages 103-113, October 1984. 51) V. Kroneck, "Conversing with the Computer," Elektrotechnik, Germany, Volume 66, Number 20, pages 16-18, October 1984. 52) P.K. Rajasekaran and G.R. Doddington, "Real-Time Factoring of the Linear Prediction Polynomial of Speech Signals," Digital Signal Processing -1984: Proceedings of the International Conference, pages 405-410, September 1984. 53) M. Hutchins and L. Dusek, "Advanced ICs Spawn Practical Speech Recognition," Computer Design, USA, Volume 23, Number 5, pages 133-139, May 1984. 54) E. Catier, "Listening Cards or Speech Recognition," Electronique Industrielle, France, Number 67, pages 72-76, March 1984. 55) O. Ericsson, "Special Processor Did Not Meet Requirements - Built Own Synthesizer," ElteknikAktuell Elektronik, Sweden, Number 3, pages 32-36, February 1984. 56) H. Strube, "Synthesis Part of a 'Log Area Ratio' Vocoder Implemented on a Signal-Processing Microcomputer," IEEE Transactions on A coustics, Speech and Signal Processing, USA, Volume ASSP-32, Number 1, pages 183-185, February 1984. 57) B. Bryden and H. Hassanein, "Implementation of Full Duplex 2.4 Kbps LPC Vocoder on a Single TMS320 Microprocessor Chip," Proceedings ofIEEE International Conference on Acoustics, Speech and Signal Processing, USA, 1984. 58) M. Dankberg, R. lItis, D. Saxton, and P. Wilson, "Implementation of the RELP Vocoder Using the TMS320," Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, USA, 1984. 59) A. Holck and W. Anderson, "A Single-Processor LPC Vocoder," Proceedings ofIEEE International Conference on Acoustics, Speech and Signal Processing, USA, 1984. ,