Guide To Assembly Language Programming In Linux Sivarama P. Dandamudi

User Manual:

Open the PDF directly: View PDF .
Page Count: 539

Download
Open PDF In Browser	View PDF

Guide to Assembly Language
Programming in Linux

Sivarama P. Dandamudi

Guide to Assembly Language
Programming in Linux

Spri
ringer

This eBook does not include ancillary media that was packaged with the
printed version of the book.

Sivarama P. Dandamudi
School of Computer Science
Carleton University
Ottawa, ON K1S5B6
Canada
sivarama@scs.carleton.ca

Library of Congress Cataloging-in-Publication Data
A CLP. Catalogue record for this book is available
from the Library of Congress.
ISBN-10: 0-387-25897-3 (SC)
ISBN-13: 978-0387-25897-3 (SC)

ISBN-10: 0-387-26171-0 (e-book)
ISBN-13: 978-0387-26171-3 (e-book)
Printed on acid-free paper.

© 2005 Springer Science+Business Media, Inc.
All rights reserved. This work may not be translated or copied in whole or in part
without the written permission of the publisher (Springer Science+Business
Media, Inc., 233 Spring Street, New York, NY 10013, USA), except for brief
excerpts in connection with reviews or scholarly analysis. Use in connection with
any form of information storage and retrieval, electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter
developed is forbidden.
The use in this publication of trade names, trademarks, service marks and similar
terms, even if they are not identified as such, is not to be taken as an expression of
opinion as to whether or not they are subject to proprietary rights.
Printed in the United States of America.
9 8 7 6 5 4 3 2 1
springeronline.com

SPIN 11302087

To
my parents, Subba Rao and Prameela Rani,
my wife, Sobha,
and
my daughter, Veda

Preface

The primary goal of this book is to teach the IA-32 assembly language programming under
the Linux operating system. A secondary objective is to provide a gende introduction to the
Fedora Linux operating system. Linux has evolved substantially since its first appearance in
1991. Over the years, its popularity has grown as well. According to an estimate posted on
h t t p : / / c o u n t e r . l i . o r g / , there are about 18 million Linux users worldwide. Hopefully,
this book encourages even more people to switch to Linux.
The book is self-contained and provides all the necessary background information. Since
assembly language is very closely linked to the underlying processor architecture, a part of the
book is dedicated to giving computer organization details. In addition, the basics of Linux are
introduced in a separate chapter. These details are sufficient to work with the Linux operation
system.
The reader is assumed to have had some experience in a structured, high-level language such
as C. However, the book does not assume extensive knowledge of any high-level language—only
the basics are needed.

Approach and Level of Presentation
The book is targeted for software professionals who would like to move to Linux and get a comprehensive introduction to the IA-32 assembly language. It provides detailed, step-by-step instructions to install Linux as the second operating system.
No previous knowledge of Linux is required. The reader is introduced to Linux and its commands. Four chapters are dedicated to Linux and NASM assembler (installation and usage). The
accompanying DVD-ROMs provide the necessary software to install the Linux operating system
and learn assembly language programming.
The assembly language is presented from the professional viewpoint. Since most professionals
are full-time employees, the book takes their time constraints into consideration in presenting the
material.

viii

Preface

Summary of Special Features
Here is a summary of the special features that sets this book apart:
• The book includes the Red Hat Fedora Core 3 Linux distribution (a total of two DVD-ROMs
are included with the book). Detailed step-by-step instructions are given to install Linux on
a Windows machine. A complete chapter is used for this purpose, with several screenshots
to help the reader during the installation process.
• Free NASM assembler is provided so that the readers can get hands-on assembly language
programming experience.
• Special I/O software is provided to simplify assembly language programming. A set of input
and output routines is provided so that the reader can focus on writing assembly language
programs rather than spending time in understanding how the input and output are done
using the basic I/O functions provided by the operating system.
• Three chapters are included on computer organization. These chapters provide the necessary
background to program in the assembly language.
• Presentation of material is suitable for self-study. To facilitate this, extensive programming
examples and figures are used to help the reader grasp the concepts. Each chapter contains
a simple programming example in "Our First Program" section to gently introduce the concepts discussed in the chapter. This section is typically followed by "Illustrative Examples"
section, which gives more programming examples.
• This book does not use fragments of code in examples. All examples are complete in
the sense that they can be assembled and run, giving a better feeling as to how these programs work. These programs are on the accompanying DVD-ROM (DVD 2). In addition,
you can also download these programs from the book's Web site at the following URL:
http://www.scs.carleton.ca/~sivarama/linux_book.
• Each chapter begins with an overview and ends with a summary.

Overview of the Book
The book is divided into seven parts. Part I provides introduction to the assembly language and
gives reasons for programming in the assembly language. Assembly language is a low-level language. To program in the assembly language, you should have some basic knowledge about the
underlying processor and system organization. Part II provides this background on computer organization. Chapter 2 introduces the digital logic circuits. The next chapter gives details on memory
organization. Chapter 4 describes the Intel IA-32 architecture.
Part III covers the topics related to Linux installation and usage. Chapter 5 gives detailed
information on how you can install the Fedora Core Linux provided on the accompanying DVDROMs. It also explains how you can make your system dual bootable so that you can select the
operating system (Windows or Linux) at boot time. Chapter 6 gives a brief introduction to the
Linux operating system. It gives enough details so that you feel comfortable using the Linux
operating system. If you are familiar with Linux, you can skip this chapter.
Part IV also consists of two chapters. It deals with assembling and debugging assembly language programs. Chapter 7 gives details on the NASM assembler. It also describes the I/O routines
developed by the author to facilitate assembly language programming. The next chapter looks at
the debugging aspect of program development. We describe the GNU debugger (gdb), which
is a command-line debugger. This chapter also gives details on Data Display Debugger (DDD),

Preface

which is a nice graphical front-end for gdb. Both debuggers are included on the accompanying
DVD-ROMs.
After covering the setup and usage details of Linux and NASM, we look at the assembly language in Part V. This part introduces the basic instructions of the assembly language. To facilitate
modular program development, we introduce procedures in the third chapter of this part. The remaining chapters describe the addressing modes and other instructions that are commonly used in
assembly language programs.
Part VI deals with advanced assembly language topics. It deals with topics such as string
processing, recursion,floating-pointoperations, and interrupt processing. In addition. Chapter 21
explains how you can interface with high-level languages. By using C, we explain how you can call
assembly language procedures from C and vice versa. This chapter also discusses how assembly
language statements can be embedded into high-level language code. This process is called inline
assembly. Again, by using C, this chapter shows how inline assembly is done under Linux.
The last part consists of five appendices. These appendices give information on number systems and character representation. In addition, Appendix D gives a summary of the IA-32 instruction set. A comprehensive glossary is given in Appendix E.

Acknowledgments
I want to thank Wayne Wheeler, Editor and Ann Kostant, Executive Editor at Springer for suggesting the project. I am also grateful to Wayne for seeing the project through.
My wife Sobha and daughter Veda deserve my heartfelt thanks for enduring my preoccupation
with this project! I also thank Sobha for proofreading the manuscript. She did an excellent job!
I also express my appreciation to the School of Computer Science at Carleton University for
providing a great atmosphere to complete this book.

Feedback
Works of this nature are never error-free, despite the best efforts of the authors and others involved
in the project. I welcome your comments, suggestions, and corrections by electronic mail.
Ottawa, Canada
January 2005

Sivarama P. Dandamudi
sivarama@scs . c a r l e t o n . ca
http://www.scs.carleton.ca/~sivarama

Contents
Preface
PART I

vii
Overview

1 Assembly Language
Introduction
What Is Assembly Language?
Advantages of High-Level Languages
Why Program in Assembly Language?
Typical Applications
Summary

3
3
5
6
7
8
8

PART II

Computer Organization

2 Digital Logic Circuits
Introduction
Simple Logic Gates
Logic Functions
Deriving Logical Expressions
Simplifying Logical Expressions
Combinational Circuits
Adders
Programmable Logic Devices
Arithmetic and Logic Units
Sequential Circuits
Latches
Flip-Flops
Summary

11
11
13
15
17
18
23
26
29
32
35
37
39
43

3 Memory Organization
Introduction
Basic Memory Operations
Types of Memory
Building a Memory Block

45
45
46
48
50

xii

Contents

Building Larger Memories
Mapping Memory
Storing Multibyte Data
Alignment of Data
Summary

52
56
58
59
60

4 The IA-32 Architecture
Introduction
Processor Execution Cycle
Processor Registers
Protected Mode Memory Architecture
Real Mode Memory Architecture
Mixed-Mode Operation
Which Segment Register to Use
Input/Output
Summary

61
61
63
63
67
72
74
75
76
78

PART III

Linux

5 Installing Linux
Introduction
Partitioning Your Hard Disk
Installing Fedora Core Linux
Installing and Removing Software Packages
Mounting Windows File System
Summary
Getting Help

81
81
82
92
107
110
112
114

6 Using Linux
Introduction
Setting User Preferences
System Settings
Working with the GNOME Desktop
Command Terminal
Getting Help
Some General-Purpose Commands
File System
Access Permissions
Redirection
Pipes
Editing Files with Vim
Summary

115
115
117
123
126
132
134
135
139
141
145
146
147
149

PART IV

NASM

151

7 Installing and Using NASM
Introduction
Installing NASM

153
153
154

Contents

xiii

Generating the Executable File
Assembly Language Template
Input/Output Routines
An Example Program
Assembling and Linking
Summary
Web Resources

154
155
156
159
160
166
166

8 Debugging Assembly Language Programs
Strategies to Debug Assembly Language Programs
Preparing Your Program
GNU Debugger
Data Display Debugger
Summary

167
167
169
170
179
184

PART V

185

Assembly Language

9 A First Look at Assembly Language
Introduction
Data Allocation
Where Are the Operands
Overview of Assembly Language Instructions
Our First Program
Illustrative Examples
Summary

187
187
188
193
196
205
206
209

10 More on Assembly Language
Introduction
Data Exchange and Translate Instructions
Shift and Rotate Instructions
Defining Constants
Macros
Our First Program
Illustrative Examples
When to Use the XLAT Instruction
Summary

211
211
212
213
217
218
221
223
227
229

11 Writing Procedures
Introduction
What Is a Stack?
Implementation of the Stack
Stack Operations
Uses of the Stack
Procedure Instructions
Our First Program
Parameter Passing
Illustrative Examples
Summary

231
231
233
234
236
238
239
241
242
248
252

xiv

Contents

12 More on Procedures
Introduction
Local Variables
Our First Program
Multiple Source Program Modules
Illustrative Examples
Procedures with Variable Number of Parameters
Summary

255
255
256
257
260
261
268
272

13 Addressing Modes
Introduction
Memory Addressing Modes
Arrays
Our First Program
Illustrative Examples
Summary

273
273
274
278
281
282
289

14 Arithmeticlnstructions
Introduction
Status Flags
Arithmetic Instructions
Our First Program
Illustrative Examples
Summary

291
291
292
302
309
310
316

15 Conditional Execution
Introduction
Unconditional Jump
Compare Instruction
Conditional Jumps
Looping Instructions
Our First Program
Illustrative Examples
Indirect Jumps
Summary

317
317
318
321
322
327
328
330
335
339

16 Logical and Bit Operations
Introduction
Logical Instructions
Shift Instructions
Rotate Instructions
Bit Instructions
Our First Program
Illustrative Examples
Summary

341
341
342
347
353
354
355
357
360

Contents

PART VI

Advanced Assembly Language

361

17 String Processing
String Representation
String Instructions
Our First Program
Illustrative Examples
Testing String Procedures
Summary

363
363
364
372
373
376
378

18 ASCII and BCD Arithmetic
Introduction
Processing in ASCII Representation
Our First Program
Processing Packed BCD Numbers
Illustrative Example
Decimal Versus Binary Arithmetic
Summary

379
379
381
384
385
387
389
390

19 Recursion
Introduction
Our First Program
Illustrative Examples
Recursion Versus Iteration
Summary

391
391
392
394
400
401

20 Protected-Mode Interrupt Processing
Introduction
A Taxonomy of Interrupts
Interrupt Processing in the Protected Mode
Exceptions
Software Interrupts
File I/O
Our First Program
Illustrative Examples
Hardware Interrupts
Direct Control of I/O Devices
Summary

403
403
404
405
408
410
411
415
415
418
419
420

21 High-Level Language Interface
Introduction
Calling Assembly Procedures from C
Our First Program
Illustrative Examples
Calling C Functions from Assembly
Inline Assembly
Summary

423
423
424
427
428
. 432
434
441

xvi

Contents

22 Floating-Point Operations
Introduction
Floating-Point Unit Organization
Floating-Point Instructions
Our First Program
Illustrative Examples
Summary

443
443
444
447
453
455
458

APPENDICES

459

A Number Systems
Positional Number Systems
Conversion to Decimal
Conversion from Decimal
Binary/Octal/Hexadecimal Conversion
Unsigned Integers
Signed Integers
Floating-Point Representation
Summary

461
461
463
463
464
466
466
469
471

B Character Representation
Character Representation
ASCII Character Set

473
473
474

C Programming Exercises

477

D IA-32 Instruction Set
Instruction Format
Selected Instructions

485
485
487

E Glossary

517

Index

527

PARTI
Overview

1
Assembly Language
The main objective of this chapter is to give you a brief introduction to the assembly language. To
achieve this goal, we compare and contrast the assembly language with high-level languages you
are familiar with. This comparison enables us to take a look at the pros and cons of the assembly
language vis-a-vis high-level languages.

Introduction
A user's view of a computer system depends on the degree of abstraction provided by the underlying software. Figure 1.1 shows a hierarchy of levels at which one can interact with a computer
system. Moving to the top of the hierarchy shields the user from the lower-level details. At the
highest level, the user interaction is limited to the interface provided by application software such
as spreadsheet, word processor, and so on. The user is expected to have only a rudimentary knowledge of how the system operates. Problem solving at this level, for example, involves composing
a letter using the word processor software.
At the next level, problem solving is done in one of the high-level languages such as C and
Java. A user interacting with the system at this level should have detailed knowledge of software
development. Typically, these users are application programmers. Level 4 users are knowledgeable
about the application and the high-level language that they would use to write the application
software. They may not, however, know internal details of the system unless they also happen to
be involved in developing system software such as device drivers, assemblers, linkers, and so on.
Both levels 4 and 5 are system independent, that is, independent of a particular processor used
in the system. For example, an application program written in C can be executed on a system with
an Intel processor or a PowerPC processor without modifying the source code. All we have to
do is recompile the program with a C compiler native to the target system. In contrast, software
development done at all levels below level 4 is system dependent.
Assembly language programming is referred to as low-level programming because each assembly language instruction performs a much lower-level task compared to an instruction in a
high-level language. As a consequence, to perform the same task, assembly language code tends
to be much larger than the equivalent high-level language code.
Assembly language instructions are native to the processor used in the system. For example,
a program written in the Intel assembly language cannot be executed on the PowerPC processor.

Assembly Language Programming in Linux
Level 5
Application program level
(Spreadsheet, Word Processor)

Increased
leve 1 or
abstra ction

System
independent

Level 4
High-level language level

(C,Java)

Level 3
Assembly language level

Y
System
dependent
Level 2
Machine language level

Level 1
Operating syslcm calls

Level 0
Hardware level

Figure 1.1 A user's view of a computer system.

Programming in the assembly language also requires knowledge about system internal details such
as the processor architecture, memory organization, and so on.
Machine language is a close relative of the assembly language. Typically, there is a one-to-one
correspondence between the assembly language and machine language instructions. The processor
understands only the machine language, whose instructions consist of strings of Is and Os. We say
more on these two languages in the next section.

Chapter 1 • Assembly Language

Even though assembly language is considered a low-level language, programming in assembly
language will not expose you to all the nuts and bolts of the system. Our operating system hides
several of the low-level details so that the assembly language programmer can breathe easy. For
example, if we want to read input from the keyboard, we can rely on the services provided by the
operating system.
Well, ultimately there has to be something to execute the machine language instructions. This
is the system hardware, which consists of digital logic circuits and the associated support electronics. A detailed discussion of this topic is beyond the scope of this book. Books on computer
organization discuss this topic in detail.

What Is Assembly Language?
Assembly language is directly influenced by the instruction set and architecture of the processor.
In this book, we focus on the assembly language for the Intel 32-bit processors like the Pentium.
The assembly language code must be processed by a program in order to generate the machine
language code. Assembler is the program that translates the assembly language code into the
machine language.
NASM (Netwide Assembler), MASM (Microsoft Assembler), and TASM (Borland Turbo Assembler) are some of the popular assemblers for the Intel processors. In this book, we use the
NASM assembler. There are two main reasons for this selection: (i) It is a free assembler; and
(ii) NASM supports a variety of formats including the formats used by Microsoft Windows, Linux
and a host of others.
Are you curious as to how the assembly language instructions look like? Here are some examples:
inc
mov
and
add

result
class_size,45
maskl,12 8
marks,10

The first instruction increments the variable r e s u l t . This assembly language instruction is equivalent to
result++;
in C. The second instruction initializes c l a s s _ s i z e to 45. The equivalent statement in C is
c l a s s _ s i z e = 45;
The third instruction performs the bitwise and operation on ma s k i and can be expressed in C as
maskl = maskl & 128/
The last instruction updates marks by adding 10. In C, this is equivalent to
marks = marks + 10/

These examples illustrate several points:

Assembly Language Programming in Linux

1. Assembly language instructions are cryptic.
2. Assembly language operations are expressed by using mnemonics (like and and inc).
3. Assembly language instructions are low level. For example, we cannot write the following
in the assembly language:
add

marks,value

This instruction is invalid because two variables, marks and v a l u e , are not allowed in a
single instruction.
We appreciate the readability of the assembly language instructions by looking at the equivalent machine language instructions. Here are some machine language examples:
Assembly language
nop
inc

result

Operation
No operation
Increment

Machine language (in hex)
90
FF060A00

mov

class_size,45

Copy

C7060C002D00

and

mask, 128

Logical and

80260E0080

add

marks, 10

Integer addition

83060F000A

In the above table, machine language instructions are written in the hexadecimal number system. If you are not familiar with this number system, see Appendix A for a quick review of number
systems.
It is obvious from these examples that understanding the code of a program in the machine
language is almost impossible. Since there is a one-to-one correspondence between the instructions of the assembly language and the machine language, it is fairly straightforward to translate
instructions from the assembly language to the machine language. As a result, only a masochist
would consider programming in a machine language. However, life was not so easy for some of
the early progranmiers. When microprocessors were first introduced, some programming was in
fact done in machine language!

Advantages of High-Level Languages
High-level languages are preferred to program applications, as they provide a convenient abstraction of the underlying system suitable for problem solving. Here are some advantages of programming in a high-level language:
1. Program development is faster.
Many high-level languages provide structures (sequential, selection, iterative) that facilitate
program development. Programs written in a high-level language are relatively small compared to the equivalent programs written in an assembly language. These programs are also
easier to code and debug.
2. Programs are easier to maintain.
Programming a new application can take from several weeks to several months and the
lifecycle of such an application software can be several years. Therefore, it is critical that
software development be done with a view of software maintainability, which involves activities ranging from fixing bugs to generating the next version of the software. Programs

Chapter 1 • Assembly Language

written in a high-level language are easier to understand and, when good programming practices are followed, easier to maintain. Assembly language programs tend to be lengthy and
take more time to code and debug. As a result, they are also difficult to maintain.
3. Prog rams a re portable,
High-level language programs contain very few processor-specific details. As a result, they
can be used with little or no modification on different computer systems. In contrast, assembly language programs are processor-specific.

Why Program in Assembly Language?
The previous section gives enough reasons to discourage you from programming in the assembly language. However, there are two main reasons why programming is still done in assembly
language: (i) efficiency, and (ii) accessibility to system hardware.
Efficiency refers to how "good" a program is in achieving a given objective. Here we consider
two objectives based on space (space-efficiency) and time (time-efficiency).
Space-efficiency refers to the memory requirements of a program, that is, the size of the executable code. Program A is said to be more space-efficient if it takes less memory space than
program B to perform the same task. Very often, programs written in the assembly language tend
to be more compact than those written in a high-level language.
Time-efficiency refers to the time taken to execute a program. Obviously a program that runs
faster is said to be better from the time-efficiency point of view. If we craft assembly language
programs carefully, they tend to run faster than their high-level language counterparts.
As an aside, we can also define a third objective: how fast a program can be developed (i.e.,
write code and debug). This objective is related to the programmer productivity, and assembly
language loses the battle to high-level languages as discussed in the last section.
The superiority of assembly language in generating compact code is becoming increasingly
less important for several reasons. First, the savings in space pertain only to the program code
and not to its data space. Thus, depending on the application, the savings in space obtained by
converting an application program from some high-level language to the assembly language may
not be substantial. Second, the cost of memory has been decreasing and memory capacity has
been increasing. Thus, the size of a program is not a major hurdle anymore. Finally, compilers are becoming "smarter" in generating code that is both space- and time-efficient. However,
there are systems such as embedded controllers and handheld devices in which space-efficiency is
important.
One of the main reasons for writing programs in an assembly language is to generate code
that is time-efficient. The superiority of assembly language programs in producing efficient code
is a direct manifestation of specificity. That is, assembly language programs contain only the
code that is necessary to perform the given task. Even here, a "smart" compiler can optimize the
code that can compete well with its equivalent written in the assembly language. Although the
gap is narrowing with improvements in compiler technology, assembly language still retains its
advantage for now.
The other main reason for writing assembly language programs is to have direct control over
system hardware. High-level languages, on purpose, provide a restricted (abstract) view of the
underlying hardware. Because of this, it is almost impossible to perform certain tasks that require
access to the system hardware. For example, writing a device driver for a new scanner on the
market almost certainly requires programming in assembly language. Since assembly language

Assembly Language Programming in Linux

does not impose any restrictions, you can have direct control over the system hardware. If you are
developing system software, you cannot avoid writing assembly language programs.

Typical Applications
We have identified three main advantages to programming in an assembly language.
1. Time-efficiency
2. Accessibility to hardware
3. Space-efficiency
Time-efficiency: Applications for which the execution speed is important fall under two categories:
1. Time convenience (to improve performance)
2. Time critical (to satisfy functionality)
Applications in the first category benefit from time-efficient programs because it is convenient or
desirable. However, time-efficiency is not absolutely necessary for their operation. For example,
a graphics package that scales an object instantaneously is more pleasant to use than the one that
takes noticeable time.
In time-critical applications, tasks have to be completed within a specified time period. These
applications, also called real-time applications, include aircraft navigation systems, process control systems, robot control software, communications software, and target acquisition (e.g., missile
tracking) software.
Accessibility to hardware: System software often requires direct control over the system hardware.
Examples include operating systems, assemblers, compilers, linkers, loaders, device drivers, and
network interfaces. Some applications also require hardware control. Video games are an obvious
example.
Space-efficiency: As mentioned before, for most systems, compactness of application code is not
a major concern. However, in portable and handheld devices, code compactness is an important
factor. Space-efficiency is also important in spacecraft control systems.

Summary
We introduced assembly language and discussed where it fits in the hierarchy of computer languages. Our discussion focused on the usefulness of high-level languages vis-a-vis the assembly
language. We noted that high-level languages are preferred, as their use aids in faster program
development, program maintenance, and portability. Assembly language, however, provides two
chief benefits: faster program execution, and access to system hardware. We give more details on
the assembly language in Parts V and VI.

PART II
Computer Organization

2
Digital Logic Circuits
Viewing computer systems at the digital logic level exposes us to the nuts and bolts of the basic
hardware. The goal of this chapter is to cover the necessary digital logic background. Our discussion can be divided into three parts. In the first part, we focus on the basics of digital logic
circuits. We start off with a look at the basic gates such as AND, OR, and NOT gates. We introduce Boolean algebra to manipulate logical expressions. We also explain how logical expressions
are simplified in order to get an efficient digital circuit implementation.
The second part introduces combinational circuits, which provide a higher level of abstraction
than the basic circuits discussed in the first part. We review several commonly used combinational
circuits including multiplexers, decoders, comparators, adders, and ALUs.
In the last part, we review sequential circuits. In sequential circuits, the output depends both
on the current inputs as well as the past history. This feature brings the notion of time into digital
logic circuits. We introduce system clock to provide this timing information. We discuss two types
of circuits: latches and flip-flops. These devices can be used to store a single bit of data. Thus,
they provide the basic capability to design memories. These devices can be used to build larger
memories, a topic covered in detail in the next chapter

Introduction
A computer system has three main components: a central processing unit (CPU) or processor,
a memory unit, and input/output (I/O) devices. These three components are interconnected by
a system bus. The term bus is used to represent a group of electrical signals or the wires that
carry these signals. Figure 2.1 shows details of how they are interconnected and what actually
constitutes the system bus. As shown in this figure, the three major components of the system bus
are the address bus, data bus, and control bus.
The width of address bus determines the memory addressing capacity of the processor. The
width of data bus indicates the size of the data transferred between the processor and memory or
I/O device. For example, the 8086 processor had a 20-bit address bus and a 16-bit data bus. The
amount of physical memory that this processor can address is 2^^ bytes, or 1 MB, and each data
transfer involves 16 bits. The Pentium processor, for example, has 32 address lines and 64 data
lines. Thus, it can address up to 2^^ bytes, or a 4 GB memory. Furthermore, each data transfer can

Assembly Language Programming in Linux

-A

Processor

Memory

Address bus

Data bus

Control bus

I/O device
Figure 2.1 Simplified block diagram of a computer system,

move 64 bits. In comparison, the Intel 64-bit processor Itanium uses 64 address lines and 128 data
lines.
The control bus consists of a set of control signals. Typical control signals include memory
read, memory write, I/O read, I/O write, interrupt, interrupt acknowledge, bus request, and bus
grant. These control signals indicate the type of action taking place on the system bus. For example, when the processor is writing data into the memory, the memory write signal is asserted.
Similarly, when the processor is reading from an I/O device, the I/O read signal is asserted.
The system memory, also called main memory or primary memory, is used to store both program instructions and data. I/O devices such as the keyboard and display are used to provide user
interface. I/O devices are also used to interface with secondary storage devices such as disks.
The system bus is the communication medium for data transfers. Such data transfers are called
bus transactions. Some examples of bus transactions are memory read, memory write, I/O read,
I/O write, and interrupt. Depending on the processor and the type of bus used, there may be other
types of transactions. For example, the Pentium processor supports a burst mode of data transfer
in which up to four 64 bits of data can be transferred in a burst cycle.
Every bus transaction involves a master and a slave. The master is the initiator of the transaction and the slave is the target of the transaction. For example, when the processor wants to read
data from the memory, it initiates a bus transaction, also called a bus cycle, in which the processor

Chapter 2 • Digital Logic Circuits

is the bus master and memory is the slave. The processor usually acts as the master of the system
bus, while components like memory are usually slaves. Some components may act as slaves for
some transactions and as masters for other transactions.
When there is more than one master device, which is typically the case, the device requesting
the use of the bus sends a bus request signal to the bus arbiter using the bus request control line.
If the bus arbiter grants the request, it notifies the requesting device by sending a signal on the
bus grant control line. The granted device, which acts as the master, can then use the bus for data
transfer. The bus-request-grant procedure is called bus protocol. Different buses use different bus
protocols. In some protocols, permission to use the bus is granted for only one bus cycle; in others,
permission is granted until the bus master relinquishes the bus.
The hardware that is responsible for executing machine language instructions can be built
using a few basic building blocks. These building blocks are called logic gates. These logic gates
implement the familiar logical operations such as AND, OR, NOT, and so on, in hardware. The
purpose of this chapter is to provide the basics of the digital hardware. The next two chapters
introduce memory organization and architecture of the Intel IA-32 processors.
Our discussion of digital logic circuits is divided into three parts. The first part deals with the
basics of digital logic gates. Then we look at two higher levels of abstractions—combinational and
sequential circuits. In combinational circuits, the output of the circuit depends solely on the current
inputs applied to the circuit. The adder is an example of a combinational circuit. The output of
an adder depends only on the current inputs. On the other hand, the output of a sequential circuit
depends not only on the current inputs but also on the past inputs. That is, output depends both on
the current inputs as well as on how it got to the current state. For example, in a binary counter, the
output depends on the current value. The next value is obtained by incrementing the current value
(in a way, the current state represents a snapshot of the past inputs). That is, we cannot say what
the output of a counter will be unless we know its current state. Thus, the counter is a sequential
circuit. We review both combinational and sequential circuits in this chapter.

Simple Logic Gates
You are familiar with the three basic logical operators: AND, OR, and NOT. Digital circuits to
implement these and other logical functions are called gates. Figure 2.2a shows the symbol notation used to represent the AND, OR, and NOT gates. The NOT gate is often referred to as the
inverter. We have also included the truth table for each gate. A truth table is a list of all possible
input combinations and their corresponding output. For example, if you treat a logical zero as
representing false and a logical 1 truth, you can see that the truth table for the AND gate represents
the logical AND operation.
Even though the three gates shown in Figure 2.2a are sufficient to implement any logical function, it is convenient to implement certain other gates. Figure 2.2b shows three popularly used
gates. The NAND gate is equivalent to an AND gate followed by a NOT gate. Similarly, the NOR
gates are a combination of the OR and NOT gates. The exclusive-OR (XOR) gate generates a 1
output whenever the two inputs differ. This property makes it useful in certain applications such
as parity generation.
Logic gates are in turn built using transistors. One transistor is enough to implement a NOT
gate. But we need three transistors to implement the AND and OR gates. It is interesting to note
that, contrary to our intuition, implementing the NAND and NOR gates requires only two transistors. In this sense, transistors are the basic electronic components of digital hardware circuits. For
example, the Pentium processor introduced in 1993 consists of about 3 million transistors. It is
now possible to design chips with more than 100 million transistors.

Assembly Language Programming in Linux

A
B
AND gate

OR gate

A -i
B
NAND gate

NOR gate

A -[^>>-F

NOT gate

Logic symbol

Truth table

(a) Basic logic gates

XOR gate
Logic symbol

Truth table

(b) Some additional logic gates

Figure 2,2 Simple logic gates: Logic symbols and truth tables.
There is SL propagation delay associated with each gate. This delay represents the time required
for the output to react to an input. The propagation delay depends on the complexity of the circuit
and the technology used. Typical values for the TTL gates are in the range of a few nanoseconds
(about 5 to 10 ns). A nanosecond (ns) is 10~^ second.
In addition to propagation delay, other parameters should be taken into consideration in designing and building logic circuits. Two such parameters are fanin and fanout. Fanin specifies
the maximum number of inputs a logic gate can have. Fanout refers to the driving capacity of an
output. Fanout specifies the maximum number of gates that the output of a gate can drive.
A small set of independent logic gates (such as AND, NOT, NAND, etc.) are packaged into
an integrated circuit (IC) chip, or "chip" for short. These ICs are called small-scale integrated
(SSI) circuits and typically consist of about 1 to 10 gates. Medium-scale integrated (MSI) circuits
represent the next level of integration (typically between 10 and 100 gates). Both SSI and MSI
were introduced in the late 1960s. LSI (large-scale integration), introduced in early 1970s, can
integrate between 100 and 10,000 gates on a single chip. The final degree of integration, VLSI
(very large scale integration), was introduced in the late 1970s and is used for complex chips such
as microprocessors that require more than 10,000 gates.

Chapter 2 • Digital Logic Circuits

Table 2.1 Truth tables for the majority and even-parity functions
Majority function

Even-parity function

0
0
0
0
1
1
1
1

0
0
1
1
0
0
1
1

0
1
0
1
0
1
0
1

0
0
0
1
0
1
1
1

0
0
0
0
1
1
1
1

0
0
1
1
0
0
1
1

0
1
0
1
0
1
0
1

0
1
1
0
1
0
0
1

Logic Functions
Logic functions can be specified in a variety of ways. In a sense their expression is similar to
problem specification in software development. A logical function can be specified verbally. For
example, a majority function can be specified as: Output should be 1 whenever the majority of
the inputs is 1. Similarly, an even-parity function can be specified as: Output (parity bit) is 1
whenever there is an odd number of Is in the input. The major problem with verbal specification
is the imprecision and the scope for ambiguity.
We can make this specification precise by using a truth table. In the truth table method, for
each possible input combination, we specify the output value. The truth table method makes sense
for logical functions as the alphabet consists of only 0 and 1. The truth tables for the 3-input
majority and even-parity functions are shown in Table 2.1.
The advantage of the truth table method is that it is precise. This is important if you are
interfacing with a client who does not understand other more concise forms of logic function
expression. The main problem with the truth table method is that it is cumbersome as the number
of rows grows exponentially with the number of logical variables. Imagine writing a truth table
for a 10-variable function—it requires 2 ^^ — 1024 rows!
We can also use logical expressions to specify a logical function. Logical expressions use the
dot, -h, and overbar to represent the AND, OR, and NOT operations, respectively. For example,
the output of the AND gate in Figure 2.2 is written as F = A • B. Assuming that single letters are
used for logical variables, we often omit the dot and write the previous AND function as F = A B.
Similarly, the OR function is written as F = A + B. The output of the NOT gate is expressed as
F = A. Some authors use a prime to represent the NOT operation as in F = A' mainly because of
problems with typesetting the overbar.

Assembly Language Programming in Linux

16
B

A B C

H>-

Figure 2.3 Logical circuit to implement the 3-input majority function.

The logical expressions for our 3-input majority and even-parity functions are shown below:
• 3-input majority function = AB + BC + A C ,
• 3-input even-parity function = A B C + A B C + A B C + A B C .
An advantage of this form of specification is that it is compact while it retains the precision of
the truth table method. Another major advantage is that logical expressions can be manipulated to
come up with an efficient design. We say more on this topic later.
The final form of specification uses a graphical notation. Figure 2.3 shows the logical circuit
to implement the 3-input majority function. As with the last two methods, it is also precise but is
more useful for hardware engineers to implement logical functions.
A logic circuit designer may use all the three forms during the design of a logic circuit. A
simple circuit design involves the following steps:
• First we have to obtain the truth table from the input specifications.
• Then we derive a logical expression from the truth table.
• We do not want to implement the logical expression derived in the last step as it often
contains some redundancy, leading to an inefficient design. For this reason, we simplify the
logical expression.
• In the final step, we implement the simplified logical expression. To express the implementation, we use the graphical notation.
The following sections give more details on these steps.

Chapter 2 • Digital Logic Circuits

A B C

?—I——n

t i n

I T—^

Figure 2.4 Logic circuit for tine 3-input majority function using the bubble notation.

Bubble Notation

In large circuits, drawing inverters can be avoided by following what is known as the "bubble"
notation. The use of the bubble notation simplifies the circuit diagrams. To appreciate the reduced
complexity, compare the bubble notation circuit for the 3-input majority function in Figure 2.4
with that in Figure 2.3.

Deriving Logical Expressions
We can write a logical expression from a truth table in one of two forms: sum-of-products (SOP)
and product-of-sums (POS) forms. In sum-of-products form, we specify the combination of inputs
for which the output should be 1. In product-of-sums form, we specify the combinations of inputs
for which the output should be 0.
Sum-of-Products Form

In this form, each input combination for which the output is 1 is expressed as an and term. This
is the product term as we use • to represent the AND operation. These product terms are ORed
together. That is why it is called sum-of-products as we use + for the OR operation to get the
final logical expression. In deriving the product terms, we write the variable if its value is 1 or its
complement if 0.
Let us look at the 3-input majority function. The truth table is given in Table 2.1. There are
four 1 outputs in this function. So, our logical expression will have four product terms. The first
product term we write is for row 4 with a 1 output. Since A has a value of 0, we use its complement
in the product term while using B and C as they have 1 as theirvalue in this row. Thus, the product
term forjhis row is A B C. The product term for row 6 is A B C. Product terms for rows 7 and 8
are A B C and ABC, respectively. ORing these four product terms gives the logical expression as
ABC + ABC + ABC-HABC.

Assembly Language Programming in Linux

Product-of-Sums Form

This is the dual form of the sum-of-products form. We essentially complement what we have done
to obtain the sum-of-products expression. Here we look for rows that have a 0 output. Each such
row input variable combination is expressed as an OR term. In this OR term, we use the variable
if its value in the row being considered is 0 or its complement if 1. We AND these sum terms to
get the final product-of-sums logical expression. The product-of-sums expression for the 3-input
majority function is (A + B + C) (A + B + C) (A-h B + C) (A + B + C).
This logical expression and the sum-of-products expressions derived before represent the same
truth table. Thus, despite their appearance, these two logical expressions are logically equivalent.
We can prove this logical equivalence by using the algebraic manipulation method described in
the next section.

Simplifying Logical Expressions
The sum-of-products and product-of-sums logical expressions can be used to come up with a
crude implementation that uses only the AND, OR, and NOT gates. The implementation process
is straightforward. We illustrate the process for sum-of-products expressions. Figure 2.3 shows the
brute force implementation of the sum-of-products expression we derived for the 3-input majority
function. If we simplify the logical expression, we can get a more efficient implementation (see
Figure 2.5).
Let us now focus on how we can simplify the logical expressions obtained from truth tables.
Our focus is on sum-of-products expressions. There are three basic techniques: the algebraic manipulation, Karnaugh map, and Quine-McCluskey methods. Algebraic manipulation uses Boolean
laws to derive a simplified logical expression. The Karnaugh map method uses a graphical form
and is suitable for simplifying logical expressions with a small number of variables. The last
method is a tabular method and is particularly suitable for simplifying logical expressions with a
large number of variables. In addition, the Quine-McCluskey method can be used to automate
the simplification process. In this section, we discuss the first two methods (for details on the last
method, see Fundamentals of Computer Organization and Design by Dandamudi).
Algebraic Manipulation

In this method, we use the Boolean algebra to manipulate logical expressions. We need Boolean
identities to facilitate this manipulation. These are discussed next. Following this discussion, we
show how the identities developed can be used to simplify logical expressions.
Table 2.2 presents some basic Boolean laws. For most laws, there are two versions: an and
version and an or version. If there is only one version, we list it under the and version. We can
transform a law from the and version to the or version by replacing each 1 with a 0, 0 with a 1, +
with a •, and • with a +. This relationship is called duality.
We can use the Boolean laws to simplify the logical expressions. We illustrate this method by
looking at the sum-of-products expression for the majority function. A straightforward simplification leads us to the following expression:
Majority function-ABC + ABC + ABC 4- ABC
AB

- A B C -f- ABC + A B .

Chapter 2 • Digital Logic Circuits

Table 2.2 Boolean laws
Name

or version

and version

Identity

X ' 1 =

X -\-0

— X

Complement

X 'X

X -{-X

Commutative

X -y = y 'X

x-i-y

== y -\-x

Distribution

X'{y

+ z) = {x -y)

Idempotent

X ' X

—

x + iy ' z) = {x^y)'

X -\- X

Null

x-0 = 0

Involution

Absorption

X ' {x -\- y) == X

-}-{X'

—

x+ 1 = 1

—

Associative

X' {y- z) =

de Morgan

x ^

{x-\- z)

X -\- {x • y) = X

{x- y)' z

X-]- {y-{- z) == (x + y) + z
x-\-y

= X -\- y

= X -y

Do you know if this is the final simplified form? This is the hard part in applying algebraic
manipulation (in addition to the inherent problem of which rule should be applied). This method
definitely requires good intuition, which often implies that one needs experience to know if the
final form has been derived. In our example, the expression can be further simplified. We start by
rewriting the original logical expression by repeating the term A B C twice and then simplifying
the expression as shown below.
Majority function-= A B C + ABC + ABC + ABC + ABC + ABC
Added extra

-ABC -f ABC + ABC + ABC + ABC + ABC
BC

- B C + AC-i-AB.
This is the final simplified expression. In the next section, we show a simpler method to derive
this expression. Figure 2.5 shows an implementation of this logical expression.
We can see the benefits of implementing the simplified logical expressions by comparing this
implementation with the one shown in Figure 2.3. The simplified version reduces not only the gate
count but also the gate complexity.
Karnaugh Map Method

This is a graphical method and is suitable for simplifying logical expressions with a small number
of Boolean variables (typically six or less). It provides a straightforward method to derive minimal sum-of-products expressions. This method is preferred to the algebraic method as it takes
the guesswork out of the simplification process. For example, in the previous majority function
example, it was not straightforward to guess that we have to duplicate the term ABC twice in
order to get the final logical expression.

Assembly Language Programming in Linux
A B C

Figure 2.5 An implementation of the simplified 3-input majority function.

CD
00
.R\

X" 0

\BC

(a) Two-variable K-map

(b) Three-variable K-map

Figure 2.6 Maps used for simplifying 2-, 3-, and 4-variable logical expressions using the Karnaugh
map method.

The Karnaugh map method uses maps to represent the logical function output. Figure 2.6
shows the maps used for 2-, 3-, and 4-variable logical expressions. Each cell in these maps represents a particular input combination. Each cell is filled with the output value of the function
corresponding to the input combination represented by the cell. For example, the bottom left-hand
cell represents the input combination A = 1 and B = 0 for the two-variable map (Figure 2.6a),
A = 1, B = 0, and C = 0 for the three-variable map (Figure 2.6b), and A = 1, B = 0, C = 0, and
D = 0 for the four-variable map (Figure 2.6c).
The basic idea behind this method is to label cells such that the neighboring cells differ in only
one input bit position. This is the reason why the cells are labeled 00,01, 11, 10 (notice the change
in the order of the last two labels from the normal binary number order). What we are doing is
labeling with a Hamming distance of 1. Hamming distance is the number of bit positions in which

Chapter 2 • Digital Logic Circuits

BC
BC

00
0

01
0

11
\

ABC
00

AB
1 r

0
w ^J

o
CD

ABC

(a) Majority function

ABC

ABC
(b) Even-parity function

Figure 2.7 Three-variable logical expression simplification using the Karnaugh map method: (a)
majority function; (b) even-parity function.

two binary numbers differ. This labeling is also called gray code. Why are we so interested in this
gray code labeling? Simply because we can then eliminate a variable as the following holds:
ABCD + ABCD - A B D .
Figure 2.7 shows how the maps are used to obtain minimal sum-of-products expressions for
three-variable logical expressions. Notice that each cell is filled with the output value of the
function corresponding to the input combination for that cell. After the map of a logical function
is obtained, we can derive a simplified logical expression by grouping neighboring cells with 1 into
areas. Let us first concentrate on the majority function map shown in Figure 2.7a. The two cells
in the third column are combined into one area. These two cells represent inputs ABC (top cell)
and ABC (bottom cell). We can, therefore, combine these two cells to yield a product term B C.
Similarly, we can combine the three Is in the bottom row into two areas of two cells each. The
corresponding product terms for these two areas are A C and A B as shown in Figure 2.7a. Now we
can write the minimal expression asBC + AC + AB, which is what we got in the last section using
the algebraic simplification process. Notice that the cell for AB C (third cell in the bottom row)
participates in all three areas. This is fine. What this means is that we need to duplicate this term
two times to simplify the expression. This is exactly what we did in our algebraic simplification
procedure.
We now have the necessary intuition to develop the required rules for simplification. These
simple rules govern the simplification process:
1. Form regular areas that contain 2* cells, where i > 0. What we mean by a regular area is
that they can be either rectangles or squares. For example, we cannot use an "L" shaped
area.
2. Use a minimum number of areas to cover all cells with 1. This implies that we should form
as large an area as possible and redundant areas should be eliminated.
Once minimal areas have been formed, we write a logical expression for each area. These represent terms in the sum-of-products expressions. We can write the final expression by connecting
the terms with OR.

Assembly Language Programming in Linux

AB
BC

'J
^^

11 /
r

—l—

AB
Figure 2.8 An example Karnaugh map that uses the fact that the first and last columns are adjacent.

In Figure 2.7a, we cannot form a regular area with four cells. Next we have to see if we can
form areas of two cells. The answer is yes. Let us assume that we first formed a vertical area
(labeled B C). That leaves two Is uncovered by an area. So, we form two more areas to cover
these two Is. We also make sure that we indeed need these three areas to cover all Is. Our next
step is to write the logical expression for these areas.
When writing an expression for an area, look at the values of a variable that is 0 as well as 1.
For example, for the areajdentified by B C, the variable A has 0 and 1. That is, the two cells we
are combining represent ABC and ABC. Thus, we can eliminate variable A. The variables B and
C have the same value for the whole area. Since they both have the value 1, we write B C as the
expression for this area. It is straightforward to see that the other two areas are represented by A C
andAB.
If we look at the Karnaugh map for the even-parity function (Figure 2.7b), we find that we
cannot form areas bigger than one cell. This tells us that no further simplification is possible for
this function.
Note that, in the three-variable maps, the first and last columns are adjacent. We did not need
this fact in our previous two examples. You can visualize the Karnaugh map as a tube, cut open to
draw in two dimensions. This fact is important because we can combine^these two columns into a
square area as shown in Figure 2.8. This square area is represented by C.
You might have noticed that we can eliminate log2n variables from the product term, where n
is the number of cells in the area. For example, the four-cell square in Figure 2.8 eliminates two
variables from the product term that represents this area.
Figure 2.9 shows an example of a four-variable logical expression simplification using the
Karnaugh map method. It is important to remember the fact that first and last columns as well
as first and last rows are adjacent. Then it is not difficult to see why the four comer cells form
a regular area and are represented by the expression B D. In writing an expression for an area,
look at the input variables and ignore those that assume both 0 and 1. For example, for this weird
square area, looking at the first and last rows, we notice that variable A has 0 for the first row and
1 for the last row. Thus, we eliminate A. Since B has^value of 0, we use B. Similarly, by looking
at the first and last columns, we eliminate C. We use D as D has a value of 0. Thus, the expression
for this area is B D. Following our simplification procedure to cover all cells with 1, we get the

Chapter 2 • Digital Logic Circuits
CD

00 .

/
BD

/
ACD

ABD

(a)

ABC

ABD

(b)

Figure 2.9 Different minimal expressions will result depending on the groupings.

following minimal expression for Figure 2.9a:
BD -f ACD + ABD.
We also note from Figure 2.9 that a different grouping leads to a different minimal expression.
The logical expression for Figure 2.9b is
BD + ABC + ABD.
Even though this expression is slightly different from the logical expression obtained from Figure 2.9a, both expressions are minimal and logically equivalent.
The best way to understand the Karnaugh map method is to practice until you develop your
intuition. After that, it is unlikely you will ever forget how this method works even if you have not
used it in years.

Combinational Circuits
So far, we have focused on implementations using only the basic gates. One key characteristic of
the circuits that we have designed so far is that the output of the circuit is a function of the inputs.
Such devices are called combinational circuits as the output can be expressed as a combination of
the inputs. We continue our discussion of combinational circuits in this section.
Although gate-level abstraction is better than working at the transistor level, a higher level of
abstraction is needed in designing and building complex digital systems. We now discuss some
combinational circuits that provide this higher level of abstraction.
Higher-level abstraction helps the digital circuit design and implementation process in several
ways. The most important ones are the following:
1. Higher-level abstraction helps us in the logical design process as we can use functional
building blocks that typically require several gates to implement. This, therefore, reduces
the complexity.

Assembly Language Programming in Linux

h
h

Figure 2.10 A 4-data input multiplexer block diagram and truth table,

2. The other equally important point is that the use of these higher-level functional devices
reduces the chip count to implement a complex logical function.
The second point is important from the practical viewpoint. If you look at a typical motherboard,
these low-level gates take a lot of area on the printed circuit board (PCB). Even though the lowlevel gate chips were introduced in the 1970s, you still find them sprinkled on your PCB along
with your Pentium processor. In fact, they seem to take more space. Thus, reducing the chip count
is important to make your circuit compact. The combinational circuits provide one mechanism to
incorporate a higher level of integration.
The reduced chip count also helps in reducing the production cost (fewer ICs to insert and solder) and improving the reliability. Several combinational circuits are available for implementation.
Here we look at a sampler of these circuits.
Multiplexers

A multiplexer (MUX) is characterized by 2^ data inputs, n selection inputs, and a single output.
The block diagram representation of a 4-input multiplexer (4-to-l multiplexer) is shown in Figure 2.10. The multiplexer connects one of 2 ^ inputs, selected by the selection inputs, to the output.
Treating the selection input as a binary number, data input Ii is connected to the output when the
selection input is i as shown in Figure 2.10.
Figure 2.11 shows an implementation of a 4-to-1 multiplexer. If you look closely, it somewhat
resembles our logic circuit used by the brute force method for implementing sum-of-products
expressions (compare this figure with Figure 2.3 on page 16). This visual observation is useful in
developing our intuition about one important property of the multiplexers: we can implement any
logical function using only multiplexers. The best thing about using multiplexers in implementing
a logical function is that you don't have to simplify the logical expression. We can proceed directly
from the truth table to implementation, using the multiplexer as the building block.
How do we implement a truth table using the multiplexer? Simple. Connect the logical variables in the logical expression as the selection inputs and the function outputs as constants to the
data inputs. To follow this straightforward implementation, we need a 2 ^ data input multiplexer
with b selection inputs to implement a b variable logical expression. The process is best illustrated
by means of an example.
Figure 2.12 shows how an 8-to-l multiplexer can be used to implement our two running examples: the 3-input majority and 3-input even-parity functions. From these examples, you can see
that the data input is simply a copy of the output column in the corresponding truth table. You just
need to take care how you connect the logical variables: connect the most significant variable in
the truth table to the most significant selection input of the multiplexer as shown in Figure 2.12.

Chapter 2 • Digital Logic Circuits
Si

•

Figure 2.11 A 4-to-1 multiplexer implementation using the basic gates.

A B C

S2 S1 So

0
0
0
1

h
—

0
1
1

. ^2 Si

M
u
X

—

1
0

—

I3
I4

1
0
0

— I7

—

Majority function

lo
I,

M
U
X

—

I5
l6

Even-parity function

Figure 2.12 Two example implementations using an 8-to-1 multiplexer.
Demultiplexers
The demultiplexer (DeMUX) performs the complementary operation of a multiplexer. As in the
multiplexer, a demultiplexer has n selection inputs. However, the roles of data input and output are
reversed. In a demultiplexer with n selection inputs, there are 2 '^ data outputs and one data input.

Assembly Language Programming in Linux
Si

Control input

Data in

Data out
03

Figure 2.13 Demultiplexer block diagram and its implementation.

Depending on the value of the selection input, the data input is connected to the corresponding
data output. The block diagram and the implementation of a 4-data out demultiplexer is shown in
Figure 2.13.
Decoders
The decoder is another basic building block that is useful in selecting one-out-of-A^ lines. The
input to a decoder is an I-bit binary (i.e., encoded) number and the output is 2 ^ bits of decoded
data. Figure 2.14 shows a 2-to-4 decoder and its logical implementation. Among the 2 ^ outputs
of a decoder, only one output line is 1 at any time as shown in the truth table (Figure 2.14). In the
next chapter we show how decoders are useful in designing system memory.
Comparators
Comparators are useful for implementing relational operations such as =, <, >, and so on. For
example, we can use XOR gates to test whether two numbers are equal. Figure 2.15 shows a 4bit comparator that outputs 1 if the two 4-bit input numbers A = A 3A2A1 AQ and B = B3B2B1B0
match. However, implementing < and > is more involved than testing for equality. While equality
can be established by comparing bit by bit, positional weights must be taken into consideration
when comparing two numbers for < and >. We leave it as an exercise to design such a circuit.

Adders
We now look at adder circuits that provide the basic capability to perform arithmetic operations.
The simplest of the adders is called a half-adder, which adds two bits and produces a sum and
carry output as shown in Figure 2.16a. From the truth table it is straightforward to see that the

Chapter 2 • Digital Logic Circuits

II l o

O3 O2 0

- 1
0 ,— [

encoded
data in

—

^ 0

1 1 -g

Io|o,

Decoded
data out

Figure 2.14 Decoder block diagram and its implementation.

A=B

Figure 2.15 A 4-bit comparator implementation using XOR gates.

carry output Cout can be generated by a single AND gate and the sum output by a single XOR
gate.
The problem with the half-adder is that we cannot use it to build adders that can add more than
two 1-bit numbers. If we want to use the 1-bit adder as a building block to construct larger adders
that can add two A'^-bit numbers, we need an adder that takes the two input bits and a potential
carry generated by the previous bit position. This is what the full-adder does. A full adder takes
three bits and produces two outputs as shown in Figure 2.16b. An implementation of the full-adder
is shown in Figure 2.16.
Using full adders, it is straightforward to build an adder that can add two A^-bit numbers. An
example 16-bit adder is shown in Figure 2.17. Such adders are called ripple-carry adders as the
carry ripples through bit positions 1 through 15. Let us assume that this ripple-carry adder is using
the full adder shown in Figure 2.16b. If we assume a gate delay of 5 ns, each full adder takes three
gate delays (=15 ns) to generate Cout- Thus, the 16-bit ripple-carry adder shown in Figure 2.17

Assembly Language Programming in Linux

A B Sum

1 0

Sum
A
B

(a) Half-adder truth table and implementation

Sum

A B

Cin

1 0

f^I>Tf^^

Sum

(b) Full-adder truth table and implementation

Figure 2.16 Full- and half-adder truth tables and implementations.

takes 16 X 15 = 240 ns. If we were to use this type of adder circuit in a system, it cannot run more
than 1/240 ns = 4 MHz with each addition taking about a clock cycle.
How can we speed up multibit adders? If we analyze the reasons for the "slowness" of the
ripple-carry adders, we see that carry propagation is causing the delay in producing thefinaliV-bit
output. If we want to improve the performance, we have to remove this dependency and determine
the required carry-in for each bit position independently. Such adders are called carry lookahead
adders. The main problem with these adders is that they are complex to implement for long words.
To see why this is so and also to give you an idea of how each full adder can generate its own carryin bit, let us look at the logical expression that should be implemented to generate the carry-in.
Carry-out from the rightmost bit position Co is obtained as
Co = Ao Bo .
Ci is given by
C i = C o ( A i + Bi) + A i B i .
By substituting Ao Bo for Co, we get
Ci = Ao Bo Ai + Ao Bo Bi 4- Ai Bi.

Chapter 2 • Digital Logic Circuits
A,5

R 15

. • .

R J

Figure 2.17 A 16-bit ripple-carry adder using the full adder building blocks.

Similarly, we get C2 as
C2

Ci (A2 4- B2) -f A2 B2

A2 Ao Bo Ai + A2 Ao Bo Bi + A2 Ai Bi
-f B2 Ao Bo Ai + B2 Ao Bo Bi + B2 Ai Bi

A2B2

Using this procedure, we can generate the necessary carry-in inputs independently. The logical
expression for C^ is a sum-of-products expression involving only A^ and B/j, i < k < 0. Thus,
independent of the length of the word, only two gate delays are involved, assuming a single gate
can implement each product term. The complexity of implementing such a circuit makes it impractical for more than 8-bit words. Typically, carry lookahead is implemented at the 4- or 8-bit
level. We can apply our ripple-carry method of building higher word length adders by using these
4- or 8-bit carry lookahead adders.

Programmable Logic Devices
We have seen several ways of implementing sum-of-products expressions. Programmable logic
devices provide yet another way to implement these expressions. There are two types of these
devices that are very similar to each other. The next two subsections describe these devices.
Programmable Logic Arrays (PLAs)

PLA is a field programmable device to implement sum-of-product expressions. It consists of an
AND array and an OR array as shown in Figure 2.18. A PLA takes A^ inputs and produces M
outputs. Each input is a logical variable. Each output of a PLA represents a logical function output.
Internally, each input is complemented, and a total of 2N inputs is connected to each AND gate
in the AND array through a fuse. The example PLA, shown in Figure 2.18, is a^ x 2 PLA with
two inputs and two outputs. Each AND gate receives four inputs: lo, lo, Ii, and Ii. The fuses are
shown as small white rectangles. Each AND gate can be used to implement a product term in the
sum-of-products expression.

Assembly Language Programming in Linux

OR array

Figure 2.18 An example PLA with two inputs and two outputs.

The OR array is organized similarly except that the inputs to the OR gates are the outputs of
the AND array. Thus, the number of inputs to each OR gate is equal to the number of AND gates
in the AND array. The output of each OR gate represents a function output.
When the chip is shipped from the factory, all fuses are intact. We program the PLA by
selectively blowing some fuses (generally by passing a high current through them). The chip
design guarantees that an input with a blown fuse acts as 1 for the AND gates and as 0 for the OR
gates.
Figure 2.19 shows an example implementation of functions FQ and Fi. The rightmost AND
gate in the AND array produces the product term A B. To produce this output, the inputs of this
gate are programmed by blowing the second and fourth fuses that connect inputs A and B, respectively. Programming a PLA to implement a sum-of-products function involves implementing each
product term by an AND gate. Then a single OR gate in the OR array is used to obtain the final
function. In Figure 2.19, we are using two product terms generated by the middle two AND gates
(Pi and P2) as inputs to both OR gates as these two terms appear in both FQ and Fi.
To simplify specification of the connections, the notation shown in Figure 2.20 is used. Each
AND and OR gate input is represented by a single line. A x is placed if the corresponding input
is connected to the AND or OR gates as shown in this figure.
Programmable Array Logic Devices (PALs)

PL As are very flexible in implementing sum-of-products expressions. However, the cost of providing a large number of fuses is high. For example, a 12 x 12 PLA with a 50-gate AND array
and 12-gate OR array requires 24 x 50 == 1200 fuses for the AND array and 50 x 12 = 600 fuses
for the OR array for a total of 1800 fuses. We can reduce this complexity by noting that we can

Chapter 2 • Digital Logic Circuits

1
p,

P()

Fn = AB + AB + AB

F i = A B + AB + AB

Figure 2.19 Implementation of functions Fo and Fi using the example PLA.

# — *
^ > -

U>-

*—f-*

—y^
•

•

P(.

p,l

Fn = AB + AB + AB

#—m—*

F, = A B + AB + AB

Figure 2.20 A simplified notation to show Implementation details of a PLA.

retain most of theflexibilityby cutting down the set of fuses in the OR array. This is the rationale
for PALs. Due to their cost advantage, most manufacturers produce only PALs.
PALs are very similar to PLAs except that there is no programmable OR array. Instead, the
OR connections are fixed. Figure 2.21 shows a PAL with the bottom OR gate connected to the
leftmost two product terms and the other OR gate connected to the other two product terms. As a

Assembly Language Programming in Linux

M>-

-*-*-^

^—N^

Fn = A B + A B

F, = A B + A B

Figure 2.21 Programmable array logic device with fixed OR gate connections. We have used the
simplified notation to indicate the connections in the AND array.

result of these connections, we cannot implement the two functions shown in Figure 2.20. This is
the loss offlexibilitythat sometimes may cause problems but in practice is not such a big problem.
But the advantage of PAL devices is that we can cut down all the OR array fuses that are present in
a PLA. In the last example, we reduce the number of fuses by a third—from 1800 fuses to 1200.

Arithmetic and Logic Units
We are now ready to design our own arithmetic and logic unit. The ALU forms the computational
core of a processor, performing basic arithmetic and logical operations such as integer addition,
subtraction, and logical AND and OR functions. Figure 2.22 shows an example ALU that can perform two arithmetic functions (addition and subtraction) and two logical functions (AND and OR).
We use a multiplexer to select one of the four functions. The implementation is straightforward
except that we implement the subtractor using a full adder by negating the B input.
To see why this is so, you need to understand the 2's complement representation for negative numbers. A detailed discussion of this number representation is given in Appendix A (see
page 468). Here we give a brief explanation. The operation {x — y) is treated as adding —y to x.
That is, {x - y) is implemented as x + (—y) so that we can use an adder to perform subtraction.
For example, 12 - 5 is implemented by adding - 5 to 12. In the 2's complement notation, - 5 is
represented as 101 IB, which is obtained by complementing the bits of number 5 and adding 1.
This operation produces the correct result as shown below:

Chapter 2 • Digital Logic Circuits
B

A C

33
Fi Fo
Fi Fo

AandB

AorB

A+B

A-B

Figure 2.22 A simple 1-bit ALU tiiat can perform addition, subtraction, AND, and OR operations.
The carry output of the circuit is incomplete in this figure as a better and more efficient circuit is
shown in the next figure. Note: V and"-" represent arithmetic addition and subtraction operations,
respectively.

120:==: l l O O B
-5D= lOllB
OlllB

To implement the subtract operation, we first convert B to — B in 2's complement representation. We get the 2's complement representation by complementing the bits and adding 1. We need
an inverter to complement. The required 1 is added via C in.
Since the difference between the adder and subtracter is really the negation of the one input,
we can get a better circuit by using a programmable inverter. Figure 2.23 shows the final design
with the XOR gate acting as a programmable inverter. Remember that, when one of the inputs
is one, the XOR gate acts as an inverter for the other input. We can use these 1-bit ALUs to get
word-length ALUs. Figure 2.24 shows an implementation of a 16-bit ALU using the 1-bit ALU
of Figure 2.23.
To illustrate how the circuit in Figure 2.24 subtracts two 16-bit numbers, let us consider an
example with A = 1001_1110 1101 1110 and B = 0110 1011 0110 1101. Since B is internally
complemented, we get B = 1001 0100 1001 0010. Now we add A and B with the carry-in to the
rightmost bit set to 1 (through the FQ bit):

Assembly Language Programming in Linux
B A G

Fi Fo

Figure 2.23 A better 1-bit ALU that uses a single full adder for both addition and subtraction operations.

Ao Bo

A, B,

A15B15

A B

Fi Fo

Cout

Carry out

Rl5

^in

A B

Fi Fo

Cout

C>n

Figure 2.24 A 16-bit ALU built with the 1-bit ALU: The Fo function bit sets Cin to 1 for the subtract
operation. Logical operations ignore the carry bits.

A
B
A-B

1 -^ carry-in from Fo
= 1001 1110 1101 1110
= 1001 0100 1001 0010
= 0011 0011 0111 0001

which is the correct value. If B is larger than A, we get a negative number. In this case, the result

Chapter 2 • Digital Logic Circuits

Sequential circuit

Input

Output

Combinational
circuit

Feedback
circuit

Feedback
K
k

Figure 2.25 Main components of a sequential circuit.

will be in the 2's complement form. Also note that, in the 2's complement representation, we
ignore any carry generated out of the most significant bit.

Sequential Circuits
The output of a combinational circuit depends only on the current inputs. In contrast, the output
of a sequential circuit depends both on the current input values as well as the past inputs. This
dependence on past inputs gives the property of "memory" for sequential circuits.
In general, the sequence of past inputs is encoded into a set of state variables. There is a feedback path that feeds these variables to the input of a combinational circuit as shown in Figure 2.25.
Sometimes, this feedback consists of a simple interconnection of some outputs of the combinational circuit to its inputs. For the most part, however, the feedback circuit consists of elements
such as flip-flops that we discuss later. These elements themselves are sequential circuits that can
remember or store the state information. Next we introduce system clock to incorporate time into
digital circuits.
System Clock

Digital circuits can operate in asynchronous or synchronous mode. Circuits that operate in asynchronous mode are independent of each other. That is, the time at which a change occurs in one
circuit has no relation to the time a change occurs in another circuit. Asynchronous mode of operation causes serious problems in a typical digital system in which the output of one circuit goes as
input to several others. Similarly, a single circuit may receive outputs of several circuits as inputs.
Asynchronous mode of operation implies that all required inputs to a circuit may not be valid at
the same time.
To avoid these problems, circuits are operated in synchronous mode. In this mode, all circuits
in the system change their state at some precisely defined instants. The clock signal provides such

Assembly Language Programming in Linux

Clock
Clock

cycle n

Rising edge

Falling edge

-^Time
(a) Symmetric

-^- Time
(b) Smaller ON period

0
- ^ Time
(c) Smaller OFF period

Figure 2.26 Three types of clock signals with the same clock period.
a global definition of time instants at which changes can take place. Implicit in this definition is
the fact that the clock signal also specifies the speed at which a circuit can operate.
A clock is a sequence of Is and Os as shown in Figure 2.26, We refer to the period during
which the clock is 1 as the ON period and the period with 0 as the OFF period. Even though we
normally use symmetric clock signals with equal ON and OFF periods as in Figure 2.26a, clock
signals can take asymmetric forms as shown in Figures 2.26b and c.
The clock signal edge going from 0 to 1 is referred to as the rising edge (also called the positive
or leading edge). Analogously, we can define di falling edge as shown in Figure 2.26a. The falling
edge is also referred to as a negative or trailing edge.
A clock cycle is defined as the time between two successive rising edges as shown in Figure 2.26. You can also treat the period between successive falling edges as a clock cycle.
Clock rate or frequency is measured in number of cycles per second. This number is referred
to as Hertz (Hz). The clock period is defined as the time represented by one clock cycle. All three
clock signals in Figure 2.26 have the same clock period.
Clock period =

1
Clock frequency

For example, a clock frequency of 1 GHz yields a clock period of
1
1 X 109

= 1 ns.

Note that one nanosecond (ns) is equal to 10"^ second.
The clock signal serves two distinct purposes in a digital circuit. It provides the global synchronization signal for the entire system. Each clock cycle provides three distinct epochs: start of

Chapter 2 • Digital Logic Circuits

S R
S

(a) Circuit diagram

Qn-fl

Qn
0

R Q

1
1 0
0
-i—h

(b) Logic symbol

Figure 2.27 A NOR gate implementation of the SR latch.

a clock cycle, end of a clock cycle, and an intermediate point at which the clock signal changes
levels. This intermediate point is in the middle of a clock cycle for symmetric clock signals. The
other equally important purpose is to provide timing information in the form of a clock period.

Latches
It is time to look at some simple sequential circuits that can remember a single bit value. We
discuss latches in this section. Latches are level-sensitive devices in that the device responds to the
input signal levels (high or low). In contrast,flip-flopsare edge-triggered. That is, output changes
only at either the rising or falling edge. We look atflip-flopsin the next section.
SR Latch
The SR latch is the simplest of the sequential circuits that we consider. It requires just two NOR
gates. The feedback in this latch is a simple connection from the output of one NOR gate to the
input of the other NOR gate as shown in Figure 2.27a. The logic symbol for the SR latch is shown
in Figure 2.27b.
A simplified truth table forjhe SR latch is shown in Figure 2.27c. The outputs of the two
NOR gates are labeled Q and Q because these two outputs should be complementary in normal
operating mode. We use the notation Qn to represent the current value (i.e., current state) and
Q„_^i to represent the next value (i.e., next state).
Let us analyze the truth table. First consider the two straightforward cases. When S = 0 and
R = 1, we can see that independent of the current state, output Q is forced to be 0 as R is 1. Thus,
the two inputs to the upper NOR gate are 0. This leads Q to be 1. This is a stable state. That is, Q
and Q can stay at 0 and 1, respectively. You can verify that when S = 1 and R = 0, another stable
state Q = 1 and Q = 0 results.
When both S and R are^ero, the next output depends on the current output. Assume that the
current output is Q = 1 and Q = 0. Thus, when you change inputs from S = 1 and R = OtoS=R = 0,
the next state^n+i remains the same as the current state Qn. Now assume that the current state
is Q = 0 and Q = L It is straightforward to verify that changing inputs from S = 0 and R = 1 to
S = R = 0, leaves the output unchanged. We have summarized this behavior by placing Q n as the
output for S = R = 0 in the first row of Figure 2.27c.
What happens when both S and R are 1? As long as these two inputs are held high, both
outputs are forced to take 0. We struck this state from the truth table to indicate that this input
combination is undesirable. To see why this is the case, consider what happens when S and R

Assembly Language Programming in Linux

Clock

Qh-

CP
R
(a) Circuit diagram

Qh-

(b) Logic symbol

Figure 2.28 Clocked SR latch.

inputs are changed from S = R = 1 to S = R = 0. It is only in theory that we can assume that both
inputs change simultaneously. In practice, there is always some finite time difference between the
two signal changes. If the S input goes low earlier than the R signal, the sequence of input changes
is^ SR = 11 —^01 -^00. Because of the intermediate state SR = 01, the output will be Q = 0 and
Q=l.
If, on the other hand, the R signal goes low before the S signal does, the sequence of input
changes is SR = 11 —» 10 —^ 00. Because the transition goes through the SR = 10 intermediate
state, the output will be Q = 1 and Q = 0. Thus, when the input changes from 11 to 00, the output
is indeterminate. This is the reason we want to avoid this state.
The inputs S and R stand for "Set" and "Reset," respectively. When the set input is high (and
reset is low), Q is set (i.e., Q = 1). On the other hand, if set is 0 and reset is 1, Q is reset or cleared
(i.e.,Q = 0).
From this discussion, it is clear that this latch is level sensitive. The outputs respond to changes
in input levels. This is true for all the latches.
We notice that this simple latch has the capability to store a bit. To write 1 into this latch,
set SR as 10; to write 0, use SR = 01. To retain a stored bit, keep both S and R inputs at 0. In
summary, we have the capacity to write 0 or 1 and retain it as long as there is power to the circuit.
This is the basic 1-bit cell that static RAMs use. Once we have the design to store a single bit, we
can replicate this circuit to store wider data as well as multiple words. We look at memory design
issues in the next chapter.
Clocked SR Latch

A basic problem with the SR latch is that the output follows the changes in the input. If we want
to make the output respond to changes in the input at specific instants in order to synchronize with
the rest of the system, we have to modify the circuit as shown in Figure 2.28a. The main change is
that a clock input is used to gate the S and R inputs. These inputs are passed onto the original SR
latch only when the clock signal is high. The inputs have no effect on the output when the clock
signal is low. When the clock signal is high, the circuit implements the truth table of the SR latch
given in Figure 2.27c. This latch is level sensitive as well. As long as the clock signal is high, the
output responds to the SR inputs.

Chapter 2 • Digital Logic Circuits

Q|—

CP
Clock
(b) Logic symbol

(a) Circuit diagram

Qn+l

Figure 2.29 D latch uses an inverter to avoid the SR = 11 input combination.

D Q

CP
Q

(a)

D Q —

>CP

>CP
Q

(b)

— D Q

Q —
(c)

Q
(d)

Figure 2.30 Logic symbol notation for latches and flip-flops: (a) high level-sensitive latch; (b) low
level-sensitive latch; (c) positive edge-triggered flip-flop; (d) negative edge-triggered flip-flop.

D Latch

A problem with both versions of SR latches is that we have to avoid the SR = 11 input combination.
This problem is solved by the D latch shown in Figure 2.29a. We use a single inverter to provide
only complementary inputs at S and R inputs of the clocked SR latch. To retain the value, we
maintain the clock input at 0. The logic symbol and the truth table for the D latch clearly show
that it can store a single bit.
Storing a bit in the D-latch is straightforward. All we have to do is feed the data bit to the D
input and apply a clock pulse to store the bit. Once stored, the latch retains the bit as long as the
clock input is zero. This simple circuit is our first 1-bit memory. In the next chapter, we show how
we can use this basic building block to design larger memories.

Flip-Flops
We have noted thatflip-flopsare edge-triggered devices whereas latches are level sensitive. In the
logic symbol, we use an arrowhead on the clock input to indicate a positive edge-triggered flip-flop
as shown in Figure 2.30c. The absence of this arrowhead indicates a high level-sensitive latch (see
Figure 2.30a). We add a bubble in front of the clock input to indicate a negative edge-triggered
flip-flop (Figure 2,30d) or a low level-sensitive latch (Figure 2.30b).
As is obvious from the bubble notation, we can convert a high level-sensitive latch to a low
level-sensitive one by feeding the clock signal through an inverter. Recall that the bubble represents
an inverter (see page 17). Similarly, we can invert the clock signal to change a negative edgetriggeredflip-flopto a positive edge-triggered one.

Assembly Language Programming in Linux

K Qn+1

<>CP

(a) Truth table

(b) Logic symbol

Figure 2.31 Truth table and logic symbol of the JK flip-flop. The logic symbol is for a negative edge
triggered flip-flop. For a negative flip-flop, delete the bubble on the clock input.

Parallel out

Figure 2.32 A 4-bit shift register using JK flip-flops.

In this section, we look at JK flip-flops. The truth table and logic symbol of this flip-flop is
shown in Figure 2.31. Unlike the SR latch, the JK flip-flop allows all four input combinations.
When JK = 11, the output toggles. This characteristic is used to build counters. Next we show
couple of example sequential circuits that use the JK flip-flops.
Shift Registers

Shift registers, as the name suggests, shift data left or right with each clock pulse. Designing a
shift register is relatively straightforward as shown in Figure 2.32. This shift register, built with
positive edge-triggered JKflip-flops,shifts data to the right. For the first JKflip-flop,we need an
inverter so that the K input is the complement of the data coming in ("Serial in" input). The data
out, taken from the Q output of the rightmost JKflip-flop,is a copy of the input serial signal except
that this signal is delayed by four clock periods. This is one of the uses of the shift registers.

Chapter 2 • Digital Logic Circuits

Clock

(a) Circuit diagram

Clock

jTrLrurLrLrLrLTL

Q2
(b) Timing diagram

Figure 2.33 A binary ripple counter implementation using negative edge-triggered JK flip-flops.

We can also use a shift register for serial-to-parallel conversion. For example, a serial signal,
given as input to the shift register in Figure 2.32, produces a parallel 4-bit output (taken from the
four Q outputs of the JKflip-flops)as shown in Figure 2.32. Even though we have not shown it
here, we can design a shift register that accepts input in parallel (i.e., parallel load) as well as serial
form. Shift registers are also useful in implementing logical bit shift operations in the ALU of a
processor.
Counters
A counter is another example of a sequential circuit that is often used in digital circuits. To see
how we can build a counter, let us consider the simplest of all counters: the binary counter. A
binary counter with B bits can count from 0 to 2^ - 1. For example, a 3-bit binary counter can
count from 0 to 7. After counting eight (with a count value of 7), the count value wraps around to
zero. Such a counter is called a modulo-8 counter.
We know that a modulo-8 counter requires 3 bits to represent the count value. In general, a
modulo-2^ counter requires B bits (i.e., log22^ bits). To develop our intuition, it is helpful to
look at the values 0 through 7, written in the binary form in that sequence. If you look at the
rightmost bit, you will notice that it changes with every count. The middle bit changes whenever
the rightmost bit changes from 1 to 0. The leftmost bit changes whenever the middle bit changes

Assembly Language Programming in Linux

Clock

Figure 2.34 A synchronous nnodulo-8 counter.

from 1 to 0. These observations can be generalized to counters that use more bits. There is a simple
rule that governs the counter behavior: a bit changes (flips) its value whenever its immediately
preceding right bit goes from 1 to 0. This observation gives the necessary clue to design our
counter. Suppose we have a negative edge-triggered JK flip-flop. We know that this flip-flop
changes its output with every negative edge on the clock input, provided we hold both J and K
inputs high. Well, that is thefinaldesign of our 3-bit counter as shown in Figure 2.33.
We operate the JKflip-flopsin the "toggle" mode with JK = 11. The Q output of one flip-flop
is connected as the clock input of the next flip-flop. The input clock, which drives our counter,
is applied to FFO. When we write the counter output as Q2QiQo» the count value represents
the number of negative edges in the clock signal. For example, the dotted line in Figure 2.33b
represents Q2Q1Q0 = 011. This value matches the number of falling edges to the left of the dotted
line in the input clock.
Counters are also useful in generating clocks with different frequencies by dividing the input
clock. For example, the frequency of the clock signal at Q 0 output is half of the input clock.
Similarly, frequencies of the signals at Qi and Q2 are one-fourth and one-eighth of the counter
input clock frequency.
The counter design shown in Figure 2.33 is called a ripple counter as the count bits ripple from
the rightmost to the leftmost bit (i.e., in our example, from FFO to FF2). A major problem with
ripple counters is that they take a long time to propagate the count value. We have had a similar
discussion about ripple carry adders on page 28.
How can we speed up the operation of the ripple binary counters? We apply the same trick
that we used to derive the carry lookahead adder on page 28. We can design a counter in which
all output bits change more or less at the same time. These are called synchronous counters. We
can obtain a synchronous counter by manipulating the clock input to each flip-flop. We observe
from the timing diagram in Figure 2.33b that a clock input should be applied to aflip-flopif all the
previous bits are 1. For example, a clock input should be applied to FFl whenever the output of
FFO is 1. Similarly, a clock input for FF2 should be applied when the outputs of FFO and FFl are
both 1. A synchronous counter based on this observation is shown in Figure 2.34.
Sequential circuit design is relatively more complex than designing a combinational circuit.
A detailed discussion of this topic is outside the scope of this book. If you are interested in this
topic, you can refer to Fundamentals of Computer Organization and Design by Dandamudi for
more details.

Chapter 2 • Digital Logic Circuits

Summary
A computer system consists of three main components: processor, memory, and I/O. These three
components are glued together by a system bus. The system bus consists of three buses: data
bus, address bus, and control bus. The address bus is used to carry the address information. The
width of this bus determines the memory address space of the processor. The data bus is used
for transferring data between these components (e.g., from memory to processor). The data bus
width determines the size of the data moved in one transfer cycle. The control bus provides
several control signals to facilitate a variety of activities on the system bus. These activities include
memory read, I/O write, and so on.
The remainder of the chapter looked at the digital logic circuits in detail. We introduced
several simple logic gates such as AND, OR, NOT gates as well as NAND, NOR, and XOR gates.
Although the first three gates are considered as the basic gates, we often find that the other three
gates are useful in practice.
We described three ways of representing logical functions: truth table, logical expression, and
graphical form. The truth table method is cumbersome for logical expressions with more than
a few variables. Logical expression representation is useful to derive simplified expressions by
applying Boolean identities. The graphical form is useful to implement logical circuits.
Logical expressions can be written in one of two basic forms: sum-of-products or productof-sums. From either of these expressions, it is straightforward to obtain logic circuit implementations. However, such circuits are not the best designs as simplifying logical expressions can
minimize the component count. Several methods are available to simplify logical expressions. We
have discussed two of them: the algebraic and Karnaugh map methods.
Combinational circuits provide a higher level of abstraction than the basic logic gates. Higherlevel logical functionality provided by these circuits helps in the design of complex digital circuits.
We have discussed several commonly used combinational circuits including multiplexers, demultiplexers, decoders, comparators, adders, and ALUs.
We also presented details about two types of programmable logic devices: PLAs and PALs.
These devices can also be used to implement any logical function. Both these devices use internal fuses that can be selectively blown to implement a given logical function. PALs reduce the
complexity of the device by using fewer fuses than PLAs. As a result, most commercial implementations of programmable logic devices are PALs.
Our discussion of ALU design suggests that complex digital circuit design can be simplified
by using the higher level of abstraction provided by the combinational circuits.
In combinational circuits, the output depends only on the current inputs. In contrast, output of
a sequential circuit depends both on the current inputs as well as the past history. In other words,
sequential circuits are state-dependent whereas the combinational circuits are stateless.
Design of a sequential circuit is relatively more complex than designing a combinational circuit. In sequential circuits, we need a notion of time. We introduced the clock signal to provide this
timing information. Clocks also facilitate synchronization of actions in a large, complex digital
system that has both combinational and sequential circuits.
We discussed two basic types of circuits: latches and flip-flops. The key difference between
these two devices is that latches are level sensitive whereas flip-flops are edge-triggered. These
devices can be used to store a single bit of data. Thus, they provide the basic capability to design
memories. We discuss memory design in the next chapter.

Assembly Language Programming in Linux

We presented some example sequential circuits—shift registers and counters—that are commonly used in digital circuits. There are several other sequential circuit building blocks that are
commercially available.

3
Memory Organization
In the last chapter, we have seen how flip-flops and latches can be used to store a bit. This chapter
builds on this foundation and explains how we can use these basic devices and build larger memory
blocks and modules. We start off with an overview of memory operations and the types of memory.
The following section discusses how larger memories can be built using memory chips. The design
process is fairly intuitive. The basic technique involves using a two-dimensional array of memory
chips. A characteristic of these designs is the use of chip select. Chip select input can be used to
select or deselect a chip or a memory module. Chip select allows us to connect multiple devices
to the system bus. Appropriate chip select signal generation facilitates communication among the
entities connected to the system bus.
Chip select logic is also useful in mapping memory modules to memory address space. We
present details about two ways of mapping a memory module to the address space. Before ending
the chapter, we describe how multibyte data are stored in memory and explain the reasons why
data alignment leads to improved application performance. We end the chapter with a summary.

Introduction
The memory of a computer system consists of tiny electronic switches, with each switch set in
one of two states: open or closed. It is, however, more convenient to think of these states as 0
and 1 rather than open and closed. A single such switch can be used to represent two (i.e., binary)
numbers: a zero and a one. Thus, each switch can represent a binary digit or bit, as it is known.
The memory unit consists of millions of such bits. In order to make memory more manageable,
bits are organized into groups of eight bits called bytes. Memory can then be viewed as consisting
of an ordered sequence of bytes. Each byte in this memory can be identified by its sequence
number starting with 0, as shown in Figure 3.1. This is referred to as the memory address of the
byte. Such memory is called byte addressable memory.
The amount of memory that a processor can address depends on the address bus width. Typically, 32-bit processors support 32-bit addresses. Thus, these processors can address up to 4 GB
(2^^ bytes) of main memory as shown in Figure 3.1. This number is referred to as the memory address space. The actual memory in a system, however, is always less than or equal to the memory
address space. The amount of memory in a system is determined by how much of this memory
address space is populated with memory chips.

Assembly Language Programming in Linux
Address

Address

(in decimal)

(in hex)

2^^-l

FFFFFFFF
FFFFFFFE
FFFFFFFD

•
•
•
2

00000002

00000001

00000000

Figure 3.1 Logical view of the system memory.

This chapter gives details about memory organization. In the next section we give details about
the two basic memory operations—read and write. Memory can be broadly divided into read-only
and read/write types. Details about the types of memory are given next. After giving these details,
we look at the memory design issues. Towards the end of the chapter, we describe two ways of
storing multibyte data and the reasons why data alignment results in improved performance.

Basic Memory Operations
The memory unit supports two fundamental operations: read and write. The read operation reads
a previously stored data and the write operation stores a value in memory. Both of these operations
require an address in memory from which to read a value or to which to write a value. In addition,
the write operation requires specification of the data to be written. The block diagram of the
memory unit is shown in Figure 3.2. The address and data of the memory unit are connected to
the address and data buses of the system bus, respectively. The read and write signals come from
the control bus.
Two metrics are used to characterize memory. Access time refers to the amount of time required
by the memory to retrieve the data at the addressed location. The other metric is the memory cycle
time, which refers to the minimum time between successive memory operations. Memory transfer
rates can be measured by the bandwidth metric, which specifies the number of bytes transferred
per second.
The read operation is nondestructive in the sense that one can read a location of the memory
as many times as one wishes without destroying the contents of that location. The write operation,
on the other hand, is destructive, as writing a value into a location destroys the old contents of that
memory location.

Chapter 3 • Memory Organization

Address
Data

Figure 3.2 Block diagram of the system memory.

Steps in a typical read cycle
1. Place the address of the location to be read on the address bus;
2. Activate the memory read control signal on the control bus;
3. Wait for the memory to retrieve the data from the addressed memory location and place it
on the data bus;
4. Read the data from the data bus;
5. Drop the memory read control signal to terminate the read cycle.
For example, a simple Pentium read cycle takes three clock cycles. During the first clock
cycle, steps 1 and 2 are performed. The processor waits until the end of the second clock and
reads the data and drops the read control signal. If the memory is slower (and therefore cannot
supply data within the specified time), the memory unit indicates its inability to the processor and
the processor waits longer for the memory to supply data by inserting wait cycles. Note that each
wait cycle introduces a waiting period equal to one system clock period and thus slows down the
system operation.
Steps in a typical write cycle
1. Place the address of the location to be written on the address bus;
2. Place the data to be written on the data bus;
3. Activate the memory write control signal on the control bus;
4. Wait for the memory to store the data at the addressed location;
5. Drop the memory write signal to terminate the write cycle.
As with the read cycle, Pentium requires three clock cycles to perform a simple write operation.
During the first clock cycle, steps 1 and 3 are done. Step 2 is performed during the second clock
cycle. The processor gives memory time until the end of the second clock and drops the memory
write signal. If the memory cannot write data at the maximum processor rate, wait cycles can be
introduced to extend the write cycle.

Assembly Language Programming in Linux

Types of Memory
The memory unit can be implemented using a variety of memory chips—different speeds, different
manufacturing technologies, and different sizes. The two basic types of memory are the read-only
memory and read/write memory.
A basic property of memory systems is, they are random access memories in that accessing
any memory location (for reading or writing) takes the same time. Contrast this with data stored
on a magnetic tape. Access time on the tape depends on the location of the data.
Volatility is another important property of a memory system. A volatile memory requires
power to retain its contents. A nonvolatile memory can retain its values even in the absence of
power.
Read-Only IVIemories Read-only memory (ROM) allows only read operations to be performed.
As the name suggests, we cannot write into this memory. The main advantage of ROM is that it is
nonvolatile. Most ROM is factory programmed and cannot be altered. The term programming in
this context refers to writing values into a ROM. This type of ROM is cheaper to manufacture in
large quantities than other types of ROM. The program that controls the standard input and output
functions (called BIOS), for instance, is kept in ROM. Current systems use the flash memory rather
than a ROM (see our discussion later).
Other types include programmable ROM (PROM) and erasable PROM (EPROM). PROM is
useful in situations where the contents of ROM are not yet fixed. For instance, when the program
is still in the development stage, it is convenient for the designer to be able to program the ROM
locally rather than at the time of manufacture.
In PROM, a fuse is associated with each bit cell. If the fuse is on, the bit cell supplies a 1
when read. The fuse has to be burned to read a 0 from that bit cell. When PROM is manufactured,
its contents are all set to 1. To program a PROM, selective fuses are burned (to introduce Os) by
sending high current. This is the writing process and is not reversible (i.e., a burned fuse cannot be
restored). EPROM offers further flexibility during system prototyping. Contents of EPROM can
be erased by exposing them to ultraviolet light for a few minutes. Once erased, EPROM can be
reprogrammed again.
Electrically erasable PROMs (EEPROMs) allow further flexibility. By exposing to ultraviolet
light, we erase all the contents of an EPROM. EEPROMs, on the other hand, allow the user to
selectively erase contents. Furthermore, erasing can be done in place; there is no need to place it
in a special ultraviolet chamber.
Flash memory is a special kind of EEPROM. One main difference between the EEPROM and
flash memory lies in how the memory contents are erased. The EEPROM is byte-erasable whereas
the flash memory is block-erasable. Thus, writing in the flash memory involves erasing a block
and rewriting it.
Current systems use the flash memory for BIOS so that changing BIOS versions is fairly
straightforward (You just have to "flash" the new version). Flash memory is also becoming very
popular as a removable media. The SmartMedia, CompactFlash, Sony's Memory Stick are all
examples of various forms of removableflashmedia.
Flash memory, however, is slower than the RAMs we discuss next. For example,flashmemory
cycle time is about 80 ns whereas the corresponding value for RAMs is about 10 ns. Nevertheless,
sinceflashmemories are nonvolatile, they are used in applications where this property is important.
Apart from BIOS, we see them in devices like digital cameras and video game systems.

Chapters • Memory Organization

Read/Write Memory Read/write memory is commonly referred to as random access memory
(RAM), even though ROM is also a random access memory. This terminology is so entrenched in
the literature that we follow it here with a cautionary note that RAM actually refers to RWM.
Read/write memory can be divided into static and dynamic categories. Static random access
memory (SRAM) retains the data, once written, without further manipulation so long as the source
of power holds its value. SRAM is typically used for implementing the processor registers and
cache memories.
The bulk of main memory in a typical computer system, however, consists of dynamic random
access memory (DRAM). DRAM is a complex memory device that uses a tiny capacitor to store a
bit. A charged capacitor represents 1 bit. Since capacitors slowly lose their charge due to leakage,
they must be periodically refreshed to replace the charges representing 1 bit. A typical refresh
period is about 64 ms. Reading from DRAM involves testing to see if the corresponding bit cells
are charged. Unfortunately, this test destroys the charges on the bit cells. Thus, DRAM is a
destructive read memory.
For proper operation, a read cycle is followed by a restore cycle. As a result, the DRAM cycle
time, the actual time necessary between accesses, is typically about twice the read access time,
which is the time necessary to retrieve a datum from the memory.
Several types of DRAM chips are available. We briefly describe some of most popular types
DRAMs next.
FPM DRAMs Fast page mode (FPM) DRAMs are an improvement over the previous generation
DRAMs. FPM DRAMs exploit the fact that we access memory sequentially, most of the time. To
know how this access pattern characteristic is exploited, we have to look at how the memory is
organized. Internally, the memory is organized as a matrix of bits. For example, a 32 Mb memory
could be organized as 8 K rows (i.e., 8192 since K = 1024) and 4 K columns. To access a bit,
we have to supply a row address and a column address. In the FPM DRAM, a page represents
part of the memory with the same row address. To access a page, we specify the row address only
once; we can read the bits in the specified page by changing the column addresses. Since the row
address is not changing, we save on the memory cycle time.
EDO DRAMs Extended Data Output (EDO) DRAM is another type of FPM DRAM. It also exploits the fact that we access memory sequentially. However, it uses pipelining to speed up memory
access. That is, it initiates the next request before the previous memory access is completed. A
characteristic of pipelining inherited by EDO DRAMs is that single memory reference requests
are not speeded up. However, by overlapping multiple memory access requests, it improves the
memory bandwidth.
SDRAMs Both FPM DRAMs and EDO DRAMs are asynchronous in the sense that their data
output is not synchronized to a clock. The synchronous DRAM (SDRAM) uses an external clock
to synchronize the data output. This synchronization reduces delays and thereby improves the
memory performance. The SDRAM memories are used in systems that require memory satisfying
the PC100/PC133 specification. SDRAMs are dominant in low-end PC market and are cheap.
DDR SDRAMs The SDRAM memories are also called single data rate (SDR) SDRAMs as they
supply data once per memory cycle. However, with increasing processor speeds, the processor
bus (also called front-side bus or FSB) frequency is also going up. For example, PCs now have a
533 MHz FSB that supports a transfer rate of about 4.2 GB/s. To satisfy this transfer rate, SDRAMs
have been improved to provide data at both rising and falling edges of the clock. This effectively
doubles the memory bandwidth and satisfies the high data transfer rates of faster processors.

Assembly Language Programming in Linux

D I — ^ • — DO
(b) E = 0
E

1
DI--

— DO

(a)

• ••
(c)E=l

Inputs

Output

DO
(d)

Figure 3.3 Tristate buffer: (a) logic symbol; (b) it acts as an open circuit when the enable input is
inactive (E = 0); (c) it acts as a closed circuit when the enable input is active (E = 1); (d) truth table
(X = don't care input, and Z = high innpedance state).

RDRAMs Rambus DRAM (RDRAM) takes a completely different approach to increase the
memory bandwidth. A technology developed and licensed by Rambus, it is a memory subsystem
that consists of the RAM, RAM controller, and a high-speed bus called the Rambus channel. Like
the DDR DRAM, it also performs two transfers per cycle. In contrast to the 8-byte wide data bus
of DRAMs, Rambus channel is a 2-byte data bus. However, by using multiple channels, we can
increase the bandwidth of RDRAMs. For example, a dual-channel RDRAM operating at 533 MHz
provides a bandwidth of 533 * 2 * 4 = 4.2 GB/s, sufficient for the 533 MHz FSB systems.
From this brief discussion it should be clear that DDR SDRAMs and RDRAMs compete with
each other in the high-end market. The race between these two DRAM technologies continues as
Intel boosts its FSB to 800 MHz.

Building a [\/lemory Block
In the last chapter, we discussed several basic building blocks such as flip-flops, multiplexers, and
decoders. For example, flip-flops provide the basic capability to store a bit of data. These devices
can be replicated to build larger memory units. For example, we can place 16 flip-flops together
in a row to store a 16-bit word. All the 16flip-flopswould have their clock inputs tied together to
form a single common clock to write a 16-bit word. We can place several such rows in a memory
chip to store multiple words of data. In this organization, each row supplies a word. To build even
larger memories, we can use multiple chips such that all their data lines are connected to the data
bus. This implies that we need to find a way to connect these outputs together. Tristate buffers are
used for this purpose.
Tristate Buffers

The logic circuits we have discussed in the last chapter have two possible states: 0 or 1. The
devices we discuss here are called tristate buffers as they can be in three states: 0, 1, or Z state. A
tristate buffer output can be in state 0 or 1 just as with a normal logic gate. In addition, the output
can also be in a high impedance (Z) state, in which the output floats. Thus, even though the output
is physically connected to the bus, it behaves as though it is electrically and logically disconnected
from the bus.
Tristate buffers use a separate control signal to float the output independent of the data input
(see Figure 3.3a). This particular feature makes them suitable for bus connections. Figure 3.3a

Chapters • Memory Organization

shows the logic symbol for a tristate buffer. When the enable input (E) is low, the buffer acts as an
open circuit (i.e., output is in the high impedance state Z) as shown in Figure 3.3b; otherwise, it
acts as a short circuit (Figure 3.3c). The enable input must be high in order to pass the input data
to output, as shown in the truth table (see Figure 3.3d).
Memory Design with D Flip-Flops
We begin our discussion with how one can build memories using the D flip-flops. Recall that we
useflip-flopsfor edge-triggered devices and latches for level-sensitive devices. The principle of
constructing memory out of Dflip-flopsis simple. We use a two-dimensional array of D flip-flops,
with each row storing a word. The number of rows is equal to the number of words the memory
should store. Thus, this organization uses "horizontal" expansion to increase the word width and
"vertical" expansion to increase the number of words.
In general, the number of columns and the number of rows is a power of two. We use the
notation M x N memory to represent a memory that can store M words, where each word is
N-bits long.
Figure 3.4 shows a 4 x 3 memory built with 12 Dflip-flopsorganized as a 4 x 3 array. Since all
flip-flops in a row store a word of data, each row offlip-flopshas their clock signals tied together
to form a single clock signal for each row. Allflip-flopsin a column receive input from the same
input data line. For example, the rightmost column D inputs are connected to the input data DO.
This memory requires two address lines to select one of the four words. The two address lines
are decoded to select a specific row by using a 2-to-4 decoder. The low-active write signal (WR)
is gated through an AND gate as shown in Figure 3.4. Depending on the address, only one of the
four decoder output lines will be high, permitting the WR signal to clock the selected row to write
the 3-bit data present on DO to D2 lines. Note that the decoder along with the four AND gates
forms a demultiplexer that routes the WR signal to the row selected by the address lines Al and
AO.
The design we have done so far allows us to write a 3-bit datum into the selected row. To
complete the design, we have to find a way to read data from this memory. As each bit of data is
supplied by one of the four Dflip-flopsin a column, we have to find a way to connect these four
outputs to a single data out line. A natural choice for the job is a 4-to-l multiplexer. The MUX
selection inputs are connected to the address lines to allow appropriate data on the output lines DO
through D2. The final design is shown in Figure 3.4.
We need to pass the outputs of the multiplexers through tristate buffers as shown in Figure 3.4.
The enable input signal for these output tristate buffers is generated by ANDing the chip select
and read signals. Two inverters are used to provide low-active chip select (CS) and memory read
(RD) inputs to the memory block.
With the use of the tristate buffers, we can tie the corresponding data in and out signal lines together to satisfy the data bus connection requirements. Furthermore, we can completely disconnect
the outputs of this memory block by making CS high.
We can represent our design using the logic symbol shown in Figure 3.5. Our design uses
separate read and write signals. These two signals are part of the control bus (see Figure 2,1). It
is also possible to have a single line to serve as a read and write line. For example, a 0 on this
line can be interpreted as write and a 1 as read. Such signals are represented as the WR/RD line,
indicating low-active write and high-active read.

Assembly Language Programming in Linux

Figure 3.4 A 4 x 3 memory design using D flip-flops.

Building Larger iVIemories
Now that we know how to build memory blocks using devices that can store a single bit, we move
on to building larger memory units using these memory blocks. We explain the design process
by using an example. Before discussing the design procedure, we briefly present details about
commercially available memory chips.

Chapters • Memory Organization

Address lines

Write

Read

DO
4X3
memory

Bidirectional
data lines

D2
Chip select

Figure 3.5 Block diagram representation of a 4 x 3 memory.

Memory Chips

Several commercial memory chips are available to build larger memories. Here we look at two
example chips—a SRAM and a DRAM—from Micron Technology.
The SRAM we discuss is an 8-Mb chip that comes in three configurations: 512 K x 18,
256 K X 32, or 256 K x 36. Note that, in the first and last configurations, word length is not a
multiple of 8. These additional bits are useful for error detection/correction. These chips have an
access time of 3.5 ns. The 512 K x 18 chip requires 19 address lines, whereas the 256 K x 32/36
versions require 18 address lines.
An example DRAM (it is a synchronous DRAM) is the 256-Mb capacity chip that comes in
word lengths of 4, 8, or 16 bits. That is, this memory chip comes in three configurations: 64 M x 4,
32 M x 8, or 16 M X 16. The cycle time for this chip is about 7 ns.
In the days when the data bus widths were small (8 or 16), DRAM chips were available in 1-bit
widths. Current chips use a word width of more than 1 as it becomes impractical to string 64 1-bit
chips to get 64-bit word memories for processors such as the Pentium.
From the details of these two example memory chips, we see that the bit capacity of a memory
chip can be organized into several configurations. If we focus on the DRAM chip, for example,
what are the pros and cons of the various configurations? The advantage of wider memory chips
(i.e., chips with larger word size) is that we require fewer of them to build a larger memory. As
an example, consider building memory for your Pentium-based PC. Even though the Pentium is
a 32-bit processor, it uses a 64-bit wide data bus. Suppose that you want to build a 16 M x 64
memory. We can build this memory by using four 16 M x 16 chips, all in a single row. How do
we build such a memory using, for example, the 32 M x 8 version of the chip? Because our word
size is 64, we have to use 8 such chips in order to provide 64-bit wide data. That means we get
32 M X 64 memory as the minimum instead of the required 16 M x 64. The problem becomes
even more serious if we were to use the 64 M x 4 version chip. We have to use 16 such chips, and
we end up with a 64 M x 64 memory. This example illustrates the tradeoff between using "wider"
memories versus "deeper" memories.
Larger Memory Design

Before proceeding with the design of a memory unit, we need to know if the memory address space
(MAS) supported by the processor is byte addressable or not. In a byte-addressable space, each
address identifies a byte. All popular processors—the Pentium, PowerPC, SPARC, and MIPS—

Assembly Language Programming in Linux

support byte-addressable space. Therefore, in our design examples, we assume byte-addressable
space.
We now discuss how one can use memory chips, such as the ones discussed before, to build
system memory. The procedure is similar to the intuitive steps followed in the previous design
example.
First we have to decide on the configuration of the memory chip, assuming that we are using the
DRAM chip described before. As described in the last section, independent of the configuration,
the total bit capacity of a chip remains the same. That means the number of chips required remains
the same. For example, if we want to build a 64 M x 32 memory, we need eight chips. We can
use eight 64 M x 4 in a single row, eight 32 M x 8 in 2 x 4 array, o r l 6 M x 16 in 4 x 2 array.
Although we have several alternatives for this example, there may be situations where the choice
is limited. For example, if we are designing a 16 M x 32 memory, we have no choice but to use
the 16 M X 16 chips.
Once we have decided on the memory chip configuration, it is straightforward to determine the
number of chips and the organization of the memory unit. Let us assume that we are using D x W
chips to build an M x N memory. Of course, we want to make sure that D < M and W < N.
Number of chips required

Number of rows
Number of columns

M XN
D X W'

N
—.
W

The read and write lines of all memory chips should be connected to form a single read and write
signal. These signals are connected to the control bus memory read and write lines. For simplicity,
we omit these connections in our design diagrams.
Data bus connections are straightforward. Each chip in a row supplies a subset of data bits. In
our design, the right chip supplies DO to D15, and the left chip supplies the remaining 16 data bits
(see Figure 3.6).
For each row, connect all chip select inputs as shown in Figure 3.6. Generating appropriate
chip select signals is the crucial part of the design process. To complete the design, partition the
address lines into three groups as shown in Figure 3.7.
The least significant Z address bits, where Z = log2(N/8), are not connected to the memory
unit. This is because each address going into the memory unit will select an N-bit value. Since
we are using byte-addressable memory address space, we can leave the Z least significant bits that
identify a byte out of N/8 bytes. In our example, N = 32, which gives us Z = 2. Therefore, the
address lines AO and Al are not connected to the memory unit.
The next Y address bits, where Y = log2D, are connected to the address inputs of all the chips.
Since we are using 16 M chips, Y = 24. Thus, address lines A2 to A25 are connected to all the
chips as shown in Figure 3.6.
The remaining most significant address bits X are used to generate the chip select signals. This
group of address bits plays an important role in mapping the memory to a part of the memory address space. We discuss this mapping in detail in the next section. The design shown in Figure 3.6
uses address lines A26 and A27 to generate four chip select signals, one for each row of chips. We
are using a low-active 2-to-4 decoder to generate the CS signals.

Chapters • Memory Organization

A2 - A25

~~1

^r^

A0-A23

>
AO - A23
16MX 16

16MX16
CS

ft
rf

A2 - A25
!/:i

OO
A26
A27

10 1

-DLsft ,

A2 - A25

A0-A23

A0-A23
16MX16
CS

D0~D15

D0-D15

D31
IT D 1 6 - ]

^r^

-Disft

m
Q

y 1
o
Q

>
16MX 16

ft
/

-1^
\)^1

AO - A23

D31
IT D 1 6 - ]

D0-D15

no -

A2 - A25

16MX16

V -—J\
- n 31

A0-A23

<>
16MX16

<1 >"^
o

11 1 02

D16-]

F:)0

^31

on
on

A
Z

D0-D15

-Dl^ft ,

= '

Q
_ j \

VD O - n 31

^r^

<>
A0-A23

A0-A23

16MX16

16MX16
CS

CS
D0-D15

ft
rr
1

D 1 6 - LD31

D0-D15
DO -DLi^i

__rv

\V—
DO - D3 1

Figure 3.6 Design of a 64 M x 32 memory using 16 M x 16 memory chips.

The top row of chips in Figure 3.6 is mapped to the first 64-MB address space (i.e., from
addresses 0 to 2^^ - 1). The second row is mapped to the next 64-MB address space, and so on.
After reading the next section, you will realize that this is a partial mapping.

Assembly Language Programming in Linux
MSB

Address lines

LSB

^^^^^^^8

^ ^ ^ ^

Y
Figure 3.7 Address line partition.

Address bus AO —- A31

—*

1O
<

«—(

< < <

o
m

ON
fN

OO
(N

\o
fN

(^
< < < < < <

A2 - A25

AO - A23

AO - A23
r-HCS

16MX32

D0-D31
Module A / \
D0~D31

16MX32

D0-D31
Module B / \
D0-D31

To data bus

Figure 3.8 Full address mapping.

Mapping Memory
Memory mapping refers to the placement of a memory unit in the memory address space (MAS).
For example, the IA-32 architecture supports 4 GB of address space (i.e., it uses 32 bits for addressing a byte in memory). If your system has 128 MB of memory, it can be mapped to one of
several address subspaces. This section describes how this mapping is done.
Full Mapping
Full mapping refers to a one-to-one mapping function between the memory address and the address
in MAS. This means, for each address value in MAS that has a memory location mapped, there is
one and only one memory location responding to the address.
Full mapping is done by completely decoding the higher-order X bits of memory (see Figure 3.7) to generate the chip select signals. Two example mappings of 16 M x 32 memory modules are shown in Figure 3.8. Both these mappings are full mappings as all higher-order X bits
participate in generating the CS signal.
Logically we can divide the 32 address lines into two groups. One group, consisting of address
lines Y and Z, locates a byte in the selected 16 M x 32 memory module. The remaining higher-

Chapters • Memory Organization

Address bus AO — A31

D0-D31

V
To data bus

Figure 3.9 Partial address mapping.

order bits (i.e., the X group) are used to generate the CS signal. Given this delineation, it is simple
to find the mapping.
We illustrate the technique by using the two examples shown in Figure 3.8. ^ince the memory
modules have a low-active chip select input, a given module is selected if its CS input is 0. For
Module A, the NAND gate output is low when A26 and A29 are low and the remaining four address lines are high. Thus, this memory module responds to memory read/write activity whenever
the higher-order six address bits are 110110. From this, we can get the address locations mapped
to this module as D8000000H to DBFFFFFFH. For convenience, we have expressed the addresses
in the hexadecimal system (as indicated by the suffix letter H). The address D8000000H is mapped
to the first location and the address DBFFFFFFH to the last location of Module A. For addresses
that are outside this range, the CS input to Module A is high and, therefore, it is deselected.
For Module B, the same inputs are used except that the NAND gate is replaced by an OR gate.
Thus, the output of this OR gate is low when the higher-order six address bits are 001001. From
this, we can see that mapping for Module B is 24000000H to 27FFFFFFH. As these two ranges
are mutually exclusive, we can keep both mappings without causing conflict problems.
Partial Mapping

Full mapping is useful in mapping a memory module; however, often the complexity associated
with generating the CS signal is not necessary. For example, we needed a 6-input NAND or OR
gate to map the two memory modules in Figure 3.8. Partial mapping reduces this complexity by
mapping each memory location to more than one address in MAS. We can obtain simplified CS
logic if the number of addresses a location is mapped to is a power of 2.
Let us look at the mapping of Module A in Figure 3.9 to clarify some of these points. The
CS logic is the same except that we are not connecting the A26 address line to the NAND gate.
Because A26 is not participating in generating the signal, it becomes a don't care input. In this
mapping, Module A is selected when the higher-order six address bits are 110110 or 110111.
Thus, Module A is mapped to the address space D8000000H to DBFFFFFFH and DCOOOOOOH

Assembly Language Programming in Linux
MSB

11110100

10011000

10110111

00001111

(a) 32-bit data

Address

Address
103

11110100

103

^ 00001 1 11

102

10011000

102

H 10110111

101

10110111

101

^ 10011000

100

00001111

100

^ 11110100

(b) Little-endian byte ordering

Figure 3.10 Two byte ordering schemes.
to DFFFFFFFH. That is, the first location in Module A responds to addresses D8000000H and
DCOOOOOOH. Since we have left out one address bit A26, two (i.e., 2 ^) addresses are mapped to
a memory location. In general, if we leave out k address bits from the chip select logic, we map
2^ addresses to each memory location. For example, in our memory design of Figure 3.6, four
address lines (A28 to A31) are not used. Thus, 2^ = 16 addresses are mapped to each memory
location.
We leave it as an exercise to verify that each location in Module B is mapped to eight addresses
as there are three address lines that are not used to generate the CS signal.

Storing Multibyte Data
Storing data often requires more than a byte. For example, we need four bytes of memory to store
an integer variable that can take a value between 0 and 2 ^^ — 1. Let us assume that the value to be
stored is the one shown in Figure 3.10a.
Suppose that we want to store these 4-byte data in memory at locations 100 through 103. How
do we store them? Figure 3.10 shows two possibilities: least significant byte (Figure 3.10b) or
most significant byte (Figure 3.10c) is stored at location 100. These two byte ordering schemes
are referred to as the little endian and big endian. In either case, we always refer to such multibyte
data by specifying the lowest memory address (100 in this example).
Is one byte ordering scheme better than the other? Not really! It is largely a matter of choice
for the designers. For example, the IA-32 processors use the little-endian byte ordering. However,
most processors leave it up to the system designer to configure the processor. For example, the
MIPS and PowerPC processors use the big-endian byte ordering by default, but these processors
can be configured to use the little-endian scheme.

Chapter 3 • Memory Organization

D 2 4 - D31
3 7 11 15 19 23 byte address
D16 - D23

^
2 6 1014 18 22 byte address
D8-D15

D0-D7

1 5 9 13 17 21 byte address

Data bus
D0-D31

CPU

0 4 8 12 16 20 byte address

MEMORY

Figure 3.11 Byte-addressable memory interface to the 32-bit data bus.

The particular byte-ordering scheme used does not pose any problems as long as you are
working with machines that use the same byte-ordering scheme. However, difficulties arise when
you want to transfer data between two machines that use different schemes. In this case, conversion
from one scheme to the other is required. For example, the IA-32 instruction set provides two
instructions to facilitate such conversion: one to perform 16-bit data conversions and the other for
32-bit data. Later chapters give details on these instructions.

Alignment of Data
We can use our memory example to understand why data alignment improves the performance of
applications. Suppose we want to read 32-bit data from the memory shown in Figure 3.6. If the
address of these 32-bit data is a multiple of four (i.e., address lines AO and Al are 0), the 32-bit
data are stored in a single row of memory. Thus the processor can get the 32-bit data in one read
cycle. If this condition is not satisfied, then the 32-bit data item is spread over two rows. Thus the
processor needs to read two 32-bits of data and extract the required 32-bit data. This scenario is
clearly demonstrated in Figure 3.11.
In Figure 3.11, the 32-bit data item stored at address 8 (shown by hashed lines) is aligned. Due
to this alignment, the processor can read this data item in one read cycle. On the other hand, the
data item stored at address 17 (shown shaded) is unaligned. Reading this data item requires two
read cycles: one to read the 32 bits at address 16 and the other to read the 32 bits at address 20.
The processor can internally assemble the required 32-bit data item from the 64-bit data read from
the memory.
You can easily extend this discussion to the Pentium's 64-bit data bus. It should be clear to
you that aligned data improve system performance.

Assembly Language Programming in Linux

• 2-Byte Data: A 16-bit data item is aligned if it is stored at an even address (i.e., addresses
that are multiples of two). This means that the least significant bit of the address must be 0.
• 4-Byte Data: A 32-bit data item is aligned if it is stored at an address that is a multiple of
four. This implies that the least significant two bits of the address must be 0 as discussed in
the last example.
• 8-Byte Data: A 64-bit data item is aligned if it is stored at an address that is a multiple
of eight. This means that the least significant three bits of the address must be 0. This
alignment is important for Pentium processors, as they have a 64-bit wide data bus. On
80486 processors, since their data bus is 32-bits wide, a 64-bit data item is read in two bus
cycles and alignment at 4-byte boundaries is sufficient.
The IA-32 processors allow both aligned and unaligned data items. Of course, unaligned data
cause performance degradation. Alignment constraints of this type are referred to as soft alignment
constraints. Because of the performance penalty associated with unaligned data, some processors
do not allow unaligned data. This alignment constraint is referred to as the hard alignment constraint.

Summary
We have discussed the basic memory design issues. We have shown how flip-flops can be used
to build memory blocks. Interfacing a memory unit to the system bus typically requires tristate
buffers. We have described by means of an example how tristate buffers are useful in connecting
the memory outputs to the data bus.
Building larger memories requires both horizontal and vertical expansion. Horizontal expansion is used to expand the word size, and vertical expansion provides an increased number of
words. We have shown how one can design memory modules using standard memory chips. In all
these designs, chip select plays an important role in allowing multiple entities to be attached to the
system bus.
Chip select logic also plays an important role in mapping memory modules into the address
space. Two basic mapping functions are used: full mapping and partial mapping. Full mapping
provides a one-to-one mapping between memory locations and addresses. In partial mapping, each
memory location is mapped to a number of addresses equal to a power of 2. The main advantage
of partial mapping is that it simplifies the chip select logic.
We have described the big-endian or little-endian formats to store multibyte data. We have also
discussed the importance of data alignment. Unaligned data can lead to performance degradation.
We have discussed the reasons for improvement in performance due to alignment of data.

4
The IA-32 Architecture
When you are programming in a high-level language like C, you don't have to know anything
about the underlying processor and the system. However, when programming in an assembly
language, you should have some understanding of how the processor is organized and the system
is put together This chapter provides these details for the Intel IA-32 architecture. The Pentium
processor is an implementation of this architecture. Of course, several other processors such as
Celeron, Pentium 4, and Xeon also belong to this architecture. We present details of its registers
and memory architecture. It supports two memory architectures: protected-mode and real-mode.
Protected-mode architecture is the native mode and the real-mode is provided to mimic the 16-bit
8086 memory architecture. Both modes support segmented memory architecture. It is important
for the assembly language programmer to understand the segmented memory organization. Other
details of this architecture are given in later chapters.

Introduction
Intel introduced microprocessors way back in 1969. Their first 4-bit microprocessor was the 4004.
This was followed by the 8080 and 8085 processors. The work on these early microprocessors led
to the development of the Intel architecture (lA). The first processor in the lA family was the 8086
processor, introduced in 1979. It has a 20-bit address bus and a 16-bit data bus.
The 8088 is a less expensive version of the 8086 processor. The cost reduction is obtained by
using an 8-bit data bus. Except for this difference, the 8088 is identical to the 8086 processor. Intel
introduced segmentation with these processors. These processors can address up to four segments
of 64 KB each. This lA segmentation is referred to as the real-mode segmentation and is discussed
later in this chapter.
The 80186 is a faster version of the 8086. It also has a 20-bit address bus and 16-bit data bus,
but has an improved instruction set. The 80186 was never widely used in computer systems. The
real successor to the 8086 is the 80286, which was introduced in 1982. It has a 24-bit address
bus, which implies 16 MB of memory address space. The data bus is still 16 bits wide, but the
80286 has some memory protection capabilities. It introduced the protection mode into the IA
architecture. Segmentation in this new mode is different from the real-mode segmentation. We
present details on this new segmentation later. The 80286 is backward compatible in that it can
run the 8086-based software.

Assembly Language Programming in Linux

Intel introduced its first 32-bit processor—the 80386—in 1985. It has a 32-bit data bus and
32-bit address bus. It follows their 32-bit architecture known as IA-32. The memory address space
has grown substantially (from 16 MB address space to 4 GB). This processor introduced paging
into the IA architecture. It also allowed definition of segments as large as 4 GB. This effectively
allowed for a "flat" model (i.e., effectively turning off segmentation). Later sections present details
on this aspect. Like the 80286, it can run all the programs written for 8086 and 8088 processors.
The Intel 80486 processor was introduced in 1989. This is an improved version of the 80386.
While maintaining the same address and data buses, it combined the coprocessor functions for
performing floating-point arithmetic. The 80486 processor has added more parallel execution
capability to instruction decode and execution units to achieve a scalar execution rate of one instruction per clock. It has an 8 KB onchip LI cache. Furthermore, support for the L2 cache and
multiprocessing has been added. Later versions of the 80486 processors incorporated features such
as energy saving mode for notebooks.
The latest in the family is the Pentium series. It is not named 80586 because Intel found
belatedly that numbers couldn't be trademarked! The first Pentium was introduced in 1993. The
Pentium is similar to the 80486 but uses a 64-bit wide data bus. Internally, it has 128- and 256-bit
wide datapaths to speed up internal data transfers. However, the Pentium instruction set supports
32-bit operands like the 80486 processor. It has added a second execution pipeline to achieve
superscalar performance by having the capability to execute two instructions per clock. It has also
doubled the onchip LI cache, with 8 KB for data and another 8 KB for the instructions. Branch
prediction has also been added.
The Pentium Pro processor has a three-way superscalar architecture. That is, it can execute
three instructions per clock cycle. The address bus has been expanded to 36 bits, which gives it an
address space of 64 GB. It also provides dynamic execution including out-of-order and speculative
execution. In addition to the LI caches provided by the Pentium, the Pentium Pro has a 256 KB
L2 cache in the same package as the CPU.
The Pentium II processor has added multimedia (MMX) instructions to the Pentium Pro architecture. It has expanded the LI data and instruction caches to 16 KB each. It has also added more
comprehensive power management features including Sleep and Deep Sleep modes to conserve
power during idle times.
The Pentium III processor introduced streaming SIMD extensions (SSE), cache prefetch instructions, and memory fences, and the single-instruction multiple-data (SIMD) architecture for
concurrent execution of multiple floating-point operations. Pentium 4 enhanced these features
further.
Intel's 64-bit Itanium processor is targeted for server applications. For these applications, the
32-bit memory address space is not adequate. The Itanium uses a 64-bit address bus to provide
substantially larger address space. Its data bus is 128 bits wide. In a major departure, Intel has
moved from the CISC designs used in their 32-bit processors to RISC orientation for their 64bit Itanium processors. The Itanium also incorporates several advanced architectural features to
provide improved performance for the high-end server market.
In the rest of the chapter, we look at the basic architectural details of the IA-32 architecture.
Our focus is on the internal registers and memory architecture. Other details are covered in later
chapters.

Chapter 4 • The IA-32 Architecture

|-^— Execution cycle — ^
Fetch

Decode Execute

Fetch
• ^

Decode Execute

Fetch

time

Figure 4.1 Execution cycle of a typical computer system.

Processor Execution Cycle
The processor acts as the controller of all actions or services provided by the system. It can be
thought of as executing the following cycle forever:
1. Fetch an instruction from the memory;
2. Decode the instruction (i.e., identify the instruction);
3. Execute the instruction (i.e., perform the action specified by the instruction).
This process is often referred to as ihcfetch-decode-execute cycle, or simply the execution cycle.
The execution cycle of a processor is shown in Figure 4.1. As discussed in the last chapter,
Fetching an instruction from the main memory involves placing the appropriate address on the
address bus and activating the memory read signal on the control bus to indicate to the memory
unit that an instruction should be read from that location. The memory unit requires time to read
the instruction at the addressed location. The memory then places the instruction on the data bus.
The processor, after instructing the memory unit to read, waits until the instruction is available on
the data bus and then reads the instruction.
Decoding involves identifying the instruction that has been fetched from the memory. To facilitate the decoding process, machine language instructions follow a particular instruction-encoding
scheme.
To execute an instruction, the processor contains hardware consisting of control circuitry and
an arithmetic and logic unit (ALU). The control circuitry is needed to provide timing controls as
well as to instruct the internal hardware components to perform a specific operation. As described
in Chapter 2, the ALU is mainly responsible for performing arithmetic operations (such as add
and d i v i d e ) and logical operations (such as and, or) on data.
In practice, instructions and data are not fetched, most of the time, from the main memory.
There is a high-speed cache memory that provides faster access to instructions and data than the
main memory. For example, the Pentium processor provides a 16 KB on-chip cache. This is
divided equally into data cache and instruction cache. The presence of on-chip cache is transparent
to application programs—it helps improve application performance.

Processor Registers
The IA-32 architecture provides ten 32-bit and six 16-bit registers. These registers are grouped
into general, control, and segment registers. The general registers are further divided into data,
pointer, and index registers as shown in Figures 4.2 and 4.3.

Assembly Language Programming in Linux

64
32-bit registers
j

16-bit registers
16 15

0 I

8 7

EAX

Accumulator

EBX

Base

ECX

Counter

EDX

Data

Figure 4.2 Data registers (the 16-bit registers are shown shaded).

Index registers
16 15

ESI

Source index

EDI

Destination index

Pointer registers
16 15

ESP

Stack pointer

EBP

Base pointer

Figure 4.3 Index and pointer registers.

Data Registers

There are four 32-bit data registers that can be used for arithmetic, logical, and other operations
(see Figure 4.2). These four registers are unique in that they can be used as follows:
• Four 32-bit registers (EAX, EBX, ECX, EDX); or
• Four 16-bit registers (AX, BX, CX, DX); or
• Eight 8-bit registers (AH, AL, BH, BL, CH, CL, DH, DL).
As shown in Figure 4.2, it is possible to use a 32-bit register and access its lower half of the data by
the corresponding 16-bit register name. For example, the lower 16 bits of EAX can be accessed by
using AX. Similarly, the lower two bytes can be individually accessed by using the 8-bit register
names. For example, the lower byte of AX can be accessed as AL and the upper byte as AH.
The data registers can be used without constraint in most arithmetic and logical instructions.
However, some registers in this group have special functions when executing specific instructions.
For example, when performing a multiplication operation, one of the two operands should be in
the EAX, AX, or AL register depending on the operand size. Similarly, the ECX or CX register is
assumed to contain the loop count value for iterative instructions.

Chapter 4 • The IA-32 Architecture

Flags register
FLAGS
3
1

2 2 2 1 1 1 1 1 1 1 1 1 1
2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4

3 2 1 0
A
I V V A V R
P
N TO O D I T S 7
r
U U U U U U U U U U D 1 Fi C|M F U T PL F F F F F F 0 F 0 F I F
P

EFLAGS
Status flags

Control flag

System flags

CF = Carry flag
PF = Parity flag
AF = Auxiliary carry flag
ZF = Zero flag
SF = Sign flag
OF = Overflow flag

DF = Direction flag

TF = Trap flag
IF = Interrupt flag
lOPL = I/O privilege level
NT = Nested task
RF = Resume flag
VM = Virtual 8086 mode
AC = Alignment check
VIF = Virtual interrupt flag
VIP = Virtual interrupt pendin
ID = ID flag

Instruction pointer
16 15

0
IP

EIP

Figure 4.4 Flags and instruction pointer registers.

Pointer and Index Registers

Figure 4.3 shows the four 32-bit registers in this group. These registers can be used either as 16or 32-bit registers. The two index registers play a special role in the string processing instructions
(these instructions are discussed in Chapter 17). In addition, they can be used as general-purpose
data registers.
The pointer registers are mainly used to maintain the stack. Even though they can be used as
general-purpose data registers, they are almost exclusively used for maintaining the stack. The
stack implementation is discussed in Chapter 11.
Control Registers

This group of registers consists of two 32-bit registers: the instruction pointer register and the flags
register (see Figure 4.4). The processor uses the instruction pointer register to keep track of the
location of the next instruction to be executed. Instruction pointer register is sometimes called the

Assembly Language Programming in Linux

15
CS

Code segment

Data segment

Stack segment

Extra segment

Figure 4.5 The six segment registers support the segmented memory architecture.

program counter register. The instruction pointer can be used either as a 16-bit register (IP), or as
a 32-bit register (EIP). The IP register is used for 16-bit addresses and the EIP register for 32-bit
addresses.
When an instruction is fetched from memory, the instruction pointer is updated to point to the
next instruction. This register is also modified during the execution of an instruction that transfers
control to another location in the program (such as a jump, procedure call, or interrupt).
The flags register can be considered as either a 16-bit FLAGS register, or a 32-bit EFLAGS
register. The FLAGS register is useful in executing 8086 processor code. The EFLAGS register
consists of 6 status flags, 1 controlflag,and 10 system flags, as shown in Figure 4.4. Bits of this
register can be set (1) or cleared (0). The IA-32 instruction set has instructions to set and clear
some of the flags. For example, the c l c instruction clears the carry flag, and the s t c instruction
sets it.
The six status flags record certain information about the most recent arithmetic or logical
operation. For example, if a subtract operation produces a zero result, the zero flag (ZF) would be
set (i.e., ZF = 1). Chapter 14 discusses the status flags in detail.
The control flag is useful in string operations. This flag determines whether a string operation
should scan the string in the forward or backward direction. The function of the direction flag is
described in Chapter 17, which discusses the string instructions.
The 10 system flags control the operation of the processor. A detailed discussion of all 10
system flags is beyond the scope of this book. Here we briefly discuss a few flags in this group.
The two interrupt enable flags—the trap enable flag (TF) and the interrupt enable flag (IF)—
are useful in interrupt-related activities. For example, setting the trap flag causes the processor
to single-step through a program, which is useful in debugging programs. These two flags are
covered in Chapter 20, which discusses the interrupt processing mechanism.
The ability to set and clear the identification (ID) flag indicates that the processor supports the
CPUID instruction. The CPUID instruction provides information to software about the vendor
(Intel chips use a "Genuinelntel" string), processor family, model, and so on. The virtual-8086
mode (VM) flag, when set, emulates the programming environment of the 8086 processor.
The last flag that we discuss is the alignment check (AC) flag. When this flag is set, the
processor operates in alignment check mode and generates exceptions when a reference is made
to an unaligned memory address. We discussed data alignment in the last chapter.

Chapter 4 • The lA-32 Architecture

Logical
address

Segment
translation

32-bit
linear
address

Page
translation

32-bit
-^" physical
address

Figure 4.6 Logical to physical address translation process in the protected nnode.

Segment Registers

The six 16-bit segment registers are shown in Figure 4.5. These registers support the segmented
memory organization. In this organization, memory is partitioned into segments, where each segment is a small part of the memory. The processor, at any point in time, can only access up to
six segments of the main memory. The six segment registers point to where these segments are
located in the memory.
A program is logically divided into two parts: a code part that contains only the instructions,
and a data part that keeps only the data. The code segment (CS) register points to where the
program's instructions are stored in the main memory, and the data segment (DS) register points
to the data part of the program. The stack segment (SS) register points to the program's stack
segment (further discussed in Chapter 11).
The last three segment registers—ES, FS, and GS—are additional segment registers that can
be used in a similar way as the other segment registers. For example, if a program's data could
not fit into a single data segment, we could use two segment registers to point to the two data
segments. We will say more about these registers later.

Protected Mode Memory Architecture
The IA-32 architecture supports a sophisticated memory architecture under real and protected
modes. The real mode, which uses 16-bit addresses, is provided to run programs written for the
8086 processor. In this mode, it supports the segmented memory architecture of the 8086 processor. The protected mode uses 32-bit addresses and is the native mode of the IA-32 architecture. In
the protected mode, both segmentation and paging are supported. Paging is useful in implementing virtual memory; it is transparent to the application program, but segmentation is not. We do
not look at the paging features here. We discuss the real-mode memory architecture in the next
section, and devote the rest of this section to describing the protected-mode segmented memory
architecture.
In the protected mode, a sophisticated segmentation mechanism is supported. In this mode,
the segment unit translates a logical address into a 32-bit linear address. The paging unit translates
the linear address into a 32-bit physical address, as shown in Figure 4.6. If no paging mechanism
is used, the linear address is treated as the physical address. In the remainder of this section, we
focus on the segment translation process only.
Protected mode segment translation process is shown in Figure 4.7. In this mode, contents of
the segment register are taken as an index into a segment descriptor table to get a descriptor. Segment descriptors provide the 32-bit segment base address, its size, and access rights. To translate
a logical address to the corresponding linear address, the offset is added to the 32-bit base address.
The offset value can be either a 16-bit or 32-bit number.

Assembly Language Programming in Linux
SEGMENT SELECTOR
15

3 2
INDEX

OFFSET
1 0

TI RPL

DESCRIPTOR TABLE

Segment
descriptor

ACCESS RIGHTS
LIMIT

BASE ADDRESS

ADDER

32-bit base address

31
LINEAR ADDRESS

Figure 4.7 Protected mode address translation.

Visible part

Invisible part

Segment selector

Segment base address, size, access rights, etc.

Segment selector

Segment base address, size, access rights, etc.

Segment selector

Segment base address, size, access rights, etc.

Segment selector

Segment base address, size, access rights, etc.

Segment selector

Segment base address, size, access rights, etc.

Segment selector

Segment base address, size, access rights, etc.

Figure 4.8 Visible and invisible parts of segment registers.

Segment Registers
Every segment register has a "visible" part and an "invisible" part, as shown in Figure 4.8. When
we talk about segment registers, we are referring to the 16-bit visible part. The visible part is
referred to as the segment selector. There are direct instructions to load the segment selector.
These instructions include mov, p o p , I d s , l e s , I s s , I g s , and I f s. Some of these instructions

Chapter 4 • The [A-32 Architecture
2 2 2 2 2 1
4 3 2 10 9
D
BASE 31:24

G/
B

69
1 1 1 1 1 1
6 5 4 3 2 1

1A
SV

D
LIMIT
P P S TYPE
19:16
L
L

BASE ADDRESS 15:00

SEGMENT LIMIT 15:00

BASE 23:16

Figure 4.9 A segment descriptor.

are discussed in later chapters and in Appendix D. The invisible part of the segment registers is
automatically loaded by the processor from a descriptor table (described next).
As shown in Figure 4.7, the segment selector provides three pieces of information:
• Index: The index selects a segment descriptor from one of two descriptor tables: a local
descriptor table or a global descriptor table. Since the index is a 13-bit value, it can select
one of 2^^ = 8192 descriptors from the selected descriptor table. Since each descriptor,
shown in Figure 4.9, is 8 bytes long, the processor multiplies the index by 8 and adds the
result to the base address of the selected descriptor table.
• Table Indicator (TI): This bit indicates whether the local or global descriptor table should
be used.
0 = Global descriptor table,
1 = Local descriptor table.
• Requester Privilege Level (RPL): This field identifies the privilege level to provide protected
access to data: the smaller the RPL value, the higher the privilege level. Operating systems
don't have to use all four levels. For example, Linux uses level 0 for the kernel and level 3
for the user programs. It does not use levels 1 and 2.
Segment Descriptors

A segment descriptor provides the attributes of a segment. These attributes include its 32-bit base
address, 20-bit segment size, as well as control and status information, as shown in Figure 4.9.
Here we provide a brief description of some of the fields shown in this figure.
• Base Address: This 32-bit address specifies the starting address of a segment in the 4 GB
physical address space. This 32-bit value is added to the offset value to get the linear address
(see Figure 4.7).
• Granularity (G): This bit indicates whether the segment size value, described next, should be
interpreted in units of bytes or 4 KB. If the granularity bit is zero, segment size is interpreted
in bytes; otherwise, in units of 4 KB.
• Segment Limit: This is a 20-bit number that specifies the size of the segment. Depending on
the granularity bit, two interpretations are possible:

Assembly Language Programming in Linux

1. If the granularity bit is zero, segment size can range from 1 byte to 1 MB (i.e., 2^^
bytes), in increments of 1 byte.
2. If the granularity bit is 1, segment size can range from 4 KB to 4 GB, in increments of
4KB.
• D/B Bit: In a code segment, this bit is called the D bit and specifies the default size for
operands and offsets. If the D bit is 0, default operands and offsets are assumed to be 16
bits; for 32-bit operands and offsets, the D bit must be 1.
In a data segment, this bit is called the B bit and controls the size of the stack and stack
pointer. If the B bit is 0, stack operations use the SP register and the upper bound for the
stack is FFFFH. If the B bit is 1, the ESP register is used for the stack operations with
a stack upper bound of FFFFFFFFH. Recall that numbers expressed in the hexadecimal
number system are indicated by suffix H (see Appendix A).
Typically, this bit is cleared for the real-mode operation and set for the protected-mode
operation. Later we describe how 16- and 32-bit operands and addresses can be mixed in a
given mode of operation.
• S Bit: This bit identifies whether the segment is a system segment or an application segment.
If the bit is 0, the segment is identified as a system segment; otherwise, as an application
(code or data) segment.
• Descriptor Privilege Level (DPL): This field defines the privilege level of the segment. It is
useful in controlling access to the segment using the protection mechanisms of the processor.
• Type: This field identifies the type of segment. The actual interpretation of this field depends
on whether the segment is a system or application segment. For application segments, the
type depends on whether the segment is a code or data segment. For a data segment, type
can identify it as a read-only, read-write, and so on. For a code segment, type identifies it as
an execute-only, execute/read-only, and so on.
• P bit: This bit indicates whether the segment is present. If this bit is 0, the processor
generates a segment-not-present exception when a selector for the descriptor is loaded into
a segment register.
Segment Descriptor Tables
A segment descriptor table is an array of segment descriptors shown in Figure 4.9. There are three
types of descriptor tables:
• The global descriptor table (GDT);
• Local descriptor tables (LDT);
• The interrupt descriptor table (IDT).
All three descriptor tables are variable in size from 8 bytes to 64 KB. The interrupt descriptor table
is used in interrupt processing and is discussed in Chapter 20. Both LDT and GDT can contain up
to 2^^ = 8192 8-bit descriptors. As shown in Figure 4.7, the upper 13 bits of a segment selector
are used as an index into the selected descriptor table. Each table has an associated register that
holds the 32-bit linear base address and a 16-bit size of the table. The LDTR and GDTR registers
are used for this purpose. These registers can be loaded using the l l d t and I g d t instructions.
Similarly, the values of the LDTR and GDTR registers can be stored by the s l d t and s g d t
instructions. These instructions are typically used by the operating system.

Chapter 4 • The IA-32 Architecture

ACCESS

LIMIT

CODE

STACK

DATA

DATA
BASE ADDRESS

Figure 4.10 Segments in a multisegment model.

The global descriptor table contains descriptors that are available to all tasks within the system.
There is only one GDT in the system. Typically, the GDT contains code and data used by the
operating system. The local descriptor table contains descriptors for a given program. There
can be several LDTs, each of which may contain descriptors for code, data, stack, and so on. A
program cannot access a segment unless there is a descriptor for the segment in either the current
LDT or GDT.
Segmentation Models

The segments can span the entire memory address space. As a result, we can effectively make the
segmentation invisible by mapping all segment base addresses to zero and setting the size to 4 GB.
Such a model is called Sijiat model and is used in programming environments such as UNIX and
Linux.
Another model that uses the capabilities of segmentation to the full extent is the multisegment
model. Figure 4.10 shows an example mapping of six segments. A program, in fact, can have
more than just six segments. In this case, the segment descriptor table associated with the program
will have the descriptors loaded for all the segments defined by the program. However, at any
time, only six of these segments can be active. Active segments are those that have their segment
selectors loaded into the six segment registers. A segment that is not active can be made active
by loading its selector into one of the segment registers, and the processor automatically loads the
associated descriptor (i.e., the "invisible part" shown in Figure 4.8). The processor generates a
general-protection exception if an attempt is made to access memory beyond the segment limit.

Assembly Language Programming in Linux

Physical address
11450
Offset
(450)
Segment base
(1100)

' ^"

11000

Figure 4.11 Relationship between the logical and physical addresses of memory in the real mode
(all numbers are in hex).

Real Mode Memory Architecture
In the real mode, an IA-32 processor such as the Pentium behaves like a faster 8086. The memory
address space of the 8086 processor is 1 MB. To address a memory location, we have to use a
20-bit address. The address of the first location is OOOOOH; the last addressable memory location
is at FFFFFH.
Since all registers in the 8086 are 16 bits wide, the address space is limited to 2^^, or 65,536
(64 K) locations. As a consequence, the memory is organized as a set of segments. Each segment
of memory is a linear contiguous sequence of up to 64 K bytes. In this segmented memory organization, we have to specify two components to identify a memory location: a segment base and an
offset. This two-component specification is referred to as the logical address. The segment base
specifies the start address of a segment in memory and the offset specifies the address relative to
the segment base. The offset is also referred to as the effective address. The relationship between
the logical and physical addresses is shown in Figure 4.11.
It can be seen from Figure 4.11 that the segment base address is 20 bits long (1 lOOOH). So
how can we use a 16-bit register to store the 20-bit segment base address? The trick is to store the
most significant 16 bits of the segment base address and assume that the least significant four bits
are all 0. In the example shown in Figure 4.11, we would store 1 lOOH as the segment base. The
implied four least significant zero bits are not stored. This trick works but imposes a restriction on
where a segment can begin. Segments can begin only at those memory locations whose address
has the least significant four bits as 0. Thus, segments can begin at OOOOOH, 0001 OH, 00020H,...,
FFFEOH, FFFFOH. Segments, for example, cannot begin at OOOOIH or FFFEEH.

Chapter 4 • The IA-32 Architecture

4 3

Segment register

1 0000 1

16 15

0 0 0 0

Offset value

Figure 4.12 Physical address generation in the real mode.

In the segmented memory organization, a memory location can be identified by its logical address. We use the notation segment'.ojfset to specify the logical address. For example, 1100:450H
identifies the memory location 11450H, as shown in Figure 4.11. The latter value to identify a
memory location is referred to as the physical memory address.
Programmers have to be concerned with the logical addresses only. However, when the processor accesses the memory, it has to supply the 20-bit physical memory address. The conversion
of logical address to physical address is straightforward. This translation process, shown in Figure 4.12, involves adding four least significant zero bits to the segment base value and then adding
the offset value. When using the hexadecimal number system, simply add a zero to the segment
base address at the right and add the offset value. As an example, consider the logical address
1100:450H. The physical address is computed as follows.
+

110 0 0
450
114 50

(add 0 to the 16-bit segment base value)
(offset value)
(physical address).

For each logical memory address, there is a unique physical memory address. The converse,
however, is not true. More than one logical address can refer to the same physical memory address.
This is illustrated in Figure 4.13, where logical addresses 1000:20A9H and 1200:A9H refer to the
same physical address 120A9H. In this example, the physical memory address 120A9H is mapped
to two segments.

Assembly Language Programming in Linux

Segment 1

Segment 2

120A9
Offset
(20A9)
Segment base
(1000)

Offset (A9)
Segment base
(1200)

Figure 4.13 Two logical addresses map to the same physical address (all numbers are in hex).

In our discussion of segments, we never said anything about the actual size of a segment. The
main factor limiting the size of a segment is the 16-bit offset value, which restricts the segments
to at most 64 KB in size. In the real mode, the processor sets the size of each segment to exactly
64 KB. At any instance, a program can access up to six segments. The 8086 actually supported
only four segments: segment registers FS and GS were not present in the 8086 processor.
Assembly language programs typically use at least two segments: code and stack segments. If
the program has data (which almost all programs do), a third segment is also needed to store data.
Those programs that require additional memory can use the other segments.
The six segment registers point to the six active segments, as shown in Figure 4.14. As described earlier, segments must begin on 16-byte memory boundaries. Except for this restriction,
segments can be placed anywhere in memory. The segment registers are independent and segments
can be contiguous, disjoint, partially overlapped, or fully overlapped.

Mixed-Mode Operation
Our previous discussion of protected and real modes of operation suggests that we can use either
16-bit or 32-bit operands and addresses. The D/B bit indicates the default size. The question is:
Is it possible to mix these two? For instance, can we use 32-bit registers in the 16-bit mode of
operation? The answer is yes!
The instruction set provides two size override prefixes—one for the operands and the other for
the addresses—to facilitate such mixed mode programming. Details on these prefixes are provided
in Chapter 13.

Chapter 4 • The IA-32 Architecture

CODE

STACK

cs
ss
DS
ES
FS
GS

DATA

Figure 4.14 The six active segments of the memory system.

Which Segment Register to Use
This discussion applies to both real and protected modes of operation. In generating a physical
memory address, the processor uses different segment registers depending on the purpose of the
memory reference. Similarly, the offset part of the logical address comes from a variety of sources.
Instruction Fetch: When the memory access is to read an instruction, the CS register provides
the segment base address. The offset part is supplied either by the IP or EIP register, depending
on whether we are using 16-bit or 32-bit addresses. Thus, CS:(E)IP points to the next instruction
to be fetched from the code segment.
Stack Operations: Whenever the processor is accessing the memory to perform a stack operation
such as push or pop, the SS register is used for the segment base address, and the offset value
comes from either the SP register (for 16-bit addresses) or the ESP register (for 32-bit addresses).
For other operations on the stack, the BP or EBP register supplies the offset value. A lot more is
said about the stack in Chapter 11.
Accessing Data: If the purpose of accessing memory is to read or write data, the DS register is
the default choice for providing the data segment base address. The offset value comes from a
variety of sources depending on the addressing mode used. Addressing modes are discussed in
Chapter 13.

Assembly Language Programming in Linux

Address bus

-A

3
JO

Data
Status

Data bus

>
Command

Control bus

>
k

/ — — \
\ —
— /

I/O Device

J/0 Controller

Figure 4.15 Block diagram of a generic I/O device interface.

Input/Output
Input/Output (I/O) devices provide the means by which a computer system can interact with the
outside world. An I/O device can be a purely input device (e.g., keyboard, mouse), a purely output
device (e.g., printer, display screen), or both an input and output device (e.g., disk). Here we
present a brief overview of the I/O device interface. Chapter 20 provides more details on I/O
interfaces.
Computers use I/O devices (also called peripheral devices) for two major purposes—to communicate with the outside world, and to store data. I/O devices such as printers, keyboards, and
modems are used for communication purposes and devices like disk drives are used for data storage. Regardless of the intended purpose of an I/O device, all communications with these devices
must involve the system bus. However, I/O devices are not directly connected to the system bus.
Instead, there is usually an I/O controller that acts as an interface between the system and the I/O
device.
There are two main reasons for using an I/O controller. First, different I/O devices exhibit
different characteristics and, if these devices were connected directly, the processor would have to
understand and respond appropriately to each I/O device. This would cause the processor to spend
a lot of time interacting with I/O devices and spend less time executing user programs. If we use
an I/O controller, this controller could provide the necessary low-level commands and data for
proper operation of the associated I/O device. Often, for complex I/O devices such as disk drives,
there are special I/O controller chips available.
The second reason for using an I/O controller is that the amount of electrical power used to
send signals on the system bus is very low. This means that the cable connecting the I/O device
has to be very short (a few centimeters at most). I/O controllers typically contain driver hardware
to send current over long cables that connect the I/O devices.
I/O controllers typically have three types of internal registers—a data register, a command
register, and a status register—as shown in Figure 4.15. When the processor wants to interact with
an I/O device, it communicates only with the associated I/O controller.
To focus our discussion, let us consider printing a character on the printer. Before the processor
sends a character to be printed, it has to first check the status register of the associated I/O controller

Chapter 4 • The IA-32 Architecture

to see whether the printer is online/offline, busy or idle, out of paper, and so on. In the status
register, three bits can be used to provide this information. For example, bit 4 can be used to
indicate whether the printer is online (1) or offline (0), bit 7 can be used for busy (1) or not busy
(0) status indication, and bit 5 can be used for out of paper (1) or not (0).
The data register holds the character to be printed and the command register tells the controller
the operation requested by the processor (for example, send the character in the data register to the
printer). The following summarizes the sequence of actions involved in sending a character to the
printer:
• Wait for the controller tofinishthe last command;
• Place a character to be printed in the data register;
• Set the command register to initiate the transfer.
The processor accesses the internal registers of an I/O controller through what are known as I/O
ports. An I/O port is simply the address of a register associated with an I/O controller.
There are two ways of mapping I/O ports. Some processors such as the MIPS map I/O ports
to memory addresses. This is called memory-mapped I/O. In these systems, writing to an I/O port
is similar to writing to a memory address. Other processors like the Pentium have an I/O address
space that is separate from the memory address space. This technique is called isolated I/O, In
these systems, to access the I/O address space, special I/O instructions are needed. The IA-32
instruction set provides two instructions—in and out—to access I/O ports. The i n instruction
can be used to read from an I/O port and the o u t for writing to an I/O port. Chapter 20 gives more
details on these instructions.
The IA-32 architecture provides 64 KB of I/O address space. This address space can be used
for 8-bit, 16-bit, and 32-bit I/O ports. However, the combination cannot be more than the I/O
address space. For example, we can have 64 K 8-bit ports, 32 K 16-bit ports, 16 K 32-bit ports, or
a combination of these that fits the 64 K address space.
Systems designed with processors supporting the isolated I/O have the flexibility of using
either the memory-mapped I/O or the isolated I/O. Typically, both strategies are used. For instance,
devices like printer or keyboard could be mapped to the I/O address using the isolated I/O strategy;
the display screen could be mapped to a set of memory addresses using the memory-mapped I/O.
Accessing I/O Devices As a programmer, you can have direct control on any of the I/O devices
(through their associated I/O controllers) when you program in the assembly language. However,
it is often a difficult task to access an I/O device without any help. Furthermore, it is a waste of
time and effort if everyone has to develop their own routines to access I/O devices (called device
drivers). In addition, system resources could be abused, either unintentionally or maliciously. For
instance, an improper disk driver could erase the contents of a disk due to a bug in the driver
routine.
To avoid these problems and to provide a standard way of accessing I/O devices, operating
systems provide routines to conveniently access I/O devices. Linux provides a set of system calls
to access system I/O devices. In Windows, access to I/O devices can be obtained from two layers
of system software: the basic I/O system (BIOS), and the operating system. BIOS is ROM resident
and is a collection of routines that control the basic I/O devices. Both provide access to routines
that control the I/O devices though a mechanism called interrupts. Interrupts are discussed in
detail in Chapter 20.

Assembly Language Programming in Linux

Summary
We described the Intel IA-32 architecture in detail. Implementations of this architecture include
processors such as Pentium, Celeron, Pentium 4, and Xeon. These processors can address up to
4 GB of memory. This architecture provides protected- and real-mode memory architectures. The
protected mode is the native mode of this architecture. In this mode, it supports both paging and
segmentation. Paging is useful in implementing virtual memory and is not discussed here.
In the real mode, 16-bit addresses and the memory architecture of the 8086 processor are supported. We discussed the segmented memory architecture in detail, as these details are necessary
to program in the assembly language.

PART III
Linux

5
Installing Linux
This chapter gives detailed information on instaUing Fedora Linux on your system. If your system
already has another operating system such as Windows XP, you can install Fedora Linux as the
second operating system. At the boot time, you can select one of the operating systems to hoot.
Such systems are called dual-boot systems. If you want to install it as the only operating system,
you can skip some of the steps described in this chapter
The default software packages installed do not include the compilers and assemblers that we
need for the assembly language programming. We show how software packages can be installed
and removed by using the package management tool provided by Fedora Linux.
We also discuss how files can be shared between the Windows and Linux operating systems. To
share files between these two operating systems, you need to mount a Windows partition so that it
is accessible under Linux. We provide detailed instructions to mount Windows partitions. Toward
the end of the chapter, we give information on how you can get help if you run into installation
problems.

Introduction
This chapter describes the Fedora Core 3 Linux operating system installation process. The book
comes with two DVD-ROMs. The first DVD-ROM (DVD 1) contains the complete Fedora 3 distribution. It is a copy of the distribution available at the Red Hat's Fedora Web site (www. f e d o r a .
r e d h a t . com). The second DVD-ROM (DVD 2) contains the source code and CD-ROM images.
If you have a DVD-ROM drive, you can install Fedora Core 3 using DVD 1.
If your system does not have a DVD-ROM drive, you can make installation CD-ROMs from the
image files on DVD 2. This DVD-ROM contains three CD-ROM ISO image files: FC3 - i 3 86 d i s c i . i s o , F C 3 - i 3 8 6 - d i s c 2 . iso,and F C 3 - i 3 8 6 - d i s c 3 . i s o . You can use these files
to bum three CDs. Note that you should not copy these ISO files onto the CDs as if they are data
files. Instead, you have to let the CD writer software know that these are ISO image files. If you
do not have a CD writer application that allows you burning of CD image files, several utilities
are available in the public domain. For example, the BurnCDCC utility from Terabyte Unlimited
( h t t p : //www. t e r a b y t e u n l i m i t e d . c o m / u t i l i t i e s .html) is a freeware that allows
you to bum an ISO file to a CD or DVD. In the rest of the chapter, we assume that you are* using
DVD 1 to install the Fedora Linux.

Assembly Language Programming in Linux

To install the Linux operating system from the accompanying DVD-ROM, you need to have
a DVD-ROM drive supported by Linux. Linux supports a variety of DVD-ROM drives. In all
probability, your drive is supported. In this chapter, we describe installation of Personal Desktop,
which is a compact system that is targeted for new users. Unfortunately, it does not install all
the software we need. For example, development tools like compilers, assemblers, and debuggers
are not installed. It is, however, simple to add additional software packages using the package
management tool provided by the Fedora 3 distribution. We give detailed instructions on how the
missing packages can be installed.
The installation can be done in several different ways depending on the state of your current
system. If you want to install Linux as the only operating system, it is relatively straightforward.
In fact, you will perform only some of the steps described here.
A most likely scenario is that you want to keep your current Windows operating system such
as XR This is what we assume in the remainder of this chapter. The steps we describe here will
add Linux as the second operating system. At boot time you can select the operating system you
want to start.
The installation process involves two steps: (i) create enough disk space for the Linux operating system, and (ii) install the Linux system. Between these two steps, the first step is a critical
one. Several scenarios are possible here. You may want to isolate your Windows system from
Linux by using a second hard drive. In this case, creating space for Linux is not a problem. Often,
you find that there is a lot of disk space in your existing hard drive. This is typically the case if
you have a recent system with a large disk drive. In this case, you may want to partition your hard
drive to make room for Linux. This is the scenario we describe here. If your situation is different
from what is described here, you may want to get on the Internet for the information that applies
to your system configuration. You can refer to the "Getting Help" section at the end of the chapter
for details on where you can get help. This chapter gives detailed instructions on how you can
partition your hard disk, install the Fedora distribution, and add additional software packages we
need.
When you have more than one operating system, it is often convenient to share files between
the operating systems. One way to share the files is to explicitly copy using a removable medium
such as a memory stick orfloppydisk. However, it would be better if we can share thefileswithout
such explicit copying. Before closing the chapter, we describe the procedure involved in mounting
a Windows partition under the Linux operating system to facilitate file sharing.

Partitioning Your Hard Disk
If you decide to partition your existing hard disk for Linux, you can use a commercial product
such as P a r t i t i o n M a g i c . It allows you to create new partitions or resize an existing partition.
If your file system is FAT32 (not NTFS), you can also use the p a r t e d utility provided on the
accompanying DVD-ROM. If you decide to follow this path, make sure to read the p a r t e d
documentation.
Important
Irrespective of how you plan to partition your hard disk, make sure to backup all your files in
case you run into problems. Before you proceed, ensure that the backup is readable. If you
want some degree of added safety, you may want to make two backup copies.

Chapter 5 • Installing Linux

Fedora
C

To instAll or upgrade In gfrnphlcol node, press the key.

To i n s t a l l or upgrade in text node, tj^pe: Unix text .

Use the function keys listed belcN for nore Infomatlon.
ot: _

Figure 5.1 Fedora Core 3 initial screenshot. Type l i n u x rescue to access the p a r t e d utility to
partition your hard disk.

In this section, we describe three ways of partitioning your hard disk. The first one uses the
p a r t e d utiHty that comes with the Fedora Core Linux distribution. Next we describe how you
can use the QTparted utiUty on DVD 2. Lastly, we describe P a r t i t i o n Magic to partition
your hard disk. You can use p a r t e d to partition FAT32 partitions. If yourfilesystem uses NTFS,
you can use either QTparted or P a r t i t i o n M a g i c .
Using PARTED

To use p a r t e d , insert DVD 1 into your DVD-ROM drive and reboot your system. For this to
work, your system should be bootable from the DVD-ROM drive. If not, get into your system's
BIOS to change the boot sequence to include DVD-ROM first or after thefloppydrive A (see the
boxed note on page 93 for details on making your system bootable from the DVD-ROM drive).
To access p a r t e d , you need to boot in the rescue mode. After booting off the DVD, you will
see a boot prompt screen shown in Figure 5.1. To enter the rescue mode, type l i n u x r e s c u e .
After this, you will be prompted for some hardware choices (keyboard, mouse, and so on). Finally
when you get the prompt, type p a r t e d . You get ( p a r t e d ) prompt after displaying the GNU
copyright information as shown here:
[root®veda root]# parted
GNU Parted 1,6.3
Copyright (C) 1998, 1999, 2000, 2001, 2002 Free Software
Foundation, Inc.
This program is free software, covered by the GNU General
Public License,
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

Assembly Language Programming in Linux

GNU General Public License for more details.
Using /dev/hda
Information: The operating system thinks the geometry on
/dev/hda is 784/255/63.
(parted)
At the p a r t e d prompt, type p or p r i n t to see the current partition information. In our
example system, we got the following:
(parted) p
Disk geometry for /dev/hda: 0 . 0 0 0 - 6 1 4 9 . 8 8 2 m e g a b y t e s
Disk label type: msdos
Minor
Filesystem
Start
End
Type
Flags
1
2000.280 primary
fat32
boot
0.031
2
6142.038
2000.281
extended
fat32
4000.561
2000.312
5
logical
4102.536
4000.592
6
logical
ext3
4102.567
7
5828.269
logical
ext3
6142.038
5828.300
8
logical
linux-swap
(parted)
The partition information consists of a minor number, start and end along with the type of
partition and the file system. In our example, Windows XP is on the primary partition (minor 1).
The file system on this partition is FAT32 (this is our drive C:). The other FAT32 partition (drive
D:) is about 2 GB. Let's assume that this is the partition that we want to resize to make room for
Linux. We can use the r e s i z e command for this purpose. You can type h e l p to get a command
list:
(parted) help
check MINOR
do a simple check on the filesystem
cp [FROM-DEVICE] FROM-MINOR TO-MINOR
copy filesystem to another partition
help [COMMAND]
prints general help, or help on COMMAND
mklabel LABEL-TYPE
create a new disklabel (partition table)
mkfs MINOR FS-TYPE
make a filesystem FS-TYPE on partititon MINOR
mkpart PART-TYPE [FS-TYPE] START END
make a partition
mkpartfs PART-TYPE FS-TYPE START END
make a partition with a filesystem
move MINOR START END
move partition MINOR
name MINOR NAME
name partition MINOR NAME
print [MINOR]
display the partition table, or a partition
exit program
quit
rescue START END
rescue a lost partition near START and END
resize MINOR START END
resize filesystem on partition MINOR
delete partition MINOR
rm MINOR
select DEVICE
choose the device to edit
change a flag on partition MINOR
set MINOR FLAG STATE
(parted)
You can also get information on a specific command. For example, if you want to know the
format of r e s i z e , you can type h e l p r e s i z e as shown here.
(parted) help resize
resize MINOR START END

resize filesystem on partition MINOR

MINOR is the partition number used by Linux. On msdos disk labels, the
primary partitions number from 1-4, and logical partitions are 5
onwards.
START and END are in megabytes
(parted)

Chapter 5 • Installing Linux

Eile operations Disks Device Qptions Help

¥?
The following drives have been detected:
Device
|
Disks'
^/dev/hda

Number

: Drive Info .
Device:

,^^

/dev/hdb

Mode!:

W--

ST380011A

Capacity (Mb):

76319.1

Length sectors:

156301488

Status:

available.

QTParted :)

] Partition

JType

j Status jSize

/dev/hdb-1 free

Hidden

7.81MB

/dev/hdbl extended Active

74.52GB

i - a 0 3 /dev/hdbS fat32

9.77GB

^ ia04 /dev/hdb6 ntfs

64.75GB

j Used space j Start

|End

I Label

N/A 0.03MB 7.84MB
N/A 7.84MB 74.53GB
32.16MB 7.88MB 9.77GB SHARE
N/A 9.77GB 74.53GB

Figure 5.2 QTparted provides a nice, user-friendly interface similar to the PartitionMagic tool.

To create space for Linux, we resize the minor 5 partition from 2 GB to about 1 GB. This is
done by the following resize command.
( p a r t e d ) r e s i z e 5 2000.312 3000

Notice that we specify 5 as the minor identifying the partition, and its start and end points. To
verify that the partition size has been reduced, we use the print command:
(parted) p
Disk geometry for /dev/hda: 0 . 0 0 0 - 6 1 4 9 . 8 8 2
Disk label type: msdos
Minor
1
2
5
6
7
8
(parted

Start
0.031
2000.281
2000.312
4000.592
4102 .567
5828.300

End
2000.280
6142.038
2996.499
4102.536
5828.269
6142.038

Type
primary
extended
logical
logical
logical
logical

megabytes

Filesystem
fat32

Flags
boot

fat32
ext3
ext3
linux-swap

Clearly, the partition has been reduced in size to about 1 GB. Now we can use the freed space for
installing another operating system. Of course, in our example system, Linux is already installed.
But you get the idea of what is involved in resizing a partition to create free space.
Using QTparted

The QTparted partitioning tool provides a nice user interface to p a r t e d and other partition
programs (see Figure 5.2). The best way to get QTparted is to get the SystemRescueCD
ISO image. For your convenience, this ISO image is on DVD 2. It is distributed under the GNU

Assembly Language Programming in Linux

f@0

Resize partition

13]
_.

Minimum Size: 2MB
Free Space before: ;

New Size: gtBftMKil

1^ MB y

Free Space After 10.00

i|| MB

It Is recommended to backup your data before do this (iteration!

Cancel

. .:
Figure 5.3 When the resize operation is selected, this pop-up window allows us specify the new
partition size.

General Public License given on page 539. If you want to download the latest version of this
image, which is about 100 MB, you can do so from www. s y s r e s c c d . org. Irrespective of how
you got the ISO file, you need to create a CD by burning this image.
You can use this CD to boot into a variety of tools, including QTparted. After the booting
is completed, enter r u n _ q t p a r t e d to launch QTparted. It displays the drives found in your
system. Once you select a drive, it gets the partition information. In our example system there
are two hard disks, / d e v / h d a and / d e v / h d b , as shown in Figure 5.2. By selecting the second
hard disk hdb, we get its partition information shown in this figure. As shown in this screenshot,
the window is divided into three parts: the left side gives a list of disk drives and information on
the selected disk drive (in our example, on / d e v / h d b ) . The partition information is displayed in
the main window.
The operations pull-down menu can be used to select an operation. Some of the common
operations are also shown on the toolbar. To illustrate the working of QTparted, we split the
NTFS partition / d e v / h d b 6 to create about 30 GB of free space. To do this task, we select the
/ d e v / h d b 6 partition and apply the R e s i z e operation from the O p e r a t i o n s menu. You could
also apply the resize operation from the toolbar by selecting the icon <->. This pops up the Resize
partition window shown in Figure 5.3. This window shows the free space before as well after the
partition. In our example, there is no free space on either end. We can specify the new size of
the partition by changing the value or by sliding the size window at the top. In our example, we
reduce the partition to about 35 GB, leaving about 30 GB of free space as shown in Figure 5.4.
Once you click OK, the main window shows the new partition information. However, actual
partitioning is not done. The necessary operations are queued for execution. If you want to proceed
with the resizing operation, you have to commit the changes by selecting Commit from the F i l e
pull-down menu. You can undo the changes by selecting the Undo command from this menu. In
our case, we proceed to commit the resize operation. After this, we get one last chance to change
our mind. Before proceeding to resize the partition, QTparted gives us the warning message
shown in Figure 5.5. If we click "Yes" the operations are executed to resize the partition. The
screenshot in Figure 5.6 clearly shows the free space created by this operation.

Chapter 5 • Installing Linux

^^^^^^^^^^^^^^^^^^^^^^^^

Minimum Size: 2 MB

Free Space Before: |

Jj-'^-r

New Size: 36036.58

"l|MBJr
—
;^BU

Free Space After 130270.73

It is recommended to backup your data before do this operation!

Cancel

Figure 5.4 The resize partition window shows that we want to reduce the NTFS partition to about
36036 MB, or about 35 GB.

-^ 1

You're commiting all changes. Warning, you can lost data!
Make sure also that you're not commiting a busy device..,
In other word PLEASE UMOUNT ALL PARTITIONS before commiting changes!

Yes

Figure 5.5 When we want to proceed with the partition operations, this warning is given before
committing the changes.

•E3®@1
Eile Operations Disks Device Options Help

»t?
The following drives have been detected:
:32.16MBi

Device
Disks
•-^/dev/hda

Number

Device:

/dev/hdb

Model:

ST380011A

Capacity (Mb):

76319.1

Length sectors:

156301488

Status:

available.

j Partition

iType

1 StatusJ Size

T4^55i.,^P^ ^ 1 ^^'^^

7.81MB

/dev/hdb-1 free

/dev/hdbl extended Active

N/A

74.52GB

N/A

- a 0 3 /dev/hdbS fat32

9.77GB

32.16MB

- a 0 4 /dev/hdb6 ntfs

35.19GB

N/A

05 /dev/Tidb-l free

Hidden 29.56GB

JEnd

[Label

0.03MB 7.84MB
7.84MB 74.53GB
7.88MB

9.77GB SHARE

9.77GB 44.97GB

N/A 44.97GB 74.53GB

:i3
QTParted :)

Figure 5.6 This screenshot clearly shows the reduced NTFS partition.

Assembly Language Programming in Linux

general ^ew [^k

J ^\^

£artHon Tools Tasks tielp

J^-'^]'^ ^

d^l^'W^

'^Q

*?" Disk 1 G149 MB
^

Deate a new pattftkjn

Deate a backup paction

InstaB another opeiating system

Resize a partition

Redistribute free space

Mcfge partHiorw

Copy a paitilion

C:
2,000.2 MB

Partition
Diski
Local^DBkICJ__
Local Disk (D:)

PafUtion Opetationt

Cre^* paititioii

j ^

Deietepailifiori

Re.iize/'Move p^rtiticiri

Convat partittcn

5pfit pattilton

Local Disk
149.6 MB NTFS

NTFS

• Type

I NTFS

„lJ5s>saM..
I NTFS

Size MB ! Used MB : UrHJScdMS.StohttJ: Pri/Log.^

2.000.2

JPiirn3rii__

) 277.0

IJIMl„ZiMGj

_^narii__

4.126.2

None

Logical

•fS5> IJndisiefeP.5ftjrioti
^ p Fiopsitist

Figure 5.7 A screenshot of P a r t i t i o n M a g i c showing the tasks that it can perform.

Using PartitionMagic

The P a r t i t i o n M a g i c tool provides a convenient and friendly interface for partitioning your
hard disk. The QTparted interface is designed to be a clone of P a r t i t i o n M a g i c . We can
use P a r t i t i o n M a g i c to create a new partition, resize, delete or move a partition, and so on. In
this section, we describe how an NTFS partition is divided to create free space to install the Linux
operating system.
The initial screenshot of P a r t i t i o n M a g i c is shown in Figure 5.7. The left part of the
screen is divided into three panes that can be used to select the tasks, partition operations, and
pending operations. The first pane allows you to pick a task such as resizing a partition. As we
shall see shortly, depending on the task you picked, a wizard will guide you through the process.
We will show this process for the resizing task.
The second pane gives the available partition operations. The third pane shows the pending
operations. P a r t i t i o n M a g i c collects all the necessary information before actually implementing the operation. The pending operation window shows the operations that are pending to be
executed. If you change your mind, you can undo these operations easily. If you want to go ahead
with the pending operations, click Apply to implement them.
In our example, we use a 6 GB disk that contains two NTFS partitions as shown in Figure 5.7.

Chapter 5 • Installing Linux

^^I^^P^^I

Resize Partitions

^ ^ • ^ a j ^ ^ ^ ^ ^ H
^ R | ^ ^ ^
^ ^ ^ 1

This wizard resizes a partlion and lets you specify how tlie resize
wll affed oHwr partKions on the sarne di5)(.

^^^^^^BC>
^
^ ^ ^ ^ ^ ^ ^ B C ^
^^^^^^^^^^•j^^:
^ ^ ^ ^ ^ ^ ^ ^ ^ | B |

If resizing a partillon larger, the wiiard can take free space
automatically from other partitions on thi«t d$k.
If resizing a parti kn smaler. the wizard can give free space
automaticaly to other partitions on that dR)(.

1
1
1

CickTipsatanythioforhetifJInformatton.

^*»^1I^HBPI|H

Dps-

O -

1 l_,„^xt >
,

Cancel

1, .,...,..,,..... - . . j ^ J

Figure 5.8 The Resize wizard helps you with the resizing task.

Resize Partitions
Select partition
Choose the partition io be resized

Indicate which partition you want to resize. You can elicit on a partition either in the
diagram or in the list beneath.

Figure 5.9 The Resize wizard allows you to select the partition.

In the remainder of this section, let us focus on dividing the second partition (Local Disk D) to
make free space. To do this, we select the R e s i z e task in "Pick a Task . . . " pane. This
selection invokes the R e s i z e wizard shown in Figure 5.8.
The wizard lets you select the partition that is to be resized. In our example, we select the D:
partition (see Figure 5.9). Any time you need help, you can select T i p s . . . for information and

Assembly Language Programming in Linux

Resize Partitions
Specify n e w partition size
Indicate what the size of the new partition should be.

M t

Enter the new size for partition:

Current size:
Mnlmum size:
Maxffnutn size:

D: Local Disk

4149.6 MB
54^ MB
4604.6 MB

New size tor partition:
[2000

- f j Mt

lips-

Figure 5.10 The wizard gives the partition information, including the minimunn and maximum sizes
of the partition.

Give space to w h i c i i partitions?
space obtained from the resized partition can be given to other partitions
on the hard disk.
Decreasing this partition's size will free up space on the disk. In the ist below, ched< the partitions
that the space can be given to.

lips..

Cancel

Figure 5.11 The space obtained from resizing a partition can be given to other partitions.

help. The wizard then asks for the size of the new partition. To help you with the selection, it
specifies the minimum and maximum sizes possible for the given partition along with the current
partition size. In our example, the current partition size is about 4 GB. We can resize this partition
to a size that is between the minimum and maximum sizes given in Figure 5.10. We selected a
partition size of about 2 GB for the current partition (i.e., we are reducing it from about 4 GB to
2 GB).

Chapter 5 • installing Linux

Confirm partition resize
Please review the changes you have selected to be made on your disk.

Local Dixie
JD:4,149.6
MB ^^ NTFS ^ ^

IC:
I2JD0.2MB NTFS;

•

nii

After
^likiikiiiliiilliilillHHiW^ iMiii^li
HD: Local Disk
|2,Q00.2MB NTFSj

BS^B

The partitions on your dsk w l l be resized as shown above. Click Finish to confirm partition resize.

SZJ Undelete Parlilrcn

^;FAT

BFATSZ

HNTFS

• Linux Ext2/3

HLinuKSwap

a Extended

MUnakcated «Unformatted

r Othei

I r Used r Unused

Figure 5.13 Main window with a pending operation to resize the partition.

Assembly Language Programming in Linux

Apply Changes
,''"^\

1 operations are currently pending.
Apply changes now?

Yes

[[Teialk

Figure 5.14 Confirmation window before applying the changes.

We can give the free space obtained by the resizing operation to other partitions. The next
window lets us specify this information. In our example, there is only one other partition (partition
C:), which appears in the window (see Figure 5.11). In our case, we do not want to give the free
space to any other partition, as we want to keep the free space for Linux. Therefore, we deselect
the checkbox next to the C: partition. The final confirmation window shows the "Before" and
"After" picture of the partition (see Figure 5.12). As shown in this figure, we have more than 2 GB
of free space.
Note that the wizard did not really do anything but collect the necessary information in preparation for resizing the partition. As can be seen from Figure 5.13 the resize operation is pending.
If we change our mind, we can undo this operation. On the other hand, if we want to go ahead
with applying these changes, we can apply these changes by clicking Apply button. Before these
changes are permanently applied, we get one last chance to confirm (see Figure 5.14). The main
window in Figure 5.15 shows the creation of a free partition to install Linux. It is clear from this
description that QTparted clones the P a r t i t i o n M a g i c tool.
We have looked at one particular task that P a r t i t i o n M a g i c can perform. As mentioned
before, it provides many more services to manage partitions. For complete details, you should
consult the P a r t i t i o n M a g i c user's manual.

Installing Fedora Core Linux
Before the installation, you need to collect certain details on your system hardware. Information
on the following devices is useful during the installation process:
• Keyboard type
• Mouse type
• Video card
• Monitor
• Sound card
• Network card
If you have Windows on your system, you can get most of this information from the Control
Panel. In the Control Panel, select System and then the Hardware tab. On this window click
Device Manager. Don't worry if you don't have all the information mentioned above. Most of the
time you don't need this information. The Fedora installer will do its best to detect your hardware
but sometimes it fails to recognize your hardware. In that case, it helps if you have this information
handy.

Chapter 5 • Installing Linux

General

'Aem

0sk

PartjUon

I00I5

T^iks

belp

^ ^ 1

; ^

Deate a new pattition

Cieate a backup pattition

Imtafl anothet operating $^tem

Resize a partilioJi

Redstribute fiee space

Meige partitions

Jf)

Copy a parUion

C:
2,000.2 MB

D: L o c a l Disk
2,000,2 MB NTFS

Partition

Size MB ; Used MB : Unused MB i S t a ^ ! Pri/Lpg .

1 Type

Disfcl
LcM^I Disk (C:)

Detete partition

§^

Resize/Move partition

NTFS
Extended

2^(XX2

"IMBl

(1
^

U
m

^ 277.0

723.2
2.149.3

Active
None

Primary
Prinary

•""imr ZSIlZXW^ZBsfriiZl'^l'.
0.0
0.0

None

Logcat

1 ^ Conyeit partition
( ^

?pilt poFtlllCT-,

»S£> ljr-ioi:jli?t^ Pairilion
0

Piopertiss

0 Operations P e n d i n g

• FAT

•FAT32

• NTFS

liLnuxE){t2/3

•LinuxSwap

-.^ Extended

MUnalocated

•

UnformaUsd

T Other

ru««d

rUmwed

jo operations pendng

Figure 5.15 This window clearly shows the unallocated partition to install Linux.

Booting

You begin the installation process by inserting DVD 1 into the DVD-ROM drive and starting your
computer. Note that you should to be able to boot off the DVD-ROM drive for the installation
process to proceed. If successful, you will see the boot screen shown in Figure 5.1.

No Boot Screen?
If you don't see the boot screen, it is likely that you are not able to boot off the DVD-ROM
drive. In this case, if you have Windows on your system, it proceeds with booting the Windows operating system. To make the DVD-ROM drive bootable, restart your computer. As it
starts, check for a message that tells you how to get into BIOS setup (e.g., pressing Del, Fl,
or F2 key). Once you are into the BIOS setup, look for "Boot Options", or something similar.
It tells you the order in which the devices are used for booting. It probably has "A" first and
then "C". That is, it first tries to boot from the floppy drive. If no floppy disk is present, it
boots from the hard disk (drive C). What you need to do is make the DVD-ROM as either the
first in the list, or after thefloppydrive. This should make your system DVD-ROM bootable.

Assembly Language Programming in Linux

—

— I CD Found |

—

To begin testing the CD media before
installation press OK.
Choose Skip to skip the nftcdia test
and start the installation.

Figure 5.16 Media check option screen.

At the boot prompt shown in Figure 5.1, press E n t e r key to start the installation in the graphical mode. The boot screen also gives some of the options available to you. If you are having
problems with the graphical mode (e.g., garbled screen), you may want to start in the text mode
by typing l i n u x t e x t at the boot prompt. Note that the graphical mode requires a minimum
of 192 MB of memory (but 256 MB is recommended) while the text mode requires a minimum of
64 MB only. In the following description, we assume the graphical mode.

Installation Problems?
Sometimes the installation process hangs up, particularly if you have an LCD monitor, or
installing on a laptop. If this happens, try using l i n u x nof b to turn off the frame buffer.
For more details, see the release notes at www. f e d o r a . r e d h a t . com/docs / r e l e a s e notes/fc3/x86/.
Once the mode is selected, you will see aflurryof messages and the boot up process stays in
the text mode for a while. During this time, it performs some simple checks and determines your
basic hardware (keyboard, mouse type, video card). It then launches the graphical mode to begin
the media check process.
Media Check

Before proceeding with the installation process, you are given an option to check the media (see
Figure 5.16). If you are using the media for the first time, you should click OK to allow media
check. It may take several minutes to complete the check. At the end of the test, it will let you know
the media test result (PASS or FAIL). If the media check failed, you need to get a replacement. If
you know that the media is not defective, you can skip this check.
Once the media has passed the test (or if you skipped), you can press c o n t i n u e to proceed
with the installation. Next you will see the installation welcome screen shown in Figure 5.17. If
your hardware (mouse, monitor, and video card) is not properly recognized, the Fedora installer
will use defaults that should work, though these default settings may not give the best performance.

Chapter 5 • Installing Linux

Fedora
j Welcome to
( Fedora Core

!
' During This (nsi;)l]Ation, you an
use your nvjuse or keyboard lo
navigate through ihe various
I The T a b key allows you v>
i move around the screen, the
I u p and Uown a r o w keys TO
• scroll rhtoiiqh \i^K, * ancf - keys
\ »Kpand arifl crill,ipM.?J
: CmMian ^Vlrvaxkt)
! Czech (Ceitinai
; Dartth iDwisk)
\ Puti

Swiss French
Swiss French ll«ml)
• Swiss C«nnan
Swiss Ctfxnafl OatJnli
• Tim* (InscflW*
Ti.iMlrMrip|

!gttriws*'Wotu^l

1 ^ &.«:t

Figure 5.19 Keyboard selection screen.

Select the Keyboard

The installer presents you with the screen shown in Figure 5.19 to allow you select the keyboard
layout of your choice. The selection in this screenshot is the default generic 101-key U.S. English
layout. After selecting the keyboard layout for your system, click next to proceed.
Select the Type of Installation

Having selected the basic devices, your next step is to select the type of installation you want. At
this point, the installer looks for an existing version. If there is one (e.g.. Fedora Core 1), you are
given the option of either upgrading the previous version or installing a new version. If you have a
previous version, select the upgrade option, as it would preserve your current data in the system. If
you select the new install option, you lose all your existing data. Whatever you want to do, make
your selection and press next to proceed. Here we assume that you did not have a previous version
of Fedora and proceed with the new install option.
Next you have to decide on the type of installation you want. The installer supports the following four types (see Figure 5.20):
• Personal Desktop: This type of installation is suitable for a home PC or laptop. It requires
2.3 GB of disk space and installs the GNOME desktop and other tools appropriate for a
home PC. This is the installation type we would use. However, it does not install system
development tools such as compilers, assemblers, and debuggers. We need these tools for
the assembly language programming. We will install these packages later.
• Workstation: This install type is similar to the personal desktop installation except that it
installs software development and system administration tools. It requires about 3 GB of
disk space.
• Server: This type installs packages that are needed to run the machine as a server (such
as a file server, print server, and Web server). By default, it does not install the graphical
environment. It needs about 1.1 GB of disk space.

Chapter 5 • Installing Linux

Fedora
1 Installation Type

*[a

: CKoo*^e the type of insr.imiion
! ih.it best me-PK v o u ' neerlfi.
Wor1iM.ilian
1 An installditon destroys anv
pKvtousiv saved informatvon on
• the sftleaed partitions.
:
i
1
I

Fur inorv infumiiitiort loncufniny
\hv d i f l c r t n t c i unxKiy thcst
in&tuildtion Lttisirev. ruTur to Uiu
prcxlucldocumcrtlalion.

o@i

olik4]atvltatinctii.

Srtrri ilin ntMiiuw r)F(w to 99*1 r^mftrw ccBtrol W T I * r hinfcfilan
Pf aftrvt, «i[itdhi| «ntM'M^ liKk

I tin* itiiuiuiij (t^ll pjiMiunmr lixil, lir,k Unwl, oBwv'i yiiu tu
f-n-Alc iMniticrKi in on intrnf tivr cRrimfrnf-iit. VMJ tan '^t thf
tile sv&t«m types, n»uni pojmt, panid«i sizes, «nd more,

I u ! ^ p.-)ititinnin9 t o o k to a&si^n

fisf^titnmMkAtv VMafitm

i matini points, crp.it« ptirtltlonfl,

0 Muiiujlty futhlun Willi Ui-.k Druiil

j or allocate space for your
j fnsMnation.
1 T o parltion manually, choose
' the Disk Druid p^tnirioning
rool.
Use the B a c k burton to choose

QH^Udp

10-Ht4iMM* Nntir,

Figure 5.21 Disk partition strategy selection screen.

Fedora
i Automatic
1 Partitioning
i

|
1

; Automatic partitioning allows

you to have son^e control

concerning what data is

rutnovod (If 4ny) riumyoui

Sefoie auioniMic (>anitionii>9 can be sei up bv rtw
ImtaJlatian pragram, you mitsi choose how to use

1 warn In h.-ivt^ iMituntatii [urlttioiing

j iyiUrrn-

*.* RtMmifj'i^ iJl biiu* [viilitKim wi thfS ••v'Jptn

;

O Remove at Dafflilor,s en (Ivv svsi«m
0 K M P all parliiicn$ atx) u&eftxlsiini}free sfxtc^

T o rvttiovv only Linux partitions 1
; (pattiilons CTLMICI) from a

; previous Linux insiAlltttion).

't '

! sck'ct R e m o v e all L i t t u *

1 T o remove all partitions on

:;
'

\ p«r1ition» o n this s y s t e m .

i^
ii;

1 partitions cicalud by oUier
• opcTalmg syvtams suth av

: wcnikiwi ysfliyN rA!oofj).

| :'

PI I.J.'

feivj

Lift Q^'AhM.AtKii^i ;;...; u«-LA

1
& IRfX""* UixJ imscldy il wink-if) titi- p.u!i1itKf. tn-.itfij

select R e m o v e all partitions

o n Itiis s y s t e m .

[Stfalgjaripj

-^
i^
1

! your httnl itrivc(l) (thik incluU«> i'S'

iidtH.) ItK! tlitvt^&> ta Uiv tu* lliiv inSft jftaiitxi:

^1
i

|GtBt-t»M;.(j.
If you do not tenow how \o
panjiion your jv$(em or If you
need help with using the
manual pamtlontng KX>1S. refer
to Ihv pfoduLl tiocuniunuiion,
If you U!^ctl autottidlic

[

R«lauM' by the desired operwng system.

By default, the G R U B boot
toader Is installed on rhe
l y s t e m . If you do noi w a n i i o
ini,Xa\t C R U B at, yuui Ixiol

[DrtiHdl JLdltvt

I loAd(>r, t<>l#« C h a n g e b o o t

loader
:

JUrvHR

F«lofa Core /dtv/VotCroiwOaio^VolOO

Othtr

f "

Add

Edit

/(fer/hda)

DHirtp

I You can also choose whicli O S
i (if you hav<» more itian on»)
I should honi hy dehulL Select

A boot loader pajwwtf pwvcrws users from changing
(iplKitr- |M!.MNJ id 1ht^ lu^ind. fw yrt'.Utn vyilom
'•nurity, il >'• nrcnmtneiKlnd Ihti vou ^H •! p.isswnil.

I Default hf^sirle the pinferred
, boot parniton to choose your
• defAultbooMhteOS. Y o u
I ciinnot move ioiward in the

! 1 U « a b « x loader(WMWord. [Ct-ar-^i; :^:^is-vf ( |

; InsMlUtion u n l 9 t \ v^u chtwu*
\ a default boot Imaqie.
i You may sdd, e d i t «nd det«ie
1 the bout lu«(lcr enttiei by

|B3HiilgH<- N e t w o r k D e v i c e s ll«t.

<¥.• iwiomailcillv via DHCP
I To configure ibe network
i«x, 'host.domiiin.com'O

i device, first select itieaevice
; and th«n click Edit. In the Edit
: Interface screen, you c^n

i:j.,

I c h o o i ^ to have the IP and

ii |

I Nptma<^k inftitinaiion

:' '

Mltc«llan«otii 5|

Figure 5.26 Network configuration screen.

Fedora
Firewall
Co nfigu ration

i
i

A firewall sits between your

A fiiiiTjII ( j n ^^l^p pn>viTit uiuul)ii]ri/(vi Jitci;?.!; tayiMu tinmpitln Utm
oul'.lcJf ^nUi. Wnjld ynu litit< l» iiuiUc^ .1 linw>tf ^
ONeftrewotl
(*; Enable n m » l l
You cin use afircwaSto atfow accett to specific sen-ic«t on yMir
eompuief fram Mlwrctsmputfira. Which s«fvlc«s, if ttny, dD you
wish. it> atoM afcess to ^

computer and the network, and
dciemtlrtes which resources on
yout computer rumotv u^vr*^ on .

'~- rt&!>. •: j

W ( * Rfprnpriaiie>

D Mm\ Server ISMTPJ

:;; -i

I security level for yoiif systeni.

! No Fircwatl No firewall
[ provides complete access to

;

Sfi iirily fttKuiM'iIlitiiu (SKImuK] iHOWiiSi!'. timrt gtjirti'iisi-^uiiTv
i.(iiitii:sl% tltiri ItH:^.!' ,iv.iil.ililr> 111 :i tt.i'kluiti.il I KIUK -.v'.tt'fii. tl CJR fa*
M'l u|} iti ,1 ilit.j|;HHJ sl.irir, J '.l.ilc whwh unly •w.iri'. .ibnul llurwr.
Mihtch wMitrj txt ()»i)iMl, nr > futt/ Ariiv«> «l;it^.

your sysiem .ind does no
security cherkinc). Secuiity
c h i r k i n g i!; the di<;.ibling of

•}

access to ceit.iin <^«rvice&. Thi.s i^
should only he selected If yoo

fflHidcHHtij

}3Hri'WM^Niitev|

^M\

Figure 5.27 Firewall setup screen.

Enable firewall: If you are connected to the Internet or to a public network, you should
select this option. This option does not allow any incoming network traffic. If you want to
allow a specific service, you have to explicitly list it. However, if the system does not allow
any incoming connections, it cannot establish service connections on the Internet. Thus, to
allow basic network setup and Web browsing, it allows DHCP and DNS replies.

102

Assembly Language Programming in Linux

Fedora
Additional
Language Support
Selta a t;tnguag« lo use as
^ c defjtuli language. The
default Unsuage is the
IdnyuJjjo u&uJ on thu syblvni
once iniitalljlKin i i tomplclc. If
you choose to iniiUlt other
languaycb, il is pywibk* to
change thi: tlcfauU lanyuittju
after the instnlldiion.

S«lect the ijetauh Unguajje for (he sysrem:

n FiH|liMi(CttMl RntiHi))

•

n Fi)<|li-.h tHoni) Konq)

n l^ni^th (tntbnd)

Fn^lttidndinj

FnqJi-!hlW«v7cal*n4l

ErvjtiiiMPhitipiMMii

• CoglithlSoutt^Africal
U English (2imbabw«)
D Estonisui
D FiKxst {Fame Istante)

one Janqu.-ige on y o u r system,

D Finnish

|fflH^h?Hfip

SetKtDefaiitQr>ty

D EngiiSh lSin(Ht>««)

The instil lldtif>n pro4i.im Ciin
Install and sup|>on severnl
latigunq*";. To use more than
choose specific languages to
be installed, or select alt
lartguages to have all available
lartguages installed on the
kYi.tvrn.

Engtish (USA) ^

G French iB«)|{fuin)
n Fiench (CariikdA)
G Tunth irrjiKe)
G Fr^nth iLinTmburg,)

|r9RHws«fNtiti'%i

N.?ni

Figure 5.28 Additional language selection screen.

No firewall: This option is appropriate, for example, if your computer is not connected to
the Internet.
Additional Language Selection

In addition to the installation language selected before, you can install support for additional languages by clicking the check boxes of the languages. By clicking the S e l e c t A l l button, you
install all supported languages on your system (see Figure 5.28).
Time Zone Selection

This screen allows you to select the time zone of your location. You can specify the location in one
of two ways. You can click a yellow dot on the interactive map to identify your city. (These dots
appear as white dots in Figure 5.29) A red X would appear to indicate your selection. Alternatively,
you can scroll through the location list to select your location.
Root Password Selection

This screen can be used to select a password for the root account (see Figure 5.30). The root
account is special in that it can be used for system administration. It is always a good idea to
keep another account for your day-to-day activities and reserve the root account for administration
purposes only.
Package Selection

The next step in the installation process involves the selection of the packages you want to install.
The installer selects a default set of packages depending on the installation type. The default
package selection for the Personal Desktop type installation is shown in Figure 5.31.

Chapter 5 • Installing Linux

103

Fedora
\ Time Zone
I Selection
I

I Pteas« seJeti th* neaie^i ciiy in yotir unwzow;

5ei yoitr time zone by selecilng
youf computer's jihysrcal
k>canon.
i O i lhi> ifiiuMttlvu m.ip, ttitkon
: a *>pccjfic cily (titiirfcud liv a
ye How del) and .1 red X
appeals indicating your
k«le(.tjan.

! You can alio itfoll thrauyh the
; liMofk)culionilO!.«tL'ctvour

• You can A\%O selett tlio
System Clock uses UTC
opli'JH. (U IC.yf CuuidiriLitviI
Univer-jjl 1 jitu;. jilowi. youi
i system to propcrt/ handle
tSHNhr Hilp

IU»Cr(ption
: AmencjyNwsfiu
• OilltirlU i UutlW*. -

j 0 ihe syittm.
Cm«f 4 password lor the fo« user,

Root e4-.swo«j
! Use the toot account only fot
i adminlstMilon. Once itie
j ln«tailaiinn has he^n completi'd.

ConfMmi

•• [

j create Jinon-(oot account tor vouf
gtneral use ar>d &u - to gain
rooi acc«$s when you rteed to lix
i lomething quicMy. These basic
1 futM** minimise th« ihAntv& of a
j lypo ci( incortuct command dying

j (Jdmayv to yaur sy*^t(.'in.

jtSHwIgHBllpj

jgRriixiMrNiitapj

V.Mk

j•

Nt^i

Figure 5.30 Root password selection screen.

If you want to select a different set of packages, or if you want to add some extra packages,
you can choose the "Customize software package to be installed" option. For example, the default
selection does not install NASM or DDD that we need. However, in our installation, we stick with
the default selection as we can easily add the missing packages later.

104

Assembly Language Programming in Linux

Fedora
Iht> tliiJ.iiil IICNIIIJMIHI vtMirtmtnmtt iii[Iuil4>v
p>iLktii|t* M-irt Ijon, intliiclinii

I Package Installation
I Defaults

Deiktopslwll (GNOME)
Office sgite (OoeiiOlfkert^t
W«b browser
EmaB (Lvctution)
Instsni nM!5H9»ng
'
SvutK) in

The insullation progiarn
KUiomaiicairv chooses package
groups to be insMllcd on ihe
by^tL'tri.
1 Selea Accept the current

After initallulcm, adJtioojM icn prace<;fi.

; Select Customize the set of
' packages to be installed rl ymi

''*> ^etskV (MAUII iOfM-ar« pack age i

; w i i h to i c l c t l (Jiffcrcril or

'') CustotnJM tofrware pacUoes lo b« instail«d

j addiiiurt.il priKkuijv group v.

CSHIIC Hrfp

[5KH"W-«- Wrt,-% i

Figure 5.31 Default package selection screen

Fedora
i About to Install

;

; C a u t i o n : Ontcyou clitkNext.
i (he Jrt!i(ulf.tiion pioyr^im l>uijJn'j
I writing the opcrotiiig system to
the hard drtve(s). This pmceKn
cannot he undone. If you have
decideit not to ronttniie with thi;;

\
iiKl;illalH)n ttui br iuuiiil m Ihar

inst:iliation, thin is the lad .ill the
tntomuuon needed to install
I edora Core on the sysiem, h
itkiy wku a whitti H) install
ever^thinq. dependinc) on hnw
many pAck.ifles need to hw?
intullcd,

gi3«feaa^]

|3^*c.liaNot«!

xdora

lirlSi]

Figure 5.34 Installation process begins by formatting file systems.

The installer has collected the necessary information and is ready to start the installation. This
is your last chance to safely abort the installation process. Click "Next" to proceed with the
installation. After this, the installation proceeds automatically. If you are using the CDs, the
installer prompts you to change the CD a couple of times.
The installation begins by formatting the file system (Figure 5.34). Once the installation is
done, you are prompted to reboot (Figure 5.35). Make sure to remove the media (DVD or CD)
before clicking the reboot button.
Post-Install Configuration

After rebooting the system, you are presented with the screen shown in Figure 5.36. There are a
few more steps to go through before the system is ready for use. These steps are:

106

Assembly Language Programming in Linux

Fedora

ceograiiiaOoiis., the instaAaiton it compfcte.
Retnwe any insiallamon nK
« b o « vow svswr*.

JQ-Miw. i;cliJ

||3lifrftriJ.eN«fi(!

Figure 5.35 You get the congratulatory message when the installation is completed.

m Welcome
ITHJHJ fan i lew iTM»>; ultp*. to Ijku Miint ytxjl tys.!*!"! r^ rvaily l lilt k liir " N t i C
iHtiton m Ihf liiwi't nithi i crwr iii (ontiiiin'

J^LJ lUJ^^J
Figure 5.36 Post-install configuration is the last step involved in completing the installation process.

Accepting licensing agreement;
Setting/confirming the time and date information;
Setting the display properties including the resolution;
Creating a system user: You should not use the r o o t account created during the installation as your regular account. This account should be reserved for system maintenance

Chapters • Installing Linux

107

System User
dAmn'.U.DrviO wit; nl yout !,vMt>n). 1« cfi%>ii.>;i %vt.ti*ni 'unofiuird!',' pfr.f.i* (i
thr mla

Internet

Office

( ^ Preferences

^ S Programming

^jj

Sound & Video

IJUMMliMJIll)^
^

System Tools

File Browser

»•

Q^^ Sen/er Settings

>
^

Authentication

©.Help

« 3 Date & Time

^ ^ Network Servers

•^

Display

Keyboard

^^^^^^^H

^^^^^^1
^^^^^^^1

' ^ Language
@

^^^^^^H
^^^^^^H
^
^^^^^H
^^^^^^^1
^
^^^^^H
^^^^^^^1

t-i>gm Screen
Network

( ^ Printing
C ^ Red Hat Network Configuration
Q

Root Password

^ y Security Level
c3| Soundcard Detection

^^^^^1

[f^il Users and Groups

Figure 5.38 Invoking the package management tool from the GNOME desktop.

several standard packages and some extra packages. The standard packages are always selected
by default. These packages, which include the gcc C compiler and the gdb debugger, are always
available when the group is installed. In addition, several extra packages are also selected by default. However, nasm and ddd are not part of the default extra packages selected for installation.
To see the package details for the selected group, click d e t a i l s . This opens a Package Details window that gives details on the standard packages selected and the extra packages available
along with their default selection. We scroll down this list to select nasm and ddd as shown in
Figure 5.41.
To install these packages, close the package details window and click Update in the Package
Management window. The tool collects the necessary information and prepares to install the
packages. During this stage, it checks for package dependencies and collects a list of packages.
This list includes the actual packages you have selected and any other packages that are required
by the selected packages. Once this analysis is done and a package list has been prepared, you will
see the prompt shown in Figure 5.42. It gives you information on the number of packages selected
and the amount of disk space required. If for some reason you want to abort the installation, this
is a good time. If you want to see the packages selected, click the Show D e t a i l s button.

Chapter 5 • Installing Linux

109

Add or Remove Packages
Desktops
[37/41]

X Window System
r0]
d browse tool
n

ptatchutils - A cotlection of programs for manipulating patch files

|||||||||||i||||i|i|i|||||ii||||||||i|||||—
Package Information
Fi4l Name: nasm
Size:

608 Kilobytes

X Close

Figure 5.41 Adding the nasm and ddd packages.

h'..... ,... ...•

.....

. . . . . .

Completed System Preparation

97 packages are queued for installation
This will take 309,004 Kilobytes of diskspace.

1 Show getaiis |
I I Cancel

ConHnue

Figure 5.42 Once the packages are ready to install, you can view the details by clicking show
D e t a i l s button.

Once you click Continue, the installation process begins. During the installation of the
selected packages, the tool will prompt you for the appropriate Fedora CD. Since we are using
DVD-ROM, ignore this message. That's it!
To remove packages that have been installed, you follow the same procedure. Of course, you
have to uncheck the packages/groups that you want to remove from the system.

Mounting Windows File System
When you have two operating systems, you would like to share files between the two systems.
Of course, you can always use a removable medium such as a floppy disk or a memory stick to

Chapter 5 • installing Linux

111

Hardware Browser

I'Bl^itJLJ^

Ble
CD-ROM Drives
Drive/dev/hda (Geom: 19929f25&'63) (Model: Maxtor 6Y160P0)

Firewire
Floppy Disks

Drive/dev/hdb(Geom; 972a'255/63) (Model: ST380011A)
IDE Controllers
-

Keyboards

•

Network devices
Pointing devices
Scanners

Disk Information

1 Start End

Device

ScHind cards
System devices
USB devices
Video cards

21176 21176

Size (MB) Type

3 Free space

^ /dev/hdb
1
^

hdbl

8 Free space
9729

76309 Extended

hdbS

1276

10001 fat32

hdb6

1277

5870

36036 ntfs

5871

9729

30271 Free space

Figure 5.43 You can use the hardware browser to get inforination on the partitions.

transfer files between the two systems. There is a better solution that eliminates the need for file
copying. In this section we show how you can mount a Windows partition so that Linux can access
this partition.
There is one restriction—Linux is not able to read NTFS partitions, at least not yet. So the
partition that you want to share between the Windows and Linux operating systems has to be
a FAT32 partition. Even if you are using FAT32 for your Windows, you do not want to make
it sharable for security reasons. For example, a single command in Linux can wipe out all the
Windows files. Your Windows operating system, on purpose, hides some system files that you
should not normally access. One such example is the b o o t . i n i file to manage the boot process.
In Linux, you see all the Windows files and you may accidentally modify the contents or delete a
file. This is particularly so during the initial learning stages. Therefore, it is a good idea to create
a separate partition that you want to use for sharing. In our example system, we created a 10 GB
partition to facilitate this sharing. We would like to mount this partition under Linux so that we
could access the files from the Windows system.
As a first step we have to find out the device number assigned to the shared partition. We can
use the Hardware Browser to get this information. The Hardware Browser can be invoked from
the A p p l i c a t i o n s menu by selecting System Tools and then Hardware Browser. We
can use this browser to get information on the system hardware such as CD-ROM drives, floppy
disks, hard drives, keyboard, and so on.
In order to run this browser, you need administrative privileges. That means, if you are not
running it as the root, you will be asked for the root password. To get the partition information
that we are interested here, we select Hard Drives as shown in Figure 5.43. From this information
we notice that the 10 GB FAT32 shared partition is assigned \ d e v \ h d b 5 . To share this partition,
we need to mount this partition.

112

Assembly Language Programming in Linux

Mounting a partition involves creating a mount point, which is a directory. In our example,
we create a mount point called s h a r e in the \mnt directory. Since we have not yet discussed
the Linux commands, you can type the following command in the command terminal window ' to
create this directory:
mkdir /mnt/share
After creating this directory we can mount the partition using the following command:
mount -t vfat /dev/hdb5 /mnt/share
Of course, you have to replace / d e v / h d b 5 with your partition number. It is most likely going
to be / d e v / h d a X where X is a number. To verify that the partition has been mounted, you can
issue the I s command (similar to d i r in Windows).
Is /mnt/share
This command displays the files and directories in this partition.
The mount command mounts the partition for this session. It is not available when you login
the next time. Of course, you can issue the mount command every time you login. We can avoid
this scenario by modifying the f s t a b file. This file is in the / e t c directory. You need to append
the following line to this file:
/dev/hdb5

/mnt/share

vfat

auto,umask=0 0 0

Once this step is done, the partition is mounted automatically as the system reads this file every
time you log into the system.
To edit the / e t c / f s t a b file, use the text editor available under A p p l i c a t i o n s pull-down
menu by following Ac c e s s e r ies=> Text E d i t o r . This is a simple text editor that resembles
the Windows Wordpad editor. We discuss this editor in the next chapter.
To open the f s t a b file you need administrative privileges, which means you must be root to
open this file in read-write mode. All other users can open this file in read-only mode. So be sure
to login as the root to modify this file.
You can use the Open icon to open a file for editing (see Figure 5.44). This pops up the
Open F i l e . . . window to select thefile(see Figure 5.45). You can start by double-clicking the
F i l e system, then e t c directory, andfinallythe f s t a b file to open it for editing. The contents
of the f s t a b file in our example system are shown in Figure 5.44.
There are several other editors available in Linux. Some of the popular ones are the v i and
emacs editors. We describe the v i editor in the next chapter.

Summary
We have provided a detailed step-by-step description of the Fedora Core 3 installation process. The
installation is a two-step process: creating sufficient disk space for the Fedora system and installing
the operating system. The first step is not required if Linux is the only operating system you want
to install. However, if you want to keep the existing Windows operating system and install Linux
as the second operating system, the first step is necessary. It often involves partitioning the disk to
make room for Linux.
We have introduced three partitioning tools for this purpose:
^The command terminal can be invoked from the A p p l i c a t i o n s menu under System T o o l s submenu. More
details on the command terminal are on page 132.

Chapter 5 • Installing Linux

JBl

''^'^.•'\ ,_'. ._.. .,.. ......

113

•••^•iH^^^

^gedit^
iSJ^^abJmqdified)^

nie Edit yiew Search Tools Documents Help

Copy
New
Open
Save ' Print ! Undo
Redo 1 Cut
Paste
Find Replace
;:J fstab* x
# T h i s f i l e i s e d i t e d by f s t a b - s y n c - s e e 'man f s t a b - s y n c ' f o r d e t a i l s
/dev/VolGroupOO/LogVolOO /
ext3
defaults
11
LABEL=/boot
/boot
ext3
defaults
12
none
/dev/pts
devpts
gid=5,mode=620 0 0
none
/dev/shm
tmpfs
defaults
0 0
none
/proc
proc
defaults
0 0
none
/sys
sysfs
defaults
0 0
/dev/VolGroupOO/LogVolOl swap
swap
defaults
0 0
/dev/hdc
/media/cdrom
auto
parnconsole , f s c o n t e x t = s y s t e i n _ u : o b j e c t _ r : removabl e _ t , r o , e x e c . n o a u t o . m a n a g e d 0 0
/dev/hdb
/media/floppy
auto
parnconsole,fscontext=system_u:object_r:removabl e_t,exec,noauto.managed 0 0
|
/dev/hdb5
/mnt/share
vfat
auto,umask=0 0 0

Lnl2, C d l

INS

,;|l

Figure 5.44 Contents of the / e t c / f stab file after adding the last line to mount the shared partition.

• The p a r t e d tool that comes with Fedora is a text-based partitioning tool. It can be used on
FAT32 and other types of partitions but not on NTFS partitions. For NTFS partitions, you
can use one of the other two tools.
• The second tool, QTparted, works on NTFS partitions as well as others. It provides a nice
user-friendly graphical interface and uses a variety of partitioning tools including p a r t e d .
Its user interface closely resembles that of the P a r t t i o n M a g i c tool.
• The last tool we presented in this chapter, P a r t i t i o n M a g i c , is a commercial partitioning
tool. This tool works with different file systems including NTFS partitions.
The Fedora Core 3 Linux can be installed from the DVD-ROM accompanying this book. If
your system does not have a DVD-ROM drive, you can bum CDs from the CD image files provided
in the second DVD. We have given detailed instructions to install Personal Desktop system that is
suitable for new users.
The default software packages selected for this installation type do not include all the software
we need. Specifically, the Personal Desktop installation excludes the development tools group.
This group includes the C compilers, assemblers, and debuggers that we need for our assembly language programming. However, using the Fedora's package management tool, it is rather
straightforward to install these developmental tools. We have given detailed instructions on how
you can do this.
Finally, we presented details on sharing files between the Windows and Linux operating systems. The Linux operating system can see the FAT32 partitions but not the NTFS partitions. For
this reason, we suggested a small partition for sharing the files between the two operating systems.
In our example, we set this partition to 10 GB, but you can set it to whatever size is appropriate in
your case. We have given step-by-step instructions to mount such shared partitions.

114

Assembly Language Programming in Linux

Floppy Drive

IJ exports

CD-R Drive

r'ifb. modes

01/12/2OO(^
ICyi 1/200^

L|£':fdpmi

1(V14/20(V^

p iT) fedora-release

ll/01/20(y
Q3/12/200i

L'ijfi'esystems

1 Ifi gnome-vfs-mime-magic 09/29/2O(V'

\<\
•I-Add

* Remove

Character Coding:

Auto Detected

m
All Files

M £ancel

[Dopen

Figure 5.45 Selection of the / e t c / f stab file using the open f i l e window.

Getting Help
The Fedora Linux has been installed on five different systems (desktops and laptops) using the
procedure described in this chapter. In all these systems, the installation proceeded smoothly.
Even though we have given detailed instructions to install the Fedora Linux operating system, it is
still possible that you encounter installation problems. There are several places you can turn to for
help.
A good starting point is the extensive and detailed bug report database maintained by Red Hat
at h t t p : / / b u g z i l l a . r e d h a t . com. Here you can enter the bug number (if you know it) or
keywords to search for information on your problem.
Several online sources are also available to help resolve installation problems. For example, L i n u x Q u e s t i o n s . o r g maintains several forums for Linux-related issues including installation problems at h t t p : //www. l i n u x q u e s t i o n s . o r g / q u e s t i o n s . Another good
source is the mailing list maintained by Red Hat at h t t p : //www. r e d h a t . com/mailman/
listinfo/fedora-list.
You can also use a good search engine such as Google ( h t t p : //www. g o o g l e . com) to
search the Internet on how others solved your installation problem.

6
Using Linux
Now that you have installed the Fedora Linux on your system, it is time to learn the basics of the
Linux operating system. This chapter assumes that you are familiar with another operating system
such as Windows XR Our focus is on the Fedora 3 Linux. We look at both the graphical user
interface (GUI) and the command line interface (CLI) provided by the system. For new users, the
GUI provides an easy-to-use, point-and-click type of interface. However, as you get familiar with
the system, the command line interface tends to be more efficient. We discuss the basics of the
command line interface and several simple but useful commands. The overview presented here is
sufficient to proceed with our goal of learning the assembly language programming.

Introduction
Assuming that you are new to the Linux operating system, this chapter gives more details on using
the Fedora 3 Linux. You have to login to an account in order to use the Linux system. To log into
the system, the login screen first prompts you for your login usemame. Then you will be asked
to enter your password for the account. This brings up the GNOME desktop shown in Figure 6.1.
This is the default desktop in Fedora 3. The panel at the top contains two pull-down menus:
A p p l i c a t i o n s and A c t i o n s . The A p p l i c a t i o n s menu provides various applications and
systems tools. It provides several useful GUI applications including games, graphics, system tools,
and system settings (see Figure 6.2a). The Act i o n s menu can be used to run applications, search
for files, lock the screen, and logout as shown in Figure 6.2b.
The icons next to the Ac t i o n s menu can be used to launch applications quickly. You can click
these launch panel icons to launch a Web browser, email reader, word processor, presentations
creator, or a spreadsheet. You can customize the launch panel by adding applications of your
choice. For example, we have added the command terminal to the launch panel shown in Figure 6.1
(see the icon next to A c t i o n s menu).
The workspace, appears as black in Figure 6.1, displays four shortcuts: Computer, your
home directory, trash, and a USB hard drive labeled PORTABLE. By clicking the Computer,
you will see the various drives (floppy drive, CD-ROM drives, and hard disks), your file system,
and networks. It is a good idea to get familiar with the desktop by playing with the various menus
and icons. Later we describe some of the applications available to perform commonly required
tasks.

116

Assembly Language Programming in Linux

( j l Tue Dec 21.11:34 AM Q)}

i Applications Actions {0: y j ^ ^ **;^r \-^ O

Figure 6.1 Initial Fedora screen with a USB hard drive (PORTABLE).

To logout of your account, you can use the A c t i o n s menu as shown in Figure 6.2b. When
you select Logout from the A c t i o n s menu, a popup window appears with three options: logout
of the account, shutdown the system, or restart the system. If you opt for logout, it will bring the
login screen. The other options can be selected to either shutdown or restart the system.

Which Account to Use?
During the installation, you created two accounts for yourself: a root account and a system
user account. Always use your system user account for non-administrative activities and
reserve the root account for special administrative tasks. Note that most administrative tasks
can also be done from your system user account. If a task requires administrative privileges,
the system will ask you for the root password. On tlie other hand, if you login as the root, the
system gives you permission to do whatever you want. This can lead to mishaps that you did
not anticipate.

Chapter 6 • Using Linux

117

^ ^ Accessories

Games

Graphics

Internet

Office

Preferences

; I S Programming

Run Application...
Search for Files...

0 ^ Sounds Video

^ & Recent Documents

System Settings

: 3 S System Tools

>
i H l Take Screenshot...

( ^ File Browser
1^
^^

Help
Network Servers

(a) Applications menu

^
^

Lock screen
Log Out
(b) Actions menu

Figure 6.2 The A p p l i c a t i o n s and A c t i o n s pull-down menus of the GNOME desktop.

You can run your programs either by using the desktop or from the command line interface.
The first part of the chapter describes several applications to manage the system. Later part concentrates on the command line interface.

Setting User Preferences
When you installed the Fedora Linux, you have already configured several of your system devices
such as the display and keyboard. It is also straightforward to configure these devices after the
installation. This configuration can be done from the Appl i c a t i o n s pull-down menu under the
P r e f e r e n c e s menu as shown in Figure 6.3. Next we look at some of these tools.
Keyboard Configuration Tool
Figure 6.4 shows the keyboard configuration tool window. It provides four functional areas: Keyboard, Layouts, Layout Options, and Typing Break. In the Keyboard area, you can set two main
options:
• You can decide if you want the repeat-key functionality when a key pressed and held down.
To enable this functionality, select the first checkbox as shown in Figure 6.4. If this option
is enabled, you can select the initial delay and the rate of repetition by using the two sliders.

118

Assembly Language Programming in Linux

@ Accessibility
j ( ^ More Preferences

>
>

W\ Desktop Background
1^

Font

< ^ Keyboard
< ^ Keyboard Shortcuts

^ 1 Menus & Toolbars

Mouse

Network Proxy

Password

^^^ Remote Desktop
(5) Removable Storage
1 ^ Screen Resolution
^

Screensaver

(39 Sound

;

@ Theme
^

Windows

Figure 6.3 The P r e f e r e n c e s menu.

• The second checkbox allows you to enable the cursor to blink in the fields and text boxes.
You can use the slider to select the cursor blink frequency. You can test the setting by typing
a sample text in the test area.
The L a y o u t s tabbed window can be used to select your keyboard model. The default is the
generic 105-key PC keyboard. This window also allows you to add or remove keyboard layouts.
By default, the U. S. English layout is selected.
The Layout s Opt i o n s window allows you to select several options for the behavior of the
various keys such as A l t and CapsLock.
The Typing Break tabbed window can be used to set typing break preferences. You can
set how long you want to work and how long the breaks should be. For example, you can select
to work 30 minutes and take a break for 3 minutes. The system will lock the screen to force you
to take the 3-minute break after 30 minutes of typing. There is also a checkbox that allows you to
postpone the breaks.

Chapter 6 • Using Linux

119
Keyboard Preferences

Keyboard Layouts Layout Options Typing Break
Repeat Keys
0jKey presses repeat when key is held downi
Delay:

Short

irr:—-: - - : [ ~ ~ ) : - - — : ^ - n :

Long

Speed:

ShW

L—:-:-:-4 : : ^-:.r-—"r::rz^-:-

Fast

Cursor Blinking
0 Cursor blinks in text boxes and fields
Speed:

Slow

^™::c^ .--r:":-:-::-

Fast

Type to test settings:

©Help

£@Accessibility...

X Close

Figure 6.4 The keyboard configuration window.

Mouse Configuration Tool

The mouse configuration tool window is shown in Figure 6.5. It has three functional areas to set
the preferences. The B u t t o n s window can be used to set the mouse orientation (left-handed or
right-handed) as well as the double-click timeout period.
Use the C u r s o r s tabbed window to select the cursor size (small, medium, or large). The
changes you make will take effect when you login the next time. You can also select the option
that highlights the cursor when you press the C t r l key. This option is helpful to locate the cursor.
The Motion window can be used to set the motion preferences. It provides two sliders to set
the speed of the mouse pointer and the sensitivity of the mouse pointer to the movement of the
mouse. It also has a third slider to specify the distance you must move an item in order to interpret
the move as the drag-and-drop action.
Screen Resolution Configuration

You can use the screen resolution tool to set the resolution of your screen. It allows you to select
the resolution from the drop-down list (see Figure 6.6). You can also set the refresh rate for your
screen. Once the selection is made, you can click the Apply button. The screen will reflect your
selection and prompts you if you want to keep the new resolution or revert back to the previous
resolution. In general, the installer does a good job in selecting the resolution and refresh rates
appropriate for your screen during the installation.

Assembly Language Programming in Linux

120

Mouse Preferences

SI
Buttons Cursors Motion
Mouse Orientation
DiLeft-hancled mouse;

Double-Cllcic Timeout
Timeout: -===:==Q£3====^

0.4 seconds

®Help

X Close

Figure 6.5 The mouse configuration window.

f5l

Screen Resolution Preferences

Default Settings
RjBsolution:

1024x768

Refresh rate:

85 Hz

a Help

X Close

^ Apply

Figure 6.6 The screen resolution window.

Changing Password

You can use the password tool to change the user password. If you want to change your root account pass word, you should use the root password change tool available in the System S e t t i n g
menu. The tool first requests your current password (see Figure 6.7). It then prompts you to enter
the new password as shown in Figure 6.8. You are asked to reenter the new password to make sure
that you did not make a mistake in entering your new password (Figure 6.9). The new password
will be effective for your next login.

Chapter 6 • Using Linux

121

Changing password for sivarama
(cun-ent) UNIX password:

||
X Cancel

<^ 0K

Figure 6.7 Changing the user password—screen 1.

New UNIX password:
X C.ancel

<^0K

Figure 6.8 Changing the user password—screen 2.

Screensaver

The Screensaver tool can be used to control the behavior of the Screensaver, display power management, and so on. The functionality is divided into two groups: D i s p l a y Modes and Advanced
as shown in Figure 6.10. The D i s p l a y Modes tabbed window is used to enable and control the
behavior of the screensaver. The Screensaver is activated either when the system is idle (when
there is no mouse or keyboard activity) for a specified period of time, or when the screen is locked.
Note that you can lock your screen by using the A c t i o n s menu (see Figure 6.2b on page 117).
The Mode drop-down menu gives you four options:
• Disable Screen Saver: Select this option if you don't want the screensaver.

P:^H

PPHpHHHR ^

Retype new UNIX password:

• • • • • m i 1 xj]

i
K Cancel

1
^GK

Figure 6.9 Changing the user password—screen 3.

122

Assembly Language Programming in Linux

Screensaver Preferences (XScreenSaver 4.18, 14-Aug-2004)

.rJlEl®

JQisplay ModeS: Advanced
Blank Screen
Mode:

Blank Screen Only
1A1

Anemone
Anemotaxis
Ant

Antinspect

AntSpotlight

Apollonian
Apple2

Atlantis

[vl

Attraction (balls)
V

Blank After
£ycle After

minutes

"B Iminutes

Previr:!vv

Settings

n Lock Screen After 0

©Help

X Close

Figure 6.10 The D i s p l a y Modes tabbed window can be used to enable and control the screensaver behavior.

• Blank Screen Only: This option not only enables the Screensaver but also selects blank
screen as your screensaver. This option is shown in Figure 6.10.
• Only One Screen Saver: This option allows you to select a single screensaver from the scrolldown display list. The selected screensaver is displayed in the test area. The S e t t i n g s
button allows you to customize the parameters of the selected screensaver. You can preview
the selected screensaver by clicking the Preview button. You can exit the preview mode
by pressing any key or clicking a mouse button.
• Random Screen Saver: You can select this option if you want more than one screensaver
display, selected from the scroll-down display list. The Cycle A f t e r field allows you
to select the time interval that each screensaver should be used before switching to another
screensaver.
When the screensaver display is enabled (i.e., if you select any of the last three options), you can
specify the idle time period before the screensaver is activated. You can set this period in minutes
in the Blank A f t e r field.

Chapter 6 • Using Linux

123

|Q®®1

Screensaver Preferfinces (XScreenSaver 4.18, 14-Aug-2004)
display Modes |yjijdyancedI
Display Power Management

'Image Manipulation
• / j ^ 0 Grab Desktop [mages
D Grab Yideo Frames
I
n Choose Random Image:

0 Rower Management Enabled
Standby After

120

minutes

Suspend After

120

minutes

Off After

240

minutes

Brow so

Diagnostics
i g |

D Verbose Diagnostics
0 Display Subprocess £n'ors
0 Display Splash Screen at Startup

Colormaps
^ S 0 Install C.o!ormap
0 Fade to Black when Blanking
n Fade from Black When Unblanking
Fade Duration

©Help

3l seconds

X C.Iose

Figure 6.11 The Advanced tabbed window can be used to select the display power management
options.

If you want to lock your screen after the screensaver is activated, select the Lock S c r e e n
A f t e r checkbox and enter the delay between screensaver activation and locking of the screen.
The Advanced tabbed window can be used to specify the display power management options
as well as others shown in Figure 6.11. If you enable the power management, you can specify the
standby, suspend, and off periods. In the standby mode, the screen is blank. In the suspend mode,
the display enters the power-saving mode. The off period indicate the waiting time before the
display is turned off.

System Settings
The system settings menu provides several services to control the system behavior. Since most of
the tools in this menu control the behavior at the system level, these tools require root privileges.
If you are logged into the system using your system user account, you will be prompted for the
root account password before proceeding with the changes.

124

Assembly Language Programming in Linux

Figure 6.12 The System S e t t i n g s menu.

The tools provided by the System S e t t i n g s menu are shown in Figure 6.12. In the previous chapter, you have seen how the Add/Remove A p p l i c a t i o n s tool can be used to load
new software packages. This tool makes managing packages easy by checking the package dependencies and automatically loading all the necessary packages. Using the system settings menu,
you can also change the root password, specify the security level, manage user accounts, and so
on. In this section we show how the date and time as well as display properties can be set. We let
you play with the other services available in this menu.
Setting Date and Time

You can set the date and time by using the Date & Time properties tool. It provides a very nice
calendar interface to set these properties (see Figure 6.13). You have seen this type of interface
during the post-installation setup. As shown in this figure, there are three tabbed windows. The
first window can be used to set the time and date. The date can be specified by using the left and
right arrows on the month and year. The time can be set by entering the three components: hour,
minutes, and seconds.
The second tabbed window allows you to specify the network time protocol that should be used
to synchronize your computer clock. This synchronization is useful as your computer clock drifts
away from the actual time. The amount of drift depends on various factors including the temperature. The drift is measured in PPM (parts per million), which corresponds to 0.0001 %. Since a day
has 86,400 seconds, a drift of approximately 11.57 PPM means a difference of 1 second per day.
The Network Time Protocol (NTP) is designed to synchronize computer clocks, which is important when communicating with other computers. NTP uses UTC (Universal Time Coordinated)
as the reference time. UTC is an official standard that evolved from the GMT (Greenwich Mean
Time). You can use this tabbed window to specify several options including whether you want to
use NTP and so on.

Chapter 6 • Using Linux

125
|Qa_i[xj-

Da|i^]me Prgperties
jOateATimeiI Network Time Protocol |ljme2one
Oate
1 < Decembe

<2004

! 26

6
13
20
27

ihelp

urrent Time: 17:06:01
Hour:

1
5
12
19

Time
ne
*\

7 B
14

2
9
16
23
30

3
10
17
24
31

4
11
18
25

Minute:

[ E _ JiJ

Second: i54

M Cancel

i4
V :

1 ^i^OK 1

Figure 6.13 Setting the date and time.

The third tabbed window can be used to specify the time zone. You have set the time zone
during the installation (see Figure 5.29 on page 103). This window provides the same screen as
that in Figure 5.29.
Setting Display

As in the Date and Time tool, setting the display properties requires root privileges. It has three
tabbed windows: S e t t i n g s , Hardware, and Dual head as shown in Figure 6.14. The settings window lets you specify the screen resolution and color depth. You have seen a similar screen
during the post-installation setup. You have also set the display resolution at the user-level before
(see Figure 6.6).
The Hardware tabbed window allows you to configure the monitor type and the video card.
The Conf i g u r e . . . button displays a large list of monitors and video cards supported by Fedora
3. If your display and video card are not supported, use a generic type that closely matches your
hardware. In general, though, the installer does a pretty good job in detecting your monitor type
and video card or selecting an appropriate generic settings.
The third tabbed window can be used to enable and configure two displays. This window
lets you configure the second video card and set the second screen resolution and color depth.
In addition, you can select a desktop layout for the two screens—either individual desktops or
spanned desktop. In the spanned desktop, your desktop is split between the two screens.

126

Assembly Language Programming in Linux

IQ@B1

Display senings

iS^ttingsj Hardware Dual head
Please select the resolution and color depth that you wish to use:

o—i
Resolution:
C.olor Depth:

1024x768

Millions of Colors

M C.ancel

<:^0K

Figure 6.14 The display configuration window.

Working with the GNOI\/IE Desktop
Fedora 3 supports two types of desktops: GNOME and KDE. The GNOME desktop is the default
desktop and this is the one you have installed. Let's get familiar with this desktop before looking
at the command line interface details.
If you have used My Computer in Windows XP, you have an equivalent one here (see the
Computer icon in Figure 6.1). You can launch the Nautilus graphical tool by double-clicking
the Computer icon. This tool provides an intuitive interface to manage the file system and other
resources in your computer. In our example system, this tool shows four icons as we have a USB
hard disk drive (PORTABLE) attached to the system (see Figure 6.15).
Browsing the File System
You invoke the Nautilus file manager by selecting F i l e Browser from the A p p l i c a t i o n s
main menu (see Figure 6.2a). The file manager is useful to navigate and manage the file system;
you can also use it to browse Web pages and play multimedia content.
This interface looks somewhat similar to the Windows Explorer you are familiar with (see
Figure 6.16). The F i l e menu allows you to create a new folder or a document, open or browse
a folder, and so on. The E d i t menu supports the standard editing actions such as cut, copy,
paste, rename, and so on. In addition, you can use P r e f e r e n c e s to set the file management

Chapter 6 • Using Linux

127

iSl
Rte

IQ®®1

Computer
Edit

View

Places

Help

^
CD-RW/DVD-ROM

Drive

Network

29G External Hard
Drive: PORTABLE

Filesystem

1
1

Computer v

4 items

Figure 6.15 The Nautilus file manager can be used to manage resources in your computer system.

preferences. For example, it is possible to select single-click or double-click to activate an item.
Similarly, you can specify to run executable files when they are clicked.
You can customize the window by using the view menu. This menu lets you specify how
the contents of the window should be viewed (as a list, icons, or catalog). In addition, it allows
you to specify whether you want the L o c a t i o n bar above the main area, S i d e p a n e on the
lefthand side, and S t a t u s bar at the bottom of the window. The Go menu provides services
to visit different locations in your file system, various Web sites, create CDs, and so on. The
Bookmarks menu can be used to add and edit bookmarks.
You can use the L o c a t i o n bar to specify the location you want to go. You could enter here
the URL of a Web site or a location in your file system. For example, in Figure 6.16, the location
bar shows / h o m e / s i v a r a m a and the main window shows the contents of this location.
The icons in the toolbar let you move around the directories and Web sites you visited. The Up
arrow can be used to move up in the directory structure. The Back and Forward buttons work
as in a typical Web browser. The Reload button is for refreshing the content. The Home button
takes you to your home directory (in our example, / h o m e / s i v a r a m a is the home directory).
The Computer button displays the content shown in Figure 6.15.
Editing with GEDIT

The g e d i t is a simple text editor that provides functionality somewhat similar to the Wordpad in
Windows. It can be invoked from the Ac c e s s o r i e s submenu available from the App l i c a t i o n s
main menu as shown in Figure 6.17.
The g e d i t window, shown in Figure 6.18, consists of the following components.
The Menubar at the top of the window contains several pull-down menus that provide commands to open and edit text files. The F i l e menu has commands to manipulate files (open,
create, save, or save as), print files, page setup, print preview, and quit. The E d i t menu has
the standard edit commands such as undo, redo, cut, paste, copy, and delete. In addition,

Assembly Language Programming in Linux

128

[BiiEi]

File Browser: sivarama
Rle

Edit

Back

View

Bookmarks

Fon Show list of known ^plications

©Help

X Cancel

#Run
—J

Figure 6.19 The Run a p p l i c a t i o n window.

I f c S f f l E S Actions % ^ ^ ^ ^

:©i

^ ^ Accessories

• ^ ^ Games

•

Graphics

Internet

Office

> iv>i Dia Diagrams

(IgJ Preferences

';':/ OpenOffice.org Draw

P 3 ) Sound & Video

j ^

System Settings

'^X i OpenOffice.org Math

System Tools

> ^ ^ OpenOffice.org Writer

Programming

( ^ File Browser
^

OpenOffice.org Calc

OpenOffice.org Impress

Project Planner

Help

^-Q§ Network Servers

Figure 6.20 The office applications software suite.

application allows you to open and process Word documents conveniently without going back to
your Windows system.
The OpenOffice Calc is a spreadsheet application that can import and modify Microsoft Excel
spreadsheets. As with the OpenOffice Writer, Calc can also save a spreadsheet in the Excel format.
When stored in the native format, you can password protect the file.

Chapter 6 • Using Linux

131

Carleton Uqiyersitv
File
^

Edit
^ ^ '

View

Boolcmarks

^ ^

^S- @

Tools

Mifi^f^^^^iVut^rMfi
IHelp

http://www.scs.carleton.ca/

L I Red Hat, Inc. ^j Red Hat Network i J Support i.J Shop ' . ) Products

.Training

R Carleton
'SP'

UNIVERSITY

Canada's Captal University

NEWS & EVENTS
Upcoming Seminars
ABOUT US
Our Programs
Admissions

Best Theoretical Student

Co-op

Presentation award at the

OCE-CITO Innovators Showcase
Congratulations to Loreto Bravo,

Calendar & Schedule

Ph.D. Candidate of School of

Graduate Info

Computer Science, who won the best student presentation

MSDNAA Info

award (theoretical category) at the OCE-CITO Innovators

SOS Account Help

Showcase, [continue...]

Course Webpages
Honours Projects

^^_r*f»»'li»t*M-i.CA^in'if\f,DA«»rt-*»'/^k iM I K A KiAt«fiK.

Done

Figure 6.21 The Mozilla Fire Fox Web browser.

Are you wondering if there is a Microsoft PowerPoint equivalent? The answer is the OpenOffice Impress, which lets you create presentations. It can read the PowerPoint files and you can
save your presentation in the PowerPoint format as well. As with the last two applications, you
can password protect the files stored in the native format.
As shown in Figure 6.20, there are also other applications such as Draw for drawings. Math for
equations, Dia for flowcharts, and Project Planner For example, the Dia application is convenient
to draw technical diagrams such as UML diagrams,flowcharts,and so on.
Connecting to the Internet

The applications to connect to the Internet are available under the I n t e r n e t submenu of the
A p p l i c a t i o n s menu (see Figure 6.2a on page 117). Here we briefly mention two common
applications that are often used: a Web browser and an email client. The system installs the
default Web browser Mozilla Firefox, which can be invoked either from the Panel or from the
Internet submenu. To invoke from the Panel, click the globe-and-mouse icon at the top of the
desktop. This Web browser is a derivative of the Netscape Web browser (see Figure 6.21). Because

132

Assembly Language Programming in Linux

IQ^Jj]]

Bl
Ejie

£dit

yiew

Actions

Xcds

[ ^ New
Send / Receive
1 total

Help

Reply to All

Forward

Move

Copy

a
Print

Junk

Not JunK

Cancel
£lear

Subject contains
Subject ^_
!E^^

The Ey^^^

' Welcome to EyojutionI

Jun 25 2002

D Drafts
G>Junk
tS>Ojibox
DSent
©Trash
From:
To:
Subject:
Date:

{> Carlecon
P VFolders

The Evolution Team <^YQlution#xjmian.(;onri>
Evolution Users
Welcome to Evolution!
25 June 2002 14:45:00 +0300 (0/:45 EOT)

. The Evolution Team is proud to welcome you to Evolution, a comp
' system for managing your communications and personal information

IZH

ZL
SMall
S

Contacts

Be
0 Tasks

Getting Started
On the left of the Evolution window is the side bar, with shortcuts to all your mail folders. Below that, you'll find
buttons for your calendars, contacts, tasks, mail.
FOT a complete guide to using Evolution, select Table of Contents in the Help menu, or press the F l key.

New Features
Evolution 2.0 adds new support for connecting to Novell GroupWise servers (6.S.3 or newer) and support for
Exchange 2000/2003. Other new features in Evolution 2.0 include junk mail filtering, S/MIME security, improved
offline I MAP, NNTP (news) suRXjrt, web calendar display, overtayed calendars and new developer APIs for

Figure 6.22 The Evolution email client.

of this relationship, you see a lot of similarities between the Mozilla and Netscape browsers.
You can also run Firefox on your Windows system by downloading the Windows version from
http://www.mozilla.org.
The other application we mention here is the Evolution email client to access your email
(see Figure 6.22 for its screenshot). Again, you will see similarities between this client and the
Netscape's email client. If you are interested, Mozilla has its own version of the email client called
Thunderbird. You can download Thunderbird from the Mozilla site mentioned before.

Command Terminal
Once you are familiar with the Linux operating system you are likely to spend more time with the
terminal emulator shown in Figure 6.23. This is the equivalent of the Command Prompt in the
Windows system. The terminal window can be invoked from the A p p l i c a t i o n s menu under
System Tool s submenu. Since this interface is preferred as you get experience with the system
and its commands, you may want to add it to the panel for single-click invocation as in Figure 6.1.
Note that you can add an application to the panel by right-clicking on it and selecting Add t h i s
l a u n c h e r t o p a n e l option.

Chapter 6 • Using Linux

Edit

View

133

lerminal

Tabs

Help

Run

Ust
[sivarana^localhost

Man

-]£

Figure 6.23 The terminal emulator window.

The terminal emulator is much more flexible than the Windows Command Prompt. Each
terminal can be defined to have its own profile. A default profile is used to open the initial terminal.
A profile defines the various characteristics of the terminal window including the colors, font,
scrollbar type, and so on. The F i l e menu can be used to open a new terminal, define a new
profile, and close a window.
The terminal emulator supports a tabbed window feature that allows multiple terminals to share
a single window. For example. Figure 6.23 has three terminals sharing the same window: L i s t is
used to display a program's source code, Run is used to execute the program and check its output,
and Man is used to look at help information ("man" pages are discussed in the next section). You
can easily switch from one terminal to another by selecting the window tab. You can use the F i l e
menu to close each individual terminal as well as open a tabbed terminal.
The E d i t menu can be used to edit the current profile, copy and paste, as well as to manage
profiles and keyboard shortcuts. The View menu is useful to specify the font size (zoom in, zoom
out, normal size), whether you want the menubar to appear, or if you want a full screen terminal.
You can use the t e r m i n a l menu to change the profile and title (for example, we used L i s t
as our title for a terminal in Figure 6.23). In addition, you can use this menu to specify character
encoding and reset the state of the terminal if you are having problems with terminals. The Tabs
menu lists all the tabbed terminals and allows you to navigate through the tabbed terminals.
The terminal window is useful to enter commands to invoke both GUI and non-GUI applications. For example, you can invoke g e d i t to edit sample . t x t file by entering the following
command in the terminal window:
gedit sample.txt
Here is another example. The command
gnome-terminal
launches another terminal window. Since the terminal emulator requires commands to specify the
work to be done, this interface is often called command-line interface (CLI). Thus, we have two
main interfaces to interact with the system: GUI and CLI.
What are the pros and cons of these interfaces? For beginners, GUI is easier to use than
CLI because of the point-and-click strategy. The main problem with CLI is the learning curve
associated with it—you need to remember various commands and their syntax. In contrast, GUI
makes the available options visible to the user. However, it is time consuming as the selection

134

Assembly Language Programming in Linux

Table 6.1 Sections of the LINUX manual

Section
1
2
3
4
5
6
7
8

Description
User commands
System calls
Library calls
Devices
File formats
Games
Miscellaneous
System administration tools

of these options often requires traversing a hierarchy of menus. In particular, if you know which
command to use and its syntax, it is faster to type the command than using menus. This is typically
the case with experienced users. As we shall see in the rest of the chapter, it is fairly straightforward
to develop simple scripts that combine several commands to accomplish a complex task. For
example, you can feed the output of one command as input to another command. In general,
experienced users tend to prefer CLI whereas new users prefer GUI for its ease of use.
In the remainder of the chapter, we focus on the command line interface and look at various
commands you can use in the terminal window.

Getting Help
Help on the Linux commands is particularly needed with the command line interface. The Linux
manual pages ("man pages") provide information on the various commands. These man pages
are divided into several sections as shown in Table 6.1. Most of the commands executed by the
users are placed in Section 1. The next section gives information on the system calls provided
by the kernel. Section 3 describes the language library functions in C, FORTRAN, and so on.
Special files in the / d e v directory are described in Section 4. Section 5 describes the file formats
and protocols. The next section gives information on the games available. Section 7 describes
conventions, character set standards, file system layout, and other miscellaneous items. The system
administrative commands, described in Section 8, can only be used by the root or superuser.
The man command can be used to access the man pages. Its syntax is simple—just type man
and the command name. For example, to get information on g e d i t , you can enter the man
command as
man g e d i t
Of course, you can use
man man
to get information on how to use the man command itself. This command displays the information
shown in Figure 6.24. You can use the Spacebar key to scroll forward through the document
and the b key to scroll backwards (up and down arrow keys also work). You can use the E n t e r
key to scroll line by line. If you want to quit the document, press q key.

Chapter 6 • Using Linux

135

OQ]
Rle

sivarania^locaihost:Edit

View

Terminal

Tab.s

RES0iD

H.elp

mand)

man(l)

*|i

NAME
man - format and display the on-line manual pages
manpath - determine user's search path for man pages

SYNOPSIS
nan [-acdfThkKtwW] [—path] [-n system] [-p string] [-C config_filG]
[-M pathlist] [-P pager 1 [-S section_list1 rsection] name j_;_^
DESCRIPnON
nan formats and displays the on-line manual pages. If you specify section, nan only looks in that section of the manual, name is normally
j
the name of the manual page, which is typically the name of a command,
1
function, or file.
However, if name contains a slash (/) then aan
1
interprets it as a file specification, so that you can do nan ./foo.5
1
or even nan /cd/foo/bar.l.gz.
J
[

See below
files.

for

description

I'^.jn

of where nan looks for the manual page

OFnONS
-C

config_file

1
Figure 6.24 Manual page entry for the man command.

Some commands appear in more than one section. For example, the passwd command can
be used by the user to change his/her account password. Thus, information on this command is
included in Section 1. There is also another entry for passwd in Section 5. This entry describes
the / e t c / p a s s w d file maintained by the system. To clarify this ambiguity, you can include the
section number in the man command. For example, to get information on the passwd file in
Section 5, we enter the man command as
man 5 passwd
All man pages follow a very simple format. Often, the description given is very cryptic. As a
new user, you may not find the man pages all that helpful. But as you get used to the various
commands, you will find man pages useful as a reference document that gives concise information
on the command syntax and the various options available.

Some General-Purpose Commands
In this section, we introduce some of the common commands that are useful for a beginner. Our
description of these commands is rather brief. Of course, you can use the man command to get
more information on these commands.
Before we proceed further, we need to introduce the shell. For our purposes, the shell can be
thought of as the user's interface to the operating system. It acts as the command line interpreter.
Several popular shells including the Bourne shell (sh), C-shell (csh), Kom shell (ksh), and
Bourne Again shell (bash) are available. Since b a s h is the preferred shell in the Linux systems,
we assume that you are using this shell. Furthermore, b a s h is the default shell in Fedora 3.

136

Assembly Language Programming in Linux

When you type a command, you don't have to specify the location of its executable program.
The shell searches for the program associated with the command among the locations specified by
a special environment variable PATH. This variable essentially defines your search path. Later we
show how you can look at the contents of your PATH variable.
Entering and Editing Commands

Command Line Completion The b a s h shell provides a command line completion feature that
helps us gready. Using this feature you don't need to type the complete command—just enough
for the shell to uniquely identify the command. The shell will complete the command if you press
the t a b key. For example, when we type
ged
the shell completes the command name as g e d i t . Suppose we have a file sample . t x t in our
directory. If there is no other file that starts with s, we can save a few key strokes by typing
ged s
to enter the command
gedit sample.txt
Recalling a Command The shell maintains a record of all your commands in a history file. Every time you enter a command, the complete command is stored in this file. This list is maintained
in the reverse chronological order (i.e., with the most recent command at the head of the list).
You can take a peek into this list by using the h i s t o r y command. When this command is used
without any options, it gives the full list of commands from the history file. However, if you want
to see the most recent n commands, enter h i s t o r y n. For example, the command
history 4
display the last four commands including the current h i s t o r y command:
93
94
95
96

man man
man 5 passwd
gedit sample.txt
history 4

Each command is displayed with a line number in the history file. You can use these line numbers
to execute the corresponding command. For example, to run the man 5 pa s swd command, you
type ! 94 at the prompt. You can run the last command by typing I !. To run a command that
contains a string, just type ? s t r i n g ? where ? is a wildcard (that is, it matches zero or more
characters). For example, given the previous history, the command
!dit
results in the following error:
bash:

! d i t : event not

found

However, by modifying the command to

Chapter 6 • Using Linux

137

!?dit
the shell successfully executes the command
gedit

sample.txt

You can also access the commands from the history list with the keys. Here are some examples:
• Use the up (j) and down (|) arrow keys to navigate the history list. Alternatively, use
C t r l -p to go to the previous command and C t r l -n to go to the next command.
• You can use C t r l - r to incrementally reverse search the command history. Once you press
c t r l - r , you are prompted for a search string. As you enter the search string, a matching
command appears. This is the reason for calling it "incremental search" as it does not wait
for the complete string to be typed.
Sometimes you don't want to execute the command as is. You may want to modify it before
running it again. To do this, you need to edit the command. This is what we are going to discuss
next.
Editing Commands The shell provides several shortcuts for editing a command line. Use the
left (<—) and right (-^) arrow keys to move cursor on your command line. You can also use
C t r l -b to move cursor back by one character and C t r l - f to move it forward by one character.
When you enter text, it is inserted at the current cursor position. The b a c k s p a c e key erases the
character before the cursor. For example, suppose you typed the following command:
gedit samples.txt
Then you notice that you entered the wrong file name (samples instead of sample). To delete
the s, use the left arrow key to move the cursor to the period and press the b a c k s p a c e key. Then
you can simply press E n t e r to execute the command. Table 6.2 gives a list of keystrokes that
allow you to navigate and edit command lines.
Changing Password
You have seen how your password can be changed by using the A p p l i c a t i o n s main menu
from the GNOME desktop. You can also change your password from the command line interface.
To change your current account password, just type the command passwd. It first asks for your
current password and then prompts you to enter the new password twice. If you are the root, you
can specify the user name. Thus, as the root, you can change the password of any account in the
system.
Locating a Program
Two commands are available to find the location of a program. The which command finds the
location of a file within the directories listed in your PATH variable. The where i s command
can find the files that are located in the standard directories. It is not restricted searching only the
directories listed in your PATH variable.
Miscellaneous Commands
If you want to find out the users logged into your system, use the who command. The uname
command gives the operating system running on your system. The echo command displays a
line of text. For example, the command

138

Assembly Language Programming in Linux

Table 6.2 Some of the keystrokes for navigating and editing connnnand lines
Keystroke
Ctrl-b
Ctrl-f
Alt-b
Alt-f
Ctrl-a
Ctrl-e
Ctrl-1
Ctrl-d
Backspace
Ctrl-t

Action
Move cursor back by one character
Move cursor forward by one character
Move cursor back by one word
Move cursor forward by one word
Move cursor to the beginning of the command hne
Move cursor to the end of the command Hne
Clear the screen and leave the command line at the top of the screen
Delete the character at the cursor position
Delete the character before the cursor position
Transpose the current and previous characters

echo $PATH
can be used to see the directories listed in your PATH variable.
The ps command can be used to see the processes running on the system. By default, it gives
information about all processes with the same user id as the current user. It displays the process id
(PID), the terminal associated with the process (TTY), the cumulative CPU time (TIME), and the
command name (CMD). You can also specify several options to get more detailed information.
The last command we discuss here allows you to become super user (i.e., root) without explicitly logging in as the root. Often, when you are in your system user account, you may need to do
a small administrative chore that requires root privileges. Instead of logging out of your current
account and logging in as the root, the su command allows you to assume the root identity. For
example, you can use the su command as shown below:
$ su Password:
#

••******

The su command asks for the root password. If you give the correct password, it changes the
prompt from $ to # to indicate that you are now the root. Then, for example, you can edit the
f s t a b file. Recall from Chapter 5 that only the root can modify this file. To edit the file, you can
use the following command:
gedit /etc/fstab
To leave the super user shell and return to your previous shell, use either e x i t or c t r l - d . As
usual, you can get more details on this command by using the man command.

Chapters • Using Linux

139

bin/ media/

etc/

sbin/

mnt/

home/

sobha/

Desktop/

bookl/

sample,txt

Figure 6.25 The file system is a hierarchy of directories. The root directory is represented by a
slash (/).

File System
The Fedora 3 file system provides the necessary structure to store information. While the file
system supports several types of files, here we focus on ordinary files and directories. The file
system is organized as a hierarchy of directories (similar to that in Windows). Since you are
familiar with hierarchical file systems, we briefly present details of the Fedora file system,
The root directory of the file system is represented by a slash / as shown in Figure 6.25. At
the next level, you see a set of common directories such as b i n / , e t c / , home/, and so on. The
/home directory contains the user directories. In the example, we show three user directories:
s i v a r a m a / , v e d a / , and s o b h a / . Each of these directories may have other directories or files.
Figure 6.25 shows the subdirectories under the s i v a r a m a directory.
Path Names

You can uniquely specify the files and directories in the file system by its path. A path is simply
the list of directories from the root directory (/). For example, the path of s a m p l e . t x t in
Figure 6.25 is
/home/sivarama/sample.txt
This is called the absolute path because it specifies where the s a m p l e . t x t file is within the
file system. Absolute path always begins with the root directory (/). In contrast, a relative path
specifies the path relative to your current directory. We discuss later how you can specify a relative
path,
You can always find your home directory by displaying the value of HOME environment variable as in the following command:
$ echo $HOME
/home/sivarama

140

Assembly Language Programming in Linux

In the command line, tilde (~) represents your home directory. For example, you can specify the
path of sample . t x t as
^/sample.txt
Next we look at a few directory commands.
Directory Commands
To know your current directory, use the pwd command. For example, you will see
$ pwd
/home/sivarama
if you are currently in the s i v a r a m a directory. Use the cd command to change the current
directory. The current directory is represented by a dot (.) and the parent of the current directory
by two dots (. . )• For example cd . . makes the parent directory as your current directory. Here
is another example. If your current directory is b i n / , you can refer to the sample . t x t file as
../home/sivarama/sample.txt
This is a relative path as opposed to the absolute path we had given before.
Next we look at some commands to navigate and access the directories and files. The I s
command lists the contents of a directory. If you don't specify a directory, the current directory is
the default.
To create a new directory, you can use the mkdir (make directory) command. Here is an
example.
mkdir c o u r s e s
creates the c o u r s e s directory in your current directory. If you want to remove an empty directory, you can do so with the r m d i r (remove directory) command. For example, the command
rmdir courses
deletes the directory we just created. If the directory specified is not empty, r m d i r will not delete
the directory. To delete a non-empty directory, you can first empty the directory by deleting its
contents (files and other sub-directories) and then delete the directory. There is also a convenient
way of deleting a non-empty directory by using the rm command, which is discussed in the next
section.
File Commands
Several conmiands are available to view the contents of files. The c a t (concatenate) command
displays the contents of the specified files. You can specify more than one file. For example, the
command
cat sample.txt test
displays the contents of the files (sample . t x t ) and t e s t .
If the file is large, you may want to control how its contents are displayed. There are several
commands that allow you controlled view of the contents. The more command displays the

Chapter 6 • Using Linux

141

contents of a file one screen at a time. To scroll the screen by a single line, press the E n t e r key.
To scroll to the next screen, use the S p a c e b a r key.
A problem with more is that it allows only forward movement—you cannot go back. This is
remedied by the l e s s command. This command allows both forward and backward movement.
In addition, the l e s s command doesn't wait to read the wholefilebefore displaying the contents.
Thus, it is faster if the file is very large. For largefiles,you can also use head to view thefirstpart
of the file and t a i l to view the last part. You can use the man command to find out more details
on these commands.
The cp (copy) command copies files and has the following format:
cp from t o
A path can be specified for from and t o . If no path is given, the current directory is the default.
Here is an example that copies sample . t x t to t e s t .
cp sample.txt test
Instead of copying a file, sometimes you may want to move a file. The mv (move) command
performs this job. For example, the command
mv t e s t

testl

moves the file t e s t to t e s t l . This operation is effectively renaming the file. Thus, you can use
mv to move and rename files. To delete a file, use rm (remove) as in the following example:
rm t e s t
This command deletes the t e s t file. To specify a group of files, you can use wildcards: * to
match zero or more characters, and ? to match a single character. The command

deletes all the files in the current directory. It does not delete the directories. For that, you need to
use the - r option mentioned below.
This last command (rm *) can be quite dangerous—it silently deletes all the files. If you
want the delete process to be interactive, use the - i (interactive) option. With this option, the rm
command asks whether to delete afile;depending on what you say (y or n), the process proceeds.
The mv command works on directories as well as files. However, cp and rm cannot be used on
a directory without options. To work on the directories, you have to use the - r (recursive) option.
As an example, if you want to remove a non-empty directory (say, c o u r s e s ) , you can use
rm - r

courses

Similarly, if you want copy a directory, the cp command with - r would do the job.

Access Permissions
Linux provides a sophisticated security mechanism to control access to individual files and directories. Each file and directory has certain access permissions that indicate who can access and
in what mode (read-only, read/write, and so on). With these permissions the system can protect,

142

Assembly Language Programming in Linux

group
file type permissions

d rwx rwx rwx

user
other
permissions permissions
Figure 6.26 Details of the access permissions.

for example, users from accessing other user's files. However, sometimes, we do need to share
files. For example, a group of software developers working on a project may need to share each
other's files. If we strictly do not allow any sharing of files, the group members would have to
share passwords so that one can login as another user to access the files, or use explicit copying of
files between the user accounts.
To avoid these problems, each Linux user belongs to a group of users as determined by the
system administrator when the account was created. You can verify this information on your system by going to the Applications—>Systern Settings—>Users and Groups menu.
If you are not logged in as root, it will ask you for the root password and then opens a tabbed
window. The U s e r s tabbed window gives information on the user accounts in the system. If
you click the Groups window, it gives the group information: group name to identify the group,
group id, and the group members. In the toolbar of this window, you see icons to add groups and
to modify group membership. The group id is an integer. Fedora reserves group ids less than 500
for system groups. Thus, for user groups, group id starts at 500.
Typically, a user belongs to a single group. However, a user may belong to multiple groups.
From the access permission point of view, there are three types of users: owner, group, and others.
The last group represents everyone else.
Linux, like the UNIX systems, associates three types of access permissions to files and directories in the file system: read (r), write (w), and execute (x). As the names indicate, the read
permission allows read access and the write permission allows writing into the file or directory.
The execute permission is required to execute a file and, for obvious reasons, should be used with
binary and script files that contain executable code or commands.
The Linux system uses nine bits to keep the access permissions as there are three types of
users, each of which can have three types of permissions. The I s command with - 1 (long) option
gives the access permission information, as in the following example.
$ Is -1
drwxr-xr-x 2 sivarama projectl 4096 Dec 24 13:56 Desktop
-rw-rw-r-- 1 sivarama projectl 5610 Dec 30 12:53 sample.txt
-rw-r--r-- 1 sivarama projectl 5610 Dec 30 12:53 test
Each line in this list contains the following information (from left to right):
• The first column displays the permissions for each file/directory. Figure 6.26 shows details
of this column. The first letter before the nine permission letters identifies the file type. In

Chapter 6 • Using Linux

143

our example, the first line with d identifies that Desktop is a directory. A dash (-) is used
for a regular file as in lines 2 and 3. The next nine letters are divided into three fields.
- The first three letters give information on the permissions for the user (that is, the
owner).
- The second set of three letters indicates the permissions for the user group.
- The last three letters represent the permissions for everyone else.

•

•
•
•
•
•

If a permission is off, it is indicated by a dash (-); otherwise, the corresponding letter is
used.
The integer in the second column gives the number of links. For example, if you give
permission to share your file to another user in your group, a link to this file will be placed
for the other user. For most files, the link count is 1.
The next column (sivarama in our example) gives the owner of the file. This is usually
the person who created the file.
The next entry (pro j ec11 here) is the group that has the group access to the file/directory.
The next number gives the size of the file in bytes (characters). In our example, the size of
sample . t x t file is 5610 characters long.
The date and time stamp of the file (when it was created or last modified) are given next.
The last column gives the name of the file/directory.

In our example, the first line indicates that the owner can read, write, and execute the Desktop
directory. The group and others have read and execute permissions but not the write permission.
Note that the read permission on a directory allows you to read its contents. The write permission for a directory means you can write into the directory (e.g., create a subdirectory in it).
What does execute permission on a directory mean? The execute permission for a directory is
redefined from its file definition. If a directory has the execute permission, it allows you to use the
cd command to make it your current directory and/or look at the files in that directory. However,
it will not allow you to read from or write into the directory. For example, the 1 s command will
not list the files in the directory if you don't have the execute permission. However, if you know
the name of afile,you can get details about it or look at its contents.
In the second line, the dash in the file type suggests that s a m p l e . t x t is a regular file. Of
course, we know that it is a text file. Therefore, it does not make sense to use execute permissions.
On this file, the owner and the group have read and write permissions whereas others have only
the read permission. From the third line in this example, we can gather that t e s t is a regular file.
In addition, only the owner has the read/write access. All the others can only read this file.
Setting Access Permissions
The chmod (change mode) command changes the access permissions. The owner of a file can
determine who can access the file. There are two ways of specifying the access permissions: in
octal or symbolic mode.
In the octal mode, you convert the three permission bits for each user type into an octal number.
In this method, the 9-bit permissions can range from 0 0 0 to 7 7 7. The permissions are represented
in the octal notation by writing a 1 for the permission bit that is on and 0 for the off bit. Following
this procedure, the Desktop directory permissions from our previous example (rwx r - x r - x)
are represented in the octal notation as 111 101 101, which is 755 in octal. Similarly, the
permissions for the sample . t x t file r w - r w - r - - can be expressed in octal as 664. The octal

144

Assembly Language Programming in Linux

Table 6.3 Values for the symbolic mode fields

Field
Who

Value
u
g
0

Description
User
Owner's group
All others not in the group
All users

Operator

Add the permission
Remove the permission
Set the permission

r
w

Set the read permission
Set the write permission
Set the execute permission
Set to the file owner's current permissions
Set to the file group's current permissions
Set to the file other's current permissions

=
Permission

u
g
0

string 644 expresses the permissions (rw- r - - r - -) for the t e s t files in our example. Since you
specify the actual permissions, this mode is also called the absolute mode.
In the symbolic mode, mode control words are used to express the access privileges, mostly relative to the current privileges. For example, you may add the write privilege to your group. Mode
control words consist of three fields and take the form .
These fields can take the values shown in Table 6.3.
The format of chmod is
chmod access-mode file-list
The a c c e s s - m o d e can be expressed in the octal or symbolic mode. Here are some examples.
The command
chmod 660 t e s t
changes the permissions to the t e s t file as rw- rw. This means that only the owner and
his/her group can read or write t e s t ; all others cannot access the file. If you use the * wildcard,
permissions for all the files and directories are changed. You can also use other metacharacters
Uke ? to specify f i l e - l i s t . If you want to allow others to read the t e s t file, you can do so
by the following command:
chmod o+r test
To change the permissions of all the files in a directory and in all of its subdirectories, use the -R
(recursive) option. For example, if temp is a directory, the command
chmod -R 764 temp
recursively changes the permissions for all thefilesand directories in temp and its subdirectories.

Chapter 6 • Using Linux

145

Redirection
In Linux, three standard files are automatically opened for you. These default files are used by
your command to read its input and to send its output and error messages. The s t d i n (standard
input) file supplies the data needed by the command. This file is mapped to your keyboard. The
s t d o u t (standard output) file receives the program's output. The error messages are directed to
s t d e r r (standard error) file. These last two standard files are mapped to the terminal running the
command. This default association with files can be changed using redirection operators.
To redirect output of a command to a file, use the > (greater-than) symbol as shown here:
command > out-file
As an example, consider the following command:
I s -1 > l i s t
This command sends the output to the l i s t file. Here is a simple way to create a text file without
using a text editor.
cat > simple.txt
Since we did not specify the file in the c a t command, it expects the input to come from the default
input file ( s t d i n ) . The output of this command is redirected to the s i m p l e . t x t file. You can
terminate the input by typing C t r l - d.
The redirect the input of a command, you can use the < (less-than) symbol as shown below:
command < in-file
Before giving an example of the input redirection, let's first look at a new command. The word
count (wc) command can be used to print the line, word, and byte counts of a file. In fact, you
can specify more than one file on the command line. If no file is specified on the command line, it
reads from the standard input file s t d i n . For example, the following command
$ wc < simple.txt
22 191 1327
uses input redirection to print the three counts for the s i m p l e . t x t file. The three numbers give
the line count (22), word count (191), and byte count (1327).
Both input and output redirections can be used in a single command. For example, if you want
to store the output of the previous command in a file (say, count), the following command will
do the job.
$ wc < simple.txt > count
When we use the output redirection, if the output file already exists, the contents are erased and the
command's output is placed in the file. Instead, if you want the command output to be appended
to the file contents, use the append output symbol (>>). The command sequence
$ cat < samplel.txt > test
$ cat < sample2.txt >> test

146

Assembly Language Programming in Linux

copies the contents of the files sample 1. t x t and sample 2 . t x t into the t e s t file.
Before closing this section, we note that the output redirect command (>) overwrites the file
with the command output. This has the unfortunate side effect of overwriting files by accident
(for example, if a wrong file name is given). You can set the n o c l o b b e r variable to avoid this
problem. You can set this variable by using the s e t command as shown below:
set -o noclobber
When the n o c l o b b e r variable is set, you can force overwriting a file by using a pipe symbol
(discussed next) after the redirection (> |) or append symbol ((>> |). To unset the n o c l o b b e r
variable, you can use the following command:
set +o noclobber
This command allows overwriting of files as before.

Pipes
As we have seen, the Linux system provides several commands. These commands can be treated
as the basic building blocks. While a simple task can be done by using a single command, we may
need several commands to accomplish a complicated task. We may have to feed the output of one
command as input to another to accomplish the task. Of course, we can store the output of the first
command in a temporary file and use this file as the input to the next command. The shell provides
the pipe operator (|) to achieve this without any temporary files. The syntax is
commandl | command2
The output of thefirstcommand (commandl) is fed as input to the second command (command2).
The output of command2 is the final output. Of course, we can connect several commands using
the pipes:
commandl | command2 | commands | command4 | commands
Here is an example that uses a pipe to sort the output of the 1 s command.
Is

I sort

As another example, let's look at a different way to get the three counts (line, word, and byte) for
the s i m p l e . t x t file. We can use the c a t and wc commands connected by a pipe as shown
below:
cat simple.txt | wc
g r e p is another useful command that allows you tofinda string in one or more files. For example,
the command
Is -1 I grep simple
displays the lines in the output of 1 s - 1 command that contain the string s imple.
We have briefly introduced several basic commands. However, this is only a small sample
of the commands that are available. If you are intrigued by this introduction, you can get more
information from several online resources. You can also visit your favorite bookstore for books
dedicated to the Linux operating system.

Chapter 6 • Using Linux

147

Editing Files with Vim
Two text editors, v i and emacs, are commonly used in the Linux system. The Fedora system
you installed has an improved version of v i called vim (vi improved). In this section we briefly
describe the vim text editor.
You can invoke vim to edit a file (say, s i m p l e . t x t ) by typing vim s i m p l e . t x t . The
vim editor works in two modes:
• Command Mode: In this mode, the input is interpreted as a command to the editor. Some
examples of these commands are: save the file, exit vim, move the cursor, delete and search
for text.
• Input Mode: This mode allows you to input text. When you start vim, it is in the command mode. You can switch to the input mode by several commands. For example, the i
command switches it to the input mode.
If the editor is in the insert mode, the bottom line indicates this (see Figure 6.27). The empty lines
are indicated with the tilde characters (~). You can exit vim in one of several ways as shown here:
ZZ
X
wq
q
q!

—
—
—
—
—

Save the buffer and quit
Save the buffer and quit (same as ZZ)
Save the buffer and quit (same as ZZ)
Quit (works only if you don't have any unsaved changes)
Quit without saving the changes in the buffer

The first three commands perform the same action—write the changes in the buffer and quit. The
vim editor has the following commands to write the buffer.
:w
:w f i l e n a m e

—
—

:w!

—

filename

Save the buffer to the current file
Save the buffer to f i l e n a m e ;
it does not overwrite if the file exists
Save the buffer to f i l e n a m e ;
it overwrites if the file exists

The first command saves the buffer to the current file that vim is editing. The second and the third
commands allow you to write the buffer to a new file.
You can move the cursor using the four arrow keys. You can also use the h, j , k, and 1 keys
to move the cursor left, down, up, and right, respectively. In addition, the following commands are
available to move the cursor:
G
IG
0 (zero)
$
w
b

—
—
—
—
—
—

Move cursor to the first line of the file
Move cursor to the last line of the file
Move cursor to the first character of the current line
Move cursor to the last character of the current line
Move cursor forward by one word
Move cursor backward by one word

Note that you have to be in the command mode to issue commands to vim. Also in the command
mode, you can do simple text editing using the following commands:

148

Assembly Language Programming in Linux

@i^HHHi^lHIHSBSMSaSBHHHi^^^^^BEJiil@
Rle Edit Yjew lerminal Tabs Help

This text is entered using the VIM editor.
This editor operates intwo modes:
command mode and insert mode.|

— INSERT —

3,30

All

Figure 6.27 The VIM editor in the input mode.

X
X
dd
u
r

—
—
—
—
—

Delete the character at the cursor
Delete the character before the cursor
Delete the line at the cursor
Undo the most recent change
Replace the character at the cursor by the character typed next

The replace command places vim in the Input mode and the character you type after the r command replaces the current character. After that the editor returns to the Command mode.
In addition to the replace command, you can put vim in the Input mode by any of the following
commands: insert (i), append (a), or open (o). When you are done entering the text, press the
Esc (Escape) key to return to the command mode. The insert command places vim in the Input
mode and the text entered will go before the cursor. The append command is similar to the i
command except that it places the text after the cursor. The open command opens a blank line and
places the cursor at the beginning of the blank line.
To search forward, you can use the / command. For example, / t e x t looks for the string
t e x t in the forward direction (that is, from the current cursor position to the end of the file). To
do the reverse search, use ? in place of the slash. For example, ? t e x t searches backward from
the current cursor position to the beginning of the file.
The last command we discuss here is the substitute (s) command. It lets you replace text
conveniently. The format of this command is

Chapter 6 • Using Linux

149

: [range] s / o l d _ s t r i n g / n e w _ s t r i n g / o p t i o n
The o l d _ s t r i n g is substituted by n e w _ s t r i n g in the range of lines specified by the optional
r a n g e . The range is specified in the format "from, to". If no range is given in the command,
the current line is the default. The o p t i o n is a modifier to the command. Usually, g is used
for global substitutions. The following examples give an idea of how this command works. The
command
:s/test/text
replaces the first occurrence of t e s t in the current line by t e x t . If you want to replace all
occurrences in the current line, use the g option as in the following conmiand:
: s/test/text/g
The command
:1,lOs/test/text
replaces the first occurrence of t e s t in each of the ten lines specified (i.e., lines 1 through 10) by
t e x t . To change all occurrences in these ten lines, add the g option to the previous command.
We have covered only the basic commands available in the vim editor. It has several very
powerful and sophisticated commands. If you decide to use vim you can look at these advanced
commands after you gain some degree of familiarity with the editor.

Summary
This chapter introduced the basics of the Linux system. If you are new to Linux, the material
presented here should get you started with the Fedora 3 system you have installed. We started the
chapter with a discussion of the graphical user interface provided by the system. Specifically, we
focused on the GNOME desktop. For new users, GUI provides an easy, point-and-click interface.
However, as you get familiar with the system, the command line interface tends to be more efficient. We have provided the basics of the command line interface and discussed several basic
commands that are useful. The material presented in this chapter is sufficient to proceed with our
main goal of learning assembly language programming using the Linux tools.

PART IV
NASM

7
Installing and Using
NASM
In this chapter, we introduce the necessary mechanisms to write and execute assembly language
programs. We begin by taking a look at the structure of assembly language programs we use in
this book. To make the task of writing assembly language programs easier, we provide a simple
template to structure the stand-alone assembly language programs used in this book.
Unlike the high-level languages, assembly language does not provide a convenient mechanism
to do input and output. To overcome this deficiency, we have developed a set of I/O routines
to facilitate character, string, and numeric input and output. These routines are described after
introducing the assembly language template.
Once we have written an assembly language program, we have to transform it into its executable form. Typically, this takes two steps: we use an assembler to translate the source program
into what is called an object program and then use a linker to transform the object program into an
executable version. We give details of these steps in the ''Assembling and Linking" section. However, this section uses an assembly language program example. Since we have not yet discussed
the assembly language, you may want to skip this section on the first reading and come back to it
after you have read Chapters 9 and 10, which provide an overview of the assembly language.

Introduction
Writing an assembly language program is a complicated task, particularly for a beginner. We make
this daunting task simple by hiding those details that are irrelevant. A typical assembly language
program consists of three parts. The code part of the program defines the program's functionality
by a sequence of assembly language instructions. The code part of the program, after translating
it to the machine language code, is placed in the code segment. The data part reserves memory
space for the program's data. The data part of the program is mapped to the data segment. Finally,
we also need the stack data structure, which is mapped to the stack segment. The stack serves
two main purposes: it provides temporary storage, and acts as the medium to pass parameters in
procedure calls. We introduce a template for writing stand-alone assembly language programs,
which are written completely in the assembly language.

154

Assembly Language Programming in Linux

We rarely write programs that do not input and/or output data. High-level languages provide
facilities to input and output data. For example, C provides the scanf and p r i n t f functions to
input and output data, respectively. Typically, high-level languages can read numeric data (integers,floating-pointnumbers), characters, and strings.
Assembly language, however, does not provide a convenient mechanism to input and output
data. The operating system provides some basic services to read and write data, but these are fairly
limited. For example, there is no function to read an integer from the user.
In order to facilitate I/O in assembly language programs, we have developed a set of I/O
routines to read and display characters, strings, and signed integers. Each I/O routine call looks
like an assembly language instruction. This similarity is achieved by using macros. Each macro
call typically expands to several assembly language statements and includes a call to an appropriate
procedure. These macros are all defined in the i o . mac file and the assembled procedures are in
the i o . obj file. We use an example program to illustrate the use of these I/O routines as well as
the assembly language template.

Installing NASM
NASM, which stands for netwide assembler, is a portable, public-domain, IA-32 assembler that
can generate a variety of object file formats. In this chapter, we restrict our discussion to a Linux
system running on an Intel PC.
The accompanying CD-ROM has a copy of NASM. If you followed the Linux installation
directions given in Chapter 5, it is already installed. However, if you did not install NASM as part
of the Linux installation, or if you want the latest version, this section explains how you can install
it.
The latest version of NASM can be downloaded from several sources (see the book's Web
page for details). The NASM manual has clear instructions on how to install NASM under Linux.
(To get the NASM manual, see the "Web Resources" section at the end of this chapter.) Here is a
summary extracted from this manual:
1. Download the Linux source archive nasm-X.XX. t a r .gz, where X.XX is the NASM
version number in the archive.
2. Unpack the archive into a directory, which creates a subdirectory nasm-X. XX.
3. cd to nasm-X. XX and type . / c o n f i g u r e . This shell script will find the best C compiler
to use and set up Makefiles accordingly.
4. Type make to build the nasm and n d i s a s m binaries.
5. Type make i n s t a l l to install nasm and n d i s a s m i n / u s r / l o c a l / b i n and to install
the man pages.
This should install NASM on your system. Alternatively, you can use an RPM distribution for the
Fedora Linux. This version is simpler to install—just double-click the RPM file.

Generating the Executable File
The NASM assembler supports several object file formats including ELF (execute and link format)
used by Linux. The assembling and linking process is simple. For example, to assemble the
sample .asm program, we use

Chapter? • Installing and Using N A S M

155

brief title of program

file name

Objectives:
Inputs:
Outputs:
%include

"io.mac"

.DATA
(initialized data go here)
.UDATA
(uninitialized data go here)
.CODE
.STARTUP

; setup

(code goes here)

.EXIT

; returns control

Figure 7.1 Template for the assembly language programs used in the book.

nasm -f e l f

sample.asm

This generates the sample . o object file. To generate the executable file sample, we have to
link this file with our I/O routines. This is done by
I d - s -o sample s a m p l e . o i o . o

Note that nasm requires the i o . mac file and I d needs the i o . o file. Make sure that you have
these two files in your current directory. We give details about the assembly process towards the
end of the chapter.

Assembly Language Template
To simplify writing assembly language programs, we use the template shown in Figure 7.1. We
include the i o .mac file by using the % i n c l u d e directive. This directive allows us to include
the contents of i o . mac in the assembly language program. If you had used other assemblers like
TASM or MASM, it is important to note that NASM is case-sensitive.
The data part is split into two: the . DATA macro is used for initialized data and the . UDATA
for uninitialized data. The code part is identified by the . CODE macro. The . STARTUP macro
handles the code for setup. The .EXIT macro returns control to the operating system.
Now let us dissect the statements in this template. This template consists of two types of
statements: executable instructions and assembler directives. Executable instructions generate
machine code for the processor to execute when the program is run. Assembler directives, on
the other hand, are meant only for the assembler. They provide information to the assembler on

156

Assembly Language Programming in Linux

the various aspects of the assembly process. In this book, all assembler directives are shown in
uppercase letters, while the instructions are shown in lowercase.
The % i n c l u d e directive causes the assembler to include the source code from another file
( i o . mac in our case). This file contains macros for the I/O routines we will discuss in the next
section.
The data section is used to define the program's variables. It is divided into two parts: initialized and uninitialized. The . DATA macro is used to define initialized variables while the . UDATA
macro is used to define uninitialized variables of the assembly language program. Chapter 9 discusses various assembler directives to define and initialize variables used in assembly language
programs.
The . CODE macro terminates the data segment and starts the code section. The . STARTUP
macro sets up the starting point. If you want, you can use the following code in its place.
global

_start

_start:

To return control from the assembly program, we use the .EXIT macro, which places the code to
call the i n t 21H function 4CH to return control. In place of the . EXIT macro, you can write
your own code to call i n t 2IH, as shown below.
mov
int

AX,4C0 0H
21H

Control is returned to the operating system by the interrupt 2IH service 4CH. The service required
under interrupt 2IH is indicated by moving 4CH into the AH register. This service also returns an
error code that is given in the AL register. It is a good practice to set AL to 0 to indicate normal
termination of the program. We discuss interrupts in Chapter 20.

Input/Output Routines
This section describes the I/O routines we developed to input and output characters, strings, and
signed integers. A summary of these routines is given in Table 7.1.
Character I/O
Two macros are defined to input and output characters: PutCh and GetCh. The format of PutCh
is
PutCh

source

where s o u r c e can be any general-purpose, 8-bit register, or a byte in memory, or a character
value. Some examples follow.
PutCh
PutCh
PutCh

'A'
AL
response

The format of Ge tCh is
GetCh

destination

displays character A
displays the character in AL
displays the character located in
memory (labeled response)

Chapter? • Installing and Using NASM

157

Table 7.1 Summary of I/O routines defined in the i o . m a c file
name

operand(s)

operand
location

size

PutCh

source

value
register
memory

8 bits

Displays the character located at
source

GetCh

dest

8 bits

Reads a character into d e s t

nwln

none

PutStr

source

memory

variable

GetStr

dest [,buf_size]

memory

variable

Reads a carriage-retum-terminated string into d e s t and
stores it as a NULL-terminated
string. Maximum string length
is buf size—1.

Putint

source

16 bits

Displays the signed 16-bit number located at s o u r c e

Getint

dest

16 bits

Reads a signed 16-bit number
into d e s t

PutLInt

source

32 bits

GetLInt

dest

Displays the signed 32-bit number located at s o u r c e
Reads a signed 32-bit number
into d e s t

—

32 bits

what it does

Displays a newline
Displays the NULL-terminated
string at s o u r c e

where des t i n a t i o n can be either an 8-bit, general-purpose register or a byte in memory. Some
examples are given here.
GetCh
GetCh

DH
response

In addition, a nwln macro is defined to display a newline. It takes no operands.
String I/O
The P u t S t r and G e t S t r macros are defined to display and read strings, respectively. The strings
are assumed to be in the NULL-terminated format. That is, the last character of the string is the
NULL character, which signals the end of the string. Strings are discussed in Chapter 17.
The format of P u t S t r is
PutStr

source

where s o u r c e is the name of the buffer containing the string to be displayed. For example,
PutStr

message

158

Assembly Language Programming in Linux

displays the string stored in the buffer message. Strings are limited to 80 characters. If the buffer
does not contain a NULL-terminated string, a maximum of 80 characters are displayed.
The format of G e t s t r is
GetStr

destination [, buffer_size]

where d e s t i n a t i o n is the buffer name into which the string from the keyboard is read. The
input string can be terminated by a carriage-return. You can also specify an optional value for
buf f e r _ s i z e . If it is not specified, a buffer size of 81 is assumed. Thus, in the default case,
a maximum of 80 characters are read into the string. If a value is specified, buf f e r _ s i z e - l
characters are read. The string is stored as a NULL-terminated string. While entering a string, you
can backspace to correct the input. Here are some examples.
GetStr
GetStr

in_string
TR_title,41

; reads at most 80 characters
; reads at most 40 characters

Numeric I/O

There are four macros for performing integer I/O: two are used for 16-bit integers and the other
two for 32-bit integers. First we look at the 16-bit integer I/O routines—Putint and Get I n t .
The formats are
PutInt
Getint

source
destination

where s o u r c e and d e s t i n a t i o n can be a 16-bit, general-purpose register or the label of a
memory word.
The P u t i n t macro displays the signed number at s o u r c e . It suppresses all leading Os. The
G e t i n t macro reads a 16-bit signed number into destination. You can backspace while entering
a number. The valid range of input numbers is —32,768 to +32,767. If an invalid input (such as
typing a nondigit character) or out-of-range number is given, an error message is displayed and
the user is asked to type a valid number. Some examples are given below.
PutInt
Putint
Getint
Getint

AX
sum
CX
count

Long integer I/O is similar except that the source and destination must be a 32-bit register
or a label of a memory doubleword (i.e., 32 bits). For example, if t o t a l is a 32-bit number in
memory, we can display it by
PutLInt

total

and read a long integer from the keyboard into t o t a l by
GetLInt

total

Some examples that use registers are:
PutLInt
GetLInt

EAX
EDX

Chapter 7 • Installing and Using NASM

159

An Example Program
Program 7.1 gives a simple example to demonstrate how some of these I/O routines can be used
to facilitate input and output. The program requests the user for a name and a repeat count. After
confirming the repeat count, it displays a welcome message repeat count times.
The program uses the db (define byte) assembly language directive to declare several strings
(lines 11-15). All these strings are terminated by a 0, which is the ASCII value for the NULL
character. Similarly, in the uninitialized data area, we use the r e s b directive to allocate 16 bytes
for a buffer to store the user name and another byte to store the user response to the repeat count
confirmation message (lines 18 and 19). These assembler directives are discussed in Chapter 9.
We use P u t s t r on line 23 to prompt the user for her or his name. The name is read as a string
using G e t S t r into the user_name buffer (line 24). Since we allocated only 16 bytes for the
buffer, the name cannot be more than 15 characters. We enforce this by specifying the optional
buffer size parameter in the G e t S t r macro. The P u t S t r on line 26 requests a repeat count,
which is read by G e t i n t on line 27.
Program 7.1 An example assembly program (for now, you can safely ignore the assembly language
statements on lines 32, 33, and 38)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

An example assembly language program

SAMPLE.ASM

Objective

%include

To demonstrate the use of some I/O
routines and to show the structure
of assembly language programs.
Inputs: As prompted.
Outputs: As per input.
"io.mac"

.DATA
name_msg
query_msg
confirm_msgl
confirm_msg2
welcome_msg
.UDATA
user_name
response

db
db
db
db
db

resb
resb

'Please enter your name: ',0
'How many times to repeat welcome message? ',0
'Repeat welcome message ',0
' times? (y/n) ',0
'Welcome to Assembly Language Programming ',0

16
1

.CODE
.STARTUP
PutStr name_msg
GetStr user_name,16
ask_count:
PutStr query_msg
Getint CX
PutStr confirm_msgl
Putint CX
PutStr confirm_msg2

buffer for user name

prompt user for his/her name
read name (max. 15 characters)
prompt for repeat count
read repeat count
confirm repeat count
by displaying its value

160

31
32
33
34
35
36
37
38
39

Assembly Language Programming in Linux

GetCh
cmp
jne

[response]
byte [response] 'Y
asJc count

display_msg
PutStr welcome_msg
PutStr user name
nwln
loop
display_msg
.EXIT

read user response
if 'y',
display welcome message
otlierwise, request repeat count
display welcome message
display tlie user name
repeat count times

The confirmation message is displayed by lines 28-30. The response of the user y or n is read
by GetCli on line 31. If the response is y, the loop (lines 34-38) displays the welcome message
repeat count times. A sample interaction with the program is shown below.
Please enter your name: Veda
How many times to repeat welcome message? 5
Repeat welcome message 5 times? (y/n) y
Welcome to Assembly Language Programming Veda
Welcome to Assembly Language Programming Veda
Welcome to Assembly Language Programming Veda
Welcome to Assembly Language Programming Veda
Welcome to Assembly Language Programming Veda

Assembling and Linking
Figure 7.2 shows the steps involved in converting an assembly language program into an executable code. It uses the sample . asm file as an example. The source assembly language file
sample . asm is given as input to the assembler. The assembler translates the assembly language
program into an object program sample . o. The linker takes one or more object programs (in
our example the sample .o and i o . o files) and combines them into an executable program
sample. The following subsections describe each of these steps in detail.
The Assembly Process
The general format to assemble a program is
nasm

-f [-o ] [-1 ]

where the specification of fields in [ ] is optional. If we specify only the source file, NASM
produces only the object file. Thus to assemble our example source file sample . asm, we can
use the command
nasm -f e l f

sample.asm

After successfully assembling the source program, NASM generates an object file with the same
file name as the source file but with . o extension. Thus, in our example, it generates the sample . o
file. You can also specify a file name for the object file using the - o option.
If you want the assembler to generate the listing file, you can use
nasm

-f e l f sample.asm - 1 s a m p l e . 1 s t

This command produces two files: sample . o and sample . 1 s t . The list file contains detailed
information as we shall see next.

Chapter 7 • Installing and Using NASM

Editor

EDIT

161
Creates an assembly
language program
sample.asm

sample.asm

Assembler

ASSEMBLE

Assembles the source program
sample.asm
to generate the object program
sample.o

"""v
sample.o
Other object files
Linker

LINK

sample.1st
Links all object programs including
sample.o
to generate the executable program
sample

sample
Figure 7.2 Assembling and linking assembly language programs (optional inputs and outputs are
shown by dashed lines).

The List File Program 7.2 gives a simple program that reads two signed integers from the user
and displays their sum if there is no overflow; otherwise, it displays an error message. The input
numbers should be in the range -2,147,483,648 to +2,147,483,647, which is the range of a 32-bit
signed number. The program uses P u r S t r and Get L i n t to prompt and read the input numbers
(see lines 22, 23 and 26, 27). The sum of the input numbers is computed on lines 30-32.
If the resulting sum is outside the range of a signed 32-bit integer, the overflow flag is set by
the add instruction. In this case, the program displays the overflow message (line 36). If there is
no overflow, the sum is displayed (lines 42 and 43).
The list file for the source program sumprog. asm is shown in Program 7.3. In addition to
the original source code lines, it contains a lot of useful information about the results of the assembly. This additional information includes the actual machine code generated for the executable
statements and the offset of each statement.

Assembly Language Programming in Linux

162

Program 7.2 An assembly language program to add two integers sumprog. asm
1:
2:
3:
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

;Assernbly language program to find sum

%include

SUMPROG.i

Objective: To add two integers.
Inputs: Two integers.
Output: Sum of input numbers.
"io.mac"

.DATA
promptl_msg
prompt2_msg
sum_msg
error__msg

db
db
db
db

.UDATA
numberl
number2
sum

resd
resd
resd

'Enter first number: ',0
'Enter second number: ',0
'Sum is: ',0
'Overflow has occurred!',0

1
1
1

/ stores first number
; stores first number
; stores sum

.CODE
.STARTUP
; prompt user for first number
PutStr promptl_msg
GetLInt [numberl]
; prompt user for second number
PutStr prompt2_msg
GetLInt [number2]
; find sum of two 32-bit numbers
mov
EAX,[numberl]
add
EAX,[number2]
mov
[sum],EAX
; cJieclc
jno
PutStr
nwln
j mp

for overflow
no_overflow
error_msg
done

; display sum
no_overflow:
PutStr
sum_msg
PutLInt [sum]
nwln
done:
.EXIT

Chapter 7 • Installing and Using NASM

163

List File Contents The format of the Hstfilefinesis
line#

offset

machine-code

nesting-level

source-line

l i n e # : is the fisting file line number. These numbers are different from the line numbers in the
source file. This can be due to include files, macros, and so on, as shown in Program 7.3.
o f f s e t : is an 8-digit hexadecimal offset value of the machine code for the source statement.
For example, the offset of the first instruction (line 187) is OOOOOOOOH, and that of the add
instruction on line 219 is 0000003 5H. Source lines such as comments do not generate any offset.
m a c h i n e - c o d e : is the hexadecimal representafion of the machine code for the assembly language instruction. For example, the machine language encoding of
mov

EAX,[number1]

is Al [00000000] (line 218) and requires five bytes. The value zero in [ ] is the offset of
number 1 in the data segment (see line 173).
Similarly, the machine language encoding of
j mp

done

isE91D000000 (line 231), requiringfivebytes.
n e s t i n g - l e v e l : is the level of nesting of "include files" and macros.
s o u r c e - l i n e : is a copy of the original source code line. As you can see from Program 7.3, the
number of bytes required for the machine code depends on the source instruction. When operands
are in memory (e.g., number 1), their relative address is used in the instruction encoding. The
actual value is fixed up by the linker after all the object files are combined (for example, i o . o in
our example). Also note that the macro definitions are expanded. For example, the P u t S t r on
line 186 is expanded on lines 187 through 190.
Program 7.3 The list file for the example assembly program sumprog. asm
/Assembly language program to find sum. . .

6
7
8
9
10
11
12
13
14
15
16
17
18

Objective: To add two integers.
Inputs: Two integers.
Output: Sum of input numbers.
%include "io.mac"
<1> extern
proc_nwln, proc_PutCh, proc_PutStr
<1> extern
proc_GetStr, proc_GetCh
<1> extern
proc_PutInt, proc_GetInt
<1> extern
proc_PutLInt, proc_GetLInt
<1>
<1> ;/
<1> %macro .STARTUP 0
<1> /group dgroup .data .bss
<1>
global
_start
<1> _start:
<1> %endmacro
<1> ;;

Assembly Language Programming in Linux

164
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193

<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>

<1>
00000000
00000009
00000012
00000015
OOOOOOIE
00000027
0000002B
00000034
0000003D
00000046

4 56E74 65 722 0666 97273 742 06E756D6265723A2000
456E746572207365636F6E64206E756D6265723A2000
53756D2069733A2000
4F766572 666C6F7 72 0686173206F636375727265642100

<1>
00000000
00000004
00000008

<1>
<1>
<1>

00000000
00000001
00000006
OOOOOOOB

51
B9 [00000000]
E8 (00000000)
59

OOOOOOOC 50

%macro

.EXIT
mov
xor
int
%endmacro

0
EAX,1
EBX,EBX
0x80

%macro

.DATA 0
segment .data
%endmacro

"
%macro

.UDATA 0
segment .bss
%endmacro

''
.DATA
segment .data
promptl_msg db

'Enter first number: ',0

prompt2_msg

'Enter second number: ',0

sum_msg
error_msg

db
db

'Sum i s;: ' ,0
'Overfl.ow lias occurred!',0

.UDATA
segment
numberl
number2
sum

.bss
resd
resd
resd

1
1
1

; stores first number
; stores first number
; stores sum

.CODE
segment .data
segment .bss
segment .text
.STARTUP

<1>
<1> global _start
<1> _start:
/ prompt user for first number
PutStr promptl_msg
<1> push ECX
<1> mov ECX,%1
<1> call proc_PutStr
<1> pop ECX
GetLInt [numberl]
<1> %ifnidni %1,EAX
<1> pusli EAX

Chapter 7 • instailing and Using NASM

194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251

OOOOOOOD E8(00000000)
00000012 A3[00000000]
00000017 58

00000018 51
0 0 0 0 0 0 1 9 B9 [ 1 5 0 0 0 0 0 0 ]
OOOOOOIE E 8 ( 0 0 0 0 0 0 0 0 )
0 0 0 0 0 0 2 3 59

00000024
00000025
0000002A
0000002F

50
E8(00000000)
A3 [04000000]
58

<1> call proc_GetLInt
<1> mov %1,EAX
<1> pop EAX
<1> %else
<1> call proc_GetLInt
<1> %endif

<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>
<1>

; prompt user for second number
PutStr prompt2_msg
pusli ECX
mov ECX,%1
call proc_PutStr
pop ECX
GetLInt [number2]
%ifnidni %1,EAX
pusli EAX
call proc_GetLInt
mov %1,EAX
pop EAX
%else
call proc_GetLInt
%endif
/ find sum of two 32-bit numbers
mov
EAX,[numberl]
EAX,[number2]
add
mov
[sum],EAX

00000030 Al[00000000]
00000035 0305 [04000000]
0000003B A3 [08000000]

00000040 7116
00000042
00000043
00000048
0000004D

165

51
B9 [34000000]
E8 (00000000)
59

<1>
<1>
<1>
<1>

0000004E E8(00000000)
00000053 E91D000000

<1>

00000058
00000059
0000005E
00000063

51
B9 [ 2 B 0 0 0 0 0 0 ]
E8 ( 0 0 0 0 0 0 0 0 )
59

<1>
<1>
<1>
<1>

00000064
00000065
0000006A
0000006F

50
Al[08000000]
E8 (00000000)
58

<1>
<1>
<1>
<1>

00000070 E8(00000000)

<1>

00000075 B801000000
0000007A 31DB
0000007C CD80

<1>
<1>
<1>

; cliec]^ f o r o v e r f l o w
jno
no_overflow
PutStr
error_msg
pusli ECX
mov ECX,%1
call proc_PutStr
pop ECX
nwln
call proc_nwln
j mp
done
/ display sum
no overflow:
PutStr sum msg
pusli ECX
mov ECX,%1
call proc_PutStr
pop ECX
PutLInt [sum]
push EAX
mov EAX,%1
call proc_PutLInt
pop EAX
nwln
call proc nwln
done:
• EXIT
mov EAX,1
xor EBX,EBX
int 0x80

166

Assembly Language Programming in Linux

Linking Object Files

Linker is a program that takes one or more object programs as its input and produces executable
code. In our example, since I/O routines are defined separately, we need two object files—
sample . o and i o . o—to generate the executable file sample (see Figure 7.2). To do this,
we use the command
Id -s -o sample sample.o io.o

If you intend to debug your program using gdb, you should use the s t a b s option during the
assembly in order to export the necessary symbolic information. We discuss this option in the next
chapter, which deals with debugging.

Summary
We presented details about the NASM assembler. We also presented the template used to write
stand-alone assembly language programs. Since the assembly language does not provide a convenient mechanism to do input and output, we defined a set of I/O routines to help us in performing
simple character, string, and numeric input and output. We used simple examples to illustrate the
use of these I/O routines in a typical stand-alone assembly language program.
To execute an assembly language program, we have to first translate it into an object program
by using an assembler. Then we have to pass this object program, along with any other object
programs needed by the program, to a linker to produce the executable code. We used NASM to
assemble the programs. Note that NASM produces additional files that provide information on the
assembly process. The list file is the one we often use to see the machine code and other details.

Web Resources
Documentation (including the NASM manual) and download information on NASM are available
fromhttp: / / s o u r c e f orge . n e t / p r o j e c t s / n a s m .

8
Debugging Assembly
Language Programs
Debugging assembly language programs is more difficult and time-consuming than debugging
high-level language programs. However, the fundamental strategies that work for high-level languages also work for assembly language programs. We start this chapter with a discussion of these
strategies. Since you are familiar with debugging programs written in a high-level language, this
discussion is rather brief
The following section discusses the GNU debugger (GDB). This is a command-line debugger
A nice visual interface to GDB is provided by Dynamic Data Display (DDD), which is described
toward the end of the chapter We use a simple example to explain some of the commands of GDB
and DDD. The chapter concludes with a summary.
As we have not yet covered the assembly language programming, you may want to read this
chapter in two passes. In the first pass, your goal is to get an overview of the two debuggers and
some hands-on experience in invoking and using them. In this pass, you can skip the material that
specifically deals with assembly language program statements. In the second pass, you can look
at the skipped material. Ideally, you can come back to this chapter after you are familiar with the
material presented in Chapters 9 through 11.

Strategies to Debug Assembly Language Programs
Programming is a complicated task. Loosely speaking, a program can be thought of as mapping a
set of input values to a set of output values. The mapping performed by a program is given as the
specification for the programming task. It goes without saying that when the program is written,
it should be verified to meet the specifications. In programming parlance, this activity is referred
to as testing and validating the program.
Testing a program itself is a complicated task. Typically, test cases, selected to validate the
program, should test each possible path in the program, boundary cases, and so on. During this
process, errors ("bugs") are discovered. Once a bug is found, it is necessary to find the source code
causing the error and fix it. This process is known by its colorful name, debugging.

168

Assembly Language Programming in Linux

Debugging is not an exact science. We have to rely on our intuition and experience. However,
there are tools that can help us in this process. Several debuggers are available to help us in the
debugging process. We will look at two such tools in this chapter—GDB and DDD. Note that our
goal here is to introduce the basics of the debugging process, as the best way to get familiar with
debugging is to use a debugger.
Finding bugs in a program is very much dependent on the individual program. Once an error
is detected, there are some general ways of locating the source code lines causing the error. The
basic principle that helps you in writing the source program in the first place—the divide and conquer technique—is also useful in the debugging process. Structured programming methodology
facilitates debugging gready.
A program typically consists of several modules, where each module may have several procedures. When developing a program, it is best to do incremental development. In this methodology,
a few procedures are added to the program to add some specific functionality. The program must
be tested before adding other functions to the program. In general, it is a bad idea to write the
whole program and then testing it, unless the program is small. The best strategy is to write code
that has as few bugs as possible. This can be achieved by using pseudocode and verifying the logic
of the pseudocode even before you attempt to translate it into the assembly language program.
This is a good way of catching many of the logical errors and saves a lot of debugging time. Never
write an assembly language code with the pseudo-code in your head! Furthermore, don't be in a
hurry to write assembly language code that appears to work. This is short sighted, as we end up
spending more time in the debugging phase.
To isolate a bug, program execution should be observed in slow motion. Most debuggers
provide a command to execute a program in single-step mode. In this mode, a program executes
a single statement and pauses. Then we can examine contents of registers, data in memory, stack
contents, and so on. In the single-step mode, a procedure call is treated as a single statement
and the entire procedure is executed before pausing the program. This is useful if you know that
the called procedure works correctly. Debuggers also provide another command to trace even the
statements of a procedure call, which is useful in testing procedures.
Often we know that some parts of the program work correcdy. In this case, it is a sheer waste of
time to single step or trace the code. What we would like is to execute this part of the program and
then stop for more careful debugging (perhaps by single stepping). Debuggers provide commands
to set up breakpoints. The program execution stops at breakpoints, giving us a chance to look at
the state of the program.
Another helpful feature that most debuggers provide is the watch facility. By using watches,
it is possible to monitor the state (i.e., values) of the variables in the program as the execution
progresses.
In the rest of the chapter, we discuss two debuggers and show how they are useful in debugging assembly language programs. Our debugging sessions use the following program, which is
discussed in Chapter 11.

Program 8.1 An example program used to explain debugging
Parameter passing via registers

PROCEXl.ASM

Objective: To show parameter passing via registers.
Input: Requests two integers from the user.

Chapter 8 • Debugging Assembly Language Programs

5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

;
Output: Outputs the sum of the input integers.
%include "io . mac"
.DATA
prompt_msgl DB
"Please input the first number: ",0
prompt_msg2 DB
"Please input the second number: ",0
sum_msg
DB
"The sum is ",0
.CODE
.STARTUP
PutStr prompt_msgl
Getint CX

; request first number
; CX = first number

PutStr
Getint

prompt_msg2
DX

; request second number
; DX = second number

call
PutStr
Putint
nwln

sum
sum_msg

; returns sum in AX
; display sum

done:
.EXIT

/Procedure sum receives two integers in CX and DX.
;The sum of \the two integers is returned in AX.

30
31

sum:

32
33
34

169

mov
add
ret

AX,CX
AX,DX

; sum = first number
; sum = sum + second number

Preparing Your Program
The assembly process described in the last chapter works fine if we just want to assemble and run
our program. However, we need to prepare our program slightly differently to debug the program.
More specifically, we would like to pass the source code and symbol table information so that
we can debug using the source-level statements. This source-level debugging is much better than
debugging using disassembled code.
To facilitate such symbolic debugging, we need to export symbolic information to the GNU
debugger. This debugger expects the symbolic information in the s t a b s format. More details on
this format are available in the GDB manual available online (see "Web Resources" section at the
end of the chapter).
We can assemble and load a program (say, p r o c e x l . asm) for debugging as follows:
nasm -f elf -g -F stabs procexl.asm
Id -o procexl procexl.o io.o

The executable program p r o c e x l would have the necessary symbolic information to help us in
the debugging process. Note that we need to include the I/O file i o . o because our programs use
the I/O routines described in the last chapter.

170

Assembly Language Programming in Linux

GNU Debugger
This section describes the GNU debugger gdb. It is typically invoked by
gdb file_name
For example, to debug the p r o c e x l program, we can use
gdb procexl
We can also invoke gdb without giving the filename. We can specify the file to be debugged by
using the f i l e command inside the gdb. Details on the f i l e command are available in the
GDB manual. You know that gdb is running the show when you see the (gdb) prompt. At this
prompt, it can accept one of several commands. Tables 8.1 and 8.2 show some of the commands
useful in debugging programs.
Display Group

Displaying Source Code When debugging, it is handy to keep a printed copy of the source code
with line numbers. However, gdb has list commands that allow us to look at the source code. A
simple list command takes no arguments. The command
list
displays the default number of lines. The default is 10 lines. If we issue this command again, it
displays the next 10 lines. We can use l i s t - to print lines before the last printed lines. We can
abbreviate this command to 1.
We can specify a line number as an argument. In this case, it displays 10 lines centered on the
specified line number. For example, the command
1 20

displays lines 15 through 24, as shown in Program 8.2 on page 178. The list command can also
take other arguments. For example,
1 first,last
displays the lines from f i r s t to l a s t .
The default number of lines displayed can be changed to n with the following command:
set listsize n

The command show l i s t s i z e gives the current default value.
Displaying Register Contents When debugging an assembly language program, we often need
to look at the contents of the registers. The i n f o can be used for this purpose. The
info registers

displays the contents of the integer registers. To display all registers including the floating-point
registers, use

Chapter 8 • Debugging Assembly Language Programs

171

Table 8.1 Some of the GDB display commands

Display Commands
Source code display commands
list
list l i s t linenum

Lists default number of source code lines from the last displayed
lines (default is 10 lines). It can be abbreviated as 1.
Lists default number of source code lines preceding the last displayed lines (default is 10 lines)
Lists default number of lines centered around the specified line
number linenum

l i s t first, last

Lists the source code lines from f i r s t to l a s t

Displays the contents of registers except floating-point registers
Displays the contents of registers

info r e g i s t e r . . .

Displays contents of the specified registers

Memory display commands
X address
Displays the contents of memory at address (uses defaults)
x/nf u adddress
Displays the contents of memory at address
Stack frame display commands
backtrace
Displays backtrace of the entire stack (one line for each stack
frame). It can be abbreviated as b t .
backtrace n
Displays backtrace of the innermost n stack frames
backtrace -n
Displays backtrace of the outermost n stack frames
frame n
Select frame n (frame zero is the innermost frame i.e., currently
executing frame). It can be abbreviated as f.
info frame
Displays a description of the selected stack frame (details include
the frame address, program counter saved in it, addresses of local
variable and arguments, addresses of the next and previous frames,
and so on)

info all-registers

Often we are interested in a select few registers. To avoid cluttering the display, gdb allows
specification of the registers in the command. For example, we can use
info eax ecx edx

to check the contents of the eax, ecx, and edx registers.
Displaying Memory Contents We can examine memory contents by using the x command (x
stands for examine). It has the following syntax:

Assembly Language Programming in Linux

172

Table 8.2 Some of the GDB commands (continued on the next page)

Execution Commands
Breakpoint commands
break linenum

Sets a breakpoint at the specified line number in the current source file.

break function

Sets a breakpoint at entry to the specified function in the current source
file.

break *address

Sets a breakpoint at the specified address. This command is useful if
the debugging information or the source files are not available.

info breakpoints

Gives information on the breakpoints set. The information includes
the breakpoint number, where the breakpoint is set in the source code,
address, status (enabled or disabled), and so on.

delete

Deletes all breakpoints. By default, gdb runs this in query mode asking for confirmation for each breakpoint to be deleted. We can also
specify a range as arguments ( d e l e t e range). This command can
be abbreviated as d.

tbreak arg

Sets a breakpoint as in break. The a r g can be a line number, function
name, or address as in the b r e a k command. However, the breakpoint
is deleted after the first hit.

disable range

Disables the specified breakpoints. If no range is given, all breakpoints
are disabled.

enable range

Enables the specified breakpoints. If no range is given, all breakpoints
are enabled.

enable once range

Enables the specified breakpoints once i.e., when the breakpoint is hit,
it is disabled. If no range is given, all breakpoints are enabled once.

Program execution commands
run

Executes the program under gdb. To be useful, you should set up
appropriate breakpoints before issuing this command. It can be abbreviated as r.

continue

Continues execution from where the program has last stopped (e.g.,
due to a breakpoint). It can be abbreviated as c.

x/nfu address

where n, f, and u are optional parameters that specify the amount of memory to be displayed
starting at a d d r e s s and its format. If the optional parameters are not given, the x command can
be written as
X address

Chapter 8 • Debugging Assembly Language Programs

173

Table 8.2 (continued)
Single stepping commands
step

Single-steps execution of the program (i.e., one source line at a time).
In case of a procedure call, it single-steps into the procedure code. It
can be abbreviated as s.

s t e p count

Single-Steps program execution count times. If it encounters a breakpoint before reaching the count, it stops execution.

Single-steps as the s t e p command does; however, procedure call is
treated as a single statement (does not jump into the procedure code).
As in the s t e p command, we can specify a c o u n t value. It can be
abbreviated as n.

next count

Single-steps program execution count times. If it encounters a breakpoint before reaching the count, it stops execution.

stepi

Executes one machine instruction. Like the s t e p command, it singlesteps into the procedure body. For assembly language programs, both
s t e p and s t e p i tend to behave the same. As in the s t e p command,
we can specify a count value. It can be abbreviated as s i .

nexti

Executes one machine instruction. Like the n e x t command, it treats a
procedure call as a single machine instruction and executes the whole
procdure. As in the n e x t command, we can specify a c o u n t value. It
can be abbreviated as n i .

Miscellaneous Commands
s e t l i s t s i z e n Sets the default list size to n lines
show l i s t s i z e
Shows the default list size
q
Quits g d b

In this case the default values are used for the three optional parameters. Details about these
parameters are given in Table 8.3.
Next we look at some examples of the x command. When g d b is invoked with Program 8.1,
we can examine the contents of the memory at p r o m p t _ m s g l by using the following x command:
(gdb) x/lsb &:prompt_msgl
0x80493e4 :

"Please input the first number:

This command specifies the three optional parameters as n = 1, f = s, and u = b. We get the
following output when we change the n value to 3:
(gdb) x/3sb &prompt_msgl
0x80493e4 :
0x8 04 94 04 :
0x8049425 :

"Please input the first number: '
"Please input the second number:
"The sum is "

174

Assembly Language Programming in Linux

Table 8.3 Details about the optional parameters

Repeat count (decimal integer)
Specifies the number of units (in u) of memory to be displayed.
Default value is 1.

Display format
X
displays in hexadecimal
d
displays in decimal
u
displays in unsigned decimal
o
displays in octal
t
displays in binary (t for two)
a
displays address both in hexadecimal and as an offset
from the nearest preceding symbol
c
displays as a character
s
displays as a null-terminated string
t
displays as afloating-pointnumber
i
displays as a machine instruction
Initial default is x. The default changes each time x is used.

Unit size
b
bytes
h
halfwords (2 bytes)
w
words (4 bytes)
g
giant words (8 bytes)
Initial default is w. The default changes when a unit is specified
with an x command.

As you can see from the program listing, it matches the three strings we declared in p r o c e x l .
asm program.
Displaying Stack Frame Contents This group of display commands helps us trace the history
of procedure invocations. The b a c k t r a c e command gives a list of procedure invocations at that
point. This list consists of one line for each stack frame of the stack. As an example, consider a
program that calls a procedure sum that calls another procedure compute, which in turn calls a
third procedure g e t _ v a l u e s . If we stop the program in the g e t _ v a l u e s procedure and issue
a b a c k t r a c e command, we see the following output:
(gdb)
#0
#1
#2

get_values () at testex.asm:50
0x080480bc in compute () at testex.asm:41
0x080480a6 in sum () at testex.asm:27

This output clearly shows the invocation sequence of procedure calls with one line per invocation.
The innermost stack frame is labelled #0, the next stack frame as #1, and so on. Each line gives
the source code line that invoked the procedure. For example, the c a l l instruction on line 27
(in the source file t e s t e x . asm) invoked the compute procedure. The program counter value

Chapter 8 • Debugging Assembly Language Programs

175

0x08 04 8 0a6 gives the return address. As we shall discuss in Chapter 11, this is the address of
the instruction following the
call

compute

instruction in the sum procedure. Similarly, the c a l l instruction on line 41 in the compute procedure invoked the g e t _ v a l u e s procedure. The return address for the g e t _ v a l u e s procedure
is 0 x 0 8 0 4 8 0 b c .

We can also restrict the number of stack frames displayed in the b a c k t r a c e command by
giving an optional argument. Details on this optional argument are given in Table 8.1. For example,
b t 2 gives the innermost two stack frames as shown below:
(gdb) b t 2
#0 get_values () at testex.asm:50
#1 0x080480bc in compute () at testex.asm:41
(More stack frames follow...)

To display the outermost two stack frames, we can issue b t - 2 . This command produces the
following output for our example program:
(gdb) b t -2
#1 0x080480bc i n compute () a t t e s t e x . a s m : 4 1
#2 0x080480a6 i n sum () a t t e s t e x . a s m : 2 7

The frame and i n f o frame commands allow us to examine the contents of a frame. We
can select a frame by using the frame command. For our test program, frame 1 gives the
following output:
(gdb) frame 1
#1 0x080480bc in compute () at testex.asm:41
41
call get_values

Once a frame is selected, we can issue the i n f o frame command to look at the contents of this
stack frame. Note that if no frame is selected using the frame command, it defaults to frame 0.
The output produced for our example is shown below:
(gdb) i n f o f
Stack level 1, frame at OxbffffaOO:
eip = 0x80480bc in compute (testex.asm:41)/ saved eip 0x80480a6
called by frame at OxbffffaOS, caller of frame at 0xbffff9f8
source language unknown.
Arglist at OxbffffaOO, args:
Locals at OxbffffaOO, Previous frame's sp is 0x0
Saved registers:
ebp at OxbffffaOO, eip at 0xbffffa04
(gdb)

In our example, each stack frame consists of the return address (4 bytes) and the EBP value stored
by e n t e r 0, 0 instruction on entering a procedure. The details given here indicate that the
current stack frame is at OxbffffaOO and previous and next frames are at Oxbf f f f a08 and
0 x b f f f f 9 f 8 , respectively. It also shows where the arguments and locals are located as well as
the registers saved on the stack. In our example, only the return address (EIP) and stack pointer
(EBP) are stored on the stack for a total of 8 bytes.

176

Assembly Language Programming in Linux

Execution Group

Brealcpoint Commands Breakpoints can be inserted using the b r e a k commands. As indicated
in Table 8.2, breakpoints can be specified using the source code line number, function name, or
the address. For example, the following commands insert breakpoint at line 20 and function sum
on line 32 in the p r o c e x l . asm program:
(gdb) b 20
Breakpoint 1 at 0x80480b0: file procexl.asm, line 20.
(gdb) b sum
Breakpoint 2 at 0x80480db: file procexl.asm, line 32.
(gdb)

Note that each breakpoint is assigned a sequence number in the order we establish them.
We can use i n f o b r e a k p o i n t s (or simply i n f o b) to get a summary of the breakpoints
and their status. For example, after establishing the above two breakpoints, if we issue the i n f o
command, we get the following output:
(gdb) i n f o b
Num Type
1
breakpoint
2
breakpoint
(gdb)

Disp Enb Address
What
keep y
0x080480b0 p r o c e x l . a s m : 2 0
keep y
0x080480db p r o c e x l . a s m : 3 2

The Disp (Disposition) column indicates the action needed to be taken (keep, disable, or delete)
when hit. By default, all breakpoints are of 'keep' type as in our example here. The enb column
indicates whether the breakpoint is enabled or disabled. A 'y' in this column indicated that the
breakpoint is enabled.
We can use t b r e a k command to set a breakpoint with 'delete' disposition as shown below:
(gdb) tbreak 22
Breakpoint 3 at 0x80480cl: file procexl.asm, line 22.
(gdb) info b
Num Type
Disp Enb Address
What
1
breakpoint
keep y
0x080480b0 procexl.asm:20
2
breakpoint
keep y
0x080480db procexl.asm:32
3
breakpoint
del y
0x080480cl procexl.asm:22
(gdb)

We can use the e n a b l e and d i s a b l e commands to enable or disable the breakpoints. The
following example disables breakpoint 2:
(gdb) disable 2
(gdb) info b
Num Type
1
breakpoint
2
breakpoint
3
breakpoint

Disp
keep
keep
del

Enb
y
n
y

Address
0x080480b0
0x080480db
0x080480cl

What
procexl.asm:20
procexl.asm:32
procexl.asm:22

(gdb)

If we want to enable this breakpoint, we do so by the following command:
(gdb) e n a b l e 2

Chapter 8 • Debugging Assembly Language Programs

177

We use the e n a b l e once command to set a breakpoint with 'disable' disposition as shown
below:
(gdb) enable once 2
(gdb) info b
Num Type
Disp Enb
1
breakpoint
keep y
2
breakpoint
dis y
3
breakpoint
del y
(gdb)

Address
0x080480b0
0x080480db
0x080480cl

What
procexl.asm:20
procexl.asm:32
procexl.asm:22

Program Execution Commands Program execution conmiand r u n is used to start the execution of a program. To be able to debug the program, breakpoints must be established before issuing
the r u n command.
The c o n t i n u e command resumes program execution from the last stop point (typically due
to a breakpoint).
Single-Stepping Commands

The gdb debugger provides two basic single-stepping commands: s t e p and n e x t . The s t e p
command executes one source line at a time. In case of a procedure call, it traces procedure execution in the single-step mode. The n e x t command is similar to the s t e p command except that
it does not single-step through the procedure body. Instead, it executes the entire procedure. Both
s t e p and n e x t conmiands can take a count argument as shown in Table 8.2 on page 173. This
table also gives details on the machine instruction version of these s t e p and n e x t commands
(see the s t e p i and n e x t i commands).
l\/liscellaneous Group

The commands in Table 8.2 are useful to manipulate the list size and exit gdb.
An Example

A sample gdb session on p r o c e x l . asm is shown in Program 8.2. The 1 2 0 command on
line 9 displays the source code centered on the source code line 20. Before issuing the r command on line 22, we insert a breakpoint at source code line 20 using the b r e a k command on
line 20. The run command executes the program until it hits line 20. Then it stops and prints
breakpoint information. Note that we entered two input numbers (1234 and 5678) before hitting
the breakpoint.
To check that these two input numbers are read into ECX and EDX registers, we issue the
i n f o r e g i s t e r s command specifying these two registers (see line 28). The output of this
command shows that these registers have indeed received the two input numbers.
We run the sum procedure in single-step mode (see commands on lines 31, 33, and 35). To
see if the result in EAX is the sum of the two input values, we display the contents of the three
registers (lines 38-40) using the info registers command on line 37. After verifying, we let the
program continue its execution using the continue command on line 41. Finally, on line 46, we
used the quit command to exit gdb.

178

Assembly Language Programming in Linux

Program 8.2 A sample gdb session
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
3E
39:
40:
41:
42:
43:
44:
45:
46:

GNU gdb Red Hat Linux (5.2.1-4)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are welcome to change it and/or distribute copies of it under
certain conditions. Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.
Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
(gdb) 1 20
first number
15
Getint CX
CX
16
request second number
17
PutStr prompt_msg2
DX = second number
18
Getint DX
19
sum
returns sum in AX
call
20
PutStr sum_msg
display sum
21
Putint AX
22
nwln
23
24 done:
(gdb) break 2 0
Breakpoint 1 at 0x80480b0: file procexl.asm, line 20.
(gdb) r
Starting program: /mnt/hgfs/winXP_D/temp/gdb_test/procexl
Please input the first number: 1234
Please input the second number: 5678
Breakpoint 1, _start () at procexl.asm:20
call
sum
20
returns sum in AX
(gdb) info registers ecx edx
0x4d2 1234
ecx
0xl62e 5678
edx
(gdb)
sum = first number
AX,CX
32
(gdb)
sum = sum + second number
add
AX,DX
33
(gdb) SI
34
ret
(gdb) info registers eax ecx edx
eax
OxlbOO 6912
ecx
0x4d2 1234
edx
0xl62e 5678
(gdb) c
Continuing.
The sum is 6 912
Program exited normally.
(gdb) q

Chapter 8 • Debugging Assembly Language Programs

Menu bar

179

^mmmmmx^mmmi^mm^i^^
§dFt yiew program Commands Status §ource Data

-Hie

Q: tnai n

; ©

e^' <^^ GO' t '

Lookup

F»K!»

Par^ieter passing via registers

ereak

Match

PriiiT

[Help I

Wspl^-"

set

Plot

PftCt" I I A '

.STARTUP
PLitStr promptjttsg'l
Getlnt CX

request fi rst number
CX = first number

PutStr prompt^nsg2
Oetint DX

request second number
DX - second number

call
sum
PutStr swiuiisg
Putint AX
nwl n

returns sum in AX
display sum

Up j Do\wnj

Procedure sum receives two integers in CX and DX.
The sum of the two integers is returned in AX.

GNU DDD 3.3.1 Ci6Ei6-pc-linux-qnu), by Dorothea L(gdb) I

Debugger
• console

Status
line"

ifelcame lo DDD 3.3:1 "Blue Gnu" (1686-pc-linux^gnu)

Figure 8.1 DDD window at the start of p r o c e x i program.

Data Display Debugger
The Data Display Debugger (DDD) acts as a front-end to a command-line debugger. DDD supports several command line debuggers including GDB, DBX, JDB, and so on. Our interest here is
in using DDD as a front-end for the GDB debugger discussed in the last section.
If you installed your Linux following the directions given in Chapter 5, DDD is already installed. However, if you did not install it as part of the Linux installation, or if you want the latest
version, you can install it using the Linux package manager. Also the DDD Web page has details
on the installation process (see the Web Resources section at the end of the chapter for details).
Because DDD is a front-end to GDB, we prepare our program exactly as we do for the GDB
(see "Preparing Your Program" section on page 169). We can invoke DDD on the p r o c e x i
executable by
ddd

procexi

Figure 8.1 shows the initial screen that appears after invoking DDD. The screen consists of the
Source Window that displays the source program, Debugger Console, Status Line, Command Tool
window. Menu Bar, and Tool Bar. The debugger console acts as the program's input/output console
to display messages, to receive input, and so on.

180

Assembly Language Programming in Linux

ODD: /mnVhgfs/winXP_DAem p/gd b Jest/P rocexl .asm
Rle

Eciit

View

Program

, on pr'ocexl .asm: 13^
13
14
15
16
17
19

C^ommands

Status

Source

Diata

Lookup -ii^: Ere&K

.STARTUP
PutStr prornpt_msg1
Getint CX

request fi rst number
CX •= first number

PutStr promptjnsg2
Cetint UX
call
sum
PutStr surrunsg
Putint M
nwl 11

request second number
DK - second nurnber

returns sum in AX
21
display sum
22
23
24 done:
25
.EXIT
2B
27
28 ;Procedure sum receives two integers in CX and DX.
29 ; The sum of the two integers is returned in A;<.
30
31
32
AX.CX
sum = fi rst number
mov
33
AX,DX
add
sum = sum + second number
34
ret

Si©IOxS048QbQ
0x80480b5

<_start+48>:
<_start+53>:
Ox80480bG <_start+54>:
Ox30480bb <_start+59>:

call
push
mov
call

OxSCWSOdb
*ecx
$0x8049425,^ecx
0x8048127

I Please input the f i r s t number: 1234
i j Please input the second number: 5878
Breakpoint 1 . _start 0 at procexl.asm:20
Cgdb) 1
I^Disassembling location 0x8048db0...done.

Figure 8.2 DDD window at the breakpoint on line 20. This screenshot also shows the machine
code window and the source code line numbers.

We can insert a breakpoint using the Tool Bar. For example, to insert a breakpoint on line 20,
place the cursor to the left of line 20 and click the breakpoint (red stop sign) on the Tool Bar. This
inserts a breakpoint on line 20, which is indicated by the red stop sign on line 20 as shown in
Figure 8.2. This figure also shows source code line numbers and the Machine Code window. Both
of these can be selected from the Source pull down menu in the Menu Bar.
Once this breakpoint is inserted, we can run the program by clicking Run in the Command
Tool. The big arrow next to the stop sign (on line 20) indicates that the program execution stopped
at that line. While executing the program before reaching the breakpoint on line 20, the program
takes two input numbers as shown in the Debugger Console (see Figure 8.2). We can get information on the breakpoints set in the program by selecting B r e a k p o i n t s . . . in the Source
pull-down menu. For our example program, it gives details on the single breakpoint we set on line
20 (see Figure 8.3). The details provided in this window are the same as those discussed in the last
section. The breakpoint information also includes the number hits as shown in Figure 8.3.
All the execution commands of gcib, discussed in the last section, are available in the
Program pull-down menu (see Figure 8.4). Figure 8.5 shows the screen after single stepping
through the sum procedure. The program is stopped at the r e t instruction on line 34. To verify

Chapter 8 • Debugging Assembly Language Programs

181

^tmmmmmvmmwimm
^

.(®

Propz... Lookup Br<««k... uuc)'>...

nbrsakpoint already M M

tiiiw

Help

Figure 8.3 Breakpoints window.

^^xl
'>^iamsimi^^m
F2' : 1 |
•Run...
Fd

Step

Step [nstruction

stiin+FS :

Next Instruction

3tiitt+F6

Until

Run A^gain
Run [n Execution Window

Rnish

£ontinue

Continue Without Signal

\§iSfS

!
~j

: Shitt+F9 j

Kill

lntetTU(5^t

Esc

—

-fliilJ

Figure 8.4 Details of tine Program pull-down menu.

the functionality of the procedure, we can display the contents of the registers. This is done by selecting R e g i s t e r s . . . in the S t a t u s pull-down menu. The contents of the registers, shown in
Figure 8.6, clearly indicate that the sum of the two input numbers (in the ECX and EDX registers)
is in the EAX register.
The examination commands of gdb are available under Data pull-down menu. A sample
memory examination window is shown in Figure 8.7. This window allows us to specify the memory location, format to be used to display the contents, size of the data, and the number of data
items to be examined. In the window of Figure 8.7, we specified &:prompt_msgl as the location
and s t r i n g as the output format. The size is given as b y t e s and the number of strings to be
examined is set to 1.
By clicking D i s p l a y , the contents are displayed in the Data Window that appears above the
Source Window as shown in Figure 8.8. We can pick the windows we want to see by selecting them

182

Assembly Language Programming in Linux

imo:fmnUnQtslwinXPJjjt&mpfQdhjiestmt€icexl.asm
f\\e

£dit

y^lew

£rogra;n

;0; procex1.asm:l3'
13
H
15
ie
17
18
13

£ommands

Status

Lookup

BceaK

r.f[^!-

Source

D^ata

jHelp

i •:

,STARTUP
PutStr promptjnsgl
Cetiint CX

request first number
CX = first number

PutStr prompt J]fisg2
Cetint DK

request second number
OX = second number

call
sum
PutStr sununsg
Putint AX
nwln

returns sum in AX
display sum

Run

; Interrupt \
•Slepj i'**.P'J
'Next] Nextj 1

21
22
23
24
25
26
27
26 ;Procedure surn receives two integers in CX and DX.
23 ; The sum of the two integers is returned in AX.
30
31
32
AX,CX
sum = f i rst number
mov
AX,OK
add
sum « sum + second number

J f 34

UntilJ FlnishJ
Cont|

Ki 1

DoM/nj
^*J
Undo] Redcl

mfij

ret

0:<80480db
OxSO-ISOde
i0x80480el
0x80480e2

:
:
:
:

mov
add
ret
nop

a;cx,*ax
sedx.^ax

Breakpoint 1 , _start 0 at procexl.asm:20
(gdb) stepi
(gdb) stepi
Cgdb) stepi
(gdb) 1
^ Updating di$plays.,.done.

Figure 8.5 DDD window after single stepping from the breakpoint on line 20.

'vit^KTSt^isn?imiem^m^^^mmmm^'
P«giitftrf
eax
BCX

edx
8bx
B^p
ebp
esi
ed\
eip
&n39«;
cc
S£

0>:1bOO
D«-ld2
0X162e
0x0

B j
Jl'j
1 •;(
\\']

oxbffffsec oxbffffssc

0x0
0x0
0X0
0x80480*1

0x0
0
0
0X3W8001

m ]
p.]
h;p

0x31S
0x23
0x2b

790
35
'13

1^. 1
m-'^l
p:}

; . j _ _ _ - — - — .
-<•' im«ger re jf\i\.9fi

6912
1234
5678
0

J-;.'

v^ At) reCfiUdlS

1
!
t

Close 1

L.—-

Help

1 1

._._J

Figure 8.6 Register window after the single stepping shown in Figure 8.5.

183

Chapter 8 • Debugging Assembly Language Programs

ODD: Examine MeniAry

Figure 8.7 Memory examination window to display three strings.

--^ DDD: / f i j rt V^H] f 5/w (n X P .,.0/1« tn p/g ; i..) s m
£ile

Edit

-\/iew

Program' C o m m a n d s

0:|rn3in
!

/ Lookup
® Fina»
^

Status

®
Braak

§ource

*^
w.icrt

Help j

Data

t ' ^J^t .'•
FrjuT Oi$p4U' t)

Hiaa

RoiftU

>•:!
i

I . ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^

. . .1:

i ^^KBS^^^HSS^I^
— _ _ — - — _ — - , .„

• • i l
i:

1
L
1

1 ;Parameter passing v i a r e g i s t e r s

^ •'

PR0CEX1.ASM

f'.-' ODD

YT'!

-^ihii

Figure 8.8 IVlemory examination window.

from the View pull-down menu. The View menu gives control to select any of the four windows:
Debuger Console Window, Machine Code Window, Source Window, and Data Window.
We can also select to display the contents in the Debugger Console Window using the P r i n t
command. Figure 8.9 shows how we can display the three strings in our program in the Console
window. This Examine Memory window is similar to that shown in Figure 8.7 except that we set
the number of strings to be displayed as 3. The result of executing this x command is shown in
Figure 8.10, which shows the three strings in our program.
Both gdb and DDD provide several other features that are useful in debugging programs. Our
intent here is to introduce some of the basic features of these debuggers. More details on these
debuggers are available from their web sites. We provide pointers to these Web sites at the end of
this chapter.
OOOi: Exatnlne Memory
ilJir^q

Uyies

, j from! ^^'nroriipt„ms9i;

Display

Figure 8.9 Memory examination window.

Hei}>

Assembly Language Programming in Linux

184
Uxyu^yuui <_start+l>:
0x80-18088 <_start+G>:

call

0>f8048127

Cgdb) X /3sb Spronipt_msgl
"Please input the f i r s t number: "
0x80^9364 :
"Please input the second riumb8r: "
0x80=1940-4 :
'The sum i s "
0x8049425 :
' Cgdb) I
U 0x80493e4 ; "Please Inpul Ihe ftrst number:"

-f;

Figure 8.10 CPU window after executing Goto. . . command.

Summary
We started this chapter with a brief discussion of the basic debugging techniques. Since assembly
language is a low-level programming language, debugging tends to be even more tedious than
debugging a high-level language program. It is, therefore, imperative to follow good programming
practices in order to simplify debugging of assembly language programs.
There are several tools available for debugging programs. We discussed two debuggers—gdb
and DDD—in this chapter. While gdb is a command line-oriented debugger, the DDD provides a
nice front-end to it. The best way to learn to use these debuggers is by hands-on experience.

Web Resources
Details on gdb are available from h t t p : //www. gnu. o r g / s o f t w a r e / g d b . The GDB User
Manual is available from h t t p : //www. g n u . o r g / s o f t w a r e / g d b / d o c u m e n t a t i o n .
Details on DDD are available from h t t p : / / w w w . g n u . o r g / s o f t w a r e / d d d .
Manual is available from h t t p : //www. gnu. o r g / m a n u a l / d d d / .

The DDD

PARTY
Assembly Language

9
A First Look at
Assembly Language
The objective of this chapter is to introduce the basics of the assembly language. Assembly language statements can either instruct the processor to perform a task, or direct the assembler during
the assembly process. The latter statements are called assembler directives. We start this chapter with a discussion of the format and types of assembly language statements. A third type of
assembly language statements called macros is covered in the next chapter
Assemblers provide several directives to reserve storage space for variables. These directives
are discussed in detail. The instructions of the processor consist of an operation code to indicate
the type of operation to be performed, and the specification of the data required (also called the
addressing mode) by the operation. Here we describe a few basic addressing modes. A thorough
discussion of this topic is in Chapter 13.
The IA-32 instruction set can be divided into several groups of instructions. This chapter
provides an overview of some of the instructions while the next chapter gives details on some
more instructions. Later chapters discuss these instructions in more detail. The chapter concludes
with a summary.

Introduction
Assembly language programs are created out of three different classes of statements. Statements in
the first class tell the processor what to do. These statements are called executable instructions, or
instructions for short. Each executable instruction consists of an operation code {opcode for short).
Executable instructions cause the assembler to generate machine language instructions. As stated
in Chapter 1, each executable statement typically generates one machine language instruction.
The second class of statements provides information to the assembler on various aspects of the
assembly process. These instructions are called assembler directives or pseudo-ops. Assembler
directives are nonexecutable and do not generate any machine language instructions.
The last class of statements, called macros, provide a sophisticated text substitution mechanism. Macros are discussed in detail in the next chapter.

188

Assembly Language Programming in Linux

Assembly language statements are entered one per line in the source file. All three classes of
the assembly language statements use the same format:
[label]

mnemonic

[operands]

[/comment]

The fields in the square brackets are optional in some statements. As a result of this format, it
is a common practice to align the fields to aid readability of assembly language programs. The
assembler does not care about spaces between the fields.
Now let us look at some sample assembly language statements.
repeat:

inc

result

/increment result by 1

The label r e p e a t can be used to refer to this particular statement. The mnemonic i n c indicates
increment operation to be done on the data stored in memory at a location identified by r e s u l t .
Certain reserved words that have special meaning to the assembler are not allowed as labels. These
include mnemonics such as i n c .
The fields in a statement must be separated by at least one space or tab character. More spaces
and tabs can be used at the programmer's discretion, but the assembler ignores them.
It is a good programming practice to use blank lines and spaces to improve the readability of
assembly language programs. As a result, you rarely see in this book a statement containing all
four fields in a single line. In particular, we almost always write labels on a separate line unless
doing so destroys the program structure. Thus, our first example assembly language statement is
written on two lines as
repeat:
inc

result

/increment result by 1

The NASM assembler provides several directives to reserve space for variables. These directives are discussed in the next section. Assembly language instructions typically require one or
more operands. These operands can be at different locations. There are several different ways we
can specify the location of the operands. These are referred to as the addressing modes. We introduce four simple addressing modes in this chapter. These addressing modes are sufficient to write
simple but meaningful assembly language programs. Chapter 13 gives complete details on the
addressing modes available in 16- and 32-bit modes. Following our discussion of the addressing
modes, we give an overview of some of the instructions available in the IA-32 instruction set.
Starting with this chapter, we give several programming examples in each chapter. We give
a simple example in the "Our First Example" section. A later "Illustrative Examples" section
gives more examples. To understand the structure of these programs, you need to understand the
information presented in Chapter 7. That chapter gives details about the structure of the assembly
language programs presented in this book, the I/O routines we use, and how you can assemble and
link them to create an executable file. If you have skipped that chapter, it is a good time to go back
and review the material presented there.

Data Allocation
In high-level languages, allocation of storage space for variables is done indirectly by specifying
the data types of each variable used in the program. For example, in C, the following declarations
allocate different amounts of storage space for each variable.

Chapter 9 • A First Look at Assembly Language

char
int
float
double

response;
value;
total;
temp;

189

/* allocates 1 byte */
/* allocates 4 bytes */
/* allocates 4 bytes */
/* allocates 8 bytes */

These variable declarations not only specify the amount of storage required, but also indicate how
the stored bit pattern should be interpreted. As an example, consider the following two statements
inC:
unsigned
int

value_l;
value_2;

Both variables use four bytes of storage. However, the bit pattern stored in them would be interpreted differently. For instance, the bit pattern (8FF08DB9H)
1000 1111 1111 0000 1000 1101 1011 1001

stored in the four bytes allocated for v a l u e _ l is interpreted as representing +2.4149 x 10 ^, while
the same bit pattern stored in v a l u e _ 2 would be interpreted as - 1 . 8 8 0 0 6 x 10^.
In the assembly language, allocation of storage space is done by the define assembler directive.
The define directive can be used to reserve and initialize one or more bytes. However, no interpretation (as in a C variable declaration) is attached to the contents of these bytes. It is entirely up to
the program to interpret the bit pattern stored in the space reserved for data.
The general format of the storage allocation statement for initialized data is
[variable-name] define-directive initial-value [,initial-value],•• •

The square brackets indicate optional items. The v a r i a b l e - n a m e is used to identify the
storage space allocated. The assembler associates an offset value for each variable name defined
in the data segment. Note that no colon (:) follows the variable name (unlike a label identifying an
executable statement).
The define directive takes one of the five basic forms:
DB
DW
DD
DQ
DT

Define
Define
Define
Define
Define

Byte
Word
Doubleword
Quadword
Ten Bytes

; allocates 1 byte
; allocates 2 bytes
; allocates 4 bytes
; allocates 8 bytes
; allocates 10 bytes

Let us look at some examples now.
sorted

'y'

This statement allocates a single byte of storage and initializes it to character y . Our assembly
language program can refer to this character location by its name s o r t e d . We can also use
numbers to initialize. For example,
sorted

79H

sorted

IIIIOOIB

190

Assembly Language Programming in Linux

is equivalent to
sorted

'y'

Note that the ASCII value for y is 79H. The following data definition statement allocates two
bytes of contiguous storage and initializes it to 25159.
value

2 515 9

The decimal value 25159 is automatically converted to its 16-bit binary equivalent (6247H). Since
the processor uses the little-endian byte ordering (see Chapter 3), this 16-bit number is stored in
memory as
address:
contents:

x x+1
47 62

You can also use negative values, as in the following example:
balance

-29255

Since the 2's complement representation is used to store negative values, -29,255 is converted to
8DB9H and is stored as
address:
contents:

x x+1
B9 8D

The statement
total

542803535

would allocate four contiguous bytes of memory and initialize it to 542803535 (205A864FH), as
shown below:
address:
contents:

x
4F

x+1
86

x+2 x+3
5A
20

Short and long floating-point numbers are represented using 32 or 64 bits, respectively (see Appendix A for details). We can use DD and DQ directives to assign real numbers, as shown in the
following examples:
floatl
real2

DD 1.234
DQ
123.456

Uninitialized Data

To reserve space for uninitialized data, we use RESB, RESW, and so on. Each reserve directive
takes a single operand that specifies the number of units of space (bytes, words,...) to be reserved.
There is a reserve directive for each define directive.
RESB
RESW
RESD
RESQ
REST

Reserve
Reserve
Reserve
Reserve
Reserve

a Byte
a Word
a Doubleword
a Quadword
Ten Bytes

Chapter 9 • A First Look at Assembly Language

191

Here are some examples:
response
buffer
total

RESB
RESW
RESD

1
100
1

The first statement reserves a byte while the second reserves space for an array of 100 words. T h e
last statement reserves space for a doubleword.
Multiple Definitions
Assembly language programs typically contain several data definition statements. For example,
look at the following assembly language program fragment:
sort
value
total

'Y'
25159
542803535

DB
DW
DD

; ASCII of y = 79H
; 25159D = 6247H
; 542803535D = 205A864FH

W h e n several data definition statements are used as above, the assembler allocates contiguous
memory for these variables. The memory layout for these three variables is
address:
contents:

x
79

x+1
47

x+2
62

x+3
4F

x+4
86

x+5
5A

x+6
20

Multiple data definitions can be abbreviated. For example, the following sequence of eight DB
directives
message

DB
DB
DB
DB
DB
DB
DB
DB

'W
'E'
'L'
'C
'0'
'M'
'E'

' !'

can be abbreviated as
message

'' W
W , ' E ' ,'l^' ,' C , '0' , 'M' , ' E ' , ' ! '

or even more compactly as
message

'WE
'WELCOME!'

Here is another example showing how abbreviated forms simplify data definitions. The definition
ssage

DB
DB
DB
DB
DB

'B'
'Y'

' e'
ODH
OAH

192

Assembly Language Programming in Linux

can be written as
message

'Bye',ODH,OAH

Similar abbreviated forms can be used with the other define directives. For instance, a marks
array of size 8 can be defined and initialized to zero by
marks

DW
DW
DW
DW
DW
DW
DW
DW

0
0
0
0
0
0
0
0

which can be abbreviated as
marks

0, 0, 0, 0, 0, 0, 0, 0

The initialization values of define directives can also be expressions as shown in the following
example.
max_marks

7*2 5

This statement is equivalent to
max_marks

175

The assembler evaluates such expressions at assembly time and assigns the resulting value.
Use of expressions to specify initial values is not preferred, because it affects the readability of
programs. However, there are certain situations where using an expression actually helps clarify
the code. In our example, if max_marks represents the sum of seven assignment marks where
each assignment is marked out of 25 marks, it is preferable to use the expression 7 *25 rather than
175.
Multiple Initializations
In the previous example, if the class size is 90, it is inconvenient to define the array as described.
The TIMES directive allows multiple initializations to the same value. Using TIMES, the marks
array can be defined as
marks

TIMES

DW 0

The TIMES directive is useful in defining arrays and tables.
Symbol Table
When we allocate storage space using a data definition directive, we usually associate a symbolic
name to refer to it. The assembler, during the assembly process, assigns an offset value for each
symbolic name. For example, consider the following data definition statements:

Chapter 9 • A First Look at Assembly Language

.DATA
value
sum
marks
message
charl

193

DW
0
DD
0
TIMES 10 DW
0
DB
'The grade is:',0
DB
?

As noted before, the assembler assigns contiguous memory space for the variables. The assembler also uses the same ordering of variables that is present in the source code. Then, finding
the offset value of a variable is a simple matter of counting the number of bytes allocated to all the
variables preceding it. For example, the offset value of marks is 6 because v a l u e and sum are
allocated 2 and 4 bytes, respectively. The symbol table for the data segment is shown below:
Name
value

Offset

sum
marks
message
charl

0
2
6
26
40

Where Are the Operands
Most assembly language instructions require operands. There are several ways to specify the
location of the operands. These are called the addressing modes. This section is a brief overview of
some of the addressing modes required to do basic assembly language programming. A complete
discussion is given in Chapter 13.
An operand required by an instruction may be in any one of the following locations:
•
•
•
•

in a register internal to the processor;
in the instruction itself;
in main memory (usually in the data segment);
at an I/O port (discussed in Chapter 20).

Specification of an operand that is in a register is called register addressing mode, while immediate addressing mode refers to specifying an operand that is part of the instruction. Several
addressing modes are available to specify the location of an operand residing in memory. The motivation for providing these addressing modes comes from the perceived need to efficiently support
high-level language constructs (see Chapter 13 for details).
Register Addressing IVIode
In this addressing mode, processor's internal registers contain the data to be manipulated by the
instruction. For example, the instruction
mov

EAX,EBX

requires two operands and both are in the processor registers. The syntax of the mov instruction is
mov

destination,source

194

Assembly Language Programming in Linux

The mov instruction copies contents of s o u r c e to d e s t i n a t i o n . The contents of s o u r c e
are not destroyed. Thus,
mov

EAX,EBX

copies the contents of the EBX register into the EAX register. Note that the original contents of
EAX are lost. In this example, the mov instruction is operating on 32-bit data. However, it can
also work on 16- and 8-bit data, as shown below:
mov
mov

BX,CX
AL,CL

Register-addressing mode is the most efficient way of specifying operands because they are within
the processor and, therefore, no memory access is required.
Immediate Addressing Mode

In this addressing mode, data is specified as part of the instruction itself. As a result, even though
the data is in memory, it is located in the code segment, not in the data segment. This addressing
mode is typically used in instructions that require at least two data items to manipulate. In this
case, this mode can only specify the source operand and immediate data is always a constant,
either given directly or via the EQU directive (discussed in the next chapter). Thus, instructions
typically use another addressing mode to specify the destination operand.
In the following example,
mov

AL,7 5

the source operand 75 is specified in the immediate addressing mode and the destination operand
is specified in the register addressing mode. Such instructions are said to use mixed-mode addressing.
The remainder of the addressing modes we discuss here deal with operands that are located in
the data segment. These are called the memory addressing modes. We discuss two memory addressing modes here: direct and indirect. The remaining memory addressing modes are discussed
in Chapter 13.
Direct Addressing Mode

Operands specified in a memory-addressing mode require access to the main memory, usually to
the data segment. As a result, they tend to be slower than either of the two previous addressing
modes.
Recall that to locate a data item in the data segment, we need two components: the segment
start address and an offset value within the segment. The start address of the segment is typically
found in the DS register. Thus, various memory-addressing modes differ in the way the offset
value of the data is specified. The offset value is often called the effective address.
In the direct addressing mode, the offset value is specified direcdy as part of the instruction.
In an assembly language program, this value is usually indicated by the variable name of the
data item. The assembler translates this name into its associated offset value during the assembly
process. To facilitate this translation, assembler maintains a symbol table. As discussed before,
symbol table stores the offset values of all variables in the assembly language program.

Chapter 9 • A First Look at Assembly Language

195

This addressing mode is the simplest of all the memory addressing modes. A restriction associated with the memory addressing modes is that these can be used to specify only one operand.
The examples that follow assume the following data definition statements in the program.
response
tablel
namel

DB
' Y'
; allocates a byte, initializes to Y
TIMES 20 DW 0
; allocates 40 bytes, initializes to 0
DB
' J i m R a y ' ; 7 bytes are initialized to Jim Ray

Here are some examples of the mov instruction:
mov
mov
mov
mov

AL,[response]
[response],'N'
[namel],'K'
[tablel],56

copies Y into AL register
N is written into response
write K as the first character of namel
56 is written in the first element

This last statement is equivalent to t a b 1 e 1 [ 0 ] = 5 6 in the C language.
Indirect Addressing l\/lode

The direct addressing mode can be used in a straightforward way but is limited to accessing simple
variables. For example, it is not useful in accessing the second element of t a b l e l as in the
following C statement:
t a b l e l [1] = 99

The indirect addressing mode remedies this deficiency. In this addressing mode, the offset or
effective address of the data is in one of the general registers. For this reason, this addressing
mode is sometimes referred to as the register indirect addressing mode.
The indirect addressing mode is not required for variables having only a single element (e.g.,
r e s p o n s e ) . But for variables like t a b l e l containing several elements, the starting address of
the data structure can be loaded into, say, the EBX register and then EBX acts as a pointer to an
element in t a b l e l . By manipulating the contents of the EBX register, we can access different
elements of t ab 1 e 1.
The following code assigns 100 to the first element and 99 to the second element of t a b l e l .
Note that EBX is incremented by 2 because each element of t a b l e l requires two bytes.
mov
mov
add
mov

EBX,tablel
[EBX],100
EBX, 2
[EBX],99

copy address of tablel to EBX
tablel[0]=100
EBX = EBX + 2
tablel[l] = 99

Chapter 13 discusses other memory addressing modes that can perform this task more efficiently.
The effective address can also be loaded into a register by the l e a (load effective address)
instruction. The syntax of this instruction is
lea

lea

EBX, [tablel]

Thus,

196

Assembly Language Programming in Linux

can be used in place of the
mov

EBX,tablel

instruction. The difference is that l e a computes the offset values at run time, whereas the mov
version resolves the offset value at assembly time. For this reason, we will try to use the latter
whenever possible. However, l e a offers more flexibility as to the types of s o u r c e operands.
For example, we can write
lea

EBX,[array+ESI]

to load EBX with the address of an element of a r r a y whose index is in the ESI register. However,
we cannot write
mov

EBX, [array+ESI]

; illegal

as the contents of ESI are known at assembly time.

Overview of Assembly Language Instructions
This section briefly reviews some of the remaining assembly language instructions. The discussion
presented here would provide sufficient exposure to the assembly language so that you can write
meaningful assembly language programs.
The MOV Instruction

We have already introduced the mov instruction, which requires two operands and has the syntax
mov

destination,source

The data is copied from s o u r c e to d e s t i n a t i o n and the s o u r c e operand remains unchanged. Both operands should be of the same size. The mov instruction can take one of the
following five forms:
mov
mov
mov
mov
mov

There is no move instruction to transfer data from memory to memory. However, as we will
see in Chapter 17, memory-to-memory data transfer is possible using the string instructions.
Here are some example mov statements:
mov
mov
mov
mov

AL,[response]
DX, [tablel]
[response],'N'
[namel+4],'K'

Chapter 9 • A First Look at Assembly Language

197

Ambiguous IVIoves Moving immediate value into memory sometimes causes ambiguity as to the
type of operand. For example, in the statements
mov
mov
mov
mov

EBX,[tablel]
ESI,[namel]
[EBX],100
[ESI],100

it is not clear, for example, whether a word (2 bytes) or a byte equivalent of 100 is to be written
in the memory. We can clarify this ambiguity by using a type specifier. For example, we can use
WORD type specifier to identify a word operation and BYTE for a byte operation. Using the type
specifiers, we can write
mov
mov

WORD [EBX],100
BYTE [ESI],100

We can also write these statements as
mov
mov

[EBX],WORD 100
[ESI],BYTE 100

Some of the type specifiers available are given below:
Type specifier

Bytes addressed

BYTE
WORD
DWORD
QWORD
TBYTE

1
2
4
8
10

Simple Arithmetic Instructions

The instructin set provides several instructions to perform simple arithmetic operations. In this
section, we describe a few instructions to perform addition and subtraction. We defer a full discussion until Chapter 14.
The INC and DEC Instructions These instructions can be used to either increment or decrement the operands by one. The i n c (INCrement) instruction adds one to its operand and the
d e c (DECrement) instruction subtracts one from its operand. Both instructions require a single
operand. The operand can be either in a register or in memory. It does not make sense to use an
immediate operand such as i n c 5 5 o r d e c 1 0 9 .
The general format of these instructions is
inc
dec

destination
destination

where d e s t i n a t i o n may be an 8-, 16- or 32-bit operand.
inc
dec

EBX
DL

; increment 32-bit register
; decrement 8-bit register

198

Assembly Language Programming in Linux

Let us assume that EBX and DL have 1057H and 5 AH, respectively. After executing the above
two instructions, EBX and DL would have 1058H and 59H, respectively. If the initial values of
EBX and DL are FFFFH and OOH, after executing the two statements the contents of EBX and DL
are changed to OOOOH and FFH, respectively.
Now consider the following program:
.DATA
count
value

0
25

DW
DB

.CODE
mc
dec
move
inc
mov
dec

[count]
[value]
EBX,count
[EBX]
ESI,value
[ESI]

/unambiguous
/unambiguous
/ambiguous
/ambiguous

In the above example,
mc
dec

[count]
[value]

are unambiguous because the assembler knows from the definition of c o u n t and v a l u e that they
are WORD and BYTE operands. However,
mc
dec

[EBX]
[ESI]

are ambiguous because EBX and ESI merely point to an object in memory but the actual object
type (whether a WORD, BYTE, etc.) is not clear. We have to use a type specifier to clarify, as
shown below:
mc
dec

WORD [EBX]
BYTE [ E S I ]

The ADD Instruction The add instruction can be used to add two 8-, 16- or 32-bit operands.
The syntax is
add

destination,source

As with the mov instruction, add can also take the five basic forms depending on how the two
operands are specified. The semantics of the add instruction are
d e s t i n a t i o n = d e s t i n a t i o n + source

Some examples of add instruction are givn in Table 9.1. In general,
inc

EAX

is preferred to
add

EAX,1

as the i n c version improves readability and requires less memory space to store the instruction.
However, both instructions execute at the same speed.

Chapter 9 • A First Look at Assembly Language

199

Table 9.1 Some examples of the add instruction
Before add
Instruction

source

After add
destination

destination
AX = BBB4H

add

AX,DX

DX = AB62H

AX = 1052H

add

BL,CH

BL = 76H

CH = 27H

BL = 9DH

v a l u e = FOH

v a l u e = OOH

DX = C8B9H

DX = FFFFH

add

value,lOH

add

DX,count

count

= 3 74 6H

Table 9.2 Some examples of the sub instruction
After sub

Before sub
instruction

source

sub

AX,DX

DX = AB62H

sub

BL,CH

CH = 27H

sub

value,lOH

sub

DX,count

c o u n t = 3746H

destination

AX = 1052H

AX = 64F0H

BL = 76H

BL = 4FH

v a l u e = FOH

v a l u e = EOH

DX = C8B9H

DX = 9173H

The SUB and CMP Instructions The sub (SUBtract) instruction can be used to subtract two 816- or 32-bit numbers. The syntax is
sub

destination,source

The s o u r c e operand is subtracted from the d e s t i n a t i o n operand and the result is placed in
the d e s t i n a t i o n .
d e s t i n a t i o n = d e s t i n a t i o n — source
Table 9.2 gives examples of the sub instruction.
The cmp (CoMPare) instruction is used to compare two operands (equal, not equal, and so on).
The cmp instruction performs the same operation as the sub except that the result of subtraction
is not saved. Thus, cmp does not disturb the source and destination operands. The cmp instruction
is typically used in conjunction with a conditional jump instruction for decision making. This is
the topic of the next section.
Conditional Execution

The instruction set has several branching and looping instructions to construct programs that require conditional execution. In this section, we discuss a subset of these instructions. A detailed
discussion is in Chapter 15.

200

Assembly Language Programming in Linux

Unconditional Jump The unconditional jump instruction j mp, as its name implies, tells the
processor that the next instruction to be executed is located at the label that is given as part of the
instruction. This jump instruction has the form
jmp

label

where l a b e l identifies the next instruction to be executed. The following example
mvj V

XZji-i.^, J.

inc_again:
inc
jmp
mov

EAX

inc_again
EBX,EAX

results in an infinite loop incrementing EAX repeatedly. The instruction
mov

EBX,EAX

and all the instructions following it are never executed!
From this example, the j mp instruction appears to be useless. Later, we show some examples
that illustrate the use of this instruction.
Conditional Jump In conditional jump instructions, program execution is transferred to the target instruction only when the specified condition is satisfied. The general format is
j

label

where identifies the condition under which the target instruction at l a b e l should be
executed. Usually, the condition being tested is the result of the last arithmetic or logic operation.
For example, the following code
read_char:
mov

DL,0

(code for reading a character into AL)
cmp
je
inc
jmp

AL,ODH
CR_received
CL
read_char

/compare the character to CR
;if equal, jump to CR_received
/Otherwise, increment CL and
;go back to read another
/character from keyboard

CR_received:
mov
DL,AL

reads characters from the keyboard until the carriage return (CR) key is pressed. The character
count is maintained in the CL register. The two instructions
cmp
je

AL,ODH
CR_received

/ODH is ASCII for carriage return
/je stands for jump on equal

Chapter 9 • A First Look at Assembly Language

201

perform the required conditional execution. How does the processor remember the result of the
previous cmp operation when it is executing the j e instruction? One of the purposes of the flags
register is to provide such short-term memory between instructions. Let us look at the actions
taken by the processor in executing these two instructions.
Remember that the cmp instruction subtracts ODH from the contents of the AL register. While
the result is not saved anywhere, the operation sets the zero flag (ZF = 1) if the two operands are
the same. If not, ZF = 0. The zeroflagretains this value until another instruction that affects ZF is
executed. Note that not all instructions affect all the flags. In particular, the mov instruction does
not affect any of the flags.
Thus, at the time of the j e instruction execution, the processor checks ZF and program execution jumps to the labeled instruction if and only if ZF = 1. To cause the jump, the processor loads
the EIP register with the target instruction address. Recall that the EIP register always points to
the next instruction to be executed. Therefore, when the input character is CR, instead of fetching
the instruction
inc

it will fetch the
mov

DL,AL

instruction. Here are some of the conditions tested by the conditional jump instructions:
je
jg
jl
jge
jle
jne

jump if equal
jump if greater
jump if less
jump if greater or equal
jump if less than or equal
jump if not equal

Conditional jumps can also test the values of flags. Some examples are
jz
j nz
jc
j nc

jump if zero (i.e., if ZF = 1)
jump if not zero (i.e., if ZF = 0)
jump if carry (i.e., if CF = 1)
jump if not carry (i.e., if CF = 0)

Example 9,1 Conditional jump examples.
Consider the following code.
go_back:
inc

cmp
AL,BL
statement_l
mov
BL,77H

Table 9.3 shows the actions taken depending on s t a t e m e n t _ l .

•

202

Assembly Language Programming in Linux

Table 9.3 Some conditional jump examples
statement_l

Action taken

go_back

56H

Program control is transferred to
i n c AL

go_back

56H

55H

Program control is transferred to
i n c AL

jg
jl

go_back
go_back

56H

No jump; executes
mov BL,77H

jle
jge

go_back
go_back

56H

Program control is transferred to
i n c AL

jne
jg
jge

go_back
go_back
go_back

27H

26H

Program control is transferred to
i n c AL

These conditional jump instructions assume that the operands compared were treated as signed
numbers. There is another set of conditional jump instructions for operands that are unsigned
numbers. But until these instructions are discussed in Chapter 15, these six conditional jump
instructions are sufficient for writing simple assembly language programs.
When you use these conditional jump instructions, your assembler sometimes complains that
the destination of the jump is "out of range". If you find yourself in this situation, you can use the
trick described on page 326.
Iteration Instruction

Iteration can be implemented with jump instructions. For example, the following code can be used
to execute 50 times.
mov
CL,5 0
repeatl:

dec
CL
jnz
repeatl

;jumps back to repeatl if CL is not 0

The instruction set, however, includes a group of l o o p instructions to support iteration. Here we
describe the basic l o o p instruction. The syntax of this instruction is
loop
target
where t a r g e t is a label that identifies the target instruction as in the jump instructions.
This instruction assumes that the ECX register contains the loop count. As part of executing
the l o o p instruction, it decrements the ECX register and jumps to the t a r g e t instruction if
ECX 7^ 0. Using this instruction, we can write the previous example as

Chapter 9 • A First Look at Assembly Language

203

mov
ECX,5 0
repeatl:

loop
repeatl

Logical Instructions

The instruction set provides several logical instructions including and, or, x o r and n o t . The
syntax of these instructions is
and
or
xor
not

destination,source
destination,source
destination,source
destination

The first three are binary operators and perform bitwise and, o r and x o r logical operations,
respectively. The n o t is a unary operator that performs bitwise complement operation. Truth
tables for the logical operations and, o r and x o r are shown in Table 9.4. Some examples that
explain the operation of these logical instructions are shown in Table 9.5. In this table, all numbers
are expressed in binary.
Logical instructions set some of the flags and therefore can be used in conditional jump instructions to implement high-level language decision structures in the assembly language. Until
we fully discuss the flags in Chapter 14, the following usage should be sufficient to write and
understand the assembly language programs.
In the following example, we test the least significant bit of the data in the AL register, and the
program control is transferred to the appropriate code depending on the value of this bit.
and
AL,01H
je
bit_is_zero

jmp
skipl
bit_is_zero:

To understand how the jump is effective in this example, let us assume that AL = 1010111 OB. The
instruction
and

AL,01H

would make the result OOH and is stored in the AL register. At the same time, the logical operation
sets the zero flag (i.e., ZF = 1) because the result is zero. Recall that j e tests the ZF and jumps to
the target location if ZF = 1. In this example, it is more appropriate to use j z (jump if zero). Thus,
jz

bit_is_zero

can replace the

204

Assembly Language Programming in Linux

Table 9.4 Truth tables for the logical operations
and operation

Input bits

Output bit

source hi

destination bi

0
0
1
1

0
1
0
1

0
0
0
1

o r operation
Output bit

Input bits
source bi

destination bi

0
0
1
1

0
1
0
1

0
1
1
1

x o r operation
Input bits

Output bit

source bi

destination bi

0
0
1
1

0
1
0
1

0
1
1
0

b i t i s zero

instruction. In fact, the conditional jump j e is an alias for j z.
A problem with using the and instruction for testing, as used in the previous example, is that
it modifies the destination operand. For instance, in the last example,
and

AL,01H

changes the contents of AL to either 0 or 1 depending on whether the least significant bit is 0 or 1,
respectively. To avoid this problem, a t e s t instruction is provided. The syntax is

Chapter 9 • A First Look at Assembly Language

205

Table 9.5 Some logical instruction examples
and AL,BL

or AL,BL

xor AL,BL

not AL

10101110

11110000

10100000

11111110

01011110

01010001

01100011

10011100

00000000

11111111

10011100

11000110

00000011

00000010

11000111

11000101

00111001

11110000

00001111

00000000

11111111

00001111

test

destination,source

The t e s t instruction performs logical bitwise and operation like the and instruction except that
the source and destination operands are not modified. However, t e s t sets theflagsjust like the
and instruction. Therefore, we can use
test

AL,01H

instead of
and

AL,01H

in the last example.

Our First Program
This is a simple program that adds up to 10 integers and outputs the sum. The program shown
below follows the assembly language template given in Chapter 7 (see page 155). The program
reads up to 10 integers from the user using G e t L I n t on line 20. Each input integer is read as a
long integer into the EDX register. The maximum number of input values is enforced by the l o o p
instruction on line 28. The loop iteration count is initialized to 10 in ECX on line 16. The user can
terminate the input earlier by entering a zero. Each input value is compared with zero (line 21) and
if it is equal to zero, the conditional branch instruction (j e) on line 22 terminates the read loop.
When the read loop terminates, the sum in EAX is output using P u t L I n t on line 32.
Program 9.1 An example program to find the sum of a set of integers
Adds a set of integers

ADDITION.ASM

Objective: To find the sum of a set of integers.
Input: Requests integers from the user.
Output: Outputs the sum of the input numbers.
include "io.mac"

206

9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

Assembly Language Programming in Linux

.DATA
input_prompt
end_msg
sum_msg

db
db
db

"Please enter at most 10 numbers: ",0
"No more numbers? Enter 0 to end: ",0
"The sum is: ",0

.CODE
.STARTUP
PutStr input_prompt
mov
ECX,10
sub
EAX,EAX
read_loop:
GetLInt
cmp
je
add
cmp
je
PutStr
skip_msg:
loop

EDX
EDX, 0
reading_done
EAX,EDX
ECX,1
skip_msg
end_msg

prompt for input numbers
loop count = 10
sum = 0

read the input number
is it zero?
if yes, stop reading input
if 10 numbers are input
skip displaying end_msg

read_loop

reading_done:
PutStr sum_msg
PutLInt EAX
nwln
.EXIT

write the sum

Note that after reading each input value, the program displays "No more numbers? Enter 0 to
end:" message to inform the user of the other termination condition. However, if 10 numbers have
been read, this message is skipped. This skipping is implemented by the code on lines 24 and 25.
Another point to note is that we used the loop count directly to initialize the ECX register on
line 16. However, from the program maintenance point of view, it is better if we define this as a
constant using the EQU directive, which is discussed in the next chapter.

Illustrative Examples
This section presents several examples that illustrate the use of the assembly language instructions
discussed in this chapter. In order to follow these examples, you should be able to understand the
difference between binary values and character representations. For example, when using a byte
to store a number, number 5 is stored as
OOOOOIOIB

On the other hand, character 5 is stored as
OOIIOIOIB

Character manipulation is easier if you understand this difference and the key characteristics of
ASCII, as discussed in Appendix A.

Chapter 9 • A First Look at Assembly Language

207

Example 9,2 Conversion of lowercase letters to uppercase.
This program demonstrates how indirect addressing can be used to access elements of an array. It
also illustrates how character manipulation can be used to convert lowercase letters to uppercase.
The program receives a character string from the user and converts all lowercase letters to uppercase and displays the string. Characters other than the lowercase letters are not changed in any
way. The pseudocode of Program 9.2 is as follows:
main()
display prompt message
read input s t r i n g
i n d e x := 0
char:= s t r i n g [ i n d e x ]
while (char 7^ NULL)
if ((char > 'a') AND (char < 'z'))
then

char := char + 'A' — 'a'
end if

display char
index := index + 1
char := s t r i n g [index]
end while
end main
You can see from Program 9.2 that the compound if condition requires two cmp instructions
(lines 27 and 29). Also the program uses the EBX register in indirect addressing mode and always
holds the pointer value of the character to be processed. In Chapter 13 we will see a better way of
accessing the elements of an array. The end of the string is detected by
cmp
je

AL,0
done

/ check if AL is NULL

and is used to terminate the while loop (lines 25 and 26).

Program 9.2 Conversion to uppercase by character manipulation
1
2
3
4
5
6
7
8
9
10
11
12
13
14

Uppercase conversion of characters

TOUPPER.ASM

Objective: To convert lowercase letters to
corresponding uppercase letters.
Input: Requests a char, string from the user.
Output: Prints the input string in uppercase.
%include "io.mac"
.DATA
name__prompt
out_msg

db
db

.UDATA
in name

resb

"Please type your name: ",0
"Your name in capitals is: ",0

208

15:
16:
17:
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

Assembly Language Programming in Linux

.CODE
.STARTUP
PutStr name_prompt
GetStr in_name,31
PutStr out_msg
mov
EBX,in_name
process_char
mov
AL, [EBX]
cmp
AL,0
done
je
cmp
AL,'a'
not_lower_case
jl
Ah,'z'
cmp
not_lower_case
jg
lower case:
add
AL,'A'-'a'
not_lower_case:
AL
PutCh
EBX
inc
process_char
jmp
done:
nwln
.EXIT

request character string
read input character string

EBX = pointer to in_name
move the char. to AL
if it is the NULL character
conversion done
if (char < 'a')
not a lowercase letter
if (char > 'z')
not a lowercase letter
convert to uppercase
write the character
EBX points to the next char.
go back to process next char.

Example 9.3 Sum of the individual digits of a number.
This last example shows how decimal digits can be converted from their character representations
to the equivalent binary. The program receives a number (maximum 10 digits) and displays the
sum of the individual digits of the input number. For example, if the input number is 45213, the
program displays 15. Since ASCII assigns a special set of contiguous values to the digit characters,
it is straightforward to get their numerical value (as discussed in Appendix A). All we have to do
is to mask off the upper half of the byte, as is done in Program 9.3 (line 28) by
and

AL,OFH

Alternatively, we can also subtract the character code for 0
sub

AL,'0'

instead of masking the upper half byte. For the sake of brevity, we leave writing the pseudocode
of Program 9.3 as an exercise.

Program 9.3 Sum of individual digits of a number
Add individual digits of a number

ADDIGITS.ASM

Objective: To find the sum of individual digits of

Chapter 9 • A First Look at Assembly Language

4:
5
6
7
8
9
10

11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

/
%include

209

a given number. Shows character to binary
conversion of digits.
Input: Requests a number from the user.
Output: Prints the sum of the individual digits.
"io.mac"

.DATA
number_prompt
out_msg

db
db

.UDATA
number

resb

"Please type a number (<11 digits): ",0
"The sum of individual digits is: ",0

.CODE
.STARTUP
PutStr numb e r_p r omp t
GetStr number,11

request an input number
read input number as a string

mov
sub

EBX,number
DX,DX

EBX = address of number
DX = 0 -- DL keeps the sum

AL,[EBX]
AL,0
done
AL,0FH
DL,AL

move the digit to AL
if it is the NULL character
sum is done
mask off the upper 4 bits
add the digit to sum
update EBX to point to next digit

repeat_add:

mov
cmp
je
and
add
inc
jmp

EBX
repeat_add

done:
PutStr
Putint
nwln
.EXIT

out_msg

write sum

Summary
The structure of the stand-alone assembly language program is described in Chapter 7. In this
chapter, we presented basics of the assembly language programming. We discussed two types of
assembly language statements: (a) Executable statements that instruct the CPU as to what to do;
(b) Assembler directives that facilitate the assembly process.
We have discussed the assembler directives to reserve space for variables. For initialized variables, we can use a define directive (DB, DW, and so on). To reserve space for uninitialized data,
we use RESB, RESW, and so on. The TIMES directive can be used for multiple initializations.
We introduced some simple addressing modes to specify the location of the operands. The
register addressing mode specifies the operands located in a processor register. The immediate addressing mode is used to specify constants. The remaining addressing modes specify the operands
located in the memory. We discussed two memory addressing modes—direct and indirect. The
remaining addressing modes are discussed in Chapter 13.

210

Assembly Language Programming in Linux

The instruction set consists of several groups of instructions—arithmetic, logical, shift, and
so on. We presented a few instructions in each group so that we can write meaningful assembly
language programs. We will introduce some more instructions in the next chapter.

10
More on Assembly
Language
This chapter continues the assembly language overview from the last chapter After the introduction, we discuss the data exchange and translate instructions. Then we describe the assembler
directives to define constants—numeric as well as string constants. Next we discuss macros supported by NASM. Macros provide a sophisticated text substitution mechanism and are useful in
program maintenance. NASM allows definition of macros with parameters. We use several examples to illustrate the application of the instructions discussed here. The performance advantage
of the translation instruction is demonstrated in the last section. The chapter concludes with a
summary.

Introduction
As mentioned in the last chapter, three types of statements are used in assembly language programs: instructions, assembler directives, and macros. We have discussed several instructions and
directives in the last chapter. For example, we used assembler directives to allocate storage space
for variables. This chapter continues our discussion from the last chapter and covers a few more
processor instructions, some assembler directives to define constants, and macros.
We present some more instructions of the IA-32 instruction set. We describe two instructions
for data exchange and translation: xchg and x l a t . The xchg instruction exchanges two data
values. These values can be 8, 16, or 32 bit values. This instruction is particularly useful in sort
applications. The x l a t instruction translates a byte value. We also discuss the shift and rotate
family of instructions. We illustrate the use of these instructions by means of several examples.
Next we discuss the NASM directives to define constants. If you have used the C language,
you already know the utility of %def i n e in program maintenance. We describe three NASM
directives: EQU, %assign and %def i n e . The EQU can be used to define numeric constants.
This directive does not allow redefinition. For example, the following assembler directive defines
a constant CR. The ASCII carriage-return value is assigned to it by the EQU directive.
CR

EQU

ODH

;carriage-return character

212

Assembly Language Programming in Linux

As mentioned, we cannot redefine CR to a different value later in the program. The %assign can
also be used to define numeric constants. However, it allows redefinition. The %def i n e directive
can be used to define both string and numeric constants.
The last topic introduces the macros supported by the NASM assembler. Macros are used as a
shorthand notation for a group of statements. Macros permit the assembly language programmer
to name a group of statements and refer to the group by the macro name. During the assembly
process, each macro is replaced by the group of statements that it represents and assembled in
place. This process is referred to as macro expansion. We use macros to provide the basic input
and output capabilities to our stand-alone assembly language programs.

Data Exchange and Translate Instructions
This section describes the data exchange (xchg) and translation ( x l a t ) instructions. Other data
transfer instructions such as movsx and movzx are discussed in Chapter 14.
The XCHG Instruction

The xchg instruction exchanges 8-, 16-, or 32-bit source and destination operands. The syntax is
similar to that of the mov instruction. Some examples are
xchg
xchg
xchg

EAX,EDX
[response],CL
[total],DX

As in the mov instruction, both operands cannot be located in memory. Note that this restriction is
applicable to most instructions. Thus,
xchg

[ r e s p o n s e ] , [namel]

; illegal

is invalid. The xchg instruction is convenient because we do not need a third register to hold a
temporary value in order to swap two values. For example, we need three mov instructions
mov
mov
mov

ECX,EAX
EAX,EDX
EDX,ECX

to perform xchg EAX, EDX. This instruction is especially useful in sorting applications. It is
also useful to swap the two bytes of 16-bit data to perform conversions between little-endian and
big-endian forms, as in the following example:
xchg

AL,AH

Another instruction, bswap, can be used to perform such conversions on 32-bit data. The
format is
bswap

This instruction works only on the data located in a 32-bit register.

Chapter 10 • More on Assembly Language

213

The XLAT Instruction

The x l a t (translate) instruction can be used to perform character translation. The format of this
instruction is shown below:
xlatb
To use this instruction, the EBX register must to be loaded with the starting address of the
translation table and AL must contain an index value into the table. The x l a t instruction adds
contents of AL to EBX and reads the byte at the resulting address. This byte replaces the index
value in the AL register. Since the 8-bit AL register provides the index into the translation table, the
number of entries in the table is limited to 256. An application of x l a t is given in Example 10.6.

Shift and Rotate Instructions
This section describes some of the shift and rotate instructions supported by the instruction set.
The remaining instructions in this family are discussed in Chapter 16.
Shift Instructions

The instruction set provides several shift instructions. We discuss the following two instructions
here: s h l (SHift Left) and s h r (SHift Right). The s h l instruction can be used to left shift a
destination operand. Each shift to the left by one bit position causes the leftmost bit to move to the
carry flag (CF). The vacated rightmost bit is filled with a zero. The bit that was in CF is lost as a
result of this operation.
SHL

Bit Position:

The s h r instruction works similarly but shifts bits to the right as shown below:
CF

SHR
Bit Position:

The general formats of these instructions are
shl
shl

destination,count
destination,CL

shr
shr

destination,count
destination,CL

The destination can be an 8-, 16- or 32-bit operand stored either in a register or in memory.
The second operand specifies the number of bit positions to be shifted. The first format specifies
the shift count directly. The shift count can range from 0 to 31. The second format can be used
to indirectly specify the shift count, which is assumed to be in the CL register. The CL register
contents are not changed by either the s h l or s h r instructions. In general, the first format is
faster!
Even though the shift count can be between 0 and 31, it does not make sense to use count
values of zero or greater than 7 (for an 8-bit operand), or 15 (for a 16-bit operand), or 31 (for a

Assembly Language Programming in Linux

214

Table 10,1 Some examples of the shift instructions

Before shift
Instruction

After shift

ALorAX

shl

AL,1

1010 1110

0101 1100

shr

AL,1

1010 1110

01010111

mov
shl

CL,3
AL,CL

0110 1101

0110 1000

mov
shr

CL,5
AX,CL

1011 11010101 1001

0000 0101 1110 1010

32-bit operand). As indicated, shift count cannot be greater than 31. If a greater value is specified,
only the least significant 5 bits of the number are taken as the shift count. Table 10.1 shows some
examples of the s h l and s h r instructions.
The following code shows another way of testing the least significant bit of the data in the AL
register.

shr
AL,1
jnc
bit_is_zero

jmp
skipl
bit_is_zero:

If the value in the AL register has a 1 in the least significant bit position, this bit will be in
the carry flag after the s h r instruction has been executed. Then we can use a conditional jump
instruction that tests the carry flag. Recall that the j c (jump if carry) would cause the jump if
CF = 1 and j nc G^mp if no carry) causes jump only if CF = 0.
Rotate Instructions

A drawback with the shift instructions is that the bits shifted out are lost. There may be situations
where we want to keep these bits. The rotate family of instructions provides this facility. These
instructions can be divided into two types: rotate without involving the carry flag, or through the
carry flag. We will briefly discuss these two types of instructions next.

Chapter 10 • More on Assembly Language

215

Table 10.2 Some examples of the rotate instructions

Instruction

Before execution
ALorAX

AL or AX

After execution

rol

AL,1

1010 1110

0101 1101

ror

AL,1

10101110

01010111

mov
rol

CL,3
AL,CL

0110 1101

0110 1011

mov
ror

CL,5

1011 11010101 1001

1100 1101 1110 1010

AX,CL

Rotate Without Carry

There are two instructions in this group:

r o l (Rotate Left)
r o r (Rotate Right)
The format of these instructions is similar to the shift instructions and is given below:
rol
rol

destination,count
destination,CL

ror
ror

destination,count
destination,CL

The r o l instruction performs left rotation with the bits falling off on the left placed on the
right side, as shown below:

ROL

Bit Position:

The r o r instruction performs right rotation as shown below:
^
CF
Bit Position:

For both of these instructions, the CF will catch the last bit rotated out of the destination. The
examples in Table 10.2 illustrate the rotate operation.
Rotate Through Carry The instructions
r c l (Rotate through Carry Left)
r c r (Rotate through Carry Right)

Assembly Language Programming in Linux

216

Table 10.3 Some rotate through carry examples

Before execution
Inst ruction

After execution

ALorAX

rcl

AL, 1

1010 1110

0101 1100

rcr

AL,1

1010 1110

11010111

mov
rcl

AL,CL

0110 1101

1011 11010101 1001

10010101 1110 1010

CL,3

mov
rcr

CL,5
AX,CL

include the carry flag in the rotation process. That is, the bit that is rotated out at one end goes into
the carry flag and the bit that was in the carry flag is moved into the vacated bit, as shown below:

RCL

Bit Position:

n,
RCR

1
CF

Bit Position:

Some examples of the r c l and r c r instructions are given in Table 10.3.
The r c l and r c r instructions provide flexibility in bit rearranging. Furthermore, these are
the only two instructions that take the carryflagbit as an input. This feature is useful in multiword
shifts. As an example, suppose that we want to right shift the 64-bit number stored in EDX:EAX
(the lower 32 bits are in EAX) by one bit position. This can be done by
shr
rcr

EDX,1
EAX,1

The s h r instruction moves the least significant bit of EDX into the carry flag. The r c r instruction copies this carry flag value into the most significant bit of EAX. Chapter 16 introduces two
doubleshift instructions to facilitate shifting of 64-bit numbers.

Chapter 10 • More on Assembly Language

217

Defining Constants
NASM provides several directives to define constants. In this section, we discuss three directives—
EQU, % a s s i g n and %def i n e .
The EQU Directive
The syntax of the EQU directive is
name

EQU

expression

which assigns the result of the e x p r e s s i o n to name. For example, we can use
NUM_OF_STUDENTS

EQU

to assign 90 to NUM_OF_STUDENTS. It is customary to use capital letters for these names in order
to distinguish them from variable names. Then, we can write
mov

ECX,NUM_0F_STUDENTS

cmp

EAX,NUM_OF_STUDENTS

to move 90 into the ECX register and to compare EAX with 90. Defining constants this way has
two advantages:
1. Such definitions increase program readability. This can be seen by comparing the statement
mov

ECX,NUM_OF_STUDENTS

mov

ECX,90

with
The first statement clearly indicates that we are moving the class size into the ECX register.
2. Multiple occurrences of a constant can be changed from a single place. For example, if the
class size changes from 90 to 100, we just need to change the value in the EQU statement.
If we didn't use the EQU directive, we have to scan the source code and make appropriate
changes—a risky and error-prone process!
The operand of an EQU statement can be an expression that evaluates at assembly time. We
can, for example, write
NUM OF ROWS
NUM OF COLS
ARRAY SIZE

EQU
EQU
EQU

50
10
NUM OF ROWS * NUM OF COLS

to define ARRAY_SIZE to be 500.
The symbols that have been assigned a value cannot be reassigned another value in a given
source module. If such redefinitions are required, you should use % a s s i g n directive, which is
discussed next.

218

Assembly Language Programming in Linux

The %assign Directive

This directive can be used to define numeric constants like the EQU directive. However, %as s i g n
allows redefinition. For example, we define
%assign

j+1

and later in the code we can redefine it as
%assign

j+2

Like the EQU directive, it is evaluated once when % a s s i g n is processed.
The % a s s i g n is case sensitive. That is, i and I are treated as different. We can use
% i a s s i g n for case insensitive definition.
Both EQU and % a s s i g n directives can be used to define numeric constants. The next directive
removes this restriction.
The %def i n e Directive

This directive is similar to the #def i n e in C. It can be used to define numeric as well as string
constants. For example
%define

[EBP+4]

replaces XI by [EBP+4]. Like the last directive, it allows redefinition. For example, we can
redefine XI as
%define

[EBP+20]

The %def i n e directive is case sensitive. If you want the case insensitive version, you should use
the %idef i n e directive.

Macros
Macros provide a means by which a block of text (code, data, etc.) can be represented by a name
(called the macro name). When the assembler encounters that name later in the program, the
block of text associated with the macro name is substituted. This process is referred to as macro
expansion. In simple terms, macros provide a sophisticated text substitution mechanism.
In NASM, macros can be defined with %macro and %endmacro directives. The macro text
begins with the %macro directive and ends with the %endmacro directive. The macro definition
syntax is
%macro

macro_name
para_count

%endmacro

The p a r a _ c o u n t specifies the number parameters used in the macro. The macro_name is the
name of the macro that, when used later in the program, causes macro expansion. To invoke or
call a macro, use the macro_name and supply the necessary parameter values.

Chapter 10 • More on Assembly Language

219

Example 10.1 A parameterless macro,
Here is our first macro example that does not require any parameters. Since using left-shift to
multiply by a power of two is more efficient than using multiplication, let us write a macro to do
this.
%macro

multEAX_by_16
sal
EAX,4
%endmacro

The macro code consists of a single s a l instruction, which will be substituted whenever the macro
is called. Now we can invoke this macro by using the macro name multEAX_by_16, as in the
following example:
mov
EAX,2 7
multEAX_by_16

When the assembler encounters the macro name multEAX_by_16, it is replaced (i.e., text substituted) by the macro body. Thus, after the macro expansion, the assembler finds the code
mov
sal

EAX,2 7
EAX,4
D

Macros with Parameters Just as with procedures, using parameters with macros aids in writing
more flexible and useful macros. The previous macro always multiplies EAX by 16. By using
parameters, we can generalize this macro to operate on a byte, word, or doubleword located either
in a general-purpose register or memory. The modified macro is
%macro

mult_by_16 1
sal
%1,4

%endmacro

This macro takes one parameter, which can be any operand that is valid in the s a l instruction.
Within the macro body, we refer to the parameters by their number as in % 1. To multiply a byte in
the DL register
mult_by_16

can be used. This causes the following macro expansion:
sal

DL,4

Similarly, a memory variable count, whether it is a byte, word, or doubleword, can be multiplied by 16 using
mult_by_16

count

Such a macro call will be expanded as

220

Assembly Language Programming in Linux

sal

count,4

Now, at least superficially, mult_by_16 looks like any other assembly language instruction,
except that it is defined by us. These are referred to as macro-instructions.
Example 10.2 Memory-to-memory data transfer macro.
We know that memory-to-memory data transfers are not allowed. We have to use an intermediate
register to facilitate such a data transfer. We can write a macro to perform memory-to-memory data
transfers using the basic instructions of the processor. Let us call this macro, which exchanges the
values of two memory variables, mxchg to exchange doublewords of data in memory.
%macro

mxchg
xchg
xchg
xchg
%endmacro

2
EAX,%1
EAX,%2
EAX,%1

For example, when this macro is invoked as
mxchg

valuel,value2

it exchanges the memory words v a l u e 1 and v a l u e 2 while leaving EAX unaltered.

•

To end this section, we give couple of examples from the i o . mac file.
Example 10.3 Put I n t macro definition from i o . mac file.
This macro is used to display a 16-bit integer, which is given as the argument to the macro, by
calling p r o c _ P u t I n t procedure. The macro definition is shown below:
%macro

Putint
push
mov
call
pop
%endmacro

1
AX
AX,%1
proc_PutInt
AX

The P u t i n t procedure expects the integer to be in AX. Thus, in the macro body, we moves
the input integer to AX before calling the procedure. Note that by using the push and pop, we
preserve the AX register.
•
Example 10.4 G e t S t r macro definition from i o . mac file.
This macro takes one or two parameters: a pointer to a buffer and an optional buffer length. The
input string is read into the buffer. If the buffer length is given, it will read a string that is one less
than the buffer length (one byte is reserved for the NULL character). If the buffer length is not
specified, a default value of 81 is assumed. This macro calls p r o c _ G e t S t r procedure to read
the string. This procedure expects the buffer pointer in EDI and buffer length in ESI register. The
macro definition is given below:

Chapter 10 • More on Assembly Language

%macro

GetStr
push
push
mov
mov
call
pop
pop
%endmacro

221

1-2 81
ESI
EDI
EDI,%1
ESI,%2
proc_GetStr
EDI
ESI

This macro is different from the previous one in that the number of parameters can be between 1
and 2. This condition is indicated by specifying the range of parameters (1-2 in our example). A
further complication is that, if the second parameter is not specified, we have to use the default
value (81 in our example). As shown in our example, we include this default value in the macro
definition. Note that this default value is used only if the buffer length is not specified.
•
Our coverage of macros is a small sample of what is available in NASM. You should refer to
the latest version of the NASM manual for complete details on macros.

Our First Program
This program reads a key from the input and displays its ASCII code in binary. It then queries the
user as to whether he/she wants to quit. Depending on the response, the program either requests
another character input from the user, or terminates.
To display the binary value of the ASCII code of the input key, we test each bit starting with
the most significant bit (i.e., leftmost bit). The mask is initialized to 80H (=10000000B), which
tests only the most significant bit of the ASCII value. If this bit is 0, the instruction on line 28
test

AL,mask

sets the zeroflag(assuming that the ASCII value is in the AL register). In this case, a 0 is displayed
by directing the program flow using the j z instruction (line 29). Otherwise, a 1 is displayed. The
mask is then divided by 2, which is equivalent to right shifting mask by one bit position. Thus,
we are ready for testing the second most significant bit. The process is repeated for each bit of the
ASCII value. The pseudocode of the program is given below:
main()
read_char:
display prompt message
read input character into c h a r
display output message text
mask := BOH {AH is used to store mask}
count := 8 {CX is used to store c o u n t }
repeat
if ((char AND mask) = 0)
then
write 0
else
write 1
end if

222

Assembly Language Programming in Linux

mask := mask/2 {can be done by s h r }
c o u n t := c o u n t — 1
until (count = 0)
display query message
read response
if (response = 'Y')
then
goto done
else
goto r e a d _ c h a r
end if
done:
return
end main
The assembly language program, shown in Program 10.1, follows the pseudocode in a straightforward way. Note that the instruction set provides an instruction to perform integer division.
However, to divide a number by 2, s h r is much faster than the divide instruction. More details
about the division instructions are given in Chapter 14.

Program 10.1 Conversion of ASCII to binary representation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Binary equivalent of characters

BINCHAR.ASM

Objective: To print the binary equivalent of
ASCII character code.
Input: Requests a character from the user.
Output: Prints the ASCII code of the
input character in binary.
%include "io.mac"
.DATA
char_prompt
out_msgl
out_msg2
query_msg

db
db
db
db

"Please input a character: ",0
"The ASCII code of '",0
"' in binary is ",0
"Do you want to quit (Y/N): ",0

.CODE
.STARTUP
read_char:
PutStr char_prompt
GetCh
AL
PutStr
PutCh
PutStr
mov
mov
print_bit:

out_msgl
AL
out_msg2
AH,80H
CX,8

request a char, input
read input character

mask byte = 8 0H
loop count to print 8 bits

Chapter 10 • More on Assembly Language

28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

test
jz
PutCh
jmp
print_0:
PutCh
skipl:
shr

AL,AH
print_0
'1'

223

test does not modify AL
; if tested bit is 0, print it
otherwise, print 1

skipl
'0'

print 0
right shift mask bit to test

AH,1

/ next bit of the ASCII code
loop
nwln
PutStr
GetCh
cmp
jne

print_bit
query_msg
AL
AL,'Y'
read char

done

; query user whether to terminate
read response
if response is not 'Y'
; read another character
otherwise, terminate program

.EXIT

Illustrative Examples
This section presents two examples that perform ASCII to hex conversion. One example uses
character manipulation for the conversion while the other uses the x l a t instruction.
Example 10.5 ASCII to hexadecimal conversion using character manipulation.
The objective of this example is to show how numbers can be converted to characters by using
character manipulation. In order to get the least significant hex digit, we have to mask off the
upper half of the byte and then perform integer to hex digit conversion. The example shown below
assumes that the input character is L, whose ASCII value is 4CH.
mask off

convert

L ^-5^" 01001 lOOB ^PP^^^^^ 00001 lOOB ' ^ " C
Similarly, to get the most significant hex digit we have to isolate the upper half of the byte and
move these four bits to the lower half, as shown below:
mask off

shift right

convert

L Ai5Ji 01001 lOOB ' ° " ^ " " OlOOOOOOB ' " ^ " " ^ OOOOOIOOB ' ^ ^ 4
Notice that shifting right by four bit positions is equivalent to performing integer division by 16.
The pseudocode of the program shown in Program 10.2 is as follows:
main()
read_char:

display prompt message
read input character into c h a r
display output message text
temp := c h a r

224

Assembly Language Programming in Linux

c h a r := c h a r AND FOH {mask off lower half}
c h a r := char/16 {shift right by 4 positions}
{The last two steps can be done by s h r }
convert c h a r to hex equivalent and display
c h a r := temp {restore c h a r }
c h a r := c h a r AND OFH {mask off upper half}
convert c h a r to hex equivalent and display
display query message
read response
if (response = 'Y')
then
goto done
else
goto r e a d _ c h a r
end if
done :
return
end main
To convert a number between 0 and 15 to its equivalent in hex, we have to divide the process
into two parts depending on whether the number is below 10 or not. The conversion using character
manipulation can be summarized as follows:
if (number < 9)
then
write (number + '0')
then
write (number + 'A - 10)
end if
If the number is between 0 and 9, we add the ASCII value for character 0 to convert the number
to its character equivalent. For instance, if the number is 5 (00000lOlB), it should be converted
to character 5, whose ASCII value is 35H (001 lOlOlB). Therefore, we have to add 30H, which is
the ASCII value of character 0. This is done in Program 10.2 by
add

AL,'0'

on line 31. If the number is between 10 and 15, we have to convert it to a hex digit between A and
F. You can verify that the required translation is achieved by
number - 10 + ASCII value for character A
In Program 10.2, this is done by
add

on line 34.

AL,'A'-10

Chapter 10 • More on Assembly Language

225

Program 10.2 ConversJon to hexadecimal by character manipulation
1

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

Hex equivalent of characters

HEXICHAR.ASM

Objective: To print the hex equivalent of
ASCII character code.
Input: Requests a character from the user.
Output: Prints the ASCII code of the
input character in hex.
%include "io.mac"
.DATA
char_prompt
out_msgl
out_msg2
query_msg

db
db
db
db

"Please input a character: ",0
"The ASCII code of '",0
"' in hex is ",0
"Do you want to quit (Y/N): ",0

.CODE
.STARTUP
read_char:
PutStr char_prompt
GetCh
AL
PutStr
PutCh
PutStr
mov
shr
mov
print_digit:
cmp
jg
add
jmp
A_to_F:
add

out_msgl
AL
out_msg2
AH,AL
AL,4
CX,2

request a char, input
read input character

save input character in AH
move upper 4 bits to lower half
loop count - 2 hex digits to print

AL, 9
A_to_F
AL,'0'
skip

if greater than 9
convert to A through F digits
otherwise, convert to 0 through 9

AL,'A'-10

subtract 10 and add 'A'
to convert to A through F

PutCh
mov
and
loop
nwln
PutStr
GetCh

AL
AL,AH
AL,OFH
print_digit

write the first hex digit
restore input character in AL
mask off the upper half byte

query_msg
AL

query user whether to terminate
read response

cmp
jne

AL,'Y'
read char

if response is not 'Y'
read another character
otherwise, terminate program

skip:

done:
.EXIT

226

Assembly Language Programming in Linux

Example 10.6 ASCII to hexadecimal conversion using the xlat instruction,
The objective of this example is to show how the use of x l a t simplifies the solution of the last
example. In this example, we use the x l a t instruction to convert a number between 0 and 15 to
its equivalent hex digit. The program is shown in Program 10.3. To use x l a t we have to construct
a translation table, which is done by the following statement (line 17):
hex table

'0123456789ABCDEF'

We can then use the number as an index into the table. For example, 10 points to A, which is the
equivalent hex digit. In order to use the x l a t instruction, EBX should point to the base of the
h e x _ t a b l e and AL should have the number. The instructin on line 29 loads the h e x _ t a b l e
address into EBX. The rest of the program is straightforward to follow.

Program 10.3 Conversion to hexadecimal by using tine x l a t instruction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

Hex equivalent of characters

HEX2CHAR.ASM

Objective: To print the hex equivalent of
ASCII character code. Demonstrates
the use of xlat instruction.
Input: Requests a character from the user.
Output: Prints the ASCII code of the
input character in hex.
%include "io.mac"
.DATA
char_prompt
db "Please input a character: '
db "The ASCII code of '",0
out_msgl
db "' in hex is ",0
out_msg2
db "Do you want to quit (Y/N): '/O
query_msg
translation table: 4-bit binary to hex
hex table
db "0123456789ABCDEF"
.CODE
.STARTUP
read_char:
PutStr charjprompt
AL
GetCh
PutStr
PutCh
PutStr
mov
mov
shr
xlatb
PutCh
mov
and
xlatb

out_msgl
AL
out_msg2
AH,AL
EBX,hex_table
AL,4
AL
AL,AH
AL,OFH

request a char, input
read input character

save input character in AH
EBX = translation table
move upper 4 bits to lower half
replace AL with hex digit
write the first hex digit
restore input character to AL
mask off upper 4 bits

Chapter 10 • More on Assembly Language

36
37
38
39
40
41
42
43
44

PutCh
nwln
PutStr
GetCh

; write the second hex digit

query_msg
AL

/ query user whether to terminate
; read response

cmp
jne

AL,'Y'
read char

; if response is not 'Y'
; read another character
; otherwise, terminate program

done

227

.EXIT

When to Use the XLAT Instruction
The x l a t instruction is convenient to perform character conversions. Proper use of x l a t would
produce an efficient assembly language program. In this section, we demonstrate by means of two
examples when x l a t is beneficial from the performance point of view.
In general, x l a t is not really useful if, for example, there is a straightforward method or a
"formula" for the required conversion. This is true for conversions that exhibit a regular structure.
An example of this type of conversion is the case conversion between uppercase and lowercase
letters in ASCII. As you know, the ASCII encoding makes this conversion rather simple. Experiment 1 takes a look at this type of example.
The use of the x l a t instruction, however, produces efficient code if the conversion does not
have a regular structure. Conversion from EBCDIC to ASCII is one example that can benefit from
using the x l a t instruction. Conversion to hex is another example, as shown in Examples 10.5 and
10.6. This example is used in Experiment 2 to show the performance benefit that can be obtained
from using the x l a t instruction for the conversion.
Experiment 1

In this experiment, we show how using the x l a t instruction for case conversion of letters deteriorates the performance. We have transformed the code of Example 9.2 to a procedure that can be
called from a C main program. This program keeps track of the execution time. All interaction
with the display is suppressed for these experiments. This case-conversion procedure is called
several times to convert a string of lowercase letters. The string length is fixed at 1000 characters.
We used two versions of the case conversion procedure. The first version does not use the
x l a t instruction for case conversion. Instead, it uses the statement
add

AL,'A'-'a'

as shown in Program 9.2.
The other version uses the x l a t instruction for case conversion. In order to do so, we have to
set up the following conversion table in the data section:
upper_table

'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

Furthermore, after initializing EBX to u p p e r _ t a b l e , the following code
sub
xlatb

AL,'a'

228

Assembly Language Programming in Linux

with xlat
O

without xlat

a 1

100

200

300

400

I—

500

600

Number of calls (in thousands)
Figure 10.1 Performance of the case conversion program.

replaces the code
add

AL,'A'-'a'

You can clearly see the disadvantage of the x l a t version of the code. First of all, it requires
additional space to store the translation table u p p e r _ t a b l e . More important than this is the fact
that the x l a t version requires additional time. Note that the add and sub instructions take the
same amount of time to execute. Therefore, the x l a t version requires additional time to execute
x l a t , which generates a memory read to get the byte from u p p e r _ t a b l e located in the data
segment.
The performance superiority of the first version (i.e., the version that does not use the x l a t
instruction) is clearly shown in Figure 10.1. These results were obtained on a 2.4-GHz Pentium 4
system. In this plot, the x-axis gives the number of times the case conversion procedure is called
to convert a lowercase string of 1,000 characters. The data show that using the x l a t instruction
deteriorates the performance by about 35 percent! For the reasons discussed before, this is clearly
a bad example to use the x l a t instruction.
Experiment 2

In this experiment, we use the hex conversion examples presented in the last section to show the
benefits of the x l a t instruction. As shown in Example 10.5, without using the x l a t , we have
to test the input number to see if it falls in the range of 0-9 or 10-15. However, such testing and
hence the associated overhead can be avoided by using a translation table along with x l a t .
The two programs of Examples 10.5 and 10.6 have been converted to C callable procedures as
in the last experiment. Each procedure receives a string and converts the characters in the input
string to their hex equivalents. However, the hex code is not displayed. The input test string in this
experiment consists of lowercase and uppercase letters, digits, and special symbols for a total of
100 characters.

Chapter 10 • More on Assembly Language

229

without xlat

0.75
T3

with xlat
0.25

— I —

100

200

300

400

500

600

Number of calls (in thousands)
Figure 10.2 Performance of the hex conversion program.

The results, obtained on a 2.4-GHz Pentium 4 system, are shown in Figure 10.2. The data
presented in this figure clearly demonstrate the benefit of using the x l a t in this example. The
procedure that does not use the x l a t instruction is about 45% slower!
The moral of the story is that judicious use of assembly language instructions is necessary in
order to reap the benefits of the assembly language.

Summary
We presented two instructions for data exchange and translation: xchg and x l a t . The first
instruction, which exchanges two data values, is useful in sort applications. The x l a t instruction
translates a byte value. We also discussed the shift and rotate family of instructions.
We presented the NASM directives to define constants—both numeric and string. We described three NASM directives: EQU, %assign and %def i n e . The EQU directive can be used
to define numeric constants. This directive does not allow redefinition. The %assign can also be
used to define numeric constants. However, it allows redefinition. The % d e f i n e directive can be
used to define both string and numeric constants.
We introduced the macros supported by the NASM assembler. Macros permit the assembly
language programmer to name a group of statements and refer to the group by the macro name.
The NASM assembler supports macros with parameters to allow additional flexibility. We used
several examples to illustrate how macros are defined in the assembly language programs.
We also demonstrated the performance advantage of the x l a t instruction under certain conditions. The results show that judicious use of the x l a t instruction provides significant performance
advantages.

11
Writing Procedures
The last two chapters introduced the basics of the assembly language. Here we discuss how procedures are written in the assembly language. Procedure is an important programming construct that
facilitates modular programming. In the IA-32 architecture, the stack plays an important role in
procedure invocation and execution. We start this chapter by giving details on the stack, its uses,
and how it is implemented. We also describe the assembly language instructions to manipulate the
stack.
After this introduction to the stack, we look at the assembly language instructions for procedure
invocation and return. Unlike high-level languages, there is not much support in the assembly
language. For example, we cannot include the arguments in the procedure call. Thus parameter
passing is more involved than in high-level languages. There are two parameter passing methods—
one uses the registers and the other the stack. We discuss these two parameter passing methods in
detail. The last section provides a summary of the chapter

Introduction
A procedure is a logically self-contained unit of code designed to perform a particular task. These
are sometimes referred to as subprograms and play an important role in modular program development. In high-level languages, there are two types of subprograms: procedures SLud functions.
A function receives a list of arguments and performs a computation based on the arguments passed
onto it and returns a single value. In this sense, these functions are very similar to the mathematical
functions.
Procedures also receive a list of arguments just as the functions do. However, procedures, after
performing their computation, may return zero or more results back to the calling procedure. In
the C language, both these subprogram types are combined into a single function construct.
In the C function
int sum (int x, int y)
{
return (x + y ) ;
}

the parameters x and y are called formal parameters or simply parameters and the function body
is defined based on these parameters. When this function is called (or invoked) by a statement like

232

Assembly Language Programming in Linux

total = sum(numberl, number2);

the actual parameters or arguments—number 1 and number2—are used in the computation of
the sum function.
There are two types of parameter passing mechanisms: call-by-value and call-by-reference.
In the call-by-value mechanism, the called function (sum in our example) is provided only the
current values of the arguments for its use. Thus, in this case, the values of these arguments are
not changed in the called function; these values can only be used as in a mathematical function.
In our example, the sum function is invoked by using the call-by-value mechanism, as we simply
pass the values of number 1 and number2 to the called sum function.
In the call-by-reference mechanism, the called function actually receives the addresses (i.e.,
pointers) of the parameters from the calling function. The function can change the contents of these
parameters—and these changes will be seen by the calling function—by directly manipulating the
argument storage space. For instance, the following swap function
void swap (int *a, int *b)
{

int temp;
temp = *a;
*a = *b;
*b = temp;

assumes that it receives the addresses of the two parameters from the calling function. Thus, we
are using the call-by-reference mechanism for parameter passing. Such a function can be invoked
by
swap (Scdatal, &data2) ;

Often both types of parameter passing mechanisms are used in the same function. As an
example, consider finding the roots of the quadratic equation
ax'^ -{-bx + c=^ 0 .
The two roots are defined as
rootl =
root2 =

- 6 4 - V P ^ - 4ac
2a
-b-y/W~- - 4ac

2a
The roots are real if 6^ > 4ac, and imaginary otherwise.
Suppose that we want to write a function that receives a, 6, and c and returns the values of
the two roots (if real) and indicates whether the roots are real or imaginary (see Figure 11.1). The
r o o t s function receives parameters a, b, and c using the call-by-value mechanism, and r o o t l
and r o o t 2 parameters are passed using the call-by-reference mechanism. A typical invocation of
r o o t s is
root_type = roots

(a, b , c,

SCTootl,

&root2);

Chapter 11 • Writing Procedures

233

i n t r o o t s (double a, double b, double c,
double * r o o t l , double *root2)
{
int root_type = 1;
if (4 * a * c <= b * b){ /* roots are real */
*rootl = (-b + sqrt(b*b - 4*a*c))/(2*a);
*root2 = (-b - sqrt(b*b - 4*a*c))/(2*a);

}
else
/* roots are imaginary */
root_type = 0 ;
return (root_type);

Figure 11.1 C function for the quadratic equation

In summary, procedures receive a list of arguments, which may be passed either by the call-byvalue or by the call-by-reference mechanism. If more than one result is to be returned by a called
procedure, the call-by-reference mechanism should be used.
In the assembly language we do not get as much help as we do in high-level languages. The
instruction set provides only the basic support to invoke a procedure. However, there is no support
to pass arguments in the procedure call. If we want to pass arguments to the called procedure, we
have to use some shared space between the callee and caller. Typically, we use either registers or
the stack for this purpose. This leads to the two basic parameter passing mechanisms: registerbased or stack-based. Later we give more details on these mechanisms along with some examples.
Our goal in this chapter is to introduce assembly language procedures. We continue our discussion of procedures in the next chapter, which discusses passing a variable number of arguments,
local variables, and multimodule programs.

What Is a Stack?
Conceptually, a stack is a last-in-first-out (LIFO) data structure. The operation of a stack is analogous to the stack of trays you find in cafeterias. The first tray removed from the stack of trays
would be the last tray that had been placed on the stack. There are two operations associated with
a stack: insertion and deletion. If we view the stack as a linear array of elements, stack insertion
and deletion operations are restricted to one end of the array. Thus, the only element that is directly accessible is the element at the top-of-stack (TOS). In stack terminology, insert and delete
operations are referred to as push and pop operations, respectively.
There is another related data structure, the queue. A queue can be considered as a linear array
with insertions done at one end of the array and deletions at the other end. Thus, a queue is a
first-in-first-out (FIFO) data structure.
As an example of a stack, let us assume that we are inserting numbers 1000 through 1003 into
a stack in ascending order. The state of the stack can be visualized as shown in Figure 11.2. The
arrow points to the top-of-stack. When the numbers are deleted from the stack, the numbers will
come out in the reverse order of insertion. That is, 1003 is removed first, then 1002, and so on.
After the deletion of the last number, the stack is said to be in the empty state (see Figure 11.3).

234

Assembly Language Programming in Linux

-^^

1002

1001

1000

After
inserting
1000

After
inserting
1001

After
inserting
1002

After
inserting
1003

^
-^
Empty
stack

1003

Figure 11.2 An example showing stack growth: Numbers 1000 through 1003 are inserted in ascending order. The arrow points to the top-of-stack.

1003

^
^-

1002

1001

1000

Initial
stack

After
removing
1003

After
removing
1002

After
removing
1001

-^—
-^-

Empty
stack

After
removing
1000

Figure 11.3 Deletion of data items from the stack: The arrow points to the top-of-stack.

In contrast, a queue maintains the order. Suppose that the numbers 1000 through 1003 are
inserted into a queue as in the stack example. When removing the numbers from the queue, the
first number to enter the queue would be the one to come out first. Thus, the numbers deleted from
the queue would maintain their insertion order.

Implementation of the Stack
The memory space reserved in the stack segment is used to implement the stack. The registers SS
and ESP are used to implement the stack. The top-of-stack, which points to the last item inserted
into the stack, is indicated by SS:ESP, with the SS register pointing to the beginning of the stack
segment, and the ESP register giving the offset value of the last item inserted.
The key stack implementation characteristics are as follows:
• Only words (i.e., 16-bit data) or doublewords (i.e., 32-bit data) are saved on the stack, never
a single byte.
• The stack grows toward lower memory addresses. Since we graphically represent memory
with addresses increasing from the bottom of a page to the top, we say that the stack grows
downward,

235

Chapter 11 • Writing Procedures

TOS

7F
BD

ESP
(256)

32
ESP
(254)

TOS

ESP
(250)
SS-

ss-

Empty stack

After pushing

(256 bytes)

21ABH

7FBD329AH

(a)

(b)

(c)

Figure 11.4 Stack implementation in tine IA-32 architecture: SS:ESP points to tine top-of-stack.

• Top-of-stack (TOS) always points to the last data item placed on the stack. The TOS always
points to the lower byte of the last word pushed onto the stack. For example, when we push
21ABH onto the stack, the TOS points to ABH byte as shown in Figure 11.4.
Figure II Aa shows an empty stack with 256 bytes of memory for stack operations. When the
stack is initialized, TOS points to a byte just outside the reserved stack area. It is an error to read
from an empty stack as this causes a stack underflow.
When a word is pushed onto the stack, ESP is first decremented by two, and then the word is
stored at SS.ESP. Since the IA-32 processors use the little-endian byte order, the higher-order byte
is stored in the higher memory address. For instance, when we push 21 ABH, the stack expands by
two bytes, and ESP is decremented by two to point to the last data item, as shown in Figure 11 Ab.
The stack shown in Figure 11 Ac results when we expand the stack further by four more bytes by
pushing the doubleword 7FBD329AH onto the stack.
The stack full condition is indicated by the zero offset value (i.e., ESP = 0). If we try to
insert a data item into a full stack, stack overflow occurs. Both stack underflow and overflow are
programming errors and should be handled with care.
Retrieving a 32-bit data item from the stack causes the offset value to increase by four to
point to the next data item on the stack. For example, if we retrieve a doubleword from the stack
shown in Figure 11.5«, we get 7FBD329AH from the stack and ESP is updated, as shown in
Figure II.5b. Notice that the four memory locations retain their values. However, since TOS is
updated, these four locations will be used to store the next data value pushed onto the stack, as
shown in Figure 11.5c.

236

Assembly Language Programming in Linux

21
AB

TOS

32
9A

TOS —

ESP
(254)

TOS

9A
ESP
(252)

ESP
(250)

ss-

Initial stack
(two data items)
(a)

After removing

After pushing

7FBD329AH

5689H

(b)

(c)

Figure 11.5 An example showing stack insert and delete operations.

Stack Operations
Basic Instructions

The stack data structure allows two basic operations: insertion of a data item into the stack (called
the push operation) and deletion of a data item from the stack (called the pop operation). These
two operations are allowed on word or doubleword data items. The syntax is
push
source
pop

destination

The operand of these two instructions can be a 16- or 32-bit general-purpose register, segment
register, or a word or doubleword in memory. In addition, s o u r c e for the push instruction can be
an immediate operand of size 8, 16, or 32 bits. Table 11.1 summarizes the two stack operations.
On an empty stack shown in Figure 11 Aa the statements
push
push

21ABH
7FBD3 2 9AH

would result in the stack shown in Figure l\.5a. Executing the statement
pop

EBX

on this stack would result in the stack shown in Figure 11.5/? with the register EBX receiving
7FBD329AH.

237

Chapter 11 • Writing Procedures

Table 11.1 Stack operations on 16- and 32-bit data
push

sourcel6

ESP = ESP - 2
SS:ESP = sourcel6

ESP is first decremented by 2 to modify TOS.
Then the 16-bit data from sourceie is copied
onto the stack at the new TOS. The stack expands by 2 bytes.

push

source32

ESP = ESP - 4
SS:ESP = source32

ESP is first decremented by 4 to modify TOS.
Then the 32-bit data from sources 2 is copied
onto the stack at the new TOS. The stack expands by 4 bytes.

pop

destie

destl6 = SS:ESP
ESP = ESP + 2

The data item located at TOS is copied to
d e s t i 6 . Then ESP is incremented by 2 to update TOS. The stack shrinks by 2 bytes.

pop

dest32

dest32 = SS:ESP
ESP = ESP + 4

The data item located at TOS is copied to
dest32. Then ESP is incremented by 4 to update TOS. The stack shrinks by 4 bytes.

Additional Instructions

The instruction set supports two special instructions for stack manipulation. These instructions
can be used to save or restore the flags and general-purpose registers.
Stack Operations on Flags The push and pop operations cannot be used to save or restore the
flags register. For this, two special versions of these instructions are provided:
pushfd
popfd

(push 32-bit flags)
(pop 32-bit flags)

These instructions do not need any operands. For operating on the 16-bit flags register (FLAGS),
we can use pushf w and popf w instructions. If we use pushf the default operand size selects
either pushf d or pushf w. In our programs, since our default is 32-bit operands, pushf is used
as an alias for pushf d. However, we use pushf d to make the operand size explicit. Similarly,
popf can be used as an alias for either popfd or popf w.
Stack Operations on All General-Purpose Registers The instruction set also has special pusha
and popa instructions to save and restore the eight general-purpose registers. The pushad saves
the 32-bit general-purpose registers EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI. These registers are pushed in the order specified. The last register pushed is the EDI register. The popad
restores these registers except that it will not copy the ESP value (i.e., the ESP value is not loaded
into the ESP register as part of the popad instruction). The corresponding instructions for the
16-bit registers are pushaw and popaw. These instructions are useful in procedure calls, as we
will show later. Like the pushf and popf instructions, we can use pusha and popa as aliases.

238

Assembly Language Programming in Linux

Uses of the Stack
The stack is used for three main purposes: as a scratchpad to temporarily store data, for transfer of
program control, and for passing parameters during a procedure call.
Temporary Storage of Data
The stack can be used as a scratchpad to store data on a temporary basis. For example, consider
exchanging the contents of two 32-bit variables that are in the memory: v a l u e 1 and v a l u e 2 .
We cannot use
xchg

valuel,value2

; illegal

because both operands of xchg are in the memory. The code
mov
mov
mov
mov

EAX,valuel
EBX,value2
valuel,EBX
value2,EAX

works, but it uses two 32-bit registers. This code requires four memory operations. However,
due to the limited number of general-purpose registers, finding spare registers that can be used for
temporary storage is nearly impossible in almost all programs.
What if we need to preserve the contents of the EAX and EBX registers? In this case, we need
to save these registers before using them and restore them later as shown below:
;save EAX and EBX registers on the stack
push
EAX
push
EBX
;EAX and EBX registers can now be useid
mov
EAX,va luel
mov
EBX,va,lue2
mov
valuel ,EBX
mov
value2 ,EAX
/restore EAX and EBX registers from the St:a<
pop
EBX
pop
EAX

This code requires eight memory accesses. Because the stack is a LIFO data structure, the sequence of pop instructions is a mirror image of the push instruction sequence.
An elegant way of exchanging the two values is
push
push
pop
pop

valuel
value2
valuel
value2

Notice that the above code does not use any general-purpose registers and requires eight memory operations as in the other example. Another point to note is that p u s h and pop instructions
allow movement of data from memory to memory (i.e., between data and stack segments). This

Chapter 11 • Writing Procedures

239

is a special case because mov instructions do not allow memory-to-memory data transfer. Stack
operations are an exception. String instructions, discussed in Chapter 17, also allow memory-tomemory data transfer.
Stack is frequently used as a scratchpad to save and restore registers. The necessity often arises
when we need to free up a set of registers so they can be used by the current code. This is often
the case with procedures as we will show later.
It should be clear from these examples that the stack grows and shrinks during the course of a
program execution. It is important to allocate enough storage space for the stack, as stack overflow
and underflow could cause unpredictable results, often causing system errors.
Transfer of Control
The previous discussion concentrated on how we, as programmers, can use the stack to store data
temporarily. The stack is also used by some instructions to store data temporarily. In particular,
when a procedure is called, the return address of the instruction is stored on the stack so that the
control can be transferred back to the calling program. A detailed discussion of this topic is in the
next section.
Parameter Passing
Another important use of the stack is to act as a medium to pass parameters to the called procedure.
The stack is extensively used by high-level languages to pass parameters. A discussion on the use
of the stack for parameter passing is deferred to a later section.

Procedure Instructions
The instruction set provides c a l l and r e t (return) instructions to write procedures in the assembly language. The c a l l instruction can be used to invoke a procedure, and has the format
call

proc-name

where proc-name is the name of the procedure to be called. The assembler replaces proc-name
by the offset value of the first instruction of the called procedure.
How Is Program Control Transferred?
The offset value provided in the c a l 1 instruction is not the absolute value (i.e., offset is not relative
to the start of the code segment pointed to by the CS register), but a relative displacement in bytes
from the instruction following the c a l l instruction. Let us look at the example in Figure 11.6.
After the c a l l instruction of main has been fetched, the EIP register points to the next
instruction to be executed (i.e., EIP = 00000007H). This is the instruction that should be executed
after completing the execution of sum procedure. The processor makes a note of this by pushing
the contents of the EIP register onto the stack.
Now, to transfer control to the first instruction of the sum procedure, the EIP register would
have to be loaded with the offset value of the
push

EBP

instruction in sum. To do this, the processor adds the 32-bit relative displacement found in the
c a l l instruction to the contents of the EIP register. Proceeding with our example, the machine

240

Assembly Language Programming in Linux
offset
(in hex)

machine code
(in hex)
main:

00000002 E816000000
00000007 89C3

call
mov

sum
EBX,EAX

; end of main procedure

OOOOOOID

sum:
push

EBP

; end of sum procedure
.••*••••••••*••••••••••••*•*••••••••*•••••••••*•••••••••

avg:
00000028
0000002D

E8F0FFFFFF
89D8

call
mov

sum
EAX,EBX

; end of avg procedure
.••••••••••••*•••••••••••***•••***••••••****•••*•*••*•**••

Figure 11.6 An example to illustrate the transfer of program control.

language encoding of the c a l l instruction, which requires five bytes, is E816000000H. The first
byte E8H is the opcode for the c a l l and the next four bytes give the (signed) relative displacement in bytes. In this example, it is the difference between 000000IDH (offset of the push EBP
instruction in sum) and 00000007H (offset of the instruction mov EBX, EAX in main). Therefore, the displacement should be 000000IDH - 00000007H = 00000016H. This is the displacement value encoded in the c a l l instruction. Note that this displacement value in this instruction
is shown in the little-endian order, which is equal to 00000016H. Adding this difference to the
contents of the EIP register leaves the EIP register pointing to the first instruction of sum.
Note that the procedure call in main is a forward call, and therefore the relative displacement
is a positive number. As an example of a backward procedure call, let us look at the sum procedure
call in the avg procedure. In this case, the program control has to be transferred back. That is, the
displacement is a negative value. Following the explanation given in the last paragraph, we can
calculate the displacement as OOOOOOIDH - 0000002DH = FFFFFFFOH. Since negative numbers
are expressed in 2's complement notation, FFFFFFFOH corresponds to - 1 OH (i.e., — 16D), which
is the displacement value in bytes.
The following is a summary of the actions taken during a procedure call:
ESP = ESP — 2
; push return address onto the stack
SS:ESP = EIP
EIP = EIP + relative displacement ; update EIP to point to the procedure
The relative displacement is a signed 32-bit number to accommodate both forward and backward
procedure calls.

Chapter 11 • Writing Procedures

241

The ret Instruction

The r e t (return) instruction is used to transfer control from the called procedure to the calling
procedure. Return transfers control to the instruction following the c a l l (the mov EBX, EAX
instruction in our example). How will the processor know where this instruction is located? Remember that the processor made a note of this when the c a l l instruction was executed. When
the r e t instruction is executed, the return address from the stack is recovered. The actions taken
during the execution of the r e t instruction are
EIP = SS:ESP ; pop return address at TOS into IP
ESP = ESP + 4 ; update TOS by adding 4 to ESP
An optional integer may be included in the r e t instruction, as in
ret 8
The details on this optional number are covered later.

Our First Program
In our first procedure example, two parameter values are passed onto the called procedure via the
general-purpose registers. The procedure sum receives two integers in the CX and DX registers
and returns the sum of these two integers via AX. No check is done to detect the overflow condition. The main program, shown in Program 11.1, requests two integers from the user and displays
the sum on the screen.
Program 11.1 Parameter passing by call-by-value using registers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

Parameter passing via registers

PROCEXl.ASM

Objective: To show parameter passing via registers.
Input: Requests two integers from the user.
Output: Outputs the sum of the input integers.
%include "io.mac"
.DATA
prompt_msgl DB
"Please input the first number: ",0
prompt_msg2 DB
"Please input the second number: ",0
sum_msg
DB
"The sum is ",0
.CODE
.STARTUP
PutStr prompt_msgl
Getint CX

request first number
CX = first number

PutStr
Getint

prompt_msg2
DX

request second number
DX = second number

call
PutStr
Putint
nwln

sum
sum_msg
AX

returns sum in AX
display sum

242

24
25
26
27
28
29
30
31
32
33
34

Assembly Language Programming in Linux

done:
.EXIT
/Procedure sum receives two integers in CX and DX.
;The sum of the two integers is returned in AX.

mov
add
ret

AX,CX
AX,DX

sum
sum

first number
sum + second number

Parameter Passing
Parameter passing in assembly language is different and more complicated than that used in highlevel languages. In the assembly language, the calling procedure first places all the parameters
needed by the called procedure in a mutually accessible storage area (usually registers or memory).
Only then can the procedure be invoked. There are two common methods depending on the type
of storage area used to pass parameters: register method or stack method. As their names imply,
the register method uses general-purpose registers to pass parameters, and the stack is used in the
other method.
Register Method
In the register method, the calling procedure places the necessary parameters in the generalpurpose registers before invoking the procedure, as we did in the last example. Next, let us look at
the advantages and disadvantages of passing parameters using the register method.
Pros and Cons of the Register Method

The register method has its advantages and disadvan-

tages. These are summarized here.
Advantages
1. The register method is convenient and easier for passing a small number of arguments.
2. This method is also faster because all the arguments are available in registers.
Disadvantages
1. The main disadvantage is that only a few arguments can be passed by using registers, as
there are a limited number of general-purpose registers available in the CPU.
2. Another problem is that the general-purpose registers are often used by the calling procedure
for some other purpose. Thus, it is necessary to temporarily save the contents of these
registers on the stack to free them for use in parameter passing before calling a procedure,
and restore them after returning from the called procedure. In this case, it is difficult to
realize the second advantage listed above, as the stack operations involve memory access.

Chapter 11 • Writing Procedures

243

numberl
TOS
ESP

number2
Return address

Figure 11.7 Stack state after the sum procedure call: Return address is the EIP value pushed onto
the stack as part of executing the call instruction.

Stack Method

In this method of parameter passing, all arguments required by a procedure are pushed onto the
stack before the procedure is called. As an example, let us consider passing the two parameters
required by the sum procedure shown in Program 11.1. This can be done by
push
push
call

numberl
number2
sum

After executing the call instruction, which automatically pushes the EIP contents onto the stack,
the stack state is shown in Figure 11.7.
Reading the two arguments—numberl and number2—is tricky. Since the parameter values
are buried inside the stack, first we have to pop the EIP value to read the two arguments. This, for
example, can be done by
pop
pop
pop

EAX
EBX
ECX

in the sum procedure. Since we have removed the return address (EIP) from the stack, we will
have to restore it by
push

EAX

SO that TOS is pointing to the return address.
The main problem with this code is that we need to set aside general-purpose registers to copy
parameter values. This means that the calling procedure cannot use these registers for any other
purpose. Worse still, what if you want to pass 10 parameters? One way to free up registers is to
copy the parameters from the stack to local data variables, but this is impractical and inefficient.
The best way to get parameter values is to leave them on the stack and read them from the stack
as needed. Since the stack is a sequence of memory locations, ESP + 4 points to number2, and
ESP + 6 to numberl. Note that both numberl and number2 are 16-bit values. For instance,
EBX, [ESP+4]

244

Assembly Language Programming in Linux

numberl

EBP+ 12

number2

EBP+ 8

numberl

Return address

EBP+ 4

number2

EBP, ESP

ESP

EBP

(a) Stack after saving EBP

Return address

(b) Stack after pop EBP

numberl
ESP

number2

Figure 11.8 Changes in stack state during a procedure execution.

can be used to access number2, but this causes a problem. The stack pointer register is updated
by the push and pop instructions. As a result, the relative offset changes with the stack operations
performed in the called procedure. This is not a desirable situation.
There is a better alternative: we can use the EBP register instead of ESP to specify an offset
into the stack segment. For example, we can copy the value of number 2 into the EAX register by
mov
mov

EBP,ESP
EAX,[EBP+4]

This is the usual way of pointing to the parameters on the stack. Since every procedure uses
the EBP register to access parameters, the EBP register should be preserved. Therefore, we should
save the contents of the EBP register before executing the
mov

EBP,ESP

Statement. We, of course, use the stack for this. Note that
push
mov

EBP
EBP,ESP

causes the parameter displacement to increase by four bytes, as shown in Figure 11.8a.
The information stored in the stack—parameters, return address, and the old EBP value—is
collectively called the stack frame. As we show on page 256, the stack frame also consists of local

Chapter 11 • Writing Procedures

245

variables if the procedure uses them. The EBP value is referred to as the frame pointer (FP). Once
the EBP value is known, we can access all items in the stack frame.
Before returning from the procedure, we should use
pop

EBP

to restore the original value of EBP. The resulting stack state is shown in Figure 11.86.
The r e t statement causes the return address to be placed in the EIP register, and the stack
state after r e t is shown in Figure 11.8c.
Now the problem is that the four bytes of the stack occupied by the two arguments are no longer
useful. One way to free these four bytes is to increment ESP by four after the call statement, as
shown below:
push
push
call
add

number1
number2
sum
ESP,4

For example, C compilers use this method to clear parameters from the stack. The above
assembly language code segment corresponds to the
sum(number2, number1);

function call in C.
Rather than adjusting the stack by the calling procedure, the called procedure can also clear
the stack. Note that we cannot write

add
ret

ESP, 4

because when r e t is executed, ESP should point to the return address on the stack. The solution
lies in the optional operand that can be specified in the r e t statement. The format is
ret

optional-value

which results in the following sequence of actions:
EIP= SSiESP
ESP= ESP +4 + o p t i o n a l - v a l u e

The o p t i o n a l - v a l u e should be a number (i.e., 16-bit immediate value). Since the purpose of
the optional value is to discard the parameters pushed onto the stack, this operand takes a positive
value.
Who Should Clean Up the Stack?

We have discussed the following ways of discarding the unwanted parameters on the stack:
1. clean-up is done by the calling procedure, or
2. clean-up is done by the called procedure.
If procedures require a fixed number of parameters, the second method is preferred. In this
case, we write the clean-up code only once in the called procedure independent of the number
of times this procedure is called. We follow this convention in our assembly language programs.
However, if a procedure receives a variable number of parameters, we have to use the first method.
We discuss this topic in detail in a later section.

246

Assembly Language Programming in Linux

Preserving Calling Procedure State
It is important to preserve the contents of the registers across a procedure call. The necessity for
this is illustrated by the following code:
mov

ECX,count

repeat:
call

compute

loop

repeat

The code invokes the compute procedure count times. The ECX register maintains the number
of remaining iterations. Recall that, as part of the l o o p instruction execution, the ECX register is
decremented by 1 and, if not 0, starts another iteration.
Suppose, now, that the compute procedure uses the ECX register during its computation.
Then, when compute returns control to the calling program, ECX would have changed, and the
program logic would be incorrect.
Since there are a limited number of registers and registers should be used for writing efficient
code, registers should be preserved. The stack is used to save registers temporarily.
Which Registers Should Be Saved?
The answer to this question is simple: Save those registers that are used by the calling procedure
but changed by the called procedure. This leads to the following question: Which procedure, the
calling or the called, should save the registers?
Usually, one or two registers are used to return a value by the called procedure. Therefore,
such register(s) do not have to be saved. For example, the EAX register is often used to return
integer values.
In order to avoid the selection of the registers to be saved, we could save, blindly, all registers each time a procedure is invoked. For instance, we could use the p u s h a d instruction (see
page 237). But such an action results in unnecessary overhead.
If the calling procedure were to save the necessary registers, it needs to know the registers used
by the called procedure. This causes two serious difficulties:
1. Program maintenance would be difficult because, if the called procedure were modified later
on and a different set of registers used, every procedure that calls this procedure would have
to be modified.
2. Programs tend to be longer because if a procedure is called several times, we have to include
the instructions to save and restore the registers each time the procedure is called.
For these reasons, we assume that the called procedure saves the registers that it uses and restores
them before returning to the calling procedure. This also conforms to the modular program design
principles.
When to Use pusha
The pusha instruction is useful in certain instances, but not all. We identify some instances where
pusha is not useful. First, what if some of the registers saved by pusha are used for returning

Chapter 11 • Writing Procedures

247

EBP, ESP

numberl

EBP + 40

number2

EBP + 36

Return address

EBP + 32

EAX

EBP + 28

ECX

EBP + 24

EDX

EBP + 20

EBX

EBP + 16

ESP

EBP + 12

EBP

EBP + 8

ESI

EBP + 4

EDI

Figure 11.9 Stack state after pusha.

results? For instance, EAX register is often used to return integer results. In this case pusha is
not really useful, as popa destroys the result to be returned to the calling procedure. Second, since
pusha introduces more overhead, it may be worthwhile to use the push instruction if we want
to save only one or two registers. Of course, the other side of the coin is that pusha improves
readability of code and reduces memory required for the instructions.
When pusha is used to save registers, it modifies the offset of the parameters. Note that
pusha
mov

EBP,ESP

causes the stack state, shown in Figure 11.9, to be different from that shown in Figure 11.8a on
page 244. You can see that the offset of numberl and number 2 increases.
ENTER and LEAVE Instructions

The instruction set has two instructions to facilitate stack frame allocation and release on procedure entry and exit. The e n t e r instruction can be used to allocate a stack frame on entering a
procedure. The format is
enter

bytes,level

The first operand b y t e s specifies the number of bytes of local variable storage we want on the
new stack frame. We discuss local variables in the next chapter. Until then, we set the first operand
to zero. The second operand l e v e l gives the nesting level of the procedure. If we specify a
nonzero level, it copies l e v e l stack frame pointers into the new frame from the preceding stack
frame. In all our examples, we set the second operand to zero. Thus the statement
enter

XX, 0

248

Assembly Language Programming in Linux

is equivalent to
push
mov
sub

EBP
EBP,ESP
ESP,XX

The l e a v e instruction releases the stack frame allocated by the e n t e r instruction. It does
not take any operands. The l e a v e instruction effectively performs the following:
mov
pop

ESP,EBP
EBP

We use the l e a v e instruction before the r e t instruction as shown in the following template for
procedures:
proc-name:
enter

XX,0

procedure body
leave
ret

As we show in the next chapter (page 259), the XX value is nonzero only if our procedure needs
some local variable space on the stack frame. The value YY is used to clear the arguments passed
on to the procedure.

Illustrative Examples
In this section, we use several examples to illustrate register-based and stack-based parameter
passing.
Example 11.1 Parameter passing by call-by-reference using registers.
This example shows how parameters can be passed by call-by-reference using the register method.
The program requests a character string from the user and displays the number of characters in the
string (i.e., string length). The string length is computed by the s t r _ l e n function. This function
scans the input string for the NULL character while keeping track of the number of characters in
the string. The pseudocode is shown below:
s t r _ l en (string)
index := 0
length := 0
while (string[index] ^/^ NULL)
index := index + 1
{ AX is used for string length}
length := length + 1
end while
return (length)
end s t r _ l e n
The s t r _ l e n function receives a pointer to the string in EBX and returns the string length in
the EAX register. The program listing is given in Program 11.2. The main procedure executes

Chapter 11 • Writing Procedures

mov

249

EBX,string

to place the address of s t r i n g in EBX (line 22) before invoking the procedure on line 23. Note
that even though the procedure modifies the EBX register during its execution, it restores the
original value of EBX by saving its value initially on the stack (line 35) and restoring it (line 44)
before returning to the main procedure.

Program 11.2 Parameter passing by call-by-reference using registers
1

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

Parameter passing via registers

PR0CEX2.ASM

Objective: To show parameter passing via registers
Input: Requests a character string from the user.
Output: Outputs the length of the input string.
%include "io.mac"
BUF LEN
EQU 41
.DATA
prompt_msg
length_msg

db
db

.UDATA
string

resb

; string buffer length

"Please input a string: ",0
"The string length is ",0
BUF_LEN

;input string < BUF_LEN chars.

.CODE
.STARTUP
PutStr prompt_msg
; request string input
GetStr string,BUF_LEN ; read string from keyboard
mov
EBX,string
call
str_len
PutStr length_msg
Putint AX
nwln

/ EBX = string address
; returns string length in AX
/ display string length

done:
.EXIT

Procedure str_len receives a pointer to a string in BX.
String length is returned in AX.
str_len:
push
sub
repeat:
cmp
je
inc
inc

EBX
AX, AX
byte [EBX],0
str_len_done
AX
EBX

/ string length = 0
compare with NULL char.
if NULL we are done
else, increment string length
point BX to the next char.

250

42
43
44
45

Assembly Language Programming in Linux

jmp
repeat
str len done:
EBX
pop
ret

and repeat the process

Example 11.2 Parameter passing by call-by-value using the stack,
This is the stack counterpart of Program 11.1, which passes two integers to the procedure sum.
The procedure returns the sum of these two integers in the AX register. The program listing is
given in Program 11.3.
The program requests two integers from the user. It reads the two numbers into the CX and
DX registers using G e t i n t (lines 16 and 19). Since the stack is used to pass the two numbers,
we have to place them on the stack before calling the sum procedure (see lines 21 and 22). The
state of the stack after the control is transferred to sum is shown in Figure 11.7 on page 243.
As discussed before, the EBP register is used to access the two parameters from the stack.
Therefore, we have to save EBP itself on the stack. We do this by using the e n t e r instruction
(line 35), which changes the stack state to that in Figure 11.8a on page 244.
The original value of EBP is restored at the end of the procedure using the l e a v e instruction
(line 38). Accessing the two numbers follows the explanation given in Section 11. Note that the
first number is at EBP +10, and the second one at EBP + 8. As in our first example on page 241,
no overflow check is done by sum. Control is returned to main by
ret

because sum has received two parameters requiring a total space of four bytes on the stack. This
r e t statement clears numberl and number2 from the stack.

Program 11.3 Parameter passing by call-by-value using the stack
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Parameter passing via the stack

PR0CEX3.ASM

Objective: To show parameter passing via the stack.
Input: Requests two integers from the user.
Output: Outputs the sum of the input integers.
%include "io.mac"
.DATA
prompt_msgl
prompt_msg2
sum_msg

db
db
db

"Please input the first number: ",0
"Please input the second number: ",0
"The sum is ",0

.CODE
.STARTUP
PutStr prompt_msgl
Getint CX

; request first number
; CX = first number

PutStr
Getint

; request second number
; DX = second number

prompt_msg2
DX

Chapter 11 • Writing Procedures

20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

push
push
call
PutStr
Putint
nwln

CX
DX
sum
sum_msg
AX

251

place first number on stack
place second number on stack
returns sum in AX
display sum

done:
.EXIT
Procedure sum receives two integers via the stack.
The sum of the two integers is returned in AX.

enter
mov
add
leave
ret

0,0
AX, [EBP+10]
AX, [EBP+8]
4

save EBP
sum = first number
sum = sum + second number
restore EBP
return and clear parameters

Example 11.3 Parameter passing by call-by-reference using the stack,
This example shows how the stack can be used for parameter passing using the call-by-reference
mechanism. The procedure swap receives two pointers to two characters and interchanges them.
The program, shown in Program 11.4, requests a string from the user and displays the input string
with the first two characters interchanged.

Program 11.4 Parameter passing by call-by-reference using the stack
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Parameter passing via the stack

PROCSWAP.ASM

Objective: To show parameter passing via the stack.
Input: Requests a character string from the user.
Output: Outputs the input string with the first
two characters swapped.
BUF_LEN
EQU 41
%include "io.mac"

; string buffer length

.DATA
prompt_msg
output_msg

db
db

"Please input a string: ",0
"The swapped string is: ",0

.UDATA
string

resb

BUF LEN

.CODE

;input string < BUF_LEN chars.

252

19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

Assembly Language Programming in Linux

.STARTUP
PutStr prompt_msg
; request string input
GetStr string,BUF_LEN ; read string from the user
mov
push
inc
push
call
PutStr
PutStr
nwln

EAX,string
EAX
EAX
EAX
swap
output_msg
string

; EAX = string [0] pointer
; EAX = string [1] pointer
; swaps the first two characters
; display the swapped string

done:
.EXIT

Procedure swap receives two pointers (via the stack) to
characters of a string. It exchanges these two characters.
.CODE
swap:
enter
push
/ swap
mov
xchg
mov
xchg
mov
xchg
; swap
pop
leave
ret

0,0
EBX
; save EBX - procedure uses EBX
begins here. Because of xchg, AL is preserved.
EBX, [EBP+12]
; EBX = first character pointer
AL, [EBX]
; EBX = second character pointer
EBX,[EBP+8]
AL,[EBX]
; EBX = first character pointer
EBX,[EBP+12]
AL,[EBX]
ends here
/ restore registers
EBX
8

; return and clear parameters

In preparation for calling swap, the main procedure places the addresses of the first two
characters of the input string on the stack (lines 23 to 26). The swap procedure, after saving the
EBP register as in the last example, can access the pointers of the two characters at EBP + 8 and
EBP + 12. Since the procedure uses the EBX register, we save it on the stack as well. Note that,
once the EBP is pushed onto the stack and the ESP value is copied to EBP, the two parameters (i.e.,
the two character pointers in this example) are available at EBP + 8 and EBP + 12, irrespective of
the other stack push operations in the procedure. This is important from the program maintenance
point of view.

Summary
The stack is a last-in-first-out data structure that plays an important role in procedure invocation
and execution. It supports two operations: push and pop. Only the element at the top-of-stack is

Chapter 11 • Writing Procedures

253

directly accessible through these operations. The stack segment is used to implement the stack.
The top-of-stack is represented by SS:ESP. In the implementation, the stack grows toward lower
memory addresses (i.e., grows downward).
The stack serves three main purposes: temporary storage of data, transfer of control during a
procedure call and return, and parameter passing.
When writing procedures in the assembly language, parameter passing has to be explicitly
handled. Parameter passing can be done via registers or the stack. Although the register method is
efficient, the stack-based method is more general. We have used several examples to illustrate the
register-based and stack-based parameter passing.

12
More on Procedures
We introduced the basics of the assembly language procedures in the last chapter We have discussed the two parameter passing mechanisms used in invoking procedures. However, we did not
discuss how local variables, declared in a procedure, are handled in the assembly language. We
start this chapter with a discussion of this topic.
Although short assembly language programs can be stored in a single file, real application
programs are likely to be broken into severalfiles,called modules. The issues involved in writing
and assembling multiple source program modules are discussed in detail.
Most high-level languages use procedures that receive afixednumber of arguments. However,
languages like C support variable number of arguments. By means of an example, we look at how
we can pass a variable number of arguments to a procedure. It turns out that passing a variable
number of arguments is straightforward using the stack. The last section provides a summary of
the chapter

Introduction
This chapter builds on the material presented in the last chapter. Specifically, we focus on three
issues: handling local variables, splitting a program into multiple modules, and passing a variable
number of arguments,
In the last chapter, we did not consider how local variables can be used in a procedure. To
focus our discussion, let us look at the following C code:
int compute(int a, int b)
{
int
temp, N;

}
The variables temp and N are local variables whose scope is limited to the compute procedure.
These variable come into existence when the compute procedure is invoked and disappear when
the procedure terminates. Like the parameter passing mechanism, we can use either registers or
the stack to store the local variables. We discuss these two methods and their pros and cons in the
next section.

256

Assembly Language Programming in Linux

In the assembly language programs we have seen so far, the entire assembly language program
is in a single file. This is fine for short example programs. Real application programs, however,
tend to be large, consisting of hundreds of procedures. Rather than keeping such a massive source
program in a singlefile,it is advantageous to break it up into several small pieces, where each piece
of the source code is stored in a separatefileor module. There are three advantages associated with
multimodule programs:
• The chief advantage is that, after modifying a source module, it is only necessary to reassemble that module. On the other hand, if you keep only a single file, the whole file has
to be reassembled.
• Making modifications to the source code is easier with several small files.
• It is safer to edit a short file; any unintended modifications to the source file are limited to a
single small file.
After discussing the local variable issues, we describe in detail the mechanism involved in creating
programs with multiple modules.
Most of the procedures we write receive a fixed number of arguments. These procedures
always receive the same number of arguments. However, procedures in C can be defined with a
variable number of parameters. In these procedures, the number of arguments passed can vary
from call to call. For example, a procedure may receive only two arguments in one call but may
receive five arguments in another. The input and output functions, scanf and p r i n t f , are the
two common procedures that take a variable number of arguments. In this type of procedures,
the called procedure does not know the number of arguments passed onto it. Usually, the first
argument specifies this number. Using an example, we show how we can write assembly language
procedures that can receive a variable number of arguments.

Local Variables
In the compute procedure, the local variables temp and N are dynamic. How do we store them
in our assembly language programs? One alternative is to use the processor registers. Even though
this method is efficient, it is not suitable for all procedures. The register method can be used for
the leaf procedures ^ Even here, the limited number of registers may cause problems.
To avoid these problems, we could reserve space for the local variables in our data segment.
However, such a space allocation is not desirable for two main reasons:
1. Space allocation done in the data segment is static and remains active even when the procedure is not. However, these local variables are supposed to disappear when the procedure is
terminated.
2. More importantly, it does not work with nonleaf and recursive procedures. Note that the
recursive procedures call themselves either directly or indirecdy. We discuss recursive procedures in Chapter 19.
For these reasons, space for local variables is reserved on the stack. For the C compute
function, Figure 12.1 shows the contents of the stack frame. In high-level languages, it is also
referred to as the activation record because each procedure activation requires all this information.
The EBP value, also called the/ram^ pointer, allows us to access the contents of the stack frame.
^ A leaf procedure is a procedure that does not call another procedure while a nonleaf procedure does.

Chapter 12 • More on Procedures

257

EBP+ 12
Parameters

EBP+ 8
EBP+ 4

Return address
old EBP

EBP

temp

EBP - 4
EBP- 8

Local variables
ESP

Figure 12.1 Activation record for the compute function.

For example, parameters a and b can be accessed at EBP +12 and EBP + 8, respectively. Local
variables temp and N can be accessed at EBP — 4 and EBP - 8, respectively.
To aid program readability, we can use the %def i n e directive to name the stack locations.
Then we can write
mov
mov

EBX,a
temp,EAX

instead of
mov
mov

EBX,[EBP+12]
[EBP-4],EAX

after establishing temp and a labels by using 1
%def i n e
%def i n e

a
temp

dword
dword

[EBP+12]
[EBP-4]

Next we look at an example that computes the Fibonacci numbers.

Our First Program
In this example, we write a procedure to compute the largest Fibonacci number that is less than or
equal to a given input number. The Fibonacci sequence of numbers is defined as
fib(l)=l,
fib(2)=l,
fib(n) = fib(n - 1) + fib(n - 2) for n > 2.
In other words, the first two numbers in the Fibonacci sequence are 1. The subsequent numbers
are obtained by adding the previous two numbers in the sequence. Thus,
1, 1,2,3,5,8,13,21,34,...,

is the Fibonacci sequence of numbers.

258

Assembly Language Programming in Linux

The listing for this example is given in Program 12.1. The main procedure requests the input
number and passes it on to the f i b o n a c c i procedure. The f i b o n a c c i procedure keeps the
last two Fibonacci numbers in local variables. We use the stack for storing these two Fibonacci
numbers. The variable FIB_LO corresponds to fib(n - 1) and FIB_HI to fib(n).
The f i b _ l o o p on lines 43-50 successively computes the Fibonacci number until it is greater
than or equal to the input number. Then the Fibonacci number in EAX is returned to the main
procedure.
Program 12.1 Fibonacci number computation witii local variables mapped to the stack
1

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

Fibonacci numbers

PROCFIB.ASM

Objective: To compute Fibonacci number using the stack
for local variables.
Input: Requests a positive integer from the user.
Output: Outputs the largest Fibonacci number that
is less than or equal to the input number.
%include "io.mac"
.DATA
prompt_msg
output_msgl
output_msg2

db
db
db
db

"Please input a positive number (>1): ",0
"The largest Fibonacci number less than "
"or equal to ",0
" is ",0

.CODE
.STARTUP
PutStr
GetLInt
call
PutStr
PutLInt
PutStr
PutLInt
nwln

prompt_msg
EDX
fibonacci
output_msgl
EDX
output_msg2
EAX

/ request input number
/ EDX = input number
; print Fibonacci number

done:
.EXIT

Procedure fibonacci receives an integer in EDX and computes
the largest Fibonacci number that is less than the input
number. The Fibonacci number is returned in EAX.
%define FIB_LO dword [EBP-4]
%define FIB_HI dword [EBP-8]
fibonacci:
enter
8,0
space for two local variables
push
EBX
; FIB_LO maintains the smaller of the last two Fibonacci
; numbers computed; FIB_HI maintains the larger one.

Chapter 12 • More on Procedures

41
42
43
44
45
46
47
48
49
50
51
52
53
54

259

initialize FIB_LO and FIB_HI to
FIB_L0,1
mov
first two Fibonacci numbers
FIB_HI,1
mov
fib_loop:
EAX,FIB_HI
compute next Fibonacci number
mov
EBX,FIB_LO
mov
EBX,EAX
add
FIB_LO,EAX
mov
FIB_HI,EBX
mov
EBX,EDX
; compare with input number in EDX
cmp
fib_loop
; if not greater, find next number
jle
EAX contains the required Fibonacci number
EBX
pop
clears local variable space
leave
ret

The code
push
mov
sub

EBP
EBP,ESP
ESP, 8

saves the EBP value and copies the ESP value into the EBP as usual. It also decrements the ESP
by 8, thus creating 8 bytes of storage space for the two local variables FIB_LO and FIB_HI. This
three-instruction sequence can be replaced by the
enter

8,0

instruction (line 37). As mentioned before, the first operand specifies the number of bytes reserved
for local variables. At this point, the stack allocation is

EBP+ 8

? ?

EBP+ 4

Return address

EBP

EBP - 4

FIB_LO

EBP - 8

FIB HI

Local variables
ESP

The two local variables can be accessed at EBP — 4 and EBP — 8. The two %def i n e statements, on lines 34 and 35, conveniently establish labels for these two locations. We can clear the
local variable space and restore the EBP value by
mov
pop

ESP,EBP
EBP

260

Assembly Language Programming in Linux

instructions. The l e a v e instruction performs exactly the same. Thus, the l e a v e instruction on
Une 53 automatically clears the local variable space. The rest of the code is straightforward to
follow.

Multiple Source Program Modules
We discussed the advantages of multimodule programs at the beginning of this chapter. If we want
to write multimodule assembly language programs, we have to precisely specify the intermodule
interface. For example, if a procedure is called in the current module but is defined in another
module, we have to state this fact so that the assembler does not flag such procedure calls as errors.
Assemblers provide two directives—global and extern—to facilitate separate assembly of
source modules. These two directives are discussed next.
GLOBAL Directive The g l o b a l directive makes the associated label(s) available to other modules of the program. The format is
global

labell, label2,

Almost any label can be made public. This includes procedure names, memory variables, and
equated labels, as shown in the following example:
global

error_msg, total, sample

.DATA
error_msg
total

db
dw

'Out of range!',0

.CODE

sample:

ret

Microsoft and Borland assemblers use PUBLIC directive for this purpose.
EXTERN Directive The e x t e r n directive can be used to tell the assembler that certain labels
are not defined in the current source file (i.e., module), but can be found in other modules. Thus,
the assembler leaves "holes" in the corresponding object file that the linker will fill in later. The
format is
extern

labell, label2, ...

where l a b e l 1 and l a b e l 2 are labels that are made public by a g l o b a l directive in some other
module.

Chapter 12 • More on Procedures

261

Illustrative Examples
We present two examples to show how the g l o b a l and e x t e r n directives are used to create
multimodule programs in the assembly language.
Example 12.1 A two-module example tofindstring length.
We now present a simple example that reads a string from the user and displays the string length
(i.e., number of characters in the string). The source code consists of two procedures: main and
s t r i n g _ l e n g t h . The main procedure is responsible for requesting and displaying the string
length information. It uses G e t S t r , P u t S t r , and P u t i n t I/O routines. The s t r i n g _ l e n g t h
procedure computes the string length.
The source program is split into two modules: the main procedure is in the m o d u l e l . asm
file, and the s t r i n g _ l e n g t h procedure is in the module2 . asm file. Program 12.2 gives a
listing of m o d u l e l . asm. Notice that on line 18, we declare s t r i n g _ l e n g t h as an externally
defined procedure by using the e x t e r n directive.
Program 12.2 The main procedure defined in modulel .asm calls the sum procedure defined in
module2.asm
Multimodule program for string length

MODULEl.ASM

Objective: To show parameter passing via registers.
Input: Requests two integers from keyboard.
Output: Outputs the sum of the input integers.
BUF_SIZE EQU 41
%include "io.mac"
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

.DATA
prompt_msg
length_msg

db
db

.UDATA
stringl

resb

string buffer size

"Please input a string: ",0
"String length is: ",0

BUF SIZE

.CODE
extern

string_length
.STARTUP
PutStr prompt_msg
; request a string
GetStr stringl,BUF_SIZE ; read string input

mov
call
PutStr
PutInt
nwln
done:
.EXIT

EBX,stringl
string_length
length_msg
AX

EBX := string pointer
returns string length in AX
display string length

262

Assembly Language Programming in Linux

Program 12.3 gives the module2 .asm program listing. This module consists of a single
procedure. By using the g l o b a l directive, we make this procedure global (line 10) so that other
modules can access it. The s t r i n g _ l e n g t h procedure receives a pointer to a NULL-terminated
string in EBX and returns the length of the string in EAX. The procedure preserves all registers
except for EAX.

Program 12.3 This module defines the sum procedure called by main
string length procedure

M0DULE2.ASM

Function: To write a procedure to compute string
length of a NULL-terminated string.
Receives: String pointer in the EBX register.
Returns: Returns string length in AX.
%include "io.mac"
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

.CODE
global string__length
string length
; all registers except AX are preserved
ESI
save ESI
push
ESI,EBX
ESI = string pointer
mov
repeat:
byte [ESI],0
cmp
is it NULL?
done
, if so, done
je
else, move to next character
ESI
inc
jmp
and repeat
repeat
,
done:
ESI,EBX
compute string length
sub
return string length in AX
mov
AX, SI
restore ESI
pop
ESI
ret

We can assemble each source code module separately producing the corresponding object file.
We can then link the object files together to produce a single executable file. For example, using
the NASM assembler, the following sequence of commands
nasm -f elf modulel.asm
nasm -f elf module2.asm
Id -s -o module modulel.o module2.o io.o

• Produces modulel. o
Produces module2 . o
• Produces module

produces the executable file module. Note that the above sequence assumes that you have the
i o . o file in your current directory.
Example 12.2 Bubble sort procedure.
There are several algorithms to sort an array of numbers. The algorithm we use here is called the
bubble sort algorithm. We assume that the array is to be sorted in ascending order. The bubble

Chapter 12 • More on Procedures

Initial state:
After 1st comparison:
After 2nd comparison:
After 3rd comparison:
End of first pass:

263

4
3
3
3
3

3
4
4
4
4

5 12
5 12
5 12
15 2
12 5

(4 and 3 swapped)
(no swap)
(5 and 1 swapped)
(5 and 2 swapped)

Figure 12.2 Actions taken during tine first pass of the bubble sort algorithm.

Initial state:
After 1st pass:
After 2nd pass:
After 3rd pass:
After the final pass:

4 3 5 12
3 4 12 5
3 12 4 5
12 3 4 5
12 3 4 5

(5 in its final position)
(4 in its final position)
(array in sorted order)
(final pass to check)

Figure 12.3 Behavior of the bubble sort algorithm.

sort algorithm consists of several passes through the array. Each pass scans the array, performing
the following actions:
• Compare adjacent pairs of data elements;
• If they are out of order, swap them.
The algorithm terminates if, during a pass, no data elements are swapped. Even if a single swap is
done during a pass, it will initiate another pass to scan the array.
Figure 12.2 shows the behavior of the algorithm during the first pass. The algorithm starts
by comparing the first and second data elements (4 and 3). Since they are out of order, 4 and
3 are interchanged. Next, the second data element 4 is compared with the third data element 5,
and no swapping takes place as they are in order. During the next step, 5 and 1 are compared
and swapped and finally 5 and 2 are swapped. This terminates the first pass. The algorithm has
performed N - 1 comparisons, where A^ is the number of data elements in the array. At the end
of the first pass, the largest data element 5 is moved to its final position in the array.
Figure 12.3 shows the state of the array after each pass. Notice that after the first pass, the
largest number (5) is in its final position. Similarly, after the second pass, the second largest
number (4) moves to its final position, and so on. This is why this algorithm is called the bubble
sort: during the first pass, the largest element bubbles to the top, the second largest bubbles to the
top during the second pass, and so on. Even though the array is in sorted order after the third pass,
one more pass is required by the algorithm to detect this.
The number of passes required to sort an array depends on how unsorted the initial array is.
If the array is in sorted order, only a single pass is required. At the other extreme, if the array is
completely unsorted (i.e., elements are initially in the descending order), the algorithm requires
the maximum number of passes equal to one less than the number of elements in the array. The
pseudocode for the bubble sort algorithm is shown in Figure 12.4.
The bubble sort program requests a set of up to 20 nonzero integers from the user and displays
them in sorted order. The input can be terminated earlier by typing a zero.
We divide the bubble sort program into four modules, surely an overkill but it gives us an
opportunity to practice multimodule programming. The main program calls three procedures to
perform the bubble sort:

264

Assembly Language Programming in Linux

b u b b l e _ s o r t (array Pointer, array Size)
status := UNSORTED
#comparisons := array Size
while (status = UNSORTED)
#comparisons := #comparisons - 1
status := SORTED
for (i = 0 to #comparisons)
if (array[i] > array[i+l])
swap ith and (i + l)th elements of the array
status := UNSORTED
end if
end for
end while

end bubble_sort
Figure 12.4 Pseudocode for the bubble sort algorithm.

• a r r a y _ r e a d procedure: This procedure reads the input numbers into the array to be
sorted,
• a r r a y _ o u t p u t procedure: This procedure outputs the sorted array.
• b u b b l e _ s o r t procedure: This procedure sorts the array in ascending order using the
bubble sort algorithm.
The main program listing is shown in Program 12.4. It first calls the r e a d _ a r r a y procedure
to fill the array with nonzero integers. The r e a d _ a r r a y procedure returns the actual number
of values read into the array in the EAX register. If this value is zero, implying that no input
was given, the program terminates after displaying an appropriate message. Otherwise, the array
pointer and its size are passed onto the bubble sort procedure. After returning from this procedure,
the a r r a y _ o u t p u t procedure is called to display the sorted array.
Program 12.4 Main program of the bubble sort program
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Bubble sort procedure
BBLMAIN.ASM
Objective: To implement the bubble sort algorithm.
Input: A set of nonzero integers to be sorted.
Input is terminated by entering zero.
Output: Outputs the numbers in ascending order.
%define
CRLF ODH,OAH
20
MAX_SIZE
EQU
%include "io.mac"
.DATA
prompt_msg db "Enter nonzero integers to be sorted.",CRLF
db "Enter zero to terminate the input.",0
output_msg db "Input numbers in ascending order:",0
error_msg
db "No input entered.",0

Chapter 12 • More on Procedures

16'
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

.UDATA
array
.CODE
extern
extern
extern

resd

MAX_SIZE

265

; input array for integers

bubble_sort
read__array
output_array

.STARTUP
PutStr prompt_msg
nwln
mov
EBX,array
mov
ECX,MAX_SIZE

; request input numbers
; EBX = array pointer
; ECX = array size

; reads input into the array
call
read_array
/ returns the number of values read in EAX
cmp
ja
PutStr
nwln
jmp
input__0K:
push
push
call
PutStr
nwln
mov
mov
call

EAX,0
input_OK
error_msg

; if no input is given
/ display error message

short done
EAX
array
bubble_sort

; push array size onto stack
; place array pointer on stack

output_msg

; display sorted input numbers

EBX,array
ECX,EAX
output_array

/ EAX has the number count

done:
.EXIT

The r e a d _ a r r a y procedure, shown in Program 15.1, receives the array pointer in EBX and
the maximum array size in the ECX register. It reads at most maximum array size values. The
loop instruction on line 24 takes care of this condition. The input can also be terminated earlier by
entering a zero. The zero input condition is detected and the loop is terminated by the statements
on lines 19 and 20. The EDX register is used to keep track of the number of input values received
from the user. This value is returned to the main program via the EAX register (line 26).
Program 12.5 Read array procedure
1
2
3
4
5;

Array read procedure
BBLREAD.ASM
F u n c t i o n : To r e a d a s e t o f n o n z e r o i n t e g e r s v a l u e s
i n t o an a r r a y .

Input is terminated by entering zero.
Receives: EBX = array pointer

266

6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25 :
26
27
28
29

Assembly Language Programming in Linux

ECX = array
c
size
Returns: EAX = number
i
of values read.
%include "io.iTiac"

;
;

.CODE
global

read _array

read_ array:
push
push
sub
read__loop:
GetLInt
cmp
je
mov
add
inc
loop
read__done:
mov
pop
pop
ret

EDX
EBX
EDX,EDX

number count = 0

EAX
EAX,0
read_done
[EBX],EAX
EBX, 4
EDX
read_loop

read input number
if the number is zero
no more numbers to read
copy the number into array
EBX points to the next element
increment number count
reads a max. of MAX_SIZE numbers

EAX,EDX
EBX
EDX

returns the # of values read

The a r r a y _ o u t p u t procedure (Program 12.6) receives the array pointer in the EBX register
and the array size in the ECX register. It uses the loop on lines 14-18 to display the sorted array.
Program 12.6 Output array procedure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Array output procedure
BBLOUTPUT.ASM
Function: To output the values of an array.
Receives: EBX = array pointer
ECX = array size
Returns: None.
%include "io.mac"
.CODE
global

output_array

output_array:
push
push
print_loop:
PutLInt
nwln
add
loop
pop

EBX
ECX
[EBX]
EBX, 4
print_loop
ECX

267

Chapter 12 • More on Procedures

20:
21:

pop
ret

EBX

The b u b b l e _ s o r t procedure receives the array size and a pointer to the array. In the
b u b b l e _ s o r t procedure, the ECX register is used to keep track of the number of comparisons
while EDX maintains the status information. The ESI register points to the ith element of the input
array.
Program 12.7 Bubble sort procedure to sort integers in ascending order
1

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

This procedure receives a pointer to an array of integers
and the size of the array via the stack. It sorts the
array in ascending order using the bubble sort algorithm.
%include "io.mac"
EQU 0
SORTED
UNSORTED EQU 1
.CODE
global bubble_sort
bubble_sort:
pushad
mov
EBP,ESP
ECX serves the same purpose as the end_index variable
in the C procedure. ECX keeps the number of comparisons
to be done in each pass. Note that ECX is decremented
by 1 after each pass.
mov
ECX, [EBP+40] ; load array size into ECX
nextjass :
dec
mov

ECX
sort_done
EDI,ECX

if # of comparisons is zero
then we are done
else start another pass

;DL is used to keep SORTED/UNSORTED status
/ set status to SORTED
mov
DL,SORTED
mov
ESI, [EBP+36] ; load array address into ESI
; ESI points to element i and ESI+4 to the next element
pass:
This loop represents one pass of the algorithm.
Each iteration compares elements at [ESI] and [ESI+4]
and swaps them if ( [ESI]) < ( [ESI+4]) .
mov
mov
cmp

EAX, [ESI]
EBX, [ESI+4]
EAX,EBX

268

40
swap
jg
41
42
increment:
43
/ Increment ESI by 4
44
add
ESI,4
45
dec
EDI
46
jnz
pass
47
cmp
48
EDX,SORTED
49
sort_done
je
50
jmp
next_pass
51
52
swap:
53
; swap elements at []
54
mov
[ESI+4],EAX
55
mov
[ESI],EBX
mov
56
EDX,UNSORTED
57
jmp
increment
58
59 : sort_done:
60 :
popad
61
ret
8

Assembly Language Programming in Linux

if status remains SORTED
then sorting is done
else initiate another pass

copy [ESI] in EAX to [ESI+4]
copy [ESI+4] in EBX to [ESI]
set status to UNSORTED

The w h i l e loop condition is tested by lines 48 to 50. The f o r loop body corresponds to lines
37 to 46 and 54 to 57. The rest of the code follows the pseudocode. Note that the array pointer is
available in the stack at EBP + 36 and its size at EBP + 40, as we use pushad to save all registers.

Procedures with Variable Number of Parameters
In assembly language procedures, a variable number of parameters can be easily handled by the
stack method of parameter passing. Only the stack size imposes a limit on the number of arguments
that can be passed. The next example illustrates the use of the stack to pass a variable number of
arguments in assembly language programs.
Example 12.3 Passing a variable number of arguments via the stack,
In this example, the v a r i a b l e _ s u m procedure receives a variable number of integers via the
stack. The actual number of integers passed is the last argument pushed onto the stack before
calling the procedure. The procedure finds the sum of the integers and returns this value in the
EAX register.
The main procedure in Program 12.8 requests input from the user. Only nonzero values are
accepted as valid input (entering a zero terminates the input). The read_number loop (lines 24
to 30) reads input numbers using G e t L I n t and pushes them onto the stack. The ECX register
keeps a count of the number of input values, which is passed as the last parameter (line 32) before
calling the v a r i a b l e _ s u m procedure. The state of the stack at line 53, after executing the
e n t e r instruction, is shown in Figure 12.5.
The v a r i a b l e _ s u m procedure first reads the number of parameters passed onto it from the
stack at EBP + 8 into the ECX register. The add_loop (lines 60 to 63) successively reads each

Chapter 12 » More on Procedures

269

parameter N
parameter N — 1
/ N parameters
EBP+ 16

parameter 2

EBP+ 12

parameter 1

EBP+ 8

EBP+ 4

Return address

LL5r, t o r

--*

Number of parameters

EBP

Figure 12.5 State of the stack after executing the enter statement.

integer from the stack and computes their sum in the EAX. Note that on Hne 61 we use a segment
override prefix. If we write
add

EAX,[EBX]

the contents of the EBX are treated as the offset value into the data segment. However, our parameters are located in the stack segment. Therefore, it is necessary to indicate that the offset in EBX
is relative to SS (and not DS) by using the SS: segment override prefix (line 61). The segment
override prefixes—CS:, DS:, ES:, FS:, GS:, and SS:—can be placed in front of a memory operand
to indicate a segment other than the default segment.
Program 12.8 A program to illustrate passing a variable number of parameters
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Variable number of parameters passed via stack

VARPARA.ASM

Objective: To show how variable number of parameters
can be passed via the stack.
Input: Requests variable number of nonzero integers.
A zero terminates the input.
Output: Outputs the sum of input numbers.
%define CRLF

ODH,OAH

carriage return and line feed

%include "io.mac"
.DATA
prompt_msg

sum_msg

db
db
db
db

"Please input a set of nonzero integers.",CRLF
"You must enter at least one integer.",CRLF
"Enter zero to terminate the input.",0
"The sum of the input numbers is: ",0

270

19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
65:
ee:
67:
68:

Assembly Language Programming in Linux

.CODE
.STARTUP
request input numbers
PutStr prompt_msg
nwln
ECX keeps number count
sub
ECX,ECX
read_number:
read input number
GetLInt EAX
if the number is zero
EAX,0
cmp
no more nuumbers to read
stop_reading
je
place the number on stack
EAX
push
increment number count
ECX
inc
read number
jmp
stop_reading
push
ECX
; place number count on stack
call
variable_sum
; returns sum in EAX
; clear parameter space on the stack
inc
ECX
; increment ECX to include count
add
ECX,ECX
; ECX = ECX * 4 (space in bytes)
add
ECX,ECX
add
ESP,ECX
; update ESP to clear parameter
space on the stack
PutStr sum_msg
; display the sum
PutLInt EAX
nwln
done:
.EXIT

This procedure receives variable number of integers via the
stack. The last parameter pushed on the stack should be
the number of integers to be added. Sum is returned in EAX.
variable_sum:
enter
0,0
push
EBX
push
ECX
mov
mov
add
sub
add_loop:
add
add
loop
pop
pop
leave
ret

save EBX and ECX
; ECX = # of integers to be added

ECX, [EBP+8]
EBX,EBP
EBX,12
EAX,EAX

; EBX = pointer to first number
/ sum = 0

EAX,[SS:EBX]
EBX, 4
add_loop

; sum = sum + next number
; EBX points to the next integer
; repeat count in ECX

ECX
EBX

; restore registers

; parameter space cleared by main

Chapter 12 • More on Procedures

271

Kernel virtual memory
(code, data, heap, stack)

OxCOOOOOOO

User stack
ESP

t
0x40000000

Shared libraries

f^
Run time heap
Read/write segment
(.data, .bss)
Read-only segment
(.text)

Loaded from
executable file

0x08048000
0

Figure 12.6 Memory layout of a Linux process.

A Few Notes

1. If you are running this program on a Linux system, you don't need the segment override
prefix. The reason is that Linux and UNIX systems do not use the physical segmentation
provided by the IA-32 architecture. Instead, these systems treat the memory as a single
physical segment, which is partitioned into various logical segments. Figure 12.6 shows the
memory layout for Linux. The bottom two segments are used for the code and data. For
example, the code segment (. t e x t ) is placed in the bottom segment, which is a read-only
segment. The next segment stores the data part (. d a t a and . bss). The stack segment is
placed below the kernel space.
2. In this example, we deliberately used the EBX to illustrate the use of segment override
prefixes. We could have used the EBP itself to access the parameters. For example, the code

add

add
sub
loop:
add
add
loop

EBP,12
EAX,EAX
EAX,[EBP]
EBP, 4
add loop

can replace the code at lines 58 to 63. A disadvantage of this modified code is that, since
we have modified the EBP, we no longer can access, for example, the parameter count value

272

Assembly Language Programming in Linux

in the stack. For this example, however, this method works fine. A better way is to use an
index register to represent the offset relative to the EBP. We defer this discussion to the next
chapter, which discusses the addressing modes.
3. Another interesting feature is that the parameter space on the stack is cleared by main.
Since we pass a variable number of parameters, we cannot use r e t to clear the parameter
space. This is done in main by lines 35 to 38. The ECX is first incremented to include the
count parameter (line 35). The byte count of the parameter space is computed on lines 36
and 37. These lines effectively multiply ECX by four. This value is added to the ESP register
to clear the parameter space (line 38).

Summary
We started this chapter with a discussion of local variables. Such variables are dynamic as these
variables come into existence when the procedure is invoked and disappear when the procedure
terminates. As with parameter passing, local variables of a procedure can be stored either in
registers or on the stack. Due to the limited number of registers available, only a few local variables
can be mapped to registers. The stack avoids this limitation, but it is slow. Furthermore, we cannot
use the registers for local variable storage in nonleaf and recursive procedures.
Real application programs are unlikely to be short enough to keep in a single file. It is advantageous to break large source programs into more manageable chunks. Then we can keep each
chunk in a separate file (i.e., modules). We have discussed how such multimodule programs are
written and assembled into a single executable file.
We have also discussed how a variable number of arguments can be passed onto procedures in
the assembly language. When the stack is used for parameter passing, passing a variable number
of arguments is straightforward. We have demonstrated this by means of an example.

13
Addressing Modes
In assembly language, specification of data required by instructions can be done in a variety of
ways. In Chapter 9 we discussed four different addressing modes: register, immediate, direct, and
indirect. The last two addressing modes specify operands in memory. However, such memory
operands can be specified by several other addressing modes. Here we give a detailed description
of these memory addressing modes.
Arrays are important for organizing a collection of related data. Although one-dimensional
arrays are straightforward to implement, multidimensional arrays are more involved. This chapter
discusses these issues in detail. Several examples are given to illustrate the use of the addressing
modes in processing one- and two-dimensional arrays.

Introduction
Addressing mode refers how we specify the location of an operand that is required by an instruction. An operand can be at any of the following locations: in a register, in the instruction itself, in
the memory, or at an I/O port. Chapter 20 discusses how operands located at an I/O port can be
specified. Here we concentrate on how we can specify operands located in the first three locations.
The three addressing modes are:
• Register Addressing Mode: In this addressing mode, as discussed in Chapter 9, processor
registers provide the input operands and results are stored back in registers. Since the IA-32
architecture uses a two-address format, one operand specification acts as both source and
destination. This addressing mode is the best way of specifying operands, as the delay in
accessing the operands is minimal.
• Immediate Addressing Mode: This addressing mode can be used to specify at most one
source operand. The operand value is encoded as part of the instruction. Thus, the operand
is available as soon as the instruction is read.
• Memory Addressing Modes: When an operand is in memory, a variety of addressing modes
is provided to specify it. Recall that we have to specify the logical address in order to
specify the location of a memory operand. The logical address consists of two components:
segment base and offset. Note that the offset is also referred to as the effective address.
Memory addressing modes differ in how they specify the effective address.

274

Assembly Language Programming in Linux
Memory

Based
[BX + disp]
[BP + disp]

[SI + disp]
[Dl + disp]
Based-Indexed
with no displacement
[BX + SI] [BP + SI]
[BX + Dl] [BP + Dl]

Based-Indexed
with displacement
[BX + SI -f disp]
[BX + Dl + disp]
[BP + SI + disp]
[BP + Dl + disp]

Figure 13.1 Memory addressing modes for 16-bit addresses.

Addressing Modes

Immediate

Memory

Based
[Base + disp]

Indexed
[(Index * scale) + disp]
Based-Indexed
with no scale factor
[Base -•- index + disp]

Based-Indexed
with scale factor
[Base + (Index * scale) + disp]

Figure 13.2 Addressing modes of the Pentium for 32-bit addresses.

We have already discussed the direct and register indirect addressing modes in Chapter 9. The direct addressing mode gives the effective address directly in the instruction. In the indirect addressing mode, the effective address is in one of the general-purpose registers. This chapter discusses
the remaining memory addressing modes.

Memory Addressing Modes
The primary motivation for providing different addressing modes is to efficiently support highlevel language constructs and data structures. The actual memory addressing modes available
depend on the address size used (16 bits or 32 bits). The memory addressing modes available
for 16-bit addresses are the same as those supported by the 8086. Figure 13.1 shows the default

Chapter 13 • Addressing Modes

275

Table 13.1 Differences between 16-bit and 32-bit addressing
16-bit addressing
32-bit addressing
Base register

BX
BP

EAX, EBX, ECX, EDX
ESI, EDI, EBP, ESP

Index register

SI
DI

EAX, EBX, ECX, EDX
ESI, EDI, EBP

Scale factor

None

1,2,4,8

Displacement

0,8, 16 bits

0, 8, 32 bits

memory addressing modes available for 16-bit addresses. A moreflexibleset of addressing modes
is supported for 32-bit addresses. These addressing modes are shown in Figure 13.2 and are
summarized below:
Segment + Base + (Index * Scale) + displacement
EAX
HEX
ECX
EDX
ESI
EDI
EBP
ESP

cs
ss

DS
ES
FS
GS

EAX
EBX
ECX
EDX
ESI
EDI
EBP

1
2
4
8

No displacement
8-bit displacement
32-bit displacement

The differences between 16-bit and 32-bit addressing are summarized in Table 13.1. How does
the processor know whether to use 16- or 32-bit addressing? As discussed in Chapter 4, it uses the
D bit in the CS segment descriptor to determine if the address is 16 or 32 bits long (see page 70).
It is, however, possible to override these defaults by using the size override prefixes:
66H
67H

Operand size override prefix
Address size override prefix

By using these prefixes, we can mix 16- and 32-bit data and addresses. Remember that our assembly language programs use 32-bit data and addresses. This, however, does not restrict us from
using 16-bit data and addresses. For example, when we write
mov

EAX,12 3

the assembler generates the following machine language code:
B8

0000007B

However, when we use a 16-bit operand as in
mov

AX,123

276

Assembly Language Programming in Linux

the following code is generated by the assembler:
66

I B8 007B

The assembler automatically inserts the operand size override prefix (66H). Similarly, we can use
16-bit addresses. For instance, consider the following example:
mov

EAX,[BX]

The assembler automatically inserts the address size override prefix (67H) as shown below:
67

I 8B 07

It is also possible to mix both override prefixes as demonstrated by the following example. The
assembly language statement
mov

AX,[BX]

causes the assembler to insert both operand and address size override prefixes:
66 I 67 I 8B 07
Based Addressing

In the based addressing mode, one of the registers acts as the base register in computing the
effective address of an operand. The effective address is computed by adding the contents of the
specified base register with a signed displacement value given as part of the instruction. For 16-bit
addresses, the signed displacement is either an 8- or a 16-bit number. For 32-bit addresses, it is
either an 8- or a 32-bit number.
Based addressing provides a convenient way to access individual elements of a structure. Typically, a base register can be set up to point to the base of the structure and the displacement can
be used to access an element within the structure. For example, consider the following record of a
course schedule:
Course number
Course title
Term offered
Room number
Enrollment limit
Number registered
Total storage per record

Integer
Character string
Single character
Character string
Integer
Integer

2 bytes
38 bytes
1 byte
5 bytes
2 bytes
2 bytes
50 bytes

In this example, suppose we want tofindthe number of available spaces in a particular course.
We can let the EBX register point to the base address of the corresponding course record and use
displacement to read the number of students registered and the enrollment limit for the course to
compute the desired answer. This is illustrated in Figure 13.3.
This addressing mode is also useful in accessing arrays whose element size is not 2, 4, or 8
bytes. In this case, the displacement can be set equal to the offset to the beginning of the array,
and the base register holds the offset of a specific element relative to the beginning of the array.

Chapter 13 • Addressing Modes

277

SSA + 100

SSA + 50

Enrollment

# registered

Room #

Term

Title

Course #

Enrollment

# registered

Room #

Term

Title

Course #

displacement
46 bytes
SSA
Structure Starting Address

\ Second course record
(50 bytes)

First course record
(50 bytes)

Figure 13.3 Course record layout in memory.

Indexed Addressing

In this addressing mode, the effective address is computed as
(Index * scale factor) + signed displacement.
For 16-bit addresses, no scaling factor is allowed (see Table 13.1 on page 275). For 32-bit addresses, a scale factor of 2, 4, or 8 can be specified. Of course, we can use a scale factor in the
16-bit addressing mode by using an address size override prefix.
The indexed addressing mode is often used to access elements of an array. The beginning of
the array is given by the displacement, and the value of the index register selects an element within
the array. The scale factor is particularly useful to access arrays whose element size is 2, 4, or 8
bytes.
The following are valid instructions using the indexed addressing mode to specify one of the
operands.
add
mov
add

EAX,[EDI+20]
EAX,[marks_table+ESI*4]
EAX,[tablel+ESI]

In the second instruction, the assembler would supply a constant displacement that represents the
offset of mark s_t a b l e in the data segment. Assume that each element of mark s_t a b l e takes

278

Assembly Language Programming in Linux

four bytes. Since we are using a scale factor of four, ESI should have the index value. For example,
if we want to access the tenth element, ESI should have nine as the index value starts with zero.
If no scale factor is used as in the last instruction, ESI should hold the offset of the element
in bytes relative to the beginning of the array. For example, if t a b l e 1 is an array of four-byte
elements, ESI register should have 36 to refer to the tenth element. By using the scale factor, we
avoid such byte counting.
Based-Indexed Addressing

Based-Indexed with No Scale Factor In this addressing mode, the effective address is computed
as
Base + Index + signed displacement.
The displacement can be a signed 8- or 16-bit number for 16-bit addresses; it can be a signed 8- or
32-bit number for 32-bit addresses.
This addressing mode is useful in accessing two-dimensional arrays with the displacement
representing the offset to the beginning of the array. This mode can also be used to access arrays
of records where the displacement represents the offset to a field in a record. In addition, this
addressing mode is used to access arrays passed on to a procedure. In this case, the base register
could point to the beginning of the array, and an index register can hold the offset to a specific
element.
Assuming that EBX points to t a b l e 1, which consists of four-byte elements, we can use the
code
mov
cmp

EAX,[EBX+ESI]
EAX,[EBX+ESI+4]

to compare two successive elements of t a b l e 1. This type of code is particularly useful if the
t a b l e 1 pointer is passed as a parameter.
Based-Indexed with Scale Factor In this addressing mode, the effective address is computed as
Base -I- (Index * scale factor) + signed displacement.
This addressing mode provides an efficient indexing mechanism into a two-dimensional array
when the element size is 2, 4, or 8 bytes.

Arrays
Arrays are useful in organizing a collection of related data items, such as test marks of a class,
salaries of employees, and so on. We have used arrays of characters to represent strings. Such
arrays are one-dimensional: only a single subscript is necessary to access a character in the array. High-level languages support multidimensional arrays. In this section, we discuss both onedimensional and multidimensional arrays.
One-Dimensional Arrays

A one-dimensional array of test marks can be declared in C as
int

t e s t marks [10];

Chapter 13 • Addressing Modes

279

In C, the subscript always starts at zero. Thus, t e s t _ m a r k s [0] gives the first student's mark
and t e s t _ m a r k s [9] gives the last student's mark.
Array declaration in high-level languages specifies the following five attributes:
•
•
•
•
•

Nameof the array ( t e s t _ m a r k s ) ,
Number of the elements (10),
Element size (4 bytes),
Type of element (integer), and
Index range (0 to 9).

From this information, the amount of storage space required for the array can be easily calculated.
Storage space in bytes is given by
Storage space = number of elements * element size in bytes.
In our example, it is equal to 10 * 4 = 40 bytes. In the assembly language, arrays are implemented
by allocating the required amount of storage space. For example, the t e s t _ m a r k s array can be
declared as
test_marks

resd

An array name can be assigned to this storage space. But that is all the support you get in assembly
language! It is up to you as a programmer to "properly" access the array taking the element size
and the range of subscripts into account.
You need to know how the array is stored in memory in order to access elements of the array.
For one-dimensional arrays, representation of the array in memory is rather direct: array elements
are stored linearly in the same order as shown in Figure 13.4. In the remainder of this section, we
use the convention used for arrays in C (i.e., subscripts are assumed to begin with 0).
To access an element we need to know its displacement value in bytes relative to the beginning
of the array. Since we know the element size in bytes, it is rather straightforward to compute the
displacement from the subscript value:
displacement = subscript * element size in bytes.
For example, to access the sixth student's mark (i.e., subscript is 5), you have to use 5 * 4 = 20 as
the displacement value into the t e s t _ m a r k s array. Later we present an example that computes
the sum of a one-dimensional integer array. If the array element size is 2,4, or 8 bytes, we can use
the scale factor to avoid computing displacement in bytes.
Multidimensional Arrays

Programs often require arrays of more than one dimension. For example, we need a two-dimensional
array of size 50 x 3 to store test marks of a class of 50 students taking three tests during a semester.
For most programs, arrays of up to three dimensions are adequate. In this section, we discuss how
two-dimensional arrays are represented and manipulated in the assembly language. Our discussion
can be generalized to higher-dimensional arrays.
For example, a 5 x 3 array to store test marks can be declared in C as
int

c l a s s marks [5] [3] ;

/* 5 rows and 3 columns */

280

Assembly Language Programming in Linux
High memory
test_marks[9]
test_marks[8]
test_marks[7]
test_marks[6]
test_marks[5]
test_marks[4]
test_marks[3]
test_marks[2]
test_marks[1]
test_marks[0]

Low memory

test_marks

Figure 13.4 One-dimensional array storage representation.

Storage representation of such arrays is not as direct as that for one-dimensional arrays. Since the
memory is one-dimensional (i.e., linear array of bytes), we need to transform the two-dimensional
structure to a one-dimensional structure. This transformation can be done in one of two common
ways:
• Order the array elements row-by-row, starting with the first row, or
• Order the array elements column-by-column, starting with the first column.
The first method, called the row-major ordering, is shown in Figure 13.5a. Row-major ordering is used in most high-level languages including C. The other method, called the column-major
ordering, is shown in Figure 13.5^?. Column-major ordering is used in FORTRAN. In the remainder of this section, we focus on the row-major ordering scheme.
Why do we need to know the underlying storage representation? When we are using a highlevel language, we really do not have to bother about the storage representation. Access to arrays
is provided by subscripts: one subscript for each dimension of the array. However, when using assembly language, we need to know the storage representation in order to access individual
elements of the array for reasons discussed next.
In the assembly language, we can allocate storage space for the c l a s s _ m a r k s array as
class marks

resd

5*3

This statement simply allocates the 60 bytes required to store the array. Now we need a formula to
translate row and column subscripts to the corresponding displacement. In the C language, which
uses row-major ordering and subscripts start with zero, we can express displacement of an element
at row / and column y as
displacement = (/ * COLUMNS 4-y) * ELEMENT_SIZE,

Chapter 13 • Addressing Modes

class marks

281

High memory

class_marks[4,2]

class_marks[4,1]

class_marks[3,2]

class_marks[4,0]

class_marks[2,2]

class_marks[3,2]

class_marks[1,2]

class_marks[3,1]

class_marks[0,2]

class_marks[3,0]

class_marks[4,1]

class_marks[2,2]

class_marks[3,1]

class_marks[2,1]

class_marks[2,0]

class_marks[1,1]

class_marks[1,2]

class_marks[0,1]

class_marks[1,1]

class_marks[4,0]

class_marks[1,0]

class_marks[3,0]

class_marks[0,2]

class_marks[2,0]

class_marks[0,1]

class_marks[1,0]

class_marks[0,0]

class marks

class_marks[0,0]

Low memory

(a) Row-major order

(b) Column-major order

Figure 13.5 Two-dimensional array storage representation.

where COLUMNS is the number of columns in the array and ELEMENT_SIZE is the number
of bytes required to store an element. For example, displacement of c l a s s _ m a r k s [ 3 , 1 ] is
(3 * 3 + 1) * 4 = 40. Later we give an example to illustrate how two-dimensional arrays are
manipulated.

Our First Program
This example demonstrates how one-dimensional arrays can be manipulated. Program 13.1 finds
the sum of the t e s t _ m a r k s array and displays the result.
Program 13.1 Computing the sum of a one-dimensional array
Sum of a long integer array

ARRAY_SUM.ASM

Objective: To find sum of all elements of an array.

282

Assembly Language Programming in Linux

/
Input: None.
;
Output: Displays the sum.
%include "io.mac"

9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

.DATA
test_marks
NO_STUDENTS
sum_msg

DD
90,50,70,94,81,40,67,55,60,73
EQU ($-test_marks)/4
; number of students
DB
'The sum of test marks is: ',0

.CODE
.STARTUP
mov
CX,NO_STUDENTS
; loop iteration count
sub
EAX,EAX
; sum := 0
sub
ESI,ESI
; array index := 0
add_loop:
mov
EBX,[test_marks+ESI*4]
PutLInt EBX
nwln
add
EAX,[test_marks+ESI*4]
inc
ESI
loop
add_loop
PutStr sum_msg
PutLInt EAX
nwln
.EXIT

Each element of the t e s t _ m a r k s array, declared on line 9, requires four bytes. The array
size NO_STUDENTS is computed on line 10 using the predefined location counter symbol $. The
predefined symbol $ is always set to the current offset in the segment. Thus, on line 10, $ points to
the byte after the array storage space. Therefore, ($ - t e s t _ m a r k s ) gives the storage space in
bytes and dividing this by four gives the number of elements in the array. We are using the indexed
addressing mode with a scale factor of four on lines 19 and 22. Remember that the scale factor is
only allowed in the 32-bit mode.

Illustrative Examples
We now present several examples to illustrate the usefulness of the various addressing modes. The
first example sorts an array of integers using the insertion sort algorithm, and the second example
implements a binary search to locate a value in a sorted array. Our last example demonstrates how
2-dimensional array are manipulated in the assembly language.
Example 13.1 Sorting an integer array using the insertion sort.
This example requests a set of integers from the user and displays these numbers in sorted order.
The main procedure reads a maximum of MAX_SIZE integers (lines 20 to 28). It accepts only
nonnegative numbers. Entering a negative number terminates the input (lines 24 and 25).

Chapter 13 • Addressing Modes

283

The main procedure passes the array pointer and its size (lines 30 to 34) to the insertion sort
procedure. The remainder of the main procedure displays the sorted array returned by the sort
procedure. Note that the main procedure uses the indirect addressing mode on lines 26 and 41.
The basic principle behind the insertion sort is simple: insert a new number into the sorted
array in its proper place. To apply this algorithm, we start with an empty array. Then insert
the first number. Now the array is in sorted order with just one element. Next insert the second
number in its proper place. This results in a sorted array of size two. Repeat this process until all
the numbers are inserted. The pseudocode for this algorithm, shown below, assumes that the array
index starts with 0:
i n s e r t i o n _ s o r t (array, size)
for (/ = 1 to size—1)
temp := array [/]
while ((temp < array[/]) AND (j > 0))
array[y+l] := array [/]
J :=7 - 1
end while
array [y+l] :=temp
end for
end i n s e r t i o n _ s o r t
Here, index / points to the number to be inserted. The array to the left of / is in sorted order.
The numbers to be inserted are the ones located at or to the right of index /. The next number to
be inserted is at /. The implementation of the insertion sort procedure, shown in Program 13.2,
follows the pseudocode.
Program 13.2 Insertion sort
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Sorting an array by insertion sort

INS_SORT.ASM

Objective: To sort an integer array using insertion sort.
Input: Requests numbers to fill array.
Output: Displays sorted array.
%include "io.mac"
.DATA
MAX_SIZE
input_prompt
out_msg

EQU
db
db
db

.UDATA
array

resd

100
"Please enter input array: "
"(negative number terminates input)",0
"The sorted array is:",0

MAX_SIZE

.CODE
.STARTUP
PutStr input_prompt ; request input array
mov
EBX,array
mov
ECX,MAX SIZE

284

22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
65:
66:
67:
68:
69:
70:
71:
72:

Assembly Language Programming in Linux

array_loop:
GetLInt
cmp
jl
mov
add
loop
exit_loop:
mov
sub
shr
push
push
call
PutStr
nwln
mov
mov
display_loop:
PutLInt
nwln
add
loop
done:
.EXIT

EAX
EAX, 0
exit_loop
[EBX],EAX
EBX,4
array_loop

read an array number
negative number?
if so, stop reading numbers
otherwise, copy into array
increment array address
iterates a maximum of MAX_SIZE

EDX,EBX
EDX,array
EDX,2
EDX
array
insertion_sort
out_msg
;

EDX keeps the actual array size
EDX = array size in bytes
divide by 4 to get array size
push array size & array pointer

display sorted array

ECX,EDX
EBX,array
[EBX]

EBX, 4
display_loop

This procedure receives a pointer to an array of integers
and the array size via the stack. The array is sorted by
using insertion sort. All registers are preserved.
%define
SORT_ARRAY
EBX
insertion_sort:
pushad
save registers
EBP,ESP
mov
copy array pointer
EBX, [EBP+36]
mov
copy array size
ECX, [EBP+40]
mov
array left of ESI is sorted
ESI,4
mov
for_loop:
; variables of the algorithm are mapped as follows.
; EDX = temp, ESI = i, and EDI = j
EDX,[SORT_ARRAY+ESI] ; temp = array [i]
mov
EDI,ESI
; j = imov
EDI,4
sub
while_loop:
temp < array[j]
EDX, [SORT_ARRAY+EDI]
cmp
exi t_whi1e_loop
jge
; array[j+1] = array[j]
mov
EAX,[SORT_ARRAY+EDI]
mov
[S0RT_ARRAY+EDI+4],EAX
sub
EDI,4
; j = j-1

Chapter 13 • Addressing Modes

73
74
75
76
77
78
79
80
81
82
83
84

285

cmp
EDI,0
; j >= 0
jge
while_loop
exi t_whi1e_loop:
; array [j+1] = temp
mov
[S0RT_ARRAY+EDI+4],EDX
add
ESI,4
/ i = i+1
dec
ECX
cmp
ECX,1
; if ECX = 1, we are done
jne
for_loop
sort_done:
popad
; restore registers
ret
8

Since the sort procedure does not return any value to the main program in registers, we can use
p u s h a d (line 55) and popad (line 83) to save and restore registers. As p u s h a d saves all eight
registers on the stack, the offset is appropriately adjusted to access the array size and array pointer
parameters (lines 57 and 58).
The w h i l e loop is implemented by lines 66 to 74, and the f o r loop is implemented by lines
60 to 81. Note that the array pointer is copied to the EBX (line 57), and line 53 assigns a convenient
label to this. We have used the based-indexed addressing mode on lines 63, 67, and 70 without
any displacement and on lines 71 and 77 with displacement. Based addressing is used on lines 57
and 58 to access parameters from the stack.
Example 13.2 Binary search procedure,
Binary search is an efficient algorithm to locate a value in a sorted array. The search process starts
with the whole array. The value at the middle of the array is compared with the number we are
looking for: if there is a match, its index is returned. Otherwise, the search process is repeated
either on the lower half (if the number is less than the value at the middle), or on the upper half
(if the number is greater than the value at the middle). The pseudocode of the algorithm is given
below:
b i n a r y _ s e a r c h (array, size, number)
lower := 0
upper := size — 1
while (lower < upper)
middle := (lower -f upper)/2
if (number = array[middle])
then
return (middle)
else
if (number < array [middle])
then
upper := middle — 1
else
lower := middle + 1
end if
end if

286

Assembly Language Programming in Linux

end while
return (0)
{number not found}
end b i n a r y _ s e a r c h
The listing of the binary search program is given in Program 13.3. The main procedure is similar
to that in the last example. In the binary search procedure, the lower and upper index variables are
mapped to the AX and CX registers, respectively. The number to be searched is stored in the DX
and the array pointer is in the EBX. Register SI keeps the middle index value.
Program 13.3 Binary search
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33 :
34
35 :
36 :
37 :
38 :
39 :
40 :

/Binary search of a sorted integer array

BIN_SRCH.ASM

Objective: To implement binary search of a sorted
integer array.
/
Input: Requests numbers to fill array and a
number to be searched for from user.
;
Output: Displays the position of the number in
the array if found; otherwise, not found
message.
%include "io.mac"
;

.DATA
MAX_SIZE
input_prompt
qu e ry_numb e r
out_msg
not_found_msg
query_msg

EQU
db
db
db
db
db
db

.UDATA
array

resw

100
"Please enter input array (in sorted order):
"(negative number terminates input)",0
"Enter the number to be searched: ",0
"The number is at position ",0
"Number not in the arrayl",0
"Do you want to quit (Y/N): ",0

MAX_SIZE

.CODE
.STARTUP
PutStr input__prompt ; request input array
nwln
ESI,ESI
• set index to zero
sub
CX,MAX_SIZE
mov
array_loop:
; read an array number
Getint AX
cmp
jl
mov
inc
loop
exit_loop:
read_input:
PutStr

; negative number?
AX,0
exit_loop
; if so, stop reading numbers
[array+ESI*2] ,AX ; otherwise, copy into array
ESI
; increment array index
; iterates a maximum of MAX_SIZE
array_loop

query_number ; request number to be searched for

Chapter 13 • Addressing Modes

41:
42:
43:
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91

287

Getint AX
; read the number
AX
push
; push number, size & array pointer
SI
push
array
push
binary_search
call
; binary_search returns in AX the position of the number
; in the array; if not found, it returns 0.
cmp
AX, 0
number found?
not_found
if not, display number not found
je
PutStr out_msg
• else, display number position
Putint AX
jmp
user_query
not_found:
PutStr not_found_msg
user_ query:
nwln
• query user whether to terminate
PutStr query_msg
AL
• read response
GetCh
AL,'Y'
' if response is not 'Y'
cmp
jne
read_input
• repeat the loop
done:
' otherwise, terminate program
.EXIT

;
/
;
;
;

This procedure receives a pointer to an array of integers,
the array size, and a number to be searched via the stack.
It returns in AX the posi.tion of the number in the array
if found; otherwise, returns 0.
All registers, except AX, are preserved.

binary_search:
enter
push
push
push
push
mov
mov
mov
xor
dec
while_loop:
cmp
ja
sub
mov
add
shr
cmp
je
jg

0,0
EBX
ESI
CX
DX
EBX, [EBP+8]
CX,[EBP+12]
DX,[EBP+14]
AX, AX
CX

;
;
;
;
;

copy array pointer
copy array size
copy number to be searched
lower = 0
upper = size-1

;lower > upper?
AX,CX
end_while
ESI,ESI
; middle = (lower + upper)/2
SI, AX
SI,CX
SI,1
DX, [EBX+ESI^'2]
; number = array[middle]?
search_done
upper_half

288
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

Assembly Language Programming in Linux
lower_half:
dec
mov
jmp
upper_half:
inc
mov
jmp
end_while:
sub
jmp
search_done:
inc
mov
skipl:
pop
pop
pop
pop
leave
ret

SI
CX,SI
while_ loop

; middle = middle-1
; upper = middle-1

SI
AX, SI
while_ loop

; middle = middle+1
/ lower = middle+1

AX, AX
skipl

; number not found (clear AX)

SI
AX, SI

; position = index+1
; return position

DX
CX
ESI
EBX

; restore registers

Since the binary search procedure returns a value in the AX register, we cannot use the pusha
instruction as in the last example. On line 89, we use a scale factor of two to convert the index value
in SI to byte count. Also, a single comparison (line 89) is sufficient to test multiple conditions (i.e.,
equal to, greater than, or less than). If the number is found in the array, the index value in SI is
returned via AX (line 105).
Example 13.3 Finding the sum of a column in a two-dimensional array.
This example illustrates how two-dimensional arrays are manipuilated in the assembly language.
This example also demonstrates the use of advanced addressing modes in accessing multidimensional arrays.
Consider the c l a s s _ m a r k s array representing the test scores of a class. For simplicity,
assume that there are only five students in the class. Also, assume that the class is given three
tests. As we have discussed before, we can use a 5 x 3 array to store the marks. Each row
represents the three test marks of a student in the class. The first column represents the marks of
the first test; the second column represents the marks of the second test, and so on. The objective
of this example is to find the sum of the last test marks for the class. The program listing is given
below.
Program 13.4 Finding the sum of a column in a two-dimensional array
Sum of a column in a 2-dimensional array

TEST_SUM.ASM

Objective: To demonstrate array index manipulation
in a two-dimensional array of integers.
Input: None.
Output: Displays the sum.

Chapter 13 • Addressing Modes

289

%include " i o . m a c "
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

.DATA
NO_ROWS
NO_COLUMNS
NO_ROW_BYTES
class_marks

sum_msg

EQU
EQU
EQU
dw
dw
dw
dw
dw

5
3
NO_COLUMNS * 2
90,89,99
79,66,70
70,60,77
60,55,68
51,59,57

"The sum of the last test marks is: ",0

; number of bytes per row

.CODE
.STARTUP
/ loop iteration count
CX,]MO_ROWS
mov
AX,AX
; sum = 0
sub
; ESI = index of class_marks[0,2]
EBX , EBX
sub
mov
ESI ,N0_C0LUMNS-1
sum_loop:
AX, [class_marks+EBX+ESI*2]
add
EBX ,NO_ROW_BYTES
add
sum__loop
loop
PutStr
Putint
nwln

sum__msg
AX

done:
.EXIT

To access individual test marks, we use based-indexed addressing with a displacement on
line 29. Note that even though we have used
[class_marks+EBX+ESI*2]

it is translated by the assembler as
[EBX+(ESI*2)+constant]

where c o n s t a n t is the offset of c l a s s _ m a r k s . For this to work, the EBX should store the
offset of the row in which we are interested. For this reason, after initializing the EBX to zero
to point to the first row (line 29), NO_ROW_BYTES is added in the loop body (line 30). The ESI
register is used as the column index. This works for row-major ordering.

Summary
The addressing mode refers to the specification of operands required by an assembly language
instruction. We discussed several memory addressing modes supported by the IA-32 architecture.

290

Assembly Language Programming in Linux

We showed by means of examples how these addressing modes are useful in supporting features
of high-level languages.
Arrays are useful for representing a collection of related data. In high-level languages, programmers do not have to worry about the underlying storage representation used to store arrays in
the memory. However, when manipulating arrays in the assembly language, we need to know this
information. This is so because accessing individual elements of an array involves computing the
corresponding displacement value. Although there are two common ways of storing a multidimensional array—row-major or column-major order—most high-level languages, including C, use the
row-major order. We presented examples to illustrate how one- and two-dimensional arrays are
manipulated in the assembly language.

14
Arithmetic Instructions
We start this chapter with a detailed discussion of the six status flags—zero, carry, overflow, sign,
parity, and auxiliary flags. We have already used these flags in our assembly language programs.
The discussion here helps us understand how the processor executes some of the conditional jump
instructions. The next section deals with multiplication and division instructions. The IA-32
instruction set includes multiplication and division instructions for both signed and unsigned integers. We then present several examples to illustrate the use of the instructions discussed in this
chapter The chapter concludes with a summary.

Introduction
We have discussed the flags register in Chapter 4. Six flags in this register are used to monitor
the outcome of the arithmetic, logical, and related operations. By now you are familiar with the
purpose of some of these flags. The six flags are the zero flag (ZF), carry flag (CF), overflow flag
(OF), sign flag (SF), auxiliary flag (AF), and parity flag (PF). For obvious reasons, these six flags
are called the status flags.
When an arithmetic operation is performed, some of the flags are updated (set or cleared) to
indicate certain properties of the result of that operation. For example, if the result of an arithmetic
operation is zero, the zero flag is set (i.e., ZF = 1). Once theflagsare updated, we can use the conditional branch instructions to alter flow control. We have discussed several types of conditional
jump instructions, including jump on less than or equal, greater than, and so on. However, we have
not described how the jumps test for the condition. We discuss these details in this chapter.
The IA-32 instruction set provides several arithmetic instructions. We have already used some
of these instructions (e.g., add and sub). The instruction set supports the four basic operations:
addition, subtraction, multiplication, and division. The addition and subtraction operations do not
require separate instructions for signed and unsigned numbers. In fact, we do not need even the
subtract instructions as the subtract operation can be treated as adding a negative value.
Multiplication and division operations, however, need separate instructions. In addition, the
format of these instructions is slighdy different in the sense they typically specify only a single
operand. The other operand is assumed to be in a designated register. Since we have covered the
addition and subtraction instructions in Chapter 9, we will focus on the multiplication and division
instructions in this chapter.

292

Assembly Language Programming in Linux

Status Flags
The six status flags are affected by most of the arithmetic instructions we discuss in this chapter.
You should note that once a flag is set or cleared, it remains in that state until another instruction
changes its value. Also note that not all assembly language instructions affect all the flags. Some
instructions affect all six status flags, whereas other instructions affect none of the flags. And
there are other instructions that affect only a subset of these flags. For example, the arithmetic
instructions add and sub affect all sixflags,but i n c and dec instructions affect all but the carry
flag. The mov, push, and pop instructions, on the other hand, do not affect any of the flags.
Here is an example illustrating how the zero flag changes with instruction execution.
;initially, assume that ZF is 0
mov
EAX, 55H ; ZF is still 0
EAX, 55H ; result is zero
sub
Thus, ZF is set (ZF = 1
EBX
push
/ ZF remains 1
mov
EBX,EAX ; ZF remains 1
pop
EDX
/ ZF remains 1
ZF remains 1
mov
ECX, 0
result is 1
ECX
inc
/ Thus, ZF is cleared (ZF = 0)

As we show later, these flags can be tested either individually or in combination to affect the flow
control of a program.
In understanding the workings of these status flags, you should know how signed and unsigned integers are represented. At this point, it is a good idea to review the material presented in
Appendix A.
The Zero Flag

The purpose of the zero flag is to indicate whether the execution of the last instruction that affects
the zero flag has produced a zero result. If the result was zero, ZF = 1; otherwise, ZF = 0. This is
slightly confusing! You may want to take a moment to see through the confusion.
Although it is fairly intuitive to understand how the sub instruction affects the zero flag, it is
not so obvious with other instructions. The following examples show some typical cases.
The code
mov
add

AL,OFH
AL,0F1H

sets the zero flag (i.e., ZF = 1). This is because, after executing the add instruction, the AL would
contain zero (all eight bits zero). In a similar fashion, the code
mov
inc

AX,OFFFFH
AX

also sets the zero flag. The same is true for the following code:
mov
dec

EAX,1
EAX

Chapter 14 • Arithmetic Instructions

293

Related Instructions The following two conditional jump instructions test this flag:
jz
jump if zero (jump is taken if ZF = 1)
j nz jump if not zero Qump is taken if ZF = 0)
Usage There are two main uses for the zero flag: testing for equality, and counting to a preset
value.
Testing for Equality: The cmp instruction is often used to do this. Recall that cmp performs
subtraction. The main difference between cmp and sub is that cmp does not store the result of
the subtract operation. The cmp instruction performs the subtract operation only to set the status
flags. Here are some examples:
cmp

char,'$'

; ZF = 1 i f

char is

Similarly, two registers can be compared to see if they both have the same value.
cmp

EAX,EBX

Counting to a Preset Value: Another important use of the zero flag is shown below. Consider the
following code:
sum = 0
for (i = 1 to M)
for (j = 1 to N)
sum = sum + I
end for
end for
The equivalent code in the assembly language is written as follows (assume that both M and
A^ are > 1):
sub
mov
outer_loop:
mov
inner_loop:
inc
loop
dec
jnz
exit_loops:
mov

EAX,EAX
EDX,M

EAX = 0 (EAX s t o r e s

sum)

ECX,N
EAX
inner_loop
EDX
outer_loop
sum,EAX

In the above example, the inner loop count is placed in the ECX register so that we can use the
l o o p instruction to iterate. Incidentally, the l o o p instruction does not affect any of the flags.
Since we have two nested loops to handle, we are forced to use another register to keep the
outer loop count. We use the dec instruction and the zeroflagto see if the outer loop has executed
M times. This code is more efficient than initializing the EDX register to one and using the code
inc
EDX
cmp
EDX,M
jle

outer_loop

in place of the dec/j nz instruction combination.

294

Assembly Language Programming in Linux

The Carry Flag

The carry flag records the fact that the result of an arithmetic operation on unsigned numbers is
out of range (too big or too small) to fit the destination register or memory location. Consider the
example
mov
add

AL,OFH
AL,0F1H

The addition of OFH and FIH would produce a result of lOOH that requires 9 bits to store, as
shown below.
OOOOllllB
IIIIOOOIB
1 OOOOOOOOB

(OFH= 15D)
(F1H = 241D)
(100H = 256D)

Since the destination register AL is only 8 bits long, the carry flag would be set to indicate that the
result is too big to be held in AL.
To understand when the carryflagwould be set, it is helpful to remember the range of unsigned
numbers that can be represented. The range is given below for easy reference.
Size (bits)
8
16
32

Range
0 to 255
0 to 65,535
0 to 4,294,967,295

Any operation that produces a result that is outside this range sets the carry flag to indicate an
underflow or overflow condition. It is obvious that any negative result is out of range, as illustrated
by the following example:
mov
sub

EAX,12AEH
EAX,12AFH

;EAX = 4782D
;EAX = 4782D - 4783D

Executing the above code will set the carryflagbecause 12AFH - 12AFH produces a negative
result (i.e., the subtract operation generates a borrow), which is too small to be represented using
unsigned numbers. Thus, the carry flag is set to indicate this underflow condition.
Executing the code
mov
inc

AL,OFFH
AL

or the code
mov
dec

EAX,0
EAX

does not set the carry flag as we might expect because i n c and dec instructions do not affect the
carry flag.
Related Instructions The following two conditional jump instructions test this flag:
jc
j nc

jump if carry (jump is taken if CF = 1)
jump if not carry (jump is taken if CF = 0)

Chapter 14 • Arithmetic Instructions

295

Usage The carry flag is useful in several situations:
• To propagate carry or borrow in multiword addition or subtraction operations.
• To detect overflow/underflow conditions.
• To test a bit using the shift/rotate family of instructions.
To Propagate Carry/Borrow: The assembly language arithmetic instructions can operate on 8-,
16-, or 32-bit data. If two operands, each more than 32 bits, are to be added, the addition has to
proceed in steps by adding two 32-bit numbers at a time. The following example illustrates how
we can add two 64-bit unsigned numbers. For convenience, we use the hex representation.
1 ^r- carry from lower 32 bits
X =
y =

3710 26A8 1257 9AE7H
489B A321 FE60 4213H
7FAB C9CA 10B7 DCFAH

To accomplish this, we need two addition operations. The first operation adds the least significant (lower half) 32 bits of the two operands. This produces the lower half of the result. This
addition operation could produce a carry that should be added to the upper 32 bits of the input.
The other add operation performs the addition of the most significant (upper half) 32 bits and any
carry generated by the first addition. This operation produces the upper half of the 64-bit result.
As an example consider adding two 64-bit numbers in the registers EBX:EAX and EDX:ECX
with EAX and ECX holding the lower 32-bit values of the two numbers. Then we can use the
following code to add these two values:
add
adc

EAX,ECX
EBX,EDX

It leaves the 64-bit result in the EBX:EAX register pair. Notice that we use adc to do the second
addition as we want to add any carry generated by thefirstaddition. An overflow occurs if there is
a carry out of the second addition, which sets the carry flag.
We can extend this process to larger numbers. For example, adding two 128-bit numbers
involves a four-step process, where each step adds two 32-bit words. Thefirstaddition can be done
using add but the remaining three additions must be done with the adc instruction. Similarly, the
sub and other operations also require multiple steps when the numbers require more than 32 bits.
To Detect Overflow/Underflow Conditions: In the previous example, if the second addition
produces a carry, the result is too big to be held by 64 bits. In this case, the carry flag would be set
to indicate the overflow condition. It is up to the programmer to handle such error conditions.
Testing a Bit: When using shift and rotate instructions (introduced in Chapter 9), the bit that has
been shifted or rotated out is captured in the carry flag. This bit can be either the most significant
bit (in the case of a left-shift or rotate), or the least significant bit (in the case of a right-shift
or rotate). Once the bit is in the carry flag, conditional execution of the code is possible using
conditional jump instructions that test the carry flag: j c (jump on carry) and j nc (jump if no
carry).

296

Assembly Language Programming in Linux

Why inc and dec Do Not Affect the Carry Flag We have stated that the i n c and dec instructions do not affect the carry flag. The rationale for this is twofold:
1. The instructions i n c and dec are typically used to maintain iteration or loop count. Using
32 bits, the number of iterations can be as high as 4,294,967,295. This number is sufficiently
large for most applications. What if we need a count that is greater than this? Do we have
to use add instead of inc? This leads to the second, and the main, reason.
2. The condition detected by the carryflagcan also be detected by the zeroflag.Why? Because
i n c and dec change the number only by 1. For example, suppose that the ECX register
has reached its maximum value 4,294,967,295 (FFFFFFFFH). If we then execute
inc

ECX

we would normally expect the carryflagto be set to 1. However, we can detect this condition
by noting that ECX = 0, which sets the zero flag. Thus, setting the carry flag is really
redundant for these instructions.
The Overflow Flag
The overflow flag, in some respects, is the carry flag counterpart for the signed number arithmetic.
The main purpose of the overflow flag is to indicate whether an operation on signed numbers has
produced a result that is out of range. It is helpful to recall the range of signed numbers that can
be represented using 8, 16, and 32 bits. For your convenience, this range is given below:
Size (bits)
8
16
32

Range
-128 to+127
-32,768 to +32,767
-2,147,483,648 to +2,147,483,647

Lecuting the code
mov
add

AL,72H
AL,OEH

; 72H = 114D
/ OEH = 14D

will set the overflow flag to indicate that the result 80H (128D) is too big to be represented as an
8-bit signed number. The AL register will contain 80H, the correct result if the two 8-bit operands
are treated as unsigned numbers. But AL contains an incorrect answer for 8-bit signed numbers
(80H represents —128 in signed representation, not +128 as required).
Here is another example that uses the sub instruction. The AX register is initialized to —5,
which is FFFBH in 2's complement representation using 16 bits.
mov

AX,OFFFBH

; AX = -5

sub

AX,7FFDH

; subtract 32,765 from AX

Execution of the above code will set the overflow flag as the result
(-5)-(32,765)=-32,770
which is too small to be represented as a 16-bit signed number.
Note that the result will not be out of range (and hence the overflow flag will not be set) when
we are adding two signed numbers of opposite sign or subtracting two numbers of the same sign.

Chapter 14 • Arithmeticlnstructions

297

Signed or Unsigned: How Does tlie System Know? The values of the carry and overflow flags
depend on whether the operands are unsigned or signed numbers. Given that a bit pattern can be
treated both as representing a signed and an unsigned number, a question that naturally arises is:
How does the system know how your program is interpreting a given bit pattern? The answer is
that the processor does not have a clue. It is up to our program logic to interpret a given bit pattern
correctly. The processor, however, assumes both interpretations and sets the carry and overflow
flags. For example, when executing
mov
add

AL,72H
AL,OEH

the processor treats 72H and OEH as unsigned numbers. And since the result BOH (128) is within
the range of 8-bit unsigned numbers (0 to 255), the carry flag is cleared (i.e., CF = 0). At the same
time, 72H and OEH are also treated as representing signed numbers. Since the result 80H (128) is
outside the range of 8-bit signed numbers (-128 to +127), the overflow flag is set.
Thus, after executing the above two lines of code, CF = 0 and OF = 1. It is up to our program
logic to take whichever flag is appropriate. If you are indeed representing unsigned numbers,
disregard the overflow flag. Since the carry flag indicates a valid result, no exception handling is
needed.
mov
AL,72H
add
AL,OEH
jc
overflow
no_overflow:
(no overflow code here)
overflow:
(overflow code here)

If, on the other hand, 72H and OEH are representing 8-bit signed numbers, we can disregard
the carry flag value. Since the overflow flag is 1, our program will have to handle the overflow
condition.
mov
AL,72H
add
AL,OEH
jo
overflow
no_overflow:
(no overflow code here)
overflow:
(overflow code here)

Related Instructions The following two conditional jump instructions test this flag:
j o jump on overflow (jump is taken if OF = 1)
j no jump on no overflow (jump is taken if OF = 0)
In addition, a special software interrupt instruction
i n t o interrupt on overflow
is provided to test the overflow flag. Interrupts are discussed in Chapter 20.

298

Assembly Language Programming in Linux

Usage The main purpose of the overflow flag is to indicate whether an arithmetic operation on
signed numbers has produced an out-of-range result. The overflow flag is also affected by shift,
multiply, and divide operations. More details on some of these instructions can be found in later
sections of this chapter.
The Sign Flag

As the name implies, the sign flag indicates the sign of the result of an operation. Therefore, it
is useful only when dealing with signed numbers. Recall that the most significant bit is used to
represent the sign of a number: 0 for positive numbers and 1 for negative numbers. The sign
flag gets a copy of the sign bit of the result produced by arithmetic and related operations. The
following sequence of instructions
mov
add

EAX,15
EAX,97

will clear the signflag(i.e., SF = 0) because the result produced by the add instruction is a positive
number: 112D (which is 01110000 in binary).
The result produced by
mov
sub

EAX,15
EAX,97

is a negative number and sets the sign flag to indicate this fact. Remember that negative numbers
are represented in 2s complement notation (see Appendix A). As discussed in Appendix A, the
subtract operation can be treated as the addition of the corresponding negative number. Thus,
15 — 97 is treated as 15 + (-97), where, as usual, -97 is expressed in 2s complement form.
Therefore, after executing the above two instructions, the EAX register contains AEH, as shown
below:
0000111 IB
+ lOOlllllB
lOlOlllOB

(8-bit signed form of 15)
(8-bit signed number for -97)

Since the sign bit of the result is 1, the result is negative and is in 2s complement form. You
can easily verify that AEH is the 8-bit signed form of -82, which is the correct answer.
Related Instructions The following two conditional jump instructions test this flag:
j s jump on sign (jump is taken if SF = 1)
j ns jump on no sign (jump is taken if SF = 0)
The j s instruction causes the jump if the last instruction that updated the sign flag produced a
negative result. The j ns instruction causes the jump if the result was nonnegative.
Usage The main use of the sign flag is to test the sign of the result produced by arithmetic and
related instructions. Another use for the sign flag is in implementing counting loops that should
iterate until (and including) the control variable is zero. For example, consider the following code:
for (i = M downto 0)
< loop body >
end for

Chapter 14 • Arithmetic Instructions

299

This loop can be implemented without using a cmp instruction as follows:
mov
for_loop:

ECX,M

dec
jns

ECX
for_loop

If we do not use the j ns instruction, we have to use
cmp
jl

ECX,0
for_loop

in its place.
From the user point of view, the sign bit of a number can be easily tested by using a logical or
shift instruction. Compared to the other three flags we have discussed so far, the sign flag is used
relatively infrequently in user programs. However, the processor uses the sign flag when executing
conditional jump instructions on signed numbers (details are in Chapter 15 on page 322).
The Auxiliary Flag

The auxiliary flag indicates whether an operation has produced a result that has generated a carry
out of or a borrow into the low-order four bits of 8-, 16-, or 32-bit operands. In computer jargon,
four bits are referred to as a nibble. The auxiliary flag is set if there is such a carry or borrow;
otherwise it is cleared.
In the example
mov
add

AL,4 3
AL,94

the auxiliary flag is set because there is a carry out of bit 3, as shown below:
1 <— carry generated from lower to upper nibble
43D
94D

=00101011B
=01011110B

137D

=10001001B

You can verify that executing the following code clears the auxiliary flag:
mov
add

AL,4 3
AL,84

Since the following instruction sequence
mov
sub

AL,4 3
AL,92

generates a borrow into the low-order 4 bits, the auxiliary flag is set. On the other hand, the
instruction sequence
mov
sub

AL,4 3
AL,8 7

clears the auxiliary flag.

300

Assembly Language Programming in Linux

Related Instructions and Usage There are no conditional jump instructions that test the auxiliary flag. However, arithmetic operations on numbers expressed in decimal form or binary coded
decimal (BCD) form use the auxiliary flag. Some related instructions are as follows:
aaa
aas
aam
aad
daa
das

ASCII adjust for addition
ASCII adjust for subtraction
ASCII adjust for multiplication
ASCII adjust for division
Decimal adjust for addition
Decimal adjust for subtraction

For details on these instructions and BCD numbers, see Chapter 18.
The Parity Flag

This flag indicates the parity of the 8-bit result produced by an operation; if this result is 16 or 32
bits long, only the lower-order 8 bits are considered to set or clear the parity flag. The parity flag is
set if the byte contains an even number of 1 bits; if there are an odd number of 1 bits, it is cleared.
In other words, the parity flag indicates an even parity condition of the byte.
Thus, executing the code
mov
add

AL,53
AL,89

will set the parity flag because the result contains an even number of Is (four 1 bits), as shown
below:
53D = OOIIOIOIB
89D =01011001B
142D =lOOOlllOB
The instruction sequence
mov
sub

AX,23994
AX,9182

on the other hand, clears the parity flag, as the low-order 8 bits contain an odd number of 1 s (five
1 bits), as shown below:
+

23994D =01011101 lOlllOlOB
-9182D =11011100 OOIOOOIOB

14813D =00111001 llOlllOOB
Related Instructions The following two conditional jump instructions test this flag:
jp
j np

jump on parity G^mp is taken if PF = 1)
jump on no parity (jump is taken if PF = 0)

The j p instruction causes the jump if the last instruction that updated the parity flag produced an
even parity byte; the j n p instruction causes the jump for an odd parity byte.

Chapter 14 • Arithmetic Instructions

301

Usage This flag is useful for writing data encoding programs. As a simple example, consider
transmission of data via modems using the 7-bit ASCII code. To detect simple errors during data
transmission, a single parity bit is added to the 7-bit data. Assume that we are using even parity
encoding. That is, every 8-bit character code transmitted will contain an even number of 1 bits.
Then, the receiver can count the number of Is in each received byte and flag transmission error if
the byte contains an odd number of 1 bits. Such a simple encoding scheme can detect single bit
errors (in fact, it can detect an odd number of single bit errors).
To encode, the parity bit is set or cleared depending on whether the remaining 7 bits contain
an odd or even number of 1 s, respectively. For example, if we are transmitting character A, whose
7-bit ASCII representation is 41H, we set the parity bit to 0 so that there is an even number of Is.
In the following examples, the parity bit is the leftmost bit:
A = 01000001

For character C, the parity bit is set because its 7-bit ASCII code is 43H.
C = 11000011

Here is a procedure that encodes the 7-bit ASCII character code present in the AL register. The
most significant bit (i.e., leftmost bit) is assumed to be zero.
parity_encode PROC
shl
AL
jp
parity_zero
stc
; CF = 1
jmp
move_parity_bit
parity_zero:
clc
; CF = 0
move_parity_bit:
rcr
AL
parity_encode ENDP

Flag Examples
Here we present two examples to illustrate how the status flags are affected by the arithmetic
instructions. You can verify the answers by using a debugger (see Chapter 8 for information on
debuggers).
Example 14.1 Add/subtract example.
Table 14.1 gives some examples of add and sub instructions and how they affect the flags. Updating of ZF, SF, and PF is easy to understand. The ZF is set whenever the result is zero; SF is
simply a copy of the most significant bit of the result; and PF is set whenever there are an even
number of Is in the result. In the rest of this example, we focus on the carry and overflow flags.
Example 1 performs -5—123. Note that —5 is represented internally as FBH, which is treated
as 251 in unsigned representation. Subtracting 123 (=7BH) leaves 80H (=128) in AL. Since the
result is within the range of unsigned 8-bit numbers, CF is cleared. For the overflow flag, the
operands are interpreted as signed numbers. Since the result is -128, OF is also cleared.
Example 2 subtracts 124 from - 5 . For reasons discussed in the previous example, the CF is
cleared. The OF, however, is set because the result is —129, which is outside the range of signed
8-bit numbers.

302

Assembly Language Programming in Linux

Table 14,1 Examples illustrating the effect on flags

Code

mov
sub

AL, - 5
AL,123

80H

Example 2

mov
sub

AL, - 5
AL,124

7FH

Example 3

mov
add
add

AL, - 5
AL,132
AL, 1

7FH
80H

1
0

0
0

0
1

1
1

0
0

Example 4

sub

AL,AL

OOH

Example 5

mov
add

AL,127
AL,129

OOH

Example 1

In Example 3, the first add statement adds 132 to - 5 . However, when treating them as unsigned numbers, 132 is actually added to 251, which results in a number that is greater than 255D.
Therefore, CF is set. When treating them as signed numbers, 132 is internally represented as 84H
(=-124). Therefore, the result -129 is smaller than -128. Therefore, the OF is also set. After
executing the first add instruction, AL will have 7FH. The second add instruction increments
7FH. This sets the OF, but not CF.
Example 4 causes the result to be zero irrespective of the contents of the AL register. This sets
the zero flag. Also, since the number of Is is even, PF is also set in this example.
The last example adds 127D to 129D. Treating them as unsigned numbers, the result 256D
is just outside the range, and sets CF. However, if we treat them as representing signed numbers,
129D is stored internally as 81H (=—127). The result, therefore, is zero and the OF is cleared.
Example 14.2 A compare example,
This example shows how the status flags are affected by the compare instruction discussed in
Chapter 9 on page 199. Table 14.2 gives some examples of executing the
cmp

AL,DL

instruction. We leave it as an exercise to verify (without using a debugger) the flag values.

Arithmetic Instructions
For the sake of completeness, we list the arithmetic instructions supported by the IA-32 instruction
set:
Addition: add, a d c , i n c
Subtraction: s u b , s b b , d e c , n e g , cmp
Multiplication: mul, imul
Division: d i v , i d i v
Related instructions: cbw, cwd, cdq, cwde, movsx, movzx

303

Chapter 14 • Arithmetic Instructions

Table 14.2 Some exam pies of cmp AL ,DL
AL
CF ZF SF OF PF
DL
56
200
101
200
-105
-125
-124

57
101
200
200
-105
-124
-125

1
0
1
0
0
1
0

0
0
0
1
1
0
0

1
0
1
0
0
1
0

0
1
1
0
0
0
0

1
1
0
1
1
1
0

AF
1
0
1
0
0
1
0

We have already looked at the addition and subtraction instructions in Chapter 9. Here we discuss
the remaining instructions. There are a few other arithmetic instructions that operate on decimal
and BCD numbers. Details of these instructions can be found in Chapter 18.
Multiplication Instructions

Multiplication is more complicated than the addition and subtraction operations for two reasons:
1. First, multiplication produces double-length results. That is, multiplying two n-bit values
produces a 2n-bit result. To see that this is indeed the case, consider multiplying two 8-bit
numbers. Assuming unsigned representation, FFH (255D) is the maximum number that the
source operands can take. Thus, the multiplication produces the maximum result, as shown
below:
11111111 X 11111111
(255D)
(255D)

1111111011111111.
(65025D)

Similarly, multiplication of two 16-bit numbers requires 32 bits to store the result, and two
32-bit numbers require 64 bits for the result.
2. Second, unlike the addition and subtraction operations, multiplication of signed numbers
should be treated differently from that of unsigned numbers. This is because the resulting
bit pattern depends on the type of input, as illustrated by the following example:
We have just seen that treating FFH as the unsigned number results in multiplying 255D x
255D.
11111111 X 11111111= 1111111011111111.
Now, what if FFH is representing a signed number? In this case, FFH is representing - ID
and the result should be 1, as shown below:
11111111 X 11111111 = 00000000 00000001.

304

Assembly Language Programming in Linux

As you can see, the resulting bit patterns are different for the two cases.
Thus, the instruction set provides two multiplication instructions: one for unsigned numbers
and the other for signed numbers. We first discuss the unsigned multiplication instruction, which
has the format
mul

source

The s o u r c e operand can be in a general-purpose register or in memory. Immediate operand
specification is not allowed. Thus,
mul

invalid
is an invalid instruction. The mul instruction works on 8-, 16-, and 32-bit unsigned numbers. But,
where is the second operand? The instruction assumes that it is in the accumulator. If the source
operand is a byte, it is multiplied by the contents of the AL register. The 16-bit result is placed in
the AX register, as shown below:
10

High-order 8 bits
AL

8-bit
source

Low-order 8 bits
AL

If the source operand is a word, it is multiplied by the contents of the AX register and the
doubleword result is placed in DX:AX, with the AX register holding the lower-order 16 bits, as
shown below:
High-order 16 bits
AX

16-bit
source

Low-order 16 bits
AX

If the source operand is a doubleword, it is multiplied by the contents of the EAX register and
the 64-bit result is placed in EDX:EAX, with the EAX register holding the lower-order 32 bits, as
shown below:
High-order 32 bits
EAX

32-bit
source

EDX

Low-order 32 bits
EAX

The mul instruction affects all six status flags. However, it updates only the carry and overflow
flags. The remaining four flags are undefined. The carry and overflow flags are set if the upper
half of the result is nonzero; otherwise, they are both cleared.
Setting of the carry and overflow flags does not indicate an error condition. Instead, this
condition implies that AH, DX, or EDX contains significant digits of the result.
For example, the code
mov
mov
mul

AL, 10
DL, 25
DL

clears both the carry and overflow flags, as the result of the mul instruction is 250, which can be
stored in the AL register (and the AH register contains 0 0 0 0 0 0 0 0). On the other hand, executing

Chapter 14 • Arithmetic Instructions

mov
mov
mul

305

AL, 10
DL, 26
DL

sets the carry and overflow flags indicating that the result is more than 255.
For signed numbers, we have to use the imul (integer multiplication) instruction, which has
the same format^ as the mul instruction
imul

source

The behavior of the imul instruction is similar to that of the mul instruction. The only difference
to note is that the carry and overflow flags are set if the upper half of the result is not the sign extension of the lower half. To understand sign extension in signed numbers, consider the following
example. We know that -66 is represented using 8 bits as
10111110.
Now, suppose that we can use 16 bits to represent the same number. Using 16 bits, -66 is represented as
1111111110111110.
The upper 8 bits are simply sign-extended (i.e., the sign bit is copied into these bits), and doing so
does not change the magnitude.
Following the same logic, the positive number 66, represented using 8 bits as
01000010
can be sign-extended to 16 bits by adding eight leading zeros as shown below:
0000000001000010.
As with the mul instruction, setting of the carry and overflow flags does not indicate an error
condition; it simply indicates that the result requires double length.
Here are some examples of the imul instruction. Execution of the following code
mov
mov
imul

DL,OFFH
AL,42H
DL

; DL = -1
; AL = 66

causes the result
1111111110111110
to be placed in the AX register. The carry and overflow flags are cleared, as AH contains the sign
extension of the AL value. This is also the case for the following code:
mov
mov
imul

DL,OFFH
AL,OBEH
DL

; DL = -1
/ AL = -66

^The imul instruction supports several other formats, including specification of an immediate value. We do not discuss
these details; see Intel's IA-32 Architecture Software Developer's Manual.

306

Assembly Language Programming in Linux

which produces the result
0000000001000010

(+66)

in the AX register. Again, both the carry and overflow flags are cleared.
In contrast, both flags are set for the following code:
mov
mov
imul

DL,2 5
; DL = 2 5
AL,0F6H ; AL = -10
DL

which produces the result
1111111100000110

(-250).

A Note on Multiplication The multiplication instruction is an expensive one in the sense it takes
more time than the other arithmetic instructions like add and sub. (Of course, the division instructions take even more time.) Thus, for some multiplications, we get better performance by not
using the multiplication instructions. For example, to multiply the value in EAX by 2, we do better
by using
add

EAX,EAX

The add instruction takes only one clock cycle whereas the multiplication instruction takes 10+
clock cycles.
As another example, consider multiplication by 10, which is often needed in number conversion routines. We can do this multiplication by using a sequence of additions more efficiently than
the multiplication instruction. For example, if we want to multiply y (in EAX) by 10, we can use
the following code:
add
mov
add
add
add

EAX, ,
EBX, ,
EAX, ,
EAX, ,
EAX, ,

EAX
EAX
EAX
EAX
EBX

;
;
/
;
;

EAX
EBX
EAX
EAX
EAX

==
==
==
==
==

2y
2y
4y
8y
lOy

Since the mov and add instructions take only one clock cycle, this sequence takes only 5 clocks
compared to 10+ clocks for the multiplication instruction. We can do even better by using a mix
of shift and add instructions. If we want to multiply a number by a power of 2, it is better to use
the shift instructions (see our discussion in Chapter 16 on page 351).
Division Instructions

The division operation is even more complicated than multiplication for two reasons:
1. Division generates two result components: a quotient and a remainder.
2. In multiplication, by using double-length registers, overflow never occurs. In division, divide overflow is a real possibility. The processor generates a special software interrupt when
a divide overflow occurs.
As with the multiplication instruction, two versions of the divide instruction are provided to work
on unsigned and signed numbers.

307

Chapter 14 • Arithmetic Instructions

div
idiv

source (unsigned)
source (signed)

The source operand specified in the instruction is used as the divisor. As with the multiplication
instruction, both division instructions can work on 8-, 16-, or 32-bit numbers. All six status flags
are affected and are undefined. None of the flags are updated. We first consider the unsigned
version.
If the source operand is a byte, the dividend is assumed to be in the AX register and 16 bits
long. After the division, the quotient is returned in the AL register and the remainder in the AH
register, as shown below:
16-bit dividend
Quotient

Remainder
and

8-bit
source
Divisor

For word operands, the dividend is assumed to be 32 bits long and in DX:AX (upper 16 bits
in DX). After the division, the 16-bit quotient will be in AX and the 16-bit remainder in DX, as
shown below:
32-bit dividend
DX

Quotient
AX

Remainder
and

16-bit
source
Divisor

For 32-bit operands, the dividend is assumed to be 64 bits long and in EDX:EAX. After the
division, the 32-bit quotient will be in the EAX and the 32-bit remainder in the EDX, as shown
below:
64-bit dividend

EDX

EAX

EAX
32-bit
source
Divisor

Remainder

Quotient
and

EDX

308

Assembly Language Programming in Linux

Example 14.3 8-bit division.
Consider dividing 251 by 12 (i.e., 251/12), which produces 20 as the quotient and 11 as the remainder. The code
mov
mov
div

AX,2 51
CL,12
CL

leaves 20 (14H) in the AL register and 11 (OBH) in the AH register.

Example 14.4 16-bit division.
Consider the 16-bit division: 5147/300. Executing the code
xor
mov
mov
div

DX,DX
AX, 141BH
CX, 012CH
CX

clear DX
AX = 5147D
CX = 300D

leaves 17 (12H) in the AX and 47 (2FH) in the DX.
D
Now let us turn our attention to the signed division operation. The i d i v instruction has the
same format and behavior as the unsigned d i v instruction including the registers used for the
dividend, quotient, and remainder.
The i d i v instruction introduces a slight complication when the dividend is a negative number.
For example, assume that we want to perform the 16-bit division: —251/12. Since —251 = FF14H,
the AX register is set to FF14H. However, the DX register has to be initialized to FFFFH by signextending the AX register. If the DX is set to OOOOH as we did in the unsigned d i v operation,
the dividend 0000FF14H is treated as a positive number 65300D. The 32-bit equivalent of -251
is FFFFFF14H. If the dividend is positive, DX should have OOOOH.
To aid sign extension in instructions such as i d i v , the instruction set provides several instructions:
cbw
cwd
cdq

(convert byte to word)
(convert word to doubleword)
(convert doubleword to quadword)

These instructions take no operands. The first instruction can be used to sign-extend the AL
register into the AH register and is useful with the 8-bit i d i v instruction. The cwd instruction
sign extends the AX into the DX register and is useful with the 16-bit i d i v instruction. The cdq
instruction sign extends the EAX into the EDX. In fact, both cwd and cdq use the same opcode
99H, and the operand size determines whether to sign-extend the AX or EAX register.
For completeness, we mention three other related instructions. The cwde instruction sign
extends the AX into EAX much as the cbw instruction. Just like the cwd and cdq, the same
opcode 98H is used for both cbw and cwde instructions. The operand size determines which
one should be applied. Note that cwde is different from cwd in that the cwd instruction uses the
DX:AX register pair, whereas cwde uses the EAX register as the destination.
The instruction set also provides the following two move instructions:
movsx
movzx

dest, src
dest, src

(move sign-extended s r c to d e s t )
(move zero-extended s r c to d e s t )

Chapter 14 • Arithmetic Instructions

309

In both these instructions, d e s t has to be a register, whereas the s r c operand can be in a register
or memory. If the source is an 8-bit operand, the destination has to be either a 16- or 32-bit register.
If the source is a 16-bit operand, the destination must be a 32-bit register.
Here are some examples of the i d i v instruction.
Example 14.5 Signed 8-bit division.
The following sequence of instructions perform the signed 8-bit division —95/12:
mov
Cbw
mov
idiv

AL,-95
; AH = FFH
CL,12
CL

The i d i v instruction leaves - 7 (F9H) in the AL register and — 11 (F5H) in the AH register.
Example 14.6 Signed 16-bit division.
Suppose that we want to divide -5147 by 300. The instruction sequence
mov
cwd
mov
idiv

AX,-5147
; DX = FFFFH
CX,3 0 0
CX

performs this division and leaves —17 (FFEFH) in the AX register and -47 (FFDIH) in the DX
register as the remainder.
•

Our First Program
In the previous chapters, we looked at how the add and subtract instructions are used in assembly
language programs. Since we introduced the multiplication instructions in this chapter, we look
at how they are used in assembly language programs. Program 14.1 is a simple to program to
multiply two 32-bit integers and display the result.
Program 14.1 Multiplication program to multiply two 32-bit signed integers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Multiplies two 32-bit signed integerts

MULT.ASM

Objective: To use the multiply instruction.
Input; Requests two integers N and M.
Output: Outputs N*M if no overflow.
%include "io.mac"
.DATA
prompt_msg
output_msg
oflow_msg
query_msg

db
db
db
db

.CODE
.STARTUP

"Enter two integers: ",0
"The product = ",0
"Sorry! Result out of range.",0
"Do you want to quit (Y/N): ",0

•

310

Assembly Language Programming in Linux

16
read_input:
17
PutStr
18
GetLInt
19
GetLInt
20
imul
21
jc
22
PutStr
PutLInt
23
nwln
24
jmp
25
overflow:
26
PutStr
27
nwln
28
user_query:
29
; query
30
PutStr
31 :
GetCh
32 :
cmp
33 :
jne
34 :
35 : done:
.EXIT
36 :

prompt_msg
EAX
EBX
EBX
overflow
output_msg
EAX

; signed multiply
; no overflow
; display result

short user_query
oflow_msg

user whether to terminate
query_msg
AL
AL,'Y'
; if response is not 'Y
read_input
; repeat the loop

An example interaction with the program is shown below:
Enter two integers: 65535
32768
The product = 2147450880
Do you want to quit (Y/N): n
Enter two integers: 65535
32769
Sorry! Result out of range.
Do you want to quit (Y/N): Y

If there is no overflow, the result is displayed; otherwise, an error message is displayed. In both
cases, the user is queried if the program is to be continued.
The two input numbers are read into the EAX and EBX registers using G e t L I n t on lines 18
and 19. Since the two values are signed integers, we use imul to multiply these two integers.
Recall that the multiply instructions set the carry flag if the result requires more than 32 bits.
While this condition is technically not an error, for practical purposes we treat this as an overflow.
We use the conditional jump instruction on line 21 to detect this overflow condition. If there is
no overflow, we display the 32-bit result (line 23). The rest of the program is straightforward to
follow.

Illustrative Examples
To demonstrate the application of the arithmetic instructions and flags, we write two procedures
to input and output signed 8-bit integers in the range of —128 to +127. These procedures are as
follows:

Chapter 14 • Arithmetic Instructions

Put I n t 8
Ge t I n t 8

311

Displays a signed 8-bit integer that is in the AL register;
Reads a signed 8-bit integer from the keyboard into the AL register.

The following two subsections describe these procedures in detail.
Example 14.7 PutlntS procedure.
Our objective here is to write a procedure that displays the signed 8-bit integer that is in the AL
register. In order to do this, we have to separate individual digits of the number to be displayed
and convert them to their ASCII representation. The steps involved are illustrated by the following
example, which assumes that AL has 108.
separate 1 —^ convert to ASCII (31H) -^ display
separate 0 —> convert to ASCII (30H) —> display
separate 8 -^ convert to ASCII (38H) -> display
Separating individual digits is the heart of the procedure. This step is surprisingly simple! All
we have to do is repeatedly divide the number by 10, as shown below (for a related discussion, see
Appendix A):
108/10
10/10
1/10

Quotient
= 1 0
=
1
=
0

Remainder
8
0
1

The only problem with this step is that the digits come out in the reverse order. Therefore,
we need to buffer them before displaying. The pseudocode for the Put I n t 8 procedure is shown
below:
Put I n t 8 (number)
if (number is negative)
then
display' —' sign
number = —number {reverse sign}
end if
index = 0
repeat
quotient = number/10 {integer division}
remainder = number % 10 {% is the modulo operator}
buffer[index] = remainder + 30H
{save the ASCII character equivalent of remainder}
index = index + 1
number = quotient
until (number = 0)
repeat
index = index — 1
display digit at buffer[index]
until (index = 0)
endPutlntS

Assembly Language Programming in Linux

312

Program 14.2 The P u t i n t s procedure to display an 8-bit signed number (in g e t p u t . asm file)
1

Putlnt8 procedure displays a signed 8-bit integer that is
in the AL register. All registers are preserved.

2
3

4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

Putlnt8:
enter
push
push
push
test
jz
negative:
PutCh
neg
positive:
mov
sub
repeatl:
sub
div
; AX/BL
add
mov
inc
cmp
jne
display_digit
dec
mov
PutCh
jnz
display_done:
pop
pop
pop
leave
ret

3,0
AX
BX
ESI
AL,80H
positive

reserves 3 bytes of buffer space

negative number?

sign for negative numbers
convert to magnitude

BL,10
ESI,ESI

divisor = 10
ESI = 0 (ESI points to buffer)

AH = 0 (AX is the dividend)
AH, AH
BL
leaves AL = quotient & AH = remainder
AH,'0'
/ convert remainder to ASCII
[EBP+ESI-3],AH ; copy into the buffer
ESI
AL,0
; quotient = zero?
repeatl
; if so, display the number
ESI
AL, [EBP+ESI-3] / d i s p l a y d i g i t p o i n t e d by ESI
AL
display_digit ; if ESI<0, done displaying
ESI
BX
AX

; restore

registers

; clears local buffer

space

The P u t i n t s procedure shown in Program 14.2 follows the logic of the pseudocode. Some
points to note are the following:
• The buffer is considered as a local variable. Thus, we reserve three bytes on the stack using
the e n t e r instruction (see line 6).
• The code
test
jz

AL,80H
positive

Chapter 14 • Arithmetic Instructions

313

tests whether the number is negative or positive. Remember that the sign bit (the leftmost
bit) is 1 for a negative number.
• Reversal of sign is done by the
neg

instruction on line 14.
• Note that we have to initialize AH with 0 (line 19), as the d i v instruction assumes a 16-bit
dividend in the AX register when the divisor is an 8-bit number.
• Conversion to ASCII character representation is done on line 22 using
add

AH ' 0 '

• The ESI register is used as the index into the buffer, which starts at [BP
[BP + ESI - 3] points to the current byte in the buffer (line 29).
• The repeat- while condition (index = 0) is tested by

3]. Thus,

display_digit

on line 31.
Example 14.8 GetlntS procedure.
The G e t l n t S procedure reads a signed integer and returns the number in the AL register. Since
only 8 bits are used to represent the number, the range is limited to -128 to +127 (both inclusive).
The key part of the procedure converts a sequence of input digits received in the character form
to its binary equivalent. The conversion process, which involves repeated multiplication by 10, is
illustrated for the number 158:
Input digit
Initial value
'l'(31H)
'5' (35H)
'8' (38H)

Numeric value
—
1
5
8

Number = number * 10 + numeric value
0
0* 10+1 = 1
1 * 10 + 5 = 15
15 * 10 + 8 = 158

The pseudocode of the G e t l n t S procedure is as follows:
GetlntS 0
read input character into char
if ((char = ' - ' ) OR (char = '+'))
then
sign = char
read the next character into char
end if
number = char - '0' {convert to numeric value}
count = 2 {number of remaining digits to read}
repeat
read the next character into char
if (char ^ carriage return)
then

314

Assembly Language Programming in Linux

number = number * 10 + (char — '0')
else
goto c o n v e r t _ d o n e
end if
count = count — 1
until (count — 0)
convert_done:
{check for out-of-range error}
if ((number > 128) OR ((number = 128) AND (sign y^ ' - ' ) ) )
then
out of range error
set carry flag
else {number is OK}
clear carry flag
end if
if(sign = ' - ' )
then
number = —number {reverse sign}
end if
endGetints

Program 14.3 The G e t i n t s procedure to read a signed 8-bit integer (in g e t p u t . asm file)
Getints procedure reads an integer from the keyboard and
stores its equivalent binary in AL register. If the number
is within -12S and +127 (both inclusive), CF is cleared;
otherwise, CF is set to indicate out-of-range error.
No error check is done to see if the input consists of
digits only. All registers are preserved except for AX.
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

Getints:
push
push
push
push
sub
sub
GetStr
mov
get_next_char
mov
cmp
je
cmp
jne
sign:
mov
inc

BX
CX
DX
ESI
DX,DX
BX,BX
number,5
ESI,number

save registers

DX = 0
BX = 0
get input number

DL, [ESI]
DL,'-'
sign
DL,'+'
digit

read input from buffer
is it negative sign?
if so, save the sign
is it positive sign?
if not, process the digit

BH,DL
ESI

BH keeps sign of input number

Chapter 14 • Arithmetic Instructions

27:
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

jmp

315

get_next_cha

digit:
sub
mov
sub
mov
mov
convert_loop:
inc
mov
cmp
je
sub
mul
add
loop
convert_done:
cmp
ja
jb
cmp
jne
number_OK:
cmp
jne
neg
number_done:
clc
jmp
out_of_range:
stc
done:
pop
pop
pop
pop
ret

AX, AX
BL,10
DL,'0'
AL,DL
CX,2
ESI
DL, [ESI]
DL,0
c onve r t_done
DL,'0'
BL
AX,DX
convert_loop

AX = 0
BL holds the multiplier
convert ASCII to numeric
maximum two more digits to read

NULL?
if so, done reading the number
else, convert ASCII to numeric
multiply total (in AL) by 10
and add the current digit

AX,128
out_of_range
number_OK
BH,'-'
out_of_range

if AX > 128, number out of range
if AX < 128, number is valid
if AX = 12 8, must be a negative;
otherwise, an invalid number

BH,'-'
number_done
AL

number negative?
if not, we are done
else, convert to 2's complement
CF = 0 (no error)

done

; CF = 1 (range error)
ESI
DX
CX
BX

; restore registers

The assembly language code for the G e t l n t S procedure is given in Program 14.3. The
procedure uses G e t S t r to read the input digits into a buffer number. This buffer is 5 bytes long
so that it can hold the sign, 3 digits, and a null character. Thus, we specify 5 in G e t S t r on line 16.
• The character input digits are converted to their numeric equivalent by subtracting '0' on
lines 31 and 39.
• The multiplication is done on line 40, which produces a 16-bit result in AX. Note that the
numeric value of the current digit (in DX) is added (line 41) to detect the overflow condition
rather than the 8-bit value in DL.
• When the conversion is done, AX will have the absolute value of the input number. Lines 44
to 48 perform the out-of-range error check. To do this check, the following conditions are
tested:

316

Assenfibly Language Programming in Linux

AX > 128
AX =128

=> out of range
=> input must be a negative number to be a valid
number; otherwise, out of range

The j a (jump if above) and j b (jump if below) on lines 45 and 46 are conditional jumps
for unsigned numbers.
• If the input is a negative number, the value in AL is converted to the 2's complement representation by using the neg instruction (line 52).
• The c l c (clear CF) and s t c (set CF) instructions are used to indicate the error condition
(lines 54 and 57).

Summary
The status flags register the outcome of arithmetic and logical operations. Of the six status flags,
zero flag, carry flag, overflow flag, and sign flag are the most important. The zero flag records
whether the result of an operation is zero or not. The sign flag monitors the sign of the result. The
carry and overflow flags record the overflow conditions of the arithmetic operations. The carry
flag is set if the result on unsigned numbers is out of range; the overflow flag is used to indicate
the out-of-range condition on the signed numbers.
The IA-32 instruction set includes instructions for addition, subtraction, multiplication, and
division. While the add and subtract instructions work on both unsigned and signed data, separate instructions are required for signed and unsigned numbers for performing multiplication and
division operations.
The arithmetic instructions can operate on 8-, 16-, or 32-bit operands. If numbers are represented using more than 32 bits, we need to devise methods for performing the arithmetic operations on multiword operands. We gave an example to illustrate how multiword addition could be
implemented.
We demonstrated that multiplication by special values (for example, multiplication by 10) can
be done more efficiently by using addition. Chapter 16 discusses how the shift operations can be
used to implement multiplication by a power of 2.

15
Conditional Execution
Assembly language provides several instructions to facilitate conditional execution. We have discussed some of these instructions like j mp and l o o p in Chapter 9. Our discussion here complements that discussion. In this chapter, we give more details on these instructions including how the
target address is specified, how the flags register is used to implement conditional jumps, and so
on. The jump instructions we have used so far specify the target address directly. It is also possible
to specify the target ofjump indirectly We describe how the target can be specified indirectly and
illustrate its use of such indirect jumps by means of an example.

Introduction
Modem high-level languages provide a variety of decision structures. These structures include
selection structures such as i f - t h e n - e l s e and iterative structures such as w h i l e and f o r
loops. Assembly language, being a low-level language, does not provide these structures directly.
However, assembly language provides several basic instructions that could be used to construct
these high-level language selection and iteration structures. These assembly language instructions
include the unconditional jump, compare, conditional jump, and loop. We briefly introduced some
of these instructions in Chapter 9. In this chapter, we give more details on these instructions.
As we have seen in the previous chapters, we can specify the target address directly. In assembly language programs, we do this by specifying a label associated with the target instruction. The
assembler replaces the label with the address. In general, this address can be a relative address
or an absolute address. If the address is relative, the offset of the target is specified relative to
the current instruction. In the absolute address case, target address is given. We start this with a
discussion of these details.
We can also specify the target address indirectly, just like the address given in the indirect
addressing mode. In these indirect jumps, the address is specified via a register or memory. We
describe the indirect jump mechanism toward the end of the chapter. We also illustrate how the
indirect jump instructions are useful in implementing multiway switch or case statements.
The IA-32 instruction set provides three types of conditional jump instructions. These include
the jump instructions that test the individual flag values, jumps based on signed comparisons, and
jumps based on unsigned comparisons. Our discussion of these conditional jump instructions on
page 322 throws light on how the processor uses theflagsto test for the various conditions.

318

Assembly Language Programming in Linux

Unconditional Jump
We introduced the unconditional jump instruction in Chapter 9. It unconditionally transfers control
to the instruction located at the target address. The general format, as we have seen before, is
jmp

target

There are several versions of the j mp instruction depending on how the target address is specified
and where the target instruction is located.
Specification of Target
There are two distinct ways by which the target address of the j mp instruction can be specified:
direct and indirect. The vast majority of jumps are of the direct type. We have used these types
of unconditional jumps in the previous chapters. Therefore, we focus our attention on the direct
jump instructions and discuss the indirect jumps toward the end of the chapter.
Direct Jumps In the direct jump instruction, the target address is specified direcdy as part of the
instruction. In the following code fragment
mov

J"^P
init CX 20:
mov
CX init done:
mov
repeatl;
dec
jmp

CX,10
CX init done
CX,20
AX,CX
CX
repeatl

both the j mp instructions directly specify the target. As an assembly language programmer, you
only specify the target address by using a label; the assembler figures out the exact value by using
its symbol table.
The instruction
jmp

CX_init_done

transfers control to an instruction that follows it. This is called tht forward jump. On the other
hand, the instruction
jmp

repeatl

is a backwardjump, as the control is transferred to an instruction that precedes the jump instruction.
Relative Address The address specified in a jump instruction is not the absolute address of the
target. Rather, it specifies the relative displacement in bytes between the target instruction and the
instruction following the jump instruction (and not from the jump instruction itself!).

Chapter 15 • Conditional Execution

319

In order to see why this is so, we have to understand how jumps are executed. Recall that the
EIP register always points to the next instruction to be executed (see Chapter 4). Thus, after fetching the j mp instruction, the EIP is automatically advanced to point to the instruction following the
j mp instruction. Execution of j mp involves changing the EIP from where it is currently pointing to the target instruction location. This is achieved by adding the difference (i.e., the relative
displacement) to the EIP contents. This works fine because the relative displacement is a signed
number—a positive displacement implies a forward jump and a negative displacement indicates a
backward jump.
The specification of relative address as opposed to absolute address of the target instruction is
appropriate for dynamically relocatable code (i.e., for position-independent code).
Where Is the Target? If the target of a jump instruction is located in the same segment as the
jump itself, it is called an intrasegmentjump; if the target is located in another segment, it is called
an intersegment jump.
Our previous discussion has assumed an intrasegment jump. In this case, the j mp simply
performs the following action:
EIP = EIP + relative-displacement
In the case of an intersegment jump, called far jump, the CS is also changed to point to the
target segment, as shown below:
CS = target-segment
EIP = target-offset
Both target-segment and target-offset are specified direcdy in the instruction. Thus, for 32-bit
segments, the instruction encoding for the intersegment jump takes seven bytes: one byte for the
specification of the opcode, two bytes for the target-segment, and four bytes for the target-offset
specification.
The majority of jumps are of the intrasegment type. Therefore, moreflexibilityis provided to
specify the target in intrasegment jump instructions. These instructions can have short and near
format, depending on the distance of the target location from the instruction following the jump
instruction—that is, depending on the value of the relative displacement.
If the relative displacement, which is a signed number, can fit in a byte, a jump instruction is
encoded using just two bytes: one byte for the opcode and the other for the relative displacement.
This means that the relative displacement should be within —128 to +127 (the range of a signed
8-bit number). This form is called the short jump,
If the target is outside this range, 2 or 4 bytes are used to specify the relative displacement. A
two-byte displacement is used for 16-bit segments, and 4-byte displacement for 32-bit segments.
As a result, the jump instruction requires either 3 or 5 bytes to encode in the machine language.
This form is called the near jump.
If you want to use the short jump form, you can inform the assembler of your intention by
using the operator SHORT, as shown below:
jmp

SHORT CX_init_done

The question that naturally arises at this point is: What if the target is not within -128 or +127
bytes? The assembler will inform you with an error message that the target can't be reached with
a short jump.
In fact, specification of SHORT in a statement like

320

Assembly Language Programming in Linux

8 0005 EB OC
9 0007 B9 OOOA
lOOOOA E B 0 7 9 0
11
12 OOOD B9 0014
13 0010 E9 OODO
14
15 0013 8B CI
16
17 0015 49
18 0016 EB FD

jmp
SHORT CX_init_done
mov
CX,10
jmp
CX_init_done
init_CX_2 0:
mov
CX,20
jmp
near_jump
CX_init_done:
mov
AX,CX
repeatl:
dec
CX
jmp
repeatl

84
85
86
87
88
89

jmp
SHORT short_jump
mov
CX, OFFOOH
short_jump:
mov
DX, 20H
near_jump:
jmp
init_CX_20

OODB
OODD

EB 03
B9 FFOO

OOEO

BA 0020

00E3

E9 FF27

Figure 15.1 Example encoding of jump instructions.

jmp

SHORT repeatl

in the example code on page 318 is redundant, as the assembler can automatically select the
SHORT jump, if appropriate, for all backward jumps. However, for forward jumps, the assembler needs your help. This is because the assembler does not know the relative displacement of
the target when it must decide whether to use the short form. Therefore, use the SHORT operator
only for forward jumps if appropriate.
Example 15.1 Example encodings of short and near jumps.
Figure 15.1 shows some example encodings for short and near jump instructions. The forward
short jump on line 8 is encoded in the machine language as EB OC, where EB represents the
opcode for the short jump. The relative offset to target CX_init_done is OCH. From the code,
it can be seen that this is the difference between the address of the target (address 0013H) and
the instruction following the jump instruction on line 9 (address 0007H). Another example of a
forward short jump is given on line 84.
The backward jump instruction on line 18 also uses the short jump form. In this case, the
assembler can decide whether the short or near jump is appropriate. The relative offset is given by
FDH, which is - 3 in decimal. This is the offset from the instruction following the jump instruction
at address 18H to r e p e a t l at 15H.
For near jumps, the opcode is E9H, and the relative offset is a 16-bit signed integer. The
relative offset of the forward near jump on line 13 is OODOH, which is equal to 00E3H - 0013H.
The relative offset of the backward near jump on line 89 is given by OOODH - 00E6H = FF27H,
which is equal to -217 in decimal.

Chapter 15 • Conditional Execution

321

Table 15.1 Some examples of cmp AL, DL
AL

56
200
101
200
-105
-125
-124

57
101
200
200
-105
-124
-125

1
0
1
0
0
1
0

0
0
0
1
1
0
0

1
0
1
0
0
1
0

0
1
1
0
0
0
0

1
1
0
1
1
1
0

1
0
1
0
0
1
0

The jump instruction encoding on line 10 requires some explanation. Since this is a forward
jump and we have not specified that it could be a short jump, assembler reserves 3 bytes for a near
jump (the worst case scenario). At the time of actual encoding, the assembler knows the target
location and therefore uses the short jump version. Thus, EB 0 7 represents the encoding, and the
third byte is not used and contains a nop (no operation).
•

Compare Instruction
Implementation of high-level language decision structures like i f - t h e n - e l se in assembly language is a two step process:
1. An arithmetic or comparison instruction updates one or more arithmetic flags;
2. A conditional jump instruction causes selective execution of the appropriate code fragment
based on the values of the flags.
We discussed the compare (cmp) instruction on page 199. The main purpose of the cmp
instruction is to update the flags so that a subsequent conditional jump instruction can test these
flags.
Example 15.2 Some examples of the compare instruction.
The four flags that are useful in establishing a relationship (<, <, >, and so on) between two
integers are CF, ZF, SF, and OF. Table 15.1 gives some examples of executing the
cmp

AL,DL

instruction. Recall that CF is set if the result is out of range when treating the operands as unsigned
numbers. Since the operands are 8 bits in our example, this range is 0 to 255D. Similarly, the OF
is set if the result is out of range for signed numbers (for our example, this range is -128D to
+127D).
In general, the value of ZF and SF can be obtained in a straightforward way. Therefore, let us
focus on the carry and overflow flags. In thefirstexample, since 56-57 = - 1 , CF is set but not OF.

322

Assembly Language Programming in Linux

The second example is not so simple. Treating the operands in AL and DL as unsigned numbers,
200—101 = 99, which is within the range of unsigned numbers. Therefore, CF = 0. However, when
treating 200D (= C8H) as a signed number, it represents —56D. Therefore, compare performs
—56—101 = —157, which is out of range for signed numbers resulting in setting OF. We will leave
verification of the rest of the examples as an exercise.
•

Conditional Jumps
Conditional jump instructions can be divided into three groups:
1. Jumps based on the value of a single arithmetic flag;
2. Jumps based on unsigned comparisons;
3. Jumps based on signed comparisons.
Jumps Based on Single Flags
The IA-32 instruction set provides two conditional jump instructions—one for jumps if the flag
tested is set, and the other for jumps when the flag is cleared—for each arithmetic flag except the
auxiliary flag. These instructions are summarized in Table 15.2.
As shown in Table 15.2, the jump instructions that test the zero and parity flags have aliases
(e.g., j e is an alias for j z). These aliases are provided to improve program readability. For
example,
if (count =100)
then
< statement 1>
end if
can be written in the assembly language as
cmp
jz

count,100
SI

SI:

But our use of j z does not convey that we are testing for equality. This meaning is better conveyed
by
cmp
je

count,100
SI

SI:

The assembler, however, treats both j z and j e as synonymous instructions.
The only surprising instruction in Table 15.2 is the j ecxz instruction. This instruction does
not test any flag but tests the contents of the ECX register for zero. It is often used in conjunction
with the l o o p instruction. Therefore, we defer a discussion of this instruction to a later section
that deals with the l o o p instruction.

Chapter 15 • Conditional Execution

323

Table 15.2 Jumps based on single flag value

Meaning

Mnemonic

Jumps if

Testing for zero:
jump if zero
jump if equal

ZF=1

jnz
jne

jump if not zero
jump if not equal

ZF = 0

jecxz

jumpifECX = 0

ECX = 0
(no flags tested)

jc
jnc

jump if carry
jump if no carry

CF=1
CF = 0

jo
jno

jump if overflow
jump if no overflow

0F=1
OF = 0

js
jns

jump if (negative) sign
jump if no (negative) sign

SF=1
SF = 0

jz
je

Testing for carry:

Testing for overflow:

Testing for sign:

Testing for parity:

JP jump if parity
JPe

jump if parity is even

jnp
jpo

jump if not parity
jump if parity is odd

PF=1
PF = 0

Jumps Based on Unsigned Comparisons
When comparing two numbers
cmp

numl,num2

it is necessary to know whether these numbers numl and num2 represent singed or unsigned numbers in order to establish a relationship between them. As an example, assume that AL = 1011011 IB
and DL = 011011 lOB. Then the statement
cmp

AL,DL

should appropriately update flags to yield that AL > DL if we treat their contents as representing
unsigned numbers. This is because, in unsigned representation, AL = 183D and DL = 1 lOD. However, if the contents of AL and DL registers are treated as representing signed numbers, AL < DL

324

Assembly Language Programming in Linux

Table 15.3 Jumps based on unsigned comparison

Mnemonic

Meaning

condition tested

jump if equal
jump if zero

ZF=1

jump if not equal
jump if not zero

ZF = 0
CF = 0 and ZF = 0

jnbe

jump if above
jump if not below or equal

jae
jnb

jump if above or equal
jump if not below

CF = 0

jb
jnae

jump if below
jump if not above or equal

CF=1

jbe
jna

jump if below or equal
jump if not above

CF=lorZF=l

jz
jne
jnz

as the AL register has a negative number (-73D) while the DL register has a positive number
(+110D).
Note that when using a cmp statement like
cmp

numl,num2

we compare numl to num2 (e.g., numl < num2, numl > num2, and so on). There are six possible
relationships between two numbers:
numl
numl
numl
numl
numl
numl

= num2
7^ num2
> num2
> num2
< num2
< num2

For the unsigned numbers, the carry and the zero flags record the necessary information in order
to establish one of the above six relationships.
The six conditional jump instructions (along with six aliases) and the flag conditions tested
are shown in Table 15.3. Note that "above" and "below" are used for > and < relationships for
the unsigned comparisons, reserving "greater" and "less" for signed comparisons, as we shall see
next.

C>iapter15 • Conditional Execution

325

Jumps Based on Signed Comparisons

The = and ^ comparisons work with either signed or unsigned numbers, as we essentially compare
the bit pattern for a match. For this reason, j e and j n e also appear in Table 15.4 for signed
comparisons.
For signed comparisons, three flags record the necessary information: the sign flag (SF), the
overflow flag (OF), and the zero flag (ZF). Testing for = and ^ simply involves testing whether
the ZF is set or cleared, respectively. With the singed numbers, establishing < and > relationships
is somewhat tricky. Let us assume that we are executing the cmp instruction
cmp

Snuml,Snum2

Conditions for Snuml > Snum2

The following table shows several examples in which Snuml

> Snum2 holds.
Snuml

Snum2

56
56
-55
55

55
-55
-56
-75

0
0
0
0

0
0
0
1

It appears from these examples that Snuml > Snum2 if
ZF

or
That is, ZF = 0 and OF = SF. We cannot use just OF = SF because if two numbers are equal,
ZF = 1 and OF = SF = 0. In fact, these conditions do imply the "greater than" relationship between
Snuml and Snum2. As shown in Table 15.4, these are the conditions tested for the j g conditional
jump.
Conditions for Snuml < Snum2 Again, as in the previous case, we develop our intuition by
means of a few examples. The following table shows several examples in which the Snuml <
Snum2 holds.
Snuml

Snum2

55
-55
-56
-75

56
56
-55
55

0
0
0
0

0
0
0
1

1
1
1
0

326

Assembly Language Programming in Linux

Table 15.4 Jumps based on signed comparison

Mnemonic

Meaning

condition tested

je
jz

jump if equal
jump if zero

ZF=1

jne
jnz

jump if not equal
jump if not zero

ZF = 0

jg
jnle

jump if greater
jump if not less or equal

ZF = 0 and SF = OF

jge
jnl

jump if greater or equal
jump if not less

SF = OF

jl
jnge

jump if less
jump if not greater or equal

SF^^OF

jle
jng

jump if less or equal
jump if not greater

ZF = 1 or SF 7^ OF

It appears from these examples that Snuml < Snum2 holds if the following conditions are
true:
ZF

or
0

That is, ZF = 0 and OF ^ SF. In this case, ZF = 0 is redundant and the condition reduces to
OF 7^ SF. As indicated in Table 15.4, this is the condition tested by the j 1 conditional jump
instruction.
A Note on Conditional Jumps

All conditional jump instructions are encoded into the machine language using only 2 bytes (like
the short jump instruction). As a consequence, all jumps should be short jumps. That is, the
target instruction of a conditional jump must be 128 bytes before or 127 bytes after the instruction
following the conditional jump instruction.
What if the target is outside this range? If the target is not reachable by using a short jump,
you can use the following trick to overcome this limitation of the conditional jump instructions.
For example, in the instruction sequence

Chapter 15 • Conditional Execution

327

target:
cmp
je
mov

EAX,EBX
target
; target
ECX,10

is not a short

jump

if t a r g e t is not reachable by a short jump, it should be replaced by
target:
cmp
jne
jmp

EAX,EBX

skipl
target

, s k i p l i s a s h o r t jump

skipl:
ECX,10

What we have done here is negated the test condition (j e becomes j ne) and used an unconditional
jump to transfer control to target. Recall that j mp instruction has both short and near versions.

Looping Instructions
Instructions in this group use the CX or ECX register to maintain repetition count. The CX register
is used if the operand size is 16 bits; ECX is used for 32-bit operands. In the following discussion,
we assume that the operand size is 32 bits. The three loop instructions decrement the ECX register
before testing it for zero. Decrementing ECX does not affect any of the flags. The format of these
instructions along with the action taken is shown below.
Meaning

Mnemonic
loop

target

loop

Action
ECX = ECX - 1
ifCXf^O
jump to target

loope

target

loop while equal

ECX = ECX - 1

loopz

target

loop while zero

if(ECX7^0andZF=l)
jump to target

loopne

target

loop while not equal

ECX = ECX - 1

loopnz

target

loop while not zero

if(ECX7^0andZF=0)
jump to target

The destination specified in these instructions should be reachable by a short jump. This is
a consequence of using the two-byte encoding with a single byte indicating the relative displacement, which should be within —128 to -Hi27.

328

Assembly Language Programming in Linux

We have seen how the l o o p instruction is useful in constructing loops. The other two loop
instructions are useful in writing loops that require two termination conditions. The following
example illustrates this point.

Our First Program
Let us say that we want to write a loop that reads a series of nonzero integers into an array. The
input can be terminated either when the array is full, or when the user types a zero, whichever
occurs first. The program is given below.

Program 15.1 A program to read long integers into an array
1

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

Reading long integers into an array

READ_ARRAY.ASM

Objective: To read long integers into an array;
demonstrates the use of loopne.
Input: Requests nonzero values to fill the array;
a zero input terminated input.
Output: Displays the array contents.
%include "io.mac"
EQU

out_msg
empty_msg
query_msg

db
db
db
db
db

"Enter at most 20 nonzero values "
"(entering zero terminates input):",0
"The array contents are: ",0
"The array is empty. ",0
"Do you want to quit (Y/N): ",0

.UDATA
array

resd

MAX SIZE

MAX_SIZE
.DATA
input_prompt

.CODE
.STARTUP
read_input:
PutStr input_jprompt ; request input array
; ESI = 0 (ESI is used as an index)
xor
ESI,ESI
mov
ECX,MAX_SIZE
read_loop:
GetLInt EAX
mov
[array+ESI*4],EAX
ESI
; increment array index
inc
; number = zero?
cmp
EAX,0
; iterates a maximum of MAX_SIZE
loopne read_loop
exit_loop:
; if the input is terminated by a zero,
; decrement ESI to keep the array size

Chapter 15 • Conditional Execution

38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

dec

329

skip
ESI

skip:
mov
jecxz
xor
PutStr
write_loop:
PutLInt
nwln
inc
loop
jmp
empty_array:
PutStr
nwln
user_query:
PutStr
GetCh
cmp
jne
done:
.EXIT

ECX,ESI
empty_array
ESI,ESI
out_msg

ESI has the actual array size
if ecx = 0, empty array
initalize index to zero

[array+ESI*4]
ESI
write_loop
short user_query
empty_msg

; output empty array message

query_msg
AL
AL,'Y'
read_input

; query user whether to terminate
; if response is not 'Y'
; repeat the loop

The program has two loops: a read loop and a write loop. The read loop consists of lines 2934. The loop termination conditions are implemented by the loopne instruction on line 34.
To facilitate termination of the loop after reading a maximum of MAX_S IZE integers, the ECX
register is initialized to MAX_S I ZE on line 28. The other termination condition is tested on line 33.
The write loop consists of the code on lines 45-49. It uses the l o o p instruction (line 49) to
iterate the loop where the loop count in ECX is the number of valid integers given by the user.
However, we have a problem with the loop instruction: if the user did not enter any nonzero
integers, the count in ECX is zero. In this case, the write loop iterates the maximum number of
times (not zero times) because it decrements ECX before testing for zero. This is not what we
want!
The instruction j ecxz provides a remedy for this situation by testing the ECX register. The
syntax of this instruction is
target
3 ecxz
which tests the ECX register and if it is zero, control is transferred to the target instruction. Thus,
it is equivalent to
cmp
jz

ECX,0
target

except that j ecxz does not affect any of the flags, while the cmp/j z combination affects the
status flags. If the operand size is 16 bits, we can use the j cxz instruction instead of j ecxz.
Both instructions, however, use the same opcode E3H. The operand size determines the register—
CX or ECX—used. We use this instruction on line 42 to test for an empty array. The rest of the
code is straightforward to follow.

330

Assembly Language Programming in Linux

Notes on Execution Times of loop and j ecxz Instructions
1. The functionality of the l o o p instruction can be replaced by
dec
jnz

ECX
target

Surprisingly, the l o o p instruction is slower than the corresponding d e c / j n z instruction
pair.
2. Similarly, the j ecxz instruction is slower than the code shown below:
cmp
jz

ECX,0
target

Thus, for code optimization, these complex instructions should be avoided. However, for
illustrative purposes, we use these instructions in the following examples.

Illustrative Examples
In this section, we present two examples to show the use of the selection and iteration instructions
discussed in this chapter. The first example uses linear search for locating a number in an unsorted
array, and the second example sorts an array of integers using the selection sort algorithm.
Example 15.3 Linear search of an integer array,
In this example, the user is asked to input an array of non-negative integers and then query whether
a given number is in the array or not. The program, shown below, uses a procedure that implements
the linear search to locate a number in an unsorted array.
The main procedure initializes the input array by reading a maximum of MAX_SIZE number
of non-negative integers into the array. The user, however, can terminate the input by entering a
negative number. The l o o p instruction (line 36), with ECX initialized to MAX_SIZE (line 29),
is used to iterate a maximum of MAX_SIZE times. The other loop termination condition (i.e.,
entering a negative number) is tested on lines 32 and 33. The rest of the main program queries
the user for a number and calls the linear search procedure to locate the number. This process is
repeated as long as the user appropriately answers the query.

Program 15.2 Linear search of an integer array
1
2
3
4
5
6
7
8
9
10
11
12

Linear search of integer array

LIN_SRCH.ASM

Objective: To implement linear search on an integer
array.
Input: Requests numbers to fill array and a
number to be searched for from user.
Output: Displays the position of the number in
the array if found; otherwise, not found
message.
%include "io.mac"
MAX SIZE

EQU

Chapter 15 • Conditional Execution

13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

.DATA
input_prompt
query_number
out_msg
not_found_msg
query_msg

db
db
db
db
db
db

"Please enter input values "
"(a negative value terminates input):",0
"Enter the number to be searched: ",0
"The number is at position ",0
"Number not in the array!",0
"Do you want to quit (Y/N): ",0

.UDATA
array

resw

MAX SIZE

331

.CODE
.STARTUP
PutStr input_prompt
index
xor
ESI,ESI
mov
ECX,MAX_SIZE
array_loop:
Getint AX
negative number?
cmp
AX,0
j1
read_input
if so, stop reading numbers
[array+ESI*2],AX
mov
increment array index
inc
ESI
iterates a maximum of MAX_SIZE
loop
array_loop
read_input:
request a number to be searched
PutStr query_number
Getint AX
push number, size & array pointer
push
AX
push
ESI
push
array
call
linear_search
; linear_search returns in AX the position of the number
; in the array; if not found, it returns 0.
cmp
AX,0
; number found?
je
not_found
; if not, display number not found
PutStr out_msg
; else, display number position
Putint AX
jmp
SHORT user_query
not_found:
PutStr not_found_msg
user_query:
nwln
; query user whether to terminate
PutStr query_msg
GetCh
AL
; if response is not 'Y'
AL,'Y'
cmp
; repeat the loop
jne
read_input
done:
.EXIT

This procedure receives a pointer to an array of integers,

332

64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89

Assembly Language Programming in Linux

;
;
/
;

the array size, and a number to be searched via the stack.
If found, it returns in AX the position of the number in
the array; otherwise, returns 0.
All registers, except EAX, are preserved.

linear_search:
enter
0,0
push
EBX
; save registers
push
ECX
; copy array pointer
EBX, [EBP+8]
mov
ECX, [EBP+12] ; copy array size
mov
; copy number to be searched
AX, [EBP+16]
mov
; adjust pointer to enter loop
EBX, 2
sub
search_loop:
; update array pointer
EBX, 2
add
; compare the numbers
AX,[EBX]
cmp
loopne search_loop
/ set return value to zero
AX,0
mov
number_not_found
jne
EAX, [EBP+12] ; copy array size
mov
EAX,ECX
; compute array index of number
sub
number_not_found:
ECX
; restore registers
pop
pop
EBX
leave
10
ret

The linear search procedure receives a pointer to an array, its size, and the number to be
searched via the stack. The search process starts at the first element of the array and proceeds
until either the element is located or the array is exhausted. We use the loopne instruction on
line 80 to test these two conditions for the termination of the search loop. The ECX is initialized
(line 74) to the size of the array. In addition, a compare (line 79) tests if there is a match between
the two numbers. If so, the zero flag is set and loopne terminates the search loop. If the number
is found, the index of the number is computed (lines 83 and 84) and returned in the EAX register.
Example 15.4 Sorting of an integer array using the selection sort algorithm.
The main program is very similar to that in the last example, except for the portion that displays
the sorted array. The sort procedure receives a pointer to the array to be sorted and its size via the
stack. It uses the selection sort algorithm to sort the array in ascending order. The basic idea is as
follows:
1. Search the array for the smallest element;
2. Move the smallest element to thefirstposition by exchanging values of thefirstand smallest
element positions;
3. Search the array for the smallest element from the second position of the array;
4. Move this element to position 2 by exchanging values as in Step 2;
5. Continue this process until the array is sorted.
The selection sort procedure implements the following pseudocode:

Chapter 15 • Conditional Execution

333

s e l e c t i o n _ s o r t (array, size)
for (position = 0 to size—2)
min.value := array[position]
min.position := position
for (j = positions 1 to size— 1)
if (array [j] < min .value)
then
min_value := array[j]
min_position := j
end if
end for
if (position ^ min .position)
then
array [min .position] := array [position]
array [position] := min .value
end if
end for
end s e l e c t i o n _ s o r t
The selection sort procedure, shown in Program 15.3, implements this pseudocode with the
following mapping of variables: p o s i t i o n is maintained in ESI, and EDI is used for the index
variable j . The min_value variable is maintained in DX and m i n _ p o s i t i o n in AX. The
number of elements to be searched for finding the minimum value is kept in ECX.

Program 15.3 Sorting of an integer array using tiie selection sort algorithm
Sorting an array by selection sort

SEL_SORT.ASM

Objective: To sort an integer array using
selection sort.
Input: Requests numbers to fill array.
Output: Displays sorted array.
%include "io.mac"
EQU

MAX_SIZE
10
11
12
13
14
15
16
17
18
19
20
21
22
23

.DATA
input__prompt
out_msg
empty_array_msg
.UDATA
array

db
db
db
db
resw

100
"Please enter input array (a negative "
"number terminates the input):",0
"The sorted array is:",0
"Empty array I",0

MAX SIZE

.CODE
.STARTUP
PutStr input_prompt ; request input array
xor
ESI,ESI
; array index = 0

334

24:
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

56
57
58
59
60 :
61 :
62 :
63 :
64 :
65 :
66
67 :
68 :
69 :
70 :
71 :
72 :
73 :
74 :

Assembly Language Programming in Linux

mov
array_loop:
Getint

cmp
jl
mov
inc
loop
exit_loop:
push
push
call

mov
jecxz
PutStr
nwln

mov
xor
display_loop:
Putint
nwln

ECX,MAX_SIZE

AX
AX,0
negative number?
exit_loop
; if so, stop reading numbers
[array+ESI*2], AX
increment array index
ESI
array_loop
; iterates a maximum of MAX_SIZE
push array size & array pointer
ESI
array
selection_sort
ECX = array size
ECX,ESI
empty_array ; check for empty array
out_msg
; display sorted array
EBX,array
ESI,ESI
[array+ESI*2]

inc

ESI

loop

display_loop
short done

jmp
empty_array:
PutStr
nwln
done:
.EXIT

emp t y_a r r ay_m sg

/ This procedure receives a pointer to an array of integers
; and the array size via the stack. The array is sorted by
; using the selection sort. All registers are preserved.
%define SORT_ARRAY EBX
selection_sort
pushad
; save registers
mov
EBP,ESP
copy array pointer
mov
EBX, [EBP+36]
mov
ECX, [EBP+40]
copy array size
cmp
ECX, 1
sel_sort_done
jle
array left of ESI is sorted
ESI,ESI
sub
sort_outer_loop:
mov
EDI,ESI
; DX is used to maintain the minimum value and AX
; stores the pointer to the minimum value
/ min. value is in DX
DX,[SORT_ARRAY+ ESI*2]
mov
EAX = pointer to min. value
mov
EAX,ESI
push
ECX

Chapter 15 • Conditional Execution

75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98

335

dec
ECX
; size of array left of ESI
sort_inner_loop:
inc
EDI
; move to next element
cmp
DX, [S0RT_ARRAY+EDI*2] / less than m.in. value?
jle
skipl
/ if not, no change to min. value
mov
DX,[S0RT_ARRAY+EDI*2]/ else, update min. value (DX)
mov
EAX,EDI
/ & its pointer (EAX)
skipl
loop
sort_inner_loop
ECX
pop
cmp
EAX,ESI
; EAX = ESI?
skip2
; if so, element at ESI is in its place
je
mov
EDI,EAX
; otherwise, exchange
AX,[S0RT_ARRAY+ESI*2]
; exchange min. value
mov
xchg
AX,[S0RT_ARRAY+EDI*2]
; & element at ESI
mov
[S0RT_ARRAY+ESI*2],AX
skip2
ESI
; move ESI to next element
inc
ECX
dec
ECX,1
/ if ECX = 1, we are done
cmp
sort_outer_loop
jne
sel_sort_done:
; restore registers
popad
8
ret

Indirect Jumps
So far, we have used only the direct jump instruction. In direct jump, the target address (i.e., its
relative offset value) is encoded into the jump instruction itself (see Figure 15.1 on page 320). We
now look at indirect jumps. We limit our discussion to jumps within a segment.
In an indirect jump, the target address is specified indirectly either through memory or a
general-purpose register. Thus, we can write
3mp

[ECX]

if the ECX register contains the offset of the target. In indirect jumps, the target offset is the
absolute value (unlike the direct jumps, which use a relative offset value). The next example
shows how indirect jumps can be used with a jump table stored in memory.
Example 15.5 An example with an indirect jump.
The objective here is to show how we can use the indirect jump instruction. To this end, we show a
simple program that reads a digit from the user and prints the corresponding choice represented by
the input. The listing is shown in Program 15.4. An input between 0 and 9 is valid. If the input is
0, 1, or 2, it displays a simple message to indicate the class selection. Other digit inputs terminate
the program. If a nondigit input is given to the program, it displays an error message and requests
a valid digit input.

336

Assembly Language Programming in Linux
Program 15.4 An example demonstrating the use of the indirect jump

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
le
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:

Sample indirect jump example

IJUMP.ASM

Objective: To demonstrate the use of indirect jump.
Input: Requests a digit character from the user.
Output: Appropriate class selection message.
%include "io.mac"
.DATA
jump_table

prompt_msg
msg_0
msg_l
msg_2
msg_default
msg_nodigit

indirect jump pointer table

dd
dd
dd
dd
dd
dd
dd
dd
dd
dd

code_for_0
code_for_l
code_for_2
default_code
default_code
default_code
default_code
default_code
default_code
default code

db
db
db
db
db
db

"Type a digit: ",0
"Economy class selected.",0
"Business class selected.",0
"First class selected.",0
"Not a valid code!",0
"Not a digit! Try again.",0

default code for digits 3-9

.CODE
.STARTUP
read_again:
request a digit
PutStr
prompt_msg
EAX = 0
sub
EAX,EAX
read input digit and
GetCh
AL
check to see if it is a digit
cmp
AL,'0'
jb
not_digit
cmp
AL,'9'
ja
not_digit
; if digit, proceed
; convert to numeric equivalent
AL,'0'
sub
; ESI is index into jump table
mov
ESI,EAX
indirect jump based on ESI
jmp
[jump_table+ESI*4]
test_termination:
cmp
AL,2
ja
done
jmp
read_again
code_for_0:
PutStr
msg_0
nwln
test termination
jmp
code_for_l:
PutStr
msg_l

Chapter 15 • Conditional Execution

51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67

nwln
jmp
code_for_2:
PutStr
nwln
jmp
default_code:
PutStr
nwln
jmp
not_digit:
PutStr
nwln
jmp
done:
.EXIT

337

test_termination
msg_2
test_termination
msg_default
test_termination

msg_nodigit
read_again

In order to use the indirect jump, we have to build a Jump table of pointers (see lines 9-18).
The input is tested for its validity on lines 33 to 36. If the input is a digit, it is converted to act as
an index into the jump table and stored in ESI. This value is used in the indirect jump instruction
(line 40). The rest of the program is straightforward to follow.
Multiway Conditional Statements

In high-level languages, a two- or three-way conditional execution can be controlled easily by
using i f statements. For large multiway conditional execution, writing the code with nested i f
statements is tedious and error prone. High-level languages like C provide a special construct for
multiway conditional execution. In this section we look at the C s w i t c h construct for multiway
conditional execution.
Example 15.6 Multiway conditional execution in C.
As an example of the s w i t c h statement, consider the following code:
switch (ch)
{

case ' a ' :
count[0]++/ /* increment count [0] */
break;
case 'b':
count[1]++;
break;
case 'c':
count[2]++;
break;
case 'd':
count[3]++;
break;
case 'e':
count[4]++;

338

Assembly Language Programming in Linux

break;
default:
count [5]++;

The semantics of the switch statement are as follows: If character ch is a, it executes the
c o u n t [0] ++ statement. The b r e a k statement is necessary to escape out of the s w i t c h statement. Similarly, if ch is b, count [1] is incremented, and so on. The d e f a u l t case statement
is executed if ch is not one of the values specified in the other case statements.
The assembly language code generated by gcc (with — s option) is shown below. Note that
gcc uses AT&T syntax, which is different from the syntax we have been using here. The assembly code is embellished for easy reading. We will discuss the AT&T syntax in Chapter 21 (see
page 434).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29 :
30
31
32 :
33
34
35 :
36
37

main:

mov
sub
cmp
ja
jmp

EAX ch
; 97 = ASCII
EAX 97
EAX 4
default
[jump_table+EAX*4]

section
.align 4
jump_table:
dd
dd
dd
dd
dd
.text
case_a:
inc
end_switch:

rodata

case_a
case_b
case_c
case_d
case_e

dword ptr[EBP--56]

leave
ret
case_b:
inc
jmp

dword ptr[EBP--52]
end_switch

inc
jmp

dword ptr[EBP--48]
end_switch

inc
jmp

dword ptr [EBP--44]
end_switch

inc
jmp

dword ptr[EBP--40]
end_switch

case_c:

case_d:

case_e:

'a'

Chapter 15 • Conditional Execution

38
39
40
41
42

339

default:
inc
jmp

WORD PTR [EBP-2 0]
end switch

The character to be tested is moved to the EAX register. The subtract and compare instructions
on lines 5 and 6 check if the character is within the range of the case values (i.e., between a and
e). If not, the conditional jump instruction on line 7 transfers control to the d e f a u l t case. If it
is one of the five lowercase letters, the indirect jump instruction on line 8 transfers control to the
appropriate case using the jump table on lines 12-17. Since each entry in this jump table is four
bytes long, we use a scale factor of 4 in this jump instruction.
•

Summary
We discussed unconditional and conditional jump instructions as well as compare and loop instructions in detail. These assembly language instructions are useful in implementing high-level
language selection and iteration constructs such as i f - t h e n - e l s e and w h i l e loops. Through
detailed examples, we have shown how these instructions are used in the assembly language.
In the previous chapters, we extensively used direct jump instructions. In this chapter, we
introduced the indirect jump instruction. In this jump instruction, the target of the jump is specified
indirectly. Indirect jumps are useful to implement multiway conditional statements such as the
s w i t c h statement in C. By means of an example, we have shown how such multiway statements
of high-level languages are implemented in the assembly language.

16
Logical and Bit
Operations
Bit manipulation is an important aspect of many high-level languages. This chapter discusses the
logical and bit manipulation instructions supported by the assembly language. Assembly language
provides several logical instructions to implement logical expressions. These instructions are also
useful in implementing bitwise logical operations. In addition, several shift and rotate instructions are provided to facilitate bit manipulation. A few instructions are also provided to test and
modify bits. These four types of instructions are discussed in this chapter After describing these
instructions, we give several examples to illustrate their application. The chapter concludes with a
summary.

Introduction
Modem high-level languages provide several conditional and loop constructs. These constructs
require Boolean or logical expressions for specifying conditions. Assembly language provides
several logical instructions to express these conditions. These instructions manipulate logical data
just like the arithmetic instructions manipulate arithmetic data (e.g., integers) with operations such
as addition and subtraction. The logical data can take one of two possible values: t r u e or f a l s e .
As the logical data can assume only one of two values, a single bit is sufficient to represent
these values. Thus, all logical instructions that we discuss here operate on a bit-by-bit basis. By
convention, if the value of the bit is 0 it represents f a l s e , and a value of 1 represents t r u e .
We have discussed the assembly language logical instructions in Chapter 9, we devote part of
this chapter to look at the typical uses for these logical instructions. The assembly language also
provides several shift and rotate instructions. The shift instructions are very efficient in performing
multiplication and division of signed and unsigned integers by a power of 2. We use examples to
illustrate how this can be done using the shift instructions. Several bit manipulation instructions
are also provided by the assembly language. These instructions can be used to test a specific bit,
to scan for a bit, and so on. A detailed discussion of these instructions is provided in the later part
of this chapter.

342

Assembly Language Programming in Linux

Logical Instructions
Assembly language provides a total of five logical instructions: and, or, not, xor, and t e s t .
Except for the n o t operator, all of the logical operators are binary operators (i.e., they require two
operands). These instructions operate on 8-, 16-, or 32-bit operands.
All of these logical instructions affect the status flags. Since operands of these instructions
are treated as a sequence of independent bits, these instructions do not generate carry or overflow.
Therefore, the carry (CF) and overflow (OF) flags are cleared, and the status of the auxiliary flag
(AF) is undefined.
Only the remaining three arithmeticflags—thezeroflag(ZF), the sign flag (SF), and the parity
flag (PF)—record useful information about the results of these logical instructions. Since we
discussed these instructions in Chapter 9, we look at their typical use in this chapter.
The logical instructions are useful in implementing logical expressions of high-level languages.
For example, C provides the following two logical operators:
C operator
ScSc

Meaning
AND
OR

These logical operators can be implemented using the corresponding assembly language logical
instructions.
Some high-level languages provide bitwise logical operators. For example, C provides bitwise
and (Sc), o r (I), x o r ("), and n o t (~) operators. These can be implemented by using the logical
instructions provided in the assembly language.
Table 16.1 shows how the logical instructions are used to implement the bitwise logical operators of the C language. The variable mask is assumed to be in the ESI register.

Table 16.1 Examples of C bitwise logical operators
Assembly language instruction

C statement
mask = ~mask
(complement mask)

not

ESI

mask = mask & 85
(bitwise a n d )

and

ESI,85

mask = mask
(bitwise o r )

ESI,85

xor

ESI,85

| 85

mask = mask " 85
(bitwise x o r )

The and Instruction

The and instruction is useful mainly in three situations:

Chapter 16 • Logical and Bit Operations

343

1. To support compound logical expressions and bitwise and operations of high-level languages;
2. To clear one or more bits;
3. To isolate one or more bits.
As we have already discussed the first use, here we concentrate on how and can be used to clear
and isolate selected bits of an operand,
Clearing Bits If you look at the truth table of the and operation (see page 204), you will notice
that the source bi acts as a masking bit: if the masking bit is 0, the output is 0 no matter what the
other input bit is; if the masking bit is 1, the other input bit is passed to the output.
Consider the following example:

and

AL =11010110
even_number:

If AL has an even number, the least significant bit of AL is 0. Therefore,
and

AL,1

would produce a zero result in AL and sets the zero flag. The j z instruction is then used to test
the status of the zero flag and to selectively execute the appropriate code fragment. This example
shows the use of and to isolate a bit—the least significant bit in this case.
•
The or Instruction
Like the and instruction, the o r instruction is useful in two applications:
1. To support compound logical expressions and bitwise or operations of high-level languages;
2. To set one or more bits.
The use of the o r instruction to express compound logical expressions and to implement bitwise
or operations has been discussed before. We now discuss how the o r instruction can be used to
set a given set of bits.
As you can see from the truth table for the o r operation (see page 204), when the source b i is
0, the other input is passed on to the output; when the source 6 ^ is 1, the output is forced to take
a value of 1 irrespective of the other input. This property is used to set bits in the output. This is
illustrated in the following example.

AL =11010110B <— operand to be manipulated
BL =0000001 IB <—mask byte
AL,BL = I I O I O I I I B

The mask value in the BL register causes the least significant two bits to change to 1. Here is
another example.
Example 16.3 Even-parity encoding (partial code).
Consider the even-parity encoding discussed in Example 16.1. If the number of I's in the least
significant 7 bits is odd, we have to make the parity bit 1 so that the total number of I's is even.
This is done by
or

AL,80H

assuming that the byte to be parity-encoded is in the AL register. This o r operation forces the
parity bit to 1 while leaving the remainder of the byte unchanged.
•
Cutting and Pasting Bits The and and o r instructions can be used together to "cut and paste"
bits from two or more operands. We have already seen that and can be used to isolate selected

Chapter 16 • Logical and Bit Operations

345

bits—analogous to the "cut" operation. The o r instruction can be used to "paste" the bits. For
example, the following code creates a new byte in AL by combining odd bits from AL and even
bits from BL registers.
and
and
or

AL,55H
BL,OAAH
AL,BL

; cut odd bits
; cut even bits
; paste them together

The first and instruction selects only the odd bits from the AL register by forcing all even bits
to 0 by using the mask 55H (0101010IB). The second and instruction selects the even bits by
using the mask AAH (lOlOlOlOB). The o r insstruction simply pastes these two bytes together to
produce the desired byte in the AL register.
The xor Instruction

The x o r instruction is useful mainly in three different situations:
1. To support compound logical expressions of high-level languages;
2. To toggle one or more bits;
3. To initialize registers to zero.
The use of the x o r instruction to express compound logical expression has been discussed
before. Here we focus on the use of x o r to toggle bits and to initialize registers to zero.
Toggling Bits Using the x o r instruction, we can toggle a specific set of bits. To do this, the
mask should have 1 in the bit positions that are to be flipped. The following example illustrates
this application of the x o r instruction.
Example 16,4 Parity conversion.
Suppose we want to change the parity encoding of incoming data—if even parity, change to odd
parity and vice versa. To accomplish this change, all we have to do is flip the parity bit, which can
be done by
xor

AL,8 0H

Thus, an even-parity encoded ASCII character A—0100000IB—is transformed into its odd-parity
encoding, as shown below:
OIOOOOOIB ^ even-parity encoded ASCII character A
xor lOOOOOOOB >2
(right-shift mask by two bit positions)
mask = mask<<4
(left-shift mask by four bit positions)

shl

SI,4

Shift Operations Some high level languages provide left- and right-shift operations. For example, the C language provides two shift operators: left shift (<<) and right shift (>>). These
operators can be implemented with the assembly language shift instructions.
Table 16.2 shows how the shift instructions are used to implement the shift operators of the C
language. The variable mask is assumed to be in the SI register.
Bit Manipulation The shift operations provideflexibilityto manipulate bits as illustrated by the
following example.
Example 16.6 Another encryption example.
Consider the encryption example discussed on page 346. In this example, we use the following
encryption algorithm: encrypting a byte involves exchanging the upper and lower nibbles (i.e.,
4 bits). This algorithm also allows the recovery of the original data by applying the encryption
twice, as in the x o r example on page 346.
Assuming that the byte to be encrypted is in the AL register, the following code implements
this algorithm:
; AL contains the byte to be
mov
AH,AL
shl
AL,4
; move lower
shr
AH,4
; move upper
or
AL,AH
; paste them
; AL has the encrypted byte

encrypted
nibble to upper
nibble to lower
together

To understand this code, let us trace the execution by assuming that AL has the ASCII character
A. Therefore,
AH = AL = OIOOOOOIB

The idea is to move the upper nibble to lower in the AH register, and the other way around in
the AL register. To do this, we use s h l and s h r instructions. The s h l instruction replaces the
shifted bits by O's and after the s h l
AL = OOOIOOOOB

Similarly, s h r introduces O's in the vacated bits on the left. Thus, after the s h r instruction
AH = OOOOOIOOB

Chapter 16 • Logical and Bit Operations

349

Table 16.3 Doubling and halving of unsigned numbers

Binary number

Decimal value

00011100
00111000
01110000
11100000

28
56
112
224

10101000
01010100
00101010
00010101

168
84
42
21

The o r instruction pastes these two bytes together, as shown below:

AL
AH
AL,AH

OOOIOOOOB
OOOOOIOOB
OOOIOIOOB

We show later that this encryption can be done better by using a rotate instruction (see Example 16.7 on page 353).
•
Multiplication and Division Shift operations are very effective in performing doubling or halving
of unsigned binary numbers. More generally, they can be used to multiply or divide unsigned
binary numbers by a power of 2.
In the decimal number system, we can easily perform multiplication and division by a power
of 10. For example, if we want to multiply 254 by 10, we will simply append a 0 at the right
(analogous to shifting left by a digit with the vacated digit receiving a 0). Similarly, division of
750 by 10 can be accomplished by throwing away the 0 on the right (analogous to right shift by a
digit).
Since computers use the binary number system, they can perform multiplication and division
by a power of 2. This point is further clarified in Table 16.3. The first half of this table shows how
shifting a binary number to the left by one bit position results in multiplying it by 2. Note that the
vacated bits are replaced by O's. This is exactly what the s h l instruction does. Therefore, if we
want to multiply a number by 8 (i.e., 2^), we can do so by shifting the number left by three bit
positions.
Similarly, as shown in the second half of the table, shifting the number right by one bit position
is equivalent to dividing it by 2. Thus, we can use the s h r instruction to perform division by a
power of 2. For example, to divide a number by 32 (i.e., 2 ^), we right shift the number by five
bit positions. Remember that this division process corresponds to integer division, which discards
any fractional part of the result.

350

Assembly Language Programming in Linux

Table 16.4 Doubling of signed numbers
Signed binary number

Decimal value

00001011

+ 11

00010110

+22
+44
+88

00101100
01011000

-11
-22
-44
-88

11110101
11101010
11010100
10101000

Arithmetic Shift Instructions

This set of shift instructions
s a l (Shift Arithmetic Left)
s a r (Shift Arithmetic Right)
can be used to shift signed numbers left or right, as shown below.
SAL

Bit Position:

SAR
Bit Position:

As with the logical shift instructions, the CL register can be used to specify the count value. The
general format is
sal
sal

destination,count
destination,CL

sar
sar

destination,count
destination,CL

Doubling Signed Numbers Doubling a signed number by shifting it left by one bit position may
appear to cause problems because the leftmost bit is used to represent the sign of the number. It
turns out that this is not a problem at all. See the examples presented in Table 16.4 to develop your
intuition. The first group presents the doubling effect on positive numbers and the second group
on negative numbers. In both cases, a 0 replaces the vacated bit. Why isn't shifting the sign bit
out causing problems? The reason is that signed numbers are sign-extended to fit a larger-thanrequired number of bits. For example, if we want to represent numbers in the range of +3 and

Cliapter16 • Logical and Bit Operations

351

Table 16,5 Division of signed numbers by 2
Signed binary number

Decimal value

01011000
00010110

+88
+44
+22

00001011

+ 11

10101000

-88
-44
-22
-11

00101100

11010100
11101010
11110101

- 4 , 3 bits are sufficient to represent this range. If we use a byte to represent the same range, the
number is sign-extended by copying the sign bit into the higher order five bits, as shown below.
sign bit
copied

+3= OOOOOOllB
sign bit
copied

- 3 - 11111 lOlB
Clearly, doubling a signed number is no different than doubling an unsigned number. Thus, no
special shift left instruction is needed for the signed numbers. In fact, s a l and s h l are one and
the same instruction—sal is an alias for s h l .
Halving Signed Numbers Can we also forget about treating the signed numbers differently in
halving a number? Unfortunately, we cannot! When we right shift a signed number, the vacated
left bit should be replaced by a copy of the sign bit. This rules out the use of s h r for signed
numbers. See the examples presented in Table 16.5. The s a r instruction precisely does this—it
copies the sign bit into the vacated bit on the left.
Remember that the shift right operation performs integer division. For example, right shifting
0000101 IB (+1 ID) by a bit results in OOOOOIOIB (+5D).
Why Use Shifts for Multiplication and Division?

Shifts are more efficient than the corresponding multiplication and division instructions. As an
example, consider dividing an unsigned 16-bit number in the AX register by a power of 2 that is
BX. Using the d i v instruction, we can write
; dividend is assumed to be in DX:AX
div
BX

352

Assembly Language Programming in Linux

o
o

Number of calls (in millions)
Figure 16.1 Execution time comparison of implementing division by a power of 2 using tine siiift
and divide instructions.

Now let us look at how we can perform this multiplication with the s h r instruction. If we place the
bit shift count in the CL register, we can use this shift instruction to perform the division operation.
In the following code
bsr
shr

CX,BX
AX,CL

the b s r instruction places this shift count in the CX register. We give details of this instruction on
page 355.
Figure 16.1 shows the execution of these two versions on a 2.8 GHz Pentium 4 machine running the Red Hat Linux. The x-axis gives the number times (in millions) the division operation is
performed. The y-axis gives the execution time in seconds. The "Shift" line is the execution time
of the version that uses s h r to perform the division 40000/1024. The corresponding execution
time for the d i v version is shown by the "Divide" line. Clearly, the shift version is much more
efficient than the divide version.
Doubleshift Instructions

The IA-32 instruction set also provides two doubleshift instructions for 32-bit and 64-bit shifts.
These two instructions operate on either word or doubleword operands and produce a word or
doubleword result, respectively. The doubleshift instructions require three operands, as shown
below:
shld
shrd

dest,src,count
dest,src,count

; left shift
; right shift

d e s t and s r c can be either a word or a doubleword. While the d e s t operand can be in a register
or memory, the s r c operand must be in a register. The shift c o u n t can be specified as in the shift
instructions—either as an immediate value or in the CL register.

Chapter 16 • Logical and Bit Operations

353

A significant difference between the shift and doubleshift instructions is that the s r c operand
supplies the bits in doubleshift instructions, as shown below:
15/31
shld

dest (register or memory) -<—

/31

0
src (register)

shrd

0 15/31
src (register)

15/31
—^-

dest (register or memory)

0
CF

Note that the bits shifted out of the s r c operand go into the d e s t operand. However, the s r c
operand itself is not modified by the doubleshift instructions. Only the d e s t operand is updated
appropriately. As in the shift instructions, the last bit shifted out is stored in the carry flag. Later
we present an example that demonstrates the use of the doubleshift instructions (see Example 16.8
on page 354).

Rotate Instructions
A drawback with the shift instructions is that the bits shifted out are lost. There are situations where
we want to keep these bits. While the doubleshift instructions provide this capability on word
and doubleword operands, the rotate instructions remedy this drawback for a variety of operands.
These instructions can be divided into two types: rotate without involving the carry flag (CF), or
rotate through the carry flag. Since we presented these two types of rotate instructions in Chapter 9,
we discuss their typical usage next.
Rotate Without Carry

The rotate instructions are useful in rearranging bits of a byte, word, or doubleword. This is
illustrated below by revisiting the data encryption example given on page 348.
Example 16.7 Encryption example revisited,
In Example 16.6, we encrypted a byte by interchanging the upper and lower nibbles. This can be
done easily either by
mov
ror

CL, 4
AL, CL

mov
rol

CL, 4
AL CL

or by

This is a much simpler solution than the one using shifts.

Rotate Through Carry

The r c l and r c r instructions provide flexibility in bit rearranging. Furthermore, these are the
only two instructions that take the carry flag bit as an input. This feature is useful in multiword
shifts, as illustrated by the following example.

354

Assembly Language Programming in Linux

Example 16.8 Shifting 64-bit numbers.
We have seen that multiplication and division by a power of 2 is faster if we use shift operations
rather than multiplication or division instructions. Shift instructions operate on operands of size
up to 32 bits. What if the operand to be manipulated is bigger?
Since the shift instructions do not involve the carry flag as input, we have two alternatives:
either use r c l or r c r instructions, or use the double shift instructions for such multiword shifts.
As an example, assume that we want to multiply a 64-bit unsigned number by 16. The 64-bit
number is assumed to be in the EDXiEAX register pair with EAX holding the least significant 32
bits.
Rotate version:
mov
CX,4
; 4 bit shift
shift_left:
shl
EAX,1
; moves leftmost bit of AX to CF
rcl
EDX,1
/ CF goes to rightmost bit of DX
loop
shift_left

Doubleshift version:
shld
shl

EDX,EAX,4 / EAX is unaffected by shld
EAX,4

Similarly, if we want to divide the same number by 16, we can use the following code:
Rotate version:
mov
CX,4
; 4 bit shift
shift_right:
shr
EDX,1
; moves rightmost bit of DX to CF
rcr
EAX,1
; CF goes to leftmost bit of AX
loop
shift_right

Doubleshift version:
shrd
shr

EAX,EDX,4 ; EDX is unaffected by shld
EDX,4

Clearly, the doubleshift instruction avoids the need for a loop.

•

Bit Instructions
The IA-32 instruction set includes several bit test and modification instructions as well as bit scan
instructions. This section discusses these two groups of instructions. The programming examples
given later illustrate the use of these instructions.
Bit Test and Modify Instructions

There are four bit test instructions. Each instruction takes the position of the bit to be tested. The
least significant bit is considered as bit position zero. A summary of the four instructions is given
below:

Chapter 16 • Logical and Bit Operations

355

Instruction

Effect on Selected Bit

bt (Bit Test)
bts (Bit Test and Set)
btr (Bit Test and Reset)
btc (Bit Test and Complement)

No effect
Selected bit^ 1
Selected bit^ 0
Selected bit^ NOT(Selected bit)

All four instructions copy the selected bit into the carry flag. The format of all four instructions
is the same. We use the b t instruction to illustrate the format of these instructions.
bt

operand,bit_pos

where o p e r a n d can be a word or doubleword located either in a register or in memory. The
b i t _ p o s specifies the bit position to be tested. It can be specified as an immediate value or in
a 16- or 32-bit register. Instructions in this group affect only the carry flag. The other five status
flags are undefined following a bit test instruction.
Bit Scan Instructions
Bit scan instructions scan the operand for a 1 bit and return its bit position in a register. There are
two instructions—one to scan forward and the other to scan backward. The format is
bsf
bsr

dest_reg,operand
dest_reg,operand

;bit scan forward
;bit scan reverse

where o p e r a n d can be a word or doubleword located either in a register or in memory. The
d e s t _ r e g receives the bit position. It must be a 16- or 32-bit register. The zero flag is set if all
bits of o p e r a n d are 0; otherwise, the ZF is cleared and the d e s t _ r e g is loaded with the bit
position of the first 1 bit while scanning forward (for bsf), or reverse (for b s r ) . Like the bit test
and modify instructions, these two instructions affect only the zero flag; the other five status flags
are undefined.

Our First Program
As our first program, we look at how we can use the s a r instruction to perform signed integer
division. In this program, we divide a signed 32-bit integer by a power of 2. The program listing
is given in Program 16.1. It requests two numbers from the user. The numerator can be a signed
32-bit integer. This is read using G e t L i n t on line 20. The user is then prompted to enter the
denominator. After validating the denominator, the program outputs the result of the division
operation. After displaying the result, it queries whether the user wants to quit. Based on the
response received, the program either terminates or repeats the process.

356

Assembly Language Programming in Linux

Program 16.1 Integer division using the shift instruction
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
1£
19:
20:
21:
22:
23 :
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:

Division using shifts

SAR_DIVIDE.ASM

Objective: To divide a 32-bit signed number
by a power of 2 using SAR.
Input: Requests two numbers from the user.
Output: Prints the division result.
%include "io.mac"
.DATA
db 'Please input numerator: ',0
prompt1
db 'Please input denominator: ',0
prompt2
db 'The integer division result is: rO
out_msgl
db 'Do you want to quit (Y/N): ',0
query_msg
db 'Denominator is zero. ',
error_msg
db 'Enter a nonzero value: ',0
.CODE
.STARTUP
read_input:
PutStr promptl
GetLInt EAX
PutStr prompt2
read_denom:
GetLInt EBX
bsr
ECX^EBX

request numerator
request denominator

; ECX receives the position of
/ the leftmost 1 bit in EBX
; bsr clears ZF if there is at least 1 bit
; in denominator; ZF = 0 if all the bits are zero
jnz
nonZero
if denominator is zero,
PutStr error_msg
read again
jmp
read_denom
nonZero:
sar
EAX,CL
output the result
PutStr out_msgl
PutLInt EAX
nwln
query whether to terminate
PutStr query_msg
GetCh
AL
if response is not 'Y'
cmp
AL,'Y'
repeat the loop
jne
read_input
done:
.EXIT

Chapter 16 • Logical and Bit Operations

357

The division is done by the s a r instruction. To do this, we need to find out the number bit
positions the numerator needs to be shifted right. If we assume that the denominator is a power of
2, it will have a single 1 bit. We use b s r to find the position of this 1 bit. The instruction
bsr

ECX,EBX

scans the denominator in EBX from the most significant bit (i.e., it scans the value in EBX from
left to right). The first 1 bit position is returned in the ECX register. If the denominator is zero,
the b s r instruction sets the zero flag (ZF = 1). Otherwise, it is cleared. We use this condition to
detect if the denominator is zero (line 28). If it is zero, an error message is displayed and the user
is prompted for a nonzero value. If the denominator is not a power of 2, the most significant bit
that has 1 is returned by the b s r instruction. For example, if the denominator is 10, it divides the
numerator by 8.

Illustrative Examples
This section presents two examples that use the instructions introduced in this chapter.
Example 16.9 Multiplication using only shifts and adds.
The objective of this example is to show how multiplication can be done entirely by using the shift
and add operations. We consider multiplication of two unsigned 8-bit numbers. In order to use the
shift operation, we have to express the multiplier as a power of 2. For example, if the multiplier is
64, the result can be obtained by shifting the multiplicand left by six bit positions because 2 ^ = 64.
What if the multiplier is not a power of 2? In this case, we have to express this number as a
sum of powers of 2. For example, if the multiplier is 10, it can be expressed as 8+2, where each
term is a power of 2. Then the required multiplication can be done by two shifts and one addition.
The question now is: How do we express the multiplier in this form? If we look at the binary
representation of the multiplicand (lOD = 00001 OlOB), there is a 1 in bit positions with weights 8
and 2. Thus, for each 1 bit in the multiplier, the multiplicand should be shifted left by a number
of positions equal to the bit position number. In the above example, the multiplicand should be
shifted left by 3 and 1 bit positions and then added. This procedure is formalized in the following
algorithm:
mult8 (numberl,number2)
result = 0
for (/ = 7 downto 0)
if (bit(number2, /) = 1)
result = result + number 1 * 2*
end if
end for
end mult8
The function b i t returns the ith bit of number2. The program listing is given in Program 16.2.
The main program requests two numbers from the user and calls the procedure mult8 and displays the result. As in the previous program, it queries the user whether to quit and proceeds
according to the response.

358

Assembly Language Programming in Linux

Program 16.2 Multiplication of two 8-bit numbers using only shifts and adds
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
1£
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:

8-bit multiplication using shifts

SHL_MLT.ASM

Objective: To multiply two 8-bit unsigned numbers
using SHL rather than MUL instruction,
Input: Requests two unsigned numbers.
Output: Prints the multiplication result.
%include "io.mac"
.DATA
db 'Please input two short numbers: ',0
input_prompt
db 'The multiplication result is: ',0
out_msgl
db 'Do you want to quit (Y/N): ',0
query_msg
.CODE
.STARTUP
read_input:
PutStr input_prompt
Getint AX
Getint BX
call
mult8
PutStr out_msgl
Putint AX
nwln
PutStr query_msg
GetCh
AL
cmp
AL,'Y'
jne
read_input
done:
.EXIT

request two numbers
AX = first number
BX = second number
mult8 leaves result in AX

query whether to terminate
if the response is not 'Y'
repeat the loop

mult8 multiplies two 8-bit unsigned numbers passed on
to it in AL and BL. The 16-bit result is returned in AX.
This procedure uses the SHL instruction to do the
multiplication. All registers, except AX, are preserved.
mult8:
push
push
push
xor
mov
mov
repeatl:
rol
jnc
mov
shl
add
skipl:
dec

CX
save registers
DX
SI
DX = 0 (keeps mult, result)
DX,DX
CX = # of shifts required
CX,7
save original number in SI
SI, AX
multiply loop - iterates 7 times
BL,1
test number2 bits from left
skipl
if 0, do nothing
AX, SI
else, AX = numberl*bit weight
AX,CL
update running total in DX
DX,AX
CX

Chapter 16 • Logical and Bit Operations

51
52
53
54
55
56
57
58
59
60

jnz
rol
jnc
add
skip2:
mov
pop
pop
pop
ret

repeatl
BL,1
skip2
DX,SI
AX,DX
SI
DX
CX

359

test the rightmost bit of AL
if 0, do nothing
else, add number1
move final result into AX
restore registers

The mult8 procedure multiplies two 8-bit unsigned numbers and returns the result in AX.
It follows the algorithm discussed on page 357. The multiply loop (lines 43-51) tests the most
significant 7 bits of the multiplier. The least significant bit is tested on lines 52 and 53. Notice
that the procedure uses r o l rather than s h l to test each bit (lines 44 and 52). The use of r o l
automatically restores the BL register after 8 rotates.
•
Example 16.10 Multiplication using only shifts and adds—version 2,
In this example, we rewrite the mul 18 procedure of the last example by using the bit test and scan
instructions. In the previous version, we used a loop (see lines 43-50) to test each bit. Since we
are interested only in 1 bits, we can use a bit scan instruction to do this job. The modified mul 18
procedure is shown below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

multS multiplies two 8-bit unsigned numbers passed on
to it in AL and BL. The 16-bit result is returned in AX.
This procedure uses the SHL instruction to do the
multiplication. All registers, except AX, are preserved.
Demonstrates the use of bit instructions BSF and ETC.
mult8:
push
push
push
xor
mov
repeatl:
bsf
jz
mov
shl
add
btc
jmp
skipl:
mov
pop
pop
pop
ret

CX
DX
SI
DX,DX
SI, AX

; save registers

CX,BX
skipl
AX, SI
AX,CL
DX,AX
BX,CX
repeatl

/ CX = first 1 bit position
; if ZF=1, no 1 bit in B
; else, AX = numberl*bit weight

AX,DX
SI
DX
CX

; move final result into AX
; restore registers

; DX = 0 (keeps mult, result)
; save original number in SI

; update running total in DX
; complement the bit found by BSF

360

Assembly Language Programming in Linux

The modified loop (lines 14-21) replaces the loop in the previous version. This code is more
efficient because the number of times the loop iterates is equal to the number of 1 's in BX. The
previous version, on the other hand, always iterates seven times. Also note that we can replace the
b t c instruction on line 20 by a b t r instruction. Similarly, the b s f instruction on line 15 can be
replaced by a b r f instruction.
•

Summary
We discussed logical, shift, and rotate instructions available in the assembly language. Logical
instructions are useful to implement bitwise logical operators and Boolean expressions. However, in some instances Boolean expressions can also be implemented by using conditional jump
instructions without using the logical instructions.
Shift and rotate instructions provideflexibilityto bit manipulation operations. There are two
types of shift instructions: one works on logical and unsigned values, and the other is meant for
signed values. There are also two types of rotate instructions: rotate without, or rotate through
carry. Rotate through carry is useful in shifting multiword data.
The instruction set also provides two doubleshift instructions that work on either word or doubleword operands. In addition, four instructions for testing and modifying bits and two instructions
to scan for a bit are available.
We discussed how the logical and shift instructions are used to implement logical expressions
and bitwise logical operations in high-level languages. Shift instructions can be used to multiply
or divide by a number that is a power of 2. We have demonstrated that the shift instructions for
such arithmetic operations are much more efficient than the corresponding arithmetic instructions.

PART VI
Advanced Assembly Language

17
String Processing
String manipulation is an important aspect of any programming task. Strings are represented in a
variety of ways. We start the chapter with a discussion of the two representation schemes used to
store strings. The IA-32 instruction set supports string processing by a special set of instructions.
We describe these instructions in detail. Several examples are presented to illustrate the use of
string instructions in developing procedures for string processing. We also describe a program to
test the procedures developed here. A novelty of this program is that it demonstrates the use of
indirect procedure calls. Even though these instructions are called string instructions, they can be
used for processing other types data. We demonstrate this aspect by means of an example. The
chapter concludes with a summary.

String Representation
A string can be represented either as a fixed-length string or as a variable-length string. In the
fixed-length representation, each string occupies exactly the same number of character positions.
That is, each string has the same length, where the length of a string refers to the number of
characters in the string. In this representation, if a string has fewer characters, it is extended by
padding, for example, with blank characters. On the other hand, if a string has more characters, it
is usually truncated to fit the storage space available.
Clearly, if we want to avoid truncation of larger strings, we need to fix the string length carefully so that it can acconunodate the largest string that the program will ever handle. A potential
problem with this representation is that we should anticipate this value, which may cause difficulties with program maintenance. A further disadvantage of using fixed-length representation is that
memory space is wasted if majority of the strings are shorter than the length used.
The variable-length representation avoids these problems. In this scheme, a string can have as
many characters as required (usually, within some system-imposed limit). Associated with each
string, there is a string length attribute giving the number of characters in the string. This length
attribute is given in one of two ways:
1. Explicidy storing string length, or
2. Using a sentinel character.
These two methods are discussed next.

364

Assembly Language Programming in Linux

Explicitly Storing String Length

In this method, string length attribute is explicitly stored along with the string, as shown in the
following example:
string
str_len

DB
DW

'Error message'
$—string

where $ is the location counter symbol that represents the current value of the location counter. In
this example, $ points to the byte after the last character of s t r i n g . Therefore,
$—string
gives the length of the string. Of course, we could also write
string
str_len

DB
DW

'Error message'
13

However, if we modify the contents of s t r i n g later, we have to update the string length value as
well. On the other hand, by using $ - s t r i n g , we let the assembler do the job for us at assembly
time.
Using a Sentinel Character

In this method, strings are stored with a trailing sentinel character to delimit a string. Therefore,
there is no need to store the string length explicitly. The assumption here is that the sentinel
character is a special character that does not appear within a string. We normally use a special,
nonprintable character that does not appear in strings. We have been using the ASCII NULLcharacter (OOH) to terminate strings. Such NULL-terminated strings are called ASCHZ strings,
Here are some examples:
stringl
string2

DB
DB

'This is OK',0
'Price = $9.99',0

The C language, for example, uses this representation to store strings. In the remainder of this
chapter, we use this representation for strings.

String Instructions
There arefivemain string-processing instructions. These can be used to copy a string, to compare
two strings, and so on. It is important to note that these instructions are not just for the strings.
We can use them for other types of data. For example, we could use them to copy arrays of
doublewords, as we shall see later. Thefivebasic instructions are shown in Table 17.1.
Specifying Operands

As indicated, each string instruction may require a source operand, a destination operand, or both.
For 32-bit segments, string instructions use ESI and EDI registers to point to the source and destination operands, respectively. The source operand is assumed to be at DS:ESI in memory, and
the destination operand at ESiEDI in memory. For 16-bit segments, SI and DI registers are used
instead of ESI and EDI registers. If both the operands are in the same data segment, we can let
both DS and ES point to the data segment to use the string instructions.

Chapter 17 • String Processing

365

Table 17.1 String Instructions

Mnemonic
LODS
STOS
MOVS
CMPS
SCAS

Meaning

Operand(s) required

LOaD String
STOre String
MOVe String
CoMPare Strings
SCAn String

source
destination
source & destination
source & destination
destination

Variations

Each string instruction can operate on 8-, 16-, or 32-bit operands. As part of execution, string
instructions automatically update (i.e., increment or decrement) the index register(s) used by them.
For byte operands, source and destination index registers are updated by 1. These registers are
updated by 2 and 4 for word and doubleword operands, respectively. In this chapter, we focus
mostly on byte operand strings.
String instructions derive much of their power from the fact that they can accept a repetition
prefix to repeatedly execute the operation. These prefixes are discussed next. The direction of
string processing—forward or backward—is controlled by the direction flag (discussed later).
Repetition Prefixes

String instructions can be repeated by using a repetition prefix. As shown in Table 17.2, the three
prefixes are divided into two categories: unconditional or conditional repetition. None of the flags
is affected by these instructions.
Table 17.2 Repetition Prefixes

unconditional repeat
rep
conditional repeat
repe/repz
repne/repnz

REPeat
REPeat while Equal
REPeat while Zero
REPeat while Not Equal
REPeat while Not Zero

rep This is an unconditional repeat prefix and causes the instruction to repeat according to the
value in the ECX register. Note that for 16-bit addresses, CX register is used. The semantics of
the r e p prefix are

366

Assembly Language Programming in Linux

while (ECX 7^ 0)
execute the string instruction;
ECX:=ECX-1;
end while
The ECX register is first checked and if it is not 0, only then is the string instruction executed.
Thus, if ECX is 0 to start with, the string instruction is not executed at all. This is in contrast to the
l o o p instruction, which first decrements and then tests if ECX is 0. Thus, with loop, ECX = 0
results in a maximum number of iterations, and usually a j ecxz check is needed.
repe/repz This is one of the two conditional repeat prefixes. Its operation is similar to that of
r e p except that the repetition is also conditional on the zero flag (ZF), as shown below:
while (ECX 7^ 0)
execute the string instruction;
ECX:=ECX-1;
if(ZF = 0)
then
exit loop
end if
end while
The maximum number of times the string instruction is executed is determined by the contents
of ECX, as in the r e p prefix. But the actual number of times the instruction is repeated is determined by the status of ZF. Conditional repeat prefixes are useful with cmps and s e a s string
instructions.
repne/repnz This prefix is similar to the r e p e / r e p z prefix except that the condition tested is
ZF = 1 as shown below:
while (ECX ^^ 0)
execute the string instruction;
ECX:=ECX-1;
if(ZF=l)
then
exit loop
end if
end while
Direction Flag

The direction of string operations depends on the value of the direction flag. Recall that this is
one of the bits of the flag's register (see Figure 4.4 on page 65). If the direction flag (DF) is clear
(i.e., DF = 0), string operations proceed in the forward direction (from head to tail of a string);
otherwise, string processing is done in the opposite direction.
Two instructions are available to explicidy manipulate the direction flag:
std
eld

set direction flag (DF = 1)
clear direction flag (DF = 0)

Chapter 17 • String Processing

367

Both of these instructions do not require any operands. Each instruction is encoded using a single
byte.
Usually, it does not matter whether a string is processed in the forward or backward direction.
For sentinel character-terminated strings, forward direction is preferred. However, there are situations where one particular direction is mandatory. For example, if we want to shift a string right
by one position, we have to start with the tail and proceed toward the head (i.e., move backward)
as in the following example.
Initial string —»

1a 1 b 1c 1 0 1? 1

After one shift -^

1a 1 b 1c 1 0 1 0 1

After two shifts -

1a 1 b 1c 1 c 1 0 1

After three shiftsFinal string —>

a a

String Move Instructions

There are three basic instructions in this group—movs, l o d s , and s t o s . Each instruction can
take one of four forms. We start our discussion with the first instruction.
IVIove a String (movs) The format of the movs instruction is:
movs
movsb
movsw
movsd

dest_string,source_string

Using the first form, we can specify the source and destination strings. This specification will be
sufficient to determine whether it is a byte, word, or doubleword operand. However, this form is
not used frequently.
In the other three forms, the suffix b, w, or d is used to indicate byte, word, or doubleword
operands. This format applies to all the string instructions of this chapter.
The movs instruction is used to copy a value (byte, word, or doubleword) from the source
string to the destination string. As mentioned earlier, the source string value is pointed to by
DSiESI and the destination string location is indicated by ES:EDI in memory. After copying,
the ESI and EDI registers are updated according to the value of the direction flag and the operand
size. Thus, before executing the movs instruction, all four registers should be set up appropriately.
(This is necessary even if you use the first format.) Note that our focus is on 32-bit segments. For
16-bit segments, we use the SI and DI registers.
movsb — move a byte string
; copy a byte
ES:EDI := (DS:ESI)
if(DF = 0)
; forward direction
then
ESI:=ESI+1
EDI:=EDI+1

368

Assembly Language Programming in Linux

else

; backward direction

ESI := ESI-1
EDI := EDI-1
end if
Flags affected: none
For word and doubleword operands, the index registers are updated by 2 and 4, respectively.
This instruction, along with the r e p prefix, is useful to copy a string. More generally, we can use
them to perform memory-to-memory block transfers. Here is an example that copies s t r i n g l to
string2.
.DATAA
db
stringl
The original
string',0
'The
(
EQU
strLen
$-str
.UDATA
string2
resb
80
.CODE
.STARTUP
ECX, StrLen
mov
StrLen includes NULL
ESI, stringl
mov
EDI, string2
mov
forward direction
eld
movsb
rep

Since the movs instruction does not change any of the flags, conditional repeat (repe or repne)
should not be used with this instruction.
Load a String (lods) This instruction copies the value from the source string (pointed to by
DSiESI) in memory to AL (for byte operands—lodsb), AX (for word operands—lodsw), or
EAX (for doubleword operands—lodsd).
l o d s b — load a byte string
AL := (DS:ESI)
; copy a byte
if (DP = 0)
; forward direction
then
ESI := ESI+1
else
; backward direction
ESI := ESI-1
end if
Flags affected: none
Use of the r e p prefix does not make sense, as it will leave only the last value in AL, AX, or
EAX. This instruction, along with the s t o s instruction, is often used when processing is required
while copying a string. This point is elaborated after we describe the s t o s instruction.
Store a String (stos) This instruction performs the complementary operation. It copies the value
in AL (for s t o s b ) , AX (for stosw), or EAX (for s t o s d ) to the destination string (pointed to
by ES:EDI) in memory.

Chapter 17 • String Processing

369

s t o s b — store a byte string
copy a byte
ES:EDI:=AL
forward direction
if(DF = 0)
then
EDI:=EDI+1
else
backward direction
EDI:=EDI-1
end if
Flags affected: none
We can use the r e p prefix with the s t o s instruction if our intention is to initialize a block of
memory with a specific character, word, or doubleword value. For example, the following code
initializes a r r a y l with —1.
.UDATA
resw
100
arrayl
.CODE
.STARTUP
mov
ECX,100
EDI,arrayl
mov
mov
AX, -1
eld
stosw
rep

forward direction

In general, the r e p prefix is not useful with l o d s and s t o s instructions. These two instructions
are often used in a loop to do value conversions while copying data. For example, if s t r i n g 1
only contains letters and blanks, the following code
mov
ECX,strLen
mov
ESI,stringl
EDI,string2
mov
eld
loopl:
lodsb
AL,20H
or
stosb
loop loopl
done:

forward direction

can convert it to a lowercase string. Note that blank characters are not affected because 20H
represents blank in ASCII, and the
or

AL,20H

instruction does not have any effect on it. The advantage of l o d s and s t o s is that they automatically increment ESI and EDI registers.

370

Assembly Language Programming in Linux

String Compare Instruction
The cmps instruction can be used to compare two strings.
cmpsb — compare two byte strings
Compare the two bytes at DS:ESI and ES:EDI and set flags
if (DF = 0)
; forward direction
then
ESI:=ESI-fl
EDI:=EDI+1
else
; backward direction
ESI:=ESI-1
EDI:=EDI-1
end if
Flags affected: As per cmp instruction
The cmps instruction compares the two bytes, words, or doublewords at DS:ESI and ES:EDI
and sets theflagsjust like the cmp instruction. Like the cmp instruction, cmps performs
(DSrESI) -

(ESrEDI)

and sets the flags according to the result. The result itself is not stored. We can use conditional
jumps like j a, j g, j c, etc, to test the relationship of the two values. As usual, the ESI and EDI
registers are updated according to the value of the direction flag and the operand size. The cmps
instruction is typically used with the r e p e / r e p z or r e p n e / r e p n z prefix.
The following code
.DATA
Stringl
db
'abcdfghi',0
$-stringl
strLen
EQU
db
'abcdefgh',0
string2
.CODE
.STARTUP
mov
ECX,StrLen
mov
ESI,Stringl
mov
EDI,string2
eld
; f
'
forward
direction
cmpsb
repe

ESI pointing to g in stringl and ED
dec
dec

ESI
EDI

leaves ESI and EDI pointing to the last character that differs. Then we can use, for example,
ja

strlAbove

to test if s t r i n g l is greater (in the collating sequence) than s t r i n g 2 . This, of course, is true
in this example. A more concrete example is given later (see the string comparison procedure on
page 375).
The r e p n e / r e p n z prefix can be used to continue comparison as long as the comparison fails
and the loop terminates when a matching value is found. For example,

Chapter 17 • String Processing

371

.DATA
Stringl
db
'abcdfghi',0
strLen
EQU
$-stringl-l
string2
db
'abcdefgh',0
.CODE
.STARTUP
mov
ECX,StrLen
mov
ESI,stringl + strLen - 1
mov
EDI,string2 + strLen - 1
std
; backward direction
repne cmpsb
inc
ESI
inc
EDI

leaves ESI and EDI pointing to the first character that matches in the backward direction.
Scanning a String

The s e a s (scanning a string) instruction is useful in searching for a particular value or character
in a string. The value should be in AL (for scasb), AX (for scasw), or EAX (for scasd), and
ES:EDI should point to the string to be searched.
s c a s b — scan a byte string
Compare AL to the byte at ES:EDI and set flags
if (DF = 0)
; forward direction
then
EDI:=EDI+1
else
; backward direction
EDI:=EDI-1
end if
Flags affected: As per cmp instruction
Like with the cmps instruction, the r e p e / r e p z or r e p n e / r e p n z prefix can be used.
.DATA
db
'abcdefgh',0
Stringl
StrLen
EQU
$ - stringl
.CODE
.STARTUP
mov
ECX,StrLen
EDI,stringl
mov
AL, ' e'
;character
cha:
mov
to be searched
/ for^
eld
forward direction
repne scasb
EDI
dec

This program leaves the EDI register pointing to e in s t r i n g l . The following example can be
used to skip the initial blanks.
.DATA
Stringl
StrLen
.CODE

db
EQU

'
abc',0
$-stringl

372

Assembly Language Programming in Linux

.STARTUP
mov
ECX,strLen
mov
EDI,stringl
mov
AL, ' '
eld
repe
scasb
dec
EDI

character to be searched
forward direction

This program leaves the EDI register pointing to the first nonblank character in s t r i n g l , which
is a in our example.

Our First Program
The string instructions we have discussed so far are not restricted to string operations only. For example, they can be used for general-purpose memory-to-memory copy operations. To demonstrate
this aspect, we write a program to perform a memory-to-memory copy operation. In this program,
we copy the contents of a doubleword array to another array. Of course, we can do this without
the string instructions. Program 17.1 shows how this can be done using the string instructions.
Program 17.1 Memory-to-memory copy using the string instructions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

Memory-to-memory copy

MEM_COPY.ASM

Objective: To demonstrate memory-to-memory copy
using the string instructions.
Input: None.
Output: Outputs the copied array.
%include "io.mac"
.DATA
in_array
ARRAY_SIZE
out_msg

dd 10,20,30,40,50,60,70,80,90,100
EQU ($-in_array)/4
db 'The copied array is: ',0

.UDATA
out_array

resd

ARRAY SIZE

.CODE
.STARTUP
mov
ECX,ARRAY_SIZE
mov
ESI,in_array
mov
EDI,out_array
eld
rep
movsd
PutStr
mov
mov
repeatl:

out_msg
ECX,ARRAY_SIZE
ESI,out_array

ECX = array size
ESI = in array pointer
EDI = out array pointer
forward direction

Chapter 17 • String Processing

30
31
32
33
34

373

lodsd
PutLInt EAX
nwln
loop
repeatl
.EXIT

This program's structure follows the example we have seen in Chapter 13 (see Example 13
on page 281). The source array ( i n _ a r r a y ) is initialized with 10 values, each is a 32-bit value.
The array size is determined on line 12 by using the predefined location counter symbol $. For a
discussion of how the array size is computed, see Example 13 on page 281.
To copy the array, we store the array size in ECX (line 20) and the source and destination array
pointers in ESI and EDI registers, respectively (lines 21 and 22). Once these registers are set up,
we clear the direction flag using e l d on line 23. Copying of the array is done using the movsd
instruction along with the r e p prefix on line 24.
In operating systems that use segmentation provided by the IA-32 architecture, we have to
make sure that the ES segment register points to the data segment. This, for example, can be done
by the following code:
mov

AX,DS

mov

E S , AX

We have to resort to an indirect means to copy the DS contents to ES as
mov

E S , DS

is not a valid instruction. Since the Linux operating system does not use the segmentation and
initializes the DS and ES registers to the same value, we don't need this code in our programs.
To display the contents of the destination array ( o u t _ a r r a y ) , we use the l o d s d instruction,
which loads the value into the EAX register. This value is displayed using the P u t L I n t on
line 31. We cannot use the r e p prefix with the l o d s d instruction as we need to display the value.
Instead, we use a loop to display the array values.

Illustrative Examples
We now give some examples to illustrate the use of the string instructions discussed in this
chapter. These procedures along with several others are available in the s t r i n g . a s m file.
These procedures receive the parameters via the stack. The pointer to a string is received in
segment :of f s e t form. A string pointer is loaded into either DS and ESI or ES and EDI
using the I d s or l e s instructions, the details of which are discussed next.
LDS and LES Instructions The syntax of these instructions is
Ids
les

where r e g i s t e r is a 32-bit general-purpose register, and s o u r c e is a pointer to a 48-bit memory operand. The instructions perform the following actions:

374

Assembly Language Programming in Linux

Ids
r e g i s t e r := (source)
DS := ( s o u r c e +4)
les
r e g i s t e r := (source)
ES := ( s o u r c e + 4)
The 32-bit value at s o u r c e in memory is copied to r e g i s t e r and the next 16-bit value
(i.e., at source+4) is copied to the DS or ES register. Both instructions affect none of the flags.
By specifying ESI as the register operand, I d s can be conveniently used to set up a source string.
Similarly, a destination string can be set up by specifying EDI with l e s . For completeness, you
should note that I f s, I g s , and I s s instructions are available to load the other segment registers.
Examples

We will next present two simple string processing procedures. These functions are available in
high-level languages such as C. All procedures use the carry flag (CF) to report input error—not a
string. This error results if the input passed is not a string whose length is less than the STR_iyiAX
constant defined in s t r i n g . asm. The carry flag is set (i.e., CF = 1) if there is an input error;
otherwise, the carry flag is cleared.
The following constants are defined in s t r i n g . a s m :
STR MAX
%define
%define

EQU
STRINGl
STRING2

128
[EBP+8]
[EBP+16]

Example 17.1 String length procedure to return the length o / s t r i n g l .
String length is the number of characters in a string, excluding the NULL character. We use
the s c a s b instruction and search for the NULL character. Since s c a s b works on the destination
string, l e s is used to load the string pointer to the ES and EDI registers from the stack. STR_MAX,
the maximum length of a string, is moved into ECX, and the NULL character (i.e., 0) is moved
into the AL register. The direction flag is cleared to initiate a forward search. The string length is
obtained by taking the difference between the end of the string (pointed to by EDI) and the start
of the string available at [EBP+8]. The EAX register is used to return the string length value. This
procedure is similar to the C function s t r l e n .
string length procedure. Receives a string pointer
(seg:offset) via the stack. If not a string, CF is set;
otherwise, string length is returned in EAX with CF = 0.
Preserves all registers.
str len:
enter
push
push
push
les
mov
cld

0,0
ECX
EDI
ES
EDI, STRINGl
ECX, STR MAX

copy string pointer to ESiEDI
need to terminate loop if EDI
is not pointing to a string
forward search

Chapter 17 • String Processing

mov
repne
JCXZ

AL,0
scasb
sl_no_string
EDI
EAX,EDI
EAX,[EBP+8]

dec
mov
sub
clc
jmp
SHORT si done
si no str:Lng:
stc
si done:
ES
pop
EDI
pop
ECX
pop
leave
8
ret

375

NULL c h a r a c t e r
i f ECX = 0, n o t a s t r i n g
back up t o p o i n t t o NULL
s t r i n g l e n g t h i n EAX
no e r r o r

c a r r y s e t => no s t r i n g

c l e a r s t a c k and r e t u r n

Example 17.2 String compare procedure to compare two strings.
This function uses the cmpsb instruction to compare two strings. It returns in EAX a negative
value if s t r i n g l is lexicographically less than s t r i n g 2 , 0 if s t r i n g l is equal to s t r i n g 2 ,
and a positive value if s t r i n g l is lexicographically greater than s t r i n g 2 .
To implement this procedure, we have to find the first occurrence of a character mismatch
between the corresponding characters in the two strings (when scanning strings from left to right).
The relationship between the strings is the same as that between these two differing characters.
When we include the NULL character in this comparison, this algorithm works correctly even
when the two strings are of different length.
The s t r _ c m p instruction finds the length of s t r i n g 2 using the s t r _ l e n procedure. It
does not really matter whether we find the length of s t r i n g 2 or s t r i n g l . We use this value
(plus one to include NULL) to control the number of times the cmpsb instruction is repeated.
Conditional jump instructions are used to test the relationship between the differing characters to
return an appropriate value in the EAX register. The corresponding function in C is strcmp,
which can be invoked by s t r c m p ( s t i n g l , s t r i n g 2 ) . This function also returns the same
values (negative, 0, or positive value) depending on the comparison.
string compare procedure. Receives two string pointers
(seg:offset) via the stack - stringl and string2.
If string2 is not a string, CF is set;
otherwise, stringl and string2 are compared and returns a
a value in EAX with CF = 0 as shown below:
EAX = negative value if stringl < string2
if stringl = string2
EAX = zero
EAX = positive value if stringl > string2
Preserves all registers.
str_cmp:
enter
push
push
push

0,0
ECX
EDI
ESI

376

Assembly Language Programming in Linux
push
DS
push
ES
; find string length first
EDI,STRING2
les
string2 pointer
push
ES
push
EDI
call
str_len
sm_no_string
jc
ECX,EAX
mov
ECX
inc
ESI,STRING1
Ids
eld
cmpsb
repe
same
je
above
ja
below:
EAX,-1
mov
clc
jmp
SHORT sm_done
same:
EAX,EAX
xor
clc
SHORT sm_done
jmp
above:
mov
EAX,1
clc
jmp
SHORT sm_done
sm_no_str ing:
stc
sm_done:
pop
ES
pop
DS
pop
ESI
pop
EDI
pop
ECX
leave
ret
16

stringl length in ECX
add 1 to include NULL
stringl pointer
forward search

EAX = -1 => stringl < string2

EAX = 0 => string match

EAX = 1 => stringl > string2

carry set => no string

clear and return

In addition to these two functions, several other string processing functions such as string copy
and string concatenate are available in the s t r i n g . asm file.

Testing String Procedures
Now let us turn our attention to testing the string procedures developed in the last section. A
partial listing of this program is given in Program 17.2. The full program can be found in the
s t r _ t e s t . asm file.
Our main interest in this section is to show how using an indirect procedure call would substantially simplify calling the appropriate procedure according to the user request. Let us first look
at the indirect call instruction for 32-bit segments.

Chapter 17 • String Processing

377

Program 17.2 Part of string test program s t r _test. asm

.DATA
proc_ptr_table
MAX_FUNCTIONS
choice_prompt

.UDATA
stringl
string2

dd
dd
EQU

str_len_fun,str_cpy_fun,str_cat_fun
str_cmp_fun,str_chr_fun,str_cnv_fun
($ - p r o c _ p t r _ t a b l e ) / 4

db
db
db
db
db
db
db
db
db
db

'You c a n test several f u n c t i o n s . ' , C R , L F
'
T o test
enter',CR,LF
'String length
1' ,CR,, LF
2' ,CR,,LF
'String copy
'String c o n c a t e n a t e 3' ,CR,,LF
4' /CR,,LF
'String compare
'Locate c h a r a c t e r
5' ,CR,,LF
'Convert string
6' /CR,,LF
'Invalid r e s p o n s e t e r m i n a t e s p r o g r a m . ' , C R , L F
'Please enter y o u r c h o i c e : ',0

resb
resb

STR_MAX
STR M A X

.CODE

.STARTUP
query_choice:
xor
EBX,EBX
P u t S t r choice_prompt
GetCh
BL
sub
BL,'1'
cmp
BL,0
jb
invalid_response
cmp
BL,MAX_FUNCTIONS
response_ok
jb
invalid_response:
PutStr
invalid_choice
nwln
jmp
SHORT done
response_ok:
shl
EBX,2
call
[proc_ptr_table+;
jmp
query_choice
done:
.EXIT

display menu
read response

multiply EBX by 4

378

Assembly Language Programming in Linux

Indirect Procedure Call

In our discussions so far, we have been using only the direct procedure calls, where the offset of
the target procedure is provided directly. Recall that, even though we write only the procedure
name, the assembler generates the appropriate offset value at the assembly time.
In indirect procedure calls, this offset is given with one level of indirection. That is, the call
instruction contains either a memory word address (through a label) or a 32-bit general-purpose
register. The actual offset of the target procedure is obtained from the memory word or the register
referenced in the call instruction. For example, we could use
call

EBX

if EBX contains the offset of the target procedure. As part of executing this c a l l instruction,
the contents of the EBX register are used to load EIP to transfer control to the target procedure.
Similarly, we can use
call

[target_proc_ptr]

if the memory at t a r g e t _ p r o c _ p t r contains the offset of the target procedure. As we have
seen in Chapter 15, the jmp is another instruction that can be used for indirect jumps in exactly
the same way as the indirect c a l l .
Back to the Example We maintain a procedure pointer table p r o c _ p t r _ t a b l e to facilitate
calling the appropriate procedure. The user query response is used as an index into this table to get
the target procedure offset. The EBX register is used as the index into this table. The instruction
call

[proc_ptr_table+EBX]

causes the indirect procedure call. The rest of the program is straightforward to follow.

Summary
We started this chapter with a brief discussion of various string representation schemes. Strings
can be represented as either fixed-length or variable-length. Each representation has advantages
and disadvantages. Variable-length strings can be stored either by explicitly storing the string
length or by using a sentinel character to terminate the string. High-level programming languages
like C use the NULL-terminated storage representation for strings. We have also used the same
representation to store strings.
There are five basic string instructions—movs, l o d s , s t o s , cmps, and s e a s . Each of
these instructions can work on byte, word, or doubleword operands. These instructions do not
require the specification of any operands. Instead, the required operands are assumed to be at
DS:ESI and/or ES:EDI for 32-bit segments. For 16-bit segments, SI and DI registers are used
instead of the ESI and EDI registers, respectively. In addition, the direction flag is used to control
the direction of string processing (forward or backward). Efficient code can be generated by
combining string instructions with the repeat prefixes. Three repeat prefixes—rep, r e p e / r e p z ,
and repne/repnz—are provided.
We also demonstrated, by means of an example, how indirect procedure calls can be used.
Indirect procedure calls give us a powerful mechanism by which, for example, we can pass a
procedure to be executed as an argument using the standard parameter passing mechanisms.

18
ASCII and BCD
Arithmetic
In the previous chapters, we used the binary representation and discussed several instructions that
operate on binary data. In this chapter, we present two alternative representations—ASCII and
BCD—that avoid or reduce the conversion overhead. We start this chapter with a brief introduction
to these two representations. The next two sections discuss how arithmetic operations can be done
in these two representations.
While the ASCII and BCD representations avoid/reduce the conversion overhead, processing
numbers in these two representations is slower than in the binary representation. This inherent
tradeoff between conversion overhead and processing overhead among the three representations is
explored toward the end of the chapter The chapter ends with a summary.

Introduction
We normally represent the numeric data in the binary system. We have discussed several arithmetic
instructions that operate on such data. The binary representation is used internally for manipulation
(e.g., arithmetic and logical operations).
When numbers are entered from the keyboard or displayed, they are in the ASCII form. Thus,
it is necessary to convert numbers from ASCII to binary at the input end; we have to convert from
binary to ASCII to output results as shown below:
Input data
(in ASCII)

ASCII to
binary
conversion

Process
in binary

Binary to
ASCII
conversion

Output data
(in ASCII)

We used Get I n t / G e t L i n t and Put I n t / P u t L i n t to perform these two conversions, respectively. These conversions represent an overhead, but we can process numbers much more
efficiently in the binary form.

380

Assembly Language Programming in Linux

In some applications where processing of numbers is quite simple (for example, a single addition), the overhead associated with the two conversions might not be justified. In this case, it is
probably more efficient to process numbers in the decimal form.
Another reason for processing numbers in decimal form is that we can use as many digits as
necessary, and we can control rounding-off errors. This is important when representing dollars
and cents forfinancialrecords.
Decimal numbers can be represented in one of two forms: ASCII or binary-coded-decimal
(BCD). These two representations are discussed next.
ASCII Representation

In this representation, numbers are stored as strings of ASCII characters. For example, 1234 is
represented as
3132 33 34H
where 3IH is the ASCII code for 1, 32H for 2, etc. As you can see, arithmetic on decimal numbers
represented in the ASCII form requires special care. There are two instructions to handle these
numbers:
aaa — ASCII adjust after addition
a a s — ASCII adjust after subtraction
We discuss these two instructions after introducing the BCD representation.
BCD Representation

There are two types of BCD representation: unpacked BCD and packed BCD. In the unpacked
BCD representation, each digit is stored in a byte, while two digits are packed into a byte in the
packed representation.
Unpaclced BCD This representation is similar to the ASCII representation except that each byte
stores the binary equivalent of a decimal digit. Note that the ASCII codes for digits 0 through
9 are 30H through 39H. Thus, if we mask off the upper four bits, we get the unpacked BCD
representation. For example, 1234 is stored in this representation as
01 02 03 04H
We deal with only positive numbers in this chapter. Thus, there is no need to represent the sign.
But if a sign representation is needed, an additional byte can be used for the sign. The number is
positive if this byte is OOH and negative if 80H.
There are two instructions to handle these numbers:
aam — ASCII adjust after multiplication
aad — ASCII adjust before division
Since this representation is similar to the ASCII representation, the four instructions—aaa, a a s ,
aam, and aad—can be used with the ASCII and unpacked BCD representations.

Chapter 18 • ASCII and BCD Arithmetic

381

Packed BCD In the last two representations, each digit of a decimal number is stored in a byte.
The upper four bits of each byte contain redundant information. In packed BCD representation,
each digit is stored using only four bits. Thus, two decimal digits can be packed into a byte. This
reduces the memory requirement by half compared to the other two representations. For example,
the decimal number 1234 is stored in the packed BCD representation as
12 34H
which requires only two bytes as opposed to four in the other two representations. There are only
two instructions that support addition and subtraction of packed BCD numbers:
da a — decimal adjust after addition
d a s — decimal adjust after subtraction
There is no support for multiplication or division operations. These two instructions are discussed later.

Processing in ASCII Representation
As mentioned before, four instructions are available to process numbers in the ASCII representation:
aaa — ASCII adjust after addition
a a s — ASCII adjust after subtraction
a am — ASCII adjust after multiplication
aad — ASCII adjust before division
These instructions do not take any operands. They assume that the required operand is in the AL
register.
ASCII Addition

To understand the need for the aaa instruction, look at the next two examples.
Example 18.1 An ASCII addition example.
Consider adding two ASCII numbers 4 (34H) and 5 (35H).
34H = OOllOlOOB
35H = OOllOlOlB
69H = OIIOIOOIB
The sum 69H is not correct. The correct value should be 09H in unpacked BCD representation. In
this example, we get the right answer by setting the upper four bits to 0. This scheme, however,
does not work in cases where the result digit is greater than 9, as shown in the next example.
•
Example 18.2 Another ASCII addition example.
In this example, consider the addition of two ASCII numbers, 6 (36H) and 7 (37H).
36H = OOllOllOB
37H =00110111B
6DH =01101101B

382

Assembly Language Programming in Linux

Again, the sum 6DH is incorrect. We would expect the sum to be 13 (01 03H). In this case, ignore
6 as in the last example. But we have to add 6 to D to get 13. We add 6 because that is the
difference between the bases of hex and decimal number systems.
•
The aaa instruction performs these adjustments. This instruction is used after performing an
addition operation by using either an add or adc instruction. The resulting sum in AL is adjusted
to unpacked BCD representation. The aaa instruction works as follows.
1. If the least significant four bits of AL are greater than 9 or if the auxiliary flag is set, it adds
6 to AL and 1 to AH. Both CF and AF are set.
2. In all cases, the most significant four bits of AL are cleared (i.e., zeroed).
Here is an example that illustrates the use of the aaa instruction.
Example 18.3 A typical use of the aaa instruction.
sub
mov
add
aaa
or

AH, AH
AL,'6'
AL,'7'

AL,30H

clear AH
AL = 3 6H
AL = 36H+37H == 6DH
AX = 0103H
AL = 33H

To convert the result in AL to an ASCII result, we have to insert 3 into the upper four bits of the
AL register.
•
To add multidigit decimal numbers, we have to use a loop that adds one digit at a time starting
from the rightmost digit. Program 18.1 shows how the addition of two 10-digit decimal numbers
is done in ASCII representation.
ASCII Subtraction

The a a s instruction is used to adjust the result of a subtraction operation (sub or sbb) and works
like aaa. The actions taken by a a s are
1. If the least significant four bits of AL are greater than 9 or if the auxiliary flag is set, it
subtracts 6 from AL and 1 from AH. Both CF and AF are set.
2. In all cases, the most significant four bits of AL are cleared (i.e., zeroed).
It is straightforward to see that the adjustment is needed only when the result is negative, as shown
in the following examples.
Example 18.4 ASCII subtraction (positive result).
sub
mov
sub
aas
or

AH, AH
AL,'9'
AL,'3'
AL,30H

clear AH
AL = 3 9H
AL = 39H-33H
AX = 0006H
• AL = 3 6H

Notice that the a a s instruction does not change the contents of the AL register, as the result is a
positive number.
•

Chapter 18 • ASCII and BCD Arithmetic

383

Example 18.5 ASCII subtraction (negative result).
sub
mov
sub
aas
or

AH, AH
AL,'3'
AL,'9'
AL,30H

clear AH
AL = 33H
AL = 33H-39H == FAH
AX = FF04H
AL = 34H

The AL result indicates the magnitude; the a a s instruction sets the carry flag to indicate that a
borrow has been generated.
•
Is the last result, FF04H, generated by a a s useful? It is when you consider multidigit subtraction. For example, if we are subtracting 29 from 53 (i.e., 53-29), the first loop iteration performs
3—9 as in the last example. This gives us the result 4 in AL and the carry flag is set. Next we
perform 5-2 using sbb to include the borrow generated by the previous subtraction. This leaves
2 as the result. After ORing with 30H, we will have 32 34H, which is the correct answer (24).
ASCII Multiplication

The a am instruction is used to adjust the result of a mul instruction. Unlike addition and subtraction, multiplication should not be performed on ASCII numbers but on unpacked BCD numbers.
The a am works as follows: AL is divided by 10 and the quotient is stored in AH and the remainder
in AL.
Example 18.6 ASCII multiplication,
mov
mov
mul
aam
or

AL, 3
BL, 9
BL
AX, 3030H

multiplier in unpacked BCD form
multiplicand in unpacked BCD form
result OOIBH is in AX
AX = 0207H
AX = 3237H

Notice that the multiplication should be done using unpacked BCD numbers—not on ASCII numbers! If the digits in AL and BL are in ASCII as in the following code, we have to mask off the
upper four bits.
mov
mov
and
and
mul
aam
or

AL, '3'
BL, '9'
AL, OFH
BL, OFH
BL
AL, 3 OH

multiplier in ASCII
multiplicand in ASCII
multiplier in unpacked BCD form
multiplicand in unpacked BCD form
result OOIBH is in AX
AX = 0207H
AL = 3 7H

The aam instruction works only with the mul instruction, not with the imul instruction,

•

ASCII Division

The aad instruction adjusts the numerator in AX before dividing two unpacked decimal numbers.
The denominator has to be a single byte unpacked decimal number. The aad instruction multiplies
AH by 10 and adds it to AL and sets AH to zero. For example, if AX = 0207H before aad, AX
changes to OOIBH after executing aad. As you can see from the last example, aad reverses the
operation of aam.

384

Assembly Language Programming in Linux

Example 18.7 ASCII division.
Consider dividing 27 by 5.
mov
mov
aad
div

AX, 0207H
BL, 05H
BL

dividend in unpacked BCD form
divisor in unpacked BCD form
AX = OOIBH
AX = 0205H

The aad instruction converts the unpacked BCD number in AX to the binary form so that d i v
can be used. The d i v instruction leaves the quotient (05H) in the AL register and the remainder
(02H) in the AH register.
•

Our First Program
As our first example of the chapter, let us see how we can perform multidigit ASCII addition.
Addition of multidigit numbers in the ASCII representation is done one digit at a time starting
with the rightmost digit. To illustrate the process involved, we discuss how addition of two 10digit numbers is done (see the program listing below).
Program 18.1 ASCII addition of two 10-digit numbers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

Addition of two integers in ASCII form

ASCIIADD.ASM

Objective: To demonstrate addition of two integers
in the ASCII representation.
Input: None.
Output: Displays the sum.
%include "io.mac"
.DATA
sum_msg
number1
number2
sum

db
db
db
db

'The sum is: '/O
'1234567890'
'1098765432'
add NULL char, to use PutStr
',0

.CODE
.STARTUP
; ESI is used as index into numberl, number2, and sum
ESI, 9
ESI points to rightmost digit
mov
iteration count (# of digits)
ECX,10
mov
clear carry (we use ADC not ADD)
clc
add_loop:
AL, [number1+ESI]
mov
AL, [number2+ESI]
adc
ASCII adjust
aaa
save flags because OR
pushf
changes CF that we need
AL,30H
or
in the next iteration
popf
[sum+ESI] ,AL
store the sum byte
mov
ESI
update ESI
dec

Chapter 18 • ASCII and BCD Arithmetic

30
31
32
33
34

loop
PutStr
PutStr
nwln
.EXIT

add_loop
sum_msg
sum

385

; display sum

The program adds two numbers numberl and number2 and displays the sum. We use ESI
as an index into the input numbers, which are in the ASCII representation. The ESI register is
initialized to point to the rightmost digit (line 18). The loop count 10 is set up in ECX (line 19).
The addition loop (lines 21-30) adds one digit by taking any carry generated during the previous
iteration into account. This is done by using the adc rather than the add instruction. Since the
adc instruction is used, we have to make sure that the carry is clear initially. This is done on
line 20 using the c l c (clear carry) instruction.
Note that the aaa instruction produces the result in unpacked BCD form. To convert to the
ASCII form, we have to o r the result with 30H (line 26). This ORing, however, destroys the carry
generated by the adc instruction that we need in the next iteration. Therefore, it is necessary to
save (line 25) and restore (line 27) the flags.
The overhead in performing the addition is obvious. If the input numbers were in binary, only
a single add instruction would have performed the required addition. This conversion-overhead
versus processing-overhead tradeoff is discussed later.

Processing Packed BCD Numbers
In this representation, as indicated earlier, two decimal numbers are packed into a byte. There are
two instructions to process packed BCD numbers:
da a — Decimal adjust after addition
d a s — Decimal adjust after subtraction
There is no support for multiplication or division. For these operations, we will have to unpack
the numbers, perform the operation, and repack them.
Packed BCD Addition

The daa instruction can be used to adjust the result of an addition operation to conform to the
packed BCD representation. To understand the sort of adjustments required, let us look at some
examples next.
Example 18.8 A packed BCD addition example.
Consider adding two packed BCD numbers 29 and 69.
29H = OOIOIOOIB
69H = OIIOIOOIB
92H =10010010B
The sum 92 is not the correct value. The result should be 98. We get the correct answer by adding
6 to 92. We add 6 because the carry generated from bit 3 (i.e., auxiliary carry) represents an
overflow above 16, not 10, as is required in BCD.
•

386

Assembly Language Programming in Linux

Example 18.9 Another packed BCD addition example,
Consider adding two packed BCD numbers 27 and 34.
27H = OOlOOlllB
34H =00110100B
5BH = OlOllOllB
Again, the result is incorrect. The sum should be 61. The result 5B requires correction, as the first
•
digit is greater than 9. To correct the result add 6, which gives us 61.
Example 18.10 Afinalpacked BCD addition example.
Consider adding two packed BCD numbers 52 and 61.
52H = OIOIOOIOB
61H = OllOOOOlB
B3H = lOllOOllB
This result also requires correction. The first digit is correct, but the second digit requires a correction. The solution is the same as that used in the last example—add 6 to the second digit (i.e.,
add 60H to the result). This gives us 13 as the result with a carry (effectively equal to 113).
•
The daa instruction exactly performs adjustments like these to the result of add or adc instructions. More specifically, the following actions are taken by daa:
• If the least significant four bits of AL are greater than 9 or if the auxiliary flag is set, it adds
6 to AL and sets AF;
• If the most significant four bits of AL are greater than 9 or if the carry flag is set, it adds 60H
to AL and sets CF.
Example 18.11 Code for packed BCD addition,
Consider adding two packed BCD numbers 71 and 43.
mov
add
daa

AL,71H
AL,43H

; AL = B4H
; AL = 14H and CF == 1

As indicated, the daa instruction restores the result in AL to the packed BCD representation. The
result including the carry (i.e., 114H) is the correct answer in packed BCD.
•
As in the ASCII addition, multibyte BCD addition requires a loop. After discussing the packed
BCD subtraction, we present an example to add two 10-byte packed BCD numbers.
Packed BCD Subtraction

The das instruction can be used to adjust the result of a subtraction (i.e., the result of sub or
sbb). It works similar to daa and performs the following actions:
• If the least significant four bits of AL are greater than 9 or if the auxiliary flag is set, it
subtracts 6 from AL and sets AF;

Chapter 18 • ASCII and BCD Arithmetic

387

• If the most significant four bits of AL are greater than 9 or if the carry flag is set, it subtracts
60H from AL and sets CF.
Here is an example that illustrates the use of the d a s instruction.
Example 18.12 Code for packed BCD subtraction.
Consider subtracting 43 from 71 (i.e., 71 — 43).
mov
sub
das

AL,71H
AL,43H

; AL = 2EH
; AL = 2 8H

The d a s instruction restores the result in AL to the packed BCD representation.

•

Illustrative Example
In this example, we consider multibyte packed BCD addition. As in the ASCII representation,
when adding two multibyte packed BCD numbers, we have to use a loop that adds a pair of
decimal digits in each iteration, starting from the rightmost pair. The program, given below, adds
two 10-byte packed BCD numbers, number 1 and number2.
Program 18.2 Packed BCD addition of two 10-digit numbers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Addition of integers in packed BCD form

BCDADD.ASM

Objective: To demonstrate addition of two integers
in the packed BCD representation.
Input: None.
Output: Displays the sum.
%define SUM_LENGTH

%include "io.mac"
.DATA
sum_msg
numberl
number2
ASCIIsum

db
db
db
db

.UDATA
BCDsum

resb

'The sum is: ',0
12H,34H,56H,78H,90H
lOH,98H,76H,54H,32H
'
',0
; add NULL char.
5

.CODE
.STARTUP
mov
ESI,4
mov
ECX,5
clc
add_loop:
mov
AL,[numberl+ESI]

; loop iteration count
; clear carry (we use ADC)

388

28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68

Assembly Language Programming in Linux

adc
daa
mov
dec
loop
call
PutStr
PutStr

AL,[number2+ESI]
[BCDsum+ESI],AL
ESI
add_loop
ASCII_convert
sum_msg
ASCIIsum

; ASCII adjust
/ store the sum byte
; update index

; display sum

nwln
.EXIT

; Converts the packed decimal number (5 digits) in BCDsum
; to ASCII represenation and stores it in ASCIIsum.
; All registers are preserved.
ASCII_convert
; save registers
pushad
; ESI is used as index into ASCIIsum
ESI,SUM_LENGTH-1
mov
; EDI is used as index into BCDsum
EDI, 4
mov
mov
; loop count (# of BCD digits)
ECX,5
cnv_loop:
mov
; AL = BCD digit
AL,[BCDsum+EDI]
/ save the BCD digit
AH,AL
mov
; convert right digit to ASCII & store in ASCIIsum
and
AL,OFH
AL,30H
or
mov
[ASCIIsum+ESI],AL
ESI
dec
mov
AL,AH
; restore the BCD digit
; convert left digit to ASCII & store in ASCIIsum
shr
AL,4
/ right-shift by 4 positions
or
AL,30H
mov
[ASCIIsum+ESI],AL
dec
ESI
/ update EDI
EDI
dec
loop
cnv_loop
popad
; restore registers
ret

The two numbers to be added are initialized on lines 14 and 15. The space for the sum
(BCDsum) is reserved using r e s b on line 19.
The code is similar to that given in Program 18.1. However, since we add two decimal digits
during each loop iteration, only five iterations are needed to add the 10-digit numbers. Thus,
processing numbers in the packed BCD representation is faster than in the ASCII representation.
In any case, both representations are considerably slower in processing numbers than the binary
representation.

Chapter 18 • ASCII and BCD Arithmetic

389

Table 18.1 Tradeoffs associated with the three representations

Representation
Binary
Packed BCD
ASCII

Storage
overhead
Nil
Medium
High

Conversion
overhead
High
Medium
Nil

Processing
overhead
Nil
Medium
High

At the end of the loop, the sum is stored in BCD sum as a packed BCD number. To display
this number, we have to convert it to the ASCII form (an overhead that is not present in the ASCII
version).
The procedure ASCII_convert takes BCDsum and converts it to equivalent ASCII string
and stores it in ASCI I sum. For each byte read from BCDsum, two ASCII digits are generated.
Note that the conversion from packed BCD to ASCII can be done by using only logical and shift
operations. On the other hand, conversion from binary to ASCII requires a more expensive division
operation (thus increasing the conversion overhead).

Decimal Versus Binary Arithmetic
Now you know three representations to perform arithmetic operations: binary, ASCII, and BCD.
The majority of operations are done in binary. However, there are tradeoffs associated with these
three representations.
First we will look at the storage overhead. The binary representation is compact and the most
efficient one. The ASCII and unpacked BCD representations incur high overhead as each decimal
digit is stored in a byte (see Table 18.1). The packed BCD representation, which stores two decimal
digits per byte, reduces this overhead by approximately half. For example, using two bytes, we can
represent numbers from 0 to 65,535 in the binary representation and from 0 to 9999 in the packed
BCD representation, but only from 0 to 99 in the ASCII and unpacked BCD representations.
In applications where the input data is in ASCII form and the output is required to be in ASCII,
binary arithmetic may not always be the best choice. This is because there are overheads associated
with the conversion between ASCII and binary representations. However, processing numbers in
binary can be done much more efficiently than in either ASCII or BCD representations. Table 18.1
shows the tradeoffs associated with these three representations.
When the input and output use the ASCII form and there is little processing, processing numbers in ASCII is better. This is so because ASCII version does not incur any conversion overhead.
On the other hand, due to high overhead in converting numbers between ASCII and binary, the
binary version takes more time than the ASCII version. The BCD version also takes substantially
more time than the ASCII version but performs better than the binary version mainly because
conversions between BCD and ASCII are simpler.
When there is significant processing of numbers, the binary version tends to perform better than
the ASCII and BCD versions. In this scenario, the ASCII version provides the worst performance
as its processing overhead is high (see Table 18.1). The BCD version, while slower than the binary
version, performs much better than the ASCII version.
The moral of the story is that a careful analysis of the application should be done before
deciding on the choice of representation for numbers in some applications. This is particularly
true for business applications, where the data might come in the ASCII form.

390

Assembly Language Programming in Linux

Summary
In previous chapters we converted decimal data into binary for storing internally as well as for manipulation. This chapter introduced two alternative representations for storing the decimal data—
ASCII and BCD. The BCD representation can be either unpacked or packed.
In the ASCII and unpacked BCD representations, one decimal digit is stored per byte, whereas
the packed BCD representation stores two digits per byte. Thus, the storage overhead is substantial
in ASCII and unpacked BCD. Packed BCD representation uses the storage space more efficiently
(typically requiring half as much space). The binary representation, on the other hand, does not
introduce any overhead.
There are two main overheads that affect the execution time of a program: conversion overhead
and processing overhead. When the ASCII form is used for data input and output, the data should
be converted between ASCII and binary/BCD. This conversion overhead for the binary representation can be substantial, as multiplication and division are required. There is much less overhead
for the BCD representations, as only logical and shift operations are needed. On the other hand,
number processing in binary is much faster than in ASCII or BCD representations. Packed BCD
representation is better than ASCII representation, as each byte stores two decimal digits.

19
Recursion
We can use recursion as an alternative to iteration. This chapter first introduces the basics of
recursion. After that we give some examples to illustrate how recursive procedures are written
in the assembly language. The advantages and pitfalls associated with a recursive solution as
opposed to an iterative solution are discussed toward the end of the chapter The last section gives
a summary.

Introduction
A recursive procedure calls itself, either directly or indirectly. In direct recursion, a procedure calls
itself directly. In indirect recursion, procedure P makes a call to procedure Q, which in turn calls
procedure P. The sequence of calls could be longer before a call is made to procedure P.
Recursion is a powerful tool that allows us to express our solution elegantly. Some solutions
can be naturally expressed using recursion. Computing a factorial is a classic example. Factorial
n, denoted n!, is the product of positive integers from 1 to n. For example,
5! = 1 x 2 x 3 x 4 x 5 .
The factorial can be formally defined as
factorial(O) = 1
factorial(n) = n * factorial(n - 1) forn > 0.
Recursion shows up in this definition as we define factorial(n) in terms of factorial(n - 1). Every
recursive function should have a termination condition to end the recursion. In this example, when
n = 0, recursion stops. How do we express such recursive functions in progranmiing languages?
Let us first look at how this function is written in C:
int fact(int n)

{
if (n == 0)
return(1);
return(n * fact(n-l));

Assembly Language Programming in Linux

392

Return
I factorial(3) = 6

Call

"=H

factorial(3) = 3 * factorial(2)
f factorial(2) = 2

n=2

factorial(2) = 2 * factorial(l)
f factorial(l)= 1

n=l

factorial(l) = 1 * factorial(O)
f factorial(O) = 1

n=0
D

factorial(O) = 1

Activation
record for A
Activation
record for B
Activation
record for C
Activation
record for D

Recursion termination
(a)

(b)

Figure 19.1 Recursive computation of factorial(3).

This is an example of direct recursion. How is this function implemented? At the conceptual level,
its implementation is not any different from implementing other procedures. Once you understand
that each procedure call instance is distinct from the others, the fact that a recursive procedure calls
itself does not make a big difference.
Each active procedure maintains an activation record, which is stored on the stack. The activation record, as explained on page 256, consists of the arguments, return address, and local
variables. The activation record comes into existence when a procedure is invoked and disappears
after the procedure is terminated. Thus, for each procedure that is not terminated, an activation
record that contains the state of that procedure is stored. The number of activation records, and
hence the amount of stack space required to run the program, depends on the depth of recursion.
Figure 19.1 shows the stack activation records for factorial(3). As you can see from this figure,
each call to the factorial function creates an activation record. Stack is used to keep these activation
records.

Our First Program
To illustrate the principles of recursion, we look at an example that computes the factorial function.
An implementation of the factorial function is shown in Program 19.1. The main function provides
the user interface. It requests a positive number n from the user. If a negative number is given as
input, the user is prompted to try again (lines 20-24). The positive number, which is read into the
BX register, is passed on to the f a c t procedure (line 27). This procedure returns factorial(n) in
the AX register, which is output with an appropriate message (lines 29-31).

Chapter 19 • Recursion

393

Program 19.1 Recursive computation of factorial(A/^)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

Factorial - Recursive version

FACTORIAL.ASM

Objective: To compute factoral using recursion.
Input: Requests an integer N from the user.
Output: Outputs N!
%include "io.mac"
.DATA
prompt_msg
output_msg
error_msg

db
db
db

"Please enter a positive integer: ",0
"The factorial is: ",0
"Not a positive number. Try again.",0

.CODE
.STARTUP
PutStr prompt_msg
try_again:
Getint
cmp
j ge
PutStr
nwln
jmp
num_ok:
call
PutStr
Putint
nwln

BX
BX,0
num_ok
error_msg

request the number

read number into BX
test for a positive number

try_again

fact
output_msg
AX

output result

done:
.EXIT

Procedure fact receives a positive integer N in BX.
It returns N! in the AX register.

fact:
cmp
jg
mov
ret
one_up:
dec

BL,1
one_up
AX,1

; if N > 1, recurse
; return 1 for N < 2
; terminate recursion

recurse with

(N-1)

Assembly Language Programming in Linux

394
call
inc
mul

49
50
51
52
53

fact
BL
BL

AL * BL

ret

The f a c t procedure receives the number n in the BL register. It essentially implements the
C code given before. One minor difference is that this procedure terminates when n < 1. This
termination would save us one recursive call. When the value in BL is less than or equal to 1, the
AX register is set to 1 to terminate the recursion. The activation record in this example consists
of the return address pushed onto the stack by the c a l l instruction. Since we are using the BL
register to pass n, it is decremented before the call (line 48) and restored after the call (line 50).
The multiply instruction
mul

multiplies the contents of the BL and AL registers and places the 16-bit result in the AX register.
This is the value returned by the f a c t procedure.

Illustrative Examples
We give two examples to further illustrate the principles of recursion. The first one computes a
Fibonacci number and the second one implements the popular quicksort algorithm.
Example 19.1 Computes the Nth Fibonacci number.
The Fibonacci sequence of numbers is defined as
fib(l)=l,
fib(2)=l,
fib(n) = fib(n - 1) + fib(n 2) for n > 2.
In other words, the first two numbers in the Fibonacci sequence are 1. The subsequent numbers
are obtained by adding the previous two numbers in the sequence. Thus,
1,1,2,3,5,8, 13,21,34,55,...,
is the Fibonacci sequence of numbers. From this definition, you can see the recursive nature of the
computation.
Program 19.1 shows the program to compute the A^th Fibonacci number. The value A^ is
requested from the user as in the last program. The main program checks the validity of the input
value. If the number is less than 1, an error message is displayed and the user is asked to enter a
valid number (lines 20-24). If the input number is a valid one, it calls the f i b procedure, which
returns the Ath Fibonacci number in the EAX register. This value is output using Put L i n t on
line 30.

Chapter 19 • Recursion

395

Program 19.2 A program to compute the Fibonacci numbers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

Fibonacci number - Recursive version

FIB.ASM

Objective: To compute the Fibonacci number.
Input: Requests an integer N from the user.
Output: Outputs fib(N).
%include "io.mac"
.DATA
prompt_msg
output_msg
error_msg

db
db
db

"Please enter a number > 0: ",0
"fib(N) is: ",0
"Not a valid number. Try again.",0

.CODE
.STARTUP
PutStr prompt_msg
try_again:
Getint
cmp
jg
PutStr
nwln
jmp
num_ok:
call

BX
BX,0
num_ok
error_msg

request the number

read number into BX
test if N>0

try_again

fib

PutStr
output_msg
PutLInt EAX
nwln

output result

done:
.EXIT

Procedure fib receives a positive integer N in BX.
It returns fib(N) in the EAX register.

fib:
cmp
jg
mov
ret
one_up:
push
dec

BX,2
one_up
EAX,1

EDX
BX

; if N > 2, recurse

; return l i f N = l o r 2
; terminate r e c u r s i o n

/ recurse with (N-1)

396

50
51
52
53
54
55
56
57
58
59

Assembly Language Programming in Linux

call
mov
dec
call
add

fib
EDX,EAX
BX
fib
EAX,EDX

add
pop

BX,2
EDX

; save fib(N-l) in EDX
; recurse with (N-2)
; EAX = fib(N-2) + fib(N-l)
; restore BX and EDX

ret

The f i b procedure uses recursion to compute the required Fibonacci number. The A^ value is
received in the BX register. The recursion termination condition is implemented by lines 42-44.
This procedure returns 1 in EAX if A/^ is 1 or 2.
The recursion is implemented on lines 47-57. It decrements BX by one before calling the
f i b procedure to compute f i b ( N - 1 ) . The value returned by this call is stored in the EDX
register (line 51). The BX value is decremented again before calling f i b on line 53 to compute
f i b (N—2). The two f i b values are added on line 54 to compute the f i b (N) value. The
procedure preserves both BX (line 56) and EDX (lines 48 and 57).
Example 19.2 Implementation of the quicksort algorithm using recursion.
Quicksort is one of the most popular sorting algorithms; it was proposed by C.A.R. Hoare in 1960.
Once you understand the basic principle of the quicksort, you will see why recursion naturally
expresses it.
At its heart, quicksort uses a divide-and-conquer strategy. The original sort problem is reduced
to two smaller sort problems. This is done by selecting a partition element x and partitioning
the array into two subarrays: all elements less than x are placed in one subarray and all elements
greater than x are in the other. Now, we have to sort these two subarrays, which are smaller than
the original array. We apply the same procedure to sort these two subarrays. This is where the
recursive nature of the algorithm shows up. The quicksort procedure to sort an TV-element array is
summarized below:
1. Select a partition element x.
2. Assume that we know where this element x should be in the final sorted array. Let it be at
a r r a y [ i ] , We give details of this step shortly.
3. Move all elements that are less than x into positions a r r a y [0] ••• a r r a y [ i - 1 ] .
Similarly, move those elements that are greater than x into positions a r r a y [i+1] • • •
a r r a y [N-1]. Note that these two subarrays are not sorted.
4. Now apply the quicksort procedure recursively to sort these two subarrays until the array is
sorted.
How do we know the final position of the partition element x without sorting the array? We don't
have to sort the array; we just need to know the number of elements either before or after it. To
clarify the working of the quicksort algorithm, let us look at an example. In this example, and
in our quicksort implementation, we pick the last element as the partition value. Obviously, the
selection of the partition element influences performance of the quicksort. There are several better
ways of selecting the partition value; you can get these details in any textbook on sorting.

Chapter 19 • Recursion
Initial state:
After 1st pass:

397

2 9 8 13 4 7 6
2 1 3 4 6 7 9 8

Partition element;
Partition element 6 is in its final place.

The second pass works (Dnthe following two subarrays.
Istsubarray:
2nd subarray:

2 1 3 4;
7 9 8.

To move the partition element to its final place, we use two pointers i and j . Initially, i points
to the first element and j points to the second-to-the-last element. Note that we are using the last
element as the partition element. The index i is advanced until it points to an element that is greater
than or equal to x. Similarly, j is moved backward until it points to an element that is less than or
equal to x. Then we exchange the two values at i and j . We continue this process until i is greater
than or equal to j . The quicksort pseudocode is shown below:
q u i c k _ s o r t (array, lo, hi)
if(hi>lo)
X := array[hi]
i:=lo
j:=hi
while (i < j)
while (array[i] < x)
i := i + 1
end while
while (array[j] > x)
j:=j-l
end while
if(i array [hi]
q u i c k _ s o r t (array, lo, i-1)
q u i c k _ s o r t (array, i-hl, hi)
end if
end q u i c k _ s o r t

/* exchange values */
/* exchange values */

The quicksort program is shown in Program 19.3. The input values are read by the read
loop (lines 25 to 31). This loop terminates if the input is zero. As written, this program can
cause problems if the user enters more than 200 integers. You can easily remedy this problem by
initializing the ECX with 200 and using the l o o p instruction on line 31. The three arguments
are placed in the EBX (array pointer), ESI (lo), and EDI (hi) registers (lines 35 to 37). After the
quicksort call on line 38, the program outputs the sorted array (lines 41 to 50).

398

Assembly Language Programming in Linux

Program 19.3 Sorting integers using the recursive quicksort algorithm
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
16
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33;
34;
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:

Sorting integers using quicksort

QSORT.ASM

Objective: Sorts an array of integers using
quick sort. Uses recursion.
Input: Requests integers from the user.
Terminated by entering zero.
Output: Outputs the sorted arrray.
%include "io.mac"
.DATA
prompt_msg
output_msg

db
db
db

.UDATA
arrayl

resw

"Please enter integers. ",ODH,OAH
"Entering zero terminates the input.",0
"The sorted array is: ",0

200

.CODE
.STARTUP
request the number
PutStr prompt_msg
nwln
mov
EBX,arrayl
EDI keeps a count of
xor
EDI,EDI
input numbers
read_more:
Getint AX
mov
[EBX+EDI*2],AX ; store input # in array
cmp
AX,0
; test if it is zero
je
exit_read
inc
EDI
jmp
read_more
exit_read:
; prepare arguments for procedure call
mov
EBX,arrayl
xor
ESI,ESI
; ESI = lo index
dec
EDI
; EDI = hi index
call
qsort
PutStr output_msg
; output sorted array
write_more:
; since qsort preserves all registers, we will
; have valid EBX and ESI values.
mov
AX,[EBX+ESI*2]
cmp
AX,0
je
done
Putint AX
nwln
inc
ESI

Chapter 19 • Recursion

50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100

jmp

399
write more

done:
.EXIT

Procedure qsort receives a pointer to the array in BX.
LO and HI are received in ESI and EDI, respectively.
It preserves all the registers.

qsort:
pushad
cmp
jle

EDI,ESI
qsort_done

; end recursion if hi <= lo

; save hi and lo for later use
ECX,ESI
mov
EDX,EDI
mov
mov
lo_loop:
cmp
jge
inc
jmp
lo_loop_done:
dec
hi_loop:
cmp
jle
cmp
jle
dec
jmp
hi_loop_done:

AX,[EBX+EDI*2] ; AX = xsep

[EBX+ESI*2],AX
1o_loop_done
ESI
lo_loop

EDI
EDI,ESI
sep_done
[EBX+EDI*2],AX
hi_loop_done
EDI
hi_loop

xchg
xchg
xchg
jmp

AX, [EBX+ESI*2]
AX, [EBX+EDI*2]
AX, [EBX+ESI*2]
lo_loop

sep_done:
xchg
xchg
xchg

AX,[EBX+ESI*2]
AX, [EBX+EDX*2]
AX, [EBX+ESI*2]

dec
mov

ESI
EDI,ESI

LO w h i l e

loop

hi = hi-1

HI w h i l e

loop

x[i]

<=> x[j]

x[i]

<=> x[hi]

i-1

400

Assembly Language Programming in Linux

; We modify the ESI value in the next statement.
; Since the original ESI value is in EDI, we use
; EDI to get i+1 value for the second qsort call.
mov
ESI,ECX
call
qsort

101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116

; EDI has the i value
inc
EDI
inc
EDI
mov
ESI,EDI
mov
EDI,EDX
call
qsort

; lo = i+1

qsort_done:
popad
ret

The quicksort procedure follows the pseudocode. Since we are not returning any values, we
use pushad to preserve all registers (line 62). The two inner while loops are implemented by
the LO_LOOP and HI_LOOP. The exchange of elements is done by using three xchg instructions
(lines 89 to 91 and 95 to 97). The rest of the program is straightforward to follow.

Recursion Versus Iteration
In theory, every recursive function has an iterative counterpart. To see this, let us write in C the
iterative version to compute the factorial function.
int fact_iterative(int n)
{

int

i, result;

if (n == 0)
return (1);
result = 1;
for(i = 1; i <= n; i++)
result = result * i;
return(result);

Comparing this code with the recursive version given on page 391, it is obvious that the recursive
version is concise and reflects the mathematical definition of the factorial function. Once you
get through the initial learning problems with recursion, recursive code is easier to understand
for those functions that are defined recursively. Some such examples are the factorial function,
Fibonacci number computation, binary search, and quicksort.
This leads us to the question of when to use recursion. To answer this question, we need to
look at the potential problems recursion can cause. There are two main problems with recursion:

Chapter 19 • Recursion

401

• Inefficiency: In most cases, recursive versions tend to be inefficient. You can see this point
by comparing the recursive and iterative versions of the factorial function. The recursive
version induces more overheads to invoke and return from procedure calls. To compute A^!,
we need to call the factorial function about A^ times. In the iterative version, the loop iterates
about A^ times.
Recursion could also introduce duplicate computation. For example, to compute the Fibonacci number f i b (5)
fib(5)

= fib(4)

+ fib(3)

a recursive procedure computes f i b (3) twice, f i b (2) twice, and so on.
• Increased memory requirement: Recursion tends to demand more memory. This can be
seen from the simple factorial example. For large A'^, the demand for stack memory can be
excessive. In some cases, the limit on the available memory may make the recursive version
unusable.
On the positive side, however, note that recursion leads to better understanding of the code for
those naturally recursive problems. In this case, recursion aids in program maintenance.

Summary
We can use recursive procedures as an alternative to iterative ones. A procedure that calls itself,
whether directly or indirectly, is called a recursive procedure. In direct recursion, a procedure calls
itself, as in our factorial example. In indirect recursion, a procedure may initiate a sequence of
calls that eventually results in calling the procedure itself.
For some applications, we can write an elegant solution because recursion is a natural fit.
We illustrated the principles of recursion using a few examples: factorial, Fibonacci number, and
quicksort. We presented recursive versions of these functions in the assembly language. In the last
section we identified the tradeoffs associated with recursion as opposed to iteration.

20
Protected-Mode
Interrupt Processing
Interrupts, like procedures, can be used to alter a program's control flow to a procedure called
an interrupt service routine. Unlike procedures, which can be invoked by a c a l l instruction,
interrupt service routines can be invoked either in software (called software interrupts) or by hardware (called hardware interrupts). After introducing the interrupts we discuss the taxonomy of the
IA-32 interrupts. We describe the interrupt invocation mechanism in the protected mode before
describing the exceptions. The next two sections deal with software interrupts and file I/O. We use
the Linux system calls to illustrate how we can access I/O devices like the keyboard and display.
Hardware interrupts along with the I/O instructions are briefly introduced toward the end of the
chapter The last section summarizes the chapter

Introduction
Interrupt is a mechanism by which a program's flow control can be altered. We have seen two
other mechanisms to do the same: procedures and jumps. While jumps provide a one-way transfer
of control, procedures provide a mechanism to return control to the point of calling when the called
procedure is completed.
Interrupts provide a mechanism similar to that of a procedure call. Causing an interrupt transfers control to a procedure, which is referred to as an interrupt service routine (ISR). An ISR is
sometimes called a handler. When the ISR is completed, the interrupted program resumes execution as if it were not interrupted. This behavior is analogous to a procedure call. There are,
however, some basic differences between procedures and interrupts that make interrupts almost
indispensable.
One of the main differences is that interrupts can be initiated by both software and hardware.
In contrast, procedures are purely software-initiated. The fact that interrupts can be initiated by
hardware is the principal factor behind much of the power of interrupts. This capability gives us an
efficient way by which external devices (outside the processor) can get the processor's attention.

404

Assembly Language Programming in Linux

Interrupts

f Software Interrupts J

Exceptions

)

(

Traps

)

f Hardware Interrupts J

Maskable

C Nonmaskable

Figure 20.1 A taxonomy of the IA-32 interrupts.

Software-initiated interrupts—called simply software interrupts—are caused by executing the
i n t instruction. Thus these interrupts, like procedure calls, are anticipated or planned events.
For example, when you are expecting a response from the user (e.g., Y or N), you can initiate
an interrupt to read a character from the keyboard. What if an unexpected situation arises that
requires immediate attention of the processor? For example, you have written a program to display
the first 90 Fibonacci numbers on the screen. While running the program, however, you realized
that your program never terminates because of a simple programming mistake (e.g., you forgot
to increment the index variable controlling the loop). Obviously, you want to abort the program
and return control to the operating system. As you know, this can be done by C t r l - c in Linux
( C t r l - b r e a k on Windows). The important point is that this is not an anticipated event—so it
cannot be effectively programmed into the code.
The interrupt mechanism provides an efficient way to handle such unanticipated events. Referring to the previous example, the C t r l - c could cause an interrupt to draw the attention of the
processor away from the user program. The interrupt service routine associated with C t r l - c can
terminate the program and return control to the operating system.
Another difference between procedures and interrupts is that ISRs are normally memoryresident. In contrast, procedures are loaded into memory along with application programs. Some
other differences—such as using numbers to identify interrupts rather than names, using an invocation mechanism that automatically pushes the flags register onto the stack, and so on—are
pointed out in later sections.

A Taxonomy of Interrupts
We have already identified two basic categories of interrupts—software-initiated and
hardware-initiated (see Figure 20.1). The third category is called exceptions. Exceptions handle
instruction faults. An example of an exception is the divide error fault, which is generated whenever divide by 0 is attempted. This error condition occurs during the d i v or i d i v instruction
execution if the divisor is 0. We discuss exceptions later.
Software interrupts are written into a program by using the i n t instruction. The main use of
software interrupts is in accessing I/O devices such as the keyboard, printer, display screen, disk
drive, etc. Software interrupts can be further classified into system-defined and user-defined.
Hardware interrupts are generated by hardware devices to get the attention of the processor.
For example, when you strike a key, the keyboard hardware generates an external interrupt, causing
the processor to suspend its present activity and execute the keyboard interrupt service routine to

Chapter 20 • Protected-Mode Interrupt Processing

405

process the key. After completing the keyboard ISR, the processor resumes what it was doing
before the interruption.
Hardware interrupts can be either maskable or nonmaskable. The processor always attends
the nonmaskable interrupt (NMI) immediately. One example of NMI is the RAM parity error
indicating memory malfunction.
Maskable interrupts can be delayed until execution reaches a convenient point. As an example,
let us assume that the processor is executing a main program. An interrupt occurs. As a result, the
processor suspends main as soon as it finishes the current instruction and transfers control to the
ISRl interrupt service routine. If ISRl has to be executed without any interruption, the processor
can mask further interrupts until it is completed. Suppose that, while executing ISRl, another
maskable interrupt occurs. Service to this interrupt would have to wait until ISRl is completed.
We discuss hardware interrupts toward the end of the chapter.

Interrupt Processing in the Protected Mo6e
Let's now look at interrupt processing in the protected mode. Unlike procedures, where a name is
given to identify a procedure, interrupts are identified by a type number. The IA-32 architecture
supports 256 different interrupt types. The interrupt type ranges from 0 to 255. The interrupt type
number, which is also called a vector, is used as an index into a table that stores the addresses of
ISRs. This table is called the interrupt descriptor table (IDT). Like the global and local descriptor
tables GDT and LDT (discussed in Chapter 4), each descriptor is essentially a pointer to an ISR
and requires eight bytes. The interrupt type number is scaled by 8 to form an index into the IDT.
The IDT may reside anywhere in physical memory. The location of the IDT is maintained in
an IDT register IDTR. The IDTR is a 48-bit register that stores the 32-bit IDT base address and
a 16-bit IDT limit value as shown in Figure 20.2. However, the IDT does not require more than
2048 bytes, as there can be at most 256 descriptors. In a system, the number of descriptors could
be much smaller than the maximum allowed. In this case, the IDT limit can be set to the required
size. If the referenced descriptor is outside the IDT limit, the processor enters the shutdown mode.
In this mode, instruction execution is stopped until either a nonmaskable interrupt or a reset signal
is received.
There are two special instructions to load ( l i d t ) and store ( s i d t ) the contents of the IDTR
register. Both instructions take the address of a 6-byte memory as the operand.
The IDT can have three types of descriptors: interrupt gate, trap gate, and task gate. We
will not discuss task gates, as they are not directly related to the interrupt mechanism that we are
interested in. The format of the other two gates is shown in Figure 20.3. Both gates store identical
information: a 16-bit segment selector, a 32-bit offset, a descriptor privilege level (DPL), and a P
bit to indicate whether the segment is present or not.
When an interrupt occurs, the segment selector is used to select a segment descriptor that is in
either the GDT or the current LDT. Recall from our discussion in Chapter 4 that the TI bit of the
segment descriptor identifies whether the GDT or the current LDT should be used. The segment
descriptor provides the base address of segment that contains the interrupt service routine as shown
in Figure 20.4. The offset part comes from the interrupt gate.
What happens when an interrupt occurs depends on whether there is a privilege change or not.
In the remainder of the chapter, we look at the simple case of no privilege change. In this case, the
following actions are taken on an interrupt:

406

Assembly Language Programming in Linux

- H ADD
Offset

Gate for
interrupt N

Offset

Gate for
interrupt 1

Offset

Gate for
interrupt 0

Segment
selector
/
47

16 15
IDT base address

32 bits

IDT limit
>|<—
IDTR

16 bits — H

Segment
selector
Segment
selector
4 bytes

Figure 20.2 Organization of tine IDT. Tine IDTR register stores the 32-bit IDT base address and a
16-bit value indicating the IDT size.

1615
Offset 31-16

Segment selector
32

5 4

D
P P 0 1 1 1 0 0 0 0 Not used
L
Offset 15-0
16 15
Interrupt gate

1615
Offset 31-16

Segment selector
32

5 4

D
P P 0 1 1 1 1 0 0 0 Not used
L
Offset 15-0
16 15
Trap gate

Figure 20.3 The IA-32 interrupt descriptors.

Chapter 20 • Protected-Mode Interrupt Processing

407

IDT

Code segment

Offset

- H ADD^

Interrupt type
segment
selector
GDT or LDT

Base
address

Figure 20.4 Protected-mode interrupt invocation.

1. Push the EFLAGS register onto the stack;
2. Clear the interrupt and trap flags;
3. Push CS and EIP registers onto the stack;
4. Load CS with the 16-bit segment selector from the interrupt gate;
5. Load EIP with the 32-bit offset values from the interrupt gate.
On receiving an interrupt, the flags register is automatically saved on the stack. The interrupt and
trap flags are cleared to disable further interrupts. Usually, this flag is set in ISRs unless there is
a special reason to disable other interrupts. The interrupt flag can be set by s t i and cleared by
c l i assembly language instructions. Both of these instructions require no operands. There are no

408

Assembly Language Programming in Linux

7 7

EFLAGS

CS
ESP

EIP

EIP
ESP

Error code

(a)

(b)

Figure 20.5 Stack state after an interrupt invocation.

special instructions to manipulate the trap flag. We have to use popf and pushf to modify the
trap flag. We give an example of this in the next section.
The current CS and EIP values are pushed onto the stack. The CS and EIP registers are loaded
with the segment selector and offset from the interrupt gate, respectively. Note that when we load
the CS register with the 16-bit segment selector, the invisible part consisting of the base address,
segment limit, access rights, and so on is also loaded. The stack state after an interrupt is shown in
Figure 20.5a.
Interrupt processing through a trap gate is similar to that through an interrupt gate except for
the fact that trap gates do not modify the IF flag.
While the previous discussion holds for all interrupts and traps, some types of exceptions also
push an error code onto the stack as shown Figure 20.5b. The exception handler can use this error
code in identifying the cause for the exception.
Returning from an interrupt handler Similar to procedures, ISRs should end with a return statement to send control back to the interrupted program. The interrupt return ( i r e t ) is used for this
purpose. The last instruction of an ISR should be the i r e t instruction. It serves the same purpose
as r e t for procedures. The actions taken on i r e t are
1. Pop the 32-bit value on top of the stack into the EIP register;
2. Pop the 16-bit value on top of the stack into the CS register;
3. Pop the 32-bit value on top of the stack into the EFLAGS register.

Exceptions
The exceptions are classified into faults, traps, and aborts depending on the way they are reported
and whether the interrupted instruction is restarted. Faults and traps are reported at instruction
boundaries. Faults use the boundary before the instruction during which the exception was detected. When a fault occurs, the system state is restored to the state before the current instruction
so that the instruction can be restarted. The divide error, for instance, is a fault detected during
the d i v or i d i v instruction. The processor, therefore, restores the state to correspond to the one

Chapter 20 • Protected-Mode Interrupt Processing

409

Table 20.1 The First Five Dedicated Interrupts

Interrupt type
0
1
2
3
4

Purpose
Divide error
Single-step
Nonmaskable interrupt (NMI)
Breakpoint
Overflow

before the divide instruction that caused the fault. Furthermore, the instruction pointer is adjusted
to point to the divide instruction so that, after returning from the exception handler, the divide
instruction is reexecuted.
Another example of a fault is the segment-not-present fault. This exception is caused by a
reference to data in a segment that is not in memory. Then, the exception handler must load the
missing segment from the disk and resume program execution starting with the instruction that
caused the exception. In this example, it clearly makes sense to restart the instruction that caused
the exception.
Traps, on the other hand, are reported at the instruction boundary immediately following the
instruction during which the exception was detected. For instance, the overflow exception (interrupt 4) is a trap. Therefore, no instruction restart is done. User-defined interrupts are also examples
of traps.
Aborts are exceptions that report severe errors. Examples include hardware errors and inconsistent values in system tables.
There are several predefined interrupts. These are called dedicated interrupts. These include
the first five interrupts as shown in Table 20.1. The NMI is a hardware interrupt and is discussed
in Section 20. A brief description of the remaining four interrupts is given here.
Divide Error Interrupt The processor generates a type 0 interrupt whenever executing a divide
instruction—either d i v (divide) or i d i v (integer divide)—results in a quotient that is larger
than the destination specified. The default interrupt handler on Linux displays a Floating point
exception message and terminates the program.
Single-Step Interrupt Single-stepping is a useful debugging tool to observe the behavior of a
program instruction by instruction. To start single-stepping, the trap flag (TF) bit in the flags
register should be set (i.e., TF = 1). When TF is set, the CPU automatically generates a type 1
interrupt after executing each instruction. Some exceptions do exist, but we do not worry about
them here.
The interrupt handler for the type 1 interrupt can be used to display relevant information about
the state of the program. For example, the contents of all registers could be displayed.
To end single stepping, the TF should be cleared. The instruction set, however, does not have
instructions to directly manipulate the TF bit. Instead, we have to resort to an indirect means. You
have to push flags register using pushf and manipulate the TF bit and use popf to store this
value back in theflagsregister. Here is an example code fragment that sets the trap flag:

410

Assembly Language Programming in Linux
pushf
pop
or
push
popf

AX
AX,100H
AX

;
;
;
/
;

copy the flag register
into AX
set the trap flag bit (TF = 1)
copy the modified flag bits
back into the flags register

Recall that bit 8 of the flags register is the trap flag (see Figure 4,4 on page 65). We can use the
following code to clear the trap flag:
pushf
pop
and
push
popf

AX
AX,OFEFFH
AX

/
;
/
;
;

copy the flags register
into AX
clear trap flag bit (TF = 0)
write back into
the flags register

Breakpoint Interrupt If you have used a debugger, which you should have by now, you already
know the usefulness of inserting breakpoints while debugging a program. The type 3 interrupt is
dedicated to the breakpoint processing. This type of interrupt can be generated by using the special
single-byte form of i n t 3 (opcode CCH). Using the i n t 3 instruction automatically causes the
assembler to encode the instruction into the single-byte version. Note that the standard encoding
for the i n t instruction is two bytes long.
Inserting a breakpoint in a program involves replacing the program code byte by CCH while
saving the program byte for later restoration to remove the breakpoint. The standard 2-byte version
of i n t 3 can cause problems in certain situations, as there are instructions that require only a
single byte to encode.
Overflow Interrupt The type 4 interrupt is dedicated to handle overflow conditions. There are
two ways by which a type 4 interrupt can be generated: either by i n t 4 or by i n t o . Like the
breakpoint interrupt, i n t o requires only one byte to encode, as it does not require the specification
of the interrupt type number as part of the instruction. Unlike i n t 4, which unconditionally
generates a type 4 interrupt, i n t o generates a type 4 interrupt only if the overflow flag is set. We
do not normally use i n t o , as the overflow condition is usually detected and processed by using
the conditional jump instructions j o and j no.

Software Interrupts
Software interrupts are initiated by executing an interrupt instruction. The format of this instruction is
int

interrupt-type

where i n t e r r u p t - t y p e is an integer in the range 0 through 255 (both inclusive). Thus a total
of 256 different types is possible. This is a sufficiently large number, as each interrupt type can be
parameterized to provide several services. For example, Linux provides a large number of services
via i n t 0x80. In fact, it provides more than 180 different system calls! All these system calls
are invoked by i n t 0x80. The required service is identified by placing the system call number
in the EAX register. If the number of arguments required for a system call is less than six, these are
placed in other registers. Usually, the system call also returns values in registers. We give details
on some of the file access services provided by i n t 0x8 0 in the next section.

Chapter 20 • Protected-Mode Interrupt Processing

411

Linux System Calls Of the 256 interrupt vectors available, Linux uses the first 32 vectors (i.e.,
from 0 to 31) for exceptions and nonmaskable interrupts. The next 16 vectors (from 32 to 47)
are used for hardware interrupts generated through interrupt request lines (IRQs) (discussed in the
next chapter). It uses one vector (128 or 0x80) for software interrupt to provide system services.
Even though only one interrupt vector is used for system services, Linux provides several services
using this interrupt.

File I/O
In this section we give several examples to perform file I/O operations. In Linux as in UNIX, the
keyboard and display are treated as stream files. So reading from the keyboard is not any different
from reading a file from the disk. If you have done some file I/O in C, it is relatively easy to
understand the following examples. Don't worry if you are not familiar with the file I/O; we give
enough details here.
The system sees the input and output data as a stream of bytes. It does not make any logical
distinction whether the byte stream is coming from a disk file or the keyboard. This makes it easy
to interface with the I/O devices like keyboard and display. Three standard file streams are defined:
standard input ( s t d i n ) , standard output ( s t d o u t ) , and standard error ( s t d e r r ) . The default
association for the standard input is the keyboard; for the other two, it is the display.
File Descriptor
For each open file, a small 16-bit integer is assigned as a file id. These magic numbers are called
thQfiledescriptors. Before accessing a file, it must first be opened or created. To open or create
a file, we need the file name, mode in which it should be opened or created, and so on. The file
descriptor is returned by the file open or c r e a t e system calls. Once a file is open or created, we
use the file descriptor to access the file.
We don't have to open the three standard files mentioned above. They are automatically opened
for us. These files are assigned the lowest three integers: s t d i n (0), s t d o u t (1), and s t d e r r
(2).
File Pointer
A file pointer is associated with each open file. The file pointer specifies an offset in bytes into the
file relative to the beginning of the file. A file itself is viewed as a sequence of bytes or characters.
The file pointer specifies the location in the file for the subsequent read or write operation.
When a file is opened, the file pointer of that file is set to zero. In other words, the file pointer
points to the first byte of the file. Sequential access to the file is provided by updating the file
pointer to move past the data read or written. Direct access, as opposed to sequential access, to a
file is provided by simply manipulating the file pointer.
File System Calls
System calls described in this section provide access to the data in disk files. As discussed previously, before accessing the data stored in a file, we have to open the file. We can only open a file
if it already exists. Otherwise, we have to create a new file, in which case there is no data and our
intent should be to write something into the file. Linux provides two separate functions—one to
open an existing file (system call 5) and the other to create a new file (system call 8).

412

Assembly Language Programming in Linux

Once a file is opened or created, the data from that file can be read or data can be written into
the file. We can use system call 3 to read data from a file and data can be written to afileby using
system call 4. In addition, since disks allow direct access to the data stored, data contained in a
disk file can be accessed directly or randomly. To provide direct access to the data stored in a file,
the file pointer should be moved to the desired position in the file. The system call 19 facilitates
this process. Finally, when processing of data is completed we should close the file. We use system
call number 6 to close an open file.
Afilename (you can include the path if you wish) is needed only to open or create file. Once a
file is opened or created, a file descriptor is returned and all subsequent accesses to the file should
use this file descriptor.
The remainder of this section describes some of the file system calls.
System call 8 — Create and open a file
Inputs:
Returns:
Error:

EAX
EBX
ECX
EAX
EAX

=
=
=
=
=

8
filename
file permissions
file descriptor
error code

This system call can be used to create a new file. The EBX should point to the file name string,
which can include the path. The ECX should be loaded with file permissions for owner, group
and others as you would in the Linux (using chmod command) to set the file permissions. File
permissions are represented by three groups of three bits as shown below:
8 7 6 5 4 3 2 1 0
R W X R W X R W X
User

Group

Other

For each group, you can specify read (R), write (W), and execute (X) permissions. For example, if you want to give read, write, and execute for the owner but no access to anyone else, set the
three owner permission bits to 1 and other bits to 0. Using the octal number system, we represent
this number as 0700. If you want to give read, write, and execute for the owner, read permission to
the group, and no access to others, you can set the permissions as 0740. (Note that octal numbers
are indicated by prefixing them with a zero as in the examples here.)
Thefileis opened in read/write access mode and afiledescriptor (a positive integer) is returned
in EAX if there is no error. In case of an error, the error code (a negative integer) is placed in EAX.
For example, a create error may occur due to a nonexistent directory in the specified path, or if
there are device access problems or the specified file already exists, and so on. As we see next, we
can also use file open to create a file.

Chapter 20 • Protected-Mode Interrupt Processing

413

System call 5 — Open a file
Inputs:

Returns:
Error:

EAX
EBX
ECX
EDX
EAX
EAX

=
=
=
=
=
=

5
filename
file access mode
file permissions
file descriptor
error code

This function can be used to open an existing file. It takes the file name and file mode information
as in the file-create system call. In addition, it takes the file access mode in ECX register. This
field gives information on how the file can be accessed. Some interesting values are read-only
(0), write-only (1), and read-write (2). Why is access mode specification important? The simple
answer is to provide security. A file that is used as an input file to a program can be opened
as a read-only file. Similarly, an output file can be opened as a write-only file. This eliminates
accidental writes or reads. This specification facilitates, for example, access to files for which you
have read-only access permission.
We can use this system call to create a file by specifying 0100 for file access mode. This is
equivalent to the file-create system call we discussed before. We can erase contents of a file by
specifying 01000 for the access mode. This leaves the file pointer at the beginning of the file. If
we want to append to the existing contents, we can specify 02000 to leave the file pointer at the
end.
As with the create system call, file descriptor and error code values are returned in the EAX
register.
System call 3 — Read from a file
Inputs:

Returns:
Error:

EAX
EBX
ECX
EDX

=
=
=
=

3
file descriptor
pointer to input buffer
buffer size
(maximum number of bytes to read)
EAX = number of bytes read
EAX = error code

Before calling this function to read data from a previously opened or created file, the number of
bytes to read should be specified in EDX and ECX should point to a data buffer into which the
data read from the file is placed. The file is identified by giving its descriptor in EBX.
The system attempts to read EDX bytes from the file starting from the current file pointer
location. Thus, by manipulating the file pointer (see I s e e k system call discussed later), we can
use this function to read data from a random location in a file.
After the read is complete, the file pointer is updated to point to the byte after the last byte
read. Thus, successive calls would give us sequential access to the file.
Upon completion, if there is no error, EAX contains the actual number of bytes read from the
file. If this number is less than that specified in EDX, the only reasonable explanation is that the
end of file has been reached. Thus, we can use this condition to detect e n d - o f - f i l e .

414

Assembly Language Programming in Linux

System call 4 — Write to a 1lie
Inputs:

Returns:
Error:

EAX
EBX
ECX
EDX
EAX
EAX

=
=
=
=
=
=

4
file descriptor
pointer to output buffer
buffer size (number bytes to write)
number of bytes written
error code

This function can be used to write to a file that is open in write or read/write access mode. Of
course, if a file is created, it is automatically opened in read/write access mode. The input parameters have similar meaning as in the read system call. On return, if there is no error, EAX
contains the actual number of bytes written to the file. This number should normally be equal to
that specified in EDX. If not, there was an error—possibly due to disk full condition.
System call 6 — Close a file
Inputs:
Returns:
Error:

EAX
EBX
EAX
EAX

=
=
=
=

6
file descriptor
—
error code

This function can be used to close an open file. It is not usually necessary to check for errors after
closing a file. The only reasonable error scenario is when EBX contains an invalid file descriptor.
System call 19 — Iseek (Updates file pointer)
Inputs:

Returns:
Error:

EAX
EBX
ECX
EDX
EAX
EAX

=
=
=
=
=
=

19
file descriptor
offset
whence
byte offset from the beginning of file
error code

Thus far, we processedfilessequentially. Thefilepointer remembers the position in the file. As we
read from or write to the file, the file pointer is advanced accordingly. If we want to have random
access to a file rather than accessing sequentially, we need to manipulate the file pointer.
This system call allows us to reposition the file pointer. As usual, the file descriptor is loaded
into EBX. The offset to be added to the file pointer is given in ECX. This offset can added relative
to the beginning of file, end of file, or current position. The whence value in EDX specifies this
reference point:
Reference position whence value
Beginning of
file
0
Current position
1
End of
file
2
These system calls allow us to write file I/O programs. Since keyboard and display are treated
as files as well, we can write assembly language programs to access these I/O devices.

Chapter 20 • Protected-Mode Interrupt Processing

415

Our First Program
As our first example, we look at the PutCh procedure we used to write a character to the display.
This is done by using the write system call. We specify s t d o u t as the file to be written. The
procedure is shown in Program 20.1. Since the character to be displayed is received in the AL
register, we store it in temp_char before loading EAX with system call number 4. We load
the temp_char pointer in ECX. Since we want to readjust one character, we load 1 into EDX
(line 10). We preserve the registers by using pusha and popa on lines 5 and 12.
Program 20.1 Procedure to write a character
1
2
3
4

5
6
7
8
9
10
11
12
13

; Put character procedure receives the character in AL.
putch
pusha

mov
mov
mov
mov
mov
int

[temp_char],AL
EAX, 4
EBX,1
ECX,temp_char
EDX,1
0x8 0

4 = write
• 1 = std output (display)
• pointer to char buffer
# bytes = 1

popa

ret

Illustrative Examples
We present two examples that use the file I/O system calls described before. As in the last example,
the first one is taken from the I/O routines we have used (see Chapter 7 for details).
Example 20.1 Procedure to read a string.
In this example, we look at the string read function ge t s t r. We can read a string by using a single
file read system call as shown in Program 20.2. Since we use the dec instruction, which modifies
the flags register, we preserve its contents by saving and restoring the flags register using pushf
(line 7) and popf (line 16). Since the file read system call returns the number of characters read in
EAX, we can add this value (after decrementing) to the buffer pointer to append a NULL character
(line 15). This returns the string in the NULL-terminated format.
Program 20.2 Procedure to read a string
Get string procedure receives input buffer pointer in
EDI and the buffer size in ESI.
getstr:
pusha

Assembly Language Programming in Linux

416
7
8
9
10
11
12
13
14
15
16
17
18

pushf

mov
mov
mov
mov
int
dec

EAX,3
EBX,0
ECX,EDI
EDX,ESI
0x8 0

file read service
0 = std input (keyboard)
pointer to input buffer
input buffer size

EAX

done_getstr:

mov

byte[ED

append NULL character

popf
popa

ret

Example 20.2 A file copy program.
This example uses file copy to show how disk files can be manipulated using the file I/O system
calls. The program requests the input and output file names (lines 27-31). It opens the input file in
read-only mode using the open file system call (lines 33-39). If the call is successful, it returns the
file descriptor (a positive integer) in EAX. In case of an error, a negative value is returned in EAX.
This error check is done on line 41. If there is an error in opening the file, the program displays the
error message and quits. Otherwise, it creates the output file (lines 47-53). A similar error check
is done for the output file (lines 55-59).
File copy is done by reading a block of data from the input file and writing it to the output file.
The block size is determined by the buffer size allocated for this purpose (see line 23). The copy
loop on lines 61-79 consists of three parts:
• Read a block of BUF_S IZE bytes from the input file (lines 62-67);
• Write the block to the output file (lines 69-74);
• Check to see if the end of file has been reached. As discussed before, this check is done by
comparing the number of bytes read by thefile-readsystem call (which is copied to EDX) to
BUF_S I ZE. If the number of bytes read is less than BUF_S I ZE, we know we have reached
the end of file (lines 76 and 77).
After completing the copying process, we close the two open files (lines 81-85).
Program 20.3 File copy program using the file I/O services
A file copy program

file_copy.asm

Objective: To copy a file using the int 0x80 services.
Input: Requests names of the input and output files.
Output: Creates a new output file and copies contents
of the input file.
%include "io.mac"
9
10
11

%define

BUF SIZE

256

Chapter 20 • Protected-Mode Interrupt Processing

12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

.DATA
in_fn_prompt
out_fn_prompt
in_file_err_msg
out_file_err_msg

db
db
db
db

.UDATA
in_file_name
out_file_name
fd_in
fd_out
in buf

resb
resb
resd
resd
resb

'Enter the input file name: ',0
'Enter the output file name: ',0
'Input file open error.',0
'Cannot create output file.'^O

30
30
1
1
BUF SIZE

.CODE
.STARTUP
PutStr in_fn_prompt
; request input file name
GetStr in_file_name,3 0 ; read input file name
PutStr
GetStr

out_fn_prompt
; request output file name
out file name,30 ; read output file name

; open the input file
file open
mov
EAX,5
mov
EBX,in_file_name / input file name pointer
ECX,0
mov
access bits (read only)
mov
EDX,0700
file permissions
int
0x80
[fd_in],EAX
mov
store fd for use in
read routine
cmp
EAX,0
open error if fd < 0
jge
create_file
PutSti in_file_err_msg
nwln
jmp
done
create_file:
/create output file
mov
EAX,8
;
mov
EBX,out_f i1e_name;
mov
ECX,0700
/
int
0x8 0
mov
[fd_out],EAX
cmp
jge
PutStr
nwln
jmp

EAX,0
repeat_read
out_file_err_msg
close_exit

repeat_read:
; read input file

file create
output file name pointer
r/w/e by owner only
store fd for use in
write routine
create error if fd < 0

close input file & exit

417

418

63
mov
64
mov
65
mov
66
mov
67
int
68
; write
69
mov
70
mov
71
72
mov
mov
73
int
74
75
cmp
76
77
jl
78
jmp
79
copy_done:
80
mov
81
mov
82
83 : close_exit:
mov
84 :
mov
85
done:
86
.EXIT
87

Assembly Language Programming in Linux

EAX,3
EBX, [fd_in]
ECX,in_buf
EDX,BUF_SIZE
0x8 0
to output file
EDX,EAX
EAX,4
EBX,[fd_out]
ECX,in_buf
0x8 0
EDX,BUF_SIZE
copy_done

file read
file descriptor
• input buffer
• size

byte count
• file write
• file descriptor
• input buffer

• EDX = # bytes read
• EDX < BUF_SIZE
• indicates end-of-file

repeat_read
EAX,6
EBX,[fd_out]

• close output file

EAX,6
EBX,[fd_in]

• close input file

Hardware Interrupts
We have seen how interrupts can be caused by the software instruction i n t . Since these instructions are placed in a program, software interrupts are called synchronous events. Hardware
interrupts, on the other hand, are of hardware origin and asynchronous in nature. These interrupts
are used by I/O devices such as the keyboard to get the processor's attention.
As discussed before, hardware interrupts can be further divided into either maskable or nonmaskable interrupts (see Figure 20.1). A nonmaskable interrupt (NMI) can be triggered by applying an electrical signal on the NMI pin of the processor. This interrupt is called nonmaskable
because the CPU always responds to this signal. In other words, this interrupt cannot be disabled
under program control. The NMI causes a type 2 interrupt.
Most hardware interrupts are of maskable type. To cause this type of interrupt, an electrical
signal should be applied to the INTR (INTerrupt Request) input of the processor. The processor
recognizes the INTR interrupt only if the interrupt enable flag (IF) bit of theflagsregister is set to
1. Thus, these interrupts can be masked or disabled by clearing the IF bit. Note that we can use
s t i and c l i to set and clear this bit in theflagsregister, respectively.
How Does the Processor Know the Interrupt Type? Recall that every interrupt should be identified by a vector (a number between 0 and 255), which is used as an index into the interrupt vector
table to obtain the corresponding ISR address. This interrupt invocation procedure is common to
all interrupts, whether caused by software or hardware.

Chapter 20 • Protected-Mode Interrupt Processing

419

In response to a hardware interrupt request on the INTR pin, the processor initiates an interrupt
acknowledge sequence. As part of this sequence, the processor sends out an interrupt acknowledge
(INTA) signal, and the interrupting device is expected to place the interrupt vector on the data bus.
The processor reads this value and uses it as the interrupt vector.
How Can More Than One Device Interrupt? From the above description, it is clear that all
interrupt requests from external devices should be input via the INTR pin of the processor. While
it is straightforward to connect a single device, computers typically have more than one I/O device
requesting interrupt service. For example, the keyboard, hard disk, and floppy disk all generate
interrupts when they require the attention of the processor.
When more than one device interrupts, we have to have a mechanism to prioritize these interrupts (if they come simultaneously) and forward only one interrupt request at a time to the
processor while keeping the other interrupt requests pending for their turn. This mechanism can
be implemented by using a special APIC (Advanced Programmable Interrupt Controller) chip.
Hardware interrupts provide direct access to the I/O devices. The next section discusses some
of the instructions available to access I/O ports.

Direct Control of I/O Devices
When we want to access an I/O device for which there is no such support available from the
operating system, or when we want a nonstandard access, we have to access these devices directly.
At this point, it is useful to review the material presented in Chapter 4. As described in that
chapter, the IA-32 architecture uses a separate I/O address space of 64K. This address space can
be used for 8-bit, 16-bit, or 32-bit I/O ports. However, the combination cannot be more than the
total I/O space. For example, we can have 64K 8-bit ports, 32K 16-bit ports, 16K 32-bit ports, or
a combination of these that fits the I/O address space.
Devices that transfer data 8 bits at a time can use 8-bit ports. These devices are called 8-bit
devices. An 8-bit device can be located anywhere in the I/O space without any restrictions. On
the other hand, a 16-bit port should be aligned to an even address so that 16 bits can be simultaneously transferred in a single bus cycle. Similarly, 32-bit ports should be aligned at addresses
that are multiples of four. The architecture, however, supports unaligned I/O ports, but there is a
performance penalty (see page 59 for a related discussion).
Accessing I/O Ports

To facilitate access to the I/O ports, the instruction set provides two types of instructions: register
I/O instructions and block I/O instructions. Register I/O instructions are used to transfer data
between a register and an I/O port. Block I/O instructions are used for block transfer of data
between memory and I/O ports.
Register I/O Instructions There are two register I/O instructions: i n and out. The i n instruction is used to read data from an I/O port, and the o u t instruction to write data to an I/O
port. A port address can be any value in the range 0 to FFFFH. The first 256 ports are directly
addressable—address is given as part of the instruction.
Both instructions can be used to operate on 8-, 16-, or 32-bit data. Each instruction can take
one of two forms, depending on whether a port is direcdy addressable or not. The general format
of the i n instruction is

420

Assembly Language Programming in Linux

in
in

a c c u m u l a t o r , p o r t s — direct addressing format
— indirect addressing format
a c c u m u l a t o r , DX

The first form uses the direct addressing mode and can only be used to access the first 256
ports. In this case, the I/O port address, which is in the range 0 to FFH, is given by the p o r t s
operand. In the second form, the I/O port address is given indirectly via the DX register. The
contents of the DX register are treated as the port address.
In either form, the first operand a c c u m u l a t o r must be AL, AX, or EAX. This choice determines whether a byte, word, or doubleword is read from the specified port.
The format for the o u t instruction is
out
out

p o r t s , a c c u m u l a t o r — direct addressing format
DX, a c c u m u l a t o r
— indirect addressing format

Notice the placement of the port address. In the i n instruction, it is the source operand and in the
o u t instruction, it is the destination operand signifying the direction of data movement.
Block I/O Instructions The instruction set has two block I/O instructions: i n s and o u t s . These
instructions can be used to move blocks of data between I/O ports and memory. These I/O instructions are, in some sense, similar to the string instructions discussed in Chapter 17. For this reason,
block I/O instructions are also called string I/O instructions. Like the string instructions, i n s
and o u t s do not take any operands. Also, we can use the repeat prefix r e p as in the string
instructions.
For the i n s instruction, the port address should be placed in DX and the memory address
should be pointed to by ES:(E)DI. The address size determines whether the DI or EDI register is
used (see Chapter 4 for details). Block I/O instructions do not allow the direct addressing format.
For the o u t s instruction, the memory address should be pointed by DS:(E)SI, and the I/O port
should be specified in DX. You can see the similarity between the block I/O instructions and the
string instructions.
You can use the r e p prefix with i n s and o u t s instructions. However, you cannot use the
other two prefixes—repe and repne—with the block I/O instructions. The semantics of r e p
are the same as those in the string instructions. The directions flag (DF) determines whether the
index register in the block I/O instruction is decremented (DF is 1) or incremented (DF is 0). The
increment or decrement value depends on the size of the data unit transferred. For byte transfers,
the index register is updated by 1. For word and doubleword transfers, the corresponding values
are 2 and 4, respectively. The size of the data unit involved in the transfers can be specified as
in the string instructions. Use i n s b and o u t s b for byte transfers, insw and outsw for word
transfers, and i n s d and o u t s d for doubleword transfers.

Summary
Interrupts provide a mechanism to transfer control to an interrupt service routine. The mechanism is similar to that of a procedure call. However, while procedures can be invoked only by a
procedure call in software, interrupts can be invoked by both hardware and software.
Software interrupts are generated using the i n t instruction. Hardware interrupts are generated
by I/O devices. These interrupts are used by I/O devices to interrupt the processor to service their
requests.

Chapter 20 • Protected-Mode Interrupt Processing

421

Software interrupts are often used to support access to the system I/O devices. Linux provides
a high-level interface to the hardware with software interrupts. We introduced Linux system calls
and discussed how these calls can be used to access I/O devices. The system calls are invoked
using i n t
0x8 0. We used several examples to illustrate the utility of these calls in reading
from the keyboard, writing to the screen, and accessing files.
All interrupts, whether hardware-initiated or software-initiated, are identified by an interrupt
type number that is between 0 and 255. This interrupt number is used to access the interrupt vector
table to get the associated interrupt vector.

21
High-Level Language
Interface
Thus far, we have written standalone assembly language programs. This chapter considers mixedmode programming, which refers to writing parts of a program in different programming languages. We use the C and assembly languages to illustrate how such mixed-mode programs are
written. We begin the chapter with discussion of the motivation for writing mixed-mode programs.
Next we give an overview of mixed-mode programming, which can be done either by inline assembly code or by separate assembly modules. We describe both methods with some example
programs. The last section summarizes the chapter

Introduction
In this chapter we focus on mixed-mode programming that involves C and assembly languages.
Thus, we write part of the program in C and the other part in the assembly language. We use
the gcc compiler and NASM assembler to explain the principles involved in mixed-mode progranmiing. This discussion can be easily extended to a different set of languages and compilers/assemblers.
In Chapter 1 we discussed several reasons why one would want to program in the assembly
language. Although it is possible to write a program entirely in the assembly language, there are
several disadvantages in doing so. These include
• Low productivity
• High maintenance cost
• Lack of portability
Low productivity is due to the fact that assembly language is a low-level language. As a result,
a single high-level language instruction may require several assembly language instructions. It has
been observed that programmers tend to produce the same number of lines of debugged and tested
source code per unit time irrespective of the level of the language used. As the assembly language
requires more lines of source code, programmer productivity tends to be low.

424

Assembly Language Programming in Linux

Programs written in the assembly language are difficult to maintain. This is a direct consequence of it's being a low-level language. In addition, assembly language programs are not
portable. On the other hand, the assembly language provides low-level access to system hardware.
In addition, the assembly language may help us reduce the execution time.
As a result of these pros and cons, some programs are written in mixed mode using both
high-level and low-level languages. System software often requires mixed-mode programming.
In such programs, it is possible for a high-level procedure to call a low-level procedure, and vice
versa. The remainder of the chapter discusses how mixed-mode programming is done in C and
assembly languages. Our goal is to illustrate only the principles involved. Once these principles
are understood, the discussion can be generalized to any type of mixed-mode programming.

Overview
There are two ways of writing mixed-mode C and assembly programs: inline assembly code
or separate assembly modules. In the inline assembly method, the C program module contains
assembly language instructions. Most C compilers including gcc allow embedding assembly
language instructions within a C program by prefixing them with asm to let the compiler know
that it is an assembly language instruction. This method is useful if you have only a small amount
of assembly code to embed. Otherwise, separate assembly modules are preferred. We discuss the
inline assembly method later (see page 434).
When separate modules are used for C and assembly languages, each module can be translated
into the corresponding object file. To do this translation, we use a C compiler for the C modules
and an assembler for the assembly modules, as shown in Figure 21.1. Then the linker can be used
to produce the executable file from these object files.
Suppose our mixed-mode program consists of two modules:
• One C module, file sample 1. c, and
• One assembly module, file sample2 .asm.
The process involved in producing the executable file is shown in Figure 21.1. We can invoke the
NASM assembler as
nasm -f elf sample2.asm

This creates the sample2 . o object file. We can compile and link the files with the following
command:
gcc -o samplel.out samplel.c sample2.o

This command instructs the compiler to first compile sample 1. c to sample 1. o. The linker
is automatically invoked to link s a m p l e l .o and sample2 .o to produce the executable file
samplel.out.

Calling Assembly Procedures from C
Let us now discuss how we can call an assembly language procedure from a C program. The
first thing we have to know is what communication medium is used between the C and assembly
language procedures, as the two procedures may exchange parameters and results. You are right if
you guessed it to be the stack.
Given that the stack is used for communication purposes, we still need to know a few more
details as to how the C function places the parameters on the stack, and where it expects the

425

Chapter 21 • High-Level Language Interface

Assembly source file

sample2.asm

Figure 21.1 Steps involved in connpiling mixed-nnode programs.

assembly language procedure to return the result. In addition, we should also know which registers
we can use freely without worrying about preserving their values. Next we discuss these issues in
detail.
Parai^neter Passing There are two ways in which arguments (i.e., parameter values) are pushed
onto the stack: from left to right or from right to left. Most high-level languages push the arguments from left to right. These are called left-pusher languages. C, on the other hand, pushes
arguments from right to left. Thus, C is a right-pusher language. The stack state after executing
sum(a,b,c,d)
is shown in Figure 21.2. From now on, we consider only right-pushing of arguments, as we focus
on the C language.
To see how gcc pushes arguments onto the stack, take a look at the following C program (this
is a partial listing of Program 21.1 on page 428):

426

Assembly Language Programming in Linux
Left-pusher

Right-pusher

-^a"
b'

EIR

TOS, ESP

EIP

-ESP, TOS

Figure 21.2 Two ways of pushing arguments onto the stack.

int main(void)

{
int
int
extern

x=25, y=70;
value;
int test(int, int, int);

value = test (x, y, 5);

The assembly language translation of the procedure call (use the - S option to generate the assembly source code) is shown below: ^
push
push
push
call
add
mov

5
70
25
test
ESP,12
[EBP-12],EAX

This program is compiled with -02 optimization. This optimization is the reason for pushing
constants 70 and 25 instead of variables x and y. If you don't use this optimization, gcc produces
the following code:
push
push
push
call
add
mov

5
[EBP-8]
[EBP-4]
test
ESP,12
[EBP-12],EAX

It is obvious from this code fragment that the compiler assigns space for variables x, y, and v a l u e
on the stack at EBP-4, EBP—8, and EBP-12, respectively. When the t e s t function is called,
the arguments are pushed from right to left, starting with the constant 5. Also notice that the stack
is cleared of the arguments by the C program after the call by the following statement:
^Note that gcc uses AT&T syntax for the assembly language—not the Intel syntax we have been using in this book.
To avoid any confusion, the contents are reported in our syntax. The AT&T syntax is introduced on page 434.

Chapter 21 • High-Level Language Interface

add

427

ESP,12

So, when we write our assembly procedures, we should not bother clearing the arguments from the
stack as we did in our programs in the previous chapters. This convention is used because C allows
a variable number of arguments to be passed in a function call (see our discussion on page 268).
Returning Values We can see from the previous assembly language code that the EAX register
is used to return the function value. In fact, the EAX register is used to return 8-, 16-, and 32-bit
values. To return a 64-bit value, use the EDXiEAX pair with the EDX holding the upper 32 bits.
We have not discussed how floating-point values are returned. For example, if a C function
returns a d o u b l e value, how do we return this value? We discuss this issue in the next chapter.
Preserving Registers In general, the called assembly language procedure can use the registers
as needed, except that the following registers should be preserved:
EBP, EBX, E S I ,

EDI

The other registers, if needed, must be preserved by the calling function.
Globals and Externals Mixed-mode programming involves at least two program modules: a
C module and an assembly module. Thus, we have to declare those functions and procedures
that are not defined in the same module as external. Similarly, those procedures that are accessed
by another module should be declared as global, as discussed in Chapter 11. Before proceeding
further, you may want to review the material on multimodule programs presented in Chapter 11
(see our discussion on page 260). Here we mention only those details that are specific to the
mixed-mode programming involving the C and assembly languages.
In most C compilers, external labels should start with an underscore character (_). The C and
C+-I- compilers automatically append the required underscore character to all external functions
and variables. A consequence of this characteristic is that when we write an assembly procedure
that is called from a C program, we have to make sure that we prefix an underscore character to
its name. However, gcc does not follow this convention by default. Thus, we don't have to worry
about the underscore.

Our First Program
To illustrate the principles involved in writing mixed-mode programs, we look at a simple example
that passes three parameters to the t e s t l assembly language function. The C code is shown
in Program 21.1 and the assembly code in Program 21.2. The function t e s t l is declared as
external in the C program (line 12) and global in the assembly program (line 8). Since C clears
the arguments from the stack, the assembly procedure uses a simple r e t to transfer control back
to the C program. Other than these differences, the assembly procedure is similar to several others
we have written before.

428

Assembly Language Programming in Linux

Program 21.1 An example illustrating assembly calls from C: C code (in file h l l ^ x i c . c)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

/*••*••••••••••••••••••*•*•**•*••••••••••••*••••

* A simple program to illustrate how mixed-mode programs
* are written in C and assembly languages. The main C
* program calls the assembly language procedure testl.
#include

int main(void)

{
int
X = 25, y = 70;
int
value;
extern int testl (int, int, int);
value = testl(x, y, 5 ) ;
printf("Result = %d\n", value);
return 0;

Program 21.2 An example illustrating assembly calls from C: assembly language code (in file
h l l - t e s t .asm)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

; This procedure receives three integers via the stack.
; It adds the first two arguments and subtracts the
; third one. :Et is called from the C program.
segment .text
global

testl

testl:
enter
mov
add
sub
leave
ret

0,0
EAX,[EBP+8]
EAX,[EBP+12]
EAX, [EBP+16]

; get argument1 (x)
; add argument 2 (y)
; subtract argument3 (5)

Illustrative Examples
In this section, we give two more examples to illustrate the interface between C and assembly
language programs.

Chapter 21 • High-Level Language Interface

429

Example 21.1 An example to show parameter passing by call-by-value as well as call-byreference.
This example shows how pointer parameters are handled. The C main function requests three
integers and passes them to the assembly procedure. The C program is given in Program 21.3.
The assembly procedure min_max, shown in Program 21.4, receives the three integer values and
two pointers to variables minimum and maximum. It finds the minimum and maximum of the
three integers and returns them to the main C function via these two pointers. The minimum value
is kept in EAX and the maximum in EDX. The code given on lines 28 to 31 in Program 21.4 stores
the return values by using the EBX register in the indirect addressing mode.
Program 21.3 An example with the C program passing pointers to the assembly program: C code
(in file h l l _ m i n m a x c . c)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

*
*
*
*

An example to illustrate call-by-value and
call-by-reference parameter passing between C and
assembly language modules. The min_max function is
written in assembly language (in hll_minmaxa.asm).

*•*•••••*••••••••*•*•••••••••••*•••••••*••••••*•••••••

#include
int main(void)
{

int
int
extern

valuel, value2, value3;
min, max;
void min_max (int, int, int, int*, int*);

printf("Enter number 1 = " ) ;
scanf("%d", &valuel);
printf("Enter number 2 = " ) /
scanf ( " %d", Scvalue2) ;
printf("Enter number 3 = " ) ;
scanf("%d", &value3);
min_max(valuel, value2, value3, &min, ficmax);
printf("Minimum = %d. Maximum = %d\n", min, max);
return 0;

*
*
*
*

430

Assembly Language Programming in Linux

Program 21.4 An example with the C program passing pointers to the assembly program: assembly
language code (in file h l l jninmaxa. asm)
±

3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

;
;
;
;

Assembly program for themin 1Tiax function - called from
the C program in the file hll__minmaxc.c. This function
finds the minimum and maximum of the three integers it
receives.

global min_ max
min_max:
enter
0,0
/ EAX keeps minimum number and EDX maximum
mov
EAX, [EBP+8]
; get value 1
EDX, [EBP+12]
mov
/ get value 2
EAX,EDX
cmp
; value 1 < value 2?
skipl
/ if so, do nothing
jl
xchg
EAX,EDX
; else, exchange
skipl:
ECX, [EBP+16]
mov
/ get value 3
ECX,EAX
value 3 < min in EAX?
cmp
t
new_min
jl
cmp
ECX,EDX
value 3 < max in EDX?
1
store_result
jl
EDX,ECX
mov
store_result
jtnp
new_min:
mov
EAX,ECX
store_result:
mov
EBX, [EBP+20]
; EBX = Scmin
[EBX],EAX
mov
EBX, [EBP+24]
mov
; EBX = &max
[EBX],EDX
mov
leave
ret

Example 21,2 Array sum example.
This example illustrates how arrays, declared in C, are accessed by assembly language procedures.
The array v a l u e is declared in the C program, as shown in Program 21.5 (line 12). The assembly
language procedure computes the sum as shown in Program 21.6. As in the other programs in this
chapter, the C program clears the parameters off the stack. We will redo this example using inline
assembly on page 439. In addition, a floating-point version of this example is given in the next
chapter.

Chapter 21 • High-Level Language Interface

431

Program 21.5 An array sum example: C code (in file h l l ^rraysumc. c)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

* This program reads 10 integers into an array and calls
* an assembly language program to compute the array sum.
* The assembly program is in "hll_arraysuma.asm" file.
#include
#define

SIZE

int main(void)
{

int
value[SIZE], sum, i;
extern int array_sum(int*, int);
printf("Input %d array values:\n", SIZE);
for (i = 0; i < SIZE; i++)
scanf("%d",lvalue[i]);
sum = array_sum(value,SIZE)/
printf("Array sum = %d\n", sum);
return 0/

Program 21.6 An array sum example: assembly language code (in file h l l ^rraysuma. asm)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

/ This procedure receives an array pointer and its size
; via the stack. It computes the array sum and returns it.
segment .text
global

array_sum

array_sum:
enter
mov
mov
sub
sub
add_loop:
add
inc
cmp
ji
leave
ret

0,0
EDX, [EBP+8]
ECX, [EBP+12]
EBX,EBX
EAX,EAX
EAX,[EDX+EBX*4]
EBX
EBX,ECX
add_loop

copy array pointer to EDX
copy array size to ECX
array index = 0
sum = 0 (EAX keeps the sum)

increment array index

432

Assembly Language Programming in Linux

Calling C Functions from Assembly
So far, we have considered how a C function can call an assembler procedure. Sometimes it is
desirable to call a C function from an assembler procedure. This scenario often arises when we
want to avoid writing assembly language code for a complex task. Instead, a C function could
be written for those tasks. This section illustrates how we can access C functions from assembly
procedures. Essentially, the mechanism is the same: we use the stack as the communication
medium, as shown in the next example.
Example 21.3 An example to illustrate a C function call from an assembly procedure.
In previous chapters, we used simple I/O routines to facilitate input and output in our assembly
language programs. If we want to use the C functions like p r i n t f () and scanf (), we have to
pass the arguments as required by the function. In this example, we show how we can use these
two C functions to facilitate input and output of integers. This discussion can be generalized to
other types of data.
Here we compute the sum of an array passed onto the array_sum assembly language procedure. This example is similar to Example 21.2, except that the C program does not read the
array values; instead, the assembly program does this by calling the p r i n t f () and scanf ()
functions as shown in Program 21.8. In this program, the prompt message is declared as a string
on line 9 (including the newline). The assembly language version implements the equivalent of
the following p r i n t f statement we used in Program 21.5:
printf("Input %d array values:\n", SIZE);

Before calling the p r i n t f function on line 21, we push the array size (which is in ECX) and the
string onto the stack. The stack is cleared on line 22.
The array values are read using the read loop on lines 26 to 36. It uses the scanf function,
the equivalent of the following statement:
scanf ( n d " , & v a l u e [ i ] ) ;
The required arguments (array and format string pointers) are pushed onto the stack on lines 28
and 29 before calling the scanf function on line 30. The array sum is computed using the add
loop on lines 41 to 45 as in Program 21.6.
Program 21.7 An example to illustrate C calls from assembly programs: C code (in file
hll_arraysum2c.c)
1
2
3
4
5
6
7
8
9
10
11

* This program calls an assembly program to read the
* array input and compute its sum. It prints the sum.
* The assembly program is in "hll_arraysum2a.asm" file.
#include
#define

SIZE

int main(void)
{

Chapter 21 • High-Level Language Interface

12
13
14
15
16
17
18

433

value[SIZE];

int

extern int array_sum(int*, int) ;
printfC'sum = %d\n",array_sum(value,SIZE));
return 0;

Program 21.8 An example to illustrate C calls from assembly programs: assembly language code
(in file h l l _ a r r a y s u m 2 a . asm)

This procedure receives an array pointer and its size
via the stack. It first reads the array input from the
user and then computes the array sum.
The sum is returned to the C program.

9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

segment .data
scan_format
printf_format

db
db

"%d",0
"Input %d array values:",10,13,0

segment .text
global
extern

array_sum
printf,scanf

array_sum:
enter
mov
push
push
call
add
mov
mov
read_loop:
push
push
push
call
add
pop
pop
add
dec
jnz

0,0
ECX, [EBP+12]
; copy array size to ECX
ECX
; push array size
dword printf_format
printf
ESP,8
; clear the stack
EDX,[EBP+8]
ECX, [EBP+12]

; copy array pointer to EDX
; copy array size to ECX

ECX
; save loop count
EDX
/ push array pointer
dword scan_format
scanf
ESP,4
; clear stack of one argument
EDX
; restore array pointer in EDX
ECX
; restore loop count
EDX,4
; update array pointer
ECX
read_loop
EDX, [EBP+8]

; copy array pointer to EDX

434

39
40
41
42
43
44
45
46
47

Assembly Language Programming in Linux

mov
sub
add l o o p :
add
add
^ dec
jnz
leave
ret

ECX, [EBP+12]
EAX,EAX
EAX,[EDX]
EDX,4
ECX

; copy array size to ECX
; EAX = 0 (EAX keeps the sum)

update array pointer

add_loop

Inline Assembly
In this section we look at writing inline assembly code. In this method, we embed assembly
language statements within the C code. We identify assembly language statements by using the
asm construct. (You can use asm if asm causes a conflict, e.g., for ANSI C compatibility.)
We now have a serious problem: the gcc syntax for the assembly language statements is
different from the syntax we have been using so far. We have been using the Intel syntax (NASM,
TASM, and MASM use this syntax). The gcc compiler uses the AT&T syntax, which is used by
GNU assemblers. It is different in several aspects from the Intel syntax. But don't worry! We give
an executive summary of the differences so that you can understand the syntactical differences
without spending too much time.
The AT&T Syntax

This section gives a summary of some of the key differences from the Intel syntax.
Register Naming In the AT&T syntax, we have to prefix register names with %. For example,
the EAX register is specified as %eax.
Source and Destination Order The source and destination operands order is reversed in the
AT&T syntax. In this format, source operand is on the left-hand side. For example, the instruction
mov
eax,ebx
is written as
movl

%ebx,%eax

Operand Size As demonstrated by the last example, the instructions specify the operand size.
The instructions are suffixed with b, w, and 1 for byte, word, and longword operands, respectively.
With this specification, we don't have to use b y t e , word, and dword to clarify the operand size
(see our discussion on page 197).
The operand size specification is not strictly necessary. You can let the compiler guess the size
of the operand. However, if you specify, it takes the guesswork out and we don't have to worry
about the compiler making an incorrect guess. Here are some examples:
movb
movw
movl

%bl,%al
%bx,%ax
%ebx,%eax

; moves contents of b l to a l
; moves contents of bx to ax
; moves contents of ebx to eax

Chapter 21 • High-Level Language Interface

435

Immediate and Constant Operands In the AT&T syntax, immediate and constant operands are
specified by prefixing with $. Here are some examples:
movb

$2 55,%al

movl

$OxFFFFFFFF,%eax

The following statement loads the address of the C global variable t o t a l into the EAX register:
movl
$total,%eax
This works only if t o t a l is declared as a global variable. Otherwise, we have to use the extended
asm construct that we discuss later.
Addressing To specify indirect addressing, the AT&T syntax uses brackets (not square brackets). For example, the instruction
mov

eax,[ebx]

is written in AT&T syntax as
movl

(%ebx),%eax

The full 32-bit protected-mode addressing format is shown below:
imm32(base,index,scale)
The address is computed as
imm32 + base + index * scale
If we declared m a r k s as a global array of integers, we can load m a r k s [5] into EAX register
using
movl
movl

$5,%ebx
marks(,%ebx,4),%eax

For example, if the pointer to m a r k s is in the EAX register, we can load m a r k s [5] into the
EAX register using
movl
movl

$5,%ebx
(%eax,%ebx,4),%eax

We use a similar technique in the array sum example discussed later. We have covered enough
details to work with the AT&T syntax.

Simple Inline Statements
At the basic level, introducing assembly statements is not difficult. Here is an example that increments the EAX register contents:
a s m ( " i n c l %eax");
Multiple assembly statements like these

436

Assembly Language Programming in Linux

asmC'pushl
asm("incl
asmCpopl

%eax");
%eax");
%eax");

can be grouped into a single compound asm statement as shown below:
asmC'pushl

%eax; i n c l

%eax; p o p l

%eax")/

If you want to add structure to this compound statement, you can write the above statement as
follows:
asmC'pushl
"incl
"popl

%eax;"
%eax/"
%eax");

We have one major problem in accessing the registers as we did here: How do we know if gcc is
not keeping something useful in the register that we are using? More importantly, how do we get
access to C variables that are not global to manipulate in our inline assembly code? The answers
are provided by the extended asm statement. This is where we are going next.
Extended Inline Statements

The format of the asm statement consists of four components as shown below:
asm(assembly code
:outputs
:inputs
:clobber list);

Each component is separated by a colon (:). The last three components are optional. These four
components are described next.
Assembly Code This component consists of the assembly language statements to be inserted
into the C code. This may have a single instruction or a sequence of instructions, as discussed
in the last subsection. If no compiler optimization should be done to this code, add the keyword
v o l a t i l e after asm (i.e., use asm v o l a t i l e ) . The instructions typically use the operands
specified in the next two components.
Outputs This component specifies the output operands for the assembly code. The format for
specifying each operand is shown below:
"=op-constraint" (C-expression)

The first part specifies an operand constraint, and the part in brackets is a C expression. The =
identifies that it is an output constraint. For some strange reason we have to specify = even though
we separate inputs and outputs with a colon. The following example
"=r"

(sum)

specifies that the C variable sum should be mapped to a register as indicated by r in the constraint.
Multiple operands can be specified by separating them with commas. We give some examples later.

Chapter 21 • High-Level Language Interface

437

Depending on the processor, several other choices are allowed including m (memory), i (immediate), rm (register or memory), r i (register or immediate), or g (general). The last one is
typically equivalent to rim. You can also specify a particular register by using a, b, and so on.
The following table summarizes the register letters used to specify which registers that gcc may
use:
Letter
a
b
c
d
S
D
r
q
A
f
t
u

Register set
EAX register
EBX register
ECX register
EDX register
ESI register
EDI register
Any of the eight general registers
(EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP)
Any of the four data registers
(EAX, EBX, ECX, EDX)
A 64-bit value in EAX and EDX
Floating-point registers
Topfloating-pointregister
Second topfloating-pointregister

The last three letters are used to specify floating-point registers. We discuss floating-point
operations in the next chapter.
Inputs The inputs are also specified in the same way, except for the = sign. The operands
specified in the output and input parts are assigned sequence numbers 0, 1, 2 , . . . starting with the
leftmost output operand. There can be a total of 10 operands, inputs and outputs combined. Thus,
9 is the maximum sequence number allowed.
In the assembly code, we can refer to the output and input operands by their sequence number
prefixed with %. In the following example
a s m C ' m o v l %1,%0"
:"=r"(sum)
:"r"(number1)

/ * o u t p u t */
/* input
*/

);

the C variables sum and number 1 are both mapped to registers. In the assembly code statement,
sum is identified by %0 and number 1 by %1. Thus, this statement copies the value of number 1
to sum.
Sometimes, an operand provides input and receives the result as well (e.g., x in x = x + y).
In this case, the operand should be in both lists. In addition, you should use its output sequence

438

Assembly Language Programming in Linux

number as its input constraint specifier. The following example clarifies what we mean.
asm( "caddl %1,%0"
:"=r"(sum)
:"r"(number1) , II Q M (sum)

/* output */
/* inputs */

),
In this example, we want to perform sum = sum + number 1. In this expression, the variable
sum provides one of the inputs and also receives the result. Thus, sum is in both lists. However,
note that the constraint specifier for it in the input list is " 0", not " r " .
The assembly code can use specific registers prefixing the register with %. Since the AT&T
syntax prefixes registers with %, we end up using %% as in %%eax to refer to the EAX register.
Clobber List This last component specifies the list of registers modified by the assembly instructions in the asm statement. This lets gcc know that it cannot assume that the contents of
these registers are valid after the asm statement. The compiler may use this information to reload
their values after executing the asm statement.
In case the assembly code modifies the memory, use the keyword "memory" to indicate this
fact. Even though it may not be needed, you may want to specify " c c " in the clobber list if the
flags register is modified (e.g., by an arithmetic instruction). Here is an example that includes the
clobber list:
asmC'movl %0,%%eax"
: /* no output */
:"r"(numberl) /* inputs */
:"%eax"
/* clobber list */

In this example, there is no output list; thus, the input operand (numberl) is referred by %0.
Since we copy the value of numberl into EAX register, we specify EAX in the clobber list so
that gcc knows that our asm statement modifies this register.
Inline Examples

We now give some examples to illustrate how we can write mixed-mode programs using the inline
assembly method.
Example 21.4 Ourfirstinline assembly example.
As ourfirstexample, we rewrite the code of the example given on page 428 using inline assembly.
The inline code is given in Program 21.9. The procedure t e s t 1 is written using inline assembly
code. We use the EAX register to compute the sum as in Program 21.2 (see lines 22-24). Since
there are no output operands, we explicitly state this by the comment on line 25. The three input
operands x, y, and z, specified on line 26, are referred in the assembly code as %0, %1, and
%2, respectively. The clobbered list consists of the EAX register and the flags register ("cc") as
the add and sub instructions modify the flags register. Since the result is available in the EAX
register, we simply return from the function.

Chapter 21 • High-Level Language Interface

439

Program 21.9 Our first Inline assembly code example (in file h l i _exi j L n l i n e . c)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

* A simple program to illustrate how mixed-mode programs
* are written in C and assembly languages. This program
* uses inline assembly code in the testl function.
•••****••*••••••**•*•*••*•••••••••••••••••*•••••••••••••/

#include

int main(void)
{

int
X = 25, y
int
value;
extern int testl
value = testl(x,
printf("Result =

= 70;
(int, int, int);
y, 5 ) ;
%d\n", value);

return 0;

int testl(int x, int y, int z)

{
asmC'movl %0, %%eax;"
"addl %l,%%eax;"
"subl %2,%%eax;"
:/* no outputs */
/* outputs */
: "r"(x), "r"(y), "r"(z) /* inputs */
:"cc","%eax");
/* clobber list */

Example 21.5 Array sum example—inline version.
This is the inline assembly version of the array sum example we did in Example 21.2. The program
is given in Program 21.10. In the array_sum procedure, we replace the C statement
sum += v a l u e [ i ] ;

by the inline assembly code. The output operand specifies sum. The input operand list consists
of the array v a l u e , array index variable i, and sum. Since sum is also in the output list, we use
" 0" as explained before. Since we use the add instruction, we specify " c c " in the clobber list
as in the last example.
The assembly code consists of a single a d d l instruction. The source operand of this add
instruction is given as (% 1, %2 , 4 ) . From our discussion on page 435 it is clear that this operand
refers to v a l u e [ i ] . The rest of the code is straightforward to follow.

440

Assembly Language Programming in Linux

Program 21.10 Inline assembly version of the array sum example (in file h l l ^ r r a y s u m .
inline.c)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

* This program reads 10 integers into an array and calls
* an assembly language program to compute the array sum.
* It uses inline assembly code in array_sum function.
#include
#define

SIZE

int main(void)
{

int
value[SIZE], sum, i;
int
array_sum(int*, int);
printf("Input %d array values:\n", SIZE);
for (i = 0; i < SIZE; i++)
scanf("%d",lvalue[i])/
sum = array_sum(value,SIZE);
printf("Array sum = %d\n", sum)/
return 0;

int array_sum(int* value, int size)
{

int i, sum=0;
for (i = 0; i < size; i++)
asm("addl (%1,%2,4),%0"
=r"(sum)
/* output */
;"r"(value),"r"(i),"0"(sum)
/* inputs */
:"cc");
/* clobber list */
return(sum);

Example 21.6 Array sum example—inline version 2.
In the last example, we just replaced the statement
sum += v a l u e [ i ] ;
of the a r r a y _ s u m function by the assembly language statement. In this example, we rewrite
the a r r a y _ s u m function completely in the assembly language. The rewritten function is shown
in Program 21.11. This code illustrates some of the features we have not used in the previous
examples.
As you can see from line 10, we receive the two input parameters ( v a l u e and s i z e ) in
specific registers ( v a l u e in EBX and s i z e in ECX). We compute the sum directly in the EAX

Chapter 21 • High-Level Language Interface

441

register, so there are no outputs in the asm statement (see line 9). We don't use " %0" and " %1"
to refer to the input operands. Since these are mapped to specific registers, we can use the register
names in our assembly language code (see lines 5 and 6).
We use the EAX register to keep the sum. This register is initialized to zero on line 3. We use
j ecxz to test if ECX is zero. This is the termination condition for the loop. This code also shows
how we can use jump instructions and labels.
Program 21.11 Another inline assembly version of the array ^um function (This function is in file
hll_arraysum_inline2 . c)
1
2
3
4
5
6
7
8
9
10
11
12

int array_sum(int* value, int size)
{
asm("
xorl %%eax,%%eax;"
/* sum = 0 */
"repl: jecxz done;
"
"
decl %%ecx/
"
"
addl (%%ebx,%%ecx,4),%%eax/"
"
jmp
repl;
"
"done:
/* no outputs */
: "b"(value),"c"(size)
/* inputs */
keax","cc");
/* clobber list */

Summary
We introduced the principles involved in mixed-mode programming. We discussed the main motivation for writing mixed-mode programs. This chapter focused on mixed-mode programming
involving C and the assembly language. Using the gcc compiler and NASM assembler, we
demonstrated how assembly language procedures are called from C, and vice versa. Once you
understand the principles discussed in this chapter, you can easily handle any type of mixed-mode
programming activity.

22
Floating-Point
Operations
In this chapter we introduce the Boating-point instructions. After giving a brief introduction to the
floating-point numbers, we describe the registers of the floating-point unit. The floating-point unit
supports several floating-point instructions. We describe a subset of these instructions in detail.
We then give a few examples to illustrate the application of these floating-point instructions. We
conclude the chapter with a summary.

Introduction
In the previous chapters, we represented numbers using integers. As you know, these numbers
cannot be used to represent fractions. We usefloating-pointnumbers to represent fractions. For
example, in C, we use the f l o a t and d o u b l e data types for thefloating-pointnumbers.
One key characteristic of integers is that operations on these numbers are always precise. For
example, when we add two integers, we always get the exact result. In contrast, operations on
floating-point numbers are subjected to rounding-off errors. This tends to make the result approximate, rather than precise. However,floating-pointnumbers have several advantages.
Floating-point numbers can be used to represent both very small numbers and very large numbers. To achieve this, these numbers use the scientific notation to represent numbers. The number
is divided into three parts: the sign, the mantissa, and the exponent. The sign bit identifies whether
the number is positive (0) or negative (1). The magnitude is given by
magnitude = mantissa x 2^^^''"^"^
Implementation of floating-point numbers on computer systems vary from this generic
format—usually for efficiency reasons or to conform to a standard. The Intel 32-bit processors,
like most other processors, follow the IEEE 754floating-pointstandard. Such standards are useful, for example, to exchange data among several different computer systems and to write efficient
numerical software libraries.

444

Assembly Language Programming in Linux

Thefloating-pointunit (FPU) supports three formats forfloating-pointnumbers. Two of these
are for external use and one for internal use. The external format defines two precision types:
the single-precision format uses 32 bits while the double-precision format uses 64 bits. In C, we
use f l o a t for single-precision and d o u b l e for double-precision floating-point numbers. The
internal format uses 80 bits and is referred to as the extended format. As we see in the next
section, all internal registers of thefloating-pointunit are 80 bits so that they can store floatingpoint numbers in the extended format. More details on the floating-point numbers are given in
Appendix A.
The number-crunching capability of a processor can be enhanced by using a special hardware to perform floating-point operations. The 80X87 numeric coprocessors were designed to
work with the 80X86 family of processors. The 8087 coprocessor was designed for the 8086
and 8088 processors to provide extensive high-speed numeric processing capabilities. The 8087,
for example, provided about a hundredfold improvement in execution time compared to that of
an equivalent software function on the 8086 processor. The 80287 and 80387 coprocessors were
designed for use with the 80286 and 80386 processors, respectively. Starting with the 80486 processor, thefloating-pointunit has been integrated into the processor itself, avoiding the need for
external numeric processors.
In the remainder of this chapter, we discuss thefloating-pointunit organization and its instructions. Toward the end of the chapter, we give a few example programs that use the floating-point
instructions.

Floating-Point Unit Organization
The floating-point unit provides several registers, as shown in Figure 22.1. These registers are
divided into three groups: data registers, control and status registers, and pointer registers. The
last group consists of the instruction and data pointer registers, as shown in Figure 22.1. These
pointers provide support for programmed exception handlers. Since this topic is beyond the scope
of this book, we do not discuss details of these registers.
Data Registers

The FPU has eight floating-point registers to hold the floating-point operands. These registers
supply the necessary operands to thefloating-pointinstructions. Unlike the processor's generalpurpose registers such as the EAX and EBX registers, these registers are organized as a register
stack. In addition, we can access these registers individually using STO, STl, and so on.
Since these registers are organized as a register stack, these names are not statically assigned.
That is, STO does not refer to a specific register. It refers to whichever register is acting as the
top-of-stack (TOS) register. The next register is referred to as STl, and so on; the last register as
ST7. There is a 3-bit top-of-stack pointer in the status register to identify the TOS register.
Each data register can hold an extended-precisionfloating-pointnumber. This format uses 80
bits as opposed to single-precision (32 bits) or double-precision (64 bits) formats. The rationale is
that these registers typically hold intermediate results and using the extended format improves the
accuracy of thefinalresult.
The status and contents of each register is indicated by a 2-bit tag field. Since we have eight
registers, we need a total of 16 tag bits. These 16 bits are stored in the tag register (see Figure 22.1).
We discuss the tag register details a little later.

Chapter 22 • Floating-Point Operations
79
ST7

sign

445

64 63

exponent

mantissa

ST6
ST5
ST4
ST3
ST2
STl
STO

FPU data registers
15

0
Control register
Status register
Tag register

Instruction pointer
Data pointer
Figure 22.1 FPU registers.

Control and Status Registers

This group consists of three 16-bit registers: the control register, the status register, and the tag
register, as shown in Figure 22.1.
FPU Control Register This register is used to provide control to the programmer on several
processing options. Details about the control word are given in Figure 22.2. The least significant
six bits contain masks for the sixfloating-pointexceptions. The PC and RC bits control precision
and rounding. Each uses two bits to specify four possible controls. The options for the rounding
control are
•
•
•
•

00 — Round to nearest
01 — Round down
10 — Roundup
11 — Truncate

The precision control can be used to set the internal operating precision to less than the default
precision. These bits are provided for compatibility to earlier FPUs with less precision. The
options for precision are

Assembly Language Programming in Linux

446

Rounding
control
/
s
i
1
RC
PC

1
Precision
control

0
P U O Z D I
M M M M M M
Exception masks
PM = Precision
UM = Underflow
OM = Overflow
ZM = Divide-by-zero
DM = Denormalized operand
IM = Invalid operation

Figure 22.2 FPU control register details (the shaded bits are not used).

•
•
•
•

00 —
01 —
10 —
11 —

24 bits (single precision)
Not used
53 bits (double precision)
64 bits (extended precision)

FPU Status Register This 16-bit register keeps the status of the FPU (see Figure 22.3). The four
condition code bits (CO - C3) are updated to reflect the result of the floating-point arithmetic operations. These bits are similar to the flags register of the processor. The correspondence between
three of these four bits and the flag register is shown below:
FPU flag CPU flag
CO

The missing CI bit is used to indicate stack underflow/overflow (discussed below). These bits are
used for conditional branching just like the corresponding CPU flag bits.
To facilitate this branching, the status word should be copied into the CPU flags register. This
copying is a two-step process. First, we use the f s t s w instruction to store the status word in the
AX register. We can then load these values into the flags register by using the s a h f instruction.
Once loaded, we can use conditional jump instructions. We demonstrate an application of this in
Example 22.1.
The status register uses three bits to maintain the top-of-stack (TOS) information. The eight
floating-point registers are organized as a circular buffer. The TOS identifies the register that is at
the top. Like the CPU stack, this value is updated as we push and pop from the stack.
The least significant six bits give the status of the six exceptions shown in Figure 22.3. The
invalid operation exception may occur due to either a stack operation or an arithmetic operation.
The stack fault bit gives information as to the cause of the invalid operation. If this bit is 1, the
stack fault is caused by a stack operation that resulted in a stack overflow or underflow condition; otherwise, the stack fault is due to an arithmetic instruction encountering an invalid operand.

Chapter 22 • Floating-Point Operations
Condition code
15 ^ (
C
B 3

Busy

1 1
TOS
1 1
Top-of-stack

447
Stack fault

C C c E S P U 0 Z D I
2 1 0 S F E E E E E E

Error status

Exception flags
PE = Precision
UE = Underflow
OE = Overflow
ZE = Divide-by-zero
DE = Denormalized operand
IE = Invalid operation

Figure 22.3 FPU status register details. The busy bit Is Included for 8087 compatibility only.

We can use the C1 bit to further distinguish between the stack underflow (CI =0) and overflow
(CI = 1).
The overflow and underflow exceptions occur if the number is too big or too small. These
exceptions usually occur when we executefloating-pointarithmetic instructions.
The precision exception indicates that the result of an operation could not be represented exactly. This, for example, would be the case when we want to represent a fraction like 1/3. This
exception indicates that we lost some accuracy in representing the result. In most cases, this loss
of accuracy is acceptable.
The divide-by-zero exception is similar to the divide error exception generated by the processor
(see our discussion on page 409). The denormal exception is generated when an arithmetic instruction attempts to operate on a denormal operand (denormals are explained later—see page 452).
Tag Register This register stores information on the status and content of the data registers. The
tag register details are shown in Figure 22.4. For each register, two bits are used to give the
following information:
•
•
•
•

00 —valid
01 — zero
10 — special (invalid, infinity, or denormal)
11 — empty

The least significant two bits are used for the STO register, and the next two bits for the STl
register, and so on. This tag field identifies whether the associated register is empty or not. If not
empty, it identifies the contents: valid number, zero, or some special value like infinity.

Floating-Point instructions
The FPU provides severalfloating-pointinstructions for data movement, arithmetic, comparison,
and transcendental operations. In addition, there are instructions for loading frequently used constants like TT as well as processor control words. In this section we look at some of these instructions.

448

Assembly Language Programming in Linux

15 14 13 12 11 10 9 8 7 6 5 4 3 2
1

ST7
Tag

ST6
Tag

ST5
Tag

ST4
Tag

ST3
Tag

1 0

ST2
Tag

STl
Tag

STO
Tag

Figure 22.4 FPU tag register details.

Unless otherwise specified, these instructions affect the four FPU flag bits as follows: the flag
bits CO, C2, and C3 are undefined; the CI flag is updated as described before to indicate the stack
overflow/underflow condition. Most instructions we discuss next, except the compare instructions,
affect the flags this way.
Data IVIovement

Data movement is supported by two types of instructions: load and store. We start our discussion
with the load instructions. The general load instruction has the following format:
fid

src

This instruction pushes s r c onto the FPU stack. That is, it decrements the TOS pointer and
stores s r c at STO. The s r c operand can be in a register or in memory. If the source operand
is in memory, it can be a single-precision (32-bit), double-precision (64-bit), or extended (80-bit)
floating-point number. Since the registers hold the numbers in the extended format, a single- or
double-precision number is converted to the extended format before storing it in STO.
There are also instructions to push constants onto the stack. These instructions do not take any
operands. Here is a list of these instructions:
Instruction
f Idz
f Idl
f 1 dp i
f Idl21
f 1dl 2 e
f ldlg2
fldln2

Description
Push +0.0 onto the stack
Push +1.0 onto the stack
Push IT onto the stack
Push log210 onto the stack
Push log2e onto the stack
Push logio2 onto the stack
Push logg2 onto the stack

To load an integer, we can use
fild
The s r c operand must be a 16- or 32-bit integer located in memory. The instruction converts the
integer to the extended format and pushes onto the stack (i.e., loads in STO).
The store instruction has the following format:
fst

dest

It stores the top-of-stack values at d e s t . The destination can be one of the FPU registers or
memory. Like the load instruction, the memory operand can be single-precision, double-precision,

Chapter 22 • Floating-Point Operations

449

or extended floating-point number. As usual, if the destination is a single- or double-precision
operand, the register value is converted to the destination format. It is important to note this
instruction does not remove the value from the stack; it simply copies its value. If you want the
value to be copied as well as pop it off the stack, use the following instruction (i.e., use the suffix
P):
fstp

dest

There is an integer version of the store instruction. The instruction
fist

dest

converts the value in STO to a signed integer and stores it at d e s t in memory. It uses the RC
(rounding control) field in the conversion (see the available rounding options on page 445).
The pop version of this instruction
fistp

dest

performs similar conversion as the f i s t instruction; the difference is that it also pops the value
from the stack.
Addition

The basic add instruction has the following format:
fadd

src

It adds thefloating-pointnumber in memory (at s r c ) to that in STO and stores the result back in
STO. The value at s r c can be a single- or double-precision number. This instruction does not pop
the stack.
The two-operand version of the instruction allows us to specify the destination register:
fadd

dest,src

In this instruction, both s r c and d e s t must be FPU registers. Like the last add instruction, it
does not pop the stack. For this, you have to use the pop version:
faddp

dest,src

We can add integers using the following instruction:
fiadd

src

Here s r c is a memory operand that is either a 16- or 32-bit integer.
Subtraction

The subtract instruction has a similar instruction format as the add instruction. The subtract instruction
fsub

src

performs the following operation:

450

Assembly Language Programming in Linux

STO = S T O - s r c
As in the add instruction, we can use the two-operand version to specify two registers. The
instruction
fsub

dest,src

performs d e s t — d e s t — s r c . We can also have a pop version of this instruction:
fsubp

dest,src

Since subtraction is not commutative (i.e., A - B is not the same as JB — A), there is a reverse
subtract operation. It is reverse in the sense that operands of this instruction are reversed from the
previous subtract instructions. The instruction
fsubr

src

performs the operation STO = s r c - S T O . Note that the f s u b performs S T O - s r c . Now you
know why this instruction is called the reverse subtract! Like the f s u b instruction, there is a
two-operand version as well as a pop version (for the pop version, use f s u b r p opcode).
If you want to subtract an integer, you can use f i s u b for the standard subtraction, or f i s u b r
for reverse subtraction. As in the f i a d d instruction, the 16- or 32-bit integer must be in memory.
l\/lultiplication
The multiplication instruction has several versions similar to the f a d d instruction. We start with
the memory operand version:
fmul

src

where the source ( s r c ) can be a 32- or 64-bit floating-point number in memory. It multiplies this
value with that in STO and stores the result in STO.
As in the add and subtract instructions, we can use the two-operand version to specify two
registers. The instruction
fmul

dest,src

performs d e s t = d e s t * s r c . The pop version of this instruction is also available:
fmulp

dest,src

There is also a special pop version that does not take any operands. The operands are assumed to
be the top two values on the stack. The instruction
fmulp
is similar to the last one except that it multiplies STO and STl.
To multiply the contents of STO by an integer stored in memory, we can use
fimul

src

The value at s r c can be a 32- or 64-bit integer.

Chapter 22 • Floating-Point Operations

451

Division

The division instruction has several versions like the subtract instruction. The memory version of
the divide instruction is
fdiv

src

It divides the contents of STO by s r c and stores the result in STO:
STO =^ STO/src

The s r c operand can be a single- or double-precisionfloating-pointvalue in memory.
The two-operand version
fdiv

dest,src

performs d e s t = d e s t / s r c . As in the previous instructions, both operands must be in the
floating-point registers. The pop version uses f d i v p instead of f d i v . To divide STO by an
integer, use the f i d i v instruction.
Like the subtract instruction, there is a reverse variation for each of these divide instructions.
The rationale is simple: A/B is not the same as B/A. For example, the reverse divide instruction
fdivr

src

performs
STO = src/STO

As shown in this instruction, we get the reverse version by suffixing r to the opcode.
Comparison

This instruction can be used to compare twofloating-pointnumbers. The format is
fcom

src

It compares the value in STO with s r c and sets the FPU flags. The s r c operand can be in memory
or in a register. As mentioned before, the CI bit is used to indicate stack overflow/underflow
condition. The other threeflags—CO,C2, and C3—are used to indicate the relationship as follows:
STO > s r c
STO-src
STO < s r c
Not comparable

C3C2C0 = 0 00
C3C2C0=100
C3C2C0 = 00 1
C3 C2 CO = 1 1 1

If no operand is given in the instruction, the top two values are compared (i.e., STO is compared
with STl). The pop version is also available (f comp).
The compare instruction also comes in a double-popflavor.The instruction
fcompp
takes no operands. It compares STO with STl and updates the FPU flags as discussed before.
In addition, it pops the two values off the stack, effectively removing the two numbers it just
compared.
To compare the top of stack with an integer value in memory, we can use

452

Assembly Language Programming in Linux

ficom

src

The s r c can be a 16- or 32-bit integer. There is also the pop version of this instruction (f icomp).
A special case of comparison that is often required is the comparison with zero. The instruction
ftst

can used for this purpose. It takes no operands and compares the stack top value to 0.0 and updates
the FPU flags as in the f cmp instruction.
The last instruction we discuss here allows us to examine the type of number. The instruction
fxam

examines the number in STO and returns its sign in CI flag bit (0 for positive and 1 for negative).
In addition, it returns the following information in the remaining three flag bits (CO, C2, and C3):
Type
Unsupported
NaN
Normal
Infinity
Zero
Empty
Denormal

C3
0
0
0
0
1
1
1

C2
0
0
1
1
0
0
1

CO
0
1
0
1
0
1
0

The unsupported type is a format that is not part of the IEEE 754 standard. The NaN represents
Not-a-Number, as discussed in Appendix A. The meaning of Normal, Infinity, and Zero does not
require an explanation. A register that does not have a number is identified as Empty,
Denormals are used for numbers that are very close to zero. Recall that normalized numbers
have 1.XX...XX as the mantissa. In single- and double-precision numbers, the integer 1 is not
explicitly stored (it is implied to save a bit). Thus, we store only XX...XX in mantissa. This
integer bit is explicitly stored in the extended format.
When the number is very close to zero, we may underflow the exponent when we try to normalize it. Therefore, in this case, we leave the integer bit as zero. Thus, a denormal has the following
two properties:
• The exponent is zero;
• The integer bit of the mantissa is zero as well.
Miscellaneous

We now give details on some of the remaining floating-point instructions. Note that there are
several other instructions that are not covered in our discussion here. The NASM manual gives a
complete list of thefloating-pointinstructions implemented in NASM.
The instruction
f chs

changes the sign of the number in STO. We use this instruction in our quadratic roots example to
invert the sign. A related instruction

Chapter 22 • Floating-Point Operations

453

fabs
replaces the value in STO with its absolute value.
Two instructions are available for loading and storing the control word. The instruction
fldcw

src

loads the 16-bit value in memory at s r c into the FPU control word register. To store the control
word, we use
fstcw

dest

Following this instruction, all four flag bits (CO - C3) are undefined.
To store the status word, we can use the instruction
fstsw

dest

It stores the status word at d e s t . Note that the d e s t can be a 16-bit memory location or the AX
register. Combining this instruction with sahf, which copies AH into the processor flags register,
gives us the ability to use the conditional jump instructions. We use these two instructions in the
quadratic roots example given later. After executing this instruction, all four flag bits (CO - C3)
are undefined.

Our First Program
All the examples in this chapter follow the mixed-mode programs discussed in the last chapter.
Thus, you need to understand the material presented in the last chapter in order to follow these
examples.
As our introductoryfloating-pointexample, we write an assembly language program to compute the sum of an array of doubles. We have done an integer version of this program in the last
chapter (see Example 21.2 on page 430). Here we use a separate assembly language module. In
the next section, we will redo this example using the inline assembly method.
The C program, shown in Program 22.1, takes care of the user interface. It requests values to
fill the array and then calls the a r r a y _ f sum assembly language procedure to compute the sum.
The a r r a y _ f sum procedure is given in Program 22.2. It copies the array pointer to EDX
(line 11) and the array size to ECX (line 12). We initialize STO to zero by using the f I d z instruction on line 13. The add loop consists of the code on lines 14-18. We use the j ecxz instruction
to exit the loop if the index is zero at the start of the loop.
We use the f add instruction to compute the sum in STO. Also note that the based-indexed
addressing mode with a scale factor of 8 is used to read the array elements (line 17). Since C
programs expectfloating-pointreturn values in STO, we simply return from the procedure as the
result is already in STO.

454

Assembly Language Programming in Linux

Program 22.1 Array sum program—C program
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:

* This program reads SIZE values into an array and calls
* an assembly language program to compute the array sum.
* The assembly program is in the file "arrayfsuma.asm".
••••*•••*•••••••••••••••***••••*••••••*••*••*••••••*••**

#include
#define

SIZE

int main(void)
{
double
value[SIZE];
int
i;
extern double array_fsum(double*, int)/
printf("Input %d array values:\n", SIZE);
for (i = 0; i < SIZE; i++)
scanf ("%lf", Scvalue [i] ) ;
printf("Array sum = %lf\n", array_fsum(value,SIZE));
return 0;

Program 22.2 Array sum program—assembly language procedure
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:

This procedure receives an array pointer and its size
via the stack. It computes the array sum and returns
it via STO.
segment .text
global array_fsum
array_fsum:
enter
mov
mov
fldz
add_loop:
jecxz
dec
fadd
jmp
done:
leave
ret

0,0
EDX, [EBP+8]
ECX, [EBP+12]

done
ECX
qword[EDX+ECX*8]
add_loop

copy array pointer
copy array size
STO = 0 (sum is in STO)
update the array index
STO = STO + arrary_element

Chapter 22 • Floating-Point Operations

455

Illustrative Examples
To further illustrate the application of thefloating-pointinstructions, we give a couple of examples
here. The first example uses separate assembly language modules as in the last example. The
second example uses inline assembly code.
Example 22,1 Quadratic equation solution.
In this example, we find roots of the quadratic equation
ax^ -\-bx -\- c = 0 .
The two roots are defined as follows:
rootl =
root2 =

-b+vW^- 4ac
2a

-h-s/W- - 4ac

2a
The roots are real if b"^ > 4ac, and imaginary otherwise.
As in the last example, our C program takes care of the user interface (see Program 22.3). It requests the user to input constants a, b, and c. It then passes these three values to the q u a d _ r o o t s
assembly language procedure along with two pointers to r o o t l and r o o t 2 . This procedure returns 0 if the roots are not real; otherwise it returns 1. If the roots are real, the two roots are
returned in r o o t l and r o o t 2 .
The assembly language procedure, shown in Program 22.4, receives five arguments: three
constants and two pointers to return the two roots. These five arguments are assigned convenient
labels on lines 7-11. The comments included in the code make it easy to follow the body of the
procedure. On each line, we indicate the contents on the stack with the leftmost value being at the
top of the stack.
We use the f t St instruction to see if (6^ — 4ac) is negative (line 30). We move the FPU flag
bits to AX and then to the processor flags register using the f s t s w and sahf instructions on
lines 31 and 32. Once these bits are copied into theflagsregister, we can use the conditional jump
instruction j b (line 33). The rest of the procedure body is straightforward to follow.
Program 22.3 Quadratic equation solution—C program
1
2
3
4
5
6
7
8
9
10
11
12

/••••••*•••••**•••••••••••*••••*•••••••••••••••••••••••••

*
*
*
*

This program reads three constants (a, b, c) and calls
an assembly language program to compute the roots of
the quadratic equation.
The assembly program is in the file "quada.asm".

•••••••••••••••*••*•••••••••*••••••••••••••••••••••

#include

int main(void)
{

double
a, b, c, rootl, root2;
extern int quad_roots(double, double, double,

456

Assembly Language Programming in Linux

double*, double*);

13
14
15
16
17
18
19
20
21
22
23
24
25

printf("Enter quad constants a, b, c:
scanf("%lf %lf %lf",&a, Sch, &c) ;

");

if (quad_roots (a, b, c, &:rootl, &root2))
printf("Rootl = %lf and root2 = %lf\n",
rootl, root2);
else
printf("There are no real roots.\n");
return 0;

Program 22.4 Quadratic equation solution—assembly language procedure
1:
2:

3:
4:
5:
6:
7:

8:
9:
10:
11:
12:
13:
14:
15:
16:
17'
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

;
;
;
;

It receives three constants a, b, c and pointers to two
roots via the stack. It computes the two real roots if
they exist and returns them in rootl & root2. In this
case, EAX = 1. If no real roots exist, EAX = 0.

%def ine
%def ine
%def ine
%def ine
%def ine

a
b
c
rootl
root2

qword[EBP+8]
qword[EBP+16]
qword[EBP+24]
dword[EBP+32]
dword[EBP+36]

segment .text
global quad_ roots
quad__roots:
enter

fid
fadd

fid
fid
fmulp
fadd
fadd
f chs

fid
fid
fmulp
faddp
ftst
f stsw
sahf

0, 0

a
STO
a
c
STl
STO
STO
b
b
STl
STl
AX

, a
2a
•
•
•
•
•
•
•
•
•
;
;

a,2a
c,a,2a
ac,2a
2ac,2a
4ac,2a
-4ac,2a
b,-4ac,2a
b,b,-4ac,2a
b*b,-4ac,2a
b*b-4ac,2a
compare (b*b-4ac) with 0
store status word in AX

no real root:s

Chapter 22 • Floating-Point Operations

34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

457

f sqrt
sqrt(b*b-4ac),2a
fid
b,sqrt(b*b-4ac),2a
f chs
-b,sqrt(b*b-4ac),2a
fadd
-b+sqrt(b*b-4ac),sqrt(b*b-4ac),2a
STl
fdiv
-b+sqrt(b*b-4ac)/2a,sqrt(b*b-4ac),2c
ST2
mov
EAX,rootl
fstp
store rootl
qword[EAX]
f chs
-sqrt(b*b-4ac) , 2a
fid
b
b,sqrt(b*b-4ac),2a
STl
f subp
-b-sqrt(b*b-4ac),2a
fdivrp STl
-b-sqrt(b*b-4ac)/2a
EAX,root2
mov
store root2
qword[EAX]
fstp
real roots exist
EAX,1
mov
short done
jmp
no_real_roots:
EAX,EAX
0 (no real roots)
sub
EAX
done:
leave
ret

Example 22.2 Array sum example—inline version.
In this example we rewrite the code for the a r r a y _ f sum procedure using the inline assembly
method. Remember that when we use this method, we have to use AT&T syntax. In this syntax,
the operand size is explicitly indicated by suffixing a letter to the opcode. For the floating-point
instructions, the following suffixes are used:
s
1
t

Single-precision
Double-precision
Extended-precision

The inline assembly code, shown in Program 22.5, is similar to that in Program 22.2. You
will notice that on line 10 we use =t output specifier to indicate that variable sum is mapped to a
floating-point register (see page 437 for a discussion of these specifiers). Since we map v a l u e to
EBX and s i z e to ECX (line 11), we use these registers in the assembly language code to access
the array elements (see line 7). The rest of the code is straightforward to follow.

458

Assembly Language Programming in Linux
Program 22.5 Array sum example—inline version

1
2
3
4
5
6
7
8
9
10
11
12
13
14

double array_fsum(double* value, int size)
double sum;
asm("
fldz;
"add_loop: jecxz
"
decl
faddl

jmp

" /* sum
II
done ;
II
%%ecx;
(%%ebx,%%ecx 8) ; "
II
add_loop;

"done:
:"=t"(sum)
:"b"(value),"c" (size)
:"cc");
return(sum);

output */
/* inputs */
/* clobber list

Summary
We presented a brief description of thefloating-pointunit organization. Specifically, we concentrated on the registers provided by the FPU. It provides eightfloating-pointdata registers that are
organized as a stack. Thefloating-pointinstructions include several arithmetic and nonarithmetic
instructions. We discussed some of these instructions. Finally, we presented some examples that
used thefloating-pointinstructions discussed.

APPENDICES

A
Number Systems
This appendix introduces background material on various number systems and representations.
We start the appendix with a discussion of various number systems, including the binary and
hexadecimal systems. When we use multiple number systems, we need to convert numbers from
system to another We present details on how such number conversions are done. We then give
details on integer representations. We cover both unsigned and signed integer representations. We
close the appendix with a discussion of thefloating-pointnumbers.

Positional Number Systems
The number systems that we discuss here are based on positional number systems. The decimal
number system that we are already familiar with is an example of a positional number system. In
contrast, the Roman numeral system is not a positional number system.
Every positional number system has a radix or base, and an alphabet. The base is a positive
number. For example, the decimal system is a base-10 system. The number of symbols in the
alphabet is equal to the base of the number system. The alphabet of the decimal system is 0
through 9, a total of 10 symbols or digits.
In this appendix, we discuss four number systems that are relevant in the context of computer
systems and programming. These are the decimal (base-10), binary (base-2), octal (base-8), and
hexadecimal (base-16) number systems. Our intention in including the familiar decimal system is
to use it to explain some fundamental concepts of positional number systems.
Computers internally use the binary system. The remaining two number systems—octal and
hexadecimal—are used mainly for convenience to write a binary number even though they are
number systems on their own. We would have ended up using these number systems if we had 8
or 16 fingers instead of 10.
In a positional number system, a sequence of digits is used to represent a number. Each digit in
this sequence should be a symbol in the alphabet. There is a weight associated with each position.
If we count position numbers from right to left starting with zero, the weight of position n in a base
h number system is 6^. For example, the number 579 in the decimal system is actually interpreted
as
5 x (10^)+ 7 X (10^)+ 9 X (10°).

462

Assembly Language Programming Under Linux

(Of course, 10^ = 1.) In other words, 9 is in unit's place, 7 in lO's place, and 5 in lOO's place.
More generally, a number in the base h number system is written as
dndn-l ... dido ,
where d o represents the least significant digit (LSD) and dn represents the most significant digit
(MSD). This sequence represents the value
dnb"" + dn-ib""-^ + •.. + dib^ 4- dob^ .
(A.l)
Each digit di in the string can be in the range 0 < d^ < (6 — 1). When we use a number system
with b < 10, we use the first b decimal digits. For example, the binary system uses 0 and 1 as
its alphabet. For number systems with 6 > 10, the initial letters of the English alphabet are used
to represent digits greater than 9. For example, the alphabet of the hexadecimal system, whose
base is 16, is 0 through 9 and A through F, a total of 16 symbols representing the digits of the
hexadecimal system. We treat lowercase and uppercase letters used in a number system such as
the hexadecimal system as equivalent.
The number of different values that can be represented using n digits in a base b system is 6 ^.
Consequently, since we start counting from 0, the largest number that can be represented using n
digits is {b'^ — 1). This number is written as
(6-l)(6-l)...(6-l)(6-l).
V

total of n digits
The minimum number of digits (i.e., the length of a number) required to represent X different
values is given by [log^ X], where [ ] represents the ceiling function. Note that [m] represents
the smallest integer that is greater than or equal to m.
Notation The commonality in the alphabet of several number systems gives rise to confusion.
For example, if we write 100 without specifying the number system in which it is expressed,
different interpretations can lead to assigning different values, as shown below:
jmber
100
100
100
100

Decimal value
binary
decimal
octal
hexadecimal

4
100
64
256

Thus, it is important to specify the number system (i.e., specify the base). One common notation is
to append a single letter—uppercase or lowercase—to the number to specify the number system.
For example, D is used for decimal, B for binary, Q for octal, and H for hexadecimal number
systems. Using this notation, lOllOlllBisa binary number and 2BA9H is a hexadecimal number.
Some assemblers use prefix Ox for hexadecimal and prefix 0 for octal.
Decimal Number System We use the decimal number system in everyday life. This is a base10 system presumably because we have 10fingersand toes to count. The alphabet consists of 10
symbols, digits 0 through 9.

Appendix A • Number Systems

463

Binary Number System The binary system is a base-2 number system that is used by computers
for internal representation. The alphabet consists of two digits, 0 and 1. Each binary digit is called
a bit (standing for binary digit). Thus, 1021 is not a valid binary number. In the binary system,
using n bits, we can represent numbers from 0 through (2 ^ — 1) for a total of 2 ^ different values.
Octal Number System This is a base-8 number system with the alphabet consisting of digits
0 through 7. Thus, 181 is not a valid octal number. The octal numbers are often used to express
binary numbers in a compact way. For example, we need 8 bits to represent 256 different values.
The same range of numbers can be represented in the octal system by using only 3 digits.
For example, the number 230Q is written in the binary system as 1001 lOOOB, which is difficult
to read and error prone. In general, we can reduce the length by a factor of 3. As we show later,
it is straightforward to go back to the binary equivalent, which is not the case with the decimal
system.
Hexadecimal Number System This is a base-16 number system. The alphabet consists of digits
0 through 9 and letters A through F. In this text, we use capital letters consistently, even though
lowercase and uppercase letters can be used interchangeably. For example, FEED is a valid hexadecimal number, whereas GEFF is not.
The main use of this number system is to conveniently represent long binary numbers. The
length of a binary number expressed in the hexadecimal system can be reduced by a factor of
4. Consider the previous example again. The binary number 1001 lOOOB can be represented as
98H. Debuggers, for example, display information—addresses, data, and so on—in hexadecimal
representation.

Conversion to Decimal
When we are dealing with several number systems, there is often a need to convert numbers from
one system to another. Let us first look at how a number expressed in the base-6 system can
be converted to the decimal system. To do this conversion, we merely perform the arithmetic
calculations of Equation A.l given on page 462; that is, multiply each digit by its weight, and add
the results. Here is an example.
Example A.l Conversion from binary to decimal.
Convert the binary number 1010011 IB into its equivalent in the decimal system.
10100111^ - 1 . 2^ + 0 • 2^ + 1 • 2^ + 0 • 2^
+ 0 • 2^ + 1 . 2^ + 1 . 2^ + 1 • 2^
= 167D

Conversion from Decimal
There is a simple method that allows conversions from the decimal to a target number system. The
procedure is as follows:

464

Assembly Language Programming Under Linux

Divide the decimal number by the base of the target number system and
keep track of the quotient and remainder Repeatedly divide the successive
quotients while keeping track of the remainders generated until the quotient
is zero. The remainders generated during the process, written in the reverse
order of generation from left to right, form the equivalent number in the
target system.
Let us look at an example now.
Example A,2 Conversion from decimal to binary.
Convert the decimal number 167 into its equivalent binary number.

167/2
83/2
41/2
20/2
10/2
5/2
2/2
1/2

=
=
=
=
=
=
=
=

Remainder

Quotient
83
41
20
10
5
2
1
0

1
1
1
0
0
1
0
1

The desired binary number can be obtained by writing the remainders generated in the reverse
order from left to right. For this example, the binary number is 1010011 IB. This agrees with the
•
result of Example A. 1.

Binary/Octal/Hexadecimal Conversion
Conversion among binary, octal, and hexadecimal number systems is relatively easier and more
straightforward. Conversion from binary to octal involves converting three bits at a time, whereas
binary to hexadecimal conversion requires converting four bits at a time.
Binary/Octal Conversion To convert a binary number into its equivalent octal number, form
3-bit groups starting from the right. Add extra Os at the left-hand side of the binary number if the
number of bits is not a multiple of 3. Then replace each group of 3 bits by its equivalent octal
digit. Why three bit groups? Simply because 2^ = 8. Here is an example.
Example A.3 Conversion from binary to octal.
The following examples illustrate this conversion process.
1

loooioiB - 'ooT'ooo^ m B
- 105Q.

Appendix A • Number Systems

465

loiooiiiB = 'OIO^'TOO^'TITB
= 247Q.
Note that we have added leftmost Os (shown in bold) so that the number of bits is 9. Adding Os on
the left-hand side does not change the value of a number. For example, in the decimal system, 35
and 0035 represent the same value.
•
We can use the reverse process to convert numbers from octal to binary. For each octal digit,
write the equivalent 3 bits. You should write exactly 3 bits for each octal digit even if there are
leading Os. For example, for octal digit 0, write the three bits 000.
Example A.4 Conversion from octal to binary,
The following two examples illustrate conversion from octal to binary:

105Q= 001 000 101B,
2

247Q = '010''T00^tlTB.
If you want an 8-bit binary number, throw away the leading 0 in the binary number.

•

Binary/Hexadecimal Conversion The process for conversion from binary to hexadecimal is
similar except that we use 4-bit groups instead of 3-bit groups because 2 ^ — 16. For each group
of 4 bits, replace it by the equivalent hexadecimal digit. If the number of bits is not a multiple of
4, pad Os at the left. Here is an example.
Example A.5 Binary to hexadecimal conversion.
Convert the binary number 1101011111 into its equivalent hexadecimal number.
3

iioioiiiiiB = oonoioTTmB
= 35FH.
As in the octal to binary example, we have added two Os on the left to make the total number of
bits a multiple of 4 (i.e., 12).
D
The process can be reversed to convert from hexadecimal to binary. Each hex digit should be
replaced by exacdy four binary bits that represent its value. An example follows:
Example A.6 Hex to binary conversion.
Convert the hexadecimal number BO ID into its equivalent binary number.
B

B01DH = T m 0000 0001 n O l B .

466

Assembly Language Programming Under Linux

Unsigned Integers
Now that you are familiar with different number systems, let us turn our attention to how integers
(numbers with no fractional part) are represented internally in computers. Of course, we know that
the binary number system is used internally. Still, there are a number of other details that need to
be sorted out before we have a workable internal number representation scheme.
We begin our discussion by considering how unsigned numbers are represented using a fixed
number of bits. We then proceed to discuss the representation for signed numbers in the next
section.
The most natural way to represent unsigned (i.e., nonnegative) numbers is to use the equivalent
binary representation. As discussed before, a binary number with n bits can represent 2 ^ different
values, and the range of the numbers is from 0 to (2 ^ — 1). Padding of Os on the left can be used
to make the binary conversion of a decimal number equal exactly N bits. For example, we can
represent 16D as lOOOOB using 5 bits. However, this can be extended to a byte (i.e., A^ == 8) as
00010000Bortol6 bits asOOOOOOOOOOOlOOOOB. This process is called zero extension and
is suitable for unsigned numbers.
A problem arises if the number of bits required to represent an integer in binary is more than
the A^ bits we have. Clearly, such numbers are outside the range of numbers that can be represented
using N bits. Recall that using N bits, we can represent any integer X such that 0 < X < 2 ^ — 1.

Signed Integers
There are several ways in which signed numbers can be represented. These include
• Signed magnitude,
• Excess-M,
• I's complement, and
• 2's complement.
Signed Magnitude Representation

In signed magnitude representation, one bit is reserved to represent the sign of a number. The
most significant bit is used as the sign bit. Conventionally, a sign bit value of 0 is used to represent
a positive number and 1 for a negative number. Thus, if we have N bits to represent a number,
(A — 1) bits are available to represent the magnitude of the number. For example, when N is
4, Table A. 1 shows the range of numbers that can be represented. For comparison, the unsigned
representation is also included in this table. The range of n-bit signed magnitude representation is
_ 2 n - i _|_ 1 to +2"^"^ — 1. Note that in this method, 0 has two representations: +0 and - 0 .
Excess-M Representation

In this method, a number is mapped to a nonnegative integer so that its binary representation can
be used. This transformation is done by adding a value called bias to the number to be represented.
For an n bit representation, the bias should be such that the mapped number is less than 2 ^.
Tofindout the binary representation of a number in this method, simply add the bias M to the
number and find the corresponding binary representation. That is, the representation for number

Appendix A • Number Systems

467

Table A.1 Number representation using 4-bit binary (All numbers except Binary column in decimal)

Unsigned
representation
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Binary
pattern
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110

nil

Signed
magnitude
0
1
2
3
4
5
6
7
-0
-1
-2
-3
-4
-5
-6
-7

Excess-7
=^7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8

I's Complement
0
1
2
3
4
5
6
7
-7
-6
-5
-4
-3
-2
-1
-0

2's Complement
0
1
2
3
4
5
6
7
-8
-7
-6
-5
-4
-3
-2
-1

X is the binary representation for the number X -\- M, where M is the bias. For example, in the
excess-7 system, -3D is represented as
- 3 + 7 = + 4 - OlOOB.
Numbers represented in excess-M are called biased integers for obvious reasons. Table A. 1
gives examples of biased integers using 4-bit binary numbers. This representation, for example,
is used to store the exponent values in the floating-point representation (discussed in the next
section).
1 's Complement Representation

As in the excess-M representation, negative values are biased in I's complement and 2's complement representations. For positive numbers, the standard binary representation is used. As in
the signed magnitude representation, the most significant bit indicates the sign (0 = positive and
1 = negative). In I's complement representation, negative values are biased by 6 ^ - 1, where b
is the base or radix of the number system. For the binary case that we are interested in here, the
bias is 2 ^ — 1. For the negative value —X, the representation used is the binary representation for
(2^ — 1) — X. For example, if n is 4, we can represent —5 as
2^-1
-5

=
=

HUB
-QIQIB
lOlOB
As you can see from this example, the I's complement of a number can be obtained by simply
complementing individual bits (converting Os to Is and vice versa) of the number. Table A. 1 shows

468

Assembly Language Programming Under Linux

I's complement representation using 4 bits. In this method also, 0 has two representations. The
most significant bit is used to indicate the sign. To find the magnitude of a negative number in this
representation, apply the process used to obtain the I's complement (i.e., complement individual
bits) again.
Representation of signed numbers in I's complement representation allows the use of simpler
circuits for performing addition and subtraction than the other two representations we have seen
so far (signed magnitude and excess-M). Some older computer systems used this representation
for integers. An irritant with this representation is that 0 has two representations. Furthermore,
the carry bit generated out of the sign bit will have to be added to the result. The 2's complement
representation avoids these pitfalls. As a result, 2's complement representation is the choice of
current computer systems.
2's Complement Representation

In 2's complement representation, positive numbers are represented the same way as in the signed
magnitude and I's complement representations. The negative numbers are biased by 2 ^, where n
is the number of bits used for number representation. Thus, the negative value - ^ is represented
by (2^ - A) using n bits. Since the bias value is one more than that in the I's complement
representation, we have to add 1 after complementing to obtain the 2's complement representation
of a negative number. We can, however, discard any carry generated out of the sign bit. For
example, —5 can be represented as
5D= OIOIB—^complement—>1010B
addl
1£
lOllB

Therefore, l O l l B represents — 5D in 2's complement representation. Table A.l shows the 2's
complement representation of numbers using 4 bits. Notice that there is only one representation
for 0. The range of an n-bit 2's complement integer is —2 ^~^ to +2^"^ — 1. For example, using
8 bits, the range is -128 to +127.
To find the magnitude of a negative number in the 2's complement representation, as in the
I's complement representation, simply reverse the sign of the number. That is, use the same
conversion process i.e., complement and add 1 and discard any carry generated out of the leftmost
bit.
Sign Extension

How do we extend a signed number? For example, we have shown that - 5 can be represented
in the 2's complement representation as l O l l B . Suppose we want to save this as a byte. How
do extend these four bits into eight bits? We have seen on page 466 that, for unsigned integers,
we add zeros on the left to extend the number. However, as cannot use this technique for signed
numbers because the most significant bit represents the sign. To extend a signed number, we have
to copy the sign bit. In our example, —5 is represented using eight bits as
sign bit

-5D = m T i o i i

Appendix A • Number Systems

469

We have copied the sign bit to extend the four-bit value to eight bits. Similarly, we can express - 5
using 16 bits by extending it as follows:
sign bit

- 5 D = 1111111111111011
This process is referred to as sign extension.

Floating-Point Representation
Using the decimal system for a moment, we can write very small and very large numbers in
scientific notation as follows:
1.2345 X 10^^
9.876543 x 10"^'^.
Expressing such numbers using the positional number notation is difficult to write and understand,
errorprone, and requires more space. In a similar fashion, binary numbers can be written in the
scientific notation. For example,
+1101.101 X 2-^^^°°^ - 13.625 X 2^^
= 4.57179 X 10^
As indicated, numbers expressed in this notation have two parts: a mantissa (or significand), and
an exponent. There can be a sign (+ or - ) associated with each part.
Numbers expressed in this notation can be written in several equivalent ways, as shown below:
1.2345 X 10'^^
123.45 X 10^^
0.00012345 X 10^^
This causes implementation problems to perform arithmetic operations, comparisons, and the like.
This problem can be avoided by introducing a standard form called the normal form. Reverting to
the binary case, a normalized binary form has the format
±l.XiX2 • • • XM-IXM

X 2±^^-i^^-2-^^^°,

where Xi and Yj represent a bit, 1 < i < M, and 0 < j < N. The normalized form of
+1101.101 X 2+^^^^°
is
+1.101101 x2+i^^^^
We normally write such numbers as
+1.101101E11101.
To represent such normalized numbers, we might use the format shown below:

Assembly Language Programming Under Linux

470

bitH^
bit

- ^

8 bits H ^ -

exponent

23 bits
mantissa

31 130

23122
(a)

- ^

bit

11 bits

exponent

-^^-

52 bits
mantissa

52151

63 162
(b)

Figure A.1 Floating-point formats (a) Single-precision (b) Double-precision.

^ii
Se

Nbits

bit

Mblts

exponent

mantissa

where Sm and Se represent the sign of mantissa and exponent, respectively.
Implementation of floating-point numbers varies from this generic format, usually for efficiency reasons or to conform to a standard. From here on, we discuss the format of the IEEE 754
floating-point standard. Such standards are useful, for example, to exchange data among several
different computer systems and to write efficient numerical software libraries.
The single-precision and double-precision floating-point formats are shown in Figure A. 1.
Certain points are worth noting about these formats:
1. The mantissa stores only the fractional part of a normalized number. The 1 to the left of
the binary point is not explicitly stored but implied to save a bit. Since this bit is always 1,
there is really no need to store it. However, representing 0.0 requires special attention, as
we show later.
2. There is no sign bit associated with the exponent. Instead, the exponent is converted to an
excess-M form and stored. For the single-precision numbers, the bias used is 127D (= 7FH),
and for the double-precision numbers, 1023 (= 3FFH).
Special Values The representations of 0 and infinity (oo) require special attention. Table A.2
shows the values of the three components to represent these values. Zero is represented by a
zero exponent and fraction. We can have a - 0 or +0 depending on the sign bit. An exponent
of all ones indicates a specialfloating-pointvalue. An exponent of all ones with a zero mantissa
indicates infinity. Again, the sign bit indicates the sign of the infinity. An exponent of all ones
with a nonzero mantissa represents a not-a-number (NaN). The NaN values are used to represent
operations like 0/0 and \ / ^ .
The last entry in Table A.2 shows how denormalized values are represented. The denormals are
used to represent values smaller than the smallest value that can be represented with normalized

471

Appendix A • Number Systems

Table A,2 Representation of special values in the floating-point format

Special number

Sign

0
1
0
1
0/1
0/1

-0
+00

—oo
NaN
Denormals

Exponent (biased)
0
0
FFH
FFH
FFH
0

Mantissa
0
0
0
0
7^0
^0

floating-point numbers. For denormals, the implicit 1 to the left of the binary point becomes a
0. The smallest normalized number has a 1 for the exponent (note zero is not allowed) and 0
for the fraction. Thus, the smallest number is 1 x 2~^^^. The largest denormalized number has
a zero exponent and all Is for the fraction. This represents approximately 0.9999999 x 2 ~^^^.
The smallest denormalized number would have zero as the exponent and a 1 in the last bit position
(i.e., position 23). Thus, it represents 2 ~^^ x 2~^^^, which is approximately 10~^^. For a thorough
discussion of floating-point numbers, see D. Goldberg, "What Every Computer Scientist Should
Know About Floating-Point Arithmetic," ACM Computing Surveys, Vol. 23, No. 1, March 1991,
pp. 5-48.

Summary
We discussed how numbers are represented using the positional number system. Positional number
systems are characterized by a base and an alphabet. The familiar decimal system is a base10 system with the alphabet 0 through 9. Computer systems use the binary system for internal
storage. This is a base-2 number system with 0 and 1 as the alphabet. The remaining two number
systems—octal (base-8) and hexadecimal (base-16)—are mainly used for convenience to write a
binary number. For example, debuggers use the hexadecimal numbers to display address and data
information.
When we use several number systems, there is often a need to convert numbers from one system to another. Conversion among binary, octal, and hexadecimal systems is simple and straightforward. We also discussed how numbers are converted from decimal to binary and vice versa.
The remainder of the chapter was devoted to internal representation of numbers. Representation of unsigned integers is straightforward and uses binary representation. There are, however,
several ways of representing signed integers. We discussed four methods to represent signed integers. Of these four methods, current computer systems use the 2's complement representation.
Floating-point representation on most computers follows the IEEE 754 standard. There are
three components of a floating-point number: mantissa, exponent, and the sign of the mantissa,
There is no sign associated with the exponent. Instead, the exponent is stored as a biased number.

B
Character
Representation
This appendix discusses character representation. We identify some desirable properties that a
character-encoding scheme should satisfy in order to faciUtate efficient character processing. Our
focus is on the ASCII encoding; we don't discuss other character sets such as UCS and Unicode.
The ASCII encoding, which is used by most computers, satisfies the requirements of an efficient
character code.

Character Representation
As computers have the capability to store and understand the alphabet 0 and 1, characters should
be assigned a sequence over this alphabet i.e., characters should be encoded using this alphabet.
For efficient processing of characters, several guidelines have been developed. Some of these are
mentioned here:
1. Assigning a contiguous sequence of numbers (if treated as unsigned binary numbers) to
letters in alphabetical order is desired. Upper and lowercase letters (A through Z and a
through z) can be treated separately, but a contiguous sequence should be assigned to each
case. This facilitates efficient character processing such as case conversion, identifying
lowercase letters, and so on.
2. In a similar fashion, digits should be assigned a contiguous sequence in the numerical order.
This would be useful in numeric-to-character and character-to-numeric conversions.
3. A space character should precede all letters and digits.
These guidelines allow for efficient character processing including sorting by names or character strings. For example, to test if a given character code corresponds to a lowercase letter, all
we have to do is to see if the code of the character is between that of a and z. These guidelines
also aid in applications requiring sorting—for instance, sorting a class list by last name.
Since computers are rarely used in isolation, exchange of information is an important concern. This leads to the necessity of having some standard way of representing characters. Most

474

Assembly Language Programming Under Linux

computers use the American Standard Code for Information Interchange (ASCII) for character
representation. The standard ASCII uses 7 bits to encode a character. Thus, 2 ^ = 128 different
characters can be represented. This number is sufficiently large to represent uppercase and lowercase characters, digits, special characters such as !," and control characters such as CR (carriage
return), LF (linefeed), etc.
Since we store the bits in units of a power of 2, we end up storing 8 bits for each character—
even though ASCII requires only 7 bits. The eighth bit is put to use for two purposes.
1. To parity encode for error detection: The eighth bit can be used to represent the parity bit.
This bit is made 0 or 1 such that the total number of 1 's in a byte is even (for even parity) or
odd (for odd parity). This can be used to detect simple errors in data transmission.
2. To represent an additional 128 characters: By using all eight bits we can represent a total of
2 ^ = 256 different characters. This is referred to as the extended ASCII. These additional
codes are used for special graphics symbols, Greek letters, etc. make up the additional 128
characters.
The standard ASCII character code is presented in two tables on the next two pages. You
will notice from these tables that ASCII encoding satisfies the three guidelines mentioned earlier.
For instance, successive bit patterns are assigned to uppercase letters, lowercase letters, and digits.
This assignment leads to some good properties. For example, the difference between the uppercase
and lowercase characters is constant. That is, the difference between the character codes of a and
A is the same as that between n and N, which is 32. This characteristic can be exploited for efficient
case conversion.
Another interesting feature of ASCII is that the character codes are assigned to the 10 digits
such that the lower order four bits represent the binary equivalent of the corresponding digit.
For example, digit 5 is encoded as 0110101. If you take the rightmost four bits (0101), they
represent 5 in binary. This feature, again, helps in writing an efficient code for character-tonumeric conversion. Such a conversion, for example, is required when you type a number as a
sequence of digit characters.

ASCII Character Set
The next two pages give the standard ASCII character set. We divide the character set into control
and printable characters. The control character codes are given on the next page and the printable
ASCII characters are on page 476.

Appendix B • Character Representation

475

Control Codes
Hex
00
01
02
03
04
05
06
07
08
09
OA
OB
OC
OD
OE
OF
10
11
12
13
14
15
16
17
18
19
lA
IB
IC
ID
IE
IF
7F

Decimal Character
~NUL
0
SOH
1
STX
2
ETX
3
EOT
4
ENQ
5
ACK
6
BEL
7
BS
8
HT
9
LF
10
VT
11
FF
12
CR
13
SO
14
SI
15
DLE
16
DCl
17
DC2
18
DC3
19
DC4
20
NAK
21
SYN
22
ETB
23
CAN
24
EM
25
SUB
26
ESC
27
FS
28
GS
29
RS
30
US
31
DEL
127

Meaning
NULL
Start of heading
Start of text
End of text
End of transmission
Enquiry
Acknowledgment
Bell
Backspace
Horizontal tab
Line feed
Vertical tab
Form feed
Carriage return
Shift out
Shift in
Data link escape
Device control 1
Device control 2
Device control 3
Device control 4
Negative acknowledgment
Synchronous idle
End of transmission block
Cancel
End of medium
Substitute
Escape
File separator
Group separator
Record separator
Unit separator
Delete

476

Assembly Language Programming Under Linux

Printable Character Codes
Hex
20
21
22
23
24
25
26
27
28
29
2A
2B
2C
2D
2E
2F
30
31
32
33
34
35
36
37
38
39
3A
3B
3C
3D
3E
3F

Decimal
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

Character
Space
I
95

#
$
%
&
9

(
)
*
+
5

/
0
1
2
3
4
5
6
7
8
9
9

<
=
>
7

Hex
40
41
42
43
44
45
46
47
48
49
4A
4B
4C
4D
4E
4F
50
51
52
53
54
55
56
57
58
59
5A
5B
5C
5D
5E
5F

Decimal
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95

Character
@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
0
P
Q
R
S
T
U
V
W
X
Y
Z
[
\
]
'^
-

Hex
60
61
62
63
64
65
66
67
68
69
6A
6B
6C
6D
6E
6F
70
71
72
73
74
75
76
77
78
79
7A
7B
7C
7D
7E

Decimal
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126

Note that 7FH (127 in decimal) is a control character listed on the previous page.

Character
a
b
c
d
e
f
g
h
i

J
k
1
m
n
0

P
q
r
s
t
u
V

w
X

y
z

{

}

c
Programming Exercises
This appendix gives several programming exercises. These exercises can be used to practice writing programs in the assembly language.
1. Modify the a d d i g i t s . asm program given in Example 9.3 such that it accepts a string
from the keyboard consisting of digit and nondigit characters. The program should display
the sum of the digits present in the input string. All nondigit characters should be ignored.
For example, if the input string is
ABCl?5wy76:~2

the output of the program should be
sum of i n d i v i d u a l d i g i t s i s : 21

2. Write an assembly language program to encrypt digits as shown below:
input digit: 0 1 2 3 4 5 6 7 8 9
encrypted digit: 4 6 9 5 0 3 1 8 7 2
Your program should accept a string consisting of digit and nondigit characters. The encrypted string should be displayed in which only the digits are affected. Then the user
should be queried whether he/she wants to terminate the program. If the response is either
'y' or 'Y' you should terminate the program; otherwise, you should request another input
string from the keyboard.
The encryption scheme given here has the property that when you encrypt an already encrypted string, you get back the original string. Use this property to verify your program.
3. Write a program to accept a number in the hexadecimal form and display the decimal equivalent of the number. A typical interaction of your program is (user input is shown in bold):
Please input a positive number in hex (4 digits max.): AlOF
The decimal equivalent of AlOFH is 41231
Do you want to terminate the program (Y/N): Y
You can refer to Appendix A for an algorithm to convert from base b to decimal. You should
do the required multiplication by the left shift instruction. Once you have converted the hex
number into the equivalent in binary, you can use the p r i n t _ i n t system call to display
the decimal equivalent.

478

Assembly Language Programming Under Linux

4. Write a program that reads an input number (given in decimal) between 0 and 65,535 and
displays the hexadecimal equivalent. You can read the input using r e a d _ i n t system call.
5. Modify the above program to display the octal equivalent instead of the hexadecimal equivalent of the input number.
6. Write a procedure l o c a t e to locate a character in a given string. The procedure receives
a pointer to a NULL-terminated character string and the character to be located. When the
first occurrence of the character is located, its position is returned to main. If no match
is found, a negative value is returned. The main procedure requests a character string and
a character to be located and displays the position of the first occurrence of the character
returned by the l o c a t e procedure. If there is no match, a message should be displayed to
that effect.
7. Write a procedure that receives a string and removes all leading blank characters in the
string. For example, if the input string is (U indicates a blank character)
U U U U UReadUUmyUlips.

it will be modified by removing all leading blanks as
ReadUUmyUlips.
Write a main program to test your procedure.
8. Write a procedure that receives a string and removes all leading and duplicate blank characters in the string. For example, if the input string is (U indicates a blank character)
U U U U UReadU U UmyU U U U Ulips.

it will be modified by removing all leading and duplicate blanks as
ReadUmyUlips.
Write a main program to test your procedure.
9. Write a procedure to read a string, representing a person's name, in the format
first-nameUMlUlast-name
and displays the name in the format
last-name, Ufirst-nameUMI
where U indicates a blank character. As indicated, you can assume that the three names—
first name, middle initial, and last name—are separated by single spaces. Write a main
program to test your procedure.
10. Modify the last exercise to work on an input that can contain multiple spaces between the
names. Also, display the name as in the last exercise but with the last name in all capital
letters.
11. Write a complete assembly language program to read two matrices A and B and display the
result matrix C, which is the sum of A and B. Note that the elements of C can be obtained
as

C[iJ] = A[iJ] + B[iJ].
Your program should consist of a main procedure that calls the r e a d _ m a t r i x procedure
twice to read data for A and B. It should then call the m a t r i x _ a d d procedure, which
receives pointers to A, B, C, and the size of the matrices. Note that both A and B should
have the same size. The main procedure calls another procedure to display C.

Appendix C • Programming Exercises

479

12. Write a procedure to perform multiplication of matrices A and B. The procedure should
receive pointers to the two input matrices (A of size / x m, B of size m x n), the product
matrix C, and values I, m, and n. Also, the data for the two matrices should be obtained
from the user. Devise a suitable user interface to read these numbers.
13. Modify the program of the last exercise to work on matrices stored in the column-major
order.
14. Write a program to read a matrix (maximum size 10 x 10) from the user and display the
transpose of the matrix. To obtain the transpose of matrix A, write rows of A as colunms.
Here is an example:
If the input matrix is
12
23
34
45

34
45
56
67

56
67
78
89

78
89
90
10

12
34
56
78

23
45
67
89

34
56
78
90

45
67
89
10

the transpose of the matrix is

15. Write a program to read a matrix (maximum size 10 x 15) from the user and display the
subscripts of the maximum element in the matrix. Your program should consist of two procedures: main is responsible for reading the input matrix and for displaying the position of
the maximum element. Another procedure mat_max is responsible for finding the position
of the maximum element. For example, if the input matrix is
12
23
34
45

34
45
56
67

56
67
78
89

78
89
90
10

the output of the program should be
The maximum element is at (2,3),
which points to the largest value (90 in our example).
16. Write a program to read a matrix of integers, perform cyclic permutation of rows, and display the result matrix. Cyclic permutation of a sequence a 0,0.1,0,2,-- - iCtn-i is defined as
ai, a 2 , . . . , cin-i, a 0- Apply this process for each row of the matrix. Your program should
be able to handle up to 12 x 15 matrices. If the input matrix is
12
23
34
45

34
45
56
67

56
67
78
89

78
89
90
10

480

Assembly Language Programming Under Linux

the permuted matrix is
34
45
56
67

56
67
78
89

78
89
90
10

12
23
34
45

17. Generalize the last exercise to cyclically permute by a user-specified number of elements.
18. Write a complete assembly language program to do the following:
• Read the names of students in a class into a one-dimensional array.
• Read test scores of each student into a two-dimensional marks array.
• Output a letter grade for each student in the format:
student name
l e t t e r grade
You can use the following information in writing your program:
•
•
•
•

Assume that the maximum class size is 20.
Assume that the class is given four tests of equal weight (i.e., 25 points each).
Test marks are rounded to the nearest integer so you can treat them as integers.
Use the following table to convert percentage marks (i.e, sum of all four tests) to a
letter grade.
Marks range
85-100
70-84
60-69
50-59
0-49

Grade
A
B
C
D
F

19. Modify the program for the last exercise to also generate a class summary stating the number
of students receiving each letter grade in the following format:
A = number of students receiving A,
B = number of students receiving B,
C = number of students receiving C,
D = number of students receiving D,
F = number of students receiving F.
20. If we are given a square matrix (i.e., a matrix with the number of rows equal to the number of
columns), we can classify it as the diagonal matrix if only its diagonal elements are nonzero;
as an upper triangular matrix if all the elements below the diagonal are 0; and as a lower
triangular matrix if all elements above the diagonal are 0. Some examples are:
Diagonal matrix:
r 28
0
0
L 0

0
87
0
0

0
0
97
0

0
0
0
65

Appendix C • Programming Exercises

481

Upper triangular matrix:
19 26 35 98
0 78 43 65
0 0 38 29
0 0 0 82
Lower triangular matrix:
76 0 0 0
44 38 0 0
65 28 89 0
87 56 67 54
Write an assembly language program to read a matrix and output the type of matrix.
21. In Appendix A, we discussed the format of the single-precision floating-point numbers.
Write a program that reads thefloating-pointinternal representation from the user as a string
of eight hexadecimal digits and displays the three components—mantissa, exponent, and
sign—in binary. For example, if the input to the program is 429DA000, the output should
be:
sign = 0
mantissa =1.0011101101
exponent = 110
22. Modify the program for the last exercise to work with the double-precision floating-point
representation.
23. Ackermann's function A{m, n) is defined for m > 0 and n > 0 as
A{0,n) = N + l
A{m,0) =
A{m-l,l)
A{m,n) =A{m - l,A{m,n-

1))

forn > 0
for m > 1
form > l,n > 1.

Write a recursive procedure to compute this function. Your main program should handle the
user interface to request m and n and display the final result.
24. Write a program to solve the Towers of Hanoi puzzle. The puzzle consists of three pegs and
A^ disks. Disk 1 is smaller than disk 2, which is smaller than disk 3, and so on. Disk A^ is
the largest. Initially, all N disks are on peg 1 such that the largest disk is at the bottom and
the smallest at the top (i.e., in the order A, A — 1, ..., 3, 2, 1 from bottom to top). The
problem is to move these A^ disks from peg 1 to peg 2 under two constraints: You can move
only one disk at a time and you must not place a larger disk on top of a smaller one. We can
express a solution to this problem by using recursion. The function
move(N, 1, 2, 3)
moves A' disks from peg 1 to peg 2 using peg 3 as the extra peg. There is a simple solution
if you concentrate on moving the bottom disk on peg 1. The task move (N, 1, 2 , 3) is
equivalent to
move(N-l, 1, 3, 2)
move the remaining disk from peg 1 to 2
move(N-l, 3, 2, 1)

482

Assembly Language Programming Under Linux

Even though the task appears to be complex, we write a very elegant and simple solution to
solve this puzzle. Here is a version in C.
void move (int n, int x, int y, int z)

{
if (n == 1)
printf("Move
else
move(n-1, x,
printf("Move
move(n-1, z,

the top disk from peg %d to %d\n",x,y};
z, y)
the top disk from peg %d to %d\n",x,y};
y, x)

}

int main (void)
{

int

disks;

scanf ("%d", Scdisks) /
m o v e ( d i s k s , 1, 2, 3 ) ;
}

Test your program for a very small number of disks (say, less than 6). Even for 64 disks, it
takes hundreds of years on whatever PC you have!
25. Write a procedure s t r _ s t r that receives two pointers to strings s t r i n g and subs t r i n g
and searches for s u b s t r i n g in s t r i n g . If a match is found, it returns the starting position of the first match. Matching should be case sensitive. A negative value is returned if no
match is found. For example, if
s t r i n g = Good things come in small packages.
and
s u b s t r i n g = in
the procedure should return 8 indicating a match of i n in t h i n g s .
26. Write a procedure s t r _ n c p y to mimic the s t r n c p y function provided by the C library.
The function s t r _ n c p y receives two strings, s t r i n g l and s t r i n g 2 , and a positive
integer num. Of course, the procedure receives only the string pointers but not the actual
strings. It should copy at most the first num characters from s t r i n g 2 to s t r i n g l .
27. A palindrome is a word, verse, sentence, or a number that reads the same both backward
and forward. Blanks, punctuation marks, and capitalization do not count in determining
palindromes. Here are some examples:
1991
Able was I ere I saw Elba
Madam! I'm Adam
Write a program to determine if a given string is a palindrome. The procedure returns 1 if
the string is a palindrome; otherwise, it returns 0.
28. Write an assembly language program to read a string of characters from the user and that
prints the vowel count. For each vowel, the count includes both uppercase and lowercase
letters. For example, the input string

Appendix C • Programming Exercises

483

Advanced Programming in UNIX Environment
produces the following output:
Vowel Count
a or A
3
eorE
3
iorl
4
oorO
2
uorU
1
29. Merge sort is a technique to combine two sorted arrays. Merge sort takes two sorted input
arrays X and Y—say of size m and n—and produces a sorted array Z of size m -\- n that
contains all elements of the two input arrays. The pseudocode of merge sort is as follows:
merge s o r t (X, Y, Z, m, n)
i := 0 {index variables for arrays X, Y, and Z}
j:=0
k:=0
while ((i < m) AND Q < n))
if (X[i] < Y[j]) {find largest of two}
then
Z[k] := X[i] {copy and update indices}
k:=k+l
i:=i+l
else
Z[k] := Y[j] {copy and update indices}
k:=k+l
end if
end while
if (i < m) {copy remainder of input array}
while (i < m)
Z[k]:=X[i]
k:=k+l
i:=i+l
end while
else
while Q :2.

Appendix D • IA-32 Instruction Set

cmp — Compare two operands
Format:
Description:

cmp

Description:

C
M

0
M

Z
M

S
M

P
M

A
M

dest, src

Compares the two operands specified by performing d e s t - s r c . However, the result of this subtraction is not stored (unlike the sub instruction)
but only the flags are updated to reflect the result of the subtract operation.
This instruction is typically used in conjunction with conditional jumps
If an operand greater than 1 byte is compared to an immediate byte, the
byte value is first sign-extended. Clock cycles: 1 if no memory operand is
involved; 2 if one of the operands is in memory.

cmps — Compare string operands
Format:

493

cmps
cmpsb
cmpsw
cmpsd

C
M

0
M

Z
M

S
M

P
M

A
M

dest, src

Compares the byte, word, or doubleword pointed by the source index register (SI or ESI) with an operand of equal size pointed by the destination
index register (DI or EDI). If the address size is 16 bits, SI and DI registers
are used; ESI and EDI registers are used for 32-bit addresses. The comparison is done by subtracting operand pointed by the DI or EDI register
from that by SI or ESI register. That is, the cmps instructions performs
either [SI]-[DI] or [ESI]-[EDI]. The result is not stored but used to update theflags,as in the cmp instruction. After the comparison, both source
and destination index registers are automatically updated. Whether these
two registers are incremented or decremented depends on the direction flag
(DF). The registers are incremented if DP is 0 (see the e l d instruction to
clear the directionflag);if the DF is 1, both index registers are decremented
(see the s t d instruction to set the direction flag). The two registers are
incremented or decremented by 1 for byte comparisons, 2 for word comparisons, and 4 for doubleword comparisons.
Note that the specification of the operands in cmps is not really required
as the two operands are assumed to be pointed by the index registers. The
cmpsb, cmpsw, and cmpsd are synonyms for the byte, word, and doubleword cmps instructions, respectively.
The repeat prefix instructions (i.e., r e p , r e p e or repne) can precede the
cmps instructions for array or string comparisons. See r e p instruction for
details. Clock cycles: 5.

494

Assembly Language Programming Under Linux

cwd — Convert word to doubleword
Format:
Description:

Description:

z s

Converts the signed word in AX to a signed doubleword in DX:AX by
copying the sign bit of AX (the most significant bit) to all bits of DX
In fact, cdq and this instruction use the same opcode (99H). Which one is
executed depends on the default operand size. If the operand size is 16 bits,
cwd is performed; cdq is performed for 32-bit operands. Clock cycles: 2.

cwde
Converts the signed word in AX to a signed doubleword in EAX by copying the sign bit of AX (the most significant bit) to all bits of the upper word
of EAX. In fact, cbw and cwde are the same instructions (i.e., share the
same opcode of 98H). The action performed depends on the operand size.
If the operand size is 16 bits, cbw is performed; cwde is performed for
32-bit operands. Clock cycles: 3.

daa — Decimal adjust after addition
Format:

cwd

cwde — Convert word to doubleword
Format:

C
M

0
*

S
M

P
M

A
M

daa
The daa instruction is useful in BCD arithmetic. It adjusts the AL register
to contain the correct two-digit packed decimal result. This instruction
should be used after an addition instruction, as described in Chapter 18.
Both AF and CFflagsare set if there is a decimal carry; these twoflagsare
cleared otherwise. The ZF, SF, and PF flags are set according to the result.
Clock cycles: 3.

Appendix D • IA-32 Instruction Set

das — Decimal adjust after subtraction
Format:
Description:

Description:

0
*

S
M

P
M

A
M

The das instruction is useful in BCD arithmetic. It adjusts the AL register
to contain the correct two-digit packed decimal result. This instruction
should be used after a subtract instruction, as described in Chapter 18.
Both AF and CF flags are set if there is a decimal borrow; these two flags
are cleared otherwise. The ZF, SF, and PF flags are set according to the
result. Clock cycles: 3.

C
-

dec

0
M

Z
M

S
M

P
M

A
M

dest

The dec instruction decrements the d e s t operand by 1. The carry flag is
not affected. Clock cycles: 1 if d e s t is a register; 3 if d e s t is in memory

C
*

div — Unsigned divide
Format:

C
M

das

dec — Decrement by 1
Format:

495

div

0
*

z
*

s
*

p
*

A
*

divisor

The d i v instruction performs unsigned division. The divisor can be an
8-, 16-, or 32-bit operand, located either in a register or in memory. The
dividend is assumed to be in AX (for byte divisor), DX:AX (for word
divisor), or EDX:EAX (for doubleword divisor). The quotient is stored
in AL, AX, or EAX for 8-, 16-, and 32-bit divisors, respectively. The
remainder is stored in AH, DX, or EDX for 8-, 16-, and 32-bit divisors,
respectively. It generates interrupt 0 if the result cannot fit the quotient
register (AL, AX, or EAX), or if the divisor is zero. See Chapter 14 for
details. Clock cycles: 17 for an 8-bit divisor, 25 for a 16-bit divisor, and
41 for a 32-bit divisor.

Assembly Language Programming Under Linux

496

enter — Allocate stack frame
Format:
Description:

enter
bytes,level
This instruction creates a stack frame at procedure entry. Thefirstoperand
b y t e s specifies the number of bytes for the local variable storage in the
stack frame. The second operand l e v e l gives the nesting level of the
procedure. If we specify a nonzero level, it copies l e v e l stack, frame
pointers into the new frame from the preceding stack frame. In all our
examples, we set the second operand to zero. Thus the
enter

XX, 0

statement is equivalent to
push
mov
sub

EBP
EBP,ESP
ESP,XX

See Chapter 11 for more details on its usage. Clock cycles: 11 if l e v e l
is zero.

hit —Halt
Format:
Description:

hit
This instruction halts instruction execution indefinitely. An interrupt or a
reset will enable instruction execution. Clock cycles: oo.

Appendix D • IA-32 Instruction Set

idiv — Signed divide
Format:
Description:

0
*

z* s*

p
*

A
*

idiv

divisor
Similar to d i v instruction except that i d i v performs signed division. The
divisor can be an 8-, 16-, or 32-bit operand, located either in a register or in
memory. The dividend is assumed to be in AX (for byte divisor), DX:AX
(for word divisor), or EDX:EAX (for doubleword divisor). The quotient
is stored in AL, AX, or EAX for 8-, 16-, and 32-bit divisors, respectively.
The remainder is stored in AH, DX, or EDX for 8-, 16-, and 32-bit divisors,
respectively. It generates interrupt 0 if the result cannot fit the quotient
register (AL, AX, or EAX), or if the divisor is zero. See Chapter 14 for
details. Clock cycles: 22 for an 8-bit divisor, 30 for a 16-bit divisor, and
46 for a 32-bit divisor.

imul — Signed multiplication
Format: imul
imul
imul
Description:

497

0
M

Z
*

S
*

P
*

A
*

src
dest,src
dest,src,constant

This instruction performs signed multiplication. The number of operands
for imul can be between 1 and 3, depending on the format used. In the
one-operand format, the other operand is assumed to be in the AL, AX,
or EAX register depending on whether the s r c operand is 8, 16, or 32
bits long, respectively. The s r c operand can be either in a register or in
memory. The result, which is twice as long as the s r c operand, is placed
in AX, DX:AX, or EDX:EAX for 8-, 16-, or 32-bit s r c operands, respectively. In the other two forms, the result is of the same length as the input
operands.
The two-operand format specifies both operands required for multiplication. In this case, s r c and d e s t must both be either 16-bit or 32-bit
operands. While s r c can be either in a register or in memory, d e s t must
be a register.
In the three-operand format, a constant can be specified as an immediate
operand. The result ( s r c x c o n s t a n t ) is stored in d e s t . As in the
two-operand format, the d e s t operand must be a register. The s r c can
be either in a register or in memory. The immediate constant can be an 8-,
16-, or 32-bit value. For additional restrictions, refer to the Pentium data
book. Clock cycles: 10(11 if the one-operand format is used with either
8- or 16-bit operands).

498

Assembly Language Programming Under Linux

in — Input from a port
Format:
Description:

in
in

Description:

This instruction has two formats. In both formats, d e s t must be the AL,
AX, or EAX register. In the first format, it reads a byte, word, or doubleword from p o r t into the AL, AX, or EAX register, respectively. Note that
p o r t is an 8-bit immediate value. This format is restrictive in the sense
that only the first 256 ports can be accessed. The other format is more
flexible and allows access to the complete I/O space (i.e., any port between
0 and 65,535). In this format, the port number is assumed to be in the DX
register. Clock cycles: varies—see Pentium data book.

0
M

S
M

P
M

A
M

dest
inc
The i n c instruction increments the d e s t operand by 1. The carry flag is
not affected. Clock cycles: 1 if d e s t is a register; 3 if d e s t is in memory.

ins — Input from a port to string
Format:

dest, port
dest,DX

inc — Increment by 1
Format:

insb
insw
insd
This instruction transfers an 8-, 16-, or 32-bit data from the input port specified in the DX register to a location in memory pointed by ES:(E)DI. The
DI index register is used if the address size is 16 bits and EDI index register
for 32-bit addresses. Unlike the i n instruction, the i n s instruction does
not allow the specification of the port number as an immediate value. After the data transfer, the index register is updated automatically. The index
register is incremented if DF is 0; it is decremented if DF is 1. The index
register is incremented or decremented by 1,2, or 4 for byte, word, doubleword operands, respectively. The repeat prefix can be used along with the
i n s instruction to transfer a block of data (the number of data transfers is
indicated by the CX register—see the r e p instruction for details). Clock
cycles: varies—see Pentium data book.

Appendix D • IA-32 Instruction Set

int — Interrupt
Format:
Description:

Description:

into
The i n t o instruction is a conditional software interrupt identical to i n t
4 except that the i n t is implicit and the interrupt handler is invoked conditionally only when the overflow flag is set. Clock cycles: varies—see
Pentium data book.

iret — Interrupt return
Format:

interrupt-type
int
The i n t instruction calls an interrupt service routine or handler associated
with i n t e r r u p t - t y p e . The i n t e r r u p t - t y p e is an immediate 8-bit
operand. This value is used as an index into the Interrupt Descriptor Table
(IDT). See Chapter 20 for details on the interrupt invocation mechanism.
Clock cycles: varies—see Pentium data book.

into — Interrupt on overflow
Format:

499

C
M

0
M

Z
M

S
M

P
M

A
M

iret
iretd
The i r e t instruction returns control from an interrupt handler. In real
address mode, it loads the instruction pointer and the flags register with
values from the stack and resumes the interrupted routine. Both i r e t and
i r e t d are synonymous (and use the opcode CFH). The operand size in
effect determines whether the 16-bit or 32-bit instruction pointer (IP or
EIP) and flags (FLAGS or EFLAGS) are to be used. See Chapter 20 for
more details. This instruction affects allflagsas theflagsregister is poppedI
from stack. Clock cycles: varies—see Pentium data book.

500

Assembly Language Programming Under Linux

jcc — Jump if condition cc is satisfied
Format:
Description:

Description:

jcc

target
The j c c instruction alters program execution by transferring control conditionally to the t a r g e t location in the same segment. The t a r g e t
operand is a relative offset (relative to the instruction following the conditional jump instruction). The relative offset can be a signed 8-, 16-, or
32-bit value. Most efficient instruction encoding results if 8-bit offsets are
used. With 8-bit offsets, the target should be within -128 to +127 of the
first byte of the next instruction. For 16- and 32-bit offsets, the corresponding values are 2^^ to 2^^ - 1 and 2^^ to 2^^ - 1, respectively. When the
target is in another segment, test for the opposite condition and use the unconditional jmp instruction, as explained in Chapter 15. See Chapter 15
for details on the various conditions tested like j a, j b e , etc. The j cxz
instruction tests the contents of the CX or ECX register and jumps to the
target location only if (E)CX = 0. The default operand size determines
whether CX or ECX is used for comparison. Clock cycles: 1 for all conditional jumps (except j cxz, which takes 5 or 6 cycles).

jmp — Unconditional jump
Format:

j mp

target
The j mp instruction alters program execution by transferring control unconditionally to the t a r g e t location. This instruction allows jumps to
another segment. In direct jumps, the t a r g e t operand is a relative offset
(relative to the instruction following the j mp instruction). The relative offset can be an 8-, 16-, or 32-bit value as in the conditional jump instruction.
In addition, the relative offset can be specified indirectly via a register or
memory location. See Chapter 15 for an example. For other forms of the
j mp instruction, see the Pentium data book. Note: Flags are not affected
unless there is a task switch, in which case all flags are affected. Clock cycles: 1 for direct jumps, 2 for indirect jumps (more clock cycles for other
types of jumps).

Appendix D • iA-32 Instruction Set

lahf — Load flags into AH register
Format:
Description:

Description:
1

Description:

The l a h f instruction loads the AH register with the low byte of the flags
register. AH := SF, ZF, *, AF, *, PF, *, CF where * represent indeterminate
value. Clock cycles: 2.

Ids
les
Ifs
Igs
Iss

dest,src
dest,src
dest,src
dest,src
dest,src

These instructions read a full pointer from memory (given by the s r c
operand) and load the corresponding segment register (e.g., DS register
for the I d s instruction, ES register for the l e s instruction, etc.) and the
de s t register. The de s t operand must be a 16- or 32-bit register. The first
2 or 4 bytes (depending on whether the d e s t is a 16- or 32-bit register) at
the effective address given by the s r c operand are loaded into the d e s t
register and the next 2 bytes into the corresponding segment register. Clock
cycles: 4 (except I s s ) .

lea — Load effective address
Format:

lahf

Ids/les/lfs/lgs/lss — Load full pointer
Format:

501

lea
dest,src
The l e a instruction computes the effective address of a memory operand
given by s r c and stores it in the d e s t register. The d e s t must be either
a 16- or 32-bit register. If the d e s t register is a 16-bit register and the
address size is 32, only the lower 16 bits are stored. On the other hand,
if a 32-bit register is specified when the address size 16 bits, the effective
address is zero-extended to 32 bits. Clock cycles: 1.

502

Assembly Language Programming Under Linux

leave — Procedure exit
Format:
Description:

leave
The l e a v e instruction takes no operands. Effectively, it reverses the actions of the e n t e r instruction. It performs two actions:
• Releases the local variable stack space allocated by the e n t e r instruction;
• Old frame pointer is popped into (E)BP register.
This instruction is typically used just before the r e t instruction. Clock
cycles: 3.

lods — Load string operand
Format:

Description:

lodsb
lodsw
lodsd
The l o d s instruction loads the AL, AX, or EAX register with the memory
byte, word, or doubleword at the location pointed by DS:SI or DS:ESI. The
address size attribute determines whether the SI or ESI register is used
The lodsw and l o a d s d instructions share the same opcode (ADH). The
operand size is used to load either a word or a doubleword. After loading,
the source index register is updated automatically. The index register is
incremented if DF is 0; it is decremented if DF is 1. The index register
is incremented or decremented by 1, 2, or 4 for byte, word, doubleword
operands, respectively. The r e p prefix can be used with this instruction
but is not useful, as explained in Chapter 17. This instruction is typically
used in a loop (see the l o o p instruction). Clock cycles: 2.

Appendix D • IA-32 Instruction Set

loop/loope/loopne — Loop control

503

Format: loop
target
loope/loopz
target
loopne/loopnz
target
Description:

The l o o p instruction decrements the count register (CX if the address
size attribute is 16 and ECX if it is 32) and jumps to t a r g e t if the count
register is not zero. This instruction decrements the (E)CX register without
changing any flags. The operand t a r g e t is a relative 8-bit offset (i.e., the
target must be in the range —128 to +127 bytes).
The l o o p e instruction is similar to l o o p except that it also checks the ZF
value to jump to the t a r g e t . That is, control is transferred to t a r g e t
if, after decrementing the (E)CX register, the count register is not zero and
ZF = 1. The loopz is a synonym for the l o o p e instruction.
The loopne instruction is similar to loopne except that it transfers control to t a r g e t if ZF is 0 (instead of 1 as in the l o o p e instruction). See
Chapter 15 for more details on these instructions. Clock cycles: 5 or 6 for
l o o p and 7 or 8 for the other two.
Note that the unconditional l o o p instruction takes longer to execute than
a functionally equivalent two-instruction sequence that decrements the
(E)CX register and jumps conditionally.

mov — Copy data
Format:
Description:

mov

dest, src

Copies data from s r c to d e s t . Clock cycles: 1 for most mov instructions except when copying into a segment register, which takes more clock
cycles.

504

Assembly Language Programming Under Linux

movs — Copy string data
Format:

Description:

i
1

movs
movsb
movsw
movsd

Description:

Copies the byte, word, or doubleword pointed by the source index register
(SI or ESI) to the byte, word, or doubleword pointed by the destination
index register (DI or EDI). If the address size is 16 bits, SI and DI registers
are used; ESI and EDI registers are used for 32-bit addresses. The default
segment for the source is DS and ES for the destination. Segment override
prefix can be used only for the source operand. After the move, both source
and destination index registers are automatically updated as in the cmps
instruction.
The r e p prefix instruction can precede the movs instruction for block
movement of data. See r e p instruction for details. Clock cycles: 4.

movsx
movsx
movsx

regl6,src8
reg32,src8
reg32,srcl6

Copies the sign-extended source operand s r c 8 / s r c l 6 into the destination r e g l 6 / r e g 3 2. The destination can be either a 16-bit or 32-bit register only. The source can be a register or memory byte or word operand.
Note that r e g l 6 and reg32 represent a 16- and 32-bit register, respectively. Similarly, s r c 8 and s r c 16 represent a byte and word operand,
respectively. Clock cycles: 3.

movzx — Copy with zero extension
Format:

dest,src

movsx — Copy with sign extension
Format:

movzx
regl6,src£
movzx
reg32,src8
movzx
reg32,srcl6
Similar to movsx instruction except movzx copies the zero-extended
source operand into destination. Clock cycles: 3.

Appendix D • iA-32 Instruction Set

mul — Unsigned multiplication
Format:

Description:

505

0
M

Z
*

S
*

p
*

A
*

mul
mul
mul

AL,src8
AX,srcl6
EAX,src32
Performs unsigned multiplication of two 8-, 16-, or 32-bit operands. Only
one of the operand needs to be specified; the other operand, matching in
size, is assumed to be in the AL, AX, or EAX register.
• For an 8-bit multiplication, the result is in the AX register. CF and
OF are cleared if AH is zero; otherwise, they are set.
• For a 16-bit multiplication, the result is in the DX:AX register pair.
The higher-order 16 bits are in DX. CF and OF are cleared if DX is
zero; otherwise, they are set.
• For a 32-bit multiplication, the result is in the EDX:EAX register
pair. The higher-order 32 bits are in EDX. CF and OF are cleared if
EDX is zero; otherwise, they are set.
Clock cycles: 11 for 8- or 16-bit operands and 10 for 32-bit operands.

neg — Negate sign (two's complement)
Format:
Description:

neg

Description:

0
M

Z
M

S
M

P
M

A
M

operand

Performs 2's complement negation (sign reversal) of the operand specified.
The operand specified can be 8, 16, or 32 bits in size and can be located in
a register or memory. The operand is subtracted from zero and the result is
stored back in the operand. The CF flag is set for nonzero result; cleared
otherwise. Other flags are set according to the result. Clock cycles: 1 for
register operands and 3 for memory operands.

nop — No operation
Format:

C
M

nop
Performs no operation. Interestingly, the rlop instruction is an alias for the
xchg (E) AX, (E) AX instruction. Clock cycles: 1.

506

Assembly Language Programming Under Linux

not — Logical bitwise not
Format:
Description:

not

Description:

Performs I's complement bitwise not operation (a 1 becomes 0 and vice
versa). Clock cycles: 1 for register operands and 3 for memory operands.

C
0

0
0

P
M

A
*

dest, src

Performs bitwise or operation. The result ( d e s t or s r c ) is stored in
d e s t . Clock cycles: 1 for register and immediate operands and 3 if a
memory operand is involved.

out — Output to a port
Format:

operand

or — Logical bitwise or
Format:

out
out

port, src
DX,src

Like the i n instruction, this instruction has two formats. In both formats,
s r c must be in the AL, AX, or EAX register. In thefirstformat, it outputs
a byte, word, or doubleword from s r c to the I/O port specified by the first
operand p o r t . Note that p o r t is an 8-bit immediate value. This format
limits access to thefirst256 I/O ports in the I/O space. The other format is
more general and allows access to the full I/O space (i.e., any port between
0 and 65,535). In this format, the port number is assumed to be in the DX
register. Clock cycles: varies—see Pentium data book.

Appendix D • IA-32 Instruction Set

outs — Output from a string to a port
Format:

Description:

This instruction transfers an 8-, 16-, or 32-bit data from a string (pointed
by the source index register) to the output port specified in the DX register.
Similar to the i n s instruction, it uses the SI index register for 16-bit addresses and the ESI register if the address size is 32. The (E)SI register is
automatically updated after the transfer of a data item. The index register
is incremented if DF is 0; it is decremented if DF is 1. The index register
is incremented or decremented by 1, 2, or 4 for byte, word, or doubleword
operands, respectively. The repeat prefix can be used with o u t s for block
transfer of data. Clock cycles: varies—see Pentium data book.

pop

dest

Pops a word or doubleword from the top of the stack. If the address size
attribute is 16 bits, SS:SP is used as the top of the stack pointer; otherwise,
SS:ESP is used, d e s t can be a register or memory operand. In addition,
it can also be a segment register DS, ES, SS, FS, or GS (e.g., pop DS)
The stack pointer is incremented by 2 (if the operand size is 16 bits) or 4
(if the operand size is 32 bits). Note that pop CS is not allowed. This can
be done only indirectly by the r e t instruction. Clock cycles: 1 if d e s t is
a general register; 3 if d e s t is a segment register or memory operand.

popa — Pop all general registers
Format:

outsb
outsw
outsd

pop — Pop a word from the stack
Format:

507

popa
popad
Pops all eight 16-bit (popa) or 32-bit (pcDp ad) general registers from the
top of the stack. The popa loads the nagisters in the order DI, SI, BP,
discard next two bytes (to skip loading i tito SP), BX, DX, CX, and AX
That is, DI is popped first and AX last. Tlle popad instruction follows the
same order on the 32-bit registers. Clock cycles: 5.

508

Assembly Language Programming Under Linux

popf — Pop flags register
Format:
Description:

Description:

S
M

P
M

A
M

Pops the 16-bit (popf) or 32-bit (popfd) flags register (FLAGS or
EFLAGS) from the top of the stack. Bits 16 (VM flag) and 17 (RFflag)of
the EFLAGS register are not affected by this instruction. Clock cycles: 6
in the real mode and 4 in the protected mode.

push

src

Pushes a word or doubleword onto the top of the stack. If the address size
attribute is 16 bits, SS:SP is used as the top of the stack pointer; otherwise
SS:ESP is used, s r c can be (i) a register, or (ii) a memory operand, or (iii)
a segment register (CS, SS, DS, ES, FS, or GS), or (iv) an immediate byte,
word, or doubleword operand. The stack pointer is decremented by 2 (if the
operand size is 16 bits) or 4 (if the operand size is 32 bits). The p u s h ESP
instruction pushes the ESP register value before it was decremented by the
p u s h instruction. On the other hand, p u s h SP pushes the decrementec
SP value onto the stack. Clock cycles: 1 (except when the operand is in
memory, in which case it takes 2 clock cycles).

pusha — Push all general registers
Format:

Z
M

popf
popf d

push — Push a word onto the stack
Format:

0
M

pusha
pushad
Pushes all eight 16-bit (pusha) or 32-bit (pushad) general registers onto
the stack. The p u s h a pushes the registers onto the stack in the order AX,
CX, DX, BX, SP, BP, SI, and DI. That is, AX is pushed first and DI last
The pushad instruction follows the same order on the 32-bit registers. It
decrements the stack pointer SP by 16 for word operands; decrements ESP
by 32 for doubleword operands. Clock cycles: 5.

Appendix D • IA-32 Instruction Set

pushf — Push flags register
Format:
Description:

Description:

Pushes the 16-bit (pushf) or 32-bit (pushfd) flags register (FLAGS oi
EFLAGS) onto the stack. Decrements SP by 2 (pushf) for word operands
and decrements ESP by 4 (pushfd) for doubleword operands. Clock cycles: 4 in the real mode and 3 in the protected mode.

C
-

0
-

S
-

P
-

A
-

rep
string-inst
repe/repz
string-inst
repne/repnz
string-inst
These three prefixes repeat the specified string instruction until the conditions are met. The r e p instruction decrements the count register (CX or
ECX) each time the string instruction is executed. The string instruction
is repeatedly executed until the count register is zero. The r e p e (repeat
while equal) has an additional termination condition: ZF = 0. The r e p z
is an alias for the r e p e instruction. The r e p n e (repeat while not equal)
is similar to r e p e except that the additional termination condition is ZF
=1. The r e p n z is an alias for the r e p n e instruction. The ZF flag is affected by the r e p cmps and r e p s e a s instructions. For more details,
see Chapter 17. Clock cycles: varies—see Pentium data book for details.

ret — Return form a procedure
Format:

pushf
pushfd

rep/repe/repz/repne/repnz — Repeat instruction
Format:

509

ret
r e t value
Transfers control to the instruction followdng the corresponding c a l l instruction. The optional immediate v a l u e specifies the number of bytes
(for 16-bit operands) or number of words (for 32-bit operands) that are to
be cleared from the stack after the return . This parameter is usually used
to clear the stack of the input parameters. See Chapter 11 for more details.
Clock cycles: 2 for near return and 3 for i'ar return; if the optional v a l u e
is specified, add one more clock cycle. Changing privilege levels takes
more clocks—see Pentium data book.

510

Assembly Language Programming Under Linux

rol/ror/rcl/rcr — Rotate instructions
Format:

Description:

0
M

Z
-

p
-

A
-

rol/ror/rcl/rcr
rol/ror/rcl/rcr
rol/ror/rcl/rcr

src, 1
src,count
src,CL
This group of instructions supports rotation of 8-, 16-, or 32-bit data. The
r o l (rotate left) and r o r (rotate right) instructions rotate the s r c data as
explained in Chapter 16. The second operand gives the number of times
s r c is to be rotated. This operand can be given as an immediate value
(a constant 1 or a byte value count) or preloaded into the CL register.
The other two rotate instructions r c l (rotate left including CF) and r c r
(rotate right including CF) rotate the s r c data with the carry flag (CF)
included in the rotation process, as explained in Chapter 16. The OF flag
is affected only for single bit rotates; it is undefined for multibit rotates.
Clock cycles: r o l and r o r take 1 (if s r c is a register) or 3 (if s r c is
a memory operand) for the immediate mode (constant 1 or count) and 4
for the CL version; for the other two instructions, it can take as many as 27
clock cycles—see Pentium data book for details.

sahf — Store AH into flags register
Format:

C
M

0
-

Z
M

S
M

P
M

A
M

sahf
The AH register bits 7, 6, 4, 2, and 0 are loaded into flags SF, ZF, AF, PF,
and CF, respectively. Clock cycles: 2.

Appendix D • [A-32 Instruction Set

sal/sar/shl/shr — Shift instructions
Format:

Description:

0
M

Z
M

S
M

P
M

A
-

sal/sar/shl/shr
sal/sar/shl/shr
sal/sar/shl/shr

src, 1
src,count
src,CL
This group of instructions supports shifting of 8-, 16-, or 32-bit data. The
format is similar to the rotate instructions. The s a l (shift arithmetic left)
and its synonym s h l (shift left) instructions shift the s r c data left. The
shifted out bit goes into CF and the vacated bit is cleared, as explained
in Chapter 16. The second operand gives the number of times s r c is to
be shifted. This operand can be given as an immediate value (a constant
1 or a byte value count) or preloaded into the CL register. The s h r
(shift right) is similar to s h l except for the direction of the shift. The s a r
(shift arithmetic right) is similar to s a l except for two differences: the
shift direction is right and the sign bit is copied into the vacated bits. If
shift count is zero, no flags are affected. The CF flag contains the last bit
shifted out. The OF flag is defined only for single shifts; it is undefined
for multibit shifts. Clock cycles: 1 (if s r c is a register) or 3 (if s r c is
a memory operand) for the immediate mode (constant 1 or count) and 4
for the CL version.

sbb — Subtract with borrow
Format:

511

sbb

C
M

0
M

Z
M

S
M

P
M

A
M

dest, src

Performs integer subtraction with borrow. The d e s t is assigned the result
of d e s t - (src+CF) .Clock cycles: 1-• 3 .

512

Assembly Language Programming Under Linux

seas — Compare string operands
Format:

Description:

Format:

0
M

Z
M

S
M

P
M

A
M

operand
seas
scasb
scasw
scasd
Subtracts the memory byte, word, or doubleword pointed by the destination index register (DI or EDI) from the AL, AX, or EAX register, respectively. The result is not stored but used to update the flags. The memory
operand must be addressable from the ES register. Segment override is
not allowed in this instruction. If the address size is 16 bits, DI register
is used; EDI register is used for 32-bit addresses. After the subtraction,
the destination index register is updated automatically. Whether the register is incremented or decremented depends on the direction flag (DF). The
register is incremented if DF is 0 (see the e l d instruction to clear the direction flag); if the DF is 1, the index register is decremented (see the s t d
instruction to set the direction flag). The amount of increment or decrement is 1 (for byte operands), 2 (for word operands), or 4 (for doubleword
operands).
Note that the specification of the operand in s e a s is not really required as
the memory operand is assumed to be pointed by the index register. The
s c a s b , scasw, and s c a s d are synonyms for the byte, word, and doubleword s e a s instructions, respectively.
The repeat prefix instructions (i.e., r e p e or repne) can precede the s e a s
instructions for array or string comparisons. See the r e p instruction for
details. Clock cycles: 4.

setCC — Byte set on condition operands

Description:

setCC

dest

Sets d e s t byte to 1 if the condition CC is met; otherwise, sets to zero
The operand d e s t must be either an 8-bit register or a memory operand.
The conditions tested are similar to the conditional jump instruction (see
j CC instruction). The conditions are A, AE, B, BE, E, NE, G, GE, L, LE,
NA, NAE, NB, NBE, NG, NGE, NL, NLE, C, NC, 0, NO, P, PE, PO >
NP, 0, NO, S, NS, Z, NZ. The conditions can specify signed and unsigned
comparisons as well as flag values. Clock cycles: 1 for register operanc 1
and 2 for memory operand.

Appendix D • IA-32 instruction Set

513

shld/shrd — Double precision shift
Format:
Description:

shld/shrd

P
M

A
*

C
1

stc
Sets the carry flag to 1. Clock cycles: 2.

std
Sets the direction flag to 1. Clock cycles: 2.

sti — Set interrupt flag
Format:
Description:

S
M

The s h l d instruction performs left shift of d e s t by count times. The
second operand s r c provides the bits to shift in from the right. In other
words, the s h l d instruction performs a left shift of d e s t concatenated
with s r c and the result in the upper half is copied into d e s t . d e s t and
s r c operands can both be either 16- or 32-bit operands. While d e s t can
be a register or memory operand, s r c must be a register of the same size
as d e s t . The third operand c o u n t can be an immediate byte value or
the CL register can be used as in the shift instructions. The contents of the
s r c register are not altered.
The s h r d instruction (double precision shift right) is similar to s h l d except for the direction of the shift.
If the shift count is zero, noflagsare affected. The CFflagcontains the last
bit shifted out. The OFflagis defined only for single shifts; it is undefined
for multibit shifts. The SF, ZF, and PFflagsare set according to the result.
Clock cycles: 4 (5 if d e s t is a memory operand and the CL register is
used for count).

std — Set direction flag
Format:
Description:

Z
M

d e s t , s r c , count

stc — Set carry flag
Format:
Description:

0
M

sti
Sets the interrupt flag to 1. Clock cycles: 7.

Assembly Language Programming Under Linux

514

stos — store string operand

Format: stosb
stosw
stosd
Description:

Stores the contents of the AL, AX, or EAX register at the memory byte,
word, or doubleword pointed by the destination index register (DI or EDI),
respectively. If the address size is 16 bits, DI register is used; EDI register
is used for 32-bit addresses. After the load, the destination index register
is automatically updated. Whether this register is incremented or decremented depends on the direction flag (DF). The register is incremented
if DF is 0 (see the e l d instruction to clear the direction flag); if the DF
is 1, the index register is decremented (see the s t d instruction to set the
direction flag). The amount of increment or decrement depends on the
operand size (1 for byte operands, 2 for word operands, and 4 for doubleword operands).
The repeat prefix instruction r e p can precede the s t o s instruction to fill
a block of CX/ECX bytes, words, or doublewords. Clock cycles: 3.

sub — Subtract
Format:
Description:

Description:

Z
M

S
M

P
M

A
M

sub
dest,src
Performs integer subtraction. The d e s t is assigned the result of d e s t
s r c . Clock cycles: 1-3.

test — Logical compare
Format:

0
M

c
0

0
0

P
M

A
*

dest,src
test
Performs logical and operation ( d e s t and s r c ) . However, the result
of the and operation is discarded. The d e s t operand can be either in a
register or in memory. The s r c operand can be either an immediate value
or a register. Both d e s t and s r c operands are not affected. Sets SF, ZF,
and PF flags according to the result. Clock cycles: 1 if d e s t is a register
operand and 2 if it is a memory operand.

Appendix D • IA-32 Instruction Set

xchg — Exchange data
Format:
Description:

xchg

Description:

Exchanges the values of the two operands s r c and d e s t . Clock cycles:
2 if both operands are registers or 3 if one of them is a memory operand.

xlat
xlatb

table-offset

Translates the data in the AL register using a table lookup. It changes the
AL register from the table index to the corresponding table contents. The
contents of the BX (for 16-bit addresses) or EBX (for 32-bit addresses)
registers are used as the offset to the the translation table base. The contents of the AL register are treated as an index into this table. The byte
value at this index replaces the index value in AL. The default segment for
the translation table is DS. This is used in both formats. However, in the
operand version, a segment override is possible. Clock cycles: 4.

xor — Logical bitwise exclusive-or
Format:

dest, src

xlat — Translate byte
Format:

515

xor

C
0

0
0

P
M

A
*

dest, src

Performs logical bitwise exclusive-or (xor) operation (des t xor s r c ) and
the result is stored in d e s t . Sets the SF, ZF, and PF flags according to the
result. Clock cycles: 1-3.

E
Glossary
Aborts See Exceptions
Access permissions Unix and Linux systems provide a sophisticated security mechanism to
control access to individual files and directories. Each file and directory has certain access permissions that indicate who can access and in what mode (read-only, read/write, and so on). With these
permissions the system can protect, for example, users from accessing other user's files. Linux,
like the UNIX systems, associates three types of access permissions to files and directories: read
(r), write (w), and execute (x). As the names indicate, the read permission allows read access and
the write permission allows writing into the file or directory. The execute permission is required
to execute a file and, for obvious reasons, should be used with binary and script files that contain
executable code or commands. The Linux system uses nine bits to keep the access permissions as
there are three types of users, each of which can have three types of permissions.
Address bus A group of parallel wires that carry the address of a memory location or I/O port.
The width of the address bus determines the memory addressing capacity of a processor. Typically,
32-bit processors support 32-bit addresses. Thus, these processors can address up to 4 GB (2 ^^
bytes) of main memory.
Addressing mode Most assembly language instructions require operands. There are several
ways to specify the location of the operands. These are called the addressing modes. A complete
discussion of the addressing modes is given in Chapter 13.
ALU see Arithmetic and logic unit
Arithmetic and logic unit This unit forms the computational core of a processor. It performs the
basic arithmetic and logical operations such as integer addition, subtraction, and logical AND and
OR functions.
Assembler Assembler is a program that translates an assembly language source program to its
machine language equivalent (usually into an object file format such as ELF).
Assembler directives These directives provide information to the assembler on various aspects
of the assembly process. These instructions are also called pseudo-ops. Assembler directives are
nonexecutable and do not generate any machine language instructions.

518

Assembly Language Programming Under Linux

Auxiliary flag The auxiliary flag indicates whether an operation has produced a result that has
generated a carry out of or a borrow into the low-order four bits of 8-, 16-, or 32-bit operands. The
auxiliary flag is set if there is such a carry or borrow; otherwise it is cleared.
Based addressing mode In this addressing mode, one of the registers acts as the base register
in computing the effective address of an operand. The effective address is computed by adding
the contents of the specified base register with a signed displacement value given as part of the
instruction. For 16-bit addresses, the signed displacement is either an 8- or a 16-bit number. For
32-bit addresses, it is either an 8- or a 32-bit number. Based addressing provides a convenient way
to access individual elements of a structure. Typically, a base register can be set up to point to the
base of the structure and the displacement can be used to access an element within the structure.
Based-indexed addressing mode In this addressing mode, the effective address is computed as
Base + Index + signed displacement.
The displacement can be a signed 8- or 16-bit number for 16-bit addresses; it can be a signed 8- or
32-bit number for 32-bit addresses. This addressing mode is useful in accessing two-dimensional
arrays with the displacement representing the offset to the beginning of the array. This mode can
also be used to access arrays of records where the displacement represents the offset to a field in a
record. In addition, this addressing mode is used to access arrays passed on to a procedure. In this
case, the base register could point to the beginning of the array, and an index register can hold the
offset to a specific element.
Based-indexed addressing mode with a scale factor In this addressing mode, the effective
address is computed as
Base + (Index * scale factor) + signed displacement.
This addressing mode provides an efficient indexing mechanism into a two-dimensional array
when the element size is 2, 4, or 8 bytes.
Big-endian byte order When storing multibyte data, the big-endian byte order stores the data
from the most-significant byte to the least-significant byte.
Breakpoint Breakpoint is a debugging technique. Often we know that some parts of the program
work correctly. In this case, it is a sheer waste of time to single step or trace the code. What we
would like is to execute this part of the program and then stop for more careful debugging (perhaps
by single stepping). Debuggers provide commands to set up breakpoints. The program execution
stops at breakpoints, giving us a chance to look at the state of the program.
Bus protocol When there is more than one master device, which is typically the case, the device
requesting the use of the bus sends a bus request signal to the bus arbiter using the bus request
control line. If the bus arbiter grants the request, it notifies the requesting device by sending a
signal on the bus grant control line. The granted device, which acts as the master, can then use the
bus for data transfer. The bus-request-grant procedure is called bus protocol. Different buses use
different bus protocols. In some protocols, permission to use the bus is granted for only one bus
cycle; in others, permission is granted until the bus master relinquishes the bus.
Bus transaction A bus transaction refers to the data transfers taking place on the system bus.
Some examples of bus transactions are memory read, memory write, I/O read, I/O write, and
interrupt. Depending on the processor and the type of bus used, there may be other types of
transactions. For example, the Pentium processor supports a burst mode of data transfer in which

Appendix E • Glossary

519

up to four 64 bits of data can be transferred in a burst cycle. Every bus transaction involves a
master and a slave. The master is the initiator of the transaction and the slave is the target of the
transaction. The processor usually acts as the master of the system bus, while components like
memory are usually slaves. Some components may act as slaves for some transactions and as
masters for other transactions.
Call-by-value parameter passing In the call-by-value mechanism, the called function is provided only the current values of the arguments for its use. Thus, in this case, the values of these
arguments are not changed in the called function; these values can only be used as in a mathematical function.
Call-by-reference parameter passing In the call-by-reference mechanism, the called function
actually receives the addresses (i.e., pointers) of the parameters from the calling function. The
function can change the contents of these parameters—and these changes will be seen by the
calling function—by direcdy manipulating the argument storage space.
Carry flag The carry flag records the fact that the result of an arithmetic operation on unsigned
numbers is out of range (too big or too small) to fit the destination register or memory location.
Clock A clock is a sequence of Is and Os. We refer to the period during which the clock is 1 as
the ON period and the period with 0 as the OFF period. Even though we normally use symmetric
clock signals with equal ON and OFF periods, clock signals can take asymmetric forms.
Clock cycle A clock cycle is defined as the time between two successive rising edges or between
successive falling edges.
Clock frequency Clock frequency is measured in number of cycles per second. This number is
referred to as Hertz (Hz). The abbreviation MHz refers to millions of cycles per second.
Clock period The clock period is defined as the time represented by one clock cycle.
Column-major order As the memory is a one-dimensional structure, we need to transform a
multidimensional array to a one-dimensional structure. In the column-major order, array elements
are stored column by column. This ordering is shown Figure 13.5b. Column-major ordering is
used in FORTRAN.
Combinational circuits The output of a combinational circuit depends only on the current inputs
applied to the circuit. The adder is an example of a combinational circuit.
Control bus The control bus consists of a set of control signals. Typical control signals include
memory read, memory write, I/O read, I/O write, interrupt, interrupt acknowledge, bus request,
and bus grant. These control signals indicate the type of action taking place on the system bus. For
example, when the processor is writing data into the memory, the memory write signal is asserted.
Similarly, when the processor is reading from an I/O device, the I/O read signal is asserted.
Data bus A group of parallel wires that carry the data between the processor and memory or I/O
device. The width of data bus indicates the size of the data transferred between the processor and
memory or I/O device.
DDD The Dynamic Data Display (DDD) provides a nice visual interface to command-line debuggers like GDB. For more details on this debugger interface, see Chapter 8.
Decoder A decoder is useful in selecting one-out-of-A/^ lines. The input to a decoder is an I-bit
binary (i.e., encoded) number and the output is 2 ^ bits of decoded data. Among the 2^ outputs of
a decoder, only one output line is 1 at any time.

520

Assembly Language Programming Under Linux

Define directive In the assembly language, allocation of storage space is done by the define
assembler directive. The define directive can be used to reserve and initialize one or more bytes.
However, no interpretation (as in a C variable declaration) is attached to the contents of these
bytes. It is entirely up to the program to interpret the bit pattern stored in the space reserved for
data.
Demultiplexer A demultiplexer has n selection inputs, 2^ data outputs, and one data input.
Depending on the value of the selection input, the data input is connected to the corresponding
data output.
Direct addressing mode This is a memory addressing mode. In this addressing mode, the offset
value is specified directly as part of the instruction. In an assembly language program, this value
is usually indicated by the variable name of the data item. The assembler translates this name into
its associated offset value during the assembly process. To facilitate this translation, assembler
maintains a symbol table. This addressing mode is the simplest of all the memory addressing
modes. A restriction associated with the memory addressing modes is that these can be used to
specify only one operand.
Direction flag The direction flag determines the direction of string processing done by the string
instructions. If the direction flag is clear, string operations proceed in the forward direction (from
head to tail of a string); otherwise, string processing is done in the opposite direction.
Effective address To locate a data item in the data segment, we need two components: the
segment start address and an offset value within the segment. The start address of the segment is
typically found in the DS register. The offset value is often called the effective address.
Executable instructions These instructions tell the processor what to do. Each executable
instruction consists of an operation code {opcode for short). Executable instructions cause the
assembler to generate machine language instructions. As stated in Chapter 1, each executable
statement typically generates one machine language instruction.
Exceptions An exception is a type of interrupt that is generated by the processor. The exceptions
are classified inio faults, traps, and aborts depending on the way they are reported and whether the
interrupted instruction is restarted. Faults and traps are reported at instruction boundaries. Faults
use the boundary before the instruction during which the exception was detected. When a fault
occurs, the system state is restored to the state before the current instruction so that the instruction can be restarted. The divide error, for instance, is a fault detected during the d i v or i d i v
instruction. Traps are reported at the instruction boundary immediately following the instruction
during which the exception was detected. For instance, the overflow exception (interrupt 4) is
a trap. Aborts are exceptions that report severe errors. Examples include hardware errors and
inconsistent values in system tables.
EXTERN directive The e x t e r n directive is used to tell the assembler that certain labels are
not defined in the current source file (i.e., module), but can be found in other modules. Thus,
the assembler leaves ''holes" in the corresponding object file that the linker will fill in later. This
directive and the g l o b a l directive facilitate separate assembly of source modules.
Fanin Fanin specifies the maximum number of inputs a logic gate can have.
Fanout Fanout refers to the driving capacity of an output. Fanout specifies the maximum number
of gates that the output of a gate can drive.
Faults See Exceptions

Appendix E • Glossary

521

Fetch-decode-execute cycle See Processor execution cycle
Full mapping Full mapping is useful in mapping a memory module to the memory address
space. It refers to a one-to-one mapping function between the memory address and the address in
memory address space. Thus, for each address value in memory address space that has a memory
location mapped, there is one and only one memory location responding to the address. Full
mapping, however, requires a more complex circuit to generate the chip select signal that is often
not necessary.
GDB GDB is a GNU debugger. This is a command-line debugger. For more details on this
debugger, see Chapter 8.
GLOBAL directive NASM provides the g l o b a l directive to make the associated label(s) available to other modules of the program. This directive is useful in writing multimodule programs.
Microsoft and Borland assemblers use p u b l i c directive for this purpose. This directive and the
e x t e r n directive facilitate separate assembly of source modules.
Hardware interrupts Hardware interrupts are of hardware origin and asynchronous in nature.
These interrupts are used by I/O devices such as the keyboard to get the processor's attention.
Hardware interrupts can be divided into either maskable or nonmaskable interrupts (see Figure 20.1). A nonmaskable interrupt (NMI) can be triggered by applying an electrical signal on
the NMI pin of the processor. This interrupt is called nonmaskable because the processor always
responds to this signal. In other words, this interrupt cannot be disabled under program control.
Most hardware interrupts are of maskable type. To cause this type of interrupt, an electrical signal
should be applied to the INTR (INTerrupt Request) input of the processor. The processor recognizes the INTR interrupt only if the interrupt enable flag (IF) bit of the flags register is set to 1.
Thus, these interrupts can be masked or disabled by clearing the IF bit.
I/O port An I/O port can be thought of as the address of a register associated with an I/O
controller.
Immediate addressing mode In this addressing mode, data is specified as part of the instruction
itself. As a result, even though the data is in memory, it is located in the code segment, not in the
data segment. This addressing mode is typically used in instructions that require at least two data
items to manipulate. In this case, this mode can only specify the source operand and immediate
data is always a constant. Thus, instructions typically use another addressing mode to specify the
destination operand.
Indexed addressing mode In this addressing mode, the effective address is computed as
(Index * scale factor) + signed displacement.
For 16-bit addresses, no scaling factor is allowed (see Table 13.1 on page 275). For 32-bit addresses, a scale factor of 2, 4, or 8 can be specified. Of course, we can use a scale factor in the
16-bit addressing mode by using an address size override prefix. The indexed addressing mode
is often used to access elements of an array. The beginning of the array is given by the displacement, and the value of the index register selects an element within the array. The scale factor is
particularly useful to access arrays whose element size is 2, 4, or 8 bytes.
Indirect addressing mode This is a memory addressing mode. In this addressing mode, the offset
or effective address of the data is in one of the general registers. For this reason, this addressing
mode is sometimes referred to as the register indirect addressing mode.

522

Assembly Language Programming Under Linux

Interrupt enable flag See Hardware interrupts
Interrupts Interrupt is a mechanism by which a program's flow control can be altered. Interrupts
provide a mechanism similar to that of a procedure call. Causing an interrupt transfers control to a
procedure, which is referred to as an interrupt service routine (ISR). An ISR is sometimes called
a handler. When the ISR is completed, the interrupted program resumes execution as if it were
not interrupted. This behavior is analogous to a procedure call. There are, however, some basic
differences between procedures and interrupts that make interrupts almost indispensable. One of
the main differences is that interrupts can be initiated by both software and hardware. In contrast,
procedures are purely software-initiated. The fact that interrupts can be initiated by hardware is
the principal factor behind much of the power of interrupts. This capability gives us an efficient
way by which external devices can get the processor's attention.
Isolated I/O In isolated I/O, I/O ports are mapped to an I/O address space that is separate from
the memory address space. In architectures such as the IA-32, which use the isolated I/O, special
I/O instructions are needed to access the I/O address space. The IA-32 instruction set provides two
instructions—in and out—to access I/O ports. The i n instruction can be used to read from an
I/O port and the o u t for writing to an I/O port.
Linker Linker is a program that takes one or more object programs as its input and produces
executable code.
Little-endian byte order When storing multibyte data, the little-endian byte order stores the data
from the least-significant byte to the most-significant byte. The Intel 32-bit processors such as the
Pentium use this byte order.
Machine language Machine language is a close relative of the assembly language. Typically,
there is a one-to-one correspondence between the assembly language and machine language instructions. The processor understands only the machine language, whose instructions consist of
strings of Is andOs.
Macros Macros provide a sophisticated text substitution mechanism. Macros permit the assembly
language programmer to name a group of statements and refer to the group by the macro name.
During the assembly process, each macro is replaced by the group of statements that it represents
and assembled in place. This process is referred to as macro expansion. Macros are discussed in
detail in Chapter 10.
Maskable interrupts See Hardware interrupts
Memory address space This refers to the amount of memory that a processor can address.
Memory address space depends on the system address bus width. Typically, 32-bit processors
support 32-bit addresses. Thus, these processors can address up to 4 GB (2 ^^ bytes) of main
memory. The actual memory in a system, however, is always less than or equal to the memory
address space. The amount of memory in a system is determined by how much of this memory
address space is populated with memory chips.
Memory-mapped I/O In memory-mapped I/O, I/O ports are mapped to memory addresses. In
systems that use memory mapped I/O, writing to an I/O port is similar to writing to a memory
location.
Multiplexer A multiplexer is characterized by 2" data inputs, n selection inputs, and a single
output. It connects one of 2^ inputs, selected by the selection inputs, to the output.
Nonmaskable interrupts See Hardware interrupts

Appendix E • Glossary

523

Offset See Effective address
Overflow flag The overflow flag is the carry flag counterpart for the signed number arithmetic.
The main purpose of the overflow flag is to indicate whether an operation on signed numbers has
produced a result that is out of range.
PALs see Programmable array logic device
Parameter passing Parameter passing in assembly language is different and more complicated
than that used in high-level languages. In the assembly language, the calling procedurefirstplaces
all the parameters needed by the called procedure in a mutually accessible storage area (usually
registers or memory). Only then can the procedure be invoked. There are two common methods
depending on the type of storage area used to pass parameters: register method or stack method.
As their names imply, the register method uses general-purpose registers to pass parameters, and
the stack is used in the other method.
Parity flag The parity flag indicates the parity of the 8-bit result produced by an operation; if this
result is 16 or 32 bits long, only the lower-order 8 bits are considered to set or clear the parity flag.
The parity flag is set if the byte contains an even number of 1 bits; if there are an odd number of 1
bits, it is cleared. In other words, the parity flag indicates an even parity condition of the byte.
Partial mapping Partial mapping is useful in mapping a memory module to the memory address space. This mapping reduces the complexity associated with full mapping by mapping each
memory location to more than one address in the memory address space. Typically, the number of
addresses a location is mapped to is a power of 2.
Path name A path name specifies the location of a file or directory in hierarchical file system.
A path can be specified as the absolute path or a relative path. In the former specification, you
give the location of afile/directorystarting from the root directory. Absolute path always begins
with the root directory (/). In contrast, a relative path specifies the path relative to your current
directory.
Pipe Linux provides several commands, which can be treated as the basic building blocks. Often,
we may need several commands to accomplish a complicated task. We may have to feed the output
of one command as input to another to accomplish a task. The shell provides the pipe operator (|)
to achieve this. The syntax is
commandl | command2
The output of thefirstcommand (commandl) is fed as input to the second command (command2).
The output of command2 is the final output. Of course, we can generalize this to connect several
commands.
Processor execution cycle The processor execution cycle consists of the following: (i) Fetch
an instruction from the memory; (ii) Decode the instruction (i.e., identify the instruction); (iii)
Execute the instruction (i.e., perform the action specified by the instruction).
Programmable array logic device A programmable array logic device is very similar to the
FLA except that there is no programmable OR array. Instead, the OR connections are fixed.
This reduces the complexity by cutting down the set of fuses in the OR array. Due to their cost
advantage, most manufacturers produce only PALs.
Programmable logic array A programmable logic array is a field programmable device to
implement sum-of-product expressions. It consists of an AND array and an OR array. A FLA
takes A^ inputs and produces M outputs. Each input is a logical variable. Each output of a FLA

524

Assembly Language Programming Under Linux

represents a logical function output. Internally, each input is complemented, and a total of 2N
inputs is connected to each AND gate in the AND array through a fuse. Each AND gate can be
used to implement a product term in the sum-of-products expression. The OR array is organized
similarly except that the inputs to the OR gates are the outputs of the AND array. Thus, the number
of inputs to each OR gate is equal to the number of AND gates in the AND array. The output of
each OR gate represents a function output.
PLA See Programmable logic array
Propagation delay Propagation delay represents the time required for the output of a circuit
to react to an input. The propagation delay depends on the complexity of the circuit and the
technology used.
Protected-mode memory architecture The IA-32 architecture supports a sophisticated memory
architecture under real and protected modes. The protected mode uses 32-bit addresses and is the
native mode of the IA-32 architecture. In the protected mode, both segmentation and paging are
supported. Paging is useful in implementing virtual memory; it is transparent to the application
program, but segmentation is not.
Queue A queue is afirst-in-first-out(FIFO) data structure. A queue can be considered as a linear
array with insertions done at one end of the array and deletions at the other end.
Real-mode memory architecture The IA-32 architecture supports a sophisticated memory architecture under real and protected modes. The real mode, which uses 16-bit addresses, is provided
to run programs written for the 8086 processor. In this mode, it supports the segmented memory
architecture of the 8086 processor.
Register addressing mode In this addressing mode, processor's internal registers contain the
data to be manipulated by an instruction. Register addressing mode is the most efficient way of
specifying operands because they are within the processor and, therefore, no memory access is
required.
Row-major order As the memory is a one-dimensional structure, we need to transform a multidimensional array to a one-dimensional structure. In the row-major order, array elements are
stored row by row. This ordering is shown Figure 13.5a. Row-major ordering is used in most
high-level languages including C.
Segment descriptors A segment descriptor provides the attributes of a segment. These attributes
include its 32-bit base address, 20-bit segment size, as well as control and status information.
Segment registers In the IA-32 architecture, these registers support the segmented memory
organization. In this organization, memory is partitioned into segments, where each segment is a
small part of the memory. The processor, at any point in time, can only access up to six segments
of the main memory. The six segment registers point to where these segments are located in the
memory.
Sequential circuits The output of a sequential circuit depends not only on the current inputs but
also on the past inputs. That is, output depends both on the current inputs as well as on how it got
to the current state. For example, in a binary counter, the output depends on the current value. The
next value is obtained by incrementing the current value (in a way, the current state represents a
snapshot of the past inputs). That is, we cannot say what the output of a counter will be unless we
know its current state. Thus, the counter is a sequential circuit.

Appendix E • Glossary

525

Shell The shell can be thought of as the user's interface to the operating system. It acts as the
command line interpreter. Several popular shells including the Bourne shell (sh), C-shell (csh),
Kom shell (ksh), and Bourne Again shell (bash) are available. However, b a s h is the default
shell in Fedora 3.
Sign flag The sign flag indicates the sign of the result of an operation. Therefore, it is useful only
when dealing with signed numbers. Note that the most significant bit is used to represent the sign
of a number: 0 for positive numbers and 1 for negative numbers. The sign flag gets a copy of the
sign bit of the result produced by arithmetic and related operations.
Single-stepping Single-stepping is a debugging technique. To isolate a bug, program execution
should be observed in slow motion. Most debuggers provide a command to execute the program
in single-step mode. In this mode, a program executes a single statement and pauses. Then we can
examine contents of registers, data in memory, stack contents, and so on.
Software interrupts Software interrupts are caused by executing the i n t instruction. Thus
these interrupts, like procedure calls, are anticipated or planned events. The main use of software
interrupts is in accessing I/O devices such as the keyboard, printer, display screen, disk drive, and
so on.
Stack A stack is a last-in-first-out (LIFO) data structure. The operation of a stack is analogous
to the stack of trays you find in cafeterias. The first tray removed from the stack of trays would be
the last tray that had been placed on the stack. There are two operations associated with a stack:
insertion and deletion. In stack terminology, insert and delete operations are referred to as push
and pop operations, respectively.
Status flags Status flags are used to monitor the outcome of the arithmetic, logical, and related
operations. There are six status flags. These are the zero flag (ZF), carry flag (CF), overflow flag
(OF), sign flag (SF), auxiliary flag (AF), and parity flag (PF). When an arithmetic operation is
performed, some of theflagsare updated (set or cleared) to indicate certain properties of the result
of that operation. For example, if the result of an arithmetic operation is zero, the zero flag is set
(i.e., ZF = 1). Once the flags are updated, we can use conditional branch instructions to alter flow
control.
Symbolic debugging Symbolic debugging allows us to debug using the source-level statements.
However, to facilitate symbolic debugging, we need to pass the source code and symbol table
information to the debugger. The GNU debugger expects the symbolic information in the s t a b s
format. More details on this topic are given in Chapter 8.
System bus A system bus interconnects the three main components of a computer system: a
central processing unit (CPU) or processor, a memory unit, and input/output (I/O) devices. The
three major components of the system bus are the address bus, data bus, and control bus (see
Figure 2.1).
Top of stack If we view the stack as a linear array of elements, stack insertion and deletion
operations are restricted to one end of the array. The top-of-stack (TOS) identifies the only element
that is directly accessible from the stack.
TOS see Top of stack
Trace Tracing is a debugging technique similar to the single stepping. In the single-step mode, a
procedure call is treated as a single statement and the entire procedure is executed before pausing
the program. This is useful if you know that the called procedure works correctly. Trace, on the
other hand, can be used to single-step even the statements of a procedure call, which is useful to
test procedures.

526

Assembly Language Programming Under Linux

Traps See Exceptions
Tristate buffers Tristate buffers can be in three states: 0, 1, or Z state. A tristate buffer output
can be in state 0 or 1 just as with a normal logic gate. In addition, the output can also be in a
high impedance (Z) state, in which the output floats. Thus, even though the output is physically
connected to the bus, it behaves as though it is electrically and logically disconnected from the bus.
Tristate buffers use a separate control signal so that the output can be in a high impedance state,
independent of the data input. This particular feature makes them suitable for bus connections.
Web browser An Internet application that allows you to surf the web. Netscape Navigator,
Mozilla Fire Fox, and Microsoft Internet Explorer are some of the popular Web browsers.
Zero flag The purpose of the zero flag (ZF) is to indicate whether the execution of the last
instruction that affects the zero flag has produced a zero result. If the result was zero, ZF = 1;
otherwise, ZF = 0.

Index
Symbols
.CODE macro, 156
.DATA macro, 156
.EXIT macro, 156
.STARTUP macro, 156
.UDATA macro, 156
$, location counter, 282, 364
i n c l u d e directive, 156
1 's complement, 467
2's complement, 468
80286 processor, 61
80386 processor, 62
80486 processor, 62
8080 processor, 61
8086 family processors, 61-62

A
aborts, 408
absolute path, 139
access permissions, 141
octal mode, 143
setting, 143
symbolic mode, 144
Ackermann's function, 481
activation record, 256, 392
adders, 26
carry lookahead adders, 28
full-adder, 27
half-adder, 26
ripple-carry adders, 27
address bus, 11
address size override prefix, 275
address translation, 73
protected mode, 67, 68
real mode, 73

addressing modes, 193-196,273-278
16-bit, 274
32-bit, 274
based addressing mode, 276
based-indexed addressing mode, 278
direct addressing mode, 194
immediate addressing mode, 194
indexed addressing mode, 277
indirect addressing mode, 195
register addressing mode, 193
alignment check flag, 66
ALUs, see arithmetic logic units
AND gate, 13
arithmetic logic units, 32
arrays, 278-289
column-major order, 280
multidimensional, 279
one-dimensional, 278
row-major order, 280
ASCII addition, 381
multidigit, 384
ASCII division, 383
ASCII multiplication, 383
ASCII number representation, 380
ASCII subtraction, 382
ASCIIZ string, 364
assembler directives, 187
assembly language
advantages, 7-8
applications, 8
what is it, 5-6
assembly process, 160
AT&T syntax, 434
addressing, 435
operand size, 434
register naming, 434
auxiliary flag, 299

528

B
based addressing mode, 276
based-indexed addressing mode, 278
b a s h , 135
BCD number representation, 380
packed, 381
unpacked, 380
binary numbers, 463
conversion, 464, 465
binary search, 285
bit, 45
bit manipulation, 348
clearing bits, 343
cutting and pasting, 344
isolating bits, 343
toggling, 345
Boolean algebra, 18-19
identities, 18
breakpoint interrupt, 410
bubble notation, 17
bubble sort, 262
building larger memories, 52
burst cycle, 12
bus cycle, 12
bus grant, 13
bus protocol, 13
bus request, 13
bus transactions, 12
byte, 45
byte addressable memory, 45
byte ordering, 58
big-endian, 58
litde-endian, 58

call-by-reference, 232
call-by-value, 232
calling assembly procedures from C, 424
calling C from assembly, 432
carry flag, 294
c a t command, 140
c d command, 140
changing password, 120
character representation, 473-474
extended ASCII, 474
chipselect, 51,54, 57, 58
chmod command, 143

Index

clobber list, 438
clock cycle, 36
clock frequency, 36
clock period, 36
clock signal, 35-37
cycle, 36
falling edge, 36
frequency, 36
period, 36
rising edge, 36
column-major order, 280
command line completion, 136
commands
c a t , 140
cd, 140
chmod, 143
cp, 141
e c h o , 137
g r e p , 146
h e a d , 141
h i s t o r y , 136
l e s s , 141
I s , 140, 142, 143
man, 134
m k d i r , 140
m o r e , 140
mv, 141
p a s s w d , 135, 137
p s , 138
pwd, 140
rm, 140, 141
r m d i r , 140
s e t , 146
s o r t , 146
su, 138
t a i l , 141
uname, 137
wc, 145
w h e r e i s , 137
comparators, 26
control bus, 12
counters, 41
c p command, 141
CPUID instruction, 66
D
data alignment, 59-60

529

Index

2-byte data, 60
4-byte data, 60
8-byte data, 60
hard alignment, 60
soft alignment, 60
data allocation, 188-192
define directives, 189-191
multiple definitions, 191-192
multiple initializations, 192
data bus, 11
Data display debugger (DDD), 179-183
DB directive, 189
DD directive, 189
decoders, 26
dedicated interrupts, 409
demultiplexers, 25
denormalized values, 470
direction flag, 366
DQ directive, 189
DT directive, 189
DW directive, 189

echo command, 137
effective address, 72, 194, 195
EQU directive, 217
even parity, 343, 344
exceptions, 404, 408
aborts, 408
faults, 408
segment-not-present, 70, 409
traps, 408,409
excess-M number representation, 466
exclusive-OR gate, 13
executable instructions, 187
execution cycle, 63
EXTERN directive, 260

factorial, 391-394
recursive procedure, 392
faults, 408
Fibonacci number, 401
file descriptor, 411
file pointer, 411
file system

browsing, 126
firewall setup, 100
flags register, 66, 292-302
auxiliary flag, 299
carry flag, 294
CF, 294
direction flag, 366,420
IFflag,418
OF, 296
overflow flag, 296
parity flag, 300
PF, 300
SF, 298
sign flag, 298
status flags, 292-302
trapflag,409
zero flag, 292
ZF, 292
flat segmentation model, 71
flip-flops, 39-40
floating-point, 469-471
denormals, 452,470
formats, 444
IEEE 754,470
representation, 469
special values, 470
00,470

NaN, 470
zero, 470
floating-point unit organization, 444
frame pointer, 245, 256
full-adder, 27
G
GDB, 170-178
commands, 171-173
g e d i t , 127
Getlnt8,313
getting help, 134
GLOBAL directive, 260
GNOME desktop, 126
g r e p command, 146
H
half-adder, 26
hardware interrupts, 404, 418

530

INTA signal, 419
INTR input, 418
maskable, 405,418
NMI,418
nonmaskable, 405,418
head command, 141
hexadecimal numbers, 463
high-level language interface, 423-441
assembling, 424
calling assembly procedures from C, 424
calling C from assembly, 432
externals, 427
globals, 427
inline assembly, 434-441
parameter passing, 425
preserving registers, 427
returning values, 427
high-level language structures
s w i t c h , 337
h i s t o r y command, 136
HOME, 139
I
I/O address space, 419
I/O controller, 76
I/O device, 76
I/O ports, 77,419
16-bit ports, 419
32-bit ports, 419
8-bit ports, 419
accessing, 419
in, 419
i n s , 420
out, 420
o u t s , 420
I/O routines, 157
GetCh, 156
Getint, 158
GetLInt, 158
GetStr, 157
PutCh, 156
Putint, 158
PutLInt, 158
PutStr, 157
IA-32 flags register, 66
IA-32 instructions
aaa, 380-382,488

Index

aad, 380, 383,488
aam, 380, 383, 488
a a s , 380, 382, 488
adc, 489
add, 198, 489
and, 203, 342, 489
arithmetic instructions, 302-309
bit instructions, 354-355
brf, 355
bsf,355,489
b s r , 490
bswap,212,490
b t , 355, 490
b t c , 355,490
b t r , 355,491
b t s , 355,491
c a l l , 239, 378, 491
cbw, 308,491
cdq, 308, 492
clc,492
e l d , 366,492
c l i , 407, 418, 492
cmc, 492
cmp, 199, 493
cmps, 370, 493
conditional jump, 500
cwd, 308,494
cwde, 308,494
daa, 381,385,494
das, 381,386,495
dec, 197,296,495
d i v , 306,409,495
division instructions, 306
doubleshift instructions, 352
e n t e r , 247, 259, 496
h i t , 496
i d i v , 306,409,497
imul,305,497
in, 419, 498
i n c , 197, 296,498
i n s , 420, 498
i n s b , 498
i n s d , 498
insw, 498
int,410,499
i n t o , 499
i r e t , 499

4^ "
to
^

h{ ^d n3
0 C C

^
c
CQ

^"^

.p^ t o
to 0

i^ "4^ Ir,

^d ^ '^ ^ "d ^ ^d ^d ^
c c 3 0 0 0^ 0 0 0
m 03 0 ^d ^
^d ^ ^
ty tr ^
i-h h+i 0; PJ JU -

U)
to
4^
U)
to

to
U)

-J

-1^ K ^ ^

:->

00 ^

to
4^
00

i^ ^ § ^ ut --^

-J

^p^

to
p
oj
"
U)
-^

P P
0 CD

^ ,^^

"H-

3
0
3
0

3
0

3 3
0 0

o
o

M M

03 0
03 0

to ON to ON
ON
ON

. ^

M 1—'

LO ^

U\ <0>

.^ .^ 'u,

0 0 0 0 -f^ 0
4:^ 4^ 4^ 4 ^ 4 ^

< 03< 03
<
< 03< 03
03

i^'

0
^d

^
N

0 0 0
0 0
0 0 0
0 0
^d ^ ^
^
" tSl
OS N P P P (D Q
a d ' " OS
00 00
o r! ^X
N CD CD '^^
LO
OJ U ) U ) U ) ON
u>
0 ON ON ON ^ vo 0 t o Lo |_. U ) n t o
00
-0

3 3
3 C
O

M <

f t ^d ip
4^
t o u i ^yi
»irt
to
0
0 0
0
U ) L^ L n 0
-^
p
»I^.
Ul
U)
0
0 J^ -1^
:3
Ln :3
0
Ln
C/5
ON
0
ON
rt

0
C

0 P
H{ 0

to u) u) u) "^
vo to to to

LO U) OJ U)

to o to o i^ 2 N LK)
L.
L.^^t ^OL.
;>^ Si^
to o -r
o
orL O^
tOu
J^
t Ou>
^ io
^ i ^oi

^ u) to uj LO to
J^ to -t^ to

;oi^ ^^p

l_l. LJ. LJ. LJ. LJ.
LJ. LJ. LJ.

p p p p 1=5 3 M M IQ t£| O CD CD
" O
CD
to - to
to Kg
? p - toi^

. l_l. l_J. «_l. C J . l_l.

N a3^d O M M l Q t Q (D
" - - Q ^ (D -^ ^
bOlOLON)U)U>lO
O ^ O V O L O t O U ) t O O
^OOOj-JtOONtoON^^-

«_I. J_l. CJ. UJ. CJ. 1—1. L J . l_l-

P^ tto . 0S^
r ^ g r" to - ^r^
0
y^ to -^ fh t o ^ 0 u t ^ t o

U) to ^
U ) ON ^—
ON ^
^

^ "^ ^ t . ^
^ ^r
H i pJ
9i

CD CD O

CD CD

^P ^d
CD

o
to

o i r^ 0° to K)
to
-J ^ vo N)

LO i ^ O OJ
U\ to

O --I g

^:i to
O ^ LO U)

hh
^ ^ ^^
^ 0\
^^^^
< ^^
OS -^
OS (T)
^^ ^^
0\

CQ ^d ^d

_ l . t_l- L J . t _ l .

M M M I - ' M M M I - ' M
M H - L Q I X I f-hCD CD (D O- J13

532

r e p n e / r e p n z , 509
repnz, 366
r e p z , 366
r e t , 241, 245, 509
rol,215,510
ror,215,510
rotate instructions, 353-354
sahf,510
s a l , 350, 511
s a r , 350,511
sbb,511
s e a s , 371, 512
s c a s b , 371,512
scasd, 371, 512
scasw, 371, 512
setCC, 512
s g d t , 70
shift instructions, 347-353
shl,213,511
s h l d , 352,513
shr,213,511
shrd,352,513
sidt,405
sldt,70
s t c , 513
std,366,513
s t i , 407,418, 513
stos,368,514
s t o s b , 368,514
s t o s d , 368,514
stosw, 368, 514
sub, 199,514
t e s t , 204, 347, 514
xchg,212,515
xlat,213,227,515
xor, 203, 345, 515
IA-32 processor
CPUID instruction, 66
EIP register, 66
flags register, 66
alignment checkflag,66
controlflags,66
EFLAGS, 66
FLAGS, 66
interruptflag,66
statusflags,66
systemflags,66

Index

trapflag,66
VMflag,66
zeroflag,66
floating-point instructions, 447-453
addition, 449
comparison, 451
data movement, 448
division, 451
miscellaneous, 452
multiplication, 450
subtraction, 449
floating-point registers, 444-447
floating-point unit organization, 444
instruction fetch, 75
IP register, 66
memory architecture, see memory architecture
protected mode, 67
real mode, 72
stack implementation, 234
stack operations, 236
IA-32 registers, 63-67,444-447
control registers, 65
data registers, 64
floating-point registers, 444-447
index registers, 65
pointer registers, 65
segment registers, see segment registers
IA-32 trapflag,66
ICs, see integrated circuits
IEEE 754floating-pointstandard, 443,470
indexed addressing mode, 277
indirect procedure call, 378
inline assembly, 434^41,457
clobber list, 438
input/output
I/O address space, 77
isolated I/O, 77
memory-mapped I/O, 77
insertion sort, 282
installation, 92-107
getting help, 114
instruction decoding, 63
instruction execution, 63
instruction fetch, 63, 75
instruction pointer, 65
int21H, 156

533

Index

int21H DOS services
4CH return control, 156
int3,410
int4,410
integrated circuits, 14
LSI, 14
MSI, 14
propagation delay, 14
SSI, 14
SSI chips, 14
VLSI, 14
interrupt 1,409
interrupt 2, 418
interrupt 4, 409
interrupt descriptor table, 405
interruptflag, 66,418
interrupt handler, 403
interrupt processing
protected mode, 405
interrupt service routine, 403
interrupts
breakpoint, 410
dedicated, 409
descriptors, 406
divide error, 409
exceptions, 404, 408
handler, 403
hardware, 418
hardware interrupts, 404
IDT organization, 406
ISR, 403
maskable, 405
nonmaskable, 405
overflow, 410
single-step, 409
software interrupts, 404
taxonomy, 404,407
into, 410
isolated I/O, 77
Itanium processor, 62

jump instructions
backward jump, 318
conditional jump, 322-327
far jump, 319
forward jump, 318

indirect jump, 335-339
intersegment jump, 319
intrasegmentjump, 319
near jump, 319
SHORT directive, 319
short jump, 319
unconditional jump, 318
direct, 318
K
Karnaugh maps, 19-23
keyboard configuration, 117

latches, 37-39
clocked SR latch, 38
D latch, 39
SR latch, 37
Id, 166
left-pusher language, 425
l e s s command, 141
linear address, 67
linear search, 330
linking, 166
Linux, 154
Linux system calls, 411
file system calls, 411
file close, 414
file create, 412
file open, 413
file read, 413
file write, 414
Iseek, 414
local variables, 256
logic circuits
adders, 26
ALUs, 32
bubble notation, 17
comparators, 26
counters, 41
decoders, 26
demultiplexers, 25
flip-flops, 39
latches, 37
multiplexers, 24
PALs, 30

534

PLAs, 29
shift registers, 40
logic gates
fanin, 14
fanout, 14
propagation delay, 14
logical address, 72
logical expressions, 15
derivation, 17
even parity, 16
majority, 16
product-of-sums, 18
simplification, 18-23
Boolean algebra method, 1 i
Karnaugh map method, 19
sum-of-products, 17
I s command, 140, 142, 143
M
machine language, 4
macro directive, 218
macro expansion, 212
macro instructions, 220
macro parameters, 219
macros, 212, 218
instructions, 220
macro directive, 218
parameters, 219
man command, 134
masking bit, 343
MASM, 5
memory
Bandwidth, 46
access time, 46
address, 45
address space, 45
address translation, 73
building a block, 50
building larger memories, 52
byte addressable, 45
chipselect, 51,54, 57, 58
cycle time, 46
design with Dflip-flops,51
DRAM, 49, 53
dynamic, 49
effective address, 72
EPROM, 48

Index

larger memory design, 53
linear address, 67
logical address, 72, 73
memory address space, 53
memory chips, 53
memory mapping, 56
full mapping, 56
partial mapping, 57
nonvolatile, 48
offset, 72
physical address, 72, 73
PROM, 48
RAM, 49
read cycle, 47
read-only, 48
read/write, 48
ROM, 48
SDRAM, 53
segmentation models, 71
segmented organization, 72
SRAM, 49
static, 49
volatile, 48
wait cycles, 47
write cycle, 47
memory access time, 46
memory address space, 45, 53
memory architecture
IA-32, 72-75
protected mode, 67
real mode, 72-74
memory bandwidth, 46
memory cycle time, 46
memory mapping, 56
full mapping, 56
partial mapping, 57
memory read cycle, 47
memory write cycle, 47
memory-mapped I/O, 77
merge sort, 483
mixed mode operation, 74
mixed-mode programs, 423
calling assembly code, 424
calling C from assembly, 432
compiling, 425
externals, 427
globals, 427

535

Index

inline assembly, 434-441
parameter passing, 425
preserving registers, 427
returning values, 427
mkdir command, 140
more command, 140
mounting file system, 110-112
mouse configuration, 119
multibyte data, 58
multidimensional arrays, 279
multiplexers, 24
multisegment segmentation model, 71
mv command, 141
N
HAND gate, 13
NASM, 5, 154-156,160-166
NOR gate, 13
NOT gate, 13
number representation
floating-point, 469-471
signed integer, 466
I's complement, 467
2's complement, 468
excess-M, 466
signed magnitude, 466
unsigned integer, 466
number systems, 461
base, 461
binary, 461,463
conversion, 463-465
decimal, 461,462
floating-point, 469-471
hexadecimal, 461,463
notation, 462
octal, 461,463
radix, 461
O
octal numbers, 463
office applications, 129
one's complement, 467
one-dimensional arrays, 278
operand size override prefix, 275
OR gate, 13
overflow flag, 296

overflow interrupt, 410
override prefix, 74
address size, 275
operand size, 275
segment override, 269

package management, 107
packed BCD numbers
addition, 385
processing, 385
subtraction, 386
paging, 67
PALs, see programmable array logic devices
parameter passing, 232, 242-252,425
call-by-reference, 232
call-by-value, 232
register method, 242
stack method, 243
variable number of parameters, 268-272
parity conversion, 345
parity flag, 300
parted, 83-85
help, 84
print, 85
resize, 85
partitioning hard disk, 82-92
PartitionMagic, 88-92
pa sswd command, 135, 137
PATH, 136
pathnames, 139
absolute path, 139
relative path, 140
Pentium II processor, 62
Pentium Pro processor, 62
peripheral device, 76
physical address, 72
pipelining
superscalar, 62
pipes, 146
PLAs, see programmable logic arrays
p r e f e r e n c e s menu, 117
procedure template, 248
procedures
indirect call, 378
local variables, 256
product-of-sums, 18

536

program counter, 66
programmable array logic devices, 30
programmable logic arrays, 29
programmer productivity, 7
protected mode architecture, 67
p s command, 138
Putlnt8,311
pwd command, 140
Q
QTparted, 85-86
quicksort, 396
algorithm, 397
Pentium procedure, 397
R
real mode architecture, 72-74
real-time applications, 8
recalling a command, 136
recursion, 391-392
activation record, 392
factorial, 391
Fibonacci number, 401
versus iteration, 400
in Pentium
factorial procedure, 392
quicksort procedure, 397
quicksort algorithm, 397
redirection, 145
input, 145
output, 145
relative path, 140
right-pusher language, 425
rm command, 140, 141
r m d i r command, 140
root password selection, 102
row-major order, 280

screen resolution configuration, 119
Screensaver configuration, 121
segment descriptor, 69-70
segment descriptor tables, 70-71
GDT, 70
IDT, 70

Index

LDT, 70
segment override, 269
segment registers, 67-69
CS register, 67
DS register, 67
ES register, 67
FS register, 67
GS register, 67
SS register, 67
segmentation, 67
segmentation models, 71
flat, 71
multisegment, 71
segmented memory organization, 72
segment base, 72
segment offset, 72
selection sort, 332
s e t command, 146
setting access permissions, 143
setting date and time, 124
setting display, 125
shell, 135
shift operations, 348
shift registers, 40
SHORT directive, 319
sign bit, 466
sign extension, 305,469
sign flag, 298
signed integer, 466
1 's complement, 467
2's complement, 468
excess-M, 466
signed magnitude representation, 466
signed magnitude representation, 466
single-step interrupt, 409
software interrupts, 404, 410
exceptions, 404
system-defined, 404
user-defined, 404
s o r t command, 146
space-efficiency, 7
stack, 233-234
activation record, 256
frame pointer, 245, 256
IA-32 processor implementation, 234
operations, 236, 237
operations onflags,237

537

Index

overflow, 235, 239
stack frame, 244, 256
top-of-stack, 233, 234
underflow, 235, 239
use, 238
what is it, 233
stack frame, 244, 256
stack operations, 236, 237
stack overflow, 235,239
stack underflow, 235, 239
status flags, 292-302
string processing
string compare, 375
string length, 374
string representation, 363
fixed-length, 363
variable-length, 363
su command, 138
sum-of-products, 17
superscalar, 62
symbol table, 192, 194
system bus, 11

truth table, 13
AND, 13
even parity, 15
majority, 15
HAND, 13
NOR, 13
NOT, 13
OR, 13
XOR, 13
two's complement, 468
type specifier, 197
BYTE, 197
DWORD, 197
QWORD, 197
TBYTE, 197
WORD, 197
types of memory, 48-50

t a i l command, 141
TASM, 5
time zone selection, 102
time-critical applications, 8
time-efficiency, 7
TIMES directive, 192
top-of-stack, 233, 234
towers of Hanoi, 481
trapflag,409
traps, 408,409
tristate buffers, 50

variable number of parameters, 268-272
vim editor, 147

U
uname command, 137
unsigned integer representation, 466

W
wait cycles, 47
wc command, 145
where i s command, 137

XOR gate, 13

zero extension, 466
zero flag, 66, 292

The GNU General Public License
Version2, June 1991
Copyright © 1989, 1991 Free Software Foundation, Inc.
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA
Everyone is permitted to copy and distribute verbatim copies of this license document, but
changing it is not allowed.

PREAMBLE
The licenses for most software are designed to take away your freedom to share and change it.
By contrast, the GNU General Public License is intended to guarantee your freedom to share and
change free software—to make sure the software is free for all its users. This General Public
License applies to most of the Free Software Foundation's software and to any other program
whose authors commit to using it. (Some other Free Software Foundation software is covered by
the GNU Library General Public License instead.) You can apply it to your programs, too.
When we speak of free software, we are referring to freedom, not price. Our General Public
Licenses are designed to make sure that you have the freedom to distribute copies of free software
(and charge for this service if you wish), that you receive source code or can get it if you want it,
that you can change the software or use pieces of it in new free programs; and that you know you
can do these things.
To protect your rights, we need to make restrictions that forbid anyone to deny you these rights
or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you
if you distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether gratis or for a fee, you must
give the recipients all the rights that you have. You must make sure that they, too, receive or can
get the source code. And you must show them these terms so they know their rights.
We protect your rights with two steps: (1) copyright the software, and (2) offer you this license
which gives you legal permission to copy, distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain that everyone understands
that there is no warranty for this free software. If the software is modified by someone else and
passed on, we want its recipients to know that what they have is not the original, so that any
problems introduced by others will not reflect on the original authors' reputations.
Finally, any free program is threatened constantly by software patents. We wish to avoid
the danger that redistributors of a free program will individually obtain patent licenses, in effect
making the program proprietary. To prevent this, we have made it clear that any patent must be
licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and modification follow.

540

GNU General Public License

TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND
MODIFICATION
0. This License applies to any program or other work which contains a notice placed by the
copyright holder saying it may be distributed under the terms of this General Public License.
The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a
work containing the Program or a portion of it, either verbatim or with modifications and/or
translated into another language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not covered by this License;
they are outside its scope. The act of running the Program is not restricted, and the output
from the Program is covered only if its contents constitute a work based on the Program
(independent of having been made by running the Program). Whether that is true depends
on what the Program does.
1. You may copy and distribute verbatim copies of the Program's source code as you receive
it, in any medium, provided that you conspicuously and appropriately publish on each copy
an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that
refer to this License and to the absence of any warranty; and give any other recipients of the
Program a copy of this License along with the Program.
You may charge a fee for the physical act of transferring a copy, and you may at your option
offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion of it, thus forming a
work based on the Program, and copy and distribute such modifications or work under the
terms of Section 1 above, provided that you also meet all of these conditions:
(a) You must cause the modified files to carry prominent notices stating that you changed
the files and the date of any change.
(b) You must cause any work that you distribute or publish, that in whole or in part contains
or is derived from the Program or any part thereof, to be licensed as a whole at no
charge to all third parties under the terms of this License.
(c) If the modified program normally reads commands interactively when run, you must
cause it, when started running for such interactive use in the most ordinary way, to print
or display an announcement including an appropriate copyright notice and a notice that
there is no warranty (or else, saying that you provide a warranty) and that users may
redistribute the program under these conditions, and telling the user how to view a copy
of this License. (Exception: if the Program itself is interactive but does not normally
print such an announcement, your work based on the Program is not required to print
an announcement.)
These requirements apply to the modified work as a whole. If identifiable sections of that
work are not derived from the Program, and can be reasonably considered independent and
separate works in themselves, then this License, and its terms, do not apply to those sections
when you distribute them as separate works. But when you distribute the same sections as
part of a whole which is a work based on the Program, the distribution of the whole must
be on the terms of this License, whose permissions for other licensees extend to the entire
whole, and thus to each and every part regardless of who wrote it.

GNU General Public License

541

Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of
derivative or collective works based on the Program.
In addition, mere aggregation of another work not based on the Program with the Program
(or with a work based on the Program) on a volume of a storage or distribution medium does
not bring the other work under the scope of this License.
3. You may copy and distribute the Program (or a work based on it, under Section 2) in object
code or executable form under the terms of Sections 1 and 2 above provided that you also
do one of the following:
(a) Accompany it with the complete corresponding machine-readable source code, which
must be distributed under the terms of Sections 1 and 2 above on a medium customarily
used for software interchange; or,
(b) Accompany it with a written offer, valid for at least three years, to give any third party,
for a charge no more than your cost of physically performing source distribution, a
complete machine-readable copy of the corresponding source code, to be distributed
under the terms of Sections 1 and 2 above on a medium customarily used for software
interchange; or,
(c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for
all modules it contains, plus any associated interface definition files, plus the scripts used
to control compilation and installation of the executable. However, as a special exception,
the source code distributed need not include anything that is normally distributed (in either
source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the
executable.
If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place
counts as distribution of the source code, even though third parties are not compelled to copy
the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute
the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have
their licenses terminated so long as such parties remain in full compliance.
5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works.
These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your
acceptance of this License to do so, and all its terms and conditions for copying, distributing
or modifying the Program or works based on it.

542

GNU General Public License

6. Each time you redistribute the Program (or any work based on the Program), the recipient
automatically receives a license from the original licensor to copy, distribute or modify the
Program subject to these terms and conditions. You may not impose any further restrictions
on the recipients' exercise of the rights granted herein. You are not responsible for enforcing
compliance by third parties to this License.
7. If, as a consequence of a court judgment or allegation of patent infringement or for any
other reason (not limited to patent issues), conditions are imposed on you (whether by court
order, agreement or otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot distribute so as to satisfy
simultaneously your obligations under this License and any other pertinent obligations, then
as a consequence you may not distribute the Program at all. For example, if a patent license
would not permit royalty-free redistribution of the Program by all those who receive copies
directly or indirectly through you, then the only way you could satisfy both it and this
License would be to refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended
to apply in other circumstances.
It is not the purpose of this section to induce you to infringe any patents or other property
right claims or to contest validity of any such claims; this section has the sole purpose of
protecting the integrity of the free software distribution system, which is implemented by
public license practices. Many people have made generous contributions to the wide range of
software distributed through that system in reliance on consistent application of that system;
it is up to the author/donor to decide if he or she is willing to distribute software through any
other system and a licensee cannot impose that choice.
This section is intended to make thoroughly clear what is believed to be a consequence of
the rest of this License.
8. If the distribution and/or use of the Program is restricted in certain countries either by patents
or by copyrighted interfaces, the original copyright holder who places the Program under
this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such
case, this License incorporates the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions of the General
Public License from time to time. Such new versions will be similar in spirit to the present
version, but may differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Program specifies a version
number of this License which applies to it and "any later version", you have the option of
following the terms and conditions either of that version or of any later version published
by the Free Software Foundation. If the Program does not specify a version number of this
License, you may choose any version ever published by the Free Software Foundation.
10. If you wish to incorporate parts of the Program into other free programs whose distribution
conditions are different, write to the author to ask for permission. For software which is
copyrighted by the Free Software Foundation, write to the Free Software Foundation; we
sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing
and reuse of software generally.

GNU General Public License

543

No WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR
THE P R O G R A M , TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS A N D / O R OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND HTNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO
THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12.

I N N O E V E N T U N L E S S R E Q U I R E D B Y A P P L I C A B L E LAW O R AGREED TO IN WRITING

WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY A N D / O R
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT
NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES
SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE
WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

END OF TERMS AND CONDITIONS

Source Exif Data: File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.6 Linearized : No Author : Sivarama P. Dandamudi Create Date : 2006:01:30 00:51:21+08:00 Modify Date : 2011:07:22 18:20:31+05:00 Has XFA : No XMP Toolkit : 3.1-702 Metadata Date : 2006:01:30 01:04:53+08:00 Format : application/pdf Title : Guide to Assembly Language Programming in Linux Creator : Sivarama P. Dandamudi Document ID : uuid:1887ac88-367b-4de5-a121-bac116f22efe Instance ID : uuid:e00af366-1c9c-412a-ba9a-517a82acdcc3 Producer : ABBYY FineReader 8.0 Professional Edition Page Mode : UseOutlines Page Count : 539 EXIF Metadata provided byEXIF.tools

Guide To Assembly Language Programming In Linux Sivarama P. Dandamudi

Navigation menu

Versions of this User Manual:

Views

Navigation