Xenix_System_Volume_3_Text_Processing Xenix System Volume 3 Text Processing Xenix_System_Volume_3_Text_Processing Xenix_System_Volume_3_Text_Processing
User Manual: Xenix_System_Volume_3_Text_Processing
Open the PDF directly: View PDF .Page Count: 249
TECHNOLOGY INCORPORATED
XENIXTM
SYSTEM
TEXT
PROCESSING
VOLUME 3
CONTENTS
1.0 INTRODUCTION......................................
1-1
2.0 USING THE TEXT EDITORS ED AND SED.................
2 • 1 ED.................................................
2.1.1
A Summary of Commands and Line
Numbers
2-4
2.1.2
More Advanced Editing Techniques
2-6
2.1.3
Editing Scripts 2-30
2 • 2 SED.............................".................
2.2.1
Overall Operation
2-31
2.2.2
Command-line Flags
2-32
2.2.3
Order of Application of Editing Commands
2-32
2.2.4
Pattern-space
2-33
2.2.5
Addresses 2-33
2.2.6
Functions
2-34
2-1
2- 4
2 - 31
3.0 PATTERN RECOGNITION AND FILE COMPARISON
UTILITIES. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .
3 •1
3•2
3•3
3•4
3•5
3•6
4.0
4. 1
4.2
4.3
4 .4
5.0
GRE P ••••••••••• .• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
AWK. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
D I FF . . . . . . . . . . . . . . . . • • • . • . . . . . . . . . • . . . . . . . • . • . .
FF 3 ••••••••••••••••••••••••••••••••••••••••••
COMM •••••.••.••.••.••••••••••••.•••..••••••••.•
SPELL .•...••••••••. " • • • • . . • • . • . • • • • . • . . . . . . . • . .
DI
TEXT FORMATTING AND DOCUMENT PREPARATION.........
FORMATTING PACKAGES..............................
SUPPORTING TOOLS.................................
HINTS FOR PREPARING DOCUMENTS....................
A NOTE ABOUT THE PAPERS..........................
4.4.1
Using the -ms Macros with Troff and
Nroff
4.4.2
A Guide to Preparing Documents with
-ms
4.4.3
NROFF/TROFF User's Manual
4.4.4
A TROFF Tutorial
4.4.5
Tbl- A Program to Format Tables
4.4.6
Typesetting Mathematics- User's
Guide
4.4.7
Some Applications of Inverted
Indexes
COMMAND REFERENCE................................
-
i -
3-1
3- 2
3- 5
3-16
3-18
3-19
3- 20
4-1
4- 2
4-3
4-4
4- 5
5-1
4.3
4.4
HINTS FOR PREPARING DOCUMENTS....................
A NOTE ABOUT THE PAPERS..........................
4.4.1
Using the -ms Macros with Troff and
Nroff
4.4.2
A Guide to Preparing Documents with
-ms
4.4.3
NROFF/TROFF User's Manual
4.4.4
A TROFF Tutorial
4.4.5
Tbl- A Program to Format Tables
4.4.6
Typesetting Mathematics- User's
Guide
4.4.7
Some Applications of Inverted
Indexes
-
ii -
4-4
4.... 5
CHAPTER
1
INTRODUCTION
Users involved in text processing applications like typing
memos,
writing
technical
reports,
and
preparing
documentation, will soon discover that
their
pr1mar~
interface with the computer is through the editors, the
various pattern recognition and file comparison utilities,
and the text formatting packages. Programmers also make
extensive use of the editors and other utilities described
in this volume for writing and revising code. Therefore, it
is extremely important that all users learn as much as
possible about the tools available to them on the XENIX
system, and practice using the various
commands
and
functions. The more understanding the user has of which
functions work best in which situations and the more
dexterity the user developes in using particular commands,
the more powerful the editors and related tools become.
This volume contains an introduction to the XENIX text
editors, ed and sed. For a more detailed tutorial material
concerning the XENIX text editors, read the appropriate
sections in The Programmer'~ Introduction.
Also introduced in this volume are some tools which prove
extremely useful in the process of preparing documents, when
it is necessary to locate repeated elements in a single file
or group of files to make a consistent set of changes, or to
compare and contrast two or more files in order to identify
the differences between them. Because several of these
programs may be used interchangeably, knowing which one will
do the job at hand most efficiently is a large part of
understanding
their
use.
These
programs
streamline
complicated editing command procedures, locate variations
among several versions of text, and can deal with large
numbers of text files at once.
~
members of the grep family, grep, egrep, and fgrep.
$
awk.
~
diff and diff3.
$
comm.
$
spell.
1-1
XENIX Text Processing
The XENIX system also offers two text formatting packages
which
simplify
the
production of technical reports,
memoranda, formal papers, and documentation, nroff and troff
designed
to
produce
output for the lineprinter and
typesetter, respectively.
-Ms, a
canned
package
of
formatting requests which is much simpler to use than nroff
and troff, is described in detail. Some supporting programs
that aid in document preparation, including eqn which
integrates mathematical symbols and equations into the text
of a document, tbl which provides an analogous service for
preparing tabular material, and, refer which
prepares
bibliographic citations from a data base, are also discussed
in this volume.
1-2
CHAPTER
2
USING THE TEXT EDITORS ED AND SED
Most users of a computer system rely heavily on text editors
in doing
their work,
whether
it be writing programs or
preparing data. For those users involved in text processing
applications,
for
typing memos, writing technical reports,
or preparing documentation, the. various editing functions
may be their primary interface with the computer. Therefore,
it is extremely important that the text processing user
learn as much as possible about the editing tools available
on the system, and practice using the various commands and
functions.
The more understanding the user has of which
functions work best in which !ituations and the more
dexterity the user developes in using particular commands,
the more powerful the editors and related tools become. For
a more detailed introduction to text editing with XENIX,
read the
appropriate
sections
in
The
Programmer'~
Introduction.
XENIX offers two text editors,
ed,
an interactive line
editor, and sed, a non-interactive context editor. Although
in many respects the capabilities of these two editors
overlap, the user will soon find that ed is more appropriate
to on-the-spot entry, deletion and simple modification of
text.
Sed is more appropriate when uniform changes must be
made ·in large files or groups of file, or when the sequence
of editing commands needed to make the changes becomes
complex.
Because sed is derived from ed,
however,
the . two editors
share some characteristics.
In particular, they recognize
the same class of regular expressions. A regular expression
specifies a set of strings of characters to be matched by a
pattern found in the text,
sometimes referred to as a
context address.
In practical terms, these are the patterns
the user asks the editor
to search and substitute when
changes in text are required. These regular expressions
include:
1.
An ordinary character
below)
is a regular
character.
2.
A circumflex 'AI
at the beginning of a regular
expression matches the null character at the beginning
of a line.
(not one of those discussed
expression,
and matches that
2-1
XENIX Text Processing
3.
A dollar-sign '$1 at the end of a
4.
The characters
character, but
pattern ~pace.
5.
A period '.1 matches any character except the terminal
regular expression
matches the null character at the end of a line.
'\n' match
an
not the newline
imbedded
newline
at the end of the
newline of the pattern space.
6.
A regular
expression followed by an asterisk '*'
matches
any
number
(including
0)
of adjacent
occurrences of the regular expression it follows.
7.
A string of characters in square brackets '[ ]'
matches any character in the string, and no others'.
If, however, the first character of the string is
circumflex 'A"
the regular expression matches any
character except the characters in the string and the
terminal newline of the pattern space.
B.
A concatenation of regular expressions is a regular
expression which matches the concatenation of strings
matched by the components of the regular expression.
9.
A regular expression between the sequences '\(1 and
'\)' is identical in effect to the unadorned regular
expression, but has side-effects which are described
under the s command below and specification 10)
immediately below.
10.
The expression '\d' means the
same
string
of
characters matched by an expression enclosed in '\(1
and '\) I earlier in the same pattern.
Here d is a
single digit; the string specified is that beginning
with the dth occurrence of '\{' counting from the
left. For example, the expression 'A\(.*\)\ll matches
a line beginning with two ~epeated occurrences of the
same string.
11.
The null regular expression
'II') is equivalent to the
compiled.
standing alone (e.g.,
last regular expression
To use one of the special characters (A $ • * [ ] \ I) as a
literal (to match an occurrence of itself in the input),
precede the special character by a backslash '\1.
For a context address to 'match' the input requires that the
whole pattern within the address match some portion of the
pattern space.
The use of these pattern matches for
2-2
XENIX Text
~~oce~sing
specific applications within
detail for each editor.
2-3
ed
and sed are discussed in
XENIX Text Processing
2.1
ED
Ed is one of the text editors on the XENIX system, used
primarily to create and modify text interactively, whether
it is a document, a program, or data for a program.
The
most frequently used commands are summarized here, followed
by a discussio.n of editing techniques especially useful in
text processing applications.
2.1.1
A Summary of Commands and Line Numbers
The general form of
preceded by one or
r. and w, followed
allowed per line,
command (except for
ed commands is the command name, perhaps
two line numbers, and, in the case of e,
by a filename.
Only one command is
but a p command may follow any other
e, r, w, and q).
a
Append, that is, add lines to the buffer (at line dot,
unless
a different line is specified). Appending
continues until a period is is typed on a new line.
The value of dot is set to the last line appended.
c
Change the specified lines to the new text which
follows. The new lines are terminated by a period on a
newline, as with a. If no lines are specified, replace
line dot. Dot is set to last line changed.
d
Delete the lines specified.
If none are specified,
delete line dot.
Dot is set to the first unde1eted
line, unless $ is deleted, in which case dot is set to
$.
e
Edit new file. Any previous contents of the buffer are
thrown away, so issue a w beforehand.
f
Print remembered filename. If a name follows
the remembered name is set to it.
g
The command
f,
then
g/---/commands
will execute the commands on those lines that contain
which can be any context search expression.
i
until a
Insert lines before specified line (or dot)
single period is typed on a new line. Dot is set to
the last line inserted.
2-4
XENI~
Text Processing
m
Move lines specified to after the line named
Dot is set to the last line moved.
after
p
Print specified lines.
If none are specified,
print
line the line specified by dot.
A single line number
is equivalent to the line-numberp command.
A single
prints .+!,the next line.
q
Quit ed. Wipes out all text in buffer if
in a row without a w command.
r
Read a file
into buffer
(at end unless specified
elsewhere.) Dot is set to the last line read.
s
The command
given
m.
twice
s/stringl/string2/
substitutes the characters stringl into string2 in
the
specified lines.
If no lines are specified,
the
substitution takes place only on the line specified by
dot.
Dot
is set to the last line
in which a
substitution took place,
which means that if
no
substitution takes place,
dot remains unchanged s
changes only the first occurrence of stringl on a line;
to change all of them, type a 9 after the final slash.
v
The command
v/---/commands
executes commands on those lines that do not contain
w
Write out buffer onto a file.
.- Print value of dot.
value of $.)
Dot remains unchanged.
(Anequalsignbyitself
prints
the
The line
!command-line
causes command-line to be executed as a XENIX command.
/string/ Context search~
Search for
next line which
contains this string of' characters.
Print it.
Dot is
set to the line where string was found.
Search starts
at .+1
,
wraps around from $ to 1, and continues to
dot, if necessary.
2-5
XENIX Text Processing
?string? Context search in reverse direction.
search at .-1 , scan to 1, wrap around to $.
2.1.2
Start
More Advanced Editing Techniques
There are often
several
alternative
procedures
for
accomplishing the same editing task, with varying degrees of
efficiency. This section provides explanations and examples
of how to use ed to edit with less effort and greater speed.
Topics covered include
$
Special characters in search and substitute commands
$
Line addressing
$
Global commands
$
Line moving
$
Line copying
2.1.2.1 Special Characters
There are
several
special
characters which facilitate searching and substitution in
ed.
The List command '1'
Ed provides two commands for printing
the contents of lines. One of these is p, in combinations
like
l,$p
to print all the lines in the file, or
s/abc/def/p
to change 'abc' to 'deft on the current line. Less familiar
is the list command 1 (the letter '1'), which gives slightly
more information than p. In particular, 1 makes visible
characters that are normally invisible, such as tabs and
backspaces. 1 prints each tab as ~ and each backspace as ~,
in a line which contains these characters. This makes it
much easier to correct typing mistakes when extra spaces are
adjacent to tabs, or backspaces are followed by a space.
The 1 command also 'folds' long lines, by printing any line
that exceeds 72 characters on multiple lines; each printed
line except the last is terminated by a backslash \,
to
2-6
XENIX Text Processing
indicate that it was folded.
This overcomes the limitation
of your terminal screen width.
Occasionally, the 1 command will print in a line a string of
numbers preceded by a backslash, such as \07 or \16, making
visible characters that normally do not print,
like form
feed or vertical tab or bell, usually typed in error.
These
combinations are a single character with special meanings on
some terminals.
The Substitute Command's'
The substitute command s is the
command for changing the contents of
individual lines,
probably the most complicated and powerful of ed commands.
The most straightforward example is the trailing 9 after a
substitute command. with
s/this/that/
and
s/this/that/g
the first one replaces the first 'this' on the line with
'that'.
If there is more than one 'this' on the line, the
second form with the trailing 9 changes all' of them.
Either form of the s command can be followed by p
'print' or 'list' the contents of the line:
or
1
to
s/this/that/p
s/this/that/l
s/this/that/gp
s/this/that/gl
are all slight variations.
Of course, any s command can be preceded by one or two 'line
numbers'
to specify that the substitution is to take place
on a group of lines. Thus
l,$s/mispell/misspell/
changes the first occurrence of 'mispell' to
each line of the file.
But
l,$s/mispell/misspell/g
changes every occurrence in each line.
2-7
'misspell'
in
XENIX Text Processing
If a p or 1 is added to the end of any of these substitute
commands, only the last line that got changed will be
printed, not all the lines.
The Undo Command 'u'
If a substitution in a line
is
incorrect, The -- append lines
The ~ function causes the argument to be written
to the output after the line matched by its address.
The ~ command is inherently multi-line; a must appear
at the end or ~ line, and may contain any number
of lines.
To preserve
the
one-command-to-a-line
fiction,
the
interior newlines must be hidden by a
backslash character
('\')
immediately preceding the
newline.
The argument is terminated by the
first unhidden newline (the first one not immediately
preceded by backslash).
Once an a function is successfully executed,
will be-written to the output regardless of what later
commands do to the line which triggered it.
The
triggering line may be deleted entirely; will
still be written to the output.
The is not scanned for address. matches,
and no
editing commands are attempted on it.
It does not
cause any change in the line-number counter.
$
(l)i\ -- insert lines
The i function behaves identically to the ~ function,
except that is written to the output before the
matched line. All other comments about the a function
apply to the i function as well.
t'J>
(2)c\
change lines
The c function deletes the lines selected by its
address(es),
and replaces them with the lines in
. Like a and i, c must be followed by a newline
hidden by a backslash; and interior new lines in
must be hidden by backslashes.
2-35
XENIX Text Processing
The £ command may have two addresses,
and therefore
select a range of lines.
If it does, all the lines in
the range are deleted, but only one copy of is
written to the output, not one copy per line deleted.
As with a and i, is not scanned for address
matches,- and -no editing commands are attempted on it.
It does not change the line-number counter.
After a line has been deleted by a
further commands are attempted on it.
c
function,
no
If text is appended after a line by a or r
functions,
and the line is subsequently changed~ the text inserted
by the c function will be placed before the text of the
a or r functions.
Note: within the text put in the output by these functions,
leading blanks and tabs will disappear, as always in sed
commands. To get leading blanks and tabs into the output,
precede the first desired blank or tab by a backslash; the
backslash will not appear in the output.
For example, the list of editing commands:
n
a\
XXXX
d
applied to our standard input, produces:
In Xanadu did Kubhla Khan
XXXX
Where Alph, the sacred river, ran
XXXX
Down to a sunless sea.
In this particular case, the same effect would
by either of the two following command lists:
n
n
i\
c\
xxxx
XXXX
d
2-36
be
produced
XENIX Text Processing
2.2.6.2
Substitute Function
One very important function
changes parts of lines selected by a context search within
the line.
(2)s -- substitute
The s
function
replaces part of a line
(selected
.
It can best be read:
by
Substitute for ,
The argument contains a pattern, exactly like the
patterns
in
addresses.
The only difference between
and a context address is that the context address
must be delimited by slash ('/1) characters; may
be delimited by any character other than space or newline.
By default,
only the first atring matched by i~
replaced, but see the ~ flag below.
The argument begins immediately after
the
second delimiting character of ,
and· must be
followed immediately by another instance of the delimiting
character.
(Thus
there are exactly three instances of the
delimiting character.) The is not a pattern,
and the characters which are special in patterns do not have
special meaning in .
Instead, other characters
are special:
$
& is replaced by the string matched by
$
\d (where d is a single dig it) is replaced by the dth
substring matched by parts of enclosed in
'\ ( , and '\) ,
substrings
occur
in
If nested
,
the dth
is determined by counting opening
delimiters ('\('):-
.
As in patterns, special characters may be made
by preceding them wi th backslash ('\' ) ..
literal
The argument may contain the following flags:
g --
substitute for
all
(non-overlapping)
instances of in the line.
After a
successful substitution, the scan for
the next
instance of begins just after the end of
the inserted characters; characters put into the
line from are not rescanned.
p -- print the line if a successful replacement was done.
The E flag causes the line to be written to the
output if and only if a substitution was actually
2-37
XENIX Text Processing
made by the s function. Notice that if several s
functions, each followed by a £ flag, successfully
substitute in the same input line, multiple copies
of the line will be written to the output: one for
each successful substitution.
w -- write the line to a file if a successful
replacement was done.
The w flag causes lines
which are actually substituted by the s function
to be written to a file named by . If
exists before Sed is run, it
is
overwritten; if not, it is created.
A single space must separate
~
and .
The possibilities of multiple, somewhat different
copies of one input line being written are the
same as for £.
A maximum of 10 different file
mentioned after w flags and w
below), combined.
Here are some examples.
our standard input,
names may be
functions (see
The following command,
s/to/by/w changes
produces, on the standard output:
In Xanadu did Kubhla Khan
A ·stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless by man
Down by a sunless sea.
and, on the file 'changes':
Through caverns measureless by man
Down by a sunless sea.
If the nocopy option is in effect, the command:
s/[.,;?:]/*P&*/gp
produces:
A stately pleasure dome decree*P:*
Where Alph*P,* the sacred river*P,* ran
Down to a sunless sea*P.*
2-38
applied
to
XENIX
Finally, to
command:
illustrate
TextPro~essing
the
effect
of
the
~
flag,
the
/X/s/an/AN/p
produces (assuming nocopy mode) :
In XANadu did Kubhla Khan
and the command:
/X/s/an/AN/gp
produces:
In XANadu did Kubhla KhAN
2.2.6.3
Input-output Functions
(2)p -- print
The print function writes the addressed lines to
the standard output file.
They are written at the
time the E function is encountered, regardless of
what succeeding editing commands may do to the
lines.
(2)w -- write on
The write function writes the addressed lines to
the
file named by .
If the file
previously existed, it is overwritten; if not,
it
is created. The lines are written exactly as they
exist when the write function is encountered for
each line,
regardless of what subsequent editing
commands may do to them.
Exactly one
.
space
must
separate
the
w
and
A maximum of ten different files may be mentioned
in write functions and ~ flags after s functions,
combined.
(l)r -- read the contents of a file
The read
function
reads
the
contents
of
,
and appends them after the line
matched by the address.
The file
is read and
appended regardless of what subsequent editing
2-39
XENIX Text Processing
commands do to the line which matched its address.
If r
and a functions are executed on the same
line~ the text from the a
functions and the r
functions
is written to the output in the order
that the functions are executed.
Exactly one space must separate the
rand
.
If a file mentioned by a r function
cannot be opened, it is considered a null file,
not an error, and no diagnostic is given.
NOTE: Since there is a limit to the number of files that can
be opened simultaneously, care should be taken that no more
than ten files be mentioned in w functions or flags;
that
number is reduced by one if-any r functions are pres~nt.
(Only one read file is open at one tIme.)
Here are some examples.
the following contents:
Assume that the
file
'notel'
has
Note: Kubla Khan (more properly Kublai Khan;
1216-1294) was the grandson and most eminent
successor of Genghiz (Chingiz) Khan, and
founder of the Mongol dynasty in China.
Then the following command:
/Kubla/r notel
produces:
In Xanadu did Kubla Khan
Note: Kubla Khan (more properly Kublai Khan;
1216-1294) was the grandson and most eminent
successor of Genghiz (Chingiz) Khan, and
founder of the Mongol dynasty in China.
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
2.2.6.4
Multiple Input-line Functions
Three functions,
all spelled with capital letters, deal specially with
pattern spaces containing imbedded newlines;
they
are
intended principally to provide pattern matches across lines
in the input.
(2)N -- Next line
2-40
XENIX Text Process'ing
The next input line is appended to the current
line in the pattern space; the two input lines are
separated by an imbedded newline~ Pattern matches
may extend across the' imbedded newline (s) •
(2)D -- Delete first part of the pattern space
Delete up to and including the first newline
character
in the current pattern space.
If the
pattern space becomes empty (the only newline was
the terminal newline), read another line from the
input.
In any case, begin the list of editing
commands again f~om its beginning.
(2)P -- Print first part of the pattern space
Print up to and including the first newline in the
pattern space.
lower-case
The P and D functions are equivalent to their
counterparts if there are no imbedded newlines in the
pattern space.
2.2.6.5
Hold and Get Functions
These functions save
retrieve part of the input for possible later use:
1.
and
(2) h--hold pattern space
The h functions copies the contents of the pattern
(destroying the previous
space into a hold area
contents of the hold area).
2.
(2)H -- Hold pattern space
The H function appends the contehts of the pattern
space to the contents of the hold area; the former and
new contents are separated by a newline~
3.
(2) g -- get contents of hold area
The ~ function copies the contents of the hold area
into
the pattern space
(destroying the previous
contents of the pattern space).
4.
(2) G -- Get contents of hold area
The ~ function appends the contents of the hold area
to the contents of the pattern space; the former and
new contents are separated by a newline.
2-41
XENIX Text Processing
5.
(2)x -- exchange
The exchange command interchanges the contents of
pattern space and the hold area.
the
For example, the commands
lh
lsi did.*//
Ix
G
s/\n/
:/
applied to our standard example, produce:
In Xanadu did Kubla Khan
:In Xanadu
A stately pleasure dome decree:
:In Xanadu
Where Alph, the sacred river, ran :In Xanadu
Through caverns measureless to man :1n Xanadu
Down to a sunless sea.
:1n Xanadu
2.2.6.6
Flow-of-Control Functions
These functions do no
editing on the input lines, but control the application of
functions to the lines selected by the address part.
(2)! -- Don't
The Don't command causes the next command (written
on the -same line), to be applied to all and only
those input lines not selected by the adress part.
(2){ -- Grouping
The grouping command '{I causes the next set of
commands to be applied (or not applied) as a block
to the input lines selected by the addresses of
the grouping command.
The first of the commands
under control of the grouping may appear on the
same line as the '{I or on the next line.
The group of commands is terminated by a
,}, standing on a line by itself.
matching
Groups can be nested.
(0) : -- place a label
The label function marks a place in the list of
editing commands which may be referred to by ~ and
t functions.
The may be any sequence of
2-42
XENIX Text Processing
eight or fewer characters; if two different colon
functions have identical labels,
a compile time
diagnostic will be generated,
and no execution
attempted.
(2)b -- branch to label
The branch function causes
the sequence
of
editing commands being applied to the current
inptlt line to be restarted immediately after
the
place where a colon function with the same
was encountered.
If no colon function with
the
same
label can be found after all the editing
commands have been compiled,
a compile
time
diagnostic
is produced,
and no execution is
attempted.
A b function with no is taken
to be a branch to the end of the list of editing
commands; whatever should be done with the current
input line is done,
and another input line is
read; the list of editing commands is restarted
from the beginning on the new line.
(2)t -- test substitutions
The t
function tests whether any
successful
substItutions have been made on the current input
line; if so, it branches to ;
if not,
it
does nothing.
The flag which indicates that a
successful substitution has been executed is reset
by:
2.2.6.7
$
1.
reading a new input line, or
2.
executing a t
function.
Miscellaneous Functions
(1)= -- equals
The = function writes to the standard output
number of the line matched by its address.
$
the
line
(l)q -- quit
The g function causes the current line to be written to
the output (if it should be), any appended or read text
to be written, and execution to be terminated.
2-43
CHAPTER
3
PATTERN RECOGNITION AND FILE COMPARISON UTILITIES
When preparing documents, it is often necessary to find a
string repeated in a file or group of files, in order to
make a consistent set of changes, or to compare and contrast
two or more files
in order to identify the differences
between them. In this section, some tools provided by XENIX
to accomplish these tasks are compared. Although several of
these programs may be used interchangeably,
knowing which
one will do the job at hand most efficiently is a large part
of understanding their use.
if the job is planned in
advance.
In this chapter more possibilites are introduced
for
streamlining complicated editing command procedures, and
dealing with large numbers of files at once.
Grep,
the
first and simplest of these tools, merely prints all lines
which match a single specified pattern. A variant of grep,
egrep,
searches for more generalized patterns.
Fgrep
searches for a set of keywords with a particularly fast
algorithm. Grep and its variations are considered in detail
here, along with awk, a program which offers some special
features,
including the capacity to deal with numerics,
logical relations, and variables. In addition,
awk allows
for searching particular fields within lines.
Both grep and awk have as their basis the same principle of
pattern recognition as ed and sed.
In each case, a file is
searched for the occurrence of a given pattern-- a character
or group of characters, a word or word string-- generating a
list of contexts where the pattern appears.
Grep,
and the
related commands,
egrep and fgrep, are introduced below,
followed by a discu5sion of awk, a programming language for
carrying out a wide range of complex text manipulation
functions.
Also discussed here are three additional programs, comm,
diff,
and diff3, which compare two or more files and output
those lines which are different.
In
text
processing
applications these programs can be extremely useful for
locating variations between several versions of text.
The
last tool introduced in this chapter is spell; spell allows
the user to locate spelling and typographic errors quickly
in large quantities of text. Chapter! contains a detailed
summary of the options associated with each of these
programs.
3-1
XENIX Text Processing
3.1
GREP
It is often necessary to find all occurences 'of some word or
pattern in a set of files.
The patterns being searched are
the same "regular expressions" recognized by the editors, ed
and sed. Grep stands for
g/re/p
and does exactly this; it searches and prints every line in
a
set
of files that contains the specified regular
expression. Thus,
grep
'thing' filel file2 file3
finds 'thing' wherever
it occurs in any of the files
'filel',
'file2', etc.
Grep also indicates the file in
which the line was found, so that it can be edited later.
By using grep as a filter,
a command that reads and
transforms input, grep can be combined with another shell
procedure to become a powerful editing tool. The use of grep
in shell procedures is discussed at
length
in
The
Programmert~ Introduction.
The commands grep, egrep, and fgrep search a file for a
specified pattern. They are expressed in the following form:
grep
[ option ]
expression [ file
egrep
[ option ] •..
[ expression
[ file ] ...
fgrep
[ option ] •..
[ strings]
[ file ]
Commands of the grep family search the input files (standard
input default) for lines matching a pattern. Normally, each
line found is copied to the standard output; unless the -h
flag
is used, the file name is shown if there is more than
one input file.
There are two other members of the grep family,
fgrep and
egrep. Grep patterns are limited regular expressions in the
style of ed, it uses a compact nondeterministic algorithm.
Egrep patterns are full regular expressions; it uses a fast
3-2
XENIX Text Processing
deterministic algorithm that sometimes needs exponential
space.
Fgrep patterns are fixed strings; grep is fast and
compact.
The following options are recognized:
-v
All lines but those matching are printed.
-c
Only a count of matching lines is printed.
-1
The names of files with matching lines are
(once) separated by newlines.
-n
Each line is preceded by its line
file.
-b
Each line is preceded by the block number on which
it
was found.
This is sometimes useful in
locating disk block numbers by context. No output
is produced, only status.
-h
Do not print filename headers with output lines.
-y
Alphabetic letters in the pattern will match
letters of either case in the input grep and fgrep
only) •
-e
Same as a simple expression argument,
when the expression begins with a -.
-f
The regular expression egrep or string list
is taken from the file.
-x
(Exact) lines matched
printed (fgrep only) •
Care should
" ( ) and \
the shell.
argument in
in
their
number
listed
in
but
useful
fgrep
entirety
are
be taken when using the characters $ * [
I ? '
in the expression as they are also meaningful to
It is safest to enclose the entire expression
single quotes' '.
A
Fgrep searches for lines that contain one of
separated) strings.
the
(newline-
Egrep accepts extended regular
expressions.
In
following description 'character' excludes newline:
1.
the
A \ followed by a single
character. The character
A
3-3
character
matches
the
that
XENIX Text Processing
2.
($) matches the beginning (end) of a line.
3.
A.
4.
A single character not otherwise endowed with
meaning matches that character.
5.
A string enclosed in brackets [1
matches any single
character from the string.
Ranges of ASCII character
codes may be abbreviated as in 'a-zO-9'.
A ] may
occur only as the first character of the string. A
literal - must be placed where it can't be mistaken as
a range indicator.
6.
A regular expression followed by * (+,
?)
matches a
sequence of 0 or more (lor more, 0 or 1) matches of
the regular expression.
7.
Two regular expressions concatenated match a match
the first followed by a match of the second.
8.
Two regular expressions separated by
I or newline
match either a match for the first or a matdh for the
second.
9.
A regular expression enclosed in parentheses matches a
match for the regular expression.
matches any character.
special
of
The order of precedence of operators at the same parenthesis
level is [] then? then concatenation then "I" and newline.
3-4
XENIX Text Processing
3.2
AWK: A Pattern Scanning and Processing Language
Awk is a programming language designed to make many common
information retrieval and t~xt"manipulation tasks easy to
state and to perform. The basic operation of awk is to
search input lines consecutively for a match of any patterns
which the user has specified. For each pattern,
an action
can be specified; this action will be performed on each line
that matches the pattern.
In awk the patterns may be more general than in grep,
and
the actions allowed are more involved than merely printing
the matching line. For example, the awk program
{print $3, $2}
prints the third and second
order. The program
columns
of
a
table
prints all input lines with an A, B,
field. The program
or
C
in
$2
in
that
/AIBlc/
$1 1= prev
{ print; prev
=
the
second
$1 }
prints all lines in which the first field is different
the previous first field.
3.2.1
from
Usage
The command
awk
program
[files]
executes the awk commands in the string program on the set
of named files,
or on the standard input if there are no
files.
The statements can also be placed in a file pfile,
and executed by the command
awk
3.2.2
-f pfile
[files)
Program Structure
An awk program is a sequence of statements of the form:
3-5
XENIX Text Processing
pattern { act~on
pattern { actlon
I
Each line of input is matched against each of the patterns
in turn.
For each pattern that matches, the associated
action is executed. When all the patterns have been tested,
the next line is fetched and the matching starts over.
Either the pattern or the action may be left out,
but not
both.
If there is no action for a pattern, the matching
line is simply copied to the output.
(Thus a line which
matches several patterns can be printed several times.) If
there is no pattern for an action,
then the action is
performed for every input line. A line which matches no
pattern is ignored.
Since patterns and actions are both optional,
actions must
be enclosed in braces to distinguish them from patterns.
3.2.3
Records and Fields
Awk input is divided into "records" terminated by a record
separator. The default record separator is a newline, so by
default awk processes its input a line at a time.
The
number of the current record is available in a variable
named NR.
Each input record is considered to be
divided
into
"fields."
Fields
are
normally
separated by white
space-blanks or tabs-but the input field separator may be
changed,
as described below. Fields are referred to as $1,
$2, and so forth, where $1 is the first field, and $0 is the
whole input record itself. Fields may be assigned to. The
number of fields in the current record is available in a
variable named NF.
The variables FS and RS refer to the input field and record
separators;
they maY- be changed at any time to "any single
character. The optional command-line argument -Fc may also
be used to set FS to the character c.
If the record separator is empty, an empty input line is
taken as the record separator, and blanks, tabs and newlines
are treated as field separators.
The variable FILENAME contains the name of the current input
file.
3-6
XENIX Text Processing
3.2.4
Printing
An action may have no pattern, in which case the action is
executed for all lines.
The simplest action is to print
some or all of a record; this is accomplished by the awk
command print.
This program prints each record, thus
copying the input to the output intact. More useful is to
print a field or fields from each record. For instance,
print $2, $1
prints the first two fields in reverse order.
Items
separated by a comma in the print statement will be
separated by the current output field separator when output.
Items not separated by commas will be concatenated, so
print $1 $2
runs the first and second fields together.
The predefined variables NF and NR can be used; for example
{ print NR, NF, $0 }
prints each record preceded by the
number of fields.
record
number
and
the
Output may be diverted to multiple files; the program
{ print $1 >"fool"; print $2 >"fo02" }
writes the first field, $1, on the file fool, and the second
field on file fo02. The» notation can also be used:
print $1 »"foo"
appends the output to the file foo.
(In each case, the
output files are created if necessary.) The file name can be
a variable or a field as well as a constant; for example,
print $1 >$2
uses the contents of field 2 as a file name.
Naturally there is a limit on the number
currently it is 10.
Similarly, output can be piped
instance,
3-7
into
of
another
output
files;
process;
for
XENIX Text Processing
print
I
"mail bwk"
mails the output to bwk.
The variables OFS and DRS may be used to change the current
output field separator and output record separator. The
output record separator is appended to the output of the
print statement.
Awk also provides
formatting:
the
printf
statement
for
output
printf format expr, expr,
formats
the expressions in the list
specification in format and prints them.
printf "%8.2f
according to
For example,
the
%lOld\n", $1, $2
prints $l as a floating point number 8 digits wide, with two
after
the decimal point, and $~ as a lO-digit long decimal
number, followed by a newline.
No output separators are
produced automatically; you must add them yourself, as in
this example. The version of printf is identical to that
used with C2
3.2.5
Patterns
A pattern in front of an action acts as a selector that
determines whether the action is to be executed. A variety
of expressions may be used as patterns: regular expressions,
arithmetic
relational
expressions,
string-valued
expressions, and arbitrary boolean combinations of these.
3.2.6
BEGIN and END
The special pattern BEGIN matches the beginning of the
input,
before the first record is read.
The pattern END
matches the end of the input, after the last record has been
processed. BEGIN and END thus provide a way to gain control
before and after processing, for initialization and wrapup.
As an example, the field separator can be set to a colon by
BEGIN
{FS = ":" }
••. rest of program ••.
Or the input lines may be counted by
3-8
XENIX Text Processing
END
{print NR }
If BEGIN is present, it must be the first pattern: END
be the last if used.
3.2.7
must
Regular Expressions
The simplest regular expression is
characters enclosed in slashes, like
a
literal
string
of
/smith/
This is actually a complete awk program which will print all
lines which contain any occurrence of the name "smith".
If a line contains "smith" as part of a larger word,
it
will also be printed, as in
blacksmithing
Awk regular expressions include the regular expression forms
found in the XENIX text editor ed and grep (without backreferencing) •
In addition, awk allows parentheses for
grouping,
I for alternatives, + for "one or more' I, and?
for "zero or one", all as in lex. Character classes may
be abbreviated:
[a-zA-ZO-9]
is the set of all letters and
digits. As an example; the-awk program
/[Aa]hol [Ww]einbergerl [Kk]ernighan/
will print all lines which contain any of the names "Aho,"
"Weinberger'I or "Kernighan," whether capitalized or not.
Regular expressions (with the extensions listed above)
must
be enclosed in slashes,
just as in ed and sed. Within a
regular expression,
blanks and the regular
expression
metacharacters are significant.
To turn of the magic
meaning of one of the regular expression characters, precede
it with a backslash. An example is the pattern
. /\/.*\//
which matches any string of characters enclosed in slashes.
One can also specify that any field or variable matches a
regular expression (or does not match it) with the operators
and ! . The program
$1
/[jJ]ohn/
prints all lines where the first field matches
3-9
"john"
or
XENIX.Text Processing
"John."
Notice that this
"St. Johnsbury", and so on.
[~]ohn, use
will also match "Johnson",
To restrict it to exactly
The caret
refers to the beginning of a line or field;
dollar sign $ refers to the end.
A
3.2.8
the
Relational Expressions
An awk pattern can be a relational expression involving the
usual relational operators <,
<=, ==, !=, >=, and >.
An
example is
$2 > $1 + 100
which selects lines where the second field is at
greater than the first field.
Similarly,
NF % 2 ==
least
100
a
prints lines with an even number of fields.
In relational tests, if neither operand is numeric, a string
comparison is made; otherwise it is numeric. Thus,
$1 >= "s"
selects lines that begin with an s, t,
u,
etc.
In the
absence of any other
information, -fields are treated as
strings, so the program
$1 > $2
will perform a string comparison.
3.2.9
Combinations of Patterns
A pattern can be any boolean combination of patterns,
using
the operators II (or), && (and), and!
(not). For example,
$1 >= "s" && $1 < "ttl && $1 != "smith"
selects lines where the first field begins with "~Sf',
but
is not "smith".
&& and I I guarantee that their operands
will be evaluated from left to right; evaluation stops as
soon as the truth or falsehood is determined.
3-10
XENIX Text Processing
3.2.10
Pattern Ranges
The "pattern"
that selects an action may also
two patterns separated by a comma, as in
patl, pat2
{
...
consist
of
}
In this case, the action is performed for each line between
an occurrence of patl and the next occurrence of pat2
(inclusive). For example,
/start/, /stop/
prints all lines between start and stop, while
NR == 100, NR == 200 { .•• }
does the action for lines 100 through 200 of the input.
3.2.11
Actions
An awk action is a sequence of action statements terminated
by newlines or semicolons. These action statements can be
used to do a variety of bookkeeping and string manipulating
tasks.
3.2.12
Built-in Functions
Awk provides a "length" function to compute the length of
a string of characters. This program prints each record,
preceded by its length:
{print length, SO}
length by itself is a "pseudo-variable" which yields the
length of the current record; length (argument) is a function
which yields the length of its argument,
as in
the
equivalent
{print length($O), SO}
The argument may be any expression.
Awk also provides the arithmetic functions sgrt,
log, exp,
and int, for square root, base e logarithm, exponential, and
integer part of their respective arguments.
The name of one of these built-in
argument or parentheses, stands for
3-;1.1
functions,
the value
without
of the
XENIX Text Processing
function on the whole record.
length < 10
II
The program
length > 20
prints lines whose length is less than 10 or greater than
20.
The function substr(s, m, n) produces the substring of
s that begins at position m ('Origin 1)
and is at most n
characters long.
If n is emitted, the substring goes to the
end of s. The function index(sl, s2) returns the position
where the string s2 occurs in sl, or zero if it does not.
The function sprintf(f, el, e2, •.. ) produces the value of
the expressions el, e2, etc., in the printf format specified
by f.
Thus, for-example,
x = sprintf("%8.2f %lOld", $1, $2)
sets x to the string produced by formatting
and $2.
3.2.13
th~
values of $!
Variables, Expressions, and Assignments
Awk variables take on numeric
(floating point)
values according to context. For example, in
or
string
x = 1
x is clearly a number, while in
x = "smith"
it is clearly a string. Strings are converted to numbers
and vice versa whenever context demands it. For instance,
x = "3" + "4"
assigns 7 to x. Strings which cannot be interpreted as
numbers in a numerical context will generally have numeric
value zero, but it is unwise to count on this behavior.
By default, variables (other than built-ins) are initialized
to the null string,
which has numerical value zero; this
eliminates the need for most BEGIN sections.
For example,
the sums of the first two fields can be computed by
END
Arithmetic
arithmetic
{ sl += $1; s2 += $1 }
{ print sl, s2 }
is done internally
operators are +,
3-12
in
floating
point.
*, I, and % (mod).
The
The C
XENIX Text Processing
increment ++ and decrement -- operators are also available,
and so are the assignment operators +=, -=, *=, /=, and %=.
These operators may all be used in expressions.
3.2.14
Field Variables
Fields in awk share essentially all of the properties of
variables
they may be used in arithmetic or string
operations,-and may be assigned to. Thus one can replace
the first field with a sequence number like this:
{ $1
=
NR; print }
or accumulate two fields into a third, like this:
{ $1
=
$2 + $3; print $0 }
or assign a string to a field:
{ if ($3 > 1000)
$3 = "too big"
print
}
which replaces the third field by "too big"
and in any case prints the record.
when
it
is,
Field references may be numerical expressions, as in
{ print $i, $(i+l), $(i+n) }
Whether a field is deemed numeric
context; in ambiguous cases like
if ($1 == $2)
or
string
depends
on
...
fields are treated as strings.
Each input line is split into fields automatically as
necessary.
It is also possible to split any variable or
string into fields:
n
= split(s,
array, sep)
splits the the string ~ into array[!], .•. , array[~].
The
number of elements found is returned. If the ~ argument
is provided, it is used as the field separator; otherwise FS
is used as the separator.
3-13
XENIX Tex·t Processing
3.2.15
String Concatenation
Strings may be concatenated.
For example
leng th ($1 $ 2 $ 3)
returns the length of the first three fields.
statement,
Or in a print
print $1 " is " $2
prints the two fields separated by" is I I
Variables
numeric expressions may also appear in concatenations.
3.2.16
and
Arrays
Array elements are not declared; they spring into existence
by being mentioned. Subscripts may have any non-null value,
including non-numeric strings.
As an
example
of
a
conventional numeric subscript, the statement
x[NR] = $0
assigns the current input record to the NR-th element of the
array x.
In fact,
it is possible in principle (though
perhaps-slow) to process the entire input in a random order
with the awk program
END
{ x [NR] = $0 }
{ .•. program •.• }
The first action merely records each input line in the array
x.
Array elements may be named by non-numeric values, which
gives awk a capability rather like the associative memory of
Snobol tables.
Suppose the input contains fields with
values like apple, orange, etc. Then the program
/apple/ { x["apple"]++ }
/orange/
{ x["orange"]++ }
END
{ print x[flapple"], x["orange"]
increments counts for the named array elements,
them at the end of the input.
Any expression can be
reference. Thus
used
as
3-14
a
subscript
and
i,n
an
}
prints
array
XENIX Text Processing
x[$l]
=
$2
uses the first field of a record (as a string) to index
array ~.
the
Suppose each line of input contains two fields, a name and a
non-zero value. Names may berepeated~ the task is to print
a list of each unique name followed by the sum of all the
values for that name. This can be done with the program
{ amount[$l] += $2 }
{ for (name in amount)
print name, amount [name]
END
}
To sort the output, replace the last line by
print name, amount [name]
3.2.17
I
"sort"
Flow-of-Control Statements
Awk provides the basic flow-of-control statements if-else,
while,
for,
and statement grouping with braces,
in c.
The if -statement
was
previously
introduced
without
description.
The condition in parentheses is evaluated; if
it is true, the statement following the if is done.
The
else part is optional.
as
The while statement is exactly like that of C.
to print all input fields one per line,
For example,
i .= 1
while (i <= NF) {
print $i
++i
}
The for statement is also exactly that of C:
for
(i
= 1;
i <= NF; i + + )
print $i
does the same job as the while statement above.
There is an alternate form of the for
statement which is
suited for accessing the elements o~n associative array:
for
(i in array)
statement
does statement with i set in turn to each element of
3-15
array.
XENIX Text Processing
The elements are accessed in an apparently random order.
Chaos will ensue if i is'altered l or if any new elements are
accessed during the loop.
The expression in the condition part of an if, while or for
can include relational operators like' <, <=-,->, >=, == ("is
equal to' '), and 1= ("not equal to' I): regular expression
matches with the match operators
and 1 : the logical
operators I I, &&, and
!;
and of course parentheses for
grouping.
The break statement causes an immediate exit from an
enclosing while or for; the continue statement causes the
next iteration to begin:The statement next causes awk to skip immediately to the
next record and begin scanning the patterns from the top.
The statement exit causes, the program to behave as if the
end of the input had occurred.
Comments may be placed in awk programs: they begin with
character # and end with the end of the line, as in
print x, y
3.3
# this is a
the
~omment
OIFF
oiff is a program to compare two files,
using the form:
diff [-efbh] filel file2
Diff ,tells what lines must be changed in two files to bring
them into agreement.
If filel (file2) is '_I, the standard
input is used.
If filel (file2) is a directory; then a file
in that directory whose file-name is the same as the filename of file2 (filel) is used.
The normal output contains
lines of these forms:
nl a n3,n4
nl,n2d n3
nl,n2 c n3,n4
These lines-resemble ed commands to convert filel into
file2.
The numbers after the letters pertain to file2.
In
fact, by exchanging 'a' for 'd l and reading backward one may
ascertain equally how to convert file2 into filel. As in
ed, identical pairs where nl = n2 or n3 = n4 are abbreviated
as a single number.
3-16
XENIX Text Processing
Following each of these lines come all the lines that are
affected in the first file flagged by ''.
The -b option causes trailing blanks (spaces and tabs) to be
ignored and ot~er strings of blanks to compare equal.
The -e option produces a script of a, c and d commands for
the editor ed, which will recreate file2 from filel. The-f
option produces a similar script, not useful with ed, in the
opposite order. In connection with -e, the folloWIng shell
program may help maintain multiple versions of a file. Only
an ancestral file ($1) and a chain of version-to-version ed
scripts ($2,$3, ••. ) made by diff need be on hand. A 'latest
version' appears on the standard output.
(shift; cat $*;
echo 'l,$p') I ed - $1
Except in rare circumstances, diff
sufficient set of file differences:---
finds
a
smallest
Option -h does a fast, half-hearted job. It works only when
changed stretches are short and well separated, but does
work on files of unlimited length. Options -e and -f are
unavailable with -h.
3-17
XENIX Text Prpcessing
3.4
DIFF3
diff3 is a program for 3-way differential
stated in the form:
file
comparison,
diff3 [-ex3] filel file2 file3
Diff3 compares three versions of a file, and publishes
disagreeing ranges of text flagged with these codes:
---all three files differ
====1
filel is different
====2
file2 is different
====3
file3 is different
The type of change suffered in converting a given range of a
given file to some other is indicated.in one of these ways:
nl a
Text is to be appended after line number
nl
in file
f,
where
f
:
f
=
1, 2, or 3.
f : nl , n2 c
Text is to be
changed in the range line
nl
to line
n2.
IT
nl
n2,
the range may be abbreviated to
nl.
The original contents of the range follows immediately after
a c indication.
When the contents of two files are
identical, the contents of the lower-numbered file is
suppressed.
3-18
XENIX Text Processing
Under the -e option, diff3 publishes a script for the editor
ed that will incorporate into filel all changes between
file2 and file3, i.e. the changes that normally would be
flagged ==== and ~=~=3. Option -x (-3) produces a script to
incorporate only changes flagged
---(====3).
The
following
command will apply the resulting script to
'filel'.
(cat script; echo 'l,$pl)
3.5
I
ed - file1
COMM
Corom selects or reject lines common to two sorted files.
is expressed in the form:
It
comm [-[123] ] filel fi1e2
Corom reads filel and file2, which should be ordered in ASCII
collating sequence, and produces a three column output:
lines only in filel; lines only in file2; and lines in both
files. The filename '_I means the standard input.
Flags 1, 2, or 3 suppress printing of the corresponding
column.
Thus comm -12 prints only the lines common to the
two files; corom -23 prints only lines in the first file but
not in the second; comm -123 is a no-oPe
3-19
XENIX Text Processing
3.6
SPELL
Spell collects words from the specified files,
and looks
them up in a spelling list. Words that neither occur among
nor are derivable
(by applying inflections, prefixes or
suffixes) from words in the spelling list are printed on the
standard output. If no files are named, words are collected
from the standard input.
spell is used with the following
format:
spell [ option]
•• [file] .••
/usr/src/cmd/spell/spellin [list]
/usr/src/cmd/spell/spellout [-d] list
Spell ignores most troff, tbl and eqn constructions.
Under
the -v option, all words not literally in the spelling list
are printed, and plausible derivations from spelling list
words are indicated. Under the -b option, British spelling
is checked. Besides preferring centre, colour, speciality,
travelled, etc., this option insists upon -ise in words like
standardise, Fowler
and
the
OED
to---the
contrary
notwithstanding.
Under the -x option, every plausible stem
is printed with '=' for each word.
The spelling list is based on many sources, and while more
haphazard
than
an ordinary dictionary,
is also more
effective in respect to proper names and popular technical
words. Coverage of the specialized vocabularies of biology,
medicine and chemistry is light. Pertinent auxiliary files
may be specified by name arguments, indicated below with
their default settings.
Copies
of
all
output
are
accumulated in the history file. The stop list filters out
misspellings (e.g.
thier=thy-y+ier)
that would otherwise
pass.
Two routines help maintain the hash lists used by spell.
Both expect a list of words, one per line, from the standard
input.
Spellin adds the words on the standard input to the
preexisting list and places a new list on the standard
output.
If no list is specified, the new list is created
from scratch. --spellout looks up each word in the standard
input and prints on the standard output those that are
missing.
3-20
CHAPTER
4
TEXT FORMATTING AND DOCUMENT PREPARATION
In addition to the text editors, and pattern recognition and
file comparison programs that simplify the work of creating
and modifying files for text processing, the XENIX system
offers
text
formatting
packages
which simplify the
production of technical reports, memoranda, formal papers,
and documentation, as well as several specialized programs
for specifying the final output of tables, mathematical
equations, and bibliographic references.
There are two major formatting programs available with
XENIX.
These programs produce a text with justified right
margins, automatic page numbering and titling, automatic
hyphenation, and many special features. nroff is designed
to produce output on terminals and line-printers.
troff
(pronounced "tee-roff") instead drives a phototypesetter,
which produces very high quality output on photographic
paper. This document is itself an example of troff output.
4-1
XENIX Text Processing
4.1
FORMATTING PACKAGES
is
The basic idea of nroff and troff is that text
interspersed with "formatting commands" that specify in
these
detail how the final output., is to look. . 'Typically,
line
length,
spacing,
and
include commands that specify
running titles.
Because nroff and troff are relatively hard to learn to use
effectively,
several "packages"
of canned formatting
requests compatible with nroff and troff have been designed
to allow the user to specify paragraphs, running titles,
footnotes, multi-column output, and so on, with less effort
and without having to learn nroff and troff.
In this
chapter,
the "manuscript"
package known as
-ms
is
described in detail.
To actually produce a document in
standard format using -ms, use the command
troff -ms files ..•
for the typesetter, and
nroff -ms files
for a terminal. The -IDS argument tells troff and nroff
use the manuscript package of formatting requests.
4-2
to
XENIX Text Processing
4.2
SUPPORTING TOOLS
In addition to the basic formatters, there are also some
supporting programs that aid in document preparation. For
example, eqn integrates mathematical symbols and equations
into the text of a document. The program tbl provides an
analogous serv~ce for preparing tabular material; it does
all the computations necessary to align complicated columns
with elements of varying widths.
Finally, refer prepares
bibliographic citations from a data base, in whatever style
is defined by the formatting package. It looks after all
the details of numbering references in sequence, filling in
page and volume numbers, getting the author's initials and
the journal name right, and so on.
4-3
XENIX Text Processing
4.3
HINTS FOR PREPARING DOCUMENTS
Most documents go through several revisions before they are
finally finished~ some simple measures will make the work of
chang ing them considerably .easier. Since most people change
documents by rewriting phrases ,and adding~de~eting or
rearranging sentences, subsequent editing of text will be
simpler if each sentence starts ,on a new line, and if each
line is short, and breaks at a natural place, such as after
a comma or semicolon.
Documents should be broken down into individual files of
reasonable size, perhaps ten to fifteen thousand characters.
Operations on larger files are considerably slower, and the
accidental loss of a small file is less catastrophic than a
large one. The files should be spli t at natural boundar ie's
in the document, and named with conventions that allow them
to be processed in groups.
One of the advantages of formatting packages like -ms is
that they allow formatting decisions to be delayed until the
document is printed or typeset.
If a document is typed
initially with generalized formatting commands like .PP,
they can be defined appropriately, as necessary, either with
a canned package like -ms, or with user-defined nroff and
traff commands. If the text has been entered in some
systematic way, it is easier to revise.
4-4
XENIX Text Processing
4.4
A NOTE ABOUT THE PAPERS
What follows is a group of independent papers about -ms, the
formatting packages nroff and troff, and some of the
specialized formatting programs, including tbl, eqn, and
refer. Keep in mind that although these papers were written
about UNIX, the operating system from which XENIXis
derived, all references to UNIX are equally applicable to
XENIX. These papers were written largely by the authors of
the
programs,
using
the
tools they describe quite
extensively. Hence the papers are in themselves excellent
examples of the final output of text formatted with these
programs.
4-5
Typing Documents on the UNIX System:
Using the - ms Macros with Troff and Nroff
M. E. Lesk
Bell Laboratories
Murray.Hill, New Jersey 07974
ABSTRACT
This document describes a set of easy-to-use macros for preparing documents on the UNIX system. Documents may be produced on either the phototypesetter or a on a computer terminal, without changing the input.
The macros provide facilities for paragraphs, sections (optionally with
automatic numbering), page titles, footnotes, equations, tables, two-column
format, and cover pages for papers.
This memo includes, as an appendix, the text of the "Guide to Preparing
Documents with -ms" which contains additional examples of features of
-ms.
This manual is a revision of, and replaces, UTyping Documents on
UNIX," dated November 22, 1974.
November 13, 1978
4-6
Typing Documents on the UNIX System:
Using the - ms Macros with Troff and N roff
M. E. Lesk
Bell Laboratories
Murray Hill, New Jersey 07974
Introduction. This memorandum describes a package of commands to produce papers
using the {,off arid nroff formatting programs on the UNIX system. As with other roff-derived
programs, text is prepared interspersed with formatting commands. However, this package,
which itself is written in {roffcommands, provides higher-level commands than those provided
with the basic troffprogram. The commands available in this package are listed in Appendix A.
Text. Type normally, except that instead of indenting for paragraphs, place a line reading
~~ .PP" before each paragraph. This will produce indenting and extra space.
Alternatively, the command .LP that was used here will produce a left-aligned (block) paragraph. The paragraph spacing can be changed: see below under HRegisters."
Beginning. For a document with a paper-type cover sheet, the input should start "as f~
lows:
[optional overall format .RP - see below1
.TL
Title of document (one or more lines)
.AU
Author(s) (may also be several lines)
.AI
Author's institution (s)
.AB
Abstract~ to be placed on the cover sheet of a paper.
Line length is 5/6 of normal~ use .11 here to change .
. AE (abstract end)
text ... (begins with .PP, which see)
To omit some of the standard headings (e.g. no abstract, or no at,lthor's institution) just omit
the corresponding fields and command lines. The word ABSTRACT can be suppressed by writing
".AB no" for ".AB". Several interspersed .AU and .AI lines can be used for multiple authors.
The headings are not compulsory: beginning with a .PP command is perfectly OK and will just
start printing an ordinary paragraph. Warning: You can't just begin a document with a line of
text. Some -ms command must precede any text input. When in doubt, use .LP to get
proper initialization, although any of the commands .PP, .LP, .TL, .SH, .NH is good enough.
Figure 1 shows the legal arrangement of commands at the start of a document.
Cover Sheets and First Pages. The first line of a document signals the general format of
the first page. In particular, if it is ".RP" a cover sheet with title and abstract is prepared. The
default format is useful for scanning drafts.
In general - ms is arranged so that only one form of a document need be stored, containing all information~ the first command gives the format, and unnecessary items for that format
are ignored.
Warning: don't put extraneous material between the .TL and .AE commands. Processing
of the titling items is special, and other data placed in them may not behave as you expect.
Don't forget that some - ms command must precede any input text.
4-7
Page headings. The - ms macros, by default. will print 1.1 page heading containing a page
number (if greater than I), A default page footer is provided only in nroff, where the date is
used. The user can make minor adjustments to the page headings/footings by redefining the
strings LH. CH. and RH which are the left. center and right portions of the page headings,
respecti vely: and the strings LF:- CF, and RF. which are the left, center and right portions of
the page footer. For more complex formats. the user can redefine the macros PT and BT,
which are invoked respectively at the 'top and bottom of each page. The margins (taken from
registers H M and FM for the top and bottom margin respecti vely) are normally I inch ~ the page
header/footer are in the middle of that space. The user who redefines these macros should be
careful not to change parameters such as point size or font without resetting them to default
values.
Care and Feeding of Directors
Multi-column formats. If you place
the command" .2C'· in your document, the
Every section heading, of either type,
document will be printed in double column
should be followed by a paragraph beginning
format beginning at that point. This feature
with .PP or .LP, indicating the end of the
is not laO useful in computer terminal outheading. Headings may contain more than
put. bUl is often desirable on the typesetter.
one line of text.
The command H.IC" will go back to oneThe .NH command also supports more
col umn format and also skip to a new page.
complex numbering schemes. If a' numeriThe ".2C" command is actually a special
cal argument is given, it is taken to be a
case of the command
"level" number and an appropriate sub.MC [column width [gutter width]]
section number is generated. Larger level
n urn bers indicate deeper sub-sections, as in
which makes multiple columns with the
this example:
specified column and gutter width: as many
col umns as will fit across the page are used.
.NH
Thus triple, quadruple .... column pages can
Erie-Lackawanna
be printed. \Vhenever the number of
.NH 2
columns is changed (except going from full
Morris and Essex Division
width to some larger number of columns) a
.NH 3
new' page is started.
Gladstone Branch
.~H 3
Headings. To produce a special headMontclair
Branch
ing, there are two commands. If you type
.NH 2
.NH
Boonton Line
type section heading here
generates:
may be several lines
you \I,.'ill get automatically numbered section
headings (1. 2. 3.... ), in boldface. For
example,
2. Erie-Lackawanna
2.1. 1\1orris and Essex Di vision
.NH
Care and Feeding of Department Heads
2.) ,l. Gladstone Branch
produces
2.] .2, 1\1ontclair Branch
1. Care and Feeding of Department Heads
Alternatively,
2.2, Boonton Line
An explicit ".NH 0" will reset the
numbering of level 1 to one, as here:
.SH
Care and Feeding of Directors
.NH 0
Penn Central
will print the heading with no number
added:
1. Penn Central
4-8
Indented
paragraphs.
(Paragraphs
with hanging numbers, e.g. references.) The
sequence
.IP [1]
Text for first paragraph, typed
normally for as long as you would
like on as many lines as needed.
.IP [2]
Text for second paragraph, ...
produces
[1] Text for first paragraph, typed normally for as long as you would like on
as many lines as needed.
[2] Text for second paragraph, ...
A series of indented paragraphs may be followed by an ordinary paragraph beginning
with .PP or .LP, depending on whether you
wish indenting or not. The command .LP
was used here.
More sophisticated uses of .IP are also
possible. If the label is omitted, for example, a plain block indent is produced.
.IP
This material will
just be turned into a
block indent suitable for quotations or
such matter.
.LP
will produce
This material will just be turned into a
block indent suitable for quotations or
such matter.
If a non-standard amount of indenting is
required, it may be specified after the label
(in character positions) and will remain in
effect until the next .PP or .LP. Thus, the
general form of the .IP command contains
two additional fields: the label and the
indenting length. For example,
.IP first: 9
Notice the longer label. requiring larger
indenting for these paragraphs.
.IP second:
. And so forth.
.LP
produces this:
4-9
first:
Notice the longer label, reqUlflng
larger indenting for these paragraphs.
second: And so forth .
It is also possible to produce multi pie nested
indents~ the command .RS indicates that the
next .IP starts from the current indentation
level. Each .RE will eat up one level of
indenting so you should balance . RS and
.RE commands. The .RS command should
be thought of as Hmove right" and the .RE
command as Hmove left". As an example
.IP 1.
Bell Laboratories
.RS
.IP 1.1
Murray Hill
.IP 1.2
Holmdel
.IP 1.3
Whippany
.RS
.IP1.3.1
Madison
.RE
.IP 1.4
Chester
.RE
.LP
will result in
1.
Bell Laboratories
1.1 Murray Hill
1.2 Holmdel
1.3 Whippany
1.3.1 Madison
1.4 Chester
All of these variations on .LP leave the right
margin untouched. Sometimes, for purposes such as setting off a quotation, a paragraph indented on both right and left is
required.
A single paragraph like this is
obtained by preceding it with
.QP. More complicated material
(several paragraphs) should be
bracketed with .QS and .QE.
Emphasis. To get italics (on the typesetter)
or underlining (on the terminal) say
.DS
table lines, like the
examples here, are placed
"between .DS and .DE
.DE
.I
as much text as you want
can be typed here
.R
as was done for These three words. The .R
command restores the normal (usually
Roman) font. If only one word is to be italicized, it may be just given on the line with
the .I command,
By default, lines between .DS and .DE are
indented and left-adjusted. You can also
center lines, or retain the left margin. Lines
bracketed by .DS C and .DE commands are
centered (and not re-arranged L lines bracketed by .DS Land .DE are left-adjusted, not
indented, and not re-arranged. A plain . OS
is equivalent to .DS L which indents and
left-adjusts. Thus,
.1 word
and in this case no .R is needed to restore
the previous font. Boldface can be produced by
these lines were preceded
by .DS C and followed by
a .DE command:
.B
Text to be set in boldface
goes here
.R
whereas
and also will be underlined on the terminal
or line printer. As with .1. a single word can
be placed in boldface by placing it on the
same line as the .B command.
A few size changes can be specified
similarly with the commands .LG (make
larger), .SM (make smaller). and .NL
(return to normal size). The size change is
two points: the commands may be repeated
for increased eflCl'l (here one .NL canceled two
.SM commands).
If actual underlining as opposed to italicizing is required on the typesetter, the
command
these lines were preceded
by .DS L and followed by
a .DE command.
Note that .DS C centers each line: there is a
variant .DS B that makes the display into a
left-adjusted block of text, and then centers
that entire block. Normally a display is kept
together, on one page. If you wish to have
a long display which may be split across page
boundaries, use .CD, .LD, or .10 in place of
the commands .DS C, .DS L, or .DS I
respectively. An extra argument to the .DS
I or .DS command is taken as an amount to
indent. Note: it is tempting to assume that
. OS R will right adjust lines, but it doesn't
work.
Boxing words or lines. To draw rectangular boxes around words the command
.UL word
will underline a word. There is no way to
underline multiple words on the typesetter.
Footnotes. Material placed between
lines with the commands .FS (footnote) and
.FE (footnote end) will be collected,
remem bered, and finally placed at the bottom of the current page·. By default. footnotes are 11112th the length of normal text.
but this can be changed using the FL register (see below).
Displays and Tables. To prepare
displays of lines, such as tables, in which the
lines should not be re-arranged, enclose
them in the commands .DS and .OE
• Like this.
.BX word
will print Iword I as shown. The boxes will
not be neat' on a terminal, and this should
not be used as a substitute for italics.
Longer pieces of text may be boxed by
enclosing them with .Bl and .B2:
.Bl
text. ..
.B2
as has been done here.
Keeping blocks together. If you wish
to keep a table or other block of lines
together on a page, there are Hkeep -
4-10
release" commands. If a block of lines preceded by .KS and followed by .KE does not
fit on the remainder of the current page, it
will begin on a new page. Lines bracketed
by .DS and .DE commands are automatically
kept together this way. There is also a
"keep floating" command: if the block to be
kept together is preceded by .KF instead of
. KS and does not fit on the current page, it
will be moved down through the text until
the top of the next page. Thus, no large
blank space will be introduced in the document.
Nro.f!1Trojf commands. Among the
useful commands from the basic formatting
programs are the following. They all work
with both typesetter and computer terminal
output:
. bp - begin new page.
.br - "break", stop running text
from line to line.
.sp n - insert n blank lines.
.na - don't adjust right margins.
Date. By default, documents produced
on computer terminals have the date at the
bottom of each page~ documents produced
on the typesetter don't. To force the date,
say ".DA". To force no date, say ·'.NO".
,To lie about the date, say" .OA July 4,
\1776" which puts the specified date at the
bottom of each page. The command
.NO May 8, 1945
in ".RP" format places the specified date on
the cover sheet and nowhere else. Place
this line before the ti tIe.
Signature line. You can obta.in a signature line by placing the command .SG in
the document. The authors' names will be
output in place of the .SG line. An argument to .SG is used as a typing identification
line, and placed after the signatures. The
.SG command is ignored in released paper
format.
Registers. Certain of the registers
used by - ms can be altered to change
default settings. They should be ·changed
with .nr commands, as with
.nr PS 9
to make the default point size 9 point. If
the effect is needed immediately, the normal
4-11
{roff command should be used in addition to
changing the number register.
Register
Defines
point size
line spacing
LL line length
LT title length
PO para. spacing
para. indent
PI
FL footnote length
CW column width
GW intercolumn gap
PO page offset
HM top margin
FM bottom margin
PS
VS
Takes
effect
next para.
next para.
next para.
next para.
next para .
next para.
next FS
next 2C
next 2C
next page
next page
next page
Default
10
12 pts
6"
6"
0.3 VS
5 ens
11112Ll
7115 LL
1115 LL
26/27"
I"
1"
You may also alter the strings LH, CH, and
RH which are the left, center, and right
headings respectively~ and similarly LF. CF.
and RF which are strings in the page footer.
The page n um ber on output is, taken from
register PN, to permit changing its output
style. For more complicated headers and
footers the macros PT and BT can be
redefined, as explained earlier.
Accents. To simplify typing certain
foreign words, strings representing common
accent marks are de fined. They precede the
letter over which the mark is to appear.
Here are the strings:
Input
\ *'e
\ . . ·e
\*:u
\*Ae
Output
e
e
u
e
Input
\*-a
Output
\ *Ce
e
C
\ *,c
a
v
Use. After your document is prepared
and stored on a file, you can print it on a
terminal with the command*
nroff -ms/ile
and you can print it on the typesetter with
the command
trofl - ms/ile
(many options are possible). In each case.
if your document is stored in several files,
just list all the filenames where we have
used" file". If equations or tables are used,
eqn and/or fbi must be in voked as preprocessors .
• If ,2e was used. pipe the IIfolloutput through
col: make the first line of the input ",pi
lusr/btn/col."
References and further study. If you
have to do Greek or mathematics, see. eqn
[1 J for equation setting. To aid eqn users, .
- ms provides definitions of .EQ and .EN
which normally center the equation and set
it off slightly. An argument on .EQ is taken
to be an equation number and placed in the
right margin near the equation. In addition,
there are three special arguments to EQ: the
letters C, 1, and L indicate centered
(default), indented, and left adjusted equations, respectively. If there is both a format
argument and an equation number, give the
format argument first, as in
.EQ L 0.3a)
for a left-adjusted equation numbered
0.3a) .
Similarly, the macros .IS and .TE are
defined to separate tables (see [2]) from text
with a little space. A very long table with a
heading may be broken across pages by
beginning it with .IS H instead of .IS, and
placing the line .TH in the table data after
the heading. If the table has no heading
repeated from page to page, just use the
ordinary .TS and .TE macros.
To learn more about troffsee [3] for a
general introduction, and [4 J for the full
details (experts only). Information on
related UNIX commands is in {5]. For jobs
that do not seem well-adapted to -ms, consider other macro packages. It is often far
easier to write a specific macro pac:kages for
such tasks as imitating particular journals
than to try to adapt - ms.
Acknowledgment. Many thanks are
due to Brian Kernighan for his help in the
design and implementation of this package,
and for his assistance in preparing this
manual.
References
[11
[2]
B. W. Kernighan and L. L. Cherry,
TypeseTting Mathematics - Users Guide
(2nd edilion). Bell Laboratories Computing Science Report no. 17.
M. E. Lesk, Tbl - A Program to For . .
mat Tables. Bell Laboratories Computing Science Report no. 45.
4-12
[3]
(4]
[5]
B. V( Kernighan, A TrofJ Turonal. Bell
Laboratories, 1976.
1. F. Ossanna, NrofflTrofJ Reference
Manual, Bell Laboratories Computing
Science Report no. 51.
K. Thompson and D. M. Ritchie,
UNIX Programmer's Manual, Bell
Laboratories, 1978.
Appendix A
List of Commands
IC
2C
AB
AE
AI
AU
B
DA
DE
OS
EN
EQ
FE
FS
Return to single column format.
Start double column format.
Begin abstract.
End abstract.
Specify author's institution.
Specify author.
Begin boldface.
Provide the date on each page.
End display.
Start display (also CD, LO, 10),
End equation.
Begin equation.
End footnote.
Begin footnote.
Begin italics.
IP
KE
KF
KS
Begin indented paragraph.
Release keep.
Begin floating keep.
Start keep.
LG
LP
Increase type size.
Left aligned block paragraph.
ND
NH
NL
PP
Change or cancel date.
Specify numbered heading.
Return to normal type size.
Begin paragraph.
R
RE
RP
RS
SG
SH
SM
TL
Return to reguler font (usually Roman).
End one level of relative indenting.
Use released paper format.
Relative indent increased one level.
Insert signature line.
Specify section heading.
Change to smaller type size.
Specify title.
UL
Underline one word.
Register Names
The following register names are used by - ms internally. Independent use of these
names in one's own macros may produce incorrect output. Note that no lower case letters are
used in any - ms internal name.
#T
IT
AV
CW
IC
2C
Al
A2
A3
A4
OW
EF
FL
FM
FP
A5
AB
AE
AI
AU
B
BG
BT
C
CI
C2
CA
GW
HI
H3
H4
H5
CB
CC
CO
CF
CH
CM
CS
CT
0
DA
DE
DS
Number registers used
IQ
HM
LL
HT
IR
LT
KI
MM
IK
1M
LI
MN
IP
LE
MO
OW
OY
El
E2
E3
E4
E5
EE
EL
EM
EN
EQ
String registers used
I
EZ
II
FA
12
FE
FJ
13
14
FK
15
FN
ID
FO
IE
FQ
1M
FS
FV
IP
FY
IZ
KE
HO
4-13
in - ms
NA
NC
NF
NS
01
OJ
PO
PF
PI
PN
in - ms
MR
KF
NO
KQ
NH
KS
NL
LB
NP
LD
00
LG
LP
OK
PP
ME
PT
MF
PY
MH
QF
MN
R
MO
PO
PQ
PX
RO
ST
RI
R2
R3
R4
R5
RC
RE
RF
RH
RP
RQ
RS
T.
TB
TO
TN
TQ
RT
SO
SI
S2
SG
SH
SM
SN
SY
TA
TE
TH
TV
VS
YE
YY
ZN
TL
TM
TQ
TS
TT
UL
WB
WH
WT
XD
XF
XK
~~~
1
AU
1
AI
I
..
AB
\ I:
I
AE
I1-..
NH,SH
~~'-----------J
PP .. LP
J
text ...
Figure 1
4-14
Commands for a TM
.TM 1978-5b3 99999 99999-11
.ND April 1, 1976
.TL
The Role of the Allen Wrench in Modern
Electronics
.AU "MH 2G-111" 2345
J. Q. Pencilpusher
.AU "MH 1 K-222" 5432
X. Y. Hardwired
.AI
.MH
.OK
Tools
Design
.AS
This abstract should be short enough to
fit on a single page cover sheet.
It must attract the reader into sending for
the complete memorandum.
.AE
.CS 10 2 12 5 6 7
.NH
Introduction.
.PP
Now the first paragraph of actual text ...
A Guide to Preparing
Documents with -ms
M. E. Lesk
Bell Laboratories
August 1978
This guide gives some simple examples of document preparation on Bell Labs computers,
emphasizing the use of the -ms macro package. It enormously abbreviates information in
1. Typing Documents on UNIX and GCOS, by
M. E. Lesk:
2. Typesetting Mathematics - User's Guide,
by B. W. Kernighan and L. L. Cherry: and
3. Tbl - A Program to Format Tables, by M.
E. Lesk.
These memos are all included in the UNIX
Programmer's Manual, Volume 2. The new
user should also have A TUlOrial Introduction to
the UNIX Text Editor, by B. W. Kernighan.
Last line of text.
.SG MH-1234-JQP/XYH-unix
.NH
References ...
Commands nOI needed in a particular formal are ignored.
For more detailed information, read Advanced
Editing on UNIX and A TrofJ TulOrial, by B. W.
Kernighan, and (for experts) NrofJlTrofJ Reference Manual by J. F. Ossanna. I nformation on
related commands is found (for UNIX users) in
UNIX for Beginners by B. W. Kernighan "nd
the UNIX Programmer's Manual by K. Thompson and D. M. Ritchie.
@ Bell Laboratories
TIl/I IIIlnrmaflol1
A released paper ..... .
An internal memo, and headings
Lists. displays, and footnotes ..
Indents, keeps, and double column
Equations and registers
Tables and usage . . . . . . . . . . . .
1m ('mli/oIN'1 of
8('/1 Laborarortel.
Tille- The Role of the Allen Wrench
in Modern Electronics
Contents
A TM ....
1\
Cover Sheet for TM
Dale-April 1, 1976 :
I
TM-
Olhcr
2
Keyv.. ord~-
(GEl 13 "-J)
1978-5b3
Tools
Design
3
4
5
6
7
8
I AUlhor
: J. Q. Pencilpusher
! X. Y. Hardwired
Localion
Exl. Charging Case- 99999
MH 2G-111 2345 Filing C2.70
3 G) 46·SS CO 2.87
4 G) 40·53 (j) 3.24
5 (!> 4S·S2 (j) 3.40
6(j)51·S9G).9S-
( I.J)
c:
.
j. [aj
{3
..
. A
()J)
/
1
r
I
I
i Dividend I
I
S2.60
2: -l1·54 I
3'-l6.55/
-l 40·53 I
5-l5·521
6 51-591
2.70
2.87
3.24
3AO
( 2.2a)
Y
• (first quarter only)
.TS
(with delim 5S on. see panel 3)
doublebox. center;
cc
II.
Name Ij) Oefinition
.sp
Gamma 11) SGAMMA (z) == int sub 0 sup int \
t sup (z·11 e sup -t dt$
Sine f!) $sin (x) =- 1 over 2i ( e sup ix - e sup -ix )$
Error:f) S roman ert (z) :oz 2 over sqrt pi \
int sub 0 sup z e sup [.t sup 21 dt$
Sessel 1')$ J sub 0 (z) - 1 over pi '.
int sub 0 sup pi cos ( z sin theta) d theta S
Zeta I!) $ zeta (s) = \
sum from k = 1 to inf k sup -s ._( Re·s > 1)5
.TE
-1V'V!2
S a dot S. S b dotdot$. $ xi tilde times y vec$:
(with delim
I
55 on. see panel)),
See also the equations in [he second [able. panel 8.
Paragraph spacing
.nr PO 0
Title length
.nr LT 7i
Page offset
.nr PO
Poin t .;;ize
.nr PS 9
Page heading
.ds CH Appendix
(center>
.ds RH 7-25-76
( right)
.ds LH Private
( left)
Vertical )pacing
nr VS 11
Column width
nr CW )i
Intercolumn .;;pacing
.nr GW .5i
\1argins - head ,Ind foot
.nr HM .75i
.nr F:v1 .75i
P:lrJgraph indent
.nr PI 2n
Name
Definition
Ii Gamma
: Sine
Some Registers You Can Change
Line length
.nr LL 7i
/
.95-
The meanings of the key-letter) describing the alignment of each entry Jre:
C
~enter
n
numerical
right-adjust
a
'iubcolumn
left-ddjust
s
spanned
The global table options Jre center. expand. box.
doublebox. a/lbox. tab (x) Jnd linesize (n).
.EO L
F hat ( chi) - mark - - I del V i sup 2
.EN
.EO L
lineup =- [left ( [partial vI over [partial xl right)
I sup 2 -+- (left ( [partial vI over [partial yI right
) I sup 2 ------ lalT bda - > inf
.EN
FI'()
Price
11971 ! ... 1-54
.TE
• (first quarter only)
.EQ I (2.2a)
bold V bar sub o1U - - -left [ pile (a above b above
c I right J + left [ matrix ( col I A(11) above.
above. I col r. above. above .1 col Labove.
above A(33) : I right J cdot left ( pile I alpha
above beta above gamma I right J
.EN
-v. ,. [aj'
b..... [A (11).
.
.
tab)
I .-\ T&T Common StOck!
css
ccC
:\ dispiayed equation is marked with an equation
number dt the right margin by adding an argument
to the EO line:
d
I Error
I Bessel
I
0.5i
I Zeta
Usage
DOl.:umen[s With Just text:
trolf oms ti les
With equations only:
eqn files Ilroff oms
With tables only:
tbl tiles i troff ·ms
With both tables and equations:
tbl nlesleqnltrotf oms
Page footer
ds CF Draft
.ds LF
.ds RF similar
Page numbers
.nr'Yn3
4-19
The above generales ST.-\RE output on Geos: replace
-st with -ph t'l)r typesetter output.
NROFF/TROFF ·User's Manual
Joseph F. Ossanna
Bell Laboratories
Murray Hill, New Jersey 07974
Introduction
NROFF and TROFF are text processors under the PDP-II UNIX Time-Sharing System) that format text
for typewriter-like terminals and for a Graphic Systems phototypesetter, respectively. They accept lines
of text interspersed with lines of format control information and format the text into a printable,
paginated document having a user-designed style. NROFF and TROFF offer unusual freedom in document styling, including: arbitrary style headers and footers~ arbitrary style footnotes~ multiple automatic
sequence numbering for paragraphs, sections, etc~ multiple column output~ dynamic font and point-size
controL arbitrary horizontal and vertical local motions at any point~ and a family of automatic overstriking, bracket construction, and line drawing functions.
NROFF and TROFF are highly compatible with each other and it is almost always possible to prepare
input acceptable to both. Conditional input is provided that enables the user to embed input expressly
destined for either program. NROFF can prepare output directly for a variety of terminal types and is
capable of utilizing the full resolution of each terminal.
Usage
The general form of invoking NRQFF (or TROFF) at UNIX command level is
n roff options files
(or troff options files)
where options represents any of a number of option arguments and files represents the list of files containing the document to be formatted. An argument consisting of a single minus (-) is taken to be a
file name corresponding to the standard input. If no file names are given input is taken from the standard input. The options, which may appear in any order so long as they appear before the files, are:
Option
Effect
-olist
Print only pages whose page numbers appear in list, which consists of commaseparated numbers and number ranges. A number range has the form N-M and
means pages N through M; a initial - N means from the beginning to page N; and
a final N - means from N to the .end ..
-nN
Number first generated page N.
-sN
Stop every N pages. NROFF will halt prior to every N pages (default N-= 1) to
allow paper loading or changing, and will resume upon receipt of a newline.
TROFF will stop the phototypesetter every N pages, produce a trailer to allow
changing cassettes, and will resume after the phototypesetter START button is
pressed.
- m name Prepends the macro file /usr/lib/tmac. name to the input files.
- raN
Register a. (one-character) is set to N.
- i
Read standard input after the input files are exhausted.
-q
Invoke the simultaneous input-output mode of the rd request.
4-20
NROFF Only
-T name Specifies the name of the output terminal type. Currently defined names are 37
for the (default) Model 37 Teletype~, tn300 fW- the GE TermiNet 300 (or any terminal without half-line capabilities), 300S for the DASI-300S, 300 for the DASI300, and 450 for the DASI-450 (Diablo Hyterm).
- e
Produce equally-spaced words in adjusted lines, using full terminal resolution.
TROFF Only
- t
Direct output to the standard output instead of the phototypesetter.
- f
Refrain from feeding out paper and stopping phototypesetter at the end of the run.
- w
Wait until phototypesetter is available, if currently busy.
- b
TROFF will report whether the phototypesetter is busy or available. No text processing is done.
- a
Send a printable (ASCII) approximation of the results to the standard output.
- pN
Print all characters in point size N while retaining all prescribed spacings and
motions, to reduce phototypesetter elasped time,
-g
Prepare output for the Murray Hill Computation Center phototypesetter and direct
it to the standard output.
Each option is invoked as a separate argument; for example,
nrotf -04,8-}0 -T 300S -mabc filel file2
requests formatting of pages 4, 8, 9, and 10 of a document contained in the files named file] and file2,
specifies the output terminal as a DASI-.300S, and invokes the macro package abc.
Various pre- and post-processors are available for use with NROFF and TROFF. These include -the
equation preprocessors NEQN and EQN2 (for NROFF and· TROFF respectively), and the tableconstruction preprocessor TBl3. A rev-erse-line postprocessor COL 4 is available for multiple-column
NROFF output on terminals without reverse-line ability~ COL expects the Model 37 Teletype escape
sequences that NROFF produces by default. TK4 is a 37 Teletype simulator postprocessor for printing
NROFF output on a Tektronix 4014. TCAT4 is phototypesetter-simulator postprocessor for TROFF that
produces an approximation of phototypesetter output on a Tektronix 4014. For example, in
lbl' files
I eqn I troft'
- t options
I teat
the first I indicates the piping of TBl's output to EQN's input; the second the piping of EQN's output to
TROFF's input~ and the third indicates the piping of TROFF's output to TCAT. GCAT 4 can be used to
send TROFF (-g) output to the Murray Hill Computation Center.
The remainder of this manual consists of: a Summary and Index~ a Reference Manual keyed to the
index~ and a set of Tutorial Examples. Another tutorial is [5].
Joseph F. Ossanna
References
[1 J
K. Thompson, D. M. Ritchie, UNIX Programmer's Manual, Sixth Edition (May 1975>-
[2J
B. W. Kernighan, L. L. Cherry, Typesetting Mathematics internal memorandum.
[3]
M. E. Lesk, Tbl - A Program to Format Tables, Bell laboratories internal memorandum.
[4]
Internal on-line documentation, on UNIX.
[5J
B. W. Kernighan, A TROFF Tutorial, Bell Laboratories internal memorandum.
4-21
User's Guide (Second Edition), Bell Laboratories
SUMMARY AND INDEX
Request
Form
Initial
Value-
/fNo
Argument
Notes# Explanation
1. General Explanation
2. Font and Character Size Control
.p5 ±N
.S5 N
.cs FNM
.bd F N
.bd S F N
.ft F
.fp N F
10 point
12/36 em
off
off
off
Roman
R,I,B,S
previous
ignored
E
E
P
p
P
previous
ignored
E
11 in
v
Point size~ also \5 ± N.t
Space-character size set to N/36 em. t
Constant character space (width) mode (font F). t
Embolden font F by N-1 units. t
Embolden Special Font when current font is Ft
Change to font F'" x, xx, or 1-4. Also \fx, \f(xx, \f N.
Font named F mounted on physical position 1 ~ N~ 4.
3. Page Control
.pl ± N
.bp ±N
.pn ±N
.po ± N
.ne N
.mk R
.rt ±N
11 in
N=1
N=1
O~
26/27 in
B;,v
ignored
previous
N=1 V
none
none
internal
internal
v
D,v
D
D,v
Page length.
Eject current page~ next page number N.
Next page number N.
Page offset.
Need N vertical space (V - vertical spacing).
Mark current vertical place in register R.
Return (upward only) to marked vertical place.
4. Text Filling, Adjusting, and Centering
. br
. fi
.nf
.ad c
.na
.ce N
B
fill
fiJI
adj,both
adjust
off
adjust
B,E
B,E
E
E
N=l
B,E
Break .
Fill output lines .
No filling or adjusting of output lines.
Adjust output lines with mode c.
No output line adjusting.
Center following N input text lines.
5. Vertical Spacing
. vs N·
.Is N
.sp N
. sv N
.os
.ns
.rs
1/6inJ 2pts previous
N= 1
previous
N=l V
N=1 V
E,p
E
B,v
v
D
D
space
Vertical base line spacing ( V) .
Output N-1 Vs after each text output line.
Space vertical distance N in either direction.
Save vertical distance N.
Output saved vertical distance .
Turn no-space mode on.
Restore spacing~ turn no~space mode off.
6. Line Length and Indenting
. 11 ± N
.in ± N
. ti ± N
6.S in
N=O
previous
previous
ignored
E,m
Line length .
B,E,m Indent.
B,E,m Temporary indent.
7. Macros, Strings, Diversion, and Position Traps
.de xx yy
.am xx yy
.ds xx string . as xx string -
.xv-=..
.yy=..
ignored
ignored
Define or redefine macro xx; end at call of yy.
Append to a macro,
Define a string xx containing string.
Append string to string xx,
·Values separated by";" are for NROFF and TROFF respectively.
#Notes are explained at the end of this Summary and Index
tNo effect in NROFF.
fThe use of" • " as control character (instead of ",") suppresses the break function.
4-22
R.equest
Form
.rm :ex
.rn :ex yy
.di xx
.da xx
. wh N xx
.ch xx N
.dt N xx
.it N xx
.em xx
Initial
Value
/fNo
Argument
ignored
ignored
end
end
Notes
D
D
v
none
off
off
none
v
D,v
E
Explanation
R..!movc request. macro, o~ string.
Rename request, macro, or string xx to yy.
Divert output to macro xx.
Divert and append to :ex.
Set location trap~ negative is w.r.t. page bottom .
Change trap location.
Set a diversion trap.
.
Set an input-line count trap.
End macro is :ex.
8. Number Registers
.nr R ±N ll1
arabic
.af R e
. rr R
u
Define and set number register R~ auto-increment by M
Assign format to register R (c=l, i, I, a, A).
Remove register R .
E,m
E
E
Tab settings: left type, unless t =R (right), C (centered).
Tab repetition character.
Leader repetition character.
Set field delimiter a and pad character b.
9. Tabs, Leaders, and Fields
.ta
.tc
.Ic
. fc
Nt ...
e
e
O.8~
a b
off
O.Sin
none
none
none
none
off
10. Input and Output Conventions and Character Translations
. ec c
.eo
. lg N
.ulN
.cu N
. uf F
. cc e
. c2 c
.tr abed....
\
on
-; on
off
off
Italic
\
on
N=l
E
N=-I
E
Italic
E
E
0
none
Set escape character .
Turn off escape character mechanism.
Ligature mode on if N>O .
Underline (italicize in TROFF) N input lines.
Continuous underline in NROFF; like ul in TROFF.
Underline font set to F (to be switched to by ul) .
Set control character to c.
Set nobreak control character to e.
Translate a to b. etc. on output.
11. Local Horizontal and Vertical Motions, and the Width Function
12. Overstrike, Bracket, Line-drawing, and Zero-width Functions
13. Hyphenation .
hyphenate
. nh
. hy N
hyphenate
. hc c
\%
. hw word1...
E
E
E
ignored
No hyphenation .
Hyphenate; N = mode .
Hyphenation indicator character c.
Exception words .
off
previous
Three part ti tie.
Page number character.
Length of title .
hyphenate
\%
14. Three Part Titles .
. tl 'left' center' right'
.pc e
%
. It ± N
6.5 in
E,m
15. Output Line Numbering .
. nm ± N M S I
.nn N
off
E
N=l
E
Number mode on or off, set parameters.
0.0 not number next N lines.
16. Conditional Acceptance of Input
.if e anything
If condition c true, accept anything as input,
for multi-line use \(£lnything\}.
4-23
/fNo
Argument
Initial
Value
Request
Form
.if ! c anything
. if N anything
Notes
u
u
. if ! N anything
.if 'string1' string2' anything
.if ! 'string]' string2' anything
. ie c anything
u
.el anything
Explanation
If condition c false, accept anything.
If expression N > 0, accept anything.
If expression N ~. 0, accept anything.
If string 1 identical to string2, accept anything.
If string 1 not identical to string2, accept anything.
Ifportion of if-else~ all above forms (like if) .
Else portion of if-else .
17. Environment Switching.
N=O
.ev N
previous
Environment switched (push down).
18. Insertions from the Standard Input
. rd prompt
.ex
prompt --BEL-
Read insertion .
Exit from NROFF/TROFF .
19. Input/Output File Switching
.so filename
. nx filename
. pi program
Switch source file (push down).
Next file .
Pipe ou tpu t to program (NROFF only) .
end-of-file
20. Miscellaneous
.mc eN
.tm string
off
newline
. ig yy
.yy== ..
.pm t
all
E,m
B
.f]
Set margin character c and separation N.
Print string on terminal (UNIX standard message output).
Ignore till call of yy.
Print macro names and sizes~
if t present, print only total of sizes.
Flush output buffer.
21. Output and Error Messages
NotesRequest normally causes a break.
Mode or relevant parameters associated with current diversion level.
Relevant parameters are a part of the current environment.
E
Must stay in effect until logical output.
0
p
Mode must be still or again in effect at the time of physical output.
v,p,m,u Default scale indicator~ if not specified, scale indicators are ignored.
B
D
Alphabetical Request and Section Number Cross Reference
ad
af
am
as
bd
bp
br
c2
4
8
7
7
2
3
4
10
ee 10
ee 4
ch 7
cs 2
cu 10
da 7
de 7
di 7
ds
dl
ec
el
em
eo
ev
ex
7
7
10
16
7
10
17
18
fe
9
fi
4
20
2
2
13
13
13
n
fp
fl
he
hw
hy
ie 16
if 16
ig 20
in 6
7
it
Ie 9
Ig 10
Ii 10
6
5
It 14
me 20
mk 3
na 4
ne 3
nf 4
II
Is
4-24
nh 13
nm 15
nn 15
nr 8
ns 5
nx 19
os 5
pc 14
pi
pi
pm
pn
po
ps
rd
rm
19
'7
20
rn
rr
rs
8
5
3
n
3
3
3
2
18
7
so 19
sp 5
ss 2
sv 5
la
Ie
Ii
II
1m
tr
uf
ul
9
9
6
14
20
10
10
10
vs 5
wh 7
Escape Sequences for Characters, Indicators, and Functions
Section
Reference
10.1
10.1
2.1
2.1
2.1
7
11.1
11.1
11.1
11.1
4.1
10.6
10.7
7.3
13
2.1
7. 1
9.1
12.3
4.2
11.1
2.2
11.1
11.3
12.4
12.4
8
12.1
4.1
11.1
2.3
9.1
11.1
11.1
11.2
5.2
12.2
16
16
10.7
Escape
Sequence
Meaning
\ (to prevent or delay the interpretation of \)
Prin table version of the current escape character.
, (acute accent) ~ equivalent to \ (aa
\'
\.
, (grave accent) ~ equivalent to \ (ga
- Minus sign in the current font
\Period (dot) (see de)
\.
\(space)
Unpaddable space-size space character
Digit width space
\0
1/6 em narrow space character (zero width in NROFF)
\1
\A
1/12 em half-narrow space character (zero width in NROFF)
Non-printing, zero width character
\&
Transparent line indicator
\!
Beginning of comment
\"
Interpolate argument 1 ~ N~ 9
\$N
Default optional hyphenation character
\%
Character named xx
\(xx
Interpolate string x or xx
\ • x, \. (xx
Non-interpreted leader character
\a
\b' abc .. :
Bracket building function
Interrupt text processing
\c
Forward (down) 1/2 em vertical motion 0/2 line in NROFF)
\d
\fx,\((xx,\fN Change to font named x or xx, or position N
\h' N'
Local horizont.al motion~ move right N (negative left)
Mark horizontal input place in register x
\kx
\l'Nc'
Horizontal line drawing function (optionally with c)
\L' Nc'
Vertical line drawing function (optionally with c)
Interpolate number register x or xx
\nx,\n(xx
Overstrike characters a, b, c, ...
\o'abc.. :
Break and spread, output line
\p
Reverse 1 em vertical motion (reverse line in NROFF)
\r
Point-size change function
\sN, \s ± N
Non-interpreted horizontal tab
\t
Reverse (up) 1/2 em vertical motion 0/2 line in NROFF)
\u
\v'N'
Local vertical motion~ move down N (negative up)
\ w' string'
Interpolate width of string
Extra line-space function (negative be/ore, positive after)
\x' N'
Print c with zero width (without spacing)
\zc
Begin conditional input
\{
End conditional input
\}
Concealed (ignored) newline
\ (newline)
X, any character not listed above
\X
\\
\e
The escape sequences \\, \., \", \$, \., \a, \n, \t, and \(newline) are interpreted in copy mode (§7.2).
4-25
Predefined General Number Registers
Section
Reference
Register
Name
Description
3
11.2
7.4
7.4
%
ct
dl
dn
dw
dy
11.3
15
hp
4.1
11.2
11.2
In
mo
nl
sb
st
yr
Current page number.
Character type (set by width function).
Width (maximum) of last completed diversion.
Height (vertical size) of last completed diversion.
Current day of the week 0-7).
Current day of the month 0-31).
Current horizontal place on input line.
Output line number.
Current month 0-12).
Vertical position of last printed text base-line.
Depth of string below base line (generated by width function).
Height of string above base line (generated by width function).
. Last two digits of current year.
Predefined Read-Only Number Registers
Section
Reference
7.3
Register
Name
.$
.A
11.1
.H
.T
11.1
5.2
.V
7.4
2.2
4
6
6
4
3
3
2.3
7.5
4.1
5.1
11.2
.d
.f
.a
.c
.h
.i
.I
.n
.0
.p
.s
.t
.u
.V
.W
.x
.y
7.4
.Z
Description
Number of arguments available at the current macro level.
Set to 1 in TROFF, if -a option used~ always 1 in NROFF .
A vailable horizontal resolution in basic units .
Set to lin NROFF, if - T option used~ always 0 in TROFF .
A vailable vertical resolution in basic units .
Post-line extra line-space most recently utilized using \x' N'.
Number of lines read from current input file .
Current vertical place in current diversion~ equal to nl, if no diversion .
Current font as physical quadrant (1-4) .
Text base-line high-water mark on current page or diversion .
Current indent.
Current line length.
Length of text portion on previous output line.
Current page offset.
Current page length.
Current point size.
Distance to the next trap.
Equal to 1 in fill mode and 0 in nofill mode.
Current vertical line spacing.
Width of previous character.
Reserved version-dependent register.
Reserved version-dependent register.
Name of current diversion.
4-26.
REFERENCE MANUAL
1. General Explanstion
1.1. Form of input. Input consists of text lines, which are destined to be printed, interspersed with control
lines, which set parameters or otherwise control subsequent processing. Control lines begin with a control character- normally . (period) or • (acute accent) - followed by a one or two character name that
specifies a basic request or the substitution of a user-defined macro in place of the control line. The
control character • suppresses the break function - the forced output of a partially filled line-caused by
certain requests. The control character may be separated from the request/macro name by white space
(spaces and/or tabs) for esthetic reasons. Names must be followed by either space or newline. Control
lines with unrecognized names are ignored.
Various special functions may be introduced anywhere in the input by means of an escape character,
normally \. For example, the function \nR causes the interpolation of the contents of the number register R in place of the function~ here R is either a single character name as in \nx, or left-parenthesisintroduced, two-character name as in \n (xx.
1.2. Formatter and device resolution. TROFF internally uses 432 units/inch, corresponding to the Graphic
Systems phototypesetter which has a horizontal resolution of 1/432 inch and a vertical resolution of
1/144 inch. NROFF internally uses 240 units/inch, corresponding to the least common multiple of the
horizontal and vertical resolutions. of various typewriter-like output devices. TROFF rounds
horizontal/vertical numerical parameter input to the actual horizontal/vertical resolution of the Graphic
Systems typesetter. NROFF similarly rounds numeriCal input to the actual resolution of the output device indicated by the - T option (default Model 37 Teletype).
1.3. Numerical parameter input. Both NROFF andTROFF accept numerical input with the appended scale
indicators shown in the following table, wpere S is the current type size in points, V is the current vertical line spacing in basic units, and C is a nominal character width in basic units.
Scale
Indicator
i
e
P
m
n
p
u
v
none
Meaning
Inch
Centimeter
Pica =- 1/6 inch
Em =- S points
En
Em/2
Point == 1/72 inch
Basic unit
Vertical line space
Default, see below
::III
Number of basic units
TROFF
NROFF
432
240
432x50/127
240x50/127
72
240/6
6xS
C
3xS
C, same as Em
240/72
6
1
V
1
V
In NROFF, both the em and the en are taken to be equal to the C, which is output-device dependent~
common values are 1/10 and 1/12 inch. Actual character widths in NROFF need not be all the same
and constructed characters such as - > (-) are often extra wide. The default scaling is ems for the
horizontally-oriented requests and functions II, in, ti, ta, It, po, me, \h, and \l~ Vs for the verticallyoriented requests and functions pI, wh, ch, dt, sp, sv, ne, rt, \ v, \x, and \L~ p for the vs request~ and
u for the requests nr, if, and ie. All other requests ignore any scale indicators. When a number register containing an already appropriately scaled number is interpolated to provide numerical input, the
unit scale indicator u may need to be appended to prevent an additional inappropriate default scaling.
4-27
The number, N, may be specified in decimal-fraction form but the parameter finally stored is rounded
to an integer number of basic units.
-
The absolute position indicator I may be prepended to a number N to generate the distance to the vertical
or horizontal plate N. For vertically-oriented requests and functions, IN becomes the distance in basic
units from the current vertical place on the page or in a diversion (§7.4) to the the vertical place N. For
all other requests and functions, IN becomes the distance from the current horizontal place on the input
line to the horizontal place N. For example,
.sp 13.2c
will space in the required direction to 3.2 centimeters from the top of the page.
1.4. Numerical expressions. Wherever numerical input is expected an expression involving parentheses,
the arithmetic operators +, -, I, ., % (mod), and the logical operators <, >, <-, >-, - (or --),
& (and), : (or) may be used. Except where controlled by parentheses, evaluation of expressions is
left-to-right~ there is no operator precedence. In the case of certain requests, an initial + or - is
stripped and interpreted as an increment or decrement indicator respectively. In the presence of default
scaling, the desired scale indicator must be attached to every number in an expression for which the
desired and default scaling differ. For example, if the number register x contains 2 and the current
point size is 10, then
.11 (4.25i+\nxP+3)/2u
will set the line length
to
1/2 the sum of 4.25 inches + 2 picas
+ 30
points.
1.5. Notation. Numerical parameters are indicated in this manual in two ways. ± N means that the
argument may take the forms N, + N, or - N and that the corresponding effect is to set the affected
parameter to N, to increment it by N, or to decrement it by N respectively. Plain N means that an initial algebraic sign is not an increment indicator, but merely the sign of N. Generally, unreasonable
numerical input is either ignored or truncated to a reasonable value. For example, most requests
expect to set parameters to non-negative values~ exceptions are sp, wh, ch, nr, and if. The requests
ps, ft, po, VS, Is, 11, in, and It restore the previous parameter value in the absence of an argument.
Single character arguments are indicated by single -lower case letters and oneltwo character arguments
are indicated by a pair of lower case letters. Character string arguments are indicated by multi-character
mnemonics.
2. Font and Character Size Control
2.1. Character set. The TROFF character set consists of the Graphics Systems Commercial II character
set plus a Special Mathematical Font character set-each having] 02 characters. These character sets
are shown in the attached Table I. All ASCII characters are included, with some on the Special Font.
With three exceptions, the ASCII characters are input as themselves, and non-ASCII characters are input
in the form \ Lxx where xx is a two-character name given in the attached Table II. The three ASCII
exceptions are mapped as follows:
ASCII Input
Character
Name
acute accent
grave accent
minus
Printed by TROFF
Character
Name
,
close quote
,
open quote
hyphen
-
The characters ., " and - may be input by \', \ " and \ - respectively or by their names (Table II).
The ASCII characters @, #, ", ., " <, >, \, {, }, -, .. , and exist only on the Special Font and are
printed as a I-em space if that Font is not mounted.
NROFF understands the entire TROFF character set, but can in general print only ASCII characters,
additional characters as may be available on the output device, such characters as may be able to be
constructed by overstriking or other combination, and those that can reasonably be mapped into other
printable characters. The exact behavior is determined by a driving table prepared for each device. The
4-26
characters " " and .... print as themselves.
2.2. Fonts. The default mounted fonts are Times Roman (R), Times Italic (I), Times Bold (B), and
the Special Mathematical Font (5) on physical typesetter positions 1, 2, 3, and 4 respectively. These
fonts are used in this document. The current font, initially Roman, may be changed (among the
mounted fonts) by use of the ft request, or by imbedding at any desired point either \fx, \f(xx, or \fN
where x and xx are the name of a mounted font and N is a numerical font position. It is not necessary
to change to the Special font~ characters on that font are automatically handled. A request for a named
but not-mounted font is ignored. TROFF can be informed that any particular font is mounted by use of
the fp request. The list of known fonts is installation dependent. In the subsequent discussion of
font-related requests, F represents either a one/two-character font name or the numerical font position,
1-4. The current font is available (as numerical position) in the read-only number register .r.
~ROFF
understands font control and normally underlines Italic characters (see §10.5).
2.3. Character size. Character point sizes available on the Graphic Systems typesetter are 6, 7, 8, 9, 10,
11, 12, 14, 16, 18, 20, 22, 24, 28, and 36. This is a range of 1/12 inch to 1/2 inch. The ps request is
used to change or restore the point size. Alternatively the point size may be changed between any two
characters by imbedding a \sN at the desired point to set the size to N, or a \s ± N (1 ~ N~ 9) to
increment/decrem.ent the size by N~ \sO restores the previous size. Requested point size values that are
between two valid sizes yield the larger of the two. The current size is available in the .s register.
N ROFF ignores type size control.
Request
Form
Initial
Value
lINo
Argument
Notes· Explanation
.ps ±N
10 point
previous
E
Point size set to ± N. Alternatively imbed \s Nor \s ± N.
Any positive size value may be requested~ if invalid, the
next larger valid size will result, with a maximum of 36.
A paired sequence + N, - N will work because the previous requested value is also remembered. Ignored in
NROFF.
.ss N
12/36 em
ignored
E
Space-character size is set to N/ 36 ems. This size is the
minimum word spacing in adjusted text. Ignored in
NROFF.
.csFNM
off
p
.bd F N
off
p
·~otes
Constant character space (width) mode is set on for font
F (if mounted)~ the width of every character will be
taken to be N/36 ems. If M is absent, the em is that of
the character's point size~ if M is given, the em is Mpoints. All affected characters are centered in this space,
including those with an actual width larger than this
space. Special Font characters occurring while the
current font is F are also so treated. If N is absent, the
mode is turned off. The mode must be still or again in
effect when the characters are physically printed. Ignored
in NROFF.
The characters in font F will be artificially emboldened by
printing each one twice, separated by N-I basic units. A
reasonable value for N is 3 when the character size is in
the vicinity of 10 points. If N is missing the embolden
mode is turned off. The column heads above were
printed with .bd I 3. The mode must be still or again in
effect when the characters are physically printed. Ignored
in NROFF.
are explained at the end of the Summary and Index above.
4-29
.bd S F N
off
.ft FRoman
.fp N F
R,I,B,S
previous
P
The characters in the Special Font will be emboldened
whenever the current font is F This manual was printed
with .bd S B 3. The mode must be still or again in effect
when the characters are physically printed.
E
Font changed to F Alternatively, imbed \r F The font
name P is reserved to mean the previous font.
ignored
Font position. This is a statement that a font named F is
mounted on position N (} -4). It is a fatal error if F is
not known. The phototypesetter has four fonts physically
mounted. Each font consists of a film strip which can be
mounted on a numbered quadrant of a wheel. The
default mounting sequence assumed by TROFF is R, L B.
and S on positions 1, 2, 3 and 4.
3. Page control
Top and bottom margins are not automatically provided; it is conventional to define two macros and to
set traps for them at vertical positions 0 (top) and - N (N from the bottom). See §7. and Tutorial
Examples §T2. A pseudo-page transition onto the first page occurs either when the first break occurs or
when the first non-diverted text processing occurs. Arrangements for a trap to occur at the top of the
first page must be completed before this transition. In the following, references to the current diversion
(§7.4) mean that the mechanism being described works during both ordinary and diverted output (the
former considered as the top diversion level).
The useable page width on the Graphic Systems phototypesetter is about 7.54 inches, beginning about
1/27 inch from the left edge of the 8 inch wide, continuous roll paper. The physical limitations on
NROFF output are output-device dependent.
Request
Form
Initial
Value
If No
Argument
Notes
Explanation
.pl ± N
11 in
11 in
v
Page length set to ± N. The internal limitation is about
75 inches in TROFF and about 136 inches in NROFF.
The current page length is available in the .p register.
.bp ±N
N-l
.pn ±N
N-l
.po ±N
0; 26/27 int previous
Begin page. The current page is ejected and a new page
is begun. If ± N is given, the new page number will be
± N. Also see request os.
Page number. The next page (when it occurs) will have
the page number ± N. A pn must occur before the initial pseudo-page transition to effect the page number of
the first page. The current page num ber is in the %
register.
ignored
Page offset. The current left margin is set to ± N. The
TROFF initial value provides about 1 inch of paper margin including the physical typesetter margin of 1/27 inch.
In TROFF the maximum ()jne-Iength) + (page-offset) is
about 7.54 inches. See §6. The current page offset is
available in the .0 register.
.ne N
N-I V
D,v
Need N vertical space. ·If the distance, D, to the next
trap position (see §7.5) is less than N, a forward vertical
space of size D occurs, which will spring the trap. If
there are no remaining traps on the page, D is the
-The use of" • " as control character (instead of ".") suppresses the break function.
tValues separated by";" are for NROFF and TROFF respectively.
4-30
distance to the bottom of the page. If D < V, another
line could still be au (Pllt and spring the trap. In a di version, D is the distance to the diversion trap, if any, or is
very large.
.mk R
none
internal
D
Mark thl.! curren! ver!ical place in an internal register
(both associated with the current diversion ;eveI), or in
register R, if given. See rt request.
.rt ±N
none
internal
D,v
Return upward only to a marked vertical place in the
current diversion. If ± N (w.r. t. current place) is given,
the place is ± N from the top of the page or diversion or,
if N is absent, to a place marked by a previous mk. Note
that the sp request (~5.3) may be used in all cases
instead of rt by spacifig to the absolute place stored in a
explicit register; e. g. using the sequence .mk R ."
.sp I\n Ru.
4. Text Filling, Adjusting, and Centering
4.1. Filling and adjusting. Normally, words are collected from input text lines and assembled into a output text line until some word doesn't fit. An attempt is then made the hyphenate the word in effort to
assemble a part of it into the output line. The spaces between the words on the output line are then
increased to spread out the line to the current line length minus any current indent. A word is any string
of characters delimited by the space character or the beginning/end or the input line. Any adjacent pair
of words that must be kept together (neither split across output lines nor spread apart in the adjustment
process) can be tied together by separating them with the lmpaddable space character "\
(backs lashspace). The adjusted word spacings are uniform in TROFF and the minimum interword spacing can be
controlled with the ss request (§2). In NROFF, they are normally nonuniform because of quantization
to character-size spaces; however, the command line option -e causes uniform spacing with full output
device resolution. Filling, adjustment, and hyphenation (§13) can all be prevented or controlled. The
text length on the last line output is available in the .n register, and text base-line position on the page
for this line is in the nl register. The text base-line high-water mark (lowest place) on the curre:1t page
is in the .h register.
II
An input text line ending with., ?, or ! is taken to be the end of a sentence, and an addi tional space
character is automatically provided during filling. Multiple inter-word space characters found in the
input are retained, except for trailing spaces~ initial spaces also cause a break.
When filling is in effect, a \p may be imbedded or attached to a word to cause a break at the end of the
word and have the resulting output line spread alit to flll the current line length.
A text input line that happens to begin with a control character can be made to not look like a control
line by prefacing it with the non-printing, zero-width filler character \&. Still ~lOother way is to specify
output translation of some convenient character into the control character using tr (§lO.5L
4.2. Interrupted text. The copying of a input line in no}711 (non-fll)) mode can be interrupted by terminating the partial line with a \c. The next encountered input text line wil! be considered to be a continuation of the same line of input text. Similarly, a word within .flied text may be interrupted by terminating the word (and line) with \c~ the next encountered text will be taken as a continuation of the interrupted word. If the intervening control lines cause a break, any partial line will be forced out along
wi th any partial word.
Request
Form
.br
Initial
Value
If l!0
Argument
Notes
Explanation
B
Break. The filling of the line currently being collected is
stopped and the line is output without adjustment. Text
lines beginning with space characters and empty text
lines (blank !il1~'» :!.Lc· calJse J br:~ak.
4-31
.fi
fill on
B,E
Fill subsequent_output lines. The register .u is 1 in fill
mode and in nofill mode.
.nf
fill on
B,E
Nofil!. Subsequent output lines are neither filled nor
adjusted. Input text lines are copied directly to output
lines without regard for the current line length.
.ad c
adj,both
E
Line adjustment is begun. If fill mode is not on, adjustment will be deferred until fill mode is back on. If the
type indicator c is present, the adjustment type is
changed as shown in the following table.
adjust
°
Indicator
I
r
c
b or n
absent
.na
adjust
.ce N
off
N=l
Adjust Type
adjust left margin only
adjust right margin only
center
adjust both margins
unchanged
E
Noadjust. Adjustment is turned off: the right margin will
be ragged. The adjuStment type for ad is not changed.
Output line filling still occurs if fill mode is on.
B,E
Center the next N input text lines within the current
(line-length minus indent)' If N= 0, any residual count
is cleared. A break occurs after each of the N input
lines. If the inpu t line is too long. it will be left adjusted.
5. Vertical Spacing
5.1. Base-line spacing. The vertical spacing {vj between the base-lines of successive output lines can be
set using the vs request with a resolution of 1/144 inch == 1/2 point in TROFF, and to the output device
resolution in NROFF. V must be large enough to accommodate the character sizes on the affected output lines. For the common type sizes (9-12 points), usual typesetting practice is to set V to 2 points
greater than the point size: TROFF default is 10-point type on a 12-point spacing (as in this document).
The current V is available in the. v register. Multiple- V line separation (e. g. double spacing) may be
requested with Is.
5.2. Extra line-space. If a word contains a vertica!ly tall construct requiring the output line containing it
to have extra vertical space before and/or after it, the extra-line-space function \x' N' can be imbedded
in or attached to that word. In this and other functions having a pair of delimiters around their parameter (here ~ ), the delimiter choice is arbitrary, except that it can't look like the continuation of a number
expression for N. If N is negative, the output line containing the word will be preceded by N extra
vertical space; if N is positive, the outpu t line containing the word will be followed by N extra vertical
space. If successive requests for extra space apply to the same line, the maximum values are used.
The most recently utilized post-line extra line-space is available in the .8 register.
5.3. Blocks 0/ vertical spacE. A block of vertical space is ordinarily requested using sp, which honors the
no-space mode and which does not space past a trap. A contiguous block of vertical space may be
reserved using sv.
Request
Form
Initial
Value
.vs N
.Is N
lIND
Argument
Notes
Explanation
1/6in; 12pts previous
E,p
Set vertical base-line spacing size V. Transient extra
vertical space available with \x' N' (see above).
N= 1
E
Line spacing set to ± N. N-l Vs (blank lines) are
appended to each output text line. Appended blank lines
are omitted, if the text or previous appended blank line
previous
4-32
reached a trap position.
.sp N
N=zl V
a,v
Space vertically in either direction. If N is negative, the
motion is backward (upward) and is limited to the distance to the top of the page. Forward (downward)
motion is truncated to the distance to the nearest trap. If
the no-space mode is on, no spacing occurs (see ns, and
rs below).
.sv N
N=lV
v
Save a contiguous vertical block of size N. If the distance to the next trap is greater than N, N vertical space
is output. No-space mode has no effect. If this distance
is less than N, no vertical space is immediately output,
but N is remembered for later output (see os). Subsequent sv requests will overwrite any still remembered N.
Output saved vertical space. No-space mode has no
effect. Used to finally output a block of vertical space
requested by an earlier sv request.
.os
.ns
space
o
No-space mode turned on. When on, the no-space mode
inhibits sp requests and bp requests without a next page
number. The no-space mode is turned off when a line of
output occurs, or with rs.
.rs
space
D
Restore spacing. The no-space mode is turned off.
B
Causes a break and output of a blank line exactly like
Blank text line.
sp
1.
6. Line Length and Indenting
The maximum line length for fill mode may be set with II. The indent may be set with in~ an indent
applicable to only the next output line may be set with ti. The line length includes indent space but not
page offset space. The line-length minus the indent is the basis for centering with ceo The effect of II.
in, or ti is delayed, if a partially collected line exists, until after that line is output. In fill mode the
length of text on an output line is less than or equal to the line length minus the indent. The current
line length and indent are available in registers .1 and .i respectively. The length of three-part titles produced by tl (see §14) is independently set by It.
Request'
Form
Initial
Value
If No
Argument
Notes
Explanation
.11 ±N
6.5 in
previous
E,m
Line length is set to ± N. In TROFF the maximum
(line-length) + (page-offset) is about 7.54 inches.
.in ±N
N=O
previous
B,E,m Indent is set to ± N. The indent is prepended to each
output line.
ignored
B,E,m Temporary indent. The next output text line will be
indented a distance ± N with respect to the current
indent. The resulting total indent may not be negative.
The current indent is not changed.
.ti ±N
7. Macros, Strings, Diversion, and Position Traps
7.1. Macros and strings. A macro is a named set of arbitrary lines that may be invoked by name or with
a trap. A string is a named string of characters, not including a newline character, that may be interpolated by name at any point. Request, macro, and string names share the same name list. Macro 3,nd
string names may be one or two characters long and may usurp previously defined request, macro. or
string names. Any of these entities may be renamed with rn or removed with rm. Macros are created
by de and di, and appended to by am and da~ di and da cause normal output to be stored in a macro.
Strings are created by ds and appended to by as. A macro is invoked in the same way as a request; a
4-33
control line beginning .xx will interpolate the contents of macro xx. The remainder of the line may
contain up to nine arguments. The strings x and xx are interpolated at any desired point with \-x and
\-(xx respectively. String references and macro invocations may be nested.
7.2. Copy mode input interpretation. During the definition and extension of strings and macros (not by
diversion) the input is read in copy mode. The input is copied without interpretation except that:
•
•
•
•
•
•
•
•
The contents of number registers indicated by \n are interpolated.
Strings indicated by \ - are interpolated.
Arguments indicated by \$ are interpolated.
Concealed newlines indicated by \ (newline) are eliminated.
Comments indicated by \" are eliminated.
\t and \a are interpreted as ASCII horizontal tab and SOH respectively (§9).
\ \ is interpreted as \.
\. is interpreted as ".".
These interpretations can be suppressed by prepending a \. For example, since \ \ maps into a \, \ \n
will copy as \n which will be interpreted as a number register indicator when the macro or string is
reread.
7.3. Arguments. When a macro is invoked by name, the remainder of the line is taken to 'contain up to
nine arguments. The argument separator is the space character, and arguments may be surrounded by
double-quotes to permit imbedded space characters. Pairs of double-quotes may be imbedded in
double-quoted arguments to represent a single double-quote. If the desired arguments won 'tfit on a
line, a concealed newline may be used to continue on the next line.
When a macro is invoked the input level is pushed down and any arguments available at the previous
level become unavailable until the macro is completely read and the previous level is restored. A
macro's own arguments can be interpolated at any point within the macro with \SN, which interpolates
the Nth argument (I ~ N ~ 9). If an invoked argument doesn't exist, a null string results. For example, the macro xx may be defined by
.de xx
\ "begin definition
Today is \\$1 the \\$2.
\ "end definition
and called by
.xx Monday 14th
to produce the text
Today is Monday the 14th.
Note that the \$ was concealed in the definition with a prepended \. The number of currently available
arguments is in the .$ register.
No arguments are available at the top (non-macro) level in this implementation. Because string
referencing is implemented as a input-level push down, no arguments are avai1able from within a string.
No arguments are available within a trap-invoked macro.
Arguments are copied in copy mode onto a stack where they are available for reference. The mechanism does not allow an argument to contain a direct reference to a long string (interpolated at copy time)
and it is advisable to conceal string references (with an extra \) to delay interpolation until argumen t
reference time.
7.4. Diversions. Processed output may be diverted into a macro for purposes such as footnote processing
(see Tutorial §TS) or determining the horizontal and vertical size of some text for conditional changing
of pages or columns. A single diversion trap may be set at a specified vertical position. The number
registers dn and dl respectively contain the vertical and horizontal size of the most recently ended
diversion. Processed text that is diverted into a macro retains the vertical size of each of its lines when
reread in nofill mode regardless of the current V Constant-spaced (cs) or emboldened (bd) text that is
diverted can be reread correctly only if these modes are again or still in effect at reread time. One way
4-34
to do this is to imbed in the diversion the appropriate cs or bd requests with the transparent mechanism
described in §10.6.
Diversions may be nested and certain parameters and registers are associated with the current diversion
level (the top non-diversion level may be thought of as the Oth diversion level). These are the diversion trap and associated macro, no-space. mode, the internally-saved marked place (see mk and rt), the
current vertical place Cd register), the current high-water text base-line (,h register), and the current
diversion name Cz register).
7.5. Traps. Three types of trap mechanisms are available - page traps, a diversion trap, and an inputline-count trap. M~cro-invocation traps may be planted using wh at any page .position including the top.
This trap position may be changed using ch. Trap positions at or below the bottom of the page have no
effect unless or until moved to within the page or rendered effective by an increase in page length.
Two traps may be planted at the same position only by first planting them at different positions and
then moving one of the traps~ the first planted trap will conceal the second unless and until the first one
is moved (see Tutorial Examples §TS). If the first one is moved back, it again conceals the second
trap. The macro associated with a page trap is automatically invoked when a line of text is output
whose vertical size reaches or sweeps past the trap position. Reaching the bottom of a page springs the
top-of-page trap, if any, provided there is a next page. The distance to the next trap position is avai 1able in the .t register~ if there are no traps between the current position and the bottom of the page, the
distance returned is the distance to the page bottom.
A macro-invocation trap effective in the current diversion may be planted using dt. The.t register
works in a diversion~ if there is no subsequent trap a large distance is returned. For a description of
input-line-count traps, see it below.
Request
Form
Initial
Value
1/ No
Argument
Notes
Explanation
.de xx yy
.yy== .•
Define or redefine the macro xx. The contents of the
macro begin on the next input line. Input lines are
copied in copy mode until the definition is terminated by a
line beginning with .yy, whereupon the macro yy is
called. In the absence of yy, the definition is terminated
by a line beginning with ".. ". A macro may contain de
requests provided the terminating macros differ or the
contained definition terminator is concealed. ".. " can be
concealed as \ \ .. which will copy as \ .. and be reread as
. am xx yy
.yy=- ••
Append to macro (append version of de) .
.ds xx string -
ignored
Define a string xx containing string. Any initial doublequote in string is stripped off to permit initial blanks.
. as xx string -
ignored
Append string to string x., (append version of ds).
.rm xx
ignored
Remove request, macro, or string. The name xx is
removed from the name list and any related storage
space is freed. Subsequent references will have no effect.
.rn xx yy
ignored
Rename request, macro, or string xx to yy. If yyexists, it
is first removed.
.di xx
end
D
Divert output to macro xx. Normal text processing
occurs during diversion except that page offsetting is not
done. The diversion ends when the request di or da is
encountered without an argument~ extraneous requests
of this type should not appear when nested diversions are
being used.
4-35
D
Divert, appending to xx (append version of dO.
.wh N xx
v
Install a trap to invoke xx at page position N; a negative N
will be interpreted with respect to the page bottom. Any
macro previously plan ted at N is replaced by xx. A zero
N refers to the top of a page. In the absence of xx, the
first found trap at N, if any, is removed.
.eh xx N
v
Change the trap position for macro xx to be N. In the
absence of N, the trap, if any, is removed.
end
.da xx
.dt N xx
off
D,v
Install a diversion trap at position N in the current diversion to invoke macro xx. Another dt will redefine the
diversion trap. If no arguments are given, the diversion
trap is removed.
.it N xx
off
E
Set an input-line-count trap to invoke the macro xx after
N lines of text input have been read (control or request
!ines don't count). The text may be in-line text or text
interpolated by'inline or trap-invoked macros.
.em xx
none
The macro xx will be invoked when all inpu~ has ended.
The effect is the same as if the contents of xx had been
at the end of the last file processed.
none
8. Number Registers
A variety of parameters are available to the user as predefined, named number registers (see Summary
and Index, page 7). In addition, the user may define his own named registers. Register names are one
or two characters long and do not conflict with request, macro, or string names. Except for certain
predefined read-only registers, a number register can be read, written, automatically incremented or
decremented, and interpolated into the input in a variety of formats. One common use of user-defined
registers is to automatically number sections, paragraphs, lines, etc. A number register may be used
any time numerical input is expected or desired and may be used in numerical expressions (§ 1.4).
Number registers are created and modified using nr, which specifies the name, numerical value, and
the auto-increment size. Registers are also modified, jf accessed with an auto-incrementing sequence.
If the registers x and xx both contain N and have the auto-increment size M, the following access
sequences have the effect shown:
Sequence
\nx
\n(xx
\n+x
\n-x
\n + (xx
\n- (xx
Value
Effect on
Interpolated
Register
none
N
N
none
x incremented by M
N+M
N-M
x decremented by M
xx incremented by M
N+M
N-M
xx decremented by Xl
When interpolated, a number register is converted to decimal (default), decimal with leading zeros,
lower-case Roman, upper-case Roman, lower-case sequential alphabetic, or upper-case sequential alphabetic according to the format specified by af.
Request
Form
.nr R ±N M
Initial
Value
/fNo
Argument
Notes
Explanation
u
The number register R is assigned the value ± N with
respect to the' previous value, if any. The increment for
auto-incrementing is set to M.
4-36
.af R c
Assign format c to register R. The available formats are:
arabic
Format
1
001
i
I
a
A
Numbering
Sequence
0,1,2,3,4,5, ...
000,001,002, 00 3,004, 005, ...
O,i,ii,iii,iv,v, ...
O,I,II,III,IV, V, ...
O,a, b,c, ... ,z,aa,ab, ... ,zz,aaa, ...
O,A,B.C, ... ,Z,AA,AB, ... ,ZZ,AAA, ...
An arabic formal having N digits specifies a field width of
N digits (example 2 above), The read-only registers and
the width function (§ 11.2) are always arabic.
.rr R
ignored
Remove register R. If many registers are being created
dynamically, it may become necessary to remove no
longer used registers to recapture internal storage space
for newer registers.
9. Tabs, Leaders, and Fields
9.1. Tabs and leaders. The ASCII horizontal tab character and the ASCII SOH (hereafter known as the
leader character) can both be used to generate either horizontal motion or a string bf repeated characters. The length of the generated entity is governed by internal tab stops specifiable with tao The
default difference is that tabs generate motion and leaders generate a string of periods; te and Ie offer
the choice of repeated character or motion. There are three types of internal tab stops -left adjusti ng,
right adjusting, and centering. lIT the following table: D is the distance from the current position on the
input line (where a tab or leader was found) to the next tab stop; next-string consists of the input characters following the tab (or leader) up to the next tab (or leaded or end of line; and W is the width of
next-string.
Tab
type.
Left
Right
Centered
Length of motion or
repeated characters
Location of
D
Following D
Right adjusted within D
Centered on right end of D
D-W
D-W/2
next-string
The length of generated motion is allowed to be negative, but that of a repeated character string cannot
be. Repeated character strings contain an integer number of characters, and any residual distance is
prepended as motion. Tabs or leaders found after the last tab stop are ignored, but may be used as
next-string terminators.
Tabs and leaders are not interpreted in copy mode. \t and \a always generate a non-interpreted tab and
leader respectively, and are equivalent to actual tabs and leaders in copy mode.
9.2. Fields. A field is contained between a pair of field delimiter characters, and consists of sub-strings
separated by padding indicator characters, The field length is the distance on the input line from the
position where the field begins to the next tab stop. The difference between the total length of all the
sub-strings and the field length is incorporated as horizontal padding space that is di vided among the
indicated padding places. The incorporated padding is allowed to be negative. For example, if the field
delimiter is # and the padaing indicator is ", #" xxx" right # specifies a right-adjusted string with the
string xxx centered in the remaining space.
Request
Form
Initial
Value
If No
Argument
Notes
Explanatio"
.ta Nt ...
0.8; O.Sin
none
E,m
Set tab stops and types. t-=R, right adjusting: t==C,
centering; t absent, ieft adjusting. TROFF tab stops are
preset every O.Sin.~ NROFF every O.8in. The stop values
are separated by spaces, and a valur preceded by + is
treated as an increment to the previous stop value.
.te c
none
none
E
The tab repetition character becomes c, or is removed
specifying motion.
none
E
The leader repetition character becomes c, or is removed
specifying motion.
.Ie c
.fe a b
off
The field delimiter is set to a; the padding indicator is set
to the space character or to b, if given. In the absence of
arguments the field mechanism is turned off.
off
10. Input and Output Conyentions and Character Translations
10.I.Input character translations. Ways of inputting the graphic character set were discussed in §2.1.
The ASCII control characters horizontal tab (§9.1), SOH (§9.1), and backspace (§10.3) are discussed
elsewhere. The newline delimits input lines. In addition, STX, ETX, ENQ, ACK, and BEL are accepted,
and may be used as delimiters or translated into a graphic with tr (§10.5)' All others are ignored.
The escape character \ introduces escape sequences.-causes the following character to mean another
character, or to indicate some function. A complete list of such sequences is given in the Summary
and Index on page 6. \ should not be confused with the ASCII control character ESC of the same name.
The escape character \ can be input with the sequence \ \. The escape character can be changed with
ee, and all that has been said about the default \ becomes true for the new escape character. \e can be
used to print whatever the current escape character is. If necessary or convenient, the escape mechanism may be turned off with eo, and restored with ee.
Request
Form
Initial
Value
If No
Argument
.ee c
\
\
.eo
on
Notes
Explanation
Set escape character to \, or to c, if given.
Turn escape mechanism off.
10.2. Ligatures. Five ligatures are available in the current TROFF character set - fl, fl, H, ffi, and m.
They may be input (even in NROFF) by \ (fi, \ (fl, \ (ff, \ (Fi, and \ (FJ respectively. The ligature mode
is normally on in TROFF, and automatically invokes ligatures during input.'
Request
Form
Initial
Value
If No
.lg N
off~
on
on
Argument
Notes
Explanation
Ligature mode is turned on if N is absent or non-zero,
and turned off if N==O. If N-'2, only the two-character
ligatures are automatically invoked. Ligature mode is
inhibited for request, macro, string, register, or file
names, and in copy mode. No effect in NROFF.
10.3. Backspacing, underlining, overstriking, etc. Unless in copy mode, the ASCII backspace character is
replaced by a backward horizontal motion having the width of the space character. Underlining as a
form of line-drawing is discussed in §12.4. A generalized overstriking function is described in §12.1.
NROFF automatically underlines characters in the underline font, specifiable with uf, normally that on
font position 2 (normally Times Italic, see §2.2). In addition to ft and \f F, the underline font may be
selected by ul and cu. Underlining is restricted to an output-device-dependent subset of reasonable
characters.
4-38
Request
Form
Initial
Value
II No
Argument
Notes
Explanation
.ul N
off
N=l
E
Underline in NROFF (italicize in TROFF) the next N
input text lines. Actually, switch to underline font, saving
the current font for later restoration~ other font changes
within the span of a ul will take effect, but the restoration will undo the last change. Output generated by tl
(§14) is affected by the font change, but does not decrement N. If N> 1, there is the risk that a trap interpolated macro may provide text lines within the span~
environment switching can prevent this.
.eu N
off
N=l
E
A variant of ul that causes every characte r to be underlined in NROFF. Identical to ul in TROFF.
.uf F
Italic
Italic
Underline font set to F In NROFF, F may not be on
position 1 (initially Times Roman).
10.4. Control characters. Both the control character . and the no-break control character ' may be
changed, if desired. Such a change must be compatible with the design of any macros used in the span
of the change, and particularly of any trap-invoked macros.
Request
Form
Initial
Value
U No
Argument
Notes
Explanation
.ee c
E
The basic control character is set to c, or reset to ".".
.e2 c
E
The nobreak control character is set to c, or reset to "'''.
10.5. Output translation. One character can be made a stand-in for another character using tr. All text
processing (e. g. character comparisons) takes place with the input (stand-in) character which appears to
have the width of the final character. The graphic translation occurs at the moment of output (including diversion).
Request
Form
I" itia I
Value
. tr abed....
none
UNo
Argument
Notes
Explanation
o
Translate a into b, c into d, etc. If an odd number of
characters is given, the last one will be mapped into the
space character. To be consistent, a particular translation
must stay in effect from input to output time.
10.6. Transparent throughput. An input line beginning with a \! is read in copy mode and transparently
output (without the initial \!)~ the text processor is otherwise unaware of the line's presence. This
mechanism may be used to pass control information to a post-processor or to imbed control lines in a
macro created by a diversion.
10.7. Comments and concealed newlines. An uncomfortably long input line that must stay one line (e. g.
a string definition, or nofilled text) can be split into many physical lines by ending all but the last one
with the escape \. The sequence \ (neWline) is always ignored-except in a comment. Comments may
be imbedded at the end of any line by prefacing them with \". The newline at the end of a comment
cannot be concealed. A line beginning with \" will appear as a blank line and behave like .sp 1~ a comment can be on a line by itself by beginning the line with .\".
11. Loeal Horizontal and Vertical Motions, and the Width Function
11.1. Local Motions. The functions \ v' N' and \h' N' can be used for local vertical and horizontal motion
respectively. The distance N may be negative~ the positive directions are rightward and downward. A
local motion is one contained within a line. To avoid unexpected vertical dislocations, it is necessary
that the net vertical local motion within a word in filled text and otherwise within a line balance to zero.
The above and certain other escape sequences providing local motion are summarized in the following
table.
)+-39
Vertical
Local Motion
Effect in
TROFF
Horizontal
Local Motion
NROFF
\v'N'
Move distance N
\h' N'
\(space)
\u
\d
\r
112 em up
V2 em down
112
\0
1 em up
1 line up
line up
Effect in
TROFF
NROFF
Move distance N
Unpaddable space-size space
Digit-size space
1/2 line down
\1
\A
1/6 em space
1/12 em space
ignored
ignored
As an example, [2 could be generated by the sequence [\s-2\v'-0.4m'2\v'0.4m'\s+2~ it should be
noted in this example that the 0.4 em vertical motions are at the smaller size.
11.2. Width Function. The width function \w'string' generates the numerical width of string (in basic
units). Size and font changes may be safely imbedded in string, and will not affect the current environment. For example, .ti - \ w'l. 'u could be used to temporarily indent leftward a distance equal to the
size of the string "l. ".
The width function also sets three number registers. The registers st and sb are set respectively to the
highest and lowest extent of string relative to the baseline; then, for example, the total height of the
string is \n (stu - \n {sbu. In TROFF the number register ct is set to a value between 0 and 3: 0 means
that all of the characters in string were short lower case characters without descenders (like e) ~ 1 means
that at least one character has a descender (like y); 2 means that at least one character is tall (like H) ~
and 3 means that both tall characters and characters with descenders are present.
11.3. Mark horizontal place. The escape sequence \kx will cause the current horizontal position in the
input line to be stored in register x. As an example, the construction \kx word\h' I\nxu + 2u' word will
embolden word by backing up to almost its beginning and overprinting it, resulting in word.
12. Overstrike, Bracket, Line-drawing, and Zero-width Functions
12.1. Overstriking. A utomatically centered overstriking of up to nine characters is provided by the overstrike function \0' string'. The characters in string overprinted with centers aligned~ the total width is
that of the widest character. string should not contain local vertical motion. As examples, \o'e\" produces
e, and \0'\ (mo\ (sl' produces
~.
12.2. Zero-width characters. The function \zc will output c without spacing over it, and can be used to
produce left-aligned overstruck combinations. As examples, \z\ (ci\ (pI will produce e, and
\ (br\z\ (rn \ (ul\ (br will produce the smallest possible constructed box O.
.
12.3. Large Brackets. The Special Mathematical Fon~ contains a number of bracket construction pieces
can be combined into various bracket styles. The function \b'string' may be used
to pile up vertically the characters in string (the first character on top and the last at the bottom) ~ the
( ( II J ~ ~ It J r 1 ) that
characters are vertically separated by 1 em and the total pile is centered 1/2 em above the current baseline (V2 line in NROFF). For example, \b'\Oc\Of'E\I\b'\(rc\(rf'\x' -0.5m'.\x'0.5m' produces [EJ.
12.4. Line drawing. The function \ I' Nc' will draw a string of repeated c's towards the right for a distance N. (\1 is \ (lower case L). If c looks like a continuation of an expression for N, it may insulated
from N with a \&. If c is not specified, the _ (baseline rule) is used (underline character in NROFF). If
N is negative, a backward horizontal motion of size N is made be/ore drawing the string. Any space
resulting from N / (size of c) having a remainder is put at the beginning (Jeft end) of the string. In the
case of characters that are designed to be connected such as baseline-rule _, underrule _, and rooten -, the remainder space is covered by over-lapping. If N is less than the width of c, a single c is centered on a distance N. As an example, a macro to underscore a string can be written
.de us
\ \51 \ 1' 10\ (uJ'
4-40
or one to draw a box around a string
.de bx
\ (b r \ 1\ \ S1 \
1\
(b r \ 1' I 0 \ (rn '\ 1' I 0 \ (u I'
such that
.ul "underlined words"
and
. bx "words in
a box"
yield underlined words and lwords in a box I.
The function \L' Nc' will draw a vertical line consisting of the (optional) character c stacked vertically
apart 1 em (1 line in NROFF), with the first two characters overlapped, if necessary, to form a continuous line. The default character is the box rule I (\ (br) ~ the other suitable character is the bold vertical I
(\ (bv). The line is begun without any initial motion relative to the current base line. A positive N
specifies a line drawn downward and a negative N specifies a line drawn upward. After the line is drawn
no compensating motions are made~ the instantaneous baseline is at the end of the line.
The horizontal and vertical line drawing functions may be used in combination to prpduce large boxes.
The zero-width box-rule and the 112-em wide underrule were designed to form corners when using I-em
I vertical spacings. For example the macro
.de eb
.sp -1
\ "compensate for next automatic base-line spacing
.nf
\ "avoid possibly overflowing word buffer
\h' - .5n'\L'I\ \nau -1 '\1'\ \n (.Iu + In\(ul'\L' -1\ \nau + 1'\I'IOu - .5n\ (ul'
.n
\"draw box
wiIl draw a box around some text whose beginning vertical place was saved in number register a (e. g.
usin .mk a) as done for this ara ra h.
13. Hyphenation.
The automatic hyphenation may be switched off and on. When switched on with hy, several variants
may be set. A hyphenation indicator character may be imbedded in a word to specify desired hyphenation points, or may be prepended to suppress hyphenation. In addition, the user may specify a smaIl
exception word list.
Only words that consist of a central alphabetic string surrounded by (usually nulI) non-alphabetic
strings are considered candidates for automatic hyphenation. Words' that were input containing hyphens
(minus), em-dashes (\ (em), or hyphenation indicator characters-such as mother-in-law-are always
subject to splitting after those characters, whether or not automatic hyphenation is on or off.
Request
Form
Initial
Value
.nh
hyphenate
.hyN
on,N=l
.hc c
\%
.hw word] ...
If No
Argument
Notes
Explanation
E
Automatic hyphenation is turned off.
on,N=1
E
Automatic hyphenation is turned on for N ~ 1, or off for
N = O. If N = 2, last lines (ones that will cause a trap)
are not hyphenated. For N::II. 4 and 8, the last and first
two characters respectively of a word are not split off.
These values are additive~ i. e. N= 14 will invoke all
three restrictions.
\%
E
Hyphenation indicator cha;acter is set to c or to the
default \%. The indicator does not appear in the output.
ignored
Specify hyphenation points in words with imbedded
minus signs. Versions of a word with terminal s are
implied~ i. e. dig-it implies dig-its. This list is examined initially and after each suffix stripping. The space
available is small-about 128 characters.
14. Three Part Titles.
The titling function tl provides for automatic placement of three fields at the left, center, and right of a
line with a title-length specifiable with It. tl may be used anywhere, and is independent of the normal
text collecting process. A common use is in header and footer macros.
Request
Form
Initial
Value
/f No
Argument
Notes
.tI 'left' center' right'
Explanation
The strings left, center, and right a-re respectively leftadjusted, centered, and right-adjusted in the current
title-length. Any of the strings may be empty, and overlapping is permitted. If the page-number character (initially %) is found within any of the fields it is replaced by
the current page number having the format assigned to
register %. Any character may be used as the string de limiter.
.pc c
%
off
.It ± N
6.5 in
previous
The page number character is set to c, or removed. The
page-number register remains %.
E,m
Length of title set to ± N. The line-length and the titlelength are independent. Indents do not apply to titles;
page-offsets do.
15. Output Line Numbering.
Automatic sequence numbering of output lines may be requested with nm. When in effect, a
three-digit, arabic number plus a digit-space is prepended to output text lines. The text lines are
3 thus offset by four digit-spaces, and otherwise retain their line length; a reduction in line length
may be desired to keep the right margin aligned with an earlier margin. Blank lines, other vertical
spaces, and lines generated by tl are not numbered. Numbering can be temporarily suspended with
6 nn, or with an .nm followed by a later .nm +0. In addition, a line number indent I, and the
number-text separation S may be specified in digit-spaces. Further, it can be specified that only
those line numbers that are multiples of some number M are to be printed (the others will appear
9 as blank number fields).
/fNo
Argument
Notes
Explanation
.nm ±N M S I
off
E
Line number mode. If ± Nis given, line numbering is
turned on, and the next output line numbered is numbered ± N. Default values are M::& 1, S== 1, and 1==0.
Parameters corresponding to missing arguments are
unaffected; a non-numeric argument is considered missing. In the absence of all arguments, numbering is
turned off; the next line number is preserved for possible
further use in number register In.
.nn N
N==l
E
The next N text output lines are not numbered.
Request
Form
Initial
Value
As an example, the paragraph portions of this section are numbered with M -- 3: .nm I 3 was
placed at the beginning; .nm was placed at the end of the first paragraph; and .nm + 0 was placed
12 in front of this paragraph; and .nm finally placed at the end. Line lengths were also changed (by
\w'OOOO'u) to keep the right side aligned. Another example is .nm +5 5 x 3 which turns on
numbering with the line number of the next line to be 5 greater than the last numbered line, with
15 M == 5, with spacing S untouched, and with the indent I set to 3.
4-42
16. Conditional Acceptance of Input
In the following, c is a one-character, built-in condition name, ! signifies not, N is a numerical expression, string] and string2 are strings delimited by any non-blank, non-numeric character not in the
strings, and anything represents what is conditionally accepted.
Request
Form
Initial
Value
If No
Argument
Notes
Explanation
.if c anything
If condition c true, accept anything as
case use \ {anything\}.
. if ! c anything
If condi tion c false, accept anything.
. if N anything
u
If expression N
> 0, accept
. if ! N anything
u
If expression N
~
input~
in multi-line
anything.
0, accept anything.
.if 'string]' string2' anything
If string 1 identical to string2. accept anything.
. if ! ' string]' string2' anything
If string1 not identical to string2. accept anything.
. ie c anything
u
If portion of
if-else~
all above forms (ljke if) .
Else portion of if-else .
. el anything
The built-in condition names are:
Condition
Name
0
e
t
n
True If
Current page number is odd
Current page number is even
Formatter is TROFF
Formatter is NROFF
If the condition c is true, or if the number N is greater than zero, or if the strings compare identically
(including motions and character size and font>, anything is accepted as input. If a ! precedes the condition, number, or string comparison, the sense of the acceptance is reversed.
Any spaces between the condition and the beginning of anything are skipped over. The anything can be
either a single input line (text, macro, or whatever) or a number of input lines. In the multi-line case,
the first line must begin with a left delimiter \{ and the last line must end with a right delimiter \}.
The request ie (if-else) is identical to if except that the acceptance state is remembered. A subsequent
and matching el (else) request then uses the reverse sense of that state. ie - el pairs may be nested.
Some examples are:
.if e .tl ' Even Page %'"
which outputs a title if the page number is even; and
.ie \n%>1 \{\
'sp 0.5i
.tt ' Page %'"
'sp 11.2i \}
.et .sp 12.5i
which treats page 1 differently from other pages.
17. Environment Switching.
A number of the parameters that control the text processing are gathered together into an environmenT,
which can be switched by the user. The environment parameters are those associated with requests
noting E in their Notes column; in addition, partially collected lines and words are in the environment.
Everything else is global~ exam pies are page-oriented parameters, diversion-oriented parameters,
4-43
nu m ber registers, and macro and string definitions.
parameter values.
Request
Form
Initial
Value
1/ No
.el' N
N=O
previous
Argument
Notes
All environments are initialized with default
Explanation
Environment switched to environment 0 ~ N~ 2. Switching is done in push-down fashion so that restoring a previous environment must be done with .ev rather than
specific reference.
18. Insertions from the Standard Input
The input can be temporarily switched to the system standard input with rd, which will switch back
when two newlines in a row are found (the extra blank line is not used). This mechanism is intended
for insertions in form-letter-like documentation. On UNIX, the standard input can be the user's keyboard, a pipe, or a file.
Request
Form
Initial
Value
I/No
Argument
Notes
prompt -=BEL -
.rd prompt
.ex
Explanation
Read insertion from the standard input until two newlines in a row are found. If the standa'rd input is the
user's keyboard, prompt (or a BEL) is written onto the
user's terminal. rd behaves like a macro, and arguments
may be placed after prompt.
Exit from NROFF/TROFF. Text processing is terminated
exactly as if all input had ended.
If insertions are to be taken from, the terminal keyboard while output is being printed on the terminal,
the command line option - q will turn off the echoing of keyboard input and prompt only with BEL.
The regular input and insertion input cannot simultaneously come from the standard input.
As an example, mUltiple copies of a form letter may be prepared by entering the insertions for all the
copies in one file to be used as the standard input, and causing the file containing the letter to reinvoke
itself using nx (§19) ~ the process would ultimately be ended by an ex in the insertion file.
19. Input/Output File Switching
Request
Form
Initial
Value
1/ No
Argument
Notes
Explanation
Switch source file. The top input (file reading) level is
switched to filename. The effect of an 50 encountered in
a macro is not felt until the input level returns to the file
level. When the new file ends, input is again taken from
the original file. so's may be nested.
.so filename
.nx filename
Next file is filename. The current file is considered
ended, and the input is immedi~tely switched to filename.
end-of-file
Pipe output to program (NROFF only). This request
must occur before any printing occurs. No arguments are
transmitted to program.
.pi program
20. Miscellaneous
Request
Form
.mc eN
Initial
Value
If No
Argument
Notes
Explanation
off
E,m
Specifies that a margin character c appear a distance N to
the right of the right margin after each non-empty text
line (except those produced by tI). If the output line is
tOo-long (as can happen in nofill mode) the character will
4-44
NROFF/TROFF User's Manual
October 11, 1976
be appended to the line. If N is not given, the previous
N is used~ the initial N is 0.2 inches in NROFF and 1 em
in TROFF. The margin character used with this paragraph was a 12-point box-rule.
.tm string
newline
After skipping initial blanks, string -(rest of the line) is
read in copy mode and written on the user's terminal.
.ig yy
.yy== ••
Ignore input lines. ig behaves exactly like de (§7) except
that the input is discarded. The input is read in copy
mode, and any auto-incremented registers will be
affected.
.pm
all
Print macros. The names and sizes of all of the defined
macros and strings are printed on the user's terminal; if t
is given, only the total of the sizes is printed. The sizes
is given in blocks of 128 characters.
t
.n
B
Flush output buffer.
force output.
Used in interactive debugging to
21. Output and Error Messages.
The output from tm, pm, and the prompt from rd, as well as various error messages are written onto
UNIX's standard message output. The latter is different from the standard output, where NROFF formatted output goes. By default, both are written onto the user's terminal, but they can be inqependently
redirected.
Various error conditions may occur during the operation of NROFF and TROFF. Certain less serious
errors having only local impact do not cause processing to terminate. Two examples are word overflow,
caused by a word that is too large to fit into the word buffer (in fill mode), and line overflow, caused by
an output line that grew too large to fit in the line buffer~ in both cases, a message is printed, the
offending excess is discarded, and the affected word or line is marked at the point of truncation with a *
in NROFF and a , . in TROFF. The philosophy is to continue processing, if possible, on the grounds
that output useful for debugging may be produced. If a serious error occurs, processing terminates, and
an appropriate message is printed. Examples are the inability to create, read, or write files, and the
exceeding of certain internal limits that make future output unlikely to be useful.
4-45
TUTORIAL EXAMPLES
Tl. Introduction
Although NROFF and TROFF have by design a
syntax reminiscent of earlier text processors·
with the intent of easing their use, it is almost
always necessary to prepare at least a small set of
macro definitions to describe most documents.
Such common formatting needs as page margins
and footnotes are deliberately not built into
NROFF and TROFF. Instead, the macro and
string definition, number register, diversion,
environment switching, page-position trap, and
conditional input mechanisms provide the basis
for user-defined implementations.
The examples to be discussed are intended to be
useful and somewhat realistic, but won't necessarily cover all relevant contingencies. Explicit
numerical parameters are used in the examples to
make them easier to read and to illustrate typical
values. In many cases, number registers would
really be used to reduce the number of places
where numerical information is kept, and to concentrate conditional parameter initialization like
that which depends on whether TROFF or NROFF
is being used.
initial pseudo-page transition (§3). In fill mode,
the output line that springs the footer trap was
typically forced out because some part or whole
word didn'1 fit on it. If anything in the footer
and header that follows causes a break, that word
or part word will be forced out. In this and other
examples, requests like bp and sp that normally
cause breaks are invoked using the no-break control character ' to avoid this. When the
header/footer design contains material requiring
independent text processing, the environment
may be switched, avoiding most interaction with
the running text.
A more realistic example would be
.de hd
\ "header
.ift .tl '\(rn"\(rn' \"troffcut mark
· if \ \n % > 1 \ {\
'sp 10.5i-l
\"tl base at O.Si
.t1 ., - % - " \ "centered page number
.ps
\ "restore size
.ft
\ "rest'ore font
•vs \}
\ "restore vs
'sp 11.0i.
\"space to 1.0i
.ns
\ "turn on no .. space mode
T2. Page Margins
\"define header
\ "footer
.de fo
.ps 10
\ "set footer/header size
\ "set font
.ft R
•vs 12p
\ "set base-line spacing
.if\\n%=l \{\
'sp 1\\n(.pu-O.Si-l Vtl base O.Si up
.tl" - % -~. \} \"first page number
'bp
\"end definition
\ "define footer
.wh 0 hd
.wh -lifo
As discussed in §3, header and footer macros are
usually defined to describe the top and bottom
page margin areas respectively. A trap is planted
at page position 0 for the header, and at -N (N
from the page bottom) for the footer. The simplest such definitions might be
.de hd
'sp Ii
.de
'bp
(0
\ "end definition
.wh 0 hd
.wh -lifo
which provide blank 1 inch top and bottom margins. The header will occur on the first page,
only if the definition and trap exist prior to the
*For example: P. A. Crisman, Ed., The Compatihie TimeSharing System. MIT Press, 1965, Section AH9.01 (Description of RUNOFF program on MIT's CTSS system).
which sets the size, font, and base-line spacing
for the header/footer material, and ultimately
restores them. The material in this case is a page
num ber at the bottom of the first page and at the
top Of the remaining pages. If TROFF is used. a
cUI mark is drawn in the form of root-en's at each
margin. The sp's refer to absolute positions to
avoid dependence on the base-line spacing.
Another reason for this in the footer is that the
footer is invoked by printing a line whose vertical
spacing swept past the trap position by possibly as
4~46
much as the base-line spacing. The no-space
mode is turned on at the end of hd to render
ineffective accidental occurrences of sp at the top
of the running text.
The above method of restoring size, font, etc.
presupposes that such requests (that set previous
value) are not used in the running text. A better
scheme is save and restore both the current and
previous values as .shown for size in the following:
. de fo
.nr sl \ \n (.S
.ps
.nr s2 \ \n (.S
. ---
\ "current size
.ps \ \n (s2
.ps \ \n (sl
\ "bottom number
\ "centered page number
T3. Paragraphs and Headings
The housekeeping associated with starting a new
paragraph should be collected in a paragraph
macro that, for example, does the desired
pre paragraph spacing, forces the correct font,
size, base-line spacing, and indent, checks that
enough space remains for more than one line, and
requests a temporary indent.
.Ct R
.ps 10
.vs 12p
.in 0
.sp 0.4
.ne 1 + \ \n (. Vu
.. ti O.2i
\ "section
\ "Coree Cont, etc .
.sp 0.4
\ "prespace
.ne 2.4+\\n(.Vu \"want 2.4+ lines
.nr 5 0 1
\ "header stuff
\ "restore previous size
\"restore current size
. wh -O.Si -1 v bn \ "tl base O.Si up
. de pg
.br
.de sc
\\n+5.
Page numbers may be printed in the bottom margin by a separate macro triggered during the
footer's page ejection:
.de bn
.tl .. - % -"
A macro to automatically number section headings might look like:
.n
\ "previous size
\ "rest of footer
.de hd
. ---
The prespacing parameter is suitable for TROFF~
a larger space, at least as big as the output device
vertical resolution, would be more suitable in
NROFF. The choice of remaining space to test
for in the ne is the smallest amount greater than
one line (the .V is the available vertical resolution) .
\ "paragraph
\"break
\ "Corce font,
\"size,
\"spacing,
\ "and indent
\ "prespace
\ "want more than 1 line
\ "temp inden.t
The first break in pg will force out any previous
partial lines, and must occur before the vs. The
forcing of font, etc. is p~rtly a defense against
prior error and partly to permit things like section heading macros to set parameters only once.
4-47
The usage is .sc, followed by the section heading
text, followed by .pg. The ne test value includes
one line of heading, 0.4 line in the. following pg,
and one line of the paragraph text. A word consisting of the next section number and a period is
produced to begin the heading line. The format
of the number may be set by af (§8).
Another common form is the labeled, indented
paragraph, where the label protrudes left into the
indent space .
.de Ip
.pg
.in O.Si
.ta O.2i O.Si
.li 0
\t\ \$1 \t\c
\ "labeled paragraph
\ "paragraph indent
\ "label, paragraph
\"flow into paragraph
The intended usage is ".lp labe/"; label will begin
at 0.2 inch, and cannot exceed a length of
0.3 inch without intruding into the p~ragraph .
The label could be right adjusted against 0.4 inch
by setting the tabs instead with . ta O.4iR O.Si.
The last line of Ip ends with \c so that it will
become a part of the first line of the text that follows .
T4. Multiple Column Output
The production of multiple column pages
requires the footer macro to decide whether it
was invoked by other tha:1 the last column, so
that it will begin a new column rather than produce the bottom margin. The header can initial·
ize a column register that the footer will increment and test. The following is arranged for two
columns, but is easily modified for more.
.de hd
\"header
.e'· \}
.nr cI 0 1
.mk
\ "init column count
\" mark top of text
'bp
.de fx
.if \ \nx .di fy
.de fo
\ "footer
.ie \ \n + (cl < 2 \ {\
.po +3.4i
\"next column: 3.1 +0.3
.rt
\"back to mark
.ns \}
\" no-space mode
.el \ {\
.po \\nMu
\"restore left margin
\"column width
\" saye left margin
Typically a portion of the top of the first page
contains full width text~ the request for the narrower line length, as well as another .mk would
be made where the two column output was to
begin.
T5. Footnote Processing
The footnote mechanism to be described is used
by imbedding the footnotes in the input text at
the point of reference, demarcated by an initial
.fn and a terminal .ef:
.fn
FOOl170lf 'f.\·'
and
.ef
In the following, footnotes are processed in a
separate environment and diverted for later
printing in the space immediately prior to the
bottom margin. There is provision for the case
where the last collected footnote doesn't completely fit in the available space.
\ "header
.nr x 0 1
.nr y O-\\nb
.ch fo -\ \nbu
. if \ \ n (dn . fz
\"init footnote count
\"current footer place
\"reset footer trap
\" leftover footnote
.de fo
.nr dn 0
\ "footer
\ "zero last dhersion size
.de ef
.br
.nr z \ \n <.,'
.el'
.di
.nr y -\\n(dn
. if \ \ n x = 1 . n r
\" end footnote
\"finish output
\" save spacing
\" pop e\"
\" end dhersion
\"new footer position.
~'
- (\ \ n (. \' - \ \ n z)- \
\ "uncertainty correction
.ch fo\\nyu
\"y is negative
.if (\\n(nl+],'» (\\nCp+\\ny) \
.ch fo \\n(nlu+1\- \"it didn't fit
.de fs
\1' Ii'
\"separator
\" 1 inch rule
.br
.de fz
.fn
.nf
cOl11rollll1es ...
.de hd
\" proces~ footnote oyerflow
\"divert overflow
.de fn
\ "start footnote
.da FN
\"dh'ert (append) footnote
.el' 1
\ "in en"ironment 1
.if \ \n + x = 1 .fs \ "if first. include separator
.fi
\ "fill mode
'bp \}
.11 3.1i
.nr 1\1 \ \ n (.0
\"pop enyironment
.h'
\" get lefto"-er footnote
\ .. retain vertical size
\" where fx put it
.ef
.nr b 1.0i
\ "bottom margin size
. wh 0 hd
\" header trap
.wh 12i fo
\"footer trap. temp position
.wh -\\nbu ex \"ex at footer position
.ch fo - \ \nbu \"conceal fx with fo
The header hd initializes a footnote count register x, and sets both the current footer trap position register ~' and the footer trap itself to a nominal position specified in register b_ In addition,
if the register dn indicates a leftover footnote, fz
is invoked to reprocess it. The footnote start
macro fn begins a diversion (append) in environment 1, and increments the count x: if the count
is one, the footnote separator fs is interpolated .
The separator is kept in a separate macro to permit user redefinition. The footnote end macro ef
restores the previous environment and ends the
diversion after saving the spacing size in register
z. y is then decremented by the size of the
.if\\nx\(\
.e\" 1
\ "expand footnotes in e\'1
. nf
\" retain vertical size
.FN
\"footnotes
.rm FN
\ "delete it
.if "\ \ n ('z" fy" .di \" end overflo\\ dh'ersion
.nr x 0
\ "disable fx
4-48
footnote, available in dn~ then on the first foot~
note, y is further decremented by the difference
in vertical base~line spacings of the two environ~
ments, to prevent the late triggering the footer
trap from causing the last line of the combined
footnotes to overflow. The footer trap is then set
to the lower (on the page) of y or the current
page position (nl) plus one line, to allow for
printing the reference line. If indicated by x, the
footer fo rereads the footnotes from FN in nofill
mode in environment 1, and deletes FN. If the
footnotes were too large to fit, the macro fx will
be trap~invoked to redivert the overflow into fy,
and the register dn wtll later indicate to the
header whether fy is empty. Both fo and fx are
planted in the nominal footer trap position in an
order that causes fx to be concealed unless the fo
trap is moved. The footer then terminates the
overflow diversion, if necessary, and zeros x to
disable fx, because the uncertainty correction
together with a not-too-Iate triggering of the
footer can result in the footnote rereading finishing before reaching the fx trap.
A good exercise for the student is to combine
the multiple-column and footnote mechanisms.
T6. The Last Page
After the last input file has ended, NROFF and
TROFF invoke the end macro (§7), if any, and
when it finishes, eject the remainder of the page.
During the eject, any traps encountered are processed normally. At the end of this last page,
processing terminates unless a partial line, word,
or partial word remains. If it is desired that
another page be started, the end-macro
.de en
\ -end-macro
\c
'bp
.em en
will deposit a null partial word, and effect
another last page.
Table I
Font Style Exampl,es
The following fonts are printed in 12-point, with a vertical spacing of 14-point, and with nonalphanumeric characters separated by l/.a em space. The Special Mathematical Font was specially
prepared for Bell Laboratories by Graphic Systems, Inc. of Hudson, New Hampshire. The Times
Roman, Italic, and Bold are among the many standard fonts available from that company.
Times Roman
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890
+ - ., / :~ = ? [ ] I
1/4 1/2 3/4 fi f1 it ffi mot ' ¢
!$% & () , ,*
• 0 -
- _
Tilnes
I!alic
® ©
a bcde.!Rhoiik In1110pq rs!uvwxyz
A BCDEFGHIJKLMNOPQRSTUVWXYZ
J234567890
.' $ % & ( ) , , * + - .. / : " = ? [
• 0 - - _ '14 '/.., ·Y4.n.lf fffli If! 0 t
® ©
11
I
(
Times Bold
a bcdefgh ij kl m nopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUV\\'XYZ
1234567890
! $ IVII & ( ) , , *
• 0 -
- _
+ - .,/ :; =
1/4 1/2 3/4
fi fl ff ffifH
? ( JI
t '¢
0
® ([
Special Mathematical Font
,,'\A _'_/< > {}#@+_=*
a~y8Es~OLKA~v~orrp~~Tv¢X~W
r~e/\=nLy'I'
.J- ~
§ \1..,
~
J
0:
n
=- t.---@
= - - r! ± n
O(llJ{ HlJ f11
~
0 E
x
-7
U
C
I
4-50
::> k d
00
a
Table II
Input Naming Conventions for " ',and
and for Non-ASCII Special Characters
N on-ASCII characters and minus on the standard fonts.
Input Character
Char Name Name
close quote
open quote
\(em 3/4 Em dash
hyphen or
\(hy hyphen
current font minus
\\(bu bullet
\(sq square
0
\(ru rule
1/4 \(14 1/4
112
\(12 1/2
3f4 \(34 3/4
Char
fi
fl
ff
ffi
ft1
•
t
¢
~
~
Input
Name
\(fi
\ (fl
\(tf
\(Fi
\(FI
\(de
\(dg
\(fm
\(ct
\(rg
\(co
Character
Name
fi
fl
ff
ffi
ft1
degree
dagger
foot mark
cent sign
registered
copyright
+, -, -, and • on the special font.
#, ", " ., <, >, \, {, }, -, ", and _ exist
Non-ASCII characters and', " _,
The ASCII characters @,
only on the special font and are
printed as a l·em space if that font is not mounted. The following characters exist only on the special
font except for the upper case Greek letter names followed by t which are mapped into upper case
English letters in whatever font is mounted on font position one (default Times Roman). The special
math plus, minus, and equals are provided to insulate the appearance of equations from the choice of
standard fonts.
Input
Char Name
+ \ (pi
\(mj
\ (eq
• \(
§
\(sc
\faa
\(ga
\ (ul
/ \ (sl
\(*a
a
f3 \ (*b
\(*g
y
\ (*d
8
\(*e
E
\(*z
\(*y
T'J
()
\<*h
\ (*j
..
,
Character
Name
math plus
math minus
math equals
math star
section
acute accent
grave accent
underrule
slash (matching backs lash)
alpha
beta
gamma
delta
epsilon
zeta
eta
theta
iota
Input
Char Name
\(*k
K
\ (*1
~
\(*m
J1.
\(*n
1I
\(*c
~
\(*0
0
\(*p
rr
\(·r
p
\(*s
0f)
\ (ts
j
\ (*t
\ (*u
u
(/)
\(*f
\ (*x
X
\e.q
I/J
\(,"w
w
A V*A
B \<*B
4-51
Character
Name
kappa
lambda
mu
nu
xi
omicron
pi
rho
sigma
terminal sigma
tau
upsilon
phi
chi
psi
omega
Alphat
Betat
Input Character
Char Name Name
r
A
E
Z
H
e
I
K
A
M
N
-
0
n
P
1:
T
y
X
'I'
n
.J
~
~
-;e.
Gamma
Delta
Epsilont
Zetat
\(.y Etat
\(·H Theta
\ (·1
lotat
\(·K Kappat
\(·L Lambda
V·M Mut
\(·N Nut
\(·C Xi
\(·0 Omicront
\ (.p Pi
\(·R Rhot
Sigma
\(·S
V-T Taut
\(*U Upsilon
V-F Phi
\(*X Chit
\(-Q Psi
\(-W Omega
square root
\ (sr
root en extender
\(rn
V*G
\(*0
V*E
\(*Z
\0 -=
\(->
\(ua
\(da
\(mu
\(di
\(+-
U
\(cu
\(ca
\(sb
\(sp
\(ib
\(ip
\(if
\(pd
\(gr
\(no
\ (is
\ (pt
\(es
\(mo
C
~
k
;d
00
a
\l
...,
J
ex:
0
E
.......*
@)
I
0
f
l
1
J
~
l
I
l
J
r
1
box vertical rule
\ (br
\(dd . double dagger
\(rh
right hand
\(Ih
left hand
\(bs Bell System logo
\(or or
\ (ci
circle
\ (It .\ left top of big curly bracket
\(Ib
left bottom
right top
\ (rt
\ (rb
right bot
left center of big curly bracket
\Ok
right center of big curly bracket
\(rk
\(bv bold vertical
left floor {left bottom of big
\ (If
square bracket)
right floor (right bottom)
\ (rf
left ceiling (left top)
\(Ie
right ceiling (right top)
\(re
\ (= .. identically equal
V-a=::
\(ap
±
n
I
\(>-= >\«_ <:II:
Vox
.ps 9
.vs 11 p ,
wllh fi~c dozen liquor JU~~
7 point: Pilck 1Tl~ box with five dozen liquor jugs.
8 point: Pack my box with five dozen liquor jugs.
9 point: Pack my box with five dozen liquor jugs.
10 point: Pack my box with five dozen liquor
11 point: Pack my box with five dozen
12 point: Pack my box with five dozen
14 point: Pack my box with five
16 point 18 point 20 point
22 24
28
36
If we changed to
.ps 9
.vs 9p
the running text would look like this. After a
few lines, you will agree it looks a little cramped.
The right vertical spacing is partly a matter of
taste, depending on how much text you want to
squeeze into a given space, and partly a matter
of traditional printing style. By default, troff
uses lOon 12.
Point size and vertical spacing
make a substantial difference in the
amount of text per square inch.
This is 12 on 14.
If the number after .ps is not one of these
legal sizes, it is rounded up to the next valid
value, with a maximum of 36. If no number follows .ps, troff reverts to the previous size, whatever it was. troff begins with point size ) 0,
which is usually fine. This document is in 9
point.
POInl ~I/~ and \CrlILJI \PJlln~ 1ll~~C J ~utlslanllJI dltlcrcn(c In
Ihc JnHlur,1 ,.I 101 per ~qU.Hl· In": h,r c\Jn1pk. 10 on 12 U\CS Jt>(Jut
l"ll.;' ;I, mud, o;pJ(t J~ 7 (In ~ 11\" 1\ (1 on 7. ,,·111(1\ I~ l',cn smaller II
p.. d,\ J l(Jt Illor,· "'ord' p.:r hnl·. tlUI \IIU (In go blind lT~lIl~ 10 rCJd II
When used withoul arguments .. ps and .vs
revert to the previous. size and vertical spacing
respectively.
The point size can also be changed in the
middle of a line or even a word with the in-line
command \s. To produce
The command .sp is used to gel extra vertical space. Unadorned, it gives you one extra
blank line (one .VS, whatever that has been set
toL Typically, that's more or less than you
want, so .sp can be followed by information
about how much space you want -
UNIX runs on a PDP-11/45
type
\s8UNIX\s 10 runs on a \s8PDP-\sl 0 11/45
As above, \s should be followed by a legal point
size, except that \sO causes the size to revert to
its previous value. Notice that \s1011 can be
understood correctly as 'size 10. followed by an
11', if the size is legal, but not otherwise. Be
cautious with similar constructions.
.sp 2i
means 'two inches of vertical space'.
.sp 2p
means 'two points of vertical space'; and
Relative size changes are also legal and
useful:
.sp 2
means 'two vertical spaces' -
4-56
two of whatever
,vs is set to (this can also be made explicit with
.sp 2v)~ troir also understands decimal fractions
in most places. so
.sp l.5i
is a space of 1.5 inches. These same scale factors can be used after. vs to define line spacing,
and in fact after most commands that deal with
physical dimensions.
It should be noted that all size numbers
are converted internally to 'machine units',
which are 1/432 inch (1/6 point>, For most purposes, this is enough resolution that you don't
have to worry about the accuracy of the
representation. The situation is not quite so
good vertically, where resolution is 11144 inch
(l /2 po i n 1> .
3. Fonts and Special Characters
troir and the typesetter allow four different
fonts at anyone time. Normally three fonts
(Times roman, italic and bold) and one collection of special characters are permanen tly
mounted.
abcdefghijklmnopqrstuvwxyz 0123456789
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghljklmnopqrsn/vwxy:: 0123456789
ABeD£FGHIJK Li'vlNOPQRSTU VWx YZ
abcdefghijklmnopqrstuvwxyz 0123456789
ABCDEFGHIJKLMNOPQRSTUVWXYZ
\fBbold\fP\flface\fP\fR text\fP
Because only the immediately previous font is
remembered, you have to restore the previous
font after each change or you can lose it. The
same is true of .ps and .vs when used without an
argument.
There are other fonts available besides the
standard set, although you can still use only four
at any given time. The command Jp tells troir
what fonts are physically mounted on the
typesetter:
Jp 3 H
says that the Helvetica font is mounted on position 3. (For a complete list of fonts and what
they look like, see the troff manuat.) Appropriate
Jp commands should appear at the beginning of
your document if you do not use the standard
fonts.
It is possible to make a document relatively independent of the actual fonts used to
print it by using font numbers instead of names;
for example, \f3 and JC) mean 'whatever font
is mounted at position 3', and thus work for any
setting. Normal settings are roman font on 1,
italic on 2, bold on 3, and special on 4.
There is also a way to get 'synthetic' bold
fonts by overstriking letters with a slight offset.
Look at the .bd command in [11.
The greek, mathematical symbols and miscellany
of the special font are listed in Appendix A.
troir prints in roman unless told otherwise.
To switch into bold, use the .ft command
.ftB
Special characters have four-character
names beginning with \ (, and they may be
inserted anywhere. For example,
1/4
+
112
=
3/4
is produced by
and for italics,
\(14 + \02 = \04
.ft I
To return to roman, use .ft R~ to return to the
previous font. whatever it was, use either .ft P or
just .ft. The 'underline' command
.ul
In particular, greek letters are all of the form
\(.-, where - is an upper or lower case roman
letter reminiscent of the greek. Thus to get
t(exx{3) -
00
in bare troft' we have to type
causes the nex.t input line to print in italics. .ul
can be followed by a count to indicate that more
than one line is to be italicized.
Fonts can also be changed within a line or
word with the in-line command \f:
boldface text
is produced by
\fBbold\flface\ffi text
That line is unscrambled as follows:
\(·s
t
(
(
\(·a
ex
\
If you want to do this so the previous font,
whatever it was. is left undisturbed, insert extra
\fP commands, like this:
4-57
Vir
00
A complete list of these special names occurs in
Appendix A.
In eqn [2] the same effect can be achieved
with the input
Pater noster qui est in caelis
sanctificetur nomen tuum; adveniat
regnum tuum; fiat voluntas tua, sicut
in caelo, et in terra.... Amen.
SIGMA ( alpha times beta) - > inf
which is less concise, but clearer to the uninitiated.
Notice the use of '+' and '-' to specify the
amount of change. These change the previous
setting by the specified amount, rather than just
overriding it. The distinction is quite important:
.It + 1i makes lines one inch longer; .11 1i makes
them one inch long .
Notice that each four-character name is a
single character as far as trofl' is concerned - the
'translate' command
. tr \(mi\(em
With .in, .11 and .po, the previous value is
used if no argument is specified .
is perfectly clear, meaning
. tr - -
To indent a single line, use the 'temporary
indent' command .ti. For example, all paragraphs
in this memo effectively begin with the command
that is, to translate - into-.
Some
characters
are
automatically
translated into others: grave
and acute '
accents (apostrophes) become open and close
single quotes '-'; the combination of " ... " is generally preferable to the double quotes "... ". Similarly a typed minus sign becomes a hyphen -. To
print an explicit - sign, use \-. To get a
backslash printed, use \e.
.ti 3
Three of what? The default unit for .6, as for
most horizontally oriented commands UL .in,
.po), is ems; an em is roughly the width of the
letter 'm' in the current point size. (Precisely. a
em in size p is p points.) Alth~ugh inches are
usually clearer than ems to people who don't set
type for a living, ems have a place: they are a
measure of size that is proportional to the
current point size. If you want to make text that
keeps its proportions regardless of point size, you
should use ems for all dimensions. Ems can be
specified as scale factors directly, as in .ti 2.5m.
4. Indents and Line Lengths
trofl' starts with a line length of 6.5 inches,
too wide for 8 1hxll paper. To reset the line
length, use the .n command, as in
.11 6i
As with .sp, the actual length can be specified in
several ways; inches are probably the most intuitive.
Lines can also be indented negatively if the
indent is already positive:
.ti -OJi
The maximum line length provided by the
typesetter is 7.5 inches, by the way. To use the
full width, you will have to reset the default physicalleft margin ("page offset"), which is normally slightly less than one inch from the left
edge of the paper. This is done by the .po command.
causes the next line to be moved back three
tenths of an inch. Thus to make a decorative
initial capital~ we indent the whole paragraph,
then move the letter 'P' back with a .ti command:
ater noster· qui est in caelis
sanctificetur nomen tuum; adveniat regnum tuum; fiat voluntas tua, sicut in caelo, et in terra.
Amen.
P
.po 0
sets the offset as far to the left as it will go.
The indent command .in causes the left
margin to be indented by some specified amount
from the page offset. If we use .in to move the
left margin in, and .Il to move the right margin
to the left, we can make offset blocks of text:
Of course, there is also so'me trickery to make
the 'P' bigger (just a '\s36P\sO'), and to move it
down from its normal position (see the section
on local motions).
.in O.3i
.11 -O.3i
text to be set into a brock
.11 +O.3i
.in -O.3i
5. Tabs
Tabs (the ASCII 'horizontal tab' character)
can be used to produce output in columns, or to
set the horizontal position of output. Typically
tabs are used only in unfilled text. Tab stops are
set by' default every half inch from the current
indent, but can be changed by the .ta command.
To set stops every inch. for example,
will create a block that looks like this:
4-58
.ta Ii 2i 3i 4i 5i 6i
Unfortunately the stops are left-justified
only (as on a typewriter), so lining UP columns
of right-justified numbers can be painful. If you
have many numbers, or if you need more complicated table layout, don't use troff directly~ use
the tbl program described in (3].
For a handful of numeric columns, you
can do it this way: Precede every number by
enough blanks to make it line up when typed.
.nf
.ta Ii 2i 3i
1 tab
2 tab 3
40 tab 50 tab 60
700 tab 800 tab 900
.fi
2
50
800
Area = \(·pr\u2\d
produces
Area = 1I"r2
To make the '2' smaller, bracket it with
\s- 2... \50. Since \u and \d refer to the current
point size, be sure to put them either both inside
or both outside the siz.e changes, or you will get
an unbalanced vertical motion.
Sometimes the space given by \u and \d
isn't the right amount. The \v command can be
used to request an arbitrary amount of vertical
motion. The in-line command
Then change each leading blank into the string
\0. This is a character that does not print, but
that has the same width as a digit. When
printed, this will produce
1
40
700
local motions \u and \d. To go back up the page
half a point-size, insert a \u at the desired place~
to go down, insert a \d. (\u and \d should always
be used in pairs, as explained below.) Thus
3
60
900
It is also possible to fill up tabbed-over
space with some character other than blanks by
setting the 'tab replacement charac'ter' with the
.tc command:
.ta 1.5i 2.Si
. tc \ (ru
(\ (ru is "_It)
Name tab Age lab
produces
Name _ _ _ _ _ _ _- Age ........- _ __
To reset the tab replacement character to a
blank, use. tc with no argument. (Lines can also
be drawn with the \1 command, described in Section 6'>
troff also provides a very general mechanism called 'fields' for setting up complicated
columns. (This is used by thl). We will not go
into it in this paper.
6. Local Motions: Drawing lines and characters
Remember 'Area == 1I"r 2, and the big 'P'
in the Paternoster. How are they done? troff
provides a host of commands for placing characters of any size at any place. You can use them
to draw special characters or to tune your output
for a particular appearance. Most of these commands are straightforward, but messy to read
and tough to type correctly.
If you won't use eqn, subscripts and superscripts are most easily done with the half-line
4~59
\v'(amount)'
c',!Uses motion up or down the page by the
amount specified in '(amount)'. For example, to
move the 'P' down, we used
.in +0.6i
.11 -O.3i
.ti -0.3i
\ v'2'\s36P\sO\ v' in caelis '"
(move paragraph tn)
(shorten lines)
(move P back)
2' ater noster qui est
A minus sign causes upward motion. while no
sign or a plus sign means down the page. Thus
\ v' - 2' causes an upward vertical motion of two
line spaces.
There are many other ways to specify the
amount of motion \v'D.li'
\v'3p'
\v' -D.Sm'
and so on are all legal. Notice that the scale
specifier i or p or m goes inside the quotes. Any
character can be used in place of the quotes; this
is also true of all other troff commands described
in this section.
Since troff does not take within-the-line
vertical motions into account when figuring out
where it is on the page, output lines can have
unexpected positions if the left and right ends
aren't at the same vertical position. Thus \ v.
like \u and \d. should always balance upward
vertical motion in a line with the same amount
in the downward direction.
Arbitrary horizontal motions are also available - \h is quite analogous to \ v, except that
the default scale factor is ems instead of line
spaces. As an example.
\h' -O.li'
causes a backwards motion of a tenth of an inch.
As a practical matter, consider printing the
mathematical symbol' > > '. The default spacing
is too wide, so eqn replaces this by
».
Frequently \h is used with the 'width function' \w to generate mOl ions equal to the width
of some character string. The construction
\w'thing'
is a number equal to the width of 'thing' in
machine units (1/432 inch). All trofl' computations are ultimately done in these units. To
move horizontally the width of an 'x', we can
say
is produced by
\h'\v/x'u'
.sp 2
\s8\z\(sq\s14\z\(sq\s22\z\(sq\s36\(sq
As we mentioned above, the default scale factor
for all horizontal dimensions is m, ems, so here
we must have the u for machine units, or the
motion produced will be far too large. trofl' is
quite happy with the nested quotes, by the way,
so long as you don't leave any out.
The .sp is needed to leave room for the result.
As another example, an eXira-heavy semicolon that looks like
; instead of ~ or ~
As a live example of this kind of construction, all of the command names in the text. like
.sp, were done by overstriking with a sligh t
offset. The commands for .sp are
can be constructed with a big comma and a big
period above it:
\s + 6\z,\ v' -0.25m'.\ v'0.25m'\sO
.sp\h' - \ w'.sp'u'\h'l u'.sp
·0.25m· is an empirical constant.
A more ornate overstrike is given by the
brackefing function \b, which piles up characters
vertically, centered on the current baseline.
Thus we can get big brackets, constructing them
with piled-up smaller pieces:
Thai is. put out '.sp', move left by the width of
'.sp', move right 1 unit, and print '.sp' again.
(Of course there is a way to avoid typing that
much input for each command name, which we
will discuss in Section 11,)
II Jj
There are also several special-purpose trofl'
commands for local motion. We have already
seen \0, which is an unpaddable white space of
the same width as a digit. 'Unpaddable' means
that it will never be widened or splil across a line
by line justification and filling. There is also
\ (blank), which is an unpaddable character the
width of a space, \1. which is half that width, \-,
which is one quarter of the width of a space, and
\&, which has zero width. (This last one is useful, for example, in entering a text line which
would otherwise begin with a '.'J
The command
The accents are \(ga and \(aa, or \' and \';
remember tha I each is jusl one character to trofl'.
You can make your own overstrikes with
another special convention, \z, the zero-motion
command. \zx suppresses the normal horizontal
motion after printing the single character x, so
another character can be laid on top of it.
Although sizes can be changed within \0, il
centers the characters on the widest, and there
can be no horizontal ,or vertical motions, so \z
may be the only way to get what you want:
> \h' -0.3m'>
to produce
sysleme telephonique
x
by typing in only this:
.sp
\b\ (It\ Uk\ (Ib' \b\ (lc\ (If' x \b'\ (rc\(rr' \b\ (rt\ (rk\ (rb'
trofl' also provides a convenient facility for
drawing horizontal and vertical lines of arbitrary
length with arbitrary characters. \1'1 j' draws a
line one inch long, like this: _ -_ _ _ __
The length can be followed by the character to
use if the _ isn' I appropriate; \J'0.5i.' draws a
half-inch line of dots: ............... The construction \L is entirely analogous, except that it draws
a vertical line instead of horizontal.
\0, used like
\0' set of characters'
causes (up to 9) characters to be overstruck. centered on the widest. This is nice for accents, as
in
7. Strings
Obviously if a paper contains a large
number of occurrences of an acute accent over a
would be a
letter 'e', typing \o"e\'" for each
syst\o"e\ (ga"me t\o"e\ (aa"l\o"e\ (aa"phonique
e
which makes
4-60
.PP
great nuisance.
Fortunately, trofl' provides a way in which
you can store an arbitrary collection of text in a
'string', and thereafter use the string name as a
shorthand for its contents. Strings are one of
several troff mechanisms whose judicious use
lets you type a document with less effort and
organize it so that extensive format changes can
be made with few editing changes.
A reference to a string is replaced by whatever text the string was defined as. Strings are
defined with the command .ds. The line
.ds e \o"e\'"
defines the string e to have the value \o"e\'"
String names may be either one or two
characters long, and are referred to by \·x for
one character names or \.(xy for two character
names. Thus to get telephone, given the
definition of the string e as above, we can say
t\ "'el\ "'ephone.
If a string must begin with blanks, define it
as
.ds xx "
text
The double quote signals the beginning of the
definition. There is no trailing .quote~ the end of
the line terminates the string.
A string may actually be several lines long;
if trofl' encounters a \ at the end of any line, it is
thrown away and the next line added to the
current one. So you can make a long string simply by ending each line but the last with a
backslash:
that would be treated by trofl' exactly as
.sp
.ti +2m
.PP is called a macro. The way we tell trofl' what
.PP means is to define it with the .de command:
.de PP
.sp
.ti +2m
The first line names the macro (we used '.PP'
for 'paragraph', and upper case so it wouldn't
conflict with any name that trofl' might already
know about). The last line .. marks the end of
the definition. In between is the text, which is
simply inserted whenever trofl' sees the 'command' or macro call
.PP
A macro can contain any mixture of text and
formatting commands.
The definition of .PP has to precede its
first use~ undefined macros are simply ignored .
Names are restricted to one or two characters.
Using macros for commonly occurring
sequences of commands is critically important.
Not only does it save typing, but it makes later
changes much easier. Suppose we decide that
the paragraph indent is too small, the vertical
space is much too big, and roman font should be
forced. Instead of changing the whole document, we need only change the definition of .PP
to something like
.ds xx this \
is a very \
long string
Strings may be defined in terms of other
strings, or even in terms of themselves; we will
discuss some of these possibilities later.
.de PP
.sp 2p
.ti +3m
.ftR
\" paragraph macro
and the change takes effect everywhere we used
.PP.
8. I ntroduction to Macros
Before we can go much further in trofl', we
need to learn a bit about the macro facility, In
its simplest form, a macro is just a shorthand
notation quite similar to a string. Suppose we
want every paragraph to start in exactly the same
way - with a space and a temporary indent of
two ems:
.sp
.ti +2m
Then to save typing, we would like to collapse
these into one shorthand line, a trofl' 'command'
like
4-61
\" is a troft' command that causes the rest
of the line to be ignored. We use it here to add
comments to the macro definition (a wise idea
once definitions get complicated).
As another example of macros, consider
these two which start and end a block of offset,
unfilled text, like most of the examples in this
paper:
.de BS
.sp
.nf
.in +0.3i
\" start indented block
.de BE
.sp
.fi
.in -0.3i
\" end indented block
issue a 'begin page' command 'bp, which causes
a skip to top-of-page (we'll explain the' shortly),
Then we space down half an inch, print the title
(the use of .tt should be self explanatory; later
we will discuss parameterizing the titles), space
another 0.3 inches, and we're done .
To ask for .NP at the bottom of each page,
we have to say something like 'when the text is
within an inch of the bottom of the page, start
the processing for a new page.' This is done with
a 'when' command .wh:
Now we can surround text like
.wh -Ii NP
Copy to
John Doe
Richard Roberts
Stanley Smith
by the commands .BS and .BE, and it will come
out as it did above. Notice that we indented by
.in +0.3i instead of .in O.3i. This way we can
nest our uses of .BS and BE to get blocks within
blocks.
If later on we decide that the indent should
be O.Si, then it is only necessary to change the
definitions of .BS and .BE, not the whole paper.
(No •.' is used before NP; this is simply the
name of a macro, not a macro call.) The minus
sign means 'measure up from the boltom of the
page', so '-1 i' means 'one inch from the bottom'.
The .wh command appears in the input
outside the definition of .NP; typically the input
would be
.de NP
.wh -Ii NP
9. Titles, Pages and Numbering
This is an area where things get tougher,
because nothing is done for you automatically.
Of necessity, some of this section is a cookbook,
to be copied literally until you get some' experience.
Suppose you want a title at the top of each
page, saying just
right toP-------left top
center top
In roff, one can say
.he 'left top'center top'right top'
.fa 'left bottom'center bottom'right bottom'
to get headers and footers automatically on every
page. Alas, this doesn't work in troff, a serious
hardship for the novice. I nstead you have to do
a lot of specification.
You have to say what the actual title is
(easy); when to print it (easy enough); and what
to do at and around the title line (harder), Taking these in reverse order, first we define a
macro .NP (for 'new page') to process titles and
the like at the end of one page and the beginning
of the next:
.de NP
'bp
'sp O.Si
.tl 'left top'center top'right top'
'sp 0.3i
Now what happens? As text is actually
being output, troff keeps track of its vertical
position on the page, and after a line is printed
within one inch from the bottom, the .NP macro
is activated. (In the jargon, the .wh command
sets a (rap at the specified place, which is
'sprung' when that point is passed,) .NP causes a
skip to the top of the next page (that's what the
'bp was for), then prints the title with the
appropriate margins.
Why 'bp and 'sp instead of .bp and .sp?
The answer is that .sp and .bp, like several other
commands, cause a break to take place. That is,
all the input text collected but not yet printed is
flushed out as soon as possible, and the next
input line is guaranteed to start a new line of
output. If we had used .sp or .bp in the .NP
macro, this would cause a break in the middle of
the current output line when a new page is
started. The effect would be to print the leftover part of that line at the top of the page, followed by the next input line on a new output
line. This is nor what we want. Using' instead
of . for a command tells troff that no break is to
take place - the output line currently being
filled should no! be forced out before the space
or new page.
The list of commands that cause a break is
short and natural:
.bp
To make sure we're at the top of a page, we
.br
.ce
All others cause
4-62
no
.fi
.nf .sp
.in
.ti
break, regardless of whether
you use a . or a '. If you really need a break, add
a .br command at the appropriate place.
One other thing to beware of - if you're
changing fonts or point sizes a lot, you may find
that if you cross a page boundary in an unexpected font or size, your titles come out in that
size and font instead of what you intended.
Furthermore, the length of a title is independent
of the current line length,so titles will come out
at the default length of 6.5 inches unless you
change it, which is done with the .It command.
There are several ways to fix the problems
of point sizes and fonts in titles. For the simplest applications, we can change .NP to set the
proper size and font for the title, then restore
the previous values, like this:
.de NP
'bp
'sp 0.5i
.ft R
\" set title font to roman
.ps 10
\" and size to 10 point
\" and length to 6 inches
.It 6i
.tl 'left' cen ter' righ t'
.ps
\" revert to previous size
\" and to previous font
.ft P
'sp 0.3i
This version of .NP
fields in the .tl command
changes. To cope with
'environment' mechanism,
in Section 13.
does not work if the
contain size or font
that requires trolrs
which we will discuss
10. Number Registers and Arithmetic
troff has a facility for doing arithmetic, and
for defining and using variables with numeric
values, called number regIsters. Number registers, like strings and macros, can be useful in
setting up a document so it is easy to change
later. And of course they serve for any sort of
arithmetic computation.
Like strings, number registers have one or
two character names. They are set by the .nr
command, and are referenced anywhere by \nx
(one character name) or \n(xy (two character
name).
There are quite a few pre-defined number
registers maintained by troff, among them % for
the current page number; nl for the current ver:·
ical position on the Pllge: dy, rna and yr for the
current day, month and year: and .s and .f for
the current size and font. (The font is a number
from 1 to 4.) Any of these can be used in computations like any other register, but' some, like
.s and J, cannot be changed with .nr .
As an example of the use of number registers, in the - ms macro package [41. most
significant parameters are defined in terms of the
values of a handful of number registers. These
include the point size for text, the vertical spacing, and the line and title lengths. To set the
point size and vertical spacing for the following
paragraphs, for example, a user may say
.nr PS 9
.nr VS 11
To get a footer at the bottom of a page,
you can modify .NP so it does some processing
before the 'bp command, or split the job into a
footer macro invoked at the bottom margin and
a header macro invoked at the top of the page.
These variations are left as exercises.
Output page numbers are computed
automatically as each page is produced (starting
at 1), but no numbers are printed unless you ask
for them explicitly. To get page numbers
printed, include the character % in the .tl line at
the position where you want the number to
appear. For example
.tl"-%-"
centers the page number inside hyphens, as on
this page. You can set the page number at any
time with either .bp n, which. immediately starts
a new page numbered n, or with .pn n, which
sets· the page number for the next page but
doesn't cause a skip to the new page. Again,
.bp +n sets the page number to n more than its
current value~ .bp means .bp + 1.
4-63
The paragraph macro .PP is defined (roughly) as
follows:
.de PP
.ps \\ n (PS
.vs \\n(VSp
.ftR
.sp O.5v
.ti +3m
\"
\"
\"
\"
reset size
spacing
font
half a line
This sets the font to Roman and the point size
and line spacing to whatever values are stored in
the number registers PS and VS.
Why are there two backslashes? This is
the eternal problem of how to quote a quote .
When troff originally reads the macro definition,
it peels off one backslash to see what's coming
next. To ensure that another is left in th~
definition when the macro is lISI!d. we have to
put in two backslashes in the detlnition. If only
one backslash is used, point size and vertical
spacing will be frozen at the time the macro is
defined, not when it is used .
Protecting by an extra layer of backslashes
is only needed for \n, \., \$ (which we -haven't
come to yet), and \ itself. Things like \s, \L \h,
\v, and so on do not need an extra backslash,
since they are converted by trofl' to an internal
code immediately upon being seen.
.nr II 7i12
.11 \\n(lIu
does just what you want, so long as you don't
forget the u on the .11 command.
11. Macros with arguments
Arithmetic expressions can appear anywhere that a number is expected. As a trivial
example,
.or PS \\n(PS-2
decrements PS by 2. Expressions can use the
arithmetic operators +, -, ., I, %1 (mod), the
relational operators >, >==, <, <==, =, and
'= (not equal), and parentheses.
A \though the arithmetic we have done so
far has been straightforward, more complicated
things are somewhat tricky. First, number registers hold only integers. troff arithmetic uses
truncating integer division, just like Fortran.
Second. in the absence of parentheses, evaluation is done left-to-right without any operator
precedence (including relational operators).
Thus
The next step is to define macros that can
change from one use to the next according to
parameters supplied as arguments. To make this
work, we need two things: first, when we define
the macro, we have to indicate that some parts
of it will be provided as arguments when the
macro is called. Then when the macro is called
we have to provide actual arguments to be
plugged into the definition.
Let us illustrate by defining a macro .SM
that will print its argument two points smaller
than the surrounding text. That is, the macro
call
.SM TROFF
will produce TROFF.
The definition of .SM is
.de SM
7·-4+3113
becomes' -1 '. Number registers can occur anywhere in an expression, and so can scale indicators like p, i, m, and so on (but no spaces).
Although integer division causes truncation, each
number and its scale indicator is converted to
machine units (1/432 inch) before any arithmetic
is done, so 1i/2u evaluates to O.Si correctly.
The scale indicator u often has to appear
when you wouldn't expect it - in particu lar,
when arithmetic is being done in a context that
implies horizontal or vertical dimensions. For
example,
\s- 2\ \$1\s + 2
\\'ithin a macro definition, the symbol \ \$n
refers to the mh argument that the macro was
called with. Thus \\$1 is the string to be placed
in a smaller point size when .SM is called.
As a slightly more complicated version, the
following definition of .SM permits optional
second and third arguments tha t will be printed
in the normal size:
.de SM
\ \$3\s- 2\ \$1\s + 2\ \$2
.11 7/2i
would seem obvious enough - 3'h inches.
Sorry. Remember that the default units for horizontal parameters like .11 are ems. That's really
'7 ems / 2 inches', and when translated into
machine units, it becomes zero. How about
Arguments not provioed when the macro is
called are treated as empty, so
.SM TROFF ),
produces TROFF), while
.SM TROFF ). (
.11 7i/2
produces (TROFF). It is convenient to reverse
the order of arguments because trailtng pu nctuation is much more common than leading .
By the way, the number of arguments that
a macro was called with is available in number
register .$.
Sorry, still no good - the '2' is '2 ems', so
'7i/2' is small, although not zero. You must use
.11 7i/2u
So again, a safe rule is to attach a scale indicator
to every number, even constants.
For arithmetic done within a .m command.
there is no implication of horizontal or vertical
dimension, so the default units are 'units', and
7i12 and 7i/2u mean the same thing. Thus
The following macro .BD is the one used
to make the 'bold roman' we have been using
for troff command names in text. It combines
horizontal motions, width computations, and
argument rearrangement.
4-64
.de BD
.
\&\ \S3\fl\ \51 \h' -\ w,\ \SI'u + 1u'\ \Sl\fP\ \$2
The \h and \w commands need no extra
backstash, as we discussed above. The \& is
there in case the argument begins with a period.
with something like
.ds CT - % to give just the page number between hyphens
(as on the top of this page), but a user could
supply private definitions for any of the strings.
12. Conditionals
Two backslashes are needed with the \ \Sn
commands, though, to protect one of them when
the macro is being defined. Perhaps a second
example will make this clearer. Consider a
macro called .SH which produces section headings rather like those in this paper, with the sections numbered automatically, and the title in
bold in a smaller size. The use is
.SH "Section title .....
(If the argument to a macro is to contain blanks,
then it must be surrounded by double quotes,
unlike a string, where only one leading quote is
permitted.)
Here is the definition of the .SH macro:
.nr SH 0
\" initialize section number
.de SH
.sp 0.3i
.ft B
.nr SH \\n(SH+l \" increment number
.ps \\n(PS-l
\" decrease PS
\" number. t'itle
\\n(SH. \\$1
.ps \\n(PS
\" restore PS
.sp 0.3i
.ft R
The section number is kept in number register
SH, which is incremented each time just before it
is used. (A number register may have the same
name as a macro without conflict but a string
may not.)
We used \\n(SH instead of \n(SH and
\ \n(PS instead of \n(PS. If we had used \n(SH,
we would get the value of the register at the time
the macro was defined. not at the time it was
used. If that's what you want, fine, but not here.
Similarly, by using \ \n (PS, we get the point size
at the time the macro is called.
As an example that does not involve
numbers, recall our .NP macro which had a
.tl'left'center'right'
We could make these into parameters by using
instead
so the title comes from three strings called LT,
eT and RT. If these are empty, then the title
will be a blank line. Normally CT would be set
Suppose we want the .SH macro to leave
two extra inches of space just before section 1,
but nowhere else. The cleanest way to do that is
to test inside the .SH macro whether the section
number is 1, and add some space if it is. The.if
command provides the conditional test that we
can add just before the heading line is output:
.if\\n(SH-l .sp 2i
\" first
)e", ••
v
•
on'"
The condition after the .if can Dt! any
arithmetic or logical expression. If the condition
is logically true, or arithmetically greater than
zero, the rest of the line is treated. as if it were
text - here a command. If the condition is
false, or zero or negative, the rest of the line is
skipped .
It is possible to do more than one command if a condition is true. Suppose several
operations are to be done before section 1. One
possibility is to define a macro .S 1 and invoke it
if we are about to do section 1 (as determined by
an .if>.
.de SI
-.- processing for section 1 -..de SH
.if\\n(SH-l.Sl
An alternate way is to use the extended
form of the .if, like this:
.if\\n(SH-l \{--- processing
for section 1 ----\}
The braces \ ( and \) must occur in the positions
shown or you will gel unexpected extra lines in
your output. trotf also provides an 'if-else' construction, which we will not go into here.
A condition can be negated by preceding it
with !; we get the same effect as above (but less
clearly) by using
.if !\\n(SH> 1 .51
There are a handful of other conditions
that can be tested with .if. For example, is ;the
current page even or odd?
.if e .tl "even page title"
.if c .tl "odd page title"
gives facing pages different titles when used
inside an appropriate new page macro.
Two other conditions are t and n, which
tell you whether the formatter is troff or nroff.
.if t troff stuff .. .
.if n nroff stuff .. .
Finally, string comparisons may be made
in an .if:
.if 'stringl'string2' stuff
does 'stuff' if sTrIng) is the same as slrIng2. The
character separating the strings can be anything
rt::as '>::c that is not contained in either string.
The Sll ~:.::, themselves can reference strings with
\*, arguments with \$, and so on.
13. Environments
As we mentioned, there is a potential
problem when going across a page boundary:
parameters like size and font for a page title may
well be different from those in effect in the text
when the page boundary occurs. troff provides a
very general way to deal with this and similar
situations. There are three 'environments', each
of which has independently settable versions of
many of the parameters associated with processing, including size, font, line and title lengths,
fill/notill mode, tab stops, and even partially collected lines. Thus the titling problem may be
readily solved by processing the main text in one
environ men t and titles in a separate one with its
own suitable parameters.
The command .ev n shifts to environment
n: n must be 0, 1 or 2. The command .ev with
no argument returns to the previous environment. Environment names are maintained in a
stack, so calls for different environments may be
nested and unwound consistently.
version shown. keeps all the processing in one
place and is thus easier to understand and
change.
14. Diversions
There are numerous occasions in page layout when it is necessary to store some text for a
period of time without actually printing it. Footnotes are the most obvious example; the text of
the footnote usually appears in the input well
before the place on the page where it is to be
printed is reached. In fact, the place where it is
output normally depends on how big it is, which
implies that there must be a way to process the
footnote at least enough to decide its size
without printing it.
trofl' provides a mechanism called a diversion for doing this processing. A ny part of the
output may be diverted into a macro instead of
being printed, and then at some convenient time
the macro may be put back into the input.·
The command .di xy begins a diversion all subsequent output is collected into the macro
xy until the command .di with no arguments is
encountered. This terminates the diversion.
The processed text is available at any time
thereafter, simply by giving the command
.xy
The vertical size of the last finished diversion is
contained in the built-in nu mber register dn.
As a simple example, suppose we want to
implement a 'keep-release' operation, so that
text between the commands .KS and .KE will not
be split across a page boundary (as for a figure or
table>. Clearly, when a .KS is encountered, we
have to begin diverting the output so we can find
out how big it is. Then when a .KE is seen, we
decide whether the diverted text will fit on the
current page, and print it either there if it fits, or
at the top of the next page if it doesn't. So:
.de KS
.br
.ev 1
.fi
.di XX
Suppose we say that the main text is processed in environment 0, which is where troff
begins by default. Then we can modify the new
page macro .NP to process titles in environment
1 like this:
.de NP
.ev 1
.It 6i
\" shift to new environmen t
\" set parameters here
start keep
start fresh line
collect in new environment
make it filled .text
collect in XX
.de KE
\" end keep
\" get last partial line
.br
.di
\" end diversion
.if\\n(dn> ==\\nCt .bp \" bp if doesn't fit
.nf
\" bring it back in no-fill
.XX
\" text
\" return to normal environment
.ev
.ft R
.ps 10
... any other processing '"
.ev
\" return to previous environment
It is also possible to initialize the parameters for
an environment outside the .NP macro, but the
\ ..
\"
\"
\"
\"
Recall that number register nl is the current
4-66
position on the output page. Since output was
being diverted. this remains at its value when the
diversion started. dn is the amount of text in
the diversion: .t (another built-in register> is the
distance to the next trap. which we assume is at
the bottom margin of the page. If the diversion
is large enough to go past the trap. the .if is
satisfied. and a .bp is issued. In either case. the
diverted output is then brought back with .XX. It
is essential to bring it back in no-fill mode so
troff will do no further processing on it.
This is not the most general keep-release. /
nor is it robust in the face of all conceivable
inputs. but it would require more space than we
have here to write it in full generality. This section is not intended to teach everything about
diversions. but to sketch out enough that you
can read existing macro packages with some
comprehension.
Ack now ledgemen ts
I am deeply indebted to 1. F. Ossanna. the
author of troff. for his repeated patient explanations of fine points. and for his continuing willingness to adapt troff to make other uses easier.
I am also grateful to Jim Blinn, Ted Dolotta,
Doug Mcilroy, Mike Lesk and Joel Sturman for
helpful comments on this paper.
References
[11
[21
F. Ossanna, NROFFITROFF User's
Manual, Bell Laboratories Computing Science Technical Report 54, 1976.
.
B. W. Kernighan, A System for Type.seftlf1!(
,WarhemallCs - User's Glilde (Second Ed/lion), Bell Laboratories Computing Science
Technical Report 17, 1977.
1.
[31
M. E. Lesk, TBL - A Program ro Formar
Tables, Bell Laboratories Computing Science Technical Report 49, 1976.
[4]
M. E. Lesk, Typln!( Documents on UNIX,
Bell Laboratories, 1978.
1. R. Mashey and D. W. Smith, PWBI,W,W
...... ProRrammer's Workbench Memorandum
lvtacros,
Bell
Laboratories
internal
memorandum.
[5J
4-67
Appendix A: Phototypesetter Character Set
These characters exist in roman, italic. and bold. To get the one on the left, type the four-character
name on the right.
tf \ (tf
\. (ru
\ko
~
\(rg
fi
Q
~,
•
\{fi
\(em
\(de
\ (bu
m \(FI
\(0
fl
ffi \{Fi
'/2 \{12
\{fm
t \(dg
\(hy
0 \(sq
(In bold, \ (sq is -.)
'/4 \04
~/4
\ 04
C
\
J
C
~
§
(
l
{
I
i
\(pd
\(sp
\ (ip
\(ga
\ (dd
\ (It
1
\ (rt
\ (Ib
\Ok
\(br
J
\(rb
\(rk
\(or
:::J
;?
~
..
\(mu
~
\(> -=
~
\«
\(no
\(pt
\(ua
\ (if
\
<
\
#
@
For greek, precede the roman letter by \ {. to get the corresponding greek: for example. \ {·a is
a b g d e z y h i kim nco p r stu f x q
Q
~
Y5
E ,
~
9
L
K
A~ v
f
0
n PUT V ~
W
X~ w
ABGDEZYHIKLMNCOPRSTUFXQW
ABrAEZH0IKAMN:::Onp!TYX'I'O
4-68
Q.
Tbl
Ai Program to Format Tables
M. E. Lesk
Bell Laboratories
Murray Hill, New Jersey 07974
ABSTRACT
Tbl is a document formatting preprocessor for 1roff or nro.ff which makes
even fairly complex tables easy to specify and enter. It is available on the PDP11 UNIX· system and on Honeywell 6000 Geos. Tables are made up of columns
which may be independently centered, right-adjusted, left-adjusted, or aligned
by decimal points. Headings may be placed over single columns or groups of
columns. A table entry may contain equations, or may consist of several rows
of text. Horizontal or vertical lines may be drawn as desired in the table, and
any table or element may be enclosed in a box. For example:
1970 Federal Budget Transfers
(in billions of dollars)
State
Taxes
collected
Money
speJ1.(
Net
New York
New Jersey
Connecticut
Maine
California
New Mexico
Georgia
Mississippi
Texas
22.91
8.33
4.12
0.74
22.29
0.70
3.30
1.15
9.33
21.35
6.96
3.10
0.67
22.42
1.49
4.28
2.32
11.13
-1.56
-1.37
-1.02
-0.07
+0.13
+0.79
+0.98
+ 1.17
+ 1.80
January 16, 1979
• UNIX is a Trademjirk/Service Mark of the Bell System
4-70
Tbl - A Program to Format Tables
M. E. Lesk
Bell Laboratories
Murray Hill, New Jersey 07974
Introduction.
Tbl turns a simple description of a table into a fro/f or I7rofl (11 program {list of commands} that prints the table. Tbl may be used on the PDP· I I U:'4IX [2] system and on the
Honeywell 6000 Geos system. It attempts to isolate a portion of a job that it can successfully
handle and leave the remainder for other programs. Thus fbi may be used with the equation
formatting program eqn [3] or various layout macro packages [4,5,6]. but does not duplicate
their functions.
This memorandum is divided into two parts. First we give the rules for preparing fbi
input: then some examples are shown. The description of rules is precise but technical, and the
beginning user may prefer to read the examples first, as they show some common table
arrangements. A section explaining how to invoke fbi precedes the examples. To avoid repetition, henceforth read frp/fas "froffor I1ro/J."
The input to fbi is text for a document, with tables preceded by a ".TS" (table start)
command and followed by a TE." (tuble end) command. Tbl processes the tables. generating
frp/f formatting commands, and leaves the remainder of the text unchanged. The
TS" and
" . TE" lines are copied, too. so that frpfl page layout macros (such as the m~mo formatting
macros [41) can use these lines to delimit and place tables as they see fit. In particular, any
arguments on the" . TS" or " •TE" lines are copied but otherwise ignored, and may be used by
document layout macro commands.
The format of the input is as f)llows:
H.
H.
text
.TS
fable
.TE
text
.TS
fable
.TE
text
where the format of each table is as follows:
.TS
OPflOf75 ;
lormat .
data
.TE
Each table is independent, and must contain formatting information followed by the data to be
entered in the table. The formatting information, which describes the individual columns and
rows of die table, may be prec~ded by a few options that affect the entire table. A detailed
description of tables is given in the next section.
4-71
Input commands.
As indicated above, a table contains, first, global option:, then a format section describing
the layout of the table entries, and then the data to be printed. The format and data are always
required, but not the options. The various parts of the table are entered as follows: .
1)
OPTIONS. There may be a single line of options affecting the whole table.
If present, this
line must follow the . TS line immediately and must contain a list of option names
separated by spaces, tabs, or commas, and must be terminated by a semicolon. The
allowable options are:
center
- center the table (default is left-adjust)~
expand
- make the table as wide as the current line length;
- enclose the table in a box;
box
- enclose each item in the table in a box;
doublebox - enclose the table intwo boxes;
tab (x)
-use x instead of tab to separate data items.
Jinesize (n) - set Jines or rules (e.g. from box) in n point type;
allbox
recognize x and y as the eqn delimiters.
The fbI program tries to keep boxed tables on one page by issuing appropriate "need"
(. ne) commands. These requests are calculated from the num'ber of lines in the tables,
and if there are spacing commands embedded in the input, these requests may be inaccurate; use normal frqifprocedures, such as keep-release macros, in that case. The user who
must have a mUlti-page boxed table should use macros designed for this purpose, as
explained below under ·Usage.'
delim (xy)
2)
-
FORM AT. The format section of the table specifies the layout of the columns. Each line
in this section corresponds to one line of the table (except that the last line corresponds to
all following lines up to the next . T &, if any - see below), and each line contains a keyletter for each column of the table. It is good practice to separate the key letters for each
column by spaces or tabs. Each key-letter is one of the following:
L or I
R or r
to indicate a left-adjusted column entry;
to indicate a right-adjusted column entry;
C or c
to indicate a centered column entry;
Nor n
to indicate a numerical column entry, to be aligned with other numerical
entries so that the units digits of numbers line up;
to indicate an alphabetic subcolumn; all correspondin!! entries are aligned on
the left, and positioned so that the widest is centered within the column (see
example on page 12);
to indicate a spanned heading, i.e. to indicate that the entry from the previous
column continues across this column (not allowed for the first column, obviously); or
A or a
S or s
to indicate a vertically spanned heading, i.e. to indicate that the entry from the
previous row continues down through this row. (Not allowed for the first row
of the table, obviously).
When numerical alignment is specified, a location for the decimal point is sought. The
rightmost dot (.) adjacent to a digit is used as a decimal point; if there is no dot adjoining
a digit, the rightmost digit is used as a units digit; if no alignment is indicated, the item is
centered in the column. However, the special non-printing character string \& may be
used to override unconditionally dots and digits, or to align alphabetic data~ this string
lines up where a dot normally WOUld, and then disappears from the final output. In the
example below, the items shown at the left will be aligned (in a numerical column) as
4-72
shown on the right:
13
4.2
26.4.12
abc
abc\&
43\&3.22
749.12
13
4.2
26.4.12
abc
abc
433.22
749.12
Note: If numerical data are used in the same column with wider L or r type table entries,
the widest number is centered relative to the wider L or r items (L is used instead of I for
readability~ they have the same meaning as key-letters). Alignment within the numerical
items is preserved. This is similar to the behavior of a type data, as explained above.
However, alphabetic subcolumns (requested by the a key-letter) are always slightly
indented relative to L items~ if necessary, the column width is increased to force this.
This is not true for n type entries.
Warning: the n and a items should not be used in the same column.
For readability, the key-letters descri bing each column should be separated 'by spaces.
The end of the format section is indicated by a period. The layout of the key-letters in
the format section resembles the layout of the actual data in the table. Thus a simple format might appear as:
c
S
s
Inn.
"which specifies a table of three columns. The first line of the table contains a heading centered across all three columns~ each remaining line contains a left-adjusted item in the
first column followed by two columns of numerical data. A sample table in this format
might be:
Overall title
34.22
Item-a
Item-b
12.65
Items: c,d,e
23
Total
69.87
9.1
.02
5.8
14.92
There are some additional features of th~ key-letter system:
Horizontal lines - A key-letter may be replaced by '_' (underscore) to indicate a horizontal line in place of the corresponding 'column entry, or by '=' to indicate a double horizontal line. If an adjacent column contains a horizontal line, or if there are
vertical lines adjoining this column, this horizontal line is extended to meet the
nearby lines. If any data entry is provided for this column, it is ignored and a warning message is printed.
Vertical lines - A vertical bar may be placed between column key-letters. This will
cause a vertical line between the corresponding columns of the table. A vertical bar
to the left of the first key-letter or to the right of the last one produces a line at the
edge of -the table. If two vertical bars appear between key-letters, a double vertical
line is drawn.
Space be/ween columns - A number may follow the key-letter. This indicates the
amount of separation between this column and the next column. The number normally specifies the separation in ens (one en is about the width of the letter 'n')" If
the "expand" option is used, then these numbers are multiplied by a constant such
that the table is as wide as-the current line length. The default column separation
• More precisely. an en is a number of points (1 point -
1172 inch) equal to half the current type size,
4,-73
number is 3. If the separation is changed the worst case (largest space requested)
governs.
Verlical spanning - Normally, vertically spanned items extending over several rows of
the table are centered in their vertical range. If a key-letter is followed by t or T,
any corresponding vertically spanned item will begin at the top line of its range.
Font changes - A key-letter may be followed by a string containing a font name or
number preceded by the letter for F. This indicates that the corresponding column
should be in a different font from the default font (usually Roman). All font names
are one or two Jetters~ a one-letter font name should be separated from whatever
follows by a space or tab. The single letters B, b, I, and i are shorter synonyms for
fB and fl. Font change commands given with the table entries override these
specifications.
POII/t size changes - A key-letter may be followed by the letter p or P and a number to
indicate the point size of the corresponding table entries. The number may be a
signed digit, in which case it is taken as an increment or decrement from the current
point size. If both a point size and a column separation value are given, one or
more blanks must separate them.
Verlical spacing changes - A key-letter may be followed by the letter \' or \' and a
number to indicate the vertical line spacing to be used within a multi-line
corresponding table entry. The number may be a signed digit, in which ·case it is
taken as an increment or decrement from the current vertical spacing. A column
separation value must be separated by blanks or some other specification from a
vertical spacing request. This request has no effect unless the corresponding table
.
entry is a text block (see below).
Column H:id,h lI7dicaliol1 - A key.;letter may be followed by the letter w or \\1 and a width
value in parentheses. This width is used as a minimum column width. If the largest
element in the column is not as wide. as the width value given after the w, the largest element is assumed to be that wide. If the largest element in the column is
wider than the specified value, its width is used. The width is also used as a default
line length for included text blocks. Normal troff units can be used to scale the
width value~ if none are used, the default is ens. If the width specification is a unitless integer the parentheses may be omitted. If the width value is changed in a
column, the las, one given controls.
Equal Width columns - A key-letter may be followed by the letter e or E to indicate
equal width columns. All columns whose key-letters are followed by e or E are
made the same width. This permits the user to get a group of regularly spaced
columns.
Note: The order of the above features is immaterial~ they need not be separated by
spaces, except as indicated above to avoid ambiguities involving point size and font
changes. Thus a numerical column entry in italic font and 12 point type with a
minimum width of 2.5 inches and separated by 6 ens from the next column could be
specified as
np12w(2.5DfI 6
Allernatlve nOlalion - Instead of listing the format of successive lines of a table on consecutive lines of the format section, successive line formats may be given on the
same line, separated by commas, so that the format for the example above might
ha ve been wri tten:
c s s, Inn.
Default - Column descriptors missing from the end of a format line are assumed to be
L. The longest line in the format section, however, defines the number of columns
in the table~ extra columns in the data are ignored silently.
4-74
3)
The data for the table are typed after the format. Normally, each table line is
typed as one line of data. Very long input lines can be broken: any line whose last character is \ is combined with the following line (and the \ vanishes). The data for different
columns (the table entries) are separated by tabs, or by whatever character has been
specified in the option (abs option. There are a few special cases:
DATA.
An input line beginning with a '.' followed by anything
but a number is assumed to be a command to trojfand is passed through unchanged,
retaining its position in the table. So, for example, space within a table may be produced by .sp" commands in the data.
Full width horizontal lines - An input line containing only the character
(underscore)
or = (equal sign) is taken to bea single or double line, respectively: extending the
full width of the table.
Trojf commands within tables -
H
An input table entry containing only the character _ or =
is taken to be a single or double line extending the full width of the column. Such
lines are extended to meet horizontal or vertical lines adjoining this column. To
obtain these characters explicitly in a column, either precede them by \& or follow
them by a space before the usual tab or newline.
Shorf horizonta/lines - An input table entry containing only the string \ is taken to be a
single line as wide as the contents of the column. It is not extended to meet adjoining lines.
Repeated characters - An input table entry containing only a string of the form \ R'(
where x is any character is replaced by repetitions of the character x as wide as the
data in the column. The sequence of x's is not extended to meet adjoining
columns.
Vertically spanned items - An input table entry containing only the character string \"
indicates that the table entry immediately above spans downward over this row. It is
equivalent to a table format key-letter of ,A'.
Single column horizonta/lines -
In order to include a block of text as a table entry, precede it by T{ and
follow it by T}. Thus the sequence
Text blocks -
•.. T{
.
block Qf
text
T} ...
is the way to enter, as a single entry in the table, something that cannot conveniently be typed as a simple string between tabs. Note that the T} end delimiter
must begin a line~ additional columns of data may follow after a tab on the same
line. See the example on page 10 for an illustration of included text blocks in a
table. If more than twenty or thirty text blocks are used in a table, various limits in
the (roff program are likely to be exceeded. producing diagnostics such as 'too many
string/macro names' or 'too many number registers.'
Text blocks are pulled out from the table, processed separately by trolf. and replaced
in the table as a solid block. If no line length is specified in the block of' (ext itself,
or in the table format, the defa'ult is to use Lx C / (N + 1) where L is the current line
length, C is the number of table columns spanned by the text, and N is the total
number of columns in the table. The other parameters (point size, font, etc.) used
in setting the block of (ex( are those in effect at the beginning of the table (including
the effect of the ~:. TS" macro) and any table format specifications of size, spacing
and font, using the p, v and f modifiers to the column key-letters. Commands
within the text block itself are also recognized, of course. However, (rolf commands
within the table data but not wi thin the text block do not affect that block.
4-75
Warnings: - Although any number of lines may be present in a table, only the first 200
lines are used in calculating the widths of the various columns. A mUlti-page table,
of course, may be arranged a~ :se'veral single-page tables if this proves to be a problem. Other difficulties with formatting may arise because, in the calculation of
column widths all table entries are assumed to be in the font and size being used
when the". TS" command was encountered, except for font and size changes indicated (a) in the table format section and (b) within the table data (as in the entry
\s + 3\ndata\fP\sO). Therefore, although arbitrary Iroffrequests may be sprinkled in
a table, care must be taken to avoid confusing the width calculations~ use requests
such as '.ps' with care.
4)
ADDITIONAL COMMAND LINES. If the format of a table must be changed after many similar lines, as with sub-headings or summarizations, the" .T&" (table continue) command
can be used to change column parameters. The outline of such a table input is:
.TS
options;
(ormaT •
data
.T&
format.
data
.T&
format.
data
.TE
as in the examples on pages] 0 and 12. Using this procedure, each table line can be close
to its corresponding format line.
J,Varning: it is n01 possible to change the number of columns, the space between columns,
the global options such as box. or the selection of columns to be made equal width.
Usage.
On UNIX, tbl can be run on a simple table with the command
tbl input-file I troff
but for more complicated use, where there are several input files, and they contain equations
and ms memorandum layout commands as well as tables, the normal command would be
tbl file-1 file-2 . . . I eqn I troff -ms
and, of course, the usual options may be used on the froff and eqn commands. The usage for
nrofJ is similar to that for frQff, but only TELETYPE([< Model 37 and Di'ablo-mechanism (DASI or
GSI) terminals can print boxed tables directly.
For the convenience of users employing line printers without adequate driving tables or
post-filters, there is a special - TX command line option to tbl which produces output that does
not have fractional line motions in it. The only other command line options recognized by fbI
are - ms and - mm which are turned into commands to fetch the corresponding macro files~
usually it is more convenient to place these arguments on the froff part of the command line,
but they are accepted by fbI as well.
Note that when eqn and tbl are used together on the same file tbl should be used first. If
there are no equations within tables, either order works, but it is usually faster to run fbI first,
since eqn normally produces a larger expansion of the input than fbi. However, if there are
equations within tables (using the delim mec~anism in eqn), tbl must be first or the output will
be scrambled. Users must also beware of using equations in n-style columns: this is nearly
4-76
always wrong, since (bl attempts to split numerical format items into two parts and this is not
possible with equations. The user can defend against this by giving the delim{xx) table option;
this prevents splitting of numerical columns within the delimiters. For example, if the eqn delimiters are $$, giving delim($$) a numerical column such as "1245 $+- 16$" will be divided
after 1245, not after 16.
Tbl limits tables to twenty columns; however, use of more than 16 numerical columns
may fail because of limits in troff, producing the ~too many number registers' message. Tro./f
number registers used by fbi must be avoided by the user within tables: these include two-digit
names from 31 to 99, and names of the forms #x. x+. x I, "X, and X-, where X is any lower
case letter. The names ##, #-, and #" are also used in certain circumstances. To conserve
number register names, the nand a formats share a register; hence the restriction above that
they may not be used in the same column.
For aid in writing layout macros, fbi defines a number register TW which is the table
width~ it is defined by the time that the H . TE" macro is invoked and may be used in the
expansion of that macro. More importantly, to assist in laying out mUlti-page boxed tables the
macro T# is defined to produce the bottom lines and side lines of a boxed table, and then
invoked at its end. By use of this macro in the page footer a multi-page table can be boxed. In
particular, the ms macros can be used to print a multi-page boxed table with a repeated heading
by giving the argument H to the TS" macro. If the table start macro is written
'
.TS H
a line of the form
.TH
must be given in the table after any table heading (or at the start if none). Material up to the
". TH" is placed at the top of each page of table; the remaining lines in the table are placed on
several pages as required. Note that this is not a feature of fbI. but of the ms layout macros.
H
•
Examples.
Here are some examples illustrating features of fbI.
represents a tab character.
The symbol CD in the input
Output: .
Input:
.TS
box;
Language
ccc
I I l.
Fortran
PLIl
C
BLISS
IDS
Pascal
Language (j) Authors (j) Runs on
Fortran (j) Many (j) Almost anything
PLll (j) IBM CD 360/370
C (j) BTL (j) 11145,H6000,370
BLISS CD Carnegie-Mellon CD PDP-10, 11
IDS (j) Honeywell CD H6000
, Pascal (j) Stanford (j) 370
.TE
4-77
Authors
Many
IBM
BTL
Carnegie- Mellon
Honeywell
Stanford
Runs on
Almost anything
360/370
lI/45,H6000.370
PDP-lO,l1
H6000
370
Output:
Input:
.TS
AT&T Common Stock
Dividend
Year
Price
4} .. 54
$2.60
1971
2.70
2 41-54
2.87
3 46-55
4 40-53
3.24
3.40
5 45-52
.95'"
6 51-59
'" (first quarter only)
allbox~
css
ccc
n n n.
AT&T Common Stock
Year 2. 70
3 (D 46- 55 cr> 2 .87
4 (D40-53 cr> 3 .24
5 cr>45-52 cr>3.40
6(1)51-59cr> .95·
.TE
• (first quarter on ly)
Output:
Input:
.TS
Major New York Bridges
Designer
Bridge
. J. A. Roebling
Brooklyn
G. Lindenthal
Manhattan
L. L. Buck
Williamsburg
Palmer &
Queensborough
Hornbostel
box~
css
clclc
1III
n.
Major New York Bridges
Bridge cr> Designer cr> Length
Length
1595
1470
1600
1182
1380
-
Triborough
Brooklyn cr> 1. A. Roebling cr> 1595
Manhattan (1) G. Lindenthal 1600
Bronx Whitestone
Throgs Neck
George Washington
Queensborough cr> Palmer & cr>1380
Triborough (DO. H. Ammann (D_
cr> (I) 383
Bronx Whitestone O. H. Ammann cr> 3500
.TE
4-78
O. H. Ammann
O. H. Ammann
O. H. Ammann
O. H. Ammann
383
2300
1800
3500
I
Output:
Input:
.TS
cc
1
np-2! n ! .
2
(1) Stack
(1)_
1 1 billion
Paleozoic (1) Manhattan Prong (l) 400 million
Mesozoic (f)T{
.na
Newark Basin, incl.
Stockton, Lockatong, and Brunswick
formations: also Watchungs
and Palisades.
T} (1) 200 million
Cenozoic (l) Coastal Plain (l) T{
On Long Island 30,000 years~
Cretaceous sediments redeposited
by recent glaciation .
. ad
Precambrian
New York Area Rocks
Age (years)
Formation
Reading Prong
> I billion
Paleozoic
Manhattan Prong
400 million
Mesozoic
Newark Basin,
incl. Stockton,
Lockatong, Clnd
Brunswick formutions; also
Watchungs und
PalisCldes,
200 million
Cenozoic
Coastal Plain
On Long Islund
30,000
yeurs:
Cretaceous sediredepoments
sited by recen t
glaciution.
Era
T}
.TE
4-80
Output:
Input:
.EQ
delim $$
.EN
Name
Definition
Gamma
Sine .
Error
.TS
doublebox~
Bessel
cc
II.
00
,(s)=:I:k-5 (Res>!)
Zeta
Name 1) $
. vs -2p
.TE
:II
Output:
Input:
.TS
box, tabC:)~
cb s s s s
cp-2 s s s s
cllclclclc
c II I c I c I c
r211 n21 n21 n21 n.
Readability of Text
Line Width and Leading for IO-Point Type
c
Readability of Text
Lin~
Line
Width
9 Pica
14 Pica
19 Pica
3.1 Pica
43 Pica
Line: Set: I-Point: 2-Point : 4-Point
Width: Solid: Leading: Leading: Leading
9 Pi ca : \ -9 .3 : \ -6.0 : \ -5.3 : \ -7 . 1
14 Pica: \-4.5: \-0.6: \-0.3: \-1. 7
19 Pica:\-5.0:\-5.1: 0.0:\-2.0
31 Pica:\-3.7:\-3.8:\-2.4:\-3.6
43 Pica:\-9.1 :\-9.0:\-5.9:\-8.8
.TE
4-81
Width and Leading for t a-Point Type
Set
Solid
-9.3
-4.5
-5.0
-3.7
-9.1
I-Point
Leading
-6.0
-0.6
-5.1
-3.8
-9.0
2-Point
Leading
-5.3
-0.3
0.0
-2.4
-5.9
4-Point
Leading
-7.1
-1.7
-2.0
-3.6
-8.8
Output:
Input:
.TS
Some London Transport Statistics
cs
(Year 1(64)
Railway route miles
Tube
Sub-surface
Surface
cip-2 s
In
an.
Some London Transport Statistics
(Year 1964)
Railway route miles G) 244
Tube G)66
Su b-surface (i) 22
Surface G) 156
.sp .5
.T&
Passenger traffic - railway
Journeys
Average length
Passenger miles
Passenger traffic - road
Journeys
Average length
Passenger miles
Ir
a r.
Passenger traffic \- railway
Journeys (i) 674 million
Average length (1) 4.55 miles
Passenger milesG)3,066 million
.T&
Ir
ar.
Passenger traffic \- road
Journeys (i) 2,252 million
Average length G) 2.26 miles
Passenger miles G) 5,094 million
.T&
In
an.
.sp .5
Vehicles G) 12,521
Railway motor cars G) 2,905
Railway trailer cars (i) 1,269
Total railway(1)4,174
Omnibuses (i) 8)47
.T&
In
an .
. sp .5
Staff(i) 73,739
Administrative, etc. (1) 5,582
Civil engineering (i) 5, 134
Electrical eng. (i)1,714
Mech. eng. \- railway (i) 4,31 0
Mech. eng. \- road (i)9,152
Railway operations G) 8,930
Road operations (1)35,946
Other (i) 2.971
.TE
4-82
244
66
22
156
674 million
4.55 miles
3,066 million
2,252 million
2.26 miles
5,094 million
Vehicles
Railway motor cars
Railway trailer cars
,Total railway
Omnibuses
12,521
2,905
1,269
4.174
8,347
Staff
Administrative, etc.
Civil engineering
Electrical eng.
Mech. eng. - railway
Mech. eng. '- road
Railway operations
Road operations
Other
73,739
5,582
5,134
1, 714
4,310
9,152
8,930
35,946
2,971
Input:
.ps 8
. vs lOp
.TS
center box~
css
ci s s
ccc
IB In.
New Jersey Representatives
(Democrats)
.sp .S
Name cr> Office address cr> Phone
.sp .S
James 1. Florio (i) 23 S. White Horse Pike, Somerdale 08083 cr> 609·627·8222
William 1. Hughes cr> 2920 Atlantic Ave., Atlantic City 0840 I (j) 609·34S·4844
James 1. Howardcr>801 Bangs Ave., Asbury Park 0771220 1·843·0240
Robert A. Roecr>U.S.P.O., 194 Ward St., Paterson 075l0cr>201·S23-5152
Henry Helstoski cr> 666 Paterson Ave., East Rutherford 07073 cr> 201-939·9090
Peter W. Rodino, 1r. cr>Suite 143SA, 970 Broad St., Newark 07102cr>201·645-3213
Joseph G. Minish cr> 308 Main St., Orange 07050 201-659-7700
Edward 1. Patten 201-722-8200
Edwin B. Forsythe 1961 Morris Ave., Union 07083 cr> 20 1-687·4235
.TE
.ps 10
. vs 12p
4-83
Output:
New Jersey Representatives
I
(U('l1Iocra IS)
I
I
Name
Office address
Phone
! James J. Florio
William J. Hughes
James J. Howard
Frank Thompson. Jr.
Andrew Maguire
Robert A. Roe
Henr~ Helstoski
Peter W. Rodino. Jr.
Joseph G. Minish
Helen S. Me~ner
Dominick \'. Daniels
Ed", ard J. Patten
23 S White Horse Pike. Somerdale 0~083
2920 AtlantIC Ave .• Atlantic City 08401
801 Bangs Ave .. Asbury Park 07712
10 Rutgers PI.. Trenton 08618
lIS W. Passaic St., Rochelle Park 07662
U.S.P.O .. 194 Ward 51.. Paterson 07510
666 Paterson Ave., East Rutherford 07073
Suite 1435A, 970 Broad St.. Newark 07102
308 Main 51.. Orange 07050
32 Bridge SI.. Lambertville 08530
895 Bergen Ave., Jersey City 07306
Nutl. B<.Ink Bldg., Perth Amboy 08861
609·627·8222
609·345-4844
201·774·1600
609·599·1619
201·843·0240
201·523·5152
201·939·9090
201·645·3213
201·645·6363
609-397-1830
201-659- 7700
201-826-4610
(Republicans)
'Y1i II icen! Fenwick
Edwin B. Forsythe
Marthe'" J. Rinaldo
41 N. Bridge St.. Somerville 08876
301 Mill St., Moorestown 08057
1961 Morris Ave., Union 97083
20) -722-8200
609-235-6622
201-687·4235
This is a paragraph of normal text placed here only to indicate where the left and right margins
are. In this way the reader can judge the appearance of centered tables or expanded tables, and
observe how such tables are formatted.
Input:
.TS
expand:
csss
cccc
I Inn.
Bell Labs Locations
Name (l) Address CD Area Code (l) Phone
Holmdel (l) Holmdel, N. 1. 07733 (1; 201 (i) 949-3000
Murray HillCVMurray Hill, N. J. 07974(l)201(l)S82-6377
Whippany (l) Whippany, N. J. 07981 (i) 20 1 (i) 386-3000
Indian Hill (i)Naperville, Illinois 60540CI)312 (i)690-2000
.TE
Output:
Name
Holmdel
Murray Hill
Whippany
Indian Hill
Bell Labs Locations
Address
Holmdel, N. J. 07733
Murray Hill, N. J. 07974
Whippany, N. J. 07981
Naperville, Illinois 60540
4-84
Area Code
201
201
201
312
Phone
949-3000
582-6377
386-3000
690-2000
Input:
.TS
box:
cb s
s
s
c I c Ic s
Itiw(Ij) lltw(2i) Ilp81Iw(l.6i)p8.
Some Interesting Places
Name(i) Description (i) Practical Information
II
American Museum of Natural History
TI(i)TI
The collections fill 11.5 acres (Michelin) or 25 acres (MTA)
of exhibition halls on four floors. There is a full-sized replica
of a blue whale and the world's largest star sapphire (stolen in 1964).
TI CD Hours CD 10-5. ex. Sun 11·5. Wed. to 9
\ ' inf} x sub n
===0
is
4-92
italic "sin(x)"
thereafter affects all equations. At the
beginning of any equation, you might say.
for instance.
(x)
is
sin (x) +sin{x)
.EQ
gsize 16
gfont R
Quotes are also used to get braces and
other EQN keywords printed:
"{ size alpha }"
.EN
is
to set the size to 16 and the fOlit to roman
thereafter. In place of R, you can use any
of the TROFF font names. The size after
gS/:e can be a relative change with + or -.
Generally. gsi::e and g/ont will appear at
the qeginning of a document but they can
also appear thoughout a document: the global font and size can be changed as often as
needed. For example. in a footnote; you
will typically want the size of equations to
match the size of the footnote text. which is
two, points smaller than the main text.
Don't forget to reset the global size at the
end of the footnote.
{ size alpha}
and
roman "{ size alpha
I"
( size alpha
I
is
The construction "" is often used as a
place-holder when grammatically EQN needs
something, but you don't actually want anything in your output. For example, to make
2He, you can't just type sup 2 roman He
because a sup has to be a superscript on
something. Thus you must say
13. Diacritical Marks
.," sup 2 roman He
To get funny marks on top of letters.
there are several words:
x dot
x dotdot
x hat
x tilde
x vec
x dyad
x bar
x under
+ sin
To get a literal quote use "\"". TROFF
characters like \ (bs can appear unquuted.
but more complicated things like horizontal
and vertical motions with \ hand \ v should
always be quoted. (If you 've never heard of
\ hand \ v. ignore this section.)
x
x
x
x
x
x
x
15. Lining Up Equations
;S
Sometimes it's necessary to line up a
series of equations at some horizon tal position, often at an equals sign. This is done
with two operations called mark and lineup.
The word mark may appear once at
any place in an equation. It remembers the
horizontal position where it appeared. Successive
equations
can
contain
one
occurrence of the word lineup. The place
where lineup appears is made to line up with
the place marked by the previous mark if at
all possible. Thus, for example. you can say
The diacritical mark is placed at the right
height. The bar and under are made the
right length for the entire construct, as in
x+y+:; other marks are centered.
14. Quoted Text
Any input entirely within quotes
(" ... ") is not subject to any of the font
changes and spacing adjustments normally
done by the equation setter. This provides a
way to do your own spacing and adjusting if
needed:
;Like thiS one. in which we have J few random
expressions II~ x, and ;r2 The sizes for these
were set by the command gSI;e -1.
4-93
.EQ I
x+y mark
.EN
.EQ I
x lineup =
.EN
= Z
to produce
x+y=z
x=l
For reasons too complicated to talk about,
when you use EQN and ~-ms', use either
.EQ I or .EQ L. mark and lineup don't work
with centered equations. Also bear in mind
that mark doesn't look ahead~
x mark
=1
three, etc. Second, big left and right
parentheses often look poor, because the
character set is poorly designed .
The right part may be omitted: a "left
something" need not have a corresponding
Hright something". If the right part is omitted, put braces around the thing you want
the left bracket to encompass. Otherwise,
the resulting brackets may be too large.
If you want to omit the left part, things
are more complicated, because technically
you can't have a right without a corresponding left. Instead you have to say
left "" ..... right)
for example. The lef' "" means a "left nothing". This satisfies the rules without hurting your output.
x +y lineup =z
isn't going to work, because there isn't
room for the x+y part after the mark
remembers where the x is.
17. Piles
There is a general facility for making
vertical piles of things~ it comes in several
flavors. For example:
16. Big Brackets, Etc.
To get big brackets [L braces {},
parentheses (), and bars II around things,
use the lefl and right commands:
A -=- left r
pile { a above b above c }
-- pile I x above y above z }
right]
left { a over b + 1 right}
- = - left ( cover d right)
+ left [ e right]
will make
A
is
The resulting brackets are made big enough
to cover whatever they enclose. Other characters can be used besides these, but the are
not likely to look very good. One exception
is the floor and ceiling characters:
left floor x over y right floor
< = left ceiling a over b right ceiling
produces
Several warnings about brackets are in
order. First, braces are typically bigger than
brackets and parentheses, because they are
made up of three, five, seven, etc" pieces,
while brackets can be made up of two,
=
[~ fl
The elements of the pile (there can be as
many as you wand are centered one above
another, at the right height for 'most purposes. The keyword above is used to
separate the pieces~ braces are used around
the entire list. The elements of a pile can
be as complicated as needed, even containing more piles.
Three other forms of pile exist: /pile
makes a pile with the elements left-justified:
rpile makes a right-justified 'pile~ and cpile
makes a centered pile, just like pile. The
vertical spacing between the pieces is somewhat larger for 1-, r- and cpiles than it is for
ordinary piles.
roman sign (x)- =left {
lpile {1 above 0 above -1 }
-- Ipile
(irx>o above irx=O above irxO
sign(x):::a 0 if x=O
-1 if x =05.
This works as you might expect - spaces,
newlines, and so on are significant in the
text, but not in the equation part itself.
Multiple equations can occur in a single
input line.
Enoug h room is left before and after a
line that contains in-line expressions that
II
IXI
something like
does not interfere with
. :",,1
the lines surrounding it.
To turn off the delimiters,
.EQ
delim off
.EN
Warning:
don't
use
braces,
tildes,
circumflexes, or double quotes as delimiters
- chaos will result.
20. Definitions
EQN provides a facility so you can give
a frequently-used string of characters a
name, and thereafter just type the name
instead of the whole string. For example, if
the sequence
x sub i sub 1
+ Y sub i sub 1
appears repeatedly throughout a paper, you
can save re-typing it each time by defining it
like this:
define xy 'x sub i sub 1
+ Y sub i sub l'
This makes .xy a shorthand for whatever
characters occur between the single quotes
in the definition. You can use any character
instead of quote to mark the ends of the
definition. so long as it doesn't appear inside
the definition.
Now you can use ,X)I like this:
.EQ
f(x)
== xy ...
.EN
and so on. Each occurrence of .xy will
expand into what it was defined as. Be care~
ful to leave spaces or their equivalent
around the name when you actually use it,
so EON wi II be able to identify it as special.
There are several things to watch out
for. First, although definitions can use previous definitions, as in
horizontal spaces can be obtained with tilde
and circumflex. You can also say back nand
fwd n to move small amounts horizontally.
n is how far to move in 1/1 ~O's of an em(an em is about the width of the letter 'm'.)
Thus back 50 moves back about half the
width of an m. Similarly you can move
things up or down with up n and down n. As
with sub or sup, the local motions affect the
next thing in the input, and this can be
something arbitrarily complicated if it is
enclosed in braces.
22. A Large Example
Here is the complete source for the
three display equations in the abstract of this
guide .
. EQ
define xi . x sub i '
define xi 1 . xi sub 1 '
.EN
.EO I
don't define something in lerms of itself A
favorite error is to say
define X . roman X .
This is a guaranteed disaster, since X
defined in terms of itself. If you say
lin - G(z} I
--- exp left (
sum from k> o:::} Is sub k z sup k) over k right)
- == - prod from k> "'"' 1 e sup IS sub k z sup k /k I
G(z)~mark -" e sup
IS
now
define X . roman "X" ,
however, the quotes protect the second X,
and everything works fine.
EQN keywords can be redefined. You
can make / mean over by saying
define / ' over '
or redefine over as / with
.EN
.EO I
lineup I : left ( } + S sub} z +
I S sub I sup 2 z sup 2 lover 21 + ... right)
left ( I + I S su b 2 z sup 2 lover 2
+ I S sub 2 sup 2 z sup 4 lover I 2 sup 2 cdot 21
+ ... right) ...
.EN
.EO I
lineup
sum from m> -0 left (
sum from
pile I k sub I ,k sub 2 .... , k sub m >-0
above
k sub I + 2k sub 2 + ... + mk sub m -= m]
! S sub I sup Ik sub}) lover II sup k sub I k sub } ~
I S sub 2 sup Ik sub 2) lover 12 sup k sub 2 k sub 2 ~
0:::
I S sub m sup Ik sub ml lover
right ) z sup m
define over ' / .
I
1m
sur k sub
mk
sub mil
.EN
If you need different things to print on
a terminal and on the typesetter, it is some~
times worth defining a symbol differently in
NEON and EON. This can be done with
ndefine and tdefine. A definition made with
ndefine only takes effect if you are running
NEON; if you use tde.fine, the definition only
applies for EON. Names defined with plain
define apply to both EON and NEON.
21. Local Motions
Although EON tries to get most things
at the right place on the paper, it isn't per~
fect, and occasionally you will need to tune
the output to make it just right. Small extra
4-96
23. Keywords, Precedences, Etc.
If you don't use braces, EON will do
operations in the order shpwn in this list.
dyad vee under bar Tilde hat dOl dOTdol
fv..'d back down up
fat roman iTalic bold size
sub sup sqrt over
from TO
These operations group to the left:
over sqrr left right
All others group to the right.
II"
Digits, parentheses, brackets, punctuation marks, and these mathematical words
are converted to Roman font when encountered:
sin cos tan sinh cosh tanh arc
max min lim log In exp
Re 1m and if for del
These character sequences are recognized
and translated as shown.
>=
<=
+-
±
«
»
«
»
inf
partial
half
prime
approx
nothing
cdot
times
del
. grad
00
a
'h
x
I
sum
J
int
prod
union
inter
IT
f3
X
0
E
T/
y
tau
theta
upsilon
xi
zeta
j
fJ
v
g
~
These are all the words known to EQN
(except for characters with names), together
with the section where they are discussed.
above
back
bar
bold
ccol
col
cpile
define
delim
dot
dotdot
down
dyad
fat
font
from
fwd
gront
gsize
hat
italic
Ieol
left
lineup
,=
->
<-
beta
chi
delta
epsilon
eta
gamma
17, 18
21
13
12
18
18
17
20
19
13
13
21
13
12
12
11
21
12
12
13
12
18
16
15
Ipile
mark
matrix
ndefine
over
pile
rcol
right
roman
rpile
size
sqrt
sub
sup
tdefine
tilde
to
under
up
vec
,
I}
17
15
18
20
9
17
18
16
12
17
12
10
7
7
20
13
11
13
21
13
4,6
8
8, 14
u
n
24. Troubleshooting
To obtain Greek letters, simply spell
them out in whatever case you want:
DELTA L\
GAMMA r
LAMBDA,\
OMEGA n
PHI
PI
n
PSI
'v
SIGMA
r.
THETA 8
UPSILON Y
XI
alpha
ex
-
iota
kappa
lambda
mu
nu
omega
omicron
phi
pi
psi
rho
sigma
If you make a mistake in an equation.
like leaving out a brace (very common) or
having one too many (very common) or
having a sup with nothing before it (comman), EQN will tell you with the message
K
syntax error between lines x and y .. file =
A
J.L
where x and yare approximately the lines
between which the trouble occurred. and :: i~
the name of the file in question. The line
numbers are approximate - look nearby as
well. There are also self-explanatory messages that arise if you leave out a quote or
try to run E~ on a non-existent file.
If you want to check a document
before actually printing it (on UNIX only),
II
w
()
cb
rr
1/1
p
(j
4-97
neqn files I nroff - Tx
eqn files >/dev/null
will throwaway the output but print the
messages.
If you use something like dollar signs
as delimiters, it is easy to leave one out.
This causes very strange troubles. The program checkeq (on GCOS, use .kheckeq
instead) checks for misplaced or missing
dollar signs and similar troubles.
In-line equations can only be so big
because of an internal buffer in TROFF. If
you get a message "word overflow", you
have exceeded this limit. If you print the
equation as a displayed equation this message will usually go away. The message
"line overflow" indicates you have
exceeded an even bigger buffer. The only
cure for this is to break the equation into
two separate ones.
On a related topic, EON does not break
equations by itself - you must split long
equations up across multiple lines by yourself, marking each by a separate .EO '" .EN
sequence. EO~ does warn about equ.ations
that are too long to fit on one line.
25. Use on UNIX
To print a document that contains
mathematics on the UNIX typesetter,
eqn files I troff
where x is the terminal type you are using,
such as 300 or 300S.
EON and NEON can be used with the
TBL program [2) for setting tables that contain mathematics. Use TBL before (N]EON,
like this:
tbl files
tbl files
I eqn I troff
I neqn I nroff
26. Acknowledgments
We are deeply indebted to J. F.
Ossanna, the author of TROFF, for his willingness to extend TROFF to make our task
easier, and for his continuous assistance
during the development and evolution of
EON. We are also grateful to A. V. Aha for
advice on language design, to S. C. Johnson
for assistance with the YACC compilercompiler, and to all the EQN users who have
made helpful suggestions and criticisms.
References
[IJ 1. F. Ossanna,
[2)
[3)
If there are any TROFF options, they go after
the TROFF part of the command. For example,
eqn files' troff -ms
To run the same document on the GCOS
typesetter, use
eqn files' troff -g (other options) Igcat
A compatible version of EON can be
used on devices like teletypes and DASI and
GSI terminals which have half-line forward
and reverse capabilities. To print equations
on a Model 37 teletype, for example, use
neqn files I nroff
The language for equations recognized by
NEON is identical to that of EON. although of
course the output is more restricted.
To use a GSI or DASI terminal as the
output device,
4-98
"NROFF/TROFF User's
Manual", Bell Laboratories Computing
Science Technical Report #54, 1976.
M. E. Lesk, "Typing Documents on
UNIX", Bell Laboratories, 1976.
M. E. Lesk, "TBL - A Program for
Setting Tables", Bell Laboratories
Computing Science Technical Report
#49, 1976.
Some Applications of Inverted Indexes on the UNIX System
M. E. Lesk
Bell Laboratories
Murray Hill, New Jersey 07974
1. Introduction.
The UNIXt system has many utilities (e.g. grep, awk, lex, egrep, fgrep, ... ) to search through
files of text, but most of them are based on a linear scan through the entire file, using some
deterministic automaton. This memorandum discusses a program which uses inverted indexes}
and can thus be used on much larger data bases.
As with any indexing system, of course, there are some disadvantages~ once an index is
made, the files that have been indexed can not be changed without remaking the index. Thus
applications are restrictea to those making many searches of relatively stable data. Furthermore, these programs depend on hashing, and can only search for exact matches of whole keywords. It is not possible to look for arithmetic or logical expressions (e.g. Udate greater than
1970") or for regular expression searching such as that in lex. 2
Currently there are two uses of this software, the refer preprocessor to format references,
and the lookall command to search through all text files on the UNIX system.
The remaining sections of this memorandum discuss the searching programs and their
uses. Section 2 explains the operation of the searching algorithm and describes the data collected for use with the loo~all command. The more important application, refer has a user's
description in section 3. Section 4 goes into more detail on reference files for the benefit of
those who wish to add references to data bases or write new troff macros for use with refer. The
options to make refer collect identical citations, or otherwise relocate and adjust references, are
described in section 5. The UNIX manual sections for refer, lookall, and associated commands
are attached as appendices.
2. Searching.
The indexing and searching process is divided into two phases, each made of two parts.
These are shown below.
A.
B.
Construct the index.
(1)
Find keys - turn the input files into a sequence of tags and keys, where each tag
identifies a distinct item in the input and the keys for each such item are the strings
under which it is to be indexed.
(2)
Hash and sort - prepare a set of inverted indexes from which, given a set of keys,
the appropriate item tags can be found quickly.
Retrieve an item in response to a query.
tUNIX is a Trademark of Bell Laboratories.
1.
D. Knuth, The Art of Computer Programming: Vol. 3. Sorting and Searching. Addison-Wesley, Reading, Mass.
(977). See section 6.5.
2.
M. E. Lesk, "Lex - A Lexical Analyzer Generator," Compo Sci. Tech. Rep. No. 39, Bell Laboratories, Murray Hill, New Jersey (D).
4-100
(3)
Search - Given some keys, look through the files prepared by the hashing and sorting facility and derive the appropriate tags.
Deliver - Given the tags, find the original items. This completes the searching process.
The first phase, making the index, is presumably done relatively infrequently. It should, of
course, be done whenever the data being indexed change. In contrast, the second phase,
retrieving items, is presumably done often, and must be rapid.
(4)
An effort is made to separate code which depends on the data being handled from code
which depends on the searching procedure. The search algorithm is involved only in steps (2)
and (3), while knowledge of the actual data files is needed only by steps (1) and (4). Thus it is
easy to adapt to d'ifferent data files or different search algorithms.
To start with, it is necessary to have some way of selecting or generating keys from input
files. For dealing with files that are basically English, we have a· key-making program which
automatically selects words and passes them to the hashing and sorting program (step 2). The
format used has one line for each input item, arranged as follows:
name:start,length (tab) key 1 key2 key3 .. ,
where name is the file name, slarf is the starting byte number, and length is the number of
bytes in the entry.
These lines are the only input used to make the index. The first field (the file name, byte
position, and byte count) is the tag of the item and can be used to retrieve it quickly. Normally, an item is either a whole file or a section of a file delimited by blank lines. After the
tab, the second field contains the keys. The keys, if selected by the automatic program, are any
alphanumeric strings which are not among the 100 most frequent words in English and which
are not entirely numeric (except for four-digit numbers beginning 19, which are accepted as
dates>. Keys are truncated to six characters and con verted to lower case. Some selection is
needed if the original items are ver lrge. VIe normally just take the first n keys, with n less
than 100 or so; this replaces any attempt at intelligent selection. One file in our system is a
complete English dictionary; it would presumably be retrieved for all queries.
To generate an inverted index to the list of record tags and keys, the keys are hashed and
sorted to produce an index. What is wanted, ideally, is a series of lists showing the tags associated with each key. To condense this, what is actually produced is a list showing the tags associated with each hash code, and thus with some set of keys. To speed up access and further
save space, a set of three or possibly four files is produced. These files are:
File
entry
posting
lag
key
Contents
Pointers to posting file
for each hash code
Lists of tag pointers for
each hash code
Tags for each item
Keys for each item
(optional)
The posting file comprises the real data: it contains a sequence of lists of items posted under
each hash code. To speed up searching, the entry file is an array of pointers into the posting
file, one per potential hash code. Furthermore, the items in the lists in the posting file are not
referred to by their complete tag, but just by an address in the tag file, which gives the complete tags. The key file is optional and contains a copy of the keys Llsed in the indexing.
The searching process starts with a query, containing several keys. The goal is to obtain
all items which were indexed under these keys. The query keys are hashed, and the pointers in
the entry file used to access the lists in the posting file. These lists are addresses in the tag file
of documents posted under the hash codes derived from the query. The common items from
4-101
all lists are determined; this must include the items indexed by every key, but may also contain
some items which are false drops, since items referenced by the correct hash codes need not
actually have contained the correct keys. Normally, if there are several keys in the query, there
are not likely to be many false drops in the final combined list even though each hash code is
somewhat ambiguous. The actual tags are then obtained from the tag file, and to guard against
the possibility that an item has false-dropped on some hash code in the query, the original
items are normally obtained from the delivery program (4) and the query keys checked against
them by string comparison.
Usually, therefore, the check for bad drops is made against the original file. However, if
the key derivation procedure is complex, it may be preferable to check against the keys fed to
program (2), In this case the optional key file which contains the keys associated with each
item is generated, and the item tag is supplemented by a string
~start, length
which indicates the starting byte number in the key file and the length of the string of keys for
each item. This file is not usually necessary with the present key-selection program, since the
keys always appear in the original document.
There is also an option (-en) for coordination level searching. This retrieves items which
match all but n of the query keys. The items are retrieved in the order of the number of keys
that they match. Of course, n must be less than the number of query keys (nothing is
retrieved unless it matches at least one key).
As an example, consider one set of 4377 references, comprising 660,000 bytes. This
included 51,000 keys, of which 5,900 were distinct keys. The hash table is kept full to save
space (at the expense of time) ~ 995 of 997 possible hash codes were used. The total set of
index files (no key file) included 171,000 bytes, about 26% of the original file size. It took 8
minutes of processor time to hash, sort, and write the index. To search for a single query with
the resulting index took 1.9 seconds' of processor time, while to find the same paper with a
sequential linear search using grep (reading all of the tags and keys) took 12.3 seconds of processor time.
We have also used this software to index all of the English stored on our UNIX system.
This is the index searched by the lookall command. On a typical day there were 29,000 files in
our user file system, containing about 152,000,000 bytes. Of these 5,300 files, containing
32,000,000 bytes (about 21 %) were English text. The total number of 'words' (determined
mechanically) was 5,100,000. Of these 227,000 were selected as keys~ 19,000 were distinct,
hashing to 4,900 (of 5,000 possible) different hash codes. The resulting inverted file indexes
used 845,000 bytes, or about 2.6% of the size of the original files. The particularly small
indexes are caused by the fact that keys are taken from only the first 50 non-common words of
some very long input files.
Even this large lookall index can be searched quickly. For example, to find this document
by looking for the keys Hlesk inverted indexes" required 1.7 seconds of processor time and system time. By comparison, just to search the 800,000 byte dictionary (smaller than even the
inverted indexes, let alone the 32,000,000 bytes of text files) with grep takes 29 seconds of processor time. The lookall program is thus useful when looking for a document which you
believe is stored on-line, but do not know where. For example, many memos from the Computing Science Research Center are in its UNIX file system, but it is often difficult to guess
where a particular memo might be (it might have several authors, each with many directories,
and have been worked on by a secretary with yet more directories). Instructions for the use of
the lookall command are given in the manual section, shown in the appendix to this memoran·
dum.
The only indexes maintained routinely are those of publication lists and all English files.
To make other indexes, the programs for making keys, sorting them, searching the indexes,
and delivering answers must be used. Since they are usually invoked as parts of higher-level
commands, they are not in the default command directory, but are available to any user in the
4-102
directory /usr/liblrejer. Three programs are of interest: mkey, which isolates keys from input
files; inv, which makes an index from a set of keys~ and hunt, which searches the index and
delivers the items. Note that the two parts of the retrieval phase are combined into one program, to avoid the excessive system work and delay which would result from running these as
separate processes.
These three commands have a large number of options to adapt to different kinds of
input. The user not interested in the detailed description that now follows may skip to section
3, which describes the rejer program, a packaged-up version of these tools specifically oriented
towards formatting references.
Make Keys. The program mkey is the key-making program corresponding to step (1) in
phase A. Normally, it reads its input from the file names given as arguments, and if there are
no arguments it reads from the standard input. It assumes that blank lines in the input delimit
separate items, for each of which a different line of keys.should be generated. The lines of
keys are written on the standard output. Keys are any alphanumeric string in the input not
among the most frequent words in English and not entirely numeric (except that all-numeric
strings are acceptable if they are between 1900 and 1999). In the output, keys are translated to
lower case, and truncated to six characters in length~ any associated punctuation is removed.
The following flag arguments are recognized by mkey:
-c name
-f name
- i chars
- kn
-In
- nm
-s
- w
Name of file of common words~ default is /usr/lib/eign.
Read a list of files from name and take each as an input argument.
Ignore all lines which begin with '%' followed by any character
in chars.
Use at most n keys per input item.
Ignore items shoner than n letters long.
Ignore as a key any word in the first m words of the list of
common English words. The default is 100.
Remove the labels (file:start,length) from the output; just give
the keys. Used when searching rather than indexing.
Each whole file is a separate item; blank lines in files are
irrelevant.
The normal arguments for indexing references are the defaults, which are -c /usr/lib/eign,
-n}OO, and -13. For searching, the -s option is also needed. When the big lookall index of
all English files is run, the options are - w, -k50, and - f (fi/e/istJ. When running on textual
input, the mkey program processes about 1000 English words per processor second. Unless the
- k option is used (and the input files are long enough for it to take effect) the output of mkey
is comparable in size to its input.
Hash and invert. The inv program computes the hash codes and writes the inverted files.
It reads the output of mkey and writes the set of files described earlier in this section. It
expects one argument, which is used as the base name for the three (or four) files to be written. Assuming an argument of Index (the default) the entry file is named Index.ia, the posting
file Index.ib, the tag file Index.ic \ and the key file (if present) Index.id. The inv program recognizes the following options:
- a
-d
- hn
Append the new keys to a previous set of inverted files, making
new files if there is no old set using the same base name.
Write the optional key file. This is needed when you can not
check for false drops by looking for the keys in the original
inputs, i.e. when the key derivation procedure is complicated
and the output keys are not words from the input files.
The hash table size is n (default 997) ~ n should be prime.
Making n bigger saves search time and spends disk space.
4-103
- ilul
name
-n
-p
-v
Take input from file name, instead of the standard input; if u is
present name is unlinked when the sort is started. Using this
option permits the sort scratch space to overlap the disk space
used for input keys., .
Make a completely new set of inverted files, ignoring previous
files.
Pipe into the sort program, rather than writing a temporary
input file. This saves disk space and spends processor time.
Verbose mode; print a summary of the number of keys which
finished indexing.
About half the time used in inv is in the contained sort. Assuming the sort is roughly
linear, however, a guess at the total timing for inv is 250 keys per second. The space used is
usually of more importance: the entry file uses four bytes per possible hash (note the - h
option), and the tag file around 15-20 bytes per item indexed. Roughly, the posting file contains one item for each key instance and one item for each possible hash code; the items are
two bytes long if the tag file is less than 65336 bytes long, and the items are four bytes wide if
the tag file is greater than 65536 bytes long. To minimize storage, the hash tables should be
over-full; for most of the files indexed in this way, there is no other real choice, since the entry
file must fit in memory.
Searching and Retrieving. The hunt program retrieves items from an index: It combines, as mentioned above, the two parts of phase (B): search and delivery. The reason why it
is efficient to combine delivery and search is partly to avoid starting unnecessary processes, and
partly because the delivery operation must be a part of the search operation in any case.
Because of the hashing, the search part takes place in two stages: first items are retrieved which
have the right hash codes associated with them, and then the actual items are inspected to
c\etermine false drops, i.e. to determine if anything with the right hash codes doesn't really
have the right keys. Since the original item is retrieved to check on false drops, it is efficient to
present it immediately, rather than only giving the tag as output and later retrieving the item
again. If there were a separate key file, this argument would not apply, but separate key files
are not common.
Input to hunt is taken from the standard input, one query per line. Each query should be
in mkey -s output format; all lower case, no punctuation. The hunt program takes one argument which specifies the base name of the index files to be searched. Only one set of index
files can be searched at a time, although many text files may be indexed as a group, of course.
If one of the text files has been changed since the index, that file is searched with /grep,· this
may occasionally slow down the searching, and care should be taken to avoid having many out
of date files. The following option arguments are recognized by hunt:
-a
-en
-F(yndJ
-g
-I string
-I n
-0 string
Give all output; ignore checking for false drops ..
Coordination level n; retrieve items with not more than n
terms of the input missing~ default CO, implying that each
search term must be in the output items.
"- Fy" gives the text of all the items found~ "- Fn"
suppresses them. "- F d" where d is an integer gives the text
of the first d items. The default is - Fy.
Do not use /grep to search files changed since the index was
made; print an error comment instead.
Take string as input, instead of reading the standard input.
The maximum length of internal lists of candidate items is n;
default 1000.
Put text output (H - Fy") in STring; of use only when invoked
from another program.
4-104
- p
-T(yndl
Print hash code frequencies~ mostly for use in optimizing hash
table sizes.
I'-Ty" gives the tags of the items found~ I'-Tn" suppresses
them.
T d" where d is an integer gives the first d tags. The
default is - Tn.
Put tag output (I' - Ty") in string; of use only when invoked
from another program.
\4 -
-t string
The timing of hunt is complex. Normally the hash table is overfull, so that there will be
many false drops on any single term~ but a multi-term query will have few false drops on all
terms. Thus if a query is underspecified (one search term) many potential items will be examined and discarded as false drops, wasting time. If the query is overspecified (a dozen search
terms) many keys will be examined only to verify that the single item under consideration has
that key posted. The variation of search time with number of keys is shown in the table below.
Queries of varying length were constructed to retrieve a particular document from the file of
references. In the sequence to the left, search ~erms were chosen so as to select the desired
paper as quickly as possible. In the sequence on the right, terms were chosen inefficiently, so
that the query did not uniquely select the desired document until four keys had been used.
The same document was the target in each case, and the final set of eight keys are also identical~ the differences at five, six and seven keys are produced by measurement error, not by the
slightly different key lists.
Inefficient Keys
Efficient Keys
No. keys
Total drops
(incl. false)
Retrieved
Documents
Search time
(seconds)
No. keys
Total drops
(incl. false)
1
2
3
15
1
1
3
1
1
1.27
0.11
0.14
0.17
0.19
0.23
0.27
0.29
1
2
3
68
29
8
1
1
1
4
1
1
5
6
7
8
1
1
1
1
1
1
1
1
4
5
6
7
8
1
1
Retrieved
Documents
Search time
(seconds)
55
5.96
2.72
0.95
0.18
0.21
0.22
0.26
0.29
29
8
1
1
1
1
1
As would be expected, the optimal search is achieved when the query just specifies the answer;
however, overspecification is quite cheap. Roughly, the time required by hunt can be approximated as 30 milliseconds per search key plus 75 milliseconds per dropped document (whether it
is a false drop or a real answer). In general, overspecification can be recommended~ it protects
the user against additions to the data base which turn previously uniquely-answered queries into
ambiguous queries.
The careful reader will have noted an enormous discrepancy between these times and the
earlier quoted time of around 1.9 seconds for a search. The times here are purely for the
search and retrieval: they are measured by running many searches through a single invocation
of the hunt program alone. Usually, the UNIX command processor (the shell) must start both
the mkey and hunt processes for each query, and arrange for the output of mkey to be fed to
the hunt program. This adds a fixed overhead of about l.7 seconds of processor time to any
single search. Furthermore, remember that all these times are processor times: on a typical
morning on our PDP 11/70 system, with about one dozen people logged on, to obtain 1 second
of processor time for the search program took between 2 and 12 seconds of real time, with a
median of 3.9 seconds and a mean of 4.8 seconds. Thus, although the work involved in a single search may be only 200 milliseconds, after you add the 1.7 seconds of startup processor
time and then assume a 4: 1 elapsed/processor time ratio, it will be 8 seconds before any
response is prin ted.
4-105
3. Seiettir.r:
!:lild Formntiiil~
P.eh:!('nc.:es for TLOFF
The rna.ior ~ji'pJicc.tiorl of lh\.., i"etricval So~twCire is rekr. which is a (rnt! preprocessor like
eqn. 3 II scans Its inpu! JooJ:irig Ie.; ile71!S 0: the form
.[
im precise citJ lIOf:
.)
where an imprecise citatioil is mereiy a string of words found in the relevant bibliographic citation. This is transla~ed into a prcperly formatted reference. If the imprecise citation does not
correctly identify (1 single paper (either selecting no papers or too many) a message is given.
The data base of citations searched may be tailored to each system, and individual users may
specify their own citatiorl files. On our system, the default data base is accumulated from the
publication lists of the members of our organization. plus about half a dozen personal bibliographies that were collected. The present total is about 4300 citations, but this increases steadily.
Even now, the data base covers a large fraction of local citations.
For example. the reference for the eqn paper above was specified as
preprocessor like
.I eqn .
.[
kernighan cherry acm ] 975
.]
It scans its input looking for items
This paper was itself printed using refer. The above input text was processed by refer as well as
fbi and (ro}1 by the command
refer memo~til(' r lbl I lroff -
inS
and the reference was automatically translated into a correct citation to the ACM paper on
mathematical typesetting.
The procedure to use to place a reference in a paper using
the lookbih command to check that the paper is in the data base
necessary to retrieve It. This is done by typing lookbib and then
until a suitable query is found. For example. had one started
above by presenting the query
feft'r is as follows. First, use
and to find out what keys are
typing some potential queries
to find the eqn paper shown
$ lookbib
kernighan cherr}
(EOT)
lookbib would have found ~everol items: experimentation 'Would quickly have shown that the
query given abo\'e is adequate. Over~pecifying the query is of course harmless: it is even desirable, since it decre?ses the risk that a document added to the publication data bas~ in the future
will be retrieved in addition to the intended document. The extra time taken by even a grossly
overspecified query is quite small. A particularly careful reader may have noticed that "aem"
does not appear in the printed citation: we have supplemented some of the data base items with
extra keywords, such as commnn abbreviations fl)r journals or other sources, to aid in searching.
If the reference is in the data base, the query that retrieved it can be inserted in the text.
between.! a;;o .1 brackets ir it is nOt in the data base, it can be typed into a private file of
3
B
\l,
I~::rr·:~h:.lr.
rrl~l-j:-:
,md L L Ch:!rq. ':"
(!\1a,ch 10 ')1
~YSlen,
for TYres(lttng \1athematlcs," Comm. Ano(
4-106
Comp Mach. 18,
references, using the format discliSed in the next section, and then the .- p option used to
search this private file. Such a command might read (jf the private references are called myfile)
refer - p myfile document I tbl I eqn I troff - ms ...
where {bl and/or eqn could be omitted if not needed. The use of the -ms macros 4 or some
other macro package, however, is essential. Refer only generates the data for the references~
exact formatting 'is done by some macro package, and if none is supplied the references will not
be printed.
By default, the references are numbered sequentially, and the -ms macros format references as footnotes at the bottom of the page. This memorandum is an example of that style.
Other possibilities are discussed in section 5 below.
4. Reference Fires.
A reference file is a set of bibliographic references usable with refer. It can be indexed
using the software described in' s~.~tiQn 2 for fast searching. What refer does is to read the
input document stream, looking fOr imprecise citation references. It then searches through
reference files to find the full citations'- and inserts them into the document. The format of the
full citation is arranged to make it convenient for a macro package, such as the -ms macros, to
format the reference for printing. Since the format of the final reference is determined by the
desired style of output, which is determined by the macros used, refer avoids forcing any kind
of reference appearance. All it does is define a set of string registers which contaih the basic
information about the reference~ and provide a macro call which is expanded by the macro
package to format the reference. It is the responsibility of the final macro package to see that
the reference is actually printed~ if no macros are used, and the output of refer fed untranslated
to {roff, nothing at all will be printed.
The strings defined by refer are taken directly from the files of references, which are in
the following format. The references should be separated by blank lines. Each reference is a
sequence of lines beginning with % and followed by a key-letter. The remainder of that line,
and successive lines until the next line beginning with %, contain the information specified by
the key-letter. In general, refer does not interpret the information, but merely pre3ents it to
the macro package for final formatting. A user with a separate macro package, for example, can
add new key-letters or use the existing ones for other purposes without bothering refer.
The meaning of the key-letters given below, in particular, is that assigned by the -ms
macros. Not all information, obviously, is used with each citation. For example, if a document
is both an internal memorandum and a journal article, the macros ignore the memorandum version and cite only the journal article. Some kinds of information are not used at all in printing
the reference~ if a user does not like finding refer'ences by specifying title or author keywords,
and prefers to add specific keywords to ~he ci'tation, a field is available which is searched but not
printed (K).
The key letters currently recognized by refer and -ms, with the kind of information
implied, are:
4.
M. E. Lesk. TYPInJ; Documents on UNIX and
memorandum (1977).
ceos:
4-107
The oms .".,facf05 /or Troff, Bell Laboratories internal
Key
A
B
C
D
E
G
I
J
K
L
M
Information specified
Author's name
Title of book containing item
City of publication
Date
Editor of book containing item
Government (NTIS) ordering number
Issuer (publisher)
Journal name
Keys (for searching)
Label
Memorandum label
Key
N
a
P
R
T
V
X
Y
Z
Information specified
Issue number
Other information
Page (s) of article
Technical report reference
Title
Volume number
or
or
Information not used by refer
For example, a sample reference could be typed as:
%T Bounds on the Complexity of the Maximal
Common Subsequence Problem
O/OZ ctrl27
%A A. V. Aho
%A D. S. Hirschberg
%A J. D. Ullman
%J 1. ACM
%V 23
%N 1
%P 1-12
%M abcd-78
%D Jan. 1976
Order is irrelevant, except that authors are shown in the' order given. The output of refer is a
stream of string definitions, one for each of the fields of each reference, as shown below .
.1.ds [A authors' names ...
. ds [T title ...
. ds [J journal ...
. ] [ type-number
The refer program, in general, does not concern itself with the significance of the strings. The
different fields are treated identically by refer, except that the X, Y and Z fields are ignored
(see the - i option of mkey) in indexing and searching. All refer does is select the appropriate
citation, based on the keys. The macro package must arrange the strings so as to produce an
appropriately formatted citation. In this process, it uses the convention that the 'T' field is the
title, the '1' field the journal, and so forth.
The refer program does arrange the citation to simplify the macro package's job, however.
The special macro .J- precedes the string definitions and Cle special macro.) (follows. These
are changed from the input. ( and .J so that running the same file through -refer again is harmless. The .1- macro can be used by the macro package to initialize. The.) ( macro, which
should be used to print the reference, is given an argument type-number to indicate the kind of
reference, as follows:
4-108
Value
1
2
3
4
5
o
Kind of reference
Journal article
Book
Article within book
Technical report
Bell Labs technical memorandum
Other
The type is determined by the presence or absence of particular fields in the citation (a journal
article must have a •r field, a book must have an 'I' field, and so forth). To a small extent,
this violates the above rule that refer does not concern itself with the contents of the citation~
however, the clas~ification of the citation in Iroff macros would require a relatively expensive
and obscure program. Any macro writer may, of course, preserve consistency by ignoring the
argument to the .J [ macro.
The reference is flagged in the text with the sequence
\* ([.number\* (.]
where number is the footnote number. The strings I. and. J should be used by the macro
package to format the reference flag in the text. These strings can be replaced for a particular
footnote, as described in section 5. The footnote number (or other signal) is available to the
reference macro .J [ as the string register [F. To simplify dealing with a text reference that
occurs at the end of a sentence, refer treats a reference which follows a period in a special way.
The period is removed, and the reference is preceded by a call for the string <. and followed
by a call for the string>. For example, if a reference follows "end." it will appear as
end\*( <.\*(['number\*(.J\*(>.
where number is the footnote number. The macro package should turn either the string >. or
<. into a period and delete the other one. This permits the output to have either the form
"end [31 1." or "end. 31" as the macro package wishes. Note that in one case the period precedes the number and in the other it follows the number.
In some cases users wish to suspend the searching, and merely use the reference macro
formatting. That is, the user doesn't want to provide a search key between. ( and .1 brackets,
but merely the reference lines for the appropriate document. Alternatively, the user can wish
to add a few fields to those in the reference as in the standard file, or override some fields.
Altering or replacing fields, or supplying whole references, is easily done by inserting lines
beginning with %~ any such line is taken as direct input to the reference processor rather than
keys to be searched. Thus
.[
key 1 key2 key 3 ...
%Q New format item
%R Override report name
.]
makes the indicates changes to the result of searching for the keys. All of the search keys must
be given before the first % line.
If no search keys are provided, an entire citation can be provided in-line in the text. For
example, if the eqn paper citation were to be inserted in this way, rather than by searching for
it in the data base, the input would read
4-109
preprocessor like
.I eqn .
.r
%A B. W. Kernighan
%A L. L. Cherry
%T A System for Typesetting Mathematics
%J Comm. ACM
%V 18
%N 3
%P 151-157
%D March 1975
.1
It scans its input looking for items
This would produce a citation of the same appearance as that resulting from the file search.
As shown. fields are normally turned into troff strings. Sometimes users would rather
have them defined as macros. so that other troff commands can be placed into the data. When
this is necessary, simply double the control character % in the data. Thus the input
.[
%V 23
%%M
Bell Laboratories,
Murray Hill, N.J. 07974
.1 .
is processed by refer into
.ds [V 23
.de [M
Bell Laboratories,
Murray Hill, N.J. 07974
The information after %%M is defined as a macro to be invoked by .(M while the information
after %V is turned into a string to be invoked by \*«V. At present -ms expects all information as strings.
5. Collecting References and other Refer Options
Normally, the combination of refer and -ms formats output as troff footnotes which are
consecutively numbered and placed at the bottom of the page. However, options exist to place
the references at the end~ to arrange references alphabetically by senior author~ and to indicate
references by strings in the text of the form [Name1975a] rather than by number. Whenever
references are not placed at the bottom of a page identical references are coaJesced.
For example, the - e option to refer specifies that references are to be collected~ in this
case they are output whenever the sequence
.[
SLIST$
.]
is encountered. Thus, to place references at the end of a paper, the user would run refer with
the -e option and place the above SLISTS commands after the last line of the text. Refer will
then move all the references to that point. To aid in formatting the collected references, refer
writes the references preceded by the line
4-110
.)<
and followed by the line
.1>
to invoke special macros before and after the references.
Another possible option to refer is the - s option to specify sorting of references. The
default, of course, is to list references in the order presented. The -s option implies the -e
option, and thus requires a
·[
SLIST$
·J
entry to call out the reference list. The - s option may be followed by a string of letters,
numbers, and' +' signs indicating how the references are to be sorted. The sort is done using
the fields whose key-letters are in the string as sorting keys; the numbers indicate how many of
the fields are to be considered, with' +' taken as a large number. Thus the default is - sAD
meaning "Sort on senior author, then date." To sort on all authors and then title, specify
-sA +T. And to sort on two authors and then the journal, write -sA2J.
Other options to refer change the signal or label inserted in the text for each reference.
Normally these are just sequential numbers, and their exact placement (within brackets, as
superscripts, etc.) is determined by the macro package. The -I option replaces reference
numbers by strings composed of the senior author's last name, the date, and a disambiguating
letter. If a number follows the I as in -13 only that many letters of the last name are used in
the label string. To abbreviate the date as well the form -Im,n shortens the last name to the
first m letters and the date to the last n digits. For example, the option -13,2 would refer to
the eqn paper (reference 3) by the signal Ker75a, since it is the first cited reference by Kernighan in 1975.
A user wishing to specify particular labels for a private bibliography may use the - k
option. Specifying - kx causes the field x to be used as a label. The default is L. If this field
ends in -, that character is replaced by a sequence letter; otherwise the field is used exactly as
given.
If none of the refer-produced signals are desired, the - b option entirely suppresses
automatic text signals.
If the user wishes to override the -ms treatment of the reference signal (which is normally to enclose the number in brackets in nroff and make it a superscript in troff) this can be
done easily. If the lines .1 or .J contain anything following these characters, the remainders of
these lines are used to surround the reference signal, instead of the default. Thus, for example, to say "See reference (2)." and avoid ,"See reference. 2" the input might appear
See reference
·[
(
imprecise citation '"
· D.
Note that bianks are significant in this construction. If a permanent change is desired in the
style of reference signals, however, it is probably easier to redefine the strings r. and.J (which
are used to bracket each <;ignaJ) than to change each citation.
Although normally refer limits itself to retrieving the data for the reference, and leaves to
a macro package the job of arranging that data as required by the local format, there are two
special options for rearrangements that can not be done by macro packages~ The -c option
p-uts fields into all upper case (CAPS-SMALL CAPS in troff output). The key-letters indicated
what information is to be translated to upper case follow the c, so that -cAJ means that
authors' names and journals are to be in caps. The - a option writes the names of authors last
4-111
name first, that is A. D. Hall, Jr. is written as Hall, A. D. Jr. The citation form of the Journal
of the A CM, for example, would require both -cA and - a options. This produces authors'
names in the style KERNIGHAN, B. W. AND CHERR y, L. L. for the previous example. The - a
option may be followed by a number to indicate how many author names should be reversed;
- al (without any -c option) would produce Kernighan, B. W. and L. L. Cherry, for example.
Finally, there is also the previously-mentioned - p option to let the user specify a p~ivate
file of references to be searched before the public files. Note that refer does not insist on a previously made index for these files. If a file is named which contains reference data but is not
indexed, it will be searched (more slowly) by refer using fgrep. In this way it is easy for users to
keep small files of new references, which can later be added to the public data bases.
4-112
CHAPTER
5
COMMAND REFERENCE
Included in this chapter are the XENIX Programmer'~ Manual
manual pages for commands discussed in this manual. They
have been included here for completeness.
5-1
INTRO(l)
XENIX Text Processing
INTRO{l)
N~E
intro -
introduction to commands
DESCRIPTION
This section describes publicly accessible commands in
alphabetic order. Certain distinctions of purpose are made
in the headings:
(1)
Commands of general utility.
(lC) Commands for communication with other systems.
(lG) Commands used primarily for graphics and computer-aided
design.
(1M) Commands used primarily for system maintenance.
The word 'local' at the foot of a page means that the command is not intended for general distribution.
SEE ALSO
DIAGNOSTICS
Section (6) for computer games.
How to
~
started, in the Introduction.
DIAGNOSTICS
Upon termination each command returns two bytes of status,
one supplied by the system giving the cause for termination,
and (in the case of 'normal' termination) one supplied by
the program, see wait and exit(2). The former byte is 0 for
normal terminatio~he latter is customarily 0 for successful execution, nonzero to indicate troubles such as erroneous parameters, bad or inaccessible data, or other inability
to cope with the task at hand.
It is called variously 'exit
code', 'exit status' or 'return code', and is described only
where special conventions are involved.
5-2
XENIX Text Processing
AWK(I)
AWK (I)
NAME
awk - pattern scanning and processing language
SYNTAX
awk [ -F£
[ prog
[ file ]
DESCRIPTION
Awk scans each input file for lines that match any of a set
of patterns specified-rn-~. With each pattern in ~
there can be an associated action that will be performed
when a line of a file matches the pattern. The set of patterns may appear literally as ~, or in a file specified
as -f file.
Files are read in order; if there are no files, the standard
input is read. The file name '_I means the standard input.
Each line is matched against the pattern portion of ·every
pattern-action statement; the associated action is pe~formed
for each matched pattern.
An input line is made up of fields separated by white space.
(This default can be changed by using FS, vide infra.) The
fields are denoted $1, $2, •.. ; $0 refers~the entire
line.
A pattern-action statement has the form
pattern { action }
A missing { action} means print the line; a missing pattern
always matches.
An action is a sequence of statemAnts.
one of the following:
A statement can be
if ( conditional ) statement [ else statement
while ( conditional) statement
for (expression
conditional; expression ) statement
break
continue
{ [ statement] .•. }
variable = expression
print [ expression-list] [ >expression
printf format [ , expression-list] [ >expression
next # skip remaining patterns on this input line
exit # skip the rest of the input
Statements are terminated by semicolons, newlines or right
braces. An empty expression-list stands for the whole line.
Expressions take on string or numeric values as appropriate,
and are built using the operators +, -, *, I, %, and concatenation (indicated by a blank). The C operators ++,
5-3
AWK(l)
XENIX Text Processing
AWK(l)
+=, -=, *=, /=, and %= are also available in expressions.
Variables may be scalars, array elements (denoted x[i]) or
fields.
Variables are initialized to the null string.
Array subscripts may be any string, not necessarily numeric;
this allows for a form of associative memory.
String constants are quoted " .•• ".
The print statement prints its arguments on the standard
output (or on a file if >file is present), separated by the
current output field separator, and terminated by the output
record separator. The printf statement formats its expression list according to the format (see printf(3).
The built-in function length returns the length of its argument taken as a string, or of the whole line if no argument.
There are also built-in functions exp, log, sqrt, and int.
The last truncates its argument to an integer.
substr(s, m, n) returns the n-character substring oe s that
begins at posItion m.
The function
sprintf(fmt, expr, expr, ••. ) formats the expressions
according to the printf(3) format given by fmt and returns
the resulting string.
Patterns are arbitrary Boolean combinations (!, I I, &&, and
parentheses) of regular expressions and relational expressions.
Regular expressions must be surrounded by slashes
and are as in egrep.
Isolated regular expressions in a pattern apply to the entire fine.
Regular expressions may also
occur in relational expressions.
A pattern may consist of two patterns separated by a comma;
in this case, the action is performed for all lines between
an occurrence of the first pattern and the next occurrence
of the second.
A relational expression is one of the following:
expression matchop regular-expression
expression relop expression
where a relop is any of the six relational operators in C,
and a matchop is either - (for contains) or !~ (for does not
contain). A conditional is an arithmetic expression, a
relational expression, ora Boolean combination of these.
The special patterns BEGIN and END may be used to capture
control before the first input line is read and after the
last.
BEGIN must be the first pattern, END the last.
A single character c may be used to separate the fields by
starting the program with
5-4
XENIX Text Processing
AWK(l)
AWK (1)
BEGIN { FS = "e" }
or by using the
-F£
option.
Other variable names with special meanings include NF, the
number of fields in the current record; NR, the ordinal
number of the current record; FILENAME, the name of the
current input file; OFS, the output field separator (default
blank); ORS, the output record separator (default newline);
and OFMT, the output format for numbers (default "%.6g").
EXAMPLES
Print lines longer than 72 characters:
length > 72
Print first two fields in opposite order:
{ print $2, $1 }
Add up first column, print sum and average:'
s += $1 }
END
1 print "sum is", s, " average is", s/NR }
Print fields in reverse order:
{ for
(i = NF; i > 0; --i)
print $i }
Print all lines between start/stop pairs:
/start/, /stop/
Print all lines whose first field is different from previous
one:
$1 != prev { print; prev
=
$1 }
SEE ALSO
lex (1) , sed(l)
A. V. Aho, B. W. Kernighan, P. J. Weinberger, Awk - a pattern scanning and processing language
NOTES
There are no explicit conversions between numbers and
strings.
To force an expression to be treated as ~ number
add 0 to it: to force it to be treated as a string ccncate
ate "" to it.
5-5
r
-
COL (1)
XENIX Text Processing
COL (1)
NAME
col - filter reverse line feeds
SYNTAX
col [-bfx]
DESCRIPTION
Col reads the standard input and writes the standard output.
~performs the line overlays implied by reverse line feeds
(ESC-7 in ASCII) and by forward and reverse half line feeds
(ESC-9 and ESC-B). Col is particularly useful for filtering
multicolumn output made with the '.rt' command of nroff and
output resulting from use of the tbl(l) preprocessor.
Although col accepts half line motions in its input, it normally does-not emit them on output.
Instead, text that
would appear between lines is moved to the next lower full
line boundary. This treatment can be suppressed by the -f
(fine) option; in this case the output from col may contain
forward half line feeds (ESC-9), but will still never contain either kind of reverse line motion.
If the -b option is given, col assumes that the output device in use is not capable of backspacing.
In this case, if
several characters are to appear in the same place, only the
last one read will be taken.
The control characters SO (ASCII code 017), and SI (016) are
assumed to start and end text in an alternate character set.
The character set (primary or alternate) associated with
each printing character read is remembered; on output, SO
and SI characters are generated where necessary to maintain
the correct treatment of each character.
Col normally converts white space to tabs to shorten printlng time.
If the -x option is given, this conversion is
suppressed.
All control characters are removed from the input except
space, backspace, tab, return, newline, ESC (033) followed
by one of 789, SI, SO, and~VT (013). This last character is
an alternate form of full reverse line feed, for compatibility with some other hardware conventions. All other nonprinting characters are ignored.
SEE ALSO
troff(l), tbl(l), greek(l)
NOTES
Can't back up more than 128 lines.
No more than 800 characters, including backspaces, on a
line.
5-6
COMM (1)
XENIX Text Processing
COMM(l)
NAME
comm - select or reject lines common to two sorted files
SYNTAX
corom [ -
[ 123 ] ] filel file2
DESCRIPTION
Comm reads filel and file2, which should be ordered in ASCII
collating sequence, and produces a three column output:
lines only in filel; lines only in fi1e2; and lines in both
files.
The filename '_I means the standard input.
Flags 1, 2, or 3 suppress printing of the corresponding
column.
Thus comm -12 prints only the lines common to the
two files; comm -23 prints only lines in the first file but
not in the second; comm -123 is a no-oPe
SEE ALSO
cmp ( 1), d iff ( 1), un iq ( l)
5-7
CTAGS (1)
XENIX Text Processing
CTAGS (1)
NAME
ctags - create a tags file
SYNTAX
ctags [ -u ] [-v]
-w]
[-x] name •••
DESCRIPTION
Ctags makes a tags file for ~(l) from the specified C, Pascal and Fortran sources. A tags file gives the locations of
specified objects (in this case functions) in a group of
files. Each line of the tags file contains the function
name, the file in which it is defined, and a scanning pattern used to find the function definition. These are given
in separate fields on the line, separated by blanks or tabs.
Using the tags file, ex can quickly find these function
definitions.
If the -x flag is given, ctags produces a list of function
names, the line number and file name on which each is
defined, as well as the text of that line and prints this on
the standard output. This is a simple index which can be
printed out as an off-line readable function index.
If the -v flag is given, an index of the form expected by
vgrind(l) is produced on the standard output. This listing
contains the function name, file name, and page number
(assuming 64 line pages). Since the output will be sorted
into lexicographic order, it may be desired to run the output through sort -f. Sample use:
ctags -v files I sort -f > index
vgrind -x index
Files whose name ends in .c or .h are assumed to be C source
files and are searched for C roGtine and macro definitions.
Others are first examined to see if they contain any Pascal
.or Fortran routine definitions; if not, they are processed
again looking for C definitions.
Other options are:
-w
suppressing warning diagnostics.
-u
causing the specified files to be updated in tags, that
is, all references to them are deleted, and the new
values are appended to the file.
(Beware: th is option
is implemented in a way which is rather slow; it is
usually faster to simply rebuild the tags file.)
The tag main is treated specially in C programs. The tag
formed is created by prepending M to the name of the file,
with a trailing .c removed, if any, and leading pathname
components also removed. This makes use of ctags practical
5-8
CTAGS (I)
XENIX Text Processing
CTAGS (1)
in directories with more than one program.
FILES
tags
output tags file
SEE ALSO
ex ( 1), vi ( l)
AUTHOR
Ken Arnold; FORTRAN added by Jim Kleckner; Bill Joy added
Pascal and -x, replacing cxref.
NOTES
Recognition of functions, subroutines and procedures for
FORTRAN and Pascal is don'e is a very simpleminded way. No
attempt is made to deal with block structure; if you have
two Pascal procedures in different blocks with the same name
you lose.
The method of deciding whether to look for C or Pascal and
FORTRAN functions is a hack.
5-9
DEROFF(I)
XENIX Text Processing
DEROFF(I)
NAME
deroff - remove nroff, troff, tbl and eqn constructs
SYNTAX
deroff [ -w ] file •••
DESCRIPTION
Deroff reads each file in sequence and removes all nroff and
troff command lines, backslash constructions, macro definitions, eqn constructs (between '.EQ' and '.EN' lines or
between delimiters), and table descriptions and writes the
remainder on the standard output. Deroff follows chains of
included files ('.so' and '.nx' commands); if a file has
already been included, a '.50' is ignored and a '.nx' terminates execution. If no input file is given, deroff reads
from the standard input file.
If the -w flag is given, the output is a word list, one
'word' (string of letters, digits, and apostrophes, beginning with a letter; apostrophes are removed) per line, and
all other characters ignored. Otherwise, the output follows
the original, with the deletions mentioned above.
SEE ALSO
troff (1), eqn (1), tbl (1)
NOTES
Deroff is not a complete troff interpreter, 50 it can be
confused by subtle constructs. Most errors result in too
much rather than too little output.
5-10
DIFF(l)
XENIX Text Processing
DIFF(l)
NAME
diff - differential file comparator
SYNTAX
diff
-efbh ] filel file2
DESCRIPTION
Diff tells what lines must be changed in two files to bring
them into agreement.
If filel (file2) is ' - ' , the standard
input is used.
If filel (file2) is a directory, then a file
in that directory whose file-name is the same as the filename of file2 (filel) is used.
The normal output contains
lines of these forms:
nl a n3,n4
nl,niCI n3
nl,n2 c n3,,!!!
These lines resemble ed commands to convert filel into'
file2.
The numbers after the letters pertain to file2.
In
fact, by exchanging 'a' for 'd' and reading backward one may
ascertain equally how to convert file2 into filel.
As in
ed, identical pairs where nl = n2 or n3 = n4 are abbreviated
a single number.
--
as
Following each of these lines come all the lines that are
affected in the first file flagged by '<', then all the
lines that are affected in the second file flagged by'>'.
The -b option causes trailing blanks (spaces and tabs)
ignored and other strings of blanks to compare equal.
to be
The -e option produces a script of ~, ~ and ~ commands for
the editor ed, which will recreate file2 from filel.
The-f
option prodUCes a similar script, not useful with ed, in the
opposite order.
In connection with -e, the following shell
program may help maintain multiple versions of a file.
Only
an ancestral file ($1) and a chain of version-to-version ed
scripts ($2,$3, •.. ) made by diff need be on hand. A 'latest
version' appears on the standard output.
(shift; cat $*; echo 'l,$p')
I
ed - $1
Except in rare circumstances, diff finds a smallest sufficient set of file differences.---Option -h does a fast, half-hearted job.
It works only when
changed stretches are short and well separated, but does
work on files of unlimited length. Options -e and -f are
unavailable with -h.
5-11
DIFF(l)
XENIX Text Processing
DIFF(l)
FILES
/tmp/d?????
/usr/lib/diffh for -h
SEE ALSO
cmp(l), comm(l), ed(l)
DIAGNOSTICS
Exit status is 0 for no differences, 1 for some, 2 for trouble.
NOTES
Editing scripts produced under the -e or -£ option are naive
about creating lines consisting of a single ' ,
5-12
XENIX Text Processing
DIFF3(1)
DIFF3(1}
NAME
diff3
SYNTAX
diff3
3-way differential file comparison
-ex3 ] filel file2 file3
DESCRIPTION
Diff3 compares three versions of a file, and publishes
disagreeing ranges of text flagged with these codes:
all three files differ
====1
filel is different
====2
file2 is different
====3
file3 is different
The type of change suffered in converting a given range of a
given file to some other is indicated in one of these ways:
f
nl a
Text is to be appended after line number nl
in file f' where f = 1, 2, or 3.
f
nl , n2 c
Text is to be changed in the range line nl
to line n2. If nl = 02, the range may be
abbreviated to n~
--
The original contents of the range follows immediately after
a c indication. When the contents of two files are identical, the contents of the lower-numbered file is suppressed.
Under the -e option, diff3 publishes a script for the editor
ed that will incorporate into filel all changes between
fIle2 and file3, i.e. the changes that normally would be
flagged ==== and ~=~=3. Option -x (-3) produces a script to
incorporate only changes flagged ==== (====3). The following command will apply the resulting script to 'filel'.
(cat script; echo' l,$p')
I
ed - filel
FILES
/tmp/d3?????
/usr/lib/diff3
SEE ALSO
diff{l)
NOTES
Text lines that consist of a single ' . ' will defeat -e.
Files longer than 64K bytes won't work.
5-13
ED(l)
XENIX Text Processing
ED(l)
NAME
ed - text editor
SYNTAX
ed [ -
[ -x ] [name ]
DESCRIPTION
Ed is the standard text editor.
If a name argument is given, ed simulates an e command (see
below,-on the named file; tha~is to say, the-file is read
into ed's buffer so that it can be edited. If -x is
present,-an x command is simulated first to handle an
encrypted file. The optional - suppresses the printing of
character counts by ~, £, and ~ commands.
Ed operates on a copy of any file it is editing: changes
made in the copy have no effect on the file until a w
(write) command is given. The copy of the text being edited
resides in a temporary file called the buffer.
Commands to ed have a simple and regular structure: zero or
more addresses followed by a single character command, possibly followed by parameters to the command. These
addresses specify one or more lines in the buffer. Missing
addresses are supplied by default.
In general, only one command may appear on a line. Certain
commands allow the addition of text to the buffer. While ed
is accepting text, it is said to be in input mode. In this
mode, no commands are recognized: all input is merely collected. Input mode is left by typing a period ' . ' alone at
the beginning of a line.
Ed supports a limited form of regular expression notation.
A regular expression specifies a set of strings of characters. A member of this set of strings is said to be matched
by the regular expression. In the following specification
for regular expressions the word 'character' means any character but newline.
.
1.
Any character except ~ special character matches
itself. Special characters are the regular expression
delimiter plus \[. and sometimes A*$.
2.
A. matches any character.
3.
A \ followed by any character except a digit or ()
matches that character.
4.
A nonempty string ~ bracketed [~] (or [A~]) matches any
character in (or not in) s. In ~, \ has no special
5-14
XENIX Text Processing
ED(l)
ED(l)
meaning, and] may only appear as the first letter. A
substring a-b, with a and b in ascending ASCII order,
stands for-the inclusive range of ASCII characters.
5.
A regular expression of form 1-4 followed by * matches
a sequence of 0 or more matches of the regular expression.
6.
A regular expression, x, of form 1-8, bracketed
matches what x matches~
7.
A \ followed by a digit n matches a copy of the string
that the bracketed regular expression beginning with
the ~th \( matched.
8.
A regular expression of form 1-8, ~, followed by a regular expression of form 1-7, y matches a match· for x
followed by a match for y, with the ~ match being as
long as possible while still permitting a y match.
9.
A regular expression of form 1-8 preceded by
(or followed by $), is constrained to matches that begin at
the left (or end at the right) end of a line.
10.
A regular expression of form 1-9 picks out the longest
among the leftmost matches in a line.
11.
An empty regular expression stands for a copy of the
last regular expression encountered.
\(~\)
A
Regular expressions are used in addresses to specify lines
and in one command (see s below) to specify a portion of a
line which is to be replaced. If it is desired to use one
of the regular expression metacharacters as an ordinary
character, that character may be preceded by ' \ ' . This also
applies to the character bounding the regular expression
(often '/') and to '\' itself.
To understand addressing in ed it is necessary to know that
at any time there is a current line. Generally speaking, the
current line is the last line affected by a command; however, the exact effect on the current line is discussed
under the description of the command. Addresses are constructed as follows.
,
.,
addresses the current line.
1.
The character
2.
The character '$' addresses the last line of the
buffer.
3.
A decimal number n addresses the
buffer.
5-15
~-th
line of the
ED(l)
XENIX Text Processing
ED(l)
4.
'IXI addresses the line marked with the name x, which
must be a lower-case letter. Lines are marked with the
k command described below.
5.
A regular expression enclosed in slashes '/1 addresses
the line found by searching forward from the current
line and stopping at the first line containing a string
that matches the regular expression.
If necessary the
search wraps around to the beginning of the buffer.
6.
A regular expression enclosed in queries '?I addresses
the line found by searching backward from the current
line and stopping at the first line containing a string
that matches the regular expression.
If necessary the
search wraps around to the end of the buffer.
7.
An address followed by a plus sign '+1 or a minus sign
'_I followed by a decimal number specifies that address
plus (resp. minus) the indicated number of lines. The
plus sign may be omitted.
8.
If an address begins with '+1 or '_I the addition or
subtraction is taken with respect to the current line;
e.g. '-5' is understood to mean '.-5'.
9.
If an address ends with '+' or '_I, then 1 is added
(resp. subtracted). As a consequence of this rule and
rule 8, the address '_I refers to the line before the
current line. Moreover, trailing '+' and '-, characters have cumulative effect, so ' __ I refers to the
current line less 2.
10.
To maintain compatibility with earlier versions of the
editor, the character 'AI in addresses is equivalent to
'_I
Commands may require zero, one, or two addresses. Commands
which require no addresses regard the presence of an address
as an error. Commands which accept one or two addresses
assume default addresses when insufficient are given.
If
more addresses are given than such a command requires, the
last one or two (depending on what is accepted) are used.
Addresses are separated from each other typically by a comma
',I.
They may also be separated by a semicolon ';1.
In
this case the current line '.1 is set to the previous
address before the next address is interpreted. This
feature can be used to determine the starting line for forward and backward searches ('/', '?'). The second address
of any two-address sequence must correspond to a line following the line corresponding to the first address.
5-16
XENIX Text Processing
ED (1)
ED(I)
In the following list of ed commands, the default addresses
are shown in parentheses.-The parentheses are not part of
the address, but are used to show that the given addresses
are the default.
As mentioned, it is generally illegal for more than one command to appear on a line. However, most commands may be
suffixed by 'pi or by '1', in which case the current line is
either printed or listed respectively in the way discussed
below.
(• )a
The append command reads the given text and appends it
after the addressed line.
' . ' is left on the last line
input, if there were any, otherwise at the addressed
line. Address '0' is legal for this command;· text is
placed at the beginning of the buffer.
(.,
.)c
The change command deletes the addressed lines, then
accepts input text which replaces these lines.
"is
left at the last line input; if there were none, it is
left at the line preceding the deleted lines.
(.,
.)d
The delete command deletes the addressed lines from the
buffer. The line originally after the last line
deleted becomes the current line; if the lines deleted
were originally at the end, the new last line becomes
the current line.
e filename
The edit command causes the entire contents of the
buffer to be deleted, and then the named file to be
read in.
' . ' is set to the last line of the buffer.
The number of characters read is typed.
'filename' is
remembered for possible use as a default file name in a
subsequent r or w command.
If 'filename' is missing,
the remembered name is used.
E filename
This command is the same as ~, except that no diagnostic results when no w has been given since the last
buffer alteration.
f filename
The filename command prints the currently remembered
file name.
If 'filename' is given, the currently
5-17
XENIX Text Processing
ED(l)
ED(l)
remembered file name is changed to 'filename'.
(l,$)g/regular expression/command list
In the global command, the first step is to mark every
line which matches the given regular expression. Then
for every such line, the given command list is executed
with ' . ' initially set to that line. A single command
or the first of multiple commands appears on the same
line with the global command. All lines of a multiline list except the last line must be ended with '\'.
A, i, and c commands and associated input are permitted~ the ':, terminating input mode may be omitted if
it would be on the last line of the command list. The
commands ~ and yare not permitted in the command list.
(. ) i
This command inserts the given text before the
addressed line. '.' is left at the last line input,
or, if there were none, at the line before the
addressed line. This command differs from the a command only in the placement of the text.
(., .+l)j
This command joins the addressed lines into a single
line; intermediate newlines simply disappear. " i s
left at the resulting line.
( • ) kx
The mark command marks the addressed line with name x,
which must be a lower-case letter. The address form"x' then addresses this line.
(.,
.)1
The list command prints the addressed lines in an unambiguous way: non-graphic characters are printed in
two-digit octal, and long lines are folded. The! command may be placed on the same line after any non-i/o
command.
( ., .) m~
The move command repositions the addressed lines after
the line addressed by a. The last of the moved lines
becomes the current line.
(.,
.)p
The print command prints the addressed lines. '.' is
left at the last line printed. The E command may be
placed on the same line after any non-i/o command.
5-18
ED(l)
XENIX Text Processing
(.,
ED(l)
.)P
This command is a synonym for
£.
q
The quit command causes ed to exit.
of a file is done.
No automatic write
Q
This command is the same as g, except that no diagnostic results when no w has been given since the last
buffer alteration.
($)r filename
The read command reads in the given file after the
addressed line.
If no file name is given, the remembered file name, if any, is used (see e and f commands).
The file name is remembered if there was no
remembered file name a gl
r and causes the file to be read at the beginning of
the buffer.
If the read is successful, the number of
c h a r act e r s rea dis ty pe d .
' .' i s 1 eft at th e I as t 1 i n e
read in from the file •
or,
• , .)s/regular expression/replacement/
., .)s/regular expression/replacement/g
The substitute command searches each addressed line for
an occurrence of the specified regular expression. On
each line in which a match is found, all matched
strings are replaced by the replacement specified, if
the global replacement indicator 'g' appears after the
command.
If the global indicator does not appear, only
the first occurrence of the matched string is replaced.
It is an error for the substitution to fail on all
addressed lines. Any character other than space or
new-line may be used instead of 'I' to delimit the regular expression and the replacement.
' . ' is left at
the last line substituted.
An ampersand '&' appearing in the replacement is
replaced by the string matching the regular expression.
The special meaning of '&' in this context may be
suppressed by preceding it by ' \ ' . The characters '\n'
where n is a digit, are replaced by the text matched~y
the n-th regular subexpression enclosed between '\('
and ~\) '. When nested, parenthesized subexpressions
are present, n is determined by counting occurrences of
'\(' starting-from the left.
Lines may be split by substituting new-line characters
into them.
The new-line in the replacement string must
be escaped by preceding it by ' \ ' .
( ., .) ta
ThIs command acts just like the m command, except that
5-19
XENIX Text Processing
ED (1)
ED(l)
a copy of the addressed lines is placed after address a
(which may be 0). '.1 is left on the last line of thecopy.
(.,
.)u
The undo command restores the preceding contents of the
current line, which must be the last line in which a
substitution was made.
(1, $)v/regular expression/command list
This command is the same as the global command ~ except
that the command list is executed ~ with ' . ' initially
set to every line except those matching the regular
expression.
(1, $)w filename
The write command writes the addressed lines onto the
given file. If the file does not exist, it is 'created
mode 666 (readable and writable by everyone). The file
name is remembered if there was no remembered file name
already. If no file name is given, the remembered file
name, if any, is used (see e and f commands). '.' is
unchanged. If the command Is successful, the number of
characters written is printed.
(l,$)W filename
This command is the same as w, except that the
addressed lines are appended-to the file.
x
A key string is demanded from the standard input.
Later r, e and w commands will encrypt and decrypt the
text wIth-this key by the algorithm of crypt(l). An
explicitly empty key turns off encryption.
($)= The line number of the addressed line is typed.
unchanged by this command.
"is
!
The remainder of the line after the '1' is sent to
sh(l) to be interpreted as a command. " i s
unchanged.
'
( • +1)
An address alone on a line causes the addressed line to
be printed. A blank line alone is equivalent to
'.+lp'; it is useful for stepping through text.
If an interrupt signal (ASCII DEL) is sent, ed prints a
and returns to its command level.
'?I
Some size limitations: 512 characters per line, 256 characters per global command list, 64 characters per file name,
5-20
ED(l)
XENIX Text Processing
ED(l)
and l28K characters in the temporary file.
The limit on the
number of lines depends on the amount of core: each line
takes I word.
When reading a file, ed discards ASCII NUL characters and
all characters after the last newline.
It refuses to read
files containing non-ASCII characters.
FILES
/tmp/e*
ed.hup: work is saved here if terminal hangs up
SEE ALSO
B. W. Kernighan, A Tutorial Introduction to the ED Text Editor
~W. Kernighan, Advanced editing on UNIX
sed(l), crypt(l)
DIAGNOSTICS
'?name' for inaccessible file; '?' for errors in commands;
'?TMP' for temporary file overflow.
To protect against throwing away valuable work, a g or ~
command is considered to be in error, unless a w has
occurred since the last buffer change. A second g or ~ will
be obeyed regardless.
NOTES
The I command mishandles DEL.
A ! command cannot be subject to a ~ command.
Because 0 is an illegal address for a w command, it is not
possible to create an empty file with ed. de1im $$
5-21
EQN(l)
XENIX Text Processing
EQN (1)
NAME
eqn, neqn, checkeq
typeset mathematics
SYNTAX
eqn [ -dxy] [-pn ] [-sn ] [-fn ] [ file ] •••
checkeq [ file ] •••
DESCRIPTION
Eqn is a troff(l) preprocessor for typesetting mathematics
on a Graphic Systems phototypesetter, neqn on terminals.
Usage is almost always
eqn file •••
neqn file •••
I
I
troff
nroff
If no files are specified, these programs reads from the
standard input. A line beginning with '.EQ' marks the start
of an equation; the end of an equation is marked by a line
beginning with '.EN'. Neither of these lines is altered, so
they may be defined in macro packages to get centering,
numbering, etc. It is also possible to set two characters
as 'delimiters'; subsequent text between delimiters is also
treated as eqn input. Delimiters may be set to characters x
and ~ with the command-line argument -d~ or (more commonly)
with 'delim ~' between .EQ and .EN. The left and right
delimiters may be identical. Delimiters are turned off by
'delim off'. All text that is neither between delimiters
nor between .EQ and .EN is passed through untouched.
The program checkeq reports missing or unbalanced delimiters
and .EQ/.EN pairs.
Tokens within eqn are separated by spaces, tabs, newlines,
braces, double quotes, tildes or circumflexes. Braces {}
are used for grouping; generally speaking, anywhere a single
character like x could appear, a complicated construction
enclosed in braces may be used instead. Tilde - represents
a full space in the output, circumflex ~ half as much.
SEE ALSO
t r 0 f f ( I), t b I ( 1), ms ( 7), eq nc h a r ( 7 )
B. W. Kernighan and L. L. Cherry, Typesetting MathematicsUser's Guide
J. F.-Ossanna, NROFF/TROFF User's Manual
NOTES
To embolden digits, parens, etc., it is necessary to quote
them, as in 'bold "12.3"'.
5-22
EX{UCB)
XENIX Text Processing
EX
(UCB)
NAME
ex - text editor
SYNTAX
ex [ -
[ -v]
[- t
tag ]
[- r ]
[ +1 i neno ] name ..•
DESCRIPTION
Ex is the root of a family of editors: edit, ex and vi. Ex
is a superset of ed, with the most notable extension~eing a
display editing facility.
Display based editing is the
focus of vi.
If you have not used ed, or are a casual user, you will find
that the editor edit is convenient for you.
It avoids some
of the complexities of ex used mostly by systems programmers
and persons very familiar with ed.
If you have a CRT terminal, you may wish to use a display
based editor; in this case see vi{UCB), which is a command
which focuses on the display editing portion of ex.
DOCUMENTATION
For edit and ex see the Ex/edit command summary - Version
2.0. The document Edit: A tutorial provides a comprehensive
Introduction to edit assuming no previous knowledge of computers or the UNIX system.
The Ex Reference Manual - Version 2.0 is a comprehensive and
complete manual for the command mode features of ex, but you
cannot learn to use the editor by reading it. Foran introduction to more advanced forms of editing using the command
mode of ex see the editing documents written by Brian Kernighan for the editor ed; the material in the introductory
and advanced documents-Works also with ex.
An Introduction to Display Editing with Vi introduces the
display editor vr-and provides reference material on vi. The
Vi Quick Reference card summarizes the commands of vi-rn a
useful, functional way, and is useful with the IntrOduction.
FOR ED USERS
If you have used ed you will find that ex has a number of
new features useful on CRT terminals.
Intelligent terminals
and high speed terminals are very pleasant to use with vi.
Generally, the editor uses far more of the capabilities of
terminals than ed does, and uses the terminal capability
data base termcap{UCB) and the type of the terminal you are
using from the variable TERM in the environment to determine
how to drive your terminal efficiently. The editor makes
use of features such as insert and delete character and line
in its visual command (which can be abbreviated vi) and
which is the central mode of editing when using vi(UCB).
5-23
EX (UCB)
XENIX Text Processing
EX (UCB)
There is also an interline editing open (0) command which
works on all terminals.
Ex contains a number of new features for easily viewing the
text of the file. The z command gives easy access to windows of text. Hitting AD causes the editor to scroll a
half-window of text and is more useful for quickly stepping
through a file than just hitting return. Of course, the
screen or"iented visual mode gives constant access to editing
context.
Ex gives you more help when you make mistakes. The undo (u)
command allows you to reverse any single change which goes
astray. Ex gives you a lot of feedback, normally printing
changed lInes, and indicates when more than a few lines are
affected by a command so that it is easy to detect when a
command has affected more lines than it should have.
The editor also normally prevents overwriting existing files
unless you edited them 50 that you don't accidentally
clobber with a write a file other than the one you are editing. If the system (or editor) crashes, or you accidentally
hang up the phone, you can use the editor recover command to
retrieve your work. This will get you back to within a few
lines of where you left off.
Ex has several features for dealing with more than one file
at a time. You can give it a list of files on the command
line and use the next (n) command to deal with each in turn.
The next command can also be given a list of file names, or
a pattern as used by the phell to specify a new set of files
to be dealt with. In general, filenames in the editor may
be formed with full shell metasyntax. The metacharacter '%'
is also available in forming filenames and is replaced by
the name of the current file. For editing large groups of
related files you can use ex's tag command to quickly locate
functions and other important-points in any of the files.
This is useful when working on a large program when you want
to quickly find the definition of a particular function.
The command ctags(UCB) builds a tags file or a group of C
programs.
For moving text between files and within a file the editor
has a group of buffers, named a through z. You can place
text in these named buffers and carry it-over when you edit
another file.
There is a command & in ex which repeats the last substitute
command. In addition there is a confirmed substitute command. You give a range of substitutions to be done and the
editor interactively asks whether each substitution is
desired.
5-24
EX(UCB)
XENIX Text Processing
EX (UCB)
You can use the substitute command in ex to systematically
convert the case of letters between upper and lower case.
It is possible to ignore case of letters in searches and
substitutions. Ex also allows regular expressions which
match words to be-constructed. This is convenient, for
example, in searching for the word "edit"
if your document
also contains the word "editor."
Ex has a set of options which you can set to tailor it to
your liking. One option which is very useful is the autoindent option which allows the editor to automatically supply
leading white space to align text. You can then use the AD
key as a backtab and space and tab forward to align new code
easily.
Miscellaneous new useful features include an intelligent
join (j) command which supplies white space between joined
lines automatically, commands < and> which shift groups of
lines, and the ability to filter portions of the buffer
through commands such as sort.
FILES
/usr/lib/ex2.0strings
/usr/lib/ex2.0recover
/usr/lib/ex2.0preserve
/etc/termcap
-/.exrc
/tmp/Exnnnnn
/tmp/Rxnnnnn
/usr/preserve
error messages
recover command
preserve command
describes capabilities of terminals
editor startup file
editor temporary
named buffer temporary
preservation directory
SEE ALSO
awk(l), ed(l), grep(l), sed(l), edit(UCB), grep(UCB),
termcap(UCB), vi (UCB)
AUTHOR
William Joy
NOTES
The undo command causes all marks to be lost on lines
changed and then restored if the marked lines were changed.
Undo never clears the buffer modified condition.
The z command prints a number of logical rather than physical lines. More than a screen full of output may result if
long lines are present.
File input/output errors don't print a name if the command
line '_I option is used~
5-25
EX (UCB)
XENIX Text Processing
EX (UCB)
There is no easy way to do a single scan ignoring case.
Because of the implementation of the arguments to next, only
512 bytes of argument list are allowed there.
The format of /etc/termcap and the large number of capabilities of terminals used by the editor cause terminal type
setup to be rather slow.
The editor does not warn if text is placed in named buffers
and not used before exiting the editor.
Null characters are discarded in input files, and cannot
appear in resultant files.
5-26
GREP (1)
GREP(l)
XENIX Text Processing
NAME
grep, egrep, fgrep - search a file for a pattern
SYNTAX
grep [ option ] ••.
expression [ file ] ••.
egrep
option
expression ]
fgrep
option
strings]
file
[file]
DESCRIPTION
Commands of the ~ family search the input files (standard
input default) for lines matching a pattern. Normally, each
line found is copied to the standard output; unless the -h
flag is used, the file name is shown if there is more than
one input file.
Grep patterns are limited regular expressions in the style
of ed(l); it uses a compact nondeterministic algorithm.
Egrep patterns are full regular expressions; it uses a fast
deterministic algorithm that sometimes needs exponential
space. Fgrep patterns are fixed strings; it is fast and
compact.
The following options are recognized.
-y
All lines but those matching are printed.
-c
Only a count of matching lines is printed.
-1
The names of files with matching lines are listed
(once) separated by newlines.
-n
Each line is preceded by its line number in the file.
-b
Each line is preceded by the block number on which it
was found. This is sometimes useful in locating disk
block numbers by context.
-s
No output is produced, only status.
-h
Do not print filename headers with output lines.
-y
Alphabetic letters in the pattern will match letters of
either case in the input (~and fgrep only).
-e expression
Same as a simple expression argument, but useful when
the expression begins with a -.
-f file
~e
regular expression (egrep) or string list (fgrep)
5-27
GREP (1)
XENIX Text Processing
GREP (1)
is taken from the file.
-x
(Exact) only lines matched in their entirety are
printed (fgrep only).
Care should
" ( ) and \
the Shell.
argument .in
be taken when using the characters $ * [ A I ? '
in the expression as they are also meaningful to
It is safest to enclose the entire expression
single quotes ' '.
Fgrep searches for lines that contain one of the (newlineseparated) strings.
Egrep accepts extended regular expressions. In the following description 'character' excludes newline:
A \ followed by a single character matches that character.
The character
line.
A
($) matches the beginning (end) of a
A • matches any character.
A single character not otherwise endowed with special
meaning matches that character.
A string enclosed in brackets [] matches any single
character from the string. Ranges of ASCII character
codes may be abbreviated as in 'a-zO-9'. A] may occur
only as the first character of the string. A literal must be placed where it can't be mistaken as a range
indicator.
A regular expression followed by * (+, ?) matches a
sequence of 0 or more (lor more, 0 or 1) matches of
the regular expression.
Two regular expressions concatenated match a match of
the first followed by a match of the second.
Two regular expressions separated by I or newline match
either a match for the first or a match for the second.
A regular expression' enclosed in parentheses matches a
match for the regular expression.
The order of precedence of operators at the same parenthesis
level is [] then *+? then concatenation then 1 and newline.
SEE ALSO
ed (1), sed (1), sh (1)
5-28
GREP (1)
XENIX Text Processing
GREP(l)
DIAGNOSTICS
Exit status is 0 if any matches are found, 1 if none, 2 for
syntax errors or inaccessible files.
NOTES
Ideally there should be only one ~, but we don't know a
single algorithm that spans a wide enough range of spacetime tradeoffs.
Lines are limited to 256 characters; longer lines are truncated.
5-29
HEAD (UCB)
XENIX Text Processing
HEAD (UCB)
NAME
head - give first few lines of a stream
SYNTAX
head [ -count ]
file •••
DESCRIPTION
This filter gives the first count lines of each of the
specified files, or of the standard input. If count is
omitted it defaults to 10.
SEE ALSO
tail(l)
AUTHOR
Bill Joy
5-30
PREP(I)
XENIX Text Processing
PREP(I)
N~E
prep - prepare text for statistical processing
SYNTAX
prep
-dio] file •••
DESCRIPTION
Prep reads each file in sequence and writes it on the standard output, one~rd' to a line. A word is a string of
alphabetic characters and imbedded apostrophes, delimited by
space or punctuation. Hyphented words are broken apart;
hyphens at the end of lines are removed and the hyphenated
parts are joined. Strings of digits are discarded.
The following option letters may appear in any order:
-d
Print the word number (in the input stream) with each
word.
-i
Take the next file as an 'ignore' file.
These words
will not appear ih the output.
(They will be counted,
for purposes of the -d count.)
-0
Take the next file as an 'only' file.
Only these words
will appear in the output.
(All other words will also
be counted for the -d count.)
-p
Include punctuation marks (single nona1phanumeric characters) as separate output lines. The punctuation
marks are not counted for the -d count.
Ignore and only files contain words, one per line.
SEE ALSO
deroff(l)
5-31
PTX (1)
XENIX Text Processing
PTX (1)
NAME
ptx - permuted index
SYNTAX
ptx [ option ] •••
[ input [ output ]
DESCRIPTION
Ptx generates a permuted index to file input on file output
(standard- input and output default).
It has three phases:
the first does the permutation, generating one line for each
keyword in an input line. The keyword is rotated to the
front.
The permuted file is then sorted. Finally, the
sorted lines are rotated so the keyword comes at the middle
of the page. Ptx produces output in the form:
.xx "tail" "before keyword" "keyword and after" "head"
where .xx may be an nroff or troff(l) macro for user~defined
formatting. The before keyword and keyword and after fields
incorporate as much of the line as will fit around the keyword when it is printed at the middle of the page. Tail and
head, at least one of which is an empty string "", arewrapped-around pieces small enough to fit in the unused
space at the opposite end of the line. When original text
must be discarded, 'It marks the spot.
The following options can be applied:
-f
Fold upper and lower case letters for sorting.
-t
Prepare the output for the phototypesetter; the default
line length is 100 characters.
-w n Use the next argument, g, as the width of the output
line. The default line length is 72 characters.
-g n Use the next argument, g, as the number of characters
to allow for each gap among the four parts of the line
as finally printed. The default gap is 3 characters.
-0
only
Use as keywords only the words given in the only file.
-i ignore
Do not use as keywords any words given in the ignore
file.
If the -i and -0 options are missing, use
lusr/lib/eign as the ignore file.
-b break
Use the characters in the break file to separate words.
In any case, tab, newline, and space characters are
always used as break characters.
5-32
XENIX Text Processing
PTX(l}
-r
PTX(l}
Take any leading nonblank characters of each input line
to be a reference identifier (as to a page or chapter)
separate from the text of the line. Attach that identifier as a 5th field on each output line.
The index for this manual was generated using ptx.
FILES
/bin/sort
/usr/lib/eign
NOTES
Line length counts do not account for overstriking or proportional spacing.
5-33
PUBINDEX (1)
XENIX Text Processing
PUBINDEX (1)
NAME
pub index - make inverted bibliographic index
SYNTAX
pub index [ file ] •••
DESCRIPTION
Pubindex makes a hashed inverted index to the named files
for use by refer(l). The files contain bibliographic references separated by blank lines. A bibliographic reference
is a set of lines that contain bibliographic information
fields.
Each field starts on a line beginning with a '%',
followed by a key-letter, followed by a blank, and followed
by the contents of the field, which continues until the next
line starting with '%'. The most common key-letters and the
corresponding fields are:
A
B
C
D
d
E
G
I
J
K
M
N
a
P
R
r
T
V
X
Author name
Title of book containing article referenced
City
Date
Alternate date
Editor of book containing article referenced
Government (CFSTI) order number
Issuer (publisher)
Journal
Other keywords to use in locating reference
Technical memorandum number
Issue number within volume
Other commentary to be printed at end of reference
Page numbers
Report number
Alternate report number
Title of article, book, etc.
Volume number
Commentary unused by pubindex
Except for 'A', each field should only be given once.
relevant fields should be supplied. An example is:
%T 5-by-5 Palindromic Word Squares
%A M. D. McIlroy
%J Word Ways
%V 9
%P 199-202
%D 1976
FILES
~.ia,
~.ib,
x.ic where x is the first argument.
5-34
Only
PUBINDEX(l)
XENIX Text Processing
SEE ALSO
refer(l)
5-35
PUBINDEX (1)
REFER(I)
XENIX Text Processing
REFER (1)
NAME
refer, lookbib - find and insert literature references in
documents
SYNTAX
refer
option
lookbib [ file
DESCRIPTION
Lookbib accepts keywords from the standard input and
searches a bibliographic data base for references that contain those keywords anywhere in title, author, journal name,
etc. Matching references are printed on the standard output. Blank lines are taken as delimiters between queries.
Refer is a preprocessor for nroff or troff(l) that finds and
formats references. The input files (standard input
default) are copied to the standard output, except for lines
between. [ and .] command lines, which are assumed to contain keywords as for lookbib, and are replaced by information from the bibliographic data base. The user may avoid
the search, override fields from it, or add new fields. The
reference data, from whatever source, are assigned to a set
of troff strings. Macro packages such as ms(7) print the
finished reference text from these strings-.- A flag is
placed in the text at the point of reference; by default the
references are indicated by numbers.
The following options are available:
-ar
Reverse the first r author names (Jones, J. A. instead
of J. A. Jones). If r is omitted all author names are
reversed.
-b
Bare mode: do not put any flags in text (neither
numbers nor labels).
-cstring
Capitalize (with CAPS SMALL CAPS) the fields whose
key-letters are in string.
-e
Instead of leaving the references where encountered,
accumulate them until a sequence of the form
[
$LIST$
.
.]
is encountered, and then write out all references collected so far. Collapse references to the same
source.
-kx
Instead of numbering references, use labels as
5-36
REFER(l)
KENIK Text
~rocessing
specified in a reference data line beginning
default x is L.
REFER(l)
%~;
by
-l~,~
Instead of numbering references, use labels made from
the senior author's last name and the year of publication.
Only the first m letters of the last name and
the last n digits of the date are used.
If either m
or ,n is omitted the entire name or date respectively
is used.
-p
Take the next argument as a file of references to be
searched.
The default file is searched last.
~n
Do not search the default file.
-skeys
Sort references by fields whose key-letters are in tne
keys string; permute reference numbers in text, accordingly.
Implies -e.
The key-letters in keys may be
followed by a number to indicate how many such fields
are used, with + taken as a very large number.
The
default is AD which sorts on the senior author and
then date; to sort, for example, on all authors and
then title use -sA+T.
To use your own references, put them in the format described
in ~ubindex(l) They can be searched more rapidly by running
publndex(l) on them before using refer; failure to index
results in a linear search.
When refer is used with egn, neqn or tbl, refer should be
first, to minimize the volume of data passed through pipes.
FILES
/usr/dict/papers directory of default publication lists and
indexes
/usr/lib/refer directory of programs
SEE ALSO
troff(l)
5-37
REV(I)
XENIX Text Processing
REV(l)
NAME
rev - reverse lines of a file
SYNTAX
rev [ file ] •••
DESCRIPTION
Rev copies the named files to the standard output, reversing
the order of characters in every line. If no file is specified, the standard input is copied.
5-38
ROFF(I)
XENIX Text Processing
ROFF(I)
NAME
roff
format text
SYNTAX
roff [ +!:!.
nroff -ror
troff -ror
-8
option
option
]
[-h] file
file
file
DESCRIPTION
Roff formats text according to control lines embedded in the
text in the given files.
Encountering a nonexistent file
terminates printing.
Incoming inter-terminal messages are
turned off during printing.
The optional flag arguments
mean:
+n
Start printing at the first page with number n.
-n
Stop printing at the first page numbered high~~ than n.
-s
Stop before each page (including the first) to allow
paper manipulation; resume on receipt of an interrupt
signal.
-h
Insert tabs in the output stream to replace spaces
whenever appropriate.
Input consists of intermixed text lines, which contain
information to be formatted, and request lines, which contain instructions about how to format it. Request lines
begin with a distinguished control character, normally a
period.
Output lines may be filled as nearly as possible with words
without regard to input lineation. Line breaks may be
caused at specified places by certain commands, or by the
appearance of an empty input line or an input line beginning
wi th a space.
The capabilities of roff are specified in the attached
Request Summary. Numerical values are denoted there by n or
+n, titles by t, and single characters by c. Numbers
denoted +n may be signed + or -, in which case they signify
relative changes to a quantity, otherwise they signify an
absolute resetting. Missing n fields are ordinarily taken
to be 1, missing t fields to be empty, and c fields to shut
off the appropriate special interpretation.
Running titles usually appear at top and bottom of every
page. They are set by requests like
.he 'partl'part2'part3'
Partl is left justified, part2 is centered, and part3 is
right justified on the page. Any % sign in a title is
replaced by the current page number. Any nonblank may serve
5-39
ROFF(l)
XENIX Text Processing
ROFF(l)
as a quote.
ASCII tab characters are replaced in the input by a replacement character, normally a space, according to the column
settings given by a .ta command.
(See .tc for how to convert this character on output.)
Automatic hyphenation of filled output is done under control
of .hy. When a word contains a designated hyphenation character, that character disappears from the output and hyphens
can be introduced into the word at the marked places only.
The -mr option of nroff or troff(l) simulates roff to the
greatest extent possible.
FILES
/usr/lib/suftab
suffix hyphenation tables
/tmp/rtm? temporary
NOTES
Roff is the simplest of the text formatting programs, and is
utterly frozen.
5-40
ROFF{l)
XENIX Text Processing
ROFF(l)
REQUEST SUMMARY
Request
. ad yes
• ar no
.br yes
.bl n
.bp +n
.cc c
.ce n
.de xx
• ds
. ef
· eh
. fi
. fo
. hc
.he
. hx
.hy
yes
t
t
yes
no
c
t
no
n
no
· ig
· in
. ix
. li
. 11
. ls
.ml
+n
+n
n
+n
+n
n
Break
Initial
Meaning
yes Begin adjusting right margins .
arabic
Arabic page numbers •
Causes a line break the filling of the current
line is stopped.
yes
Insert of n blank lines, on new page if
necessary.
yes n=l Begin new page and number it n: no n means
'+1' •
no
c=. Control character becomes 'c'.
yes
Center the next n input lines, without filling.
no
Define parameter less macro to be invoked by
request' .xx' (definition ends on line beginning
,
')
no
Double space; same as '.ls 2'.
no
t=
Even foot title becomes t .
no
t=
Even head title becomes t.
yes Begin filling output lines .
t=
All foot titles are t .
no
none Hyphenation character becomes 'ct .
no
t=
All head titles are t.
Title lines are suppressed .
no
n=l Hyphenation is done, if n=l: and is not done,
if n=O.
,Ignore input lines through a line beginning with
.. .
yes
no
no
no
yes
no
.m2 n
no
.m3 n
no
.m4 n
no
.na yes
.ne n
no
no
.nn +n
.nl no
no
no
. n2 n
no
. ni +n
• nf yes
no
no
Indent n spaces from left margin.
Same as '.in' but without break .
Literal, treat next n lines as text .
n=65 Line length including indent is n characters .
n=l Line spacing set to n lines per output line .
n=2 Put n blank lines between the top of page and
head title.
n=2 n blank lines put between head title and
beginning of text on page.
n=l n blank lines put between end of text and
foot title.
n=3 n blank lines put between the foot title and
the bottom of page.
Stop adjusting the right margin.
Begin new page, if n output lines cannot fit
on present page.
The next n output lines are not numbered.
Add 5 to page offset: number lines in margin from
1 on each page.
no
Add 5 to page offset: number lines from ni
stop if n=O.
n=O Line numbers are indented n •
Stop filling output lines •
5-41
XEN~X
ROFF(l)
.nx
. of
. oh
• pa
. pl
.po
.ro
. sk
.sp
.ss
. ta
. tc
. ti
• tr
.ul
Text Processing
ROFF(l)
Switch inp:~t to "'file'.
t=
Odd foot ·~itle becomes t •
t=
Odd head title becomes t •
n=l
Same
as .... bp' •
+n
+n
n=66 Total paper length taken to be n lines •
+n
n=O Page offset. All lines are preceded by n
spaces.
no
arabic
Roman page numbers.
no
Produce n blank pages starting next page •
n
n
yes
Insert block of n blank lines, except at top
of page.
yes yes Single space output lines, equivalent to .... ls 1'.
Pseudotab settings •
n n ••
.ta n n..
Pseudotab settings.
Initial
tab settings are columns 9 17 25 •••
space
Tab replacement character becomes "'c' •
c
no
Temporarily indent next output line n spaces •
+n
yes
Translate c into d, e into f, etc •
no
cdef .•
Underline the letters and numbers in the next
n
no
n input lines.
file
t
t
no
no
yes
no
no
5-42
SED (I)
XENIX Text Processing
SED (I)
NAME
sed - stream editor
SYNTAX
sed [ -n
[ -e script]
[-f sfile ]
[ file]
DESCRIPTION
Sed copies the named files (standard input default) to the
standard output, edited according to a script of commands.
The -f option causes the script to be taken from file sfile;
these options accumulate.
If there is just one -e option
and no -f's, the flag -e may be omitted. The -n option
suppresses the default output.
A script consists of editing commands, one per line, of the
following form:
[address [, address]
] function [arguments]
In normal operation sed cyclically copies a line of input
into a pattern space~nless there is something left after a
'D' command), applies in sequence all commands whose
addresses select that pattern space, and at the end of the
script copies the pattern space to the standard output
(except under -n) and deletes the pattern space.
An address is either a decimal number that counts input
lines cumulatively across files, a '$' that addresses the
last line of input, or a context address, '/regular expression/', in the style of ed(I) modified thus:
The escape sequence '\n' matches a newline embedded in
the pattern space.
A command line with no addresses selects every pattern
space.
A command line with one address selects each pattern space
that matches the address.
A command line with two addresses selects the inclusive
range from the first pattern space that matches the first
address through the next pattern space that matches the
second.
(If the second address is a number less than or
equal to the line number first selected, only one line is
selected.) Thereafter the process is repeated, looking again
for the first address.
Editing commands can be applied only to non-selected pattern
spaces by use of the negation function '1' (below).
5-43
SED (1)
XENIX Text Processing
SED (1)
In the following list of functions the maximum number of
permissible addresses for each function is indicated in
parentheses.
An argument denoted text consists of one or more lines, all
but the last of which end with '\1 to hide the newline.
Backslashes in text are treated like backslashes in the
replacement string of an 'Sl command, and may be used to
protect initial blanks and tabs against the stripping that
is done on every script line.
An argument denoted rfile or wfile must terminate the command line and must be preceded by exactly one blank. Each
wfile is created before processing begins. There can be at
most 10 distinct wfile arguments.
(l)a\
text
Append. Place text on the output before reading the
next input line-.--(2)b label
Branch to the ':1 command bearing the label.
is empty, branch to the end of the script.
If label
(2)c\
text
Change. Delete the pattern space. With 0 or 1 address
or at the end of a 2~address range, place text on the
output. Start the next cycle.
(2)d Delete the pattern space.
Start the next cycle.
(2)0 Delete the initial segment of the pattern space through
the first newline. Start the next cycle.
(2)g Replace the contents of the pattern space by the contents of the hold space.
(2)G Append the contents of the hold space to the pattern
space.
(2)h Replace the contents of the hold space by the contents
of the pattern space~
(2)H Append the contents of the pattern space to the hold
space.
(1) i \
text Insert.
Place text on the standard output.
(2)1 List the pattern space on the standard output in an
5-44
XENIX Text Processing
SED(l)
SED (1)
unambiguous form.
Non-printing characters are spelled
in two digit ascii, and long lines are folded.
(2)n Copy the pattern space to the standard output.
the pattern space with the next line of input.
Replace
(2)N Append the next line of input to the pattern space with
an embedded newline.
(The current line number
changes.)
(2)p Print.
Copy the pattern space to the standard output.
(2}P Copy the initial segment of the pattern space through
the first newline to the standard output.
(l)q Quit.
Branch to the end of the script.
new cycle.
Do not start a
(2)r rfile
Read the contents of rfile.
Place them on the output
before reading the next input line.
(2)s/regular expression/replacement/flags
Substitute the replacement string for instances of the
regular expression in the pattern space.
Any character
may be used instead of 'I'. For a fuller description
see ed(l).
Flags is zero or more of
g
Global.
Substitute for all nonoverlapping
instances of the regular expression rather than
just the first one.
p
Print the pattern space if a replacement was made.
w wfile
Write.
Append the pattern space to wfile if a
replacement was made.
(2)t label
Test.
Branch to the ' : ' command bearing the label if
any substitutions have been made since the most recent
reading of an input line or execution ofa 'tee
If
label is empty, branch to the end of the script.
(2}w wfile
Write.
Append the pattern space to wfile.
(2}x Exchange the contents of the pattern and hold spaces.
(2)y/stringl/string2/
Transform.
Replace all occurrences of characters in
stringl with the corresponding character in string2.
5-45
SED(l)
XENIX Text Processing
SED(l)
The lengths of stringl and string2 must be equal.
(2)! function
Donlt. Apply the function (or group, if function is
'{I) only to lines not selected by the addressees).
(0): label
This command does nothing; it bears a label for 'b' and
' t l commands to branch to.
(1)= Place the current line number on the standard output as
a line.
(2){ Execute the following commands through a matching '}I
only when the pattern space is selected.
(0)
An empty command is ignored.
SEE ALSO
ed(l), grep(l), awk{l)
5-46
SORT(l)
XENIX Text Processing
SORT (1)
NAME
sort -
sort or merge files
SYNTAX
so r t [ -m u b d fin r t x] [+po s 1
-T directory ] [ name ]
[ - po s 2 ] ] ...
[
-0
name ]
DESCRIPTION
Sort sorts lines of all the named files together and writes
the result on the standard output.
The name '_I means the
standard input . . If no input files are named, the standard
input is sorted.
The default sort key is an entire line.
Default ordering is
lexicographic by bytes in machine collating sequence.
The
ordering is affected globally by the following options, one
or more of which may appear.
b
Ignore leading blanks (spaces and tabs)
parisons.
in field com-
d
'Dictionary' order: only letters, digits and blanks are
significant in comparisons.
f
Fold upper case letters onto lower case.
i
Ignore characters outside the ASCII range 040-0176 in
nonnumeric comparisons.
n
An initial numeric string, consisting of optional
blanks, optional minus 'sign, and zero or more dig its
with optional decimal point, is sorted by arithmetic
value. Option n implies option b.
r
Reverse the sense of comparisons.
tx
'Tab character' separating fields is x.
The notation +posl -pos2 restricts a sort key to a field
beginning at posl and ending just before pos2.
Posl and
pos2 each have the form ~.g, optionally followed by o~e or
more of the flags bdfinr, where m tells a number of flelds
to skip from the beginning of the line and n tells a number
of characters to skip further.
If any flags are present
they override all the global ordering options for this key.
If the b option is in effect n is counted from the first
nonblank in the field; b is attached independently to pos2.
A missing .!l means .0; a missing -pos2 means the end of the
line. Under the -tx option, fields are strings separated by
X; otherwise fields-are nonempty nonblank strings separated
by blanks.
5-47
SORT(l)
XENIX Text Processing
SORT(l)
When there are multiple sort keys, later keys are compared
only after all earlier keys compare equal.
Lines that otherwise compare equal are ordered with all bytes significant.
These option arguments are also understood:
c
Check that the input file is sorted according to the
ordering rules; give no output unless the file is out
of sort.
rn
Merge only, the input files are already sorted.
o
The next argument is the name of an output file to use
instead of the standard output.
This file may be the
same as one of the inputs.
T
The next argument is the name of a directory in which
temporary files should be made.
u
Suppress all but one in each set of equal lines.
Ignored bytes and bytes outside keys do not participate
in this comparison.
Examples. Print in alphabetical order all the unique spellings in a list of words.
Capitalized words differ from
uncapitalized.
sort -u +Of +0 list
Print the password file (passwd(5»
(the 3rd colon-separated field).
sorted by user id number
sort -t: +2n /etc/passwd
Print the first instance of each month in an already sorted
file of (month day) entries.
The options -urn with just one
input file make the choice of a unique representative from a
set of equal lines predictable.
sort -urn +0 -1 dates
FILES
/usr/tmp/stm*, /tmp/*: first and second tries for temporary
files
SEE ALSO
uniq(l), comm(l), rev(l), join(l)
DIAGNOSTICS
Comments and exits with nonzero status for various trouble
conditions and for disorder discovered under option -c.
5-48
SORT (1)
XENIX Text Processing
NOTES
Very long lines are silently truncated.
5-49
SORT (1)
SPELL(I}
XENIX Text Processing
SPELL(I)
NAME
spell, spellin, spellout SYNTAX
spell [ option ]
•••
find spelling errors
[ file ] •••
/usr/src/cmd/spell/spellin [ list ]
/usr/src/cmd/spell/spellout [ -d ] list
DESCRIPTION
Spell collects words from the named documents, and looks
them up in a spelling list. Words that neither occur among
nor are derivable (by applying certain inflections, prefixes
or suffixes) from words in the spelling list are printed on
the standard output.
If no files are named, words are collected from the standard input.
Spell ignores most troff, tbl and eqn(l) constructions.
Under the -v option, all words not literally in the spelling
list are printed, and plausible derivations from spelling
list words are indicated.
Under the -b option, British spelling is checked. Besides
preferring centre, colour, speciality, travelled, etc., this
option insists upon -ise in words like standardise, Fowler
and the OED to the contrary notwithstanding.
Under the -x option, every plausible stem is printed with
'=' for each word.
The spelling list is based on many sources, and while more
haphazard than an ordinary dictionary, is also more effective in respect to proper names and popular technical words.
Coverage of the specialized vocabularies of biology, medicine and chemistry is light.
Pertinent auxiliary files may be specified by name arguments, indicated below with their default settings. Copies
of all output are accumulated in the history file. The stop
list filters out misspellings {e.g. thier=thy-y+ier} that
would otherwise pass.
Two routines help maintain the hash lists used by spell.
Both expect a list of words, one per line, from the standard
input.
Spellin adds the words on the standard input to the
preexisting list and places a new list on the standard output.
If no list is specified, the new list is created from
scratch. speIIOut looks up each word in the standard input
and prints on the standard output those that are missing
from (or present on, with option -d) the hash list.
5-50
SPELL(I)
XENIX
Text~Processing
SPELL(I)
FILES
D=/usr/dict/hlist[ab]: hashed spelling. lists, American &
British
S=/usr/dict/hstop: hashed stop list
H=/usr/dict/spellhist: history file
/usr/lib/spell
deroff(l), sort(l), tee(l), sed(l)
NOTES
The spelling list's coverage is uneven~ new installations
will probably wish to monitor the output for several months
to gather local additions.
British spelling was done by an American.
5-51
XENIX Text Processing
SPLIT(l)
SPLIT(l)
NAME
split - split a file into pieces
SYNTAX
split [ -!!,]
[
file [ name] ]
DESCRIPTION
Split reads file and writes it in n-line pieces (default
1000), as. many as necessary, onto
set of output files.
The name of the first output file is name with aa appended,
and so on lexicographically. If no output name is given, x
is default.
a
If no input file is given, or if - is given in its stead,
then the standard input file is used.
WARNING
1000 lines is usually less than 19 pages.
Lpr does not guarantee that it prints the files in the order
given.
SEE ALSO
Ipr (I), wc (I)
5-52
TAIL(l)
XENIX Text piocessing
TAIL(l)
NAME
tail - deliver the last part of a file
SYNTAX
tail ±number [lbc]
[ file]
DESCRIPTION
Tail copies the named file to the standard output beginning
at a designated place.
If no file is named, the standard
input is used.
Copying begins at distance +number from the beginning, or
-number from the end of the input. Number is counted in
units of lines, blocks or characters, according to the
appended option 1, b or c. When no units are specified,
counting is by lines.
SEE ALSO
dd(l)
NOTES
Tails relative to the end of the file are treasured up in a
buffer, and thus are limited in length. Various kinds of
anomalous behavior may happen with character special files.
5-53
XENIX Text Processing
TBL(l}
TBL(l)
NAME
tbl - format tables for nroff or troff
SYNTAX
tbl [ files·]
.".
DESCRIPTION
Tbl is a preprocessor for formatting tables for nroff or
trOff(l).- The input files are copied to the standard output, except for lines between .TS and .TE command lines,
which are assumed to describe tables and reformatted.
Details are given in the reference manual.
As an example, letting \t represent a tab (which should be
typed as a genuine tab) the input
.TS
c s s
c c s
c c c
1 n n.
Household Population
Town\tHouseholds
\tNumber\tSize
Bedminster\t789\t3.26
Bernards Twp.\t3087\t3.74
Bernardsville\t20l8\t3.30
Bound Brook\t3425\t3.04
Branchburg\tl644\t3.49
Bridgewater\t7897\t3.8l
Far Hil1s\t240\t3.l9
.TE
yields
Household Population
Households
Town
Number
Size
789
3.26
Bedminster
3087
3.74
Bernards Twp.
2018
3.30
Bernardsville
3425
3.04
Bound Brook
1644
3.49
Branchburg
7897
3.81
Bridgewater
240
3.19
Far Hills
If no arguments are given, tbl reads the standard input, so
it may be used as a filter. When it is used with egn or
negn the tbl command should be first, to minimize the volume
of data passed through pipes.
5-54
TBL(l)
XENIX Text processing
SEE ALSO
troff(l), eqn(l)
M. E. Lesk, TBL.
5-55
TBL(l)
TR(l)
XENIX Text Processing
TR(l)
N~E
tr - translate characters
SYNTAX
tr [ -cds
stringl [ string2 ] ]
DESCRIPTION
Tr copies the standard input to the standard output with
substitution or deletion of selected characters. Input
characters found in stringl are mapped into the corresponding characters of string2. When string2 is short it is padded to the length of stringl by duplicating its last character. Any combination of the options -cds may be used: -c
complements the set of characters in stringl with respect to
the universe of characters whose ASCII codes are 01 through
0377 octal; -d deletes all input characters in stringl; -s
squeezes all strings of repeated output characters that are
in string2 to single characters.
In either string the notation ~-£ means a range of characters from a to b in increasing ASCII order. The character
'\' followed by-I, 2 or 3 octal digits stands for the character whose ASCII code is given by those digits. A '\' followed by any other character stands for that character.
The following example creates a list of all the words in
'filel' one per line in 'file2', where a word is taken to be
a maximal string of alphabetics. The second string is
quoted to protect '\' from the Shell. 012 is the ASCII code
for newline.
tr -cs A-Za-z '\012' file2
SEE ALSO
ed(l), ascii(7)
NOTES
Won't handle ASCII NUL in stringl or string2; always deletes
NUL from input.
5-56
TROFF(l)
XENIX Text Processing
TROFF(l)
NAME
troff, nroff - text formatting and typesetting
SYNTAX
troff
option
[ file
nroff
option
file
DESCRIPTION
Troff formats text in the named files for printing on a
Graphic Systems C/A/T phototypesetteri nroff for
typewriter-like devices. Their capabilities are described
in the Nroff/Troff user'~ manual.
If no file argument is present, the standard input is read.
An argument consisting of a single minus (-) is taken to be
a file name corresponding to the standard input. The
options, which may appear in any order so long as they
appear before the files, are:
-olist Print only pages whose page numbers appear in the
comma-separated list of numbers and ranges. A range
N-M means pages N through Mi an initial -N means from
the beginning to-page Ni and a final N- means from N
to the end.
~.
-nN
Number first generated page
-sN
Stop every ~ pages. Nroff will halt prior to every ~
pages (default N=l) to allow paper loading or changing, and will resume upon receipt of a newline.
Troff will stop the phototypesetter every N pages,
produce a trailer to allow changing cassettes, and
resume when the typesetter's start button is pressed.
-mname Prepend the macro file /usr/lib/tmac/tmac.name to the
input files.
---~
(one-character)
to~.
-raN
Set register
-i
Read standard input after the input files are
exhausted.
-q
Invoke the simultaneous input-output mode of the rd
request.
Nroff only
-Tname Prepare output for specified terminal.
Known names
are 37 for the (default) Teletype Corporation Model
37 terminal, tn300 for the GE TermiNet 300 (or any
terminal without half-line capability), 300S for the
5-57
TROFF(I)
XENIX Text Processing
TROFF(l)
DASI-300S, 300 for the DASI-300, and 450 for the
DASI-450 (Diablo Hyterm).
-e
Produce equally-spaced words in adjusted lines, using
full terminal resolution.
-h
Use output tabs during horizontal spacing to speed
output and reduce output ,character count. Tab settings are assumed to be every 8 nominal character
widths.
Troff only
-t
Direct output to the standard output instead of the
phototypesetter.
-£
Refrain from feeding out paper and stopping phototypesetter at the end of the run.
-w
Wait until phototypesetter is available, if currently
busy.
-b
Report whether the phototypesetter is busy or available. No text processing is done.
-a
Send a printable ASCII approximation of the results
to the standard output.
-pN
Print all characters in point size ~ while retaining
all prescribed spacings and motions, to reduce phototypesetter elasped time.
-9
Prepare output for a GCOS phototypesetter and direct
it to the standard output (see gcat(l».
If the file /usr/adm/tracct is writable, troff keeps phototypesetter accounting records there. The integrity of that
file may be secured by making troff a 'set user-id' program.
FILES
IIII
/usr/lib/suftabl
/tmp/ta* 111,1111
/usr/lib/tmac/tmac.*
/usr/lib/term/*
/usr/lib/font/*
/dev/catllllill
/usr/adm/tracct
suffix hyphenation tables
temporary file
standard macro files
terminal driving tables for nroff
font width tables for troff
phototypesetter
accounting statistics for /dev/cat
SEE ALSO
J. F. Ossanna, Nroff/Troff user's manual
B. W. Kernighan, A TROFF Tutorral
eqn (1), tbl (1)
5-58
TROFF(l)
XENIX Text Processing
col(l), tk(l) (nroff only)
tc (1), gcat (1) (troff only)
5-59
TROFF(l)
UNIQ(I)
XENIX Text Processing
UNIQ(l)
NAME
uniq - report repeated lines in a file
SYNTAX
uniq [ -ude [ +n J [-n ] ] [ input [ output ] ]
DESCRIPTION
Uniq reads the input file comparing adjacent lines. In the
normal case, the second and succeeding copies of repeated
lines are removed; the remainder is written on the output
file. Note that repeated lines must be adjacent in order to
be found; see sort(l). If the -u flag is used, just the
lines that are not repeated in the original file are output.
The -d option specifies that one copy of just the repeated
lines is to be written. The normal mode output is the union
of the -u and -d mode outputs.
The -c option supersedes -u and -d and generates an' output
report in default style but with each line preceded by a
count of the number of times it occurred.
The n arguments specify skipping an initial portion of each
line-in the comparison:
-n
The first n fields together with any blanks before
each are i~nored. A field is defined as a string of
non-space, non-tab characters separated by tabs and
spaces from its neighbors.
+n
The first n characters are ignored.
skipped before characters.
SEE ALSO
sort(l), comm(l)
5-60
Fields are
VI (UCB)
XENIX Text Processing
VI (UCB)
NAME
vi - screen oriented (visual) display editor based on ex
SYNTAX
vi [ -t tag ]
[-r ]
[+lineno ] name •••
DESCRIPTION
Vi (visual) is a display oriented text editor based on
ex(UCB).
Ex and vi run the same code; it is possible to get
to the command mode of ex from within vi and vice-versa.
The Vi Quick Reference card and the Introduction to Display
Editing with Vi provide full details on using vi.
FILES
See ex (UeB) .
SEE ALSO
ex (UCB) , vi (UCB) , "Vi Quick Reference"
duction to Display Editing with Vi' '.
card,' "An Intro-
NOTES
Scans with / and? begin on the next line, skipping the
remainder of the current line.
Software tabs using AT work only immediately after the
autoindent.
Left and right shifts on intelligent terminals don't make
use of insert and delete character operations in the terminal.
The wrapmargin option can be fooled since it looks at output
columns when blanks are typed.
If a long word passes
through the margin and onto the next line without a break,
then the line won't be broken.
Insert/delete within a line can be slow if tabs are present
on intelligent terminals, since the terminals need help in
doing this correctly.
Occasionally inverse video scrolls up into the file from a
diagnostic on the last line.
Saving text on deletes in the named buffers is somewhat
inefficient.
The source command does not work when executed as :source;
there is no way to use the :append, :change, and :insert
commands, since it is not possible to give more than one
line of input to a
escape.
To use these on a :global you
must Q to ex command mode, execute them, and then reenter
5-61
VI (UCB)
XENIX Text Processing
the screen editor with vi or open.
5-62
VI (UCB)
XENIX Text Processing
We(l)
We(l)
NAME
wc - word count
SYNTAX
wc [ -lwc
[ name
DESCRIPTION
Wc counts lines, words and characters in the named files, or
in the standard input if no name appears. A word is a maximal string of characters delimited by spaces, tabs or newlines.
If the optional argument is present, just the specified
counts (lines, words or characters) are selected by the
letters 1, w, or c.
5-63
Source Exif Data:
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
PDF Version : 1.3
Linearized : No
XMP Toolkit : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37
Create Date : 2013:09:20 12:54:26-08:00
Modify Date : 2013:09:20 16:59:22-07:00
Metadata Date : 2013:09:20 16:59:22-07:00
Producer : Adobe Acrobat 9.55 Paper Capture Plug-in
Format : application/pdf
Document ID : uuid:bafe8388-6ef0-e444-893d-f2a21e4654f6
Instance ID : uuid:129f3951-7e1f-0e4a-b1de-6d4af1928f04
Page Layout : SinglePage
Page Mode : UseNone
Page Count : 249
EXIF Metadata provided by EXIF.tools