Xenix_System_Volume_3_Text_Processing Xenix System Volume 3 Text Processing

Xenix_System_Volume_3_Text_Processing Xenix_System_Volume_3_Text_Processing

User Manual: Xenix_System_Volume_3_Text_Processing

Open the PDF directly: View PDF PDF.
Page Count: 249

DownloadXenix_System_Volume_3_Text_Processing Xenix System Volume 3 Text Processing
Open PDF In BrowserView PDF
TECHNOLOGY INCORPORATED

XENIXTM
SYSTEM
TEXT
PROCESSING

VOLUME 3

CONTENTS
1.0 INTRODUCTION......................................

1-1

2.0 USING THE TEXT EDITORS ED AND SED.................
2 • 1 ED.................................................
2.1.1
A Summary of Commands and Line
Numbers
2-4
2.1.2
More Advanced Editing Techniques
2-6
2.1.3
Editing Scripts 2-30
2 • 2 SED.............................".................
2.2.1
Overall Operation
2-31
2.2.2
Command-line Flags
2-32
2.2.3
Order of Application of Editing Commands
2-32
2.2.4
Pattern-space
2-33
2.2.5
Addresses 2-33
2.2.6
Functions
2-34

2-1
2- 4

2 - 31

3.0 PATTERN RECOGNITION AND FILE COMPARISON
UTILITIES. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .

3 •1
3•2

3•3
3•4
3•5
3•6

4.0
4. 1
4.2
4.3
4 .4

5.0

GRE P ••••••••••• .• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
AWK. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

D I FF . . . . . . . . . . . . . . . . • • • . • . . . . . . . . . • . . . . . . . • . • . .
FF 3 ••••••••••••••••••••••••••••••••••••••••••
COMM •••••.••.••.••.••••••••••••.•••..••••••••.•
SPELL .•...••••••••. " • • • • . . • • . • . • • • • . • . . . . . . . • . .
DI

TEXT FORMATTING AND DOCUMENT PREPARATION.........
FORMATTING PACKAGES..............................
SUPPORTING TOOLS.................................
HINTS FOR PREPARING DOCUMENTS....................
A NOTE ABOUT THE PAPERS..........................
4.4.1
Using the -ms Macros with Troff and
Nroff
4.4.2
A Guide to Preparing Documents with
-ms
4.4.3
NROFF/TROFF User's Manual
4.4.4
A TROFF Tutorial
4.4.5
Tbl- A Program to Format Tables
4.4.6
Typesetting Mathematics- User's
Guide
4.4.7
Some Applications of Inverted
Indexes
COMMAND REFERENCE................................

-

i -

3-1
3- 2
3- 5
3-16
3-18
3-19
3- 20

4-1
4- 2
4-3
4-4
4- 5

5-1

4.3
4.4

HINTS FOR PREPARING DOCUMENTS....................
A NOTE ABOUT THE PAPERS..........................
4.4.1
Using the -ms Macros with Troff and
Nroff
4.4.2
A Guide to Preparing Documents with
-ms
4.4.3
NROFF/TROFF User's Manual
4.4.4
A TROFF Tutorial
4.4.5
Tbl- A Program to Format Tables
4.4.6
Typesetting Mathematics- User's
Guide
4.4.7
Some Applications of Inverted
Indexes

-

ii -

4-4
4.... 5

CHAPTER

1

INTRODUCTION

Users involved in text processing applications like typing
memos,
writing
technical
reports,
and
preparing
documentation, will soon discover that
their
pr1mar~
interface with the computer is through the editors, the
various pattern recognition and file comparison utilities,
and the text formatting packages. Programmers also make
extensive use of the editors and other utilities described
in this volume for writing and revising code. Therefore, it
is extremely important that all users learn as much as
possible about the tools available to them on the XENIX
system, and practice using the various
commands
and
functions. The more understanding the user has of which
functions work best in which situations and the more
dexterity the user developes in using particular commands,
the more powerful the editors and related tools become.
This volume contains an introduction to the XENIX text
editors, ed and sed. For a more detailed tutorial material
concerning the XENIX text editors, read the appropriate
sections in The Programmer'~ Introduction.
Also introduced in this volume are some tools which prove
extremely useful in the process of preparing documents, when
it is necessary to locate repeated elements in a single file
or group of files to make a consistent set of changes, or to
compare and contrast two or more files in order to identify
the differences between them. Because several of these
programs may be used interchangeably, knowing which one will
do the job at hand most efficiently is a large part of
understanding
their
use.
These
programs
streamline
complicated editing command procedures, locate variations
among several versions of text, and can deal with large
numbers of text files at once.
~

members of the grep family, grep, egrep, and fgrep.

$

awk.

~

diff and diff3.

$

comm.

$

spell.

1-1

XENIX Text Processing

The XENIX system also offers two text formatting packages
which
simplify
the
production of technical reports,
memoranda, formal papers, and documentation, nroff and troff
designed
to
produce
output for the lineprinter and
typesetter, respectively.
-Ms, a
canned
package
of
formatting requests which is much simpler to use than nroff
and troff, is described in detail. Some supporting programs
that aid in document preparation, including eqn which
integrates mathematical symbols and equations into the text
of a document, tbl which provides an analogous service for
preparing tabular material, and, refer which
prepares
bibliographic citations from a data base, are also discussed
in this volume.

1-2

CHAPTER

2

USING THE TEXT EDITORS ED AND SED

Most users of a computer system rely heavily on text editors
in doing
their work,
whether
it be writing programs or
preparing data. For those users involved in text processing
applications,
for
typing memos, writing technical reports,
or preparing documentation, the. various editing functions
may be their primary interface with the computer. Therefore,
it is extremely important that the text processing user
learn as much as possible about the editing tools available
on the system, and practice using the various commands and
functions.
The more understanding the user has of which
functions work best in which !ituations and the more
dexterity the user developes in using particular commands,
the more powerful the editors and related tools become. For
a more detailed introduction to text editing with XENIX,
read the
appropriate
sections
in
The
Programmer'~
Introduction.
XENIX offers two text editors,
ed,
an interactive line
editor, and sed, a non-interactive context editor. Although
in many respects the capabilities of these two editors
overlap, the user will soon find that ed is more appropriate
to on-the-spot entry, deletion and simple modification of
text.
Sed is more appropriate when uniform changes must be
made ·in large files or groups of file, or when the sequence
of editing commands needed to make the changes becomes
complex.
Because sed is derived from ed,
however,
the . two editors
share some characteristics.
In particular, they recognize
the same class of regular expressions. A regular expression
specifies a set of strings of characters to be matched by a
pattern found in the text,
sometimes referred to as a
context address.
In practical terms, these are the patterns
the user asks the editor
to search and substitute when
changes in text are required. These regular expressions
include:

1.

An ordinary character
below)
is a regular
character.

2.

A circumflex 'AI
at the beginning of a regular
expression matches the null character at the beginning
of a line.

(not one of those discussed
expression,
and matches that

2-1

XENIX Text Processing

3.

A dollar-sign '$1 at the end of a

4.

The characters
character, but
pattern ~pace.

5.

A period '.1 matches any character except the terminal

regular expression
matches the null character at the end of a line.
'\n' match
an
not the newline

imbedded
newline
at the end of the

newline of the pattern space.
6.

A regular
expression followed by an asterisk '*'
matches
any
number
(including
0)
of adjacent
occurrences of the regular expression it follows.

7.

A string of characters in square brackets '[ ]'
matches any character in the string, and no others'.
If, however, the first character of the string is
circumflex 'A"
the regular expression matches any
character except the characters in the string and the
terminal newline of the pattern space.

B.

A concatenation of regular expressions is a regular
expression which matches the concatenation of strings
matched by the components of the regular expression.

9.

A regular expression between the sequences '\(1 and
'\)' is identical in effect to the unadorned regular
expression, but has side-effects which are described
under the s command below and specification 10)
immediately below.

10.

The expression '\d' means the
same
string
of
characters matched by an expression enclosed in '\(1
and '\) I earlier in the same pattern.
Here d is a
single digit; the string specified is that beginning
with the dth occurrence of '\{' counting from the
left. For example, the expression 'A\(.*\)\ll matches
a line beginning with two ~epeated occurrences of the
same string.

11.

The null regular expression
'II') is equivalent to the
compiled.

standing alone (e.g.,
last regular expression

To use one of the special characters (A $ • * [ ] \ I) as a
literal (to match an occurrence of itself in the input),
precede the special character by a backslash '\1.
For a context address to 'match' the input requires that the
whole pattern within the address match some portion of the
pattern space.
The use of these pattern matches for

2-2

XENIX Text

~~oce~sing

specific applications within
detail for each editor.

2-3

ed

and sed are discussed in

XENIX Text Processing

2.1

ED

Ed is one of the text editors on the XENIX system, used
primarily to create and modify text interactively, whether
it is a document, a program, or data for a program.
The
most frequently used commands are summarized here, followed
by a discussio.n of editing techniques especially useful in
text processing applications.
2.1.1

A Summary of Commands and Line Numbers

The general form of
preceded by one or
r. and w, followed
allowed per line,
command (except for

ed commands is the command name, perhaps
two line numbers, and, in the case of e,
by a filename.
Only one command is
but a p command may follow any other
e, r, w, and q).

a

Append, that is, add lines to the buffer (at line dot,
unless
a different line is specified). Appending
continues until a period is is typed on a new line.
The value of dot is set to the last line appended.

c

Change the specified lines to the new text which
follows. The new lines are terminated by a period on a
newline, as with a. If no lines are specified, replace
line dot. Dot is set to last line changed.

d

Delete the lines specified.
If none are specified,
delete line dot.
Dot is set to the first unde1eted
line, unless $ is deleted, in which case dot is set to
$.

e

Edit new file. Any previous contents of the buffer are
thrown away, so issue a w beforehand.

f

Print remembered filename. If a name follows
the remembered name is set to it.

g

The command

f,

then

g/---/commands
will execute the commands on those lines that contain
which can be any context search expression.
i

until a
Insert lines before specified line (or dot)
single period is typed on a new line. Dot is set to
the last line inserted.

2-4

XENI~

Text Processing

m

Move lines specified to after the line named
Dot is set to the last line moved.

after

p

Print specified lines.
If none are specified,
print
line the line specified by dot.
A single line number
is equivalent to the line-numberp command.
A single
 prints .+!,the next line.

q

Quit ed. Wipes out all text in buffer if
in a row without a w command.

r

Read a file
into buffer
(at end unless specified
elsewhere.) Dot is set to the last line read.

s

The command

given

m.

twice

s/stringl/string2/
substitutes the characters stringl into string2 in
the
specified lines.
If no lines are specified,
the
substitution takes place only on the line specified by
dot.
Dot
is set to the last line
in which a
substitution took place,
which means that if
no
substitution takes place,
dot remains unchanged s
changes only the first occurrence of stringl on a line;
to change all of them, type a 9 after the final slash.
v

The command
v/---/commands
executes commands on those lines that do not contain

w

Write out buffer onto a file.

.- Print value of dot.
value of $.)

Dot remains unchanged.

(Anequalsignbyitself

prints

the

The line
!command-line
causes command-line to be executed as a XENIX command.
/string/ Context search~
Search for
next line which
contains this string of' characters.
Print it.
Dot is
set to the line where string was found.
Search starts
at .+1
,
wraps around from $ to 1, and continues to
dot, if necessary.

2-5

XENIX Text Processing

?string? Context search in reverse direction.
search at .-1 , scan to 1, wrap around to $.
2.1.2

Start

More Advanced Editing Techniques

There are often
several
alternative
procedures
for
accomplishing the same editing task, with varying degrees of
efficiency. This section provides explanations and examples
of how to use ed to edit with less effort and greater speed.
Topics covered include
$

Special characters in search and substitute commands

$

Line addressing

$

Global commands

$

Line moving

$

Line copying

2.1.2.1 Special Characters
There are
several
special
characters which facilitate searching and substitution in
ed.
The List command '1'
Ed provides two commands for printing
the contents of lines. One of these is p, in combinations
like
l,$p
to print all the lines in the file, or
s/abc/def/p
to change 'abc' to 'deft on the current line. Less familiar
is the list command 1 (the letter '1'), which gives slightly
more information than p. In particular, 1 makes visible
characters that are normally invisible, such as tabs and
backspaces. 1 prints each tab as ~ and each backspace as ~,
in a line which contains these characters. This makes it
much easier to correct typing mistakes when extra spaces are
adjacent to tabs, or backspaces are followed by a space.
The 1 command also 'folds' long lines, by printing any line
that exceeds 72 characters on multiple lines; each printed
line except the last is terminated by a backslash \,
to

2-6

XENIX Text Processing

indicate that it was folded.
This overcomes the limitation
of your terminal screen width.
Occasionally, the 1 command will print in a line a string of
numbers preceded by a backslash, such as \07 or \16, making
visible characters that normally do not print,
like form
feed or vertical tab or bell, usually typed in error.
These
combinations are a single character with special meanings on
some terminals.
The Substitute Command's'
The substitute command s is the
command for changing the contents of
individual lines,
probably the most complicated and powerful of ed commands.
The most straightforward example is the trailing 9 after a
substitute command. with
s/this/that/
and
s/this/that/g
the first one replaces the first 'this' on the line with
'that'.
If there is more than one 'this' on the line, the
second form with the trailing 9 changes all' of them.
Either form of the s command can be followed by p
'print' or 'list' the contents of the line:

or

1

to

s/this/that/p
s/this/that/l
s/this/that/gp
s/this/that/gl
are all slight variations.
Of course, any s command can be preceded by one or two 'line
numbers'
to specify that the substitution is to take place
on a group of lines. Thus
l,$s/mispell/misspell/
changes the first occurrence of 'mispell' to
each line of the file.
But
l,$s/mispell/misspell/g
changes every occurrence in each line.

2-7

'misspell'

in

XENIX Text Processing

If a p or 1 is added to the end of any of these substitute
commands, only the last line that got changed will be
printed, not all the lines.
The Undo Command 'u'
If a substitution in a line
is
incorrect, The  -- append lines
The ~ function causes the argument  to be written
to the output after the line matched by its address.
The ~ command is inherently multi-line; a must appear
at the end or ~ line, and  may contain any number
of lines.
To preserve
the
one-command-to-a-line
fiction,
the
interior newlines must be hidden by a
backslash character
('\')
immediately preceding the
newline.
The  argument is terminated by the
first unhidden newline (the first one not immediately
preceded by backslash).
Once an a function is successfully executed,

will be-written to the output regardless of what later
commands do to the line which triggered it.
The
triggering line may be deleted entirely;  will
still be written to the output.
The  is not scanned for address. matches,
and no
editing commands are attempted on it.
It does not
cause any change in the line-number counter.

$

(l)i\  -- insert lines
The i function behaves identically to the ~ function,
except that  is written to the output before the
matched line. All other comments about the a function
apply to the i function as well.

t'J>

(2)c\ 

change lines

The c function deletes the lines selected by its
address(es),
and replaces them with the lines in
. Like a and i, c must be followed by a newline
hidden by a backslash; and interior new lines in 
must be hidden by backslashes.

2-35

XENIX Text Processing

The £ command may have two addresses,
and therefore
select a range of lines.
If it does, all the lines in
the range are deleted, but only one copy of  is
written to the output, not one copy per line deleted.
As with a and i,  is not scanned for address
matches,- and -no editing commands are attempted on it.
It does not change the line-number counter.
After a line has been deleted by a
further commands are attempted on it.

c

function,

no

If text is appended after a line by a or r
functions,
and the line is subsequently changed~ the text inserted
by the c function will be placed before the text of the
a or r functions.
Note: within the text put in the output by these functions,
leading blanks and tabs will disappear, as always in sed
commands. To get leading blanks and tabs into the output,
precede the first desired blank or tab by a backslash; the
backslash will not appear in the output.
For example, the list of editing commands:
n

a\
XXXX
d

applied to our standard input, produces:
In Xanadu did Kubhla Khan
XXXX
Where Alph, the sacred river, ran
XXXX
Down to a sunless sea.
In this particular case, the same effect would
by either of the two following command lists:
n

n

i\

c\

xxxx

XXXX

d

2-36

be

produced

XENIX Text Processing

2.2.6.2
Substitute Function
One very important function
changes parts of lines selected by a context search within
the line.
(2)s -- substitute
The s
function
replaces part of a line
(selected
.
It can best be read:

by

Substitute for , 
The  argument contains a pattern, exactly like the
patterns
in
addresses.
The only difference between
 and a context address is that the context address
must be delimited by slash ('/1) characters;  may
be delimited by any character other than space or newline.
By default,
only the first atring matched by  i~
replaced, but see the ~ flag below.
The  argument begins immediately after
the
second delimiting character of ,
and· must be
followed immediately by another instance of the delimiting
character.
(Thus
there are exactly three instances of the
delimiting character.) The  is not a pattern,
and the characters which are special in patterns do not have
special meaning in .
Instead, other characters
are special:
$

& is replaced by the string matched by 

$

\d (where d is a single dig it) is replaced by the dth
substring matched by parts of  enclosed in
'\ ( , and '\) ,
substrings
occur
in
If nested
,
the dth
is determined by counting opening
delimiters ('\('):-

.

As in patterns, special characters may be made
by preceding them wi th backslash ('\' ) ..

literal

The  argument may contain the following flags:
g --

substitute  for
all
(non-overlapping)
instances of  in the line.
After a
successful substitution, the scan for
the next
instance of  begins just after the end of
the inserted characters; characters put into the
line from  are not rescanned.

p -- print the line if a successful replacement was done.
The E flag causes the line to be written to the
output if and only if a substitution was actually

2-37

XENIX Text Processing

made by the s function. Notice that if several s
functions, each followed by a £ flag, successfully
substitute in the same input line, multiple copies
of the line will be written to the output: one for
each successful substitution.
w  -- write the line to a file if a successful
replacement was done.
The w flag causes lines
which are actually substituted by the s function
to be written to a file named by . If
 exists before Sed is run, it
is
overwritten; if not, it is created.
A single space must separate

~

and .

The possibilities of multiple, somewhat different
copies of one input line being written are the
same as for £.
A maximum of 10 different file
mentioned after w flags and w
below), combined.
Here are some examples.
our standard input,

names may be
functions (see

The following command,

s/to/by/w changes
produces, on the standard output:
In Xanadu did Kubhla Khan
A ·stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless by man
Down by a sunless sea.
and, on the file 'changes':
Through caverns measureless by man
Down by a sunless sea.
If the nocopy option is in effect, the command:
s/[.,;?:]/*P&*/gp
produces:
A stately pleasure dome decree*P:*
Where Alph*P,* the sacred river*P,* ran
Down to a sunless sea*P.*

2-38

applied

to

XENIX

Finally, to
command:

illustrate

TextPro~essing

the

effect

of

the

~

flag,

the

/X/s/an/AN/p
produces (assuming nocopy mode) :
In XANadu did Kubhla Khan
and the command:
/X/s/an/AN/gp
produces:
In XANadu did Kubhla KhAN
2.2.6.3

Input-output Functions

(2)p -- print
The print function writes the addressed lines to
the standard output file.
They are written at the
time the E function is encountered, regardless of
what succeeding editing commands may do to the
lines.
(2)w  -- write on 
The write function writes the addressed lines to
the
file named by .
If the file
previously existed, it is overwritten; if not,
it
is created. The lines are written exactly as they
exist when the write function is encountered for
each line,
regardless of what subsequent editing
commands may do to them.
Exactly one
.

space

must

separate

the

w

and

A maximum of ten different files may be mentioned
in write functions and ~ flags after s functions,
combined.
(l)r  -- read the contents of a file
The read
function
reads
the
contents
of
,
and appends them after the line
matched by the address.
The file
is read and
appended regardless of what subsequent editing

2-39

XENIX Text Processing

commands do to the line which matched its address.
If r
and a functions are executed on the same
line~ the text from the a
functions and the r
functions
is written to the output in the order
that the functions are executed.
Exactly one space must separate the
rand
.
If a file mentioned by a r function
cannot be opened, it is considered a null file,
not an error, and no diagnostic is given.
NOTE: Since there is a limit to the number of files that can
be opened simultaneously, care should be taken that no more
than ten files be mentioned in w functions or flags;
that
number is reduced by one if-any r functions are pres~nt.
(Only one read file is open at one tIme.)
Here are some examples.
the following contents:

Assume that the

file

'notel'

has

Note: Kubla Khan (more properly Kublai Khan;
1216-1294) was the grandson and most eminent
successor of Genghiz (Chingiz) Khan, and
founder of the Mongol dynasty in China.
Then the following command:
/Kubla/r notel
produces:
In Xanadu did Kubla Khan
Note: Kubla Khan (more properly Kublai Khan;
1216-1294) was the grandson and most eminent
successor of Genghiz (Chingiz) Khan, and
founder of the Mongol dynasty in China.
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
2.2.6.4
Multiple Input-line Functions
Three functions,
all spelled with capital letters, deal specially with
pattern spaces containing imbedded newlines;
they
are
intended principally to provide pattern matches across lines
in the input.
(2)N -- Next line

2-40

XENIX Text Process'ing

The next input line is appended to the current
line in the pattern space; the two input lines are
separated by an imbedded newline~ Pattern matches
may extend across the' imbedded newline (s) •
(2)D -- Delete first part of the pattern space
Delete up to and including the first newline
character
in the current pattern space.
If the
pattern space becomes empty (the only newline was
the terminal newline), read another line from the
input.
In any case, begin the list of editing
commands again f~om its beginning.
(2)P -- Print first part of the pattern space
Print up to and including the first newline in the
pattern space.
lower-case
The P and D functions are equivalent to their
counterparts if there are no imbedded newlines in the
pattern space.
2.2.6.5
Hold and Get Functions
These functions save
retrieve part of the input for possible later use:
1.

and

(2) h--hold pattern space
The h functions copies the contents of the pattern
(destroying the previous
space into a hold area
contents of the hold area).

2.

(2)H -- Hold pattern space
The H function appends the contehts of the pattern
space to the contents of the hold area; the former and
new contents are separated by a newline~

3.

(2) g -- get contents of hold area
The ~ function copies the contents of the hold area
into
the pattern space
(destroying the previous
contents of the pattern space).

4.

(2) G -- Get contents of hold area
The ~ function appends the contents of the hold area
to the contents of the pattern space; the former and
new contents are separated by a newline.

2-41

XENIX Text Processing

5.

(2)x -- exchange
The exchange command interchanges the contents of
pattern space and the hold area.

the

For example, the commands
lh
lsi did.*//
Ix

G
s/\n/

:/

applied to our standard example, produce:
In Xanadu did Kubla Khan
:In Xanadu
A stately pleasure dome decree:
:In Xanadu
Where Alph, the sacred river, ran :In Xanadu
Through caverns measureless to man :1n Xanadu
Down to a sunless sea.
:1n Xanadu
2.2.6.6
Flow-of-Control Functions
These functions do no
editing on the input lines, but control the application of
functions to the lines selected by the address part.
(2)! -- Don't
The Don't command causes the next command (written
on the -same line), to be applied to all and only
those input lines not selected by the adress part.
(2){ -- Grouping
The grouping command '{I causes the next set of
commands to be applied (or not applied) as a block
to the input lines selected by the addresses of
the grouping command.
The first of the commands
under control of the grouping may appear on the
same line as the '{I or on the next line.
The group of commands is terminated by a
,}, standing on a line by itself.

matching

Groups can be nested.
(0) :

Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
XMP Toolkit                     : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37
Create Date                     : 2013:09:20 12:54:26-08:00
Modify Date                     : 2013:09:20 16:59:22-07:00
Metadata Date                   : 2013:09:20 16:59:22-07:00
Producer                        : Adobe Acrobat 9.55 Paper Capture Plug-in
Format                          : application/pdf
Document ID                     : uuid:bafe8388-6ef0-e444-893d-f2a21e4654f6
Instance ID                     : uuid:129f3951-7e1f-0e4a-b1de-6d4af1928f04
Page Layout                     : SinglePage
Page Mode                       : UseNone
Page Count                      : 249
EXIF Metadata provided by EXIF.tools

Navigation menu