CSL 85 9_Walnut_Storing_Electronic_Mail_in_a_Database 9 Walnut Storing Electronic Mail In A Database

CSL-85-9_Walnut_Storing_Electronic_Mail_in_a_Database CSL-85-9_Walnut_Storing_Electronic_Mail_in_a_Database

User Manual: CSL-85-9_Walnut_Storing_Electronic_Mail_in_a_Database

Open the PDF directly: View PDF PDF.
Page Count: 20

DownloadCSL-85-9_Walnut_Storing_Electronic_Mail_in_a_Database CSL-85-9 Walnut Storing Electronic Mail In A Database
Open PDF In BrowserView PDF
Walnut:

·Storing Electronic Mail in a Database

James Donahue and Willie-Sue Orr

Walnut: Storing Electronic Mail in a Database
James Donahue and Willie-Sue Orr

CSL-85-9

November 1985

[P85-00028]

© Copyright 1985 Xerox Corporation. All rights reserved.

Abstract: Walnut is an electronic mail storage and retrieval system developed for the Cedar
environment.

The most novel aspect of Walnut is its use of a general·purpose relational

database for storage of messages. This paper discusses the design and implementation of
Walnut, with particular attention to the problems of using a database system for this type of
application.

CR Categories and Subject Descriptors: H.2 [Database Management]: Systems transaction processing; H.4.3 [Communications Applications]: Electronic mail.
Additional Keywords and
processing.

XEROX

Phrases:

Cedar, integrated applications, background

Xerox Corporation
Palo Alto Research Center
3333 Coyote Hill Road
Palo Alto, California 94304

WALNUT:" STORING ELECTRONIC MAIL IN A DATABASE

3

designers was the emphasis placed on asynchrony. One of the main goals of the Cedar system was
easy switching between tasks and allowing the" user to work on new tasks while previously started
ones completed. This emphasis imposed several requirements on Walnut:
1.

The user can view several components of a Walnut database simultaneously. Thus, Walnut
operations must be prepared to keep the contents of multiple windows up-to-date; we
guarantee that the effect of any user operation on the database will be consistent with the
portion of the database that is currently displayed.

2.

More importantly, Cedar users want to leave Walnut running constantly-they don't want
to have to restart the system to begin reading their mail again. Much of what we do can
be characterized as responding to our mail, either by answering messages directly or
incorporating information from our mail into the programs and papers that we write.
Because we store Walnut databases on a shared file server, it is important to be able to give
up the connection to the server when the user is idle. Providing the consistency guarantee
stated above, while not going to the extremes of either holding onto long transactions or
redisplaying the contents of several Walnut windows when starting up after a long idle
period, was a major design problem.

3.

Finally, having plenty of things to do is no help when the only thing that you really want
to do takes a very long time. Walnut has two operations that are potentially much more
expensive th~n all of the others: reading mail" from mail servers and expunging old mail
from the system. New mail retrieval is the more important, because it is done frequently
by most users. We currently use background processing to add new mail to the system and
have designed (but not yet implemented) the background collection of space consumed by
expunged messages. In this way, we were able to smooth out the performance of the system
substantially; however, as we discuss below, the cost of these improvements in perceived
performance is a rather substantial complication in the internal processing of the system.

2.2 The Mail Distribution Facilities
Walnut is only a mail storage and retrieval system; it relies on the Grapevine message transport
system to provide the new mail to store in a Walnut database and to deliver the messages that a
Walnut user composes. Grapevine is discussed in [Birre1l83]. It is used by Walnut through a
GrapevineUser package that runs on the client workstation; for the purposes of this paper, the
GrapevineUser package provides four operations:
1.

Wait until new mail exists for the current user on qne or more of the Grapevine servers,

2. " Enumerate the servers on which the current user has a mail box (in which there might be
new mail) and establish a connection with each of them in turn,
3.

Copy messages from the current server to a file, and

4.

Flush previously copied messages from the server. (Copying and then flushing allows

XEROX PARCo CSL-85-9. NOVEMBER 1985

WALNUT: STORING ELECTRONIC MAIL IN A DATABASE

protection against crashes.)
It is easy to write a "new mail retrieval" section of Walnut that would perform correctly: when
new mail exists, copy the new mail into the database and then flush the messages from the Grapevine
servers. However, there are some performance characteristics of Grapevine that have to be considered
when building a responsive system.
1.

The Grapevine servers are slower than most of our workstations (and the servers are
generally heavily loaded). Thus, fetching new mail is relatively slow - it can take several
minutes to retrieve new messages when returning from a vacation or a long weekend.

2.

Grapevine server crashes are rare, but they do occur. Unfortunately, a Grapevine server
may crash between the time at which the user has been told that new mail exists and the
time at which he attempts to fetch it. And when this happens, the error indication does
not happen until the connection attempt times out a few minutes later.
Both of these considerations make the cost of retrieving new mail hard to predict. If the servers
are up and lightly loaded, and there aren't many messages to fetch, and you are near the server on
the network, and ... , it may go quickly. On the other hand, there are frequently situations in which
new mail is going to arrive slowly, or possibly not at all. As we discuss later, we use background
processing in Walnut to attempt to minimize the variance in responsiveness this causes.

2.3 File Storage and Databases in Cedar

It generally takes three or more machines to read mail: one or more Grapevine servers (where
the user's mail boxes reside), one or more file servers (where the user's mail database resides), and
the user's workstation (which is running the Walnut program).
" 2.3.1 Alpine
Walnut stores mail files on Alpine file servers [Brown84]. Alpine is a file server program (written
in Cedar) that provides page- and file-level locking and transactions: a general-purpose remote
procedure call package is used to transfer pages to and from Alpine servers. Walnut uses Alpine
transactions to prevent inconsistent updates due to either concurrent access or server crashes.
Although Alpine can "run on a client workstation (it will coexist with the normal Cedar file
system), most Walnut users store their files on separate Alpine server machines. The reasons for
storing files on a server are simple: backup of the server is done by the administration and the
Walnut user is then free to read his mail from any Cedar machine that happens to be available. The
ability to easily move among machines is particularly important for our summer interns, who typically
do not have personal workstations at their disposal but must share a pool of public machines; using
a shared server also makes it possible to have public mail databases. However, using a file server
machine introduces some interesting failure modes and makes it necessary to manage the connections
with the shared resource.

XEROX PARCo CSL-85-9. NOVEMBER 1985

WALNUT:

STORING ELECTRONIC MAIL IN A DATABASE

5

2.3.2 Cypress

Walnut manipulates Cypress databases [Catte1l83].

Cypress is an entity-relationship database

system developed in Cedar. The details of the Cypress data model are not particularly important;
almost any relational system would have served us as well.

Cypress provides a fairly low-level

programmer's interface, which is important since we use the database to record and to manipulate a
large amount of information about the state of Walnut processing. The database is a natural place
to store state information and the Cypress programmer's interface is efficient enough to make this
practical.
Cypress currently runs on the client workstations, i.e., the workstation running Walnut also runs
Cypress. We have plans to change Cypress so that, like Alpine, it can run on a separate server;
when this is done, we will be able either to run Alpine and Cypress on the same machine (as a
"database server") or to run Alpine on one machine, providing only file service and run Cypress on
a separate machine, communicating with both the client .machine (running Walnut) and the Alpine
servers using remote procedure calls.
Cypress uses Alpine transactions in a straightforward manner: instead of doing logical locking
of tuples or relations, Cypress simply locks the pages of the database as tuples are read or written.
Because the mapping from Cypress entities and relationships to Alpine pages is not under the control
of the Cypress client, the client cannot depend on logically independent operations to be physically
independent on the Alpine server. (Cypress attempts to perform some co-location of tuples for the
same entity on the same page, but this is done to improve performance rather than to reduce
transaction conflicts.)

3. The Walnut User Interface
As we mentioned above, the user's view of a Walnut database is relatively straightforward. A
database consists of messages with immutable properties and message sets with changeable contents.
This is reflected in the user interface by having three separate types of windows on a Walnut
database:

1.

The Walnut "control window," which contains the list of all the current message sets (in
Figure 1. the built-in message sets Active and Deleted are highlighted), buttons for basic
operations such as shutdown and retrieval of new mail, and a typescript that records the
history of user interactions.

2.

A Walnut message set displayer lists the messages in a message set and provides a menu of
operations for them (Figure 2). The message names displayed in italics have not been read:
the message name displayed in bold is the "current selection" in the message set. (The
choice of these fonts is a user-specified option.)

The operations in the menu use the

"current selection" as an implied argument: the AddTo and MoveTo operations use the
selected message sets in the Walnut control window as arguments. A message in a message

XEROX PARe. CSL-85-9. NOVEMBER 1985

6

WALNUT: STORING ELECTRONIC MAIL IN A DATABASE

;
, _
1j.i'iialnut 6.0 :
"
Sender NewMail CloseAll IvIs gSetOps NumMs gs Explnfo Find
There is no nevI mail at .November 11, 1985 5:01:47 pm PST
Act i ve De 1 eted Addresses ADForum Alp; ne Arpanet
Banyan Bass b; b 1; ography Bu 11 et; nBoard Cedar6. 0
CedarArch i ve CedarD; scuss i on CedarLang CedarLore
CedarLoreNew CedarP 1 an Chr i s C8L Cypress DBP 1 ans
Disks DocMgmt Dragon finger Grapev i ne Hi ckory icons
I ntegrat; on I nterest I nterscr; pt Junk
Object8erverDiscussion Oct17MeetingMinutes Projects
Recru i t i ng Russe 11 8em i nars 80ftwareForum 8qu; rre 1
TechReports Tr i ps Types UGrant UN I X ~~a 1nut
Wa 1 nut8tats ~~a 1 nut8uggest; ons Wh; teboards ~~; shL ; st

Addin g new mail to the database
Sal v ador .ms delivered 1 message
Msg: Ending backup (file s ... : has been Deleted from Active
Copyright © 1985 by Xerox Corporation. All rights reserved
Figure 1. The Walnut Control Viewer

set can be displayed simply by pointing the mouse at it and clicking one of the mouse
buttons.
3.

A Walnut message displayer. which displays the text of a message (Figure 3). The menu
items in the window include operations to generate message-sending windows to Answer or,
Forward a message. a Print button for producing hardcopy. and operations to manipulate
the contents of the message as a normal Cedar document (the "Places" and "Levels"
buttons). Note that messages in Walnut can be formatted documents-the mail system uses
the same editor as the rest of Cedar.

,
V'~jI'hi te boa.rds IvIessa.ges
' " ". ..-' .
Display Delete AddTo Places Levels MsgOps
7 Aug 85 To: CedarUs...
Whlteboard for CedarChest 6.0
13 Aug 85 PeterKessler.pa WhiteBoardDoc
17 Sep 85 jw@GVAX""
Whiteboards paper
2.4 Sep 85 To: jw@G...
Re: Whiteboards paper
2.5 Sep 85 jw@GVAX....
Found it ...
2.6 Sep 85 jw@GVAXIIII
Version Starnps and The Promised
Paragraph
2.6 Sep 85 To: jw@G...
Re: Version Stamps and The Promised
Paragraph
'/7 Oct 85 To.' Barth
Re.· Core discussion
17 Oct 85 Sprei tzer .pa
Re: Core discu.ssion
11 Oct 8S
Spreitzer.pa
Re: Core discussion
18 Oct 85 Barth.pa
Large text boxes on whiteboards
18 Oct 85 To: Barth
Re: Lar ge text boxes on white boards
25 Oct 85
To.' CedarUse...
Hthiteboard for CedarChest 6.0
>

MoveTo

Copyright © 1985 by Xerox Corporation. All rights reserved
Figure 2. A Walnut Message Set Viewer

XEROX PARC, CSL-85-9. NOVEMBER 1985

•

7

WALNUT: STORING ELECTRONIC MAIL IN A DATABASE

Freeze Categorles Answer Forward Prmt gvID Split Places Levels
Date: Mon~ 4 Nov 85 14:34:43 PST
From: Terry.pa
Suhject: J. Ousterhout on "The Sprite Network Operating System," December 10 at 2. p.m. in the
CSL Commons
To: Compu terResearch t .pa
Reply-to: Terry.pa
I am announcing this now for those of you that may have been planning to drive to IBM to hear
John's talk. I'll send out a reminder in a few weeks ...

Tuesday~ December 10~ 1985
2. p.m.~ CSL Commons (35-2.2.30)

THE SPRITE NETWORK OPERATING SYSTEM
John K. Ousterhout
Computer Science Division
University of California~ Berkeley
Sprite is a new network operating system built by a team of graduate students and myself as part of the
SPUR workstation project. The talk will focus on three parts of Sprite: the filesystem, process
offloading, and the virtual mem.ory system. The filesystem will provide a single· shared Unix-like file
hierarchy distributed across several servers. Clients will use dynamically-constructed prefix tables to
associate file names with servers. Sprite will include a process offloading mechanism that allows jobs to
be run on idle workstations in the same way that jobs may be placed in background in Unix. The
virtual memory system will be Unix-like with simple extensions to permit shared data segments and
synChronization. I'll talk about how changes in technology have influenced the design of Sprite and try
to convince you that a) a simple network filesystem eliminates the need for other network protocols,
RPC, name servers, and the like; b) magnetic disks will soon be obsolete; and c) paging is also about to
be obsolete (sort of).
Host: Doug Terry

Copyright © 1985 by Xerox Corporation. All rights reserved
Figure 3. A Walnut Message Viewer
Something that is missing from the user interface for Walnut is a more general mail database
querying facility than simply being able to display the contents of a message set.

A query .

facility has been built for Walnut. but it is not an integral part of the system (and currently is
not widely used). There are two reasons why we did not build a general query processor as
part of Walnut:

1.

Our first goal was to get the underpinnings of the system right before adding the substantial
complexity that a queryer would entail. We were conservative in our ambitions because of
the importance our users place on high reliability and performance.

2.

More importantly, a good test of the programmer's interfaces to Walnut was whether we
could build a query processor on top of Walnut, rather than inside it. Being able to do so
made it much easier to modify the queryer as we gained experience with it and made it
possible for the Walnut users who did not need or want a query processor to avoid paying
any performance penalties.

We will discuss below the query processor and the other

packages built on top of Walnut.

XEROX PARCo CSL-85-9. NOVEMBER 1985

8

WALNUT: STORING ELECTRONIC MAIL IN A DATABASE

4. The Walnut Internal Structure
4.1 Files
Instead of storing all of the information pertaining to messages and message sets in a Cypress
database. Walnut uses multiple files for storage: a Cypress database file and a collection of log files.
Both the logs and the database are stored on Alpine servers. The most important of the log files is
the "current log" file, which contains both the contents of all of the messages and a history of all of
the operations that a Walnut user has performed (expunging the database compresses this history by
discarding entries for deleted messages); thus, the contents of the database can always be reconstructed
by replaying the current log. The database contains the current association of messages with message
sets and the information necessary to display messages (e.g., the label for a message in a message set
viewer and the position in the log of the message body). Additional log files are used to store
incoming messages and to execute an expunge operation - the details of their use is given below.
This separation of the information into several files was motivated by several concerns:
1.

Cypress, like most database systems, does not handle large varying length strings well. We
knew that Walnut would have to handle large messages, so the use of a separate log to store
the message bodies was necessary. Messages of approximately half a megabyte have been
sent and received by Walnut; the system was designed to handle messages much larger.

2.

Initially, there were some worries about the reliability of Cypress; indeed, a few bugs were
found in Cypress when Walnut use became heavy. Having a separate log meant that one
could always fall back to reconstructing the database rather than having to find ways of
recovering data from damaged databases. Reliability was critical in the acceptance of the
system; our users were willing to have the system crash if the recovery was simple. and
certain.

3.

Having a log also meant that we had more freedom in changing the database schema,
something that we have done during the course of the Walnut development. We can release
a new version with a change in the database schema without issuing complicated instructions
to our users. Instead, we record in the database a "database schema version stamp" that is
checked by each Walnut release; having the wrong version stamp is logically no different
than having a mangled database - in each case, recovery is done by replaying the log. (Still,
rebuilding databases is not something that we do regularly; most Walnut users rebuild only

when forced to do so because of serious hardware or software failures. In particular, we
do not depend on rebuilding daiabases to reclaim file space.)
A "root file" is used to provide a logical connection between the log files an~ the database.
Again, the root file is stored on Alpine. The root file contains the name of the database file and the
list ·of the log files; additionally it records which user may add new mail to the database and a
user-specified "key" that is stored in all of the log files and in the database to help guarantee that
the logs and database mentioned in the root really do belong together (this key is usually some

XEROX PARCo CSL-85-9. NOVEMBER 1985

WALNUT: STORING ELECTRONIC MAIL IN A DATABASE

9

descriptive phrase, such as "Donahue's mail database").
The fact that the information in a Walnut database is spread over several files is carefully hidden
from both the Walnut client programs and (as best we can) from the user. The root file is really
the file that describes a Walnut database-it is the responsibility of the low levels of Walnut to
interpret the contents of the root file and to make the collection of files described therein behave as
if all of the data were stored in a single database.

4.2 Programmer's Interfaces
Walnut makes available three programmer's interfaces. Together, these capture the semantics
of a Walnut database as a collection of "abstract datatypes." The lowest level interface, WalnutOps,
provides all of the operations that can be legally performed on a Walnut database. The user level
interface, WalnutWindow, gives a client program access to the same set of operations that a Walnut
user can perform through the buttons on the screen. Finally, the WalnutRegistry interface provides
a client program with a history of the WalnutOps operations that are performed so that the state
transformations of the database can be monitored.

4.2.1 WalnutOps
The semantics of a Walnut database are completely characterized for Walnut clients by the
WalnutOps interface. WalnutOps includes all of the operations needed for the Walnut user interface
code (such as "get the contents of this message," or "move this message to this message set") and
all of the operations that are used to synchronize the various Walnut processes (more about this will
be given below). The operations in WalnutOps can be separated into the following categories:

1.

Starting and stopping Walnut: The StartUp operation takes the name of a root file and
initializes the system to operate on the database and log files named in the root; the
ShutDown operation shuts down all of the server c

3

III

en

15'
~
D>
'3

c:

CD
D>
~

Co

~

co·

en
c:
CD

~



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
XMP Toolkit                     : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37
Create Date                     : 2010:11:26 18:52:56-08:00
Modify Date                     : 2010:11:26 19:53:58-07:00
Metadata Date                   : 2010:11:26 19:53:58-07:00
Producer                        : Adobe Acrobat 9.4 Paper Capture Plug-in
Format                          : application/pdf
Document ID                     : uuid:c1e56012-642d-4b74-ae8c-e5fcba670b22
Instance ID                     : uuid:fbcaa75d-77a5-4feb-94e5-3ba9c391cc1d
Page Layout                     : SinglePage
Page Mode                       : UseNone
Page Count                      : 20
EXIF Metadata provided by EXIF.tools

Navigation menu