CSL 85 9_Walnut_Storing_Electronic_Mail_in_a_Database 9 Walnut Storing Electronic Mail In A Database
CSL-85-9_Walnut_Storing_Electronic_Mail_in_a_Database CSL-85-9_Walnut_Storing_Electronic_Mail_in_a_Database
User Manual: CSL-85-9_Walnut_Storing_Electronic_Mail_in_a_Database
Open the PDF directly: View PDF .
Page Count: 20
Download | |
Open PDF In Browser | View PDF |
Walnut: ·Storing Electronic Mail in a Database James Donahue and Willie-Sue Orr Walnut: Storing Electronic Mail in a Database James Donahue and Willie-Sue Orr CSL-85-9 November 1985 [P85-00028] © Copyright 1985 Xerox Corporation. All rights reserved. Abstract: Walnut is an electronic mail storage and retrieval system developed for the Cedar environment. The most novel aspect of Walnut is its use of a general·purpose relational database for storage of messages. This paper discusses the design and implementation of Walnut, with particular attention to the problems of using a database system for this type of application. CR Categories and Subject Descriptors: H.2 [Database Management]: Systems transaction processing; H.4.3 [Communications Applications]: Electronic mail. Additional Keywords and processing. XEROX Phrases: Cedar, integrated applications, background Xerox Corporation Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, California 94304 WALNUT:" STORING ELECTRONIC MAIL IN A DATABASE 3 designers was the emphasis placed on asynchrony. One of the main goals of the Cedar system was easy switching between tasks and allowing the" user to work on new tasks while previously started ones completed. This emphasis imposed several requirements on Walnut: 1. The user can view several components of a Walnut database simultaneously. Thus, Walnut operations must be prepared to keep the contents of multiple windows up-to-date; we guarantee that the effect of any user operation on the database will be consistent with the portion of the database that is currently displayed. 2. More importantly, Cedar users want to leave Walnut running constantly-they don't want to have to restart the system to begin reading their mail again. Much of what we do can be characterized as responding to our mail, either by answering messages directly or incorporating information from our mail into the programs and papers that we write. Because we store Walnut databases on a shared file server, it is important to be able to give up the connection to the server when the user is idle. Providing the consistency guarantee stated above, while not going to the extremes of either holding onto long transactions or redisplaying the contents of several Walnut windows when starting up after a long idle period, was a major design problem. 3. Finally, having plenty of things to do is no help when the only thing that you really want to do takes a very long time. Walnut has two operations that are potentially much more expensive th~n all of the others: reading mail" from mail servers and expunging old mail from the system. New mail retrieval is the more important, because it is done frequently by most users. We currently use background processing to add new mail to the system and have designed (but not yet implemented) the background collection of space consumed by expunged messages. In this way, we were able to smooth out the performance of the system substantially; however, as we discuss below, the cost of these improvements in perceived performance is a rather substantial complication in the internal processing of the system. 2.2 The Mail Distribution Facilities Walnut is only a mail storage and retrieval system; it relies on the Grapevine message transport system to provide the new mail to store in a Walnut database and to deliver the messages that a Walnut user composes. Grapevine is discussed in [Birre1l83]. It is used by Walnut through a GrapevineUser package that runs on the client workstation; for the purposes of this paper, the GrapevineUser package provides four operations: 1. Wait until new mail exists for the current user on qne or more of the Grapevine servers, 2. " Enumerate the servers on which the current user has a mail box (in which there might be new mail) and establish a connection with each of them in turn, 3. Copy messages from the current server to a file, and 4. Flush previously copied messages from the server. (Copying and then flushing allows XEROX PARCo CSL-85-9. NOVEMBER 1985 WALNUT: STORING ELECTRONIC MAIL IN A DATABASE protection against crashes.) It is easy to write a "new mail retrieval" section of Walnut that would perform correctly: when new mail exists, copy the new mail into the database and then flush the messages from the Grapevine servers. However, there are some performance characteristics of Grapevine that have to be considered when building a responsive system. 1. The Grapevine servers are slower than most of our workstations (and the servers are generally heavily loaded). Thus, fetching new mail is relatively slow - it can take several minutes to retrieve new messages when returning from a vacation or a long weekend. 2. Grapevine server crashes are rare, but they do occur. Unfortunately, a Grapevine server may crash between the time at which the user has been told that new mail exists and the time at which he attempts to fetch it. And when this happens, the error indication does not happen until the connection attempt times out a few minutes later. Both of these considerations make the cost of retrieving new mail hard to predict. If the servers are up and lightly loaded, and there aren't many messages to fetch, and you are near the server on the network, and ... , it may go quickly. On the other hand, there are frequently situations in which new mail is going to arrive slowly, or possibly not at all. As we discuss later, we use background processing in Walnut to attempt to minimize the variance in responsiveness this causes. 2.3 File Storage and Databases in Cedar It generally takes three or more machines to read mail: one or more Grapevine servers (where the user's mail boxes reside), one or more file servers (where the user's mail database resides), and the user's workstation (which is running the Walnut program). " 2.3.1 Alpine Walnut stores mail files on Alpine file servers [Brown84]. Alpine is a file server program (written in Cedar) that provides page- and file-level locking and transactions: a general-purpose remote procedure call package is used to transfer pages to and from Alpine servers. Walnut uses Alpine transactions to prevent inconsistent updates due to either concurrent access or server crashes. Although Alpine can "run on a client workstation (it will coexist with the normal Cedar file system), most Walnut users store their files on separate Alpine server machines. The reasons for storing files on a server are simple: backup of the server is done by the administration and the Walnut user is then free to read his mail from any Cedar machine that happens to be available. The ability to easily move among machines is particularly important for our summer interns, who typically do not have personal workstations at their disposal but must share a pool of public machines; using a shared server also makes it possible to have public mail databases. However, using a file server machine introduces some interesting failure modes and makes it necessary to manage the connections with the shared resource. XEROX PARCo CSL-85-9. NOVEMBER 1985 WALNUT: STORING ELECTRONIC MAIL IN A DATABASE 5 2.3.2 Cypress Walnut manipulates Cypress databases [Catte1l83]. Cypress is an entity-relationship database system developed in Cedar. The details of the Cypress data model are not particularly important; almost any relational system would have served us as well. Cypress provides a fairly low-level programmer's interface, which is important since we use the database to record and to manipulate a large amount of information about the state of Walnut processing. The database is a natural place to store state information and the Cypress programmer's interface is efficient enough to make this practical. Cypress currently runs on the client workstations, i.e., the workstation running Walnut also runs Cypress. We have plans to change Cypress so that, like Alpine, it can run on a separate server; when this is done, we will be able either to run Alpine and Cypress on the same machine (as a "database server") or to run Alpine on one machine, providing only file service and run Cypress on a separate machine, communicating with both the client .machine (running Walnut) and the Alpine servers using remote procedure calls. Cypress uses Alpine transactions in a straightforward manner: instead of doing logical locking of tuples or relations, Cypress simply locks the pages of the database as tuples are read or written. Because the mapping from Cypress entities and relationships to Alpine pages is not under the control of the Cypress client, the client cannot depend on logically independent operations to be physically independent on the Alpine server. (Cypress attempts to perform some co-location of tuples for the same entity on the same page, but this is done to improve performance rather than to reduce transaction conflicts.) 3. The Walnut User Interface As we mentioned above, the user's view of a Walnut database is relatively straightforward. A database consists of messages with immutable properties and message sets with changeable contents. This is reflected in the user interface by having three separate types of windows on a Walnut database: 1. The Walnut "control window," which contains the list of all the current message sets (in Figure 1. the built-in message sets Active and Deleted are highlighted), buttons for basic operations such as shutdown and retrieval of new mail, and a typescript that records the history of user interactions. 2. A Walnut message set displayer lists the messages in a message set and provides a menu of operations for them (Figure 2). The message names displayed in italics have not been read: the message name displayed in bold is the "current selection" in the message set. (The choice of these fonts is a user-specified option.) The operations in the menu use the "current selection" as an implied argument: the AddTo and MoveTo operations use the selected message sets in the Walnut control window as arguments. A message in a message XEROX PARe. CSL-85-9. NOVEMBER 1985 6 WALNUT: STORING ELECTRONIC MAIL IN A DATABASE ; , _ 1j.i'iialnut 6.0 : " Sender NewMail CloseAll IvIs gSetOps NumMs gs Explnfo Find There is no nevI mail at .November 11, 1985 5:01:47 pm PST Act i ve De 1 eted Addresses ADForum Alp; ne Arpanet Banyan Bass b; b 1; ography Bu 11 et; nBoard Cedar6. 0 CedarArch i ve CedarD; scuss i on CedarLang CedarLore CedarLoreNew CedarP 1 an Chr i s C8L Cypress DBP 1 ans Disks DocMgmt Dragon finger Grapev i ne Hi ckory icons I ntegrat; on I nterest I nterscr; pt Junk Object8erverDiscussion Oct17MeetingMinutes Projects Recru i t i ng Russe 11 8em i nars 80ftwareForum 8qu; rre 1 TechReports Tr i ps Types UGrant UN I X ~~a 1nut Wa 1 nut8tats ~~a 1 nut8uggest; ons Wh; teboards ~~; shL ; st Addin g new mail to the database Sal v ador .ms delivered 1 message Msg: Ending backup (file s ... : has been Deleted from Active Copyright © 1985 by Xerox Corporation. All rights reserved Figure 1. The Walnut Control Viewer set can be displayed simply by pointing the mouse at it and clicking one of the mouse buttons. 3. A Walnut message displayer. which displays the text of a message (Figure 3). The menu items in the window include operations to generate message-sending windows to Answer or, Forward a message. a Print button for producing hardcopy. and operations to manipulate the contents of the message as a normal Cedar document (the "Places" and "Levels" buttons). Note that messages in Walnut can be formatted documents-the mail system uses the same editor as the rest of Cedar. , V'~jI'hi te boa.rds IvIessa.ges ' " ". ..-' . Display Delete AddTo Places Levels MsgOps 7 Aug 85 To: CedarUs... Whlteboard for CedarChest 6.0 13 Aug 85 PeterKessler.pa WhiteBoardDoc 17 Sep 85 jw@GVAX"" Whiteboards paper 2.4 Sep 85 To: jw@G... Re: Whiteboards paper 2.5 Sep 85 jw@GVAX.... Found it ... 2.6 Sep 85 jw@GVAXIIII Version Starnps and The Promised Paragraph 2.6 Sep 85 To: jw@G... Re: Version Stamps and The Promised Paragraph '/7 Oct 85 To.' Barth Re.· Core discussion 17 Oct 85 Sprei tzer .pa Re: Core discu.ssion 11 Oct 8S Spreitzer.pa Re: Core discussion 18 Oct 85 Barth.pa Large text boxes on whiteboards 18 Oct 85 To: Barth Re: Lar ge text boxes on white boards 25 Oct 85 To.' CedarUse... Hthiteboard for CedarChest 6.0 > MoveTo Copyright © 1985 by Xerox Corporation. All rights reserved Figure 2. A Walnut Message Set Viewer XEROX PARC, CSL-85-9. NOVEMBER 1985 • 7 WALNUT: STORING ELECTRONIC MAIL IN A DATABASE Freeze Categorles Answer Forward Prmt gvID Split Places Levels Date: Mon~ 4 Nov 85 14:34:43 PST From: Terry.pa Suhject: J. Ousterhout on "The Sprite Network Operating System," December 10 at 2. p.m. in the CSL Commons To: Compu terResearch t .pa Reply-to: Terry.pa I am announcing this now for those of you that may have been planning to drive to IBM to hear John's talk. I'll send out a reminder in a few weeks ... Tuesday~ December 10~ 1985 2. p.m.~ CSL Commons (35-2.2.30) THE SPRITE NETWORK OPERATING SYSTEM John K. Ousterhout Computer Science Division University of California~ Berkeley Sprite is a new network operating system built by a team of graduate students and myself as part of the SPUR workstation project. The talk will focus on three parts of Sprite: the filesystem, process offloading, and the virtual mem.ory system. The filesystem will provide a single· shared Unix-like file hierarchy distributed across several servers. Clients will use dynamically-constructed prefix tables to associate file names with servers. Sprite will include a process offloading mechanism that allows jobs to be run on idle workstations in the same way that jobs may be placed in background in Unix. The virtual memory system will be Unix-like with simple extensions to permit shared data segments and synChronization. I'll talk about how changes in technology have influenced the design of Sprite and try to convince you that a) a simple network filesystem eliminates the need for other network protocols, RPC, name servers, and the like; b) magnetic disks will soon be obsolete; and c) paging is also about to be obsolete (sort of). Host: Doug Terry Copyright © 1985 by Xerox Corporation. All rights reserved Figure 3. A Walnut Message Viewer Something that is missing from the user interface for Walnut is a more general mail database querying facility than simply being able to display the contents of a message set. A query . facility has been built for Walnut. but it is not an integral part of the system (and currently is not widely used). There are two reasons why we did not build a general query processor as part of Walnut: 1. Our first goal was to get the underpinnings of the system right before adding the substantial complexity that a queryer would entail. We were conservative in our ambitions because of the importance our users place on high reliability and performance. 2. More importantly, a good test of the programmer's interfaces to Walnut was whether we could build a query processor on top of Walnut, rather than inside it. Being able to do so made it much easier to modify the queryer as we gained experience with it and made it possible for the Walnut users who did not need or want a query processor to avoid paying any performance penalties. We will discuss below the query processor and the other packages built on top of Walnut. XEROX PARCo CSL-85-9. NOVEMBER 1985 8 WALNUT: STORING ELECTRONIC MAIL IN A DATABASE 4. The Walnut Internal Structure 4.1 Files Instead of storing all of the information pertaining to messages and message sets in a Cypress database. Walnut uses multiple files for storage: a Cypress database file and a collection of log files. Both the logs and the database are stored on Alpine servers. The most important of the log files is the "current log" file, which contains both the contents of all of the messages and a history of all of the operations that a Walnut user has performed (expunging the database compresses this history by discarding entries for deleted messages); thus, the contents of the database can always be reconstructed by replaying the current log. The database contains the current association of messages with message sets and the information necessary to display messages (e.g., the label for a message in a message set viewer and the position in the log of the message body). Additional log files are used to store incoming messages and to execute an expunge operation - the details of their use is given below. This separation of the information into several files was motivated by several concerns: 1. Cypress, like most database systems, does not handle large varying length strings well. We knew that Walnut would have to handle large messages, so the use of a separate log to store the message bodies was necessary. Messages of approximately half a megabyte have been sent and received by Walnut; the system was designed to handle messages much larger. 2. Initially, there were some worries about the reliability of Cypress; indeed, a few bugs were found in Cypress when Walnut use became heavy. Having a separate log meant that one could always fall back to reconstructing the database rather than having to find ways of recovering data from damaged databases. Reliability was critical in the acceptance of the system; our users were willing to have the system crash if the recovery was simple. and certain. 3. Having a log also meant that we had more freedom in changing the database schema, something that we have done during the course of the Walnut development. We can release a new version with a change in the database schema without issuing complicated instructions to our users. Instead, we record in the database a "database schema version stamp" that is checked by each Walnut release; having the wrong version stamp is logically no different than having a mangled database - in each case, recovery is done by replaying the log. (Still, rebuilding databases is not something that we do regularly; most Walnut users rebuild only when forced to do so because of serious hardware or software failures. In particular, we do not depend on rebuilding daiabases to reclaim file space.) A "root file" is used to provide a logical connection between the log files an~ the database. Again, the root file is stored on Alpine. The root file contains the name of the database file and the list ·of the log files; additionally it records which user may add new mail to the database and a user-specified "key" that is stored in all of the log files and in the database to help guarantee that the logs and database mentioned in the root really do belong together (this key is usually some XEROX PARCo CSL-85-9. NOVEMBER 1985 WALNUT: STORING ELECTRONIC MAIL IN A DATABASE 9 descriptive phrase, such as "Donahue's mail database"). The fact that the information in a Walnut database is spread over several files is carefully hidden from both the Walnut client programs and (as best we can) from the user. The root file is really the file that describes a Walnut database-it is the responsibility of the low levels of Walnut to interpret the contents of the root file and to make the collection of files described therein behave as if all of the data were stored in a single database. 4.2 Programmer's Interfaces Walnut makes available three programmer's interfaces. Together, these capture the semantics of a Walnut database as a collection of "abstract datatypes." The lowest level interface, WalnutOps, provides all of the operations that can be legally performed on a Walnut database. The user level interface, WalnutWindow, gives a client program access to the same set of operations that a Walnut user can perform through the buttons on the screen. Finally, the WalnutRegistry interface provides a client program with a history of the WalnutOps operations that are performed so that the state transformations of the database can be monitored. 4.2.1 WalnutOps The semantics of a Walnut database are completely characterized for Walnut clients by the WalnutOps interface. WalnutOps includes all of the operations needed for the Walnut user interface code (such as "get the contents of this message," or "move this message to this message set") and all of the operations that are used to synchronize the various Walnut processes (more about this will be given below). The operations in WalnutOps can be separated into the following categories: 1. Starting and stopping Walnut: The StartUp operation takes the name of a root file and initializes the system to operate on the database and log files named in the root; the ShutDown operation shuts down all of the server c 3 III en 15' ~ D> '3 c: CD D> ~ Co ~ co· en c: CD ~
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.3 Linearized : No XMP Toolkit : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37 Create Date : 2010:11:26 18:52:56-08:00 Modify Date : 2010:11:26 19:53:58-07:00 Metadata Date : 2010:11:26 19:53:58-07:00 Producer : Adobe Acrobat 9.4 Paper Capture Plug-in Format : application/pdf Document ID : uuid:c1e56012-642d-4b74-ae8c-e5fcba670b22 Instance ID : uuid:fbcaa75d-77a5-4feb-94e5-3ba9c391cc1d Page Layout : SinglePage Page Mode : UseNone Page Count : 20EXIF Metadata provided by EXIF.tools