CSL 81 8_Information_Storage_in_a_Decentralized_Computer_System 8 Information Storage In A Decentralized Computer System

CSL-81-8_Information_Storage_in_a_Decentralized_Computer_System CSL-81-8_Information_Storage_in_a_Decentralized_Computer_System

User Manual: CSL-81-8_Information_Storage_in_a_Decentralized_Computer_System

Open the PDF directly: View PDF .
Page Count: 153

Download
Open PDF In Browser	View PDF

Information Storage in a Decentralized
Computer System
David K. Gifford

Information Storage in a Decentralized
Computer System
by David K. Gifford

CSL-81-8

June 1981; Revised March 1982

Abstract: See page x.
This report is a slightly revised version of a dissertation submitted to the Department of
Electrical Engineering and the Committee on Graduate Studies of Stanford University in
partial fulfillment of the requirements for the degree of Doctor of Philosophy.

CR categories: 4.33. 4.35. 3.73

Key words and phrases: file system, information storage, reference, transaction, location,
replication, suite, representative, quorum, reconfiguration, protection, cryptographic sealing,
key, seal, unseal.

XEROX

Xerox Corporation
Palo Alto Research Centers
3333 Coyote Hill Road
Palo Alto, California 94304

@

Copyright 1981, 1982 by David K. Gifford
All Rights Reserved.

Contents
List of Figures vii
List of Tables viii
Preface ix
Abstract x

1. Method and Principles 1
1.1 Introduction 1
1.2 Method 1
1.3 System Description 3
1.4 Background 4
1.5 Architectural Principles 6
1.6 Summary 7

2. Environment 8
2.1 Hardware Components 8
2.1.1 Processors 8
2.1.2 Communication 9
2.1.3 Storage Devices 9
2.2 Stable Storage 10
2.3 Unique Identifiers 11
2.4 Reliable Remote Evaluation 11
2.4.1 Model 11
2.4.2 Algorithm 13
2.5 Summary 20

3. Transactional Storage 21
3.1 Model 21
3.1.1 References 21
3.1.2 Transactions 23
3.1.3 Volumes 26
3.1.4 Files 27
3.1.5 Mutable and Immutable Objects 28
3.2 Basic Algorithms 29
3.2.1 Single Processor Case 29
3.2.2 Decentralized Case 30
3.2.3 Reed's Algorithm 31
3.2.4 Located References 32
3.3 Refinements 35
3.3.1 Lock Compatibility 35
3.3.2 Lock Granularity 36
3.3.3 Lower ,Degrees of Consistency 36
3.3.4 Broken Read Locks 37
3.3.5 Releasing Read Locks 37
3.3.6 Deadlock 37

iii

3.3.7 Checkpoints and Save Points 38
3.3.8 Nested Transactions 38
3.3.9 Stable Storage Failure 38
3.4 Summary 38

4. Location 40
4.1 Indexes 40
4.2 Indirection 41
4.2.1 Model 41
4.2.2 Basic Algorithm 41
4.3 Indirect References 43
4.4 Object Location 46
4.5 Choice References 46
4.6 Summary 48

5. Replication 49
5.1 Suites 49
5.2 Basic Algorithm 52
5.3 Correctness Argument 61
5.4 Refinements 62
5.4.1 Weak Representatives 62
5.4.2 Lower Degrees of Consistency 62
5.4.3 Representative Performance 62
5.4.4 Expressive Power of Suites 62
5.4.5 Size of Replicated Objects 62
5.4.6 Total Replacement 63
5.4.7 Simultaneous Suite Update 63
5.4.8 Releasing Read Locks 63
5.4.9 Updating Representatives in Background 63
5.4.10 Guessing a Representative is Current 63
5.4.11 Postponed Creation of File Representatives 63
5.4.12 An Alternative for Replicated Volumes 64
5.5 Related Work 64
5.6 Summary 64

6. Reconfiguration 65
6.1 Reconfigurable Objects 65
6.2 Basic Algorithm 66
6.3 Refinements 74
6.3.1 Reducing Data Movement 74
6.3.2 Eliminating the Volume Index 74
6.4 Summary 75

iv

7. Protection 76
7.1 Preliminaries 77
7.1.1 Framework 77
7.1.2 Environment 77
7.1.2.1 Cryptography 77
7.1.2.2 Threshold Scheme 81
7.1.2.3 Checksums 82
7.2 Cryptographic Sealing 82
7.2.1 Model 82
7.2.2 Basic Algorithm 85
7.2.3 Strength 93
7.2.3.1 Correctness Argument 93
7.2.3.2 Susceptibility to Cryptanalysis 95
7.3 Applications 97
7.3.1 Privilege Establishment 97
7.3.1.1 Key Rings 97
7.3.1.2 Encrypted Objects 97
7.3.1.3 Guarded Objects 98
7.3.1.4 Protected Volumes 99
7.3.2 Common Protection Mechanisms 100
7.3.2.1 Capabilities 100
7.3.2.2 Access Control Lists 102
7.3.2.3 Information Flow Control 102
7.3.3 Secure Processors 104
7.3.3.1 Limiting Remote Evaluation 104
7.3.3.2 Secure Channels 105
7.3.4 Revocation 107
7.3.4.1 Protected Indirection 109
7.3.4.2 Revocation Algorithm 109
7.4 Practical Considerations III
7.4.1 Changing Protection Controls III
7.4.2 Authentication in the Large 112
7.4.3 Performance 112
7.4.4 Comments 113
7.4.5 Elimination of Authentication 113
7.5 Comparative Analysis 113
7.6 Summary 114

8. Practical Considerations 115
8.1 Implementation 115
8.1.1 Prototypes 115
8.1.2 Full-Scale Implementations 116
8.2 Configuration 116
8.2.1 Static Configuration 116
8.2.2 Dynamic Configuration 120
8.3 Summary 120

v

9. Summary of Ideas 121

Appendix A: Exposition Language: EL 123
A.l Language Extensions 123
A.2 Classes 123
A.3 Records 125
AA External Representation 127
A.5 Exception Handling 127
A.6 Processes 128
A.7 Synchronization 128
A.8 Sets 128
A.9 Byte Arrays 129
A.I0 Miscellaneous 129

Appendix B: Storage System Summary 130
Appendix C: The Open Function 133
Bibliography 134
Index 138

vi

Figures

Figure 2.1 - Structure of a Processor Class 18
Figure 3.1- Structure ofa Located Class 34
Figure 4.1 - An Indirect Record 42
Figure 4.2 - Structure of an Indirect Class 45
Figure 5.1 - Structure of a Suite Class 54
Figure 6.1 - Structure of a Reconfigurable File or Index 67
Figure 6.2 - Structure ofa Reconfigurable Volume 68
Figure 6.3 - File and Index Reconfiguration Algorithm 70
Figure 6.4 - Step 1 of Volume Reconfiguration: Move Files 71
Figure 6.5 - Step 2 of Volume Reconfiguration: Move File Index 72
Figure 6.6 - Step 3 of Volume Reconfiguration: Change Volume Pointer 73
Figure 7.1- A Passive Protection Mechanism 78
Figure 7.2 - An Active Protection Mechanism 79
Figure 7.3 - Example Sealed Objects 87
Figure 7.4 - Protection System Internal Structure 88
Figure 7.5 - Capability Reference 101
Figure 7.6 - Object Reference: Access Control List 103
Figure 7.7 - Structure of a Secure Located Class 108
Figure 7.8 - A Revocable Capability 110
Figure 8.1 - A Basic Storage Service 117
Figure 8.2 - Client Defined Protected Volumes 118

vii

Tables

Table 3.1 - Typical Lock Compatibility Matrix 35
Table 3.2 - Lock Compatibility Matrix with Intention Locks 36
Table 5.1 - Sample Suite Configurations 51

viii

Preface

This paper is organized so that it can be used in several ways. It can of course be read in its
entirety for a treatment of the general problem of decentralized information storage. It is also
possible to read an isolated chapter for a treatment of a single topic, such as protection. Each
chapter begins with an outline of its organization and ends with a summary of its contents. If the
reader wishes to look at a chapter in isolation I would also suggest that they read Sections 1.3 and
3.1.
There is a comprehensive index at the end of the paper. The index is useful to locate
references to a specific concept It also indexes the program text, and thus should help a reader
find the definition of a function or a type.
I have tried to present enough detail so a reader can easily implement the ideas I discuss.
However, it is possible to understand the essence of the ideas without reading program text On a
first reading, a casual reader should probably skip sections titled implementation or refinement
I would like to thank my patient advisers, Dr. Butler Lampson and Prof. Susan Owicki, for
their help and good advice over the past four years. In addition to my advisers, Peter Deutsch,
Jim Gray, Martin Haeberli, Prof. Larry Manning, Gene McDaniel, Alfred Spector, Larry Stewart,
and Howard Sturgis took the time to carefully read chapters of this paper and suggest
improvements. I would also like to thank Bob Taylor for making it possible for me to do this work
at the Xerox Palo Alto Research Center. The work benefited greatly from daily interactions with
colleagues at Xerox.
The Fannie and John Hertz Fellowship that I held while a graduate student gave me the
freedom to pursue this research. The research was supported in part by the Fannie and John Hertz
Foundation, and by the Xerox Corporation.
This work is dedicated to my parents.

ix

Abstract

This paper describes an architecture for shared information storage in a decentralized computer
system. The issues that are addressed include: naming of files and other objects (naming), reliable
storage of data (stable storage), coordinated access to shared storage (transa~tional storage), location
of objects (location), use of multiple copies to increase performance, reliability and availability
(replication), dynamic modification of object representations (reconfiguration), and storage security
and authentication (protection).
A complete model of the architecture is presented, which describes the interface to the facilities
provided, and describes in detail the proposed mechanisms for implementing them. The model
presents new approaches to naming, location, replication, reconfiguration, and protection. To verify
the model, three prototypes were constructed, and experience with these prototypes is discussed.
The model names objects with variable length byte arrays called references. References may
contain location information, protection guards, cryptographic keys, and other references. In
addition, references can be made indirect to delay their binding to a specific object or location.
The replication mechanism is based on assigning votes to each copy of a replicated object. The
characteristics of a replicated object can be chosen from a range of possibilities by appropriately
choosing its voting configuration. Temporary copies can be easily implemented by introducing
copies with no votes.
The reconfiguration mechanism allows the storage that is used to implement an object to
change while the system is operating. A client need not be aware that an object has been
reconfigured.
The protection mechanism is based on the idea of sealing an object with a key. Sealed objects
can only be unsealed with an appropriate set of keys. Complex· protection structures can be created
by using such operators as Key-Or and Key-And. The protection mechanism can be employed to
create popular protection mechanisms such as capabilities, access control lists, and information flow
control.

x

Chapter 1: Method and Principles

1.1 Introduction
Communication is an essential part of our basic need to cooperate and share with one another.
We have been given the freedom to have distant friends and increased knowledge about our world
by advances in communication technology such as the post office, the telegraph, and the telephone.
Computer systems promise to be another such advance. Large scale community information
systems are likely to playa major role in our future ability to create, organize, process, store, and
share information.
It was once thought that the problem of building large computer systems was that of building
large computers. It is now clear that this is not the case. Instead of employing a single computer,
future large scale computer systems will be composed of thousands, or even millions, of computers.
The goal of this research is to demonstrate an information storage system architecture that can
be used to integrate a collection of computers. By integrate we mean that the architecture permits
information to be easily exchanged between the users of the system. We are not suggesting that
information storage be centralized. Rather, we propose to organize the storage facilities that would
normally exist in a collection of computers into a single decentralized system.
The information storage architecture we present is intended to create a foundation for shared
applications. Example applications include office information systems, programming environments,
and data base systems. The principles of our architecture are best characterized by the desire to
provide fundamental storage facilities that can be flexibly adapted to a wide range of uses.
1.2 Method
Computer system research is more than the invention of new algorithms; part of the work lies
in the synthesis of a collection of ideas into a single package. Furthermore, it is important that a
synthesis be faithful to a single set of coordinated architectural principles. Conceptual integrity
keeps complex interactions from making the system intractable as it increases in size and function.
A clear statement of principal design decisions is central to the overall success of a large system.
The importance of these ideas have been demonstrated by [Belady and Lehman 77] and [Brooks 75].
We suggest a four stage process for system creation that is intended to promote these concepts.
The idea of the process is to emphasize the importance of asking fundamental questions early in the
life of a system, and to postpone secondary decisions. In order, the four stages are:
1.

Define the system's architectural principles. The architectural principles of a system are a
set of primary design decisions that consider technical feasibility [Liddle 76]. These
decisions serve to define and delimit the scope of a system. Furthermore, they allow for
orderly growth by providing a single conceptual framework that can accommodate
extensions in system size and function.
For example, what is the nature of the system? Is it intended to provide a general purpose
computing environment? Or is going to be used exclusively for electronic mail? How large

1

CHAPTER 1: METHOD AND PRINCIPLES

will the system be? Are different instances of the system going to be interconnected? What
are the capacity and reliability requirements of the system? What are the services that the
system will offer? How will these services be presented to a client? What are the
requirements for accounting? For protection?
2.

Formulate a system model. A system model is a design for the system in line with its
architectural principles. A model describes the system's interfaces and mechanisms in
enough detail that it is possible to reason about the correctness of key algorithms. When
the system is constructed, the system model is used as a pattern.

3.

Implement the system. A system implementation is a concrete set of hardware and software
components that realize a system model. The implementation of a system normally starts
by making a plan for its construction, testing, and documentation. Naturally, there can be
several implementations of a system model. This is important, as over the life of a system
new implementations of parts of a system will cause new and old components to coexist

4.

Plan a system configuration. A system configuration is an installed set of components from
a system implementation. A system implementation represents a wide spectrum of capacity,
reliability, availability, and performance possibilities; a configuration reflects the decisions
made to meet specific needs.

Our research has been organized according to these stages.
The remainder of this chapter treats the architectural principles of our system and their
background. We discuss how time-sharing systems, personal computing systems, and computer
networks have influenced our goals.
The next six chapters describe our system model by successive refinement. Chapter 2 defines
the environment of the system model. Chapter 3 presents a simple storage system. This storage
system would be ideal if one made the following assumptions:
1.

Files, volumes, and other objects never move.

2.

It is never necessary to improve the reliability, availability, or performance characteristics of

storage devices.
3.

It is never necessary to change the storage that is used to store an object.

4.

People are perfectly trustworthy and there is no need for protection.

Chapters 4 through 7 remove these assumptions. We add location (Chapter 4), replication (Chapter
5), reconfiguration (Chapter 6), and protection (Chapter 7) to describe a practical system.
Chapter 8 discusses system implementation and configuration. Three prototypes were built to
test the validity of our system model, and experience with these prototypes is discussed. In view of
these prototypes, we outline our expectations about full scale implementations of the system.
System configuration is discussed briefly, but it is not thoroughly explored.
Chapter 9 concludes the paper with a summary of the major ideas introduced in the paper and
a review of how we have achieved the architectural principles we set forth.

2

CHAPTER 1: METHOD AND PRINCIPLES

1.3 System Description
This paper includes a large amount of program text The intent of the program text is to
provide the reader with a detailed understanding of the system model. Considered collectively, the
program text is an implementation of the system model. The program text is also intended to be
used as a pattern for full-scale implementations. Thus, the code that we present is called a model
implementation. The functions and types of the model implementation are indexed at the back of
the paper.
The system model is described in EL ("Exposition Language"), a dialect of Lisp 1.5 [McCarthy
et at. 62]. Lisp was chosen because it includes an Eval function, which allowed us to define the
semantics of operations executed at remote processors. In addition, Lisp lends itself to the
transformation of objects to byte strings and back again (see the descriptions of Encode and Decode
in the appendix). Appendix A should provide enough information to enable the reader to
understand the code in the paper. Much of the technical content of the paper is contained in the
program text, and thus we suggest that the reader take the time now to read Appendix A.
To help the reader understand EL we present a short example program fragment The
fragment shown below is not intended to be useful, but it does demonstrate some key EL
constructs. We start by defining three record types: Person, Experience, and Experienced-Person.
Experienced-Person is a derived record type that contains the fields and types of Person and
Experience. The function Open-Person creates a new class that services the operations Name and
Parent Open-Person takes a person record and a person class as inputs. They respectively
represent a person and that person's parent When a Name request is sent to a class, Person-Name
is invoked, and the name of the person is returned. When a Parent request is sent to a class,
Parent-Name is invoked. Parent-Name gets the name of the class' parent from its superclass, which
was set when the class was created.
Every time that we create a class we include a comment that describes the class' instance
variables. For example, Open-Person creates a class, and Person-Name and Parent-Name can use
the instance variable name. The value of instance variables persist over class activations.

Person fa Record[name: Byte-Array];
Experience fa Record[years: Integer];
Experienced-Person fa Extend[person, Experience];
Create-Person[name: Byte-Array / p: Person] +- Prog[ (];
p fa Create[person];
p.name fa name;
Return[p];
];
Open-Person[p: Person, parent: Person-Class / c: Person-Class]
[name: Byte-Array];
-- copy name
name fa p.name;
create a new class
c fa Create-Class[List[
'Name, 'Person-Name,
'Parent, 'Parent-Name], parent];

3

fa

Prog[

CHAPTER 1: METHOD AND PRINCIPLES

-- Instance variables: name
Return[c];
];
Person-Name[/n: Byte-Array] .- Prog[ [];
-- Person-Name is evaluated in the environment of Open-Person
Return[name];
];
Parent-Name[/n: Byte-Array] +- Prog[ [];
-- ask superclass for its name
n +- superclass I N ame[];
Return[n];
];
frank-class .- Open-Person[Create-Person[ttFrank tt ], NIL];
alfred-class .- Open-Person[Create-Person["Alfred tt ], frank-class];
-- dad will be "Frank"
dad +- alfred-class I Parent[];
We observe the following stylistic contentions in program text Variables always begin with
lower-case letters. Function names and record types are capitalized. Whenever a new object is
introduced, we follow the same order of presentation as we did in our example. First, we introduce
the operation to create an object instance. Second, we describe an tt open" function that returns a
class that will service an object instance. Third, we present the functions that actually implement
the class' operations.
Let us define some terms that we will use repeatedly throughout this paper. A client is a
program that uses the facilities we describe, and a user is a human being that interacts with a client
The reliability of a system is a measure of the probability that the system will malfunction, and a
system's availability is a measure of the probability that it will be operational when it is needed.
1.4 Background

Early in the 1960's time-sharing was introduced as a way of providing the illusion of a personal
computer to aid in program debugging. Time-sharing systems turned. out to provide another benefit
that was not originally anticipated. Users found they could easily share information that was stored
in a time-sharing system. Sharing proved to be easy because it was as if a single file cabinet
simultaneously existed in every user's office. Items placed in one file cabinet immediately appeared
in the rest of the file cabinets. The facilities for information sharing provided by time-sharing soon
found use in large collaborative software projects.
As time-sharing matured, sharing was recognized as a basic facility, and made correspondingly
convenient The crss system [Corbato et al. 62] pioneered multiple access computers, and provided
a simple shared file system. Based on this experience, Multics [Corbato et al. 72] extended its file
system to include a tree-structured naming system and advanced protection facilities.
Late in the 1970's hardware became inexpensive enough that users could be provided their own
computers [Thacker et al. 79]. Placing a large amount of computational power at the man-machine

4

CHAPTER 1: MEfHOD AND PRINCIPLES

interface dramatically improved its character. For example, graphics could be introduced, adding an
important element to the domain of things that computers could be used for. Communication
networks [Metcalfe and Boggs 76] allowed the users of personal computers to share resources such
as high-speed printers. However, sharing was not made as convenient as it was in time-sharing
systems.
This research proposes to combine the success of time-sharing's shared storage and the success
of personal computing's man-machine interface into a single system. Unfortunately, it is not
sufficient to simply add conventional storage to personal computing systems. In a decentralized
environment such problems as coordination, protection, reliability, availability, and performance
become much more complicated. For example, in a decentralized system when information is
transferred between computers iover an insecure channel it must be ~ncrypted to provide protection.
Furthermore, people's expectations have properly increased. Computers are being used for an
increasingly diverse spectrum of applications, and many computer users are no longer computer
professionals. Our understanding of these requirements is reflected in the principles of our
architecture.
A number of systems have been built that share our primary goal of integrating a collection of
computers with a shared information storage system. These systems fall into three broad categories.
1.

Existing time-sharing systems have been modified to access remote files. The RSEXEC
[Thomas 73] system was an early attempt to join TEN EX systems in this manner, and the
RSEXEC approach was later adopted by the National Software Works [Forsdick et a1. 77].
The Locus project [popek et al. 81] at UCLA has integrated a number of UNIX systems in
a similar manner. Locus includes facilities for mediating concurrent access to information,
and there are plans to incorporate replicated data as well. These systems all have major
restrictions that are rooted in their time-sharing origins. In addition, they have as a general
rule adopted ad hoc solutions to the intrinsic and environmental problems they faced.
Thus, they do not provide a general framework of the sort we propose.

2.

Data base systems have been extended to operate on several computers that are connected
by a network. Examples of such systems are CICS ISC [IBM 80a] and Tandem Computer's
Encompass fTandem 81]. These systems are intended to provide a specialized service. In
addition, they have not provided general solutions to many of the problems that we
consider.

3.

File servers have been constructed for local computer networks, and these file servers have
been used by client computers for shared storage. WFS, and its successor, IFS, are two
such file servers [Swinehart et al. 79]. A file server at the University of Cambridge has
been successfully used as the only storage service of a time-sharing system [Dion 80]. The
Xerox Distributed File System [Israel et a1. 1978] provides facilities for guaranteeing
information consistency across file servers, and sophisticated facilities for failure recovery.
These file servers are more general than the time-sharing based efforts, and motivated the
system we propose. However, the scope of these servers is limited, and they do not address
many of the problems that we consider.

5

CHAPTER 1: MEIHOD AND PRINCIPLES

1.5 Architectural Principles
The following twelve architectural principles describe and delimit the scope of our ideal storage
system. We will return to this list at the end of the paper to review how they have been satisfied
1.

The system should behave in a well defined way. In a large system design there are many
opportunities for selecting a mechanism that works most of the time. Such a mechanism
can only be employed in conjunction with a backup mechanism that is expected to work all
of the time. For example, certain existing systems will undetectably malfunction in unusual
circumstances. We will not consider such designs.

2.

The system should provide a basic storage service. The basic unit of storage should be the
file, an uninterpreted array of bytes. Read and write primitives should be provided to
access files. The notion of a volume should also be provided to model storage media. Files
are created and stored on volumes.

3.

Storage should be resilient to expected failures. From time to time hardware errors, system
errors, or operator errors will occur. The storage system should expect such errors, and
recover from them without information loss. Furthermore, if unexpected errors occur, the
system should indicate that storage has been damaged instead of providing incorrect
in formation.

4.

Files, volumes, and other objects should be named with unambiguous low-level names. The
storage system should not anticipate how clients might use these names or what naming
environments will be presented to users.

5.

The system should mediate concurrent access to storage to ensure consistency.

6.

The system should be decentralized, and the location of storage system objects should be
hidden from clients. The system should also allow clients to discover where objects are
located.

7.

The system should allow modular expansion. The storage capacity of the system should not
be limited by any design decision, nor should the design intrinsically limit the number of
users that the system can support.

8.

It should be possible to improve the performance, reliability, and availability of the storage
system by keeping multiple copies of selected storage system objects. This principle
includes the idea of making temporary copies of objects for rapid access.

9.

It should be possible to reconfigure the system while it is operating. Reconfiguration
involves changing the storage resources that are used to implement a storage system object

10. A mechanism for information secrecy and authentication should be provided. The
mechanism should be general enough that clients can use it to implement a variety of
protection policies. No one should be able to circumvent the protection mechanism. For
example, system administrators should not be able to access information that they are not
authorized to see.
11. It should be possible to construct derived volumes by extending existing volumes with

6

CHAPTER 1: METHOD AND PRINCIPLES

replication, reconfiguration, and protection structures. Files created on derived volumes will
use the volume's structure as a template. For example, it should be possible to create a
volume R from three other volumes A, B, and C. When a file F is created on R, copies of
F will be automatically maintained on A, B, and C. This allows popular classes of storage
to be directly represented in the system as volumes. Thus, clients may choose to ignore
what facilities are being used to provide the storage they use.
12. A client should be able to select the resources that it uses in such a way that its processor
can remain autonomous from the rest of the system.
1.6 Summary

A new information storage system was proposed. It is intended to be the foundation of diverse
applications such as office information systems, programming environments, and data base systems.
The design of the system was divided into four stages: definition of architectural principles,
formulation of a system model, system implementation, and system configuration. The paper is
structured according to these stages. The system's architectural principles and their background
were then introduced. Chapter 2 begins our consideration of a system model to realize these
principles.

7

Chapter 2: Environment

The first step in defining the system model is to outline its presumed environment This
chapter reviews the characteristics of the hardware components we use, introduces the concepts of
stable storage and unique identifiers, and shows how to perform reliable function evaluation at a
remote processor. The material in this chapter represents an integration of ideas that have been
presented before in various forms. Notably [Lampson and Sturgis 79] previously introduced a
number of the concepts reviewed here.
2.1

Hardware Components

Three types of hardware components comprise the system: processors, communication channels,
and storage devices. In each of the following sections we discuss the characteristics of each type of
hardware component The hardware model presented is abstract to the extent that it only concerns
itself with device characteristics that will influence later design decisions.
2.1.1 Processors
A processor corresponds to the familiar notion of a stored program digital computer. Processors
are the active elements in a system, and they operate independently from one another. Processors
only communicate with each other through communication channels. There is absolute protection
between processors in the sense that all a malicious processor can do is send messages, which other
processors can choose to ignore. Depending on application needs, processors may be connected to a
wide variety of peripherals, such as storage devices, bit-map displays, pointing devices, bar-code
readers, and laser printers. Although it is not an absolute necessity, we will assume that all
processors in the system have the following ideal capabilities.
Every processor is assigned an identifier that is distinct from any other processor's identifier.
Processor identifiers can be implemented by including a read-only memory in each processor that
contains its identifier.
Fixed length processor identifiers offer simplicity, but for some
implementations the expansion capability offered by variable length identifiers may be attractive. A
processor's identifier can be discovered with the function GetProcessorID.
GetProcessorID[/id: ProcessorID]
GetProcessorID returns the identifier of the processor that the calling process is using.
In addition, it is assumed that a processor can encrypt and decrypt data, and generate true
random numbers. Encryption can be implemented in software, but for efficient operation with high
security codes it is likely that special purpose hardware will be required. True random numbers are
useful for generating hard to guess cryptographic keys. A true random number is not the output of
a pseudo-random number generator, but is derived from a truly random process such as thermal
noise, shot noise, or radioactive decay.

8

CHAPTER 2: ENVIRONMENT

2.1.2 Communication

Communication and synchronization are accomplished between processors by sending and
receiving messages. A message is an uninterpreted array of bytes that is sent to a destination
processor. Every processor is assigned an address that includes its processor identifier, so different
processors will never be assigned the same address. However, we assume that if a processor is
physically moved its address will change.
The following functions define the interface to the communication system. Note that Processor
includes a processor identifier and fields that are private to the communication system.
Processor ... Extend[Record(id: ProcessorID], PrivateFields];
Send[destination: Processor, message: Byte-Array]
Send transmits message to destination.
Receive[/message: Byte-Array]
Receive delays until a message arrives addressed to this processor, and then returns the
incoming message.
GetMyProcessor[/self: Processor]
GetMyProcessor returns the address of this processor.
The communication system can lose, duplicate, or arbitrarily delay messages. Thus, messages
may arrive out of order, more than once, or not at all. We will assume that messages damaged in
transit are detected and discarded by the communication system, and will appear to have been lost
At least one distinguished address, Broadcast, is defined by the communication system so that a
processor can discover things about its environment and begin communicating with other processors.
One or more processors can ask to receive messages addressed to Broadcast. How far broadcast
messages will propagate in the communication system is implementation dependent. Chapter 4
discusses how this facility is used.
Broadcast: Processor
Packet switched networks [Metcalfe 73] are well adapted to providing a full connectivity
network at a moderate cost, and are an attractive method for implementing the proposed
communication system. Local packet switched networks, such as the Ethernet [Metcalfe and Boggs
76], provide high capacity and low delay at low cost for a local area. A treatment of local networks
can be found in [Clark et al. 78].
Local networks can be connected together by gateway processors and communication channels
to form an internetwork [Boggs et al. 80]. An internetwork retains the performance and cost
advantages of a local network, while extending the communication system to accommodate modular
growth. In an internetwork, non-local packets are forwarded though a succession of gateways to
eventually arrive at their destination.
2.1.3 Storage devices
A storage device is a processor peripheral that stores data. Read and write are the two primitive

operations that are used to access and update storage devices, respectively.

9

We assume that a

~2:E~RONMENT

storage device will indicate when it has failed to accurately store data. Failure detection is typically
implemented with the help of labels and checksums [Lampson and Sproull 79].
Three fundamental characteristics of storage devices are capacity, latency, and transfer rate. The
capacity of a storage device is the maximum number of bytes that it can store on a single medium.
We have been careful to state maximum, because the amount of data that can be stored is often a
function of the type of media in use. The transfer rate is the maximum number of bytes per
second that can be transferred to or from the device. The latency of a storage device is the average
amount of time required to read or write information ignoring the time the device is actually
transferring data.
Storage devices have four additional important characteristics.
1.

Some storage devices are designed for random access to data, and some are designed for
serial access. These devices will be called random access and serial access, respectively. A
tape drive is an example of a device designed for serial access to data, and an attempt to
use it in a random access mode would result in poor performance. Although the storage
capacity of random access devices is increasing, serial access devices may continue to have a
role in information storage because of their lower cost

2.

Some storage devices allow data to be read or written, but once data is stored it can not be
overwritten. Such a device will be called write-once. This property is usually due to the
storage media in use, for example a write-once optical disk.

3.

Some storage devices do not allow data to be written. Such a device will be called readonly. As a safety feature some storage devices can be made temporarily read-only.

4.

Some storage devices can record on storage media that are interchangeable between other
storage devices of the same type. Such a device will be said to have removable media.

2.2 Stable Storage
Storage that is resilient to a set of expected failures is called stable storage. From time to time
hardware errors, system errors, or human errors will occur. A stable storage system expects such
errors, and recovers from them without information loss. Furthermore, if unexpected errors occur,
a stable storage system indicates that storage has been damaged instead of providing incorrect
information. Expected hardware failures include storage device transfers that malfunction, and
information on a storage device that decays and becomes unreadable. Transfers that malfunction
include transfers that are in progress when their controlling processor fails.
The write-ahead-log protocol [Gray 78] is widely used to implement stable storage. It
implements stable storage by carefully maintaining two copies of data on devices with independent
failure modes.
[Lampson and Sturgis 79] suggest a similar. algorithm.
It is difficult to precisely characterize the reliability of stable storage. Manufacturers give bit
error rates for their devices, but these figures do not include catastrophic failures. Examples of
catastrophic failures are head crashes on disk drives, media dropped on the floor by operators, and
media mistakenly erased by users.
We shall call storage that is not protected against failures volatile storage. Stable storage will be
used to record long term system state, and volatile storage will be used for intermediate results.

10

CH~R2:E~RONMENT

Volatile storage is naturally much more efficient than stable storage because the same precautions
do not have to be observed
2.3 Unique Identifiers
A unique identifier is defined to be a value that is distinct from every other unique identifier.
We will assume for simplicity that unique identifiers are system generated nonsensical bit strings,
but unique identifiers could be client generated and sensible.
All unique identifiers do not have to be the same length. When it is known that a unique
identifier is going to be used extensively it may be advantageous to use a shorter identifier. Of
course, there .are fewer short identifiers than there are long ones.
A common method for generating a unique identifier is to concatenate a processor identifier
with a locally unique identifier. A locally unique identifier can be implemented by a counter in
stable storage. Whenever a locally unique identifier is required the counter is incremented and
returned. An obvious optimization is to withdraw a sequence of locally unique identifiers from
stable storage at once to reduce delay. Another technique for generating locally unique identifiers is
as follows. At processor initialization time create a variable nextId and set it equal to a calendar
clock. A calendar clock is a clock that holds the current date and time and thus monotonically
increases. Every time a locally unique identifier is requested increment nextld by one and ensure
that it is less than the calendar clock by pausing if necessary. Now nextId is guaranteed to be
locally unique. Thus, the second scheme does not require stable storage.
Although theoretically unique identifiers are unique, there is a chance that the unique identifier
mechanism could fail and issue duplicate identifiers. Such a failure could result from two
processors that were mistakenly assigned the same processor identifier, or from a malicious client
The algorithms we present in many instances check for duplicate unique identifieTh. However, to
provide a foundation on which to build, we will assume that unique identifiers are in fact unique.
The following function provides unique identifiers:
GetUniqueID[/id: UniqueID]
GetUniqueID returns a unique identifier. On a single processor subsequent unique
identifiers from GetUniqueID are monotonically increasing.
2.4 Reliable Remote Evaluation

2.4.1 Model
Remote form evaluation is the way one processor requests another processor to evaluate a
function and return a result A remote evaluation is a generalization of what is commonly referred
to as a remote procedure call. [Spector 80] provides a taxonomy of remote operations and their
semantics.
It is possible to provide precise semantics for remote evaluation because evaluation is formally
defined with respect to an environment that binds values to free variables and functions. We will
assume that remote evaluation is done with respect to a default environment that includes the data
types and functions defined in the paper.
To communicate with a remote processor it is first opened. Opening a processor results in a

11

CHAYfER 2: ENVIRONMENT

class that will service EWlI requests. Eval will evaluate an EL form at the remote processor. To
make the remote evaluation of fonns easier, we restrict fonns to be defined by the following
grammar:


Function[ ... 1


Quote[
where  is a local variable. and 
Connection

""



~

address:

I Processor

0

I
I

0

id:

I Unique Identifer

sequence:

I

Integer

I
I



I
Connections Set



0
0



HeldResults Set

Structure of a Processor Class
Figure 2.1

18

CHAPTER 2: ENVIRONMENT

SL: Lock +- CreateLock(];
Get-Next-Sequence[c: Connection I next: Connection] +- Critical[SL, 'Prog[ (];
-- increment connection sequence number
c.sequence +- c.sequence + 1;
Return[c];
]];
Bad-Sequence[cl: Connection, c2: Connection I bad: Boolean] +- Critical[SL, 'Prog[ (];
IF cl.sequence#c2.sequence THEN Returnrn;
Return[NIL];
]];

The following functions are executed by remote processors in support of rp
Connections: Set +- Create-Set[];
OpenConnection[1 c: Connection] +- Prog [(];
-- create a new connection
c +- Create[Connection];
c.sequence +- 1;
c.id +- GetUniqueID(];
c.address +- GetMyProcessor(];
Set-Insert[Connections, c.id, c];
Return[c];

];
HeldResults: Set +- Create-Set(];
Result +- Record[value: Any];
Remember-Eval[c: Connection, form: Form I result: Any] +- Prog [
[connection: Connection; rr: Result];
rr +- Set-Lookup[HeldResults, c];
-- if we still have result from previous Eval, return it
IF Not[Null[rr]] THEN Return[rr.value];
connection +- Set-Lookup[Connections, c.id];
-- if connection unknown, assume that we have crashed
IF NUll[connection] THEN Return[Error['ProcessorFailure]];
-- make sure this is the next in the sequence
IF Bad-Sequence[connection, c] THEN Return[Error['Discard]];
-- create a record so we can distinguish the following cases:
-- a) a result of NIL
-- b) a result that is not in HeldResults
rr +- Create[Result];
-- evaluate the form
rr.value +- Eval[form];
Set-Insert[HeldResults, c, rr];
Get-Next-Sequence[connection];

19

I

Eval.

CHAPTER 2: ENVIRONMENT

Retum[rr. value];

];
DeleteResult[c: Connection] +- Prog [[];
-- originating processor has received result
Set-Delete[HeldResults, c];
];
CloseConnection[c: Connection] +- Prog [[];
-- originating processor closed connection
Set-Delete[Connections, c.id];

];
2.5 Summary
This chapter introduced and characterized three types of hardware components: processors,
communication, and storage devices. Stable storage was introduced, and identifiers that are unique
over processors and time were demonstrated. It was shown how to perform remote form evaluation
exactly once with processor failure detection.

Exercises
1.

Assume that you have a remote processor class, rp. Give a statement that will determine
the value of x in the environment at rp.

2.

Assume that your processor provides you with a stream of uncorrelated but biased bits.
Give an algorithm that will remove the bias (hint: the algorithm will not produce unbiased
bits at a fixed rate) [von Neumann 51].

3.

Give an algorithm for implementing stable storage. Assume that the basic unit of storage is
a page, failures are independent, and the probability that a storage device remembers a
page after time t is pr(t) = e- At. What is pr(t) for your algorithm? How could you do
better?

20

Chapter 3: Transactional Storage

This chapter presents a basic set of facilities for shared information storage in three stages. The
first section of the chapter introduces transactional storage, the second section discusses algorithms
for implementing such storage, and the final section outlines some refinements. The information
storage system described in this chapter would be ideal if storage system objects never moved,
storage did not need its properties improved, there was no need to reconfigure storage, and people
were perfectly trustworthy. Each of these problems is the subject of a subsequent chapter.

3.1 Model
Users do not want the inconsistent intermediate results they store to be misinterpreted by
others, and thus inherent with the concurrent sharing of storage is the need for coordination. To
this end we extend the notion of stable storage to create what we will call transactional storage.
Transactional storage solves the problem of coordination by providing a client with the illusion that
there is no other activity in the system. This illusion is achieved by the use of transactions, which
are a basic unit of concurrency control and error recovery.
The following sections define a transactional storage system in terms of an an ideal set of
generic objects and operations. We will call objects that store information files, and objects that
store files volumes. Volumes are intended to model storage devices. Not all objects have to support
all operations, and it is possible to create specialized types of files and volumes. For example, an
indexed file could be defined that implemented storage and retrieval by key. Extensions to the
basic facilities are provided in subsequent chapters, and we expect that clients will also create
facilities tailored to their needs.
A further property of our transactional storage system is that it hides the physical location of a
resource from its clients. Clients will typically use objects without knowing where they are.
The facilities of a transactional storage system are introduced in the following five sections. In
order, the topics covered are naming, transactions, volumes, files, and immutable objects.

3.1.1 References
All objects· we define are named by references. A reference contains a unique identifier that
unambiguously identifies its referent References can be extended to contain such things as location
hints, cryptographic keys, indirect pointers, and other useful things. We have already introduced
processor references in Chapter 2 with the type Processor.
Reference .... Record[id: UniqueID);
Using references, clients can implement naming systems tailored to their users' needs. A
general model for a naming system is a function that maps context dependent specifications into
references. For example, a data base could be employed to resolve queries into references. A
query might be "show me the file I created last Tuesday". Alternatively, objects could be selected
from a menu of icons on a terminal screen. In such a system screen coordinates would be resolved
to references.

21

CHAPTER 3: TRANSACfIONAL STORAGE

Records are used to construct references. As described in Appendix A, record instances can be
converted to and from arrays of bytes by the operations Encode and Decode. Thus references can
be saved in files. Clients can not predict what fields will be in a reference, and thus should store
references verbatim. As a matter of convention, the type of a reference will correspond to the type
of its referent Thus, the function Is (Appendix A) can be used to determine the type of object that
a reference names.
Every object has a reference-count, which corresponds to the number of counted references to
the object An object can also have uncounted references, which do not affect its reference-count
The storage for an object is not released until all of its counted references are destroyed. When the
storage for an object is released we say the object is deleted. If a reference is counted, care must be
taken to ensure that it is not lost, and that it is properly destroyed when it is no longer needed.
Clients, of course, are not bound to use the full generality of the reference-counting mechanism.
They could limit the number of counted.references to an object to one, in which case destroying a
counted reference would function like delete-object Counted references include the Counted type:
Counted +- Record[);
Reference-counts are provided as a convenience to help with, but not completely solve, the
problem of reclaiming objects that are no longer wanted. Reference-counts have traditionally been
somewhat suspect, but there is a good chance that they will be more reliable when they are
maintained in transactional storage instead of volatile storage.
Garbage collection can also be used to reclaim unwanted objects. An asynchronous garbage
collection scheme has been successfully used in a system that keeps systematic track of all counted
references [Garnett and Needham 80]. However, as the size of a system grows, it is probably not
wise to assume that garbage collection will always be a feasible alternative. Furthennore, when
protection is added to a system, a garbage collector may not be authorized to enumerate all of the
references in the system.
There is a well known accounting problem with reference-counts. An object may exist for an
undefined time after the user that is paying for its storage is interested in it Provisions could be
made to allow users that fund objects to delete them, but the semantics of reference-counts would
be destroyed.
It is possible that a configuration will have enough storage capacity that an object will never
have to be deleted. As we shall see in Chapter 8, in this case reference-counts are still useful to
automatically reclaim unused copies of objects that are stored on the configuration's limited amount
of high performance storage.
If a read-only reference is used to refer to an object, no operations can be issued that would
change the object in any way. Read-only references provide a client with a form of self-protection.
Read-only references include the type Read-Only. Add-Type (Appendix A) can be used to add the
type Read-Only to a reference.
Read-Only +- Record[);
All of the operations on objects in the system are implemented by classes. To acquire a class
that will service operations for a given object, a reference for the object is opened with the function
Open. Open tests the type of its input argument and then evaluates a function that is specialized to

22

CH~R3:TRANSACTIONALSTORAGE

deal with the presented type of reference. For example, if Open is applied to a processor reference,
Open-Processor will be evaluated. Appendix C contains a complete description of Open. The
following operations are supported by all objects.
Open[ref: Reference, tc: TC, ring: List[Key], guards: List[Key] I c: Class]
Open returns a class that will service requests to ref. All operations that the class performs
will be part of transaction class tc, as described in Section 3.1.2. The precise initialization
steps that Open takes depend on the reference's type. If a reference's referent can not be
found, Error['NotFound] is returned. The third and fourth arguments to Open, ring and
guards, are described in the chapter on protection (Chapter 7).
c: Class

I

Close[]

Close deactivates a class. Open references should be closed after they are no longer being
used.
c: Class

I

GetID[1 id: UniqueID]

GetID returns the unique identifier of the object serviced by c.
c: Class

I GetTransactionClass[1 tc: TC]

GetTransactionClass returns the transaction class of class c.
c: Class

I

CopyReference[counted: Boolean I ref: Reference]

An open reference can be copied, and a client can specify whether it wishes the copy to be
counted or uncounted. Making a counted reference is the only way to increment the
reference-count of an object
c: Class

I DestroyReference[1 deleted: Boolean]

When a reference is no longer needed, it should be destroyed. The only way to decrement
the reference-count of an object is to destroy one of its counted references.
DestroyReference returns T if the object's reference-count went to zero. If the reference
count of the object went to zero, the object is deleted when c is closed.
3.1.2 Transactions
Function evaluations that access transactional storage, also called actions, are grouped into
disjoint sets called transactions. A transaction is defined to have three properties.
Totality. Totality ensures that all of the actions in a transaction will either occur exactly
once or not at all. Because transactional storage is built from stable storage, totality also
guarantees that if a transaction appears to occur its effects will be resilient to hardware
failures.
Serial Consistency. Serial consistency makes it appear to a transaction that that there is no
other simultaneous activity in transactional storage.
To formalize the notion of serial consistency we will adapt some ideas from [Eswaran et
a1. 76]. As we have stated, a transaction is a set of actions. For simplicity, we will only

23

CHAPTER 3: TRANSACnONAL STORAGE

consider read and write actions, and we will assume that a transaction performs at most one
read and one write action on a data element
The question of serial consistency arises when a transactional storage system
concurrently executes actions drawn from several transactions. Imagine two simple
transactions:
T2
Tl
(a2l)
ReadX
(all) Read Y
(a22) Read Y
(a12) Write X
(a23) Write X
(a13) Write Y
The order in which the actions of Tl and T2 are processed is called a schedule. A
schedule is an arbitrary interleaving of the actions of a set of transactions into a single
sequence. A serial schedule results when transactions are executed one at a time to
completion. Thus, there are two possible serial schedules for our example:
Sa: {all, a12, a13, a2l, a22, a23}
Sb: {a2l, a22, a23, all, a12, a13}.
A schedule generates a dependency relation.
A dependency relation describes
transactions that depend on one another. If S is a schedule, then  is a member
of DEP(S) if:
Ta and T b are distinct transactions
3j ETa and a.i ETb
3j and a.i are actions from S
3j occurs before a.i in S
3j is a write action to a data element e
there are no other write actions to e between 3j and a.i
a.i is a read or write action to e.
In other words,  is in DEP(S) if Ta updates data element e that T buses.
If two schedules Sand S* have identical dependency relations, DEP(S) = DEP(S*),
then they provide each transaction with the same inputs and outputs. Thus, two distinct
schedules that have identical dependency relations are said to be equivalent.
A system provides serial consistency if it guarantees that the schedule it will use to
process a set of transactions is equivalent to one of the possible serial schedules.
External Consistency. External consistency guarantees that a transaction will always receive
current information. Using the concepts we have just introduced, we can provide a formal
definition of external consistency. The actual time order in which transactions complete
defines a unique serial schedule. This serial schedule is called the external schedule. A
system is said to provide external consistency if it guarantees that the schedule it will use to
process a set of transactions is equivalent to its external schedule.

Let's consider a classical example to see why these properties are important in practice.
Imagine that a bank teller requests that the money in a customer's savings account be transferred to
his checking account The bank's computer system would begin a transaction, and evaluate
functions referenced to this transaction to withdraw the balance of the customer's savings account

24

CHAPTER 3: TRANSACTIONAL STORAGE

and deposit it in his checking account Totality ensures that either both the withdrawal and deposit
will take place, or neither of them will. Serial consistency eliminates problems that might result
from concurrency. For example, it prevents two independent transactions from observing the same
balance, and then simultaneously withdrawing it Finally, external consistency ensures that if the
customer goes to another teller to cash a check, the teller will see the customer's new checking
account balance.
A transaction is active until it ends by either committing or aborting. If it commits, the effects
of its actions will become permanent and public. If it aborts, the effects of its actions will be
undone. If a client does not wait for an action to complete before requesting its transaction to
commit, the action mayor may not occur. Furthermore, a transaction can commit with outstanding
uncompleted reads. Normally a client controls the destiny of a transaction, but transactions can be
spontaneously aborted by the system to cope with exceptional conditions such as hardware failure or
the inability to guarantee consistency.
In addition to performing actions in transactional storage, a transaction can cause real actions to
occur [Gray 78]. A real action is an action that once done, can not be undone. Examples of real
actions are launching a missile or dispensing cash from an automated teller. Totality is guaranteed
for real actions as well as for updates to transactional storage. For example, assume a withdrawal
transaction from an automated teller updates transactional storage (to debit the customer's account)
and dispenses money from the teller machine. Totality guarantees that either both of these actions
will occur, or neither of them will.
Committing a transaction is a real action. The only way to undo the effects of a comitted
transaction is to execute a compensating transaction. If compensation is possible, precisely how it
can be accomplished is highly application dependant For example, it is very difficult to recall a
missile that has been launched.
A benefit of the all-or-nothing nature of transactions is that they can be used to insulate remote
evaluations from processor failures. We assume that every evaluation of a transactional storage
system function is associated with a transaction. If a processor fails while it is executing such an
evaluation, it is sufficient to abort the evaluation's transaction. All of the effects of the evaluation
will be discarded. Furthermore, if orphans later try to execute actions referenced to the aborted
transaction, they will not do any damage.
A coordinator is an object that creates and manages transactions. When a coordinator is
opened, the transaction parameter to Open is ignored. The interface to coordinators is as follows:
Transaction +- Extend[Reference, Record[]];
Coordinator +- Extend[Reference, Record[]];
cc: Coordinator-Class

I

Create-Transaction[1 t: Transaction]

Create-Transaction creates a new transaction and returns a reference for it. The transaction
reference returned is not counted. However, if a counted reference for the transaction is
made, the transaction will persist after it is committed or aborted. Abort and Commit erase
transactions that have a zero reference-count
Once a transaction has been created, it can be opened. The transaction parameter to Open is
ignored when a transaction is opened. The type of class that results from opening a transaction will

25

CHAPTER 3: TRANSACTIONAL STORAGE

be often referred to as a TC, for transaction class. The operations that a transaction provides are as
follows:
tc: Transaction-Class

I

Commit[]

Commit causes the effects of a transaction to be made permanent and pUblic.
tc: Transaction-Class

I

Abort[]

Abort erases any effects of a transaction and makes it appear as if the transaction had never
existed.
tc: Transaction-Class

I

GetStatus[/ status: Atom]

GetStatus returns C, A, or NIL, for a transaction that is committed, aborted, or active,
respectively.
Clients that hold information derived from an active transaction are called participants. A
buffer manager is an example of a participant. At commit, a participant should ensure that all
updated information has been output to transactional storage. At commit or abort, a participant
may wish to invalidate information that was derived from the transaction. Because the transaction is
no longer active, the derived information is no longer guaranteed to be current.
tc: Transaction-Class

I

AddParticipant[commit, abort: Global-Function / id: UniquelD]

A client can be informed when a transaction commits or aborts by calling AddParticipant.
The functions provided by the client will be applied to the specified transaction's reference
just before the transaction commits or aborts, respectively.
The final disposition of a transaction is not guaranteed when commit or abort are called.
Thus, the commit function can not be used to indicate success to a user. Only a successful
return from tc I Commit indicates that a transaction has sucessfully committed.
tc: Transaction-Class

I

DeleteParticipant[id: UniquelD]

D~leteParticipant deletes a participant.
The participant to be deleted is specified by a
unique identifier that was returned from AddParticipant.

It is possible that a client will request a transaction to commit and then its processor will fail
before it can find out what happened. A solution to this problem adopted by some systems is to
use real actions to indicate success. Alternatively, a client can keep a counted reference for the
transaction, and use GetStatus to discover what happened when its processor recovers.
3.1.3 Volumes
A volume is a file-storage service that is independent of a specific hardware realization. A
volume can be implemented by a fraction of a storage device, one storage device, or more than one
storage device. The details of how a physical medium are prepared for use are beyond the scope of
this work, and we simply assume that a mechanism exists that will create a new volume and return
a reference for it
The storage technology used to implement a volume influences its properties. We call the
volumes implemented by serial devices and write-once devices serial volumes and write-once

26

CHAPTER 3: TRANSACTIONAL STORAGE

volumes, respectively. A volume may be both serial and write-once. Serial and write-once volumes
influence the properties of the files stored on them, as described in the next section.
A volume is named by a volume reference, and serial and write-once volumes are further
distinguished by unique types of volume references. The types that are used to implement these
references are:
Volume ~ Extend[Reference, Record[]);
Serial ~ Record[];
Serial-Volume ~ Extend[Volume, Serial];
Write-Once ~ Record[];
Write-Once-Volume ~ Extend[Volume, Write-Once];
Users may choose not to rely on reference-counts to determine when a volume should be
retired, and thus an implementation of volumes may choose not to implement the referenceHowever, automatic volume
counting semantics of CopyReference and Destroy Reference.
reclamation may be appropriate for some storage devices [IBM 80b).
Create-File is the only operation that all volumes must support. However, a volume might
choose to implement an operation to return its remaining storage capacity, or an operation to
enumerate the files that it holds.

3.1.4 Files
A file is a variable-length array of bytes that can be read and written. The length of a file can
be set, and the system mayor may not remember information past the declared end of a file. A
file is initialized to contain all zero bytes, and an attempt to read past the end of file will return
zero bytes. If data is written beyond the end of a file, the file will be automatically extended and
have its length adjusted.
Like other objects, a file is named with a reference. A file reference contains a reference for its
containing volume. A file reference may also contain other things, such as storage device addresses,
but as we described in the section on references, the job of a client is simply to store references
verbatim. A file reference is defined as:
File

~

Extend[Reference, Record[volume: Volume));

Files that are stored on serial and write-once volumes have special properties and reference
types. If a file is stored on a serial volume, then non-serial access to the file is very expensive. If a
file is stored on a write-once volume, then it can only be written once. It is unlikely that clients will
have to deal directly with serial or write-once files. In Chapter 8 we discuss briefly how to obtain
the cost and performance benefits of serial and write-once devices while presenting a normal file
interface. Serial and write-once file references are defined as:
Serial-File ~ Extend[File, Serial];
Write-Once-File ~ Extend[File, Write-Once];
The following functions define the file operations. Files can be created, read, written, and they
can have their length changed.

27

CHAPTER 3: TRANSACTIONAL STORAGE

vc: Volume-Class

I Create-File[name: UniqueID, exists: Boolean / ref: File]

Create-File creates a file with a given name on a specific volume, and returns an uncounted
reference for the file. If a client wishes to preserve the file, it must make a counted
reference for it If a file already exists on the specified volume with the given name,
Error['DuplicateIdentifier] is returned unless exists is T. Although Create-File is described
in this section, it is intrinsically a volume operation.
fc:
fc:
fc:
fc:

File-Class
File-Class
File-Class
File-Class

Write[startPage: Integer, pages: Integer, data: Byte-Array]
Read[startPage: Integer, pages: Integer / data: Byte-Array]
Set-Size(pageCount: Integer]
Get-Size[/ pageCount: Integer]

The fundamental operations for data retrieval and storage are read and write. Files are
read and written in units of pages, and the size of a page is implementation dependent
Write writes into a file from an array a specified number of pages starting at a given page
number. Read reads from a file into an array a specified number of pages starting at a
given page number. The length of a file can be set and determined by Set-Size and GetSize, respectively.
These operations can result in four exceptional conditions. An attempt to write or set
the length of a file on a read-only device returns Error['ReadOnly]. An attempt to write or
set the length of an immutable file (see below) returns Error[' Immutable]. An attempt to
extend the length of a file, either by writing past the end of it or by an evaluation of SetSize, will return Error['InsufficientCapacity] if the file's containing volume is full. If an
unrecoverable failure is detected Error['StorageDamaged] is returned.
3.1.5 Mutable and Immutable Objects
After an idea originally suggested by Paul Mclones, files and volumes can be freely modified
until they are made immutable. Once an object is niade immutable it becomes permanently readonly, and there is no way of making the object mutable again. An immutable file can not be
written or have its length changed, and an immutable volume consists of a permanently fixed set of
immutable files.
The following two functions set and test the immutable property:

or: Class
or: Class

SetImmutableD
IsImmutable[/ immutable: Boolean]

Immutable objects afford important practical benefits. A copy of an immutable object can be
made without fear that the copy will become obsolete. Furthermore, because immutable objects can
not change, the concurrency control and error recovery facilities of transactions are not required for
their contents. However, deleting an immutable object is a real action, and thus requires the
facilities of transactions.
Theoretically a storage system could consist of a large number of immutable files and one
mutable file that contained a single reference. In such a scheme, every time an object was updated,
a new immutable version of the storage system would be created. The single mutable reference
would point to the current version of the storage system. The extent to which immutable objects

28

CHAPTER 3: TRANSACfIONAL STORAGE

should be used will depend on the characteristics of an application under consideration, and the
relative cost of mutable files.
The type of device that is used to store an object and whether the object is immutable or not
are completely independent For example, objects that are stored on a write-once storage device do
not have to be immutable. Although an object that is stored on a write-once device can not be
directly updated, it could be moved to a read-write device, and then updated.
3.2 Basic Algorithms
Techniques for implementing single-processor, transactional-storage systems have been well
understood for some time. At the time of this writing, a few systems have been sucessfully
constructed that implement a transactional storage system across processor boundaries [Eade et ale
77; Israel et ale 1978; LeLann 81; Tandem 81]. Because we are drawing on previous work, we will
provide an overall sketch of how a decentralized transactional storage system can be implemented,
but the reader should consult the references provided for details.
Our survey is given in four sections. Transactional storage is first introduced on a single
processor, and then extended to the decentralized case. We then discuss a different algorithm for
transactional storage due to Reed. The final section describes how remote objects are accessed in
our model.
3.2.1 Single Processor Case
The job of a transactional storage system is to implement the three properties of transactions:
totality, serial consistency, and external consistency. Existing systems follow a common pattern
[Gray 78].
The two types of consistency are implemented by lock management A lock protects objects
that a transaction uses from concurrent updates from other transactions, and it notifies other
transactions that an object the transaction updates is busy. Locks are typically set implicitly as the
result of storage system function evaluations. A client can also use its knowledge of what it is trying
to accomplish to directly set and clear locks as an optimization. Some form of time out mechanism
is usually provided that will cause a transaction to be aborted if it holds a popular lock too long.
Totality is implemented by recovery management In an "undo" implementation, updates are
directly applied to objects, and recovery management remembers old values in case a transaction
aborts. If a transaction aborts, recovery management undoes the transaction's updates by consulting
the preserved old values; hence the designation undo. In a "redo" or "intentions" implementation
a transaction does not directly update objects. Requested updates are remembered by recovery
management, and if a transaction commits, recovery management applies them to their respective
objects. Thus, the list of updates remembered by recovery management are called intentions. If the
system fails while intentions are being applied they must be reapplied, or redone, until they have
been completely applied.
Recovery management requires storage. In an undo scheme the extra storage is for old values,
and in a redo scheme the extra storage is for new values. This storage is implemented in existing
systems with a log or a shadow mechanism.
A log is an append-only file that records the history of the system. In typical log-based
systems, updating an object causes a log record to be written that includes the object's name, old

29

CHAPTER 3: TRANSACfIONAL STORAGE

and new values, and information that identifies the update's transaction.
A shadow mechanism implements storage for new values. New values are written out to fresh
pages called shadows. If a transaction commits, any files that were updated have their physical
storage description changed to point to their shadow pages instead of to pages that contain old
values.
The ideas of undo and redo can be combined. For example, in System R performance is
increased by not keeping stable storage up to date. If the system fails, updates may have to be
undone or redone from the log [Gray et al. 79].
Because most log-based systems remember redo values, they implement stable storage. Even if
part of a file is lost, the log will still have a copy of the data. To prevent logs from becoming
infinite, condensed snapshots called fuzzy dumps are taken periodically [Gray et al. 79]. Systems
that only use shadows must implement stable storage with an additional mechanism.
Recovery management keeps track of the real actions that a transaction has requested. These
actions are postponed until recovery management is told to commit or abort. If recovery
management is told to commit a transaction, then the transaction's real actions are performed.
Now that we have introduced the concepts of lock and recovery management we can describe
how they are synchronized to implement transactions. Our discussion assumes that a transaction is
processed by more than one lock manager and more than one recovery manager. Inside of a single
system there may be several lock and recovery managers. This commonly occurs when two
independent data base systems are integrated.
As we have discussed, a transaction is aborted for one of two reasons. First, a client can
request that a transaction be aborted. Second, if a lock manager can not guarantee the consistency
of a transaction, the transaction is aborted. When a transaction is aborted, the transaction's
coordinator tells the transaction's lock managers to release the transaction's locks, and the
transaction's recovery managers to erase the transaction's updates.
A transaction is committed only if a client requests that it be committed. When a client
requests a transaction to commit, the transaction's coordinator executes the following two step
algorithm.
1.

The coordinator asks all of the transaction's lock managers to guarantee the transaction's
consistency, and all of the transaction's recovery managers to guarantee to commit on
demand. If any manager can not make the requested guarantee, the transaction is aborted.

2.

After all of the managers have made their promises, the coordinator informs the recovery
managers to go ahead and commit the transaction. After the recovery managers have
applied the transaction's updates, the lock mangers are told to release the transaction's locks.

This two step process is known as the two-phase commit protocol.
3.2.2 Decentralized Case
The decentralized case is not substantially different from the single processor case with multiple
lock and recovery managers. A transaction that spans processors is implemented by a cooperating
set of single processor transaction systems. Each processor, with its own transactional storage
system, is called a worker. A worker implements lock and recovery management for the data it
stores.

30

CHAPTER 3:

TRANSACTIONAL STORAGE

A decentralized transaction's coordinator carefully synchronizes the commit and abort processing
of the transaction's workers as follows. If a transaction is aborted, either because a lock manager
could not guarantee consistency or because the client requested an abort, the coordinator tells every
worker to abort. If a client requests commit, the coordinator asks every worker to promise to
commit on demand. Before a worker can make such a promise it must ensure that as far as it
knows the transaction is consistent, and it has all of the information in stable storage that is
required for the transaction to commit If these conditions are met then the worker can make its
solemn promise. A worker that has made such a promise is said to be prepared. After making its
promise a worker must remain prepared until it is informed to commit or abort by the coordinator.
The request and promise sequence is phase one. If all workers promise to commit, then they can
all be instructed to go ahead and commit This is phase two. If any worker fails or refuses to make
the promise then the coordinator aborts the transaction. The two steps of a decentralized commit,
request and receive promises, followed by commit, is the decentralized version of the two-phase
commit protocol [Gray 78].
Coordinators must be very reliable so workers are not left prepared for an appreciable length of
time. A prepared worker is undesirable because its lock manager can not permit any transactions to
be processed that would destroy the consistency of its prepared transaction. If a transaction's
coordinator fails a worker may remain prepared for a long time.
We now digress for a moment to introduce a necessary idea. Imagine that a value can be
replicated at a number of distinct places and manipulated in a way that is reasonably tolerant of
place failures. There are two operations to manipulate the value: TryToSet and ReadValue.
TryToSet attempts to set the value, but it only sets the value ifit was NIL. Thus, if many TryToSet
operations are simultaneously evaluated only one of them will successfully update the value.
TryToSet will not return until the value is not NIL (it mayor may not have succeeded).
ReadValue returns the current value. Several algorithms have been proposed that solve this
problem [Lamport 78a; Lampson 80; Thomas 79].
The following addition to the synchronization algorithm sketched above keeps workers from
depending on a coordinator. The coordinator stores the state of a transaction at a number of places
using TryToSet. The state of a transaction can be NIL. C. or A. representing active, committed, or
aborted. When a transaction is aborted, the coordinator executes TryToSet[' A]. If a client requests
that a transaction be commited, and the coordinator has received promises from every worker, then
the coordinator executes TryToSet['C]. After it has done this, the coordinator sees if the value of
ReadValue is C, and if so. it tells the workers to commit When a worker is asked to get prepared
and· make its promise, it is passed the list of places where the state of the transaction is stored. If a
worker gets restless or its lock manager notes that there are competing requests for a. prepared
transaction's locks it can use ReadValue to determine the state of the transaction. If the state is C,
it can go ahead and commit. If the state is A, it can abort. If the state is NIL, it can execute
TryToSet[' A). Thus, regardless of when the coordinator might fail the transaction will continue to
move towards termination.
3.2.3 Reed's Algorithm
Reed has proposed a new way of implementing transactions [Reed 78). We will not describe
his scheme in detail, but we briefly sketch how his assumptions differ from tradition. First, he

31

CHAPTER 3: TRANSACTIONAL STORAGE

discarded the idea of private storage for recovery management. and instead let new and old values
coexist in storage. New values are hidden by a version mechanism until a transaction commits, and
old values persist for an implementation dependent period. Second, he abandoned locks as a
concurrency control mechanism. Every transaction in his system is assigned a distinct pseudo-time
period during which it appears to run. Data items have time stamps acquired from transactions,
and rules are enforced that ensure serial consistency.

3.2.4 Located References
Open is defined to return a class that will service requests for an object regardless of where the
object is located. In this section we will make simplifying assumptions so we can clearly present the
mechanics of how remote objects are accessed. In later chapters we add more complex concepts,
such as object location.
A reference is said to be located if it includes the address of a processor that will service
requests for its referent Until Chapter 4 we assume that all references are located, and that the
locations in references are always acctfrate. The address of an object is added to its reference as
follows:
Create-Located[ref: Reference, loc: Processor / Iref: Located]
Create-Located creates a located reference. When Open is applied to Iref, control will be
transferred to processor loc, and ref will be opened there.
Located

~

Record[ref: Reference, loc: Processor];

Create-Located[ref: Reference, loe: Processor / lref: Located]
Iref ~ Create[Located];
lref.ref ~ ref;
lref.loe ~ loe;
Return[Add-Type[lref, Major-Type[ref]]];
];

~

Prog[ (];

When we create references that are derived from other references, the new reference is assigned
the same major type as the reference it was derived from. For example, if Create-Located is
applied to a file reference, it will return a located file reference.
Major-Type[ref: Reference / type: Type] ~ Prog[ [];
-- return the major type of ref
Return[Intersection[ref.type,
Union[File, Volume, Coordinator, Transaction, Index, Processor, Counted,
Secure-Door, Suite]];
];
When Open encounters a located reference, it calls Open-Located:
Open-Located[ref: Reference, tc: TC, ring: List[Key], guards: List[Key] / c: Class]
Open-Located will open ref on the processor that is specified in its reference. The class
returned can be used to communicate with the remote object as if it were local.

It is not sufficient just to evaluate Open on the remote processor because classes can not be

32

CHAPTER 3: TRANSACfIONAL STORAGE

passed between processors. Thus, Open-Located arranges to open a reference on a remote
processor, and have the resulting class kept at the remote processor in a set called Doors. An entry
in Doors is indexed by a unique identifier, and the unique identifier is returned to the originating
processor. We will call an entry in the set Doors a door. The class created by Open-Located can
refer to a remote class by specifying the unique identifer of its door.
Shown below is the model implementation of Open-Located. Open-Door opens a reference
on a remote processor and creates a door for the resulting class. A request directed at a located
class is sent to Enter-Door. Enter-Door passes the request and a door identifier to the remote
processor, where Door-Eval actually performs the requested operation. Figure 3.1 shows the
structure of the class that Open-Located creates.
Open-Located[ref: Reference, te: TC, ring: List[Key], guards: List[Key] / class: Class] fProg [
[rp: Processor-Class; d: Door; t: Transaction; r: Reference];
IF ref.loc=GetMyProcessor[] THEN [
-- object is at this processor
class f- Open[ref.ref, te, ring, guards];
IF Is[class, Error-Type] THEN Return[class];
ELSE [
-- object is at a remote processor
t f- tc I CopyReferenceO;
r f- ref.ref;
rp f- Open[ref.loc];
-- pass guards but not ring to remote processor (details in Chapter 7)
d f- rp I Eval[,Open-Door[r, t, guards]];
IF Is[d, Error-Type] THEN Return[d];
class f- Function[Enter-Door];
];
Instance variables: rp, d, ref, tc
Return [Create-Class[
List[' Copy Reference, 'Default-Copy],
class]];
];
Enter-Door[request: List / result: Any] f- Prog [ 0;
result f- rp I Eval[,Door-Eval[d, request]];
-- Eliminate problems with orphans if the processor fails
IF result = Error['ProcessorFailure] THEN te I Abort[];
Return[result];
];
Default-Copy[counted: Boolean/ cref: Reference] f- Prog [ [];
-- default implementation of Copy Reference
cref f- Copy[ref];
IF counted THEN [
-- if counted, increase reference count of referent
Apply[superclass, request];
-- add counted type
Add-Type[cref, Counted];
]

33

located Class
rp:
d:

I
I

Processor-Class
Unique Identifier

I

,

".

Processor
Class

I
I

I
I

,~

,,.

Remote P rocesso r



,""


0
0



Doors

Structure of a Located Class
Figure 3.1

34

Class

CHAPTER 3: TRANSACTIONAL STORAGE

ELSE Remove-Type[cref. Counted];
Retum[cref];
];

Open-Door opens a reference and returns the identifier of its door. Door-Eval applies a door to a
request
Doors

~

Create-Set[];

Open-Door[ref: Reference. t: Transaction, guards: List[Key] / d: Door]
[c: Class; dc: Class];
c ~ Open[ref, Open[t]. NIL, guards];
IF Is[c, Error-Type] THEN Retum[c];
d ~ GetUniqueIDD;
dc ~ Create-Class[List[·Close. 'Close-Door]. c];
Set-Insert[Doors, d. dc];
-- Instance variables: c, d
Retum[d];
];

~

Prog [

Door-Eval[d: Door, request: List / result: Any] ~ Prog[ [c: Class];
c ~ Set-Lookup[Doors. d];
IF Null[c] THEN Return[Error['NoSuchDoor]];
Retum[Apply[c. request]];
];

Close-Door[]

~

Prog [ [tc: Transaction-Class];
c I GetTransactionClassD;
c I Close[];
tc I CloseD;
Set-Delete[Doors, d];
];
tc

3.3

~

Refinements

3.3.1 Lock Compatibility
A typical locking structure that is used to guarantee serial consistency has two types of locks,
read and update. These locks are set on data items implicitly in response to file operations. The
compatibility of the locks is specified by Table 3.1.

No Lock

Read

Update

No Lock

Yes

Yes

Yes

Read

Yes

Yes

No

Update

Yes

No

No

Table 3.1: Typical Lock Compatibility Matrix

A transaction is suspended if it attempts to set a lock that is incompatible. This matrix
corresponds to the familiar rule that either n readers or one updater are permitted to access a file

35

CHAPTER 3: TRANSACfrONAL STORAGE

simultaneously.
This locking rule potentially can introduce long periods of time when information is unavailable.
For example. if a user controls the length of a transaction. he can hold a update lock for a long
period of time. This may naturally occur as a user thinks at the keyboard.
A property of serial consistency is that all of a transaction's updates appear to occur at
transaction commit time. One can take advantage of this property to increase the concurrency in the
following way. Updates appear to occur when they are issued. but in fact are buffered until commit
time by the stable storage system. Allowances are made so a read following a write will receive the
write's data. When a user updates a datum, an I-Update lock is set, for intention to update. At
commit time I-Update locks are converted to Commit locks. and the updates are actually reflected in
stable storage. Table 3.2 specifies the new lock compatibility matrix.
No Lock

Read

I-Update

Commit

No Lock

Yes

Yes

Yes

Yes

Read

Yes

Yes

Yes

No

I-Update

Yes

Yes

No

No

Commit

Yes

No

No

No

Table 3.2: Lock Compatibility Matrix with Intention Locks
With this revised locking matrix, information is only unavailable for predictably short periods of
time. during commit processing. This results in increased concurrency, but it may cause the later
abortion of a transaction. We chose to make multiple I-Update locks incompatible, because
eventually one of the two transactions would probably commit, and become incompatible. Thus we
chose not to postpone the inevitable.
This is one example of what is called lock conversion [Gray 78]. Lock conversion allows clients
to predeclare their intentions to help lock management schedule transactions.

3.3.2 Lock Granularity
The granularity of a lock [Gray et al. 76] refers to its scope. For example, a fine grained lock
might be set on a range of bytes in a file. and a coarse grained lock might be set on an entire file.
Choosing the granularity of locks presents a trade off between concurrency and lock manager
overhead. Fine grained locks increase concurrency by not locking as much, but lock management
will set more locks. On the other hand, if coarser locks are chosen. then concurrency will be
reduced.
Locking protocols have been devised that provide variable granularity locks [Gray et al. 76].
Variable granularity locks allow clients to decide how much should be locked, and have proven to
be very valuable in practice.
3.3.3 Lower Degrees of Consistency
There are a number of different levels of consistency, but serial consistency is the highest, and
the consensus is that lower degrees of consistency are not sufficient for all applications [Gray 78].
[Gray et al. 76] define four degrees of consistency that can be roughly characterized as follows.
Degree 0 protects transactions from overwriting each others uncommitted updates. Degree 1 adds

36

CHAPTER 3: TRANSACfIONAL STORAGE

totality. Degree 2 adds protection against reading the uncommitted updates of others. Degree 3
provides serial consistency. Further details are provided in the taxonomy [Gray et al. 76].
The motivation for using lower degrees of consistency is that they permit more concurrency,
perform better because fewer locks are set, and are less complex to implement than higher degrees.
A transactional storage system could support lower degrees of consistency. However, there are a
number of ways of achieving the benefits of lower degrees of consistency with the system as
described. For example, if a transaction runs for a long time it could be broken up into many
smaller transactions.
3.3.4 Broken Read Locks
In some cases when a data item that a transaction has read becomes obsolete it is not necessary
to abort the transaction. A common example is a cache manager that holds data as a convenience.
If one of the data items it is holding is no longer current it would be content to be informed of that
fact so it could remove the item from its cache, and then proceed.
Transactional storage could inform clients that a data item they previously received has become
obsolete, and give them the option of not aborting their transaction. This is called breaking a read
lock, in reference to the read lock the data item held. The advantage of the scheme is that fewer
transactions are aborted.
3.3.5 Releasing Read Locks
Transactional storage could allow clients to inform it that although they read a data item, it did
not influence their computation, and its corresponding read lock could be released. As a general
rule, the more read locks a transaction holds, the more likely it is to conflict with another
transaction and be aborted. In addition, the system must verify that read locks are still being held
at commit time, which increases the amount of communication that must occur.
3.3.6 Deadlock
Cyclic dependencies, or deadlocks, can occur when clients dynamically acquire resources. For
example, if client A acquires X and wants Y, and client B acquires Y and wants X, a deadlock
exists. In order to guarantee progress either A or B must be aborted and the other allowed to
proceed.
Deadlock prevention requires that all transactions either declare in advance what resources they
are going to use, or acquire resources in a fixed order. Deadlock prevention is usually considered
too restrictive for general use, and it reduces concurrency by locking too much for too long.
Deadlock detection maintains a resource dependency graph to detect when a cyclic dependency
occurs. This approach is difficult to efficiently implement in a decentralized system, although some
proposals have been made [Gray 78; Obermarck 80].
A time-out strategy resolves deadlocks by aborting a transaction if it holds a resource for too
long that another transaction has requested. The problem with a time-out scheme is that in a
system with a lot of contention for resources, once a transaction has run longer than the timeout
interval it is likely to be aborted. This problem can be alleviated somewhat by specifying the timeout interval for a transaction when it is created.

37

CHAPTER 3: TRANSACfIONAL STORAGE

3.3.7 Checkpoints and Save Points
It is possible to commit a transaction and still keep the transaction active for further operations.
Such a commit is called a checkpoint. A checkpoint allows a transaction to save a consistent set of
results to hedge against the possibility that it might not complete [Gray et al. 79].
A save point is an intermediate place in a transaction that recovery management can return to if
the transaction gets into trouble. For example, if a transaction is involved in a deadlock, it may be
possible for recovery management to return to the transaction's last save point If this is possible,
all of the transaction's work will not be discarded, as would be the case if recovery management
could only abort the entire transaction.
3.3.8 Nested Transactions
The one level transaction structure that has been presented can be extended to include nested
transactions [Davies 73; Reed 78]. Nested transactions are just like normal transactions. except that
when they commit their results are only visible to their parent transaction. For example. imagine T
has two nested transactions, Tl and T2. running at the same time. Tl and T2 can not detect each
others presence, and their results are only visible to T when they commit The results of Tl and T2
are made public when T commits.
Nested transactions may provide a performance improvement by making transactions less likely
to abort For example. in Section 3.2.4 we described a scheme that forced a transaction to abort if a
processor failed. If Enter-Door created a separate nested transaction for each remote evaluation,
then on processor failure it could abort the corresponding nested transaction and retry the
evaluation.
3.3.9 Stable Storage Failure
The fundamental premise that we are operating under is that stable storage will not fail.
However, stable storage will fail sometime. If stable storage fails it is the clients' responsibility to
reconstruct its contents. Simply using an obsolete copy of the destroyed data will not do. Assume
that there is a catastrophic failure of volume V, and all that remains is an out of date copy of V
made at time T. One can not simply restore V to its state at time T because consistency would be
destroyed. Consider the case where V stores bank account balances, and substantial withdrawals
have been made since time T. Restoring V would result in the bank giving away money.
Obsolete information can be of use, however. For example, if V and V's log from time T to
the present are available stable storage can be totally reconstructed. If a log is not available, a
salvaging program could read obsolete data and combine it with information available from outside
the system to rebuild storage.
3.4 Summary

Transactional storage was introduced as an ideal way of allowing stable storage to be shared
between clients. The fundamental characteristics of transactional storage were described, along with
a model set of objects and primitives. A survey of existing and proposed algorithms for the
implementation of transactional storage was provided. The final section of the chapter outlined
some refinements that can be made to the model and its implementation.

38

CHAPTER 3: TRANSACfIONAL STORAGE

Exercise
1.

Give an algorithm for TryToSet

Will your algorithm always terminate in finite time?

39

Chapter 4: Location

In many instances objects naturally physically move about in a decentralized system. A
removable storage volume is an obvious example. Other objects, such as coordinators, can also
move if the processor that implements them changes.
This chapter describes the techniques that we employ to determine where an object is
implemented. The first section introduces the notion of an index, the second section shows how
indexes can be used to implement indirection, the third section introduces indirect references, the
fourth section discusses how files, volumes, and other objects are located, and the last section
describes choice references. The chapter ends with a brief summary.

4.1

Indexes

An index holds a set of entries. An entry is an array of bytes, and each entry is named with an
entry-name. Like other objects, indexes are named by references, and Open is evaluated to obtain a
class to service an index. Like files and volumes, indexes can be made immutable. The following
functions define the interface to indexes:
Create-I ndex(storage: File / index: Index]
Create-Index takes a file reference and returns an uncounted reference for a new index. It
is presumed that storage either contains the data structures that represent an index, or is a
zero length file. We will see in Chapter 6 why storage might already contain the data
structures of an index. The index will inherit the unique identifier of storage. An index
reference is defined to have the following type:
Index +- Extend[Record(storage: File], Reference];
ic: Index-Class

I Write[entry-name: Byte-Array, value: Byte-Array]

Write enters an entry in an index and names it entry-name. If an entry previously existed
under the specified entry-name its value is updated. If an entry previously existed and the
value argument to Write is NIL, the entry is deleted from the index.
ic: Index-Class

I Read[entry-name: Byte-Array / value: Byte-Array]

Read returns the entry named by entry-name.
name, NIL is returned.

If there is no entry under the specified

Entry +- Record[entry-name: Byte-Array, value: Byte-Array];
ic: Index-Class

I Enumerate[last: Entry / next: Entry]

Enumerate allows all of the entries in an index to be enumerated. The first entry in an
index is returned when last is NIL, and next is NIL when there are no more entries.
There are many algorithms for implementing indexed files. Popular approaches are presented
by [Knuth 73] and [McCreight 77].

40

CHAPTER 4: LocATION

4.2 Indirection
4.2.1 Model
As the first step to implement location, we introduce processor-independent indirection.
Indirection is processor-independent in the sense that an indirect pointer can be made on one
processor and dereferenced on another processor. The following four functions define the interface
to the indirection mechanism:

Create-Indirect[record: Record, index: Index, te: TC / ie: Indirect]
Create-Indirect creates an indirect entry, places record in the entry, and returns an indirect
pointer to the new entry.
Lookup[ie: Indirect, te: TC / record: Record]
Lookup returns the contents of the indirect entry specified by ie.
Change-Indirect[ie: Indirect, record: Record, te: TC]
Change-Indirect places record in the indirect entry specified by ie.
indirect entry is deleted.

If record is NIL, the

Normalize[ie: Any, te: TC / record: Record]
If ie is indirect, Normalize dereferences ie using Lookup until the result is not an indirect
record.
4.2.2 Basic Algorithm
Indexes are used to implement indirection. As shown in Figure 4.1, an indirect record points to
the index entry that is used to hold its value.

Indirect +- Record[indirect-id: UniqueID, index: Index];
Create-Indirect[record: Record, index: Index, te: TC / ie: Indirect]
+- Prog[ [ic: Index-Class];
ie +- Create[Indirect];
ie.indirect-id +- GetUniqueID[];
ie.index +- index;
ic +- Open[index, te];
IF Is[ic, Error-Type] THEN Return[ic];
-- check for unique id generator malfunction
IF Not[N uIl[ic I Read[ie.indirect-id]]] THEN [
ic I Close[];
. Return[Error['UniqueIDMalfunction]];
];
ic
Write[ie.indirect-id, Encode[record]];
ic I Close[];
Return[Add-Type[ie, Major-Type[record]]];
];
Lookup[ie: Indirect, te: TC / record: Record] +- Prog[
[ic: Index-Class];
ic +- Open[ie.index, te];

41

"",
Indirect Record
index:

id:

I
I

I

Index Reference

I

Unique Identifier

I

I

""
~

Record

Index

An Indirect Record
Figure 4.1

42

CHAPTER 4: LocATION

record +- Decode[ic
ic I Close[];
Return[record];
];

I

Read[ie.indirect-id]];

Change-Indirect[ie: Indirec~ record: Record, te: TC] +- Prog[
ric: Index-Class; contents: Any];
contents +- IF Null[record] THEN NIL ELSE Encode[record];
ic +- Open[ie.index. te];
ic I Write[ie.indirect-id. contents];
ic I Close[];
];
Nonnalize[ie: Any. tc: TC / record: Record] +- Prog[ [];
record +- ie;
UNTIL Not[Is[record. Indirect]] 00 record +- Lookup[record. te];
Return[record];
];
4.3 Indirect References
With the structure we have described so far once a reference is given to a client it must never
change. There are many reasons to eliminate this restriction. For example. if a located reference
could change. then the location of its referent could change. At times it may also be desirable to
change the referent of a reference, perhaps to cause clients to use a new object
To allow references to change, we introduce the notion of an indirect reference. Indirect
references are created by applying Create-Indirect to a reference. Indirect references are used just
like ordinary references. They are counted or uncounted, depending on the type of their original
reference. CopyReference and DestroyReference are used to copy and destroy indirect references.
Indirect-References take up less space than nonnal references, and thus can reduce the amount of
storage occupied by common references.
ChangeReference is used to change' an indirect reference and destroy the old reference that was
in its index entry. Change-Indirect can be used to change an indirect reference, but it does not
destroy the old reference.
ic: Indirect-Class

I

ChangeReference[nref: Reference]

ChangeReference causes the indirect reference represented by ic to be changed to point at
nref ic is also changed to service nref
When a counted-indirect reference is made for an objec~ the object's reference count is
increased by one. When the destruction of an indirect reference results in the destruction of its
referent (it was the only counted reference), then the indirect reference's index entry is also
destroyed. Thus, if three counted-indirect references share an indirect entry, the indirect entry will
be destroyed when the last of the three references is destroyed.
A file reference is defined to include a reference for its containing volume. Thus, when a file is
created on an indirect volume. a reference for the indirect volume is included in the new file
reference.
Figure 4.2 shows the class structure that is established when an indirect reference is opened.

43

CHAPTER 4: locATION

All requests are first considered by a class specialized for indirect references, which is called an
indirect class. If a request is not one of the four operations that the indirect class handles, then it is
passed along to a class for the indirect reference's referent A similar class structure will be used for
other types of references.
Open-Indirect[ref: Reference, te: TC, ring: List[Key], guards: ListIKey] I c: Class]
+- Prog[ [d-ref: Reference];
d-ref +- Lookup[ref. te];
c +- Open[d-ref, te, ring, guards];
IF Is[c, Error-Type] THEN Retum[c];
c +- Create-Class[List{
'CopyReference, 'Default-Copy,
'ChangeReference, 'Indirect-Change,
'DestroyReference, 'Indirect-Destroy,
'Create-File, 'Indirect-Create-File],
c];
Instance variables: ref, te, ring, guards
Retum[c];
];

Indirect-Change[nref: Reference] +- Prog[ 1];
-- change reference
-- first, destroy old reference
superclass I DestroyReferencel];
superclass I Closel];
-- second, update index
Change-Indirect[ref, nref, te];
-- third, open new reference and switeh superclass
superclass +- Open[nref, te, ring, guards];
IF Is[superclass, Error-Type] THEN Retum[superclass);
];

Indirect-Destroy[1 deleted: Boolean] +- Prog[ 1];
-- destroy reference pointed to
deleted +- Apply[superclass, request);
-- destroy indirect if object deleted
IF deleted THEN Change-Indirect{ref. NIL, te];
Retum[deleted];
];

Indirect-Create-File[] +- Prog[ 1];
-- substitute an indirect volume ref
file-ref +- Apply[superclass, request];
IF Is[file-ref, Error-Type] THEN Return[file-ret];
file-ref.volume +- self I CopyReference[];
];

Open-Indirect can cache the results of Lookup requests. However, if it uses one of these
cached references and an Error results, then it must use Lookup to determine if its cached value is
still current If an error is going to occur it will occur at Open time, so it is always safe for OpenIndirect to attempt to use an obsolete cache entry.

44

I

Indirect Ref :

,""

,...

Reference

Referent

.
I

J..
Reference Structure

I

I

Indirect Class

?

Services:

""'-

Referent Class

Services all other

Change Reference

operations

CopyRefe rence
Create-File
Dest royRefe rence

Class Structu re

Structure of an Indirect Class
Figure 4.2

45

CHAPrER. 4: locATION

4.4 Object Location
Indirect references are used to implement location for volumes, coordinators, and indexes. As a
matter of convention, indirect references are always made for objects that might move. When the
location of such an object changes, Change-Indirect is used to update the object's indirect entry with
a new located reference.
Take the case of a removable volume as an example. When the volume is created, an indirect
reference for it is made. This indirect reference is what is given to clients and passed around in the
system. Whenever the removable volume is mounted, its location is placed in the index specified
by its indirect reference. Clients that hold the volume's reference will then be able to access it
Indexes can not always be located with indexes because of the obvious problem with recursion.
The index-index is introduced to solve this problem. The index-index is named by a fixed, located
reference. Because the index-index will be used frequently an implementation should represent its
reference compactly. The reference for the index-index is a located reference:
Index-Index: Located-Index
The location in the index-index reference is Broadcast. This is the only use of the broadcast
address we will make. We assume that every processor that listens for broadcasts knows of a
processor that will service requests for the index-index.
Files and indexes are assumed to be located at the same place as their containing volume, and
we can take advantage of this fact to reduce our need for indirection. We simply determine the
location of an index's or file's volume, and use it to construct a located reference.
Open-File[ref: Reference, tc: TC, ring: List[Key], guards: List[Key] / c: Class] +- Prog [
[volume-ref: Reference];
-- Get located volume reference
volume-ref +- Normalize[ref.volume, tc];
IF Not[Is[volume-ref, Located)) THEN Retum[Error['NotFound));
-- Open file at the processor where the volume is
c +- Open[Create-Located[ref, volume-ref.loc], te, ring, guards];
IF Is[c, Error-Type] THEN Retum[c];
Retum[Create-Class[List['Copy Reference, 'Default-Copy], c));
];
Open-Index[ref: Reference, tc: TC, ring: List[Key], guards: List[Key] / c: Class] +- Prog [
[volume-ref: Reference];
-- Get located volume reference
volume-ref +- Normalize[Normalize[ref.file, tc].volume, tc];
IF Not[Is[volume-ref, Located)) THEN Retum[Error['NotFound));
-- Open index at the processor where the volume is
c +- Open [Create-Located[ref, volume-ref.loc], tc, ring, guards];
IF Is[c, Error-Type] THEN Retum[c];
Retum[Create-Class[List['Copy Reference, 'Default-Copy], c));
];

4.5 Choice References
Let us return to the way we name objects for a moment We have stated that a reference is an
unambiguous name for a single object Here we modify that view slightly. Imagine that we ~anted

A£

CHAPrER 4: LocATION

to use some object from the list 01 ... On, but we did not care which object was chosen. To
implement this idea we introduce the idea of a choice reference. A choice reference is a set of
references S, and when a choice reference is opened, the class returned will correspond to one of
the elements of S.
A strategy could be employed to select which element of a choice reference is selected. For
example, in the case of a volume choice reference we could select the volume with the largest
remaining capacity.
An application for choice references is when anyone of a number of processors can service
requests for an object In this case, we do not want to place the address of a single processor in the
object's index entry. Instead, we create a processor choice reference that includes all of the
processors that can service the object, and use this choice reference as the object's location. A
choice reference is created by Create-Choice:
Create-Choice[choices: List[Reference] I cr: Choice]
Create-Choice creates a reference whose referent can be anyone of the references in
choices.
Choice +- Record[choices: List[Reference));
Create-Choice[choices: List[Reference] I cr: Choice] +- Prog[ 1];
cr +- Create[Choice];
cr.choices +- choices;
Retum[cr];
];
The model implementation for a choice reference opens all of its possible choices at once, and
uses the first object to respond. Objects that are not selected are closed.
Open-Choice[ref: Reference. te: TC, ring: List[Key]. guards: List[Key] I class: Class]
+- Prog[ [choice-made: Condition-Variable, choice-lock: Lock];
class +- NIL;
choice-lock +- Create-Lockl];
choice-made +- Create-Condition-Variablel];
MapCar[ref.choices, 'Choice-Open];
UNTIL Not[Null[Get-Choice-Class)) 00 Wait[choice-made];
Retum[Get-Choice-Classl11;
];
Get-Choice-Class[/c: Class] +- Critical[choice-Iock, 'class];
Set-Choice-Class[c: Class] +- Critical[choice-Iock. 'Prog[ 1];
IF Is[c Error-Type] THEN Retuml];
IF Null[class] THEN [
class +- c;
Broadcast[choice-made];
]
ELSE c I Closel];

));
Choice-Open[ref: Reference] +- Fork[,Set-Choice-Class[Open[ref, te. ring, guards]]];

47

CHAPTER 4: LocATION

4.6 Summary

This chapter described a general framework for locating objects. Indexes were defined. and
used to implement indirect records. Indirect records in tum allowed indirect references to be
created. Indirect references were used to keep the location of an object in a single place that can be
conveniently updated. Choice references were introduced to handle a service that is implemented
by many objects.

Exercise

1.

Under what circumstances would you use a choice reference for a volume?

48

Chapter 5: Replication

So far we have described a system where only one copy is kept of files, indexes, and volumes.
This chapter describes how objects can be replicated to improve their availability, reliability, and
performance. The first section describes a model of the replication facilities we provide, the second
section shows how our model can be implemented, the third section argues that the replication
algorithm preserves the properties of transactions, the fourth section offers some refinements to the
basic algorithm, and the final section compares our method of replication with previous work. The
chapter ends with a brief summary.

5.1 Suites
To introduce what a suite is and how one works let's start with an example. Imagine that we
would like to maintain three copies of a file. We will say a copy is current if it has received all of
the updates that have been applied to the file. Here are three approaches for maintaining the
copies:
1.

Always update all three copies. Thus, all three copies will always be current To read the
file, we could read anyone of its three copies. Of course, if anyone of the three copies is
unavailable we would be unable to update the file.

2.

Always update at least two copies. Both of these copies must be current, to guarantee that
a current copy always receives all of the updates that are applied to the file. To read, we
know that if we examine two copies, one of them is guaranteed to be current If version
numbers are kept on the copies, then we can identify a current copy, because it will have
the highest version number.

3.

Always update at least one copy. Once again, this copy must be current, to guarantee that a
current copy always receives all of the updates that are applied to the file. To read. we
know that if we examine all three copies, one of them is guaranteed to be current Version
numbers are once again required to determine which of the three copies is current

These methods all preserve the invariant that every set of copies that they write intersects with
every set of copies that they read. In this way they are always guaranteed to provide current
information.
We will call a collection of copies that implements a single object a suite, and individual copies
representatives. Our example described a file suite with three representatives. Furthermore, we will
call a set of representatives that is examined to read a suite a read quorum, and a set of
representatives that is written in response to a suite write operation a write quorum. As in our
example,every read quorum and every write quorum must have a representative in common. The
read and write quorums described by the above three methods are respectively:
1.

Write Quorum: {I, 2, 3}
Read Quorums: {I}, {2}, {3}

49

CHAPTER 5: REPLICATION

2.

Write Quorums: {I, 2}, {I, 3}, {2, 3}
Read Quorums: {I, 2}, {l, 3}, {2, 3}

3.

Write Quorums: {I}, {2}, {3}
Read Quorum: {I, 2, 3}

The basic idea behind our replication mechanism is a compact encoding of a suite's read and
write quorums. Every representative of a suite is assigned a non-negative integer, which represents
a number of votes. A read quorum is considered to be any set of representatives whose votes total
to r, and a write quorum is considered to be any set of representatives whose votes total to w. Nonnegative integers r and w are chosen such that r + w is greater than the total number' of votes
assigned to the suite. A suite's voting configuration is the triple (vote-assignment, r, w).
As in our example, when updates are applied to a write quorum every member of the write
quorum must be current Thus, every read quorum and every write quorum have a representative
in common, and every read quorum will always have a representative that is current Version
numbers make it possible to identify this representative. Section 5.3 describes the algorithm in
detail.
To read a suite, at least r votes worth of representatives must be available. To write a suite,
MAX[r,w] votes worth of representatives must be available. This is because before a suite can be
written a read quorum of representatives must be assembled to determine what version number a
current representative will have. When the contents of a suite are totally replaced only w votes
worth of representatives need be available (Section 5.4.6).
The algorithm has a number of desirable properties. It starts with and preserves the properties
of transactional storage, including totality, serial consistency, and external consistency (Section
3.1.2). It continues to operate even when some representatives are unavailable. It is simple, and
can be used to create many types of suites. In addition, all of the copies of an object, including
temporary copies that clients create to increase performance, can be incorporated into the
framework.
A suite's characteristics can be chosen from a range of possibilities by adjusting its voting
configuration. For example, heavily weighting high performance representatives will result in a
suite with higher performance, and heavily weighting representatives that are very reliable will result
in a more reliable suite. A completely decentralized structure results from equally weighting
representatives, and a completely centralized scheme results by assigning all of the votes to a single
representative.
Once the general reliability and performance of a suite is established by its voting configuration,
the relative reliability and performance of Read and Write can be controlled by adjusting rand w.
As r decreases, reads become more efficient and reliable. As w decreases, writes become more
efficient and reliable. The choice of rand w will depend on the read to write ratio expected, the
relative costs of reading and writing, and desired suite characteristics.
Table 5.1 suggests the diverse mix of properties that can be created by appropriately setting r
and w. The blocking probabilities shown in the table represent the probability that a quorum will
not be available. We have assumed that the probability that a representative is unavailable is .01.
Example 1 is configured for a file with a high read to write ratio in a single server, multiple user
environment Replication is used to enhance the performance of the system, not the reliability.

50

CHAPTER 5: REPLICATION

There is one server on a local network that can be accessed in 75 milliseconds. Two users have
chosen to make copies on their personal disks by creating weak representatives, or representatives
with no votes (see Section 5.5.1 for a complete discussion of weak representatives). This allows them
to access the copy on their local disk, resulting in lower latency and less traffic to the shared server.
Example 2 is configured for a file with a moderate read-to-write ratio that is primarily accessed
from one local network. The server on the local network is assigned two votes, with the two servers
on remote networks assigned one vote apiece. Reads can be satisfied from the local server, and
writes must access the local server and one remote server. The system will continue to operate in
read-only mode if the local server fails. Users could create additional weak representatives for lower
read latency.
Example 3 is configured for a file with a very high read to write ratio, such as a system
directory, in a three server environment Users can read from any server, and the probability that
the file will be unavailable is very small. Updates must be applied to all copies. Once again, users
could create additional weak representatives on their local machines for lower read latency.
Exam~le

ExamQle 1

ExamQle2

75
65
65

75
100
750

75
750
750

(1,0,0)
1
1

<2,1,1)
2
3

(1,1,1)
1
3

Read
Latency (msec)
Blocking Probability

65
1.0 X 10-2

75
2.0 X 10-4

75
1.0 X 10-6

Write
Latency (msec)
Blocking Probability

75
1.0 X 10-2

100
1.0 X 10-2

750
3.0 X 10-2

Latency (msec)
Representative 1
Representative 2
Representative 3
Voting Configuration
r
w

3

Table 5.1
Create-Suite is used to organize a collection of objects into a suite.
Create-Suite[id: UniqueID, r: Integer, w: Integer, rep: List[Reference], votes: List[Integer]
/ suite: Suite]
Create-Suite creates a suite reference from a specification that consists of the identifier to be
assigned to the suite, r, w, a list of representative references, and a list of the
representatives' respective vote assignments. The suite reference returned can be opened
and used like an ordinary object The suite's type is determined by the type· of its
representatives. All of the references in rep must be of the same type, and they all must
have the same version number. If the resulting reference is a volume reference, then files
created on suite will be file suites.
Suite +- Record[];

51

CHAPTER 5: REPLICAnON

Normal-Suite +-

Extend(Suite, Record(id: UniqueID, r:
List[Reference], votes: List[IntegerD;

Integer, w:

Integer, rep:

Create-Suite[id, r, w, rep, votes I suite] +- Prog[ [);
-- start making up the suite
suite +- Create[Normal-Suite);
suite.r +- r; suite. w +- w; suite. votes +- votes;
suite.id +- id; suite.rep +- rep;
Return[Add-Type[suite, Major-Type[car[repDD;
];
When Create-Suite is applied to a set of volumes a volume suite results. A volume suite is a
template for creating file suites. When Create-File is applied to a volume suite a file suite will be
created that has the same voting configuration as the volume suite. The new file suite will have a
set of representatives whose votes total to at least MAX[r,w]. This ensures that the new suite will be
fully functional.
In order to allow the replication algorithm to keep track of which representatives have current
information. and to allow the algorithm to update obsolete representatives. all objects that will be
used as representatives must implement the following operations.
class: Class

I

GetVersion[1 version: Integer)

The version number of an object is a count of the number of writes that have been
performed on the object Consistency is guaranteed for version numbers. In other words.
once GetVersion returns the version number of an object, no other transaction will update
the object before the the transaction associated with class ends.
class: Class

I

Copy[copy-from: Reference]

Copy copies the index or file specified by copy-from to the index or file serviced by class.
Copy is implemented in such a way that an object can be copied to itself. After Copy
finishes. the original object and its copy are indistinguishable.
The object serviced by class receives the following state from copy-from: version
number. reference count, length. immutability, and contents. The last thing that is copied is
the version number. Thus, if a transaction is committed before Copy finishes. the object
serviced by class will still have its old version number.
In order to guarantee that an object has a monotonically increasing version number.
copy-from must have a version number that is larger or equal to the version number of
class. If this is not the case. Copy will return Error['VersionError).
5.2 Basic Algorithm
We present the basic algorithm by describing how operations on a suite are transformed into
operations· on the suite's representatives. The perspective taken is that of a single transaction. Of
course there can be many transactions accessing a given suite at the same time. all performing the
algorithm.
We also provide a step by step synthesis of a class to implement the algorithm. The class is
initialized by Open-Suite, which is called by Open to create a class to service a suite. Before OpenSuite ret IfDS, it gathers a read quorum to determine the suite's version number. Open-Suite

52

CHAPTERs: REPLICATION
establishes the correspondence between operations on the suite (e.g. Create-File) and the functions
that implement these operations in tenns of the suite's representatives (e.g. Suite-Create).
Careful use of critical sections has been made so the class can process simultaneous requests.
Figure 5.1 shows the gross structure of a suite cl&ss.
Open-Suite[input-ref: Suite, te: TC, ring: List[Key], guards: List[Key] / sc: Class] +- Prog[
[x: List[Representative];
r: Integer; w: Integer; votes: List[Integer];
number-of-representatives: Integer;
-- normal form of input-ref is suite-ref
suite-ref: Normal-Suite;
-- Broadcast[crowd-Iarger] occurs when a new representative becomes available
crowd-larger: Condition Variable;
-- write-lock must be locked to access write-quorum or raw-votes
write-lock: Lock;
write-quorum: List[Representative];
raw-votes: Integer;
-- suite-lock must be locked to access any of the following:
suite-lock: Lock;
suite: List[Representative];
current: Integer;
a current representative has this version number
not-found-votes: Integer;
];
-- let's see what is out there and initialize suite variables
Initiate-Inquiries[];
x +- Collect-Read-Quorum[];
-- if suite was not found, return error to client
IF Is[x, Error-Type] THEN Return[x];
-- Instance variables: r, w, votes, number-of-representatives, suite-ref, crowd-larger,
write-lock, write-quorum, raw-votes, suite-lock, suite, current, not-found-votes,
input-ref, te, ring, guards
Return[Create-Class[List[
'Read, 'Suite-Read,
'Copy, 'Suite-Copy,
'Enumerate, 'Suite-Read,
'GetVersion, 'Suite-Read,
'Write, 'Suite-Write,
'IsImmutable, 'Suite-Read,
'SetImmutable, Suite-Write,
'GetSize, 'Suite-Read,
'GetID, 'Suite-GetID
'SetSize, 'Suite-Write,
'GetTransactionClass, 'Suite-Read,
'Copy Reference, 'Suite-Copy Ref,
'DestroyReference, 'Suite-Write,
'Close, 'Suite-Close,
'Create-File, 'Suite-Create], NIL]];
];

53

,""

Representative
Class

Suite Class

[!]
w:[!]
r:

Votes

Version

1

47
47
46

1
1

Class

,'"

Representative
Class

suite-lock

,..

St ructu re of a Suite Class
Figure 5.1

54

Representative
Class

CHAPTER 5: REPLICATION

Suite-GetID[1 id: UniqueID] ... [suite-ref.id];
A quorum is represented by a list of Representative records. Collect-Quorum and CollectWrite-Quorum collect quorums. They are described in detail later.
Representative ... Record[
version: Integer,
votes: Integer,
class: Class,
ref: Reference];
A suite read operation is an operation that examines the state of a suite without changing it
The operations Read. Enumerate, IsImmutable, GetSize. GetTransactionClass, and GetVersion are
all transformed into Suite-Read.
A suite read operation can be performed by any current representative. To find a current
representative the algorithm first collects a read quorum.
From this quorum a current
representative is selected, and the requested read operation is performed by this representative.
Ideally, one would like to read from the representative that will respond the fastest
If storage is damaged by an unexpected error, Error['StorageDamaged] will be returned by a
read request In this event Suite-Read could attempt to find another current representative to read
from.
Suite-Read[1 result: Any] ... Prog[ [];
result ... Apply[Select-Current[Collect-Quorum[r]].class, request];
Return [result];
];
Select-Current[quorum: List[Representative] I rep: Representative] ... Critical[suite-Iock,
'Prog[ [];
FOR rep IN quorum 00 [
-- find a current representative
IF rep.version=current THEN
Return[rep];
];
]];

A suite write operation is an operation that updates the state of a suite and creates a new
version. The operations Write, SetImmutable, SetSize, and DestroyReference are transformed into
Suite-Write.
A suite write operation is performed by a write quorum, all of whose members are current
The algorithm first gathers a write quorum, and then every representative in the quorum performs
the requested write operation. After all of the members of the quorum have finished, a check is
made to ensure that none of them returned an error, and the result from one of the representatives
is returned as the result of the write operation.
The result of concurrent· writes that update the same portion of a suite is undefined. In the
model implementation, if two concurrent writes update the same portion of a suite it is possible that
half of the suite will assume one value, and the other half of the suite a different value. When the
suite is subsequently read, it will be indeterminate which of these two values will be returned.

55

CHAPTER 5: REPLICATION

Suite-Write[/ result: Any] f- Prog[ [];
Retum[Apply-Quorum[Collect-Write-Quorum[]]];
];
Apply-Quorum[quorum: List[Representative] / result: Any]
[process: List[Process]; x: Any];
-- start all representatives working on the request
process f- MapCar[quorum, 'Fork-Request];
-- wait for all of the representatives to finish
result f- MapCar[process, 'Join];
FOR x IN result 00
IF Is[x, Error-Type] THEN Retum[x];
Return[car[result]];
];
Fork-Request[rep: Representative / p: Process]
Fork['Apply[rep.class, request]];

f-

Prog[

f-

A Copy request is a special write operation. Because the contents of the suite are being
replaced, all of the members of the write quorum do not have to be current. A Copy operation is
transformed into Suite-Copy. Suite-Copy first replaces the contents of the suite, and then updates
cached version numbers.
Suite-Copy[/ result: Any] f- Critical[write-Iock, 'Prog[
[rep: Representative];
-- replace a write quorum
write-quorum f- Collect-Quorum[w];
result f- Apply-Quorum[write-quorum];
IF Is[result, Error-Type] THEN Retum[result];
Mark-Reps-Current[write-quorum];
Return[result];
]];
Mark-Reps-Current[quourm: List[Representative]] f- Critical[suite-Iock, 'Prog[
[rep: Representative];
-- update cached version numbers
current f- car[write-quorum].class I GetVersion[];
FOR rep IN write-quorum DO
rep. version f- current;
j);
A CopyReference request is a write operation, because it can update the state of the suite by
incrementing its reference count. A CopyReference operation is transformed into Suite-CopyRef.
Suite-CopyRef{counted: Boolean / nr: Suite] f- Prog[
[];
-- copy the references of a write quorum
Apply-Quorum[Collect-Write-Quorum[]];
-- return a copy of the reference
nr f- Copy[input-ref];
IF counted THEN Add-Type[nr, Counted]
ELSE Remove-Type[nr, Counted];

56

CHAPTER 5: REPLICATION

Return[nr];
];
As we discussed in the last section, when Create-File is applied to a volume suite a file suite is
created Representatives for the new file suite are created on a set of volume representatives whose
votes total to at least MAX[r,w]. The model implementation of Suite-Create does not allow a volume
suite to include representatives that are reconfigurable or protected, concepts covered in Chapters 6
and 7 respectively.
We digress for a moment to explain a fine point about the operation of Suite-Create. There are
two types of file suite references. A suite reference returned from Create-Suite is fully expanded,
and contains a list of a suite's representatives, vote assignments, and so on. This type of suite
reference is said to be in normal form. The file suite reference returned from Suite-Create does not
contain a reference for each of its representatives. Such references are condensed and consist of the
unique identifier of the suite's representatives and a volume suite. This type of file suite reference
can be easily reconstituted to normal form. as we shall see later.
Suite-Create[id: UniqueID / nr: Suite-Reference] ... Prog[
[result: Any];
result ... Apply-Quorum[Collect-Quorum[Max[r, w]]];
IF Is[resul~ Error-Type] THEN Return[result];
-- create file reference
nr ... Create[Extend[File, Suite]];
nr.id ... id;
nr.volume ... self I CopyReference[];
-- return the new suite's reference.
Return[nr];
];
This concludes our discussion of how suite operations are transformed into operations on suite
representatives. We now turn our attention to how a suite is initialized, and how quorums are
gathered.
When a suite is opened queries are sent out to determine the version numbers of the suite's
representatives. If a representative is available. it responds with its version number. and it can be
considered for inclusion in a quorum. As each representative reports its version number the highest
version number seen is updated. After a read quorum of representatives have reported their version
numbers the highest version number that has been seen is the highest that exists. This in essence is
the version number of the suite, and any current representative will have this version number. If a
read quorum of a suite's representatives do not exis~ provisions are made so clients are told that the
suite does not exist
It may be that a version number inquiry will never return because its corresponding
representative is unavailable. As stated in Section 3.1.2, outstanding uncompleted version number
reads do not affect the ability of a transaction to commit
Initiate-Inquiries[] ... Prog [ [i: Integer];
-- initialize suite
suite-ref ... Expand-Suite[input-ret];
r ... suite-ref.r;
w ... suite-ref.w;

57

CHAPTER 5: REPLICATION

votes +- suite-ref. votes;
number-of-representatives +- Length[suite-ref.rep];
write-lock +- Create-LockO;
write-quorum +- NIL;
suite-lock +- Create-LockO;
crowd-larger +- Create-Condition VariableO;
suite +- NIL;
curren t+-O;
not-found-votes +- 0;
-- send out version number inqmnes
FOR i FROM 1 TO number-of-representatives DO
Fork[,Open-Representative[i]];
];
Open-Representative[i: Integer] +- Prog[
[rep: Representative];
rep +- Create[Representative);
rep.ref +- Nth[suite-ref.rep, i];
rep. votes +- Nth[suite-ref.votes, i];
rep.class +- Open[rep.ref, te, ring. guards];
IF rep. class = Error['NotFound] THEN Not-Found[rep.votes];
IF Is[rep.class, Error-Type] THEN ReturnO;
rep. version +- IF Is[rep.ref, Volume] THEN 0 ELSE class
Note-Representative[rep];
];
Note-Representative[rep: Representative] +- Critical[suite-Iock, 'Prog[
-- a representative has responded to our inquiry
IF rep. version > current THEN current +- rep.version;
suite +- Append[suite, rep];
Broadcast[crowd-Iarger];
]];

I

GetVersionO;

0;

Not-Found[votes: Integer] +- Critical[suite-Iock, 'Prog[ 0;
-- a representative was not found
not-found-votes +- not-found-votes + votes;
IF not-found-votes=w]];
The reference produced by Suite-Create is a file reference that contains a volume suite.
Expand-Suite is used by Initiate-Inquiries to convert such references into normal form.

58

CHAFfER 5: REPLICATION

Expand-Suite[ref: Suite / expanded: Normal-Suite] +- Prog[
[f: File;
v: Volume; nv: Volume;
reps: List[Reference]];
IF Is[ref, Normal-Suite] THEN Return[ref];
-- ref is a result of Suite-Create
nv +- Normalize[ref.volume, te];
reps +- NIL;
FOR v IN nv.reps DO [
fjd +- ref.id;
f.volume +- v;
f +- Create[File];
-- if volume rep is a volume suite, make file rep a file suite
IF Is[v, Suite] THEN Add-Type[f, Suite];
reps +- Append[reps, f];
];
Return[Create-Suite[refjd, nv.r, nv.w, reps, nv.votes]];
];
A Close request is applied to every open representative.
Suite-Close[] +- Critical[suite-Iock, 'Prog[ [];
Return[Apply-Quorum[suite]];
]];
Collect-Quorum normally gathers a quorum with a specified number of votes and returns
immediately to its caller. All of the representatives in the quorum are not guaranteed to be current
If a quorum of representatives have not reported their version numbers all that Collect-Quorum can
do is wait for a representative to respond.
Quorum sizes are the minimum number of votes that must be collected to guarantee correct
operation. However, quorums can always be expanded by adding additional representatives. In the
model implementation all eligible representatives are included in every quorum.
Collect-Quorum[threshold: Integer/ quorum: List[Representative]] +- Prog [ [];
oo[
-- if the suite does not exist, return an error
IF Suite-Not-Found[] THEN Return[Error['NotFound]];
quorum +- Collect[threshold];
IF Not[Null[quorum]] THEN Return[quorum];
-- if we can't get a quorum, just wait
Wait[crowd-Iarger];

];
];
Collect[threshold: Integer / quorum: List[Representative]] +- Critical[suite-Iock, 'Prog [
[x: Representative; votes: Integer];
-- returns a quorum or NIL
votes +- 0;
FOR x IN suite 00 votes +- votes + x.votes;
IF votesw THEN Return[quorum];
-- could not get a quorum
Return[NIL];

]];
An obsolete representative must be updated with a consistent version of the suite. The model
implementation achieves this by not allowing the suite to change while an obsolete representative is
being updated. Alternatively, an obsolete representative could be updated in a separate transaction.
Update-Obsolete-RepresentativeO +-, Prog [
[rep: Representative];
rep +- Select-Obsolete[];
IF Not{Null[rep)) THEN [
rep.c1ass I Copy[suite-ref];
U pdate-Done[rep];
];
];

Select-Obsolete[1 rep: Representative] +- Critical[suite-Iock, 'Prog [
FOR rep IN suite DO
IF rep.version#current THEN
-- We found an obsolete representative.

60

0;

CHAPTER 5: REPLICATION

Retum[rep];
Return[NIL];
]];

Update-Done[rep: Representative] +- Critical[suite-lock, 'Prog[ [];
-- mark representative as being current
rep. version +- current;
]];

The above functions comprise the model implementation of the suite algorithm. There are
many refinements that can be made to the suite we have described, and some of them are given at
the end of this chapter. For example, when a suite is closed inquiry processes can optionally be
stopped. The details of how this can be accomplished were not shown.
5.3 Correctness Arguments

We argue that if a representative has the highest version number it has received all of the
updates to the suite. The set of representatives that we update to make version v+ 1 all have
version v. The desired result follows by simple induction.
We argue that the replication algorithm preserves the properties of transactions. Representatives
are stored in transactional storage, and we base our arguments on the properties transactional
storage is guaranteed to exhibit
(totality) Suite updates are transformed to representative updates. Representative updates
are guaranteed totality. Thus, suite updates are guaranteed totality.
(consistency) We show that the suite algorithm preserves serial and external consistency
(Section 3.1.2) by showing that it produces schedules that are equivalent to the unreplicated
case's schedules. Let S be a schedule, and S' be a schedule that results when we change
object 0 to be a suite. We wish to show that DEP(S)= DEP(S').
  

Reconfigu ra ble Ref fo r 

I

Reconfigurable Ref for(FI}

I

Indirect Reffor(VX}

""r

Ref for (FI)

'"

Ref for (VX)

I
I

I
I
r

Reconfigurable Ref for 

I

Indirect Ref for (F)

I
I

!

",.

V

~

J

""""

,'"

\

,'"

Ref for (F)

File 
Index 
Volume



Structure of a Reconfigurable Volume
Figure 6.2

68

CHAPTER 6: RECONFIGURATION

contents of the object is copied by its new implementor. Second, the indirect reference is changed
to point at the new implementor.
Reconfigure[new-ref: Reference] ... Prog[
[];
for files and indexes only
ic ... Open[new-ref, tc, ring];
ic I Copy[ref];
ic I Close[];
superclass I ChangeReference[new-ret];
];
Whenever a file is created on a reconfigurable volume, it is made reconfigurable. The volume's
,file index is used to implement the necessary indirection.
Reconfigure-Create-File[] ... Prog[
[file-ref: File-Reference];
file-ref ... Apply[superclass, request];
IF Is[file-ref, Error-Type] THEN Return[file-ret];
Return[Create-Reconfigurable[file-ref, ref.file-index, tcU;
];
As shown in Figures 6.4 through 6.6, reconfiguring a volume occurs in three steps:
1.

All of the files on the volume are moved to their new home. This is accomplished by
moving the files that are listed in the file index one by one.

2.

The file index is moved to the new volume.

3.

The indirect pointer to the current implementor of the reconfigurable volume is switched to
point at the new volume.

Contention for a reconfigurable 'volume may cause a transaction that is performing a
reconfiguration to abort Step 1 can be split up into many smaller transactions, each of which
would reconfigure a single file.

69

Reconfigu ra ble Refe rence

I

Indirect Ref

I

,

lo.

I

~

Reference

Old
Implementor

Index

V'

New
Implementor

Step 1 : Copy State

Reconfigurable Reference

I

Indirect Ref

,

1

lo.

I

Old
Implementor

Reference

Index

lo.

-,

New
Implementor

Step 2: Change Pointer

File and Index Reconfigu ration Algorithm
Figure 6.3

70

Old Volume
File
Reference
Reference
Reference
File

Reference

File Index
Index
Reconfigurable Volume
Indirect Volume
Indirect Index

Volume Ref
Index Ref

New Volume
L----I~

File

File

Step 1 of Volume Reconfiguration: Move Files
Figure 6.4

71

Index
Reconfigurable Volume
Indirect Volume

Volume Ref

Indirect Index

Index Ref

Old Volume

File

Reference
Reference

File

Reference

File Index
File

'----I~ File

New Volume

Step 2 of Volume Reconfiguration: Move File Index
Figure 6.5

72

Index
Reconfigurable Volume
Indirect Volume
Indirect Index

Volume Ref
Index Ref

Old Volume

File

Reference
Reference

File

Reference

File Index
File

L.-----1.... File

New Volume

Step 3 of Volume Reconfigu ration: Change Volume Pointer
Figure 6.6

73

CHAPTER 6: RECONFlGURA TION

Reconfigure-Volume[new-volume: Volume] +- Prog[
[old-home: Reference; new-home: Reference; entry: Entry; vc: Volume-Class;
ic: Index-Class];
-- reconfigure all of the files on the volume
ic +- Open[ref.file-index, tc, ring];
vc +- Open[new-volume, tc, ring];
entry +- ic I Enumerate[NIL];
-- move all of the files
WHILE Not[Null[entry]] DO [
-- move one file: first construct a reference
rfr +- Create[Reconfigurable];
rfr.ref +- Create[Indirect];
rfr.ref.index +- ref.file-index;
rfr.ref.indirect-id +- entry.entry-name;
open current implementor of file
fc +- Open[rfr, tc, ring];
-- create new implementor of file
new-home +- vc I Create-File[fc I GetID[], T];
move file
fc I Reconfigure[new-home];
fc I Close£];
entry +- ic I Enumerate[entry];
];
now move the file index
new-home +- vc I Create-File[ic I GetID[], T];
new-home +- Create-Index[new-home];
ic I Reconfigure[new-home];
-- change the volume's configuration
superc1ass I ChangeReference[new-volume];
-- all done.
ic I Close£];
vc I Close[];
];
6.3 Refinements

6.3.1 Reducing Data Movement
Copy could be modified to return if its source and destination were already identical. Copy
could accomplish this by checking their unique identifiers and version numbers.
This could significantly reduce unnecessary data movement. Consider the case of a file suite
that is reconfigured to change its vote assignments. It may be that no representatives need to be
updated. In this case, with the proposed improvement to Copy, the Reconfigure operation would
not cause any data movement.
6.3.2 Eliminating the File Index
The file index for a reconfigurable volume could be eliminated by storing a file's index entry in
its first few pages. When such a file was opened, it would either contain a pointer to itself or to its
new location. The reason that we did not use this scheme is that if a file that is part of such an

74

CHAPTER 6: RECONFIGURATION

indirect chain was unavailable, it would be impossible to find the new home of a file. By using an
index we avoided this problem at some expense.
6.4 Summary

Reconfiguration was accomplished with indirection and transfer of state at the time an object is
reconfigured. All of the information containing objects we have defined - volumes. files, and
indexes - can be made dynamically reconfigurable.

Exercise
1.

Discuss how a volume reconfiguration could use a transaction for each file or index that
had to be moved.

75

Chapter 7: Protection

Until this point we have assumed that users are perfectly trustworthy. This chapter relaxes this
assumption by describing a protection mechanism that can be used to protect client information.
The mechanism can be used directly for the protection of small objects such as data base entries,
and it can be used to implement popular protection policies for larger objects.
Protection can be considered to consist of four distinct components: secrecy (ensuring that
information is only disclosed to authorized users), authentication (ensuring that information is not
forged), integrity (ensuring that information is not destroyed), and availability (ensuring that access
to information can not be maliciously interrupted).
This chapter describes a new protection mechanism called cryptographic sealing that provides
primitives for secrecy and authentication. The mechanism is enforced with a synthesis of
conventional cryptography, public-key cryptography, and a threshold scheme.
The new mechanism is based on the idea of sealing an object with a key. Sealed objects are
self-authenticating, and in the absence of an appropriate set of keys, only provide information about
the size of their contents. Thus, keys are the basic unit of secrecy and authentication in the
mechanism. New keys can be freely created at any time, and keys can also be derived from existing
keys with operators that include Key-Or and Key-And. These operators allow protection structures
to be established that allow any member of a set of keys to unseal an object, that require every
member of a set of keys to unseal an object, or any combination of these extremes. This flexibility
allows cryptographic sealing to implement common protection mechansims such as capabilities,
access control lists, and information flow control.
Objects and sealed objects are simply arrays of bytes. In order to update the value of a sealed
object it is necessary to unseal it, change its value, and reseal it.
Clients must operate in a secure environment so they can safely manipulate unsealed objects. A
simple way of providing such an environment would be for each client to execute on a separate
physical processor. For the purposes of this discussion we will assume that every client executes on
a secure processor that is protected from every other client (see Section 2.1.1).
Our description of the protection mechanism and its applications is organized into six sections.
The first section describes some preliminaries, including the general framework of the mechanism
and the cryptographic methods that we use. The second section covers cryptographic sealing. The
third section shows how cryptographic sealing can be used to implement capabilities, access control
lists, information flow control, secure processors, and revocation. The fourth section examines some
practical considerations. The fifth section is a comparative analysis of cryptographic sealing and
traditional protection mechanisms. The chapter ends with a brief conclusion.

76

CHAPTER 7: PROTECTION

7.1 Preliminaries
7.1.1 Framework
In order to compare the present work wIth previous protection systems [Saltzer and Schroeder
79] we need to provide an appropriate framework. We will call a protection mechanism active if it
is placed between a client and protected information, as shown in Figure 7.1. An active protection
mechanism serves to inhibit unauthorized client requests to storage. If a client could bypass an
active protection mechanism, it· could gain unauthorized access to protected information. Thus,
active protection mechanisms depend on a security envelope that a client can not penetrate. As
shown in the figure, system administrators and operators are inside of this envelope, because they
have direct physical access to the system.
The protection mechanism this chapter introduces is the first example of a general purpose
passive protection mechanism. It is passive because there are no restrictions placed on a client's
access to storage. However, a client is only able to decipher information that it is authorized to see.
As shown in Figure 7.2, a passive protection system operates as part of the client.
A passive protection system can only address the secrecy and authentication aspects of computer
security. Section 7.3 describes how an active protection mechanism can be used to supplement a
passive system to provide integrity and availability. When passive and active systems are used
together in this manner a failure of the active protection system only effects integrity and
availability. Secrecy and authentication are still guaranteed by the passive protection system. In a
similar way, if the ability to revoke privileges is desired, an active protection system would be used
as described in Section 7.3.4.
Some of the ideas in this chapter have appeared in other forms. Morris [Morris 73] discussed
the concept of sealing objects. but he did not present a way to create general protection structures,
and his mechanism was not enforced by cryptography. Chaum and Fabry [Chaum and Fabry 78].
Lindsay and Gligor [Lindsay and Gligor 78], and Needham [Needham 79] independently observed
that cryptography can be used to authenticate objects such as capabilities. Gudes [Gudes 80]
described a way to use cryptography to implement a form of access control lists without groups or
indirect keys.
7.1.2 Environment
We describe the building blocks of the protection mechanism in detail because with a thorough
knowledge of these facilities the reader will find it easier to understand the sections that follow.
The three facilities that the protection mechanism uses are cryptography, a threshold scheme, and
checksums. It is difficult to motivate all of these facilities in the abstract, and thus we ask the
reader to be patient until the following section where it will become clear why they were
introduced.
7.1.2.1 Cryptography
Cryptography is used to encrypt information to be protected. also called cleartext. into
ciphertext. Encryption is a transformation from the space of possible cleartexts to the space of
possible ciphertexts. The transformation selected depends on the key supplied to the encryption
function. To decrypt. or recover the cleartext of a ciphertext, requires a specific key so the

77

Client

Protection
Mechanism

Security Envelope

An Active Protection Mechanism
Figure 7.1

78

Storage

Storage

Client

Protection
Mechanism

A Passive Protection Mechanism
Figure 7.2

79

CHAPTER 7: PROTECfION

encryption transformation can be inverted.
A cryptographic system is called perfectly secure if all that can be discerned from a ciphertext is
that the ciphertext exists, and that the corresponding cleartext is not longer than the ciphertext.
Perfect security can only be achieved when the entropy of a key is at least as great as the entropy of
information encrypted [Shannon 49]. Thus, perfectly secure systems are usually not practical
because a key must be at least as long as the cleartext it is used to encrypt.
A cryptographic system is called computationally secure if even when enough infonnation is
theoretically available to break the system, the amount of computation required to do so is
unreasonable. Unfortunately, proving facts about the computational complexity of many interesting
algorithms is beyond the reach of current theory. Thus, most practical cryptosystems can not be
proven to be computationally secure. In practice, the best that can be done is to invest a substantial
amount of effort in trying to think of a way to break a system; if the system survives such an
attack, then it is considered to be secure for practical purposes. As unsatisfying as this may be, it is
the current state of the art
The goal of a cryptanalytic attack is to find the key that was used to encrypt a ciphertext.
Attacks are classified by the information an intruder has to aid him. It is assumed that an intruder
knows the workings of the cryptosystem he is attacking. As the name implies, in a ciphertext-only
attack an intruder only has ciphertext. I n a known-cleartext attack an intruder has both the
ciphertext and the cleartext that was used to generate it. In achosen-cleartext attack an intruder
can choose text to be encrypted and observe the resulting ciphertext Experience has shown that a
cryptosystem that is known to be vulnerable to a chosen-cleartext attack should not be used.
Two generic types of cryptosystems are used to implement cryptographic sealing. As we
introduce each cryptosystem we will provide enough background so the reader can understand its
properties. For more infonnation about cryptography see Diffie and Hellman [Diffie and Hellman
79].
The framework of the protection system is very general, and it is designed to accommodate
new cryptosystems as they are discovered. The interfaces to the cryptosystems are described in
general terms, allowing stronger implementations of the systems to be adopted as they become
available.
Conventional Cryptography
In a conventional cryptosystem a cleartext is encrypted with a key, and the same key is used to
decrypt the resulting ciphertext The following functions implement conventional cryptography.

Create-Conventional-Key[/key: Byte-Array]
Create-Conventional-Key returns a random key that can be used in conjunction
with the conventional encrypt and decrypt functions. All of our key creation
functions are based on a true random bit generator.
Conventional-Encrypt[clear: Byte-Array, key: Byte-Array I cipher: Byte-Array]
Conventional-Encrypt encrypts clear with key, producing cipher.
Conventional-Decrypt[cipher: Byte-Array, key: Byte-Array I clear: Byte-Array]
Conventional-Decrypt decrypts cipher with key, producing clear.

80

If the key

CHAPTER 7: PROTECfION

specified was not used to encrypt cipher, the value of clear is undefined.
Public-Key Cryptography
In a public-key cryptosystem the key generation function returns a pair of keys. Call the keys of
such a pair keya and keyb. Only keyb can decrypt ciphertext enciphered with keya, and only keya
can decrypt ciphertext enciphered with keyb. This is in contrast to a conventional cryptosystem,
where a single key is used for both encryption and decryption.
Public-key cryptosystems were first suggested in the open literature by Diffie and Hellman
[Diffie and Hellman 76]. The name public-key resulted from the observation that with such a
system certain keys could be made public, solving in part the problem of key distribution. Publickey cryptosystems have also been called asymmetric cryptosystems.
Rivest, Shamir, and Adleman [Rivest et al. 78] described the first practical implementation of a
public-key system. The only cryptanalytic approaches currently known for breaking their scheme
are at least as computationally complex as factoring extremely large numbers.
The exact semantics we chose for our definition of public-key cryptography were motivated by
the Rivest, Shamir, and Adleman proposal. Some public-key cryptosystems allow ciphertext
enciphered with keya to be decrypted with keyb, but not the converse. In this case we will assume
that two key pairs from such a system are used to implement one of our public-key pairs to provide
the appropriate semantics.
The following functions implement public-key cryptography.
PK-Pair +- Record[keya: Byte-Array, keyb: Byte-Array];
Create-PK -Pair[/pair: PK -Pair]
Create-PK-Pair computes a pair of public keys. The keys are random, in the sense
that they are a function of a true random number.
PK-Encrypt[clear: Byte-Array, key: Byte-Array / cipher: Byte-Array]
PK-Encrypt encrypts clear with one of the members of a key pair, and it returns
cipher.
PK -Decrypt[cipher: Byte-Array, key: Byte-Array / clear: Byte-Array]
PK-Decrypt decrypts cipher with key, producing clear.
correct, the value of clear is undefined.

If the key specified is not

7.1.2.2 Threshold Scheme
A threshold scheme allows a datum D to be divided into n pieces, such that any k pieces are
sufficient to reconstruct D but complete knowledge of any k-1 pieces reveals no information about
D [Blakley 79, Shamir 79]. A practical implementation of this system was first demonstrated by
[Shamir 79].
The following functions implement a threshold scheme.
Threshold-Pieces +- Record[pieces: List];

81

CHAPTER 7: PROTECfION

Threshold-Split[clear: Byte-Array, n: Integer, k: Integer / piece-list: Threshold-Pieces]
Threshold-Split takes clear and logically splits it into n pieces, any k of which are
sufficient to recover clear. These pieces are returned as an n element list
Threshold-Recover(piece-list: Threshold-Pieces / clear: Byte-Array]
Threshold-Recover takes a list of pieces, and if k pieces are available, it will return
the original value of clear. Elements on the input list that are not threshold pieces
are ignored. If k pieces are not available, Threshold-Recover returns Error['Fai/ed}.

7.1.2.3 Checksums
A checksum function maps arbitrary input values into a comparatively small set of output
values, such that independent input values have a small probability of being mapped into the same
output value. I f a c bit checksum is implemented by a cyclic code, it can be proved that the
fraction of independent values that have the same checksum is T C [Peterson and Weldon 72, p. 229].
Conventional encryption can also be used to create checksums. One way of creating an encryption
based checksum is to use a cipher block chain technique [NBS 80).
Add-Checksum[x: Any / cx: Checksummed-Object]
Add-Checksum returns a copy of x and x's checksum.
Check-Checksum[cx: Checksummed-Object / x: Any]
Check-Checksum checks the checksum of cx, and if the checksum is correct, it
returns the value of x it holds. If the checksum of ex is incorrect, Check-Checksum
returns Error['Fai/ed}.

7.2 Cryptographic Sealing
Our basic protection mechanism, cryptographic sealing, is described in three stages. First, we
provide a model of what the mechanism does. Second, we show how the mechanism works. Third,
we discuss the extent to which the mechanism can be trusted.
In the sections that follow, we extend the notion of keys. These extended keys are similar
enough to cryptographic keys that we did not coin a new tenn, but the reader should be aware that
they are not precisely the same.
7.2.1 Model
As we have mentioned, keys are the basic unit of secrecy and authentication in the protection
mechanism. There are two ways to generate a key. It is possible to generate a brand new random
key that does not depend on any other keys; such a key is called a base key. It is also possible to
generate a key that is a function of existing keys; such a key is called a derived key. Derived keys
are used to implement protection structures that can not be realized with base keys alone.
The keys required to unseal an object depend on the structure of the key used to seal the
object For example, to unseal an object that was sealed with Key-Or[ka, Key-And[kb, kc}} one
needs either ka or both kb and kc. To be as explicit as possible, we introduce the unseals relation
between a set of keys and a key. As we shall see in a moment, if a set of keys S unseals key k,

.'
82

CHAPTER 7: PROTECfION

then S can be used to unseal any object that has been sealed with k. If key set S unseals k we will
write S >> k.
If S >> k we also say that k is unsealed by S.
Seal and Unseal are the primitive functions of the protection mechanism. Seal seals an object
with a key, and Unseal reverses the sealing process. The type Key is used below to denote either a
base or a derived key.
Seal[x: Any, k: Key I sx: Sealed-Object]
Seal seals x with k, returning a sealed object
modified.
Seal[x, NIL} is x.

Seal operates by value; x and k are not

Unseal[sx: Sealed-Object, ks: List[Key], tc: TC I x: Any]
Unseal unseals sx with the set of keys contained in the key set ks. If Unseal is successful, it
returns x (the original input value to Seal). Otherwise Unseal returns Error['Fai/ed].
Unseal does not modify sx.
The protection mechanism provides two properties. The secrecy property states that a sealed
object is useless to someone who does not have a set of keys that unseals the key that was used to
seal the object The authentication property states that Unseal will only return values that were in
fact properly sealed.
x can be recovered from Seal[x, k} with a set of keys S if and only if S

Secrecy.

»

k.

Authentication. If x was not sealed with a key that is unsealed by S then the result of
Unseal[x, S} will be Error['Failed}.

The following functions create keys, and completely define the unseals relation.
Create-Base-Key[1 k: Key]
Create-Base-Key creates a new regular base key. Regular base keys are the simplest kind of
keys. To unseal Seal[x, k} requires k. In other words, a key set unseals k if and only if it
contains k.
(S
Key-Pair

+-

»

k)

==

(k E S)

Record(keya: Key, keyb: Key);

Create-Key-Pair[1 kp: Key-Pair]
Create-Key-Pair creates a pair of base keys. These keys are related to each other in the
following way. To unseal Seal[x, keyaJ requires keyb, and to unseal Seal[x, keyb} requires
keya. In other words, a key set unseals keya if and only if it contains keyb, and a key set
unseals keyb if and only if it contains keya.
(S
(S

>>

»

keya) _ (keyb E S)
keyb) _ (keya E S)

Key-And[ka: Key, kb: Key I dk: Key]
Key- And creates a derived key that is the logical and of ka and kb. To unseal Seal[x, dk}
requires either dk or a set of keys that will unseal both ka and kb. That is, a key set

83

CHAPTER 7: PROTECTION

unseals dk if and only if it unseals both ka and kb or it includes dk.
(S

»

dk) _

«S

»

»

ka) 1\ (S

kb»

V (dk E S)

Key-Or(ka: Key. kb: Key / dk: Key]

Key-Or creates a derived key that is the logical or of ka and kb. To unseal Seal[x, dkJ
requires either dk or a set of keys that unseals ka or kb. That is. a key set unseals dk if
and only if it unseals ka or kb or it includes dk.
(S

»

dk)

= (S

»

ka) V (S

»

kb) V (dk E S)

Key-Quorum[q: Integer. key-list: List[Key] / dk: Key]

Key-Quorum creates a derived key. key-list is a list of an arbitrary number of keys. To
unseal Seal[x. dkJ requires either dk or a set of keys that unseals q distinct keys from keylist. In other words. a set of keys unseals dk if and only if it includes dk or if it unseals q
distinct keys from key-list. Note that Key-Quorum can be used to implement Key- And and
Key-Or (with q= 2 and q= 1 respectively). but as we shall see. the implementations of KeyAnd and Key-Or are more efficient than Key-Quorum. For some q combination k j , ... , kq
drawn from key-list;
(S

>>dk)

=(1\ 1 < i < q S >>ki ) V (dk E S)

Submaster(k: Key / dk: Key]

Submaster creates a derived key. To unseal Seal[x, dkJ requires either dk or a set of keys
that unseals k. In other words, a key set unseals dk if it unseals k or if it includes dk.
(S

»

dk)

==

(S

»

k) V (dk E S)

Seal-Only[k: Key / dk: Key]

Seal-Only creates a derived key. To unseal Seal[x, dkJ requires a set of keys that unseals k.
In other words. a key set unseals dk if and only if it unseals k. A key set that includes dk
is not sufficient to unseal dk.
(S

»

dk)

=

(S

»

k)

Create-Indirect-Key[k: Key, tc: TC / ik: Indirect-Key]

Create-Indirect-Key creates an indirect key. To unseal Seal[x, ikJ either requires ik or a set
of keys that unseals k. In other words, a key set will unseal ik if it contains ik or if it
unseals k. Once an indirect key has been created, it can be changed with Change-IndirectKey. Because indirect keys can be altered, the following statement is an implication.
(S

»

k) V (ik E S)

-+

(S

»

ik)

Change-Indirect-Key[ik: Indirect-Key, nk: Key. tc: TC]

Change-Indirect-Key changes an indirect key.
After Change-Indirect-Key has been
performed, to unseal an object that has been sealed with ik either requires ik or a set of
keys that unseals nk. In other words, ik is changed such that if a key set unseals nk it now

84

CHAPTER 7: PROTECfION

also unseals ike
7.3.4.
(S

»

Change-Indirect-Key does not provide revocation as discussed in Section
nk) -+ (S

»

ik)

Let us consider an example to make the idea of an indirect key clear.
following occurs:

Imagine that the

kevin +- Create-Base-Key[];
ik +- Create-Indirect-Key[kevin];
sx +- Seal[x, ik];
At this point sx can be unsealed with either ik or kevin. The indirect key is then updated:
harry +- Create-Base-Key[];
Change-Indirect-Key[ik, Key-Or[kevin, harry]];
After the Change-Indirect- Key occurs, ik, harry. or kevin are required to unseal sx.
If S » kl and S » k2, then Unseal[Seal[Seal[x, kll. k2l. Sl will be Seal[x. kll. In order to
recover an object that may have been sealed many times R Unseal can be used.

RUnseal[sx: Sealed-Object, ks: List[Key], tc: TC / x: Any]
RUnseal applies Unseal as many times as necessary until its result is not sealed. RUnseal
always applies Unseal at least once.
RUnseal-List[sl: List[Sealed-Object], ks: List[Key], tc: TC / Ix: List[Any]]
RUnseal-List is a function that when passed two lists [al ... an] [kl ... kn] returns
[RUnsea4.al. kl. tc] ... RUnsea4.an. kn. tc]]
Seal-List[lista: List[Any], listb: List[Key] / slist: List[Sealed]]
Seal-List is a function that when passed two lists [al ... an] [kl ... kn] returns [Sea4.al.kl]
... Sea4an. kn]].
NA-Unseal[sx: Sealed[Any], kl: List[Key], tc: TC / x: Any]
NA-Unseal unseals sx with RUnseal if it is sealed, otherwise it returns sx.
Unseal does not provide authentication.

Thus, NA-

NA-Unseal-List[sx: Sealed[Any], kl: List[Key], tc: TC / x: Any]
The same as RUnseal-List, except that is uses NA-Unseal.
7.2.2 Basic Algorithm
We present the basic algorithm by first outlining the principles of the mechanism and then
presenting detailed descriptions of the major functions.
The protection mechanism operates by arranging that precisely enough information is included
in a sealed object to allow an appropriate set of keys to unseal it. We will first consider base keys
and then show how this is arranged for derived keys.
Base keys are directly implemented by cryptography. If an object is sealed with a base key,

85

CHAPTER 7: PROTECTION

then Seal encrypts the object with the key. To provide the authentication property, a checksum is
added to the object before it is encrypted. Unseal uses the checksum to tell if it has a legitimate
sealed object For example, assume that kb is a base key. Seal[x. kbJ is transformed to:
Encrypt[Add-Checksum[x], kb].
Base keys are assigned unique identifiers which are included in the result of an Encrypt
operation. Thus, it is easy for Unseal to determine which key can be used to recover the contents
of a sealed object
A derived key consists of two fields: key and opener. When an object is sealed with a derived
key dk, it is sealed with dk.key, and dk.opener is carried along in the sealed object Seal[x. dkJ is
transfonned to:
[Seal[x, dk.key]. dk.opener].
The central idea is that the opener of a derived key is carefully constructed so that if S >> dk, then
it is possible to compute a key that unseals dk.key from dk.opener and S.
Figure 7.3 gives several examples of sealed objects. The first example shows what happens
when an object is sealed with a base key. The object is encrypted (as suggested by the heavy box),
and marked with the key that can be used to decrypt the object. Although it is not shown in the
figure, the object in the box includes its checksum for authentication. The next three examples show
how derived keys work. It should be clear from the illustration how the opener expresses the
protection structure that corresponds to its Seal statement.
The following functions implement the protection mechanism. Figure 7.4 is a diagram of the
dependency relationships between the functions, which the reader may find helpful.
Simple base keys are implemented by conventional cryptography. A simple base key consists of
a unique identifier and an encryption key.
Simple-Key +- Record[id: UniqueID, key: Byte-Array];
Create-Base-Key[/k: Key] +- Prog[ [];
k +- Create[Simple-Key];
k.key +- Create-Conventional-Key[];
k.id +- GetUniqueID[];
Retum[k];
];
Key pairs are implemented by public-key cryptography. Like a conventional key, a Key-PairHalfhas fields for an identifier and akey. A Key-Pair-Halfalso includes the identifier of the key
that will decrypt it
Key-Pair-Half +- Record(id: UniqueID, decryptedBy: UniqueID, key: Byte-Array];
Create-Key-Pair[1 kp: Key-Pair] +- Prog[ (pkp: PK-Pair];
kp +- Create[Key-Pair];
kp.keya +- Create[Key-Pair-Half];
kp.keyb +- Create[Key-Pair-Half];
pkp +- Create-PK -Pair[];
-- set up keya
kp.keyajd +- GetUniqueID[];
kp.keya.decryptedBy +- GetUniqueID[];

86

Seal[Object, key]

II Object -i I
....- - - k e y : : : . J

Seal[Object, Key·And[keya, keyb]]
opener

II Object 1 I IL.::C ikey

....._ _ _ ikey:J

II

keya - _.....

- - - - - keyb

Seal[Object, Key·Or[keya, keyb]]
opener

II

Obje~:ey

nII ikey keya UI.-.I_ik_eykeyb U

Seal[Object, Key·And[keya, Key·Or[keyb, keyc]]]
opener
opener

II Object -I I IL.::C ikey2
:.J
.....- - - ikey2

II

keya _ _......

II ikeY10 II ikeyl
..._ _ _ keYb:J

-----ikeyl

Example Sealed Objects
Figure 7.3

87

jl

-.- - - keyc : J

Seal-Only

Change-I ndirect
Create-Indirect

Seal

Unseal

/

Create-Key

I

I

Encrypt

Decrypt

X

Public-Key Cryptography

Key-Quorum

"

Classical Cryptography

Protection System Internal Structure
Figure 7.4

88

Threshold Scheme

CHAPTER 7: PROTECfION

kp.keyakey +- pkp.keya;
-- set up keyb
kp.keyb.id +- kp.keya.decryptedBy;
kp.keyb.decryptedBy +- kp.keyajd;
kp.keyb.key +- pkp.keyb;
Retum[kp];
];
Before we discuss Seal and Unseal, let us introduce two cryptographic functions that work with
base keys. The Encrypt function encrypts an object with a base key, and includes in its result the
unique identifier of the key that will decrypt the resulting ciphertext The Decrypt function takes
an object that has been encrypted and attempts to decrypt it with the aid of a set of keys.
Ciphertext +- Record[decryptedBy: UniqueID, text: Byte-Array];
Encrypt[clear: Any, k: Key / cipher: Ciphertext] +- Prog[ D;
cipher +- Create[Ciphertext];
-- encrypt clear with k, producing cipher
IF Is[k, Key-Pair-Half] THEN [
cipher.decryptedBy +- k.decryptedBy;
cipher.text +- PK-Encrypt[Encode[clear], k.key];
]
ELSE [
cipher.decryptedBy +- k.id;
cipher.text +- Conventional-Encrypt[Encode[clear], k.key];
];
Retum[cipher] ;
];
Decrypt[cipher: Ciphertext, ks: List[Key] / clear: Any] +- Prog[
[k: Key, rk: Key];
-- decrypt cipher with one of the keys in ks
clear +- Error['Failed];
FOR k IN ksOO [
-- If a derived key, get the base key hidden inside.
rk +- IF Is[k, Derived-Key] THEN k.key ELSE k;
-- see if we have a match
IF Is[rk, Simple-Key] THEN [
IF cipher.decryptedBy =rk.id THEN
clear +- Decode[Conventional-Decrypt[cipher.text, rk.key));
];
IF Is[rk, Key-Pair-Half] THEN [
IF cipher.decryptedBy = rk.id THEN
clear +- Decode[pK-Decrypt[cipher.text, rk.key));
];
];
Retum[clear];
];
Seal works in the following way. It first tests the type of key that it has been passed. If it is a
base key then Encrypt is used to encrypt the object and a checksum of the object The checksum is

89

CHAPTER 7: PROTECTION

used by Unseal to authenticate the sealed object If Seal is passed a derived key it uses the base
key contained in the derived key to seal the object, and it includes the opener of the derived key in
its result
Derived-Key +- Record[key: Key. opener: List[Sealed-Object]];
Sealed-Object +- Record[cipher: Any. opener: Any];
Seal[x: Any. k: Key / sealed-x: Sealed-Object] +- Prog[ [];
IF NUll[k] THEN Return[x];
sealed-x +- Create[Sealed-Object];
IF Is[k. Derived-Key] THEN [
-- Include the derived key's opener in the sealed object
sealed-x.cipher +- Seal[x, k.key];
sealed-x.opener +- k.opener;
Return[sealed-x];
];
-- Add a checksum for authentication.
sealed-x.cipher +- Encrypt[Add-Checksum[x], k];
sealed-x.opener t- NIL;
Return[sealed-x];
];
Unseal reverses the sealing process. Unseal first checks to see what type of key was used to seal
an object If the object was sealed with a base key, Unseal uses Decrypt to recover the object's
contents. The result of Decrypt is authenticated by ensuring that the object's checksum is correct
If the object was sealed with a derived key, the opener is unsealed, and the resulting key set is used
to unseal the object

Unseal[sealed-x: Sealed-Object, ks: List[Key], tc: TC / x: Any] +- Prog[
[opener: List[Key]; tp: Threshold-Pieces];
-- Ensure that the object was in fact sealed.
IF Not[Is[sealed-x, Sealed-Object]] THEN Return[Error[,Failed]];
IF Null[sealed-x.opener] THEN [
-- object was sealed with a base key
Return[Check-Checksum[Decrypt[sealed-x.cipher, ks]]];
];
The object was sealed with a derived key.
opener +- Normalize[sealed-x.opener, tc];
IF Is[opener, Threshold-Pieces] THEN [
-- The opener is a list of sealed key pieces.
tp +- Create[Threshold-Pieces];
tp.pieces +- RUnseal-List[opener.pieces. ks, tc];
opener +- List[Decode[Threshold-Recover[tp]]];
]
ELSE
The opener is a list of sealed keys.
opener +- RUnseal-List[opener. ks, tc];
Return[Unseal[sealed-x.cipher, Append[opener. ks], tc]];
];

90

CHAPTER 7: PROTECfION

The derived key dk that Key- And creates is simple in concept The opener of Key- And can be
unsealed with a key set S only if S >> ka and S >> kb. ka and kb must not be a key pair. This is
because with the implementation of public-key cryptography proposed by [Rivest et at. 78]
encrypting with one and then the other is equivalent to encrypting and then decrypting. This
restriction could be eliminated at the cost of a small increase in the size of the sealed object by
changing the opener to be List[Sea/[Sea/[dk.key, ka], Submasterlkb}}}.
Key-And[ka: Key, kb: Key / dk: Key] +- Prog[ [];
dk +- Create[Derived-Key];
dk.key +- Create-Base-Key[];
dk.opener +- List[Seal[Seal[dk.key, ka], kb]]];
Retum[dk];
];

Key-Or is implemented in the same manner as Key- And. One of the elements of the opener of
Key-Or can be unsealed with a key set S only if S >> ka or S >> kb.
Key-Or[ka: Key, kb: Key / dk: Key] +- Prog[ [];
dk +- Create[Derived-Key];
dk.key +- Create-Base-Key[];
dk.opener +- List[Seal[dk.key, ka], Seal[dk.key, kb]];
Retum[dk];

];
Key-Quorum creates a new base key and then splits the key into n pieces, where n is the length
of the input key list Things are arranged so that any combination of q of these pieces can be used
to reconstruct the new base key. The n parts of the base key are then sealed with the n input keys,
creating an opener that will yield the base key when a key set is available that unseals q or more
distinct input keys.
Key-Quorum[q: Integer, kl: List[Key] / dk: Key] +- Prog[ [];
dk +- Create[Derived-Key];
dk.key +- Create- Base- Key[];
-- Create the proper number of key pieces.
dk.opener +- Threshold-Split[Encode[dk.key], Length[kl], q];
-- Seal the pieces with the elements of kl.
dk.opener.pieces +- Seal-List[dk.opener.pieces, kl];
Retum[dk];
];

Submaster is implemented by creating a new base key, and including the key in an opener
sealed with k.
Submaster[k: Key / dk: Key] +- Prog[ [];
dk +- Create[Derived- Key];
dk.key +- Create-Base-Key[];
dk.opener +- List[Seal[dk.key, k]];
Retum[dk];

];
Seal-Only ensures that only sets that unseal k will unseal dk.

91

This derived key is set up to

CHAPTER 7: PROTECTION

encrypt objects with one half of a key pair, and to hide the other half in an opener. The opener
can only be unsealed with a key set S if S >> k.
Seal-Only[k: Key / dk: Key] +- Prog[ [kp: Key-Pair];
kp +- Create-Key-Pair[];
dk +- Create[Derived-Key];
dk.key +- kp.keya;
dk.opener +- List[Seal[kp.keyb, k]];
Retum[dk];
];
Create-Indirect-Key creates a key that can be changed. This is accomplished by creating a
derived key with an indirect opener. By changing the indirect opener, one can change the keys that
unseal the indirect key.
Create-Indirect-Key[k: Key, tc: TC / dk: Key] +- Prog[ 0;
k +- Submaster[k];
-- create indirect key
dk +- Create[Derived-Key];
dk.key +- k.key;
dk.opener +- Create-Indirect[k.opener, tc];
Retum[dk];
];
Change-Indirect-Key[dk: Key, nk: Key, tc: TC] +- Prog[ 0;
-- if nk is NIL, delete key
IF Null[nk] THEN
Change-Indirect[dk.opener, NIL, tc];
ELSE
-- replace opener with new one
Change-Indirect[dk.opener, List[Seal[dk.key, nk]], tc];
RetumD;
];
RUnseal and NA-Unseal are implemented as the following functions.
RUnseal[sx: Any, ring: List[Key], tc: TC / x: Any] +- Prog[
-- recursive unseal
x +- Unseal[sx, ring, tc];
IF Is[x, Sealed] THEN Return[RUnseal[x, ring, tcU;
Return[x];
];

0;

NA-Unseal[sx: Any, ring: List[Key], tc: TC / x: Any] +- Prog[ 0;
unauthenticated unseal
-- sx does not have to be sealed
x +- IF Is[sx, Sealed] THEN RUnseal[sx, ring, tc] ELSE sx;
Return[x];
];

92

CHAPTER 7: PROTECfION

7.2.3 Strength
We reason about the strength of the protection mechanism in two parts. First, assuming that
the cryptographic systems we use are perfect, we demonstrate the security and authentication
properties of the mechanism. Second, we examine the assumption that cryptographic systems are
perfect, and suggest usage conventions that would make the protection mechanism less susceptible
to cryptanalysis.
7.2.3.1 Correctness Argument
It is difficult to prove facts about systems that are based on cryptography because, as we
discussed, it is difficult to show that cryptosystems have certain properties. Thus, what we will do is
to make strong assumptions about the behavior of the cryptographic systems that we use, and show
that the secrecy and authentication properties of the protection mechanism follow from these
assumptions.
First, we demonstrate the secrecy property of the protection mechanism, which is:
Secrecy.

x can be recovered from Seal[x, k] with a set of keys S if and only if S

»

k.

Our first assumption has to do with the security of cryptography:
PC. (Perfect Cryptography) Encrypt[x, k] reveals no information about k. If k is one half
of a public-key pair, then x can only be recovered with the other half of the public-key
pair, and if k is a classical key, then x can only be recovered with k.
We interpret "can only be recovered" to mean a total lack of information in the information
theoretic sense. PC is close enough to what is expected of a practical cryptosystem to make it a
reasonable assumption. However, we know of one exception to PC. To recover x from
Encrypt[Encrypt[x, kI], k2] should always require a key set S, such that S » kl and S » k2. In the
public-key system proposed by [Rivest et at. 78] if kl and k2 are a public-key pair, then
Encrypt[Encrypt[x, kI], k2] is x. In Key-And, where we will need to reason about an object that
has been sealed twice, we will assume that kl and k2 are not a public-key pair.
Our demonstration that the secrecy property follows from PC proceeds by induction on the
structure of the key k. Our basis is the case where k is a base key.
Basis. Assume that k is the result of Create-Base-Key or Create-Key-Pair. Seal[x, k] is
transformed to Encrypt[Add-Checksum[x], k]. The secrecy property follows from PC.
We now assume the secrecy property as our induction hypothesis, and consider each way that
derived keys can be created as our induction step.
Key-And.

Assume dk is the result of Key-And[ka, kb].

dk is

[k, [Seal[Seal[k, ka], kb]l1
where k is the result of a Create-Base-Key operation.

Seal[x, dk] is

[Seal[x, k], [Seal[Seal[k, ka], kbl1].
By the induction hypothesis we need k to recover x, and to recover k we need a set of keys
that is admitted to ka and to kb. Thus, a set of keys S can recover x if and only if

93

CHAPTER 7: PROTECTION

«S »

kb) A (S »

ka»

V (dk E S).

This is precisely the unseals relation for Key-And.
Assume dk is the result of Key-Or(ka. kb).

Key-Or.

dk is

[k. [Seal[k. ka). Seal[k. kb]]]
where k is the result of a Create-Base-Key operation.

Seal[x. dk) is

[Seal[x. k). [Seal[k. ka). Seal[k. kb]]].
By the induction hypothesis we need k to recover x. and to recover k we need a set of keys
that is admitted to ka or to kb. Thus. a set of keys S can recover x if and only if
(S »

ka) V (S »

kb) V (dk E S).

This is precisely the unseals relation for Key-Or.
The arguments for Submaster and Create-Indirect are very similar to Key-And and Key-Or. and
thus we will omit them.
Assume dk is the result of Seal-Only[k).

Seal-Only.

dk is

[ka. [Seal[kb. k]]]
where [ka. kb) is the result of Create-Key-Pair.

Seal[x. dk) is

[Sea1[x. ka). [Seal[kb. k]]].
By the induction hypothesis we need kb to recover x. and to recover kb we need a set of
keys that is admitted to k. kb only appears in openers. Thus. a set of keys S can recover
x if and only if
(S

»

k).

This is precisely the unseals relation for Seal-Only.
Key-Quorum.

Assume dk is the result of Key-Quorum[q. List[kl .. kn)).

dk is

[k. [Seal(Pl' k 1) ... Seal(Pn' k n)))
where k is the result of a Create-Base-Key operation. and Pl'" Pn are the result of
Threshold-Split[k. n. q).
Seal[x. k) is
[Seal[x. k). [Seal(Pl' k 1) ... Sea1(Pn' k n]]].
By the induction hypothesis we need k to recover x. To recover k we need q distinct values
of Pl'" Pn· By the induction hypothesis we can only recover q distinct values of Pl'" Pn
with a set of keys that is admitted to q distinct keys in k 1 ... k n . Thus. a set of keys Scan
recover x if and only if for some set of q distinct keys ka ... kq drawn from k 1 ... k n :
«S »ka)

A .. , A

(S»

kq )} V (dk E S).

This is precisely the unseals relation for Key-Quorum.

D.

94

CHAPTER 7: PROTECTION

We will now demonstrate the authentication property of the protection mechanism, which is:
Authentication. If x was not sealed with a key that is unsealed by S then the result of
Unseal[x, S] will be Error['Failed].
To demonstrate the authentication property we need to make an assumption concerning the
behavior of checksums:
CHK. Assuming x and Encrypt[x, k] are known, it is intractable to compute Encrypt[AddChecksum[x], k] unless k is known.
The argument we use is once again based on induction on the structure of a key that was used
to seal an object As our basis we will assume that Unseal decides that an object has been sealed
with a base key.
Basis. Assume that Unseal decides that x has been sealed with base key k. By CHK,
unless the client that sealed x had k it could not generate the correct encrypted checksum.
If the checksum of Decrypt[x. kl] is not valid Unseal returns Error['Failedj.
Now we assume the authentication property as the induction hypothesis, and show that if an
object was sealed with a derived key the authentication property is also guaranteed.
Induction Step. Assume that Unseal decides that x has been sealed with a derived key. It
unseals what it thinks is an opener with kl. If the opener was sealed a key that is unsealed
by kl, then this operation will return a new key list kl'. By the induction hypothesis this
operation would return Error['FailedJ if kl' had not been sealed by a key that is unsealed by
kl. Unseal then uses kl and kl'to unseal x. By the induction hypothesis if x was sealed by
a key that is not unsealed by the union of kl and kl'then Unseal will return Error['Failedj.
A key that is unsealed by kl was required to create the opener that contained kl'. Thus, if
x was not sealed with a key that is unsealed by kl then the result of U nseal[x, kl] will be
Error['Fai/edJ.

D.
7.2.3.2 Susceptibility to Cryptanalysis
In the previous section we assumed that cryptography is perfect Of course it is not Often,
breaking a practical cryptographic system is a matter of economics [Diffie and Hellman 77].
However, if some guidelines are followed, the susceptibility of our protection system to cryptanalysis
can be reduced.
The cryptosystems that are used must be secure against a known-cleartext attack. The
checksums that are included in sealed objects and the implementation of Key-And increase the
probability that an intruder will be able to use a known-c1eartext attack.
It is wise to keep the amount of information protected with a single key to a minimum. This
makes it more difficult for an intruder to perform a known-cleartext attack, and it reduces the
vulnerability of the system to a single cryptanalytic success. Because keys are inexpensive, a client
should use as many keys as are natural for its application.
The strength of the cryptographic system used to protect information should correspond to the
length of time the information needs to be kept secret For example, if certain information is going

95

CHAPTER 7: PROTECfION

to be sensitive for twenty years, it would not be wise to protect it with a cryptosystem that can be
broken in one year. Another metric is that the value of information protected with a cryptographic
system should be considerably less than the cost of an attack on the system. Cryptographic sealing
can accommodate the coexistence of a number of cryptosystems that have different key sizes and
strengths. Thus, the strength of a cryptosystem can be matched to the sensitivity of the information
it protects.
Finally, there is no need to divulge information that might lead to a successful cryptanalytic
attack to clients that do not need to know the information. For example, public keys can be
protected from clients that do not need them.

96

7.3 Applications

We now turn our attention to applications of cryptographic sealing. The first two sections show
how cryptographic sealing and a simple active protection mechanism can implement a variety of
popular protection mechanisms, including capabilities, access control lists, and information flow
control. The third section demonstrates secure processors. The last section shows how secure
processors can be used to implement revocation.
7.3.1 Privilege Establishment
7.3.1.1 Key Rings
Keys are used to represent privileges, and thus a list of keys defines a set of privileges. Each
user has a personal list of keys, or key ring, that defines his privileges. When a key ring is stored, it
is sealed with a key that only its owner knows. A user authenticates himself to the system by
providing his key ring key. The key ring key is used to unseal the user's key ring in his processor,
resulting in his list of privileges.
A key ring key could be stored on a magnetic card, or perhaps transformed into an easily
remembered sentence, such as "Ralph and George ran to the store on a rainy cold day with their
Aunt Essie's dog Fred". Such a transformation could be accomplished with a context free grammar.
A user's unsealed key ring is the third argument to Open, ring. Open is very careful with a
user's key ring, and will not transmit it to any other processor.
7.3.1.2 Encrypted Objects
Cryptography is used to control access to information stored in files and indexes. An encrypted
object E is an object that is encrypted with a certain conventional key K. Thus, possession of K
gives a client the ability to access the information in E. Objects may be encrypted many times.

Create-Encrypted[ref: Reference, k: Key / eref: Encrypted]
Create-Encrypted creates an encrypted object. If the object is a file or an index, then to
access the object a client must have a key that is admitted to k. Create-Encrypted assumes
that the referent of ref will be totally overwritten and currently contains no information.
Encrypted objects are implemented by creating a special reference that contains their key and a
reference to their encrypted form.
Encrypted

~

Record[ref: Reference, key: Key];

Create-Encrypted[ref: Reference, key: Key / eref: Encrypted]
eref ~ Create[Encrypted];
eref.ref ~ ref; eref.key ~ key;
Return [Add-Type[eref, Major-Type[ref]ll;
];

~

Prog[ [];

When an encrypted object is opened, a class structure is set up that will encrypt and decrypt the
data portions of read and write requests.

97

CHAPTER 7: PROTECfION

Open-Encrypted[ref: Reference, tc: TC, ring: List[Key], guards: List[Key] / c: Class]
+- Prog[ [k: Key; ew: Function];
try to unseal key
k +- NA-Unseal[ref.key, ring, tc];
-- failed, client does not have access
IF Is[k, Error-Type] THEN Retum[k];
ew +- IF Is[ref, Index] THEN 'Encrypted-Index-Write ELSE 'Encrypted-File-Write;
c +- Open[ref.ref, tc, ring, guards];
IF Is[c, Error-Type] THEN Retum[c];
c +- Create-Class[List[
'Copy Reference, 'Default-Copy,
'Read, 'Encrypted-Read,
'Write, ew,
'Enumerate, 'Encrypted-Enumerate], c];
-- Instance variables: k, ref
Retum[c];
];
Encrypted-Read[1 value: Byte-Array] +- Prog[ [];
value +- Apply[superclass, request];
Retum[Conventional-Decrypt[value, k.key]];
];
Encrypted-Index-Write[entry-name: Byte-Array, value: Byte-Array] +- Prog[ [];
IF Null[value] THEN Retum[superclass I Write[key, NIL]]
ELSE Retum[superclass I Write[entry-name, Conventional-Encrypt[value, k.key]]];
];
Encrypted-File-Write[startpage: Integer, pages: Integer, value: Byte-Array] +- Prog[ [];
Retum[superclass I Write[startpage, pages, Conventional-Encrypt[value, k.key]]];
];
Encrypted-Enumerate[last: Entry I next: Entry] +- Prog[ [];
IF Not[Null[last]] THEN last value +- Conventional-Encrypt[lastvalue, k.key];
next +- superclass I Enumerate[last];
next value +- Conventional-Decrypt[next value, k];
Retum[next];
];

7.3.1.3 Guarded Objects
To provide integrity and availability we introduce a simple active protection mechanism.
Imagine that each object is assigned a unique set of passwords, one for each of its independent
privileges. We will call these passwords guards. Because each object has a unique set of guards,
they must be stored with the object. For example, suppose file F is assigned write guard G. The
processor that stores F would require that G be presented for each write access.
Guards are presented in the fourth argument to Open, guards. Guards are checked at open
time. For example, if a reference is opened for update (it is not a Read-Only reference) then Open
checks for a write guard if one is required. Guards can be directly manipulated by a client, or the
facilities described below can be used to help manage guards.
The system defines a standard set of guard types. An Access guard must be provided before an

98

CHAPTER 7: PROTECfION

object can be used in any way, a Read guard must be provided before an object can be read, a
Write guard must be provided before an object can be written, a Create guard must be provided
before an object will service a create operation, and a Change guard must be provided before an
object will allow its guards to be changed. An object can implement a subset of these guard types,
or it can choose to implement guards based on its special needs.
c: Class

I

Set-Guard[gt: Type, gk: Simple-Key]

Set-Guard sets a guard for the object serviced by c to be gk. The guard will
protect the set of privileges specified by gt. If gk is NIL Set-Guard removes
guards.
ic: Index-Class

I

Set-Entry-Guard[entry-id: Byte-Array, gt: Type, gk: Simple-Key]

Index entries can also be protected by guards. Set-Entry-Guard sets a guard for the
index entry specified by entry-id. The guard will protect the set of privileges
specified by gt.
If gk is NIL Set-Entry-Guard removes guards.

7.3.1.4 Protected Volumes
In many cases clients would like to have the files that they create protected automatically. To
this end we provide the notion of a protected volume. Files created on a protected volume assume a
default protection structure specified by the protected volume.
Create-Protected-Volume[ref: Reference, tl: List[Type], kl: List[Key] / pref: Protected]
Create-Protected-Volume creates a reference for a protected volume. All files
created on the protected volume will have the types of access controls in tl set, and
the resulting privilege keys will be sealed with corresponding elements of kl. The
elements of tl may be guard types, or they may be the type Encrypted, in which
case the file will be encrypted.
Protected

~

Record[ref: Reference, tl: List[Key], kl: List[Key]];

Create-Protected-Volume[ref: Reference, tl: List[Key],
Protected]
~ Prog[ [];
pref ~ Create[Protected];
pref.kl ~ kl;
pref.ref ~ ref;
pref.tl ~ tl;
Retum[Add-Type[pref, Major-Type[ret]]];

kl:

List[Key]

/

pref:

];
When a protected volume is opened the model implementation establishes a class structure that
will set protection controls in response to create file operations, and that will properly copy a
protected volume reference.
Open-Protected[ref: Reference, tc: TC, ring: List[Key], guards: List[Key] / c: Class]
~ Prog[ [];
c ~ Open[ref.ref, tc, ring, guards];
IF Is[c, Error-Type] THEN Retum[c];
Retum[Create-Class[List[
'Copy Reference, 'Default-Copy,

99

CHAPTER 7: PROTECfION

'Create-File, 'Protected-Create], c]];

];
Protected-Create[1 cref: Capability] ~ Prog[ [];
[k: Key; cl: Class;
priv: List[Sealed]; x: Cons[Type, Key]];
priv ~ NIL;
-- create file
cref ~ Apply[superclass, request];
open file
cl ~ Open[cref, te, ring];
-- apply guards or encrypt file
FOR x IN Pair[ref.tl, ref.gl] DO
k ~ Create-Base-Key[];
IF Is[car[x] , Encyrpted] THEN
cref ~ Create-Encrypted[cref, Seal[k, cdr[x]]];
ELSE [
cl I Set-Guard[car[x], k];
priv ~ Append[priv, Seal[k, cdr[x]]];
];
];
cl I Close[];
-- Capabilities are described in Section 7.3.2.1
Return[Create-Capability[cref. priv]];
];

7.3.2 Common Protection Mechanisms
All of the ingredients are now at hand for creating a large number of common protection
mechanisms. We have represented privilege by the possession of keys, and these keys can be sealed
such that only authorized clients can unseal them. We will treat the three major types of protection
mechanisms in current use: capabilities, access control lists, and information flow control.
7.3.2.1 Capabilities
A capability is an unforgable ticket which permits a possessor to access the object it names with
certain privileges [Dennis and Van Horn 66, Lampson 69]. No special privileges are required in our
system to make a copy of a capability.
Capabilities can be implemented by including a set of keys and guards in an object's reference.
For example, imagine object 0 is encrypted with the conventional key K, and G is the write guard
for object o. If a reference for 0 includes K, then it can be used to read O. If the reference also
includes G, it can be used to read or write O. Without G or K it is impossible to read or write O.
To the extent that keys and guards are considered impossible to guess, capabilities can be
considered unforgable.
A capability reference is shown in Figure 7.5.
Create-Capability[ref: Reference, pI: List[Key] I cref: Capability]
Create-Capability creates a capability for ref. pI is the list of keys that defines the set of
privileges that the new capability will have. If any elements of pI are sealed, an attempt
will be made to unseal them when the capability is opened.
Capability

~

Record[ref: Reference, pI: List[Key]];

100

Object Ref
Key
Write Guard

Capability Reference
Figure 7.5

101

CHAPTER 7: PROTECTION

Create-Capability[ref: Reference, pI: List[Key] / cref: Capability]
cref +- Create[Capability];
cref.pI +- pI;
cref.ref +- ref;
Return[Add-Type[cref, Major-Type[ret]]];
];

+-

Prog[ [];

Open-CapabiIity[ref: Reference, tc: TC, ring: List[Key], guards: List[Key] / c: Class]
+- Prog[ [PI];
pI +- NormaIize[ref.pl, te];
pI +- NA-Unseal-List[pl, Append[ring, guards], te];
-- expand privilege keys by pI
c +- Open [ref.ref, te, Append[ring, pI], Append[guards, pI]];
IF Is[c, Error-Type] THEN Retum[c];
Retum[Create-Class[List['Copy Reference, 'Default-Copy], c]];
];

7.3.2.2 Access Control Lists
An access control list system associates a list of users with each object. This list describes who
may access the object, and with what privileges [Saltzer and Schroeder 79].
To implement access control lists each user of the system creates a key pair, and makes one of
the keys of this pair public. Section 7.4.2 discusses how one user can reliably learn another user's
public key. A user keeps the private half of his key pair on his key ring. Thus if user X seals an
object with user Y's public key, only user Y will be able to unseal it. This result follows directly
from the public-key cryptography that is used to implement key pairs.
Access control lists are implemented by sealing the privileges in a capability reference, as shown
in Figure 7.6. In general, a key or guard is sealed with the Key-Or of the users' keys that have
been granted the corresponding privilege. Once sealed, these references can be placed in a public
directory system. Only users that have been granted a privilege will be able to unseal its
corresponding key or guard.
It is often desirable to be able to grant privileges to a group of users and allow the members of
the group to change over time. If revocation is not required, indirect keys can be used to define a
group as
Group- Key +- Seal-Only[Create-In direct- Key[
Key-Or[ul ... Key-Or[un-l, un]]]];
where ul ... un are the users' keys that are members of the group. The members of the group can
be altered with Change-Indirect-Key.

7.3.2.3 Information Flow Control
In an information flow control scheme [Denning 76] each object is labeled with one or more
classifications, and the output of a computation is labeled with the union of the classifications of its
inputs. An example of an information flow control scheme is the military system of classification.
If Top Secret and Crypto information are used in a report, then the report would be classified Top
Secret, Crypto.
Information flow control schemes are usually nondiscretionary, meaning that the classification of

10)

Read
(Jones>, (Smith>, (Zephyr>

Object Ref

II
II

Write
(Jones>, (Smith>

Key
.....---key23

Write Guard

TI
TI

key12

Key23 = Key-Or[(Jones>, (Smith>, (Zephyr>]
Key12 = Key-Or[(Jones>, (Smith>]

Object Reference: Access Control List
Figure 7.6

103

CHAPTER 7: PROTECTION

a computation's output is fixed, and can not be altered by a client In the information flow control
scheme we present a client chooses the classifications of its outputs. Once a classification is
specified, the protection mechanism ensures that it is properly enforced.
Cryptographic sealing can be used to implement information flow control in the following
manner.
Represent each classification by a key.
If 0 is created from objects that have
classifications C] ... C n• seal the privileges in O's reference with Key-And[C] ... Key-And[Cn_]. C,J].
For example, imagine that the fictitious company Sierra has two divisions, a medical division
and an office division. Sierra would like to enforce the policy that information that is private to
one division is only accessible to employees of that division. Furthermore, Sierra has financial
information that only senior managers are allowed to access. Sierra could create keys to represent
these classifications as follows:
Medical

Office

t- Seal-Only[Create-Indirect- Key[
Key-Or[]]]];

Seal-Only[Create-Indirect-Key[
Key-Or[(jones), ... Key-Or[

""
~


0
0



.,,.
Private Doors Set

'"

r

Remote P rocesso r

Structure of a Secure Located Class
Figure 7.7

108

Class

CHAPTER 7: PROTECTION

Create-Revocable-Capability creates a capability that has certain privileges, and these privileges
can be changed by Change-Revocable-Capability. If a client's privileges are reduced by ChangeRevocable-Capability. on subsequent calls to Open there is no way the client can assume its old
privileges.
Create-Revocable-Capability[ref: Reference. pI: List[Key]. key-list: List[Key], ck: Key, i: Index, sp:
Secure-Processor, tc: TC / rref: Revocable]
Create-Revocable-Capability creates a new revocable capability. pi is the set of privileges
that the capability has, and these privileges are protected by key-list with Seal-List. Clients
that can unseal ck can change the revocable capability. The index and secure processor
specified are used to implement the capability. and must be trusted.
Change-Revocable-Capability[rref: Revocable, pI: List[Key], key-list: List[Key]. ring: List[Key], tc:
TC]
Change-Revocable-Capability changes rref such that the new set of privileges is pi, and
these permissions are protected by key-list with Seal-List A key ring that can unseal ck
must be supplied.

7.3.4.1 Protected Indirection
In order to implement revocation we first need to introduce indirection with access protection.
The functions shown below are a simple variation of the standard indirection primitives. As an
exercise the reader might consider how to implement these functions from the primitives we have
introduced.
Create-Secure-Indirect[record: Any, index: Index, tc: TC, access: Key / ind: Indirect];
Create-Secure-Indirect creates an indirect entry that can only be accessed by a client that
can unseal access.
Change-Secure-Indirect[ie: Indirect, contents: Any. tc: TC, ring: List[Key]]
Change-Secure-Indirect changes an indirect entry if ring unseals access, and returns NIL.
Otherwise Error['Failed] is returned.
Lookup-Secure[ie: Indirect, tc: TC, ring: List[Key]]
Lookup-Secure returns the contents of ie if ring contains a key that unseals access.
Error['Failed] is returned if ring does not unseal access.

7.3.4.2 Revocation Algorithm
Figure 7.8 shows how revocation is accomplished. Assume that the client can not access the
second capability shown, but secure processor SP can. The client's key ring is used to used to
unseal the first capability, which yields a set of intermediate keys. These intermediate keys are then
passed to SP. SP takes the set of intermediate keys and determines the privilege keys for the object
SP opens the object on the client's behalf with these keys. Intermediate keys are employed so a
user's key ring does not have to be passed to SP.
The model implementation of revocation follows.

109

Secure Processor SP
Located Ref

I

I

SP
I ndi rect Ref

I

~

r

I

Indirect Privileges

Revocable
Capability

'"r

I

I
I

"',."

I

ikeyl

I

I

]

ckeyl

pkeyl
ikeyl

I

]

0

0

0

0

I

l

Reference

ikeyn

I

]

pkeyn

]

ikeyn

ckeyn

Capability

Key List

Protected f rom Client

,."'"

Object

Protected by:
pkeyl, """' pkeyn

A Revocable Capability
Figure 7.8

110

CHAPTER 7: PROTECfION

Create-Revocable-Capability[ref: Reference, pi: List[Key], key-list: List[Key], ck: Key, i:
Index, sp: Secure-Processor, tc: TC / rref: Revocable] +- Prog[
[nkl: List[Key]; gate: List[Sealed[Key]]];
nkl +- Create-Key-List[Length[pI]];
gate +- Seal-List[pl, nkl];
key-list +- Seal-List[nkl, key-list];
-- with nkl gate can be unsealed
rref +- Create-Capability[ref, gate];
-- we assume that the secure processor will use its
-- private key when it looks up indirect references.
rref +- Create-Secure-Indirect[rref, i, tc, Key-Or[ck, sp.public-key]];
-- switch to secure processor
rref +- Create-Located[rref, sp];
-- with key-list nkl can be unsealed
rref +- Create-Capability[rref, Create-Indirect[key-list, i, tc]];
Return[rref];
];
Create-Key-List[n: Integer / kl: List[Key]] +- Prog[ [i: Integer];
-- creates a list of n keys
kl +- NIL;
FOR i FROM 1 TO n 00 kl +- Cons[Create-Base-Key[], kl];
Return[kl];
];
Change-Revocable-Capability[rref: Revocable, pi: List[Key], key-list: List[Key], ring:
List[Key], tc: TC]] +- Prog[
[nref: Reference;
oref: Reference; nkl: List[Key]; gate: List[Sealed[Key]]];
-- create new set of intermediate keys
nkl +- Create-Key-List[Length[pI]];
-- create new sealed privilege list
gate +- Seal-List[pl, nkl];
-- with key-list, a client can discover intermediate keys
key-list +- Seal-List[nkl, key-list];
oref +- Lookup-Secure[rref.ref.ref, tc, ring];
nref +- Create-Capability[oref.ref, gate];
Change-Secure-Indirect[rref.ref.ref, nref, tc, ring];
Change-Indirect[rref.pl, key-list, tc];
];
7.4 Practical Considerations

7.4.1 Changing Protection Controls
Once a set of protection controls has been established there are two ways of changing the
controls. The first way is to create the protection structure with indirect keys. For example, if an
indirect key is made for the users that can access a file, it is a simple matter to change this indirect
key to authorize additional users. Another way to change protection controls is to reconfigure an
object to a new implementor that is protected with a desired structure. The new implementing
object may in fact be the same object as the old one, but protected in a new way.

III

CHAPTER 7: PROTECfION

7.4.2 Authentication in the Large
A practical system must ensure that external names are mapped into correct internal keys.
There are a number of solutions to this problem. Needham and Schroeder [Needham and
Schroeder 78] have demonstrated one way of transforming external names into authentic internal
keys.
Here is a straightforward way to solve the problem in our framework. Provide every processor
with a reference R that is located with a secure processor. Let R be a reference for the file that
contains the root of the naming system. If a client uses R as the starting point in resolving names
to references or keys, then the client will obtain an authentic internal reference or key. We assume
that R is protected against malicious modification.
The problem with this approach is that a corrupt system administrator could change R's referent
in such a way that users would unwittingly give him access to their objects. To guarantee that an
intermediate party does not tamper with key distribution it is necessary to distribute personal public
keys outside of the system.
Another approach would be to create a key pair, [keya, keyb]. keya is kept secret by the system
administration, and 
Reconfigurable Ref for 

Indirect Ref for 
Index: Index-Index

Reconfigu rable Index Ref

,'"

Index: Index-Index

File Index

Volume Suite 
Indirect Volume

Located Volume
id

r: 2

,'"

Index: Index-Index

,'"

id:31294
loc:

w: 2

I Processor I

Votes: 1
id: 2274736
Indirect Volume

Located Volume
id

,'"

Index: Index-Index

id: 28946
loc:

I Processor I

Votes: 1
Indirect Volume

Located Volume
id

Index: Index-Index
Votes: 1

A Basic Storage Service
Figu re 8.1

117

,

.10.

id: 38464
loc:

I Processor I

: Read

Protected Volume

: Read
: Write

Read: Key-Or[{Key-2837},
{Key-3482}]
Write: {KeY-3482}
ref:

I



: Read

Protected Volume

: Write

Read: {Key-76}
Write:
ref:

{KeY-76}

I



Protection St ructu re
 will be able to be read by
users in (Accounting Dept.> or in (Manufacturing> and updated by users in (Accounting Dept.>.
(Accounting Dept.> and (Manufacturing> can be changed as people join and leave the groups.
This is accomplished by changing their indirect keys. Only (John Dover> can read and write files
created on (John Dover's Volume>.
Note that there is no way to treat the files created on (Accounting Volume> as a group. The
same is true for files created on (John Dover's Volume>. Thus. it is not possible to change the
protection of all of the files created on (John Dover's Volume> because there is no way to
enumerate these files. However. if (John Dover's Volume> had been a reconfigurable protected
volume. then we could change the protection of all of the files that had been created on it. There
is an important subtle difference between reconfigurable protected volumes and protected
reconfigurable volumes.
In general. configuration is a difficult problem. This fact is hidden from naive clients. as they
can use storage without knowing how it is con figured.
A methodology for configuring a system is as follows:

1.

First. carefully understand the needs of the client population. Clients' needs should be
analyzed in enough detail to understand specific requirements for capacity. reliability.
availability. performance. security. and flexibility.

2.

Choose a set of hardware components that is likely to satisfy these client requirements.

3.

Organize the raw capability represented by the hardware components into a set of external
objects (such as volumes) that clients will use. This corresponds to specifying the client
interface to the storage service without deciding how the storage service is implemented.

4.

Determine how the external objects will be configured. Suite configurations. the location of
volumes. an indexing structure. and so on must be selected based. in part. on the
anticipated applications of the system.
For example. a wide variety of indexing structures can be created. For a very small
configuration all objects could be indexed by the index-index: for large configurations it is
likely that several levels of indexes will be common (index B indexes volume C. index A
indexes index B. the index-index indexes index A).

5.

Review the properties of the resulting system. and compare them with the requirements
established in Step 1.

The last four steps are repeated until an acceptable con figuration is chosen.
It is likely that a program could help in this process. A program could have knowledge of the

119

CHAPTER 8: PRACTICAL CONSIDERA nONS

types of hardware components available. possible component interconnections. and how the facilities
we have proposed can be used to create desired properties. For example. the program could
suggest alternative hardware and suite configurations to achieve a required level of availability.
Furthermore. when a new type of storage service is required it is important to consider how
existing storage facilities might be utilized. A configuration program could maintain a data base of
available storage services and their properties and automatically consider how they might be used.
In summary. capacity can be added to a configuration by adding storage devices. and the
properties of storage can be improved by using the facilities described by the system model.

8.2.2 Dynamic Configuration
The configuration decisions we have just outlined so far have two properties. First. the
decisions are reviewed infrequently. Second. the decisions are large in the sense that they affect the
abstractions that clients see and they involve trucks dropping off new equipment at loading docks.
We will call the problem of making such decisions static configuration.
It is also possible to take discretionary actions to improve the performance or lower the cost of
a system. These decisions do not affect the objects that users see and they are essentiaIIy reviewed
continuously. We will caII the problem of making such decisions dynamic configuration. Dynamic
configuration includes ideas such as:
Using write-once storage to store a file and. when the file is opened for update. temporarily
reconfiguring the file to read-write storage. This allows lower cost storage technologies to
be used without the knowledge of clients.
DynamicaIIy creating weak representatives in response to perceived local needs for
information.
Dynamically

changing

suite

configurations

to

match

changes

in

the

referencing

characteristics of clients.
It is likely that decision analysis [Raiffa 70] can be applied to help formulate strategy for
dynamic configuration.
Decision analysis aIIows strategy to be formulated with incomplete
information and allows the value of additional information to be judged.
8.3 Summary

We argued that the system model is practical. based on prototypes that have been constructed.
The model implementation has benefited from experience with a number of prototypes. Thus. it is
likely that full-scale implementations of the system model will be patterned after the model
implementation. System configuration was observed to have two independent aspects. The first
was the static arrangement of hardware components and logical objects to provide a useful service.
The second was the problem of real-time system management which involves strategies for taking
discretionary actions to improve the performance of the system.

120

Chapter 9: Summary of Ideas

In this paper we have considered the problem of information storage in a decentralized
computer system. The major ideas that we adopted were:
1.

Transactional Storage Transactions that guarantee totality. serial consistency, and external
consistency were used to simplify parts of the system. As we have pointed out. all of the
properties of transactions are not always required, but in some instances they provide a
foundation that simplifies system design to a large degree.

2.

Object Style The system model was constructed using an object oriented programming
style. This style allowed a diverse set of ideas to be considered and explained separately.

From these two starting points. we introduced the following major novel ideas:

1.

Naming References were introduced to name objects. Internally. a reference is a typed'
record. To a client a reference appears to be a variable length byte string. References can
include such things as location information, protection guards, cryptographic keys, and other
references. In addition. references can be made indirect to delay their binding to a specific
object or location.

2.

Location A new location mechanism was presented that hides the location of objects.
Location was implemented by using indirection to delay the binding of references to object
storage sites.

3.

Replication A new replication algorithm was introduced that can improve the availability.
reliability. and performance of objects. It was shown how previous replication algorithms
were special cases of the new algorithm. and how temporary copies naturally fit into its
framework.

4.

Reconfiguration A new reconfiguration mechanism was presented that will dynamically
reorganize objects. Reconfiguration is accomplished by indirection and state transfer.

5.

Cryptographic Sealing A new very flexible low-level protection mechanism based on
sealing objects with cryptographic techniques was introduced. The mechanism can be used
for fine grained protection. Assuming perfect cryptography. the mechanism was shown to
be correct.

6.

Object Protection
The low-level protection mechanism was used to create popular
protection structures. Access control lists, information flow control. capabilities, secure
communication channels. and revocation were implemented in terms of our new low-level
protection mechanism.

121

CHAPTER 9: SUMMARY OF IDEAS

In Chapter 1 we established a set of architectural principles that the system model was to
observe. Here we briefly review how the system model satisfies each of the twelve principles.
1.

The system we described will always function in a well defined manner. The semantics of
concurrent access to storage are well defined. and the mechanism provided for object
location is always guaranteed to find an on-line object.

2.

The storage service we provide is based on the ideas of files and volumes.

3.

Stable storage is resilient to a set of expected failures.
unexpected failures.

4.

References provide unambiguous low-level names that can be used in a variety of ways by
clients.

5.

Transactions mediate concurrent access to storage in a well defined way.

6.

The assumed environment of the system model is decentralized. and a mechanism is
provided for locating objects.
Clients can also choose to use located references.

7.

There is no inherent limit to the size of the system. Processors and storage devices can be
added to increase the capacity of the system. and the number of users that it can service.

8.

Suites provide a comprehensive facility for replicating objects to improve their availability.
reliability. and performance. Weak representatives allow cache copies to be handled in a
natural way.

9.

The reconfiguration mechanism allows the storage that is used to implement a storage
system object to be dynamically changed.

Furthermore. it will detect most

10. A low-level protection mechanism based on cryptographic sealing is provided. and we
showed that it is secure. This mechanism is used to create popular protection policies.
11. Volume suites. reconfigurable volumes. and protected volumes allow complex configuration
structures to be hidden from clients.
12. A client specifies the volumes that are used to store its information.
guarantee its autonomy if it so desires.

Thus. a client can

This paper has considered problems that occur in large scale in formation systems. We have
taken care to formulate our solutions to these problems in a general way. Some of our solutions.
such as the protection mechanism. will no doubt find use in a variety of applications.
In the years to come it will be a challenge to design and build large information systems. These
systems will alter our life style. influence the way that we interact with each other. and even contain
sensitive information about us. It is our hope that the ideas we presented will be useful to the
future designers of such systems.

122

Appendix A: Exposition Language· EL

EL. the programming language that is used in the paper to describe algorithms. is a simple
extension of Lisp 1.5. The purpose of this appendix is to fully describe the extensions we have
made. The reader unfamiliar with Lisp will find the Lisp 1.5 Manual [McCarthy et al. 62] helpful.
A.I Language Extensions
To improve the readability of source programs a number of minor syntactic extensions are
added to M-Expressions. These extensions have been chosen to make EL programs easy to read for
people that are familiar with contemporary algebraic languages.
Things that are ignored:
Any line that starts with two dashes "--" is treated as a comment and is ignored.
The expression after a colon ":" is ignored. This is so the types of variables can be
included as comments in the source text. "Any" is used to indicate that a variable may
contain any expression.
Transformations:
Commas are used to separate variables.
Function definitions can include a formal return variable after a slash "I".
transformed to F[x] +- Prog[[y]: Return[z]].

F[x/y]

+- Z

is

The following constructs are adopted from Interlisp [Teitelman et al. 78]: IF THEN. IF
THEN ELSE. WHILE DO. UNTIL DO. FOR DO. and DO. As in Interlisp, they are
transformed into progs.
As in Interlisp. certain infix operators are transformed to appropriate functions and
predicates. The infix predicates transformed are: =. (, >. (=, >=. # (not equal). The
infix functions transformed are +-. +. -. *. I. t. and they must have a space on either side.
A shorthand for class invocations is provided. x I y[z] is transformed to Apply[x, List['y, z]]
(see Section A.2).
Record accesses are transformed to make programs easier to read. The form x.y +- z is
transformed to Store[x, 'yo z]. The form x.y without a trailing assignment operator is
transformed to Fetch[x. 'y] (see Section A.3).
If a function is evaluated with too many arguments. the extra arguments are ignored. If a
function is evaluated with too few arguments. NIL is supplied for the missing arguments.
A.2 Classes
Seemingly identical requests can require substantially different amounts of processing. Consider
reading a file. Reading one file might not present any unusual complications. but reading another

123

ApPENDIX A: EXPOSITION LANGUAGE - EL

file might require decryption, logic to determine which copy of the file to actually read, and
communication to a remote processor. Ideally, a client should not be able to tell the difference
between the two files.
The notion of a class allows algorithms to be composed with one other to provide a composite
service that has a standard interface. A class can use private state and other classes to process the
requests that it receives. For example, a class that implemented a replicated file might need state to
keep track of what copies are current, and might use other services to read and write file copies.
The concept of a class is further elucidated in [Ingalls 78].
To request that a class perform a function the class is applied to a request list. The car of the
request list is the function to be performed, and the remainder of the request list consists of
arguments to the function.
Section A.I introduced the shorthand for invoking a class
class

I

operation [argI .. argn]

which is transformed to
Apply[c1ass, List['operation, argl. ... , argn]].
When a class starts executing it has at hand a request list, and it also has private state variables
that were declared when the class was created. These variables are implemented by the closure
mechanism in Lisp 1.5.
Create-Class[function-map: List superclass: Class / class: Class]
Create-Class creates a class that can process the functions included in junction-map.
junction-map is a list of operation names and functions to implement them. For example, if
Create-Class was applied to List['Read, 'FRead, 'Write, 'FWrite] it would create a class that
would evaluate FRead when it received a Read request, and FWrite when it received a
Write request. If an operation is not contained in junction-map then the request is passed to
superclass, if it has been specified. If superclass is NIL and the operation is not contained
in junction-map then Error[,NoFunction] is returned.
Create-Class saves the environment in which it was evaluated. This saved environment is
restored whenever the class it created is evaluated. This allows a class to maintain private state
between invocations. Using our previous example, FRead and FWrite would both execute in the
environment that Create-Class had, and thus could save state and communicate with each other.
The following distinguished variables are defined when a class is executing:
self
superclass
request

The current class.
This class' superclass.
The current request.

The class mechanism is implemented by the following functions:
Create-Class[function-map: List superclass: Class / self: Class] +- Prog[
self +- Function[Standard-Class]:
-- self. superclass, and request will be available to the class
Retum[self]:
]:

124

0:

ApPENDIX A: EXPOSITION LANGUAGE - EL

Standard-Class[request: List / result: Any] ... Prog[
[f: Function~ operation: Atom~ self: Class]~
operation ... car[request]~
f ... Listget[f~nction-map, operation]~
IF Null[t] THEN [
-- No function. See if we have a superclass.
IF Null[superclass] THEN Return[Error['NoFunction]]~
Return[Apply[superclass. request]]~
]:
Return[Apply[f. cdr[request]]]:
]:
Listget[x: List y: Atom / result: Any] ... Prog[ 0:
takes a list of the form [atom. val. ... atom. val]
-- returns the value associated with y
IF Null[x] THEN Return[NIL]:
IF car[x] = y THEN Retum[cadr[x]]:
Return [Listget[cddr[x]. y]]:
]:
A.3 Records

To make data structures easier to understand and implement the notion of a record is
introduced. A record instance consists of a set of independent fields that can be read and written
with Fetch and Store. respectively. A field is just like a variable. and is named by a tag. A record
instance is made by applying Create to a record type. A record type is created by applying the
function Record to a list of tags the new record might contain. It is also possible to create a new
record type by combining two existing record types with Extend.
Every record is assigned a unique type. Different types of records are used for different things,
and often it is convenient to be able to determine the type of a record instance. A client can use
the function Is to test the type of a record instance.
Record[tagl: Atom. tag2: Atom. ... tagn: Atom / type: Record-Type]
Record creates a new record with a distinct type. The arguments to Record are discarded.
Record[/type: Record-Type] ... Prog[
Return[List[GetUniqueID[]]]
]:

0

Create[type: Record-Type / instance: Record-Instance]
Create creates an instance of a record.

The instance inherits the types of its template.

Create[type: Record-Type / instance: Record-Instance] ... Prog[
Return[List[Cons['type. type]]]
]:

125

0

ApPENDIX A: EXPOSITION LANGUAGE - EL

Store[instance: Record-Instance. tag: Atom. value: Any]
Store sets the value of the field named by tag to be value. Store also returns value.
Store[instance: Record-Instance, tag: Atom. value: Any] ... Prog[ []
PutAsscx;[tag. value, instance]:
Retum[value]:
]:
Fetch[instance: Record-Instance. tag: Atom / value: Any]
Fetch returns the value of the field named by tag in instance. If no value has been stored
in the field. NIL is returned.
Fetch[instance: Record-Instance. tag: Atom / value: Any] "'Prog[ []
value ... Assoc[tag. instance]:
IF Null[value] THEN Return[NIL]:
Retum[cdr[val ue]]:
]:
Extend[typea: Record-Type. typeb: Record-Type / type: Record-Type]
Extend creates a record type that is the union of its input record types.
I

Extend[typea. typeb / type] ... Prog[ []
Return[Append[typea. typeb]]:
]:
Is[x: Record-Instance. y: Record-Type / result: Boolean]
The type of a record instance can be checked with Is. Is returns T if the the types of yare
a subset of the types of x. Otherwise it returns NIL.
To define Is we need an auxiliary function. Subset. Subset[x: List. y: List] returns T if x is
a subset of y.
Is[x: Record-Instance. y: Record-Type / result: Boolean] ... Prog[ []
IF Null[x] THEN Return[NIL):
IF Atom[x) THEN Retum[NIL]:
Return[Subset[y. Fetch[x. 'type]]]:
]:
Subset[x: List y: List / result: Boolean] ... Prog[ []
IF NUll[x] THEN Return[T]:
Return[And[Member[Car[x]. y]. Subset[Cdr[x]. y]]]:
]:
Add-Type[x: Record-Instance. t: Record-Type / z: Record-Instance]
Add-Type adds the type t to the record instance specified by x.

x is also returned.

Add-Type[x: Record-Instance. t: Type / z: Record-Instance] ... Prog[ []:
x.type ... Union[x.type. t]:
Return[x]:
]:

126

ApPENDIX A: EXPOSITION LANGUAGE - EL

Remove-Type[x: Record-Instance, t: Record-Type / z: Record-Instance]
Remove-Type removes the type t from the record instance specified by x.
returned.

x is also

Remove-Type[x: Record-Instance, t: Type / z: Record-Instance] +- Prog[ [];
Efface[t, x.type];
Return[x);
]:
Examples of record usage:
Experienced +- Record[experience: Integer]:
Person +- Record[age: Integer]:
Experienced-Person +- Extend[person. Experienced]:
joe +- Create[Experienced-Person]:
-- IsUoe Experienced] = T
joe. experience +- 10:
joe.age +- 26:
A.4 External Representation
Encode[object: Any / encoded: Byte-Array]
Encode converts an object into an array of bytes. and is similar to the Lisp function Print.
An encoded object may be preserved outside of EL for any length of time. None of the
data structures in the paper are circular. and thus we do not require that Encode work
properly on circular structures.
Decode[encoded: Byte-Array / object: Any]
Decode converts an encoded object back into a forin that can be directly manipulated;
Decode is similar to the Lisp function Read. Decode[Encode[x]] is equivalent to Copy[x].
A.5 Exception Handling
Error-Type

+-

Record[problem: Any]:

Error[problem: Any]
When an exceptional condition occurs in an EL function. the function returns what
amounts to an error code. Error creates an instance of Error-Type. The argument to Error
is used to set the new instance's problem field so different types of errors can be
Is[x. Error-Type] will return T if x is an error instance.
distinguished.
Error[problem / instance] +- [ []:
instance +- Create[Error-Type]:
instance.problem +- problem:
Return[instance]:
]:

127

ApPENDIX A: EXPOSITION LANGUAGE- EL

A.6 Processes
Fork[form: Any I handle: Process-Handle]
Fork starts another process that evaluates the supplied form in the current environment.
Fork returns a process handle that can be used with Join and Stop.
Join[handle: Process-Handle I result: Any]
Join returns the result of the evaluation carried out by the process specified by handle. If
the process specified by handle has not completed its assigned evaluation Join will pause
until the result is available. If no copies of a process' handle are kept then the result of the
process' evaluation will be discarded.
Stop[handle: Process-Handle]
Stop causes the process specified by handle to be stopped and destroyed.
A.7 Synchronization
Create- Lock[/lock: Lock]
Create- Lock creates a lock.
Critical[lock: Lock. form: Any I result: Any]
Critical locks the specified lock. evaluates form. and then unlocks the specified lock.
Critical returns the result of the evaluation of form.
Create-Condition-Variable[time-out: Time I cv: Condition-Variable]
Create-Condition-Variable creates a condition variable.
the maximum time that Wait will pause on cv.

If time-out is not NIL. then it is

Wait{cv: Condition-Variable]
Wait pauses until a Broadcast on the specified condition variable occurs. or the time-out
expires.
Broadcast{cv: Condition-Variable]
Broadcast wakes up all processes waiting on cv.
A.8 Sets
Set-Create[/set: Set]:
Set-Create creates a new volatile set. A volatile set is a set that is destroyed when there are
no more references to it or the machine it is stored on fails.
Set-Insert[set: Set key: Any. value: Any]
Inserts or replaces a reference to value in set indexed by key,

128

ApPENDIX A: EXPOSITION LANGUAGE - EL

Set-Lookup[set: Set. key: Any / value: Any]
Returns the value indexed by key in set. or NIL if there is no such key.
Set-Delete[set: Set. key: Any]
Deletes the entry indexed by key in set.
A.9 Byte Arrays
Byte-Array[length: Integer I array: Byte-Array]
The Byte-Array function creates an array of bytes that has a specified length. A byte is an
eight bit integer.
All of the elements of the array are initially set to zero.
Length[array: Byte-Array / length: Integer]
Length returns the number of of bytes in an array.
Elt[array: Byte-Array. element: Integer / value: Byte]
Elt returns the value of the specified element of the array.
SElt[array: Byte-Array. element: Integer. value: Byte]
SElt sets the value of the specified array element.

A.IO Miscellaneous
Intersection[lista: List. listb: List / result: List]
Intersection returns the list of elements that are in Usta and Ustb.
Nth[l: List. n: Integer / element: Any]
Nth returns the nth element of list 1.
Pair[lista: List. listb: List '/ result: List]
If !ista is [a b c] and listb is [1 2 3]. the result of Pair[lista. listb] is [[a 1] [b 2] [c 3]].
Union[lista: List. listb: List / result: List]
Union returns the list of elements that are either in Usta or Ustb.

129

Appendix B: Storage System Summary

This appendix is a concise summary of the operations that are implemented by the storage
system. The reader is referred to the text for full descriptions of these operations. and for details on
the protection mechanism.
The operations we introduced to create references that are derived from other references are:
Create-Capability[ref: Reference. pI: List[Key] / cref: Capability]
Creates a capability for ref with privileges pI.

Create-Choice[choices: List[Reference] / cr: Choice]
Creates a reference whose referent can be anyone of the elements in choices.

Create-Encrypted[ref: Reference. k: Key / eref: Encrypted]
Encrypts an object with k.

Create-Index[storage: File / iref: Index]
Creates an index.

Create-Indirect[ref: Reference. index: Index. tc: TC / iref: Indirect]
Binds iref to ref.

but this binding can be changed with ChangeReference.

Create-Located[ref: Reference. loc: Location / lref: Located]
When Iref is opened. control will

be transferred to loe and ref will

be opened.

Create-Protected-Volume[ref: Reference. tl: List[Type]. kl: List[Key] / gref: Guarded]
Files created on a protected volume have protection controls set according to tl and gl.

Create-Reconfigurable[ref: Reference. index: Index. tc: TC / IT: Reconfigurable]
Creates a reconfigurable object.. whose. first implementor is ref.

Create-Revocable[ref: Reference. pI: List[Key]. key-list: List[Key]. ck: Key. i: Index. sp:
Secure-Processor. tc: TC / ITef: Revocable]
Creates a capability that may be revoked.

Create-Suite[name: UniqueID. r: Integer.
List[Reference] / sref: Suite]
Creates a suite with

w:

Integer.

rep:

List[Reference].

votes:

the specified configuration.

Detailed below are the operations that are supported by each type of object. The bold face
entries correspond to object types. The list begins with the operations that are supported by all
objects. and then describes specialized operations.
Class
CloseD
Deactivates a class.

CopyReference[counted: Boolean / copy: Reference]
Copies a reference.

130

ApPENDIX

B: STORAGE SYSTEM SUMMARY

GetIO[/ id: UniqueIO]
Returns the unique identifier of the object

GetTransactionClass[1 tc: TC]
Returns the transaction class associated with this open object.

OestroyReference[/ destroyed: Boolean]
Destroys this reference and returns T if its reference count went to zero.

Indirect
ChangeReference[new-ref: Reference]
Changes the indirect reference to point at new-ref.

Processor, Secure Processor
Eval[fonn: Any I result: Any]
Evaluates form at the processor serviced by this class.

Coordinator
Create-Transaction[1 tref: Transaction]
Creates a transaction.

Transaction
Abort{]
Aborts the transaction.

Commit(]
Commits the transaction.

GetStatus[/state: Atom]
Returns the current state of the transaction.

AddParticipant[commit: Global-Function. abort: Global-Function lid: UniqueIO]
Adds a participant to transaction processing.

OeleteParticipant[id: U niq ue 10]
Deletes a participant from

transaction processing.

Information Holding Object
IsImmutable[/ immutable: Boolean]
Returns T if the object is immutable.

SetlmmutableD
Makes an object immutable.

GetVersion[1 version: Integer]
Returns the objecfs version number.

131

ApPENDIX

B:

STORAGE SYSTEM SUMMARY

Copy[from: Reference]
Overwrites the object with the contents of from.

Volume, Volume Suite
Create-File[name: UniqueID, exists: Boolean / ref: File]
Creates a new file.

File, File Suite
Read[startPage: Integer. pages: Integer / data: Byte-Array]
Reads from a file starting at startPage the number of pages specified by pages.

Write[startPage: Integer. pages: Integer. data: Byte-Array]
Updates a file starting at startPage by number of pages specified by pages.

GetSize[l pageCount: Integer]
Returns the number of pages in

the

file.

SetSize[pageCount: Integer]
Sets the number of pages in the file.

Index, Index Suite
Enumerate[Iast: Entry / next: Entry]
Enumerates the contents of an

index.

Read[entry-name: Byte-Array / value: Byte-Array]
Reads the value of an

index entry.

Write[entry-name: Byte-Array. value: Byte-Array]
Updates the value of an index entry.

132

Appendix C: Open

The function Open contains the only centralized knowledge of the types of objects that are
supported by the system. Thus. to add a new type of object to the system it is only necessary to
add an appropriate line of code to Open.
The function Open-Local is defined to provide classes to service coordinators. files. indexes.
volumes. or transactions that reside on the local processor.
If the location of a file or an index is not known. we look for it on the local processor. If it is
not there. its location is computed. These two things could occur in parallel.
Although it is not shown in the code. Open could detect when an object was opened more than
once with identical parameters and return the same class. In certain cases. such as opening a
processor more than once. the performance gain would be significant
Open[ref: Reference. tc: TC. ring: List[Key]. guards: List[Key] / class: Class] +- Prog[
-- if passed NIL return NIL
IF Null[ref) THEN Return[NIL]:

It

-- Located. Capability. and Indirect have the highest precedence
(because they alter where a request is processed. the keys
it is processed with. and the referent that is used).
IF Is[ref. Located] THEN Return[Open-Located[ref. tc. ring. guards]]:
IF Is[ref. Capability] THEN Return[Open-Capability[ref. tc. ring. guards]]:
IF Is[ref. Indirect] THEN Return[Open-Indirect[ref. tc. ring. guards]]:
-IF
IF
IF
IF
IF
IF
IF
IF

The following can occur in any order.
Is[ref. Choice] THEN Return[Open-Choice[ref. tc. ring, guards]]:
Is[ref. Encrypted] THEN Return[Open-Encrypted[ref. tc. ring, guards]]:
Is[ref. Protected] THEN Return[Open-Protected[ref. tc. ring. guards]]:
Is[ref. Processor] THEN Return[Open-Processor[ref, NIL NIL guards]]:
Is[ref. Reconfigurable] THEN Return[Open-Reconfigurable[ref. tc. ring, guards]];
Is[ref. Secure-Door] THEN Return[Open-Secure-Door[ref, NIL NIL guards]]:
Is[ref. Secure-Processor] THEN Return[Open-Secure-Processor[ref. NIL, NIL guards]]:
Is[ref. Suite] THEN Return[Open-Suite[ref. tc. ring. guards]];

-- We don't know where. the object is. Try this processor.
class +- Open-Local[ref. tc. ring. guards]:
-- if success. return to client
IF class # ErrorrNotFound] THEN Return[class]:
-- Find object
IF Is[ref. File] THEN Return[Open-File[ref. tc. ring. guards]]:
IF Is[ref. Index] THEN Return[Open-Index[ref. tc. ring. guards]]:
-- Not a File or an Index (should not occur)
Return[ErrorrN otFound]]:
]:

133

Bibliography

[Alsberg et al. 76] Alsberg, P., Belford, G. G., Day, J. D., Grapa, E., Multi-Copy Resiliency
Techniques, CAC Document Number 202, Center for Advanced Computation, Univ. of Illinois,
Urbana, Illinois, May 1976.
[Blakley 79] Blakley, G. R .. Safeguarding Cryptographic Keys, Proc. 1979 National Compo Con/..
AFIPS Press. pp. 313-317.
[Boggs et al. 80] Boggs, D.R .. Shoch. J.F., Taft. E.A.. Pup: An Internetwork Architecture, l.E.E.E.
Trans. on Comm COM-28. 4 (April 1980). pp. 612-624.
[Belady and Lehman 77] Belady, L. A.. and Lehman, M. M., The Characteristics of Large Systems,
Report RC-6785. IBM T. J. Watson Research Center, September 1977.
[Birrell and Needham 80] Birrell. A. 0 .. and Needham. R. M .. A Universal File Server, IEEE
Trans. on Software Engineering SE-6. 5 (September 1980), pp. 450-453.
[Brooks 75]

Brooks. F. P.; The Mythical Man-Month. Addison-Wesley, Reading, MA, 1975.

[Chaum and Fabry 78] Chaum. D. and Fabry. R .. Implementing Capability-Based Protection Using
Encryption. Rep. UCB/ERL M78/46. Electronics Research Laboratory. University of CA, Berkeley,
July 1978.
[Clark et al. 78] Clark. D. D .. Pogran. K. T .. Reed. D. P.. An Introduction to Local Area Networks,
Proc. of the IEEE 66. 11 (November 1978). pp. 1497-1516.
[Corbato et al. 62] Corbato. F. 1.. Daggett M. M.. Daley, R. C .. An Experimental Time-Sharing
System. 1962 Fall Joint Computer Conference. AFIPS Con/. Proc. 21. pp. 335-344.
[Corbato et al. 72] Corbato. F. J .. Saltzer. J. H .. Clingen. C. T.. Multics - The First Seven Years.
1972 Fall Joint Computer Conference. AFIPS Con/. Proc. 40. pp. 571-583.
[Davies 73] Davies. C. T.. Recovery Semantics for a DB/DC System. Proc. ACM National
Conference. 1973. pp. 136-141.
[Denning 76] Denning. D. E.. A Lattice Model of Secure Information Flow. Comm ACM 19. 5
(May 1976). pp. 236-243.
[Dennis and Van Hom 66] Dennis. 1. B.. and Van Hom. E. C .. Programming Semantics for
Multiprogrammed Computations. Comm ACM 9. 3 (March 1966), pp. 143-155.
[DES 75] Notice of a Proposed Federal Information Processing Data Encryption Standard. Federal
Register. Vol. 40. No. 52. March. 1975.
[Diffie and Hellman 76] Diffie. W .. and Hellman. M. E.. New Directions in Cryptography. IEEE
Trans. on In/. Thy. IT-22.6 (November 1976). pp. 644-654.
[Diffie and Hellman 77] Diffie. W .. and Hellman. M. E.. Exhaustive Cryptanalysis of the NBS Data
Encryption Standard. Computer (June 1977). pp. 74-84.
[Diffie and Hellman 79] Diffie. W .. and Hellman. M. E.. Privacy and Authentication: An
Introduction to Cryptography. Proc. of the IEEE 67. 3 (March 1979), pp. 397-427.
[Dian 80] Dion. J .. The Cambridge File Server. Operating Systems Review 14. 4 (October 1980). pp.
26-35.

134

BIBLIOGRAPHY

[Eswaran et ·a1. 76] Eswaran. K.P. et al.. The Notions of Consistency and Predicate Locks in a
Database System. Comm. ACM 19. 11 (November 1976). pp. 624-633.
[Forsdick et a1. 77] Forsdick. H. C .. Schantz. R. E.. Thomas. R. H.. Operating Systems for
Computer Networks. Rep. 3614. Bolt Beranek and Newman. Cambridge. MA. July 1977.
[Garnett and Needham 80] Garnett. N.H .. and Needham. R.M .. An Asynchronous Garbage
Collector for the Cambridge File Server. Operating Systems Review 14. 4 (October 1980). pp. 36-40.
[Gifford 79a] Gifford. D.K.. Violet. An Experimental Decentralized System. Integrated Office
System Workshop. IRIA. Rocquencourt. France. November. 1979. Available as CSL Report 79-12.
Xerox Palo Alto Research Center. 1979.
[Gifford 79b] Gifford. O.K. Weighted Voting for Replicated Data. Operating System Review 13. 5
(December 1979). pp. 150-162.
[Gray et a1. 76] Gray. J. N. et al.. Granularity of Locks and Degrees of Consistency in a Shared
Data Base. in Modeling in Data Base Management Systems. North Holland Publishing. 1976. pp.
365-394.
[Gray 78] Gray. J. N.. Notes on Data Base Operating Systems. in Operating Systems. An Advanced
Course. Lecture Notes in Computer Science 60. Springer-Verlag. 1978. pp. 393-481.
[Gray et a1. 79] Gray. J. N .. et al.. The Recovery Manager of a Data Management System. Rep.
RJ2623. IBM San Jose Research Laboratory. San Jose. CA. August 1979.
[Gudes 80] Gudes. E.. The Design of a Cryptography Based Secure File System. IEEE Trans. on
Soft. Eng. SE-6. 5 (September 1980). pp. 411-420.
[Hellman 76 et a1.] Hellman. M. E.. et al.. Results on an initial attempt to Cryptanalyze the NBS
data encryption standard. Rep. SEL-76-042. Dept. of Electrical Engineering. Stanford Univ ..
Stanford. CA. September 1976.
[IBM 80a] CICS/VS System Application Design Guide. Third Edition. Rep. SC33-0068-2.
International Business Machines. 1980.
[IBM 80b] IBM 3850 Mass Storage System Introduction and Preinstallation Planning. Second
Edition. Rep. GA32-0038-1. International Business Machines. 1980.
[Ingalls 78] Ingalls. D. H.. The Smalltalk-76 Programming System Design and Implementation.
Fifth Annual ACM Symposium on Principles of Programming Languages. Tucson. Arizona. January
1978.
[Israel et al. 1978] Israel. J.E .. Mitchell. J.G .. and Sturgis. H.E.. Separating Data From Function in
a Distributed File System. Report CSL-78-5. Xerox Palo Alto Research Center. Palo Alto. CA.
September 1978.
[Knuth 73] Knuth. D. E.. The Art of Computer Programming. Vol. 3. Sorting and Searching.
Addison-Wesley. Reading. MA. 1973.
[Lamport 78a] Lamport. L.. The Implementation of Reliable Distributed Multiprocess Systems.
Computer Networks 2 (1978). pp. 95-114.
[Lamport 78b] Lamport. L.. Time. Clocks. and the Ordering of Events in a Distributed System.
Comm. ACM 21. 7 (July 1978). pp. 558-565.
[Lampson 69]
Lampson. B. W.. Dynamic Protection Structures. 1969 Fall Joint Computer
Conference. AFIPS Con/. Proc. 35. pp. 27-38.

135

BIBLIOGRAPHY

[Lampson 80] Lampson, B. W., Replicated Commit, Draft Paper, Xerox Palo Alto Research Center,
Palo Alto, CA. November 1980.
[Lampson and Sproull 79] Lampson, B. W., and Sproull, R. E, An Open Operating System for a
Single-User Machine, Operating System Review 13. 5 (December 1979), pp. 98 - 105.
[Lampson and Sturgis 79] Lampson. B. W .. and Sturgis, H. E.. Crash Recovery in a Distributed
Data Storage System. Draft Report Xerox Palo Alto Reserch Center. Palo Alto, CA, April 1979.
Also to appear Comm ACM.
[LeLann 81] Le Lann. G., A Distributed System for Real-Time Transaction Processing, Fourteenth
Hawaii International Conference on System Science. Honolulu. January 1981.
[Liddle 76]

Liddle. D., private communication.

[Lindsay and Gligor 78] Lindsay. B.. and Gligor. V.. Migration and Authentication of Protected
Objects. IEEE Trans. Soft. Eng. SE-5. 6 (November 1979). pp. 607-611.
[McCarthy et al. 62]
MA. 1962.

McCarthy. J .. et. al.. Lisp 1.5 Programmer's Manual, MIT Press, Cambridge,

[McCreight 77] McCreight E. M .. Pagination of B*-Trees with Variable-Length Records. Comm
ACM 20. 9 (September 1977). pp. 670-674.
[Metcalfe 73] Metcalfe. R.M .. Packet Communication. Ph.D. Thesis. Harvard University, December.
1973. Also available as Rep. TR-114. MIT Laboratory for Computer Science. Cambridge, MA.
[Metcalfe and Boggs 76] Metcalfe. R.M .. and Boggs. D.. Ethernet: Packet Switching for Local
Computer Networks. Comm ACM 19. 7 (July 1976). pp. 395-403.
[Morris 73] Morris. 1. H .. Protection in Programming Languages. Comm ACM 16. 1 (January
1973). pp. 15-21.
[NBS 80] National Bureau of Standards. DES Modes of Operation. Preliminary Copy of Federal
Information Standards Publication 81. September. 1980.
[Needham and Schroeder 78] Needham. R. M .. and Schroeder. M. D.. Using Encryption for
Authentication in Large Networks of Computers. Comm ACM 21. 12 (December 1978), pp. 993999.
[Needham 79] Needham. R. M .. Adding Capability Access to Conventional File Servers. Operating
Systems Review 13. 1 (January 1979). pp. 3-4.
[Nelson 81]

Nelson. Boo Remote Procedure Call. Ph.D. Thesis. Carnegie-Mellon Univ .. May. 1981.

[Obermarck 80] Obermarck. R .. Global Deadlock Detection Algorithm. Rep. RJ2845. IBM San Jose
Research Laboratory. San Jose. CA. August 1979.
[Osterhout et al. 80] Osterhout J. K .. Schelza. D. A.. Sindhu. P. S.. Medusa: An Experiment in
Distributed Operating System Structure. Comm ACM 23. 2 (February 1980), pp. 92-105.
[Peterson and Weldon 72] Peterson. W. W .. Weldon. E. J .. Error-Correcting Codes. Second Edition.
MIT Press. Cambridge. MA. 1972.
[Popek et al. 81] Popek. G .. et aL LOCUS: A Network Transparent High Reliability Distributed
System. Operating Systems Review 15. 5 (December 1981). pp. 169-177.
[Raiffa 70] Raiffa. H .. Decision Analysis: Introductory Lectures on Choices under Uncertainty.
Addison-Wesley. Reading. MA. 1975.

136

BIBLIOGRAPHY

[Redell 74] . Redell. D. D.. Naming and Protection in Extendible Operating Systems. Ph.D. Thesis.
University of California. Berkeley. September 1974. Available as TR-140. MIT Laboratory for
Computer Science. Cambridge. MA.
[Reed 78] Reed. D. P.. Naming and Synchronization in a Decentralized Computer System. Rep.
TR-205. MIT Laboratory for Computer Science. Cambridge. MA. 1978.
[Rivest et al. 78] Rivest. R. L.. Shamir. A.. and Adleman. L.. A Method of Obtaining Digital
Signatures and Public-Key Cryptosystems. Comm ACM 21. 2 (February 1978). pp. 120-126.
[Rothnie et al. 77] Rothnie. 1. B.. Goodman. N.. Bernstein. P. A.. The Redundant Update
Methodology of SOD-I: A System for Distributed Databases (The Fully Redundant Case).
Technical Report CCA-77-02. Computer Corporation of America. Cambirdge. MA. June 1977.
[Saltzer and Schroeder 79] Saltzer. J. H .. and Schroeder. M. D .. The Protection of Information in
Computer Systems. Proc. of the IEEE 63. 9 (September. 1975). pp. 1278-1308.
[Shamir 79] Shamir. A.. How to Share a Secret. Comm ACM 22. 11 (November. 1979). pp. 612613.
[Shannon 49] Shannon. C. E.. Communication Theory of Secrecy Systems. Bell Sys. Tech Journal
28. (October 1949). pp. 656-715.
[Spector 80] Spector. A. Z.. Performing Remote Operations Efficiently on a Local Computer
Network. Stanford Department of Computer Science Report STAN-CS-80-850. Stanford Univ ..
Stanford. CA. December 1980.
[Swinehart et al. 79] Swinehart. D .. McDaniel. G .. Boggs. D.. WFS: A Simple Shared File System
for a Distributed Environment. Operating System Review 13. 5 (December 1979). pp. 9 - 17.
[Tandem 81]
1981.

Data Base Management ENCOMPASS: TMF. Tandem Computers. Cupertino. CA.

[Teitelman et al. 78] Teitelman. W .. et at.. Interlisp Reference Manual. Xerox Palo Alto Research
Center. Palo Alto. CA. 1978.
[Thacker et al. 79] Thacker. C. P. et at.. Alto: A Personal Computer. Rep. CSL-79-11. Xerox Palo
Alto Research Center. Palo Alto. CA. August 1979.
[Thomas 73] Thomas. R. H .. A Resource Sharing Executive for the ARPANET. 1973 National
Computer Conference. AFIPS Con/. Proc. 44. pp. 155-163.
[Thomas 79] Thomas. R. H .. A Majority Consensus Approach to Concurrency Control for Multiple
Copy Databases. ACM Trans. on Database Systems 4. 2 (June 1979) pp. 181-209.
[von Neumann 51] von Neumann. J .. Various Techniques Used in Connection with Random
Digits. in Monte Carlo Math Series # 12. National Bureau of Standards. 1951.

137

INDEX

Conventional-Encrypt. 80
Coordinator, 25
coordinator, 25, 31
Copy, 52
CopyReference.23
counted reference, 22
Create. 125
Create-Base-Key,83
implementation of, 86
Create-Capability. 100
Create-Choice, 47
Create-Class, 124
Create-Condition-Variable, 128
Create-Conventional-Key.80
Create-Encrypted. 97
Create-File, 28
Create-Global-Function. 13
Create-Index, 40
Create-Indirect, 41
implementation of, 41
Create-Indirect-KeY,84
implementation of, 92
Create- Key-Pair. 83
implementation of, 86
Create- Located, 32
Create-Lock, 128
Create-PK -Pair. 81
Create-Protected-Volume,99
Create-Recon figurable, 65
implementation of, 66
Create-Revocable-Capability, 109
implementation of. III
Create-Suite, 51
Create-Transaction. 25
Critical, 128
cryptanalytic attack, 80
chosen-cleartext,80
ciphertext-only, 80
known-cleartext. 80
on cryptographic sealing, 95
cryptographic sealing, 76
applications of. 97
authentication property of, 83
comparative analysis of, 113
correctness of, 93
cryptanalysis of. 95
model of, 82
performance of, 112
security property of, 83
cryptography. 8. 77
computationally secure, 80
conventional, 80
hardware for, 115
perfectly secure. 80
public-key. 81

Abort. 26
abort. 25
access control list. 102
action. 23
real. 25
active protection mechanism. 77
Add-Type. 126
AddParticipant 26
address. 9
broadcast 9. 46
Apply-Quorum. 56
architectural principles. l. 122
list of, 6
authentication. 76
elimination of. 113
in the large. 112
availability. 76
base key. 82
Broadcast. 9. 128
broken read lock, 37
Byte-Array. 129
Capability, 100
capability, 100
Change-Indirect 41
implementation of. 43
Change-Indirect-Key.84
implementation of. 92
Change-Revocable-Capability. 109
implementation of. III
ChangeReference. 43
checkpoint 38
Checksum. 82
checksum. 82
choice reference. 47
ciphertext 77
class. 22. 123
cleartext 77
Close. 23
Collect-Quorum. 59
Collect-Write-Quorum. 60
Commit. 26
commit 25
communication. 9
one-way authenticated. 105
packet switched. 9
secure. 105
compensation. 25
connection. 16
secure. 105
consistency
degrees of. 36
external. 24
lower degrees of. 62
serial. 23
Conventional-Decrypt 80

138

INDEX

current, 49
deadlock. 37
detection. 37
prevention. 37
time-out strategy. 37
Decode. 127
decrypt. 77
DeleteParticipant, 26
dependency relation. 24
derived key. 82
key field. 86
opener field. 86
Destroy Reference. 23
door. 33
secure. 105
Door-Eval. 33. 105
dynamic configuration. 120
EL. 3. 121. 123
Elt, 129
Encode. 127
encrypt. 77
Encrypted. 97
encrypted object. 97
Entry. 40
Enumerate. 40
equivalent
schedule. 24
Error. 127
Error-Type. 127
Eval. 12. 104
Extend. 126
external consistency. 24
failure
processor. 12. 25
stable storage. 38
storage. 10
Fetch. 126
File. 27
file. 21. 27
file index. 66
Fork. 128
form. 12
garbage collection. 22
Get-Size. 28
GetID.23
GetMyProcessor.9
GetProcessorID. 8
GetStatus.26
GetTransactionClass.23
GetUniqueID. 11
GetVersion.52
global function. 13
goal. 1
guard. 98
standard types. 98

guarded object, 98
ideas
major, 121
immutable. 28
Inde'x,40
index. 40
entry, 40
entry name. 40
file. 66
Index-Index. 46
Indirect. 41
indirect reference. 43
indirection. 41
information flow control. 102
Initiate-Inquiries. 57
integrity. 76
Intersection. 129
Is. 126
IsImmutable. 28
Join. 128
key. 76
base. 82
derived. 82
key ring. 97
key for. 97
Key-And. 83
implementation of. 91
Key-Or, 84
implementation of. 91
Key-Pair. 83
Key-Quorum. 84
implementation of, 91
Length. 129
Lisp, 123
Located. 32
location. 46. 121
lock. 29
breaking read. 37
compatibility. 35
granularity. 36
releasing read. 37
lock management. 29
log. 29
Lookup. 41
implementation of. 41
Major-Type, 32
message, 9
model implementation. 3
NA-Unseal. 85
implementation of. 92
NA-Unseal-List. 85
naming systems
model of. 21
networks
internetwork. 9

139

INDEX

local. 9
Normalize. 41
implementation of. 43
nth. 129
object
sealed. 76
one-way authenticated. 105
Open. 22. 133
Open-Capability. 102. 133
Open-Choice. 47. 133
Open-EncryPted. 98. 133
Open-File. 46. 133
Open-Index. 46. 133
Open-Indirect 44. 133
Open-LocaL 133
Open- Located. 32. 133
Open-Processor. 12. 133
implementation of. 17
Open-Protected. 99.133
Open-Reconfigurable. 66. 133
Open-Secure-~r. 106.133
Open-Secure-Processor. 105. 133
Open-Suite. 53. 133
opener. 86
orphan. 12
page. 28
Pair. 129
participant 26
passive protection mechanism. 77
personal computing. 5
PK-Decrypt 81
PK -Encrypt. 81
PK-Pair.81
practical considerations. 115
Pre-Eval. 14
prepared. 31
Processor. 9
processor. 8
identifier. 8
secure. 104
Protected. 99
protected volume. 99
protection. 76
authentication. 76
availability. 76
changing controls. III
common mechanisms. 100. 121
group. 102
integrity. 76
secrecy. 76
protection mechanism. 121
active. 77
algorithm. 85
cryptographic sealing. 76
passive. 77

prototype of. 115
prototypes, 115
pseudo-time, 32
quorum
read. 49
write. 49
random number. 80
random numbers. 8
Read
file. 28
index, 40
read quorum, 49
Read-Only. 22
real action. 25
Receive, 9
Reconfigurable.66
reconfigurable object. 65
reconfiguration. 65. 121
algorithm. 66
Reconfigure. 65
Record. 125
records. 125
recovery management. 29
Reference. 21
reference. 21
capability. 100
choice. 47
coordinator. 25
counted. 22
encrypted object, 97
file, 27
index. 40
indirect, 43
located. 32
processor. 9
protected volume, 99
read only, 22
reconfigurable.66
suite, 51
transaction, 25
uncounted. 22
volume. 27
reference-count. 22
references. 121
Remember-Eval. 17. 105
remote form evaluation. 11. 104
algorithm. 13
semantics, 12
Remove-Type. 127
replication. 49. 121
related work. 64
replication algorithm
prototype of. 115
Representative. 53
representative. 49

140

INDEX

transfer rate, 10
write-once, 10
Store. 126
Submaster, 84
implementation of, 91
Suite, 51
suite, 49
algorithm, 52
algorithm correctness, 61
alternative for volumes, 64
concurrent update of, 63
examples, 50
object size of. 62
recursive, 62
representative. 49
total replacement of. 63
volume, 52
voting con figuration. 50
suite algorithm, 52
Suite-Close, 59
Suite-Copy, 56
Suite-CopyRef.56
Suite-Create. 57
Suite-Read. 55
Suite-Write. 56
superc1ass. 124
system configuration, 2. 116
dynamic. 120
example of. 116
methodology for. 119
static, 120
system implementation, 2, 115
full-scale. 116
prototypes, 115
system model, 2
environment. 7
summary of. 130
TC.26
threshold scheme. 81
Threshold-Pieces, 81
Threshold-Recover. 82
Threshold-Split. 82
time-sharing, 4
totality, 23
Transaction. 25
transaction, 21. 23
abort, 25
commit, 25
nested. 38
transactional storage, 21, 121
decentralized. 30
prototype of. 115
single processor, 29
TryToSet 31
two-phase commit protocol, 30

creation. 63
current, 49
obsolete. 63
weak, 62
request, 124
Request- Eval. 15. 105
revocation. 107
algorithm. 109
ring, 97
RUnseal. 85
implementation of. 92
RUnseal-List, 85
save point, 38
schedule. 24
equivalent 24
Seal. 83
implementation of. 87
Seal-List 85
Seal-Only. 84
implementation of, 92
sealed object, 76
sealing. 76
secrecy. 76
secure channel, 105
secure door, 105
secure processor. 104
Secure-~r. 105
Secure-Processor. 105
security envelope. 77
self. 124
SElt.129
Send, 9
serial consistency. 23
Set-Create. 128
Set-Delete. 129
Set-Entry-Guard. 99
Set-Guard. 99
Set-Insert, 128
Set-Lookup. 129
Set-Size. 28
Setlmmutable. 28
shadow mechanism. 29
spoof. 105
stable storage. 10. 30
static configuration. 120
Stop. 128
storage
transactional. 21
storage device. 9
capacity. 10
latency. 10
random access. 10
read-only. 10
removable media. 10
serial access. 10

141

INDEX

decentralized, 31
uncounted reference, 22
Union. 129
unique identifier. 11
generation. 11

UniqueID,11
Unseal. 83
implementation of. 90
unsealed by. 83
unseals relation. 82
volatile storage. 10
Volume. 27
volume. 21. 26
protected. 99
reconfigurable.66
serial. 26
write-once. 26
vol ume suite. 52
Wait 128
weak representative. 62
Weak-Remote-EvaL 13
implementation of. 14
worker. 30
Write
file. 28
index. 40
write quorum. 49
write-ahead-log protocol. 10

142

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
XMP Toolkit                     : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37
Create Date                     : 2010:11:26 16:51:06-08:00
Modify Date                     : 2010:11:26 16:58:52-07:00
Metadata Date                   : 2010:11:26 16:58:52-07:00
Producer                        : Adobe Acrobat 9.4 Paper Capture Plug-in
Format                          : application/pdf
Document ID                     : uuid:27fc910f-b88c-4508-962f-c54032c3e030
Instance ID                     : uuid:898f7026-c5f0-4027-9787-1e6b79c956d7
Page Layout                     : SinglePage
Page Mode                       : UseNone
Page Count                      : 153

EXIF Metadata provided by EXIF.tools

CSL 81 8_Information_Storage_in_a_Decentralized_Computer_System 8 Information Storage In A Decentralized Computer System

CSL-81-8_Information_Storage_in_a_Decentralized_Computer_System CSL-81-8_Information_Storage_in_a_Decentralized_Computer_System

Navigation menu

Versions of this User Manual:

Views

Navigation