Fifth_Generation_Computer_Systems_1992_Volume_2 Fifth Generation Computer Systems 1992 Volume 2
User Manual: Fifth_Generation_Computer_Systems_1992_Volume_2
Open the PDF directly: View PDF
.
Page Count: 773
| Download | |
| Open PDF In Browser | View PDF |
~
FIFTH GENERATION
COMPUTER SYSTEMS 1992
Edited by
Institute for New Generation
Computer Technology (ICOT)
Volume 2
Ohmsha, Ltd.
IDS Press
FIFTH GENERATION COMPUTER SYSTEMS 1992
Copyright
©
1992 by Institute for New Generation Computer Technology
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system or transmitted, in any form or by any means, electronic, mechanical, recording or
otherwise, without the prior permission of the copyright owner.
ISBN 4-274-07724-1 (Ohmsha)
ISBN 90-5199-099-5 (lOS Press)
Library of Congress Catalog Card Number: 92-073166
Published and distributed in Japan by
Ohmsha, Ltd.
3-\ Kanda Nishiki-cho, Chiyoda-ku, Tokyo 101, Japan
Distributed in North America by
lOS Press, Inc.
Postal Drawer 10558, Burke, VA 22009-0558, U. S. A.
United Kingdom by
lOS Press
73 Lime Walk, Headington, Oxford OX3 7AD, England
Europe and the rest of the world by
lOS Press
Van Diemenstraat 94, 10 13 CN Amsterdam, Netherlands
Far East jointly by
Ohmsha, Ltd., lOS Press
Printed in Japan
111
CONTENTS OF VOLUME 1
PLENARY SESSIONS
Keynote Speech
Launching the New Era
Kazuhiro Fuchi
....................... 3
General Report on ICOT Research and Development
Overview of the Ten Years of the FGCS Project
Takashi K urozumi . . . . . . . . . . . . . . . . . . . . . . .
Summary of Basic Resear.ch Activities of the FGCS Project
Koichi Furukawa . . . . . . . . . . . . . . . . . . . . . . . .
Summary of the Parallel Inference Machine and its Basic Software
Shunichi Uchida . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 9
.20
.33
Report on ICOT Research Results
Parallel Inference Machine PIM
Kazuo Taki . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.50
Operating System PIMOS and Kernel Language KL1
Takashi Chikayama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.73
Towards an Integrated Knowledge-Base Management System: Overview of R&D on
Databases and Knowledge-Bases in the FGCS Project
Kazumasa Yokota and Hideki Yasukawa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . · .89
Constraint Logic Programming System: CAL, GDCC and Their Constraint Solvers
Akira Aiba and Ryuzo Hasegawa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . · 113
Parallel Theorem Provers .and Their Applications
Ryuzo Hasegawa and Masayuki Fujita
· 132
Natural Language Processing Software
Yuichi Tanaka . . . . . . . . . . . . . . . . . . . . . ..
· 155
Experimental Parallel Inference Software
Katsumi Nitta, Kazuo Taki and Nobuyuki Ichiyoshi . . . . . . . . . . . . . . . . . . . . . . . . . 166
Invited Lectures
Formalism vs. Conceptualism: Interfaces between Classical Software Development Techniques
and Knowledge Engineering
Dines Bj¢rner . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
The Role of Logic in Computer Science and Artificial Intelligence
J. A. Robinson . . .
Programs are Predicates
C. A. R. Hoare
Panel Discussion: A Springboard for Information Processing in the 21st Century
PANEL: A Springboard for Information Processing in the 21st Century
Robert A. Kowalski (Chairman) . . . . . . .
Finding the Best Route for Logic Programming
Herve Gallaire . . . . . . . . . . . . . . . . . .
The Role of Logic Programming in the 21st Century
Ross Overbeek . . . . . . . . . . . . . .
Object-Based Versus Logic Programming
Peter Wegner . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
Concurrent Logic Programming as a Basis for Large-Scale Knowledge Information Processing
Koichi Furukawa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
· 191
.199
.211
.219
.220
.223
.225
.230
IV
Knowledge Information Processing in the 21st Century
Shunichi Uchida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
ICOT SESSIONS
Parallel VLSI-CAD and KBM Systems
LSI-CAD Programs on Parallel Inference Machine
Hiroshi Date, Yukinori Matsumoto, Kouichi Kimura, Kazuo Taki, Hiroo Kato and
lvlasahiro Hoshi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 237
Parallel Database Management System: Kappa-P
Moto Kawamura, Hiroyuki Sato, Kazutomo Naganuma and Kazumasa Yokota . . . . . . . . 248
Objects, Properties, and Modules in QUIXOTe
Hideki Yasukawa, Hiroshi Tsuda and Kazumasa Yokota . . . . . . . . . . . . . . . . . . . . . . 257
Parallel Operating System, PIMOS
Resource Management Mechanism of PIM OS
Hiroshi Yashiro, Tetsuro Fujise, Takashi Chikayama, Masahiro Matsuo, Atsushi Hori
and K umiko Wada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
The Design of the PIMOS File System
Fumihide Itoh, Takashi Chikayama, Takeshi Mori, Masaki Sato, Tatsuo Kato and
Tadashi Sato . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 278
ParaGraph: A Graphical Tuning Tool for Multiprocessor Systems
Seiichi Aikawa, Mayumi Kamiko, Hideyuki Kubo, Fumiko Matsuzawa and
Takashi Chikayama .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
Genetic Information Processing
Protein Sequence Analysis by Parallel Infere?ce Machine
Masato Ishikawa, Masaki Hoshida, Makoto Hirosawa, Tomoyuki Toya, Kentaro Onizuka
and Katsumi Nitta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Folding Simulation using Temperature Parallel Simulated Annealing
Makoto Hirosawa, Richard J. Feldmann, David Rawn, Masato Ishikawa, Masaki Hoshida
.. 300
and George Michaels . . . . . . . . .... ". . . . . . . . . . . . . . . . . . . . . . . . .
Toward a Human Genome Encyclopedia
Kaoru Yoshida, Cassandra Smith, Toni Kazic, George Michaels, Ron Taylor,
David Zawada, Ray Hagstrom and Ross Overbeek . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Integrated System for Protein Information Processing
Hidetoshi Tanaka . . . . ". . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Constraint Logic Programnling and Parallel Theorenl Proving
Parallel Constraint Logic Programming Language GDCC and its Parallel Constraint Solvers
Satoshi Terasaki, David J. Hawley, Hiroyuki Sawada, Ken Satoh, Satoshi Menju,
Taro Kawagishi, Noboru Iwayama and Akira Aiba . . . . . . . . . . . . . . . . . . . . . . .. . 330
cu-Prolog for Constraint-Based Grammar
Hiroshi Tsuda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Model Generation Theorem Provers on a Parallel Inference Machine
Masayuki Fujita, Ryuzo Hasegawa, Miyuki Koshimura and Hiroshi Fujita
.357
Natural Language Processing
On a Grammar Formalism, Knowledge Bases and Tools for Natural Language Processing in
Logic Programming
Hiroshi Sano and Fumiyo Fukumoto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
v
Argument Text Generation System (Dulcinea)
Teruo Ikeda, Akira Kotani, Kaoru Hagiwara and Yukihiro Kubo . . . . . . . . . . . . . . . . . 385
Situated Inference of Temporal Information
Satoshi Tojo and Hideki Yasukawa
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
A Parallel Cooperation Model for Natural Language Processing
Shigeichiro Yamasaki, Michiko Turuta, Ikuko Nagasawa and Kenji Sugiyama . . . . . . . . . 405
Parallel Inference Machine (PIM)
Architecture and Implementation of PIM/p
Kouichi Kumon, Akira Asato, Susumu Arai, Tsuyoshi Shinogi, Akira Hattori,
Hiroyoshi Hatazawa and Kiyoshi Hirano . . . . . . . . . . . . . . . . . . . .
. . . 414
Architecture and Implementation of PIM/m
Hiroshi Nakashima, Katsuto Nakajima, Seiichi Kondo, Yasutaka Takeda, Yu Inamura,
Satoshi Onishi and Kanae Masuda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 5
Parallel and Distributed Implementation of Concurrent Logic Programming Language KLI
Keiji Hirata, Reki Yamamoto, Akira Imai, Hideo Kawai, Kiyoshi Hirano,
Tsuneyoshi Takagi, Kazuo Taki, Akihiko Nakase and Kazuaki Rokusawa
.436
Author Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . i
VII
CONTENTS OF VOLUME 2
FOUNDATIONS
Reasoning about ProgralTIS
Logic Program Synthesis from First Order Logic Specifications
Tadashi Kawamura . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 463
Sound and Complete Partial Deduction with Unfolding Based on Well-Founded Measures
Bern Martens, Danny De Schreye and Maurice Bruynooghe . . . . . . . . . . . . . . . .
.473
A Framework for A.nalyzing the Termination of Definite Logic Programs with respect to Call
Patterns
Danny De Schreye, Kristof Verschaetse and Maurice Bruynooghe . . . . . . . . . . . . . . . 481
Automatic Verification of GHC-Programs: Termination
Lutz Plumer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
Analogy
Analogical Generalization
Takenao Ohkawa, Toshiaki Mori, Noboru Babaguchi and Yoshikazu Tezuka . . . . . . . . . . 497
Logical Structure of Analogy: Preliminary Report
Jun Arima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
Abduction (1)
Consistency-Based and Abductive Diagnoses as Generalised Stable Models
Chris Preist and Kave Eshghi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 514
A Forward-Chaining Hypothetical Reasoner Based on Upside-Down Meta-Interpretation
Yoshihiko Ohta and Katsumi Inoue ... . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 522
Logic Programming, Abduction and Probability
David Poole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
Abduction (2)
Abduction in Logic Programming with Equality
P. T. Cox, E. Knill and T. Pietrzykowski
.539
Hypothetico-Deductive Reasoning
Chris Evans and Antonios C. Kakas . . . .
.546
Acyclic Disjunctive Logic Programs with Abductive Procedures as Proof Procedure
Phan Minh Dung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
Semantics of Logic Progranls
Adding Closed World Assumptions to Well Founded Semantics
Luis Moniz Pereira, Jose J. Alferes and Joaquim N. Aparicio . . . . . . . . . . . . . . . . . . . 562
Contributions to the Semantics of Open Logic Programs
A. Bossi, M. Gabbrielli, G. Levi and M. C. Meo . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
A Generalized Semantics for Constraint Logic Programs
Roberto Giacobazzi, Saumya K. Debray and Giorgio Levi
.581
Extended Well-Founded Semantics for Paraconsistent Logic Programs
Chiaki Sakama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
Invited Paper
Formalizing Database Evolution in the Situation Calculus
Raymond Reiter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
VIII
Machine Learning
Learning Missing Clauses by Inverse Resolution
Peter Idestam-Almquist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 610
A Machine Discovery from Amino Acid Sequences by Decision Trees over Regular Patterns
Setsuo Arikawa, Satoru Kuhara, Satoru Miyano, Yasuhito Mukouchi, Ayumi Shinohara
an d Takeshi Sh in oh ara . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
.618
Efficient Induction of Version Spaces through Constrained Language Shift
Claudio Carpineto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.626
Theorem Proving
Theorem Proving Engine and Strategy Des·cription Language
Massimo Bruschi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 634
A New Algorithm for Subsumption Test
Byeong Man Kim, Sang Ho Lee, Seung Ryoul Maeng and Jung Wan Cho . . . . . . . . . . . 643
On the Duality of Abduction and Model Generation
Marc Denecker and Danny De Schreye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
Functional Programming and Constructive Logic
Defining Concurrent Processes Constructively
Yukihide Takayama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
Realizability Interpretation of Coinductive Definitions and Program Synthesis with Streams
Makoto Tatsuta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MLOG: A Strongly Typed Confluent Functional Language with Logical Variables
Vincent Poirriez . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ANew Perspective on Integrating Functional and Logic Languages
John Darlington, Yi-ke Guo and Helen Pull . . . . . . . . . . . . . . . . . . . . . . . . . .
Temporal Reasoning
A Mechanism for Reasoning about Time and Belief
Hideki Isozaki and Yoav Shoham . . . . . . ..
Dealing with Time Granularity in the Event Calculus
Angelo Montanari, Enrico Maim, Emanuele Ciapessoni and Elena Ratto
.658
.666
. 674
. 682
.694
.702
ARCHITECTURES & SOFTWARE
Hardware Architecture and Evaluation
UNIRED II: The High Performance Inference Processor for the Parallel Inference Machine
PIE64
Kentaro Shimada, Hanpei Koike and Hidehiko Tanaka . . . . . . . . . . . . . . . . . . . . .. 715
Hardware Implementation of Dynamic Load Balancing in the Parallel Inference Machine
PIM/c
T. Nakagawa, N. Ido, T. Tarui, M. Asaie and M. Sugie . . . . . . . . . . . . . . . . . . .
. 723
Evaluation of the EM-4 Highly Parallel Computer using a Game Tree Searching Problem
Yuetsu Kodama, Shuichi Sakai and Yoshinori Yamaguchi . . . . . . . . . . . . . . . . .
.731
OR-Parallel Speedups in a Knowledge Based System: on Muse and Aurora
Khayri A. M. Ali and Roland Karlsson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739
Invited Paper
A Universal Parallel Computer Architecture
William J. Dally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746
IX
AND-ParallelislTI and OR-Parallelism
An Automatic Translation Scheme from Prolog to the Andorra Kernel Language
Francisco Bueno and Manuel Hermenegildo . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 759
Recomputation based Implementations of And-Or Parallel Prolog
Gopal Gupta and Manuel V. Hermenegildo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770
Estimating the Inherent Parallelism in Prolog Programs
David C. Sehr and Laxmikant V. Kale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783
Implementation Techniques
Implementing Streams on Parallel Machines with Distributed Memory
Koichi Konishi, Tsutomu Maruyama, Akihiko Konagaya, Kaoru Yoshida and
Takashi Chikayama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .791
Message-Oriented Parallel Implementation of Moded Flat GHC
I{azunori Ueda and Masao Morita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799
Towards an Efficient Compile-Time Granularity Analysis Algorithm
x. Zhong, E. Tick, S. Duvvuru, L. Hansen, A. V. S. Sastry and R. Sundararajan
.809
Providing Iteration and Concurrency in Logic Programs through Bounded Quantifications
.817
Jonas Barklund and Hakan Millroth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Extension of Logic Programming
An Implementation for a Higher Level Logic Programming Language
Anthony S. K. Cheng and Ross A. Paterson . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825
Implementing Prolog Extensions: a Parallel Inference Machine
Jean-Marc Alliot, Andreas Herzig and Mamede Lima-Marques . . . . . . . . . . . . . . . . . . 833
Parallel Constraint Solving in Andorra-I
Steve Gregory and Rong Yang . . . . . . . . . . . . . . . . . . . .
.843
A Parallel Execution of Functional Logic Language with Lazy Evaluation
Jong H. Nang, D. W. Shin, S. R. Maeng and Jung W. Cho . . . . . . . . . . . . . . . . . . . . 851
Task Scheduling and Load Analysis
Self-Organizing Task Scheduling for Parallel Execution of Logic Programs
Zheng Lin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Asymptotic Load Balance of Distributed Hash Tables
Nobuyuki Ichiyoshi and Kouichi Kimura
.859
.869
Concurrency
Constructing and Collapsing a Reflective Tower in Reflective Guarded Horn Clauses
Jiro Tanaka and Fumio Matono . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877
CHARM: Concurrency and Hiding in an Abstract Rewriting Machine
Andrea Corradini, Ugo Montanari and Francesca Rossi . . . . . .
.887
Less Abstract Semantics for Abstract Interpretation of FG H C Programs
Kenji Horiuchi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .897
Databases and Distributed SystelTIS
Parallel Optimization and Execution of Large Join Queries
Eileen Tien Lin, Edward Omiecinski and Sudhakar Yalamanchili . . . . . . .
Towards an Efficient Evaluation of Recursive Aggregates in Deductive Databases
Alexandre Lefebvre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Distributed Programming Environment based on Logic Tuple Spaces
Paolo Ciancarini and David Gelernter . . . . . . . . . . . . . . . . .
.907
.915
.926
x
Programluing Environn1.ent
Visualizing Parallel Logic Programs with VISTA
E. Tick . . . . . . . . . . . . . . . . . . . . . .
Concurrent Constraint Programs to Parse and Animate Pictures of Concurrent Constraint
Programs
Kenneth M. Kahn . . . . . . . . . . . . . . . . . . . . . . .
Logic Programs with Inheritance
Yaron Goldberg, William Silverman and Ehud Shapiro . . . . . . . . . . . . . . . . . . .
Implementing a Process Oriented Debugger with Reflection and Program Transformation
Munenori Maeda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
· .. 934
.943
.951
. 961
Production Systen1.s
ANew Parallelization Method for Production Systems
E. Bahr, F. Barachini and H. Mistelberger
. . . . . . . . . . . . . . . . . . . . . . . · .. 969
Performance Evaluation of the Multiple Root Node Approach to the Rete Pattern Matcher
for Production Systems
Andrew Sohn and Jean-Luc Gaudiot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . · .. 977
APPLICATIONS & SOCIAL IMPACTS
Constraint Logic Programluing
Output in CLP(R)
Joxan Jaffar, Michael J. Maher, Peter J. Stuckey and Roland H. C. Yap
Adapting CLP(R) to Floating-Point Arithmetic
J. H. M. Lee and M. H. van Emden ..
Domain Independent Propagation
Thierry Le Provost and Mark Wallace . . . . . . . . . . . . . . . .
A Feature-Based Constraint System for Logic Programming with Entailment
Hassan 'Ai't-Kaci, Andreas Podelski and Gert Smolka . . . . . . . . . . . .
Qualitative Reasoning
Range Determination of Design Parameters by Qualitative Reasoning and its Application to
Electronic Circuits
Masaru Ohki, Eiji Oohira, Hiroshi Shinjo and Masahiro Abe
Logical Implementation of Dynamical Models
Yoshiteru Ishida . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Knowledge Representation
The CLASSIC Knowledge Representation System or, I<:L-ONE: The Next Generation
Ronald J. Brachman, Alexander Borgida, Deborah L. McGuinness, Peter F. PatelSchneider and Lori Alperin Resnick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Morphe: A Constraint-Based Object-Oriented Language Supporting Situated Knowledge
Shigeru Watari, Yasuaki Honda and Mario Tokoro . . . . . .
. . . . . . . .
On the Evolution of Objects in a Logic Programming Framework
F. Nihan Kesim and Marek Sergot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.987
.996
1004
1012
1022
1030
1036
1044
1052
Panel Discussion: Future Direction of Next Generation Applications
The Panel on a Future Direction of New Generation Applications
Fumio Mizoguchi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1061
Knowledge Representation Theory Meets Reality: Some Brief Lessons from the CLASSIC
Experience
Ronald J. Brachman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1063
XI
Reasoning wi th Constraints
Catherine Lassez
Developments in Inductive Logic Programming
Stephen 1'1'1uggleton . . . . . . . . . . . . . . . . . . . ..
Towards the General-Purpose Parallel Processing System
Kazuo Taki . . . . . . . . . . . . . . . . . . . . . . . . . .
Know ledge- Based SystelTIS
A Hybrid Reasoning System for Explaining
Mistakes in Chinese Writing
(
Jacqueline Castaing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Automatic Generation of a Domain Specific Inference Program for Building a Knowledge
Processing System
Takayasu Kasahara, Naoyuki Yamada, Yasuhiro Kobayashi, Katsuyuki Yoshino and
Kikuo Yoshimura . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Knowledge-Based Functional Testing for Large Software Systems
Uwe Nonnenmann and John K. Eddy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Diagnostic and Control Expert System Based on a Plant MO,del
Junzo Suzuki, Chiho Konuma, Mikito Iwamasa, Naomichi Sueda, Shigeru Mochiji and
Akimoto Kamiya . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1066
1071
1074
1076
1084
1091
1099
Legal Reasoning
A Semiformal Metatheory for Fragmentary and Multilayered Knowledge as an Interactive
Metalogic Program
Andreas Hamfelt and Ake Hansson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1107
HELIC-II: A Legal Reasoning System on the Parallel Inference Machine
Katsumi Nitta, Yoshihisa Ohtake, Shigeru Maeda, Masayuki Ono, Hiroshi Ohsaki and
Kiyokazu Sakane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115
Natural Language Processing
Chart Parsers as Proof Procedures for Fixed-Mode Logic Programs
David A. Rosenblueth . . . . . . . . . . . . . . . . . . . .
A Discourse Structure Analyzer for Japanese Text
K. Sumita, K. Ono, T. Chino, T. Ukita and S. Amano . . . . . . .
Dynamics of Symbol Systems: An Integrated Architecture of Cognition
K6iti Hasida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Knowledge Support Systems
Mental Ergonomics as Basis for New-Generation Computer Systems
M. H. van Emden . . . . . . . . . . . . . . . . . . . . . . . . . . . .
An Integrated Knowledge Support System
B. R. Gaines, M. Linster and M. L. G. Shaw . . . . . . . . . . . .
Modeling the Generational Infrastructure of Information Technology
B. R. Gaines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1125
1133
1141
1149
1157
1165
Parallel Applications
Co-HLEX: Co-operative Recursive LSI Layout Problem Solver on Japan's Fifth Generation
Parallel Inference Machine
Toshinori Watanabe and Keiko Komatsu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1173
A Cooperative Logic Design Expert System on a Multiprocessor
Yoriko Minoda, Shuho Sawada, Yuka Takizawa, Fumihiro Maruyama and
Nobuaki Kawato . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. ... 1181
A Parallel Inductive Learning Algorithm for Adaptive Diagnosis
Yoichiro N akakuki, Yoshiyuki Koseki and Midori Tanaka . . . . . . . . . . . . . . . . . . . . 1190
XII
Parallel Logic Simulator based on Time Warp and its Evaluation
Yukinori Matsumoto and Kazuo Taki . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1198
Invited Paper
Applications of Machine Learning: Towards Knowledge Synthesis
Ivan Bratko
1207
Author Index.
. . i
FOUNDATIONS
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
463
Logic Program Synthesis from First Order Logic
Specifications
Tadashi KAWAMURA
Institute for New Generation Computer Technology
1-4-28 Mita, Minato-ku, Tokyo 108, Japan
tkawamur@icot.or.jp
Abstract
In this paper, a logic program synthesis method from first
order logic specifications is described. The specifications
are described by Horn clauses extended by universally
quantified implicational formulae. Those formulae are
transformed into definite clause programs by meaningpreserving unfold/fold transformation. We show some
classes of first order formulae which can be successfully
transformed into definite clauses automatically by unfold/fold transformation.
1
Introduction
Logic program synthesis based on unfold/fold transformation [1] is a standard method and has been investigated by many researchers [2, 3, 5, 6, 11, 12, 19]. As
for the correctness of unfold/fold rules in logic programming, Tamaki and Sato proposed meaning-preserving
unfold/fold rules for definite clause programs [20]. Then,
Kanamori and Horiuchi proposed unfold/fold rules for a
class of first order formulae [7]. Recently, Sato proposed
unfold/fold rules for full first order formulae [18].
In the studies of program synthesis, unfold/fold rules
are used to eliminate quantifiers by folding to obtain definite clause programs from· first order formulae. However, in most of those studies, unfold/fold rules were applied nondeterministically and general methods to derive
definite clauses were not known. Recently, Dayantis [3]
showed a deterministic method to derive logic programs
from a class of first order formulae. Sato and Tamaki [19]
also showed a deterministic method by incorporating the
concept of continuation.
This paper shows another characterization of classes of
first order formulae from which definite clause programs
can be derived automatically. Those formulae are described by Horn clauses extended by universally quantified implicational formulae. As for transformation rules,
Kanamori and Horiuchi's unfold/fold rules are adopted.
A synthesis procedure based on unfold/fold rules is given,
and with some syntactic restrictions, those formulae are
successfully transformed into equivalent definite clause
programs. This study is also an extension of those by
Pettorossi and Proietti [14, 15, 16] on logic program
tr ansforma tions.
The rest of this paper is organized as follows. Section
2 describes unfold/fold rules and formalizes the synthesis
process. Section 3 describes a program synthesis procedure and proves that definite clause programs can be successfully derived from some classes of first order formulae
using this procedure. Section 4 discusses the relations to
other works and Section 5 gives a conclusion.
In the following, familiarity with the basic terminologies of logic programming is assumed[13]. As syntactical
variables, X, Y, Z, U, V are used for variables, A, B, H
for atoms and F, G for formulae, possibly with primes
and subscripts. In addition, 0 is used for a substitution,
FO for the formula obtained from formula F by applying
substitution 0, X for a vector of variables and Fa[G'] for
replacement of an occurrence of subformula G of formula
F with formula G'.
2
Unfold/Fold Transformation
for Logic Program Synthesis
In this section, preliminary notions of our logic program
synthesis are shown.
2.1
Preliminaries
Preliminary notions are described first.
A formula is called an implicational goal when it is of
the form Fl -+ F2, where Fl and F2 are conjunctions of
atoms.
Definition 2.1 Definite Formula
Formula C is called a definite formula when C is of
the form
A f - G1 1\ G2 1\ ... 1\ Gn (n ~ 0),
where Gi is a (possibly universally quantified) conjunction of implicational goals for i = 1,2, ... ,n. A is called
the head of C, G 1 1\ G 2 1\ . .. 1\ G n is called the body of
C and each G i is called a goal in the body of C.
464
Note that the notion of a definite formula is a restricted
form of that in [7].
A set of definite formulae is called a definite formula
program, while a set of definite clauses is called a definite
clause program. We may simply say programs instead of
definite formula (or clause) programs when it is obvious
to which we are referring.
Definition 2.2 Definition Formula
Let P be a definite formula program. A definite formula D is called a definition formula for P when all the
predicates appearing in D's body are defined by definite
clauses in. P and the predicate of D's head does not appear in P. The predicate of D's head is called a new
predicate, while those defined by definite clauses in P
are old predicates. A set of formulae D is called a definition formula set for P when every element D of D is
a definition formula for P and the predicate of D's head
appears only once in D.
Atoms with new predicates are called new atoms, while
those with old predicates are called old atoms.
2.2
Unfold/Fold
~ansformation
In this subsection, unfold/fold transformation rules are
shown following [7]. Below, we assume that the logical
constant i1'ue implicitly appears in the body of every unit
clause. Further, we assume that a goal is always deleted
from the body of a definite formula when it is the logical
constant true, and a definite formula is always deleted
when some goal in its body is the logical constant false.
Further, we introduce the reduction of implicational
goals with logical constant true and false, such as
,true::::} false, true /\ F ::::} F, and so on. (See [7] for
details.) Let G be an implicational goal. The reduced
form of G, denoted by G L is the normal form in the
above reduction system.
Variables not quantified in formula F are called global
variables of F. Atoms appearing positively (negatively)
in formula F are called positive (negative) atoms of F.
Definition 2.3 Positive Unfolding
Let Pi be a program, C be a definite formula in Pi,
G be a goal in the body of C and A be a positive old
atom of G containing no universally quantified variable.
Then, let Go be GA[Jalse] 1 and Cb be the definite formula obtained from C by replacing G with Go. Further,
let Cl , C2 , ••. ,Ck be all the definite clauses in Pi whose
heads are unifiable with A, say by mgu's 01 , O2 , ••• , Ok.
Let Gj be the reduced form of GOj after replacing AO j in
GO j with the body of CjOj , and Cj be the definite formula
obtained from COj by replacing GOj in the body with G j .
(New variables introduced from Cj are global variables
of Gj.) Then, Pi+l = (Pi - {C}) u {Cb,C~,c~, ... ,CD.
Cb, C~, C~, . .. , Ck are called the results of positive unfolding C at A (or G).
Example 2.1 Let P be a definite clause program as follows:
Cl : list([]).
C2 : list([XIL]) f - list(L).
C3 : 0 < suc(Y).
C4 : suc(X) < suc(Y) f - X < Y.
C5 : member(U,[UIL]).
C6 : member(U,[VIL]) f - member(U,L).
Let C7 be a definition formula for P as follows :
C7 : less-than-all(X,L) f list(L) /\ V Y(member(Y,L) ---t X
when every descendant formula C' of C in P' satisfies
one of the following:
(a) C' is a definite clause.
(b) There exists a goal G consisting of positive atoms
only in the body of C' such that an old atom in G is
not unifiable with the head of any definite clause in P'.
(c) By successively folding C' by clauses in {C} U D, a
definite clause can be obtained.
PU {C} is said to be closed with respect to D when there
exists a closed program with respect to < P, C, D > and
for every definition formula D in D there exists a closed
program with respect to < P, D, D U {C} >.
Example 2.5 Let P and P3 be programs in Example 2.2. Then, P3 is closed w.r.t. < P, C7 , 0 >. Further,
P U {C7 } is closed w.r.t. 0.
The above framework is an extension of the one shown
in [8], and also a modification of the one Pettorossi and
Proietti proposed [14, 15, 16] in their studies of program
transformation.
Now, our problem can be formalized as follows: for
given definite clause program P and definition formula
C for P, find a finite definition formula set 1) for P such
that P U {C} is closed with respect to D.
3
Some Classes of First Order
Formulae from Which Logic
Programs Can Be Derived
In this section, we specify some classes of first order formulae from which definite clause programs can be derived by unfold/fold transformation.
3.1
A Program Synthesis Procedure
In this subsection, we show a naive program synthesis
procedure. In the following, we borrow some notions
about programs in [15, 16]. We consider definite formula
(clause) programs with predicate =, which have no explicit definition in the programs. Predicate = is called
a base predicate, while other predicates are called defined predicates. Atoms with base predicates are called
base atoms, while those with defined predicates are called
defined atoms. Transformation rules can be applied to
defined atoms only.
A formula containing base atoms can be reduced by
unifying arguments of =. When a universally quantified variable and a global variable are unified, the global
variable is substituted for the universal one. The above
reduction is called the reduction with respect to =. We
assume that no formulae are reduced w.r.t. = unless this
is explicitly mentioned.
Further, we assume that the following operations are
always applied implicitly to the results of positive or negative unfolding. Goals G is said to be connected when
at most one universally quantified implicational goal G'
466
appears in G and each atom in G' has common universally quantified variables with at least one another atom
in G'. Let C be a definite formula such that all the goals
in its body are connected. Let C' be one of the results of
positive or negative unfolding C at some goal. By logical
deduction, definite formulae CL C~, ... ,C:n (m 2: 1) are
obtained from C' such that all the goals in the body of
Cf are connected. (Note that some goal G in the body of
C' is of the form Fl ---t F2 or Fl V F2 and no universally
quantified variables appear in both Fl and F 2 , C' can be
split into two formulae by replacing G in C' with ,Fl
(or F l ) and F 2 .)
Before showing our program synthesis procedure, a notion is defined.
Definition 3.1 Sound Unfolding
Suppose that positive or negative unfolding is applied
to a definite formula at atom A. Then, the application
of unfolding is said to be sound when no two distinct
universally quantified variables in A are unified when
reducing the result of unfolding with respect to =.
Some syntactic restrictions on programs ensure the
soundness of all possible applications of unfolding. In
fact, the restriction shown in [3] ensures the soundness.
However, in the following, we assume that every application of unfolding is sound, without giving any syntactic
restriction, for simplicity.
Now, we show our program synthesis procedure, which
is similar to partial evaluation procedures(cf.[9, 10]).
First, a procedure to synthesize new predicates is shown.
Procedure 3.1 Synthesis of New Predicates
Suppose that definite formula program P and definite
formula C in P of the form A +- G ll G2 , .•• , Gn are
given. Let G~ be the reduced formula obtained from Gi
by removing all base atoms and by replacing all universally quantified variables appearing in every base atom
with distinct fresh global variables if global variables are
substituted for them when reducing Gi w.r.t. =. Let Di
be of the form Hi +- G~ for i = 1,2, ... , n, where Hi is
an atom whose predicate does not appear in P or H j for
i =I- j and whose arguments are all global variables of C
appearing in Gi. Then, D l ,D 2 , ••• ,Dn are returned.
Note that in Procedure 3.1, C can be folded by
D l ,D2 , •.• ,Dn after reducing it w.r.t. = when C is the
result of sound unfolding, and the result of the folding is
a definite clause.
Example 3.1 Let P be a program as follows.
C l : all-less-than(L,M) +- list(L) 1\ list(M) 1\
V U,V (member(U,L) 1\ member(V,M) ---t U < V).
C 2 : member(U,[VIXD +- U = V.
C 3 : member(U,[VIX]) +- member(U,X).
The definition of '<' is given in Example 2.1. Suppose
that C's body consists of only one goal. By applying
positive unfolding and negative unfolding to C successively, the following formulae are obtained. (The reduction w.r.t. = is done when no universally quantified variable appears as an argument of =.)
C4 : all-less-than(O,M) +- list(M).
Cs : all-less-than([XILJ,M) +- (list(L) 1\ list(M)) 1\
(list(L) 1\ list(M) 1\
V U,V (U = X 1\ member(V,M) ---t U < V)) 1\
(list(L) 1\ list(M) 1\
VU,V (member(U,L)l\member(V,M) ---t U < V)).
Then, by Procedure 3.1, the following new predicates are
defined from Cs .
Dl : new1(X,L,M) +- list(L) 1\ Hst(M) 1\
V V (member(V ,M) --+ X < V).
D2 : new2(L,M) +- list(L) 1\ Hst(M) 1\
V U,V (member(U,L) 1\ member(V,M) ---t U < V).
Next, the whole procedure for program synthesis is
shown.
Procedure 3.2 A Program Synthesis Procedure
Suppose that definite clause program P and definition
formula C for P are given. Let 'D be the set {C}.
(a) If there exist no unmarked formulae in 'D, then return P and stop.
(b) Select an unmarked definition formula D from 'D.
Mark D 'selected.' Let P' be the set {D}.
(c) If there exist no formulae in P' which do not satisfy
conditions (a) and (b) in Definition 2.8; then P :=
PUP' and go to (a).
(d) Select a definite formula C' from P'. Apply positive
or negative unfolding to C'. Let CJ, ... , Cn be the
results. Remove C' from P'.
(e) Apply Procedure 3.1 to C l , ... , Cn. Let DJ,"" Dm
be the outputs. Add Di to 'D if it is not a definite clause
and there exists no formula in 'D which is identical to D i
except for the predicate of the head. Fold C1 , •• • , Cn
by the formulae in 'D and add the results to P'.
(f) Go to (c).
Example 3.2 Consider the program in Example 3.1
again. We see that D2 is identical to C except for the
predicate of the head. Cs can be folded by D1 and C
after reduction w.r.t. =. The result is as follows.
C6 : all-less-than([XIL],M) +- list(L) 1\ list(M) 1\
new1(X,L,M) 1\ all-less-than(L,M).
Similar operations are applied to D 1 , and finally, the
following clauses are obtained.
D3 : new1(X,L,D) +- list(L).
D4 : new1(X,L,[YIM)) +- X < Y 1\ new1(X,L,M).
Note that Procedure 3.2 does not necessarily derive
a definite clause program from a definite formula program. For example, when the following program is given
as input, Procedure 3.2 does not halt.
C1 : p(X,Y) +- p(X,Z) 1\ p(Z,Y)
C2 : h(X,Y) +- V Z (p(X,Z) --+ p(Y,Z))
467
3.2
Classes of First Order Formulae
In this section, we show some classes of definite formula
programs which can be transformed into equivalent definite clause programs by Procedure 3.2.
Throughout this subsection, we assume that unfolding
is always applicable to every definite formula at an atom
when there exist definite clauses whose heads are unifiable with the atom. Note that the above assumption
does not always hold. This problem will be discussed
in 3.3.
After giving a notion, we show a theorem which is an
extension of the results shown in [15]. A simple expression is either a term or an atom.
Definition 3.2 Depth of Symbol in Simple Expression
Let X be a variable or a constant and E be a simple
expression in which X appears. The depth of X in E,
denoted by depth( X ,E), is defined as follows.
(a) depth(X,X) = l.
(b) depth(X,E) = max{depth(X,ti)IX appears in ti
for i = 1, ... ,n} + 1, if E is either f(tl, ... ,tn ) or
p( t l , ... , t n ), for any function symbol f or any predicate symbol p.
The deepest variable or constant in E is denoted by
maxdepth( E).
Theorem 3.1 Let P be a definite clause program. Su'ppose that for any definition formula C for P, there exists
a U-selection rule R for P U {C} rooted on C such that R
is defined for all descendant clauses of C in which at least
one defined atom appears. Suppose also that there exist
two positive integers Hand W such that every descendant clause C' of C in every program P' obtained from
P U {C} via R satisfies the following two conditions.
( a) The depth of every term appearing in every goal in
the body of C' is less than H.
(b) Let Gl,GZ, ... ,Gn be connected goals inthe body
of C'. Then, the number of atoms appearing in Gi is
less than W, for i = 1,2, ... , n.
Then, there exists a finite definition formula set 1) for P
such that P U {C} is closed with respect to 1).
Proof. From hypothesis (a), only a finite number of distinct atoms (modulo renaming of variables) can appear
in the goals of all the descendant formulae of C. Then,
apply Procedure 3.2 to P and C. Note that every goal in
the body of every descendant formula of C is connected.
Then, for every goal of every descendant formula of C,
the number of atoms appearing in the goal is less than
W, from hypothesis (b). Hence, only a finite number of
distinct goals can appear in all the descendant formulae
of C. Thus, we can obtain a finite definition formula
set 1)0 for P such that there exists a closed program P'
w.r.t. < P, C, 1)0 >.
The above discussion holds for all the definition formulae in 1)0, since those formulae are constructed from
bodies of the descendant formulae of C. Evidently, only
a finite number of distinct definition formulae can be defined. Thus, there exists a finite definition formula set 1)
for P such that P U {C} is closed w.r. t. 1).
0
Theorem 3.1 shows that Procedure 3.2 can derive a
definite clause program when (a) a term of infinite depth
can not appear, or (b) an infinite number of atoms can
not appear in a connected goal during a transformation
process. In the following, we show some syntactic restrictions on programs which satisfy the above conditions.
Proietti and Pettorossi showed some classes of definite
clause programs which satisfy the conditions in Theorem 3.1 in their studies of program transformation [15].
We show that some extensions of their results are applicable to our problem.
The following definitions are according to [15]. The set
of variables occurring in simple expression E is denoted
by var(E).
Definition 3.3 Linear Term Formula and Program
A simple expression or a formula is said to be linear
when no variable appears in it more than once. A definite
formula (clause) is called a linear term formula (clause)
when every atom appearing in it is linear. A definite
formula ( clause) program is called a linear term program
when it consists of linear term formulae (clauses) only.
A linear term formula (clause) is called a strongly linear term formula (clause) when its body is linear. A definite formula (clause) program is called a strongly linear
term program when it consists of strongly linear term
formulae (clauses) only.
Note that the following definite clause is not a linear
term clause.
member(X,[XIL]).
However, it is easy to obtain an equivalent linear term
clause as follows :
member(X,[YIL])+-- X=Y.
Definition 3.4 A Relation ~ between Linear Simple
Expressions
Let El and Ez be linear simple expressions. When
depth(X,El)~depth(X,Ez) holds for every variable X in
var(El)nvar(Ez), we write El ~ E z. (Both El ~ Ez and
Ez ~ El hold when var(El)nvar(Ez)= 0. )
Definition 3.5 Non-Ascending Formula and Program
Let C be a linear term formula and H be the head of
C. C is said to be non-ascending when A ~ H holds
for every defined atom A appearing in the body of C. A
linear term program is said to be non-ascending when it
consists of non-ascending formulae only.
A definite formula (clause) is said to be strongly nonascending when it is a strongly linear term formula
(clause) and non-ascending. A definite formula (clause)
program is said to be strongly non-ascending when it
468
consists of strongly non-ascending formulae (clauses)
only.
Definition 3.6 Synchronized Descent Rule
Let P be a linear term program, R be a V-selection
rule for P and C be any descendant formula of the root
formula for R. Let AI, A 2, ... ,An be all the atoms appearing in the body of C. Then, R is called a synchronized descent rule when
(a) R selects the application of positive or negative unfolding to C at Ai if and only if Aj :::; Ai holds for
j = 1, ... , n, and
(b) R is not defined for C, otherwise.
Note that synchronized descent rules are not necessarily defined uniquely for given programs and definition
formulae.
The following theorem is an extension of the one shown
in [15, 16].
Lemma 3.2 Let P be a non-ascending definite clause
program, C be a linear term definition formula for P, and
R be a synchronized descent rule rooted on C. Let p' be
a program obtained from PU{ C} via R. For each defined
atom A appearing in the body of every descendant clause
of C in pI, the following holds :
maxdepth(A) ~
max{maxdepth(B)j B is a defined atom in P U {C}}
Proof By induction on the number of applications of
unfolding.
0
Now we show some classes of definite formula programs
which satisfy the hypotheses of Theorem 3.1. In the following, for simplicity, we deal with definition formulae
with only one universally quantified implicational goal
in the body. The results are easily extended to the definite formulae with a conjunction of universally quantified
implicationaJ goals.
The following results are also extensions of those
shown in [15].
Theorem 3.3 Let P be a strongly non-ascending definite clause program and C be a linear term definition
formula for P of the form H f - Al /\ VX(A2 ~ A 3 ), such
that the following hold.
(a) For every clause D in P of the form HD f - BI /\ ... /\
Bn /\ B~ /\ ... /\ E~, where B I , ... ,En are defined atoms
and B~, ... ,B~ are base atoms, the following hold.
(a-I) Let tH be any argument of H D . For every argument ti of B i , if tH contains a common variable with
ti, then ti is a subterm of tHo
(a- 2) For every argument ti of B i , if ti is a su bterm
of an argument tH of H D , then no other argument of
Bi is a subterm of tHo
(b) There exist two arguments ti and Si of some Ai (ti i
Si, i = 1,2 or 3) such that the following hold.
(b-l) There exists an argument tj of Aj (i 1= j) such
that
vars( Ai )nvars( Aj )=vars( ti )nvars( t j), and
. either ti is a subterm of tj, tj is ;1 subterm of ti or
vars( ti)nvars( tj )=0.
(b-2) There exists an argument Sk of Ak (k =I i,j)
such that the same relations as above hold for Si and
Sk'
(b-3) Aj contains no common variable with A k .
Then, there exists a definition formula set 'D for P such
that P U {C} is closed with respect to 'D.
Proof Note that there exists an atom A in the body of C
s.t. an argument of A is a maximal term in the body of
C w.r.t. subterm ordering relation. Let C ' be any result
of unfolding C at A and G be any connected goal in the
body of C ' of the form FI /\ VX(F2 ~ F3 ), where Fi is a
conjunction of atoms. Then, from the hypothesis, it can
be shown that a similar property to hypothesis (b) holds
for G. Note that the number of implicational goals dose
not increase by applying positive unfolding and no global
variables are instantiated by applying negative unfolding.
Then, again there exists an atom in the body of C ' s.t.
one of its arguments is a maximal term in the body of
C ' w.r.t. subterm ordering relation. By induction on
the number of applications of unfolding, a synchronized
descent rule can be defined for every descendant formula
of C. Then, from Lemma 3.2, the depth of every term
appearing in every descendant clause of C is bounded.
Note that the number of different subterms of a term
is bounded. Then, from the hypothesis, the number of
atoms appearing in every connected goal in the body of
every descendant formula of C is bounded. Thus, P and
C satisfy the hypotheses of Theorem 3.1. Hence, there
exists a definition formula set 'D for P such that P U {C}
is closed with respect to 'D.
0
Note that Theorem 3.3 holds for any nondeterministic
choice of synchronized descent rules in the above proof.
Note also that any program can be modified to satisfy
hypothesis (a) of Theorem 3.3 by introducing atoms with
= in the body.
Corollary 3.4 Let P be a strongly non-ascending definite clause program and pI be a definite clause program
such that no predicate appears in both P and P'. Let
C be a linear term definition formula for P U pI of the
form H f - Al /\ \fX(A2 ~ A 3 ), where the predicates of
Al and A2 are defined in P and that of A3 is defined in
P'. Suppose that the following hold.
(a) Hypothesis (a) of Theorem 3.3 holds for every clause
Din P.
(b) There exist arguments tl of Al and t2 of A2 such
that the following hold.
(b-l) vars(A 1 )nvars(A 2 )=vars( tl)nvars(t2)'
469
(b-2) Either tl is a subterm of t z, tz is a subterm of tl
or vars(tdnvars(tz)=0.
(c) No variable in A3 is instantiated by applying positive or negative unfolding to C successively.
Then, there exists a definition formula set 'D for P U p'
such that P U p' U {C} is closed with respect to 'D.
Proof. Suppose that unfolding is never applied at A 3 . A
synchronized descent rule can be defined by neglecting
A 3. Since variables in A3 are never instantiated, no other
atoms are derived from A 3 . Thus, the corollary holds. 0
In Corollary 3.4, no restrictions are required on the
definition of A 3 • This result corresponds to that in [3].
Note that any program can be modified to satisfy hypothesis (c) of Corollary 3.4 by introducing atoms with
= in the body.
Example 3.3 The program and the definition formula
in Example 2.1 satisfy the hypotheses of Theorem 3.3 and
Corollary 3.4, if clause C5 is replaced with the equivalent
clause:
C~: member(U,[V\L]) f - U=V.
In fact, a definite clause program can be obtained, as
shown in subsection 2.2.
Next, we show an extension of the results shown in
Theorem 3.3. Let P be a non-ascending definite clause
program and C be a definition formula for P of the form
H f - A/\ VX(FI -+ F z ), where Ais an atom, and Fl and
Fz are conjunctions of atoms. Let Di be the definition
clause for P of the form Hi f - Fi for i = 1,2. If Di
can be transformed into a set of definite clauses which
satisfies the hypotheses of Theorem 3.3, by replacing Fi
with Hi, we can show that P U {C} can be transformed
into an equivalent definite clause program.
The above problem is related to the foldability problem in [16]. The foldability problem is described informally as follows. Let P be a definite clause program and
C be a definition clause for P. Then, find program pI
obtained from P U {C} which satisfies the following: for
every descendant clause C ' of C in pI, there exists an ancestor clause D of C ' such that C"s body is an instance
of D's.
Proietti and Pettorossi showed some classes of definite
clause programs such that thefoldability problem can be
solved [16]. We show that their results are also available
to our problem.
A definite clause program P is said to be linear recursive when at most one defined atom appears in the body
of each clause in P. Note that a linear recursive and
linear term program (clause) is a strongly linear term
program (clause).
Lemma 3.5 Let P be a linear recursive non-ascending
program and C be a non-ascending definition clause for
P of the form H f - Al /\ Az /\ Bl /\ ... /\ B n , where Al
and Az are defined atoms and B 1 , ••. , Bn are base atoms.
Suppose that the following hold.
(a) For every clause D in P of the form HD f - AD /\
B~ /\ ... /\ B~, where AD is the only defined atom in
the body of D, the following hold.
(a-I) Let tH be any argument of H D . For every argument tA of AD, if tH contains a common variable
with t A , then tA is a subterm of tHo
(a-2) For every argument tA of AD, if tA is a subterm
of an argument tH of HD , then no other argument of
AD is a subterm of tHo
(b) There exist arguments tl of Al and t z of Az such
that the following hold.
(b-l) vars(A 1)nvars(A 2 )=vars( t1)nvars( t2)'
(b-2) Either tl is a subterm of t z , t2 is a subterm of tl
or vars(t 1)nvars(t 2 )=0.
Then, from P U {C}, we can obtain a linear recursive
non-ascending program which define the predicate of H
by unfold/fold transformation.
Proof. As shown in [16], we can get a solution of the
foldability problem for P and C. Then, obviously, a
linear recursive program is obtained.
0
Example 3.4 Let P be a linear recurSIve nonascending program as follows.
C1 : subseq([],L).
C2 : subseq([X\L],[Y\M]) f - X = Y /\ subseq(L,M).
C3 : subseq([X\L],[Y\M]) f - subseq([X\L],M).
Let C be a non-ascending definition clause for P as follows.
C: csub(X,Y,Z) f - subseq(X,Y), subseq(X,Z).
Then, P U {C} can be transformed into a linear recursive
non-ascending program as follows.
csub([],Y,Z).
csub([A\X],[B\Y],Z) f - A = B /\ cs(A,X,Y.Z).
csub([AIX],[BIYJ,Z) f - csub([AIX]'Y,Z).
cs(A,X,Y,[BIZ]) f - A = B /\ csub(X,Y,Z).
cs(A,X,Y,[BIZ]) f - cs(A,X,Y,Z).
Though Proietti and Pettrossi showed one more
class [16], we will not discuss this here.
Now, we get the following theorem.
Theorem 3.6 Let P be a linear recursive non-ascending
program and C be a linear term definition formula for
P of the form H f - Al 1\ VX(A2 1\ B2 ---+ A3 /\ B 3), such
that the following hold.
(a) Hypothesis (a) of Lemma 3.5 holds for P.
(b) Let 51 be the set of all the arguments of AI, and
5 i be the set of all the arguments of Ai and Bi for
i = 2,3. Then, there exist two terms tj and Sj in
some 5 j (tj i= Sj,j = 1,2 or 3) such that the following
hold.
(b-l) there exists a term t k in 5 k (j i= k) such that
. vars(5j )nvars(5k)=vars( tj )nvars(tk), and
470
. either tj is a subterm of tk, tk is a subterm of tj or
vars( t j )nvars( tk )=0.
(b-2) There exists a term Sz of Sz (l i= j, k) such that
the same relations as above hold for Sj and Sz.
(b-3) Sk contains no common variable with Sz.
Then, there exists a definition formula set V for P such
that P u {C} is closed with respect to V.
unfolding G, or they are unified with terms consisting of
constants and global variables by reduction w.r.t. =.
We believe that techniques such as mode analysis are
available to guarantee that every applicable negative unfolding satisfies the above conditions.
Proof Obvious from Theorem 3.3 and Lemma 3.5.
Negative unfolding should be applied without instantiating global variables. In some cases, this restriction may
be critical. However, we can deal with most of those
cases by adding positive atoms to the formula such tha~
the globaJ variables can be instantiated by applying positive unfolding at those atoms. Atoms with predicates
which specify data types (cf. list) are available. For
example, with the definitions of 'member' and '<' in Example 2.1, negative unfolding can not be applied to the
definite formula below.
less-than-all(X,L) +- V Y(member(Y,L) -7 X 0
then It I = 1 + Itll + ... + Itnl
else It I = 0
It is then possible to introduce weight-functions on
atoms.
in G'.
The binary relations descendent and ancestor, defined on
atoms in goals, are the transitive closures of the direct descendent and direct ancestor relations respectively. For
A an atom in G and B an atom in G', A is an ancestor
of B is denoted as A >pr B ("pr" stands for proof tree).
Notice that we also speak about one goal G' being an ancestor (or descendent) of another goal G. This terminology refers to the obvious relationships between goals in
an SLD-tree and should not be confused with the prooftree based relationships between literals, introduced in
the previous definition. The following definition does
introduce a relationship between goals, based on definition 3.3.
Definition 3.4 Let G and G' denote two different nodes
in an SLD-tree T. Let R be the computation rule used
in T. Then G' covers G iff
1. R( G') and R( G) are atoms with the same predicate
2. R( G') >pr R( G)
Notice that G' covers G implies that G' is an ancestor of
G.
We need one more piece of terminology.
Definition 3.5 Let G and G' denote two different nodes
in an SLD-tree T. We call G' the youngest covering ancestor of G iff
1. G' covers G
2. For any other node G ft such that Gil covers G, we
have that Gil covers G'
We are now finally able to formulate the following algorithm:
Algorithm 3.6
Definition 3.2 Let p be a predicate of arity nand S=
{al,"" am}, 1 ~ ak ~ n, 1 ~ k ~ m, a set of argument
positions for p. We define 1.lpls : {AlA is an atom with
predicate symbol p} ~ IN as follows:
Ip(tll' .. ,tn)lpls = Itall + ... + Itam I
Input
a definite program P
a definite goal ~A
The next two definitions introduce useful relations on
literals and goals in an SLD-tree.
Initialisation
T := {( +-A,l)}
Pr:= 0
Definition 3.3 Let (G,i) = ((~AI, ... ,Aj, ... ,An),i)
be a node in an SLD-tree T, let R( G) = Aj be the
call selected by the computation rule R, let H ~
Bll ... ,Bm be a clause whose head unifies with Aj
and let (J = mgu(Aj, H) be the most general unifier. Then (G, i) has a son (G' , k) in T, (G' , k) =
(( ~All"" Aj - I , B ll ···, Bm, Aj+l,';" An)(J, k).
We
say that BI(J, ... , Bm(J in G' are direct descendents of Aj
in G and that Aj in G is a direct ancestor of BI(J, . .. , Bm(J
Output
a finite SLD-tree
T
for P U {+- A}
Terminated := 0
Failed:= 0
For each recursive predicate pin in P and
for the derivation D in T:
SplD := {l, ... , n}
While there exists a derivation D in
D f/. Terminated do
Let (G, i) name the leaf of D
T
such that
476
Select the leftmost atom p( t 1 , ..• ,t n ) in G
satisfying the following condition:
If p is recursive and there is
a youngest covering ancestor (G', j) of (G, i) in D
then IR(G')lp,Sp,D new > Ip(t 1 , ..• , tn)lp,Sp,D new where
Sp,D new = Sp,D \ Sp,Dremove and
Sp,Dremove
... rev([l,2IXsJ,[],Zs)
... rev([2IXs],[1],a)
=
{ak E Sp,D IIp(t 1, ... , tn)lp,{ak} > IR(G')lp,{ak}}
If such an atom p( t 1 , ••. ,tn ) can be found
then
.... rev(Xs,[2,1],Zs)
R(G) :=P(tl, ... ,tn)
Let Derive( G, i) name the set of all derivation steps
that can be performed
If Derive( G, i) = 0
then
Add D to Terminated and Failed
else
Let Descend(R(G), i) name the set of
all pairs ((R(G), i), (BO,j)), where
- B is an atom in the body of a clause
applied in an element of Derive( G, i)
- 0 is the corresponding m.g. u.
- j is the number of the corresponding
descendent of (G, i)
Expand D in T with the elements of Derive( G, i)
Add the elements of Descend( R( G), i) to Pr
For every newly created extension D' of D and
for every recursive predicate q in P:
if q = p and (G, i) has a covering ancestor in D
new
then Sq,D' := Sq,D
else Sq,D' := Sq,D
else
Add D to Terminated
Endwhile
We have the following theorem.
Theorem 3.7 Algorithm 3.6 terminates. If a definite
program P and a definite goal -A are given as inputs,
its output T is a finite (possibly incomplete) SLD-tree for
P U {-A}.
Proof The theorem is an immediate consequence of
0
proposition 3.1 in [Bruynooghe et al., 1991aJ.
Example 3.8 The SLD-tree generated by algorithm 3.6
for the program and the query from example 2.2, are
depicted in figure 1. ("reverse" has been abbreviated to
"rev" .)
4
4.1
Combining These Techniques
Introduction
In the previous section, we introduced an algorithm for
the automatic construction of (incomplete) finite SLDtrees. In this section, we present sound and complete
Zs=[2.1]
~ Xs=[X'IXs']
xs=[/
o
~
.... rev(Xs',[X',2,l],a)
Figure 1: The SLD-tree for example 3.8.
partial deduction methods, based on it. Moreover, these
methods ar.e guaranteed to terminate. The following example shows that this latter property is not obvious, even
when termination of the basic unfolding procedure is ensured. We use the basic partial deduction algorithm from
[Benkerimi and Lloyd, 1990], together with our unfolding algorithm.
Example 4.1 For the reverse program with accumulating parameter (see example 2.2 for the program and the
starting query), an infinite number of (finite) SLD-trees
is produced (see figure 2). This behaviour is caused by
the constant generation of "fresh" body-literals which,
because of the growing accumulating parameter, are not
an instance of any atom that was obtained before.
In [Benkerimi and Lloyd, 1989], it is remarked that a solution to this kind of problems can be truncating atoms
put into A at some fixed depth bound. However, this
again seems to have an ad-hoc flavour to it, and we therefore devised an alternative method, described in the next
section.
4.2
An algorithm for partial deduction
We first introduce some useful definitions and prove a
lemma.
Definition 4.2 Let P be a definite program and p a
predicate symbol of the language underlying P. Then a
pp' -renaming of P is any program obtained in the following way:
• Take P together with a fresh-duplicate-copy of
the clauses defining p.
• Replace p in the heads of these new clauses by some
new (predicate) symbol pi (of the same arity as p).
477
• Replace p by p' in any number of goals in the bodies
of (old and new) clauses.
___ rev([1,2IXs],[],Zs)
~
rev([2IXs],[1],Zs)
--- rev(Xs,[2,1],Zs)
Zs=[2,1]
~XS=[X'IXS']
Xs=[Y
o
~
.... rev(Xs',[X',2,1],Zs)
.... rev(Xs',[X',2,l],Zs)
o
--- rev(Xs",[X",X',2,1],Zs)
--- rev(Xs",[X",X',2,1],Zs)
Figure 2: An infinite number of (finite) SLD-trees.
Lemma 4.3 Let P be a definite program and Pr a pp'renaming of P. Let G be a definite goal in the language
underlying P. Then the following hold:
• Pr U {G} has an SLD-refutation with computed answer e iff P U {G} does.
• Pr U {G} has a finitely failed SLD-tree iff P U {G}
does.
Proof There is an obvious equivalence between SLDderivations and -trees for P and Pr •
0
Definition 4.4 Let P be a definite program and p a
predicate symbol of the language underlying P. Then
the complete pp' -renaming of P is the pp'-renaming of P
where p has been replaced by p' in all goals in the bodies
of clauses.
Our method for partial deduction can then be formulated as the following algorithm.
Algorithm 4.5
Input
a definite program P
a definite goal ~A =~p(tl, .. . , t n )
in the language underlying P
a predicate symbol p', of the same arity as p,
not in the language underlying P
Output
a set of atoms A
a partial deduction P/ of Pr ,
the complete pp'-renaming of P, wrt A
Initialisation
Pr := the complete pp'-renaming of P
A := {A} and label A unmarked
While there is an unmarked atom B in A do
Apply algorithm 3.6 with Pr and ~B as inputs
Let TB name the resulting SLD-tree
Form PrB, a partial deduction for B in Pr , from TB
Label B marked
Let AB name the set of body literals in Pr B
For each predicate q appearing in an atom in AB
Let msg q name an msg of all atoms having q
as predicate symbol in A and AB
If there is an atom in A having q as predicate
symbol and it is less general than msgq
then remove this atom from A
,If now there is no atom in A having q as
predicate symbol
then add msgq to A and label it unmarked
Endfor
Endwhile
Finally, construct the partial deduction P/ of Pr wrt A:
Replace the definitions of the partially deduced
predicates by the union of the partial deductions Pr B
for the elements B of A.
We illustrate the algorithm on our running example.
Example 4.6
complete renaming of the reverse program:
reverse( [] ,L,L) .
reverse([X\Xs]'Y s,Zs) ~ reverse'(Xs,[X\Y s]'Zs).
reverse'([],L,L ).
reverse'([X\Xs],Y s,Zs) ~ reverse'(Xs,[X\Ys],Zs).
partial deduction for ~reverse([1,2\Xs],[],Zs):
reverse( [1 ,2], [], [2,1]).
reverse([1,2,X\Xs]'[],Zs) ~ reverse'(Xs,[X,2,1],Zs).
partial deduction for ~reverse'(Xs,[X,2,1 ]'Zs):
reverse'( [] ,[X,2,1] ,[X,2,1]).
reverse'( [X'\Xs], [X,2,1] ,Zs) ~
reverse'(Xs, [X',X,2, 1],Zs).
msg of reverse'(Xs,[X,2,1]'Zs) and
reverse'(Xs,[X',X,2,1 ],Zs): reverse'(Xs,[X,Y,Z\Y s],Zs)
478
partial deduction for +--reverse'(Xs,[X,Y,ZIYs}'Zs):
reverse'( [J ,[X, Y,ZIY sJ ,[X,Y,ZI YsJ).
reverse'([X'IXs],[X,Y,ZIY s],Zs) +-reverse'(Xs ,[X' ,X, Y,Z IY s] ,Zs).
Corollary 4.9 Let P be a definite program, A =
p( i 1 , •.• , in) be an atom and p' be a predicate symbol
used as inputs to algorithm 4.5. Let A be the set of
atoms and P/ be the program output by algorithm 4.5.
Let G =+--Al, ... , Am be a goal in the language underlying P, consisting of atoms that are instances of atoms
in A. Then the following hold:
resulting set A:
{reverse([1 ,2IXs] ,[J,Zs ),reverse'(Xs,[X,Y,ZIYs],Zs)}
resulting partial deduction:
reverse( (1,2],[J ,[2,1]).
reverse((1 ,2,XIXs],[J ,Zs) +-- reverse'(Xs,[X,2,1]'Zs).
reverse'( [], [X, Y,ZIYs] ,[X,Y,Z IYs]).
reverse'([X'IXs],[X,Y,ZIY s],Zs) +-reverse'( Xs, [X' ,X, Y,Z IY sJ,Zs).
• P/ U {G} has an SLD-refutation with computed an- .
swer () iff P U {G} does.
• P/ U {G} has a finitely failed SLD-tree iff P U {G}
does.
We can prove the following interesting properties of
algorithm 4.5.
Theorem 4.7 Algorithm 4.5 terminates.
Proof
Due to space restrictions,
(Martens and De Schreye, 1992].
we refer to
o
Theorem 4.8 Let P be a definite program, A
p( i 1 , .•• , in) be an atom and p' be a predicate symbol
used as inputs to algorithm 4.5. Let A be the (finite) set
of atoms and P/ be the program output by algorithm 4.5.
Then the following hold:
• A is independent.
• For any goal G =+--Al, . .. , Am consisting of atoms
that are instances of atoms in A, P/ U {G} is Acovered.
Proof
• We first prove that A is independent.
From the way A is constructed in the For-loop, it
is obvious that A cannot contain two atoms with
the same predicate symbol. Independence of A is
an immediate consequence of this.
• To prove the second part of the theorem, let Pr * be
the subprogram of P/ consisting of the definitions
of the predicates in P/ upon which G depends. We
show that Pr * U {G} is A-closed.
Let A be an atom in A. Then the For-loop in algorithm 4.5 ensures there is in A a generalisation of
any body literal in the computed partial deduction
for A in Pr'. The A-closedness of P/ U {G} now
follows from the following two facts:
1. Pr ' is a partial deduction of a program (Pr ) wrt
A.
2. All atoms in G are instances of atoms in A.
o
Proof The corollary is an immediate consequence of
lemma 4.3 and theorems 2.1 and 4.8.
0
Proposition 4.10 Let P be a definite program and A
be an atom used as inputs to algorithm 4.5. Let A be
the set of atoms output by algorithm 4.5. Then A E A.
Proof A is put into A in the initialisation phase. From
definition 4.4, it follows that no clause in Pr contains a
condition literal with the same predicate symbol as A.
Therefore, A will never be removed from A.
0
This proposition ensures us that algorithm 4.5 does
not suffer from the kind of specialisation loss mentioned
in section 2.1: The definition of the predicate which appears in the query +--A, used as starting input for the
partial deduction, will indeed be replaced by a partial
deduction for A in P in the program output by the algorithm.
Finally, we have:
Corollary 4.11 Let P be a definite program, A =
p( i 1 , ... , in) be an atom and p' be a predicate symbol
used as inputs to algorithm 4.5. Let P/ be the program
output by algorithm 4.5. Then the following hold for any
instance A' of A:
• P/ U {+--A'} has an SLD-refutation with computed
answer () iff P U {+--A'} does.
• P/ U {+--A'} has a finitely failed SLD-tree iff P U
{ +-- A'} does.
Proof The corollary immediately follows from corollary 4.9 and proposition 4.10.
0
Theorem 4.7 and corollary 4.11 are the most important results of this paper. In words, their contents can
be stated as follows. Given a program and a goal, algorithm 4.5 produces a prograrri which provides the same
answers as the original program to the given query and
any instances of it. Moreover, computing this (hopefully
more efficient) program terminates in all cases.
479
5
Discussion and Conclusion
In [Lloyd and Shepherdson, 1991], important criteria ensuring soundness and completeness of partial deduction are introduced. In the present paper, we started
from a recently proposed strategy for finite unfolding
([Bruynooghe et al., 1991a]) and developed a procedure
for partial deduction of definite logic programs. We
proved this procedure produces programs satisfying the
mentioned criteria and, in an important sense, showing
the desired specialisation. Moreover, the algorithm terminates on all definite programs and goals.
The unfolding method as it is presented in section 3
was proposed in [Bruynooghe et al., 1991a]' but appears
here for the first time in this detailed and automatisable form, specialised for object level programs. It
tries to maximise unfolding while retaining termination.
We know, however, of two classes of programs where
the first goal is not achieved. First, meta programs
require a somewhat more refined control of unfolding.
This issue is addressed in [Bruynooghe et ai., 1991a].
We refer the interested reader to that paper (or to
[Bruynooghe et al., 1991b]) for further comments on this
topic. Second, (datalog) programs where the information
contained in constants appearing in the program text
plays an important role, are not treated in a satisfactory
way. Further research is necessary to improve the unfolding in this case. (A combination of our rule with the Rv
computation rule seems promising.) As far as the used
unfolding strategy does maximise unfolding, however, it
probably diminishes or eliminates the need for dynamic
renaming as proposed in [Benkerimi and Hill, 1989].
We now compare briefly algorithm 4.5 with the partial deduction procedure with static renaming presented
in [Benkerimi and Lloyd, 1990]. First, we showed above
that our procedure terminates for all definite programs
and queries while the latter does not. The culprit
of this difference in behaviour is (apart from the unfolding strategy used) the way in which msg's are
taken. We do this predicatewise, while the authors of
[Benkerimi and Lloyd, 1990] only take an msg when this
is necessary to keep A independent. This may keep more
specialisation (though only for predicates different from
the one in the starting goal), but causes non-termination
whenever an infinite, independent set A is generated (as
illustrated in example 4.1). Observe, moreover, that we
have kept a clear separation between the issues of control
of unfolding and of ensuring soundness and completeness. The use of algorithm 3.6 - or further refinements
(see above) - guarantees that all sensible unfolding and therefore specialisation - is obtained. The way in
which algorithm 4.5, in addition, ensures soundness and
completeness, takes care that none of the obtained specialisation is undone. Therefore, it does not seem worthwhile to consider more than one msg per predicate. Note
that one can even consider restricting the partial deduc-
tion to the predicate in the starting query and simply
retaining the original clauses for all other predicates in
the result program. This can perhaps be formalised as a
partial deduction where only a 1-step trivial unfolding is
performed for these predicates.
Next, the method in [Benkerimi and Lloyd, 1990] is
formulated in a somewhat more general framework than
the one presented here. A reformulation of the latter
incorporating the concept of L-selectability and allowing more than one literal in the starting query seems
straightforward. However, a generalisation to normal
programs and queries and SLDNF-resolution while retaining the termination property, is not immediate. In
e:g. [Benkerimi and Lloyd, 1990], it is proposed that
during unfolding, negated calls can be executed when
ground and remain in the resultant when non-ground.
This of course jeopardises termination, since termination of "ordinary" ground logic program execution is not
guaranteed in general. One solution is restricting attention to specific subclasses of programs (e.g. acyclic
or acceptable programs, see [Apt and Bezem, 1990],
[Apt and Pedreschi, 1990]). Another might be to use an
adapted version of our unfolding criterion in the evaluation of the ground negative call, and to keep the latter one in the resultant whenever the SLD(NF)-tree produced is not a complete one. Yet a third way might be
offered by the use of more powerful techniques related to
constructive negation (see [Chan and Wallace, 1989]).
Finally, [Gallagher and Bruynooghe, 1990] presents
another approach to partial deduction focusing both on
soundness and completeness and on control of unfolding.
The main difference is the control of unfolding by a condition based on maximal deterministic paths, where our
approach is based on maximal data consumption, monitored through well-founded measures.
References
[Apt and Bezem, 1990] K. R. Apt and M. Bezem.
Acyclic programs.
In D. H. D. Warren and
P. Szeredi, editors, Proceedings ICLP'90, pages 617633, Jerusalem, June 1990. The MIT Press. Revised
version in New Generation Computing, 9(3 & 4):335364.
[Apt and Pedreschi, 1990] K. R. Apt and D. Pedreschi.
Studies in pure prolog: Termination.
In J. W.
Lloyd, editor, Proceedings of the Esprit Symposium on
Computational Logic, pages 150-176. Springer-Verlag,
November 1990.
[Benkerimi and Hill, 1989] K. Benkerimi and P. M. Hill.
Supporting transformations for the partial evaluation of logic programs. Technical report, Department
of Computer Science, University of Bristol GreatBritain, 1989.
'
480
[Benkerimi and Lloyd, 1989J K. Benkerimi and J. W.
Lloyd. A procedure for the partial evaluation of logic
programs. Technical Report TR-89-04, Department
of Computer Science, University of Bristol, GreatBritain, May 1989.
[Martens and De Schreye, 1992J B. Martens and D. De
Schreye. Sound and complete partial deduction with
unfolding based on well-founded measures. Technical
Report CW-137, Departement Computerwetenschappen, K.U.Leuven, Belgium, January 1992.
[Benkerimi and Lloyd, 1990J K. Benkerimi and J. W.
Lloyd. A partial evaluation procedure for logic programs. In S. Debray and M. Hermenegildo, editors, Proceedings NACLP'90, pages 343-358. The MIT
Press, October 1990.
[Safra and Shapiro, 1986J S. Safra and E. Shapiro. Meta
interpreters for real. In Information Processing 86,
pages 271-278, 1986.
[Bruynooghe et al., 1991aJ M. Bruynooghe, D. De Schreye, and B. Martens. A general criterion for avoiding infinite unfolding during partial deduction of logic
programs. In V. Saraswat and K. Ueda, editors, Proceedings ILPS'91, pages 117-131, October 1991.
[Bruynooghe et al., 1991bJ M. Bruynooghe, D. De Schreye, and B. Martens. A general criterion for avoiding
infinite unfolding during partial deduction. Technical
Report CW-126, Departement Computerwetenschappen, K.U.Leuven, Belgium, March 1991.
[Chan and Wallace, 1989J D. Chan and M. Wallace. A
treatment of negation during partial evaluation. In
H. D. Abramson and M. H. Rogers, editors, Proceedings il1eta '88, pages 299-318. MIT Press, 1989.
[Gallagher and Bruynooghe, 1990]
J. Gallagher and M. Bruynooghe. The derivation of
an algorithm for program specialisation. In D. H. D.
Warren and P. Szeredi, editors, Proceedings ICLP'90,
pages 732-746, Jerusalem, June 1990. Revised version
in New Generation Computing, 9(3 & 4):305-334.
[Gallagher, 1986] J. Gallagher. Transforming logic programs by specialising interpreters. In Proceedings
ECAI'86, pages 109-122, 1986.
[Komorowski,1981J H. J. Komorowski. A specification
of an abstract Prolog machine and its application to
partial evaluation. Technical Report LSST69, Linkoping University, 1981.
[Komorowski, 1989] H. J. Komorowski. Synthesis of programs in the framework of partial deduction. Technical
Report Ser.A, No.81, Departments of Computer Science and Mathematics, Abo Akademi, Finland, 1989.
[Levi and Sardu, 1988J G. Levi and G. Sardu. Partial
evaluation of metaprograms in a multiple worlds logic
language. New Generation Computing, 6(2 & 3), 1988.
[Lloyd and Shepherdson, 1991J J. W. Lloyd and J. C.
Shepherdson. Partial evaluation in logic programming.
Journal of Logic Programming, 11(3 & 4):217-242,
1991.
The Mixtus approach to
[Sahlin, 1990j D. Sahlin.
automatic partial evaluation of full Prolog.
In
S. Debray and M. Hermenegildo, editors, Proceedings
NACLP'90, pages 377-398, 1990.
[Sterling and Beer, 1986] L. Sterling and R. D. Beer. Incremental flavor-mixing of meta-interpreters for expert
system construction. In Proceedings ILPS'86, pages
20-27. IEEE Compo Society Press, 1986.
[Sterling and Beer, 1989J L. Sterling and R. D. Beer.
Metainterpreters for expert system construction.
Journal of Logic Programming, pages 163-178, 1989.
[Takeuchi and Furukawa, 1986J A. Takeuchi and K. Furukawa. Partial evaluation of Prolog programs and its
application to metaprogramming. In H.-J. Kugler, editor, Information Processing 86, pages 415-420, 1986.
[Venken and Demoen, 1988J R. Venken and B. Demoen.
A partial evaluation system for Prolog : Some practical considerations. New Generation Computing, 6(2
& 3):279-290, 1988.
[Venken, 1984J R. Venken. A Prolog meta interpreter
for partial evaluation and its application to source to
source transformation and query optimization. In Proceedings ECAI'84, pages 91-100. North-Holland, 1984.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
481
A Framework for Analysing the Termination of Definite Logic
Programs with respect to Call Patterns
Danny De Schreye'"
Kristof Verschaetse t
Maurice Bruynooghe>4<
Department of COlnputer Science, K.U.Leuvell,
Celestijnenlaan 200A, B-3001 Heverlee, Belgium.
e-Inail: {dannyd,kristof,nlaurice }@cs.kuleuven.ac.be
Abstract
We extend the notions 'recurrency' and 'acceptability'
of a logic program, which were respectively defined in
the work of M. Bezem and the work of K. R. Apt and
D. Pedreschi, and which were shown to be equivalent
to respectively termination under an arbitrary computation rule and termination under the Prolog computation
rule. We show that these equivalences still hold for the
extended definitions. The main idea is that instead of
measuring ground instances of atoms, all possible calls
are measured (which are not necessarily ground). By
doing so, a more practical technique is obtained, in the
sense that "more natural" measures can be used, which
can easily be found automatically.
1
Introduction
In the last few years, a strong research effort in the field
of logic programming has addressed the issue of termination. From the more theoretical point of view, the results
obtained by Vasak and Potter [1986]' Baudinet [1988]'
Bezem [1989], Cavedon [1989], Apt and Pedreschi [1990],
and Bossi et ai. [1991] have provided several frameworks
and basic techniques to formulate and solve questions
regarding the termination of logic programs in semantically clear and general terms. Other researchers, such
as Ullman and Van Gelder [1988], Plumer [1990], Wang
and Shyamasundar [1990], Verschaetse and De Schreye
[1991], and Solm and Van Gelder [1991] have provided
practical and automatable tecliniques for proving the termination of logic programs with respect to certain classes
of queries at compile time.
In this paper, we propose an extension of the theoretical frameworks for the characterisation of terminating programs and queries proposed in [Bezem 1989] and
[Apt and Pedreschi 1990]. The framework does not only
provide slightly more general results, but also increases
the practicality of the techniques in view of automation.
·Supported by the National Fund for Scientific Research.
tSupported by ESPRIT BRA COMPULOG project nr. 3012.
Let us recall some definitions from [Bezem 1989] in
order to explain our motivation and the intuition behind
our approach.
Definition 1.1 (see [Bezem 1989]; Definition 2.1) A level
mapping for a definite logic program P is a mapping
1.1: Bp -+ IN.
Definition 1.2 (see [Bezem 1989]; Definition 2.2) A
definite logic program P is recurrent if there exists a
level mapping 1.1, such that for each ground instance
A-B l , ••. , Bn of a clause in P, IAI > IBi!, for each
i = 1, .. . ,n.
Definition 1.3 (see [Bezem 1989]; Definition 2.7) A definite logic program P is terminating if all SLD-derivations
for (P, -G), where G is a ground goal, are finite.
One of the basic results of [Bezem 1989] is that a program is recurrent if and only if it is terminating. Although this result is very interesting from a theoretical
perspective, it is not a very practical one in terms of automated detection of terminat.ing programs and queries.
The problem comes from the fact that the definition of
recurrency requires that the level mapping "compares"
the head of each ground instance of a clause with every corresponding atom in the body and imposes a decrease. Intuitively, what would be preferable is to obtain
a well-founding based on a measure function (or level
mapping), which only decreases on each recursive call to
a same predicate. This corresponds better to our intuition, since nontermination (for pure logic programs) can
only be caused by infinite recursion.
As we stated above, the problem is not merely related
to our intuition on the cause of nontermination, but more
importantly to the practicality of level mappings. Consider the following example.
Example 1.4
p(O)·
p([ HIT)) -
q([HIT)), p(T).
q( []).
q([HIT))
q(T).
-
482
It is not possible to take as level mapping a function
that maps ground instances p(:e) and q(:e) to the same
level, namely list-length(:e) if :e is a ground list, and 0
otherwise. Instead, the definition of recurrency obliges
us to take a level mapping that has a "unnatural" offset
(1 in this case).
Ip(:e)1
Iq{:e )1
1. We first compute all atoms that call occur as calls
during any SLD-derivation for the top-level goal( s)
under consideration.
2. We use an extended notion oflevel mapplllg, defined
on all such atoms - not only the growld ones.
3. We have an adapted definition of recurrency, with
as its most important features:
list-length(:e) + 1
list-length(:e ).
(a) the condition IAI > IBil is not imposed 011
growld instances of a clause, but instead, 011
each instance obtained after unification with a
(possible) call,
In a naive attempt to improve on the results of
[Bezem 1989], one could try to start from an adapted
definition for a recurrent program, in which the relation
IAI > IBil would only be required if A and Bi are atoms
with the same predicate symbol. However, the equivalence with termination would immediately be lost even for programs having only direct recursion - as the
following example shows.
Example 1.5
appenci([), L, L).
appenci((HIS], T,
p([HITJ) -
[H/uD -
append(S, T, U).
append(X, Y, Z), peT).
An "extended" notion of recurrency, where the level
mapping only relates the measure of ground instances of
the recursive calls, would hold with respect to the level
mapping:
Ip(:e )1
lappenci(:e, y, z)1
list-length(:e )
list-length(:e) .
On the other hand, the program is clearly not terminating - if it would be terminating, then we would have
shown that append/3 terminates for a call with all three
arguments free.
The heart of the problem is that in the definition of
recurrency, the level mapping is used for two quite distinct purposes at the same time. First, the level mapping
does ensure that on each derivation step, the measure of
a recursive descending call is smaller than the measure of
the ancestor call (or at least: for each ground instance of
such a derivation step). Second, since we are only given
that the top level goal is ground (or, in a more general
version of the theorem, bounded) - but we have no information on the instantiation of any of the descending
calls - the level mapping is also used to ensure that we
have some upper limit on the measures for the calls of
the (independent) recursive subcomputation evoked by
the original call. In the current definition, this is done
by imposing that the level also decreases between a call
and its descendants that are not related through recursion.
The way in which we address the problem here, differs
from the approach in [Bezem 1989] in three ways:
(b) "the decrease IAI > IBil is only imposed if A
and Bi are calls to the same predicate symbol.
(This is for direct recursion - in the context of
.indirect recursion, the condition is more complex).
One of the side effects of taking this approach is
that there is no more necessity to start the analysis
for one ground or bounded goal. The technique works
equally well when we start from any general set of
atoms. The additional advantage that we gain here is
that in practice, we are usually interested in the termination properties of a program with respect to some
call pattern. Such call patterns can always be specified in terms of abstract properties of the arguments in
the goals through mode information, type lllformation
or combined (rigid or integrated) mode and type information (see [Janssens and Bruynooghe 1990)). Any such
call pattern corresponds to a set of atoms in the concrete domain, and can therefore be analysed with our
approach.
The paper is organised as follows. In the next section we extend the equivalence theorem of [Bezem 1989]
in the way described above. In section 3 we take
a completely similar approach to extend results of
[Apt and Pedreschi 1990] on left termination. In section 4, we illustrate the improved practicality of
the new framework.
We also indicate how some
simple extensions are likely to provide full theoretical support for the automated technique proposed in
[Verschaetse and De Schreye 1991].
All proofs have been omitted from the paper. They
can be found in [De Schreye and Verschaetse 1992J.
2
Recurrency with respect to a
set of atoms
We first introduce some conventions and recall some
basic terminology. Throughout the paper, P will denote a definite logic program. The extended Herbrand Universe,
and the extended Herbrand Base,
Bffi, associated to a program P, were introduced ill
Up,
483
[Falaschi et al. 1989]. They are defined as follows. Let
Termp and Atomp denote the sets of respectively all
terms and all atoms that can be constructed from the
alphabet underlying to P. The variant relation, denoted ~, defines an equivalence. Up and BP are respectively the quotient sets Termp / ~ and Atomp / ~.
For any term t (or atom A), we denote its class in U:
(B~) as {(A). There is a natural partial order on Up
(and BP), defined as: s S; [if there exist representants s' of sand t' of [in Termp and a substitution
0, such that s' = t'O. Throughout the paper, 5 will denote a subset of B~. We define its closure under < as:
5 e = {A E Bffi \ :3B E 5 : A S; B}.
Definition 2.1 P is terminating with respect to S if for
any representant A' of any element A of 5, every SLDtree for (P, ~ A') is finite.
Denoting the classical notion of a Herbrand Base (of
ground atoms) over P as B p, then with the terminology
of [Bezem 1989] we have:
Lemma 2.2 P is terminating if and only if it is terminating with respect to B p.
Lemma 2.3 If all SLD-derivations for (P, ~A) are finite,
and 0 is any substitution, then all SLD-derivations for
(P, ~AO) are finite.
From lemma 2.3 it follows that in order to verify definition 2.1 for a set 5 ~
it suffices to verify the
finiteness of the SLD-trees for (P, ~A) for only one representant of each element in ..1. It also follows that P is
terminating with respect to a set 5 ~ B~ if and only if it
is terminating with respect to 5 e • In fact, given that P
terminates with respect to 5, it will in general be terminating with respect to a larger set of atoms than those in
se. It is clear that if all SLD-trees for (P, ~A) are finite,
and if H ~Bl' ... , Bn is a clause in P, such that A and
H unify, then all SLD-trees for (P, ~BiO), i = 1, ... , n,
where 0 = mgu(A, H), are finite. We can characterise
the complete set of terminating atoms associated to a
given set S as follows.
B:,
Definition 2.4 For any T ~ B~, define Tp-l(T) =
{BiO E Bffi \ A' is a representant of A E T, H
~ Bl"'" Bn is a clause in P, 0 = mgu(A', H) and
1 ~ i ~ n}.
Denote 1ts = {T E 2B~ \ 5 e ~ T}. 1t s is a complete
lattice with bottom element se.
Definition 2.5 Rs : 1is
-+
1is : Rs(T) = T U Tp-l(Tr.
Lemma 2.7 P is terminating with respect to 5 if and
only if P is terminating with respect to RsTw.
As a result of our construction (in fact: as the very
purpose of it), RsTw contains every call in every SLDtree for any atomic goal of S. Formally:
Proposition 2.8 Let call( P, 5) denote the set of all
atoms B, such that B is the subgoal selected by the
computation rule in some goal of some SLD-tree for a
pair (P, ~A), with A the representant of an element of
S. Then, call(P, 5) ~ RsTw.
We now introduce a variant of the definition of a level
mapping, where the mapping is defined on equivalence
classes of calls.
Definition 2.9 (level mapping)
A level mapping with respect to a set 5 ~ Bffi is a function
\.\ : RsTw -+ IN. A level mapping \.\ is called rigid
i!J:.or all A E Rs jw and for any substitution 0, IAI =
IAOI, i.e. the level of an atom remains invariant under
substitution.
With slight abuse of notation, we will often write IA I,
where A is a representant of A E Bffi. The associated
notion of recurrency with respect to 5 will not be defined on ground instances of clauses, but instead OIl all
instances (H ~Bl"'" Bn}e of clauses H ~Bl"'" En of
P, such that 0 = mgu(A, H), where A is a representant
of an element of Rs Tw. The definition in [Bezem 1989J
does not explicitly impose a decrease of the level mapping at each inference step. The level mapping's values
should only decrease for ground instances of clauses. By
considering more general instances of clauses (as above),
we can explicitly impose a decrease of the level mapping's
value during (recursive) inference steps. As a result, the
adapted level mapping no longer needs to perform different functionalities at once, and we can concentrate on
the real structure of the recursion.
Now, concerning this recursive structure, there are a
number of different possibilities for a new definition of
recurrency, depending on how we aim to deal with indirect recursion. In order not to confuse all issues involved
we first provide a definition for programs P, relying onI;
on direct recursion.
Definition 2.10 A (directly recursive) program P is recurrent with respect to S, if there exists a level mapping
1.1 with respect to 5, such that:
• for any A' representant of
A E Rs jw,
• for any clause H ~Bl"'" Bn in P, such that
mgu( A', H) = 0 exists,
Lemma 2.6 Rs is continuous.
As a result, the least fix-pohl.t for Rs is Rs Tw.
• for any atom Bi, 1 S; i S; n, with the same predicate
symbol as H: IA'I > IBiOI.
484
What is expressed in this definition is that for any two
recursively descending calls with a same predicate symbol in any SLD-tree for (represent ants of) atoms in S,
the level mapping's value should decrease. This condition has the advantage of being perfectly natural and
therefore, of being easy to verify in an automated way.
The only possible problem in view of automation is that
it requires the computation of Rsiw. But, this problem
is precisely the type of problem that can easily be solved
(or approxinlated) through abstract interpretation (see
section 4).
In the presence of indirect recursion, we need a more
complex definition, that deals with the problem that a recursive call with a same predicate symbol as an ancestor
call may only appear after a finite number of inference
steps (instead of in the body of the particular instance
of the applied clause). Tlus can be done in several ways.
We first provide a defuution related to the concept of a
resultant of a finite (incomplete) derivation. Based on
tIus definition, we prove the equivalence with ternunation. After that, we provide a more practical condition,
of which definition 2.10 is an obvious instance for the
case of direct recursion.
First, we need some additional terminology.
Definition 2.11 Let A be an atom and (Go = - A),
G l , G 2 , ••• , G n , (n > 0), a finite, incomplete SLDderivation for (P, _A).
Let 01 , ••• , On be the corresponding sequence of substitutions, and let 0 =
0 10 2 " , On and G n = -B I , ••• , Bm. With the terminology of [Lloyd and Shepherds on 1991] we say that
AO-B1 , • •• , Bm is the resultant of the derivation.
Definition 2.12 A resultant AO-B 1 , ••• , Bm of a
derivation (Go = -A), G l , ..• , Gn , is a recursive resultant for A if there exists i (1 ::; i ::; m), such that Bi has
the same predicate symbol as A.
Definition 2.13 (recurrency wrt a set of atoms)
A program P is recurrent with respect to S, if there exists
a level mapping, 1.1, with respect to S, such that:
• for any A' representant of A E Rs iw,
• for any recursive resultant A'O-B l , ... , B m , for A',
• for any atom B i , 1 ::; i ::; m, with the same predicate
symbol as A': IA'I > IBil.
Proposition 2.14 If P is recurrent with respect to S,
then P terminates with respect to S.
Just as in the framework of Bezem, the converse statement holds as well.
Theorem 2.15
P is recurrent with respect to S if and only if it is terminating with respect to S.
One of the nice consequences of this result is that we
can now relate the concept of a recurrent program in the
sense of [Bezem 1989] to recurrellCY with respect to a set
of (ground) atoms.
Corollary 2.16 P is recurrent if and only if it is recurrent with respect to B p •
It may seem surprising to the reader that two apparently very different notions such as recurrency and recurrency with respect to B p coincide. It is our experience
from our work in termination of wlfolding in the context
of partial deduction ([Bruynooghe et ai. 1991]) that this_
is not unusual. The reason is that conditions occurring
in these contexts require the 11 existence 11 of some wellfounded measure. The specific properties of such measures can take totally different form without loosing the
termination property. The only real difference lies in the
practicality.
We conclude the section by introducing a condition
that implies definition 2.13. This condition has the advantage over definition 2.13 that it does not rely on the
verification of some property for each of a potentially
infinite number of recursive resultants. Instead it only
requires such a verification for a finite number of clauses,
which can be characterised through the minimal, cyclic
collections of P.
Definition 2.17 (minimal cyclic collection)
A minimal cyclic collection of P is a finite sequence of
clauses of P:
such that:
• for each pair (i -=f j), the heads of the clauses, Ai
and A j , are atoms with distinct predicate symbols,
• Ai and
m),
• A~+l
Ai have the same predicate symbols (1 < i
:::;
has the same predicate symbol as AI'
Only a finite number of minimal cyclic collections exists.
They can easily be characterised and computed from the
predicate dependency graph for P.
Proposition 2.18
Let S ~ B~ and 1.1 a rigid level mapping with respect to
S, such that for any minimal cyclic collection of P (after
standardizing apart),
485
and for any AI,"" Am E Rsjw, with A~, ... , A~ as
their respective representants, and 0i = mgu(Ai, An,
(1 :::; i :::; m), the following condition holds:
{
IA~Oll
~ IA~I }
IA~Om-ll
>
IA~I
Then, P is recurrent with respect to 5.
The conditions in proposition 2.18 seem rather unnatural at first sight and need some clarification. First, observe that in the case of direct recursion - except for the
rigidity of the level mapping - the conditions coincide
with those of definition 2.10.
For the case of indirect recursion, the conditions that
one would intuitively expect, are that for each minimal
cyclic collection
-
Am -
BL···,A~, ... ,B~I
Bi,···, A:n+l , ... , B;:'m
and each A~ representant of Al E Rs jw, such that 0 =
mgu(A~, Ad and Oi = mgu(AL Ai), 1 < i :::; m, exist and
are consistent, we have
IA~I > IA~+1001" ·Oml·
The problem is that such a condition is not correct. Consider the clauses:
p(a,[_IX])
p( b, X)
q(b,X)
q(a, [_IX])
++++-
p(b,X).
q(a, [_IX]).
p(a, [_IX]).
q(b, X).
Acceptability with respect to
a set of atoms
All definitions and propositions from the previous section can be specialised for the Prolog computation rule.
Following [Apt and Pedreschi 1990], we call an SLDderivation that uses Prolog's left-to-right computation
rule, an LD-derivation.
JJIA~I > IA:n+lOml·
Al
3
(ell)
(el2)
(el3)
(cl4)
There are 4 associated minimal collections: ( cll ),
Consider for instance
(cl2,cl3), (cl3,cl2) and (cl4).
the derivation +-p(a, [_, _]), +-p( b, [_]), +-q(a, [_, _]),
-q( b, [-]), -p( a, [-, _D.
The problem is caused by resultants associated to
derivations that start with a clause from one minimal
cyclic collection - say (cl2) in the collection ( cl2 ,cl3) then shift to applying another collection, (cl4), and only
after this resume the first collection and apply clause
(cl3). The head of the third clause, q( b, X), does not
unify with q(a, [_IX']), and therefore, the condition on
the cycle (cl2,cl3) can not be applied.
So, we have to impose th; condition in proposition
2.18. It states that, even if the next call in the traversal
of a mininlal collection (An is not really related - as
an instance - to a call we obtained earlier (A~ei-l)' but
if - through the intermediate computation in another
minimal collection - the level between these two has
decreased anyway, then the final conclusion bet.ween the
original call to the collection and the indirectly depending one must still hold. We will not discuss the condition
any further here, but we will return to its practicality in
section 4.
Definition 3.1 (left termination wrt 5) Let 5 be
a subset of B:. A program P is left-terminating with
respect to 5 if for any representant A of any element of
5, every LD-derivation is finite.
Recall definitions 2.4 and 2.5. The motivation behind
these definitions was finding an overestimation of all calls
that are possible in any SLD-derivation using an arbitrary computation rule. The fact that no fixed computation rule is used, forces us to take the closure under all
possible instantiations in definition 2.5, and hence Rs j w
contains in general a lot more calls than can really occur
when a particular computation rule is chosen.
In this section, we focus our analysis on computations
that use Prolog's left-to-right computation rule. Therefore, adapted definitions of the Tp- 1 and Rs functions are
needed.
Definition 3.2 For any T ~ Bffi, define: Ppl(T) =
{BieO'l ... O'i-l E Bffi I A' is a representant of A E T,
H +- B 1, ... , Bn is a clause in P, = mgu(A', H), 1 ~
i ~ n, :30'1, ... , O'i-l, such that Vj = 1, ... , i-I: O'j is an
answer for (P, +-BjOO'l .. , O'j-t)}.
e
The answer substitutions O'j are computed using LDresolution. Let 1it;r denote {T E 2B~ I 5 ~ T}.
Definition 3.3 Rt;r : 1it;r
Ppl(T)
-t
1i~-r
: RZ;r (T)
=T u
In a completely analogous way as in the previous section, we find that R~-r is continuous. Hence, the least fix
point R~-r j w contains all atoms that can possibly occur
as a call when P is executed under the Prolog computation rule, and when a representant of an element from 5
is used as query.
Level mappings are now defined on RZ;r. Recursive resultants are constructed using the left-to-right computation rule. This allows us to consider only recursive resultants of the formp(sl,"" sn)-p(t 1 , ••• , tn), B 2 , · · · , Bm·
The analogue of recurrency with respect to a set 5 of
atoms, is acceptability with respect to 5.
Definition 3.4 (acceptability wrt a set of atoms)
A program P is acceptable with respect to 5,
if there exist.s a level mapping 1.\ with respect
to 5, such that for any p( S1, . . . , Sn), representant of an element in R~-r j w, and for any recursive resultant P(Sl,"" sn)e-p(t 1 , ••• , tn), B 2 , . · · , Em:
Ip(sl,,,,,sn)1 > Ip(t1, ... ,tn)l·
486
Theorem 3.5
P is acceptable with respect to S if and only if it is leftterminating with respect to S.
As in section 2, we provide a more practical, sufficient
condition. The result is completely analogous to proposition 2.18.
Proposition 3.6
Let S ~ B: and 1.1 a level mapping with respect to 5,
such that for any minimal cyclic collection of P (after
standardizing apart),
~
Al
Bf, ... , BlI ' A~, ... , B~I
and for any AI' ... ' Am E R~-r jw, with A~, ... , A~
as their respective representants, and with OJ
mgu(Aj, Ai) (1 ~ j ~ m) and crt is a computed an(1 ~ k ~ ij),
swer substitution for (P, '--B~8jcr{ ...
the following condition holds:
crtl)
IA~81 cr~
{
... crI I
~
I
IA~8m-1 cr~-1
... crr:-=~ I >
IA~I
IA~I
}
or integrated types of (Janssens and BruYllooghe 1990].
Abstract interpretation can be applied to automatically infer a safe approximation of Rs jw or R~-r jw (see
[Janssens and Bruynooghe 1990]).
Automated techniques for proving termination use
various types of norms. A norm is a mapping 11.11 : U: ---+
IN. Several examples of norms can be found in the literature. When dealing with lists, it is often appropriate
to use list-length, which gives the depth of the rightmost
branch in the tree representation of the term. A more
general norm is term-size, which counts the number of
function symbols in a term. Another frequently used
norm is term-depth, which gives the maximum depth of
(the tree representation of) a term.
However, we restrict ourselves to semi-linear norms,
which were defined in [Bossi et al. 1991].
Definition 4.1 (semi-linear norm)
A norm 11.11 is semi-linear if it satisfies the folowing conditions:
• IIVII = 0 if V
• IIf(t l
, ..• ,
is a variable, and
in)11
= c+lltil /1+·· ·+1 Itj", II where c E IN,
1 ::; i l < ... < im
on fin.
~
nand c, i l , ••• , im depend only
.lJIA~I
> IA~+1emcri·· ·cr7:l,
Then, P is acceptable with respect to 5.
4
Practicality and automation
A fully automated technique needs to address the following issues:
• safe approximations of Rs j w and R~-r j w must be
computed,
• precise and natural level mappings are needed, and
• the condit.ions in propositions 2.18 and 3.6 must be
automatically verifiable.
For left termination, there is one extra issue:
• some properties of the answer substitutions for the
atoms in R~-r jw are needed; ill particular, after application of a computed answer substitution we want
an estimation of the relationship between the sizes
of the argwnents of the atoms in R~-rjw.
Concerning the first issue, observe that in practice, the
sets of atoms S in the framework are likely to be specified
in terms of call patterns over some abstract domain. The
framework contains no implicit restriction on the kind of
abstractions that are used for this purpose. They could
be either expressing mode or type information, or even
combined mode and type information - as in the rigid
Examples of semi-linear norms are list-length and
term-size.
As was pointed out in [Bossi et al. 1991), proving termination is significantly facilitated if the norm of a term
remains invariant under substitution. Such terms are
called rigid.
Definition 4.2 (rigid term; see [Bossi et al. 1991])
Let 11.11 be a (semi-linear) norm. A term t is rigid with
respect to 11.11 if for any substitution cr, IItcrll = Iltll.
Rigidity is a generalisation of groundnessj by using this
concept it is possible to avoid restricting the definition of
a norm to ground terms only, a restriction that is often
found in the literature.
Given a semi-linear norm and a set of atoms S, a very
natural level mapping with respect to S can be associated
to them.
Definition 4.3 (natural level mapping)
Given is a semi-linear norm 11.11 and a set of atoms s.
1.lnat' the natural level mapping induced by S, is defined
as follows: Vp(t l , • .. ,in) E Rs jw:
Ip(t l
, .•• ,
tn)lnat
:EiEllitill,
= 0
if I :;t: 0
otherwise,
with 1= {i I Vp(Ul,.'.'U n ) E RsTw: Ui is rigid}.
Let us illustrate the practicality of such mappings and of the framework itself - with some examples.
487
Example 4.4
Reconsider example 1.4 from the introduc tion. Assume
that S = {p(:u) I :u is a nil-terminated list}. Let 11.11, be
the list-length norm. The argument positions of all atoms
in Rs j ware rigid under this norm. So, Ip(:u) Inat = 1I:v II,
and Iq(:z: )Inat = 1I:z:II,. The program is directly recursive,
so that it suffices to verify the conditions of definition
2.10.
For the clause p([HIT])+-q([HIT]),p(T) and for each
call p(:u) E Rsjw, with 0 = mgu(:u, [HIT]), we have
Ip(:u)lnat > Ip(T)Olnat' By the same argunlent, the condition on the clause q{[HIT])+-q(T) holds as well. Thus,
the program is recurrent with respect to S under the
natural, list-length level mapping with respect to S.
Assume that e(:z:), f(y) and g(z) are any atoms with
ground terms :v, y and z, and that:
As a second example, we take a program with indirect
recursion. It defines some form of well-formed expressions built from integers and the function symbols +/2,
*/2 and -/1.
In the context of left termination, definition 4.3 can be
adapted to produce equally natural level mappings with
respect to a set S. Obviously, Rs jw should be replaced
by R~-rjw. In the context of left termination there is
an extra issue, namely, (an approximation of) the set of
possible answer substitutions for an atom is needed. The
next example illustrates how this is handled.
Example 4.5
e{X + Y)
e(X)
f(X * Y)
f(X)
g(-(X))
g(X)
++++++-
f(X), e(Y). (ell)
f(X).
(cZ2)
e(X).
integer(X).
=
=
+-
f(X), e(Y).
g(X'), f(Y').
+-
e(X").
+-
1 Since collections are sequel\ces of clauses, cyclic permutatiol\s
should be considered as well.
mgu(f(y), f(X' * yI))
= mgu(g(z), g( -(X"))).
Also assume' that If(X)Oll :2: If(y)1 and Ig(X')021 :2:
Ig(z )1· We then have le(:u)1 > If(X)Oll :2: If(y)1 >
Ig(X')021 :2: Ig(z)1 > le(X")031, so that le(x)1 >
le(X")031, and the conditions of proposition 2.18 (for the
third cycle) are fulfilled. All other cycles can be verified
in a similar way. The conclusion is that the program is
recurrent with respect to S and the very natural termsize level mapping.
Example 4.6
p([],O)·
p([HIT], [GIS])
(cZS)
In the context of our framework, consider the set S =
{e(:u) I :u is ground}. Through abstract interpretation,
we can find that Rs j w ~ B p.
Let 11.ll t be the term-size norm. Again, the argument
positions of all atoms in Rs jw }tore rigid (even ground) under this norm. Thus, le(:u)lnat = 1I:z:llp If(:z:)lnat = 11:vll t
and Ig(:z:)lnat = 11:ull t . The program contains essentiallyl
6 minimal, cyclic collections: (cll), (el3), (ell, cl3, clS ),
(ell, cl4, elS ), (cl2, cl3, clS ), (cl2, cl4, clS ).
Let us consider, as an example, the third collection:
e(X + Y)
f(X' * Y')
g( -(X"))
()3
+-
d(H, [HIT], T).
d(G, [HIT], [HIU])
(d6)
3 x term-size(:v)+2
3 x term-size(:u) + 1
3 x term-size(:z:).
= mgu(e(:u), e(X + Y))
()2 =
g(X), f(Y). (d3)
g(X).
(el4)
The obvious choice for a level mapping for this program is
term-size. However, the program is not recurrent in the
sense of [Bezem 1989] with respect to this norm. Since it
is clearly terminating, a level mapping exists. The most
natural mapping (in the sense of [Bezem 1989]) we were
able to come up with is:
le{:u)1
If(:z:)1
Ig(:u)1
Ol
d(G, [HIT], U),p(U, S).
+-
d(G, T, U).
Assume that S = {p(:u, y) I :u is a nil-terminated list and
y is free}. Notice that Rs j w contains the set {p( x, y) I :z:
and yare free variables}. We are not able to define a level
mapping on Rs jw that can be used to prove recurrency
with respect to S. This is not surprising, since P is not
terminating with respect to S.
However, program P is left terminating with respect
to S. We prove this by showing that P is acceptable with respect to S. The set R~-r Tw is the union
of {p(;z:, y) I x is a nil- terminated list and y is free}
and {d(:v, y, z) I :v and z are free variables and y is a
nil-terminated list}. This can be found by using abstract interpretation. Since there is only direct recursion in program P, it suffices to show that: (1) for
any p(:v,y) E R~-rTw, ip(:v,y)1 > Ip(U,S)Oo-\, where
= mgu(p(:v, y), p([HIT], [GIS])) and 0- is a computed
answer substitution for (P, +- d(G, [HIT], U)O), and (2)
for any d(:v,y,z) E R~-rjw, Id(x,y,z)1 > Id(G,T,U)01,
where () = mgu(d(x,y,z),d(G,[HIT],[HIU])).
Now, in practice, the statement "0- is a computed answer substitution for (P, +- d( G, [HIT], U)O)" can be
replaced by "11[HIT]fJo-lll = 11U()0-11, + I". This latter
statement is a so-called linear size relation, which expresses a relation between the norms of the arguments
of the atoms in the success set of the program. Alternatively, it can also be interpreted as a (non-Herbralld)
o
488
model of the program. For more details we refer to
[Verschaetse and De Schreye 1992], where we describe
an automated technique for deriving linear size relations.
By taking this information into account, and by taking
Ip(;e, y)1 = II:ell, for any p(;e, y) E R~-" jw -notice that ;z;
is rigid with respect to 11.11, - we find: Ip(;e, y)1 = II;ell, =
II[HIT]Oll, =
Ip(U, 5)00-1·
II [HIT]Oo-lI, = II UO o-lI, + 1
>
11U00-1I, =
The second inequality, Id(;e, y, z)1 > Id(G, T, U)oI, is
more easy to prove. TIns time, the list-length of the
second argument can be taken as level mapping. Since
both inequalities hold, we can conclude that the program
is acceptable with respect to the set of atoms that is
considered.
Automatic verification of the conditions for recurrency
and acceptability is handled by reformulating them into
a problem of checking the solvability of a linear system of
inequalities. This part of the work is described in more
detail in [De Schreye and Verschaetse 1992].
References
[Apt and Pedreschi 1990] K. R. Apt and D. Pedreschi.
Studies in pure Prolog: termination. In Proceedings
Esprit symposium on computational logic, pages 150176, Brussels, November 1990.
[Baudinet 1988] M. Baudinet.
Proving termination
properties of Prolog programs: a semantic approach.
In Proceedings of the 3rd IEEE symposium on logic
in computer science, pages 336-347, Edinburgh, July
1988. Revised version to appear in Journal of Logic
Programming.
[Bezem 1989] M. Bezem. Characterizing termination of
logic programs with level mappings. In Proceedings
NACLP'89, pages 69-80,1989.
[Bossi et al. 1991] A. Bossi, N. Cocco, and M. Fabris.
N onns on terms and their use in proving universal
termination of a logic program. Technical Report
4/29, CNR, Department of Mathematics, University
of Padova, March 1991.
[Bruynooghe et ai. 1991] M. Bruynooghe, D. De Schreye, and B. Martens. A general criterion for avoiding
infinite unfolding during partial deduction of logic programs. In Proceedings ILPS'91, pages 117-131, San
Diego, October 1991. MIT Press.
[Cavedon 1989] L. Cavedon. Continuity, consistency,
and completeness properties for logic programs. In
Proceedings ICLP'89, pages 571-584, June 1989.
[De Schreye and Verschaetse 1992] D. De Schreye and
K. Verschaetse. Termination analysis of definite logic
programs with respect to call patterns.
Technical Report CW 138, Department Computer Science,
K.U.Leuven, January 1992.
[Falaschi et al. 1989] M. Falaschi, G. Levi, M. Martelli,
and C. Palamidessi. Declarative modeling of the operational behaviour of logic languages. Theoretical Computer Science, 69(3):289-318,1989.
[Janssens and Bruynooghe 1990]
G. Janssens and M. Bruynooghe. Deriving descriptions of possible values of program variables by means
of abstract interpretation. Technical Report CW 107,
Department of Computer Science, K.U .Leuven, Mardi
1990. To appear in Journal of Logic Progranulling, ill
print.
[Lloyd and Shepherdson 1991] J. W. Lloyd and J. C.
Shepherdson. Partial evaluation in logic programming.
Journal of Logic Programming, 11(3 & 4):217-242, October/November 1991.
[Plumer 1990] L. Plumer. Termination proofs for logic
programs. Lecture Notes in Artificial Intelligence 446.
Springer- Verlag, 1990.
[Sohn and Van Gelder 1991] K. Sohn and A. Van
Gelder. Termination detection in logic programs using argument sizes. In Proceedings 10th symposium on
principles of database systems, pages 216-226. Acm
Press, May 1991.
[Ullman and Van Gelder 1988] J. D. Ullman and A. Van
Gelder. Efficient tests for top-down termination of
logical rules. Journal A CM, 35(2):345-373, April 1988.
[Vasak and Potter 1986] T. Vasak and J. Potter. Characterisation of terminating logic programs. In Proceedings 1986 symposium on logic programming, pages
140-147, Salt Lake City, 1986.
[Verschaetse and De Schreye 1991] K. Verschaetse and
D. De Schreye. Deriving termination proofs for logic
programs, using abstract procedures. In Proceedings
ICLP'91, pages 301-315, Paris, June 1991. MIT Press.
[Verschaetse and De Schreye 1992] K. Verschaetse and
D. De Schreye. Automatic derivation of linear size relations. Technical Report CW 139, Department Computer Science, K.U.Leuven, January 1992.
[Wang and Shyamasulldar 1990] B. Wang and R. K.
Shyamasundar. Towards a characterization of termination of logic programs. In Proceedings of international workshop PLILP'90, Lecture Notes in Computer Science 456, pages 204-221, Linkoping, August
1990. Springer- Verlag.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
489
Automatic Verification of GHC-Programs:
Termination
Lutz Pliimer
Rheinische 'Friedrich-Wilhelms-UniversiHit Bonn, Institut fiir Informatik III
D-5300 Bonn 1, Romerstr. 164
lutz@uran.infonnatik.uni-bonn.de
Abstract
We present an efficient technique for the automatic generation of tennination proofs for concurrent logic programs,
taking Guarded Hom Clauses (GHC) as an example. In contrast to Prolog's strict left to right order of evaluation, termination proofs for concurrent languages are complicated by a
more sophisticated mechanism of sub goal selection. We introduce the notion of directed GHC programs and show that
for this class of programs goal reductions can be simulated
by Prolog-like derivations. We give a sufficient criterion for
directedness. Static program analysis techniques developed
for Prolog can thus be applied, albeit with some important
modifications.
1. Introduction
With regard to termination it is useful to distinguish between
two types of software systems or programs: transformational
and reactive [HAP85]. A transformational system receives
an input at the beginning of its operation and yields an output
at the end. If the problem at hand is decidable, termination of
the process is surely a desirable property. Reactive systems,
on the other hand, are designed to maintain some interaction
with their environment. Some of them, for instance operating systems and database management systems, ideally
never terminate and do not yield a fmal result at all. Based on
the process interpretation of Hom clause logic, concurrent
logic programming systems have been designed for many
different applications including reactive systems and transformational parallel systems. While for some of them termination is not a desirable property, for others it is. In this paper we discuss how automatic termination proofs for concurrent logic programs can be achieved automatically.
inductive assertions and termination proof techniques, substantially depend on the strict left to right order of evaluation
in most cases and thus cannot easily be applied to concurrent
languages. Concurrent languages delay sub goals which are
not sufficiently instantiated. Goals which loop forever when
evaluated by a Prolog interpreter may deadlock in the context
of a concurrent language. These phenomena may suggest
that termination proofs for concurrent logic programs require
a different approach. This paper, however, shows that
techniques which have been established for pure Prolog are
still useful in the context of concurrency.
Our starting point is the question under which conditions
reductions of a concurrent logic program can be simulated by
Prolog-like derivations. We take Guarded Hom Clauses
(GHC, see [UED86]) as an example, but our results can
easily be extended to other concurrent logic programming
languages such as PARLOG, (Flat) Concurrent Prolog or
FCP(:). Our basic assumptions are the restriction of unification to input matching, nondetenninistic sub goal selection
and resuming of sub goals which are not sufficiently instantiated. Since we consider all possible derivations, the commit
operator does not need special attention.
In general simulation is not possible: if there is a GHCderivation of g' from g, g' cannot necessarily be derived
with Prolog's computation rule.
One could now try to augment simulation by program
transfonnation. Let, for instance, P' be derived from P by
including all clause body permutations. Although P' may be
exponentially larger than P, there are still derivations which
are not captured.
Example 1.1:
Program: p f- q,r.
s.
q
f-
s,t.
r f- u,v.
v.
Automatic proof techniques for pure Prolog programs
have been described in several papers including [ULG88]
and [PLU90a]. Prolog is characterized by a fixed
computation rule which always selects the leftmost atom.
Deterministic sub goal selection and strict left to right order of
evaluation cannot be assumed for the concurrent languages.
This goal can be reduced to f- t,u by nondeterministic
subgoal selection, but not by a Prolog like computation,
even after adding the following clauses:
p f- r,q. q f- t,s. r f- v,u.
Static program analysis techniques, which are well established for sequential Prolog, such as abstract interpretation,
The reason is that in order to derive f- t,u, the subderivations of f- q and f- r have to be interleaved.
Goal:
f-
p
490
The question arises whether there is an interesting subclass for which appropriate simulations can be defmed. Such
a class of programs will be discussed in Section 3. The main
idea is to assume that if a sub goal p may produce some
output on which evaluation of another sub goal q depends,
then p is smaller w.r.t. some partial ordering. Whether a
program maintains such a property, which we will call directedness, is undecidable. We will then introduce the
stronger notion of well-formedness which can be checked
syntactically. Well-formedness is related to directionality,
which is discussed in [GRE87]. Well-formedness is sufficient but not necessary for directedness, and it will tum out
that quite a lot of nontrivial programs (including for instance
systolic programs as discussed in [SHA87a] and most of the
examples given in [TIC91]) fall into this category. In Section
5 we will demonstrate how termination proof techniques
which have been established for pure Prolog can be
generalized such that they apply to well-formed GHC
programs.
The rest of this paper is organized as follows. Section 2
provides basic notions. Section 3 introduces the notion of directed programs and shows that this property is undecidable.
It provides the notion of well-formedness and shows that it
is sufficient for directedness. Section 4 discusses oriented
and data driven computation and shows that after some simple program transformation derivations with directed GHCprograms can be simulated by Prolog-like derivations.
Using the notion of S-models introduced in [FLP89], Sections 5 and 6 show how termination proofs can be achieved
automatically.
2. Basic Notions
We use standard notation and terminology of Lloyd [Ll087]
or Apt [APT90]. Following [APP90] we will say LD-resolution (LD-derivation, LD-refutation LD-tree) for SLD-resolution (SLD-derivation, SLD-refutation SLD-tree) with the
leftmost selection rule characteristic for Prolog.
Next we define GHC programs following [UED87] and
[UED88].
A GHC program is a set of guarded Hom clauses of the
following form:
(m >0, n> 0)
where H, GI, ... ,Gm and BI, ... ,Bn are atomic formulas. H
is called a clause head, the Gi's are called guard goals and
the Bi's are called body goals. The part of a clause before 'I'
is called a guard, and the part after 'I' is called a body. One
predicate, namely '=', is predefmed by the language. It unifies two terms.
Declaratively, the commitment operator 'I' denotes conjunction, and the above guarded Hom clause is read as "H is
implied by Gl, ... ,Gm and Bt. ... ,Bn". The operational semantics of GHC is given by parallel input resolution restricted by the following two rules:
Rule of Suspension:
• Unification invoked directly or indirectly in the guard of a
clause C called by a goal G (Le. unification of G with the
head of C and any unification invoked by solving the
guard goals of C) cannot instantiate the goal G.
• Unification invoked directly or indirectly in the body of a
clause C called by a goal G cannot instantiate the guard of
C or G until C is selected for commitment.
Rule of Commitment:
• When some clause C called by a goal G succeeds in
solving (see below) its guard, the clause C tries to be selected for subsequent execution (Le., proof) of G. To be
selected, C must first confirm that no other clauses in the
program have been selected for G. If confirmed, C is selected indivisibly, and the execution of G is said to be
committed to the clause C.
An important consequence is that any unification intended
to export bindings to the calling goal must be specified in the
clause body and use the predefmed predicate '='.
The operational semantics of GHC is a sound - albeit not
complete - proof procedure for Hom clause programs: if
~ B succeeds with answer substitution S, then V(BS) is a
logical consequence of the program.
Subsequently, we may fmd it convenient to denote a goal
g by the pair , i.e. g = GS. A single derivation step
reducing the i-th atom of G using clause C and applying mgu
S' is denoted by -7 i;C . Subscripts may
be omitted.
3. Directed Programs
An annotation dp for an n-ary predicate symbol p is a function from {l, ... ,n} to {+,-} where '+' stands for input and
'-' for output. We will write p(+,+,-) in order to state that
the first two arguments of p are input and the last is output
A goal atom A generates (consumes) a variable v if v occurs at an output (input) position of A. A is generator for B,
if some variable v occurs at an output position of A and at an
input position of B; in this case, B is consumer of A.
r
Let denote a tuple of terms. A derivation -7*
respects the input annotation of p if vS v for every
variable v occurring at an input position of per).
=
A goal is directed if there is a linear ordering among its
atoms such that if Ai is generator for Aj then Ai precedes Aj
in that ordering. A program is directed, if all its derivations
respect directedness, i.e., all goals derived from a directed
goal are directed. Note that directedness of a goal is a static
491
property which can be checked syntactically. Directedness of
a program, however, is a dynamic property.
Theorem 3.1: It is undecidable, whether a program is directed.
Proof: Let tM(X) be a directed GHC simulation of a Turing
machine M for a language L which binds X to halt if and
only if M applied to the empty tape halts. Such a simulation
is for instance described in [PLU90b]. Next consider the
following procedures PM and q:
PM(X,Y) f- tM(A), q(A,X,Y).
q(halt,X,X).
and the (directed) goal
f- r(X,Y), s(Y:Z), PM(X,Z),
The following annotations are given:
t M(-)· q(+,-,-). PM(-'-)' r(+,-).
s(+,-).
If M halts on the empty tape, tM(A) will bind A to 'halt',
PM(X,Y) will identify X and Y and thus the given goal can
be reduced to the undirected goal f- r(X,Y), s(Y,X).
Decidability of program directedness would thus imply solvability of the halting problem: contradiction. •
Next we introduce the notion of well-formedness of a
program w.r.t. a given annotation and show that this property is sufficient for directedness.
A goal is well-Jormed if it is directed, generators precede
consumers in its textual ordering, and its output is unrestricted. Output of a goal is unrestricted if all its output arguments are distinct variables which do not occur (i) at an
output position of another goal atom and (ii) at an input position of the same atom.
A program P is well-formed if the following conditions
are satisfied by each clause H f- Glo ... ,Gm I Blo ... ,B n in P:
• f- Blo ... ,Bn is well-formed
• the input variables of H do not occur at output positions
of body atoms.
The predicate '=' has the annotation '- = -'. It is convenient to have two related primitives: '==' (test) and '¢::'
(matching) which have the same declarative reading as '='
but different annotations, namely '+ == +' and '- ¢:: +'.
Note that the goal f- r(X,Y),s(Y,Z), PM(X,Z) is not
well-fonned because its output is restricted: Z has two output
occurrences.
The next example is taken from [UED86]:
Example 1: Generating primes
primes(Max,Ps).
gen(N ,Max,Ns)
gen(N,Max,Ns)
+- true I
gen(2,Max,Ns),sift(Ns,Ps).
+- N ~ Max I Nl <= N + 1,
gen(Nl,Max,Nsl), Ns <=[N/Nsl}.
+- N > Max I Ns <= [J.
+- filter(P,Xs,Ys),sift(Ys,Zsl),
Zs <= [P/Zsl}.
sift([} ,Zs)
+- Zs <= [J.
filter(P,[X/Xs},Ys) +- X mod P == 0 I filter(P ,xs,Ys).
filter(P,[X/Xs},Ys) +- X mod P :f: 0 I filter(P ,xs,Ysl),
Ys <= [XIYsl].
filter(P,[J,Ys)
+- Ys <= [J.
sift([PjXs},Zs)
primes(+,-).
gen(+,+,-).
sift(+,-).
filter(+,+,-).
The call primes(Max,Ps) returns through Ps a stream of
primes up to Max. The stream of primes is generated from a
stream of integers by filtering out the multiples of primes.
For each prime P, a filter goal filter(p,Xs,Ys) is generated
which filters out the multiples of P from the stream Xs,
yielding Ys.
In this example all input terms are italic and all output
terms are bold. It can easily be seen that this program is
well-formed.
Another example for a well-formed program is quicksort.
The call qsort([HIL],S) returns through S an ordered version
of the list [HIL]. To sort [HIL] L is split into two lists Ll and
~ which are itself sorted by recursive calls to qsort.
Example 2: Quicksort
qt: qsort([J,L)
<12: qsort([H/L},S)
f-
sl: split([J ,x,Lt,~)
f-
~:
f-
split([X/Xs} ,y,Lt ' ,~)
f-
~: split([X/Xs},y,Lt,~') f-
L ¢:: [J.
split(L,H,A,B),
qsort(A,A t ), qsort(B,Bt),
append(A 1 ,[HIB1}'S),
L t ¢::: [J, L2 ¢:: [J.
X 5 YI
split(Xs,Y,Ll'L 2),
L t '¢::: [XIL 1}·
X > Y I split(Xs,Y,L t ,L 2),
Ll'
al: append([] ,L1,Ll)
f-
a2: append([H/Ll] ,L2,L3)
f-
split(+,+,-,-). qsort(+,-).
¢::
[X/L 2]·
Ll ¢::: L l'
append(Ll,L2,L3'),
L3 ¢::: [H/L J '}.
append(+,+,-).
Theorem 3.2: Let P be a well-formed program, g a wellformed goal and g -7 * g' a GHC-derivation. Then g' is
well-formed.
Proof: See [PLU92].
Well-formed programs respect input annotations:
1beorem 3.3: Let -7* be a derivation and v
an input variable of p(~). Then v9 = v.
Proof: Goal variables can only be bound by transitions applying '=' or '¢::', since in the other cases matching substitutions are applied. Since both arguments of '=' are output,
and '¢::' also binds only output variables, input variables
cannot be bound. •
492
4. Oriented and Data Driven Computations
Our next aim is to show that derivations of directed programs can be simulated by derivations which are similar to
LD-derivations. In this context we fmd it convenient to use
the notational framework of SLD-resolution and to regard
GHC-derivations as a special case.
We say that an SLD-derivation is data driven, if for each
resolution step with selected atom A, applied clause C and
mgu 8 either C is the unit clause (X =X ~ true.) or C is
B f - BI, ... ,B n and A = B8. Data driven derivations are the
same as GHC derivations of programs with empty guards.
The assumption that guards are empty is without loss of
generality in this context
Next we consider oriented computation rules. Oriented
computation rules are similar to LD-resolution in the sense
that goal reduction strictly proceeds from left to right. They
are more general since the selected atom is not necessarily
the leftmost one. However, if the selected atom is not
leftmost, its left neighbors will not be selected in any future
derivation step.
More formally, we define: A computation rule R is
oriented, if every derivation ~ ... ~ .. , via
R satisfies the following property: If in Gi an atom Ak is
selected, and Aj (j < k), is an atom on the left of Ak , no
further instantiated version of Aj will be selected in any
future derivation step.
Our next aim is to show that, for directed programs, any
data driven derivation can be simulated by an equivalent data
driven derivation which is oriented. To prove the following
Lheorem, we need a slightly generalized version of the
switching lemma given in [LL087]. Here g ~i;C;9 g'denotes a single derivation step where the i-th atom of g is resolved with clause C using mgu 8.
Lemma 4.1: Let gk+2 be derived from gk via
gk ~i;ck+l;ek+l ghl ~j;Ck+2;~+2 gk+2 . Then there is a
derivation gk ~j;Ck+2·;ek+t' gk+I' ~j;Ck+1';9k+2' gk+2' such
that gk+2' is a variant of gk+2 and Ck+I', C k+2' are variants
of Ck+2 and ChI.
Proof: [LL087] The difference between this and Lloyds
version is that the latter refers to SLD-refutations, while ours
refers to (possibly partial) derivations. His proof, however,
also applies to our version. •
Theorem 4.2: Let P be a directed program and a
directed goal. Let D = ~ ... be a data driven
derivation using the clause sequence CI, ... ,Ck. Then there is
another data driven derivation D': ~ ...
using a clause sequence Cjl', ... ,Cjk' , where is a
permutation of , each Ci' is a variant of Ci and
Gk'8 k' is a variant of Gk8k, and D' is oriented.
Proof: Let gj be the first goal in D where orientation is violated, i.e. there is the following situation:
gj :
gj:
R' is selected in gi and R is selected in g .. Now we
get a new
switch subgoal selection in g·-l and g.
J
J
. .
denvatlOn D*. In D* we look again for the first goal
violating the orientation. After a finite number of iterations,
we arrive at a derivation D' which is oriented. It remains to
be shown that D* (and thus D') is still data driven.
ani
Note that up to gj-l both derivations are identical. Above,
the switching lemma implies that, from gj+ lon, the goals of
D' are variants of those of D.
Now let Q be the selected goal of Gj-l. Since orientation
is violated for the first time in G., Q is to the right of R. (If
. .
J
I =J - I then Q =R', and otherwise j-l would have the first
violation of orientation.) Since gj-l = is directed,
Q8~-1 is not a generator of R8j _1 and thus R8j_l and R8j are
vanants. Let H be the head of the clause applied to resolve R
in . Since D is data driven, R8 j_1 =Hcr for some cr,
and so R8j = Hcr' for some cr'. Thus D' is data driven. •
Corollary 4.3: Let P be a directed program and g a directed goal. Then g has an infmite data driven derivation if
and only if it has an infinite data driven derivation which is
oriented.
According to Corollary 4.3, in our context it is suffIcient
to consider data driven derivations which are oriented. Such
derivations are still not always LD-derivations since the selected atom is not necessarily leftmost. If it is not, however,
its left neighbors will never be reactivated in future derivation steps; thus w.r.t. termination they can simply be
ignored. The same effect can be achieved by a simple
program transformation proposed in [FAL88]:
Pro(P) =
{p(X) f - I p is an n-ary predicate appearing
in the body or the head of some clause of P
and Xis an n-tuple of distinct variables}
Parto{P) =
p u Pro(P)
Simulation Lemma 4.4: Let D =Go ~ ... Gi-l ~ Gi be
an oriented SLD-derivation of Go and P where
Gi-l = f- Bl, ... ,Bj ... ,B n and
Gi
= f- (Bl, ... ,Bj_l,Ci+,Bj+l, ... ,Bn>8i·
Cj+ is the body of the Cj applied to resolve Bjo Then there is
an LD-derivation
D'
Go ... ~ ... Gk-l'-7Gk' with Part.o{P), where
Gk-t' = f- Bj ... ,B n and
Gk' = f- (Ci+,Bj+l ... ,B n)8i:
=
Proof: Whenever an atom B is selected in D which is not
the leftmost one, first the atoms to the left of B are resolved
493
away in D' with clauses in PrG(P), and then D' resolves B in
the same way as D .•
An immediate implication is the following:
Theorem 4.5: If g has a non-terminating data driven oriented derivation with P, then it has a nonterminating LDderivation with Parto(P).
The converse, however, is not true. Consider, for
instance, the quicksort example from above, extended by the
following clauses
qo:
so:
ao:
qsortL,..).
splitL,_,_,_).
appendL,_,..).
While the LD-tree for f- qsort([2,1],x) is finite in the
context of the standard deftnition of qsort, it is no longer true
for the extended program. Consider the following infinite
LD-derivation:
+- qsort([2,1],X)
+- split([1],2,A,B), qsort(A,A1)'
qsort(B,Bt), append(A1,[HIB 1],S).
by so:
+qsort(A,A 1)'
qsort(B,Bt), append(A1,[HIB 1],S).
+- splitL,_,_,_) •...
by q2:
+- qsortL,_), ...
by so:
This derivation, however. is not data driven: resolving
qsort(A,A 1) in the third goal with qZ yields an mgu which is
not a matching substitution.
For data driven LD-derivations we get a stronger result:
Theorem 4.6: There is a nonterminating data driven oriented derivation for g with P if and only if there is a nonterminating data driven LD-derivation for g with Parta(P).
Proof: The only-if part is implied by the simulation lemma.
For the if-part, consider a nonterminating. data driven LDderivation D. By removing all applications of clauses in
Pro(P), one gets another derivation D'. D' is a nonterminating data driven oriented derivation. •
Restriction to LD-derivations which are data-driven
enlarges the class of goal/program pairs which do not loop
forever. In the general case, termination of quicksort
requires that the first argument is a list. Termination of
append requires that the first or the third argument is a list.
Restriction to data-driven LD-derivation implies that no
queries of quicksort or append (and many other procedures
which have finite LD-derivations only for certain modes)
loop forever. However, goals like +- append(X.Y,z) or +quicksort(A,B) deadlock immediately.
5. Termination Proofs
In this section. we will give a sufficient.condition for terminating data driven LD-derivations. We will concentrate on
programs without mutual recursion. In [PLU90b] we have
demonstrated how mutual recursion can be transformed into
direct recursion. We need some further notions.
For a set T of terms, a norm is a mapping 1...1: T ~ N.
The mapping II.. .II: A ~ N is an input norm on (annotated)
atoms, if for all B = p(tl, ... ,trJ. II B II = Lie I I ti I, where I
is a subset of the input arguments of B.
Let P be a well-formed program without mutual recursion. P is safe. if there is an input norm on atoms such that
for all clauses c Bo f- Bl, ... ,Bh ... ,B n the following
holds: If Bi is a recursive literal (Bo and Bj have the same
predicate symbol), cr a substitution the domain of which is a
subset of the input variables of Bo and 8 is a computed
answer for f- (Bl, ... ,Bj-l)cr, then IIBocr811 > IIBicr811.
=
We can now state the following theorem:
Theorem 5.1: If P is a safe program and G = +- A is wellformed, then all data driven LD-derivations for G are fmite.
PROOF: By contradiction. Assume that there is an infmite
data driven LD-derivation D. Then there is an infinite subsequence D' of D containing all elements of D starting with the
same predicate symbol p. Let di and di+ 1 be two consecutive
elements of D' and
di
+- P(tl, ... ,tr), .. .
di+1
+- p(t't. ... ,t'r), .. .
and
P(Sl, ... ,Sr) +- B}, ... ,Bk,P(S'l, ... ,S'r), ...
=
=
=
be the clause applied to resolve the ftrst literal of di, 8j the
corresponding mgu. Then there is a computed answer
substitution 8' for +- (Bl, ... ,Bk)8i such that p(t'l, ... ,t'r) =
p(s' 1•••• ,S'r)8i8 '.
Since D is data driven, 8j is a matching substitution, i.e.
p(t., ...•ft.) = p(t., ... ,tr)8i. Since P is well-formed, Theorem
3.3 further implies p(t..... ,tr) p(t..... ,tr)8j8'. We also
have p(tl, .. .,lr)8j8' = p(Sf, ... ,Sr)8i8'.
=
Since P is a safe program
IIp(Sl, ... ,sr)8i8'1I > IIp(S'l, ... ,s'r)8j8'1I and thus
IIp(tl, ... ,tr)8j8'1I> IIp(t' ..... 't'r)8j8'11. Since the range of
11 ... 11 is a well-founded set, D' cannot be infinite.
Contradiction.•
The next question is how termination proofs for data
driven LD-derivations can be automated. In [PLU90b] and
[PLU91], a technique for automatic termination proofs for
Prolog programs is described. It uses an approximation of
the program's semantics to reason about its operational
behavior. The key concept are predicate inequalities which
relate the argument sizes of the atoms in the minimal
Herbrand model of the program. Now in any program
Parto(P) for every predicate symbol p occurring in P there is
a unit clause p(X). Thus the minimal Herbrand model Mp of
P equals the Herbrand base Bp of P, a semantics which is
494
not helpful. To overcome this difficulty, we will consider Smodels which have been proposed in [FLP89] in order to
model the operational behaviour of logic programs more
closely. The S-model of a logic program P can be characterized as the least fixpoint of an operator Ts which is defmed
as follows:
Ts(I) = (B I 3 Bo ~ Bt. ... ,Bk in P,3 BI', ... ,Bk' e I,
30= mgu«BI, ... ,Bk),(BI', ... ,Bk'
and B = BoO}.
»,
We need some notions defmed in [BCF90] and [PLU91].
Let.1 be a mapping from a set of function symbols F to N
which is not zero everywhere. A norm I ... I for T is said to
be semi-linear if it can be defmed by the following scheme:
It I
0
if t is a variable
I t I = .1(0 + Lie I I ti I
ift= f(tt. ... to),
where I ~ {1, ... ,n} and I depends on f.
A subterm lj is called selected if i e I.
A term t is rigid w.r.t. a norm I ... I if I t I = I tS I for all
substitutions S. Let t[v(i)~s] denote the term derived from t
by replacing the i-th occurrence ofv by s. An occurrence v(i)
of a variable v in a term t is relevant w.r.t. I ... I if
I t[v(i)~s] I '# I t I for some s. Variable occurrences which
are not relevant are called irrelevant A variable is relevant if
it has a relevant occurrence. Rvars(t) denotes the multiset of
relevant variable occurrences in the term t.
Proposition 5.2: Let t be a term, tS be a rigid term and V
be the multiset of relevant variable occurrences in t. Then for
a semi-linear norm 1...1 we have ItSI = Itl + Lve V IvSl
Corollary 5.3: ItS I ~ ItI.
Proof: [PL U91]
For an n-ary predicate p in a program P, a linear predicate
inequality Lip has the form Lie I Pi + c ~ L je J Pj, where I
and J are disjoint sets of arguments of p, and c, the offset of
Lip, is either a natural number or 00 or a special symbol like
y. I and J are called input resp. output positions of p (w.r.t.
Lip).
Let Ms be the S-model of P. LIp is called valid (for a
linear norm 1. . .1) ifp(tl, ... ,to) e Ms implies Lie I llil+ c ~
Lje J Itjl.
Let A = P(tl, ... ,tn). With the notations from above we
further define:
Theorem 5.4: Let Lie I Pi + c ~ L je J Pj be a valid linear
predicate inequality, G = ~ p(t..... ,tn)o, a well-formed goal,
V and W the multisets of relevant input resp. output variable
occurrences of P(tl, ... ,trJ and S a computed answer for G.
Then the following holds:
L
i) Lie I IliaSI + c ~
je J IljO'SI.
ii) LveV I vaS I + F(p(tl, ... ,tn),Llp ~
Lwe w IwO'S I .
Proof: According to [FLP89], p(t ..... ,trJO'S is an instance
of an atom p(s ..... ,SO> in the S-model Ms of P. Since the
output of G is unrestricted, tjO'S = Sj for all je J. Proposition
5.2 implies ItiO'SI ~ Itil for all ie I. Thus
L
Lie I ItiO'SI ~ Lie I I Si I and Lje J Itj0'81 =
je J I Sj I
which proves the first part of the theorem. The second part is
implied by Prop. 5.2. •
Theorem 5.4 gives a valid inequality relating variables occurring in a single literal goal. Next we give an algorithm for
the derivation of a valid inequality relating variables in a
compound goal.
Algorithm 5.5 goal_inequality(G ,LI,U, W,.d,b)
A well-formed goal G = +- B ..... ,Bo, a set LI
with one inequality for each predicate in G, and
two multisets U and W of variable occurrences.
Output: A boolean variable b which will be true if a valid
inequality relating U and W could be derived, and
an integer 11 which is the offset of that ineqUality.
begin
M :=W;.1 :=0; V:= U;
For i := n to 1 do:
IfM () Vout(Bi,Llp):F. 0 then
M:= (M\ Vout(Bi,Llp» u (Vio(Bi,Llp)\ V);
V := V\ Vio(Bi,Llp);
11 := 11 + F(Bj,Llp). fi
If M = 0 then b:= true else b:= false fi
end.
Input:
Next we show that the algorithm is correct:
Theorem 5.6: Assume that the inequalities in LI are valid
and b is true, (J is an arbitrary substitution such that GO' is
well-formed and S is a computed answer substitution for
GS. Then Lvev IV(JSI+ 11
~ LweW'wO'SI holds.
Proof: See [PLU92].
F(A,Llp)
Vin(A,Llp)
=
Lie I Ittl- Lje J Itjl + c.
Algorithm 5.5 takes time O(m) where m is the length of G.
=
u rvars(ti)
[PLU90b] gives an algorithm for the automatic derivation
of inequalities for compound goals based on andlor-dataflow
graphs which has exponential -runtime in the worst case.
Algorithm 5.5 makes substantial use of the fact that G is
well-formed: each variable has at most one generator; which
makes the derivation of inequalities detenninistic.
Vout(A,LIp) =
u rvars(tj)
Fin(A,Llp)
=
Lie I llil
Fout(A,Llp)
=
LjeJ Itjl
F(A,LIp) is called the offset of A w.r.t. Lip.
495
6. Derivation of inequalities for S-models
In Aection 5 it has been assumed that linear inequalities are
given for the predicates of a program P. We now show how
these inequalities can be derived automatically. We assume
that P is well-fonned and free of mutual recursion. Let P<1t q
if P '# q and p occurs in one of the clauses defining q.
Absence of mutual recursion in P implies that <1t defmes a
partial order which can be embedded into a linear order.
Thus there is an enumeration {Pl, ... ,Pn} of the predicates of
p such that Pi ~ Pj implies i < j. We will process the predicates of P in that order, thus in analyzing p we can assume
that for all predicates on which the definition of p depends
valid inequalities have already been derived. Note that a
trivial inequality with offset 00 always holds.
Let in(A) and out(A) denote the sets of input resp. output
variables of an atom or a set of atoms according to the annotation of the given programs.
Algorithm 6.1: predicate_inequalities(P,LI):
Input:
A well-fonned program P defming Plo ... ,Pn.
Output: A set LI of valid inequalities for the predicates of P.
begin
LI:= 0
For i:= 1 to n do:
begin
Let Cl, ... ,Cm be the clauses defining Pi,
Let M, N be the input resp. output arguments of Pi,
li := LJ1E M lpJ11 + 'Y ~ LVE Nlpvl.
hi:= true.
For j:= 1 to m do:
begin
Let cJ be Bo ~ Bh ... ,Bk.
goal_mequality((+- B 1, .•. ,B k),
LIu{li} ,Vin(Bo),Vout(Bo), 6i,bJ
c:= 6i + Fout(Bo,li) - Fin(Bo,li).
Wi := hi
If c contains '00' then Wi := Wi" false
(*)
elseif c is an integer then Wi :=Wi A (y ~ c)
(**)
elseif c = y + dAd ~ 0 then Wi := Wi " true
else if c = y + d " d > 0 then Wi := Wi " false
(***) elseif c = k * 'Y + n" k> 1,
then Wi := Wi A (y S n/(I-k).
end
If Wi is satisfiable then let 8i be the smallest value for
'Y which satisfies Wi
else let ~i be '00'.
Replace 'Y in li by ~.
LI := LI u {li}
end
end
Theorem 6.2: The inequalities derived by the algorithm
are valid.
Proof: By induction on the number of predicates n in P.
The case n = 0 is immediate. For the inductive case, assume
that the derived inequalities for the predicates Pt. ... ,Pn-l are
valid. Let 10 be the minimal S-model of P restricted to the
predicates Ph ... ,Pn-l. In the context of the program which
consists of the defmition of Pn only, let = 10 and ~ =
1
.Ts
). Its limes equals the minimal S-modelofP
restricted to the predicates Pl, ... ,Pn. Now we have to show
that the inequality li derived for Pn is valid w.r.t ~ . The
. proof is now by induction on m. The case m = 0 is implied
by the induction assumption on n. Assume that the theorem
holds for n - 1. We have to show that the inequality for Pn
holds for the elements of~. Now lett B E ~ and
Bo ~ Bh ... ,Bk be the clause applied to derive B. We have
B = BoO, where 0 is a computed answer substitution for
~ Bh ... ,Bk, which is a well-fonned goal. Let V = in(Bo)
and W = out(BO>. Let LI be the set of inequalities derived by
Algorithm 6.1, and A be the result of calling
goal_inequality«+- Blo ... ,B0,LI,V,W, A, hi). Theorem 5.6
and the induction assumption imply
(:t:)
LVE v ivOI + A ~ LWE w lwei
T?
('I7
=
=
Since B BoO, we have Fin(B,li) Fin(Bo,li) + LVEVlvel
and Fout(B,li) = Fout(Bo,li) + LWE w1wel. Let a be the
offset of Ii. We have to show
(:t::t:) Fin(B,li) + a. ~ Fout(B,li).
If bi is false or 6 is 00, we are done since in that case a is 00.
Three more cases remain. (*) and (**) immediately imply
(:t::t::t:) a. ~ 6 + Fout(Bo,li} - Fin(Bo,li).
(***) implies a. ~ n/(I-k) and thus a ~ n + k*a for some n
such that n + k*a. =6 + Fout(Bo,li) - Fin(Bo,li}. Again
(:t::t::t:) follows. (:t:) and (:t::t::t:) together now imply (:t::t:). •
Note that Algorithm 6.1 again has run-time complexity
O(n), where n is the length of the given program P.
Algorithm 6.1 is not yet able to derive PI ~ P2 for a unit
clause like p(X,Y) with mode(p(+,-». This inequality, however, holds since in a well-fonned goal the output argument
of p will always be unbound. To overcome this difficulty,
we assume that before calling predicate_inequalities(p,LI), P
will be transfonned to P' in the following way:
Defme freevars(Bo +- Bl, ... ,BJ =
(out(Bo) \out(B ..... ,BJ) u in(Bl, ... ,BJ \ in(Bo».
Now for the clause c = Bo +- Bl, ... ,Bn in P let freevars(c)
= {Yl, ... ,Ym }. Replace c by Bo +- q(Yl, ... ,Ym>,Bl, ... ,B n
where a new predicate q is defmed by the unit clause
q(XJ, ...
with mode(q{+, ... ,+». Note that, after that
transformation, P' is well-fonned if P is well-formed, and if
an inequality is valid for P' it is valid for P as well. In the
example mentioned above, input for Algorithm 6.1 will be
the program P = (q(X). ,p(X,Y) ~ q(Y)} and the output
will be (0 ~ qt, PI ~ P2).
,xm>
Another improvement can be made by considering subsets of
the input arguments in order to achieve stronger inequalities.
This, however, makes the algorithm less efficient.
496
languages, Proc. of the Int. Con! of Fifth
Gen. Compo Systems, ICOT 1988.
7. Example
We finally discuss how, with the techniques given so far, it
can be shown that the GHC program for quicksort specified
in Section 3 tenninates for arbitrary goals.
Corollary 4.3 and Theorem 4.5 imply that is suffices to
consider data-driven LD-derivations of the extended program
for qsort including the clauses SO, ao and qo. According to
Theorem 5.1 we only have to show that the three predicates
of the program are safe. This is easy to show for split and
append. In fact these procedures are structural recursive. It
is more difficult to prove of qsort because in q2 both
recursive calls contain the local variables A and B. For this
reason we need a linear predicate inequality for split which
has the form splitt + "( ~ Split3 + Split4. After the
transforamtion mentioned at the end of the last paragraph So
will have the following fonn:
f- q(L3' L,J
so: split(L 1,L2,L 3,L,J
Now SO and S1 give 'Y ~ 0 (case * in Algorithm 6.1), while S2
and S3 give 'true' (case **). Thus we get splitt + 0 ~ Split3 +
split4. In order to prove safety of qsort, we only have to
consider q2. Using this inequality Algorithm 5.5
immediately shows IIqsort([HIL],s)811 > IIqsort(A,A1)811 and
IIqsort([HIL],S)81 > IIqsort(B,B1)811 for all answer
substitutions 8 for split(H,L,A,B). Thus qsort is safe.
[FLP89]
Falaschi, M., Levi, G., Palamidessi, C.,
Martelli, M., Declarative Modeling of the
Operational Behavior of Logic Languages,
Theoretical Computer Science 69,1989.
[GRE87]
Gregory, S., Parallel Logic Programming in
P ARLOG, Addison Wesley, 1987.
[HAP85]
Harel, D., Pnueli, A., On the development of
reactive systems, in Apt, K. R (ed.) Logics
and Models of Concurrent Systems, Springer
1985.
[LL087]
Lloyd, 1., Foundations of Logic
Programming, Springer Verlag, Berlin,
second edition, 1987.
[pLU90a]
Pltimer, L., Tennination proofs for logic
programs based on predicate inequalities, in
Warren, D.H.D., Szeredi, P. (eds.),
Proceedings of the Seventh International
Conference on Logic Programming, MIT
Press 1990.
[PLU90b]
Pltlmer, L., Tennination Proofs for Logic
Programs, Springer Lecture Notes in
Artificial Intelligence 446, Berlin 1990.
[PLU91]
Pltimer, L., Termination proofs for Prolog
programs operating on non ground tenns,
1991 International Logic Programming
Acknowledgment
Part of this work was perfonned while I was visiting CWI.
K. R Apt stimulated my interest in concurrent logic programming.
References
[APP90]
[APT90]
Apt, K. R, Pedreschi, D., Studies in pure
Prolog: Tennination, Technical Report CSR9048, Centre for Mathematics and
Computer Science, Amsterdam, 1990.
Apt, K. R, Introduction to logic
programming, in Leeuwen (ed.), The
Symposium, San Diego, California, 1991.
[PLU92]
PlUmer, L., Automatic Verification of GHCPrograms: Tennination, Technical Report,
UniversiUit Bonn, 1992.
[SHA87]
Shapiro, E., Concurrent Prolog, Collected
Papers, MIT Press 1987.
[SHA87a]
Shapiro, E., Systolic Programming: A
paradigm of parallel processing, in [SHA87].
[TIC9l]
Tick, E., Parallel Logic Programming, MIT
Press 1991.
[UED88]
Ueda, K., Guarded Hom Clauses: A Parallel
Logic Programming Language with the
Concept of a Guard, in Nivat, M., Fuchi, K.
(eds.), Programming of Future Generation
Computers, North-Holland 1988.
[UED86]
Ueda, K., Guarded Hom Clauses, in
[SHA87].
[ULG88]
Ullman, J. D., van Gelder, A., Efficient
Tests for Top-Down Tennination of Logical
Rules, Journal of the ACM 35, 2,1988.
Handbook ofTheoretical Computer Science,
North Holland 1990.
[BCF90]
[FAL88]
Bossi, A., Cocco, N., Fabris, M., Proving
Tennination of Logic Programs by Exploiting
Tenn Properties, Technical Report Dip. di
Matematica Pura e Applicata, Universita di
Padova, 1990.
Falaschi, M., Levi, G., Finite failures and
partial computations in concurrent logic
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
497
Analogical Generalization
Takenao OHKAWAt
Toshiaki MORIt
Noboru BABAGUCHIt
Yoshikazu TEZUKAt
t Education Center for Informati(;m Processing, Osaka University
t Dept. of Communication Eng., Faculty of Eng., 'Osaka University
2-1, Yamadaoka, Suita, Osaka, 565 Japan
e-mail: ohkawa@oucom5.oucom.osaka-u.ac.jp
Abstract
Approaches to learning by examples have focused 011 generating general knowledge from a lot of examples. In this paper
we describe a new learning method, called analogical generalization, which is capable of generating a new rule which
specifies a given target concept from a single example and
existing rules. Firstly we formulate analogical generalization
based on the similarity between a given example and existing
rules from the logical viewpoint. Secondly, we give a new procedure of inductive learning with analogical generalization,
called ANGEL. The procedure consists of the following five
steps: (1) extending a given example, (2) extracting atoms
from the example and selecting a base rule out of the set of
existing rules, (3) generalizing the extracted atoms by means
of the selected rule as a guide. (4) replacing predicates, and
(5) generating a rule. Through the experiment for the system
for parsing English sentences, we have clarified that ANGEL
is useful for acquiring rules on knowledge based systems.
1
Introduction
Machine learning has a great contribution to improving performance through automated knowledge acquisition and refinement, and so far, various types of machine learning
paradigms have been considered. In particular, learning from
examples, which can form general knowledge from specific
cases given as input examples, has been well studied and a
lot of concerned methods have been proposed[Mitchell1977,
Dietterich and Michalski 1983, Ohkawa et al. 1991].
Generally, in learning from examples, we have to give a
lot of examples to the learner. Why are so many examples
required? We think the reason for this is that the bias for
restricting the generalization is relatively weak, because it is
independent of the domain. Hov?ever, when a human being
acqt~ires new knowledge, he would not always require a lot of
examples. As the case may be, he can learn from one example. We think tbis is because h~ decides a strong bias for the
gencralization according to the domain, and generalizes the
examples based on the bias, That is, in order to generalize a
few examples appropriately, a strong bias which depends on
the domain is indispensable.
It is necessary to consider how the strong bias should
be provided. Let us recall the behavior of a human being
again. ·When c.cquiring new knowledge, he often utilizes similar knowledge which is already known. In other words, the
existence of similar knowledge may help for him to associate
new knowledge. This process is called analogy. Analogy is
considered promising to realize learning from a few examples.
Since analogy will be regarded as one of the most effective
way for restriction on generalization, modeling its process
will make it possible to provide a domain dependent bias.
In this paper, we propose a new learning method, called
ANGEL (ANalogical GEneraLization), which is capable of
generating a new rule from a single example. In ANGEL,
both the rules and the examples are represented as logical formulas. We introduce the notion of analogy[Winston 1980],
namely, the similarity between the example and the existing rules as the bias for the generalization[Mori et al. 1991].
The similarity is determined by comparing the atoms of both
the example and the existing rules. Based on the siIl'Jlarity,
firstly, ANGEL extracts atoms from the example and selects
a rule out of the existing rules; next, it generates a new rule
by generalizing the extracted atoms by means of the selected
rule as a guide.
The next section describes the definition of analogical generalization. In this section we consider analogical generalization from the logical viewpoint. Section 3 gives the procedure
of ANGEL which is a method for learning based on analogical generalization. In this section, we also give consideration
to the experimental result oflearning by ANGEL. Finally in
section 4, we clarify the originality of ANGEL through its
comparison to other related ·works.
2
Analogical generalizat.ion
To represent knowledge, we use the form which conforms
to first order predicate logic. Two kinds of forms, called a
fact and a rule, are provided. A fact is represented as an
atom, while a rule is represented as a Horn clausc, which is
expressed in the form of
where cx,/31, ... ,/3n are atoms. Letting l' be a rule cx
/31,'" ,/3n, we denote the consequence of rule r, namely cx, by
cons( r), and denote the premise of rule r, namely /31, ... ,/3n,
f--
by prem(r).
The underlying notion of analogical generalization is that
a new rule is generated by generalizing an input example,
which consists of facts, based on the similarity between the
example and the existing rules. Before formulating analogical
generalization, we define the similarity between two atoms,
498
and next formalize the similarity between two finite sets of
atoms.
Let us consider the similarity of father(x, y) to
mother(Jim,Betty) and brother(Tom,Joe). For each atom,
the following R-deducible sets are derived as
2.1
Similarity between two atoms
First, we define some basic notations. A substitution is a
finite set of the pair v It, where v is a variable, t is a term,
and the variables are distinct. Let {} = {VI/tb ... , vn/t n }
be a substitution and e be an expression, which is either a
literal or a conjunction or disjunction of literals. Then e{} is
the expression obtained from e by replacing each occurrence
of the variable Vi in e by the term ti. If S is a finite set
of expressions and {} is a substitution, SO denotes the set
{e{} leE S}.
Let {} be a substitution and S be a finite set of atoms. If S{}
is a singleton, S is unifiable by {} and we write unifiable(S).
Now, we give the following two functions, and define the
similarity between atoms by means of these functions. Let
R be a set of existing rules, and Q and Q' be atoms.
R-shnilar sets of father(x,y) for mother(Jim,Betty) and
brother(Tom, Joe) are as follows.
'!F(RI, father(x, y),mother( Jim, Betty))
= {parent(x,y),family(x,y)}
'!F(RI' father(x, y), brother(Tom, Joe)) = {family(x, y)}
Accordingly
father(x, y)
is
more
similar
to
mother(Jim, Betty) than brother(Tom, Joe) with respect to
RI. This result matches our intuition very well.
2.2
Definition 1 ( R-deducible set )
~(R, Q) ~ {fi I R
- [II(1") : II(1')O].
[A: B] !):. [A' : C].
Now, we assume C I is the
foll~wing set
of atoms.
C1 = {brother(Tom,Joe), strikes(Joe,Mark)}
A maximally preceding correspondence of Al to C I with
respect to Rl is shown as
{(fathe:!"(x, y), brother (Tom, Joe»,
(kills(y,z),strikes(Joe,Mark»},
and therefore,
3. For an arbitrary rule 1'" (E R) and an arbitrary set
of atoms A(~ E"), the following relation does not
hold.
[A: II(1'")]
R
>- [II(r)O : II{1")].
• Significance condition
For a r1..lle 1" which satisfies similarity condition 2, letting
P(x)).
Preparations
In this paper, we use standard formal logic and notations,
while defining the following. An n-ary predicate U is
generally expressed by AXQ, where x is a tuple of n object
variables, Q is a formula in which no object variables
except variables in x occur free. If t is a tuple of n terms,
U(t) stands for the result of replacing each occurrence of
(elements of) x in Q with (each corresponding element
of) t simultaneously. For any formulas A and F, when
A f- F and If F (that is, F is not valid), we say F is a
genuine theorem of A and express it simply as A f-F.
We will use a closed formula of first order logic A for a
theory, (generally n) terms T for a tar-yet and (generally
n) terms B for a base. A property is expressed by a predicate, for instance, a similarity and a p1'Ojected pr~perty
are expressed by predicates, Sand P respectively.
2.2
Thus, the essential information newly obtained by analogy is F( x) in the above rather than the explicit projected property P. Making J (x) staud for the (,Ollj unction of the example-based information and F ( x). the
above meta-sentence is transformed equivalently to
Approach To A Seed of Analogy
We can understand analogical reasoning as follows:
(1) Example-based Information:
"An object, x' (corresponding to a base), satisfies
both properties Sand P (3x'.(S(x') 1\ P(x')))."
(2) Similarity-based Information: "Another object,
x (corresponding to a target), satisfies a shared
property S with x' (S(x))."
(3) Analogical Conclusion: "The object ,r would satisfy the other property P (P(x))."
Then,
.• Analogical reasoning is to reason (3) from A
together with (1)+(2)."
(A)
Let this understanding be our starting point of analySIS.
As analogy is not, generally, deductive, this starting
point may, unfortunately, be expressed only as follows.
In the notation of proof theory,
(3)
because A is closed. This implies that a rule must be
a theorem of A and that the rule concludes any object
which satisfies J(x) to satisfy P when it satisfies S. Once
J ~s satisfied, (by reason of (S(x) :> 'P(x)),) the analogical conclusion ("an object satisfies P") can be deduced
from the similarity-based information ("the object satisfies S). For this reason, this rule will be called the
analogy prime rule (it will be specified in more detail
later), J will be called the analogy justification.
Moreover, it is improbable that the analogy prime rule
is a valid formula, because, if so, any pair of predicates·
can be an analogical pair of a similarity and a projected
property independently of A. Thus, the analogical prime
rule must be a genuine theorem of A,
A
~Vx.(J(x) 1\
S(x) :::> P(x)).
(4)
Consequently, an object T which satisfies S is concluded
to satisfy P from an analogy prime rule by analogical
reasoning that assumes that T satisfies the analogy justification (J(T)). That is, our starting point (A) can be
specified from two aspects.
"An analogical conclusion can be obtained from
an analogy prime rule together with examplebased information and similarity-based information."
(B)
"A non-deductive jump by analogy, if it occurs,
is to assume that the analogy justification of the
prime rule is satisfied."
(C)
In the following part of this paper, the analogy justification and non-deductivity will be further explored .
Before beginning an abstract discussion, it may be useful to see concrete examples of analogical reasoning. The
next section introduces ·'target" examples of analogical
reasoning to be clarified here.
2.3
Examples
As analogy, however, infers P(x) from the premises, it
implies that some knowledge is assumed in the premise
part of (1). Let the assumed knowledge be F(x), providing that it depends on the x in general. That is,
Examplel: Determination Rule[3].
"Bob's car
(CBob ) and Sue's car (C Sue ) share the property of being
1982 Mustangs (Mustang). We infer that Bob's car is
worth about $3500 just because Sue's car is worth about
$3500. (We could not, however, infer that Bob's car is
painted red just because Sue;s car is painted red.)"
Example-based Information:
A,3x'.(S(x') 1\ P(x')),S(x),F(x) f- P(x).
Model(Cs ue , Mustang) 1\ Value(Cs ue ,$3500),
A,3x'.(S(x') 1\ P(x'»,S(x)
If P(x).
(1)
(2)
(5)
507
Similarity-based Information:
M odel( CBob, Mustang),
(6)
Example2: Brutus and Tacitus [1]. ~~ Brutus feels
pain when he is cut or burnt. Also, Tacitus feels pain
when he is cut. Therefore, if Tacitus is burnt. he will
feel pain."
Example-based Information:
(Suffer(Brutus, Cut) =:l FeeIPain(Brutus))
I\(Suffer(Brutus,Burn) =:l FeeIPain(Brutus))
(7)
(8)
Similarity-based Information:
Suffer(Tacitus, Cut) =:l FeeIPain(Tacitus)
(9)
Example3: Negligent Student l . "When I discovered that one of the newcomers (5tudentT) to our laboratory was a member of an orchestra club (Orch), remembering that another student (5tudentB) was a member of the same club and he was often negligent of study
(Study), I guessed that the newcomer would be negligent
of study, too."
Exa.mple-based Information:
Member ...of(StudentB, Orch)
I\N egligenLof(StudentB, Study)
(10)
Similarity-based Information:
Member_of(StudentT,Orch)
2.4
(11)
Logical Analysis: a rule as a seed
of analogy
In treating analogy in a formal system, as the information of a base object being Sand P is projected into
a target object, it is desirable to treat such properties
as objects so that we can avoid the use of second order langua.ge. As an example, the fact that Bob's car is
a Mustang is represented by "Model(C Bob , Mustang)"
rather than simply as "Mustang(CBob )". In the remaining part, we rewrite S(x) to ~(x, S) and P(x) to I1(x, P).
~ will be called a similar attribute, II will be a projected
attribute,S as an object will be a similar attribute value,
and P as an object will be a projected attribute value.
Then, (4) is rewritten
.A
~'v'x,s,p.(J(x,s,p) 1\
I:(x,s) =:l II(x,p)),
(12)
considering the most general case that the analogy justification J depends on all of these factors.
Again, when 3-tuple < object: X, similar attribute
value: 5, projected attribute value: P > satisfies the
analogy justification J, object X is conjectured to satisfy the projected property AX .I1( x, P) (analogical conclusion) just because X has the similarity Ax.~(x, 5).
lThe author thanks Satoshi Sato (Hokuriku Univ.) for showing
this challenging example.
That is, J (x, s, p) can be considered a condition. where
x could be concluded to be p from x being s by analogical
reasoning.
Now, recalling that an analogical conclusion is obtained from the analogy prime rule with example-based
information and similarity-based information, consider
what information can be added by the information in
relation to the analogy prime rule.
1) Example-based Information:
This shows that
there exists an object as a base which satisfies a
similarity and a projected property ( :l.T'.(~(;r'. S) 1\
I1(x'. P)) ). It seems to be adequate that the base.
B. satisfying ~(x', S) can also be derived to satisfy I1(:r'. P) from the prime rule. because B can be
considered a target which has similarity S. That is.
3-tuple < B, S, P > satisfies the analogy justification. Consequently, from arbitrariness in selection
of an object as a base in this information, what is
obtained from this information is :lx'. J(x', S, P).
2) Similarity-based Information: This shows that
an object as a target, T, satisfies the same property S in the above. Just by this fact, an analogical
conclusion is obtained, by assuming that the object
satisfies J by some conjecture. That is, there exists some attribute value p' and 3-tuple < T, S. p' >
satisfies J (:lp'. J(T,5,p')).
3) Analogical Conclusion: With the above two
pieces of information, an analogical conclusion. "T
satisfies I1(x, P)", is obtained from the analogy
prime rule. Therefore. such 3-tuple < T. S, P >
satisfies J ( J(T, S, P) ).
In the above discussion, T, 5, and P are arbitrary.
Therefore. the following relation about the analogy justification turns out to be true:
Vx.s,p.( :Jx'.J(x',s,p) 1\ :Jp'.J(x,s,p')
=:l J(x,s,p) ).
(13)
(13) is able to represent it equivalently as follows:
J(x,s,p) = Jatt(s,p) 1\ Jobj(X,S),
(14)
where both J att and J obj are predicates, that is, each of
them has no free variables other than its arguments .
The point shown by this result is that any analogy
justification can be represented by a conjunction in which
variable .T and variable p occur separately in different
conjuncts.
By (12) and (14), the analogical prime rule can be
defined as follows.
Definition 1 Analogy Prime Rule
A l'ule is called an analogy prime l'ule w.r.t.
< E(x, s); I1(x,p) >, if it has the following form:
508
VX,s,p.(Jatt(s,p) 1\ Jobj(;r,S) 1\ L;(x,s):) II(.1:,p)), (15)
where J att , J obj . 2: and II are predicates. (That is. each of
Jatt(s,p), Jobj(J',s), ~(J:,s) and II(x,p) is a forrn.ula in
which no variablt other than its arguments occm's free.)
o
In (15), Jatt(s,p) will be called the attribute justification and J obj ( X , s) will he called the object justification.
Also, by the above discussion, the following two conjectures can be considered as causes which make analogy
non-dedu~tive .
• Example-based Conjecture (EC): An object
shows a existing concrete combination of a similarity and a projected property. This specializes the
prime rule and allows it to be applicable to a similar object. Assuming some generally non-deductive
inference system under A, "~A" (we will propose
such a system later),
3x.(L;(x,S) 1\ II(x,P))
f'vA
Jatt(S,P).
(16)
• Similarity-based Conjecture (SC): Just" because an ob jed satisfies S', application of the specialized prime rule to the object is allowed.
L;(x, S)
r-- A Jobj(X, S).
(17)
In case that the attribution justification (Jatt (s, p))
is a valid formula, example-based information becomes
unnecessary in yielding analogical conclusion. Thus, it
could, in general, be essential in analogical reasoning to
guess Jatt(s,p) which is not a valid formula. The objectjustification (Jobj (x,.5)) is, still, important in another
sense, because it can be considered to express a really significant similarity. It is not an unusual case when a really
significant similarity is not observable. Consider a case
of Example 2. Having a nervous system will be a sufficient condition for an object to feel pain. thus, whether
an object has a nervous system is a significant factor in
making a conjecture on feeling pain. In this case, however, we could, without dissection. not obtain a direct
evidence which shows that Tacitus and Btutus have nervous systems, while we obtain only a circurnstantial evidence that the both feel pain when they are cut. Thus,
the similarity-based conjecture is to guess such a really
significant but implicit similarity, the object justification
(Jobj ( x, s)), from an observed similarity ~(x, s).
To summarize, a logical analysis of analogy could draw
conclusions as follows.
Analogical reasoning is possible only if a certain analogical prime rule is a genuine theorem of a given theory
and the process of analogical reasoning can be divided
into the following 3 steps: 1) the attribute justification
part of the rule is satisfied by EC from example- based information. 2) the object justification part of the rule is
satisfied by SC from similarity-based information, and,
3) from similarity-based information and the analogy
prime rule specialized by the two preceding steps, an
analogical conclusion is obtained by deduction.
A question remains unclear, that is, what inference
is EC and what SC? Though we cannot identify the
mechanism underlying each of the conjectures, we can
propose a (generally) non-deductive inference system as
their candidates. The next section shows this.
3
Non-deductive Inference for
Analogy
This section explores a type of generally non-deductive
inference by which a conjecture G is obtained from a
given theory A with additional information K.
Generally speaking, what properties should be satisfied by a, generally, non-deductive inference? It might
be desirable that a non-deductive inference satisfies at
least the following conditions. First, it should subsume
deduction, that is, any deductive theorem is one of its
theorems, because any deductive conclusion would be
desirable. Secondly, any conclusion obtained by it must
be able to be used deductively, that is, from such a conclusion, it should be possible to yield more conclusions
using, at least, deduction. And, thirdly, any conclusion
obtained must be consistent with given information. We
define a class of inference systems which satisfy the above
three conditions.
An inference system under a theory A
(written ~A) is deductively expansible if the following
conditions are satisfied. For any set of sentences A and
f{ and any sentences G and H,
Definition 2
i) Subsuming deduction:
if
A, f{
f-- G
then
K ~A G.
ii) Deductive usefulness:
if
iii)
f{ ~A
G
and A,K,G f-- H.
then K ~A H.
Consi.5tency:
if K ~A G
and AUK
AuK U {G} is consistent.
Z.5
consistent, then
The following inference system is an example of a deductively expansible system.
509
Definition 3 G is a conjecture from A based on
(atomic) circumstantial reasoning (written
iff
i) A,K r G,
J{ ~~
J{
by
G)
2.
V:l' .( Job j ( X , 5')
or
r G
if there exists a minimal set of atomic formulas 3 E
s.t.
A, E r K. and Au E i s consistent if
AUK is consistent4 .
ii) A,E
Proposition 1
If K ~A G and K, G r.-~ H, then I{
Corollary 1 If K
r.-: G.
r.- A
H.
IT ( x , P )).
( 19)
Even if similarity-based information 2:,(T, S) is introduced. to obtain analogical conclusion II(T, P) by circumstantial reasoning, some information apart from the
prime rule turns out to be needed in A. And, both EC
and SC are generally needed to accomplish analogical
reasoning. which implies that multiple application of circumstantial reasoning is necessary. Even in such a case,
circumstantial reasoning remains worthwhile (Proposition 1).
then K ~A G.
Corollary 1 shows that circumstantial reasoning is deductively expansible, and proposition 1 (together with
the corollary) shows that inference done by multiple applications of circumstantial reasoning is also deductively
expansible.
Circumstantial reasoning (K ~: G) implies a very
general and useful inference class in that so many types
of inference used in AI can be considered as circumstantial reasoning. Deduction and abduction, for example,
are obviously circumstantial reasoning. Moreover, if we
loosen the condition "atomic formulas" to "clauses", inductive learning from examples is the case where A is
empty in general, K is "examples" and G is inductive
knowledge obtained by "learning,,5 6
Now, we assume that both EC and SC are circumstantial reasoning, but based on different information. Then,
we can see analogical reasoning in more detaiL
Let an analogy prime rule w.r.t. < ~(x,s);II(x,p) >
be a theorem of A. Then, when example-based information, ~(B, S) /\ II(B, P), is introduced, by circumstantial reasoning from the prime rule, some justifications are
satisfied, that is,
'L-(B,S) 1\ II(B,P)
1\ ~ ( x , 5') :)
r-: Jatt(S,P) 1\ Jobj(B,S),
4
Classification of Analogy and
Examples
Each EC and SC has two cases; a deductive one and
a non-deductive one. According to this measure, analogical inference can be divided into 4 types. A typical
example is shown in each class and explored.
4.1
deductive EC
+
deductive SC
Typical reasoning of this type was proposed by T .Davies
and S.Russell [3J. They insisted that, to justify an analogical conclusion and to use information of the base case.
a type of rule, called a dete1'mination rule, should be a
theorem of a given theory. The rule can be written as
follows:
Vs.p.( 3x'.('L-(x',s) 1\ II(x',p))
:) Vx.('L-(x,s) :) IT(.1',p)) )
(20)
Example 1 (continued). In this example, the following determination rule is assumed to hold under A.
Vs,p.( 3x'.(Model(x',.s) 1\ ~raiue(;r',p))
:) Vx.(Model(x,s) :) l/alue(;r.p)) )
(18)
which concludes a specialized prime rule,
This rule is an analogy prime rule. because
2Circumstantial reasoning is essentially equivalent to "abduction" + deduction [13, 15]. However, "abduction" has many definitions and various usages in different contexts, so we like to introduce a new term for the type of inference in Definition 3 to avoid
confusion.
3 Atoms, that is, formulas which contain only one predicate
symbol.
4If there exists such a minimal set of atomic formulas E, the
case ii) involves the case i) apparently. Thus, the case i) can often
be neglected in a usual application, for instance, if J{ is a universal
formula which has the form I:tx.F(x), where F is quantifier-free.
Note that a clause is universal.
5In this case, G = E in Definition 3, which implies that G is a
minimal set to explain "example" J{. Indeed, such minimality is
very common in this field.
.
6Such a unified aspect of various reasoning in AI was pointed
out by Koich Furukawa (lCOT) in a private discussion and a similar and more intuitive view can be seen in [5].
Jobj(X,s) = 2:,(;1'.8) = Model(x.8),
Jatt(s,p) = (:lx. Model(x,s) /\ Falue(x.p)),
II(x,p) = Falue(x.p).
(21)
Moreover,
EC:
Model(Cs ue , Mustang) 1\ Faiue(Cs ue , $3500)
f- J att (Mustang,$3500),
(22)
SC:
This illustrates that reasoning based on determination
rules belongs to the "deductive EC + deductive SC" type
and that it can also be done by circumstantial reasoning.
510
4.2
deductive EC
SC
+
non-ded uctive
4.3
non~deductive
EC
+
deductive
SC
This type of analogical reasoning was explored by the author [1]. It was concluded that, once we assumed the following two premises for analogical reasoning, it seemed
to be an inevitablt conclusion that analogical reasoning
which infers P(T) from S'(T), S(B), and P(B) satisfies
the illustrative criter'ion. And if an inference system satisfies the criterion, the system is called an illustrative
analogy.
Premise 1: "Analogy is done by projecting properties
(satisfied by a base) from the base onto a target."
Premise 2: "The target is not a special object."
Premise :2 is also assumed in this paper, it is translated
into an arbitrary selection of a target object. Premise
1 was translated as follows: J(B), (where J is the justification in (4) and B stands for a base object) must
be a theorem of A, because it is essential in analQgical
reasoning to project J(B) onto a target object T. That
is, the non-deductive part in this reasoning is just SC
which conjectures the property of the target object, and
EC must be deductive.
Example 2 (continued). By illustrative analogy, a
target is conjectured to satisfy properties used in an
explanation of why a base satisfies a similarity. In
this example, to explain the phenomena of the base
case, "Brutus feels pain when he is cut or burnt", the
following sentences must be in A.
'ix,i.( Nervous_Sys(x) 1\ Destructive(i) 1\ Suffer(x,-i}
:::> FeelPain(x) ),
(24)
I\N ervousSys(Brutus)
(25)
I\Destructive(Cut) 1\ Destructive(Burn)
(26)
As far as the author knows, this type of analogy has never
been discussed. Example:3 seems to show this type of
analogy.
Example 3 (continued). First, let us consider what
we know from example-based information in this case.
From the fact that a student (StudentB) was a member of the same club (Orch) and often neglected study
(Study), we could find that "the orchestra club keeps
its members very busy (BusyClub(Orch))" and that
"activities of the club are obstructive to one's study
(Obstructive_to( Orch, Study))". This implies that we
knew some causal rule like "If it is a busy club and its
activities are obstructive to something, then any member.
of the club neglects the thing."
'ix,.s,p.( BusyClub(s) 1\ Obstructive_to(p,s)
I\M ember_of(x, s)
:::> NegligenLof(x,p) )
Using this rule, we found the above information.
Thus. the above rule is assumed to be a theorem of
A. BusyC lube Orch) and Obstructive_toe Orch, Study)
are non-deductive conjectures and it can be obtained by
circumstantial reasoning based on the above rule which
is just an analogy prime rule, as follows:
Jobj(x,s) = E(x,s) = Member_of(x,s),
J att ( s, p) = BusyC lube s) 1\ Obstructive_to(p, s),
IT(x,p) = Negligent-of(x,p).
4.4
From (24), the following follows:
'ix,s,p.( Nervous_Sys(x)
(28)
non-dedtlCtive
deductive SC
EC
+
nOD-
I\Destructive( s) 1\ Destructive(p)
I\(Suffer(x,s) :::> FeelPain(x))
:::> (Suffer(x,p) :::> FeelPain(x)) ),
(27)
which is an analogy prime rule, that is,
Jobj(x,s) = Nervous_Sys(x),
J att ( s, p) = Destructive( s) 1\ Destructil'e(p),
E(x,s) = Suffer(x,s)::) FeeIPain(x),
IT(x,p) = Suffer(x,p) ::) FeelPain(x).
J att ( Cut, Burn) ("Both cut and burn are destructive") is a deductive theorem of A and a non-deductive
conjecture, Jobj(Tacitus, Cut) ("Tacitus has a nervous system"), is obtained by circumstantial reasoning
from (24) based on the similarity-based information,
Suf fer(Tacitus, Cut) ::) FeelPain(Tacitus).
As an example of this type, we can take Example 2 again.
We might know neither "Brutus has a nervous system"
nor "Both cut and burn are destructive", which corresponds to the case that (25) and (26) are not in A (nor
any deductive theorem of A) in the previous Example 2.
However, by circumstantial reasoning from (24) based on
example- based information (" Bru t us feels pain when he
is cut or burnt"), "Both cut and burn are destructive"
(and "Brutus has a nervous system") can be obtain~d,
and based on similarity-based information ("Tacitus feels
pain when he is cut"), "Tacitus has a nervous system", a
really significant but implicit similarity, is obtained similarly to the previous exampie. Consequently, the analogical conclusion ("Tacitus would feel pain when he is
burnt") is derived from (27) (or (24)) together with the
above conjectures.
511
5
Conclusion and Remarks
• Through a logical analysis of analogy, it is shown
to be reasonable that analogical reasoning is possible only if a certain analogy primt rult is a deductive theorem of a given theory. From the rule.
together with an example-based conjecture and a
similarity-based conjecture, the analogical conclusion
is derived. A candidate is shown for a non-deductive
inference system which adequately yields both conjectures.
• Results shown here are general and do not depend
on particular pragmatic languages like the purpost
predicate [10] nor on some numeric similarity measure [20]. These results can be applied to any normal
deductive data bases (DDB) which consist of logical
sentences.
Application of this analogical reasoning to DDB
may be one of the most fruitful. It is. generally
speaking, very difficult to build a DDB which involves perfect knowledge about an item. Analogical reasoning will increase the chance of answering
queries adequately, even when its deductive operation fails to answer. In a DDB, it is very common
to see inheritance rules and transitivity( -like) rules,
which have the form of the analogy prime rule, for
instance,
in the area of artificial intelligence.
Analogical reasoning differs from other reasoning, abductivt and deductive, in that analogical reasoning actually uses example-based information (the
base information). Consider the difference from.
this time. abduction in the above database case.
Even if the database uses (ordinal) abductive reasoning in the query, it cannot specify an adequate grandparent of Tom. the possible answer
will be x s.t.
Gran_pa(x, Tom), Parent(x, Sue),
(:3z. )(Parent(x, z), Parent(z. Tom)), or Sue assuming Parent(Sue, Sue), etc [2. 14, 18. 9]. The reason
for this failure is that abduction tries to explain only
the target case.
Moreover. comparing with enumerative induction
and cast-based reasoning (eBR) in which the use
of examples are essential similarly to analogical reasoning, analogical reasoning has a salient feature in
more strongly depending on a background knowledge (a given theory). Analogy can be seen as a
singh instance generalization as Davies and Russell
pointed out [3]. Take an example, Example 3. From
the analogy prime rule (28) and example-based information of an base case (StudentB), some nondeductive inference (ex. circumstantial reasoning)
yields a more specified analogy prime rule,
'v'x.( Member_of(x,Orch)
:::J NegligenLof(x,Study) ),
(33)
Gran-pa(x, y) : -Parent(x,z),Parent(z, y). (29)
This is an analogy prime rule w .1'. t.
<
Parent(z, y); Gran_pa(x, y) > (z is a variable for the
similar attribute value and x is a variable for the
projected attribute value). Assume that a query
"7 -Gran_pa(x, Tom)" is given to a database A which
involves the above rule and the following facts:
Parent(Sue, Tom).
Gran_pa( John, Bob).
(30)
Parent( Sue, Bob).
(32)
(31)
The database cannot answer the query q.eductively,
because it does not know who is a parent of Sue.
If the database uses the proposed type of analogical reasoning, it is able to guess Gran_pa( John, Tom)
from Bob's case just because Tom is similar to Bob in
that their parents is the same.
Interestingly, a method which discovers an analogy
prime rule from knowledge data-base CYC is explored independently [1 7]. Such methods make analogical reasoning more common in DDB.
• By the side effect of this analysis. it becomes
possible to compare analogy with other reasoning formally which have been studied vigorously
which is a generalization of the example-based information,
Member _of(StudentB, Orch)
AN egligenLof(StudentB, Study).
(34)
We should note that, in the process of this single
instance generalization, an analogy prime rule in a
background knowledge is used as an intermediary,
and it might be considered the reason why analogy
seems more plausible than a simple single instance
generalization such that it yields (33) just from (34).
In the research offormal inductive inference [16, 12],
a back ground knowledge does not play such an important role. So, plenty of examples are needed until a plausible conclusion is obtained. Concerning
eBR [19], though it uses base cases like analogical reasoning and, in order to retrieve their base
cases, it uses an index which corresponds to the
similarity S, the index is assumed to be given in
spite of using background knowledge. Intuitively
speaking, these methods will be very useful when
a background knowledge is rather poor or difficult
to formulate. and when the background knowledge
is extremely strong or able to be formulated perfectly. deduction will be most usefuL on the other
512
hand, the proposed type of analogy will be useful
when rather strong and difficult to formulate.
• An implementation system for this type of analogy
has been developed. Given a theory A, a target
T and a prqjected attribute II(x,p) (from a query,
"? - II( T, p)"), this system finds a base B, a similarity E(x, S) and a projected property II(x, P) (ie.
"II(T, P)" is the answer of the query) by the process
with backtracking, according to the following steps:
1) Find a separate rule SepR s.t. A f- SepR,
where SepR = II(x,p) :- Gatt(s,p),Gobj(X,S).
2) Take a similar attribute E(:r,s)
s.t. E(x,s) rv~ Gobj(X,S).
3) Obtain the similar attribute value S
by the side effect of a proof A f- :ls.E(T,s).
4) Retrieve a base B and obtain the projected
attribute value P
by the side effect of a proof
A f- ::Jx,p.(E(x, S) 1\ II(x,p)).
Here, a separate rule (w.r.t. II(x,p)) is a Horn clause
in which the head is II(x,p), and any variable of x
and any variable of p does not appear in the same
conjunct in the body. This system guesses successfully for the examples shown here, though each of
them is translated into a set of Horn clauses.
Significant restrictions are needed on the time complexity of this process. Details of this system will
be reported elsewhere.
Acknow ledgment
I especially wish to thank Satoshi Sato for his frank
comments and challenging problems. I am also grateful
to Koichi Furukawa, Hideyuki Nakashima, Natsuki Oka,
and five anonymous referees for their constructive comments, Makoto Haraguchi and members of ANR-WG,
which was supported by ICOT, for discussions on this
topic, Katsumi Inoue and Hitoshi Matsubara for discussions on abduction and CBR respectively, and Kazuhiro
Fuchi for giving me the opportunity to do this work.
References
[1] Arima, J.: A logical analysis of relevance in analogy, in Proc. of Workshop on Algorithmic Learning
Theory (ALT'.91), (1991).
[2] Cox P.T. and Pietrzykowski T.: Causes for events:
their computation and applications, in: Proc. of
Eighth International Conference on Automated Deduction, Lecture Notes in Computer Science 230
(Springer-Verlag, Berlin, 1986) pp. 608-621.
[3] Davies, T. and Russell, S.J.: A logical approach
to reasoning by analogy, in IJCAI-81, pp.264-270
(1987).
[4] Evans, T,G.: A program for the solution of a class
of geometric analogy intelligence test questions, in:
M.Minsky (Ed.), Semantic Information Processing
(MIT Press. Cambridge, MA, 1968).
[.5] Falkenhainer, B.: A unified approach to explanation
and theory formation, in: J.Shrager & P.Langley
(Ed.), Computational Models of Scientific Discovery
and Theory Formation, (Morgan Kaufmann, San
Mareo, CA. 1990).
[6] Gentner, D.:
Structure-mapping:
Theoretical
Framework for Analogy, in: Cognitive Science,
Vol.7. No.2, pp.155-170 (1983).
[7] Greiner, R.: Learning by understanding analogy,
Al'tificial Intelligence, Vol. 35, pp.81-125 (1988).
[8] Haraguchi, M. and Arikawa, S: Reasoning by Analogy as a Partial Identity between Models, in Proc. of
Analogical and Inductive Inference (ALL '86), Lecture Notes in Computer Science 265, (SpringerVerlag, Berlin, 1987) pp 61-87.
[9] Inoue. K.: Linear Resolution for ConsequenceFinding, in A rtificial Intelligence (To appear).
[10] Kedar-Cabelli, S.: Purpose-directed analogy, in the
7th Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Lawrence Erlbaum Associates,
pp.150-159 (1985).
[11] Kling, R.E.: A paradigm for reasoning by analogy,
A rtificial Intelligence 2 (1971).
[12] Muggleton, S. and Buntine, W.: Machine Invention
of First-Order Predicates by Inverting Resolution,
In: Proc. of 5th International Conference on Machine Learning, pp 339-352 (1988).
[13] Peirce C.S.: Elements of Logic, in: C. Hartshorne
and P. Weiss (eds.), Collected Papers of Charles
Sanders Peirce, Volume 2 (Harvard University
Press, Cambridge, MA, 1932).
[14] Poole D., Goebel R. and Aleliunas R.: Theorist:
a logical reasoning system for defaults and diagnosis, in: N. Cercone and G. McCalla (eds.), The
Knowledge Frontier: Essays in the Representation
of Knowledge (Springer-Verlag, New York, 1987)
331-352.
[1.5] Pople, H.E.Jr.: On the mechanization of .abductive logic, in: Proceedings IJCAI-13, Stanford, CA
(1973) 147-152.
513
[16] Shapiro, E.Y.,: Inductive Inference of Theories
From Facts, TR 192, Yale Univ. Computer Science
Dept. (1981).
[17] Shen ,W.:Discovering Regularities from Knowledge
Bases, Proc. of Knowledge Discovery in Databases
Workshop 1991, pp 95-107.
[18] Stickel M.E.: Rationale and methods for abductive reasoning in natural-language interpretation,
in: R. Studer (ed.), Natural Language and Logic,
Proceedings of the International Scientific Symposium, Hamburg, Germany, Lecture Notes in Artificial Intelligence 459 (Springer-Verlag, Berlin, 1990)
233-252.
[19] Schank, R.C.: Dynamic Memory: A The01'y of Reminding and Learning in Computers and Peoplt
(Cambridge University Press, London. 1982).
[20] Winston, P.H.: Learning Principles from Precedents
and exercises, Artificial Intelligence, Vol. 19, No. :3
(1982).
Appendix
Proposition 1.
If K r.- A G and K, G
r.-:t H,
then K
r.- A
Proof of Pr L
For any formula G s.t. K r.- A G and K. G ~~ H.
case-i) A, K. G f- H (from K, G r.-~ H )
From the premises. A. K. G f- L.
L. (from Definition :3 i))
Therefore. K. G
r.-:
case- ii) otherwise. for some minimal set of atomic
formulas E S.t. A. E f- K A G.
A. E f- K A H. (from]{, G ~~ H)
Therefore. A, E f- L.
Thus. K. G
1.
r.-:
Thus
K. G ~~ 1.
iii) Consistency:
if
K r.-~ H
AuK
U
and
AUK is consistent, then
{H} is consistent.
(proof)
Au J{ is consistent.
=}
Au J{ u {G} is consistent. (from J{ ~A G)
=}
Au E is consistent. (from J\'. G ~~ H)
=}
Au]{ u {E}. (because A. E f- J{ A H)
Corollary 1.
If 1{ r.-~ G. then K ~A G.
Proof of Corollary 1.
K ~~ K (from subsuming deduction)
If K ~A K and K. K ~~ G, then J{ ~A G. (from
Proposition 1)
Therefore.
If K ~~ G, then K ~A G.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
514
CONSISTENCY-BASED AND ABDUCTIVE DIAGNOSES AS GENERALISED
STABLE MODELS
Chris Preist, Kave Eshghi
Hewlett Packard Laboratories, Filton Road,
Bristol, BS12 6QZ, Great Britain
cwp@hplb.hpl.hp.com
ke@hplb.hpl.hp.com
Abstract
If realistic systems are to be successfully modelled and
diagnosed using model-based techniques, a more
expressive language than classical logic is required. In
this paper, we present a definition of diagnosis which
allows the use of a nonmonotonic construct, negation as
failure, in the modelling language. This definition is
based on the generalised stable model semantics of
abduction.
Furthermore, we argue that, if negation as failure is permitted in the modelling language, the distinction
between abductive and consistency-based diagnosis is
no longer clear. Our definition allows both forms of
diagnosis to be expressed in a single framework. It also
allows a single inference procedure to perform abductive or consistency-based diagnoses, as appropriate.
1 Introduction
Many different definitions of diagnosis have been used
in an attempt to formalise and automate the diagnosis
process. In the so-called 'logical' approach, two frameworks, namely the consistency-based [Reiter 1987] and
abductive [Cox and Pietrzykowski 1986], have attracted
a lot of attention. Typically, the modelling language
used in these frameworks is first order logic (or some
subset of it). In this paper we present a unified framework for diagnosis which brings together these two
styles of diagnosis, as well as providing a non-monotonic modelling language.
We were primarily motivated by the need to incorporate
negation asfailure, the non-monotonic construct in
logic programming, into the modelling language. We
first show the need for this construct through some
examples, and then argue that the incorporation of
negation as failure in the modelling language necessitates the inclusion of both consistency-based and
abductive diagnosis within the same framework. We
then present our unified framework, which allows negation as failure in the modelling language and naturally
incorporates both abductive and consistency-based
diagnosis. We then show that in the special cases, our
approach reduces to pure consistency and pure abductive diagnosis, i.e. it is a generalisation of both styles.
Our work is similar in spirit to the work of Console and
Torasso, [1990],[1991], but goes beyond it in many
ways. We will compare our approach to that of Console
and Torasso in a later section. Our proposed framework
is based on the Generalised Stable Model semantics
[Kakas and Mancarella 1990a] of generalised logic programs with abduction, strengthening the link between
logic programming and diagnosis first explored in [Eshghi 1990].
2 Consistency-based and abductive
approaches to diagnosis
In both consistency-based and abductive approaches, a
set of axioms SO (called the system description) models
the system under investigation, and a set of abnormality
assumptions Ab={ab 1 ,ab2 , ... abn} represents the possible
underlying causes of failure. A set of statements, Obs,
represents observations of the behaviour of the system
which are to be explained.
In the consistency-based approach, a diagnosis is a set
of abnormality assumptions, L\, such that
(1)
SOuOBSuL\u{ -,abkl abkE Ab-L\} is consistent.
The consistency-based approach focuses primarily on a
model of the system's correct behaviour. When the
abnormality assumptions relate to the failure of the
components of the system, it attempts to find a set of
normality and abnormality assumptions which can be
assigned to the system's components to give a theory
consistent with the observations.
In the abductive approach, a diagnosis is a set of abnormality assumptions, L\, such that
(2)
SOuL\ ~ OBS
SOuL\ is consistent.
The abductive approach primarily models the behaviour
of a failing system, by using fault models in the system
description, SO. The diagnosis process consists of look-
515
cl'
d1
Figure 1: A pre-charged line
ing for a set of abnormality assumptions which, when
adopted, will logically predict the observed faulty
behaviour given the system description and the context
of the observation.
In both approaches, a diagnosis 8 is defined to be minimal if there is no other diagnosis, 8', which is a proper
subset of 8.
3 The Diagnosis Problem
The system description used in model-based diagnosis
takes one of two forms. It is either a causal model, or a
model consisting of the system's structure and the behaviour of individual components. In general, work on
abductive diagnosis has focused on the former, while
work on consistency-based diagnosis has focused on the
latter.
For the purposes of this paper, we adopt a specification
of a diagnosis problem based on those used in [deKleer
and Williams 1987] and [Reiter 1987], which uses a
component-based approach. However, the results hold
equally for a causal model-based approach, and for this
reason, we adopt slightly more general language in the
definition.
in cluster Cj are present, the 'good behaviour model' of
this cluster;
good_behaviour_model f-not ab( Cj).
In the component-based approach, Cj represents a component, and each cause in cluster S represents a possible
fault model of the component. Note that the effects of a
cause need not be defined deterministically. For example, the 'arbitrary behaviour' mode of a component, proposed in [deKleer and Williams 1989], is consistent with
any behaviour of the component, but predicts nothing.
The logical language adopted to represent SO can vary
with the definition of diagnosis adopted. In this paper,
we focus on two possible languages; classical logic, as
adopted by Reiter [1987], and hom clauses with negation as failure, as used in the logic programming community.
4 The need for negation as failure in the
system description
The desire to integrate consistency-based and abductive
diagnosis was motivated primarily by the need to include negation as failure in our models. The following
two examples illustrate this need:
RAM modelling
Definition:
A diagnosis problem consists of a triple,
where; .
(i) The system description, SO, specifies the behaviour
of the system.
(ii) The observation set, OBS, specifies a set of observations of the system as unit clauses.
(iii) C consists of constants,"'cj, which represent causal
clusters within the system.
Causal clusters are groups of causes of abnormal system
behaviour which it makes sense to consider together.
Each cause, n, within the cluster, ch is modelled in SD
with two clauses;
eJfects_of_cause_n
f-ab(cj, n).
ab(Cj) f-ab(cj, n).
Furthermore, if so desired, we can define emergent properties of the system which occur when none of the causes
In order to model the behaviour of a random access
memory cell, we needed an axiom that says: the content
of a cell at time T is X if X was written to this cell at time
T, and no other write operation has been performed between T and T. The most straightforward way of writing
this is as the clause
contents(Cell, X. T) f-
written(Cell, X, I'),
T be an abductive framework, and L\ k atomseA) be a set of abducibles. Then the set M(L\) of ground
atoms is a generalised stable model (GSM) for
iff it is a stable model for the logic program PuL\, it is a
model for th~ integrity constraints Ie, and L\=AnM(L\).
The above definition is an extension of that in [Kakas
and Mancarella 1990a] to allow abducibles to appear in
the head of a clause. As a result of this, the set of abducibles chosen as generators can be smaller than L\, the set
of abducibles true in the generalised stable model.
A unit clause, q, representing an observation, has an abductive explanation with hypothesis set ~ if there exists
a generalised stable model, M(L\), in which q is true.
Equivalently, we can say that q has an abductive explanation, L\, within the abductive framework
if the
abductive framework
has a generalised stable model M(L\). Having q in the integrity constraints imposes the condition that q must be true in the generalised
stable model, and hence must follow from the logic program together with the set of abducibles chosen.
Here, we briefly recall their definitions;
Definition 1
An abductive framework is a triple
where
1) P is a set of clauses of the form H f-- L l> .. ,Lk kO
where H is an atom and Li is a literal.
2) A is a set of predicate symbols, the abducible predicates. The abducibles, Ab, are then all ground atoms with
predicate symbols in A.
3) IC, the integrity constraints, is a set of closed formulae.
Hence an abductive framework extends a logic program
to include integrity constraints and abducibles. The semantics of this framework is based on the stable model
semantics for logic programs;
Definition 2
Let P be a logic program, and M a set of atoms from the
Herbrand base. Define PM to be the set of ground horn
clauses formed by taking grdund(P), in clausal form, and
deleting;
(i) each clause that has a negative literal--.l in its body,
and 1 EM.
(ii) all negative literals --.1 in the body of clauses, where
1 eM.
M
is a stable model for P if M is the minimal model of
PM'
This definition is extended to give a semantics to abductive frameworks.
7
Generalised Stable Models and Diagnosis
The generalised stable model semantics for abduction
can be applied to diagnosis by mapping a diagnosis
problem, , with multiple observations, onto
an abductive framework as follows;
Represent the system description, SD, as a logic
program with integrity constraints, . The
integrity constraints will usually contain sentences stating that observation points cannot
take multiple values at a given time.
Let the abducibles represent the causes within
the clusters, {ab(ci.n)1 ciE C}, hence A =
{ab(X,N)}.
Intuitively, given an observation set aBS, represented
by a set of unit clauses, we have a choice of how to use
it. We either wish to predict it, giving an abductive diagnosis, or make assumptions to restore the theory to consistency, giving a consistency-based diagnosis. By
adding aBS to the integrity constraints, only models in
which the observations are true, and hence explained by
the system description together with selected abducibles, are legal generalised stable models. Hence we get
an abductive diagnosis. If, instead, we add aBS to the
logic program representing the system description, then
a set of assumptions can only be made if they are consistent with the observations; i.e. the observations, system description and assumptions cannot derive
anything which violates the integrity constraints. This
will give us consistency-based diagnoses. Furthermore,
518
we can partition OBS into two sets, and predict some
observations, OBSp, while maintaining consistency with
others, OBSe. We do this by placing OBS p in the integrity constraints, and OBSe in the logic program.
This allows us to give a definition of unified diagnosis
as follows;
Definition 4
Let be a diagnosis problem, where;
SO is a logic program with integrity constraints, .
OBSp is the set of observations to be predicted by diagnoses.
OBSe is the set of observations which diagnoses need to
be consistent with.
C is the set of causal clusters in the system.
Then;
d is a GSM-diagnosis of iff there is
a generalised stable model, M(d), of the abductive
framework '
where A = {ab(C,N)} represents the set of possible root
causes of misbehaviour in SO.
To demonstrate this, we consider a simple example
from the medical domain, that of pericardiai tamponade. The heart consists of two parts, the myocardium is
the muscle which beats, while the pericardium is the
protective sac which surrounds this muscle. If this sac is
pierced, instantaneous pain occurs, which can subside
fairly quickly. However, blood slowly flows into the
pericardium over a period of time, increasing the pressure on the myocardium. Later, the myocardium will
become so compressed that blood does not flow round
the arteries, even though the myocardium itself is functioning perfectly.
The model of this phenomenon is given below. For simplicity, we treat time discretely, in units of hours.
pulse_ok(T)
f-
normaLcardiac_contraction(T),
not hearCcompressed(T).
no-pulse(T)
f-
hearCcompressed(T) .
ab(pericardium,pierced(T)),
TUlse(12). Let us
consider the generalised stable models of .
If we place the observation in the logic program as a
unit clause, any set of abducibles can be assumed as
long as they do not violate the integrity constraints - i.e.
they must not generate a stable model in which pulse_ok(12) is true. If we assume nothing, the resulting
stable model contains pulse_ok(12) as true, resulting in a
conflict. There are two possible (minimal) ways to
restore consistency. We can assume ab(myocardium,failure(1 0» 1, and cease to contain normaLcardiac_contraction(12) in the stable model. Alternatively, we assume
ab(pericardium,pierced(2» 1, which predicts heart compression at time 12. The resulting stable model will
therefore not contain pulse_ok(12), and so be a legitimate generalised stable model of .
If, instead, we place the observation in the integrity
constraints, Ie, we are restricted to stable models which
contain nOJ)ulse(12). In this case, only by assuming
ab(pericardium,pierced(2» do we generate a stable model
which contains nOJ)ulse(12). As this also satisfies IC, it
is a legitimate GSM for .
Hence, by making a choice of where to place the observation, we can generate either consistency-based or
abductive diagnoses. Furthermore, if we have a second
observation, ecg-1)ood(12), we can choose to treat it in a
different way from the first. Let OBS p = {noJ)ulse(12)}
and OBSe = {ecQ-1)ood(12)}. In this case, the only (minimal) GSM of is that generated
by ab(pericardium, pierced(2». However, if we swap
OBSp and OBSe, the only (minimal) GSM is that generated by ab(myocardium, failure(10».
Note how the model uses negation-as-failure to handle
the frame problem. If we used classical negation
instead, it would be necessary to have extra clauses to
predict nOCheart_compressed at all relevant times,
resulting in a larger, less understandable, and less efficient model.
8 Abductive and consistency-based
diagnosis as special cases
If we restrict our attention to the traditional definitions
of diagnosis, we can show that our definition is equivalent to these under certain conditions.
1 Or, of course,
at any other appropriate time instant.
519
8.1 Abductive Diagnoses as Generalised
Stable Models
If all the observations are to be predicted in the abductive sense, and the system description contains only
hom clauses, our definition of diagnosis reduces to the
standard definition of abduction given in section 1. This
is achieved as follows:
Given an abductive diagnosis problem ,
where SO is a hom-clause theory, divide the system
description into a set of definite clauses, P, and a set of
denials, O. Let A be the set of abducibles.
It is easy to show that abductive diagnoses of SO
according to formula (2) correspond to generalised stable models of the framework '
8.2 Consistency-Based Diagnoses as
Generalised Stable Models
For a certain class of theories, namely almost-horn theories, we show that our definition of diagnosis is equivalent to the traditional definition of consistency-based
diagnosis given in [Reiter 1987]. An almost-hom theory
is a theory in which negation is used only to represent
the negation of certain predicates. In the context of our
theorem, these correspond to the abnormality assumptions.
A clause is said to be almost-Horn with respect to A, if,
when in disjunctive normal form, it contains at most
one positive literal with a predicate symbol not in A.
Theorem
Let be a consistency-based diagnosis
problem, with SO a theory which is almost-hom with
respect to A={ab}.
Then define the logic program with integrity constraints, SO·=, as follows;
E
atoms{A), and P. qj
9 Comparison with Console & Torasso [2]
Console & Torasso have defined a framework for a general abduction problem. This framework allows a spectrum of diagnosis styles to be represented within it,
including the pure consistency-based and abductive
styles described above.
They divide the observations into two sets. One set,
OBSa' is to be explained by the assumptions, while the
other set, OBSe, must be consistent with the assumptions. They then define two sets;
r=OBS a ·
'I' = { -,f{x) I f{Y)E OBSeo x:;t:y}
A diagnosis is then a set of abducibles which, when
added to the theory, allows prediction of all observaand is consistent with the negative literals in
tions in
r,
Definition 5
Let aj
resent the normality assumptions in the system, -,ab,
then the nonmonotonic definition of diagnosis given by
us is equivalent to the monotonic definition given in
[Reiter 1987]. However, if negation is used elsewhere
in the theory, the two definitions diverge. The classical
consistency-based definition requires explicit representation of all negative information. The GSM-diagnosis,
however, will make the closed-world assumption, and
assume information is false unless it can be proved otherwise.
~
atoms{A).
1. For every clause of the form
pr -,al,-,a2 ...-,ak.ak+l .... am.ql,q2 ... 'qn in SO, there is a
program clause
pr not al,not a2 ... not ak.ak+l .... am.q1,q2, .. 'qn in P.
2. For every clause of the fctrm
alva2 ... vakv-,ak+lv ... v-,am-,q1v-,q2v .. v-,qn in SO there is
an identical clause in IC.
Then;
o is a consistency-based diagnosis of
according to formula (1)
¢:> D is a GSM-diagnosis of
The proof of this theorem is available in an extended
version of this paper, available from the authors.
This theorem shows that, if negation is used only to rep-
'1'.
Our definition is more powerful in several ways.
It extends the definition of Console and Torasso from hom-clause theories to general logic
programs with integrity constraints. This gives
a sophisticated and expressive language for
modelling, which includes negation as failure.
The inclusion of the consistency-based observations in the object level, rather than their negations in the integrity constraints, means that
these can be used easily during inference. This
can reduce the time to find a conflict, by using
'backwards simulation' of components. In
some cases, such as the example documented in
[van Soest et al. 1990] , certain diagnoses cannot be found without access to the observations
in this way.
Within this framework, it is possible to define
minimal diagnoses model-theoretically. We
will expand on this in section 10.
Placing the consistency-based observations at the object
level potentially gives us more efficient inference.
However, to do this in the context of joint diagnoses can
lead to problems.
It may be possible to conclude that an abductive obser-
520
vation is true, based on the adding of a consistencybased observation to the theory alone;
SD: obs1 -7obs2
they must be mutually exclusive logically. This can easily be achieved by adding an integrity constraint forbidding a component to have two modes;
false ~ ab(ci,mjl)' ab(ci,mj2), mjl"",mj2.
OBSa :obs2
OBSc :obs1
By adding obs1 to the system description, we can conclude that obs2 is true. Whether this is legitimate
depends on how we interpret the consistency-based
obselVations. If we consider them true, but not necessarily explainable, then this is legitimate. This is the
case in Reiter's formalisation of diagnosis, and also in
the case of the setting factors of Reggia et al. [1983].
However, if we consider them not necessarily true,
merely not false, then this is unacceptable. In such circumstances, it is necessary to restrict the model so that
consistency-based obselVations do not appear in the
body of clauses, or use the approach proposed by Console and Torasso.
The framework provided by Console and Torasso satisfies the second of these conditions, but not the first. Because they work in a monotonic framework, it is not possible to represent the correct behaviour of a component
as the default behaviour; instead, it must be explicitly assumed that a component behaves correctly.
As a result of this, they must specify a semantic minimisation criterion; a diagnosis is minimal if it contains a
minimal set of abducibles corresponding to faulty behaviour. We, however, can specify a model theoretic criterion;
A diagnosis, .1, is minimal if its corresponding GSM,
M(.1), is a minimal GSM.
11
10 Minimality
We now focus attention on component-based diagnosis,
and consider the problem of minimal diagnoses. We
wish to restrict our attention to those diagnoses which
contain a minimal number of failing components.
To do this, we introduce minimal generalised stable
models;
Definition:
A general stable model, M(.1), for an abductive framework,, is minimal if there is no other GSM,
M(.1'), such that.1'c.1.
Hence, a minimal general stable model contains a minimal set of assumptions which allow the consequences of
the logic program P to satisfy the integrity constraints,
IC. Note that, because abductive frameworks are nonmonotonic, this does not imply that any superset of .1, <1>,
will have a GSM, M(<1».
If, in our diagnosis framework, we have a 1-1 correspondence between a hypothesised failed component
and an abducible being assumed in the abductive framework, then minimal general stable models will correspond to minimal diagnoses. To do this, we must impose
two restrictions on the relationship between the frameworks;
(i) There must be no abducible representing the correct
behaviour of a component. This must instead be a default behaviour which is used in the absence of abducibles referring to the faulty behaviour of a component.
(ii) It must be illegal to make more than one assumption
about a component's behaviour at a time.
Note that the second condition does not force fault
modes to be mutually exclusive in real-life, merely that
Calculating Diagnoses
By providing a uniform model-theoretic framework for
consistency-based, abductive and joint diagnoses, we
have also provided a method for a uniform implenientation. We simply need an algorithm for generating the
minimal generalised stable models of an abductive
framework, and we can use this for performing a variety
of diagnosis tasks.
.
Much work has been carried out on the generation of
stable models, and several efficient algorithms exist.
However, as general stable models are a newer innovation, these results have yet to be fully exploited and
extended to the GSM case. Currently, the state of the art
in GSM generation is provided by Satoh and Iwayama
[1991]. This work, however, has the drawback that it
does not produce minimal GSMs.
Traditionally, in the abductive community, top-down
algorithms have been used which tend to generate minimal solutions, as they avoid making irrelevant assumptions. (e.g. [Cox and Pietrzykowski 1986] [Kakas and
Mancarella 1990b]) However, non-minimal abductive
diagnoses are still acceptable in the model-theoretic
semantics, and can be generated by the algorithms.
Similarly, in the diagnosis community, generation of
minimal diagnoses has tended to be a consequence of
the algorithm selected (e.g. the ATMS in [deKleer and
Williams 1987]) rather than a model-theoretic restriction.
However, Eshghi [1990] proposes an alternative
approach. He generates a theory in which minimal diagnoses correspond exactly to the stable models of the
theory. This means that non-minimal diagnoses are
excluded by the semantics, rather than the algorithm.
By extending these results beyond the almost-horn case,
we are able to transform an abductive framework into a
521
logic program. The stable models of this logic program
correspond exactly to the minimal generalised stable
models of the abductive framework. This means that
minimality is brought into the theory as a necessary
property of each solution, rather than being a selection
criterion between solutions. This work is currently in
progress.
[deKleer and Williams 1989] J. deKleer & B. Williams.
Diagnosis with Behavioural Modes. Proceedings of
the Eleventh International Joint Conference on
Artificial Intelligence, Detroit 1989.
As a result of this, a wider variety of literature can be
used to select appropriate and efficient algorithms,
rather than being restricted to algorithms which have
been developed specifically for the task of diagnosis.
[Dressler 1990] O.Dressler. Computing Diagnoses as
Coherent Assumption Sets. Proceedings of the First
International Workshop on Principles of Diagnosis,
Menlo Park 1990
12
[Eshghi 1990] K. Eshghi. Diagnoses as Stable Models.
Proceedings of the First International Workshop on
Principles of Diagnosis, Menlo Park 1990
Conclusions
By moving to a nonmonotonic logical framework, it is
possible to bring abductive and consistency-based diagnosis together, and use the same inference method to
perform both. We have done this by using generalised
stable models to provide the semantics, which provides
us with a rich and expressive modelling language. It
also gives a link between diagnosis and logic programming, allowing application of theoretical and practical
logic programming results to the domain of diagnosis.
Acnowledgements
Thanks to Bruno Bertolino and Enrico Coiera for their
assistance.
References
[Console et al. 1990] L.Console, D. Theseider Dupre &
P.Torasso. A Completion Semantics for Object-level
Abduction. Proc. AAAI Symposium in Automated
Abduction, 1990.
[Console et al. 1991] L.Console, D. Theseider Dupre &
P.Torasso. On the relationship between abduction
and deduction. Journal of Logic and Computation,
2(5), Sept. 1991.
[Console and Torasso 1990] L.Console & P. Toras so.
Integrating Models of the Correct Behaviour into
Abductive Diagnosis. Proceedings of the 9th
European Conference on Artificial Intelligence,
1990.
[deKleer and Williams 1987] J. deKleer & B. Williams.
Diagnosing Multiple Faults. Artificial Intelligence
32:97 -130, 1987.
[Eshghi and Kowalski 1989] K. Eshghi & R Kowalski.
Abduction compared with Negation as Failure.
Proceedings of the 6th Int. Conf. on Logic
Programming, Lisbon 1989, pp234-254.
[Eshghi and Preist 1992] K. Eshghi and C. Preist. The
Cachebus Experiment: Model Based Diagnosis
applied to a Real Problem in Industrial Applications
of Knowledge-Based Diagnosis, ed Guida and
Stefanini, Elsevier 1992.
[Gelfond and Lifshitz 1988] M. Gelfond & V. Lifshitz.
The Stable Model Semantics for Logic
Programming. Proceedings of the Fifth International
Conference on Logic Programming, 1988.
[Kakas and Mancarella 1990a] A. Kakas & P.
Mancerella. Generalised Stable Models: A
Semantics for Abduction. Proceedings of the 9th
European Conference on Artificial Intelligence,
1990.
[Kakas and Mancarella 1990b] A. Kakas & P.
Mancarella. On the relation between Truth
Maintenance and Abduction. Proceedings of
PRICAI, 1990.
[Reiter 1987] R Reiter. A theory of diagnosis fromfirsl
principles, Artificial Intelligence Journal 32, 1987
[Reggia et al. 1983] J.A. Reggia, D.S. Nau & P.Y.
Wang. Diagnostic Expert Systems based on a Sel
Covering Model. Int. 1. of Man-Machine Studies 19,
p437-460. (1983)
[Console and Torasso 1991] L.Console & P.Torasso. A
Spectrum of Logical Definitions of Model-Based
Diagnosis. University of Torino Technical Report,
1991.
[Satoh and Iwayama 1991] K. Satoh & N. Iwayama.
Computing Abduction by using the TMS.
Proceedings of the Eighth International Conference
on Logic Programming, 1991.
[Cox and Pietrzykowski 1986] P.T. Cox & T.
Pietrzykowski.
Causes for Events:
their
Computation and Application. Proc. 8th conference
on Computer Aided Design and Engineering, 1986.
[Shanahan 1989] M. Shanahan. Prediction is Deduction
but Explanation is Abduction. Proceedings of the
Eleventh International Joint Conference on
Artificial Intelligence, Detroit 1989.
[Davis 1984] R Davis. Diagnostic Reasoning based on
Structure and Behaviour. Artificial Intelligence
24:347-410, 1984.
[vanSoest et al. 1990] D.C. van Soest, RR Bakker, F.
van Raalte & N.J.!. Mars. Improving effectiveness oj
model-based diagnosis, Proc. 10th international
workshop on expert systems and their applications,
Avignon 1990.
[deKleer et al.1990] J. deKleer, A. Mackworth & R
Reiter. Characterizing DiagnOses. Proceedings of
the Eighth National US Conference on Artificial
Intelligence, Boston 1990.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
522
A Forward-Chaining Hypothetical Reasoner Based on
Upside-Down Meta-Interpretation
Yoshihiko Ohta
Katsumi Inoue
Institute for New Generation Computer Technology
Mita'Kokusai Bldg. 21F, 1-4-28 Mita, Minato-ku, Tokyo 108, Japan
{ohta, inoue}@icot.or.jp
Abstract
A forward-chaining hypothetical reasoner with the
assumption-based truth maintenance system (ATMS)
has some advantages such as avoiding repeated proofs.
However, it may prove subgoals unrelated to proofs of
the given goal. To simulate top-down reasoning on
bottom-up reasoners, we can apply the upside~down
meta-interpretation method to hypothetical reasoning.
Unfortunately, when programs include negative clauses,
it does not achieve speedups because checking the consistency of solutions by negative clauses should be globally
evaluated. This paper describes a new transformation
algorithm of programs for efficient forward-chaining hypothetical reasoning. In the transformation algorithm,
logical dependencies between a goal and negative clauses
are analyzed to find irrelevant negative clauses, so that
the forward-chaining hypothetical reasoners based on the
upside-down meta-interpretation can restrict consistency
checking of negative clauses to those relevant clauses.
The transformed program has been evaluated with a
logic circuit design problem.
1
Introduction
Hypothetical reasoning [Inoue 88] is a technique for proving the given goal from axioms together with a set of hypotheses that do not contradict with the axioms. Hypothetical reasoning is related to abductive reasoning and
default reasoning.
A forward-chaining hypothetical reasoner can be constructed by simply combining a bottom-up reasoner
with the assumption-based truth maintenance system
(ATMS) [de Kleer 86-1] (for example [Flann et al. 87,
Junker 88]). We have implemented a forward-chaining
hypothetical reasoner [Ohta and Inoue 90], called APRICOT /0, which consists of the RETE-based inference
engine [Forgy 82] and the ATMS. With this architecture, we can reduce the total cost of the label computations of the ATMS by giving intermediate justifications to the ATMS at two-input nodes in the RETElike networks. On the other hand, hypothetical rea-
soning based on top-down reasoning has been proposed
in [Poole et al. 87, Poole 91]. Compared with top-down
(backward-chaining) hypothetical reasoning, bottom-up
(forward-chaining) hypothetical reasoning has the ad-'
vantage of avoiding duplicate proofs of repeated subgoals
and duplicate proofs among different contexts. Bottomup reasoning, however, has the disadvantage of proving
unnecessary sub goals that are unrelated to the proofs of
the goal.
To avoid the disadvantage of bottom-up reasoning,
Magic Set method [Bancilhon et al. 86] and Alexander
method [Rohmer et al. 86] have been proposed for deductive database systems. Recently, it is shown that
Magic Set and Alexander methods are interpreted as
specializations of the upside-down meta-interpretation
[Bry 90). The upside-down meta-interpretation has been
extended to abduction and deduction with non-Horn
clauses in [Stickel 91]. His abduction, however, does not
require the consistency of solutions.
Since the consistency requirement is crucial for some
applications, we would like to make programs in'dude
negative clauses for our hypothetical reasoning. When
programs include negative clauses, however, the upsidedown meta-interpretation method does not achieve
speedups because checking the consistency of solutions
by ,negative clauses should be globally evaluated.
We' present a new transformation algorithm of programs for efficient forward-chaining hypothetical reasoning based on the upside-down meta-interpretation. In
the transformation algorithm, logical dependencies between a goal and negative clauses are analyzed to find
irrelevant negative clauses, so that the forward-chaining
hypothetical reasoners based on the upside-down metainterpretation can restrict consistency checking of negative clauses to those relevant clauses. The transformed
program has been evaluated with a logic circuit design
problem.
In Section 2, our hypothetical reaso~ing is defined with
the default proofs [Reiter 80T. In Section 3, the outline
of the ATMS is sketched. Section 4 shows the basic algorithm for hypothetical reasoning based on the bottom-up
reasoner MGTP [Fujita and Hasegawa 91) together with
523
the ATMS. Section 5 presents two transformation algorithms based on the upside-down meta-interpretation.
One is a simple transformation algorithm, the other is
the transformation algorithm with the abstracted dependency analysis. We have implemented the hypothetical
reasoner and these program transformation systems, and
Section 6 shows the result of an experiment for the evaluation of the transformed programs. In Section 7, related
works are considered.
Let ~ be the set of all ground instances of the normal
defaults of D. A default proof [Reiter 80] of G with respect to (D, W) is a sequence ~o" .. ,~k of subsets of
~ if and only if
1. WU CONSEQUENTS(~o) f- G,
2. for 1 ~ i
~
k,
Wu CONSEQUENTS(~i) fPREREQUISITES(~i_1)'
3. ~k
= 0,
4. WUUf=oCONSEQUENTS(~i) is consistent,
2
Problem Definition
where
In this section, we define our hypothetical reasoning
based on a subset of normal default theories [Reiter 80].
A normal default theory (D, W) and a goal G are given
as follows:
• W: a set of Horn clauses.
A Horn clause is represented in an implicational
form,
PREREQUISITES(~i_d
for (a : /3//3) E ~i-1 and
CONSEQUENTS(~i)
(2)
Here, ai (1 ~ i ~ nj n 2:: 0) and /3 are atomic
formulas, and 1.. designates falsity. Function symbols are restricted to O-ary function symbols. All
variables in a clause are assumed to be universally
quantified in front of the clause. Each Horn clause
has to be range-restricted, that is, all variables in
the consequent /3 have to appear in the antecedent
a1 /\ •.. /\ an. A Horn clause of the form (2) is called
a negative clause.
• D: a set of normal defaults.
A normal default is an inference rule,
a:/3
73'
• goal G: a conjunction of atomic formulas.
All variables in G are assumed to be existentially
quantified.
~d·
k
W U
U CONSEQUENTS(~i) F GO,
i=O
where the sequence
~o"'" ~k
is a default proof of
If GO is an answer to
G from (D, W), 0 is an answer substitution for G
from (D, W).
A support for an answer GO from
(D, W) is Uf=o CONSEQUENTS(~i)' where the sequence ~o" .. ,~k is a default proof of GO with respect
to (D, W). For an answer GO from (D, W), the minimal supports for GO from (D, W), written as MS(GO),
is the set of minimal elements in all supports for GO from
(D, W). The solution to G from (D, W) is the set of all
pairs (GO, MS(GO)), where GO is an answer to G from
(D, W) and MS(GO) is the minimal supports for GO.
The task of our hypothetical reasoning is defined to find
the solution to a given goal from a given normal default
theory.
G with respect to (D, W).
(3)
where a, called the prerequisite of the normal default, is restricted to a conjunction a1 /\ ... /\ an of
atomic formulas and /3, called its consequent, is restricted to an atomic formula. Function symbols are
restricted to O-ary function symbols. All variables in
the consequent /3 have to appear in the prerequisite
a. A normal default with free variables is identified
with the set of its ground instances. The normal
default can be read as " if a and it is consistent to
assume /3, then infer /3".
== {/3 I (a: /3//3) E
A ground instance GO of the goal G is an answer to G
from (D, W) if
(1)
or
== /\ a
3
ATMS
The ATMS [de Kleer 86-1] is used as one component of
our hypothetical reasoner. The following is the outline
. of the ATMS.
In the ATMS, a ground atomic formula is called a datum. For some datum N, r N designates an assumption.
The ATMS treats both 1.. and r N as special data. The
ATMS represents each datum as an ATMS node:
(datum, label, justifications).
Justifications correspond to ground Horn clauses and are
incrementally input to the ATMS. Each justification is
denoted by:
524
where Ni and N are data. Each datum Ni is called an
antecedent, and the datum N is called a consequent. In
the slot justifications, the ATMS records the set of antecedents of justifications whose consequents correspond
to the datum.
Let H be a current set of assumptions. An assumption
set E ~ H is called an environment. When we denote
an environment by a set of assumptions, each assumption
fN is written as N by omitting the letter f. Let J be a
current set of justifications. An environment E is called
nogood if JuE derives .1-. The label of the datum N is the
set of environments {E1 , · · · , Ej, ... , Em} that satisfies
the following four properties [de Kleer 86-1]:
where ai(l :S; i :S; n;n ~ 0) and {3;(1 ~ j ~ m;m ~ 0)
are atomic formulas and all variables in {31 V ... V {3m
have to appear in al 1\ ... 1\ an. Each clause in P is
. translated into a KL1 [Ueda and Chikayama 90] clause.
Then, model candidates are generated from the set of
KL1 clauses. The MGTP works as a bottom-up reasoner
on the distributed-memory multiprocessor called MultiPSI.
As shown in Figure 1, we can construct a hypothetical reasoner by combining the MGTP with the ATMS.
The normal default theory (D, W) i~ translated into a
program P,
P
== { al 1\ ... 1\ an ---+ assume({3) I
(al 1\ ... 1\ an :
1. N holds in each E j (soundness),
2. every environment in which N holds is a superset of
some E j (completeness),
{3 / {3) ED} U W,
where assume is a metapredicate not appearing anywhere in D and W.
3. each E j is not nogood (consistency),
Infer~nce
4. no E j is a subset of any other (minimality).
If the label of a datum is not empty, the datum is believed; otherwise it is not believed. A basic algorithm
to compute labels [de Kleer 86-1] is as follows. When
a justification is incrementally input to the ATMS, the
ATMS updates the labels relevant to the justification in
the following procedure.
Step 1: Let L be the current label of the consequent
N of the justification and Li be the current label
of the i-th antecedent Ni of the justification. Set
L' = L U {x I x = Ui:l E j , where Ei E Ld.
Step 2: Let L" be the set obtained by removing nogoods and subsumed environments from L'. Set the
new label of N to L".
Step 3: Finish this updating if L is equal to the new
label.
Step 4: If N is -1, then remove all new nogoods from
labels of all data other than -1.
Step 5: Update labels of the consequents of the
recorded justifications which contain N as their antecedents.
4
Hypothetical Reasoner with
ATMS and MGTP
The MGTP [Fujita and Hasegawa 91] is a model generation theorem prover for checking the unsatisfiability of
a first-order theory P. Each clause in P is denoted by:
Engine
MGTP
Justifications
ATMS
Beliefs
Figure 1: Forward-Chaining Hypothetical Reasoner with ATMS and MGTP
proced ure R( G, P) :
begin
Bo:= 0;
Jo := { (:::} {3) I (---+ {3) E P }
U { (f.6 :::} {3) I (---+ assume({3)) E P };
s:= 0;
while J s -1= 0 do
begin
s := s + 1;
Bs := UpdateLabels(Js_1 , AT MS);
J s := GenerateJustifications(Bs, P, B s- 1 )
end;
Solution := 0;
for each () such that G() E Bs do
begin
LGe := GetLabel(G(),ATMS);
Solution := Solution U {(G(), LGe)}
end;
return Solution
end.
Figure 2: Reasoning Algorithm with ATMS and
MGTP
The reasoning procedure R(G,P) for the MGTP with
the ATMS is shown in Figure 2. The reasoning proce-
525
dure consists of the part for UpdateLabels - GenerateJustifications cycles and the part for constructing the
solution. The UpdateLabels - GenerateJustifications cycles are repeated while Is is not empty. The ATMS
updates the labels related to a justification set l s - 1
given by the MGTP. The ATMS returns the set Bs
of all the data whose labels are not empty after the
ATMS has updated labels with Is-I. The procedure
UpdateLabels( Is-I, AT M S) returns a believed data set
Bs. The MGTP generates each set Is of justifications
by matching elements of Bs with the antecedent of every clause related to new believed data. The procedure
Generate1ustifications(Bs , P, B s - 1 ) returns a new justification set Is. If any element in (Bs \ B s- 1 ) can match
an element of the antecedent of any (0'.1 I\. ... I\. an ~ X)
in P and there exists a ground substitution ~ for all ai
such that ai~ E B s , then Is is as follows.
• (al~'···' an~,
f,Bu ~ f3~) E
Is if X = assume(f3).
The procedure GetLabel(GO,ATMS) returns the label
of GO and is used in constructing the solution. Note
that the label of GO corresponds to the minimal supports for GO. The hypothetical reasoner with the ATMS
and the MGTP can avoid duplicate proofs among different contexts and repeated proofs of subgoals. However,
there may be a lot of unnecessary proofs unrelated to the
proofs of the goal.
5
5.1
Upside-Down
Meta-Interpretation
Simple Transformation Algorithm
Bottom-up reasoning has the disadvantage of proving
unnecessarily subgoals that are not related to proofs of
the given goal. We introduce a simple transformation
of a program P on the basis of the upside-down metainterpretation for speedups of bottom-up reasoning by
incorporating goal information. A bottom-up reasoner
interprets a Horn clause
in such a way that the fact f3~ is derived if facts
al~,· .. ,an~ are present for some substitution~. On
the other hand, a top-down reasoner interprets it in such
a way that goals al~,·· . ,an~ are derived if a goal f3~
is present, and fact f3~ is derived if both a goal f3~ and
facts al~,···, an~ are present. We transform the Horn
clause
into
goal(f3)
for every
ai
~
goal(ai)
(1 ~ i ~ n) and
goal(f3)
I\. al I\. ... I\.
an
~
13,
then a bottom-up reasoner can simulate top-down reasoning. Here, goal is a metapredicate symbol which does
not appear in the original program P. After some facts
related to the proofs of the goal have derived with the
upside-down meta-interpretation, those facts may derive
contradiction with bottom-up interpretation of the original program. Thus, we transform each negative cla~se
into
and
~
goal(ai)
for every ai (1 ~ i ~ n). This means that every subgoal
related to negative clauses is evaluated.
Note that (goal(f3) ~ goal(ai)) or (~ goal(ai)) may
not be satisfy the range-restricted condition. We have
some techniques which make every clause in transformed
programs range-restricted. Here, we take a very simple
technique in which only the predicate symbols are used
as the arguments of the metapredicate goal. When, is
an atomic formula, we denote by 1 the predicate symbol
of ,. The algorithm T1 as shown in Figure 3 transforms
an original program P into the program P in which the
top-down information is incorporated. The solution to
G from T1 (G, P) is always the same as the solution to G
from P because all subgoals related to negative clauses
as well as the given goal are evaluated and every label of
goal (;8) for any atomic formula 13 is {0}.
For example, consider a program,
Pb = {
~ penguin(a),
penguin(X) ~ bird(X),
bird(X) ~ assume(fly(X)),
fly(X) I\. notfly(X) ~ .1..,
penguin(X) ~ notfly(X) }.
By the simple transformation algorithm, we get
T1(fly, Pb ) =
{ goal(penguin) ~ penguin(a),
goal(bird) I\. penguin(X) ~ bird(X),
goal(bird) ~ goal(penguin),
goal(fly) I\. bird(X) ~ assume(fly(X)),
goal(fly) ~ goal(bird),
fly(X) I\. notfly(X) ~ .1..,
~ goal(fly),
~ goal(notfly),
goal(notfly) I\. penguin(X) ~ notfly(X),
goal(notfly) ~ goal(penguin) }
u { ~ goal(fly) }.
526
Next, consider the goal bird(X). Then, the transformed
program Tl(bird, Pb ) is the program
Tl(bird, Pb ) = { ... } U {-+ goal(bird) },
where only the last element (-+ goal(Jly)) of Tl(Jly, Pb )
is replaced with (-+ goal(bird)). Even if the goal
is bird(X), both goal(Jly) and goal(notfly) are evaluated because { ... } includes (-+ goal(Jly)) and (-+
goal(notfly)) for the negative clause. Then, the computational cost of R( bi rd (X), Tl (bi rd, Pb )) is nearly equal
to the cost of R(Jly(X),Tl(Jly,Pb )).
U
P:= 0;
for each (al 1\ ... 1\ an -+ X) E P do
begin
if X =..L then
begin
P := P U {al 1\ ... 1\ an -+ ..L};
for j := 1 until n do
P := P U {-+ goal(aj)}
end
else if X = assume(,8) then
begin
P := P U {goal(fJ) 1\ al 1\ .. . 1\ an -+ assume(,8)};
for j:= 1 until n do
-+
goal( aj)}
end
else if X =,8 then
begin
P := P U {goal(fJ) 1\ al 1\ ... 1\ an -+ ,8};
for j:= 1 until n do
P := P U {goal(fJ)
J=. {(al, .. ·,an,f,a=}fJ)
-+
goal(aj)}
end
end;
P := P U {-+ goal( C)};
return P
end.
Figure 3: Simple Transformation Algorithm Tl
Transformation Algorithm with
.Abstracted Dependency Analysis
In this subsection, we describe a static method to find
irrelevant negative clauses to evaluation of the goal. If
we can find such irrelevant negative clauses, for every
antecedent ai of each irrelevant clause, we do not need to
add (-+ goal(ai)) into the transformed program. We try
to find them by analyzing logical dependencies between
{(a!"", an =} false(C)) I
C = (al 1\ ... 1\ an -+ ..L), C E P}.
Let .ii be the set of propositions appearing in J. Note
that .ii consists of all predicate symbols in P and all
f alse( C) for C E P. For each proposition N in .ii, we
compute a set of abstracted environments on which N
depends. Now, we show an algorithm to compute the
set of abstracted environments. This algorithm is obtained by modifying the label-updating algorithm shown
in Section 3. The modified points are as follows.
1. Replace Step 2 with
Step 2': Set the new label of N to L'.
2. Remove Step 4.
Every proposition in .ii is labeled with the set of abstracted environments obtained by applying the modified algorithm to the abstracted justifications J. This
label is called the abstracted label of the proposition.
The system to compute the set of abstracted environments for each proposition is called an abstracted dependency analyzer. The reasons why we have to modify the
label-updating algorithm are as follows. Firstly, in the
abstracted justifications, every 1. is replaced with the
proposition false(C) for the negative clause C, so that
each abstracted label is always consistent. Thus, we do
not need Step 4. Secondly, each abstracted label may
not be minimal because we replace Step 2 with Step 2'.
Suppose that every abstracted label is minimal. Then,
the theorem we present below may not hold. For example, let
Pe =
5.2
I
(al 1\ ... 1\ an -+ assume(,8)) E P}
U {( aI, ... , an =} fJ) I (al 1\ ... 1\ an -+ ,8) E P}
procedure Tl(C, P) :
begin
P := P U {goal(fJ)
the goal and each negative clause at the abstracted level.
We do not care about any argument in the abstracted
dependency analysis.
When, is an atomic formula, we denote by the proposition i the predicate symbol of ,. For each negative
clause C, the proposition false(C) is used as the identifier of C. For every (a -+ ass u me (,8) ), fJ is called an
assumable-predicate symbol. For any environment E, its
abstracted environment (denoted by E) is { f,B I f j3 E E}.
The abstracted justifications with respect to P is defined
as:
{ -+ p(a),
-+ p(b), -+ q(b), q(X)
p(X) -+ assume(r(X)),
p(X) -+ assume(s(X)),
r(a) -+ g, r(X) 1\ s(X) -+ g,
r(X) 1\ s(X) 1\ t(X) -+ 1. } .
-+
t(X),
Consider the problem defined with the goal 9 and Pe.
The abstracted label of 9 is { {r }, {r, s} } . The abstracted
label of the negative clause is {{ r, s}}. The abstracted
environment {r, s} cannot be omitted for 9 although the
set of minimal elements in the abstracted label of 9 is
{{r}}.
527
procedure T2(G, P) :
begin
P:= 0;
J:= 0;
k:= 0;
for each (al /\ ... /\ an - t X)E P do
begin
if X = l.. then
begin
k := k + 1
P:= P U {al/\···/\ an - t l..};
J
:=
J U {(aI,···, an
=;.
false(k))};
end
else if X = assume(,8) then
begin
P:=PU
{goal(j3) /\ al /\ ... /\ an - t assume(,8)};
J U {(al,···, an, rj3 =;. j3)};
J
:=
for j:= 1 until n do
P := P U {goal(j3) - t goal(aj)}
end
else if X =,8 then
begin
p := P U {goal(j3) /\ al /\ ... /\ an - t ,8};
J
:=
J U {(al,···, an
=;.
j3)};
for j:= 1 until n do
P := P U {goal(j3)
-t
goal(aj)}
end
end;
UpdateAbstractedLabels(J, ADA);
La := GetAbstractedLabel( G, ADA);
for i:= 1 until k do
begin
Li
GetAbstractedLabel(false(i) , ADA);
for each Ea E La do
for each Ei E Li do
if Ei ~ Ea then
for (aI,···, an =;. f alse( i)) E J do
:=
for j := 1 until n do
P := P U { - t goal(aj)}
end;
P := P U { - t goal(G)};
return
end.
P
Figure 4: Transformation Algorithm T2 with Abstracted Dependency Analysis
Theorem: Let P be a normal default theory and G
a goal, J the abstracted justifications with respect to
P , L(G) the abstracted label of G , L(false(C)) the
abstracted label of f alse( C) where C E P. If no element
in L(false(C)) is a subset of any element in L(G), then
the solution to G from P is equivalent to the solution to
G from P \ {C}.
Sketch of the proof: Let C be (a - t l..) and pI
be P \ {C}. Assume that ()m is any answer substitution
for G from pI and ak is any answer substitution for a
from P'. Let MS(aak) be the minimal supports for aak
from pI and M S( G()m) be the minimal supports for G()m
from P'. Suppose that no element in L(false(C)) is a
subset of any element in L(G). From the supposition and
similarity between ATMS labels and abstracted labels,
no element in MS(aak) is a subset of any element in
MS(GO m ). Therefore, the solution to G from pI U {C}
is the same as the solution to G from P'.
•
On the basis of the theorem, we can omit consistency checking for a negative clause C if the condition
of the theorem is satisfied. The transformation algorithm T2(G, P) with the abstracted dependency analysis
is shown in Figure 4 for the program P and the goal G.
In Figure 4, UpdateAbstractedLabels( J, AD A) denotes
the procedure which computes abstracted labels from abstracted justifications J with the abstracted dependency
analyzer ADA, and GetAbstractedLabel(G, ADA) denotes the procedure which returns the abstracted label of
G from the abstracted dependency analyzer ADA. The
procedure transforms an original program into the program in which the top-down information is incorporated
and consistency checking is restricted to those negative
clauses relevant to the given goal.
Consider the same example Pb , shown in the previous subsection, in case that the goal is bird(X). The
abstracted justifications Jb is
penguin), (penguin =;. bird), (bird,r f1y =;. fly),
(fly,notfly =;. false(l)), (penguin =;. notfly) }.
{ (=;.
As the result of the abstracted dependency analysis,
the abstracted label of false(l) is {{fly}} and the abstracted label of bird is {0}. Then, no element in the
abstracted label of f alse(l) is a subset of any element in
the abstracted label of bird, so that we do not need to
evaluate this negative clause. As a consequence, we have
the transformed program:
T2(bird, Pb ) =
{ goal(penguin) - t penguin(a),
goal(bird) /\ penguin(X) - t bird(X),
goal(bird) - t goal(penguin),
goal(fly) /\ bird(X) - t assume(fly(X)),
goal(fly) - t goal(bird),
fly(X) /\ notfly(X) - t .1,
goal(notfly) /\ penguin(X) - t notfly(X),
goal(notfly) - t goal(penguin) }
- t goal(bird) }.
u{
528
Since the transformed program does not include (---+
goal(Jly)) and (---+ goal(notfly)), the reasoner can omit
solving both the goal fly(X) and the goal notfly(X).
6
Evaluation with Logic Design
Problem
We have taken up the design of logic circuits to calculate the greatest common divisor (GCD) of two integers
expressed in 8 bits by using the Euclidean algorithm.
The solutions are circuits calculating GCD and satisfying
given constraints on area and time [Maruyama et al. 88].
The program P d contains several kinds of knowledge:
datapath design, component design, technology mapping, CMOS standard cells and constraints on area and
time [Ohta and Inoue 90]. The design problem of calculators for GCD includes design of components such as
subtracters and adders.
Table 1 shows the expermental result, on a PseudoMulti-PSI system, for the evaluation of the transformed
programs. The run time of a program P for a goal G
is denoted by TR(G,p). The predicate symbol G of each
goal G is adder (design of adders), subtracter (design of
subtracters) or cGCD (design of calculators for GCD).
The run time TR(G,Pd) of each goal G is equal to the others
on the original program Pd'
T a bl e 1: R un T'lme 0 fP rogralll
Goal G
TR(G,Pd) [s]
TR(G,P1 ) [s]
T R (G,P2) [s]
adder
10.7
17.5
0.4
subtracter
10.7
17.3
0.6
cGCD
10.7
17.3
16.8
Let PI be the simple transformed program of Pd' The
experiment on the simple transformation time shows that
it takes 6.35 [s] for making PI from Pd. However, the run
time TR(G,Pl) for each goal G is nearly equal to the others because constraints on area and time of the GCD
calculators are represented by negative clauses. Even if
we want to design adders or subtracters, the hypothetical reasoner cannot avoid designing GCD calculators for
consistency checking.
Let P2 be the transformed program with the abstracted dependency analysis. The experiment on the
transformation time with the abstracted dependency
analysis shows that it takes 6.63 [s] for making P2 from
Pd. The transformation time with the abstracted dependency analysis is a little bit longer (0.28 [s]) than
the simple transformation time. When G is adder or
subtracter, the run time TR (G,P2 ) is much shorter than
the run time for the design of GCD calculators. This is
because the program can avoid consistency checks for
negative clauses representing constraints on area and
time of the GCD calculators when the design of adders
or the design of subtracters is given as a goal. The result show that each total of the transformation time with
abstracted dependency analysis and the run time of the
transformed program is shorter than the run time of the
original program when the problem does not need the
whole of the program.
7
Related Work
The algorithm for first-order Horn-clause abduction with
the ATMS is presented in [Ng and Mooney 91]. The system is basically a consumer architecture [de Kleer 86-3]
introducing backward-chaining consumers. The algorithm avoids both redundant proofs by introducing the
goal-directed backward-chaining consumers and duplicate proofs among different contexts by using the ATMS.
Their problem definition is the same as [Stickel 90],
whose inputs are a goal and a set of Horn clauses without
negative clauses. When there are negative clauses in the
program, they briefly suggest that forward-chaining consumer can be used for each negative clause to check the
consistency. On the other hand, since we only simulate
backward-chaining by the forward-chaining reasoner, we
do not require both types of chaining rules. Moreover,
we see that when the program includes negative clauses,
it is sometimes difficult to represent the clauses as a set
of consumers. For example, suppose that the axioms are
{a---+c, b---+d, cAd---+g, c---+e, d---+f,
eAf---+~}
and the goal is g. Assume that the set of consumers is
{(c
~
a), (d ~ b), (g ~ c, d),
(e ~ c), (J ~ d), (e,f =*
~)},
where ~ means a backward-chaining consumer and
=* means a forward-chaining consumer. Then, we
get the solution {(g, {{g}, {a, b}, {a, d}, {c, b}, {c, d}})}.
However, the correct solution is {(g, {{g}})} because
{a, b}, {a, d}, {c, b} and {c, d} are nogood. To guarantee the consistency when the program includes negative
clauses, for every Horn clause, we have to add the corresponding forward-chaining consumer. Such added consumers would cause the same problem as the program
that appeared in using the simple transformation algorithm.
In [Stickel 91], deduction and abduction with the
upside-down meta-interpretation are proposed. This abduction does not require the consistency of solutions.
Furthermore, rules may do duplicate firing in different
contexts since it does not use the ATMS. This often
causes a problem when it is applied to practical programs
where heavy procedures are attached to rules.
Another difference between the frameworks of
[Ng and Mooney 91, Stickel 91] and ours is that their
529
frameworks treat only hypotheses in the form of normal defaults without prerequisites, whereas we allow for
normal defaults with prerequisites.
8
Conclusion
We have presented a new transformation algorithm of
programs for efficient forward-chaining hypothetical reasoning based on the upside-down meta-interpretation. In
the transformation algorithm, logical dependencies between a goal and negative clauses are analyzed at abstracted level to find irrelevant negative clauses, so that
consistency checking of negative clauses can be restricted
to those relevant clauses. It has been evaluated with a
logic circuit design problem on a Pseudo-Multi-PSI system.
We can also apply this abstracted dependency analysis to transformed programs based on Magic Set and
Alexander methods. Our dependency analysis with only
predicate symbols may be extended to an analysis with
predicate symbols and their some arguments.
Acknowledgments
Thanks are due to Mr. Makoto Nakashima of JIPDEC
for implementing the ATMS and combining it with the
MGTP. We are grateful to Prof. Mitsuru Ishizuka of the
University of Tokyo for the helpful discussion. We would
also like to thank Dr. Ryuzo Hasegawa and Mr. Miyuki
Koshimura for providing us the MGTP, and Dr. Koichi
Furukawa for his advise. Finally, we would like to express our appreciation to Dr. Kazuhiro Fuchi, Director
of ICOT Research Center, who provided us with the opportunity to conduct this research.
References
[Bancilhon et al. 86] F. Bancilhon, D. Maier, Y. Sagiv
and J.D. Ullman, Magic Sets and Other Strange
Ways to Implement Logic Programs, Proc. of ACM
PODS, pp.I-15 (1986).
[Bry 90] F. Bry, Query evaluation in recursive databases:
bottom-up and top-down reconciled, Data fj
Knowledge Engineering, 5, pp.289-312 (1990).
[de Kleer 86-1] J. de Kleer, An Assumption-based TMS,
Artificial Intelligence, 28, pp.127-162 (1986).
[de Kleer 86-2] J. de Kleer, Extending the ATMS, Artificial Intelligence, 28, pp.163-196 (1986).
[de Kleer 86-3] J. de Kleer, Problem Solving with
the ATMS, Artificial Intelligence, 28, pp.197-224
(1986)
[Flann et al. 87] N.S. Flann, T .G. Dietterich and
D.R. Corpron, Forward Chaining Logic Program-
ming with the ATMS, Proc. of AAAI-87, pp.24-29
(1987).
[Forgy 82] C.L. Forgy, Rete: A Fast Algorithm for the
Many Pattern/Many Object Pattern Match Problem, Artificial Intelligence, 19, pp.17-37 (1982).
[Fujita and Hasegawa 91] H. Fujita and R. Hasegawa,
A Model Generation Theorem Prover in KLI Using a Ramified-Stack Algorithm, Proc. of ICLP '91,
pp.494-500 (1991).
[Inoue 88] K. Inoue, Problem Solving with Hypothetical
Reasoning, Proc. of FGCS '88, pp.1275-1281 (1988).
[Junker 88] U. Junker, Reasoning in Multiple Contexts,
GMD Working Paper No.334 (1988).
[Maruyama et al. 88] F. Maruyama, T. Kakuda, Y. Masunaga, Y. Minoda, S. Sawada and N. Kawato, coLODEX: A Cooperative Expert System for Logic
Design, Proc. of FGCS '88, pp.1299-1306 (1988).
[Ng and Mooney 91] H.T. Ng and R.J. Mooney, An Efficient First-Order Abduction System Based on the
ATMS, Technical Report AI 91-151, The University
of Texas at Austin, AI Lab. (1991).
[Ohta and Inoue 90] Y. Ohta and K. Inoue, A ForwardChaining Multiple-Context Reasoner and Its Application to Logic Design, Proc. of IEEE TAl, pp.386392 (1990).
[Poole et al. 87] D. Poole, R. Goebel and R. Aleliunas,
Theorist: A logical Reasoning System for Defaults
and Diagnosis, N. Cercone and G. McCalla (Eds.),
The Knowledge Frontier: Essays in the Representation of Knowledge, Springer-Verlag, pp.331-352
(1987).
[Poole 91] D. Poole, Compiling a Default Reasoning System into Prolog, New Generation Computing, 9,
pp.3-38 (1991).
[Reiter 80] R. Reiter, A Logic for Default Reasoning, Artificial Intelligence, 13, pp.81-132 (1980).
[Rohmer et al. 86] J. Rohmer,
R. Lescoeur and
J.M. Kerisit, The Alexander Method - A Technique for The Processing of Recursive Axioms in
Deductive Databases, New Generation Computing,
4, pp.273-285 (1986).
[Stickel 90] M.E. Stickel, Rationale and Methods for Abductive Reasoning in Natural~Language Interpretation, Lecture Nodes in Artificial Intelligence, 459,
Springer-Verlag, pp.233-252 (1990).
[Stickel 91]
M.E. Stickel, Upside-Down Meta-Interpretation of
the Model Elimination Theorem-Prover Procedure
for Deduction and Abduction, ICOT Technical Report TR-664, ICOT (1991).
[Ueda and Chikayama 90] K. Ueda and T. Chikayama,
Design of the Kernel Language for the Parallel Inference Machine, The Computer Journal, 33, 6, pp.
494-500 (1990).
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
530
Logic Programming, Abduction and Probability
David Poole
Department of Computer Science,
University of British Columbia,
Vancouver, B.C., Canada V6T lZ2
poole@cs.ubc.ca
telephone: (604) 822 6254
fax: (604) 822 5485
Abstract
Probabilistic Horn abduction is a simple framework to combine probabilistic and logical reasoning into a coherent practical framework.
The numbers can be consistently interpreted
probabilistically, and all of the rules can be interpreted logically. The relationship between
probabilistic Horn abduction and logic programming is at two levels. At the first level
probabilistic Horn abduction is an extension of
pure Prolog, that is useful for diagnosis and
other evidential reasoning tasks. At another
level, current logic programming implementation techniques can be used to efficiently implement probabilistic Horn abduction. This forms
the basis of an "anytime" algorithm for estimating arbitrary conditional probabilities. The
focus of this paper is on the implementation.
1
Introduction
Probabilistic Horn Abduction [Poole, 1991c; Poole,
1991b; Poole, 1992a] is a framework for logic-based abduction that incorporates probabilities with assump-·
tions. It is being used as a framework for diagnosis
[Poole, 1991c] that incorporates both pure Prolog and
Bayesian Networks [Pearl, 1988] as special cases [Poole,
1991b]. This paper is about the relationship of proba..;
bilistic Horn abduction to logic programming. This simple extension to logic programming provides a wealth of
new applications in dia&nosis, recognition and evidential
reasoning [Poole, 1992aJ.
This paper also presents a logic-programming solution
to the problem in abduction of searching for the "best"
diagnoses first. The main features of the approach are:
• We are using Horn clause abduction. The procedures are simple, both conceptually and computationally (for a certain class of problems) .. We develop a simple extension of SLD resolution to implement our framework.
• The search algorithms form "anytime" algorithms
that can give an estimate of the conditional probability at any time. We do not generate the unlikely
explanatiolls unless we Ileed Lo. 'vVe have a boulld on
the probability mass of the remaining explanations
which allows us to know the error in our estimates.
• A theory of "partial explanations" is developed.
These are partial proofs that can be stored in a priority queue until they need to bf further expanded.
We show how this is implemented in a Prolog interpreter in Appendix A.
2
Probabilistic Horn abduction
The formulation of abduction used is a simplified form
of Theorist [Poole et al., 1987; Poole, 1988] with probabilities associated with the hypotheses. It is simplified in being restricted to definite clauses with simple
forms of integrity constraints (similar to that in [Goebel
et al., 1986]). This can also be seen as a generalisation of an ATMS [Reiter and de Kleer, 1987] to be nonpropositional.
The language is that of pure Prolog (i.e., definite
clauses) with special disjoint declarations that specify a
set of disjoint hypotheses with associated probabilities.
There are some restrictions on the forms of the rules and
the probabilistic dependence allowed. The language presented here is that of [Poole, 1992a] rather than that of
[Poole, 1991c; Poole, 1991b].
The main design considerations were to make a language the simplest extension to pure Prolog that also
included probabilities (not just numbers associated with
rules, but numbers that follow the laws of probability,
and so can be consistently interpreted as probabilities
[Poole, 1992al). \Ve are also assuming very strong independence assumptions; this is not intended to be a
temporary restriction on the language that we want to
eventually remove, but as a feature. We can represent any probabilistic information using only independent hypotheses [Poole, 1992a]; if there is any dependence amongst hypotheses, we invent a new hypothesis
to explain that dependency.
2.1
The language
Our language uses the Prolog conventions, and has the
same definitions of variables, terms and atomic symbols .
Definition 2.1 A definite clause is of the form: a.
or (l t - al 1\ .. . 1\ (In. where (l a.nd each (li are a.tomic
symbols.
531
Definition 2.2 A disjoint declaration is of the form
disjoint([hl : PI, .. " h n : Pn]).
where the hi are atoms, and the Pi are real numbers
Pi :::; 1 such that PI + ... + Pn = 1. Any variable
appearing in one hi must appear in all of the h j (i.e., the
hi share the same variables). The hi will be referred to
as hypotheses.
o :::;
Definition 2.3 A probabilistic Horn abduction
theory (which will be referred to as a "theory") is a collection of definite clauses and disjoint declarations such
that if a ground atom h is an instance of a hypothesis
in one disjoint declaration, then it is not an instance of
another hypothesis in any of the disjoint declarations.
Assumption 2.7 (acyclicity) If F' is the set of ground
instances of elements of F, then it is possible to assign
a natural number to every ground atom such that for
every rule in F' the atoms in the body of the rule are
strictly less than the atom in the head.
This assumption is discussed in [Apt and Bezem,
1990].
Assumption 2.8 The rules in F' for a ground nonassumable atom are covering.
That is, if the rules for a in F' are
Given theory T, we define
f-
a
f-
a
FT the facts, is the set of definite clauses in T together
with the clauses of the form
false
a
f-
hi 1\ h j
where hi and h j both appear in the same disjoint
declaration in T, and i f. j. Let Ff be the set of
ground instances of elements of FT.
HT to be the set of hypotheses, the set of hi such that
hi appears in a disjoint d~claration in T. Let Hfr
be the set of ground instances of elements of HT.
a
f-
some_other _reason_for_a
and making "some_other _reason_for _a" a hypothesis
[Poole, 1992a].
Lemma 2.9 [Console et al., 1991; Poole, 1988] Under
assumptions 2.6, 2.7 and 2.8, if expl(g, T) is the set of
minimal explanations of 9 from theory T:
v
9
eiEexpl(g,T)
disjoint declaration in T.
Definition 2.4 [Poole et at, 1987; Poole, 1987] If 9 is
a closed formula, an explanation of 9 from (F, H) is a
set D of elements of H' such that
• F U D 1= 9 and
• F U D ~ false.
The first condition says that D is a sufficient cause for
g, and the second says that D is possible.
Definition 2.5 A minimal explanation of 9 is an explanation of 9 such that no strict subset is an explanation
of g.
2.2
Assumptions about the rule base
Probabilistic Horn abduction also contains some assumptions about the rule base. It can be argued that
these assumptions are natural, and do not really restrict
what can be represented [Poole, 1992a]. Here we list
these assumptions, and use them in order to show how
the algorithms work.
The first assumption we make is about the relationship
between hypotheses and rules:
Assumption 2.6 There are no rules with head unifying
with a member of H.
Instead of having a rule implying a hypothesis, we
invent a new atom, make the hypothesis imply this atom,
and all of the rules imply this atom, and use this atom
instead of the hypothesis.
Bm
if a is true, one of the Bi is true. Thus Clark's completion
[Clark, 1978] is valid for every non-assumable. Often we
get around this assumption by adding a rule
PT is a function Hfr .- [0,1]. PT(hD = Pi where h~ is a
ground instance of hypothesis hi, and hi : Pi is in a
Where T is understood from context, we omit the subscript.
f-
BI
B2
Assumption 2.10 The bodies of the rules in F' for an
atom are mutually exclusive.
Given the above rules for a, this means that
Bi 1\ Bj
=>
false
is true in the domain under consideration for each i =1= j .
We can make this true by adding extra conditions to the
rules to make sure they are disjoint .
Lemma 2.11 Under assumptions 2.6 and 2.10, minimal explanations of atoms or conjunctions of atoms are
mutually inconsistent.
See [Poole, 1992a] for more justification of these assumptions.
2.3
Probabilities
Associated with each possible hypothesis is a prior probability. We use this prior probability to compute arbitrary probabilities ..
The following is a corollary oflemmata 2.9 and 2.11
Lemma 2.12 Under assumptions 2.6, 2.7, 2.8, 2.10
and 2.13, iJ expl(g, T) is the set oj minimal explanations oj conjunction oj atoms 9 Jront probabilistic IIorn
abduction theory T:
P(g)
p
L"~(9 e;)
,T)
2:=
eiEexpl(g,T)
P(ei)
532
Thus to compute the prior probability of any 9 we sum
the probabilities of the explanations of g.
To compute arbitrary conditional probabilities, we use
the definition of conditional probability:
rule«h :- c, e».
rule«h :- g, b».
disjoint([b:O.3,c:O.7]).
disjoint([e:O.6,f:O.3,g:O.1]).
There are four minimal explanations of a, namely
P( 1{3) = P( a: 1\ {3)
a:
P({3)
{e,e}, {b,e}, {j,b} and {g,b}.
The priors of the explanations are as follows:
Thus to find arbitrary conditional probabilities
P(a:\{3), we find P({3), which is the sum of the explanations of {3, and P( a:1\{3) which can be found by explaining
a: from the explanations of {3. Thus arbitrary conditional
probabilities can be c9mputed from summing the prior
probabilities of explanations.
It remains only to compute the prior probability of
an explanation D of g. We assume that logical dependencies impose the only statistical dependencies on the
hypotheses. In particular we assume:
Assumption 2.13 Ground instances of hypotheses
that are not inconsistent (with FT) are probabilistically
independent. That is, different disjoint declarations define independent hypotheses.
The hypotheses in a minimal explanation are always
logically independent. The language has been carefully
set up so that the logic does not force any dependencies
amongst the hypotheses. If we could prove that some
hypotheses implied other hypotheses or their negations,
the hypotheses could not be independent. The language
is deliberately designed to be too weak to be able to state
such logical dependencies between hypotheses.
Under assumption 2.13, if {hI, .. " h n } are part of a
minimal explanation, then
P(c 1\ e) = 0.7 x 0.6 = 0.42.
Similarly P(bl\e)
0.03. Thus
pea) = 0.42 + 0.18 + 0.09 + 0.03 = 0.72
There are two explanations of e 1\ a, namely {c, e} and
{b, e}.Thus pee 1\ a) = 0.60. Thus the conditional
probability of e given a is P(ela) = 0.6/0.72 = 0.833.
What is important about this example is that all of
the probabilistic calculations reduce to finding the probabilities of explanations.
2.5
1. Generate the explanations of some goal (conjunction
of atoms), in order.
2. Determine the prior probability of some goal. This
is implemented by enumerating the explanations of
the goal.
3. Determine the posterior probabilities of the explanations of a goal (i.e., the probabilities of the explanations given the goal).
4. Determine the conditional probability of one formula given another. That is, determining P(a:I{3)
for any a: and {3.
i)
1=1
To compute the prior of the minimal explanation we multiply the priors of the hypotheses. The posterior probability of the explanation is proportional to this.
The following is a corollary of lemmata 2.9 and 2.11
Len1l1.la 2.14 Under assumptions 2.6, 2.7, 2.8, 2.10
and 2.13, if exp/(g, T) is the set of all minimal explanations of 9 from theory T:
C'YCg'T) e
i
P
L
)
P(ei)
eiEexpl(g,T)
2.4 An example
In this section we show an example that we use later in
the paper. It is intended to be as simple as possible to
show how the algorithm works.
Suppose we have the rules and hypotheses:
rule«a
rUle«a
rule( (q
rule«q
rule«h
b, h».
q,e».
h».
b,e».
b, f».
All of these will be implemented by enumerating the
explanations of a goal, and estimating the probability
mass in the explanations that have not been enumerated. It is this problem that we consider for the next few
sections, and then return to the problem of the tasks we
want to compute.
3
peg)
Tasks
The following tasks are what we expect to implement:
n
IT P(h
= 0.18, P(J 1\ b) = 0.09 and P(gl\b) =
A top-down proof procedure
In this section 'we show how to carry out a best-first
search of the explanations. In order to do this we build
a notion of a partial proof that we can add to a priority
queue, and restart when, necessary.
3.1
SLD-BF resolution
In this section we outline an implementation based on
logic programming technology and a branch and bound
search.
The implementation keeps a priority queue of sets
of hypotheses that could be extended into explanations
("partial explanations"). At any time the set of all the
explanations is the set of already generated explanations,
plus those explanations that ca.n be generated from the
pa.rtial explanations in the priority queue.
533
Q:= {(g <-- g, {})};
II := {};
repeat
choose and remove best (g <-- C, D) from Q;
if C true
then if good(D) then II := II U {D} endif
else Let C
aAR
for each rule(h <-- B) where mgu(a, h) = 0
Q := Q U {(g <-- BAR, D) O} ;
if a E Hand good( {a} U D)
then Q := Q U {(g <-- R, {a} U D)}
endif
endif
until Q = {}
where good(D) == (Vd 1 ,d2 E D fJ1J E NG3cjJ (d 1 ,d2 ) = 1JcjJ)
A (fJ7r E II, 3cjJ D ~ 7rcjJ)
=
=
Definition 3.2 A partial explanation (g
valid with respect to (F, H) if
Proof:
This is proven by induction on the
number of times through the loop.
It is trivially true initially as q ~ q for any q.
There are two cases where elements are added
to Q. In the first case (the "rule" case) we know
by the inductive assumption, and so
F
C, D)
As
a()
(g
<--
C, D)
b1 A ... A bn A R, D) 0
and add it to the priority queue.
The second operation is used when a E H. In this
case we produce the partial explanation
(g
<--
R, {a} U D)
and add it to Q. We only do this if {a} U D is consistent,
and is not subsumed by another explanation of q. Here
we assume the set N G of pairs of hypotheses that appear in the same disjoint declaration (corresponding to
nogoods in an ATMS [Reiter and de Kleer, 1987]). Unlike in an ATMS this set can be built at compile time
from the disjoint declarations.
This procedure will find the explanations in order of
likelihood. Its correctness is based on the meaning of a
partial explanation
=?- h)O
F= (D A R A B
=?-
g)O.
The other case is when a E H. By the induction
step
we choose an element
in F, such that h and a have most general unifier 0, we
generate the partial explanation
F= (B
= h(), by a simple resolution step we have
F
Figure 1 gives an algorithm for finding explanations of
of the priority queue Q with maximum prior probability
of D.
We have an explanation when C is the empty conjunction (represented here as true). In this case D is added
to the set II of already generated explanations.
Otherwise, suppose C is conjunction a A R.
There are two operations that can be carried out. The
first is a form of SLD resolution [Lloyd, 1987], where for
each rule
h <-- b1 A ... A bn
(DARAa=?-g)O
F
q in order of probability (most likely first). At each step
<--
F=
We also know
where 9 is an atom (or conjunction of atoms), C is a
conjunction of atoms and D is a set of hypotheses.
(g
1S
Lemma 3.3 Every partial explanation m the queue Q
is valid with respect to (F, H).
Definition 3.1 a partial explanation is a structure
<--
C, D)
FF=DAC~g
Figure 1: SLD-BF Resolution to find explanations of 9
in order.
(g
<--
F
F= D
F
F= (D A a) A R
A
(a
A R) =?- 9
and so
~ g
If D only contains elements of H and a is an element of H then {a}UD only contains elements
of H. 0
It is now trivial to show the following:
Corollary 3.4 Every element of II in figure 1 is an explanation of q.
Although the correctness of the algorithm does not
depend on which element of the queue we choose at any
time, the efficiency does. We choose the best partial explanation based on the following ordering of partial explanations. Partial explanation (gl <-- C 1 , D 1) is better
than (g2 <-- C2, D 2) if P(D 1) > P(D 2). It is simple to
show that "better than" is a partial ordering. \"'hen we
choose a "best" partial explanation we choose a minimal
element of the partial ordering; where there are a number
of minimal partial explanations, we can choose anyone.
When we follow this definition of "best", we enumerat.e
the minimal explanations of q in order of probability.
3.2
Our example
III this section we show how the simple example in Section 2.4 is handled by the best-first proof process.
The following is the sequence of values of Q each time
through the loop (where there are a number of minimal explanations, we choose the element that was added
534
4
last):
Discussion
4.1 Probabilities in the queue
{(at-a,U)}
{(a t- b /\ h, U) , (a t- q /\ e, U)}
We would like to give an estimate for P(g) after having
{(a t- q /\ e, U), (a t- h, {b})}
generated only a few of the most likely explanations of g,
{(a t- h /\ e, U), (a t- b /\ e /\ e, U), (a t- h, {b})}
and get some estimate of our error. This problem reduces
{(a t- b /\ I /\ e, {}) , (a t- C /\ e /\ e, U) ,
to estimating the probability of partial explanations in
(a (- 9 /\ b /\ e, {}), (a t- b /\ e /\ e, U), (a t- h, {b})}
the queue.
{{a (- c /\ e /\ e, {}) , (a t- 9 /\ b /\ e, U) ,
If (g (- C, D) is in the priority queue, then it can pos(a (- b /\ e 1\ e, U), (a (- 1/\ e, {b}), (a (- h, {b})}
sibly be used to generate explanations D I , ... , Dn. Each
{(a (- 9 /\ b /\ e, {}) , (a (- b 1\ e /\ e, {}) , (a t- e /\ e, {c}) , Di will be of the form D U D~. We can place a bound on
(a (- 11\ e, {b}), (a (- h, {b})}
the probability mass of all of the Di, by
{(a (- b 1\ e 1\ e, {}), (a (- e 1\ e, {c}) , (a (- 1/\ e, {b}) ,
P(D I V .. · V Dn) = P(D /\ (D~ V ... V D~»
(a (- h, {b}) , (a t- b /\ e, {g})}
::; P(D)
{ (a t- e /\ e, {c}) , (a t- e /\ e, {b}) , (a t- I /\ e, {b}) ,
(a (- h, {b}), (a t- b /\ e, {g})}
{(a (- e, {e,c}), (a t- e /\ e,{b}), (a t- 1/\ e, {b}),
(a (- h, {b}), (a t- b /\ e, {g})}
{(a (- true, {e, c}), (a t- e /\ e, {b}) , (a t- 1/\ e, {b}),
(a (- h, {b}), (a (- b /\ e, {g})}
Thus the first, and most likely explanation is {e, c}.
{(a (- e 1\ e, {b}) , (a (- 1/\ e, {b}), (a (- h, {b}),
(a (- b /\ e, {g})}
(a (- I /\ e, {b}), (a (- h, {b}), (a (- e, {e, b}),
{(a (- b /\ e, {g})}
{ (a t- h, {b }) , (a (- e, {e, b}) , (a (- b /\ e, {g}) ,
(a (- e, {I, b})}
{(a (- b /\ I, {b}), (a (- c /\ e, {b}), (a (- 9 /\ b, {b}) ,
(a t- e, {e, b}), (a (- b /\ e, {g}) , (a (- e, {I, b})}
{(a t- I, {b}), (a t- c /\ e, {b}), (a t- 9 /\ b, {b}),
(a t- e, {e, b}), (a t- b /\ e, {g}) , (a (- e, {I, b})}
{(a t- c /\ e, {b}) , (a (- 9 /\ b, {b}) , (a t- e, {e, b}) ,
(a t- b /\ e, {g}) , (a (- true, {I, b}), (a (- e, {I, b})}
Here the algorithm effectively prunes the top partial
explanation as (c, b) forms a nogood.
Given this upper bound, we can determine an upper
bound for P(g), where {el," . , en} is the set of all minimal explanations of g:
P(g)
P(el V e2 V ... V en)
peel) + P(e2) + .,. + peen)
L p(ei») + (
( found
ei
L
ej
p(ej)~
to be generated
We can easily compute the first of these sums, and can
put upper and lower bounds on the second. This means
that we can put a bound on the range of probabilities of
a goal based on finding just some of the explanations of
the goal. Suppose we have goal g, and we have generated
explanations II. Let
PIT =
L
P(D)
DeIT
PQ =
L
P(D)
D:{g<-C,D}eQ
where Q is the priority queue.
{(a (- 9 /\ b, {b}), (a (- e, {e,b}), (a t- b /\ e, {g}),
vVc then have
(a t- true, {I, b}), (a t- e, {I, b})}
PIT ::; peg) ::; PIT + PQ
{(a - e, {e, b}) , (a - b /\ e, {g}) , (a t- true, {I, b}) ,
(a t- e, {I, b}), (a t- b, {g, b})}}
As the computation progresses, the probability mass
{(a - t1'ue, {e, b}), (a (- b /\ e, {g}), (a (- true, {I, b}),
in the queue PQ approaches zero l and we get a better
(a - e, {I, b}), (a (- b, {g, b})}
refinement on the value of P(g). This thus forms the
We have now found the second most likely explanation, namely {e, b}.
{(a - b /\ e, {g}), (a t- true, {I, b}), (a - e, {I, b}),
(a - b, {g, b})}
{(a (- true, {I, b}), (a (- e, {I, b}), (a - e, {g, b}),
(a-b,{g,b})}
We have thus found the third explanation
{I, b}.
{(a (- e, {I, b}), (a (- e, {g, b}), (a (- b, {g, b})}
{(a - e, {g, b}), (a (- b, {g, b})}
{(a - b, {g, b})}
{(a -true,{g,b})}
The fourth explanation is {g, b}. There are no more
partial explanations and the process stops.
basis of an "anytime" algorithm for Bayesian networks.
4.2 Conditional Probabilities
We can also use the above procedure to compute conditional probabilities. Suppose we are trying to compute
the conditional probability P( aLB). This can be computed from the definition:
P( 1,8) = P( a /\ ,8)
a
P(,8)
We compute the conditional probabilities by enumerating the minimal explanations of a/\,8 and,8. Note that
the minimal explanations of a 1\,8 are explanations (not
1 Note that the estimate given above does not always decrease. It is possible that the error estimate increases. [Poole,
1992b] considers cases where convergence can be guaranteed.
)
535
necessarily minimal) of (3. We can compute the explanations of a 1\ (3, by trying to explain a from the explanations of (3. The above procedure can be easily adapted
for this task, by making the task to explain (31\ a, and
making sure we prove (3 before we prove a, so that we
can collect the explanations of (3 as a we generate them.
Let pf3 be the sum of the probabilities of the explanations of (3 enumerated, and let pcx/l.f3 be the sum of the
explanations of a 1\ (3 generated.
Thus given our estimates of P( a 1\ (3) and P((3) we
have
pcx/l.f3
pcx/l.f3 + PQ
pf3 + PQ :::; P(al(3) :::;
pf3
The lower bound is the case where all of the partial descriptions in the queue go towards worlds implying (3,
but none of these also lead to a. The upper bound is the
case where all of the elements of the queue go towards
implying a, from the explanations already generated for
(3.
4.3
Consistency and subsumption checking
One problem that needs to be considered is the problem of what happens when there are free variables in
the hypotheses generated. When we generate the hypotheses, there may be some instances of the hypotheses
that are inconsistent, and some that are consistent. We
know that every instance is inconsistent if the subgoal is
subsumed by a nogood. This can be determined by substituting constants for the variables in the the subgoal,
and finding if a subset unifies with a nogood.
We cannot prune hypotheses because all instance is inconsistent. However, when computation progresses, we
may substitute a value for a variable that makes the partial explanation inconsistent: This problem is similar to
the problem of delaying negation-as-failure derivations
[Naish, 1986], and of delaying consistency checking in
Theorist [Poole, 1991a]. We would like to notice such
inconsistencies as soon as possible. In the algorithm of
Figure 1 we check for inconsistency each time a partial explanation is taken off the queue. There are cases
where we do not have to check this explicitly, for example when we have done a resolution step that did not
assign a variable. There is a trade-off between checking
consistency and allowing some inconsistent hypotheses
on the queue 2 • This trade-off is beyond the scope of this
paper.
Note that the assumptions used in building the system
imply that there can be no free variables in any explanation of a ground goal (otherwise we have infinitely many
disjoint explanations with bounded probability). Thus
delaying subgoals eventually grounds all variables.
4.4
Iterative deepening
In many search techniques we often get much better
space complexity and asymptotically the same time complexity by using an iterative deepening version of a
search procedure [Korf, 1985]. An iterative deepening
version of the best-first search procedure is exactly the
2We have to check the consistency at some time. This
could be as late as just before the explanation is added to II.
same as the iterative deepening version of A * with the
heuristic function of zero [Korf, 1985]. The algorithm of
procedure 1 is given at a level of abstraction which docs
not preclude iterative deepening.
For our experimental implementations, we have used
an interesting variant of iterative deepening. Our queue
is only a "virtual queue" and we only physically store
partial explanations with probability greater than some
threshold. We remember the mass of the whole queue,
including the values we have chosen not to store. When
the queue is empty, we decrease the threshold. We can
estimate the threshold that we need for some given accuracy. This speeds up the computation and requires less
space.
Recomputing subgoals
One of the problems with the above procedure is that
it recomputes explanations for the same subgoal. If s is
queried as a subgoal many times then we keep finding
the same explanations for s. This has more to do with
the notion of SLD resolution used than with the use of
branch and bound search.
.
We are currently experimenting with a top-down procedure where we remember computation that we have
computed, forming "lemmata". This is similar to the use
of memo functions [Sterling and Shapiro, 1986] or Earley
deduction [Pereira and Shieber, 1987] in logic programming, but we have to be very careful with the interaction between making lemmata and the branch and bound
search, particularly as there may be multiple answers to
any query, and jllst because we ask a query docs not
mean we want to solve it (we may only want to bound
the probability of the answer).
4.5
4.6 Bounding the priority queue
Another problem with the above procedure that is not
solved by lemmatisation is that the bound on the priority queue can become quite large (i.e., greater than one).
Some bottom-up procedures [Poole, 1992b], can have an
accurate estimate of the probability mass of the queue
(i.e., an accurate bound on how much probability mass
could be on the queue based on the information at hand).
See [Poole, 1992b] for a description of a bottom-up procedure that can be compared to the top-down procedure
in this paper. In [Poole, 1992b] an average case analysis
is given on the bottom-up procedure; while this is not
an accurate estimate for the top-down procedure, the
case where the bottom-up procedure is efficient [Poole,
1992b] is the same case where the top-down procedure
works well; that is where there are normality conditions
that dominate the probability of each hypothesis (i.e.,
where all of the probabilities are near one or near zero).
5
COHlparison with other systen1S
There are many other proposals for logic-based abduction schemes (e.g., [Pople, 1973; Cox and Pietrzykowski,
1987; Goebel et ai., 1986; Poole, 1987]). These, however,
consider that we either find an arbitrary explanation or
find all explanations. In practice there are prohibitively
many of these. It is also not clear what to do with all
of the explanations; there are too many to give to a
536
user, and the costs of determining which of the explanations is the "real" explanation (by doing tests [Sattar
and Goebel, 1991]) is usually not outweighed by the advantages of finding the real explanation. This is why
it is important to take into account probabilities. We
then have a principled reason for ignoring many explanations. Probabilities are also the right tool to use when
we really are unsure as to whether something is true or
not. For evidential reasoning tasks (e.g., diagnosis and
recognition) it is not up to us to decide whether some
hypothesis is true or not; all we have is probabilities
and evidence to work out what is most likely true. Similar considerations motivated the addition of probabilities
to consistency-based diagnosis [de Kleer and Williams,
1989].
Perhaps the closest work to that presented here is that
of Stickel [Stickel, 1988]. His is an iterative deepening
search for the lowest cost explanation. He does not consider probabilities.
6
U sing existing logic programming
technology
In this section we show how the branch and bound search
can be compiled into Prolog. The basic idea is that when
we are choosing a partial explanation to explore, we can
choose any of those with maximum probability. If we
choose the last one when there is more than one, we
carry out a depth-first search much like normal Prolog,
except when making assumptions. We only add to the
priority queue when making assumptions, and let Prolog
do the searching when we are not.
6.1
Remaining subgoals
Consider what subgoals remain to be solved when we are
trying to solve a goal. Consider the clause:
h
b1 /\ b2 /\ ••• /\ bm .
Suppose R is the conjunction of subgoals that remain
to be solved after h in the proof. If we are using the
leftmost reduction of subgoals, then the conjunction of
sub goals remaining to be solved after subgoal bi is
Suppose in our proof we select a possible hypothesis
h of cost P( {h}) with U being the conjunction of goals
remaining to be solved, and T the set of currently assumed hypotheses with cost peT). We only want to
consider this as a possible contender for the best solution if P( {h} U T) is the minimal cost of all proofs being
considered. The minimal cost proofs will be other proofs
of cost peT). These can be found by failing the current
subgoal. Before we do this we need to add U, with hypotheses {h} U T to the priority queue. When the proof
fails we know there is no proof with the current set of
hypotheses; we remove the partial proof with minimal
cost from the priority queue, and continue this proof.
We do a branch and bound search over the partial
explanations, but when the priorities are equal, we use
Prolog's search to prefer the last added. The overhead on
the resolution steps is low; we only have to do a couple
more simple unifications (a free variable with a term).
The main overhead occurs when we reach a hypothesis.
Here we store the hypotheses and remaining goals on
a priority queue and continue or search by failing the
current goal. This is quick (if we implement the priority
queue efficiently); the overhead needed to find aU proofs
is minimal.
Appendix A gives code necessary to run the search
procedure.
7
Conclusion
This paper has considered a logic programming approach
that uses a mix between depth-first and branch-andbound search strategies for abduction where we want
to consider probabilities, and only want to generate the
most likely explanations. The underlying language is
a superset of pure Prolog (without negation-as-failure),
and the overhead of executing pure Prolog programs is
small.
f-
bi+1 /\ ... /\ bm
/\
R
The total information of the proof is contained in the
partial explanation at the point we are in the proof, i.e.,
in the remaining subgoals, current hypotheses and the
associated answer. The idea we exploit is to make this
set of subgoals explicit by adding an extra argument to
each atomic symbol that contains all of the remaining
subgoals.
6.2 Saving partial proofs
There is enough information within each subgoal to
prove the top level goal it was created to solve. When we
have a hypothesis that needs to be assumed, the remaining subgoals and the current hypotheses form a partial
explanation which we save on the queue. We then fail
the current subgoal and look for another solution. If
there are no solutions found (i.e., the top level computation fails), we can choose a saved subgoal (according to
the order given in section 3.1), and continue the search.
A
Prolog interpreter
This appcndix gives a brief overvicw of a lnctainterpreter. Hopefully it is enough to be able to build
a system. Our implementation contains more bells and
whistles, but the core of it is here.
A.l
Prove
prove(G, To, T 1 , Go, G 1 , U)
means that G can be proven with current assumptions
To, resulting in assumptions Tl, where Gi is the probability of Ii, and U is the set of remaining subgoals.
The first rule defining prove is a special purpose rule
for the case where we have found an explanation; this
reports on thc answer found.
prove(ans(A),T,T,C,C,_) :- !,
ans(A,T,C).
The remaining rules are the real definition, that follow
a normal pattern of Prolog meta-interpreters [Sterling
and Shapiro, 1986].
prove(true,T,T,C,C,_) :- !.
prove((A,B),TO,T2,CO,C2,U) :- !,
537
prove(A,TO,Ti,CO,Ci,(B,U»,
prove(B,Ti,T2,Ci,C2,U).
prove(H,T,T,C,C,_) :hypothesis(H,PH),
member(H, T), ! .
prove(H,T,[HIT],C,Ci,U)
hypothesis(H,PH),
\+ (( member(Hi,T), makeground((H,Hi»,
nogood(H,Hi) »,
Ci is C*PH,
add_to_PQ(process([HITJ,Ci,U»,
fail.
prove(G,TO,Ti,CO,Ci,U) :rul(G,B),
prove(B,TO,Ti,CO,Ci,U).
A.2
Rule and disjoint declarations
We specify the rules of our theory using the declaration
rule(R) where R is the form of a Prolog rule. This asserts
the rule produced.
rule((H :- B» :- !,
assert(rul(H,B».
rule(H) :assert(rul(H,true».
The disjoint declaration forms nogoods and declares
probabilities of hypotheses.
:- ope 500, xfx, : ).
disjoint( [J).
disjoint([H:PIRJ)
assert(hypothesis(H,P»,
make_disjoint(H,R),
disjoint(R).
make_disjoint(_,[]).
make_disjoint(H,[H2: _ I RJ)
assert(nogood(H,H2»,
assert(nogood(H2,H»,
make_disjoint(H,R).
A.3
Explaining
To find an explanation for a subgoal C we execute
explain( C). This creates a list of solved explanations
and the probability mass found (in "done"), and creates
an empty priority queue.
explain(G) :assert(done([J,O»,
initQ,
ex ( (G, ans (G) ) , [J ,1) , ! •
We can report the explanations found, the estimates
of the prior probability of the hypothesis, etc, by defining ans(C, D, C), which means that we have found an
explanation D of C with probability C.
ans ( G, [J , _ ) :llriteln( [G, I is a theorem. ,]), ! .
ans(G,D,C) :allgood(D),
qmass (QM) ,
retract(done(Done,DC»,
DCi is DC+C,
assert(done([expl(G,D,C)IDone],nCi»,
TC is DCi + QM,
llriteln(['Probabilityof I,G,
,= [1,DCi,',I,TC,IJ']),
Pri is C / TC,
Pr2 is C / DCi,
llriteln( ['Explanation: I ,nJ),
llriteln(['Prior = I,CJ),
llriteln(['Posterior = [',Pri,', I,Pr2, IJIJ).
more is a way to ask for more answers. It will take
the top priority partial proof and continue with it.
more :- ex(fail,_,_).
A.4 Auxiliary relations used
The following relations were also used. They can be
divided into those for managing the priority queue, and
those for managing the nogoods.
We assume that there is a global priority queue into
which one can put formulae with an associated cost and
from which one can extract the least cost formulae. We
assume that the priority queue persists over failure of
subgoals. It can thus be implemented by asserting into
a Prolog database, but cannot be implemented by carrying it around as an extra argument in a meta-interpreter
[Sterling and Shapiro, 1986], for example. We would like
both insertion and removal from the priority queue to be
carried out in log n time where n is the number of elements of the priority queue. Thus we cannot implement
it by having the queue asserted into a Prolog database
if the asserting and retracting takes time proportional
to the size of the objects asserted or retracted (which it
seems to in the implementations we have experimented
with).
Four operations are defined:
initQ
initialises the queue to be the empty queue, with zero
queue mass.
add_to_PQ(process(D, C, U))
exeC, D, C) tries to prove C with assumptions D such
that probability of Dis C. If G cannot be proven, a partial proof is taken from the priority queue and restarted.
This means that ex( C, D, C) succeeds if there is some
proof that succeeds.
adds assumption set D, with probability C and remaining subgoals U to the priority queue. Adds C to the
queue mass.
ex (G, D, C) :prove(G,D,_,C,_,true).
ex(_,_,_) :remove_from_PQ(process(D,C,U»,!,
ex(U,D,C).
if the priority queue is not empty, extracts the element with highest probability (highest value of C) from
the priority queue and reduces the queue mass by C.
remove_!1'om_PQ fails if the priority queue is empty.
remove_from_PQ(process(D, C, U))
qmass(l\l)
538
returns the sum of the probabilities of elements of the
queue.
We assume the relation for handling nogoods:
fails if L has a subset that has been declared nogood.
[Poole et ai., 1987] D. Poole, R. Goebel, and R. Aleliunas. Theorist: A logical reasoning system for defaults
and diagnosis. In N. Cercone and G McCalla editors
r,he Knowledge Frontier: Essays i'n the Re~resenta~
tzon of Knowledge, pages 331-352. Springer-Verlag,
New York, NY, 1987.
Acknowledgements
[Poole, 1987] D. Poole. A logical framework for default
reasoning. Artificial Intelligence, 36(1):27-47, 1987.
allgood(L)
Thanks to Andrew Csinger, Keiji Kanazawa and Michael
Horsch for valuable comments on this paper. This
research was supported under NSERC grant OGP0044121, and under Project B5 of the Institute for
Robotics and Intelligent Systems.
References
[Apt and Bezem, 1990] K. R. Apt and M. Bezem.
Acyclic programs (extended abstract). In Logic Programming: Proceedings of the Seventh International
Conference, pages 617-633. MIT Press, 1990.
[Clark, 1978] K. L. Clark. Negation as failure. In H. Gallaire and J. Minkel', editors, Logic and Databases,
pages 293-322. Plenum Press, New York, 1978.
[Console et al., 1991] L. Console, D. Theseider Dupre,
and P. Torasso. On the relationship between abduction and deduction. Journal of Logic and Computation, 1991.
[Cox and Pietrzykowski, 1987] P.
T.
Cox
and T. Pietrzykowski. General diagnosis by abductive inference. Technical Report CS8701, Computer
Science, Technical University of Nove Scotia Halifax
April 1987.
'
,
[de Kleer and Williams, 1989] J. de Kleer and B. C.
Williams. Diagnosis with behavioral modes. In Proc.
11th International Joint Con/. on Artificial Intelligence, pages 1324-1330, Detroit, August 1989.
[Goebel et al., 1986] R. Goebel, K. Furukawa, and
D. Poole. Using definite clauses and integrity constraints as the basis for a theory formation approach
to diagnostic reasoning. In E. Shapiro, editor, Proc.
Third International Conference on Logic Programming, pages 211-222, London, July 1986.
[Korf, 1985] K. E. Korf. Depth-first iterative deepening:
an optimal admissable tree search. Artificial Intelligence, 27(1):97-109, September 1985.
[Lloyd, 1987] J. W. Lloyd. Foundations of Logic Programming. Symbolic Computation Series. SpringerVerlag, Berlin, second edition, 1987.
[Naish, 1986] L. Naish. Negation and Control in Pro10g.Lecture Notes in Computcr Scicnce 238. Springer
'
Verlag, 1986.
[Pearl, 1988] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA, 1988.
[Pereira and Shieber, 1987] F. C. N. Pereira and S. M.
Shieber. Prolog and Natural-Language Analysis. Center for the Study of Language and Information, 1987.
[Pool~,
1988] D,. Pool~. Representing knowledge for
logIc-based dIagnosIs. In International Conference
on Fifth Generation Computing Systems, pages 12821290, Tokyo, Japan, November 1988.
[Poole, 19~1a] D. Poole. Compiling a default reasoning
system mto Prolog. New Generation Computing Journal, 9(1):3-38, 1991.
[Poole, 1991b] D. Poole. Representing Bayesian networks within probabilistic Horn abduction. In Proc.
Seventh Con/. on Uncertainty in Artificial Intelligence, pages 271-278, Los Angeles, July 1991.
[Poole, 1991c] D. Poole. Representing diagnostic knowledge for probabilistic Horn abduction. In Proc. 12th
International Joint Conf. on Artificial Intelligence,
pages 1129-1135, Sydney, August 1991.
[Poole, 1992a] D. Poole. Probabilistic Horn abduction
and I3ayesian networks. Technical Report 92-2, Department of Computer Science, University of I3ritish
Columbia, January 1992.
[Poole, 1992b] D. Poole. Search for computing posterior
probabilities in I3ayesian networks. Proc. Eighth Con/.
on Uncertainty in Artificial Intelligence, submitted,
Stanford, California, July 1992.
[Pople, 1973] H. E. Pople, Jr. On the mechanization
of abductive logic. In Proc. 3rd International Joint
Conf. on Artificial Intelligence, pages 147-152, Stanford, August 1973.
[Reiter and de Kleer, 1987] R. Reiter and J. de Kleer.
Foundations of assumption-based truth maintenance
systems: preliminary report. In Proc. 6th National
Conference on Artificial Intelligence, pages 183-188,
Seattle, July 1987.
[Sattar and Goebel, 1991] A. Sattar and R. Goebel. Using crucial literals to select better theories. Computational Intelligence, 7(1):11-22, February 1991.
[Sterling and Shapiro, 1986] L. Sterling and E. Shapiro.
The Art of Prolog. MIT Press, Cambridge, MA, 1986.
[Stickel, 1988] M. E. Stickel. A prolog-like inference system for computing minimum-cost abductive explanations in natural language interpretations. Technical Note 451, SRI International, Menlo Park, CA,
September 1988.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
539
Abduction in Logic Programming with Equality
P.T. Cox, E. Knill, T. Pietrzykowski
Technical University of Nova Scotia, School of Computer Science
P.O. Box 1000, Halifax, Nova Scotia
Canada B3J 2X4
Abstract
Equality can be added to logic programming by llsing
surface deduction. Surface deduction yields interpretations of unification failures in terms of residual hypotheses needed for unification to succeed. It can therefore
be used for abductive reasoning with equality. In surface deduction the input clauses are first transformed to
a flat form (involving no nested terms) and symmetrized
(if necessary). They are then manipulated by binary
resolution, a restricted version of factoring and compression. The theoretical properties of surface deduction,
including refutation completeness and weak deductive
completeness properties (relative to equality), are established in [Cox et al. 1991]. In this paper we show that
these properties imply that an enhancement of surface
deduction will yield all parsimoniolls hypotheses when
used as an abductive inference engine. The characterization of equational implication for goal clauses given
in [Cox et al. 1991] is shown to yield a uniquely defined
equationally equivalent residuum for every goal clause.
The residuum naturally represents the corresponding abductive hypothesis. An example illustrating the use of
surface deduction in abductive reasoning is presented.
1
Introduction
In abductive reasoning, the task is to explain a
given observation by introducing appropriate hypotheses
([Cox and Pietrzykowski 1987], [Goebel 1990]). Most
presentations of abduction do not include reasoning with
equality, nor do they allow the introduction of equality assumptions to explain an observation. A notable
exception is E. Charniak's work on motivation analysis [Charniak 1988]. Charniak allows the introduction of
certain restricted equality assumptions to determine motivations for observed actions. He shows that the introduction of such equality assumptions is required to successfully abduce motivations. In this paper we consider
the problem of abductive reasoning with Horn clauses in
the presence of equality. We show that surface deduction has the necessary properties for use in an abductive
inference system provided that the input theory contains
the function substitutivity axioms.
In the presence of equality, an abduction problem
consists of a theory T and a formula 0 (the observation).
An explanation of (0, T) is a formula E consistent with
T such tha.t E together with T equationally implies O.
We will assume that 0 and E are existentially quantified
conjunctions of facts and that T is a Horn clause theory.
One way to obtain an explanation E, given an observation 0 and a theory T, is to deduce -,E from T and
-,0. Since explanations with less irrelevant information
are preferred (the pa1'simony principle), it is sufficient to
deduce a clause -,E' such that -,E' implies -,E. Intuitively, E' is at least as good an explanation as E (see
Section 4). It follows that a deduction system adequate
for abductive reasoning should satisfy a weak deductive
completeness: If the theory T implies a non-tautological
clause -,E, then we must be able to deduce a clause -,E'
from T such that -,E' implies -,E. In the absence of
equality, SLD-resolution (see [Lloyd 1984]) satisfies this
condition.
The problem of introducing equality to Horn clause
logic has been well-studied, see [Holldobler 1989] for an
excellent overview. The simplest approach to this problem involves adding the equality axioms (which are Horn
clauses) to the set of input clauses. However, unrestricted use of these axioms results in inefficiency. Furthermore, this approach does not yield any insights into
the degree to which the equality axioms are needed.
Paramodulation and other term rewriting systems do
not explicitly introduce new equality assumptions into
derivations and therefore do not satisfy the weak deductive completeness condition. Other approaches, such as
the ones in [van Emden and Lloyd 1984] and extended
in [Hoddinott and Elcock 1986] using the homogeneous
form of clauses, require restricting the form of the input
theory. Here, we use the results of [Cox et al. 1991] to
show that if equality is introduced to Horn clause logic
via surface deduction with the function substitutivity axioms, then all preferred explanations for an abduction
problem ca.n be obta.ined. The need for axioms of equality other than function substitutivity is thus eliminated.
540
In surface deduction, a set of input clauses is first
transformed to a flat form and symmetrized. The deduction then proceeds using linear input resolution for Horn
clauses (see [Lloyd 1984]) together with a limited use of
factoring and a new rule called compression. The additional deduction rules are equivalent to those restricted
uses of the reflexivity axiom (x ~ x :-) which preserve
flatness. They are required only at the end of a deduction.
A clause is flat if it has no nested functional expressions, and every variable which appears immediately to
the right of an equality symbol (~) appears only in such
positions. A stronger version of flatness requires that in
addition the clause is separated. This means that every
variable appears at most once in any given literal and has
only one occurrence inside a functional or relational expression. Symmetrization affects only those clauses with
equalities in their heads (see Section 3).
The idea of using flattening to add equality to theorem proving is due to [Brand 1975] and is applied
to logic programming in [Cox and Pietrzykowski 1986]
Flattening is
where surface deduction is defined.
closely related to narrowing. In narrowing the process of flattening is implicit in the deduction rules.
The relationship between the two methods is examined in [Bosco et al. 1988]. Separation of terms is implicit in the transformations to the homogeneous forms
of [Hoddinott and Elcock 1986]. The symmetrization
method used here is similar to the one introduced in
[Chan 1986] and does not increase the number of clauses
in the theory.
In [Cox et al. 1991] it is shown that surface deduction
satisfies a weak deductive completeness provided that the
input clauses are first transformed to separated form. As
an application of this result, equational implication for
goal clauses is found to have a simple syntactic characterization analogous to subsumption.
Once an explanation E is obtained by surface deduction, in what form should E be presented? For example
if ,E (the actual clause deduced) is given by
:- x
~
a, y
~
b, y
~
c,
then :- y ~ b, y ~ c is equationally equivalent to ,E.
Therefore the atom x ~ a is irrelevant and should be
removed. In Section 4 it is shown that the cha.racterization of equational implication for goal clauses given
in [Cox et al. 1991] implies that for every goal clause G
there is a uniquely defined equational residuum RES( G)
which cannot be further reduced without weakening
the corresponding explanation. The notion of equational residuum is related to that of prime implicates
used in switching theory [Kohavi 1978], truth maintenance systems [Reiter and de Kleer 1987] a.nd diagnoses
[de Kleer et al. 1988]. RES( G) is an equational prime
implicate of a flattening of C.
In Section 2 the terminology is established; in Section 3 surface deduction is defined and the completeness
results needed for abductive reasoning are given. In Section 4 the formalism of abductive reasoning with surface
deduction is discussed; and finally in Section 5 an example is presented of an abductive problem solved by using
surface deduction.
2
Preliminaries
Familiarity with logic programming is assumed (see
e.g. [Lloyd 1984]). As in [Holldobler 1990], let ~ denote
the equality predicate symbol. The usual equality symbol = is used exclusively for syntactic equality. If L is
an atom and C = {Ml , ... , Aln} is a set of atoms, then
L :- C denotes the Horn clause L V ,Ml V ... ,Mn. In
this expression, L is the head and G is the body of the
clause. A clause of the form :- C is a goal clause. The
atoms of C are the subgoals of :- G. A clause of the form
L:- is a fact. If C l , ... ,Gn are sets of atoms and G is
the union of the Gi , then L :- C l , ... , Cn means L :- C.
\i\/hen possible, set notation is omitted for one-element
sets.
If OP is an operation which maps clauses to clauses
and A is a set of clauses, then OP(A) = {OP(G) ICE
A}. Let (Y be a substitution. If Xi(Y = ti for i = 1, ... ,n
and X(Y = x for all other variables, then (Y is denoted by
{Xl f - t}, ... Xn f - t n }. A substitution (Y is variable-pure
iff X(Y is a variable for every variable x.
The expression 'most general unifier' is abbreviated
by 'mgu'. An equality is an atom of the form s ~ t. Let
['; be the set of equality axioms other than x ~ x :-. If
A and B are sets of clauses, then A satisfies (or implies)
B iff every model of A is a model of B. A equationally
satisfies (or implies) B iff A u ['; u {x ~ x :- } satisfies B.
A and Bare (equationally) equivalent iff each (equationally) satisfies the other. A is equationally inconsistent iff
A equationally implies the empty clause.
3
Surface Deduction
In surface deduction, a refutation of a set of input clauses
proceeds by first transforming the input clauses to a flat
form and then refuting the result using resolution, factoring and compression. The transformation subsumes
the equality axioms other than reflexivity. The rules of
factoring and compression subsume reflexivity.
Definition. Let C be a clause and t a term. An occurrence of t on the left-hand side (right-hand side) of an
equality t ~ s (s ~ t) in C is a root (surface) occurrence
of t in C. Every other occurrence of t is an internal occurrence of t. The term t is a root term of C iff it has
a root occurrence in G. Surface and internal terms are
defined analogously.
541
Definition. A clause C is flat iff
(i) every atom of C is of the form P(x}, ... , x n ),
x == f(XI,''''X n ) or x == y, and
(ii) no surface variable of C is a root or internal
variable of C.
Definition. Let C be a Horn clause. An elementary
flattening of C is obtained by either
(i) replacing some of the non-surface occurrences
of a non-variable term t by a new variable y and
adding the equality y == t to the body,
or
(ii) replacing some of the surface occurrences of a
root or internal variable x of C by a new variable
y and adding the equality x == y to the body.
An elementary flattening of the set of clauses A is obtained by replacing a clause in A by an elementary flattening of that clause.
Modifying a clause C by successive elementary flattenings eventually results in a flat clause (a flattening of
C) which cannot be flattened any further (Theorem 2
of [Cox and Pietrzykowski 1986]).
Definition. Let C be a clause. Then FLAT( C) denotes
a (arbitrary but fixed) flattening of C.
For any set of clauses A, FLAT(A) is equationally
equivalent to A. In [Cox et al. 1991] it is shown that for
refutation completeness the transformation FLAT subsumes the substitutivity axioms but not transitivity and
symmetry.
In order to subsume transitivity and symmetry, we
need another transformation.
Definition. Let C be a clause with an equality in its
head. Then C is symmetric iff C is of the form
x
== u :- x == v, s == v, y == u, y == t, 111
for some terms sand t and set of atoms 111, where x, y,
u and v do not occur in M, s or t. The set of clauses A
is symmetrized iff every clause C of A with an equality
in its head is symmetric.
Definition. Let C be a Horn clause. If C does not
have an equality in its head or if C is symmetric, then the
symmetrization SYM( C) of Cis C. If C is not symmetric
and of the form s == t :- 111, then SYM (C) is given by
x == u :- x
== v, s == v, y == 1l, Y == t, 111.
Note that if A is a set of Horn clauses, then SYM(A)
is equationally equivalent to A, and if A is flat, then
SYM(A) is flat. In [Cox et al. 1991] it is shown that
the transformation SYM subsumes transitivity and symmetry. In order to subsume substitutivity, transitivity
and symmetry, the transformations SYM and FLAT are
composed.
Flattening and symmetrization followed by SLDresolution using resolution with x == x :- as an additional
deduction rule is refutation complete for logic programming with equality. However, weak deductive completeness is not satisfied [Cox et al. 1991]. In order to obtain
weak deductive completeness an additional transformation is required.
Definition. A positive (negative) root occurrence of
the term t in the clause C is a root occurrence in the
head (body) of C.
Definition. The flat clause C is separated in the variable x iff
(i) every literal of C has at most one occurrence of
x,
(ii) C has at most one internal occurrence of x, and
(iii) if x has an internal occurrence in C, then x has
a negative root occurrence in C.
The clause C is separated iff C is separated in all its
variables.
If A is a set of separated flat Horn clauses, then
SYM( A) is separated. Separated clauses can be obtained
from a given fla.t clause by using the transformation SEP:
Definition. Let C be a flat clause and x a variable.
The clause SEP( C) is the separated flat clause obtained
by applying the following transformation to C: For every
variable x such that C is not separated in x, replace each
internal occurrence of x by a new variable Xi and add
the equalities x == y, Xl == y, x2 == y, ... to the body of C
(where y is a new surface variable).
The rules of factoring and compression used in surface
deduction are:
(i) Root factoring. The clause C, is a root factor of C
iff C, is obtained by factoring two equalities of C
with the same root variable.
(ii) Surface facto1'ing. The clause C' is a surface factor
of C iff C, is obtained by factoring two equalities
of C with the same surface term.
(iii) Root compression. The clause C, is a root compression of C iff C' is obtained by removing an equality
x == t from the body of C, where x has only one
occurrence in C.
(iv) Surface compression. The clause C' is a surface
compression of C iff C' is obtained by removing an
equality x == y from the body of C, where y has
only one occurrence in C.
542
A compression is a root or surface compression. A compression of a clause C is a clause C' obta.ined [rom C by
a sequence of applications of compression rules.
The soundness of root and surface factoring and
compression (in the presence of equality) is shown
in [Cox and Pietrzykowski 1986]. Observe that binary
resolution, surface and root factoring and compression preserve flatness. The relationship between factoring, compression and resolution with the reflexivity axiom is determined by the following result (proved
implicitly in [Cox and Pietrzykowski 1986] and explicitly in [Cox et al. 1991]; see also [Hoddinott and Elcock
1986]):
Theorem 3.1 Let ;- C be a fiat goal clause. If ;- c'
is a fiat goal clause obtained from ;- C by a sequence
of binary resolutions with x ~ x ;-, then ;- C' can be
obtained from ;- C by a sequence of root and sU1face
facto rings and compressions.
Definition. Let A be a set of fiat I-lorn clauses. The
flat goal clause C is S -deducibl e from A iff C can be
obtained from A by a sequence of binary resolutions,
surface and root factorings and compressions. Note that
we can assume that the deduction is linear. A is Srefutable iff the empty clause is S-deducible from A.
Definition. Let
:- C be a goal clause. An equa:- C is a minimal subclause of
REDU(FLAT( :- C)) which is equationally equivalent to
tional residuum of
:-C.
Every equational residuum of :- C is equationally
equivalent to :- C. The fact that every subclause of a
reduced clause is reduced implies that if :- C' is an equational residuum of :- C, then :- C' is reduced. The next
theorem shows that the equational residuum is unique.
Theorem 3.4 [Cox et al. 1991] Let ;- A' and ;- B' be
equational 1'esidua of the goal clauses ;- A and :- B respectively. Then ;- A is equationally equivalent to :- B
iff ;- A' is a variant of ;- B'.
4
Abduction USIng Surface Deduction
As an application of this result, the following theorem
is proved in [Cox et al. 1991]:
An existential conjunction of facts is a conjunction of
facts with all its free variables quantified existentially.
The abduction problem for Horn clause logic with equality can be stated as follows:
Abduction Problem: An abduction problem is a pair
(A, 0), where A is a theory of Horn clauses and 0 (the
observation) is an existential conjunction of facts. An
explanation of the abduction problem (A,O) is an existentia.l conjunction of facts E consistent with A such
that E and A equationally imply O.
Let -,0 and -,E denote the disjunctions of the negations of the constituent facts of 0 and E respectively.
Since E and A equationally imply 0 iff -,0 and A equationally imply -'E, a solution to an abduction problem
can be obtained by deducing a clause C from A and -,0,
and negating C to obtain E.
In general, it is desirable for an explanation E of
an abductive problem (A,O) to have certain additional
properties (see [Cox and Pietrzykowski 1987]). For example, an explanation E should not contain any facts
not required to yield the observation from A (the parsimony principle). Thus if E and E' are explanations
of (A, 0) and E equationally implies E', E' is preferred
over E. (Here 'preferred' is to be understood as 'at least
as good as'.)
For abduction, a desirable property of a deduction
system is that for every explanation E of an abductive
problem (A, 0), one can obtain an explanation preferred
over E. The weak completeness result of Theorem 3.2
implies that surface deduction with separated clauses
and the function substitutivity axioms has this property.
Theorem 3.3 Let :- A and ;- B be goal clauses. Then
Theorem 4.1 Let (A,O) be an abductive problem,
:- A equationally implies ;- B iff there is a variable-pure
substitution 0"" such that a compression of FLAT( :- A)O""
is included in REDU(FLAT( ;- B)).
whe're A contains the function substitutivity axioms.
Then for every explanation E of (A,O),
there is an explanation E' preferred over E such
To state the weak deductive completeness result for
flat, separated and symmetrized clauses, we need the
transformation defined next.
Definition. Let :- C be a fiat goal clause. Then :- C
is 1'educed iff :- C has no surface variables and no two
equalities of :- C have the same right-hand sides. A fla.t
reduced clause REDU( :- C) is obtained from :- C by
factoring equalities with identical right-hand sides until all right-hand sides are distinct, and by removing
all remaining equalities with surface va.ria.bles by surfa.ce
compression. Note that for every fla.t goal clause :- C,
REDV( :- C) is equationally equivalent to :- C.
Let ;- C be a goal
clause and A a set of Horn clauses which includes
Then A equathe function substitutivity axioms.
tionally implies ;- C iff the1'e is a fiat goal clause
;- C' such that for some variable-pure substitution (J",
:- C'O"" ~ REDU(FLAT( ;- C)) and ;- C, is S-deducible
from SYM(SEP(FLAT(A))).
Theorem 3.2 [Cox et al. 1991]
543
that ,E' is S-deducible from SYM(SEP(FLAT(A))) U
{SEP(FLAT( ,O))}.
Proof. This follows by Theorem 3.2 and the fact that
-,0 is a goal clause, so that it does not need to be sym•
metrized.
Fortunately, it appears that the function substitutivity axioms are rarely needed in abductive problems when
using surface deduction with separated clauses.
Flattenings of a clause can be viewed as alternate
representations of the clause's term structure and are
therefore essentially equivalent. Without loss of generality we restrict our attention to explanations E such that
-,E is flat (flat explanations).
If E and E' are explanations of (A, 0) such that E
equationally implies E' but is not equationally equivalent to E', then E' is strictly preferred over E. Given
an explanation E of (A, 0) there are many equationally
equivalent existential conjunctions of facts, all of which
are also explanations of (A,O). The preference criteria
introduced so far do not distinguish among equationally
equivalent explanations. Using the intuition that a "simpler" explanation should be preferred, we give a stronger
definition of preference:
Definition. Let E and E' be flat explanations. Then
E' is strictly preferred over E iff either E equationally
implies E' but is not equivalent to E', or E is equationally equivalent to E' and E' has fewer atoms.
Given these preference criteria, we have the following
theorem which determines the most preferred flat explanation among equationally equivalent ones:
Theorem 4.2 For any explanation E, if E' is the negation of the equational residuum of -,E, then E' is the
unique most preferred flat explanation among flat explanations equationally equivalent to E.
inclusion of equality in abductive reasoning are given
in [Charniak 1988]. Here we give an example from a
different domain.
Consider the following (imaginary, but realistic) situation. A researcher X experimentally determines the
value of a quantity associated with a physical object (e.g.
the mass of an isotope of an element) and sends us the
result. We have independently obtained a value for the
same quantity (by theory and/or experiment) and our
value differs from X's value. We believe our value to
be correct and we would like to explain the discrepancy.
\Ve do not know the exact means by which X's value
was obtained, but we know what kinds of experimental
apparatus X might have used. One kind of apparatus
(type A) is notorious for a hard- to-control drift in the
settings which results in a systematic bias in the readings. Thus we can explain the discrepancy between our
and X's values by hypothesizing that X used apparatus
of type A with a systematic bias equal to the difference
between the two values.
The situation is formalized as follows: Let T A(x)
mean that x is an apparatus of type A. Let Vt(y) be the
true value of quantity y, Vm(z,y) the value of quantity
y measured in experiment z, A(u) the apparatus used in
experiment u and B( x) the systematic bias of apparatus
x. The quantity measured by X is q, and the experiment performed by X is given the name e. With these
definitions, our knowledge T consists of the clauses
Tl:
Vt(q)~O:-
T2:
Vm(XI' x 2)
TA(A(XI))
Xl ~ 0 + Xl :-
T3:
where knowledge about other types of apparatus and theorems about real numbers other than T3 have been omitted. The observation 0 is given by
0:
Proof. Let :- A be a flat clause equationally equivalent to ,E. If :- A is not reduced, then REDU( :- A)
has fewer atoms than :- A and the corresponding explanation is therefore strictly preferred. Assume that :- A
is reduced. If the equational residuum of :- A is not
given by :- A, then the equational residuum of :- A has
fewer atoms than :- A, so that the corresponding explanation is strictly preferred. The result now follows by the
uniqueness theorem for equational residua, Theorem 3.4.
•
5
An Application
Examples from the domain of story comprehension and
motivation analysis which demonstrate the need for the
Vm(e,q)~2:-
The first task is to obtain a flattening of T and the
negation of the observation:
IT3:
Xl == 0 :- Xl == Vt(x 2), X2 == q.
X4 ~ Xs + X6:- T A(X3)' X6 ~ B(X3)' X4 Vm(XI,X2), Xs ~ Vt(x 2), X3 ~ A(x l )·
Xl ~ x2 + Xl :- X2 ~ O.
fO:
:- Xl == 2, Xl ~ Vm(X2' X3), X2 ~ e, X3 ~ q.
ITl:
IT2:
The clauses ITl and fO are separated.
clauses for IT2 and IT3 are given by
Separated
sIT2:
X4 ~ Xs + x6:- T A(X3), X6 ~ B(X7)' X3 ~ XS,
X7 ~ XS, X4 ~ Vm(xI' X2), Xs == Vt(XlO), X2 ~
Xg, XlO ~ Xg, X3 ~ A(x u ), Xl ~ X12, Xu ~ X12'
sIT3:
Xl
~
X2
+ X3
:- X3
== X4,
Xl
== X4, X2
~
O.
544
All clauses of T have equalities in their heads and
need to be symmetrized. The fully transformed set of
clauses is given by
TI':
X3 ~ X4:-
~
Xl
T2':
X3 ~ X s , Xl ~ Xs, X6 ~ X 4 , X6 ~
+ X6,
0,
XIS' X 4 -
XIS, X16 -
X 14 '
~ B(X7), X3 ~ Xs,
X7 ~ Xs, X4 == Vm(XI,X2), Xs ~ Vt(XlO), X2 ~
X9, XlO ~ X9, X3 == A(xu), Xl ~ X12, Xu ~ X12'
Xl6
==
Xs
T A(x 3 ),
root fact., surf.
fact., and compr.
X6
Xs
X2
==
X6:-
+ X3,
X3
==
X4, Xl
X.j, X2
~
0.
~
X19
X31,
X25
:-
X9
e,
X3 ~
0'
:- Xl ~
e,
res. with T2'
X3 ~
Xl ~ Vm(X2' X3), X2 ~
2,
~
Xl9
X7 ~ XIS,
+
Xs
X6 ~ Xu,
~ Vm(x4' x s ), :rs
Xs
-
Xl
surf. fact. followed by root
fact. and compr.
==
== Xu, Xg ~ 0, Xg ~
~ A(XI4)~~ XIS,
X6
e,
X3 ~
:-
X9
== 2, T A(X6),
X6
~
X11'
X 17
~
X 21 '
~ q,
Xg
~
XIO
q.
X9
Xu,
X12,
:- Xl9 ~
2,
Xl9
~ Xs
~
xu,
X6
~
XIS'
+ X9, T A(X6),
X9 ~ B(XlO), X6 ~ X11,
:rlO ~
Xll,
Xs ~ Vt(x I3 ), :r3 ~ :r I2 , X l3 ~ X 12 '
~
X2 ~
~
A(X I4 ), X2
e,
X3
:- X l9 ~
==
2,
~
XIS' X 14
XIS'
q.
x l9 ~ Xs
~ B(XlO), X6 ~
X9
+ Xg,
TA(X6),
~ X11'
:1.:u, XlO
~ X24, X20 ~ X24, X2S ~ Vt(X I 3),
X25 == 0, X 20 ~ Vt(X21 ), X21 ~ q,
Xs
X3
~ X12' X l3 ~ x 12 ' X6 ~ A(x 14 ),
x 2 ~ XIS, X l 4 ~ XIS, X2
and
:- X l9
== 2,
X I9 ~ Xs
== e,
X3 ~
q.
+ X g , T A(x6 ),
~ B(xlO)' X6 ~ :r 11 , XlO ~ Xu,
Xs ~ X24, X20 ~ X2.j, X20 ~ 0, X20 ~
Vt(X3), X6 ~ A(x 14 ), X2 ~ XIS,
X9
Xl4
==
XI5, X2 ~
e,
X3 ~
q.
res. with T3'
X3I, X32 ~ Xs
X27
~
T A(X6)'
XlO
~
X2S'
X9
Xu,
+ Xg,
X32
~
==
X26
+ X27,
~ 0,
~ B(XlO)' X6 ~ X 11 ,
X 2S
X2S'
Xs ~ X24,
:1.: 26
X20 ~ X24,
~ Vt(X3), X6 ~ A(x 14 ),
X2 ~ X15, X14 ~ XIS' :/:2 == e, X3 ~ q.
X20
X9 ~
:X6
~
X2
==
:-
X22 == 0, X l 7 ~ Vt(XIg),
~3)' X6 ~ A(X I4 ),
2,
XI5, Xl4
==
e,
X3 ~
q.
T A(X6), X9 ~ B(XlO),
Xu, XlO
X6
~ B(XlO),
Xs ~ X21,
~
==
Xu, X6
XI5, X2
A(X2), X2
Xg ~ B(X6), X9 ~
~
~
A(XI4),
e, X3
== e,
== q.
T A(X6),
2.
== 0,
X20
The last clause is the negation of the desired explanation. Note how two resolutions with TI' were used to
simulate symmetry.
Vt(X I3 ),
X14
XIS,
2,
~ Vm(X2' X3), X2 ~ e, X3 ~ q.
X6
res. with TI'
~
X4
~
Xg
~
XlO
...:..
Xl3
X12,
A(X14)'
'&19
T A(x 6 ),
X9,
B(XlO),
X7
reduction to the
min. residuum
q.
q.
Xl ~ XIS,
:-
surf. fact., root
fact. and compI'.
X31,
~ 2, T A(X6)' X9 ~ B(xlO),
X2 ~ XIS, Xl4 ~ XI5, X2 ~
The negation of the desired explanation can now be
deduced from 0'. In the deduction below, the literals
involved in each step are underlined. As is usually the
case, the function substitutivity axioms are not needed.
~
X6 ~ Xu, XlO
XIS
0':
surf. fact.
compr.
2,
Xl4 ~ X15, X2 ~
Xs ~ X7, Xl ~ X7, Xs ~ X6, Xs ~
==
~
X19
~~25 == X2S, TA(X6), X9 ~
B(XlO), X6 == Xu, XlO ~ Xu, Xs ~ 0,
Xs ~ Vt(x 3 ), X6 == A(X I 4), X2 -
Vi(X3),
res. with TI'
T3':
.-
X9
XIS' X l4 ~ X 15 ' X 2 ~
== q.
Vt(X2), X2
~ X I4 :- XI3 ~
X l3
root fact., surf.
fact. and compr.
6
Conclusion
From a theoretical perspective, surface deduction is very
appealing in its simplicity. We have seen how (at least
in theory) surface deduction can be applied in situations
such as abductive reasoning where deduction rather than
refutation is the primary goal.
If the equality theory of interest contains function
substitutivity, a problem with using surface deduction
for abduction is that in general the function substitutivity axioms are still required. Current research indicates
that to a large extent, the function substitutivity axioms
can be ignored in abductive problems when using surface
deduction with symmetrized, separated and flat clauses.
VVe do not know any practical example where this is not
the case.
From a practical point of view, one of the frequently
recognized problems with flattening the clauses of the
input theory is that one loses most of the advantages of
unification, particularly if the input theory contains few
equalities. One can regain some of these advantages in
practice by interpreting the set of equalities in the body
of a clause as a directed graph or hypergraph (with arcs
from the root variables to the surface terms) which defines the set of possible definitions of the main terms
and variables of the clause. Such a directed graph generalizes the usual tree representation of terms. Unification and more generally term rewriting can then be
replaced by (hyper)graph rewriting rules. To implement
545
this idea, the deduction procedures must be substa.ntially
enhanced. The types of graph rewriting rules and graph
representations needed require further research.
The preference criteria for explanations given in Section 4 are very weak. However, we believe that no matter
what preference criteria are used, RES( C) is at least as
good an explanation as C. One of the most important
problems in abductive reasoning is to determine stronger
preference criteria to avoid combinatorial explosion.
These issues are discussed in [Poole and Provan 1990].
Many ofthe results used in this paper can be generalized to arbitrary clauses so that the restriction of abductive reasoning to Horn clause theories ca.n be removed.
These generalizations will be the topic of a forthcoming
paper.
References
[Baxter 1976] L. D. Baxter. The C01nple;rity of Unification. Ph.D. Thesis, University of 'Waterloo, 1976.
[Bosco et al. 1988] P. G. Bosco, E. Giovannetti, and
C. Moiso. Narrowing vs. SLD-Resolution. Theoretical
Computer Science Vol. 59 (1988), pp. 3-2:3.
[Brand 1975] D. Brand. Proving Theorems with the
Modification Method. SIA1\1 1. Comput. Vol. 4 (1975),
pp. 412-430.
[Chan 1986] K. H. Chan. Equivalent Logic Programs
and Symmetric Homogeneous Forms of Logic Programs with Equality. Technical Report 86, Dept. of
Computer Science, Univ. of Western Ontario, London,
Ont., Canada, 1986.
[Charniak 1988] E. Charniak. Motivation Analysis, Abductive Unification, and Nonmonotonic Equality. Artificial Intelligence Vol. 34 (1988), pp. 275-295.
[Colmerauer et al. 1982] A. Colmerauer et al. Prolog II:
Reference Manual and Theoretical Model. Groupe
d'Intelligence Artificielle, Faculte des Sciences de Luminy, Marseilles, 1982.
[Cox and Pietrzykowski 1985] P. T. Cox and T. Pictrzykowski. Surface Deduction: a Uniform Mechanism for
Logic Programming. In Pmc. Syrnp. On Logic Programming, IEEE Press, Washington, 1985. pp. 220227.
[Cox and Pietrzykowski 1986] P. T. Cox and T. Pietrzykowski. Incorporating Equality into Logic Programming via Surface Deduction. Ann. P1l1'e Appl. Logic,
Vol. 31 (1986), pp. 177-189.
[Cox and Pietrzykowski 1987] P. T. Cox and T. Pietrzykowski. General Diagnosis by Abductive Inference. In
Proceedings of the Symposium on Logic Programming,
IEEE Press, Washington, 1987. pp. 183-189.
[Cox et al. 1991] P.T. Cox, E. Knill, and T. Pidrzykowski. Equality and Abductive Residua for Horn Clauses.
Technical Report TR-8-1991, School of Computer Science, Technical University of Nova Scotia, Halifax, NS,
Canada, 1991.
[van Emden and Lloyd 1984J M.H. van Emden and
J.W. Lloyd. A Logical Reconstruction of Prolog II.
In Proc. 2nd Int!. Conlon Logic Prog. Uppsala, 1984.
pp. 35-40.
[Goebel 1990] R. Goebel. A Quick Review of Hypothetical Reasoning Based on Abduction. In AAAI Spring
Symposium on Automated Abduction, Stanford University, 1990. pp. 145-149.
[Hoddinott and Elcock 1986] P. Hoddinott and E.W.
Elcock. PROLOG: Subsumption of Equality Axioms
by the Homogeneous Form. In Pmceedings of the Symposium on Logic Programming, 1986. pp. 115-126.
[Bolldobler 1989] S. I-Iolldobler. Foundations of. Equational Logic Programming. Lecture Notes in Computer
Science 353, Springer Verlag, Berlin, 1989.
[Holldobler 1990] S. Holldobler. Conditional Equational
Theories and Complete Sets of Transformations. TheO1,etical Computer Science, Vol. 75 (1990), pp. 85-110.
[de Kleer et al. 1988] J. de Kleer, A. K. Mackworth, and
R. Reiter. Characterizing Diagnoses. In Proceedings
Eighth National Conference on Artificial Intelligence,
1990. pp. 324-330.
[Kohavi 1978J Z. Kohavi. Switching and Finite Automata Theory. McGraw-Hill, 1978.
[Lloyd 1984] J. W. Lloyd. Foundations of Logic Programming. Springer Verlag, Berlin, 1984.
[Paterson and Wegman 1978J M. S. Paterson and M. N.
\Vegman. Linear Unification. J. Comput. Syst. Sci.
Vol. 16 (1978), pp. 158-167.
[Poole 1988] D. Poole. A Logical Framework for Default Reasoning. Ar·tificial Intelligence Vol. 36 (1988),
pp.27-47.
[Poole and Provan 1990] D. Poole and G. M. Provan.
What is an Optimal Diagnosis? In Conference on Uncertainty in AI, Boston, 1990.
[Reiter and de Kleer 1987J R. Reiter and J. de Kleer.
Foundations of Assumption-Based Truth Maintenance
Systems: Preliminary Report. In Proceedings of the
National Conference on A l·tificial Intelligence, 1987.
pp. 183-188.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
546
Hypothetico-deductive Reasoning
Chris Evans * and Antonios C. Kakas t
*Department of Mathematical Studies, Goldsmiths' College, University of London
New Cross, London SE14 6NW, UK. EMAIL: c.evans@gold.lon.ac.uk.
tDepartment of Computer Science, University of Cyprus, 75 Kallipoleos Street,
Nicosia, Cyprus. EMAIL: kakas@cyearn.earn
(Part of the research for this paper was completed while both authors were at Imperial College, London SW7 2BZ)
Abstract
This paper presents a form of reasoning called
"hypothetico-deduction", that can be used to address
the problem of multiple explanations which arises in
the application of abduction to knowledge assimilation
and diagnosis.
In a framework of hypothetico-deductive reasoning
the knowledge is split into the theory T and observable
relations S which may be tested through experiments.
The basic idea behind the reasoning process is to
formulate and decide between alternative hypotheses.
This is performed through an interaction between the
theory and the actual observations. The technique
allows this interaction to be user mediated, permitting
further information through
the acquisition of
experimental tests. Abductive explanations which have
all their empirical consequences observed are said to be
"fully corroborated".
We set up the basic theoretical framework for
hypothetico-deductive reasoning and develop a
corresponding proof procedure. We demonstrate how
hypothetico-deductive reasoning deals with one of the
main characteristics of common-sense reasoning,
namely incomplete information, through the use of
partial corroboration. We study the extension of basic
hypothetico-deductive reasoning applied to theories
that incorporate default reasoning as captured by
negation-as failure (NAF) in Logic Programming. This
is applied to the domain of Temporal Reasoning, where
NAP is used to formulate default persistence. We show
how it can be used successfully to tackle typical
problems in this domain.
1 Motivation
Abduction is commonly adopted as an approach to
diagnostic reasoning [Reggia & Nau, 1984], [Poole,
1988]. However, there are frequently many possible
abductive explanations for a given observation. This is
the problem of "multiple explanations". In order to
choose between these explanations it becomes
necessary to collect more information. Consider the
Crime Detection example formalized below (Theory
Tl).
Suppose we arrive at the scene of the crime and the
first observation we make is that someone is dead. We
seek an explanation for this on the basis of the theory
Tl above. Suppose we accept that there are only three
possible causes of death: being strangled, being
stabbed, or drinking arsenic (these are technically
known as the abducibles). Simple abduction starting
from the observation "dead" yields precisely these three
possible explanations. In order to choose between these
multiple explanations, we need to collect more
information. For example, if we examined the corpse
and discovered that there were marks on the neck, we
Theory Tl
strangled
~
blood_loss
poisoned
~
~
dead
strangled
dead stabbed
dead
~
~
neck_marks
blood_loss
drunk_arsenic
~
poisoned
might take this as evidence for the first explanation
over the others. Moreover, we know that drinking
arsenic also has the consequence of leaving the victim
with a blue tongue, so we might like to look for that.
One approach to deciding between multiple
explanations is through the performance of crucial
experiments ([Sanar & Goebel, 1989]): pairs of
explanations are examined for contradictory
consequences, and an experiment is performed which
refutes one of them whilst simultaneously
corroborating the other. With n competing
explanations we must thus perform at most (n-l)
crucial experiments .
The crucial experiment approach is, however, unable
to choose between explanations when they fail to have
contradictory consequences or when they have
contradictory consequences that are not empirically
determinable (e.g. Tychonic and Copernican world
systems). In our example, for instance, the explanations
"strangled" and "stabbed" are not incompatible. It is
possible that the victim was both strangled and stabbed.
As result, there can be no crucial experiment that will
decide between the two. However, further evidence
might lead us to accept one explanation, whilst
tentatively rejecting the other. For example, knowledge
that the person exhibits marks on the neck supports the
"strangled" hypothesis. In fact we have all the
theoretically necessary observations to conclude that
the victim was strangled. On the other hand, the
"stabbed" hypothesis implies "blood_loss", which if not
observed might lead us to favour the "strangled"
explanation. Note that later evidence of blood loss
would lead us to return to the "stabbed" hypothesis (in
addition to "strangled"). From our viewpoint, crucial
experiments are the speCial case of general
hypothetico-deductive reasoning when an hypothesis is
refuted whilst simultaneously corroborating a second.
The process of hypothetico-deductive reasoning
allows the formation and testing of hypotheses within
an interactive framework which is applicable to a wide
547
class of applications and is implementable using
existing technology for resolution.
The technique of hypothetico-deductive reasoning
has its origin in the Philosophy of Science. It was
primarily proposed by opponents of Scientific
Induction. Its notable contributors were Karl Popper
([Popper, 1959],[Popper, 1965]), and Carl Hempel
[Hempel, 1965]. In its original context, hypotheticodeduction is a method of creating scientific theories by
making an hypothesis from which results already
obtained could have been deduced and which entails
new predictions that can be corroborated or refuted. It
is based on the idea that hypotheses cannot be derived
from observation, but once formulated can be tested
against observation.
The hypothetico-deductive mechanism we formulate,
resembles this method in having the two components of
hypothesis formation and corroboration. It differs from
the accepted usage of the term in philosophy of science
by the status of the hypothesis formation component.
In the philosophy of the process of hypothesis
formation is equivalent to theory formation: a creative
process in which a complete theory is constructed to
account for the known observations. By contrast, the
method we describe here starts with a fixed generalized
theory which is assumed to be complete and correct.
The task is to construct some hypotheses which when
added to the theory have the known observations as
logical consequences. The process is more akin to that
used by an engineer when they apply classical
mechanics to a particular situation: they don't seek a
new physical theory, but rather a set of hypotheses
which would explain what they have observed. Since,
for us, hypothesis formation can be mechanized, we do
not have to tackle the traditional issues of the
philosophy of science concerning the basis of theory
formation. We thus avoid (like Poole before us [Poole,
1988, p.28]) one of the most difficult problems of
science.
This paper is organized as follows. We first describe
the reasoning process and present the logical structure
of the reasoning mechanism, indicating how it relates to
classical deduction and model theory. Abductive and
corroborative derivation procedures for implementing
the reasoning process are then defined through
resolution. We indicate how this reasoning technique
relates to current work on abduction and diagnostic
reasoning, and suggest some possible extensions. We
illustrate the features and applicability of this reasoning
method with several examples. We then describe the
extension of hypothetico-deduction to apply to theories
which include some form of default reasoning, using
negation-as-failure as an example. We consider a
typical application of defaults in causal reasoning,
namely default persistence, and provide several further
examples which illustrate this extension.
2
Hypothetico-deductive Framework
Suppose we have a fixed logical theory T about the
world. For example, it might be a medical model of the
anatomy, or a representation of the connections in an
electrical network, or a model of the flow of urban
traffic in Madrid. Let us divide the relations in the
theory into two categories: empirical and theoretical.
How we make this distinction will depend on how we
interpret these relations in the domain for the theory.
An empirical relation is one which can be (or has been)
observed. For example, the blood pressure of a patient,
the status of a circuit-breaker (open or closed), or the
number of cars passing some point. By contrast, a
theoretical relation is in principle not observable.
Examples of theoretical relations might be infection
with an influenza virus, the occurrence of a short-circuit
from the viewpoint of a control centre, or the density of
traffic at some pOint.
Suppose we want an explanation for G on the basis
of the theory. By this, what we mean is "what relations
(we will call them hypotheses) might be true in order to
have given rise to G?". The answer to this question
could involve either theoretical or empirical relations.
In order to be confident that an explanation is the
correct explanation it is useful to test it. Explanations in
terms of empirical relations are directly testable. In the
simplest case we just consider the other observations we
have already made; in more complicated cases, we may
need to "go and look" or even perform an
"experiment". Explanations in terms of theoretical
relations must be tested indirectly, by deducing their
empirical consequences. and testing these.
Unfortunately, not all hypotheses that might give rise
to the observation G serve as explanations. regardless as
to whether they pass any tests. Some are too trivial such
as taking G as an explanation for itself. Others we rule
out as unsuitably shallow. For example, suppose we
sought an explanation for the observation "Jo laughed
at the joke"; one possible hypothesis is because "the
joke was funny". However, what we really wanted was a
deeper explanation: Why was the joke funny? We
therefore designate certain types of hypotheses as
explanatory (or, more strictly, "abducible").
The problem of explanation. as far as we are
concerned in this paper, is the problem of constructing
abducible hypotheses which when we add them to T
will have G as a logical consequence. Furthermore,
explanations must pass (direct or indirect) tests.
The process of constructing hypotheses which have
G as a deductive consequence is an example of
hypothesis formation. It is this stage that corresponds
to the "hypothetico-" component of hypotheticodeductive reasoning. The process of testing an
explanation is an example of corroboration. It is this
stage that corresponds to the "deductive" component
of hypothetico-deductive reasoning. This is because we
use deduction to determine the empirical consequences
of a given explanation. The process of hypotheticodeductive reasoning can now be formulated as the
construction of an explanation for an observation
through interleaving hypothesis formation and
corroboration.
3 The Hypothetico-deductive
Mechanism
Let us consider the mechanism for hypotheticodeductive reasoning in more detail. To simplify matters
we shall require that our theory is composed of rules
and no facts. In logical terms, an hypothesis (and thus
an explanation) will be a set of ground atomic wellformed formulae.
Suppose we have a (usually causal) theory T, an
observation set 0, a set of abducible atomic formulae A,
and a particular observation G from 0 which we wish to
explain. Let 0' = O-G. In addition we define a set S, the
observables, containing all the formulae that can occur
in O.
There are three components to the reasoning
process: hypothesis formation, hypothesis
corroboration, and explanation corroboration. In
outline, we carry out hypothesis formation on G, and
for each component formula in the resultant
hypothesis. We repeat this process until all that remains
548
is a set of abducible relations constituting the
explanation. We also carry out hypothesis
corroboration at each formation point. Finally we
reason forwards from the explanation to perform
explanation corroboration.
Hypothesis Formation
From any ground atomic formula F we form an
hypothesis for that formula. This is done by
determining which rules in T might allow F as a
conclusion, and forming an hypothesis from tIle
antecedents of each such rule (after carrying out the
relevant substitutions dictated by F). Each hypothesis is
thus sufficient to allow the conclusion of F.
Hypothesis Corroboration
An hypothesis for an observation may contain
instances of observables defined by S. For each such
component we check to see whether it is an observation
recorded in 0'. If it is a member of 0' then it is
corroborated and we can retain it. However, where any
component is not corroborated in this fashion, we reject
the entire hypothesis.
Explanation Corroboration
An hypothesis H which is composed entirely of
instances of abducible predicates defined by A is an
explanatory hypothesis. To corroborate H, we use T to
reason forwards from H as an assumption. Each logical
consequence of H which is also an instance of an
observable is checked against 0' for corroboration
(similar to "hypothesis corroboration"). If it does not
occur in 0' then the original hypothesis H is rejected. If
all observable consequences are corroborated, then the
explanation H is said to be corroborated.
In general, rules may have more than one literal in
their antecedent. We must also check the satisfaction of
the other literals in a given rule by reasoning backwards
until we reach either one of the observations in 0' or
one of the other explanatory hypotheses. If neither of
these two situations arise, the rule is discarded from the
forward reasoning process.
We make a distinction between corroboration failure,
where an hypothesis or prediction does not occur in the
observation set 0', and refutation, where the negation
of an hypothesis or prediction occurs in 0'. Normally
the form of and T means that refutation is impossible
(see the next section for details of this form). Later we
suggest an extension which allows the possibility of
refutation in addition to corroboration failure. In cases
where it is natural to apply the closed world assumption
to 0, these two situations will coincide.
°
4 The Logical Structure of
Hypothetico-deductive Reasoning
Suppose we have a theory T composed of definite
Horn clauses and an observation set of ground atomic
well-formed formulae 0. Let the set of ground atomic
be S, the observables.
formulae which can occur in
Similarly, let us define a set of distinguished ground
atomic formulae A, the abducibles, in terms of which
all explanations must be constructed. An explanation
will be a member of the set A. We will assume that the
theory T alone does not entail any empirical
observation without some other empirical input i.e.
there does not exist any formula <\> such that <\> E Sand
T 1= <\>. Consider also a ground atomic formula G (a
member of S) for which we seek an explanation.
°
Given the 4-tuplc , a corroborated
explanation 6. for G, is a set of ground atomic wellformed formulae, which fulfils all of the following
criteria:
(1) Each formula in ~ must be a member of A.
(2) T v ~ 1= G
°
(3) IfT v 6. 1= nand n ~ S ,then n ~
An explanation set ~ which satisfies (1) and (2) but not
(3) is said to be uncorroborated.
This formulation is easily generalized to explanation
for multiple observations by simply replacing G with a
conjunction of ground atomic formulae.
We note that since at this stage we have taken our
theories to be Horn, a simple extension to hypotheticodeductive reasoning allows us to distinguish between
explanation refutation when a prediction is inconsistent
with observation, and merely the failure of
corroboration where a prediction is consistent with
known observations but not present in them. Such an
extension would allow a hypothetico-deductive system
to deal with circumstances where our observations
cannot ever be complete (where we know our faultdetection system is itself fallible, for instance). We
could then discard only those explanations that are
refuted, and order the remaining ones according to
their degree of corroboration (corresponding to
Popper's notion of versimilitude, [Popper, 1965]). A
later section discusses the extension of hypotheticodeductive reasoning to theories which include ncgationas-failure.
This extended version of hypothetico-deductive
reasoning is non-monotonic because later information
might serve to refute a partially corroborated
explanation. To return to our first example for instance,
the observation that the victim does not have a blue
tongue would lead us to reject the hypothesis that they
had drunk arsenic (even if previously this hypothesis
had some observational consequences which had been
observed).
5 Hypothetico-deductive Proof
Procedure
A resolution proof procedure which implements
hypothetico-deductive reasoning is formally presented
below. BaSically we define two types of derivation:
abductive derivation and corroboration derivation
which are then interleaved to define the proof
procedure. Abductive derivation corresponds to the
processes of hypothesis formation and corroboration,
deriving hypotheses for goals. Corroboration derivation
corresponds to the process of explanation
corroboration, deriving predictions from goals. There
are two different ways to interleave the abductive and
deductive components of the reasoning mechanism.
One approach is to derive all the abducible literals in
the hypothesis for an observation, before any of them
are corroborated. The second approach attempts
corroboration as soon as an abducible literal is derived,
postponing consideration of other (non-abducible)
literals in the hypothesis. Here we present a proof
procedure based on the second approach.
Definition (safe selection rule)
A safe selection rule R is a (partial) function which,
given a goal ~ Li, ... , Lk k~l returns an atom Li,
i=l, ... ,k such that:
either
i)
Li is not abducible;
Lj is ground.
or
ii)
549
Definition (Hypothetico-deductive proof procedure)
An abductive derivation from (G I ~ I) to (G n ~n)
via a safe selection rule R is a sequence
(GI ~l), (G2 ~2), ... , (G n ~n)
such that for each i> 1 Gi has the form ~ L 1, .. · ,L k>
R(Gi)=Lj and (Gi+l ~i+r) is obtained according to one
of the following rules:
AI)
If Lj is neither an abducible nor an observable,
then Gi+l=C and ~i+l=~i where C is the resolvent
of some clause in T with Gi on the selected literal
Lj;
A2) If Lj is observable, then Gi+l=C and ~i+l=~i
where C is the resolvent ofC': ~ Ll', ... ,Lj', ... ,Lk'
with some clause in T on Lj' where ~ LI', ... ,Ljl',Lj+l', ... ,Lk' is the resolvent of Gi with some
clause (ground assertion) Lj' in a on the selected
literal Lj;
A3) If Lj is abducible and LjE ~ i, then
Gi+l= ~LI, ... ,Lj-l,Lj+I, ... ,Lk and ~i+l=~i;
A4) If Lj is abducible and Lje: ~ and there exists a
corroboration derivation from ({ Lj} ~iU {Lj}) to
({} ~') then Gi+l = ~Ll, ... ,Lj-l, Lj+l, ... ,Lk and
~i+l = ~'.
Step AI) is an SLD-resolution step with the rules of
T. In step A2) under the assumption that observables
and abducibles are disjoint we need to reason backward
from the true observables in the goal to find
explanations for them since the definition of an
explanation requires that it logically implies G in the
theory T alone without the set of observations O. Step
A3) handles the case where an abductive hypotheses is
required more than once. In step A4) a new abductive
hypotheses is required which is added to the current set
of hypotheses provided it is corroborated.
A corroboration derivation from (FI ~l) to (Fn ~n) is
a sequence
(FI ~l), (F2 ~2) ... (Fn ~n) to (Fn ~n)
such that for each i>1 Fi has the form {H~LI, ... ,Lk} U
Fi' and (Fi+l ~i+l) is obtained according to one of the
following rules:
Cl)
If H is not observable then Fi+l = C' U Fi'
where C' is the set of all resolvents of clauses
in T with H~LI, ... ,Lk on the atom Hand
C2)
If H is a ground observable, He:
and
LI, ... ,Lkis not empty then Fi+l = C' U Fi'
~i+l=~i;
a
where C' is ~LI, ... ,Lk and ~i+l=~i; IfHeO
then Fi+l = Fi' and ~i+l=~i.
C3) If H is a non ground observable, O~ 3 xH and
L I, ... ,Lk is not empty then Fi+ 1 = C' u Fi'
where C' is ~LI, ... ,Lk and ~i+l=~i;
C4) If H is a non ground observable and Lj is any
non observable selected literal from L I, ... ,Lk
then Fi+ 1 = C' u Fi' where C' is the set of all
resolvents of clauses in T U ~i with
H ~ L 1, ... ,Lk on the selected literal Lj and
~i+ 1=~i; If Lj is observable the resolutions
are done only with clauses in O.
C5) If H is empty, Lj is any selected literal and Lj
is not observable then Fi+ 1 = C' u Fi' where
C' is the set of all resolvents of clauses in T u
~i with ~ L 1, ... ,Lk on the literal Lj and
De: C', and ~i+l =~i; If Lj is observable the
resolutions are done only with clauses in O.
In step C I) we "reason forward" from the
conclusion H trying to generate a ground observable at
the head. Once this happens if this observable is not
"true" steps C2), C3) give the denial of the conditions
that imply this observable. Step C4) reasons backward
from the conditio ns either fail i ng or try i ng to
instantiate further the observable head. Step C5)
reasons backward from the denials of steps C2), C3)
until every possible such backward reasoning branch
fails. Note that in the backward reasoning steps
observables are resolved from the observations a and
not the theory. More importantly notice that we do not
reason forward from an observable that is true.
Note that we have included the set of hypotheses ~i
in the definition of the corroboration derivation
although this does not get affected by this part of the
procedure. The reason for this is that more efficient
extensions of the procedure can be defined by adding
extra abducible information in the ~ i duri ng the
corroboration phase e.g.the required absence of some
abducible A can be recorded by the addition of a new
abducible A *.
Theorem
Let be a Hypothetico-Deductive framework
and G a ground atomic formula. If (~G {}) has an
adbuctive derivation to (0, Ll) then the set Ll is a
corroborated explanation for G.
Proof (Sketch)
The soundness of the abductive derivations follows
directly from the soundness of SLD resolution for
definite Horn theories as every abductive derivation
step of this procedure can be mapped into an SLD
resolution step. To show that the explanation ~ is
corroborated let AE S be any ground atomic logical
consequence of T u ~ . Since T u ~ is a definite Horn
theory A must belong to its minimal model which can
be constructed in terms of the immediate consequence
operator 'I[van Emden & Kowalski, 1976] . Hence
there exists a finite integer n such that A E 'IT v Ll i n
(0) and A does not follow from T alone by our
assumption on the form of the theory T . The result
then follows by induction on the length of the
corroboration derivation.
6 Application of Hypotheticodeductive Reasoning
In this section we will illustrate hypotheticodeductive reasoning with some examples. Before this it
is worth pointing out that existing abductive diagnosis
techniques (e.g. [Poole et aI., 1987], [Davis, 1984],
[Cox & Pietrzkowski, 1987], [Genesereth, 1984],
[Reggia et aI., 1983], [Sattar & Goebel, 1989]) can be
accommodated within the HD framework. For example
in the diagnosis of faults in electrical circuits
hypothetico-deductive reasoning exhibits similar
behaviour to [Genesereth, 1984], [Sattar & Goebel,
1989].
Problems and domains which are ideally suited to the
application of hypothetico-deductive reasoning exhibit
two characteristics. Firstly, they have a large number of
550
possible explanations in comparison to the number of
empirical consequences of each of those explanations.
Secondly, they have a minimal amount of observational
data pertaining to a given explanation so that
corroboration failure is maximized.
To illustrate the manner in which general
hypothetico-deductive reasoning deals with differing
but compatible explanations, let us consider the
example of abdominal pain first presented by [Pople,
1985] and axiomatized in [Sattar & Goebel, 1990]. The
axioms are reproduced below. To allow the possibility
of several diseases occurring simultaneously, the three
expressions which capture the fact that the symptoms
(nausea, irritation_in_bowel, and heartburn) are
incompatible, have been omitted.
Theory T2
abdominaCpain_symp(X)
problem_is(indigestion)
~
~
has_abdominal-pain
abdominal_pain_symp(nausea)
problem_is(dysentry) ~
abdominal-pain_symp(irritation_in_bowel)
problem_is(acidity) ~ abdominal_pain_symp(beartbum)
Now consider the following observations:
Observations 0
has abdominal pain
abdominal pain symp(nausea)
Abducibles,
A
{problem_is(indigestion),
problem_is(dysentry),
problem_is(acidity) }
Observables,
S
{has_abdominal_pain,
abdominal_pain_symp(nausea),
abdominal_pain_sympCirritation_in_bowel),
abdominal_pain_symp(heartburn) }
There are three possible potential explanations for the
observation "has_abdominal_pain". Since they are not
mutually incompatible (it is possible to have all three
diseases, for example), there is no crucial literal which
can help us distinguish between the three explanations.
There is thus no "best" explanation from this pOint of
view.
From the point of view of hypothetico-deductive
reasoning however, one of the explanations stands apart
from the others. On the basis of all the currently
available evidence "problem_is(indigestion)" is
completely corroborated. The two remaining
explanations remain possible but uncorroborated; that
is to say there is no supplementary evidence in support
of them. Experiments might be performed (testing for
"abdominal_pain_symp(irritation_in_bowel)", and
"abdominal_pain_symp(heartburn)") which could
corroborate one or both of the others, which would lead
us to extend our explanation. Since physical
incompatibilities are rare in common-sense reasoning,
hypothetico-deductive reasoning has an advantage in
being able to offer a (revisable) "best" explanation
based on the currently available evidence, in spite of the
absence of possible crucial experiments. It is important
to appreciate that it is usually impractical to simply
construct the hypotheses by performing abduction on
.all the observations in 0, since in general there may be
an extremely large number of them. Moreover, only a
few may be relevant to the particular observation for
which we seek an explanation.
It might be thought that the checking of all the
observational consequences of some explanation might
be equally impractical: there might be an infinite
number of them as well. However, it must be borne in
mind that we are only considering the representation of
common-sense; we would normally ensure that there
are only a small number of observable consequences in
which we would be interested. We would define our set
of observables, S, accordingly. So, for instance, in the
fermentation example below we represent certain
critical times (often referred to as "landmarks") at which
we might perform observations. Similarly, in the
"stolen car" example which we present later, we restrict
observables to events that occurred at some specific
pOint in time.
One application area in which incomplete
information is intrinsic, is that of temporal reasoning.
Reasoning about time is constrained by the fact that
factual information is only available concerning the
past and the present. By its very nature we must
perform temporal diagnosis with no knowledge about
the future states of the systems we are trying to model.
As an example of temporal diagnosis which
illustrates this characteristic, consider an industrial
process involving the fermentation of wine. Suppose we
are faced with the task of diagnosing whether the
fermentation process has proceeded normally, or that
the extremely rare conditions have occurred under
which we will produce a vintage wine. To do this we
must carry out a test at some time after the winemaking process has begun, such as measuring its pH, its
relative density, or its alcohol content. Suppose further
that we need to decide on this diagnosis before a certain
time, e.g. the bottling-time tomorrow. Let us refer to
some property of the mixture which would be observed
for vintage wine by the symbol pI, and that for
ordinary wine as p2. These two properties might be
entirely compatible: it is perfectly possible for ordinary
wine to be produced under conditions which exhibit
. p leas well as p2), but in such a case it is not the fact that
the mixture is ordinary wine that causes pI to be
observed. Now suppose we observe p I before the
bottling time, and suppose there are no further
observational consequences for the "vintage wine"
hypothesis that are observable before tomorrow. Then
the "vintage wine" hypothesis is completely
corroborated within the defined time-scale. On the
other hand, the "ordinary wine" hypothesis remains at
best only partially corroborated. Hypothetico-deductive
reasoning would then p refe r the "vintage wine"
hypothesis over the "ordinary wine" one. The
temporal dimension illustrates the ability of
hypothetico-deductive reasoning to form diagnoses on
the basis of incomplete information. Notice that an
extension of the time scale would revise the status of the
observable relations and perhaps the "vintage wine"
hypothesis would become only partially corroborated.
The application of hypothetico-deductive reasoning to
the temporal domain will be discussed in more detail in
the next section as an important special case of the
integration of hypothetico-deductive reasoning and
default reasoning.
7 Hypothetico-deduction with Default
Theories
As we discussed above, the aim of hypotheticodeducti ve reasoning has been to provide a framework
in which we can tackle one of the main characteristics
of common sense reasoning, namely incomplete
information. More specifically it addresses the fact that
551
we are often forced to form hypotheses and
explanations on the basis of liI?ited informati~n.
Another important form of reas?n~ng t~at deals. wl~h
the problem of incomplete (or limIted) mformatIOn IS
default reasoning (see e.g. [Reiter, 1980]). We can then
enhance the capability of each framework separately to
deal with this problem of missing information by
integrating them together into a common framework.
So far we have only considered the application of
hypothetico-deduction to classical theQries. In t~s
section we study its application to default theon~s
incorporating negation-as-failure (N~) from .Loglc
Programming. We will then apply this ~daptatIOn of
hypothetico-deduction to temporal reasomng pr?blemS
formulated within the event calculus where NAF IS used
to represent default persistence in time ([Kowalski &
Sergot, 1987], [Evans, 1989]).
.
The approach we adopt is to conside~ only class~cal
theories to which non-monotonIC reasonIng
mechanisms such as default and hypothetico-deductive
reasoning are applied (in contrast to ~on-monotonic
logics). The motivation a~ before, IS to. separate
representation (classical lOgIC) from reasomng .(nonmonotonic). Recent formalizations of the semanUcs of
negation-as-failure [Eshghi & Kowalski, 1989], [Kakas
& Mancarella, 1990], [Dung, 1991], [Kakas &
Mancarella, 1991] have adopted a similar point of vi~w.
This approach means that hypothetico~deductIve
reasoning can be applied to default theOrIes of any
system which separates these two components, e.g.
circumscription [McCarthy, 1980].
Following this work, we associate to any gene~al
logic program, P, (Horn clauses extended wIth
negation-as-failure) a classical theory, P', as follows.
Each negative condition, no~ p, where not de.notes the
negation-as-failure operator, IS regarde~ ~s a smgle n.ew
positive atom. This can be made explICIt by. repla~Ing
each such negative literal, not p, by a syntacuc vanant,
say p*, to give the Horn theory P'. The model-theoretic
extension of the new symbol is intended to be the
complement of the old one, so that we can 0!llit the not.
To take a more meaningful example we mIght replace
"not alive" with "dead". These new symbols "p*"
or "dead" are then defined to be abducible predicates.
The above authors show that with this view it is possible
to understand (and generalize) the stable model
semantics [Gelfond & Lifschitz, 1989] for NAF in logic
programming. (Note that this is also the approach ta~en
more generally in [Poole, 1988]. for unders~andmg
default reasoning through abductIOn by namll1g the
defaults and considering these as assumptions.)
We can then apply an adapted formulation of
hypothetico-deductive reasoning to these. classical Horn
theories P' corresponding to general lOgIC programs P.
As above we have a 4-tuple where the set, A,
of abducibles has been extended with new abducibles
e.g. "p*", "dead", which name the different NAF
default assumptions.
Hence given a 4-tuple
, a corroborated
explanation d for an observation G, is a set of ground
atomic well-formed formulae, which fulfils all of the
following criteria:
(1) Each formula in d is a member of A.
Let d = dO U dH where dD denotes the subset of
abducibles corresponding to NAF.
(4) There exists a stable model 1 M of P' U dH U 0
such that the negations corresponding to dD hold
in M (Le. are contained in the complement of M).
This is a direct extension of the previous definition
of hypothetico-deductive reasoning.. The ext~a
condition (4) captures the default reasonmg present In
the theory P (or P'). This is clearly separated in this
condition although it does play an important role in the
generation of explanations by rejecting expl.anations
that do not satisfy it. This has the effect of addmg extra
abducibles in the d to make it acceptable. For example
in the theory,
G ~ p*
p ~ q*
q
~
a
although {p*} is an explanation for G, this is not
accepted until the abducible "a" .is added ~o it ~hich
ensures that this default assumptIOn {p*} IS valId. In
addition condition (4) also ensures that any default
assumption (abducible) in d is compatible with the
observations O. Note that we could have chosen to put
together condi tions (2) and (4) as "G is true in a stable
model of P U dH" for generating the explanations d,
and use condition (4) solely for the purpose of
ensuring that dD are compatible with the observations
O.
Although at first sight it might seem appropriate to
allow default reasoning during the corroboration of an
explanation this is not the case as indicated by
condition (3). The reason for this is clear: if we allow it
then the corroboration process will not be for the
explanation d alone, but for d plus any additional
default assumptions made in arriving at the observable
test. In other words, we would not want to reject an
explanation ~ by failure to corroborate an observati.on
that is a not a consequence of ~ alone but of d WIth
some additional default assumptions.
Let us now indicate how the proof procedure for
hypothetico-deductive reasoning, defined earlier, needs
to be extended to deal with this more general
formulation where our theories are general logic
programs. The first thing to notice is that, as indicated
by condition (3), the corroboration phase of the
procedure remains unchanged apart from the fact t~at
it will also be applied whenever a NAP hypotheSIS,
"p*" (or "not p"), is added to the explan.ation.
Similarly, the adbuctive derivation phase remall1s as
before with the set of abducibles enlarged to include
the NAF default assumptions.
The main extension of the procedure arises from the
need to implement the new condition (4). This can be
done by adopting the abductive proof procedure
developed in [Eshghi & Kowalski, 1989], [Kakas &
Mancarella, 1990b], [Kakas & Mancarella, 1990c] for
NAF which is an extension of SLDNF. A new type of
derivation, called consistency derivation, is introduced
interleaved with the abductive phase of the procedure
whenever a NAF hypothesis, "p*" (or "not p"), is
required in the explanation. Itsyurpose is to ens~re that
"p*" (or "not p") is a valId NAF ass~mJ?tIOn by
checking that p does not succeed. ThIS lI1volves
reasoning backwards from p in all possible ways and
showing that each such branch ends in failure.
During this consistency check for some N AF
hypothesis, "p*" (or "not p"), it is possible for new
(2) P' U d 1= G
(3) If P'
U
d 1= nand n ~ S , then n ~ 0
1 More generally, we can use recent extensions of stable
models e.g. preferred extensions or stable theories as defined in
rDun!!. 19911 and rKakas & Mancarella. 19911 resoectivelv.
552
abductive phases to be generated whenever the failure
of some consistency branch reduces to showing that
some other NAF default assumption e.g. "q*" (or
"not q") does not hold in the theory P' u ~. To ensure
this the procedure starts a new abductive phase to show
that q holds where it is possible that new hypotheses
may be added in the explanation if this is needed to
prove q. Then with this enlarged explanation "q*" (or
"not q") is not a valid (default) NAF assumption (as q
holds) and so the original consistency branch can not
succeed. In the example above the abducible "a" in
the explanation {p*, a} for G is generated during the
consistency check of p* (or not p) as described here.
More details about this extension of the proof
procedure can be found in the references above.
8 Application of HD Reasoning to
Temporal Reasoning
As an example of the application of the above
extended hypothetico-deducti ve mechanism, let us
consider temporal reasoning with the Event Calculus
[Kowalski & Sergot, 1987] where NAF is used to
express default persistence in time.
The Event Calculus represents properties which hold
over intervals of time. They are initiated and terminated
by events which happen at particular instances of time.
NAF is used to conclude that a property is not
"clipped" or "broken" over an interval of time,
achieving default persistence. Variants of the two main
axioms, which define when a property "holds" and
when a property is "broken", are given below.
holds-at(p,t2) f- happens-at(e,t 1) "
initiates(e,p) "
t1 < t2 "
not broken-during(p, }. M is a stable model of P iff M is a minimal model
of GL(P,M) [Pr2,GL].
It has been showed [DK] that each minimal Herbrand
model of a positive disjunctive programs is a model of the
Clark's completion of N(P). In this chapter, we are
interested in the more general -question about the
relationship between the stable models of P and N(P).
We introduce now the acyclic disjunctive programs.
The following theorem shows the equivalence between P
and N(P) for acyclic disjunctive programs.
Definition
A disjunctive program P is acyclic if it is possible to
decompose the Herbrand base of P into disjoint sets, called
strata Ho,Hl' ... '~' .. where i is a natural number so that for
each ground clause in Gp
(i) all Ci belong to the same stratum, say
~.
(ii) all Ai and Bi belong to U { Hj I j < r }
II
Since acyclic programs are locally stratified, their intended
semantics is the perfect model semantics.
3. Transforming Acyclic Disjunctive Programs into
Normal Programs
Let us introduce some new notations. Let D be a
disjunction of atoms. D is canonical if the atoms in D are
pairwise different. For each disjunction D, the canonical
Theorem 1
Let P be an acyclic disjunctive
program P, and M be a Herbrand
interpretation of P. Then M is a
stable model of P iff M is a stable
model of N(P).
Proof "=>" Let Q = GL(Gp,M). Since M is a stable model
of P, M is a minimal model of Q. Since M is a minimal
model of Q, for each A E M, there is a clause A v Al v..
v An <- Body in Q such that for each i: Ai e. M and Body
is true in M. Hence, for each A E M, there is a clause A
<- Body' in GN(P) such that Body' is true in M. Thus, there
exists a clause C' in GL(GN(P),M) such that head(C')=A and
body(C') is true in M. Since P is acyclic, GL(GN(p),M) is
acyclic, too. It follows, that M is the least Herbrand model
of GL(GN(P),M). So M is a stable model of N(P).
"<=" Let M be a stable model of N(P). Since GL(GN(P),M)
= GL(N(GL(Gp,M)),M), M is also a stable model of
N(GL(Gp,M)). Thus M is a minimal model of GL(Gp,M).
Hence M is a stable model of P.
II
557
Corollary
Let P be an acyclic disjunctive
program, N (P) be its normal fann.
Then a Herbrand interpretation M is
a perfect model of P iff M is a
stable model of N(P).
Summary
Let P be an acyclic disjunctive program, and L be a ground
literal.
II
2) P U {L} is stable-consistent iff
The following example shows that in general, the above
theorem does not hold.
N(P) U {L} is stable-consistent iff
Example Let P:
a <- b
b <- a
avb
N(P): a <- b
b <- a
The question of basic interest to us now is:
a <-l b
b <-l a
(*)
It is clear that P is not acyclic. It is easy to see that N (P)
has no stable model while the unique minimal model of P
is {a,b}.
II
Since each locally stratified disjunctive program posseses
at least one perfect model [Prl,Pr2], it is obvious that there
exists at least one stable model for N(P). So
Corollary
comp(N(P» U {L} is consistent.
II
If P is acyclic, then N(P) posseses at
least one stable model.
II
The following theorems give important characterizations of
the normal form of a acyclic disjunctive program.
"Given an acyclic disjunctive program P and a
ground literal L, is P U {L} stable-consistent ?"
Eshghi and Kowalski have developed an abductive
procedure [EK,Dun] which takes as input a query G and a
normal program P, and delivers as output a set of ground
negative literals H such that P U H U {G} is stableconsistent. From the above obtained results, it is clear that
this abductive procedure can be used as a proof procedure
for the question (*).
4. The Eshghi and Kowalski's Abductive Procedure
Before presenting the formal definition of the abductive
procedure, let us explain the algorithm informally by an
example.
Example
Theorem 2
Let P be an acyclic disjunctive
program. Then each stable model of
N(P) is a Herbrand model of
comp(N(P) and vice versa where
comp(N(P» denotes the Clark's
predicate completion [Cla,Llo] of
N(P).
II
Theorem 3
The three-valued semantics and the
two-valued semantics of comp(N(P»
are equivalent in the sense that each
three-valued model of comp(N(P»
can be extended into an two-valued
one.
II
Let L be a ground literal. We say that L holds with respect
to the stable semantics of P, written P h L, if L is true in
each stable model of P. We say P U {L} is stableconsistent if there exists one stable model of P in which L
is true.
P:
P <-l q
q <-l P
We want to check whether p belongs to some stable model
of P, i.e. whether P U {p} is stable-consistent. It is clear
that the SLDNF-resolution will not terminate for this goal
due to the existence of a negative loop. To avoid getting
trapped in this loop, the abductive procedure uses a loop
check by "storing" all "encountered" negative literals in a
set H. If a selected sub goal belongs to H, then the
respected goal is simplified by deleting the selected subgoal
from it.
558
~- P
I
<-lq ~------------C~--~~-=~~---~
I
<-q and 1 q is "stored" in R = h q}.
I
I
such that, for each i, O,R')
Gi+l = <-1' and
~+l = H'
An abductive refutation is an abductive derivation to a
pair ([],H).
A consistency derivation from (Fl,Hl) to (Fn'~) (wrt P)
is a sequence
if
there is an abductive derivation from
«-k,~) to ([],R')
then
else
Fi+1 = F i ' and ~+l = H'
if l' is not empty
then F i+l= {<-I'} U Fi '
and ~+l=~
A consistency derivation of the goal ({G},.
We say that the abductive procedure is complete with
respect to the stable semantics if for each ground literal L,
if P U {L} is stable-consistent then there exists a refutation
for the goal «-L,.
I
{<-
l
b, <-b}
I
Proof (Sketch) Let Ho, .. ~, ... be the strata of P. Let Pi
consist of those clauses Ai v ... v Au. <- Bd in Gp such that
all Aj belong to ~. By induction, we can prove that for
each i, the stable semantics and preferential semantics
[Dun] of Pi coincide. It follows then that the stable and
preferential semantics of P coincide. The theorem follows
immediately from the fact that the abductive procedure is
sound wrt preferential semantics [Duti].
~~------------------,
1-b
I
I
<- l a
t
~--------------~
{<-a} H
I
{<- l b}
I
= h P,l a}
I
II
Using AbdudiYe ........... For Skeptical Reasoning
<-b
I
I
The question of this chapter is:
[]
{<-b}
I
{<- l a}
"Given a logic program P and a ground literal L,
does L hold with respect to the stable semantics of
P ?"
I
fail
The following lemma shows that if the abductive procedure
is complete, then it can be used to as a proof procedure for
skeptical reasoning.
fail
As there is no refutation for
Lemma
Let L be a ground literal and assume that
the abductive procedure is sound and
complete with resp'eet to the stable
semantics.
If there exists no refutation for
«-
l
L, IBI.
It is not difficult to see that if P is an acyclic disjunctive
program then N(P) is always p-acyclic. Note that positive
.cyclicity is different to local stratifiability, i.e. there exists
programs which are p-acyclic and not locally stratified and
vice versa.
The atom dependency graph of P is a graph with ground
atoms as its nodes such that there exists a positive (resp.
560
negative) edge from A to B if A occurs in the head, and B
occurs positively (resp. negatively) in the body of some
clause C in Gp •
An infinite path (AI'''.An , •• ) of pairwise different atoms in
the atom dependency graph of P is said to be a negative
infinite loop if the path contains infinitely many negative
edges. P is said to be free of infinite negative loop,
written INL-free, if there exists no negative infinite loop
in the atom dependency graph of P.
A program P is allowed [LIo] if each clause in P satisfies
the condition that each variable appearring in the clause
appears also in a positive subgoal in the clause body.
References
[ABW]
[AB]
[Bez]
[Cav]
Theorem 5 (Completeness of the Abductive Procedure)
[CIa]
Let P be an allowed, p-acyclic, and NIL-free normal
program, and L be an arbitrary ground literal. Then the
abductive procedure will terminate for the goal «-L,e true in
any model of P, by the first rule, and so, the second rule cannot contradict the assigned meaning.
Another way to understand this is that one may
safely assume IVC using a form of CWA on c, since
IVa may not be consistently assumed.
= {}.
However, when relying on the absence of present
evidence about some atom A, we do not always
want to assume that IV A holds, since there may
exist consistent assumptions allowing to conclude
A. Roughly, we want to define the notion of concluding for the truth of a negative literal IVA just
in case there is no hard nor hypothetical evidence
to the contrary, i.e. no consistent set of negative .
assumptions such that IVA is untenable.
Consider P = {a +-lVbj b +-lVaj C +- a}. If we
interpret the meaning of this program as its WFM
(which is empty), and as we do not have a, a naive
CWA could be tempted to derive IVC based on the
assumption IVa. There is however an alternative
negative assumption IVb, that if made, defeats the
assumption IVa, i.e. the assumption IVa may not
be sustained since it can be defeated by the assumption IVb. We will define later more precisely
the notions of sustainability and tenability.
Both programs above have empty well founded
563
models. We argue that WFS is too careful, and
something more can safely be added to the meaning of program, thus reducing the undefinedness of
the program, if we are willing to adopt a suitable
form ofCWA.
We argue that a set CW A( P) of negative literals (assumptions) added to a program model
MOD(P) by CWA must obey the four principles:
1. MOD(P) U CW A(P)
~
L for any
",L E CW A(P). This says that the program
model added with the set of assumptions identified by the CWA rule must be consistent.
2. There is no other set of assumptions A
such that MOD(P) U A F L for some
",L E CW A(P). I.e. CW A(P) is sustainable.
3. CW A(P) must be maximal.
4. CW A(P) must be unique.
The paper is organized as follows: in the next
section we present some basic definitions. In section 3 we introduce some new definitions, capturing the concepts behind the semantics, accompanied by examples illustrating them. Models are
defined and organized into a lattice, and the class
of sustainable A-Models is identified. In section 5
we define the O-Semantics of a program P based on
the class of maximal sustainable tenable A-Models.
A unique model is singled out as the O-Model of P.
Afterwards we present some properties of the class
of A-Models. Finally, we relate to other semantics
and present conclusions.
2
Language
Here we give basic definitions and establish notation ([Monteiro, 1991]). A program is a set of rules
of the form: H +- B I , .•. ,i!n,"'CI , ... ,,,,Cm (n 2::
0, m 2:: 0) or equivalently H +- {B I , ••• , Bn}U
"'{ CI, ... ,Cm }, where ",{AI, ... ,An} is a shorthand for {",AI, ... , ",An}, and", C is short for
"'{ CI, ... ,Cm }; H" Bi and Cj are atoms.
The Herbrand Base B(P) of a program P is defined as usual as the set of all ground atoms. An
interpretation I of P is denoted by TU ",F, where
T and F are disjoint subsets of B(P). Atoms in T
are said to be true in I, atoms in F false in I, and
atoms in B(P) - (T U F) undefined in I.
In an interpretation TU '" F a conjunction of
literals {BI, ... , Bn}U '" {CI, ... ,Cm } is true iff
{BI, ... , Bn} ~ T and {CI, ... , Cm} ~ F, is false
iff {B I , ... ,Bn}nF # 0 or {CI, ... ,Cm}nF # 0,
and is undefined iff it is neither true nor false.
3
Adding Negative Assumptions to a Program
Here we show how to consistently add negative
assumptions to a program P. Informally, it is consistent to add a negative assumption to P if the
assumption atom is not among the consequences
P after adding the assumption. We also define
when a set of negative assumptions is defeated by
another, and show how the models of a program,
for different sets of negative assumptions added to
it, are organized into a lattice.
We begin by defining what it means to add assumptions to a program. This is achieved by substituting true for the assumptions, and false for
their atoms, in the body of all rules.
Definition 3.1 (P+A) The program P + A obtained by adding to a program P a set of negative
assumptions A ~"'B(P) is the result of:
• Deleting all rules H +- {B I , ••. , Bn}U '" C
from P, such that some Bi E A
• Deleting from the remaining rules all fVL E A
Definition 3.2 (Assumption Model) An Assumption Model of a program P, or A-Model for
short, is a pair (A; M) where A ~'" B(P) and
M=WFM(P+A).
Among these models we define the partial order
(AI; M I ) :Sa (A 2 ; M 2 ) iff
Al ~ A 2 • On the basis of set union and set intersection among the sets A of negative assumptions,
the set of all A-Models becomes organized as a
complete lattice.
Having defined assumption models we next consider their consistency. According to the CWA
principles above, an assumption '" A cannot be
added to a program P if by doing so A is itself
a consequence of P, or some other assumption is
contradicted.
:Sa in the following way:
Definition 3.3 (Consistent A-Model) An AModel (A; M) is consistent iff A U M is an interpretation, i.e. there exists no assumption fVL E A
such that L E M.
564
Example 1 Let P
= {c
+-""
b; b
+-""
a;
a +-"" a}, whose WFM is empty. The A-Model
({ "'a}; {a, b, ""c}) is inconsistent since by adding
the assumption ""a then a E W F M(P + {""a}).
The same happens with all A-Models containing
the assumption ""a. The A-Model ({ ""b, ""c}; {c})
is also inconsistent. Thus the only consistent AModels are ({};{}), ({"'b};{c}) and ({""c};{}). 0
Lemma 3.1 If an A-Model AM is inconsistent
then any AM' such that AM ~a AM' is incon~
sis tent.
Proof;[sketch] We prove that for all ""a' E B(P),
if (A; W F M(P + A)
is inconsistent then
(A U {""a'};W F M(P + A U {""a'}) is also inconsistent. By definition of consistent A-Model:
3 ",b E A I b E W F M(P + A), so it suffices to
guarantee that: b ¢ W F M(P + A U {""a'}) --.
a' E W FM(P + A U {"'a'}).
Consider b ¢ W F M(P + A U {""a'}). Since
P + A U {",a'} only differs from P + A in rules
with a' or ""a', and since b is true in P + A, it can
be shown a' is also true in P + A. As the truth
of an atom in the WFM of any program may not
rely neither on the truth of itself nor of its complementary, and because the addition of "" a' to
P + A only changes rules with ""a' or a', the truth
value of a' in P + A U {""a'} remains the same, i.e.
a' E WFM(P+AU{""a'}). ¢
({ ""b}; {c}), i.e. the assumption ""c is unsustainable since there is a set of consistent assumptions
(namely {""b}) that leads to the conclusion c. 0
The assumptions part of maximal sustainable AModels of a program P are maximal sets of consistent Closed World Assumptions that can be safely
added to the consequences of P without risking
contradiction by other assumptions.
Lemma 3.2 If an A-Model AM is defeated by another A-Model D, then all A-Models AM' such
that AM ~a AM' are defeated by D.
Proof: Similar to the proof of lemma 3.1 above. ¢
Lemma 3.3 The A-Model ({}; WFM(P)
ways sustainable.
is al-
Proof: By definition of sustainable. ¢
Theorem 3.4 The set of all sustainable A-Models
is nonempty. On the basis of set union and set intersection among its A sets, the A-Models ordered
by ~a form a lower semi lattice.
Proof: Follows directly from the above lemmas. ¢
A program may have several maximal sustainable A-Models.
According to the CWA principles above, an assumption ",A cannot be sustained if there is some
set of consistent assumptions that concludes A.
We've already expressed the notion of consistency
being used. To capture the notion of sustainability
we now formally define how an A-Model can defeat another, and define sustainable A-Models as
the nondefeated consistent ones.
Example 3 Let
P = {c +-""c, ""b; b +- a; a +-""a}. Its sustainable
A-Models are ({}; {}), ({ ""b}; {}) and ({ ""c}; {}).
The last two are maximal sustainable A-Models.
We cannot add both ""b and ""c to the program to
obtain a sustainable A-Model since ({ ""b, ""c}; {c})
is inconsistent. 0
Definition 3.4 (Defeating)
A consistent A-Model (A; M) is defeated by a consistent (A'; M') iff 3 ",a E Ala E M'.
4
Definition 3.5 (Sustainable A-Models)
An A-Model (A; M) is sustainable iff it is consistent and not defeated by any consistent A-Model.
Equivalently (""S; M) is sustainable iff:
S
n Uconsistent
(AjjMj) Mi
= {}
Example 2 The only sustainable models in example 1 are ({}; {}) and ({""b}; {c}). Note that
the consistent A-Model ({ ""c}; {}) is defeated by
The O-semantics
This section is concerned with the problem of singling out, among all sustainable A-Models of a program P, one that uniquely determines the meaning of P when the CWA is enforced. This is accomplished by means of a selection criterium that
takes a lower semilattice of sustainable A-Models
and obtains a subsemilattice of it, by deleting AModels that in a well defined sense are less preferable, i.e. the untenable ones.
Sustainability of a consistent set of negative assumptions insists that there be no other consistent
565
set that defeats it (Le. there is no hypothetical evidence whose consequences contradict the sustained
assumptions). Tenability requires that a maximal
sustainable set of assumptions be not contradicted
by the consequences of adding to it another competing (nondefeating and nondefeated) maximal
sustainable set.
The selection process is repeated and ends
up with a complete lattice of sustainable AModels, which defines for every program Pits 0Semantics. The meaning of P is then specified
by the greatest A-Model of the semantics, its 0Model.
To illustrate the problem of preference among
maximal A-Models we give an example.
Example 4
Let P = {c ~rvc, rvb; b ~ a; a ~rva}, whose sustainable A-Models are ({};{}), ({rvb};{}), and
( { rvC }, {} ).
Because we wish to maximize the
number of negative assumptions we consider the
maximal A-Models, which in this case are the
last two. The join of these maximal A-Models,
({ rvb, rvc}; {c}), is per force inconsistent, in this
case wrt c. This means that when assuming rvC
there is an additional set of assumptions entailing c, making this A-Model untenable. But the
same does not apply to rvb. Thus the preferred AModel is ({ rvb}, {}), and the A-Model ({ rvc}; {})
is said untenable. The rationale for the preference
is grounded on the fact that the inconsistency of
the join arises wrt c but not wrt b. 0
Definition 4.1 (Candidate Structure)
A Candidate Structure CS of a program P is any
subsemilattice of the lower semi lattice of all sustainable A-Models of P.
Definition 4.2 (Untenable A-Models)
Let {(AI; M 1 }, ••• , (An; Mn)} be the set of all maximal A-Models in Candidate Structure GS. Let
J = (AJ; MJ) be the join of all such A-Models, in
the complete lattice of all A-Models. An A-Model
(Ai; Mi) is untenable wrt G S iff it is maximal in
G S and there exists rva E Ai such that a E M J.
The Candidate Structure left after removing all
untenable A-Models of a CS, may itself have several maximal elements, some of which might not
be maximal A-Models in the initial CS. If the removal of untenable A-Models is performed repeatedly on the retained Candidate Structure, a single
maximal element is eventually obtained, albeit the
bottom element of all the CSs.
Definition 4.3 (Retained CS) The
Retained Candidate Structure R( G S) of a Candidate
Structure G Sis:
• G S if it has a single maximal A -Model, i. e.
G S is a complete lattice.
• Otherwise, let U nt be the set of all untenable
A-Models wrt GS. Then R(GS) = R(GS Unt).
Definition 4.4 (The O-Semantics)
The O-Semantics of a program P is defined by the
Retained Candidate Structure of the semilattice of
all sustainable A-Models of P.
Let (A; M) be its maximal element. The intended meaning of P is A U M, the O-Model of
P.
Theorem 4.1 (Existence of O-Semantics)
The Retained Candidate Structure of the semilattice of all sustainable A-Models is nonempty.
Proof:[sketch] It suffices to guarantee that at each
iteration with more than one maximal A-Model at
least one is untenable. This is done by contradiction: suppose no maximal A-Model is untenable.
Then their join would be the single maximal sustainable one, and so could not be untenable, in the
previous and final iteration; accordingly the supposed models cannot be maximal.
When there is a single maximal A-Model then
the structure is a complete lattice, since at
each iteration only maximal A-Models were removed. This lattice is nonempty since its bottom ({}; W F M(P) is always sustainable and can
never be untenable. 0
Proposition 4.1 There exists no untenable AModel wrt a Candidate Structure with a single
maximal element.
Proof: Since the join coincides with the unique
maximal A-Model, which is sustainable by definition of CS, then it cannot be untenable. 0
5
Examples
In this section we display some examples and their
O-Semantics. Remark that indeed the O-Models
obtained express the safe CWAs compatible with
the WFMs (which are all {}).
566
Example 5
Let P = {a +-"'aj b +- aj C +-"'C, ",bj d +- c} The
semilattice of all sustainable A-Models CS is:
vice-versa. Thus the O-Semantics is determined
by ({}j {}) and ({ ",p}j {}), and the meaning of P
is {"'P}, its O-Model. 0
=
The
Jom
of
its
maximal
AModels is ({ ",b, "'c, ",d}j {c, "'d}). Consequently,
the maximal A-Model on the right is untenable
since it contains "'c in the assumptions, and c is
a consequence of the join. So R( C 8) = R( C B')
where C8' is:
The join of all maximal elements in C 8' is the same
as before and the only untenable A-Model is again
the maximal one having'" c in its assumptions.
Thus R( C 8) = R( C 8") where C 8" is:
So the O-Model is {",b, "'d}. Note that if P is
divided into PI = {c +-"'c, "'b; d +- c} and P2 =
{a +-"'a; b +- a}, the O-models of PI and P2 both
agree on the only common literal ",b. So ",b rightly
belongs to the O-models of P. 0
Example 6 Let P = {q +-'" pj p +- a; a +-"'bj
b +-'" C; C +-'" a}. Its only consistent A-Models
are ({};{}), ({",p};{q}) and ({",q};{}). As this
last one is defeated by the second, the only sustainable ones are the first two. Since only one is
maximal, these two A-Models determine the 0Semantics, and the meaning of P is {"'p, q}, its
O-Model. Note that if the three last rules, forming an "undefined loop", are replaced by another
"undefined loop" a +-",a, the O-model is the same.
This is as it should, since the first two rules conclude nothing about a. 0
Example 7 Let P = {p +- a, b; a +-",b; b +-",a}.
The A-Models with '" b in their assumptions defeat A-Models with ",a in their assumptions and
,
Example 8 Let P
{c +-'" C, '" b; b +-"'c, "'b;
b +- aj a +-'" a}. Its sustainable A-Models are
({}j {}), ({ ",b}j {}) and ({ ",c}j {}). The join of the
maximal ones is ({ ",b, ",c}j {b, c}), and so both are
untenable. Thus the Retained Candidate Structure has the single element ({}j {}) and the meaning of P is {} 0
6
Properties of Sustainable
A-Models
This section explores properties of sustainable
A-Models that provide a better understanding
of them, and also give hints for their construction without having to previously calculate all AModels.
We begin with properties that show how our
models can be viewed as an extension to Well
Founded Semantics (WFS). As mentioned in
[Kakas and Mancarella, 1991a], negation in WFS
is based on the notion of support, i.e. a literal ",L
only belongs to an Extended Stable Model (XSM)
if all the rules for L (if any) have false bodies in
the XSM. In contradistinction, we are interested
in negations as consistent hypotheses that cannot
be defeated. To that end we weaken the necessary
(but not sufficient) conditions for a negative literal to belong to a model as explained below. We
still want to keep the necessary and sufficient conditions of support for positive literals. More precisely, knowing that XSMs must obey, among others, the following conditions d. [Monteiro, 1991]:
• If there exists a rule p +- B in the program
such that B is true in model M then p is also
true in M (sufficiency of support for positive
literals).
• If an atom p E M then there exists a rule
p +- B in the program such that B is true in
M (necessity of support for positive literals).
• If all rule bodies for p are false in M then
"'p E M (sufficiency of support for negative
literals).
• If "'P E M then all rules for p have false bodies
in M (necessity of support for negative literals).
567
Our consistent A-models, when understood as
the union of their pair of elements, assumptions
A and W F M{P + A), need not obey the fourth
condition. Foregoing it condones making negative
assumptions. In our models an atom might be false
even if it has a rule whose body is undefined. Thus,
only false atoms with an undefined rule body are
candidates for having their negation added to the
WFM{P).
Proposition 6.1 Let (A; M) be any consistent AModel of a program P. The interpretation A U M
'obeys the first three conditions above.
Proof: Here we prove the satisfaction of the first
condition. The remaining proofs are along the
same lines.
If 3p +- bt , ... , bn , I'V Ct, ... , I'V Crn E P I
{b t , ... ,bn , I'VCt, ... ,I'VCrn } ~ AU M then bi E M
(I ~ i ~ n) and I'VCj E M or I'VCj E A (1 ~ j ~ m).
Let p +- b}, ... ,bn,I'VC1, ... ,l'Vck(1 ~ 1,k ~ m) be
the rule obtained from an existing one by removing
alll'Vcj E A, which is, by definition, a rule of P + A.
Thus there exists a rule p +- B in P + A such that
B ~ WFM{P+A) = M. Given that the WFM of
any program must obey the first condition above,
p E WFM{P+A). ~
Next we state properties useful for more directly
finding the sustainable A-Models.
Proposition 6.2 There exists no consistent AModel (A; M) of P with I'V a E A such that
a E WFM(P).
Proof: Let (A; M) be an A-Model such that
l'Va E A and a E W F M(P). It is known that the
truth of any a E WFM(P) cannot be supported neither on itself nor on l'Va. If A = {l'Va}
then, lafter adding {l'Va} to the program, the
rules supporting the truth of a remain unchanged, i.e. a E W F M{P + {l'Va}), and thus
({ l'Va}; W F M (P + {l'Va})),s inconsistent. It follows, from lemma 3.1, that all A-Models (A; M)
such that {l'Va} ~ A are inconsistent. ~
Hence, A-Models not obeying the above restriction are not worth considering as sustainable.
Proposition 6.3
If a negative literall'VL E W F M(P) then there is
no consistent A-Model (A; M) of P such that
LEM.
Proof:[sketch] We prove that if L E M for a given
A-Model (A; M) of P then (A; M) is inconsistent.
If L E M there must exist a rule L +- B, I'Ve in P
such that BU I'Ve ~ M U A and BU I'Ve is false in
W F M(P), i.e. there must exist L +- B, I'Ve in P
with at least one body literal true in M U A and
false in W FM{P). If that literal is an element of
I'Ve, by proposition 6.2, (A; M) is inconsistent (its
corresponding atom is true in W F M (P) and false
in M U A). If it is an element of B this theorem
applies recursively, ending up in a rule with empty
body, an atom with no rules or a loop without an
interposing I'V/. By definition of W F M (P + A) the
truth value of literals in these conditions can never
be changed. ~
Theorem 6.1 If I'VL E W F M(P) then I'VL E M
in every consistent A-Model (A; M) of P.
Proof: Given proposition 6.3, it suffices to prove
that L is not undefined in any consistent A-Model
of P. The proof is along the lines of that of the
proposition above. ~
Consequently, all supported negative literals in
the W F M{P), which includes those without rules
for their atom, belong to every sustainable AModel.
Lemma 6.2 Let WFM{P) = TU I'VF. For any
subset S of I'VF, W F M{P) = W F M(P + S).
Proof: This lemma is easily shown using the definition of P + A and the properties of the WFM.
~
Theorem 6.3 Let WFM(P) = TU I'V F and
(A; W F M(P + A)) be a consistent A-Model, and
let A' = An I'V F. Then WFM(P + A) =
WFM(P + (A - A')).
Proof: Let pI = P + (A - A'), and W F M(P) =
Tu I'VF. By theorem 6.1 I'VF ~ W F M(P I). So,
by lemma 6.2, WFM(P I) = WFM(PI+ I'VF) =
W F M([P+(A-(An I'VF) )]+ I'VF). By definition of
P+A it follows that (P+At}+A 2 = P+(A 1 UA 2 ).
Thus W F M(P I ) is:
~
=
W F M(P + [(A - (An I'VF)U I'VF])
WFM(P+A)
568
This theorem shows that sets of assumptions
including negative literals of W F M(P) are not
worth considering since there exist smaller sets
having exactly the same consequences AU M and,
by proposition 6.3 the larger sets are not defeatable
by reason of negative literals from the WFM(P).
Another important hint for calculating the sustainable A-Models is given by lemma 3.1. According to it one should start by calculating A-Models
with smaller assumption sets, so that when an inconsistent A-Model is found, by the lemma, sets
of assumptions containing it are unworth considering.
Example 9
Let P = {p f-"""a, """b; a f - c, d; c f-"""C; d}. The
least A-Model is ({}; {d, """b}) where {d, """b} =
W F M(P). Thus sets of assumptions containing """ d or """ b are not worth considering.
Take now, for example, the consistent A-Model
({ """a}; {d, """b,p}), which we retain.
Consider
({ """c}; {c, a, """p}); as this A-Model is inconsistent
we do not retain it nor consider any other AModels with assumption sets containing """c. Now
we are left with just two more A-Models worth
considering: ({ """P }; {d, """b}) which is defeated
by ({"""a}; {d,,,,,,,b,p}); and ({"""p,,,,,,,a};{d,,,,,,,b,p})
which is inconsistent. Thus the only two sustainable
A-Models
are
({}; {d, """b})
and ({"""a}; {d, """b,p}). In this case, the latter is
the single maximal sustainable A-Model, and thus
uniquely determines the intended meaning of P to
be A U M = {"""a, d, """b, p}. 0
7
Relation to other work
Consider the following program ([Van Gelder et
al., 1980]):
P
=
{p f - q,"""r,"""s; q f - r,"""p;
r f - p, """q; s f-"""P, """q, """r}
In [Przymusinska and Przymusinski, 1990] they
argue that the intended semantics of this program
should be the interpretation {s, """p, """q, """r} due to
the mutual circularity of p, q, r. This model is precisely the meaning assigned to the program by the
O-Semantics, its O-Model. Note that WFS identifies the (3-valued) empty model as the meaning
of the program. This is also the model provided
by stable model semantics [Gelfond and Lifschitz,
1988]. The weakly perfect model semantics for this
program is undefined as noticed in [Przymusinska
and Przymusinski, 1990].
The EWFS [Baral et al., 1990] is also an extension to the WFM based on the notion of GCWA
[Minker, 1987]. Roughly EWFS moves closer than
the WFM (in the sense of being less undefined)
to being the intersection of all minimal Herbrand
models of P [Dix, 1991]:
EWFM(P) =def
WFM(P)
+ (T(WFM(P)),F(WFM(P)))
where: T(I) =def True(I - MIN - MOD(P)),F(I) =aef False(I - MIN - MOD(P)) and
I - MIN - MOD ( P) is the collection of all minimal
models consistent with the three valued interpretation I.
For the program P = {a f-"""a} we have:
WFM(P)={},
MIN-MOD(P) = {a} and EWFM(P)
= {a}
Note this view identifies the intended meaning
of rule a f-""" a as the equivalent logic formula
a f - -,a, i.e. a. The O-Model of P is empty.
The difference between the O-Semantics and
EWFS may be noticed in the intended meaning
of the two rule program: {a f-"""b; b f-"""a}, which
is behind the motivation of the extension EFWS of
WFM based on GCWA. EWFS wants to identify
a V b as the meaning of this program, which also
justifies the identification of a f-"""a with the fact
a. The O-Model is empty.
A similar approach based on the notion of stable
negative hypotheses (built upon the notion of consistency) is introduced in [Kakas and Mancarella,
1991b], identifying a stable theory associated with
a program P as a "skeptical" semantics for P, that
always contains the well founded model.
One
example
showing
that
their approach is
still conservative is:
{p f-"""q; q f-"""r; r f-"""P; s f - p}. Stable theories
identifies the empty set as the meaning of the program; however its O-Model is {"""s}, since it is consistent, maximal, sustainable and tenable. Kakas
(personnal communication) now also obtains this
model, as a result of the investigation mentioned in
the conclusions of [Kakas and Mancarella, 1991b].
8
Conclusions
We identify the meaning of a program P as a suitable partial closure of the well founded model of
569
the program in the sense that it contains the well
founded model (and thus always exists). The extension we propose reduces undefinedness (which
some authors argue is a desirable property) in the
intended meaning of a program P, by an adequate form of CWA based on notions of consistency, sustainability and tenability with regard to
alternative negative assumptions. Sustainability
of a consistent set of negative assumptions insists
that there be no other consistent set that defeats it
(i.e. there is no hypothetical evidence whose consequences contradict the sustained assumptions).
Tenability requires that a maximal sustainable set
of assumptions be not contradicted by the consequences of adding to it another competing (nondefeating and nondefeated) maximal sustainable set.
Acknow ledgements
We thank ESPRIT BRA COMPULOG (no. 3012),
Instituto Nacional de Investiga~ao CientHica, Junta
Nacional de Investiga~ao Cientlfica e Tecnol6gica and
Gabinete de Filosofia do Conhecimento for their support. We are indebted to Anthony Kakas and Paolo
Mancarella for their previous incursions and intuitions
into a similar problem in the setting of their Stable
Theories. LUIS Monteiro is thanked for helpful discussions.
References
[Baral et al., 1990] C. Baral, J. Lobo, and J. Minker.
Generalized well-founded semantics. In M. Stickel,
editor, CAD'90. Springer-Verlag, 1990.
[Dix, 1991] J. Dix. Classifying semantics of logic programs. In A. Nerode, W. Marek, and V. S. Subrahmanian, editors, Logic Programming and NonMonotonic Reasoning'91. MIT Press, 1991.
[Gelfond and Lifschitz, 1988] M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In R. A. Kowalski and K. A. Bowen,
editors, 5th International Conference on Logic Programming, pages 1070-1080. MIT Press, 1988.
[Kakas and Mancarella, 1991a} A. C. Kakas and
P. Mancarella. Negation as stable hypothesis. In
A. Nerode, W. Marek, and V. S. Subrahmanian, editors, Logic Programming and NonMonotonic Rea30ning'91. MIT Press, 1991.
[Kakas and Mancarella, 1991bJ A. C. Kakas and
P. Mancarella. Stable theories for logic programs .
. In Ueda and Saraswat, editors, International Logic
Programming Symposium'91. MIT Press, 1991.
[Minker, 1987] J. Minker. On indefinite databases and
the closed world assumption. Readings in Nonmonotonic Reasoning. Morgan Kaufmann, 1987.
[Monteiro, 1991J L. Monteiro. Notes on the semantics
of logic programs. Technical report, DIjUNL, 1991.
[Pereira et al., 1991a] 1. M. Pereira, J. J. Alferes, and
J. N. Aparicio. Contradiction Removal within Well
Founded Semantics. In A. Nerode, W. Marek, and
V. S. Subrahmanian, editors, Logic Programming
and NonMonotonic Reasoning '91. MIT Press, 1991.
[Pereira et al., 1991b] L. M. Pereira, J. J. Alferes, and
J. N. Aparicio. The extended stable models of contradiction removal semantics. In P. Barahona, L. M.
Pereira, and A. Porto, editors, 5th Portuguese AI
Conference'91. Springer-Verlag, 1991.
[Pereira et al., 1991cJ 1. M. Pereira, J. N. Aparicio,
and J. J. Alferes. Counterfactual reasoning based
on revising assumptions. In Ueda and Saraswat,
editors, International Logic Programming Symposium'91. MIT Press, 1991.
[Pereira et al., 1991d] L. M. Pereira, J. N. Aparicio,
and J. J. Alferes. Hypothetical reasoning with well
founded semantics. In B. Mayoh, editor, Scandinavian Conference on AI'91. lOS Press, 1991.
[Pereira et al., 1991e] L. M. Pereira, J. N. Aparicio,
and J. J. Alferes. Nonmonotonic reasoning with well
founded semantics. In International Conference on
Logic Programming'91. MIT Press, 1991.
[Przymusinska and Przymusinski, 1990]
H. Przymusinska and T. Przymusinski. Semantic
Issues in Deductive Databases and Logic Programs.
Formal Techniques in Artificial Intelligence. North
Holland, 1990.
[Przymusinski, 1990J T. Przymusinski. Extended stable semantics for normal and disjunctive programs.
In International Conference on Logic Programming'90, pages 459-477. MIT Press, 1990.
[Van Gelder et al., 1980] A. Van Gelder, K. A. Ross,
and J. S. Schlipf. The well-founded semantics for
general logic programs. Journ,al of ACM, pages 81132,1980.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
570
Contributions to the Semantics of Open Logic Programs
A. Bossi l , M. Gabbrielli 2 , G. Levi 2 and M.e. Meo 2
2) Dipartimento di Informatica
1) Dipartimento di Matematica Pura ed Applicata
Universita di Padova
Universita di Pisa
Via Belzoni 7, 1-35131 Padova, Italy
Corso Italia 40, 56125 Pisa
mat010@IPDUNIVX.UNIPD.IT
{gabbri,levi,meo}@dipisa.di .. unipi.it
Abstract
The paper considers open logic programs originally
introduced in [Bossi and Menegus 1991J as a tool
to build an OR-compositional semantics of logic
programs. We extend the original semantic definitions in the framework of the general approach
to the semantics of logic programs described in
[Gabbrielli and Levi 1991bJ. We first define an ORcompositional operational semantics On(P) modeling computed answer substitutions.
We consider next the semantic domain of D-interpretations,
which are sets of clauses with a suitable equivalence relation. The fixpoint semantics Fn(P) given
in [Bossi and Menegus 1991J is proved equivalellt to
the operational semantics, by using an intermediate unfolding semantics. From the model-theoretic
viewpoint, an D-interpretation is mapped onto a set
of Herbrand interpretation, thus leading to a definition of D-model based on the classical notion of
truth. We show that under a suitable partial order,
the glb of a set of D-models of a program P is an
D-model of P. Moreover, the glb of all the D-models
of P is equal to the usual Herbrand model of P \;vhile
Fn(P) is a (non-minimal) D-model.
1
Introduction
An D-open program [Bossi and Menegus 1991J P is a
program in which the predicate symbols belonging to
the set D are considered partially defined in P. P can
be composed with other programs which may further
specify the predicates in D. Such a composition is
denoted by Un. Formally, if Pred(P) n Pred(Q) ~
fl then P Un Q = P U Q, otherwise P Un Q is
not defined (Pred(P) denotes the predicate symbols in P). A typical partially defined program is a
program where the intensional definitions are com-
pletely known while extensional definitions are only
partially known and can be further specified.
Example 1.1 Let us consider the following program
Ql = {
anc(X, Y) : -parent(X, Y).
anc(X, Z) : -parent(X, Y), anc(Y, Z).
parente isaac, jacob).
parent(jacob, benjamin).
}
New extensional information defining new parent tuples can be added to QI as follows
Q2 = { parente anna, elizabeth).
parente elizabeth, john).
}
The semantics of open programs must be flcompositional w.r.t. program union, i.e. the semantics of PI Un P2 must be derivable from the semantics
of PI and P 2 • If D contains all the predicates in P,
D-compositionality is the same as compositionality.
The least Herbrand model semantics, as originally proposed [van Emden and Kowalski 1976] and
the computed answer substitution semantics in
[Falaschi et al. 1988,Falaschi et al. 1989a], are not
compositional w.r.t. program union. For example,
in example 1.1, the atom anc( anna, elizabeth) which
belongs to the least Herbrand model semantics of
QI U Q2 cannot be obtained from the least Herbrand
model semantics of QI and Q2 (see also example 2.1).
In this paper we will introduce a semantics for
fl-open programs following the general approach
in [Gabbrielli and Levi 1991bJ which leads to semantics definitions which characterize the program
operational behavior.
This approach leads to
the introduction of extended interpretations C7rinterpretations) which are more expressive than Herbrand interpretations. The improved expressive
power is obtained by accommodating more syn,tactic objects in 7r-interpretations, which are (possibly
571
infinite) programs. The semantics in terms of 7rinterpretations can be computed both operationally
and as the least fixpoint of suitable continuous immediate consequence operators on 7r-interpretations.
It can also be characterized from the model-theoretic
viewpoint, by defining a set of extended models (7rmodels) which encompass standard Herbrand models. In the specific case of n-open programs, extended interpretations are called n-interpretations
and are sets of conditional atoms (i.e. clauses such
that all the atoms in the body are open). Each
n-interpretation represents a set of Herbrand interpretations that could be obtained by composing the
open program with a definition for the open predicates. n-interpretations of open programs are introduced to obtain a unique representative model, computable as the least fixpoint of a suitable continuous
operator, in cases where no such a representative exists in the set of Herbrand models.
The main contribution of this paper is the definition of an OR-compositional (i.e. compositional
w.r.t. program union) semantics of logic programs
in the style of [Falaschi et al. 1988, Falaschi et al.
1989b]. Other approachs to OR-compositionality
can be found in [Lassez and Maher 1984, Mancarella and Pedreschi 1988, Gaifman and Shapiro 1989a,
Gaifman and Shapiro 1989b]. An OR-compositional
semantics corresponds to an important program
equivalence notion, according to which two programs
PI and P2 are equivalent iff for any program Q a
generic goal G computes the same answers in PI U Q
and P2 U Q. An OR-compositional semantics has
also some interesting applications. Namely it can be
used
• to model logic languages provided with a
module-like structure,
• to model incomplete knowledge bases, where
new chunks of knowledge can incrementally be
assimilated,
• for program transformation
(the transformed programs must have the same
OR-compositional semantics of the original program),
• for semantics-based "modular" program analySIS.
The paper is organized as follows. Subsection 1.1
contains notation and useful definitions on the semantics of logic programs. In section 2 we define
an operational semantics On(P) modeling computed
answer substitutions which is OR-compositional.
Section 3 introduces a suitable $emantic domain for
the On(P) semantics and defines n-interpretations
which are sets of clauses modulo a suitable equivalence relation. In section 4 the fixpoint semantics
Fn(P), is proved equivalent to the operational semantics by using an intermediate unfolding semantics. Section 5 is concerned with model theory. From
the model-theoretic viewpoint, an n-interpretation
is mapped onto a set of Herbrand interpretations,
thus leading to a definition of n-model based on the
classical notion of truth. We show that under a suitable partial order, the glb of a set of n-models of a
program P is an n-model of P. Moreover, the glb of
all the n-models of P is equal to the usual Herbrand
model of P. Moreover, Fn(P) is a (non-minimal) nmodel, equivalent to the model-theoretic semantics
defined in [Bossi and Menegus 1991] in terms of Somodels. A comparison between n-models and the
So-models is made in section 6. Section 7 is devoted
to some conclusive remarks. All the proofs of the results given here can be found in [Bossi et al. 1991].
1.1
Preliminaries
The reader is assumed to be familiar with the terminology of and the basic results in the semantics of logic programs [Lloyd 1987,Apt 1988]. Let
the signature S consist of a set F of function symbols, a finite set P of predicate symbols, a denumerable set V of variable symbols. All the definitions in the following will assume a given signature
S. Let T be the set of terms built on F and V.
Variable-free terms are called ground. A substitution is a mapping {) : V ---7 T such that the set
D({)) = {X I {)(X) =I- X} (domain of f)) is finite.
If W c V, we denote by {)Iw the restriction of {) to
the variables in W, i.e. {)lw(Y) = Y for Y
w.
c denotes the empty substitution. The composition
{)(J of the substitutions {) and (J is defined as the
functional composition. A renaming is a substitution p for which there exists the inverse p-l such
that pp-l = p-l P = c. The pre-ordering:::; (more
general than) on substitutions is such that {) :::; (J iff
there exists {)' such that {){)' = (J. The result of the
application of the substitution {) to a term t is an instance of t denoted by tf). We define t :::; t' (t is more
general than t') iff there exists {) such that t{) = t'. A
substitution {) is grounding for t if t{) is ground. The
relation :::; is a preorder. ~ denotes the associated
equivalence relation (variance). A substitution {) is a
unifier of terms t and t' if tf) == t'{). The most general
unifier of tl and t2 is denoted by rngu( tll t2)' All the
above definitions can be extended to other syntactic
expressions in the obvious way. An atom is an object
of the form P(tl" .. ,t n) where pEP, t ll ... ,tn E T.
rt
572
A clause is a formula of the form H : -L 1 , . . . , Ln
with n ;::: 0, where H (the head) and L 1 , ..• ,Ln (the
body) are atoms. ": -" and "," denote logic implication and conjunction respectively, and all variables
are universally quantified. If the body is empty the
clause is a -unit clause. A program is a finite set of
clauses. A goal is a formula L 1 , ... , Lm, where each
Li is an atom. By V m'( E) and P1'ed(E) we denote
respectively the sets of variables and predicates occurring in the expression E. A Herbrand interpretation I for a program P is a set of ground atoms.
The intersection M(P) of all the Herbrand models
of a program P is a model (least Herbrand model).
M(P) is also the least fixpoint of a continuous transformation Tp (immediate consequences operator) on
the complete lattice of Herbrand interpretations. If
G is a goal, G ~ p B I , ... , Bn denotes an SLD derivation with fair selection rule of B I , ... ,Bn in the program P where 13 is the composition of the mgu's used
in the derivation. G ~ p 0 denotes the refutation
of G in the program P with computed answer s'ubstitution 13. A computed answer substitution is always
restricted to the variables occurring in G. The notations i, X will be used to denote tuples of terms and
variables respectively, ,,,,,hi Ie iJ denotes a (possibly
empty) conjunction of atoms.
2
Computed answer substitution semantics for D-open
programs
The operational semantics is usually given by means
of a set of inference rules which specify how derivations are made. From a purely logical point of
view the operational semantics is simply defined in
terms of successful derivations. However, frol11 a
programming language viewpoint, the operational
semantics must be concerned with additional information, namely observable properties. A given program in fact may have different semantics depending on which of its properties can be observed. For
instance in pure logic programs one can observe successes, finite failure, computed answer substitutions,
partial computed answer substitutions or any combination of them. A given choice of the observable induces an equivalence on programs, namely
two programs are equivalent iff they are observationally indistinguishable. "\iVhen the semantics correctly
captures the observable, two programs are equivalent if they have the same semantics. "\iVhen also
compositionality is taken into account, for a given
observable property we can obtain different seman-
tics (and equivalence relations) depending on which
kind of program composition we consider. Indeed,
the semantics of logic programs is usually concerned
with AND~composition (of atoms in a goal or in a
clause body). Consider for example logic programs
with computed answer substitutions as observable
[Falaschi et al. 1989a]. The operational semantics
can be defined as
O(P) = {p(X)8 IX are distinct var, p(X) ~p D}
where the denotation of a program is a set of nonground atoms, which can be viewed as a possibly infinite program [Falaschi et al. 1989a]. Since we have
syntactic objects in the semantic domain, we need
an equivalence relation in order to abstract from
irrelevant syntactic differences. If the equivalence
is accurate enough the semantics is fully abstract.
According to [Gabbrielli and Levi 1991b], Herbrand
interpretations are generalized by 7r-interpretations
which are possibly infinite sets of (equivalence classes
of) clauses. The operational semantics of a program P is then a 7r-interpretation I, which has
the following property. P and I are observationally equivalent with respect to any goal G. This
is the property which allows to state that the semantics does indeed capture the observable behavior
[Falaschi et al. 1989a]. The following example shows
that when considering OR-composition (i.e. union of
sets of clauses), non-ground atoms (or unit clauses)
are not sufficient any longer to define a compositional
semantics.
Exalnple 2.1 Lei
grams
PI
={
U8
con8ider the following pro-
q(X): -p(X).
1'(X) : -.s(X).
.s( b).
pea).
P2 = { p( b).
According to the prevzous definition of O( P),
O(Pd = {pea), q(a), reb), s(b)} and O(P2 ) = {pCb)}.
Since O(PI U P2 ) = {p(a),p(b), q(a), q(b), reb), s(b)},
the semantics of the ~mion of the two programs cannot be obtained from the semantics of the programs.
In order for a semantics to be compositional, it
must contain information in the form of a mapping
from sets of atoms to sets of atoms. This is indeed
the case of the semantics based on the closure operator [Lassez and Maher 1984] and on the Tp operator [Mancarella and Pedreschi 1988]. If we want
a semantics expressed by the program syntax, ORcompositionality can only be obtained by choosing
as semantic domain a set of (equivalence classes of)
clauses. In example 2.1, for instance, the semantics
of PI should contain also the clause q(X) : -p(X).
573
Let us formally give the definition of the program
composition we consider.
Definition 2.2 Let P be a program and Q be a set of
predicate symbols. P is open w.r.t. Q (or Q-open) if
the information on the predicates in Q is considered
to be partial. Moreover if P, Q are Q-open progra,ms
and (Pred(Q) n Pred(P)) ~ Q then P Uo Q is the
Q-open program PUQ. If(Pred(Q)nPred(P)) Cl Q
then P Uo Q is not defined.
Note that when considering an Q-open program
P and an Q'-open program Q, the composition of
P and Q is defined only if (Pred(Q) n Pred(P)) ~
(Q n Q'). Moreover, the composition of P and Q is
a W-open program, where W = Q U Q'.
The definition of any predicate symbol p E Q in
an Q-open program P can always be extended or
refined. For instance in example 1.1 program Ql is
open w.r.t. the predicate parent and this predicate
is refined in program Q2. Therefore, a deduction
concerned with a predicate symbol of an Q-open program P can be either complete (when it takes place
completely in the program P) or partial (when it terminates in P with an atom p(i) such that p E Q and
p( i) does not unify with the head of any clause in
P). A partial deduction can be completed by the
addition of new clauses. Thus we have an hypothetic
deduction, conditional on the extension of predicate
p.
Let us consider again the program PI of example 2.1 and assume
= {pl. Then, the goalr'(X)
produces a complete deduction onl;y, comput.ing the
answer substitution {Xjb}. The goal q(X) produces
a complete deduction, computing t.he answer substitution {X j a} and an hypothetical deduction returning any answer that could be computed by a
definition of p external to Pl' The goal q( b) instead
has one hypothetical deduction only, conditional on
the provability (outside PI) of p( b). V\Te want to express this hypothetical reasoning, i.e. that. q( b) is
refutable if p( b) is refutable. Hence we will consider
the following operational semantics (recall that by
B we denote B l , ... , Bn with n ~ 0).
.
n
Definition 2.3 Let Q be a set of predicate syrnbol8.
We define
Id(Q) = {p(-X-) : -p(_Y) I p E D, ~t are
distinct variables }
Definition 2.4 (Q-compositional computed answer
substitutions semantics) Let P be a program and let
P* = P U Id(Q). Then we define Oo(P) =
{A: -B2 I p(X) ~p Bl
-0.p.
B2
X distinct variables,
A = p( _Y ){J'y, {Pred( .Hz)} ~ Q}
The set of clauses I d( Q) in the previous definition is used to delay the evaluation of open atoms.
This is a trick which allows to obtain by using
a fixed fair selection rule R, all the derivations
P(Xl' ... ,Xn ) ~ P B l , ... , En which use any selection rule R', for P1'ed(B l , ... ,Bn) ~ D. Note that
t.he first step of the derivation uses a clause in P (instead than in P*) because we want Oo( P) to contain
a clause p(-X) : -p(X) if and only if p(_Y) ~p p(~Y).
Example 2.5 Let PI, P 2 be the Q-open programs of
example 2.1 where Q = {p}.
Then On(P2) = {p( b)} and
0dPl ) = {q(X) : -p(X), p(a), q(a), r(b), s(b)}.
0 0 contains eno'ugh information to compnte the semantics of compositions.
Indeed O(PI U P2) ~
On(Pl UP2) and On(Pl UP2) = Oo( On(PdUOo(P2))
(see theorem 2.9).
Example 2.6 Lei Q = {q,r} and lei Ql, Q2 be the
following programs
Ql = {p(X, Y) : -1'(X), q(Y).
Q2 = { 1'(b). }
r(a).
Then 00(Q2) = {1'(b)}, 00(Q1)=
{p(X, Y) : -r(X), q(Y), p(o., Y) : -q(Y), dan and
On(Q1 U Q2) = Oo(Oo(Qd U On(Q2)) =
{p(X, Y) : -1'(X), q(Y), p(a, Y) : -q(Y),
p(b, Y) : -q(Y), r(a), r(b)} (see theorem 2.9).
Note that Oo(P) is essentially the result of the
partial evaluation [Lloyd and Shepherdson 1987] of
P, where derivations terminate at open predicates.
This operational semantics fully characterizes hypothetic deductions, conditional on the extension of the
predicates in n. Indeed the semantics of a program
P can be viewed as a possibly infinite set of clauses
and the partial computed answer substitutions can
be obtained by executing the goal in the "program".
The equivalence (~n) on programs induced by the
computed ansvver substitution observable when considering also programs union, can be formally defined as follows.
Definition 2.7 Let PI, P2 be Q-open programs.
Then PI ~n P 2 if for e'very goal G and for every program Q s.t. Pi Un Q, i = 1,2) is defined)
{} P UnQ 0 Z.'ff G' 1----+
{}p
1 1p, '1,S a renamG 1----+
P U nQ 0 'were
j
T
2
mg.
On allows to characterize a notion of answer substitution which enhances the usual one, since also
(unresolved) atoms, with predicate symbols in Q, are
considered. Therefore it is able to model computed
answer substitutions in an OR compositional way.
The following results show that On(P) is compositional w.r.t. Un and therefore it correctly captures
574
the computed answer substitution observable notion
when considering also programs union.
where E is the empty renamzng. However" PI ~o
P 2 since by considering Q = {q(X,b),q(a,Y)},
Theorem 2.8 Let P be an rl-open program.
Then P ~o Oo(P).
p(X,Y) ~PIUQ 0 where () = {X/a,Y/b}, while the
goal p(X, Y) in the program P 2 U Q can compute either {X/a} or {Y/b} only.
Theorem 2.9 Let Pl. P2 be n-open programs and
let PI Uo P2 be defined.
Then Oo(Oo(Pd Un 00(P2 )) = Oo(Pl Uo P2 ).
Corollary 2.10 Let PI, P2 be n-open programs.
If Oo(Pd = On(P2 ) then PI ~o P 2 .
3 ,Semantic domain for n-open
programs
In this section we formally define the semantic domain which characterizes the above introduced operational semantics 0 0 . Since 0 0 contains clauses
(whose body predicates are all in n), we. have
to accommodate clauses in the intei'pretations we
use. Therefore we will define the notion of ninterpretation which extends the usual notion of interpretation since an n-interpretation contains conditional atoms. As usual, in the following, n is a set
of predicates.
Definition 3.1 (Conditional atoms)
An n-conditional atom is a clause
A: -B l , .. . , Bn such that Pn;:d(B l
, . ..•
Bn) ~
n.
In order to abstract fro111 the purely syntactical
details, we use the following equivalence ~ on conditional atoms.
Definition 3.2 Let Cl = Al : -B l , ... , B n , C2 =
A2 : -D l , ... , Dm be cla1tSes. Then Cl :s; C2 iff:::h9
such that 3{i l , . . . , in} ~ {I, ... , rn} such that A(z9
= A 2 , i h =I- i k for h =I- k, and (BliJ, ... , Bn iJ ) =
(Dil' ... ,Din ). Moreover we define C1 ~ C2 iff C1 :::; C2
and
C2 :::; C1·
Note that in the previous definition bodies of
clauses are considered as multisets (considering sets
would give the standard definition of subsumption).
Equivalent clauses have the same body (considered
as a multi set ) up to renaming. Considering sets instead of multisets (subsumption equivalence) is not
correct when considering computed answer substitutions. The following is a simple counterexample.
Example 3.3 Let
= p(X, Y) : -q(X, Y), q(X, Y)
and C2 = p(X, Y) : -q(X, Y). Let PI = {cd and
P 2 = {C2} be D.-open programs 'where D. = {q}. Obviously, considering bodies of cl(J.1I,8es (l.S sets. C1 = C2E
C1
Definition 3.4 The n-conditional base, Cn , is the
quotient set of all the rl-conditional atoms w. r. t.
~.
In the following we will denote the equivalence
class of a conditional atom c by G itself, since all
the definitions which use conditional atoms are not
dependent on the element chosen to represent an
equivalence class. Moreover, any subset of Cn will
be considered implicitly as an D.-open program. Before giving the formal definition of rl-interpretation,
we need the notion of 'u-closed subset of Cn .
Definition 3.5 A subset I of Cn is u-closed iff
VH : - Bll ... ,Bn E I and VB : - AI, ... ,Am E I
s'uch that ~() = mgu(Bi' B), for 1 :::; i :::; n,
(H : -B l , ... , B i - l , AI,"" Am, B i + b . · . , Bn){) E I.
Moreover if I ~ Cn , 'we denote by j its 'n-closure
defined as the least (w. r. t. ~) l' ~ Co 1t-closed such
tha.t I ~ 1'.
Proposition 4.5 will show that the previous notion of
u-closure is well defined. A u-closed interpretation I
is an interpretation which, if viewed as a program,
is closed under unfolding of procedure calls. Interpretations need to be u-closed for the validity of the
model theory developed in section 5. Therefore, in
order to define rl-interpretations we will consider uclosed sets of conditional atoms only. Let us now
give the formal definition of D.-interpretation.
Definition 3.6 An rl-interpretation I is any subset of Co which is tt-closed. The set of all the D.interpretations is denoted by S'.
Lemma 3.7
(S',~) is a complete lattice where the
minimal element is
X ~ S'.
0
and glb(X) = Ux~ x for any
In the following the operational semantics On will
be formally considered as an D.-interpretation.
4
Fixpoint semantics
In this section we define a fixpoint semantics Fn( P)
which in the next subsection is proved to be equivalent to the previously defined operational semantics
On(P). This can be achieved by defining an immediate consequence operator TJ1 on the lattice (S', ~)
of D.-interpretations. Fn( P) is the least fixpoint of
TJ1.
575
The immediate consequences operator TJl is
strongly related to the derivation rule used for D.open programs and hence to the unfolding rule.
Therefore TJl models the observable properties in an
OR compositional way, and may be useful for modular (i.e. OR compositional) bottom-up program
analysis.
4.1
Unfolding semantics and equivalence results
{(A : -LI, . .. ,Ln){} E Cn I
3A: -Bb'" ,Bn E P,
3Bi: -Li E I U Id(D.), i = 1, ... ,11., mi ~ 0
s.t. {} = mgu((Bb"" B n), (B~, ... , B~))}
To clarify the relations between the operational and
the fixpoint semantics, before proving their equivalence, we introduce the intermediate notion of unfolding semantics Un(P) [Levi 1988, Levi and Mancarella 1988]. Un(P) is obtained as the limit of the
unfolding process. Since the unfolding semantics can
be expressed top-down in terms of the r~ operator, the unfolding semantics can be proved equal to
the standard bottom-up fixpoint semantics. On the
other hand, since Un(P) and On(P) are based on
the same inference rule (applied in parallel and in
sequence respectively) Un(P) and Oo(P) can easily
be proven equivalent.
Proposition 4.2 TJl is contin1W'lls in the complete
lattice (~, ~).
Definition 4.7 Let P and Q be D.-open programs.
Then the unfolding of P w. r. t. Q is defined as
Definition 4.1 Let P be an D.-open program. Then
TfJ(I) = r~(I) where r~(I) is the operator defined
in [Bossi and M enegus 1991 j as follows.
r~(I)
=
The notion of ordinal powers for TJl is defined as
usual, namely TJl iO = 0, TJl in+1 = Tj}( TJl in )
and TJl iw = Un>O ( TJl in). Since Tj} is continuous on (8', ~), ~ell known results of lattice theory
allow to prove proposition 4.3 and hence to define
the fixpoint semantics as follows.
Proposition 4.3 Tj} iw is the least fixpoint of Tj}
in the complete lattice (~, ~).
Definition 4.4 Let P be an D.-open program.
The fixpoint semantics Fn(P) of P is defined as
Fn(P) =Tj} i~',
Remark
The original definition of r~( I) does not require D.interpretations to be u-closed subsets of Cn . If we
consider an D.-interpretation as any subset of Cn and
the r~ operator, even if the intermediate results
r~ i 11. are different, the following proposition 4.5
and theorem 4.6 show that the least fixpoint r~ i w
is a u-closed set and it is equal to Fn(P) (r~ is continuous on (~(Cn), ~)). Therefore, when considering
the fixpoint semantics we can use the r~ operator.
Moreover, proposition 4.5 ensures us that the previous notion of u-closure is well defined.
Proposition 4.5 Let I ~ Cn and let r9(I) be defined as in definition 4.7. Then the following hold
1. I is u-closed iff 1=
r9(I))
2. for any program p) r~
i
w is 'u-closed)
9. I' = r9 i w is the least (w. r. t. set incl1tsion)
subset of Cn such that it is ,It-closed and I ~ 1'.
Theorem 4.6 Lei P an D.-ope~ program. r~
Fn(P).
i
w
=
unffJ(Q)=
{(A: -Ll'''' , Ln){} I
3A : -Bl' ... , Bn E P,
3Bi : -Li E I U !d(D.), i = 1, ... ,11., mi ~ 0
s.t. {} =n/'gU((Bl, ... ,Bn),(B~, ... ,B~))}
Note that the only difference between unffJ(Q) and
r~( Q) is that the second restricts to clauses in Cn
the set resulting from the definition. Therefore if I is
an D.-interpretation (i.e. I ~ Cn), r~(I) = unffJ(1)
holds. In general, r~(I) = tn(unffJ(1)) where tn(P)
extracts from a program P an D.-interpretation.
Definition 4.8 Let P be an D.-open program. Then
we define
tn(P)={cEPlcECn }.
Definition 4.9 Let P be an D.-open program and let
tn(P) be as defined in definition 4,8. Then we define
the collection of programs
Po =P
Pi = Unfpi_l (P)
The unfolding semantics Un(P) of the program P is
defined as
Un(P) =
Ln(Pi ).
U
i=1,2, ...
The following theorem states the equality of the unfolding and the operational semantics.
Theorem 4.10 Let P be an D.-open program. Then
On(P) = Un(P).
Note that r~ i 11. + 1 = unf;;, (0), where P~
P and P!+! = 1mffJ(Pf). Therefore we have the
following t.heorem.
576
Theorem 4.11 Let P be a program. Then Fo(P) =
Uo(P).
Corollary 4.12 Let P
Fo(P) = Oo(P}.
5
be a program.
Then
Example 5.6 Let us consider the D-open program
P = {pea) :' -q(a)} where D = {q}. Then 0 is
a (the least) H erbrand model of P. If, by violating
the J ~ Atomo(!) condition, {q(a)} E H(0), since
{q( a)} is not a H erbrand model of P, 0 would not be
an D-model of P.
Model Theory
As we have shown, the operational and fixpoint semantics of a program P define an D-interpretation
I p , which can be viewed as a syntactic notation for
a set of Herbrand interpretations denoted by H(Ip).
Namely, H( Ip) represents the set of the least Herbrand models of all programs which can be obtained
by closing the program Ip with a suitable set of
ground atoms defining the open predicates. Our aim
is finding a notion of D-model such that Oo(P) (and
Fo(P)) are D-models and every Herbrand model is
an D-model. This can be obtained as follows.
Definition 5.1 Let J be an D-interpretation .. Then
we define
Atomo( J) = {pC i) I p E D and p( i) is a ground
instance of a.n a.tom in J}.
Example 5.2 Let D = {p, q} and
J = {pea) : -q(b)}. Then Atorno(J)
= {p(o.),q(b)}.
Definition 5.3 Let I be an D-interpreta.tion for an
D-open program. Then we define
H(!)
= {M(I U J) I J
~
Atomo(I)}
where M( K) denotes the least H erbrand model of J(.
Example 5.4 Let I
interpretation. Then
= {p(o.) : -q(b)} be an D-
1) for D = {q}
Atomo (I) = {q( b)} and
H(!) = {0. {pC a.), q(b)}},
2)
for D = {p, q}
Atomo(I) = {p(a),q(b)} and
H(!) = {0, {p(o.)}, {p(o.),q(b)}}.
Definition 5.5 Let P be an D-open program and_
I be an D-interpretation. I is an D-model of P iff
'If J E H(I), J is a Herbrand model of P.
Obviously, in general given a Herbrand modellvI
of a program P, M U N is not anymore a model of
P for an arbitrary set of ground atoms N. Since
we want a notion of D-model which encompasses the
standard notion of Herbrand model, the "closure" of
the interpretation I can be performed by adding only
ground atoms which unify with atoms already in I.
The following example 5.6 shows that if such a condition is not satisfied, a standard Herbrand model
would not any more be an D-p10del.
Example 5.7 Let us consider the program PI where
D = {p} of the example 2.1. Then
Oo(Pd = {q(X) : -p(X), pea), q(a), reb), s(b)}
is an D-model of PI since
H(Oo(Pl )) = {H l ,H2 ,H3 , .. • }'
where, denoting by [p(X)] the set of ground instances
of p(Xo),
HI = {p(a),q(a),r(b),s(b)}
H2 = {p(a),p(b),q(a),q(b),r(b),s(b)}
Hw = {reb), s(b)} U [p(X)] U [q(X)]}
and HI, H 2 , ••• ,Hw a.re Herbrand models of Pl'
The following proposition states the mentioned properties of D-models.
Proposition 5.8 Let P
open program. Then
1. every Herbrand model of P is an D-model of P,
2. Oo(P) is an D-model of P.
A relevant property of standard Herbrand models states that the intersection of a set of models of
a program P is always a model of P. This allows
to define the model-theoretic semantics of P as the
least Herbrand model obtained by intersecting all
the Herbrand models of P. The following example
shows that this important property does not hold
any more when considering D-models with set theoretic operations.
Example 5.9 Let D = {q} and P be the following
D-open program P = {pCb) : -q(b), p(X), q(a)}.
Then Oo(P) = {pCb) : -q(b), p(x), q(a)} and
M(P) = {q(a)} U {pel) I l is a ground term }.
By proposition 5.8 Oo(P) and M(P) are D~models
ofP. HoweverOo(P)nM(P) = {q(a)} is not an
D-model of P.
The D-model intersection property does not hold
because set theoretic operations do not adequately
model the operations on conditional atoms. Namely,
the information of an D-interpretation II may be
contained in 12 without II being a subset of 12. In
order to define the model-theoretic semantics for Dopen programs as a unique (least) D-model, we ·then
need a partial order ~ on D-interpretations which
577
allows to restore the model intersection property. G
should model the meaning of D-interpretations, in
such a way that (SS, G) is a complete lattice and the
greatest lower bound of a set of D-models is an Dmodel. As we will show in the following, this can
be obtained by considering G as given: in definition
5.10. According to the above mentioned property,
there exists a least D-model. It is worth noting that
such a least D-model is the standard least Herbrand
model (proposition 5.21). Moreover note that, the
most expressive D-model Oo(P) is a non-minimal Dmodel. The following definitions extend those given
in [Falaschi et a1. 1989b] for the non compositional
semantics of positive logic programs.
Definition 5.10 Let II, 12 be D-interpretations.
We define
• II S 12
UJ
VC1
E II
3C2
E 12 s'uch that e2
• II G 12 iff (II S 12) and (12 S II
II ~ 12)'
SCI'
implies
Proposition 5.11 The relation S is a preorder (f.nd
the relation G is an ordering.
Note that if II ~ 12 , then II G 12 , since II ~ [;2
implies II S 12 . The following definitions and propositions will be used to define the model-theoretic semantics.
Definition 5.12 Let I be an D-interpretation. We
define Min'(I) = {c E I I Ve' E I, c'S c :::} c' = c}
and Min(I) = 1I1in'(I).
Example 5.13 We show "~1in and jIlin' for the following D-interpretations I and J. Let
It is worth noting that V I Min( I) ~ I (recall that
I is u-closed) and "~1in(A) = "~lin.(U A).
Proposition 5.15
For any set A of D-interpretations there exists the
least tlpper bound of A , Iv.b(A). and [ub(A) = UA
holds.
Proposition 5.16 The set of all the D-interpretations SS with the ordering G is a complete lattice. Co
is the top element and 0 is the bottom element.
The model-theoretic construction is possible only
if D-interpretations can be viewed as representations
of Herbrand interpretations. Notice that every Herbrand interpretation is an D-interpretation. The following proposition generalizes the standard intersection property of Herbrand models to the case of Dmodels .
Proposition 5.17 Let M be (J. non-empty 8et of Dmodels of an D-open pTOgram P. Then glb(M) is an
D-model of P.
Corollary 5.18 The set of all the D-rnodels of a
program P with the ordering G 7:" (J. complde lattice.
Vve are now in the position to formally define the
model- theoretic semant.ics.
Definition 5.19 Let P be a program. n~ modeltheoretic semantics 1:S the greak~t lower bound of the
set of its modeL~: i.e.,
Alo (P) = glb( {I E '::s' I I is a D-m.odel of P}).
1=
{p(x), q(b), p(a), p(a) : -q(b)
J={
q(;1.;) : -p(;r), 7'(;1')
Proposit.ion 5.21 shovvs t.hat the above defined
model-t.heoret.ic semantics is t.he standarclleast. Herbrand model. This fact just.ifies om choice of t.he
ordering relat.ion.
q(b) : -p(b)
q(b) : -p(T)
db)
Proposition 5.20 For any D-morlel I there exists
a standard H erbrand model I' ,,·u.ch that I' G I.
Proposition 5.21 The least "ta.ndo.rd
model is the lea,8t D-model.
Then
Min'(I) = lvIin(I) = {p(x), q(b)}.
Min'(J) = {r(b), q(x) : -p(;r),r(x). q(b) : -p(.?,)},
Min(J) = J.
Definition 5.14
Let A be a set of D-interpreta.tions.
the following notations.
6
We introd'uce
• VA = UIEA I
• Min(A) = lvIin(vA)
• UA = ,4 'where ...-l = JlIin(~\) U v{I
Min(A)
~
I} )
E .:\
I
Herbrand
SD-models
vVe will now consider t.he relat.ion between Dmodels (definition 5.5) ancl the So-models defined
in [Bossi and Menegus 1991] on the same set of interpretat.ions. Bot.h t.he D-models and the So-models
are intended to capt.ure specific operat.ional properties, from a model-t.heoret.ic point of view. However,
So-models are based on an ad hoc notion of t.ruth
(So-trut.h) and t.he least. So-modd is eX(l('tly Fo(P).
578
Conversely, n-models are based on the usual notion
of truth in a Herbrand interpretation through the
function 1t. Moreover the least n-model is the usual
least Herbrand model, while Fo(P) is a non-minimal
n-model.
Definition 6.1 (Bossi and Menegus 1991]
(So-Truth) Let n be a set of predicate symbols and I
be an n-interpretation. Then
(a) An atom A is n-true in I iff AEI.
(b) A definite clause A:-B I , ... ,Bmis n-true in I iff
VB~,
... , B~ such that
B~ : - L1 , ••• ,B~ : - Ln E I U I d( n)
if 319 = mg·u((B ll ... , B n ), (B~, ... , B:J)
then (A : -LI" .. , LnW EI.
So-models are defined in the obvious way.
It is worth noting that, Slllce Oo(P)= Fo(P)
= MSf)(P), theorem 2.9 shows that the model-
theoretic semantics Msn(P) is compositional w.r.t.
n-union of programs when considering computed answer substitutions as observahles. This result was
already proved in [Bossi and Menegus 1991] for the
Msn(P) model. Finally note that, as shownby the
following example, Tj1 is not monotonic (and therefore it is not continuous) on the complete lattice
(S', ~). However, proposition 6.10 ensures us that
Fo(P) is still the least fixpoint of Tj1 on (S', ~).
Exalnple 6.9 Consider the program
P = {reb) p(x): -q(x)}.
Let n = 0, II = {q(a),q(x)} and 12 = {r(b),q(x)}.
Then II ~ 12 while Tj1(Id={p(:r),p(a),r(b)} ~
Tj1(I2 ) ={p( x), r( b)}.
Proposition 6.2 E-very So-model is an n-model
(according to definition 5.5).
Proposition 6.10 Tj1 jw is the least fixpoint of Tj1
on the complete lattice CS', ~).
Proposition 6.3 (Bossi and Meneg'lls 1991] If A i8
a non-empty set of So-models of an n-open program
P, then nMEA k! is an So-model of P.
7
The previous proposition allows to define the model
theoretic semantics M Sf) (P) for Ct prograrn P in
terms of the So-models as follows.
Definition 6.4 (Bossi and M encgns 1991] Let P be
an n-open p7'Ogram and let S be the set of a.ll the
So-modeL~ of P. Then j'vlsn(P) = nIlES j\I.
Corollary 6.5 Let A be a non-empty sct of Snmodels of an n-open program P. Then nUEA JI is
an n-model of P.
By definition and by proposition 6.:3., j'vlSf)(P) is
the least So-model in the lattice (:s. <:;;;:) (recall that :s
is the set of all the n-interpretations). The following
proposition shows that j'vlsn(P) is also the least Somodel in the lattice (S, ~).
Proposition 6.6 Let P be a p'T'ogram and let S be
the set of all the SwmodeL~ of P. Then j'vl 8'-1 (P) =
glb(S) (acco'Nling to ~ oTdeTing).
The following theorem shows the equivalence
of the fixpoint semantics (definition 4.4) and the
model-theoretic semantics j'vl S'n (P).
Theorem 6.7 (Bossi and Meneg'l/.s 1991] Let P be
an n-open program. Then Fo(P) = .i'vlS'I(P),
Corollary 6.8 Let P be an n-open pTogT(J:rn. Then
Fo(P) is an n-model of P.
Related work and conclusions
The result of our semantic construction has several sirnilarities with the proof-theoretic semantics
defined in [Gaifman and Shapiro 1989a, Gaifman
and Shapiro 1989b]. Our construction however is
closer to the usual characterization of the semantics of logic programs. Namely we define a topdown operational and bottom-up fixpoint semantics,
and, last but not least a model-theoretic semantics which allows us to obtain a declarative characterization of syntactically defined models. The
semantics in [Gaifman and Shapiro 1989a] does not
characterize computed ansvver substitutions, while
the denotation defined by the fully abstract semantics in [Gaifman and Shapiro 1989b] is not a set of
clauses (i.e. a program). The framework of [Gaifman
and Shapiro 1989a, Gaifman and Shapiro 1989b]
can be useful for defining a program equivalence notion, even if our more declarative (model-theoretic)
characterization is even more adequate. Moreover,
the presence of an operational or a fixpoint semantics makes our construction useful as a formal basis for program analysis. Another related paper is
[Brogi et al. 1991]' where n-open logic programs are
called open theories. Open theories are provided
with a model-theoretic semantics v"hich is based on
ideas very similar to those underlying our definition
5.3. [Brogi et al. 1991] however does not consider
semantic definitions in the style of our OoCP) which
gives a. unique denotation to any open program.
579
Let us finally remark some interesting properties
of the n-model On(P).
• By means of a syntactic device, we obtain a
unique representation for a possibly infinite set
of Herbrand models when a unique representative Herbrand model does not exist. A similar device was used in [Dung and Kanchanasut 1989, Kanchanasut and Stuckey 1990, Gabbrielli et a1. 1991J to characterize logic programs
with negation.
• Operators, such as Un are quite easy and natural to define on On(P).
• On(P) can be used for modular program analysis [Giacobazzi and Levi 1991J and for studying
new equivalences of logic programs, based on
computed answer substitutions. which arf' not
considered in [Maher 1988].
• It is strongly related to abd'uction [Eshghi and
Kowalski 1989]. If n is the set of abducible predicates, the abductive consequences of any goal
G can be found by executing G in Oll(P).
• The delayed evaluation of open predicates which
is typical of Oll(P) can easily be generalized to
other logic languages, to achieve compositionality w.r.t the union of programs. In particular
this matches quite naturally the sem.antics of
CLP and concurrent constraint programs given
in [Gabbrielli and Levi 1990. Gabbrielli and Levi
1991aJ.
References
[Apt 1988] 1\:. R. Apt. Introduction to Logic Programming. In J. van Leeuvven, editor, Handbook of Theoretical Computer Science, volume
B: Formal Models and Semantics. Elsevier, Amsterdam and The MIT Press, Cambridge. 1990.
[Bossi et al. 1991] A. Bossi, M. Gabbrielli, G. Levi,
and M. C. Meo. Contributions to the Semantics
of Open Logic Progratns. Technical Report TR
17/91, Dipartimento di Informatica. Universita
di Pisa, 1991.
[Bossi and Menegus 1991] A. Bossi and M. IVIenf'gus. Una Semantica Composizionale per Programmi Logici Aperti. In P. Asirelli, editor,
Proc. Sixth Italian Conference on Logic Programming, pages 95-109. 1991.
[Brogi et a1. 1991J A. Brogi, E. Lamma. and
P. Mello. Open Logic Theories. In P. Krnf'ger
L.-H. Eriksson and P. Shroeder-Heister, editors,
Proc. of the Second WorA.:8hop on Exten8ion8 to
Logic Programming, Lecture Notes in Artificial
Intelligence. Springer-Verlag, Berlin, 1991.
[Dung and Kanchanasut 1989] Phan Minh Dung
and K. Kanchanasut. A Fixpoint Approach to
Declarative Semantics of Logic Programs. In
E. Lusk and R. Overbeck, f'ditors, Proc. North
American Conf. on Logic Programming)89,
pages 604-625. The MIT Press, Cambridge,
Mass., 1989.
[Eshghi and Kowalski 1989J K. Eshghi and R. A.
Kowalski. Abduction compared with Negation
by Failure. In G. Levi and M. Martelli, editors, Proc. Sixth Int)l Conf. on Logic Programming, pages 234-254. The T\IIT Press, Cambridge, Mass., 1989.
[Falaschi et a.l. 1988] M. Falaschi, G. Levi,
M. Martelli, and C. Palamidessi. A new Declarative Semantics for Logic Languages. In R. A.
Kowalski and K. A. Bow·en, editors, Proc. Fifth
Int'l Conf. on Logic Programming, pages 9931005. The NIIT Press, Cambridge. :r-dass., 1988.
[Falaschi et a.l. 1989a] M. Falaschi. G. Levi,
DeclaraM. Martelli, and C. Palamidessi.
tive Modeling of the Operational Behavior of
Logic Languages. Theoretical Computer Science, 69(3):289-318,1989.
[Falaschi et a.l. 1989b] lvI. Falaschi, G. Levi,
M. Martelli, and C. Palamidessi. A ModelTheoretic Reconstruction of the Operational Semantics of Logic Programs. Tf:'chnical Report
TR 32/89, Dipartimento di Informatica, Universita di Pisa, 1989. To appecu- ill Information
and Computation.
[Gabbrielli and Levi 1990J M. Gabbrielli and
G. Levi. Unfolding and Fixpoint Semantics of
Concurrent Constraint Programs. In H. Kirchner and \V. \Vechler, editors, Proc. Second Inn
Conf. on Algebra1:c and Logic Pmgrarnming, volume 463 of Led-ure N ote8 in Computer Science,
pages 204-216. Springer-Verlag. Berlin, 1990.
Extended version to appear ill Thr:oretica.l Computer Science.
[Gabbrielli and Levi 1991a] M. Gabbrielli and
G. Levi. Modeling Answer Constraints in Constraint Logic Programs. In K. Furukmva, editor,
Proc. Eighth Int'l Conf. on Logic Programming,
pages 238- 252. The MIT Press. Cambridge,
Mass., 1991.
580
[Gabbrielli and Levi 1991 b] M. Gabbrielli and
G. Levi.
On the Semantics of Logic Programs.·
In J. Leach Albert, B. Monien,
and M. Rodriguez-Arta.lejo, editors, A-utorrwta,.
Lang'uages and Programming, 18th International Colloquium, volume 510 of Ledllre Notes
in Comp'llter Science, pages 1-19. SpringerVerlag, Berlin, 1991.
[Gabbrielli et al. 1991] M. Gabbrielli, G. Levi, and
D. Turi. A Two Steps Semantics for Logic Programs with Negation. Technical report, Dipartimento di Informatica, Universita di Pisa,
1991.
[Gaifman and Shapiro 1989a] H. Gaifman and
E. Shapiro. Fully abstract compositional semantics for logic programs. In Pr"()(;. Sixtr.enih
Annual ACM Symp. on Principles of PTogTa7f/,ming Lang'llages, pages 134-142. AC~I. 1989.
[Gaifman and Shapiro 1989b] H. Gaifman and
E. Shapiro. Proof theory and semantics of logic
programs. In Proc. Fourih IEEE Symp. on
Logic In Cornputer Science, pages 50-62. IEEE
Computer Society Press. 1989.
[Giacobazzi and Levi 1991] R. Giacobazzi anel
G. Levi. Compositional Abstract Interpret.ation
of Constraint Logic Programs. Technical report, Dipartimento di Informatica. lTni\"(~rsit;\
di Pisa. 1991.
[Kanchanasut and Stuckey 1990] E. Eanchanasut
and P. Stuckey. Eliminating Negation from
Normal Logic Progrmns. In H. Eirchner and
W. \Vechler, editors, Pmt. Su;ond Iui'l Conf. on
Algebraic and Logic Progra:mming. volume 463
of Leci'llre Note8 in Comp-u.icr Sciencf:. pages
217-231. Springer-Verlag. Berlin. 1990.
[Lassez and r-v'Iaher 1984] J.-L. Lassez €,
M = A + (B - A) /2, x = Xl + X2
o int(A, M, Xl)' int(M, B, X2), E(€).
E(x) : - x = l/nON(n).
N(n')
n' = n + 10N(n).
N(n') : - n' = 10.
Let, for instance, c+, c* and Cf be the costs of
addition, multiplication/division and f respectively.
Variables and constants have a zero cost. Thus, denoting r T such constraint system:
F1T(P) =
{
int(A, B, x) : E(n'): - c*'
4c+
+ 3c* + Cf,
}
N(n') :- 0
o
A space of approximate constraints can be specified by defining an auto-weak morphism p which
is an upper closure operator (i.e. an idempotent,
monotonic and extensive operator) on (e, ~I'f)). As
shown in [Cousot and Cousot 79] the approximation
process essentially consists in partitioning the space
of constraints so that no distinction is made between
equivalent constraints, all approximated by a representant of their equivalence class. The equivalence
relation is induced by an upper closure operator p:
Cl
C2 iff P(CI) = P(C2). In [Cousot and Cousot 79]
different equivalent methods for specifying abstract
domains (i.e. upper closure operators) are presented.
However, there are standard techniques in algebraic
specifications that allow the definition of abstract
constraint systems. For example, cylindrifications
can be interpreted as abstractions on the algebra of
constraints.
=p
Proposition 4.3 Let ~ ~ V; :J~ is an auto-weak
morphism and upper closure operator on (e, ~I'f)).
Existential quantification is then a way to define
abstract domains. The space of approximate constraints can also be specified by adding axioms to
588
the underlying constraint system A. These additional axioms extend the meaning of the diagonal elements dt,t' of the algebra, in effect specifying which
objects are to be considered "equivalent" from the
perspective of the analysis. This is illustrated by the
following example:
Example 4.2 Consider the logic program P
p(O).
p(s(x))
q(s(x))
and a simple type (parity) analysis for P. Interpreting P as a constraint logic program on the Herbrand
constraint system A H , the type analysis can be specified by extending the axioms specifying the constraint system with the additional axiom: s(s(x)) =
x. The resulting constraint system, denoted by A1J,
is trivially Noetherian. The semantics of P in AH is
{p(x) :- x = 0 V x = s2n(0); q(:r):- V x =
n>l
n>l
s2n-l(0)}; whereas the interpretation in A1J l~turns
{p(x) : - x = 0; q(x) : - x = s(O)}. The meaning
of P in A1J captures the type of the predicate p and
0
A very useful analysis on the relationships among
variables of a program can be specified in our framework [Cousot and Halbwachs 78]. The automatic
derivation technique in [Verschaetse and De Schreye 91] for linear size relations among variables in
logic programs can be suitably specified as a constraint computation. A constraint system of affine
relationships (i.e. linear equalities of the form Co =
CIX 1 + ... + cnXn) can be defined by specifying intersection, disjunction and cylindrification (restriction)
as. given in [Verschaetse and De Schreye 91]. Generalizations considering linear inequalities, as proposed
in [Cousot and Halbwachs 78], can still be defined
in our framework, thus making explicit the strong
connection between automatic detection of linear relationships among variables and C LP( 3?) computations. Applications of this analysis are: compile time
overflow, mutual exclusion, constraint propagation,
termination etc. [J¢rgensen et al. 91J.
4.1
1. .
Example 4.3 The following weighting map is a
norm on the Herbrand term system: Itlsize = 0 if t is
avariableort = [], Itlsize = 1+ltaill size ift = [hltail].
o
q(x).
p(x).
q, computing even and odd numbers respectively.
Definition 4.3 Let T ba a term system. A norm
on T is a function 1< : T ~ N, mapping any
term t E T into a natural number. I
Generalized Rigidity Analysis
There exists a wide class of abstract interpretation techniques for the analysis of ground dependences (also named covering) of pure logic programs
[Barbuti et al. 91,Cortesi et al. 91J. In this section
we extend the ground dependence notion by means
of the notion of rigidity.
A norm is a function weighting terms. Let us recall
some basic concepts about norms. For a more accurate treatment on this subject see [Bossi et al. 90].
In order to extend the notion of groundness and
ground dependences [Barbuti et aI. 91, Cortesi et
al. 91] to deal with a more refined one, able to take
into account only the relevant subterms of a given
(possibly non-ground) term t, we address the notion
of rigidity as introduced in [Bossi et al. 90J.
Definition 4.4 Let I.. k be a norm on the term system T. A term t E T is rigid with respect to I.. k iff
for any substitution of variables a: latk = Itk. I
The rigidity of terms turns out to be important in
simplifying termination proofs. If a term is rigid, its
weight will not be modified by further substitutions.
Rigidity is then strongly related to groundness. Any
ground term can not change its weight by instantiation, thus it is always rigid. This notion allows to
identify those sub terms which are relevant for the
analysis purposes. Notice that given a norm I.. k,
and a non-rigid term t E T, there must exist some
variable in t whose instantiation affects the weight of
t. In the Herbrand case, results in [Bossi et al. 90]
allow to restrict our attention to a particular class
of norms: semilinear norms on Herbrand.
Definition 4.5 A norm on 1(~,V) is semilinear iff it
may be defined according to the following structure:
Itk = 0 if t is a variable; Itk = Co + Itil k + ... + Itimlc
if t = f(tl, ... ,t n ), where Co 2: a and {il, ... ,i m } ~
{1, ... ,n}. I
Note that the position of the subterms which allow
the principal term to change its weight by instantiation depends on the outermost term constructor only
(i.e. 1). These subterms are then relevant from the
analysis viewpoint. All the non-relevant subterms
are discarded by the analysis. Semilinear norms allow to reduce the rigidity notion to a syntactical
property of terms. Let
Vrel«(t) = { v E V
I :3 a
such that latk =I- Itk }.
As shown in [Bossi et al. 90], given a semilinear
norm I.. k, a term t E 1(~,V) is rigid iff Vrel«(t) = 0.
The notion of semilinear norms can be generalized to
589
arbitrary term systems in a straightforward way, as
follows: given a term system T, we define a function
w : T ---+ N; for each t E T, an associated finite
set of functions Ft : t ---+ T; and an associative and
commutative function ~: N x N ---+ N.
Intuitively, for any term t, the value of w(t) is the
"initial weight" of the term t, the set of functions
F t correspond to the set of selectors for the "relevant" subterms, and ~ indicates how the sizes of the
sub terms of a term are to be combined. Then, generalized semilinear norms can be defined as follows:
It I = w(t)+ ~fEFt If(t)l·
Example 4.4 The "usual" notion of semilinear
norms for Herbrand constraint systems can now be
generalized as follows, let Co E N: w(t) = 0 if t is a
variable, Co otherwise; if t is a variable then F t = 0;
otherwise Ft consists of selectors for the relevant positions of t; ~ is summation.
The "depth norm", which could not be expressed as a semilinear norm in the development of
[Bossi et al. 90], can be defined as follows: w( t) = 0
if t is a variable, 1 otherwise; if t is a variable
then F t = 0; otherwise if t = f(tl, ... , tn) then
F t = {fill ::s; i ::s; n}, where fi(t) = ti, i.e. fi is
the selector for the subterm at the ith position; and
~ is max.
0
Let us consider the set C(V) of finite conjunctions
of variables in V (the empty conjunction is denoted
€) and a term abstraction map O'.T : T --+ C(V) such
that, given a semilinear norm I·· Ie and t E T, O'.T(t) =
{ Xl /\ ... /\ Xm Vrel((t) = {Xl, ... , x m } } . Let 7(
be the corresponding abstract term system where
substitutions are performed as usual.
Marriott
and Spndergaard have proposed an elegant domain,
named Prop, further studied in [Cortesi et al. 91]'
to represent ground dependences among arguments
in atoms. In [Codognet and File 91] an interesting
application is introduced. Prop is formalized as a
constraint system, and both groundness and definiteness analysis are specified by executing programs
in CLP(Bool). The corresponding constraint system does not allow disjunctions of vara.bles, without
fully exploiting the expressive power of Prop. The
general notion of ground dependence corresponding
with any Prop formula (including disjunctions) ca.nnot be specified.
I
Let A( = (Prop(, V,/\,T,F,3x,t <-t t')xr;,V;t,t'E1{
be the algebra of possibly existentially quantified formulas defined on the term system 7(; including the
set of connectives V, /\, <-to Intuitively, the formula
X /\
Y /\ z
<-t W /\
t = t' where
{w,v}; X /\ Y
v represents an equation
VreZ,(t) = {x,y,z} and Vrel,(t')
=
represents a term whose rigidity depends upon variables X and y; while X V Y represents a set of terms
whose rigidity depends upon variables X or y. Local variables are hidden by existential quantification,
projecting away non-global variables in the computation [Codognet and File 91].
Let Bool be a boolean algebraic structure; c ~Bool
c' iff B 001 1= c <-t c'. It is easy to prove that Ad';::, Boo!
is an abstract constraint system.
Exalnple 4.5 Let us consider the semi linear norm
"size" and the following constraint logic program on
the Herbrand constraint system
append(XI' X2, X3)
append(xI,x2,X3)
Xl = [] /\ X2 = X3·
Xl = [hly] /\ X3 = [hlz]
Dappend(y, X2, z).
The corresponding abstract model is:
{append(xI,x2,X3) :- Xl
<-t € /\ X2 <-t X3}, generalizing the standard ground behavior (where
Vrel(t)
var(t): and the abstract model is
{append( Xl, X2, X3) : - X3 <-t Xl /\ X2}) vs. sizerigidity behavior: "the second argument list-size can
change iff the third argument does".
0
5
Machine-level Traces
In this section, we consider an example non-standard
semantics for constraint logic programs, that of
machine-level traces (for a discussion of similar nonstandard semantics in a denotational context, see
[Stoy 77]). Such a semantics is essential, for example, if we wish to reason formally about the correctness of a compiler (e.g. see [Hanus 88]) or the behavior of a debugger or profiler. In this section, we show
how the semantics described in earlier sections may
be instantiated to describe such low-level behaviors.
Instead of constrained atoms where each atom is associated with a constraint, this semantics will associate each atom with a set of machine states (equivalently, instruction sequences) that may be generated
on an execution of that atom.
The code generated by a compiler for a constraint
language must necessarily depend on both the constraint system and the target machine under consideration. Suppose that each "primitive" constraint
op( t l , ... , t n ) in the language under consideration
corresponds to (an instance of) a (virtual) machine
instruction op(t l , ... , t n ).2 For example, correspond2In an actual implementation, each such virtual machine
instruction may, of course, "macro-expand" to a sequence of
lower-level machine instructions.
590
ing to a constraint 'X = Y + 5' in the language under consideration, we might have a virtual machine
instruction 'eq(X, Y + 5)'. Each such machine instruction defines a transformation on machine states,
representing the changes that are performed to the
heap, stack, :registers, etc. of the machine by the ex. ecution of that instruction (e.g., see [Hanus 88] for a
discussion of the WAM along these lines). In other
words, let S be the set of all possible states of the
machine under consideration, then an instruction I
denotes a function I : S --+ S U {fail}, where fail
denotes a state where execution has failed.
Given a set S, let Soo denote the set of finite
and infinite sequences of S. Intuitively, with each
execution we want to associate a set of finite and
infinite sequences of machine states, that might be
generated by an OR-parallel interpreter. Thus, we
want the universe of our algebra to be 2s "", the
set of sets of finite and infinite sequences of machine states. One subtlety, however, is that instructions may "fail" at runtime because som~ constraints may be unsatisfiable. To model this, it
is necessary to handle failure explicitly, since "forward" execution cannot continue on failure. To
deal with this, we define the notion of concatenation of sequences of machine states as follows:
given any two sequences 81 and 82 of states in S U
{fail}, their concatenation 81 8 82 is given by 81 8 82
= if 81 contains fail then 81 else concat (81,82),
where concat (81,82) d~notes the "usual" notion of
concatenation of finite and count ably infinite sequences. Thus, the cylindric closed semiring in
this case is (e, 0 , EEl ,1, 0, 3~, dt,t ' )~~V;t,t'ET where:
C = 2(su{fail})OO is the set of finite and infinite sefor any S1, 52 E C,
quences of machine states;
S10S2 = {818 8 2I 81 E S1,82 E 52}; EEl = U;
1 = {c:}, where c: is the empty sequence; 0 = 0;
3~ corresponds to the function that, given any machine state S, yields the machine state obtained by
discarding all information about the variables in ~;
and for any t, t' E T, dt,t ' corresponds to the function
that, given any machine state S, yields the machine
state resulting from constraining t and t' to be equal,
and fail if this is not possible.
A simple variation on this semantics is one where
failed execution sequences are discarded silently. To
obtain such a semantics, it suffices to redefine the
operation EEl as follows:
S1 EEl S2 = { 8 8 E S1 U S2 !\ fail is not in s }.
I
6
Related Work
A related framework is considered in [Codognet and
File 91] where an algebraic definition of constraint
systems is given. Program analysis based on ab-
stract interpretation techniques are considered, like
groundness dnalysis and definiteness analysis for
CLP programs. Only 0-composition is considered.
The notion of "computation system" is introduced
but it is neither formalized as a specific algebraic
structure nor extended with the join-operator. In
particular, because of the underlying semantics construction, mainly based on a generalization of the
top-down SLD semantics, a loop-checker consisting
in a "tabled" -interpreter is introduced. The use of
tabled interpreters allows to keep separate the notion
of abstraction from the finiteness required by any
static analysis. As a consequence, static analysis can
be performed by "running" the program in the standard CLP interpreter with tabulation. In our framework, no tabulation is considered. This makes the
semantics construction more general. Finiteness is a
specific property of the constraint system (expressed
in terms of EEl-chains), thus allowing to specify nonstandard computations as standard CLP computations over an appropriate non-standard constraint
system. Both the traditional top-down and bottomup semantics can then be specified in the standard
way thus allowing the definition of goal-independent
static analysis as an abstract fixpoint computation,
without loop-checking. If the constraint system
is not Noetherian, a widening/narrowing technique
[Cousot and Cousot 91] can be applied in the fixpoint computation to get a finite approximation of
the
fixpoint.
T;
In a related paper, Marriott and S¢ndergaard
consider abstract interpretation of CLP. A metalanguage is defined to specify, in a denotational style,
the semantics of logic languages. Abstract interpretation is performed by abstracting such a semantics [Marriott and S¢ndergaard 90]. In this framework, both standard and non-standard semantics are
viewed as an instance of the meta language specification.
Acknowledgment
The stimulating discussions with Maurizio Gabbrielli, Michael Maher and Nino Salibra are gratefully acknowledged.
References
[Aho et al. 74] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of Computer Algorithms. Addison Wesley Publishing Company, 1974.
[Barbuti et al. 92] R. Barbuti, M. Codish, R. Giacobazzi,
and G. Levi. Modelling Prolog Control. In Proc. Nineteenth Annual ACM Symp. on Principles of Programming Languages, pages 95-104, 1992.
591
[Barbuti et aZ. 91] R. Barbuti, R. Giacobazzi, and
G. Levi. A General Framework for Semantics-based
Bottom-up Abstract Interpretation of Logic Programs.
Technical Report TR 12/91, Dipartimento di Informatica, Universita. di Pisa, 1991. To appear in ACM Transactions on Programming Languages and Systems.
[Barbuti and Martelli 83] R. Barbuti and A. Martelli. A
Structured Approach to Semantics Correctness. "Science of Computer Programming", 3:279-311, 1983.
[Bossi et aZ. 90] A. Bossi, N. Cocco, and M. Fabris. Proving Termination of Logic Programs by Exploiting Term
Properties. In S. Abramsky and T. Maibaum, editors, Proc. TAPSOFT'91, volume 494 of Lecture Notes
in Computer Science, pages 153-180. Springer-Verlag,
Berlin, 1991.
[Cirulis 88] J. Cirulis. An Algebraization of First Order
Logic with Terms. Colloquia Mathematica Societatis
Jimos Bolyai, 54, 1991.
[Codognet and File 91] P. Codognet and G. File. Computations, Abstractions and Constraints. Technical
Report 13, Dipartimento di Matematica Pura e Applicata, Universita di Padova, Italy, 1991.
[Cortesi et al. 91] A. Cortesi, G. File, and W. Winsborough. Prop revisited: Propositional Formulas as Abstract Domain for Groundness Analysis. In p.roc. Sixth
IEEE Symp. on Logic In Computer Science, pages 322327. IEEE Computer Society Press,1991.
[Cousot and Cousot 77] P. Cousot and R. Cousot. Abstract Interpretation: A Unified Lattice Model for
Static Analysis of Programs by Construction or Approximation of Fixpoints. In Proc. Fourth A CM Symp.
Principles of Programming Languages, pages 238-252,
1977.
[Cousot and Halbwachs 78] P. Cousot and N. Ha.lbwa.chs.
Automatic Discovery of Linear Restraints Among Variables of a Program. In Proc. Fifth A CM Symp. Principles of Programming Languages, pages 84-96, 1978.
[Cousot and Cousot 79] P. Cousot and R. Cousot. Systematic Design of Program Analysis Fra.meworks. In
Proc. Sixth A CM Symp. Principles of Programming
Languages, pages 269-282, 1979.
[Cousot and Cousot 91] P. Cousot and R. Cousot.
Comparing the Galois Connection and Widening/Narrowing Approaches to Abstract Interpretation.
Preliminary draft, ICLP'91 Pre-conference workshop,
Paris, 1991.
[Debray and Ramakrishnan 91] S. Debray and R. Ramakrishnan.
Generalized Horn Clause Programs.
Technical report, Dept. of Computer Science, The University of Arizona, 1991.
[Falaschi et al. 89] M. Falaschi, G. Levi, M. Martelli, and
C. Palamidessi. Declarative Modeling of the Operational Behavior of Logic Languages. Theoretical Computer Science, 69(3):289-318, 1989.
[Gabbrielli and Levi 91] M. Gabbrielli and G. Levi.
Modeling Answer Constraints in Constraint Logic Programs. In K. Furukawa, editor, Proc. Eighth Int'l Conf.
on Logic Programnling, pages 238-252. The MIT Press,
Cambridge, Mass., 1991.
[Hanus 88] M. Hanus. Formal Specification of a Prolog Compiler.
In P. Deransart, B. Lorho, and
J. Maluszynski, editors, Proc. International Workshop
on Programming Languages Implementation and Logic
Programming, volume 348 of Lecture Notes in Computer Science, pages 273-282. Springer-Verlag, Berlin,
1988.
[Henkin et al. 85] L. Henkin, J.D. Monk, and A. Tarski.
Cylindric Algebras. Part I and II. North-Holland, Amsterdam, 1971. (Second edition 1985)
[Jaffar and Lassez 87] J. Jaffar and J.-1. Lassez. Con.
straint Logic Programming. In Proc. Fourteenth Annual ACM Symp. on Principles of Programming Languages, pages 111-119, 1987.
[J0rgensen et al. 91] N. J0rgensen, K. Marriott, and
S. Michaylov. Some Global Compile-Time Optimizations for CLP(~). Technical report, Department of
Computer Science, Monash University, 1991.
[Kemp and Ringwood 90] R. Kemp and G. Ringwood.
An Algebraic Framework for the Abstract Interpretation of Logic Programs. In S. Debray and
M. Hermenegildo, editors, Proc. North American ConI.
on Logic Programming'90, pages 506-520. The MIT
Press, Cambridge, Mass., 1990.
[Marriott and S0ndergaard 90] K. Marriott and H. S(Ilndergaard. Analysis of Constraint Logic Programs. In
S. Debray and M. Hermenegildo, editors, Proc. North
American Conf. on Logic Programming'90, pages 531547. The MIT Press, Cambridge, Mass., 1990.
[Saraswat et al. 91] V.A. Saraswat, M. Rinard, and
P. Panangaden. Semantic foundation of concurrent
constraint programming. In Proc. Eighteenth Ann.ual
A CM Symp. on Principles of Programming Languages,
1991.
[Scott 82] D. Scott. Domains for Denotational Semantics. In Proc. ICALP, volume 140 of Lecture Notes in
Computer Science. Springer-Verlag, Berlin, 1982.
[Stoy 77] J .E. Stoy. Denotational Semantics: The ScottStrachey Approach to Programming Language Theory.
MIT Press, 1977.
Verschaetse
[Verschaetse and De Schreye 91] K.
and D. De Schreye. Automatic Derivation of Linear
Size Relations. Technical report, Dept. of Computer
Science, K.U. Leuven, 1991.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
592
Extended Well-Founded Semantics for Paraconsistent Logic Programs
Chiaki Sakama
ASTEM Research Institute
17 Chudoji Minami-machi, Shimogyo, Kyoto 600 Japan
sakama@astem.or.jp
Abstract
This' paper presents a declarative semantics of logic
programs which possibly contain inconsistent information. We introduce a multi-valued interpretation of logic
programs and present the extended well-founded semantics for paraconsistent logic programs. In this setting, a
meaningful information is still available in the presence
of an inconsistent information in a program and an.y fact
which is affected by an inconsistent information is distinguished from the others. The well-founded semantics
is also extended to disjunctive paraconsistent logic programs.
1
Introduction
Recent studies have greatly enriched an expressive
power of logic programming as a tool for knowledge .representation. Handling classical negation as well as negation by failure in a program is one of such extension. An
extended logic program, which is introduced by Gelfond
and Lifschitz [GL90], distinguishes two types of negation
and enables us to deal with explicit negation as well as
default negation in a program. An extended logic program is, however, possibly inconsistent in general, since
it contains negative heads as well as positive ones in program clauses. Practically, an inconsistency is likely to
happen when we build a large scale of knowledge base
in such a logic program. A knowledge base may contain
local inconsistencies that would make a program contradictory and yet it may have a natural intended global
meaning. However, in an inconsistent program, the answer set semantics proposed in [GL90] implies every formula from the program. This is also the case for most
of the traditional logics in which a piece of inconsistent
information might spoil the rest of the whole knowledge
base.
To avoid such a situation, the so-called paraconsistent
logics have been developed which are not destructive in
the presence of an inconsistent information [Co74]. From
the point of view of logic programming, a possibly inconsistent logic program is called a para consistent logic
program. Blair and Subrahmanian [BS87] have firstly de-
veloped a fixpoint semantics of such programs by using
Belnap's four-valued logic [Be75]. Recent studies such as
[KL89, Fi89, Fi91] have also developed a logic for possibly inconsistent logic programs and provided a framework for reasoning with inconsistency. However, from
the point of view of logic programming, negation in these'
approaches is classical in its nature and the treatment of
default negation as well as classical one in paraconsistent
logic programming is still left open.
In this paper, we present a framework for paraconsistent logic programming in which classical and default
negation are distinguished. The rest of this paper is organized as follows. In section 2, we first present an application of Ginsberg's lattice-valued logic to logic programming and provide a declarative semantics of paraconsistent logic programs by extending the well-founded
semantics of general logic programs. Then we show how
the extended well-founded semantics isolates an inconsistent information and distinguishes meaningful information from others in a program. In section 3, the wellfounded semantics is also extended to paraconsistent disjunctive logic programs.
2
2.1
Well-Founded Semantics for Paraconsistent
Logic Programs
Multi-valued Logic
To present the semantics of possibly inconsistent logic
programs, multi-valued logics are often used instead of
the traditional two-valued logic. Among them, Belnap's four-valued logic [Be75] is well-known and several researchers have employed this logic to give the semantics of paraconsistent logic programs [BS87, KL89,
Fi89, Fi91]. In Belnap's logic, truth values consist of
{t, f, T, 1..} in which each element respectively denotes
true, false, contradictory, and undefined. Each element
makes a complete lattice under a partial ordering defined
over these truth values (figure 1).
To represent nonmonotoriic aspect of logic programming, however, we need extra truth values which represent default assumption. Such a logic is firstly introduced by Ginsberg [Gi86] in the context of bilattice for
593
T
T
t
t
...L
Figure 1. Four-valued logic
Figure 2. The logic VII
default logic. We use this logic to give the semantics of
paraconsistent logic programs. 1
Definition 2.1 Let P be a program and I be its
interpretation. Suppose IFF denotes that I satisfies a
formula F, then:
The set VII = {t, f, dt, df, *, T,.l} is the space of
truth values in our seven-valued logic. Here, additional
elements dt, df, and *, are read as true by default, false
by default, and don't-care by default, respectively. In
VII, each element makes a complete lattice under the
ordering ~ such that: \Ix E VII, x ~ x and .1 ~ x ~ T;
and for x E {t, f}, dx ~ * ~ x (figure 2).
A program is a (possibly infinite) set of clauses of the
form:
A
~
Bl A ... A Bm AnotC1 A ... A notCn
where m, n ~ 0, each A, Bi(1 ~ i ~ m) and
Cj (1 ~ j ::; n) are literals and all the variables are assumed to be universally quantified at the front of the
clause. In a program, two types of negation are distinguished; hereafter, ..., denotes a monotonic classical negation, while not denotes a nonmonotonic default negation.
A ground clause (resp. program) is a clause (resp. program) in which every variable is instantiated by the elements of the Herbrand universe of a program. Also,
such an instantiation is called H erbrand instantiation of
a clause (resp. program).
An interpretation I of a program is a function such
that I : HB -+- VII where Hn is the Herbrand base of
the program. (Throughout of this paper, HB denotes the
Herbrand base of a program.)
A formula is defined as usual; (i) any literal L or ...,L
,is a formula, (ii) for any literal L, notL and not...,L are
formulas, and (iii) for any formula F and G, \IF, 3F,
F V G, FAG and F ~ G are all formulas. A formula
is closed if it contains no free variable. Satisfaction of a
formula is also defined as follows.
1 [KL89]
has also suggested the extensibility of their logic for han-
dling defaults by using Ginsberg's lattice-valued logic.
1. For any atom A E H B ,
(a) I
FA ift ~ I(A),
(b) I
F ...,A if f
(c) I
F notA if df ~ I(A)
(d) I
F not...,A if dt ~ I(A)
~
I(A),
*,
~
~ *.
F 3F (resp.
I F \IF) if IFF' for some (resp. every) Herbrand
instantiation F' of F.
2. For any closed formula 3F (resp. \IF), I
3. For closed formulas F and G,
(a) IFF V G if IFF or I
(b) IFF A G if IFF and
(c) IFF
~ G
F G,
I F G,
if IFF or I ~ G.
0
The ordering ~ on truth values is also defined between interpretations. For interpretations 11 and 12,
11 ~ 12 iff \lA E H B , Il(A) ~ I2(A). An interpretation I is called minimal, if there is no interpretation J
such that J i= I and J ~ I. An interpretation I is also
called least, if I ~ J for every interpretation J.
An interpretation I is called a model of a program
if every clause in a program is satisfied in I. Note that
in our logic, the notion o( model is also defined for an
inconsistent set of formulas. For example, a program
{p, ...,p} has a model I such that I(p) = T. Especially,
an interpretation I of a program is called consistent if
for every atom A in H B , I(A) i= T. A program is called
consistent if it has a consistent model.
594
2.2
lin + 1 = a(I in);
Mp = Un<(oI lin.
0
Extended Well-Founded Semantics
The well-founded semantics is known as one of the
most powerful semantics which is defined for every generallogic program [VRS88, Pr89]. The well-founded semantics has also extended to programs with classical
negation in [Pr90], however, it is not well-defined for inconsistent programs in which inconsistent models are all
thrown away_ In this section, we reformulate the wellfounded semantics for possibly inconsistent logic programs.
To compute the well-founded model, we first present
an interpretation of a program by a pair of sets of ground
literals.
Definition 2.2 For a program P, a pair of sets of
ground literals I =< Crj 6 > presents an interpretation of
P in which each literal in I is interpreted as follows:
For a positive literal L,
(i) if L (resp. ,L) is in a, L is true (resp. false) in I;
(ii) else if L (resp. ,L) is in 6, L is false by default
(resp. true by default) in I;
(iii) otherwise, neither L nor ,L is in a nor 6, L is
undefined.
Especially, if both L and ,L are in a (resp. 6), L is
contradictory (resp. don't-care by default) in I.
0
Intuitively, a presents proven facts while 6 presents
default facts, and an interpretation of a fact is defined
by the least upper bound of its truth values in the pair.
Now we extend the constructive definition of the wellfounded semantics for general logic programs [Pr89] to
paraconsistent logic programs.
Definition 2.3 Let P be a program and I =< a; 6 >
be an interpretation of P. For sets T and F of ground
literals, the mapping cI> I and WI are defined as follows:
cI> I(T)
= {A I there is a ground clause A
~ BI /\ ... /\
Bm /\ notCI /\ ... /\ notCn from P s.t. VBi (1 ::; i ::; m)
(1 ::; j ::; n) Cj E 6},
Bi E aUT and VCj
wI(F) = {A I for every ground clause A ~ BI/\ ... /\
Bm /\ notCI /\ ... /\ notCn from P, either 3Bi (1 ::; i ::; m)
0
s.t. Bi E 6 U F or 3Cj (1 ::; j ::; n) s.t. Cj E a}.
Definition 2.4 Let I be an interpretation. Then,
TI i 0 = 0 and FI! 0 = HB U,HB (where ,HB =
{,A I A E H B } );
TIin+1=cI>I(TIin) and Fj !n+1=wI(Fj !
n);
TI
= Un I and WI, respectively.
Definition 2.5 For every interpretation I, an operator e is defined by:
e(I) = IU < T j ; FI >;
Ii 0 =< 0;0 >;
Lemma 2.1 Mp is the least fixpoint of the monotonic
operator e and also a model of P.
0
By definition, Mp is uniquely defined for every paraconsistent logic program. We call such an Mp the extended well-founded model of a program and the meaning
of a program represented by such a model is called the
extended well-founded semantics of a program.
Note that the original fixpoint definition of the wellfounded semantics in [Pr89] is three-valued and defined
for general logic programs, while our extended wellfounded semantics is seven-valued and defined for extended logic programs. Compared with the three-valued
well-founded semantics, the extended well-founded semantics handles positive and negative literals symmetrically during the computation of the fixpoint. Further,
the extended well-founded model is the least fixpoint of
a program under the ordering ~, while the three-valued
well-founded model is the least fixpoint with respect to
the ordering f < .1 < t, which is basically different from
~.2
Example 2.1 (barber's paradox) Consider the following program:
shave(b, X)
~
not shave(X, X)
Then
shave( b, b)
is
undefined under the three-valued well-founded semantics, while Mp =< 0; {,shave(b,b)} > then shave(b,b)
is true by default under the extended well-founded semantics. In another words, the extended well-founded
semantics assumes the fact 'the barber shaves himself'
without conflicting the sentence in the program.
0
Also it should be noted that the extended wellfounded model is the least fixpoint of a program, but not
necessarily the least model of the program in general.
Example 2.2 Let P = { ,p ~ not p, ,q ~
~}. Then Mp =< {,p,q"q}j{p} > and the
truth value of each predicate is {p ---+ f, q ---+ T}. While,
the least model assigns truth values such as {p ---+ .1, q --+
,p, q
t}.
0
In fact, the above least model is not the fixpoint of
the program. In this sense, our extended well-founded semantics is different from the least fixpoint model semantics of [BS87] (even for a program without nonmonotonic
negation). The difference is due to the fact that in their
least fixpoint model semantics each fact which cannot be
proved in a program is assumed to be undefined, while
it possibly has a default value under the extended well2This point is also remarked in [Pr89, Pr90j. In terms of the
bilattice valued logic [Gi86, Fi91j, the ordering < is called a truth
ordering, while the ordering ::S is called a knowledge ordering.
595
founded semantics. The above example also suggests the
fact that for. a consistent program P, Mp is not always
consistent.
The extended well-founded semantics is also different
from Fitting's bilattice-valued semantics [Fi89, Fi91].
Example 2.3 Let P = {p - q, p - 'q, q }. Then, as is pointed out in [Su90], p is unexpectedly
contradictory under Fitting's semantics, while Mp =<
{p, q}; {'p, .q} > then both p and q are true under the
extended well-founded semantics.
D
Now we examine the behavior of the extended wellfounded semantics more carefully in the presence of an
inconsistent information.
Example 2.4 Let P be the following program:
innocent - .guilty
.guilty - charged A not guilty
charged Then Mp is < {charged, innocent, .guilty};
{guilty, .innocent, .charged} >. Then the truth values
of charged and innocent are true, while guilty is false.
D
In the above example, when we consider the program
P' = P U {.innocent -}, the truth value of innocent
turns contradictory, while truth values of charged and
guilty are unchanged. That is, a meaningful information
is still available from the inconsistent program.
On the other hand, when we consider the program
P" = P U {.charged -, man -}, the truth value of
charged is now contradictory, while man, innocent are
true and guilty is false. Carefully observing this result,
however, the truth of innocent is now less credible than
the truth of man, since innocent is derived from the fact
.guilty which is now supported by the inconsistent fact
charged in the program.
Such a situation also happens in Blair and Subrahmanian's fixpoint semantics [BS87], in which a truth fact
is not distinguished even if it is supported by an inconsistent fact in a program. In the next section, we refine
the extended well-founded semantics to distinguish such
suspicious truth facts from others.
2.3
Reasoning with IncoBSistency
When a program cont ains an inconsistent information, it is important to detect a fact affected by such an
information and distinguish it from other meaningful information in a program. In this section, we present such
skeptical reasoning under the extended well-founded semantics.
First we introduce one additional notation. For a
program P and each literal L from HE, L r is called a
suffixed literal where r is a collection of sets of ground
literals (possibly preceded by not). Informally speaking,
each element in r presents a set of facts which are used
to derive L in P (it is defined more precisely below). An
interpretation of such a suffixed literal L r is supposed to
be the same with the interpretation of L.
Definition 2.6 Let P be a program and I =< 0"; 8 >
be an interpretation in which 0" (resp. 0) is a set of
suffixed literals (resp. a set of ground literals). For a set
T (resp. F) of suffixed literals (resp. ground literals),
the mapping «Pj and Wj are defined as follows:
«pj(T) = {Ar I there are k ground clauses A - BIl A
... AB'mAnotCIlA ... AnotC'n (1 ::; 1 ~ k) from P s.t. VB ,i
(1 ~ i ::; m) B~'i E 0" U T and VC'j (1 ::; j ~ n) C ,j E 0
and r = U,{ {B Il , .. , B,m , notCIl , .. , notC,n }U')'Il U.. U')'lm I
')'li
E
r ,i }.
wj(F) = {A I for every ground clause A - Bl A ... A
Bm A notCl A ... A notCn from P, either 3Bi (1 ::; i::; m)
s. t. Bi E 0 U F or 3Cj (1 ::; j ::; n) s. t. Cp EO"}.
D
The least fixpoint Mp of a program is similarly defined by using the mapping «Pj and wj instead of «P I and
W1, respectively in the previous section. Clearly, Mp is
also a model of P and we call such Mp the suspicious
well-founded model.
Example 2.5 Let P = {p - q A not r, p q s, . r - , S - } . Then, Mp =<
{p{{q,$,not r},{ -.r}}, q{{ $}} , .r{0}, s{0}}; {'p, 'q, r, .s}
>.
.r,
D
Definition 2.7 Let P be a program and Mp be its
suspicious well-founded model. For a suffixed literal L r
in M p, if every set in r contains a literal L' or .L' such
that L' is contradictory in M p, L is called suspicious.
D
We consider a proven fact to be suspicious if every
proof of the fact includes an inconsistent information. In
another words, if there is at least one proof of a fact which
contains no inconsistent information, we do not consider
such a fact to be suspicious. A proven fact which is not
suspicious is called sure.
Note that we do not consider any fact derived from
true and false by default information to be suspicious,
since such a don"t-care information just presents that
both positive and negative facts are failed to prove in a
program and does not present any inconsistency by itself.
The following lemma presents that a fact which is
derived using a suspicious fact is also suspicious.
Lemma 2.2 Let P be a program and Lr be a suffixed
literal in Mp. If each set in~r contains a suspicious fact,
then the truth value of L is also suspicious.
Proof Suppose that each set')' in r contains a suspicious fact A. Then A has its own derivation histories r'
such that each ')" in r' contains a literal which is contradictory in Mp. By definition, ')" ~ ')' then')' also contains
596
in a program. This solution is rather ad-hoc and also
easily simulated in our framework by giving higher priorNow reasoning under the suspicious well-founded seities to negative facts in a program. Another approaches
mantics is defined as follows.
such as [PAA91] and [DR91] consider r~moving contraDefinition 2.8 Let P be a program and Mj, be its
diction brought about by default assumptions. For insuspicious well-founded model. Then, for each atom A
stance, consider a program {p f - not q, .p f - r, r}.
such that A r (resp. .Ar ) is in Mj" A is called true
This program has an inconsistent well-founded model,
with suspect (resp. false with suspect) if A (resp . • A) is
however, it often seems legal to prefer the fact .p to
suspicious and .A (resp. A) is not sure in Mj,.
p, since p is derived by the default assumption notq,
On the contrary, if A (resp. .A) is suspicious but
while its negative counterpart .p is derived by the
.A (resp. A) is sure in Mj" then A is false (resp. true)
proven fact r. Then they present program transformain Mj, without suspect.
0
tions for taking back such a default assumption to generate a consistent well-founded model. In our frameEspecially, if A is both true and false with suspect,
work, such a distinction is also achieved as follows. ConA is contradictory with suspect.
sider a suspicious well-founded model of the program
Example 2.6 Let P be the following program:
< {p{{not q }},.p{{r}},r{0}};{q,.q,.r} > where a fact p
innocent f - .guilty
has a default fact in its derivation history while .p does
.guilty f - charged 1\ not guilty
not, then we can prefer the fact .p as a more reliable
charged f one. These approaches [PAA91, DR91] further discuss
.charged f contradiction removal in the context of belief revision or
man f abductive framework, but from the point of view of paraconsistent
logic programming, they provide no solution
where Mj, is < {charged{0} , .charged{0} , man{0},
innocent{{ .... guilty,charged,not guilty}}, 'guilty{{charged,not guilty}}}; for an inconsistent program such as {p, 'p, q}. Another
approaches in this direction are [In91, GS92b] in which
{guilty, .innocent, .man} >.
Then, man is true,
the meaning of an inconsistent program is assumed to
charged is contradictory, while innocent and guilty are
be a collection of maximally consistent subsets of the
true with suspect and false with suspect, respectively.
program.
o
the contradictory literal.
0
In the above example, if a new fact guilty is added
to P, this fact now holds for sure then guilty becomes
true without suspect.
2.4
Related Work
Alternative approaches to paraconsistent logic programming based upon the stable model semantics [GL88]
are recently proposed in [PR91, GS92a]. These approaches have improved the result of [GL90] in the sense
that stable models are well-defined in inconsistent programs. However, these semantics still inherit the problem of the stable model semantics and there exists a
program which has no stable model and yet it contains a meaningful information. For example, a program
{p f - , q f - not q} has no stable model, while it has an
(extended) well-founded model in which p is true. Wagner [Wa91] has also introduced a logic for possibly inconsistent logic programs with two kinds of negation. His
logic is paraconsistent and not destructive in the presence
of an inconsistent information, but it is still restricted
and different from our lattice valued logic.
Several studies have also been done from the standpoint of contradiction removal in extended logic programs. Kowalski and Sadri [KS90] have extended the
answer set semantics of [GL90] in an inconsistent program by giving higher priorities to negative conclusions
3
Extension to Disjunctive Programs
The semantics of logic programs is recently extended
to disjunctive logic programs which contain incomplete
information in a program. The well-founded semantics
is also extended to disjunctive logic programs by several
authors [Ro89, BLM90, Pr90]. In paraconsistent logic
programming, [Su90] has also extended the fixpoint semantics of [BS87] to paraconsistent disjunctive logic programs. In this section, we present the extended wellfounded semantics for paraconsistent disjunctive logic
programs.
A disjunctive program is a (possibly infinite) set of
the clauses of the form:
Al V .. , V Al
f-
BI 1\ ... 1\ Bm 1\ notCI 1\ ... 1\ notCn
where l > 0, m, n ;::: 0, each Ai, B j and C k are literals and all the variables are assumed to be universally
quantified at the front of the clause. The notion of a
ground clause (program) is also defined in the same way
as in the previous section. Hereafter, we use the term
normal program to distinguish a program which contains
no disjunctive clause.
As in [Sa89], we consider the meaning of a disjunctive
program by a set of its split programs.
Definition 3.1 Let P be a disjunctive program and
597
G be a ground clause from P of the form:
Al V ... V A, -+- BI A ... A Bm A notel A ... A noten
(1
~
2)
Then G is split into 2' - 1 sets of clauses G}, .. , G 2 1-1
such that for each non-empty subset Si of {A I, .. , A, } j
Gi = {Aj -+- Bl A ... A Bm A notel A ... A noten I Aj E
Silo
A split program of P is a ground normal program
which is obtained from P by replacing each disjunctive
clause G with its split clauses G i .
0
Example 3.1 Let P = {p V -'q -+- not r, s -+p, s -+- -,q}. Then there are three split programs of
Pj
PI = {p -+- not r, s -+- p, S -+- -,q},
P2 = {-,q -+- not r, s -+- p, S -+- -,q},
P3 = {p -+- not r, -,q -+- not r, s -+- p, S -+- -,q}.
o
Intuitively, each split program presents a possible
world of the original program in which each disjunction
is interpreted in either exclusive or inclusive way. The
following lemma holds from the definition.
Lemma 3.1 Let P be a disjunctive program and P;,
be its split program. If I is a model of P;" I is also a
model of P.
0
The extended well,-founded models of a disjunctive
program are defined by those of its split programs.
Definition 3.2 Let P be a disjunctive program.
Then Mp is called the extended well-founded model of P
if Mp is the extended well-founded model of some split
program of P.
0
Clearly, the above definition reduces to the extended
well-founded model of a normal program in the absence
of disjunctive clauses in a program.
A disjunctive program has multiple extended wellfounded models in general and each atom possibly has
different truth value in each model. In classical twovalued logic programming, a ground atom is usually assumed to be true (resp. false) if it is true (resp. false) in
every minimal model of a program. In our multi-valued
setting, we define an interpret.tion of an atom under the
extended well-founded semantics as follows.
Definition 3.3 Let P be a disjunctive program,
.. , Mp be its extended well-founded models and
M~(A)(i = 1, .. , n) be the truth value of an atom A in
M~. Then an atom A in P has a truth value J-L under
the extended well-founded semantics if M~(A) = ... =
M~,
Mp(A)
= J-L.
0
Example 3.2 For the program P in example
3.1, there are three extended well-founded models
such that M~ =< {p, s}j {-,p, q, -'q, r, -'r, -,s} >,
M~ =< {-,q, s}; {p, -,p, q, r, -'r, -,s} > and M~ =<
{p,-,q,s};{-,p,q,r,-,r,-,s} >. Then s is true and r
is don't-care by default in P under the extended wellfounded semantics, while truth values of p and q are not
uniquely determined.
0
When a program has inconsistent models as well as
consistent ones, however, it seems natural to prefer consistent models and consider truth values in such models.
Example 3.3 Let P = {p -+-, -,p V q -+-}. Then
the extended well-founded models of P are M~ =<
{p,-,p}j{q,-,q} >, M~ =< {p,q}j{-,p,-,q} > and
M~ =< {p, -,p, q}; {-,q} > where only M~ is consistent.
o
In the above example, a rational reasoner seems to
prefer the consistent model M~ to M~ and M~, and
interprets both p and q to be true. The extended wellfounded semantics for such a reasoner is defined bellow.
Definition 3.4 Let P be a disjunctive program such
that M~, .. , Mp (n =F 0) are its consistent extended wellfounded models. Then an atom A in P has a truth value
J-L under the rational extended well-founded semantics if
M}(A) = ... = Mp(A) = J-L.
0
Lemma 3.2 Let P be a disjunctive program such
that it has at least one consistent extended well-founded
model. If an atom A has a truth value J-L under the
extended well-founded semantics, then A has also the
truth value J-L under the rational extended well-founded
0
semantics, but not vice versa.
The suspicious well-founded semantics presented in
section 2.3 is also extensible to disjunctive programs in
a similar way.
4
Concluding Remarks
In this paper, we have presented the extended wellfounded semantics for paraconsistent logic programs.
Under the extended well-founded semantics, a contradictory information is localized and a meaningful information is still available in an inconsistent program.
Moreover, a suspicious fact which is affected by an inconsistent information can be distinguished from others
by the skeptical well-founded reasoning. The extended
well-founded semantics proposed in this paper is a natural extension of the three-valued well-founded semantics
and it is well-defined for every possibly inconsistent extended logic program. Compared with other paraconsistent logics, it can treat both classical and default negation in a uniform way and also simply be extended to
disjunctive paraconsistent logic programs.
This paper has centered on a declarative semantics
598
of paraconsistent logic programs, but a proof procedure of the extended well-founded semantics is achieved
in a straightforward way as an extension of the SLSprocedure [Pr89]' That is, each fact which is true/false in
a program have a successful SLS-derivation in a program,
while a default fact in a program has a failed derivation.
A fact which is inconsistent in a program has a successful
derivation from its positive and negative goals. The proof
procedure for the suspicious well-founded semantics is
also achieved by checking consistency of each literal appearing in a successful derivation. These procedures are
sound and complete with respect to the extended wellfounded semantics and also computationally feasible.
Acknowledgments I would like to thank V. S.
Subrahmanian and John Grant for useful correspondence
on the subject of this paper.
References
[Be75] Belnap, N. D., A Useful Four-Valued Logic, in
Modern Uses of Multiple- Valued Logic, J. M. Dunn
and G. Epstein (eds.), Reidel Publishing, 8-37, 1975.
[BLM90] Baral, C., Lobo, J. and Minker, J., Generalized Disjunctive Well-Founded Semantics for Logic
Programs, CS-TR-2436, Univ. of Maryland, 1990.
[BS87] Blair, H. A. and Subrahmanian, V. S., Paraconsistent Logic Programming, Proc. Conf. on
Foundations of Software Technology and Theoretical Computer Science (LNCS 287), 340-360, 1987.
[Co74] Costa, N. C. A. da, On the Theory ofInconsistent
Formal Systems, Notre Dame J. of Formal Logic 15,
497-510, 1974.
[DR91] Dung, P. M. and Ruamviboonsuk, P., WellFounded Reasoning with Classical Negation, Proc.
1st Int. Workshop on Logic Programming and Nonmonotonic Reasoning, 120-132, 1991.
[Fi89] Fitting, M., Negation as Refutation, Proc. 4th
Annual Symp. on Logic in Computer Science, 6369, 1989.
[Fi91] Fitting, M., Bilattices and the Semantics of Logic
Programming, J. of Logic Programming 11, 91-116,
1991.
[Gi86] Ginsberg, M. L., Multivalued Logics, Proc. of
AAAI'86, 243-247, 1986.
[GL88] Gelfond, M. and Lifschitz, V., The Stable Model
Semantics for Logic Programming, Proc. 5th Int.
Conf. on Logic Programming, 1070-1080, 1988.
[GL90] Gelfond, M. and Lifschitz, V., Logic Programs
with Classical Negation, Proc. 7th Int. Conf. on
Logic Programming, 579-597, 1990.
[GS92a] Grant, J. and Subrahmanian, V. S., Reasoning
in Inconsistent Knowledge Bases, draft manuscript,
1992.
[GS92b] Grant, J. and Subrahmanian, V. S., The Optimistic and Cautious Semantics for Inconsistent
Knowledge Bases, draft manuscript, 1992.
[In91] Inoue, K., Extended Logic Programs with Default
Assumptions, Proc. 8th Int. Conf. on Logic Pr~
gramming, 490-504, 1991.
[KL89] Kifer, M. and Lozinskii, E. L., RI: A Logic for
Reasoning with Inconsistency, Proc. 4th AnnualSymp·. on Logic in Computer Science, 253-262,
1989.
[KS90] Kowalski, R. A. and Sadri, F., Logic Programs
with Exception, Proc. 7th Int. Conf. on Logic Programming, 598-613, 1990.
[PAA91] Pereira, L. M., Alferes, J. J. and Aparicio,
N., Contradiction Removal within Well-Founded Semantics, Proc. 1st Int. Workshop on Logic Programming and Nonmonotonic Reasoning, 105-119,
1991.
[Pr89] Przymusinski, T. C., Every Logic Program has a
Natural Stratification and an Iterated Least Fixed
Point Model, Proc. 8th ACM Symp. O?l Principle
of Database Systems, 11-21, 1989.
[Pr90]Przymusinski, T. C., Extended Stable Semantics
for Normal and Disjunctive Logic Programs, Proc.
7th Int. Conf. on Logic Programming, 459-477,
1990.
'
[PR91] Pimentel, S. G. and Rodi, W. L., Belief Revision and Paraconsistency in a Logic Programming
Framework, Proc. 1st Int. Workshop on Logic Programming and Nonmonotonic Reasoning, 228-242,
1991.
[Ro89] Ross, K., The Well-Founded Semantics for Disjunctive Logic Programs, Proc. 1st Int. Conf. on
Deductive and Object Oriented Databases, 352-369,
1989.
[Sa89] Sakama, C., Possible Model Semantics for Disjunctive Databases, Proc. 1st Int. Conf. on Deductive and Object Oriented Databases, 337-351, 1989.
[Su90] Subrahmanian, V. S., Paraconsistent Disjunctive
Deductive Databases, Proc. 20th Int. Symp. on
Multiple-valued Logic, 339-345, 1990.
[Su90] Subr ahmani an , V. S., V-Logic: A Framework for
Reasoning about Chameleonic Programs with Inconsistent Completions, Fundamenta Informaticae
XIII, 465-483, 1990.
599
[VRS88] Van Gelder, A., Ross, K. and Schlipf, J. S., Unfounded Sets and Well-Founded Semantics for General Logic Programs, Proc. 7th ACM Symp. on
Principle of Database Systems, 221-230, 1988.
[Wa91] Wagner, G., A Database Needs Two kinds of
Negation, Proc. 3rd Symp. on Mathematical Fundamentals of Database and Knowledge Base Systems
(LNCS 495), 357-371, 1991.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992, .
edited by ICOT. © ICOT, 1992
600
Formalizing Database Evolution in the Situation Calculus
Raymond Reiter
Department of Computer Science
University of Toronto
Toronto, Canada M5S lA4
and
The Canadian Institute for Advanced Research
email: reiter@ai.toronto.edu
Abstract
vVe continue our exploration of a theory of database updates (Reiter [21, 23]) based upon the situation calculus.
The basic idea is to take seriously the fact that databases
evolve in time, so that updatable relations should be
endowed with an explicit state argument representing
the current database state. Database transactions are
treated as functions whose effect is to map the current
database state into a successor state. The formalism
is identical to that arising in the artificial intelligence
planning literature and indeed, borrows shamelessly from
those ideas.
Within this setting, we consider several topics, specifically:
1. A logic programming implementation of query evaluation.
2. The treatment of database views.
3. State constraints and the ramification problem.
4. The evaluation of historical queries.
5. An approach to indeterminate transactions.
1
1. We sketch a logic programming implementation
of the axioms defining a database under updates.
While we give no proof of its correctness, we observe
that under suitable assumptions, Clark completion
axioms (Clark [3]) should yield such a proof.
2. We show how our approach can accommodate
database views.
3. The so-called ramification problem, as defined in the
AI planning literature, arises in specifying database
updates. Roughly speaking, this is the problem
of incorporating, in the axiom defining an update
transaction, the indirect effects of the update as
given by arbitrary state constraints. We discuss this
problem in the database setting, and characterize
its solution in terms of inductive entailments of the
database.
4. An historical query is one that references previous
database states. We sketch an approach to such
queries which reduces their evaluation to evaluation
in the initial database state, together with conventional list processing techniques on the list of those
update transactions leading to the current database
state.
Introduction
Elsevlhere (Reiter [21, 23]), we have described how one
may represent databases and their update transactions
vvithin the situation calculus (McCarthy [13]). The basic idea is to take seriously the fact that databases evolve
in time, so that updatable relations should be endowed
with an explicit state argument representing the current
database state. Database transactions are treated as
functions, and the effect of a transaction is to map the
current database state into a successor state. The resulting formalism becomes identical to theories of planning
in the AI literature (See, for example, (Reiter [18])).
Following a review of some of the requisite basic concepts and results, we consider several topics in this paper:
5. The database axiomatization of this paper addresses
only determinate transactions; roughly speaking, in
the presence of complete information about the current database state, such a transaction determines a
unique successor state. By appealing to some ideas
of Haas ([7]) and Schubert ([24]), we indicate how
to axiomatize indeterminate database transactions.
2
Preliminaries
This section reviews some of the basic concepts and results of (Reiter [23, 21, 19]) which provide the necessary
Qackground for presenting the material of this paper.
601
These include a motivating example, a precise specification of the axioms used to formalize update transactions
and databases, an induction axiom suitable for proving
properties of database states, and a discussion of query
evaluation.
1. register(st, course): Register student st in course
course.
2. change(st,course,grade):
Change the current
grade of student st in course course to grade.
3. drop(st, course): Student st drops course course.
2.1
The Basic Approach: An Example
In (Reiter [23]), the idea of representing databases and
their update transactions within the situation calculus
was illustrated with an example education domain, which
we repeat here.
Relations
The database involves the following three relations:
1. enrolled(st,course,s):
Transaction Preconditions
Normally, transactions have preconditions which must be
satisfied by the current database state before the transaction can be "executed". In our example, we shall require
that a student can register in a course iff she has obtained a grade of at least 50 in all prerequisites for the
course:
Student st is enrolled in
course course when the database is in state s.
Poss(register(st, c), s) ==
{(Vp).prerequ(p,c)::J (3g).grade(st,p,g,s) Ag 2 50}.l
2. grade( st, course, grade, s): The grade of student st
in course cou'rse is grade when the database is in
state s.
It is possible to change a student's grade iff he has a
grade which is different than the new grade:
3. prerequ(pre, course): pre is a prerequisite course
for course course. Notice that this relation is 'state
independent, so is not expected to change during
the evolution of the database.
Poss(change(st,c,g),s) ==
(3g') .grade( st, c, g', s) A g' =J g.
A student may drop a course iff the student is currently
enrolled in that course:
Poss(drop(st,c),s) == enrolled(st,c,s).
Initial Database State
We assume given some first order specification of what is
true of the initial state So of the database. These will be
arbitrary first order sentences, the only restriction being
that those predicates which mention a state, mention
only the initial state So. Examples of information which
might be true in the initial state are:
enrolled(Sue, ClOO, So) V enrolled(Sue, C200, So),
(3c)enrolled(Bill, c, So),
(Vp).prerequ(p, P300) == p
= PIOO V P = MIOO,
Update Specifications
These are the central axioms in our formalization of update transactions. They specify the effects of all transactions on all updatable database relations. As usual,
all lower case roman letters are variables which are implicitly universally quantified. In particular, notice that
these axioms quantify over transactions. In what follows,
do( a, s) denotes that database state resulting from performing the update transaction a when the database is
in state s.
Poss(a,s):J [enrolled(st,c,do(a,s)) ==
a = register(st,c) V
enrolled(st,c,s) A a =J drop(st, c)],
(Vp)-,prerequ(p, CIOO),
(Vc).enrolled(Bill, c, So) ==
c = MIOO V c = ClOO V c
= P200,
enrolled(M ary, ClOO, So),
-,enrolled(John, M200, So), ...
Poss(a, s) :J [grade(st, c,g, do(a, s)) ==
a = change(st, c,g) V
grade(st,c,g,s) A (Vg')a =J change(st,c,g')].
grade(Sue, P300, 75, So), grade(Bill, 111200, 70, So), . . .
prerequ(M200, MIOO), -'prerequ(MIOO, CIOO), .. .
2.2
An Axiomatization of Updates
Database Transactions
The example education domain illustrates the general
principles behind our approach to the specification of
Update transactions will be denoted by function symbols, and will be treated in exactly the same way as
actions are in the situation calculus. For our example,
there will be three transactions:
lIn the sequel, lower case roman letters will denote variables.
All formulas are understood to be implicitly universally quantified
with respect to their free variables whenever explicit quantifiers
are not indicated.
602
database update transactions. In this section we precisely characterize a class of databases and updates of
which the above example will be an instance.
where, for not.ational convenience, we assume that F's
last argument is of sort state, and where F is a
simpl~ formula, all of whose free variables are among
Unique Names Axioms for Transactions
a, s, Xl, . .. ,X n .
For distinct transaction names T and T',
T(x)
i= T'(f}).
Identical transactions have identical arguments:
T(X1' ... , xn)
= T(y!, ... , Yn) => Xl = Y1 A ... A Xn = Yn
for each function symbol T denoting a transaction.
U nique Names Axioms for States
i= do(a,s),
= do(a',s') => a = a' As = s'.
(Va,s)So
(Va,s,a', s').do(a, s)
Definition: The Simple Formulas
The simple formulas are defined to be the smallest set
such that:
.
1. F(~s) and F(~So) are simple whenever F is an
updatable database relation, the are terms, and s
is a variable of sort state. 2
t
2. Any equality atom is simple.
3. Any other atom with predicate symbol other than
Pass is simple.
4. If Sl and S2 are simple, so are --,S1, Sl A S2, Sl 'l.fS2,
Sl => S2, Sl == S2.
5. If S is simple, so are (:lx)S and (Vx)S whenever X
is an individual variable not of sort state.
In short, the simple formulas are those first order formulas whose updatable database relations do not mention
the function symbol do, and which do not quantify over
variables of sort state.
Definition: Transaction Precondition Axiom
A transaction precondition axiom is a formula of the form
(Vx, S).PoSS(T(X1, ... ,xn), s) == fIr,
where T is an n-ary transaction function, and fIr
is a simple formula whose free variables are among
X1,···,X n,S.
Definition: Successor State Axiom
A successor state axiom for an (n + 1)-ary updatable
database relation F is a sentence of the form
2.3
An Induction Axiom
There is a close analogy between the situation calculus
and the theory of the natural numbers; simply identify
So with the natural number 0, and do(Add1, s) with the
successor of the natural number s. In'effect, an axiomatization in the situation calculus is a theory in which each
"natural number" s has arbitrarily many successors.3
Just as an induction axiom is necessary to prove anything interesting about the natural numbers, so also is
induction required to prove general properties of states.
This section is devoted to formulating an induction axiom suitable for this task.
We begin by defining an ordering relation < on states.
The intended interpretation of s < s' is that state s' is
reachable fTom state s by some sequence of transactions,
each action of which is pos.sible in that state resulting
from executing the transactions preceeding it in the sequence. Hence, < should be the smallest binary relation
on states such that:
1.
2.
(7 < doe a, (7) whenever transaction a is possible in
state (7, and
(7
< doe a, (7') whenever transaction a is possible in
(7' and (7 < (7'.
state
This can be achieved with a second order sentence, as
follows:
Definitions: s < s', s :::; s'
(Vs, s').s < s' ==
(VP).{[(Va,sl).Poss(a,sl) => P(sl,do(a,sl))] A
[(Va, Sl, S2)'POSS( a, S2) A P( Sl, S2) =>
pes!, do(a, S2))]}
=> pes,s').
(1)
(Vs,s')s :::; s' == s < s' V s = s'.
(2)
Reiter [20] shows how these axioms entail the following
induction axiom suitable for proving properties of states
s when So :::; s:
(VW).{W(So) A
[(Va, s).Poss(a, s) A So:::; s A W(s) => W(do(a,s))]}
:) (Vs ).So :::; s => W(s).
(3)
(Va,s).Poss(a,s) =>
(Vx!, ... ,xn).F(X1, ... ,xn,do(a,s)) == F
This is our analogue of the standard second order induction axiom for Peano arithmetic.
2For notational convenience, we assume that the last argument
of an updatable database relation is always the (only) argument
of sort state.
3There could even be. infinitely many successors whenever
an action is parameterized by a real number, as for' example
move(block, location).
'603
Reiter [23, 20J provides an approach to database integrity constraints in which the concept of a database
satisfying its constraints is defined in terms of inductive
entailment from the database, using this and other axioms of induction for the situation calculus. In this paper, we shall find other uses for induction in connection
with database view definitions (Section 4), the so-called
ramification problem (Section 5), and historicaf queries
(Section 6).
2.4
Databases Defined
In the sequel, unless otherwise indicated, we shall only
consider background database axiomatizations 'D of the
form:
D = less-axioms U Dss U Dtp U'Duns U 'Dunt U Dso
where
• less-axioms are the axioms (1), (2) for < and
~.
• 'Dss is a set of successor state axioms, one for each
updatable database relati'on.
• Dtp is a set of transaction precondition axioms, one
for each database transaction.
• 'Duns is the set of unique names axioms for states.
• 'Dunt is the set of unique names axioms for transactions.
Querying an evolving database is precisely the temporal
projection problem in AI planning [8J.4
Definition: A Regression Operator R
Let W be first order formula. Then R[WJ is that formula
obtained from W by replacing each atom F (~ do( 0:,0-))
mentioned by W by F(~ 0:, 0-) where F's successor state
axiom is
(Va, s).Poss(a, s) :J (Vx).F(x,do(a,s)) == F(X, a,s).
All other atoms of W not of this form remain the same.
The use of the regression operator R is a classical plan
synthesis technique (Waldinger {25]). See also (Pednault
[16, 17]). Regression corresponds to the operation of unfolding in logic programming. For the class of databases
of this paper, Reiter [23, 19J provides a sound and complete query evaluator based on regression. In this paper,
we shall have a different use for regression, in connection
with defining database views (Section 4).
3
Updates in the Logic Programming Context
It seems that our approach to database updates can be
implemented in a fairly straightforward way as a logic
program, thereby directly complementing the logic programming perspective on databases (Minker [15]). For
example, the axiomatization of the education example of
Section 2.1 has the following representation as clauses:
Successor State Axiom Translation:
• Dso is a set of first order sentences with the property that So is the only term of sort state mentioned
by the database updatable relations of a sentence of
Dso' See Section 2.1 for an example Dso' Thus,
no updatable database relation of a formula of Dso
mentions a variable of sort· state or the function symbol do. Dso will play the role of the initial database
(i.e. the one we start off with, before any transactions have been "executed").
2.5
Querying a Database
Notice that in the above account of database evolution,
all updates are virtual; the database is never physically
changed. To query the database resulting from some
sequence of transactions, it is necessary to refer to this
sequence in the query. For example, to determine if John
is enrolled in any courses after the transaction sequence
drop(J ohn, ClOO), register(M ary, ClOO)
has beeri 'executed', we must determine whether
Database F (3c).enrolled(John,c,
do(register(M ary, ClOO), do(drop(John, ClOO), So))).
enrolled( st, c, do( register( st, c), s))
t - Poss(register(st,c),s).
enrolled( st, c, do( a, s))
t - a=j:. drop(st,c),enrolled(st,c,s),Poss(a,s).
grade( st, c, g, do( change(st, c, g), s))
t - Poss(change(st,c,g),s).
grade( st, c, g, do( a, s))
t - a =j:.change( st, c, g'), grade( st, c, g, s), Poss( a, s ).5
Transaction Precondition Axiom Translation:
Poss(register(st, c), s) t - not P(st, c, s).
Q(st,p,s) t - grade(st,p,g,s),g 2': 50. 6
Poss(change(st,c,g),s) t - grade(st,c,g',s),g =j:. g'.
Poss(drop(st,c),s) t - enrolled(st,c,s).
4This property of our axiomatization makes the resulting approach quite different than Kowalski's situation calculus formalization of updates [9], in which each database update is accompanied
by the addition of an atomic formula to the theory axiomatizing
the database.
5This translation is problematic because it invokes negationas-failure on a non-ground atom. The intention is that whenever
. a is bound to a term whose function symbol is change, the call
should fail. This can be realized procedurally by retaining the
clause sequence as shown, and simply deleting the inequality a =f.
change(st, c, g').
604
With a suitable clausal form for Dso , it would then be
possible to evaluate queries against updated databases,
for example
enrolled(John, C200,
do( register(M ary, ClOO), do( drop( John, ClOO), So))).
not mention V and whose free variables are among x, s.
Suppose further that D BS contains the successor state axiom (6) for V, and that Dso contains the initial state
axiom (5). Then,
DU {3} F= (Vs).So
f-
Presumably, all of this can be made to work under
suitable conditions. The remaining problem is to characterize what these conditions are, and to prove correctness of such an implementation with respect to the logical specification of this paper. In this connection, notice
that the equivalences in the successor state and transaction precondition axioms are reminiscent of Clark's [3J
completion semantics for logic programs, and our unique
names axioms for states and transactions provide part of
the equality theory required for Clark's semantics (Lloyd
[12], pp.79, 109).
Views
In our setting, a view is an updatable database relation
V(x,s) defined in terms of so-called base predicates:
(Vx,s).V(x,s) == B(x,s),
(4)
x
where B is a simple formula with free variables among
and s, and which mentions only base predicates. 7 Unfortunately, sentences like (4) pose a problem for us because
they are precluded by their syntax from the databases
considered in this paper. However, we can accommodate
nonrecursive views by representing them as follows:
(Vx).V(x, So) == B(x, So),
(Va, s ).Poss( a, s) ::)
(Vx). V( x, do( a, s)) == R.[B( x, do( a, s) )J.8
(5)
(6)
Sentence (5) is a perfectly good candidate for inclusion
in D so ' while (6) has the syntactic form of a successor
state axiom and hence may be included in Dss.
This representation of views requires some formal justification, which the following theorem provides:
Theorem 1 Suppose V(x, s) is an updatable database
relation) and that 8(x,s) is a simple formula which does
6We have here invoked some of the program transformation
rules of (Lloyd [12], p.113) to convert the non-clausal formula
{('v'p).prerequ(p, c) ::)
(3g).grade(st, c, g, s) 1\ 9
~
s::) (Vx).V(x,s) == 8(x,s).
Theorem 1 informs us that from the initial state and
successor state axioms (5) and (6) we can inductively
derive the view definition
(Vs).So
~
s::) (Vx).V(x,s) == B(x,s).
This is not quite the same as the view definition (4) with
which we began this discussion, but it is close enough. It
guarantees that in any database state reachable from the
initial state So, the view definition (4) will be true. We
take this as sufficient justification for representing views
within our framework by the axioms (5) and (6).
5
4
~
State
Constraints and
R~mification Problem
the
Recall that our definition of a database (Section 2.4) does
not admit state-dependent axioms, except those of Dso
referring only to the initial state So. For example, we
are prevented from including in a database a statement
requiring that any student enrolled in C200 must also be
enrolled in C 100.
(Vs, st).So ~ sA enrolled(st, C200, s) ::)
enrolled( st, ClOD, s).
(7)
In a sense, such a state-dependent constraint should be
redundant, since the successor state axioms, because
they are equivalences, uniquely determine all future evolutions of the database given the initial database state
So. The information conveyed in axioms like (7) must
already be embodied in Dso together with the successor
state and transaction precondition axioms. We have already seen hints of this observation. Reiter [20J proposes
that dynamic integrity constraints should be viewed as
inductive entailments of the database, and gives several examples of such derivations. Moreover, Theorem
1 shows that the view definition
(Vs).So ~ s::) (Vx).V(x,s) == 8(i,s).
is an inductive entailment of the database containing the
initial state axiom (5) and the successor state axiom (6).
These considerations suggest that a state constraint
can be broadly conceived as any sentence of the form
50} ::) Poss(register(st, c), s)
to a Prolog executable form. P and Q are new predicate symbols.
7We do not consider recursive views. Views may also be defined
in terms of other, already defined views, but everything eventually
"bottoms out" in base predicates, so we only consider this case.
8Notice that since we are not considering recursive views (i.e., f3
does not mention V), the formula n[f3(x, do(a, s))] is well defined.
(Vs 1 , •.. , sn).SO ~
Si
A Si ~
Sj
A··· ::)
W(Sl,""
sn),
and that a database is said to satisfy this constraint iff
the database inductively entails it. 9
9See Section 2.3 for a brief discussion of inductively proving
properties of states in the situation calculus.
605
The fact that state constraints like (7) must be inductive entailments of a database does not of itself dispense
with the problem of how to deal with such constraints
in defining the database. For in order that a state constraint be an inductive entailment, the successor state
axioms must be so chosen as to guarantee this entailment. For example, the original successor state axiom
for enroll (Section 2.1) was:
Poss(a,s)::) {enTollecl(st,c,clo(a,s)) ==
a = TegisteT(st,c)V
enTollecl(st,c,s) /\ a i- clrop(st, c)}.
Historical Queries
Using the relations < and ::; on states, as defined in
Section 2.3, it is possible to pose hi.stoTical queries to a
database. First, some notation.
Notation: do([al, ... ,n],s)
Let aI, ... ,an be transactions. Define
clo( [ ], .s) = .s,
(8)
and for
As one would expect, this does not inductively entail (7).
To accommodate the state constraint (7), this Sllccessor
state axiom mllst be changed to:
Po.ss(a,s)::) {enTollecl(st,c,clo(a,.s)) ==
a = regi.steT( .st, c) /\ [c = C200 ::) enTollecl( .st, ClOO,.s)]
V
enrolled(.st,c,.s) /\ a i- clTOp(.st,c)/\
[c = C200 ::) a i- dTop(.st, ClOD)]}.
(9)
It is now simple to prove that, provided 'Dsa contains
the unique names axiom ClOD i- C200 and the initial
instance of (7),
enTolled( .st, C200, So) ::) enTolled( .st, ClOD, So),
then (7) is an inductive entailment of the database.
The example illustrates the subtleties involved in getting the successor state axioms to reflect the intent of a
state constraint. These difficulties are a manifestation
of the so-called ramification problem in artificial intelligence planning domains (Finger [4]). Transactions might
have ramifications, or indiTect effect.s. For the example
at hand, the transaction of registering a student in C200
has the direct effect of causing the student to be enrolled
in C200, and the indirect effect of causing her to be enrolled in ClOD (if she is not already enrolled in ClOD).
The modification (9) of (8) was designed to capture this
indirect effect. In our setting, the ramification problem
is this: Given a static state constraint like (7), how can
the indirect effects implicit in the state constraint be embodied in the successor state axioms so as to guarantee
that the constraint will be an inductive entailment of
the database? A variety of circumscriptive proposals for
addressing the ramification problem have been proposed
in the artificial intelligence literature, notably by Baker
[1], Baker and Ginsberg [2], Ginsberg and Smith [5], Lifschitz [10] and Lin and Shoham [11]. Our formulation
of the problem in terms of inductive entailments of the
database seems to be new. For the databases of this pa. per, Fanghzen Lin lo appears to have a solution to this
problem.
lOPersonal communication.
6
17,
= 1,2, ...
do( [aI, ... , an], s) is a compact notation for the state
term clo(a n , do(an-I,'" clo(al' s) .. .)) which denotes that
state resulting from performing the transaction aI, followed by a2, ... , followed by an, beginning in state .s.
Now, suppose T is the transaction sequence leading
to the current database state (i.e., the current database
state is clo(T, So)). The following asks whether the
database was ever in a state in which John was simultaneously enrolled in both ClOD and 1I1100?
(3s).50 ::; s /\.s ::; clo(T, 5 0 )/\
em'ollecl( John, ClOD, s) /\ enTollecl( John, 1I1100,.s).
(10)
Has Sue always worked in department 13?
(\ls).5 0
::;
s /\ s ::; clo(T, So) ::) emp(5ue, 13, s).
(11)
The rest of this section sketches an approach to answering historical queries of this kind. The approach is of
interest because it reduces the evaluation of such queries
to evaluations in the initial database state, together
with conventional list processing techniques on the list of
those transactions leading to the current database state.
Begin by considering two new predicates, last and
mem-dZtf. The intended interpretation of last(s, a) is
that the transaction a is the last transaction of the sequence.s. For example,
last( clo( [clTOp( 111 w'y, ClOD), registeT( John, ClOD)]' 50),
TegisteT( John, ClOD)).
is true, while
la.st( do([ clrop(M ary, ClOD), clrop( John, ClOD)]' 50)'
1'egisteT( John, ClOD))
is false, assuming unique names axioms for transactions.
The following two axioms are sufficient for our purposes:
.(last(50 , a).
la.st( do( a,.s), a') == a = a'.
606
The intended interpretation of rnern-d1jJ( a, s, Sf) is that
transaction a is a member of the "list difference" of s
and 5', where state s' is a "su blist" of .5. For example,
mern-diff( drop(JVJ ary, C100),
dO([Tegister( John, C100), dTOp(Bill, ClOO),
dTOp(1I1 aTY, ClOO), drop( John, 111100)], So),
dO([TegisteT(John, ClOO)J, So))
This form of the original query is of interest because
it reduces query evaluation to evaluation in the initial
database state, together with simple list pTocessing on
the list T of those transactions leading to the current
database state. We can verify that Sue has always been
employed in department 13 in one of two ways:
1. Verify that she was initially employed in department
13, and that neither jire(Sue) nor quit(Sue) are
members of list T.
is true, whereas
2. Verify
that
T
has
a sublist ending with hire(Sue,13), and that neither jire(Sue) nor quit(Sue) are members of the
list difference of T and this sublistY
rnern-diff( registerUvI aTY, C1 00),
do([register( John, ClOO), drop(Bill, ClOO),
drop(M ary, ClOO), drop(J ohn, 111100)], So),
do([register( John, ClOO)J, So))
is false (assuming unique names axioms for transactions).
The following axioms will be sufficient for our needs:
,'mern-diff(a, s, s).
s S s':::) rnern-diff(a,do(a,s'),s).
We now consider evaluating the first query (10) in the
same list processing spirit. We shall assume that (8) is
the successor state axiom for enrolled. Using the above
sentences for last and rnern-diff, together with (8) and
the induction axiom (3), it is possible to prove:
So S s :::) enrolled(st, c, s)) ==
enrolled(st,c,So) /\ ,rnern-diff(drop(st,c),s,So) V
(3s').So S s' S s /\ last(s',regi5ter(st, c)) /\
,rnern-diff(drop(st, c), s, 5').
mern-diff( a, s, Sf) :::) rnern-diff( a, do( a', s), s').
mem-diff(a,do(a',s),s') /\ a =I- a':::) rnern-diff(a,s,s').
'vVe begin by showing how to answer query (11). Suppose, for the sake of the example, that the successor state
axiom for emp is:
Poss(a,s):::) emp(p,d,do(a,5)) == a = hiTe(p,d) V
emp(p, d,s) /\ a =I- jire(p) /\ a =I- quit(p).
Then, on the assumption that the transaction sequence
T is legal, it is simple to prove that the query (10) is
equivalent to:
(3s).So S s S do(T, So) /\
enrolled( John, ClOO, So) /\
enrolled(John, MlOO, So) /\
,rnern-diff( drop( J olm, ClOO), s, So) /\
,rnern-diff( drop( John, M100), 5,5'0)
Using this, and the sentences for last and rnern-diff together with the induction axiom (3), it is possible to
prove:
So S s :::) emp(p, d, s) == emp(p, d, So) /\
,rnern-diff(fire(p) , s, So) /\ ,rnern-diff(quit(p) , s, So)
(3s').So S s' S s/\ last(s', hire(p, d)) /\
,rnern-diff(fi1'e(p) , 5, Sf) /\ ,rnern-diff( quit(p), 5, s').
{
}
V
V
Using this and the (reasonable) assumption that the
transaction sequence T is legal,l1 it is simple to prove
that the query (11) is equivalent to:
enrolled(John, ClOO, 5'0) /\
,rnern-diff(drop(J ohn, ClOO), s, 5'0) /\
(35').So Ss' S s /\
last(s', register(John, MlOO)) /\
,rnern-diff( drop( John, MlOO), s, Sf)
V
1
enroll ed( John, Ml 00, 5'0) /\
(3s").5'0 S S" S s /\
last( S", register( John, ClOO)) /\
,rnern-diff( drop( John, ClOO), s, S")
{
emp(Sue, 13, So) /\
}
,rnern-diff(fire(Sue) , do(T, So), So) /\
{
,rnern-diff( quit(Sue), do(T, So), So)
)
}
V
V
(3s').So S s' S do(T, So) /\
}
last( s', hire( Sue, 13)) /\
{ ,rnern-diff(fire(Sue), do(T, So), Sf) /\
,rnern-diff(quit(Sue) , do(T, So), Sf).
11 Intuitively, T is legal iff each transaction of T satisfies its preconditions (see Section 2,1) in that state resulting from performing
all the transactions preceeding it in the sequence, beginning with
state So' See (Reiter [19)) for details, and a procedure for verifying
the legality of a transaction sequence.
(3s', S") .5'0 S s' S s /\ So S S" S s /\
last(s', register(John, MlOO)) /\
last( S", register( John, C100)) /\
,rnern-diff( drop( John, M100), s, Sf) J\
,rnern-diff( drop( John, ClOO), s, S")
1
)
12The correctness of this simple-minded list processing procedure relies on some assumptions, notable suitable unique names
axioms.
607
Despite its apparent complexity, this sentence also has a
simple list processing reading; we can verify that John
is simultaneously enrolled in ClOO and All00 in some
previous database state as follows. Find a sublist (loosely
denoted by s) of T such that one of the following four
conditions holds:
1. John
was
initially
enrolled
in
both
ClOO and 1\IIlOO and neither drop(John, ClOO) nor
clrop( John, All00) are members of list s.
2. John
was
initially
enrolled
in
ClOO, d7'Op( John, ClOO) is not a member of list s,
s has a sublist s' ending with register( John, MlOO)
and drop(John, All00) is not a member of the list
difference of sand s'.
3. John
was
initially
enrolled
in MlOO,
drop(John,All00) is not a member of list s, s has
a sublist s' ending with register( John, ClOO) and
clrop( John, ClOO) is not a member of the list difference of sand s'.
4. There are two sublists s' and s" of s, s'
ends with registe7'( J o/m, MlOO), s" ends with
register(John,ClOO), clrop(John,All00) is not a
member of the list difference of sand s', and
dr'op( John, ClOO) is not a member of the list difference of sand s".
\Ve can even pose queries about the future, for example, is it possible for the database ever to be in a state
in which John is enrolled in both ClOO and C200?
(::Is ).So :::; s /\ enrolled( John, ClOO, s) /\
enrolled( John, C200, s).
Answering queries of this form is precisely the problem
of plan synthesis in AI (Green [6]). For the class of
databases ofthis paper, Reiter [22, l8J shows how regression provides a sound and complete evaluator for such
queries.
7
Indeterminate Transactions
A limitation of our formalism is that it requires all transactions to be determinate, by which we mean that in
the presence of complete information about the initial
database state a transaction completely determines the
resulting state.
One way to extend the theory to include indeterminate transactions is by appealing to a simple idea due
to Haas [7J, as elaborated by Schubert [24J. As an example, consider the indeterminate transaction drop-astudent(c), meaning that some student - we don't know
.whom - is to be dropped from course c. Notice that we
cannot now have a successor state axiom of the form
Poss(a,s) :J {enrollecl(st,c, do(a, s)) == (st,c, a,s)}.
To see why, consider the following instance of this axiom:
Poss( drop-a-student(Cl 00), So) :J
{enrolled( John, ClOO, clo( drop-a-student(Cl 00), So))
== (John, ClOO, drop-a-student(C100), So)}.
Suppose L;o is a complete description of the initial
data.base state, and suppose moreover, that
L;o
1=
Poss( drop-a-student(Cl 00), So) /\
enrolled(John, ClOO, So).
By the completeness assumption,
L;o
1= ±(John, ClOO, drop-a-student(C100), So),
in which case
L;o
1= ±enrollecl(John, ClOO,
do( drop-a-student (C100), So)).
In other words, we would know whether John was the
student dropped from ClOO, violating the intention of
the drop-a-student transaction.
Despite the inadequacies of the axiomatization of Section 2.2 (specifically the failure of successor state axioms
for specifying indeterminate transactions), we can represent this setting.with something like the following axioms:
(3st)enrolled( st, c, s) :J Poss( drop-a-student( c), s).
enrolled(st, c, s) :J Poss(drop(st, c), s).
Poss(a, s) :J
{a = drop(st,c):J ,enrolled(st,c,do(a,s))}.
Poss(a,s):J {a = drop-a-student(c):J
(::I!st)em'olled( st, c, s) /\ ,enrolled( st, c, do( a, s)}Y
Poss(a, s) :J
{-.enrolled(st, c, s) /\ enrolled(st, c, do(a, s)) :J
a = register(st, c)}.
Poss( a, s) :J
{enrollecl(st, c, s) /\ -.enrolled(st, c, do(a, s)) :J
a = clrop(st, c) Va = drop-a-student(c)}.
The last two formulas are examples of what Schubert
[24] calls explanation closure axioms. For the example
at hand, the last axiom provides an exhaustive enumeration of those transactions (namely clrop( st , c) and
drop-a-student(c)) which could possibly explain how it
came to be that st is enrolled in c in the current state
s and is not enrolled in c in the successor state. Similarly, the second last axiom explains how a. student could
come to be enrolled in a course in which she was not enrolled previous to the transaction. 14 The feasibility of .
13(3!st) denotes the existence of a unique st .
is these explanation closure axioms which provide a succinct alternative to the frame axioms (McCarthy and Hayes [14])
which would normally be required to represent dynamically changing worlds like databases (Reiter [23]).
14It
608
such an approach relies on a closure assumption, namely
that we, as database designers, can provide a finite exhaustive enumeration of such explaining transactionsY
In the "real" world, such a closure assumption is problematic. The state of the world has changed so that a
student is no longer enrolled in a course. What can explain this? The school burned down? The student was
kidnapped? The teacher was beamed to Andromeda by
extraterrestrials? Fortunately, in the database setting,
such open-ended possible explaining events are precluded
by the database designer, by virtue of her initial choice
of some closed set of transactions with which to model
the application at hand; no events outside tflis closed
set (school burned down, student kidnapped, etc.) can
be considered in defining the evolution of the database.
This initial choice of a closed set of transactions having
been made, explanation closure axioms provide a natural
representation of this closure assumption.
By appealing to explanation closure axioms, we can
now specify indeterminate transactions. The price we
pa.y is the loss of the simple regression-based queryevalua.tor of (Reiter [23, 21]); we no longer have a simple
sound and complete query evaluator. Of course, conventional first order theorem-proving does provide a query
evaluator for such an axiomatization. For example, the
following are entailments of the above axioms, together
with unique names axioms for transactions and for John
and Ma1'Y:
en1'olled(J ohn, ClOO, So) 1\ en1'olled(M ary, CIOO, So)
Acknowledgements
Many of my colleagues provided important conceptual
and technical advice. My thanks to Leo Bertossi, Alex
Borgida, Craig Boutilier, Charles Elkan, Michael Gelfond, Gosta Grahne, Russ Greiner, Joe Halpern, Hector Levesque, Vladimir Lifschitz, Fangzhen Lin, Wiktor
Marek, John McCarthy, Alberto Mendelzon, John Mylopoulos, Javier Pinto, Len Schubert, Yoav Shoham and
Marianne Winslett. Funding for this work was provided
by the National Science and Engineering Research Council of Canada, and by the Institute for Robotics and Intelligent Systems.
References
[1] A. Baker. A simple solution to the Yale shooting problem. In R. Brachman, H.J. Levesque, and
R. Reiter, editors, Proceedings of the First International Conference on Principles of Knowledge Representation and Reasoning (KR '89), pages 11-20.
Morgan Kaufmann Publishers, Inc., 1989.
[2] A. Baker and M. Ginsberg. Temporal projection and
explanation. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence,
pages 906-911, Detroit, MI, 1989.
[3] K.L. Clark. Negation as failure. In H. Gallaire
and J. Minker, editors, Logic and Data Bases, pages
292-322. Plenum Press, New York, 1978.
=>
enrolled(John, CIOO, do(drop(Mary, CIOO), So)) 1\
-,enrolled(M ary, C100, do( drop(M ary, ClOO), So)).
{(Vst).enrolled(st,C100,So) ==st = John} =>
(Vst)-,enrolled(st, CIOO,
do(drop-a-student (C100),So)).
{(Vst).enrolled(st, ClOO, So) ==
st = John V st = Mary}
=>
enrolled(John, CIOO, do( drop-a-student(C100), So)) EEl
enrolled(lvJ ary, ClOO, do( drop-a-student( CIOO), So)).
Notice that the induction axiom (3) of Section 2.3 does
not depend on any assumptions about the underlying
database. In p~rticular, it does not depend on successor state axioms. It follows that we can continue to use
induction to prove properties of d·atabase states and integrity constraints in the more generalized setting of indeterminate transactions. The fundamental perspective
on integrity constraints of (Reiter [20]) - namely that
they are inductive entailments of the database - remains
the same.
15This assumption is already implicit in our successor state axioms of Section 2.2
[4] J. Finger. Exploiting Constraints in Design Synthesis. PhD thesis, Stanford University, Stanford, CA,
1986.
[5] M.L. Ginsberg and D.E. Smith. Reasoning about
actions I: A possible worlds approach. Artificial Intelligence, 35:165-195, 1988.
[6] C. C. Green. Theorem proving by resolution as a
basis for question-answering systems. In B. Meltzer
and D. Michie, editors, Machine Intelligence 4,
pages 183-205. American Elsevier, New York, 1969.
[7] A. R.. Haas. The case for domain-specific frame axioms. In F. M. Brown, editor, The frame problem in
artificial intelligence. Proceedings of the 1987 workshop, pages 343-348, Los Altos, California, 1987.
Morgan Kaufmann Publishers, Inc.
[8] S. Hanks and D. McDermott. Default reasoning,
nonmonotonic logics, and the frame problem. In
Proceedings of the National Conference on Artificial
Intelligence, pages 328-333, 1986.
[9] R.. Kowalski. Database updates in the event calculus. Journal of Logic Programming, 12:121-146,
1992.
609
[10] V. Lifschitz. Toward a metatheory of action. In
J. Allen, R. Fikes, and E. Sandewall, editors, Proceedings of the Second International Conference on
Principles of Knowledge Representation and Reasoning (KR '91), pages 376-386, Los Altos, CA,
1991. Morgan Kaufmann Publishers, Inc.
[11] F. Lin and Y. Shoham. Provably correct theories of
action. In Proceedings of the National Conference
on Artificial Intelligence, 1991.
[12] J.W. Lloyd. Foundations of Logic Programming.
Springer Verlag, second edition, 1987.
[13] J. McCarthy. Programs with common sense. In
lV1. Minsky, editor, Semantic Information Processing, pages 403-418. The MIT Press, Cambridge,
MA,1968.
[14] J. McCarthy and P. Hayes. Some philosophical
problems from the standpoint of artificial intelligence. In B. Meltzer and D. Michie, editors, Machine Intelligence 4, pages 463-502. Edinburgh University Press, Edinburgh, Scotland, 1969.
[15] J. Minker, editor.
Foundations of Deductive
Databases and Logic Programming. Morgan Kaufmann Publishers, Inc., Los Altos, CA, 1988.
[16] E.P.D. Pednault. Synthesizing plans that contain
actions with context-dependent effects. Computational Intelligence, 4:356-372, 1988.
[17] E.P.D. Pednault.
ADL: Exploring the middle
ground between STRIPS and the situation calculus. In R.J. Brachman, H. Levesque, and R. Reiter, editors, Proceedings of the First International
Conference on Principles of Knowledge Representation and Reasoning (KR '89j, pages 324-332. Mor-
gan Kaufmann Publishers, Inc., 1989.
[18] R. Reiter. The frame problem in the situation calcuIus:. a simple solution (sometimes) and a completeness result for goal regression. In Vladimir Lifschitz,
editor, Artificial Intelligence and Mathematical Theory of Computation: Papers in Honor of John McCarthy, pages 359-380. Academic Press, San Diego,
CA, 1991.
[19] R. Reiter. The projection problem in the situation
calculus: A soundness and completeness result, with
an application to database updates. 1992. submitted for publication.
[20] R. Reiter. Proving properties of states in the situation calculus. 1992. submitted for publication.
[21] R. Reiter. On specifying database updates. Technical report, Department of Computer Science, University of Toronto, in preparation.
[22] R. Reiter. A simple solution to the frame problem
(sometimes). Technical report, Department of Computer Science, University of Toronto, in preparation.
[23] R. Reiter. On formalizing database updates: preliminary report. In Proc. 3rd International Conference
on Extending Database Technology, Vienna, March
23 - 27, 1992. to appear.
[24] L.K. Schubert. Monotonic solution of the frame
problem in the situation calculus: an efficient
method for worlds with fully specified actions. In
H.E. Kyberg, R.P. Loui, and G.N. Carlson, editors,
Knowledge Representation and Defeasible Reasoning, pages 23-67. Kluwer Academic Press, 1990.
[25] R. Waldinger. Achieving several goals simultaneously. In E. Elcock and D. Michie, editors, Mach1:ne
IntelligerLce 8, pages 94-136. Ellis Horwood, Edinburgh, Scotland, 1977.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
610
Learning Missing Clauses by Inverse Resolution
Peter Idestam-Ahnquist*
Department of Computer and Systems Sciences
Stockhohn University
Electrum 230, 16440 Kista, Sweden
pi@dsv .su.se
Abstract
The incomplete theory problem has been of large interest
both in explanation based learning and more recently in
inductive logic programming. The problem is studied in the
context of Hom clause logic, and it is assumed that there is
only one clause missing for each positive example given.
Previous methods have used either top down or bottom up
induction. Both these induction strategies include some
undesired restriction on the hypothesis space for the missing
clause. To overcome these limitations a method where the
different induction strategies are completely integrated is
presented. The method involves a novel approach to inverse
resolution by using resolution, and it implies some
extensions to the framework of inverse resolution which
makes it possible to uniquely determine the most specific
result of an inverse resolution step.
1 Introduction
Completion of incomplete theories has been of large interest
in machine learning, particularly in the area of explanation
based learning, for which a complete theory is crucial
[Mitchell et al. 1986, Dejong and Mooney 1986]. Research
on augmenting an incomplete domain has been reported in
[Hall 1988, Wirth 1988, Ali 1989]. A new framework for
inductive learning was invented by inverting resolution
[Muggelton and Buntine 1988]. Papers considering
augmentation of incomplete theories in this framework are
[Wirth 1989, Rouveirol and Puget 1990, RouveiroI1990].
We only consider Hom clause logic, which is a subset of
first order logic, and we follow the notation in logic
programming [Lloyd 1987]. The incomplete theory problem
can then be formulated as follows. Let P be a definite
program (an incomplete theory) and E a definite program
clause which should but does not follow from P (P 1* E).
* This research was supported by NUTEK, the Swedish National Board
for Industrial and Technical Development.
Then find a definite program clause H such that:
(a) Pu{E} 1* H
(b) Pu{H} 1= E
H is an inductive conclusion according to [Genesereth
and Nilsson 1987].
Let E=(Af-Bl. ... ,Bn). Then by top down induction
we mean any reasoning procedure, to infer an inductive
conclusion, that starts from A. By bottom up induction
we mean any inductive reasoning procedure that starts from
Bl, ... ,B n .
Most previous methods use either top down [Hall 1988,
Wirth 1988, Ali 1989] or bottom up induction [Sammut and
Banerji 1986, Muggelton and Buntine 1988, Rouveirol and
Puget 1990]. Both these induction strategies have some
undesired restrictions on the hypothesis space of H. In
[Wirth 1989] a method that combines top down and bottom
up induction is presented, while in this paper a method
where they are completely integrated will be described. In
the previous methods there are also other undesired
restrictions, namely that the input clause E must be fully
instantiated [Hall 1988, Wirth 1988, Wirth 1989, Ali 1989,
Sammut and Banerji 1986] or a unit clause [Muggelton and
Buntine 1988]. Our method works for full Horn clause
logic.
Logical entailment is used as a definition of generality.
Let E and F be two expressions. Then E is more general
than F, if and only if E logically entails F (E 1= F). We also
say that F is more specific than E.
In the examples, predicate symbols are denoted by p, q,
r, s, t and u. Variables (universally quantified) are denoted
by x, y, z and w. Constants are denoted by a, band c.
Skolem functions are denoted by k.
In section 2 the inductive framework of inverse
resolution is given. In section 3 some extensions to this
framework, which make it possible to determine the most
specific inverse resolvent, are described. In section 4 a new
611
inverse resolution method is presented, and finally in section
5 related work and contributions is discussed.
condition (c) can be rewritten as:
(c) R is the clause (C-{A})8AU(D-{B})8B, where
8=8AU8B and A8A= B8B.
2 The Framework of Inverse
Resolution
The inductive framework of inverse resolution was first
presented in [Muggelton and Buntine 1988]. First, as a
background, resolution will be described. Then inverse
resolution will be definied, and some problems considering
inverse reslution will be pointed out.
Let RO be a definite program clause and P a definite
program. A linear derivation from RO and P consists of a
sequence RO,R 1,... of definite program clauses and a
sequence C1,C2, ... of variants of definite program clauses
in P such that each Ri+1 is resolved from Ci+1 and Ri. A
linear derivation of Rk from RO and P is denoted:
(RO;C1) I-R (R1;C2) I-R ... I-R Rk or for short
(RO;P) I-R* Rk.
2.2 Inverse Resolution
2.1 Resolution
A substitution is a finite set of the form {V1/t1, ... ,Vn/tn}'
where each Vi is a variable, each ti is a term distinct from Vi,
and the variables Vb ... ,Vn are distinct. Each element Vilti is
called a binding for Vi. A substitution is applied by
simultaneously replacing each occurence of the variable Vi,
in an expression, by the term ti.
An expression is either a term, a literal, a clause or a set
of clauses. (A fixed ordering of literals in clauses and a
fixed ordering of clauses in sets of clauses are assumed.)
Let E be an expression and V be the set of variables
occurring in E. A renaming substitution for E is a
substitution {X1/Yl, ... ,xn/Yn} such that Y1, ... ,Yn are distinct
variables and (V-{X1, ... ,Xn})n{Y1, ... ,Yn}=0.
Let E and F be expressions. Then E is a variant of F if
there exists a renaming substitution 8 such that E=F8.
A unifier for two terms or literals t1 and t2 is a
substitution 8 such that t18=t28.
A unifier 8 for t1 and t2 is called a most general
unifier (mgu) for t1 and t2, if for each unifier 8' of t1 and
t2 there exists a substitution 8" such that 8'=88".
Let C and D be two clauses which have no variables in
common. Then the clause R is resolved from C and D,
denoted (C;D) I-R R, if the following conditions hold:
(a) A is a literal in C and B is a literal in D.
(b) 8 is an mgu of A and B.
(c) R is the clause «C-{A})u(D-{B })8.
The clause R is called a resolvent of C and D.
Since C and D have no variables in common, the mgu 8
can uniquely be divided into two disjunct!ve parts 8A and
8B such that 8=8AU8B and A8A= R8B. Consequently
A place within an expression is denoted by an n-tuple and
defined recursively as follows. The term, literal or clause at
place within f(t1, ... ,tn) or {t1, ... ,tn } is tal' The term
or literal at place (m>1) within f(tl, ... ,t n) or
{t1, ... ,tn} is the term or literal at place in tal'
Let E be an expression. Then for each substitution 8
there exists a unique inverse substitution 8- 1 such that
E88- 1=E. Whereas the substitution 8 maps variables in E to
terms, the inverse substitution 8- 1 maps terms in E8 to
variables. An inverse substitution is a finite set of the
form {(t1, {P1,1, ... ,P1,ml }/Vl, ... ,(t n,{ Pn,l, .. ·,Pn,m n } )/vn}
where each Vi is a variable distinct from the variables in E,
each ti is a term distinct from Vi, the variables V1, ... ,Vn are
distinct, each Pi,j is a place at which ti is found within E and
the places P1,1, ... ,Pn,m n are distinct. An inverse substitution
is applied by replacing all ti at places {Pi,l, ... ,Pi,m) in the
expression E by Vi.
Example: If the following inverse substitution
{(a,{ <1,1,2>,<1,2,1,1>,<2,2,1> ))/x} is applied on the
expression {(p(a,a)~p(f(a»,(q(a)~r(a»}, the expression
{(p(a,x)~p(f(x»,(q(a)~r(x»} is obtained.
Let R, C and D be three clauses. If R can be resolved
from C and D, then D can be inverse resolved from Rand
C. The clause 0 is inverse resolved from Rand C,
denoted (R;C) I-IR D, if the following conditions hold:
(a) A is a literal in C.
(b) 8A is a substitution whose variables are variables that
occur in A.
(c) (C-{A})8A is a subset of R.
(d) r is a subset of (C-{A})8A.
(e) 8B- 1 is an inverse substitution whose terms are terms
that occur in A.
(f) D is the clause «R-r)u{ A}8A)8B-1.
612
The clause D is called an inverse resolvent of Rand C.
Given R and C there are four sources of indeterminacy
for D, namely: A, SA, rand SB- l .
If A is a positive literal then D is forwardly inverse
resolved, and if A is a negative literal then D is
backwardly inverse resolved.
Example: Suppose we have R=(s(a,z)f-q(a),r(b»,
C=(P(a,x)f-q(a),r(x» and D=(s(y,z)f-p(y,b». The clause
D can be forwardly inversed resolved from Rand C,
(R;C) I-IR D, if A=p(a,x), SA={x/b}, r=(f-q(a),r(b» and
SB- l ={(a,{,<2,1>})/y}. The clause C can be
backwardly inverse resolved from Rand D, (R;D) I-IR C, if
A=-.p(y,b), SA={y/a}, r=(s(a,z)f-) and SB- 1 =
{(b,{<1,2>,<3,1>})/x}. (It is assumed that the positive
literal is first in the ordering of literals in a clause.)
Unfortunately there are examples when the choice of SA2 is
crucial.
Example: Let R=(rf-q), Cl =(p(x)f-q), C2=(Sf-p(a»
and D=(rf-s). Then there is a linear derivation of R from D
and {Cl,C2}:
(D;C2) I-R «rf-p(a»;Cl) I-R (rf-q).
Consequently, there is an inverse linear derivation of D from
Rand {Cl,C2}:
(R;Cl) I-IR «rf-p(a»;C2) I-R (rf-s).
In the first inverse resolution step SA2 is chosen as {x/a}.
With any other choice of SA2 the inverse linear derivation of
D would not have been possible.
If R, C, A and SAl are given, then it is desirable that a
unique most specific inverse resolvent can be determined.
Unfortunately, in Horn clause logic, it is not possible due to
the substitution SA2.
Let DO be a definite program clause and P a definite
program. An inverse linear derivation from DO and P
consists of a sequence DO,Dl, ... of definite program clauses
and a sequence Cl,C2, ... of variants of definite program
clauses in P such that each Dj+l is inverse resolved from
Cj+l and Dj. An inverse linear derivation of Dk from Do and
P is denoted:
(DO;Cl) I-IR (Dl;C2) I-IR ... I-IR Dk or for short
(DO;P) I-IR* Dk·
Example: Let R=(rf-q) and C=(p(x)f-q). If we seek
the most specific clause D such that (R;C) I-IR D, then we
let r=0 and SB- l =0 but what should SA2 be? If we let
SA2=0, the clause Dl=(rf-p(x),q) is obtained. For example
the clauses D2=(rf-p(a),q) and D3=(rf-p(b),q) are more
specific than Db but neither D2 nor D3 is more specific than
the other. Consequently, there is no unique most specific
inverse resolvent.
A backward inverse linear derivation is an inverse
linear derivation where each Dj is backwardly inverse
resolved, and a forward inverse linear derivation is an
inverse linear derivation where each Dj is forwardly inverse
resolved.
3 Extended Inverse Resolution
2.3 Some Problems
Consider the definition of inverse resolved. The substitution
SA can be divided into two disjunctive parts, SAl including
the variables that occur both in A and (C-{ A}), and SA2
including the variables that only occur in A (SA=SA1USA2).
Then, to determine an inverse resolvent D, we have to
choose A, SAl, SA2, rand SB- I . Only in some special
cases there are more than one alternative for A and SAl.
Example: Let R=(pf-q(a),r(b» and C=(Pf-q(x),r(x».
Then we have either A=-.q(x) and SAl ={ x/b}, or A=-.r(x)
and SA1={ x/a}.
For rand SB- l there are limited numbers of alternatives,
but for S A2 there is not. The terms in S A2 can be any
possible terms. Consequently, it is hard to choose SA2.
Our inverse resolution method (see section 4) implies some
extensions to the framework of inverse resolution. After
these extensions the choices of SA2, rand SB- l in inverse
linear derivations can be postponed, and the most specific
inverse resolvent can be determined.
3.1 Existentially Quantified Variables
To postpone the choice of SA2, existentially quantified
variables will temporarily be introduced. Any sentence, in
which the existentially quantified variables are replaced by
Skolem functions, is equal to the original sentence with
respect to satisfiability [Genesereth and Nilsson 1987].
Therefore the existentially quantified variables will be
represented by Skolem functions. As a consequence of the
introduction of existentially quantified variables (Skolem
functions), some additional types of substitutions are
needed.
613
A Skolemfunction is a term f(Xl, ... ,X n) where f is a
new function symbol and Xl, ... ,Xn are the variables
associated with the enclosing universal quantifiers.
A S kolem substitution is a finite set of the form
{Vl/kl, ... ,vn/k n }, where each Vj is a variable, each kj is a
Skolem function, and the variables Vl, ... ,Vn are distinct.
An inverse Skolem substitution is a finite set of the
form {kl/Vl, ... ,kn/vn}, where each ki is a Skolem function,
each Vj is a new variable, and the Skolem functions
kl, ... ,k n are distinct.
Let o'={ xl/kl, ... ,Xn/kn} be a Skolem substitution and
o-l={kl/Yl, ... ,kn/Yn} an inverse Skolem substitution such
that the Skolem functions in 0' and 0- 1 are exactly the same.
Then the composition 0'0'-1 of 0' and 0'-1 is a renaming
substitution {Xl/Yl, ... ,Xn/Yn} for any expression E.
An existential substitution is a finite set of the form
{kl/tl, ... ,kn/t n }, where each kj is a Skolem function
(existentially quantified variable), each tj is a term (possibly
a Skolem function) distinct from kj, and the Skolem
functions kl, ... ,kn are distinct. While a substitution or a
Skolem substitution corresponds to a specialization an
existential substitution corresponds to a generalization.
As an inverse substitution, an inverse existential
substitution is specified with respect to an expression E. An
inverse existential substitution is a finite set of the
form {(tl, {Pl,l, ... ,Pl,ffil }/kl, ... ,(t n,{ Pn,l, ... ,Pn,ffi n } )/k n }
where each kj is a Skolem function distinct from the Skolem
functions in E, each tj is a term distinct from kj, the Skolem
functions kl, ... ,k n are distinct, each Pj,j is a place at which
ti is found within E and the places Pl,l, ... ,Pn,ffi n are
distinct. An inverse existential substitution is applied by
replacing all ti at places {Pj,l, ... ,Pi,ffij} in E by kj.
Let o'={ vl/kl, ... ,Vn/kn} be a Skolem substitution and
Tl={kl/tl, ... ,kn/tn } an existential substitution such that the
Skolem functions in 0' and Tl are exactly the same. Then the
composition O'Tl of 0' and Tl is the substitution
{Vl/tl, ... ,Vn/tn}. In this way Skolem substitutions and
existential substitutions can be used to postpone the choice
of SA2.
3.2 Most Specific Inverse Resolution
To postpone the choice of r, the notion of optional
literals will be used. A clause {Bl, ... ,Bk,Bk+I, ... ,B n }, in
which the literals {Bk+ I , ... ,Bn} are optional, is denoted
C[c]={BI, ... ,Bk,[Bk+l, ... ,B n]} where C={B}, ... ,Bd and
C={Bk+l, ... ,B n }. Consequently, if c=0 then C[c]=C.
Example: Let R=(pf-q,r,s) and C=(tf-q,r,s) be two
clauses. Then (R;C) I-IR D, where D=(pf-t,q,r,s)-r and
r~{-,q,-,r,-,s}. All these alternatives for D can be
described in a compact way by using optional literals. Thus,
D[d]=(pf-t,[ q,r,s]).
The definition of inverse resolved can now be modified
in such a way that the choices of SA2, rand SB- 1 are
postponed. The clause D is most specific inverse
resolved from R[r] (which may include Skolem functions)
and C, denoted (R[r];C) 1-.tIR D, if the following conditions
hold:
(a) A is a literal in C.
(b) SAl is a substitution whose variables are variables that
occur both in A and (C-{A}).
(c) Tl- l is an inverse existential substitution whose terms
are terms that occur in (C-{ A}).
(d) (C-{A})SAlTl- 1 is a subset of R[r].
(e) 0' is a Skolem substitution whose variables are all the
variables that only occur in A.
(f) D[d] is the clause D=«R-r)u{ A}SA1O'), d=rur,
where r=(C-{A })SAlTl- I .
The clause Du d is called a most specific inverse
resolvent of Rand C.
Given Rand C, there are only two sources of
indeterminacy, namely: A and SAl. Consequently, given R,
C, A and SAl there is a unique most specific inverse
resolvent Dud.
Example: Let R=(rf-q) and C=(p(x)f-q). Then the
unique most specific inverse resolvent of Rand C is the
clause Dud=(rf-p(k),q) where k is a Skolem functions
(representing an existentially quantified variable). This is
true, since 'v'x(rf-p(x),q) 1= (rf-p(t),q), and (rf-p(t),q) 1=
3x(rf-p(x),q) for any term t.
Let DO[do] a be definite program clause and P a definite
program. A most specific inverse linear derivation
from DO[dO] and P consists of a sequence DO[dO],D}[dI1, ...
of definite program clauses and a sequence CI,C2, ... of
variants of definite program clauses in P such that each
Di+l[dj+I1 is most specific inverse resolved from Cj+l and
Dj[dj]. A most specific inverse linear derivation of Dk[dkl
from Do[do] and P is denoted:
(DO[dO];Cl) 1-.tIR (D}[dl];C2) 1-.tIR ... 1-.tIR Dk[dkl
or for short (DO[dO];P) 1-.tIR* Dk[dk]·
614
Each result of an inverse linear derivation can be obtained
from the result of some most specific inverse linear
derivation, if we apply an inverse substitution, an existential
substitution, and drop a subset of the optional literals.
Example: Suppose we have the following clauses
R=(rf-q), Cl=(P(X)f-q), C2=(Sf-p(a)), C3=(t(b)f-p(b)),
D=(rf-s,t(x),p(c» and D'[d']=(rf-s,t(b),[p(k),q]). Then
(R;{Cl,C2,C3}) -IR* D and
(R; {Cl,C2,C3}) I-J-IR* D'[d'].
The clause D can be obtained from D'[d'] by application of
the inverse substitution {(b,{<3,1>})/x} and the existential
substitution {klc}, and by dropping the optional literal q.
The most specific inverse linear derivation of D'[d'] looks
as follows:
. (R;Cl) I-J-IR «rf-p(k),[q]);C2) I-J-IR
«rf-s,[p(k),q]);C3) I-J-IR (rf-s,t(b),[p(k),q]).
That TJ-l, in the two last steps are {(a,)/k}, and that k then can be replaced by a third
term c, may seem inconsistent, but it is not. Consider the
corresponding inverse linear derivation of D from Rand
{Cl,C2,C3}:
«rf-q);Cl) I-IR «rf--q,p(c»;Cl) I-IR
«rf--q,p(b),p(C»;Cl) I-IR «rf--p(a),p(b),p(C));C2) I-IR
«rf--s,p(b),p(C»),C3) I-IR (rf--s,t(x),p(c)).
Note that since k has been used as three different terms (a, b
and c) in the most specific inverse linear derivation, three
inverse resolution steps are needed to compensate for the
step where k is introduced. Note also that 8A2={X/C} in the
first, 8A2={x/b} in the second and 8A2={x/a} in the third
inverse resolution step. To choose exactly those
substitutions is hard, but in a most specific inverse linear
derivation it is not necessary.
3.3 Truncation Generalization
A clause Cl 01]-subsumes a clause C2 if there exists a
substitution 8 and an existential substitution rt such that
C18~C2TJ· If Cl 8rt-subsumes C2 then CI 1= C2.
To perform a 01]-truncation is to apply some arbitrary
existential substitution rt, apply some arbitrary inverse
substitution 8- 1, and drop some arbitrary literals. The
generalizarion technique 8rt-truncation corresponds to 8rtsubsumption.
Let P be a definite program (an incomplete theory) and E
a definite program clause which should but does not follow
from P (P 1# E), let D be the set of definite program clauses
D such that (E;P) I-IR* D, and let lHl be the set of definite
program clauses H such that Pv{H} 1= E. Since resolution
is not complete [Rob65] D is a subset of·lHl (D ~ lHl). In
particular each definite program clause D' that 8TJ-subsumes
some clause D, where D E D, will be in lHl. This is true
since Pv{D} 1= E, and D' 1= D, gives us Pv{D'} 1= E.
Consequently, we can perform any 8rt-truncation on the
result D of a most specific inverse linear derivation and still
have an inductive conclusion.
4 The Method
In this section a method, which in an easy way realizes
inverse linear derivations, will be described. Instead of
performing an inverse linear derivation from the example
clause E, a variant of ordinary resolution derivation is
performed from the complement E of E.
4.1 Complement
A definite program clause complement set (dpcc-set)
is set of clauses containing exactly one unit goal and a
number of unit clauses.
Let C be a definite program clause (Af-Bt, ... ,B n), as- l
an inverse Skolem substitution including all Skolem
functions in C, and as a Skolem substitution including all
the universally quantified variables in C. Then the
complement C of C is the definite program clause
complement set {( f--A),(B 1f- ), ... ,(Bnf-) las-lac. Let S
be a dpcc-set {(f--A),(Blf--), ... ,(Bnf--)}, ac- l an inverse
Skolem substitution including all Skolem functions in S,
and as a Skolem substitution including all the universally
quantified variables in S. Then the complement S of S is
the definite program clause (Af--Bl, ... ,Bn)ac-las. Thus,
the complement of a dpcc-set is a definite program clause
and vice versa.
Example: Let C be the clause (p(a,x)f--q(k,x,y». Then
the complement C of C is the definite program clause
complement set {(f--p(a,kx»,(q(Xk,kx,ky)f-)}, which is
obtained by application of the inverse Skolem substitution
{klXk} and the Skolem substitution {x!kx,ylky} on the set of
clauses {(f-p(a,x»),(q(k,x,y)f--)}. The complement C' of
C is the definite program clause (p(a,x')f--q(k',x',y'),
which is obtained by application Qf the inverse Skolem
substitution {kx/x',ky/Y'} and the Skolem substitution
615
{xidk'} on the clause (p(a,kx)f-q(Xk,kx,ky)). The clause C
is a variant of C, since C=C8 where 8 is the renaming
substitution {x/x',y/y'}.
4.2 Clause Set Resolution
The notion of optional clauses will be used a similar
same way as optional literals. A set of clauses
{Cl, ... ,Ck,Ck+l, ... ,Cn}, in which the clauses
{ C k + 1 , ... , C n } are optional, is denoted
S[s]={ Cl, ... ,Ck,[Ck+l, ... ,C n]} where S={Cl. ... ,Ck} and
S={Ck+I. ... ,Cn }. Consequently, if s=0 then S[s]=S.
An elementary clause set L is a set of clauses
containing at most one clause, that is L=0 or L={ C} where
C is a clause.
Let Si[Si] be a clause set and L an elementary clause set.
Then Si+l[Si+11 is clause set resolved from Si[Si] and
L, denoted (Si[Si];L) I-CSR Si+l[Si+l]' if the following
conditions hold:
(a) C is a variant of a clause C in Si[Si]UL.
(b) D is a clause in Si[SJ.
(c) R is a resolvent of C' and D.
(d) 1:1 is the elementary clause set of unit clauses in {C,D}.
(e) Si+l[Si+Il is the clause set Si+l=(Si-{C,D})u{R},
Si+l=Siul:1.
If D is a definite goal then R will also be a definite goal,
and we say that Si+dsi+l] is backwardly clause set
resolved from Si[Si] and L. If both C and D are definite
program clauses then R will also be a definite program
clause, and we say that Si+ 1[Si+ 11 is forwardly clause
set resolved from Si[Si] and L.
Let SO[SO] be a clause set and P a definite program. A
clause set derivation from So[SO] and P consists of a
sequence SO[so],Sl[sIl, ... of clause sets, and a sequence
Ll.L2, ... of elementary clause sets, such that each Li is a
subset of P and each clause set Si+dsi+Il is clause set
resolved from Si[Si] and Li+l. A clause set derivation of
Sk[Sk] from SO[SO] and P is denoted:
(SO[SO];Ll) I-CSR (Sl[SI];L2) I-CSR ." I-CSR Sk[skl or
(SO[SO];P) I-CSR* Sk[skl.
A backward clause set derivation is a clause set
derivation where each Si[Sj] is backwardly clause set
resolved, and a forward clause set derivation is a
clause set derivation where each Si[Si] is forwardly clause
set resolved.
Example: Let So={ (f-p(k)),(q(k)f- ),(r(k)f-)} and
C=(p(x)f-r(x),s(x). Then we have the following backward
clause set derivation:
(SO;{C}) I-CSR ({(f-r(k),s(k),(q(k)f-),(r(k)f-)};0) I-CSR
{(f-s(k»,(q(k)f- ),[(r(k)f-)]}.
4.3 The Algorithm
Let P be a definite program and E a definite program clause
which should but does not follow from P (P I:;t: E). Our
algorithm to produce an inductive conclusion H looks as
follows.
Completion of Refutation Proof Algorithm:
1. Compute the complement E of E, which is a dpcc-set.
2. Perform a clause set derivation from P and E of a dpccset H'[h'].
3. Compute the complement H[h'] of H[h'], which is a
definite program clause.
4. Perform a 8Tl-truncation of H[h'] to obtain H.
The generalization performed in steps 1-3, is called a
generalization, which in fact is
equivalent to performing a most specific inverse linear
derivation.
Reconsider the definition of most specific inverse linear
resolved in section 2. Let {AI, ... ,Am}=C-{A} and
{Bl, ... ,B n }=R-{Al, ... ,A m}8A11l- 1. Then the clause
D[d]={ Bl, ... ,B n }u{ A} 8AI aS2U[ {AI, ... ,Am }8AITl- 1] is
most specific inverse resolved from
R={Al, ... ,A m}9AlTl- 1u{B}, ... ,B n} and
C={A }u{Al. ... ,A m}.
The corresponding reformulation generalization looks as
follows:
1. The complement R of R is the dpcc-set
({ { Ad, ... ,{ Am} }9A11l- lu {{ Bd, ... ,{ Bn} DaSI-laR
where aSl- 1 is an inverse Skolem substitution including all
Skolem functions in Rand aR is a Skolem substitution
including all universally quantified variables in R.
2. The following clause set derivation is performed:
( R;{C}) I-CSR* D[d] where
D=({{ Bl}, ... ,{ Bn}}u{{A}9All and
d=[{ { Ad, ... ,{ Am} }8AlTl- l ])asl-laR.
3. The complement D[d] of D[d] is the definite program
clause
({Bl, .. ·,Bn}u{ A }9AlaS2U[ {Al, ... ,A m}9AlTl- l ])9
where aS2 is a Skolem substitution including all universally
quantified variables in D[d]aSl and 8 is the renaming
substitution 8=aSl- 1 })/x} and the existential substitution {kw/c} and
by dropping the optional literals, DI'=(r(x,z')~s(c,z')) is
obtained.
If the last negative literal in DI' also is dropped then
D2'=(r(x,z')~) is obtained.
Steps 2 and 4 in the completion of refutation proof
algorithm are indeterministic. The use of a preference bias
can make them deterministic. Such a preference bias must
specify which clause set is the most preferable result of the
clause set derivation (reformulation bias), and which
generalization should be done in the STl-truncation
(truncation bias).
The algorithm is implemented in a system, called CRPl,
in which a depth first search is used to find the best dpcc-set
H[h'] according to some given preference bias.
4.4 Integrating Top down and Bottom up
Induction
Backward inverse linear derivations correspond to top down
induction, and forward inverse linear derivations correspond
to bottom up induction. In our method, backward clause set
derivations correspond to top down induction, and forward
clause set derivations correspond to bottom up induction.
Each step in a clause set derivation can be either backwardly
or forwardly clause set resolved. Consequently, in our
method (and in the system CRP1) top down and bottom up
induction are completely integrated.
Example: Let E=(p~q,t,u) and P={ (p~q,r),(s~t,u)}.
Then the inductive conclusion HI =(r~t,u) is inferable by
top down induction (backward inverse linear derivation),
but not by bottom up induction (forward inverse linear
derivation). The inductive conclusion H2=(P~q,s) is
inferable by bottom up induction, but not by top down
induction. The inductive conclusion H3=(r~s) can only be
inferred by a method that combines top down and bottom up
induction. With our algorithm the clause H3 is constructed
as follows:
E of E is the dpcc-set
1. The complement
{(~p),(q~ ),(t~),(u~)}.
2. The following clause set derivation is performed:
( E; {(p~q,r)}) I--CSR
({ (~q,r),(q~ ),(t~ ),(u~) };0) I--CSR
({ (~r),(t~ ),(u~),[(q~ )]};{ (s~t,u)}) I--CSR
({(~r),(s~u),(u~),[(t~),(q~)]};0) I--CSR H3[h3]
where H3[h3]={ (~r),(s~ ),[(u~ ),(t~ ),(q~)]}
3. The complement H3[h3] of H3[h3] is the definite
program clause (r~s,[u,t,q]).
4. By dropping the optional literals H3=(r~s) is obtained.
The first two steps in the clause set derivation are
backwardly clause set resolved (top down induction) and the
last two steps are forwardly clause set resolved (bottom up
induction).
5 Concluding Remarks
Some extensions to the inverse resolution framework and a
new inverse resolution method have been presented. This
method subsumes the previous methods based on inverse
resolution and completely integrates top down and bottom
up induction.
617
Reconsider the definition of inverse resolved in section 2.
References
If we let A be a positive literal, SA2=0 and r=(C-{A})SA
then it is a definition of the absorption operator [Muggelton
and Buntine 1988]. If we let A be a positive literal, SA2=0,
r=0 and SB- 1=0 then it is a definition of elementary
saturation [Rouveirol and Puget 1990]. The saturation
operator [Rouveirol and Puget 1990] is equal to an
exhaustive forward inverse linear derivation, in which each
step is restricted according to elementary saturation. If we let
A be a positive literal, SA2=0, r=(C-{A})SA and SB- 1=0
then it is a definition of the learning procedure called
generalize in [BaneIji 1991]. If we let A be a negative literal,
SA2=0 and r=(C-{A})SA then it is a definition of the
identification operator [Muggelton and Buntine 1988].
Since our method performs inverse linear derivations
without any restrictions on A, SA2, r or SB- 1, all the
methods mentioned above can be seen as special cases of
our method.
Our notion of optional literals is the same as in
[Rouveirol and Puget 1990]. Our Srt-truncation is similar to
the truncation generalization in [Rouveirol and Puget 1990]
and the truncation operator in [Muggelton and Buntine
1988], which both correspond to S-subsumption.
Wirth [Wirth 1989] and Rouveirol [RouveiroI1991] have
both pointed out the advantages of combining top down and
bottom up induction. In [Wirth 1989], a system called LFP2,
which uses both top down and bottom up induction is
presented. However, the different induction strategies are
separated into different parts of the system. The first part
(top down) is based on completion of partial proof trees,
while the second part (bottom up) is based on operators
performing inverse resolution. The second part uses the
result from the first part, and different types of bias are used
in the different parts. Our method has the major advantage
that the two different induction strategies are completely
integrated, which not only eliminates the restrictions that
they imply when separated, but also makes possible the use
of an overall preference bias.
The main contributions of this research are:
1. A complete integration of top down and bottom up
induction.
2. Introduction of existentially quantified variables, which
makes it possible to uniquely determine the most
specific inverse resolvent.
3. A method to perform inverse resolution for full Hom
clause logic by using resolution.
[Ali 1989] K. M. Ali, "Augmenting Domain Theory for
Explanation Based Generalization" in Proceedings of the
6th International Workshop on Machine Learning,
Morgan Kaufmann, 1989.
[Banerji 1991] R. B. Banerji, "Learning Theoretical Terms"
in Proceedings of International Workshop on Inductive
Logic Programming, 1991.
[Dejong and Mooney 1986] G. Dejong and Mooney,
"Explanation-Based Learning: An Alternative View" in
Machine Learning 1: 145-176, 1986.
[Genesereth and Nilsson 1987] Nilsson and Genesereth,
Logic Foundations of Artificial Intelligence, Morgan
Kaufmann, 1987.
[Hall 1988] R. J. Hall, "Learning by Failing to Explain:
U sing Partial Explanations to Learn in Incomplete or
Intractable Domains" in Machine Learning 3: 45-77,
1988.
[Lloyd 1987] J. W. Lloyd, Foundations of Logic
Programming (second edition), Springer-Verlag, 1987.
[Mitchell et al. 1986] T. M. Mitchell, S. Kedar-Cabelli and
R. Keller, "Explanation-Based Generalization: A
Unifying View" in Machine Learning 1: 47-80, 1986.
[Muggleton and Buntine 1988] S. Muggleton and W.
Buntine, "Machine Invention of First-order Predicates by
Inverting Resolution" in Proceedings of the 5th
International Conference on Machine Learning, Morgan
Kaufmann, 1988.
[Robinson 1965] J. Robinson, "A Machine-oriented Logic
Based on the Resolution Principle" in Journal of ACM
12(1), 1965.
[Rouveirol 1990] C. Rouveirol, "Saturation: Postponing
Choices when Inverting Resolution" in Proceedings of
the 9th European Conference on Artificial Intelligence,
Pitman, 1990.
[Rouveirol 1991] C61ine Rouveirol, "ITOU: Induction of
First Order Theories" in Proceedings of International
Workshop on Inductive Logic Programming, 1991.
[Rouveirol and Puget 1990] C. Rouveirol and J. F. Puget,
"Beyond Inversion of Resolution" in Proceedings of the
7th International Conference on Machine Learning,
Morgan Kaufmann, 1990.
[Sammut and Banerji 1986] C. Sammut and R. Banerji,
"Learning concepts by asking questions" in Michalski,
Carbonell and Mitchell (eds), Machine Learning: an
artificial intelligence approach volume 2, Morgan
Kaufmann, 1986.
[Wirth 1988] R. Wirth, "Learning by Failure to Prove" in
Proceedings of the 3rd European Working Session on
Learning, Pitman, 1988.
[Wirth 1989] R. Wirth, "Completing Logic Programs by
Inverse of Resolution" in Proceedings of the 4th
European Working Session on Learning, Pitman, 1989.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
618
A Machine Discovery from Amino Acid Sequences
by Decision Trees over Regular Patterns
Setsuo Arikawat
t
+
tt
++
Satoru Kuhara+
Ayumi Shinoharat
Research Inst.it.ut.e of Fundamental Information Science, Kyushu University 33, Fukuoka 812, Japan.
Graduate School of Genetic Resources Technology, Kyushu University 46, Fukuoka 812, Japan.
Depart.ment of Information Systems, Kyushu Universit.y 39, Kasuga 816, Japan.
Depart.ment of Art.ificial Intelligence, Kyushu Institute of Technology, Iizuka 820, Japan.
Abstract
This paper describes a machine learning syst.em that discovered a "negat.ive mot.if", in transmembrane domai~
ident.ificat.ion from amino acid seqnences, and report.s it.s
experiments on protein dat.a using PIR database. We int.roduce a decision tree whose nodes are labeled wit.h regular pat.terns. As a hypothesis, t.he system produces such
a decision tree for a small number of randomly chosen
posit.ive and negat.ive examples from PIR. Experiments
show t.hat our syst.em finds reasonable hypotheses very
successfully. As a theoret.ical foundat.ion, we show t.hat.
the class of languages defined by decision trees of depth
at most dover k-variahle regular patterns is polynomialtime learnable in the sense of probably approximately
correct (PAC) learning for any fixed d, k ~ O.
1
Satoru Miyanot
Yasuhito Mukouchi tt
Takeshi Shinohara++
Introduction
Hydrophobic transmembrane domains can be ident.ified
by a very simple decision tree over regular patterns. This
result was discovered by the machine learning system we
developed. The system takes some training sequences of
positive and negative examples, and produces a hypothesis explaining them. When a small number of positive
and negative examples of transmembrane domains were
given as input, our system found a small decision tree
over regular patterns as a hypothesis. Although the hypothesis is made from just 10 positive and 10 negative
examples, it can explain all data in PIR database [PIR]
with high accuracy more than 90%. The hypothesis exhibits that "two consecutive polar amino acids" (Arg,
Lys, His, Asp, Glu, GIn, Asn) are not included in the
tl'ansmembrane domains. This indicates that significant
Email addresses:
arikawaOrifis.sci.kyushu-u.ac.jp
kuharaOgrt.kyushu-u.ac.jp
miyanoOrifis.sci.kyushu-u.ac.jp
mukouchiOrifis.sci.kyushu-u.ac.jp
ayumiOrifis.sci.kyushu-u.ac.jp
shinoOdonald.ai.kyutech.ac.jp
motifs are not in the inside of the transmembrane domains but in the outside. We call such motifs "negative
motifs."
This paper describes a machine learning system t.ogether wit.h a background theory that discovered such
negative motifs, and reports its experiments on knowledge acquisition from amino acid sequences that reveal
the importance of negative data. Traditional approaches
to motif-searching are to find subsequences common
to functional domains by various alignment techniques.
lIenee the eyes are focused only on positive examples,
and negative examples are mostly ignored. Our approach
by decision trees over regular patterns provides new direction and method for discovering motifs.
A regular pattern [Shinohara 1982, Shinohara 1983]
is an expression WOXI WI X2 ••• Xn Wn that defines the sequences containing Wo, Wt, ... , Wn in this order, where
each 'IDi is a sequence of symbols and Xj varies over
arbitrary sequences. Regular patterns have been used
to describe some features of amino acid sequences in
PROSITE database [Bairoch 1991] and DNA sequences
[Arikawa et al. 1992,
Gusev and Chuzhanova 1990].
Our view to these sequences is through such regular patterns. A decision tree over regular patterns is a tree
which describes a decision procedure for determining the
class of a given sequence. Each node is labeled with either a class name (lor 0) or a regular pattern. At a
node with a regular pattern, the decision tree tests if the
sequence matches the pattern or not. Starting from the
root toward a leaf, the decision procedure makes a test
at each node and goes down by choosing the left or right
branch according to the test result. The reached leaf answers the class name of the sequence. Such decision trees
are produced as hypotheses by our machine learning system. Since the system searches a decision tree of smaller
size, regular patterns on the resulting decision tree exhibit motifs which play a significant role in classification. Hence, compared with neural network approaches
[Holly and Karplus 1989, Wu et al.]' our system shows
important motifs in a hypothesis more explicitly.
619
We employ the idea of ID3 algorithm [Quinlan 1986,
Utgoff 1989] for constructing a decision tree since it is
sufficiently fast and experiments show that small enough
trees are usually obtained. We also devise a new method
for constructing a decision tree over regular patterns using another evaluation function. Given two sets of positive and negative examples, our machine learning system
finds appropriate regular patterns as node attributes dynamically during the construction of the decision tree.
Hence, unlike ID3, we need not assume any concrete
knowledge about attributes and can avoid struggles from
defining the attributes of a decision tree beforehand. Our
system makes a decision tree just from a small number of training sequences, which we also guarantee with
the PAC learning theory [Valiant 1984] in some sense.
Therefore it may cope with a diversity of classification
problems for proteins and DNA sequences.
We made an experiment on raw sequences from twenty
symbols of amino acid residues. The system discovered a
small decision tree just from 20 sequences with more than
85% accuracy that show if a sequence contains neither E
nor D (both are polar amino acids) then it is very likely
to be a transmembrane domain.
A hydropathy plot [Engelman et al. 1986, Kyte and
Doolittle 1982, Rao and Argos 1986] has been used generally to predict transmembrane domains from primary sequences. With this knowledge, we first transform twenty
amino acids to three categories (*, +, -) according to the
hydropathy index of Kyte and Doolittle [1982]. From
randomly chosen 10 positive and 10 negative training examples, our system has successfully produced some small
decision trees over regular patterns which are shown to
achieve very high accuracy. The regular patterns appearing in these decision trees indicate that two consecutive
polar amino acid residues are important negative motifs
for transmembrane domains. From the view point of Artificial Intelligence, it is quite interesting that the polar
amino acid residues D and E were found by our machine
learning system without any knowledge on the hydropathy index.
2
Let E be a finite alphabet and X = {x, y, Z, Xl, X2,' .. }
be a set of variables. We assume that E and X are disjoint. A pattern is an element of (E U X)+, the set of
all nonempty strings over E U X. For a pattern 7r, the
language L( 7r) is the set of strings obtained by substituting each variable in 7r for a string in E*. We say that a
pattern 7r is regular if each variable occurs at most once
in 7r. For example, xaybza is a regular pattern, hut xx
is not. Obviously, regular patterns define regular languages, but not vice versa. In this paper we consider
only regular patterns. A regular pattern containing at
most k variables is called a k-variable regular pattern.
A decision tree over regular patterns is a binary tree
such that the leaves are labeled with 0 or 1 and each
internal node is labeled with a regular pattern (see Figure 1). For an internal node v, we denote the left and
right children of v by left (v) and right( v), respectively.
We denote by 7r( v) the regular pattern assigned to the
internal node v. For a leaf tt, value( u) denotes the value
o or 1 assigned to u. The depth of a tree T, denoted by
depth(T), is the length of the longest path from the root
to a leaf.
For a decision tree T over regular patterns, we define
a function fT : E* -+ {O, I} as follows. For a string w
in E*, we determine a path from the root to a leaf and
define the value fT( w) by the following algorithm:
begin
/*
Input: w E E*
*/
v +- root;
while v is not a leaf do
if w E L( 7r( v)) then v +-right( v)
else v +-left( v);
fT( w) +- value( v)
end
For a decision tree T over regular patterns, we define
= I}. It is easy to see that L(T)
is also a regular language. But the converse is not true.
Let L = {a 2n I n ~ I}. It is straightforward to show that
there is no decision tree T over regular patterns with L =
L(T). The same holds for the language {a 2n b I n ~ I}.
L(T) = {w E E* I fT(W)
3
After knowing the importance of negative motifs, we
examined decision trees with a single node with regular
patterns XI-X2-' • '-X n for n ~ 3. The best is the pattern
XI-X2-X3-X4-XS-X6 that gives the sequences containing
at least five polar amino acids. The result is very acceptable. The accuracy is 95.4% for positive and 95.0%
. for negative examples although it has been believed to
be difficult to define transmembrane domains as a simple
expression when the view point was focussed on positive
examples.
Decision Trees over Regular
Patterns
Constructing Decision Trees
This section gives two kinds of algorithms for constructing decision trees over regular patterns that are used in
our machine learning system.
The first algorithm employs the idea of ID3 algorithm
[Quinlan 1986] in the construction of decision trees. The
ID3 algorithm assumes data together with explicit attributes in advance. On the other hand, our approach
assumes a space of regular patterns which are simply
generated by given positive and negative examples. No
620
Figure 1: Decision tree over regular patterns defining a language {ambna l I m,n,12 I} over ~ = {a,b}
function DT1 ( P, N : sets of strings ): node;
begin
if N = 0 then
return( CREATE("l", null, null) )
else if P = 0 then
return( CREATE("O", null, null) )
else begin
Find a shortest pattern 7r in II(P, N)
that minimizes E(7r, P, N);
PI +- P n L(7r); Po +- P - PI;
NI +- N n L(7r); No +- N - N I ;
return(CREATE(7r,DTl (Po, N o),DT1(P1 , Nd))
end
end
Algorithm 1
extra knowledge about data is required. Although the
space may be large and contain meaningless attributes,
our algorithm finds appropriate regular patterns from
this space dynamically during the construction of a decision tree in a feasible amount of time. This is a point
which is very suited for our empirical research.
Let P and N be finite sets of strings with P n N = 0.
Using P and N, we deal with regular patterns of the form
'tVOXl'WIX2 ••• XkWk such that 'tVa, ••• , 'Wk are substrings of
some strings in PuN. Let II(P, N) be some family of
such regular patterns made from P and N. The family
II(P, N) is appropriately given and used as a space of
attributes.
For a regular pattern 7r E II( P, N), the cost E( 7r, P, N)
is the one defined in [Quinlan 1986] by
where PI (resp. nd is the number of positive examples
in P (resp. negative examples in N) that match 7r, i.e.,
PI = IP n L(7r)I, nl = IN n L(7r)I, and Po (resp. no)
is the number of positive examples in P (resp. negative
examples in N) that do not match 7r, i.e., Po = IPnL(7r)I,
no = IN n L(7r)I, L(7r) = ~* - L(7r), and
I(x,y)
o
= {
(if x = 0 or y = 0)
x
x
y
y
.
---log - - - --log - - (otherwIse).
x+y
x+y
x+y
ct~t
(b)
Figure 2: A leaf is replaced by (a) or (b) for some pattern
1r.
function DT2( P, N: sets of strings,
AfaxNode: int
) : tree;
begin
if N = 0 then
return( CREATE("l", null, null) )
else if P = 0 then
return( CREATE("O", null, null) )
else begin
T +-CREATE("l", null, null);
while ( nodes(T) < !lfaxN ode
and Score(T, P, N) < 1 ) do
begin
find Tmax E T(T)
that maximizes Score(Tmax , P, N);
T +- Tmax
end
end
return ( T )
end
Algorithm 2
x+y
The first algorithm DT1(P,N) (Algorithm 1) sketches
our decision tree algorithm for II(P, N), where
CREATE( 7r, To, Td returns a new tree with a root labeled with 7r whose left and right subtrees are To and 1'1,
respectively.
The second algorithm uses a different evaluation function. For a decision tree T over regular patterns, let
nodes(T) be the number of nodes in T, and T(T) be the
set of trees constructed by replacing a leaf v of T by the
tree of Fig. 2 (a) or Fig. 2 (b) for some pattern 7r.
The score function Score(T, P, N) balances the infor-
cf~
(a)
mation gains in classification and is defined as
S core (T , P'N)
.
= IP
n L(T)I . IN n L(T)I
IPI
INI·
The second algorithm DT2(P, N,MaxNode) (Algorithm
2) checks all leaves at each phase of a node generation
using the evaluation function Score(T, P, N).
Algorithm 2 is slower than Algorithm 1 since all leaves
are checked at each phase of a node generation. However,
621
Algorithm 2 constructs decision trees which are finely
tuned when the size of decision trees is large. Moreover, it is noise-tolerant, i.e., it allows conflicts between
positive and negative training examples. If the size of
Il(P, N) is polynomial with respect to the size of P and
N, then these algorithms run in polynomial time.
4
Transmembrane
Identification
Domain
The problem of transmembrane domain identification is
one of the most important protein classification problems
and some methods and experiments have been reported.
For example, Hartman et a1. [1989] proposed a method
using the hydropathy index for amino acid residue'S in
[Kyte and Doolittle 1982]. The reported success rate is
about 75%. Most approaches deal with positive examples, i.e., sequences corresponding to transmembrane domains, and try to find properties common to them.
The sequence in Figure 3 is an amino acid sequence of
a membrane protein. There is a tendency to assume that
a membrane protein contains several transmembrane domains each of which consists of 20 '" 30 amino acid
residues. Therefore, if a sequence corresponding to a
transmembrane domain is found in an amino acid sequence, it is very likely that the protein is a membrane
protein.
Our idea for transmembrane domain identification is
to use decision trees over regular patterns for classification. Algorithm 1 and 2 introduced in Section 3 are used
to find good decision trees from positive and negative
training examples. In order to avoid combinatorial explosion, we restrict the space of attributes to the regular
patterns of the form
xay,
where x and yare variables and a is a substring taken
from given examples.
In our experiments, a positive example is a sequence
which is already known to be a transmembrane domain.
A negative example is a sequence of length around 30 cut
out from the parts other than transmembrane domains.
The length 30 is simply due to the reasonable length of
a transmembrane domain. From PIR database our machine learning system chooses randomly two small sets
P and N of positive and negative training examples, respectively. Then, at each trial, by using Algorithm 1 or
Algorithm 2, the system tries to construct a small deci-'
sion tree over regular patterns which classifies P and N
exactly.
We have evaluated the performance ratio of a produced decision tree in the following way. As the total
space of positive examples, we use the set POS of all
. transmembrane domain sequences (689 sequences) from
PIR database. The total space NEG of negative examples consists of 19256 negative examples randomly
chosen from all proteins from PIR. The success rate of
a decision tree for positive examples is the percentage
of the positive examples from POS recognized as positive (class 1). The success rate for negative examples is
counted as the percentage of the negative examples from
NEG recognized as negati\'f~ (class 0).
Fignre 5 (a) is one of the sma llest df'cision tref'S discovf'rf'd hy our system just from 10 posit.ivc and 10 lwgative
raw sequences that achif've good accuracy. The performance ratio is (81.8%,89.6%) for all data in POS and
NEG, respectively. This decision trf'e suggests that if
a seque'nce of length around :30 contains neitllfT D not
E then it is Vf'ry likely to be a part of transmemhrane
domain.
The alphahet of amino acid sequences consists of
tWf'nty symhols. It has bcen shown that the use of
the hydropat hy indf'x for amino acids is Vf'ry successful
[Arikawa et al. 1992, Hartmann et al. 1989]. According
to the hydropathy indf'x of [Kyte and Doolittle 1982],
we transform t l1<'se twent.y symbols to three symbols as
shown in Tahle 1. This t.ransformation reduces the size
of a search spa,e drastically small while less information
is, fortunately, lost in classification.
Then by this transformation table, the sequence in
Figure 3 becomes the sequence in Figure 4.
Figure 5 (b), (c) show two of the best decision trees
over regular patterns that our machine learning system found from 10 positive and 10 negative training examples. The decision tree (b) recognizes 91.4%
of positive examples and 94.8% of negative examples.
Even the decision tree of (c) can recognize 92.6% of
the positive examples and 91.6% of the negative examples. The negative motif "--" which indicates consecutive polar amino acid residues plays a key role
in classification. This may have a close relation to
the signal-anchor structure that consists of two parts,
the hydrophobic part of a membrane-spanning sequence
and the charged residues around the hydrophobic part
[Lipp et al. 1989, Von Heijine 1988].
The decision tree (a) also shows the importance of a
cluster of polar amino acids in transmembrane domain
identification although our machine learning system assumed no knowledge ahout the hydropathy.
VVe examined how the performance of our machine
learning system changes with respect to the number of
training examples. The training examples are chosen
randomly ten times in each case and a point of the graph
of Figure 6 is the average of these ten results for each
case. Figure 6 shows the results. We may observe the
following facts:
1. The hydropathy index of Kyte and Doolittle
[Kyte and Doolittle 1982] is very useful. \Vhen indexed sequences are used, the system can produce
from 40 positive and 40 negative examples a decision
tree with only several nodes whose accuracy is more
622
MDVVNQLVAGGQFRVVKE(PLGFVKVLQWVFAIFAFATCGSY)TGELRLSVECANKTESALNIEVEFEYPFRLHQVYFDA
PSCVKGGTTKIFLVGDYSSSAE(FFVTVAVFAFLYSMGALATYIFL)QNKYRENNK(GPMMDFLATAVFAFMWLVSSSAW
A)KGLSDVKMATDPENIIKEMPMCRQTGNTCKELRDPVTS(GLNTSVVFGFLNLVLWVGNLWFVF)KETGWAAPFMRAPP
GAPEKQPAPGDAYGDAGYGQGPGGYGPQDSYGPQGGYQPDYGQPASGGGGYGPQGDYGQQGYGQQGAPTSFSNQM
Figure 3: An amino acid sequence which contains four transmembrane domains shown by the parenthesized parts.
Amino Acids
A MC F L V I
P Y WS T G
R K DEN Q H
Hydropathy Index
1.8 4.5
-1.6 -0.4
-4.5 -3.2
New Symbol
I"V
-+
I"V
-+
I"V
-+
*
+
Table 1: Transformation rules
*-**--***++-*-**--(+*+**-**-+********+*+++)++-*-*+*-**--+-+**-*-*-*-++*-*--*+*-*
++**-++++-****+-++++*-(***+*******++*+***++***)---+-----(++**-***+******+**+++*+
)-+*+-*-**+-+--**--*+**--++-+*--*--+*++(+*-++***+**-***+*+-*+***)--+++**+**-*++
+*+---+*++-*++-*+++-+++++++--++++-+++-+-++-+*++++++++-+-++--+++--+*+++*+--*
Figure 4: The sequence obtained by the transformation
no~yes
[6.5%,3.9%]
[4.4%,13.0%]
(a) (84.8%, 89.6%)
[91.4%,5.2%]
[1.2%,3.2%]
(b) (91.4%,94.8%)
cD ®
[92.6%,8.4%]. [7.4%,91.6%]
(c) (92.6%, 91.6%)
Figure 5: The node label, for example, -- is an abbreviation of XI--X2 that tests if a given sequence contains the sequence
--. The leaf label 1 (resp. 0) is the class name of transmembrane domains (resp. non-transmembrane domains). The total
space consists of 689 positive examples and 19256 negative examples. Each of the decision trees (a)-( c) is constructed from 10
positive and 10 negative training sequences. The pair [P%, n%] attached to a leaf shows that p% of positive examples and n%
of negative examples have reached to the leaf. The pair (p%,n%) means that p% of 689 positive (resp. n% of 192.56 negative)
examples are recognized as transmembrane domains (resp. non-transmembrane domains).
than 90% for the total space in average. On the
other hand, for raw sequences the accuracy is not so
good but both accuracies approach to the same line
as the number of training examples increases.
2. The number of nodes of a decision tree is reasonably
small. But when the number of training examples
is larger, the number of nodes in a decision tree becomes larger while the accuracy is not improved very
much. There may arise the problem of overfitting.
A new discovery obtained from these decision trees
is that the motif "--" drastically rejects positive examples. After knowing the negative motif "--", we have
examined the decision trees with a single node with the
patterns of the form
for n ~ 3. The best is the pattern containing "-" five
times. The result is quite acceptable as shown in Table
2.
623
Accuracy (%)
100
·-0'-"
.--0'-"
90
indexed positive
indexed negative
raw positive'
raw negative
20
80
Jit. ..... ....
•••••
70
~: : :~: : : .: : : : : : : : : : : : : :
Number of Nodes
in Decision Tree
10
••• 4-•••• ""
raw
indexed
If:~····-u:
20
40
60
80
100
Number of Training Examples
Figure 6: Relations between the number of training exa.mples, accuracy and the number of nodes in a decision tree
Pattern
POS (689)
NEG (19256)
18296 (95.0%)
With these decision trees over regular patterns, we
have developed a transmembrane domain predictor that
reads an amino acid sequence of a protein as an input
and predicts symbol by symbol whether each location of
a symbol is in a transmembrane domain or not. Experiments on all protein sequences in PIR show that the
success rate is 85% 90%.
f'V
5
PAC-Learnable Class
This section provides a theoretical foundation on the
classes of sets classified by decision trees over regular
patterns from the point of algorithmic learning theory
[Valiant 1984].
For integers k, d ~ 0, we consider a decision tree T over
k-variable regular patterns whose depth is at most d. We
denote by DTRP(d, k) the class of languages defined by
decision trees over k-variable regular patterns with depth
at most d.
Theorem 1 DTRP( d, k) is polynomial-time learnable
for alld, k ~ o.
We need some terminology for the above theorem.
When we are concerned with learning, we call a subset of
:E* a concept. A concept class C is a non empty collection
of concepts. For a concept c E C, a pair (x, c(x)) is called
an example of c for x E :E*,where c(x) = 1 (c(x) = 0) if
x is in c (is not in c). For an alphabet :E and an integer
n ~ 0, :E$n denotes the set {x E :E* Ilxl ::; n}.
A concept class C is said to be polynomial-time learnable [Blumer et al. 1989, Natarajan 1989, Valiant 1984]
if there is an algorithm A which satisfies (1) and (2).
(1) A takes a sequence of examples as an input and runs
in polynomial-time with respect to the length of input.
(2) There exists a polynomial p("".) such that for any
integer n ~ 0, any concept c E C, any real number
e, 8 (0 < e,8 < 1), and any probability distribution
P on :E$n, if A takes p(n,;,
examples which are
generated randomly according to P, then A outputs,
with proba.bility. at least 1 - 8, a representation of a
hypothesis h with P(cEB h) < e.
t)
Theorem 2 [Blumer et al. 1989, Natarajan 1989] A
concept class -C is polynomial-time learnable if the following conditions hold.
1. C is of polynomial dimension, i.e., there is a polynomial d(n) such that I{c n :E$n IcE C}I :::; 2d(n) for
all n ~ O.
2. There is a polynomia.l-time algorithm called a
polynomial-time hypothesis finder for C which produces a hypothesis from a sequence of examples such
that it is consistent with the given examples.
624
Moreover, the polynomial-time hypothesis finder for C
is a learning algorithm satisfying (1) and (2) if C satisfies
1.
The following lemma can be easily shown.
Lemma 1 Let T be a decision tree over regular patterns
and Tv be a subtree of T at node v. We denote Tv by
7r(To, TJ), where 7r is the label of node v and To, Tl are
the left and right subtrees of Tv, respectively. Let S be a
set of strings and let T' be the tree obtained from T by
replacing Tv wit.h To at node v. If no string in S matches
7r, then L(T) n S = L(T') n S.
Proof of Theorem 1.
First we show that the concept class DTRP( d, k) is of
polynomial dimension. Let DTRP(d, I.~)n = {L n ~~n I
L E DTRP( d, I.:)} for n 2: O. \Ve evaluate the cardinality
of DTRP(d, k)n. Let 7r be a regular pattern with 17r1 >
n + k, then no string of length at most n matches 7r. By
Lemma 1, we need to consider only regular patterns of
length at most n + k. The number of such patterns is
roughly bonnded by (I~I + 1 )n+k. Since a tree of depth
bounded by d has at most 2d - 1 internal nodes and at
2d
most 2d leaves, IDTRP(d, k)nl ::; ((I~I + 1)n+k?d- 1 ·2
This shows that the dimension of DTRP(d, k)n is O(n).
Next we show that there is a polynomial-time hypothesis finder for DTRP(d, k). Let P and N be the sets of
strings which appear in positive and negative examples,
respectively. Let TI (I.:, P, N) be the set of regular patterns 7r up to renaming of variables such that it contains
at most k variable occurrences and 7r() is a substring of
some s in PuN. By Lemma 1, we need to consider only
patterns in TI( I.~, P, N) in order to find a decision tree over
regular patterns which is consistent with P and N. Then
ITI(k, P, N)I ::; LSEPUN((lsI 2 )k+ 1 ). Therefore the number
d
2d
of possible trees is bounded by (ITI(k, P, N)1)2 -l . 2 ,
which is bounded by a polynomial with respect to the
input length LSEPUN lsi·
It is known that, given a regular pattern 7r and string
W, we can decide in polynomial time whether w matches
7r or not. Therefore, given a string wand a decision
tree T over k-variable regular patterns whose depth is
at most d, we can decide whether w E L(T) or not in
polynomial-time.
The required polynomial-time algorithm enumerates
decision trees T over regular patterns in TI(k, P, N) with
depth at most d. Then it checks whether s E L(T) for
each s E P and t ~ N for each tEN. If such a tree T is
found, the algorithm outputs T as a hypothesis. D
\Ve should say that the polynomial-time learning algorithm in the proof of Theorem 1 exhausts an enormous
amount of time and is not suited for practical use.
We may understand the relationship of Algorithms 1
and 2 in Section 3 to Theorem 1 in the following way:
\Vhen we set n(p, N) to be the family of k-variahle regular patterns made from P and N, Algorithms 1 and 2 run
sufficiently fast in practicalnse (of conrse, in polynomialtime) and produce a decision tree over I.~-variable regular
patterns which classifies given positive and negative examples. But the produced decision tree is not guaranteed
to be of depth at most d. Hence, these algorithms are
not any learning algorithm in the exact sense of (2).
However, experiences tell that these algorithms usually find small enough decision trees over regular patterns in our experiments on transmembrane domains.
For the class DTRP(d, I.:), Theorem 2 asserts that if a
polynomial-time algorithm A produces a decision tree
over k-variable regular patterns with depth at most d
which classifies given positive and negative examples
then it is a polynomial-time learning algorithm. In
this sense, we may say that Algorithms 1 and 2 are
polynomial-time algorithms for DTRP(d, I.~) which often produce reasonahle hypotheses although there is no
mathematical proof showing how often snch small hypotheses are obtained. This aspect is very important
and useful when we are concerned with machine discovery.
Ehrenfeucht and Haussler [1989] have considered
learning of decision trees of a fixed rank. For learning
decision trees over regular patterns, the restriction by
rank can be shown to have no sense. Instead, we consider the depth of a decision tree. It is also reasonable
to put a restriction on the number of variables in a regular pattern. It has been shown that the class of regular
pattern languages is not polynomial-time learnable unless NP =1= RP [Miyano et al. 1991]. Therefore, unless
restrictions such as bound on the number of variables
in a regular pattern are given, we may not expect any
positive results for polynomial-time learning.
6
Conclusion
We have shown that the idea of combining regular patterns and decision trees works quite well for transmembrane domain identification. The experiments also have
shown the importance of negative motifs.
A union of regular patterns is regarded as a special
form of a decision tree called a decision list. We have
reported in [Arikawa et al. 1992] that the union of small
number of regular patterns can also recognize transmembrane domains with high accuracy. However, the time
exhausted in finding hypotheses in [Arikawa et al. 1992]
is much larger than that reported in this paper.
Our system constructs a decision tree over regular patterns just from strings called positive and negative examples. We need not take care of which attribut~s to
specify as in ID3. Therefore it can be applied to 'another
classification problems for proteins and DNA sequences.
We believe that our approach provides a new application
625
of algorithmic learning to Molecular Biology.
We are now in the process of examining our method
for some other related problems s11ch as predicting the
secondary structure of proteins.
[Lipp et al. 1989] J. Lipp, N. Flint, M.T. Haellptle and
B. Dobberskin. Structural Requirements for Membrane Assemhly of Proteins Spanning the 1\lembrane Several Times. J. Cell BioI., Vol. 109 (1989),
pp. 2013-2022.
References
[Miyano et al. 1991] S. Miyano, A. Shinohara and
T. Shinbhara. \Vhich Classes of Elementary Formal
Systems are Polynomial-Time Learnable? In Proc.
2nd Algorithmic Learning Theory, Tokyo, 1991. pp.
139-150.
[Arikawa et al. 1992] S. Arikawa, S. Kuhara, S. Miyano,
A. Shinohara and T. Shinohara. A Learning Algorithm for Elementary Formal Systems and its Experiments on Identification of Transmembrane Domains. In Proc. 25th Hawaii Int. Conf. on Sys. Sci,
IEEE, Hawaii, 1992. pp. 675-684.
[Rairoch 1991] A. Bairoch, PROSITE: A Dictionary of
Sites a.nd Patterns in Proteins, sl Nucleic Acids Res.,
Vol. 19 (1991), pp. 2241-2245.
[Blumer et al. 1989] A. Blumer, A. Ehrenfeucht, D.
Haussler and M.K. ·Warmuth. Learnahility and the
Vapnik-Chervonenkis Dimf'nsion. JACM, Vol. 36
(1989), pp. 929-965.
[Ehrenfeucht and Haussler 1989] A. Ehrenfeucht and
D. Haussler. Learning Decision Trees from Random Examples. Inform. Comput., Vol. 82 (1989),
pp. 231-246.
[Engelman et al. 1986] D.M. Engf'lman, T.A. Steiz and
A. Goldman. Identifying Nonpolar Transbilayer Helices in Amino Acid Sequences of Membrane Proteins. Ann. Rev. Biophys. Biophys. Chem., Vol. 15
(1986), pp. 321-353.
[Gusev and Chuzhanova 1990] V.
Gusev,
N. Chuzhanova. The Algorithms for Recognition of
the Functional Sites in Genetic Texts. In Proc. 1st
"Vorkshop on Algorithmic Learning Theory, Tokyo,
1990. pp. 109-119.
[Hartmann et al. 1989] E. Hartmann, T.A. Rapoport
and H.F. Lodish. Predicting the Orientation of
Eukaryotic Membrane-Spanning Proteins. In Proc.
Natl. Acad. Sci. U.S.A., Vol. 86 (1989), pp. 57865790.
[Holly and Karplus 1989] L.H. Holly and M. Karplus.
Protein Secondary Structure Prediction with a Neura.l Network, In Proc. Nat!. Acad. Sci. USA, Vol. 86
(1989), pp. 152-156.
[Kyte and Doolittle 1982] J. Kyte and R.F. Doolittle.
A Simple Method for Displaying the Hydropathic
Character of Protein. J. Mol. BioI., Vol. 157 (1982),
pp. 10.5-132.
[Natarajan 1989] n.K. Natarajan, On Learning Set.s and
Functions. AJachine Learning, Vol. 4 (1989), pp. 6797.
[PIR] Protein Identification Resource, National Biomf'dical Resf'arch Foundation.
[Quinlan 1986] J.R. Quinlan, Induction of Decision
Trees. l\larhine Learning, Vol. 1 (1986), pp. 81-106.
[Quinlan and Rivest 1989] J.R. Quinlan and R. L.
Rivest. Inferring Decision Trees using the 1\finimum Description Length Principle. Inform. Comput., Vol. 80 (1989), pp. 227-218.
[Rao and Argos 1986] J.K.1\1. Rao and P. Argos. A Confirmationa.l Preferf'nce Paramder to Predict Helices
in Integral 1\lemhrane Proteins. Biorhim. Biophys.
Acta, Vol. 869 (1986), pp. 197-214.
[Shinoha1'a 1982] T. Shinohara. Polynomial Time Inference of Pattern Languages and its Applicat ions. In
Proc. 7th IBM Symp. !lJathematiral Foundations of
Computer Science, 1982. pp. 191-209.
[Shinohara 1981] T. Shinohara. Polynomial Time Inference of Regular Pattern Languages. In Proc.
RL\JS Symp. SoftWAre Science and Engin('('ring
(Lecture Notes in Computer Science, Vol. 147),
1983. pp. 115-127.
[Utgoff 1989] P.E. Utgoff. Incrf'mental Induction of Decision Tree. MAchine Learning, Vol. 4 (1989),
pp.161-186.
[Valiant 1981] L. Valiant. A Theory of the Lf'arna hIe.
Commun. A.CM, Vol. 27 (1981), pp. 113/1-1142.
[Von Heijine 1988] G. von Heijine. Transcending the Impenetrahlf': How Proteins Come to Terms with
Membranes. Biochim. Biophys. A.cta, Vol. 9-17
(1988), pp. 107-333.
[\Vu et al.] C.H. \Vu, G.M. \Vhiston and G ..J. :Montllor.
PROCANS: A Protein Classification System Using
a Neural Network, J.JCNN Int. Joint Conf. N(,l1ral
Networks, Vol. 2 (1990), pp. 91-96.
PROCEEDINGS OF THE INTERN A TIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
626
Efficient Induction of Version Spaces through
Constrained Language Shift
Claudio Carpineto
Fondazione Ugo Bordoni
Via Baldassarre Castiglione 59, 00142 - Rome, ITALY
fubdpt5@itcaspur.bitnet
Abstract
A large hypothesis space makes the version space
approach, like any other concept induction algorithm based
on hypothesis ordering, computationally inefficient.
Working with smaller composable concept languages rather
than one large concept language is one way to attack the
problem, in that it allows us to do part of the induction job
within the more convenient languages and move to the less
convenient languages when necessary. In this paper we
investigate the use of multiple concept languages in a
version space approach. We define a graph of languages
ordered by the standard set inclusion relation, and provide
a procedure for efficiently inducing version spaces while
shifting from small to larger concept languages. We apply
this method to the attribute languages of a typical
conjunctive concept language (i.e., a conjunctive concept
language defined on a tree-structured attribute-based
instance space) and compare its complexity to that of a
standard version space algorithm applied to the full concept
language. Finally we contrast our approach with other
work on language shift, outlining an alternative highlyconstrained strategy for searching the space of new
concepts which is not based on constructive operators.
1 Introduction
Of all the algorithms for incremental concept induction that
are based on the partial order defined by generality over the
concept space, the candidate elimination (CE) algorithm
[Mitchell 1982] is the best known exemplar. The CE
algorithm represents and updates the set of all concepts that
are consistent with data (Le. the version space) by
maintaining two sets, the set S containing the maximally
specific concepts and the set G containing the maximally
general concepts. The procedure to update the version
space is as follows. A positive example prunes concepts in
G which do not cover it and causes all concepts in S which
do not cover the example to be generalized just enough to
cover it. A negative example prunes concepts in S that
cover it and causes all concepts in G that cover the example
to be specialized just enough to exclude it. As more
examples are seen, the version space shrinks; it may
eventually reduce to the target concept provided that the
concept description language is consistent with the data.
This framework has been later improved along several
directions. The first is that of incorporating the domain
knowledge available to the system in the algorithm; this has
resulted in feeding the CE algorithm with analytically-
generalized positive examples (e.g., [Hirsh 1989],
[Carpineto 1990]), and analytically-generalized negative
examples (e.g., [Carpineto 1991]). Another research
direction is to relax the assumption about the consistency of
the concept space with data. In fact, like many other
learning algorithms, the CE algorithm uses a restricted
concept language to incorporate bias and focus the search
on a smaller number of hypotheses. The drawback is that
the target concept may be contained in the set of concepts
that are inexpressible in the given language, thus being
unleamable. In this case the sets Sand G become empty: to
restore consistency the bias must be weakened adding new
concepts to the concept language [Utgoff 1986]. Thirdly,
the CE algorithm suffers from lack of computational
efficiency, in that the size of S and G can be exponential in
the number of examples and the number of parameters
describing the examples [Haussler 1988]. Changes to the
basic algorithm have been proposed that improve efficiency
for some concept language [Smith and Rosenbloom 1990].
In this paper we investigate the use of multiple concept
languages in a version space approach. By organizing the
concept languages into a graph corresponding to the
relation larger-than implicitly defined over the sets of
concepts covered by the languages, we have a framework
that allows us to shift from small· to larger concept
languages in a controlled manner. This provides a powerful
basis to apply a general divide-and-conquer strategy to
improve the efficiency of a standard version space
approach in which the concept description language is
factorizable. The idea is to start out with the smallest
concept languages (Le., the factor languages) and, once the
version spaces induced over them have become
inconsistent with the data, to move along the graph of
product languages to the maximally small concept
languages that restore consistency. Working with smaller
concept languages may greatly reduce the size of S and G,
thus resulting in a neat improvement in efficiency. On the
other hand, use of several languages in parallel and
language shifts negatively affect complexity. Therefore the
two main objectives of the paper are : (1) define a set of
languages and a procedure for inducing version spaces
after any language shift efficiently, (2) show that in some
cases this method may be applied to reduce the complexity
of the standard CE algorithm. Since this framework
supports version-space induction over a set of concept
languages, it can also be suitable to handle inconsistency
when the original concept language is too small. More
generally, it suggest an alternative approach to inductive
language shift in which the search for useful concepts to be
added to the concept language is not based on constructive
operators. This aspect is also discussed in the paper.
627
any suit
anyrank
"\
../"\ /"\
/
black
/
• •
•
"\
face
red
J
numbered
//"\ //.. ~
Q
K
1 2
10
Fig. 1. Two concept languages in the playing cards domain.
The rest of the paper is organized as follows. In the
next section we define a graph of conjunctive concept
languages and describe the learning problem with respect to
it. Then we present the learning method. Next, we apply
the method to the factor languages of a conjunctive concept
language defined on a tree-structured attribute-based
instance space, and evaluate its utility. Finally we compare
this work to other approaches to factorization in concept
induction and to inductive language shift.
Incrementally Find
The version spaces in the set of product
concept languages that are consistent with data
and that contain the smallest number of factors.
2 The learning problem
We first introduce the notions that characterize our learning
problem. In the following concepts are viewed as sets of
instances and languages as sets of concepts.
A concept c 1 is more general than a concept c2 if the set
of instances covered by c 1 is a proper superset of the set of
instances covered by c2.
A language LI is larger than a language L2 if the set of
concepts expressible in LI is a proper superset of the set of
concepts expressible in L2.
In the playing cards domain, which we shall use as an
illustration, two possible concept languages are: LI =
{anysuit, black, red, "', ... , ., • } and L2 = {anyrank,
face, numbered, J, Q, K, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1O}.
The relation more-general-than over the concepts present in
each language is shown in fig 1.
The product L1,2 of two factor languages LI and L2 is
the set of concepts formed from the conjunctions of
concepts from Ll and L2 (examples of product concepts are
'anyrank-anysuit', 'anyrank-black', etc). The number of
concepts in the product language is therefore the product of
the number of concepts in its factors. Also, a concept
cel' c2' in the language LI2 is more general than (» another
co~cept ccl",c2" if and on'ly if CI' > Cll! and c2'> c21!.
With n initial languages it is possible to generate Lk=l,n
n! / (n - k)! k! = 2 0 - 1 product languages (see fig. 2).
Moreover, given that the superconcept 'any' can always be
added to each factor language, the relation larger than over
this set of languages can be immediately established, for
each product language is larger than any of its factor
languages.
The learning problem can be stated as follows.
A set of factor concept languages
A set of positive instances.
A set of negative instances.
Fig. 2. The graph of product languages with three
factor languages.
3. The learning method
In this approach concept learning and language shift are
interleaved. We process one instance at a time, using a
standard version space approach to induce consistent
concepts over each language of the current set (initially,
the n factor languages). During this inductive phase some
concept languages may become inconsistent with the data.
When every member of the current set of languages has
become inconsistent with data, the language shifting
algorithm is invoked. It iteratively selects the set of
maximally small concept languages that are larger than the
current ones (i.e. the two-factored languages, the threefactored languages, etc.) and computes the new version
spaces in these languages. It halts when it finds a
consistent set of concept languages (i.e. a set in which
there is at least one consistent concept language); then it
returns control to the inductive algorithm to process
additional examples.The whole process is iterated as long
as the set of current languages can be further specialised
(i.e. until the n-factored language has been generated). We
call this algorithm Factored Candidate Elimination (FCE)
algorithm. The top-level FCE algorithm is presented in
table 1.
The core of the algorithm is the procedure to find the
new consistent version spaces in the product languages (in
italics in table 1). The difficulty is that the algorithm for
inducing concepts over a language (the inductive algorithm)
is usually distinct from the algorithm for adding new terms
to the language itself (the language-shifting algorithm). In
628
Table 1: The top-level FCE algorithm
Input:
Output:
Variables:
Function:
An instance set {I}.
A set of partially ordered concept languages {L} formed by n given
one-factored languages and their products.
The version spaces in the set of languages {L} that are consistent with {I} and that
contain the smallest number of factors.
{LS}k is the subset of (unordered) languages in {L} which have k factors.
{VS} k is a set of version spaces, with 1VSki = ILSkl.
{Ls,VS}k is the set of pairs obtained pairing the corresponding elements in {LS}k
and {VS}k.
CE(i,l,vs) takes an instance, a concept language and a version space and returns the
updated version space.
FCE({I},{L})
K=1.
{VSh
={Lsh.
For each instance i in {I},
For each (ls,vs) in {LS,VS}k
vs = CE(i,ls,vs).
If all the version spaces in {VS} k are empty
Then Repeat
IfK=n
Then Returnfailure
K=K+1.
For each Is in {LS}k,
find the new version space vs associated with it.
Until at least one vs is not empty.
general, the inductive algorithm has to be run again over
the instance set after any change made by the languageshifting algorithm ([Utgoff 1986], [Matheus and Rendell
1989], [Pagallo 1989], [Wogulis and Langley 1989]). In
this case, however, in defining the procedure to induce the
new consistent concepts after any language shift, we take
advantage of the features of the particular inductive learning
algorithm considered (i.e. the CE algorithm) and of the
properties of language "multiplication". The two key facts
are that the CE algorithm makes an explicit use of concept
ordering and that concepts in any product language
preserve the order of concepts in its factors. This makes it
possible to modify the basic CE algorithm with the aim of
computing the set of consistent concepts in a product
language as a function of some appropriate concept sets
induced in its factors.
The concept sets computed in each factor language
which will be utilized during language shift are the
following. First, for each language we compute the set S*.
S* contains the most specific concepts in the language that
cover all positive examples, regardless of whether or not
they include any negative examples. Second, for each
language and each negative example, we compute the set
0*. 0* contains the most general concepts in the language
that do not cover the negative example, regardless of
whether or not they include all positive examples.
These operations can be better illustrated with an
example. Let us consider again the playing cards domain
and suppose that we begin with the two concept languages
introduced above - rank (L 1) and suit (L2). Let us suppose
the system is given one positive example - the Jack of
spades - and two negative examples - the Jack of hearts and
the Two of spades. We compute the two corresponding
version spaces (one for each language), the sets S* (one
for each language), and the sets 0* (one for each language
and for each negative example) in parallel. In particular, the
sets S* and 0* can be immediately determined, given the
ordering over each language's members. The inductive
phase is pictured in fig.3 (f stands for face, b for
black,etc).
The three instances cause both of the version spaces to
reduce to the empty set. The next step is therefore to shift
to the set of maximally small concept languages that are
larger than Ll and L2 (in this case the product L 12) and
check to see if it contains any concepts consistent with
data. The problem of finding the version space in the
language L12 can be subdivided into the two tasks of
finding the lower boundary set S 12 (i.e. the set of the
most specific concepts in L12 that are consistent with data)
and the upper boundary set 0 12 (i.e. the set of the most
general concepts in L12 that are consistent with data).
Computation of S 12
Because a product concept contains an instance if and
only if all of its factor concepts contain the instance, the
product of S 1* and S2 * returns the most specific factor
concepts that include all positive instances. By discarding
those that also cover negative examples, we get just the set
S 12' If the set becomes empty, then the product language is
also inconsistent with the data. More specific concepts, in
fact, cannot be consistent because they would rule out
some positive example. More general concepts cannot be
consistent either, for they would cover some negative
examples. In our example, as there is only one positive
example, the result is trivial: S 12 = {J'" }.
629
vers-s P
2
vers-sP1
/n
+
J.
y
G*
any",
b" •
b
/f
J
{}
"
•
S~ = {J}
s;={.}
*
~={n,Q,K}
*
~={b,
+}
*
{}
{}
2.
5 *
G={f,1,3 .. 10}
1
*
~={r, ... }
Fig. 3. Concept sets computed during the inductive phase.
Computation of 012
The simplification with the set S12 returns:
Rather than generating and testing for consistency all
the product concepts more general than the members of
S12, the set 0 12 is computed using the sets 0* As for
each negative example there must be at least. one factor
concept in each consistent product concept whIch does not
cover the negative example, and because we se~k t~e
maximally general consistent product concepts, the Idea IS
to use the members of the sets 0* as upper bounds to find
the factor concepts present in such maximally general
product concepts.
. .
The algorithm is as follows. It begms by droppmg
from the sets 0* the elements that cannot generate factor
concepts that are more general than those contained in S12'
Then, it (a) finds all the conjunctions o~ concept~ in the
reduced sets 0* such that each negative mstance IS ruled
out by at least one concept, and (b) checks if there are ~ore
general consistent conjunc.tions. SteI? (~) reqUIres
conjoining each factor concept I~ each 0* (It wII~ rul~ out at
least one negative example) wIth all the combmatIOns of
factor concepts in the other O*'s which rule out. the
remaining negative examples. Step (b) reqUIr~s
generalising (with the value 'any') the factor c<:mcepts m
the conjunctions found at the en.d of step (a) WhICh do ~ot
contribute to rule out any negatIve example. The resultmg
set of conjunctions, if any is found, coincides with the set
0 12, in that there cannot be more. general product co?cepts
consistent with data. However, It may not be possIble to
find a consistent concept conjoining the members of the
O*'s. In this case we are forced to specialise the members
of the O*'s to the extent required so that they rule out more
negative instances, and to iterate the procedure (in the limit,
we will get the set S12)'
In our example there are just two factors and only two
negative instances. The initial sets 0* are:
tn,
Q, K}
{f, 1, 3,00, 10}
{b, + }
{r, ... }
{ }
{b}
{f}
{ }
Step (a) in this case reduces to the union of ~he
conjunction of 01 * relative t? inst~nce 1 and 02* r~latIve
to instance 2 and the conjunctIon of 01 * relatIve to
instance 2 and 02* relative to instance 1. The resull ({ fb })
does not need be generalized (step (b)) for both 'f and 'b'
contribute to rule out (at least) one negative example. Also,
in this case, the specialisation procedure is not needed
because we have been able to find a consistent conjunction:
0 12={fb }. The overall version space in the language 1.,12 is
shown in fig.4.
~
t"'M
G
12
Jb
Fig.4. The version space in the product language after
the constructive phase.
630
4 Evaluation
There are two ways in which the factored CE algorithm
(FeE) can be used to reduce the complexity of the standard
CE algorithm. Either we use a graph-factoring algorithm
[Subramanian and Feigenbaum, 1986] to find the factors of
a given concept space (provided that it is factorable), or we
choose a concept language that can be naturally
decomposed into factor languages. Here we evaluate the
utility of the FCE algorithm with respect to a simple but
widely used concept language that has this property. We
consider a conjunctive concept language defined on a treestructured attribute-based instance space. We assume the
number of attributes be n, each attribute with I levels and
branching factor b (the case can be easily extended to
nominal and linear attributes, considering that a nominal
attribute can be converted in a tree-structured attribute using
a dummy root 'any-value', and that a linear attribute can be
considered as a tree-structured attribute with branching
factor = 1). Each term of the concept space is a conjunction
of n values, one for each attribute; the total number of
terms in the concept space is [(bI - 1) / (b - 1)]n. It is worth
noting that with such a concept language the set S of the
version space will never contain more than one element
[Bundy et al. 1985]. Even in this case, however, Haussler
[1988] has shown that the size of the set G can still be
exponential, due to its fragmentation.
In the following we compare the CE algorithm applied
to this full conjunctive concept language to the FCE
algorithm applied to its attribute languages. While their
relative performances are equivalent, in that in order to
find all the concepts consistent with data in the full concept
language it suffices to eventually compute the boundaries
of the n-factored version space, their time complexity may
strongly vary. The gain/loss in efficiency ultimately
depends on the number of instances that each intermediate
language is able to account for before it becomes
inconsistent. In the best case all the induction is done
within the smallest languages,and language shift to larger
languages is not necessary. In the worst case no consistent
concepts are induced in the smaller languages, so that all
the induction is eventually done within the full concept
language.
To make a quantitative assessment we have to make
assumptions about a number of factors in addition to the
structure of factor and product languages, including target
concept location, training instance distribution, cost of
matching concepts to training instances. We consider the
worst case convergency to the target concept in the full
concept language. This amounts to say that after the first
positive instance (the first instance must be positive in the
CE algorithm) there are only negative instances, and that
each of them causes only one concept to be removed from
the version space until it shrinks to the target concept (i.e.,
the first positive instance). In terms of the full concept
language ordering this means that general concepts are
removed earlier than any of their more specific concepts.
Furthermore, we assume that the generality of the attribute
values in the concepts dropped from the version space
decreases uniformly. More precisely, we assume that if an
attribute value in a dropped concept is placed at level k in
the corresponding attribute tree, then the values of that
attribute in the remaining consistent concepts are placed at
most at level k+ 1. This presentation of training instances
has the effect of maximizing the amount of instances that
each intermediate language can take in before it becomes
inconsistent.
As for the cost of matching concepts to instances and
other concepts we assume that it is the same in all
languages.
We can now analyse the complexity in the two
approaches. As done in [Mitchell 1982], the time
complexity bounds indicate bounds on the number of
comparisons between concepts and instances, and
comparisons between concepts.
CE ale-orithm with full conjunctive concept lane-uae-e.
Let q be the number of negative instances, g the largest size
of G. Following [Mitchell 1982], in our case the key term
is 0(g2q). The maximum size of G is given by the largest
number of unordered concepts that can be found in the
version space after the first positive instance. This number
turns out to be 0(n21). To illustrate, first we must note that
the version space after the first positive instance will
contain the concepts more general than the instance,
therefore the admissible values for each attribute will be the
I values in the attribute tree that are placed in the chain
linking the attribute value in th~ instance to the root of the
attribute tree. When n = 2 there "are. at most I ways to
choose a pair of values from two ordered sets of size 1 in
such a way that the pairs are unordered. When n increases,
this number comes to be multiplied by n / (n -2)! 2! . In
fact, considering that two n-factored concepts are
unordered if they contain at least two factor concepts with
different orderings, all the possible unordered n-factored
concepts can be obtained considering the same
combinations as in the 1 original unordered concepts for
each possible way of choosing a pair of attributes from
among the n attributes. The maximum size of G is
therefore 0(n21). The complexity of the CE algorithm is
0(n412q).
FCE al~orithm with attribute lane-uae-es. In this case
several concept languages are active at once. For each
negative instance we have to update in parallel at most
maxk [n! / (n-k)!k!], that is 0(n 2), version spaces. Given
our hypothesis on instance distribution, the g value of the
intermediate version spaces will be 1 for the one-factored
languages, 2 for the two-factored languages, .. , n for the
n-factored languages. The largest value of g is n, and the
relative complexity factor for each version space is
therefore 0(n2). Thus the time taken to induce version
spaces within the set of active languages is at most
0(n2n2q) =0(n4q).
.
The total time complexity can be calculated adding the
time taken by language shift to the time taken by concept
induction alone. The cost of shifting the concept languages
is given by the number of language shifts (2n) multiplied
by the cost of any single language shift. The time taken by
any single language shift becomes constant if we modify
the FCE algorithm's inductive phase by labelling each
. member of each G* and any of its more specific concepts
with all the negative instances it does not cover. In this
way, in fact, the operations described in the procedure to
compute the G set in any product language will no longer
involve any matching between concepts and instances. On
the other hand, the cost of labelling must now be added to
the cost of language shift. The labelling we introduced
requires matching each negative instance against the
members of n G*'s (we keep only the G*'s relative to the
initial factor languages), where each G* contains only one
member (in our case, in fact, as there is only one positive
instance, we can immediately remove the concepts that are
not more general than the positive instance from the G*'s,
631
at an additional cost of O(qnbl», and repeat for all the 1
more specific concepts of each member of G* (Le., the
concepts contained in the chain of admissible values
relativ~ to that G*'s factor language). Therefore labelling
takes In all O(qnl) + O(qnbl) = O(qnbl). The time
complexity of language shift is 0(2n) + O(qnbl). The
overall time complexity is therefore 0(n4q) + 0(2n) +
O(qnbl), which, for practical values of n, b, and 1,
approximates to 0(n4q).
In sum, we have 0(n4}2q) in the CE algorithm versus
0(n4q) in the FeE algorithm. The effect of using the FCE
algorithm with the chosen instance distribution appears to
be that of blocking the fragmentation of G due to 1. It is
also worth noting that the factor 0(n2) in the FCE
algorithm due to the presence of multiple languages can be
reduced by reducing the number of intermediate product
languages employed. This would, on the other hand, be
counteracted by an increase of the factor 0(n2) due to the g
of the intermediate languages. Here is a trade-off between
using few concept languages and using many concept
languages in a given range. The fewer the concept
languages, the less the amount of computation devoted to
parallel induction and language shift. The more the concept
languages, the more likely it is that a smaller amount of
induction will be done within the largest concept
languages, which are the least convenient. Experimentation
might help investigate this kind of trade-off.
5
Relation to factorization in concept induction
Factorization with smaller concept languages in the CE
algorithm has been first explored in [Subramanian and
Feigenbaum 1986] and [Genesereth and Nilsson 1987].
Although we were inspired by their work, our goals,
methods and assumptions are different. First, in
[Subramanian and Feigenbaum 1986] and [Genesereth and
Nilsson 1987].1anguage factorization has been used with
the aim of improving efficiency during the phase of
experiment generation, whereas we have investigated its
utility during the earlier and more important stage of
version-space induction from given examples and counterexamples. Second, while they have primarily addressed the
problem of factoring a version space and assessing credit
over its factors, we have focussed on language shift
during version-space induction over a set of available factor
and product languages. Third, their approach relies on the
assumption that the given concept langage is factorable into
independent concept languages l . By contrast, when
applying the FCE algorithm directly to the attribute
languages of a conjunctive concept language it is not
necessary the attribute languages be independent. For
example, the two factor languages we have used as an
illustration throughout the paper (Ll and L2) happen to be
ITwo concept languages LA and LB are independent if membership
in any of the concepts from LA does not imply or deny membership
in any of the concepts in LB. This definition implies that for every
concept a in LA and every concept bin LB the intersection of a and b
is neither empty nor equal to either conceptTwo independent concept
languages are unordered with respect to the larger-than relation.
two independent languages 2 ; however, we could well
apply the FeE algorithm to the concept language LB we
introduced earlier along with the concept language Lc =
{anyrank, odd, even, 1, 3, 5, 7, 9, J, K, 2, 4, 6, 8, 10,
Q}, these two languages being not independent (the
intersection of the concept "2" in LB and the concept "odd"
in Lc is empty, for instance). Using non-independent
factor languages, as their product may contain a large
number of empty or redundant concepts, may badly affect
the performance when the FeE algorithm is applied to
recover from inconsistency due to use of small concept
languages. But it does not seem to affect the result when
the FeE algorithm is used to improve efficiency with
respect to the full conjunctive concept language.
6
Relation to inductive language shift
As mentioned earlier, the FCE algorithm can also be
seen as a method for introducing new concepts to
overcome the limitations of a set of restricted concept
languages (Le., the factor languages). It does so by
creating another set of larger concept languages (Le., the
product languages) to constrain the search for new useful
concepts. This is a significant departure from the search
strategy usually employed in most approaches to inductive
language shift. Regardless of the specific goal pursued many systems deal with improvement of some quality
measures of the learned descriptions rather than with their
correctness - "the problem of new terms" [Dietterich et al.
1982] or "constructive induction" [Michalski 1983] is in
general tackled by defining a set of appropriate constructive
operators and carrying out a depth-first search through the
space of the remaining concepts to find useful (e.g.,
consistent, more concise, more accurate) extensions to be
added to the given language. Furthermore, since the
number of admissible extensions is generally intractably
large, most of the approaches to constructive induction rely
on various heuristics to reduce the number of candidate
additional concepts and/or to cut down the search (e.g,
[Matheus and Rendell 1989], [Pagallo 1989], [Wogulis
and langley 1989]).
By contrast, we compute and keep all the admissible
language extensions (in a given set of extensions) that
restore consistency with data, rather than considering one
or few plausible language extensions at a time. Just as the
relation more general than that is implicitly defined over the
terms of a concept language may allow efficient
representation and updating of all consistent concepts
[Mitchell 1982], so too the relation larger than that is
implicitly defined over a set of languages may provide the
framework to efficiently organize the small-to-Iarge
breadth-first search of useful languages. These
considerations suggest that an alternative abstract model for
language shift can be formulated, in which the search for
new concepts, rather than being based on the use of
constructive operators, is driven by the ordering of a set of
candidate concept languages (work in preparation).
2 It is ~ften the case that attribute choice reflects independencies in
the world, thus giving rise to actual independent factor languages.
632
7. Conclusion
We have presented the FCE algorithm for efficiently
inducing version spaces over a set of partially-ordered
concept languages. The utility of this algorithm is twofold:
improving the efficiency of version-space induction if the
initial concept language is decomposable into a set of factor
languages, and inducing consistent version spaces if a set
of concept languages inconsistent with data is initially
available. In this paper we have focussed on the former.
We have applied theFCE algorithm to the task of inducing
version spaces over a conjunctive concept language defined
on a tree-structured attribute-based instance space, and we
have evaluated when it leads to a reduction in complexity.
Acknowledgements
Part of this work was done while at the Computing Science
Department of the University of Aberdeen, partially
supported by CEC SS project SC1.0048.C(H). I would
like to thank Derek Sleeman and Pete Edwards for their
support and for useful discussions on this topic. The work
was carried out within the framework of the agreement
between the Italian PT Administration and the Fondazione
Ugo Bordoni
633
References
107-148.
[Bundy et al. 1985] A. Bundy, B. Silver, D. Plummer.
An analytical comparison of some rule-learning problems.
Artificial Intelligence, Vol. 27, No.2 (1985), pp. 137-181.
[Wogulis and Langley 1989] J. Wogulis, P. Langley.
Improving efficiency by learning intennediate concepts. In
Proc. 11 th IJCAI, Morgan Kaufmann, Los Altos, pp. 657662.
[Carpineto 1990] C. Carpineto. Combining EBL from
success and EBL from failure with parameter version
spaces. In Proc. 9th ECAI, Pitman, London, 1990, pp.
138-140.
[Carpineto 1991] C. Carpineto. Analytical negative
generalization and empirical negative generalization are not
cumulative: a case study. In Proc. EWSL-1991, Lecture
Notes on Artificial Intelligence, Springer-Verlag, Berlin,
1991, pp. 81-88.
[Dietterich et al 1982] T. Dietterich, B. London, K.
Clarkson, R. Dromey. Learning and inductive inference. In
Cohen & Feigenbaum (Eds.) The Handbook of Artificial
Intelligence, Morgan Kaufmann, Los Altos, 1982.
[Genesereth and Nilsson 1987] M. Genesereth, N.
Nilsson. Logical Foundations of Artificial Intelligence.
Morgan Kaufmann, Los Altos, 1987.
[Haussler 1988] D. Haussler. Quantifying inductive bias:
Artificial Intelligence learning algorithms and Valiant's
learning framework. Artificial Intelligence, Vol. 36, No.2
(1988), pp. 177-221.
[Hirsh 1989] H. Hirsh. Combining Empirical and
Analytical Learning with Version Spaces. In Proc. 6th
Int.Workshop on Machine Learning. Morgan Kaufmann,
Los Altos, 1989, pp. 29-33.
[Matheus and Rendell 1989] C. Matheus, L. Rendell.
Constructive induction on decision trees. In Proc. 11th
IJCAI, Detroit, Morgan Kaufmann, Los Altos, 1985, pp.
645-650.
[Michalski 1983] R. Michalski. A theory and methodology
of inductive learning. Artificial Intelligence, Vol. 20, 1983,
pp.111-161.
[Mitchell 1982] T. Mitchell. Generalization as Search.
Artificial Intelligence, VoL 18, 1982, pp. 203-226.
[Pagallo 1989] G. Pagallo. Learning DNF by Decision
Trees. In Proc .11 th IJCAI, Morgan Kaufmann, Los Altos,
pp. 639-644.
[Smith and Rosenbloom 1990] B. Smith, P. Rosenbloom.
Incremental Non-Backtracking Focusing: A Poliniomally
Bounded Generalization Algorithm for Version Spaces. In
Proc. 8thAAAI, Morgan Kaufmann, Los Altos, pp. 848853.
[Subramanian and Feigenbaum 1986] D. Subramanian, J.
Feigenbaum. Factorization in Experiment Generation. In
Proc. 5th AAAI, Morgan Kaufmann, Los Altos, 1986, pp.
518-522.
[Utgoff 1986] P. Utgoff. Shift of bias for inductive
concept learning. In Michalski et al. (Eds), Machine
Learning II. Morgan Kaufmann, Los Altos, 1986, pp.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
634
Theorem Proving Engine and Strategy Description Language
Massimo Bruschi
State University of Milan - Computer Science Department
Via Comelico 39, 20135 Milan, Italy
e-mail: mbruschi~imiucca.csi.unimi.it
Abstract
The concepts of strategy description language (SDL) and
theorem proving engine (T PE) are introduced as architectural and applicative tools in the design and use of
an automated theorem proving system. Particular emphasis is given to the use of an SDL as a research tool
as well as a way to use a prover both as a batch or as
an interactive program. In fact, the availability of an
interpreter for such a language offers the possibility of
having a system able to cover both of these usages, giving to the user some way of choosing the granularity of
the steps the prover must take. Three examples are given
to show possible applications. Their purpose is to show
its usefulness for expressing and testing new ideas. Some
interesting capabilities of an SDL are applied to highlight
how it allows the treatment of self-analysis on the state
of the search space. Examples of these are the definition
of a self-adaptive search and a tree pruning strategy. All
the definitions we give reflect a running Prolog prototype
and inherit much from the Prolog style and structure.
1
Introd uction
The uses of and the interest in automated theorem proving have grown markedly in the preceding decade. The
cause rests in part with faster computers, easy access to
workstations, portable and powerful automated theoremproving programs, and successes with answering open
questions. Various researchers in the field conjecture
that far more power is needed to attack the deep problems of mathematics and logic that are currently out of
reach.
Although some of the needed increase in effectiveness
will result from even faster computers, many state that
the real advances will result from the formulation of new
and diverse strategies. Because we feel that the ease of
comparing, analyzing, and formulating such strategies
would be enhanced if an appropriate abstract language
and theory were available, we undertake here the development of such a language. Perhaps the abstraction and
language will lead to needed insights into the nature of
strategy of diverse types. In addition, because of its relation to this language, here we also provide an abstract
treatment of theorem-proving programs as engines. This
abstraction may enable researchers to analyze the differences, similarities, and sources of power among the
radically diverse program designs.
The idea for developing a strategy description language (SDL) usable to define search strategies for a theorem prover was born when we began to study the application of parallelism to ATP. One proposal was to run
many theorem provers on the same problem but with
different search strategies. Having different strategies
expressed as programs would mean having, as input of
each prover process, the couple < theorem, search algorithm >.
The development of a language requires the definition
of an abstract machine to execute its programs, requiring
an interpreter for the language. Our experiences and
previous work with Prolog has suggested its use for the
realization of a prototype.
One simple way to build an interpreter is to define a
kernel module offering the basic services. This led us to
the definition of a theorem-proving engine (TPE). Next,
we developed a theorem prover having an SDL interpreter and a TPE as basic modules. zSDL is the name
of our SDL.
Generally, we conjecture that an SDL might benefit by
having one (or more) of the basic attitudes and of being
procedural, functional, and logical. It should also be able
to focus on the operations with different granularity as
well as directing the prover process, controlling details of
different level of complexity. As a sample model we can
think at production systems in AI, and say that an SDL
could be used to describe the control side of such a system. There can be as many SDL languages as different
production systems.
The language we defined did not result from a deep
analysis of the cited aspects; instead, it has been driven
by the underlying structure of the TPE we developed,
by the fact that is realized in Prolog, and by the wish
to define the language on the field so that it could he
run. One of the nice things about Prolog is that you can
develop executable meta-languages.
635
2
A theorem proving engine
A TPE is a program module devoted to maintain and
operate a knowledge base (K B) of logical formulas and a
set of indexes on them. We think of these indexes as sets
of references (or ids) to the formulas. The sets are distinguished by name. Each formula is retained together
with various information about it.
A TPE can perform two basic activities: inference and
reduction. The object of the first activity is to deduce
new knowledge, gathering it by considering various subsets of the formulas in the KB. The object of the second
activity is to keep the size (or the weight) of the KB as
small as possible, by discarding redundant information.
We require that every successful call to the inference process (IP) also calls the reduction process (RP).
To better define the activities of a TPE, we focus on
a possible minimal interface to such a module. We assume that the TPE finds the KB initialized with a given
input set of formulas and that each operation maintains
appropriately the indexes. We shall extend this interface
gradually in the paper.
The kernel functions of a TPE can be:
set {N1 ,N2 , ... ,Nm } will refer to their resolvents
(if any). Rules with single premises are called with
8uperpose(Id,Id).
• (TPE.4) - delete(+Id) :
It is used to delete the formula referred to by Id from
the KB and from the indexes. This operation, combined with a superposition call, can be used to realize transformation processes on the KB. Consider
for example the standard CNF transformation. It
replaces a formula with a (satisfiability) equivalent
set of clauses. We can model this by calling an inference rule with only one premise to generate the set
of clauses and then delete the premise. As a matter
of fact we think of this operation as reversible. See
the next operation.
• (TPE.5) - undelete(+Id)
It is called to recover an earlier deletion of a formula. We can think of it as a special inference
rule that uncovers a formula. It can be useful in
adaptive searches. Suppose for example we are using a weighting strategy to discard newly generated
formulas if they exceed a fixed weight. Using the
delete/1 call we can simply hide the formula from
the KB and the indexes and later recover it if, for
example, the search ends with a consistency status.
• (TPE.l) - enable(+Rule)
• (TPE.2) - disable(+Rule)
A TPE is thought to offer a set of inference and
reduction rules, each referred with a name. An IP
will then apply the set of all the active inference
rules, and the RP will only use the active reduction
rules. These two functions are used to control the
activity sets. For simplicity we assume the calls can
also accept a list of rule names.
As a matter of fact the indexes on the formula KI3
have a dominant role for understanding the entire idea.
In the next section we will make clearer this role.
3
Its purpose is to activate the IP. It will superpose
the formula referred to by I d} on the one referred
to by Id 2 using all the active inference rules. We
use the concept of superposition because it implies
the ordering of the arguments, which is sometimes
required. In this respect the general form of a single
inference (as well as reduction) rule is thought of as
meaning that this rule takes as premises two formula
references and produces a set of new references associated to the formulas resulting from the actual
application. Consider for example the binary resolution inference rule. It takes two clauses and generates a set of resolvents. So, if we consider the clausal
formulas referred to by Id 1 and by Id 2 , the reference
zSDL: a strategy description
language
Indexes, as sets of references to KB's formulas, are the
basic objects of the language zSDL, which uses id-sets as
the basic elements to refer to nodes and to describe the
visit of the search tree.
The underlying idea is that an SDL requires some
mechanism to represent a proof tree, for the ideal search
strategy for proving a given theorem is the description
of the precise steps the reasoning module must follow
to reach the proof nodes in the search tree. Wi th an
SDL we must be able to speak about the nodes of the
tree (the formulas) and the relations between them (how
to reach the parents of each node, following the ancestor relation, as well as how to reach the children of a
node, following the descendants relation). Another useful property might be the ability to know the level of
a tree node, in order to define a (partial) ordering between the steps made to reach the proof (a sequence of
parallelizable steps).
636
From these observations we chose to use sets of nodes
as the basic description objects. And zSDL turned out
to be, in some sense, a sets-operations oriented language.
We will refer to a generic zSDL set of references to formulas to mean either an id-set or an index. A set is referred
by a (unique) name. It is something like a variable of
type id-set.
In zSDL we can apply to the id-sets all of the common operations and relations on sets, plus some special
(procedural) ones like assignments, evaluation, etc. The
following is a list of these functions, giving in addition
some of the syntax of zSDL (recall that it is a Prolog byproduct). In zSDL an id-set is represented as a Prolog
list.
The Prolog variable names implicitly define the types
of the operators in the following way:
SelName: the name of the variable that refers to the set.
SelExpr: an expression on sets, which can be an explicit
set (list), a SetNarne or an expression built up using
the defined operations.
Var: a Prolog non-instantiated variable.
ElemOr Var: a Prolog variable (Var) eventually instantiated (Elem).
Notice that the SetExpr are evaluated.
As an example, in a zSDL-Prolog session we could
have:
I ?- a :- [1,2,3],
b .• a .- [3,4,5].
yes
I ?- A
B
X
a,
b,
b .• [6].
A" [1.2.3].
B .. [1.2].
X
[1,2,6]
in which you see how zSDL sets are permanent objects,
contrary to the classical Prolog variables.
This level of basic operations on (id- )sets must be enriched by statements to permit interaction with the TPE.
We will show the basic calls zSDL defines to run an IP
by developing the Prolog code that can realize it.
We are looking for a statement responsible for executing the actual inference steps applicable on some given
id-sets. Consider the zSDL syntax
o
(zSDL.4) - directed superposition
+SetExprA ++> +SetExprB
After the evaluation of the id-set expressions the general form of a call can be thought of as
o (zSDL.l) - set operations:
••
+5'etExpr A
+SetExprB % union
+5'etExprA .+ +SetExprB % weak union
+5'etExpr A .* +SetExprB % intersection
+5'etExpr A - +SetExprB % difference
Obviously we expect this search to consider all the
pairs, i.e. the TPE must be directed to try all the following superpositions:
The weak union makes no checks on repetitions.
, < A},B2 >, ... ,
< A 2 , B} >, < A 2 , B2 >, ... ,
o (zSD L.2) - ,'et relations :
? ElernOrVar .? +SetExpr
+5'etExpr A . -< +SetExprB
+5'etExpr A . < +SetExprB
+5'etExprA . +5'etExprB
II:
% membership
% strict containment
% equality
Notice that, using the Prolog negation, we also have
the negations of these relations
o
(zSDL.3) - set procedures:
+5'etN arne : +5'etExpr
- Var . +SetN arne
- Var " +SetExpr
.. +5'etN arne
II:
... ,
% containment
% assignment
% extract 1st element
% evaluate
% destroy the set
The pop operation treats the set as a stack.
This can be realized by the following straightforward
Prolog code:
SetExprA ++> SetExprB
Ai .? SetExprA,
Bj .? SetExprB,
superpose(Ai,Bj),
stop_search.
SetExprA ++> SetExprB.
The only new predicate we used is stop_search/O. In
fact, one omitted item in the TPE interface we have observed is a test to control the status of the KB. Therefore,
we extend the TPE interface with
637
• (TPE.6) - prooLfoundC-Int) :
Used to ask the status of the KB. The number of
found proof(s) is given.
You can think of stop_search/O as built from a
proof _found/ 1 call followed by an appropriate comparison and by any other (eventually) necessary operations.
In addition to the ++>/2 operator, zSDL also defines
the syntax
<>
It asks the TPE to release a new dynamic index
that will be updated during the execution of the given
TPE_Goal to hold the result of the evaluation. This
result is then properly assigned to the input Given argument and finally the dynamic index is cleared. This
asks for the extension of the TPE interface with the two
following calls
• (TPE.7) - new_dynamicjndexC-SetName)
Ask the TPE to extend the sets of active indexes.
SetName will be used to refer to this new dynamic
id-set. The complementary call is
(zSDL.5) - superposition
+SetExprA <+> +SetExprB
With the <+>/2 operator each couple is also reversed
(except for the ones).
As we commented, the general form of an inference
rule in zSDL is thought to be
• (TPE.8) - deLdynamicjndexC+SetName)
It is used to remove the index referred by SetName
from the set of the dynamic indexes known by the
TPE.
Id},Id 2
The actual application of such a rule is called by
With the new zSDL operator we can now use the following statement to sketch the application of an inference
rule:
NewIds ::- [Id}] ++> [Id 2 ].
The first missing item is a way to get, in a zSDL program, the id-set of the generated formulas. With a typical Prolog attitude, we can generalize this problem.
A superposition goal on id-sets is like evaluating a
high-level function on a set. The relation that links the
input and the output sets is different from the classical
ones, for it is related to some properties of the objects
in the sets and not to the sets themselves. This simply
implies that the actual module responsible of the evaluation of these relations is not the classical one. And
we know that that module must be the TPE. So we are
looking for a syntax like
<>
(zSDL.6) :
?Index ::- +TPE_Goal,
where aT P E_Goal can be, as an example, a superposition call. Notice that we defined the new operator: : =/2
in order to switch the evaluation to the right module.
The call also suggests a possible model for the computation of the goal. In fact a goal of the TPE is generally
requested to produce a new index (say a dynamic index)
that is updated during the actual evaluation of the goal.
Consider the following code.
Given ::- TPE_Goal :new_dynamic_indexCNewSet),
callCTPE_Goal),
( var(Given),
Given .. NewSet
Given :- NewSet ),
del_dynamic_indexCNewSet).
where NewIds will be instantiated to the right instance of [N}, N 2 , ••• , N m ], even possibly the empty idset. Notice that the: : =/2 operator works for each TPE
goal.
The last extension we will give before going through
some examples of an application of zSDL focus on a way
to have a local specification of the inference rules we wish
to apply in a TPE goal. The zSDL syntax is:
<>
(zSDL.7) :
+TPE_Goal ./ +Inferences,
defines a TPE evaluation modulo a given set of inference rules.
Suppose for example we wish to superpose clauses 3
and 15 only by binary resolution (binary _res). Consider
the following code
TPE_Goal .f Inferences
Active .. enabled_inferences,
disable(Active),
enable(Inferences).
call(TPE_Goal) •
disable(Inferences),
enable(Active).
With this new operator we can express the preceding
problem as
Resolvents :: - [3] ++> [15] ./ binary _res.
638
The code we have given assumes that the enable/1
and disable/1 calls in the TPE interface maintain one
set, called enabled_inferences, collecting the names of
the active inference rules.
So in zSDL the more general IP activation call to the
TPE is
NewIds ::- EXprA <+> EXprA ./ Infs.
which will give in NewIds the id-set of all the formulas derivable by applying the chosen inferences to all
the pairs of formulas implicitly referred to by the id-set
expressions.
4
A simple zSDL program: the
breadth-first strategy
Time has come to give the first example of the use of
zSDL to describe a classical strategy: the breadth-first
search. We suppose that the TPE is already active and
some input formulas are present in the KB. An index
called input collects the references to those statements.
In the breadth-first search the next level of the tree is
filled with all the conclusions given by superposing the
last level with all the existing levels. The search stops
with complete search or, for example, with a proof. The
zSDL program is
breadth_first .levels :- input,
last
:- input,
while( ( \+ stop_search,
\+ last . = [] ),
( Next ::- last <+> levels,
last :- Next,
levels .- last) ).
The two indexes, levels and last, refer to the entire
tree and to its last level, respectively. The vhile/2 is the
classical cyclic structure you found in each procedural
language. Its syntax is
o
{zSDL.8} - while(+Condition,+Goal)
After the initialization of the values to the input references, the program repeatedly fills the Next level of
the search tree, superposing the last level with all the
nodes. Then the Next level becomes the last and is also
added to the references of the entire tree. The.- notation resemble the C language style assignments. Similarly zSDL accepts the operators +-, -a and .11:. Notice
also that the instances of the Prolog variable(s) in the
vhile/2 statement are released between the cycles.
The preceding algorithm can be improved by thinking
of the cases it generates. When we superpose the last
level with the entire tree, we must note that all of the
nodes in last are already in levels. Furthermore, if we
apply the <+> operator to superpose an id-set on itself,
we try all of the pairs twice. So, a better program is
breadth_first :last
:- input,
others :- [],
while( ( \+ stop_search,
\+ last . - [] ),
LL ::- last ++> last,
LO ::- last <+> others,
others :- last .+ others,
last :- LL .+ LO ) ).
In this definition the last index refers again to the
last level of the tree while others refers to the rest of the
tree. At each step last is superposed on itself (with the
oriented operation ++» and then with the upper levels
of the tree. You might also note that in this way we
can substitute the use of the standard union with the
weak one (append) as no repetitions are possible in the
references in the indexes.
In addition to the while statement, zSDL defines some
other basic control structure:
o {zSDL.9} - foreach(+Generator, +Goal) :
Goal is executed for all the solutions of the given
Generator.
o {zSDL.I0} - repeat (+Goal, +Condition)
Goal is executed at least once and re-executed while
Condition fails.
o (zSDL.l1) - iF( +Condition, +Goal)
Goal is executed only if the Condition holds. It always succeeds.
This list is given only for completeness: the reader
might note that zSD L programs are basically extended
Prolog programs and that all the structures definable
on the underlying Prolog machine can be used by zSDL
programs.
However, we think that one real important aspect of
the < TPE,zSDL > Prolog-based architecture comes
from its direct executabilty on a Prolog machine. The
global proving system loses the property to be batch or
interactive: a proof search is directed by the execution
of goals, and the granularity of these steps can vary from
the single superposition to the entire search.
639
5
More complex applications
The availability of a language like zSDL adds to the ease
of implementing and experimenting with new ideas, for
example, non-standard search strategies. To illustrate
the value of using of zSDL, and to introduce some additional features of this language, we now focus on three
somewhat complex programs. The first defines an adaptive, weighting-based, search strategy. The second introduces some atypical deletion strategy into the search.
The last one shows how to define a strategy (oriented)
tailored to a given inference rule.
5.1
A weight-based adaptive strategy
By weighting (w) strategies we refer to those algorithms
structured to consider the length, or weight, of the formulas. Examples of w-functions are: the number of symbols in a formula, the number of (positive, negative, total) literals in a clause, as well as linear functions built
on these or other values. The general behavior of a wstrategy is to filter the retention in the KB of a newly
generated formula, according to the given w-function.
Formulas that are too heavy are discarded. The underlying intuitive idea is that if a proof can be obtained
without the use of heavy formulas, then such formulas
can be discarded.
We shall not consider the well-known subproblems
that the subsumption operation can lead to, which vary
with the w-function adopted. Instead, we consider one
of the practical difficulties in the application of these
strategies, namely, choosing the appropriate threshold
(upper bound on weights) to use for deciding which formulas to discard. The solution we propose follows this
simple idea: the threshold can be increased, when the
search stops generating formulas, and set to the lightest
weight template in the set of the w-deleted formulas. In
this sense the search is adaptive: it adapts to the performance of the program.
Let us first show the mechanisms provided by the
TPE to support w-strategies. Each formula is stored
with a weight template. An internal function, namely,
weight (+Formula ,-W _Template), is used by the TPE
to calculate it. Such a template consists of a 4-integers
tuple (N -P-T-S) that counts Negative.Literals, Positive.Literals, Total.Literals and Symbols, where the first
three values are "0" if the formula is not a clause. The
TPE offers some calls in order to define weighting-based
strategies:
• (TPE.9) - max_weights(?W _Template)
The call can be used both to access the current reference w-template (if W_Template is an uninstantiated variable at the call) or to set a new value for
it. The new given W_Template will be used by the
w-filter operation to decide which new formulas to
accept or discard. All the values for the new formulas must be less or equal to the threshold ones fixed
by the given W_Template. The value of a variable
will be considered greater than each integer.
• (TPE.I0j - lormula_weight(+Id, -W..Template) :
Accesses the given formula(s) to get their weight
tern plate( s).
The basic behavior of the strategy we are going to
write is straightforward. At each time we choose the
lightest not yet used formula in the KB to be superposed
with all the already used ones. Than we move the given
formula to the set of the used ones (say "done") while
the new generated formulas are added to the first set (say
"to_do"). We can express this with the following zSDL
program:
to_do :- [],
done
.• [],
Input
input,
add_ordered (Input ,to_do) ,
while( ( \+ stop_search,
\+ to_do .- [] ),
Lightest .- to_do,
add_ordered([Lightest] ,done),
New ::- [Lightest] <+> done,
add_ordered (New ,to_do) ) ).
As one sees, we solved the problem of getting the
lightest formula in a set by extracting the first element
from an ordered set. The expected side effect of an
add_ordered(Set,SetName) call is to build an ordered
union of Set and SetName (into SetName) according to
the weight of the corresponding formulas. We can obtain
this with:
add_ordered([] ,_SetName).
add_ordered(Set,SetName) :Xet .. SetName,
XX gets a list of Count-Id pairs
get_counts(Set,SetCs),
get_counts(Xet,XetCs),
append(SetCs,XetCs,YetCs),
XX sorts by counts
keysort(YetCs,ZetCs),
XX removes the counts
pop_counts(ZetCs,Zet),
Set Name :- Zet .
where the get_counts/2 call accesses the weightstemplate of the formulas to get the symbol counts (obviously, one can choose different approaches).
To extend our strategy to be self-adaptive we have to
solve certain problems:
640
*
*
how to get information on the deleted formulas;
how to choose some initial value for the reference
w-template.
The first problem rests entirely on the TPE behavior, as the "over-weight" deletions are embedded into its
operations. Our system maintains a set of structures,
indexed by weights-template, to have the references to
the deleted formulas. The call
• (TPE.ll) - queue(wdel(?W _Template), ?Queue)
Queue holds the ids of all the deleted formulas shar-
ing the same it W _Template.
We first give the extended program that realizes the
self-adaptive search, and then we discuss its main steps.
self_adaptive :input_weighting,
to_do .• [],
done :- [],
Input .. input,
add_ordered(Input,to_do) ,
while( ( \+ stop_search,
( \+ to_do .= []
q_exists(wdel(_»
»,
once ( to_do .= [],
lightest_deleted(Count) ,
closest_wtemplate(Count,NewWT),
max_weights (NewWT),
deleted :- [],
add_deleted(Count,deleted),
Unhide .. deleted,
Restored ::= undelete (Unhide),
add_ordered(Restored,to_do)
Lightest .- to_do,
add_ordered([Lightest] ,done),
New ::= [Lightest] <+> done,
add_ordered(New,to_do) ) ).
The first difference concerns the while condition: it
now considers the possible presence of formulas deleted
by weight, so the search is complete only if no deleted
formulas remain. The lightest_deleted/l call accesses
the deletion queue, searching for the lightest-weight formula. Its definition can be:
lightest_deleted(Count)
setof( SymCount,
queue(wdel(N-P-L-SymCount),Q),
Deleted ),
sort (Deleted, [Countl_Others]).
The closest_wtemplate/2 call is responsible for deciding the value for the new reference weights-template,
or, in other words, for the "size of the adaptation-step".
The following definition builds the new template in order
to accept all the formulas with the given deleted smallest
symbol count.
closest_wtemplate(Count,Template) .setof( N-P-L-Count,
queue(wdel(N-P-L-Count),Q),
Deleted ),
max_weights (CurrentWT),
max4([CurrentWTIDeleted] ,Template) .
where the call to max4/2 builds the Template given by
the maximal values for each count.
The add_deleted/2 call is conceptually similar to the
add_ordered/2 call, but works with the deletion queue.
Its definition can be:
add_deleted(Count,SetName)
( queue(wdel(N-P-L-Count),Queue),
SetName += Queue,
q_del(wdel(N-P-L-Count»,
fail
true ).
It collects into SetName all the references to the deleted
formulas with the given symbol Count and deletes the
corresponding queue (q_del/l).
So, in the while loop of our program, the to_do idset is extended either by newly inferred formulas or by
reactivating the lightest deleted ones (if any).
A last point addresses the choice of the initial values
for the reference w-template. A strategy that has given
us interesting results fixes the values by looking at the
counts of the input formulas and choosing the lowest
values among them. Its definition is:
input_weighting :Input .. input,
formula_weight(Input,WTs),
max4(WTs,Template) ,
max_weights(Template).
5.2
A pruning strategy
This second example of the applications of the zSDL language is given to show how it can be used to define some
self-analytical activity for the proving process. In other
words we can use it to reason about the current state of
the search during the execution.
A well-known problem each ATP program must face
is the possible explosion of the search space, which can
occur for various reasons. Here we do not study this
topic, nor do we suggest that our program has a deep
impact on the solution of the general problem. Our goal
is only to show how an SDL language can be useful in
different research areas of ATP.
641
We observe that our pruning strategy is based on the
addresses of the undeterminism in the order of application of the inference steps. On the other hand, the use of
reduction rules comes from the wish to have a KB capture the same logical consequences with a smaller possible representation "size". Consider now a generic search
process and suppose a reduction step occurs. With "reduction" we will refer to the results of an operation able
to change the structure of a formula, maintaining its logical value. Generally speaking a reduction step will reformulate a formula by "reducing" its complexity and/or
size. This transformation will in general involve other
formulas used as a base for the logical reformulation. As
an example, consider the following steps on two generic
clauses
[1] -,A I B, [2] -,A I -,B I C binary resolution
[3] -,A I C subsumption
delete 2
We can view this step as the application of a reduction rule that uses [1] to transform [2] into [3]. We note
that the satisfiability of the overall KB is preserved, i.e.
the operation maintains the logical truth of the set of
formulas.
Suppose next that such a reduction has occurred during a search, say a formula F has been reduced to F'.
There now exists a potential set of formulas whose generation depends on the order in which the search process
has been executed: this set consists of all of the descendants of F that have not contributed to the generation
of F', or, more precisely, the set
by_inference{ descendants{F)) - ancestors{F').
(We note that we must leave all the descendants of F
given by reduction as those are formulas originally not
in the set generated by F).
Pruning this set (if not empty) could perhaps make
the proof longer, as the proof could be reachable rapidly
by using one of the formulas we deleted, but it will not
preclude the possibility of finding the proof if there is
one.
The effectiveness of this pruning strategy depends
mainly on the effectiveness and the applicability of reduction steps in a proof, and so it relies directly on the
structure of the search space (given by the formulas asserting the theorem).
Let us now see how we can implement this operation
by using zSDL and the mechanisms of the TPE. First
of all we formalize the calls the TPE defines (and zSDL
inherits) to access various relations on the content of the
KB. We already announced some of them in section 2.
• (TPE.12) - parentsC+ld,-Parents)
• (TPE.13) - ancestorsC+ld,-Ancestors)
• (TPE.14) - childrenC+ld,-Children)
• (TPE.15) - descendantsC+ld, -Descendants)
Being I d the reference to a formula, these calls
will respectively return the id-set of its parents,
ancestors, children, and descendants, with respect
to the current KB. We note that the given id-set
may contain references to currently inactive formulas (deleted for some reason). All these relations will
consider both inference as well as reduction steps.
• (TPE.16) - by..reductionC+ldSet,-ByRed) :
Given an IdSet this call selects which referred formulas have been produced by application of a reduction rule, building the id-set ByRed with their
ids.
• (TPE.17) - replaceC?Newld, ?Id)
The call succeeds if New I d refers to a formula that
replaces an old one (referred by I d) following a reduction step. Otherwise the call fails.
The proposed pruning strategy acts like a filter on the
result of a superposition call: at each step it checks if
the new formulas are given by reduction, in which case
it tries to apply the deletion. So, we are going to extend
the superposition control level of zSDL with a meta-call
realizing the pruning.
pruning_deriveCSetA,Mode,SetB)
XA .? SetA,
XB .? SetB,
once C
Given::- deriveC[XA] ,Mode,[XB]),
by_reductionCGiven,ByRed),
foreach( Nld .? ByRed, C
replace(Nld,Id),
ancestors(Nld,NldAnc),
descendants(Id,IdDes),
by_reduction(IdDes, IdDesByRed),
IdDesBylnf .. IdDes .- IdDesByRed,
DelSet .. IdDesBylnf .- NldAnc,
delete(DelSet) ) ) ),
stop_search.
pruning_derive(SetA,Mode,SetB).
derive(SetA,«+»,SetB)
derive(SetA,(++»,SetB)
SetA <+> SetB.
SetA ++> SetB.
The schema is quite simple. Each by-reduction child
(Nld) of a superposition call is related to the formula it
replaces (Id). Then the set of the by-inference descendant of Id is reduced by the set of the Nld ancestors. Notice how the byjnference(descendant(F)) set is evaluable a.'> desceTldants(F) - by_reductions( descendants{F )).
642
5.3
A hyperresolution-oriented search
strategy
Our last example uses zSDL to define a strategy specifically oriented to work with a given inference rule, namely,
the inference rule hyperresolution.
The efficiency of an ATP system comes from the efficiency of all of the different components of the program,
from the basic unification and match algorithms to the
KB management, and so on. With some "tough" inference rule, it also heavily relies on the ability of the search
strategy to control its application ensuring a complete
search without repeating steps. Hyperresolution is one
such inference rule.
.
Hyperresolution considers a basic clause (called nucleus) that has one or more negative literals. An inference step occurs when a set of positive unit clauses
(called satellites) is found that simultaneously unify with
all of the negative literals of the nucleus. It is simple
to see how hyperresolution will not generate new nuclei
(for the rule cannot produce a clause containing negative literals) while it can generate new satellites. So,
the set of potential satellites change dynamically during
the search, and a good strategy must ensure a complete
covering partition (with multiple occurrences) of this set
without repeating trials.
We first explain how we implemented the hyperresolution inference rule in our system (we call it hy_p).
As usual the. rule has two arguments: the first must be
a satellite and the second a nucleus. If a unification is
found between the given satellite and one of the negative
literals in the nucleus, then the set of the current active
satellites is partitioned and superposed on the remaining
negative literals.
This behavior suggests the development of a search
strategy driven by the generation of new satellites. In
fact, we can visit the search space by levels, generate all
the possible hyperresolvents, choose from them the new
satellites, and use those to drive the search in the next
level. As those satellites are new, the partitions we will
try are new too, and no repetition in the trials occur.
The basic shape of the strategy can be:
hyper_strategy :Input .. input,
get_satellites(Input,Sats),
get_nuclei(Input,Nucs),
last_sats := Sats,
nucs :- Nucs,
while( ( \+ stop_search,
\+ last_sats .K [ ] ) , (
New ::- last_sats ++> nucs ./ hy_p,
get_satellites(New,NewSats),
last_sats := NewSats ) ).
The get_satellites/2 and get..lluclei/2 calls are
used to choose from an id-set the subset of formula-
references corresponding, respectively, to valid satellites
and nuclei. Notice how these calls can be defined by using the formula_weight/2 call and testing the negative
and positive literal counts accordingly.
As a matter of fact, the algorithm we have given follows closely the general schema of a breadth-first search.
So, it can be simply extended to consider the application
of more inference rules, intermixing the searches with the
control, the enable/i, and the disable/i operations
permitted.
6
Conclusions
This work introduces the concepts of Theorem Proving
Engine and Strategy Description Language as architectural and applicative tools in the design and use of an
automated theorem-proving program.
The definitions we give reflect a running Prolog system, named zEN2, and" because of this fact, they inherit
a Prolog style structure.
Particular emphasis is given to the use of an SDL as a
research tool as well as a way to reinterpret the use of a
theorem prover as a batch or as an interactive program.
In fact, the availability of an interpreter for such a language offers the possibility of having a system able to
cover both of these usages, giving to the user some way
of choosing the granularity of the steps the prover must
take.
Three examples are given to show the possible application of an SOL. Their purpose is to show its usefulness
for expressing and testing new ideas. Some interesting
capabilities of zSDL are applied to highlight how it allows the treatment of self-analysis on the state of the
search space. Examples of these are the definition of the
self-adaptive search and the pruning strategy.
Acknowledgments
The author is very grateful to Larry Wos, Bill McCune
and Gianni Degli Antoni for their comments. This work
was partially supported by the CEE ESPRIT2 KWICK
Project and partially by a grant of the Italian Research
Council. Most part of the work was done while the author was visiting the Mathematics and Computer Science
Division of the Argonne National Laboratory.
References
[Henschen et al. 1974 ] L.Henschen, R.Overbeek and
L. Wos. A Theorem Proving Language for Experimentation. Communications of the A eM, Vol. 17
No.6 (1974)
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
643
A New Algorithm for Subsumption Test
Byeong Man Kim*, Sang Ho Lee**, Seung Ryoul Maeng*, and Jung Wan Cho*
* Department of Computer Science & Center for Artificial Intelligence Research
Korea Advanced Institute of Science and Technology, Dae-Jeon, Korea
** Database Section
Electronics and Telecommunications Research Institute, Dae-Jeon, Korea
Abstract
To reduce the number of generated clauses in resolutionbased deduction systems, subsumption has been around
quite for a long time in the automated reasoning community. It is well-known that use of the subsumption
sharply improves the effectiveness of theorem proving.
However, subsumption tests can be very expensive because they should be applied repeatedly and are relatively slow. There have been several researches to overcome the expensiveness of subsumption. One of them is
the s-link test based on the connection graph procedure.
In the s-link test, it is essential to find a set of pairwise
strongly compatible matching substitutions between literals in two clauses. This paper presents an improved
algorithm of the s-link test with a new object, called
strongly compatible list. By use of the strongly compatible lists and appropriate bit operations on them, the
proposed algorithm reduces the possible combinations
of matching substitutions between literals as well as improves the pairwise strongly compatible test itself. Two
other subsumption algorithms and our algorithm are analyzed in terms of the estimated maximal number of
string comparisons. Our analysis shows that the worstcase time complexity of our algorithm is much lower
than the other algorithms.
1
Introduction
Logical Reasoning (or theorem proving) is the key to
solving many puzzles, to solving problems in mathematics, to designing electronic circuits, to verifying programs, and to answering queries in deduction systems.
Logical reasoning is a process of drawing conclusions
that follows logically from the supplied facts. Since the
first-order predicate logic is generally sufficient for logical reasoning and offers the advantage of being partially
decidable, it is widely used in automated reasoning.
There have been a number of approaches to show
that a formula is a logical consequence of a set of
formulas. Notable among them is Robinson's resolution principle [Robinson 1965] which is very powerful and uses only one inference rule. Many refinements of the resolution principle based on graph have
been proposed to increase the efficiency [Kowalski 1975,
Sickel 1976, Andrew 1981, Bibel1981, Kowalski 1979].
One of them is Kowalski's connection graph proof procedure [Kowalski 1975, Kowalski 1979] which has some
distinct advantages over previous approaches based
upon resolution.
1. Once an initial connection graph is constructed
all information is present as to which literals are
potentially resolvable so that no further search for
unifiable complementary literals is needed.
2. Application of a deletion operation can result in
further deletion operations, thus potentially leading. to a snowball effect which reduces the graph
rapIdly. The probability of this effect rises with
the number of deletion rules available.
3. The presence of the complete search space during
connection graph proof procedure suggests the opportunity to use parallel evaluation strategies [Loganantharaj 1986,Loganantharaj 1987,Juang 1988]
to improve the efficiency.
Various deletion strategies [Munch 1988 Gottlob and
Leitsch 1985,Chang and Lee 1973] are suggested to re~uce the number of cla?ses generated in theorem provmg (automated reasomng). A very powerful deletion
rule in resolution-based deduction systems is the subsumption [Eisinger 1981, Wos 1986]. The subsumption
is used not only to discard a newly deduced clause when
a copy already has been retained, but also to discard
other types of unneeded information. The use of subsumption sharply improves the effectiveness of theorem
proving, as illustrated by the benchmark problem, Sam's
Lemma [Wos 1986].
However, the use of subsumption can be quite expensive because it must be repeated very often and is
relatively slow [Wos 1986]. There have been two approaches for overcoming the expensiveness of subsumpt~on. One is t.o.reduce the number of necessary subsumptIon tests [ElSlnger 1981], and the other is to improve
the subsumption test itself [Gottlob and Leitsch 1985
Stillman 1973]. Eisinger [Eisinger 1981] proposes the s~
link test which is based on the principal ideas of the connection graph proof procedure. His method provides an
efficient preselection which singles out clauses D that do
not possess the appropriate links to the clause C. Having
preselect~d t~e candida~es, we need to compose matchmg substItutIOns from lIterals in clause C to literals in
clause D to find a matcher () from C to D. In some cases
many compositions are possible and hence the search
for () becomes quite expensive. Socher [Socher 1988] improves the search procedure by imposing restrictions on
the possible matching substitutions.
In this paper we propose an improved s-link test with
a new object, called strongly compatible list. By use of
the strongly compatible lists and appropriate bit operations on them, the proposed algorithm reduces the
644
possible combinations of matching substitutions between literals as well as improves the pairwise strongly
compatible test itself. Two subsumption algorithms
(Eisinger, Socher) and our algorithm are analyzed in
terms of the estimated maximal number of string comparisons. Our analysis shows that the worst-case time
complexity of our algorithm is much lower than the other
algorithms.
In the next chapter, preliminary definitions and the
s-link test are presented. A new subsumption algorithm
based on strongly compatible lists and its related works
and analysis are given in Chapter 3 and Chapter 4, respectively. In Chapter 5, our works are summarized.
2
Preliminaries
We assume that the readers are familiar with materials
in [Chang and Lee 1973]. A variable starts with an upper case letter and a constant starts with a lower case
letter.
Definition 2.1 A substitution
variables to terms.
(7
IS
a mapping from
We represent a substitution (7 with Si(7 = ti for each
i (1 :::; i :::; n) by the set of pairs {td SI,' .. ,tn/ sn}, and
represent the composition of substitution of (7 and T by
0' - T. For convenience, we denote (71 •••• - (7 n by -£=1 (7i.
Definition 2.2 Two substitutions
compatible, if 0' • T = T • (7.
(7
and
T
are strongly
Definition 2.3 Substitutions (7I,' •. ,(7n are pairwisestTongly compatible, if any two substitutions (7i,O'j E
{0'1' ... ,(7n} are strongly compatible.
Definition 2.4 A matching substitution from a term
(or a literal) s to a term (or a literal, respectively) t
is a substitution Il such that SIl = t.
Definition 2.5 uni( C, Ii, D) is a set of all matching
substitutions mapping a literal Ii in clause C onto some
literal in clause D.
en
For example, given C = {p(X, Y), q(Y,
and D =
P(a,b),p(b,a),q(a,c
we have uni(C,p(X,Y),D) =
{a/X,b/Y},{b/X,a/Y}} and uni(C, q(Y,c), D) =
{a/Y}}.
l
n,
Definition 2.6 If there is a T with {} = (7 _ T for any
other unifier {} for sand t, (7 is a most general unifier
(mgu) for sand t.
To reduce the search space in theorem proving, redundant clauses must be removed. The redundant clause
means a clause whose removal does not affect the unsatisfiability. The redundant clause includes a tautology or
a subsumed clause. The subsumption can be defined in
two ways.
Definition 2.7 A clause C l subsumes another clause
C2 if C l logically implies C2 .
Definition 2.8 A clause C 1 {}-subsumes another clause
IGII :::; IG2 1 and there is a substitution {} such that
G1 {} ~ C 2 •
G2 if
It has been shown [Gottlob and Leitsch 1985,Loveland 1978] that these two definitions are not equivalent. If we use the first definition, then most of the
resolution-based proof procedures are not complete because a clause always subsume its factors. In this paper
we are concerned only with the {}-subsumption.
In order to perform a subsumption test on given
two clauses, we must find a matcher 0 such that CO
~ D.
It is well known that finding such {} is NPcomplete [Gottlob and Leitsch 1985] and the search for
{} may become expensive. There have been some efforts
to reduce the cost of finding a matcher {} [Gottlob and
Leitsch 1985,Socher 1988,Chang and Lee 1973,Eisinger
19S1,Stillman 1973]. One of them is the s-link test based
on the connection graph procedure. The subsumption
test based on the s-link is provided by the following theorem [Eisinger 1981]:
Theorem 2.1 Let C = {/I, ... , In} and D be clauses.
Then C {}-subsumes D if and only if ICI S IDI and there
is an n-tuple ((7I,"" O'n) E x£=luni(C, Ii, D) such that
all (7i (1 :::; i :::; n) are pairwise strongly compatible.
Example 2.1 (of Theorem 2.1 [Socher 1988]) Given a
set {C, D l , D2 , D3} of clauses with C = {p(X, Y),
q(Y,
en, Dl = {p(a, c), r(b, en, en
D2 = {p(U, V), q(V, Wn
and D3 = {p(a, b), p(b,a), q(a,
one want to find out,
which clauses are subsumed by C. Dl can be excluded
because the literal q(Y, c) in C is not unifiable with any
literal in DI, that is, there is no s-link from q(Y, c) to a
literal in D l . D2 cannot be a candidate because uni( C,
q(Y, c), D2) = D. For D3 we obtain the two pairs ((7I,
T) and ((72, T), where (71 = {a/X, b/Y}, (72 = {b/X,
a/Y} and T = {a/Y}. From these two pairs only ((72,
T) is strongly compatible and thus C subsumes D 3. 0
As shown in Example 2.1, in order to find clauses
that are subsumed by a clause C = {II, ... , 1m}, first
we have to preselect clauses that are connected to every
literals in C by s-links of a connection graph. If D is
such clause then each literal in C is unifiable with some
literals in D. For such candidate D, we need to perform
a pairwise strongly compatible test on all elements of
X~l uni( C,
3
ii, D).
A New Subsumption Algorithm Based on Strongly Compatible Lists
The s-link test [Eisinger 1981] for long clauses with more
than one matching substitution for each literal may require an expensive search of all elements of the Cartesian
product.
We define the strongly compatible list of matching
substitutions in order to improve the s-link test. With
the strongly compatible lists, we can single out useless matching substitutions and improve the pairwise
strongly compatible test itself.
The following three bit operations are used in this
paper.
bitwise disjunction of WI and W2
bitwise conjunction of WI and W2
bitwise complementation
645
where
WI
by
Wi
is a bit sequence. For convenience, we denote
Similarly, we denote WI *... *'Wn
+... +'W n by +i=l 'Wi.
*i=l'Wi.
To test whether the given two matching substitutions
are strongly compatible, we need the following definition.
Definition 3.1 Let {VI,'" ,vn } be an ordered set of
variables in clause C, and a matching substitution 0" between literals in clauses C and D be {tIi Sl,' .. ,tmj Sm}.
8(0") is an n-length list such that the ith element is tj if
Vi = Sj, , a), 8(U4) = (>, ). We
can calculate the followmg strongly compatible lists of
matching substitutions:
f3IUIj
= PIla) * P2(» = 1011 * 1111 = 1011
f3 U2 = PI b) * P2 (» = 0111 * 1111 = 0111
f3 U3 = PI » * P2(a) = 1111 * 1110 = 1110
f3 U4 = PI » * P2(b) = 1111 * 1101 = 1101.
From this strongly compatible lists, we can obtain ,(UI)
and ,( (2) as follows:
,(UI) = f3(UI) * J.Li = 1011 * 0111 = 0011
,(U2) = f3(U2) * J.L~ = 0111 * 1011 = 0011.
Since ,(UI) * ,(U2) = 0, we obtain the relation UI ~q
U2. Similarly, we obtain the relation U3 ~q U4'
0
Theorem 3.2 Let C = ~it, "', 1m} and D be clauses,
U E uni( C, ik, D) and U E uni(C, i k, D) for some k
(1 ~ k ~ m), U ~q u' and Ui E uni(C, Ii, D) for each
i (1~ i ~m, i =f. k). If there is a 0 such that CO ~ D
and 0 = UI •... • Uk-I. U.Uk+! ... • Um, then there is a
0' such that CO' ~ D and 0' = UI •... • Uk-I. U'.Uk+!
••• • U m •
(Proof) Let us suppose that there is a 0 such that CO ~
D and 0 = UI • ... • Uk-I. U.Uk+I '" .Um. Then, Ut, ...
Uk-b u, Uk+! ... Um are pairwise strongly compatible
by Theorem 2.1. Since u' includes u, u' is strongly compatible with UI, .. ', Uk-I, Uk+I, .. ', u m. Thus UI, ".,
Uk-b u', Uk+!, .. ', Um are pairwise strongly compatible.
Therefore, there is a 0' such that CO' ~ D and 0' = UI.
... • Uk-I. U'.Uk+! ... • Um by Theorem 2.1.
0
By Theorem 3.2, we do not need perform a strongly
compatible test on the combinations of matching substitutions which contain a matching substitution UI such
that UI E uni(C, ii, D), U2 E uni(C, ii, D), and UI ~q
U2. In Example 3.5, we can remove UI and U3 because
U2 and U4 include UI and U3, respectively.
As Theorem 3.1 (useless theorem) and Theorem 3.2
(included theorem) suggest, we can remove the useless
or included matching substitution before we take a pairwise strongly compatible test. We call a matching substitution which is either useless or included unnecessary.
647
One phenomenon we want to point out is that a matching substitution becomes unnecessary due to the propagation of deletion, so needs to be deleted. Therefore we
should keep deleting unnecessary matching substitutions
until there is no more such matching substitution. For
examples, let 0"1, 0"2 and 0"3 be matching substitutions
from literals in C to literals in D, and let the number of
literals in C be 3. Suppose that 0"1 is strongly compatible with 0"2 and 0"3, and 0"2 is not strongly compatible
with 0"3. Then 0"1 is not a useless matching substitution.
However, the removal of useless matching substitutions,
0"2 and 0"3, causes 0"1 to be a useless matching substitution and thus it can be removed.
Let C = {II, ... , in}, and D clause. Then, in
the worst case O(n 2 ) strongly compatible tests will be
needed for each combination (O"I, ••• , O"n) E xi=l uni(C,
ii, D) in order to check C subsumes D. However, given
f3( O"i) we can enhance the performance of a subsumption
test by the following theorem.
Theorem 3.3 Let C = {it, ... , 1m} and D be clauses,
{0"1, ... , O"n} be a set of matching substitutions from
literals in C to literals in D, and {O"XI' ... , O"Xm} be a
subset of {O"I, ••• , O"n}. There is a f) = O"Xl e ... eO"Xm
such that Cf) ~ D and O"Xk Euni( C, lk, D) for each k
(1 :::; k :::; m) if only if *k=l f3(O"Xk) * +k=l I1xk n = +k=l
I1xk n.
(Proof) (+--) Since *k=l f3( O"Xk) * +k=l I1xk n = +k=l I1xJ< n,
by Lemma 3.2, O"XI' ... , O"Xm are strongly compatible
wi th each of {O" Xl' ••• , 0" Xm}. Therefore 0" Xl' ... , 0"Xm are
pairwise strongly compatible. Thus, by Theorem 2.1,
there is a () = O"XI e ... eO"Xm such that CO ~ D and O"Xk
E uni(C, Ik' D) for each k (1 :::; k :::; m)
(--+) By Theorem 2.1, O"XI' ... , O"Xm are pairwise strongly
compatible. Therefore, by Lemma 3.2, the Xi bit of
f3(O"Xk) for each i, k (1 :::; i, k :::; m) is 1. Thus *k=l
f3( O"Xk)
* +k=l
I1xk n = +k=l I1xk n.
0
Now we can formulate a new algorithm that returns
a pairwise stron~ly compatible set {O"I, ••• , O"m} such
that (O"I, ••• , O"m) E X~l uni(C, ii, D) if exists, otherwise return {}. The detail algorithm, Pairwise Strongly
Compatible Test (PSCT), is described in Figure 1 and it
can be summarized as follows:
1. Calculate the strongly compatible list for each
matching substitution.
2. Remove unnecessary matching substitutions until
there is no such matching substitution.
3. Find out an m-tuple
f3( O"k)
* +k=l
(O"I, ••• ,
O"m) such that *k=l
11k n = +k=l 11k n.
Example 3.6 Given C = {p(X, Y), r(Y, Z), s(X, Z)}
and D = {p(b,a), p(a, b), r(a,d), r(b,c), s(a,d), s(a,c)}
we want to find out a substitution f) such that Cf) C D.
Let {X, Y, Z} be an ordered set of variables in C. Then,
we can obtain that
Mp(x,y) = 110000, Mr(y,z) = 001100, Ms(x,z) = 000011,
13(0"1) = 101000, 13(0"2) = 010111, 13(0"3) = 101010,
13(0"4) = 010101, 13(0"5) = 011010, 13(0"6) = 010101.
Since j3( O"d * Ms(x,z) = 0, 0"1 is removed and thus the
strongly compatible lists are adjusted as follows:
13(0"2) = 010111,
13(0"5) = 011010,
13(0"3) = 001010,
13(0"6) = 010101.
13(0"4) = 010101,
Since 13(0"3) * Mp(X,y) = 0, 0'3 is useless. By further
removing the useless matching substitution 0"3, we can
obtain that
13(0"2) = 010111, 13(0"4) = 010101, 13(0"5) = 010010,
13(0"6) = 010101.
Since 13(0"5) * Mr(y,Z) = 0, 0"5 is useless and thus removed. Consequently we can obtain following strongly
compatible lists:
13(0"2) = 010101, 13(0"4) = 010101, 13(0"6) = 010101.
Since 13(0"2) * 13(0"4) * 13(0"6) * 010101 = 010101, 0"2, 0"4
and 0"6 are pairwise strongly compatible. Thus, there is
a substitution () = 0"2e0"4e0"6 = {ajX, bjY, cjZ}.
0
Our Algorithm PSCT
Input: clauses C = {II, ... , 1m} and D
Output: a pairwise strongly compatible set {0"1' ... , 0" m}
such that (O"l,···,O"m) E X~l uni(C, Ii, D)
1. Calculate 13(0") for all 0" E U~l uni(C, Ii, D).
2. Let I be an n-bit sequence such that all its bits are
o.
(a) for each O"k E U~l uni(C, Ii, D), if O"k is useless then
i. remove O"k.
ii. I = I + 11k n.
(b) for each O"k E U~l uni(C, Ii, D), if there is a
0"/ such that O"k :::;(7 0"/ then
i. remove O"k.
ii.I=I+l1kn.
(c) for each O"k E U~l uni(C, lj, D), f3(O"k)
f3(O"k)
* 7.
3. If uni(C, ii, D)
= {} for some i, then return
{}.
4. Repeat step 2",3 until there is no unnecessary
matching substitution.
5. For each m-tuple (O"i l , ..• , O"i m) where O"i k E uni(C,
h, D),
if*k=l f3(O"ik) * +k=ll1ik n = +k=ll1ik n, then return
{O"il,···,O"i m}·
6. return {}.
Figure 1: Algorithm PSCT
648
4
Related Works and Analysis
This section compares our algorithm with the two existing s-link tests, namely Eisinger's algorithm and
Socher's algorithm. The analysis is based on the number
of string comparisons to determine whether a clause C =
{11, ... , 1m} subsumes a clause D. To measure the complexity of three algorithms, we use the following symbols:
r: the maximal arity of predicate symbols occurring in
literals in clauses C and D.
Ne: the number of distinct variables in a literal in
clauses C and D.
N D : the number of distinct terms which are substituted
for a variable in clause C.
Ns: the number of strongly compatible tests needed to
see whether m matching substitutions between literals are pairwise strongly compatible.
N p : the number of pairwise strongly compatible tests
needed to find a matcher f) such that CO ~ D.
To simplify the analysis, we assume that the number of
matching substitutions in each uni( C, Ii, D) (1 :s; i :s;
m) is equal and let it be k.
In Eisinger's algorithm, subsumption tests for long
clauses with more than one matching substitution for
each literal may require an expensive search of all elements of the Cartesian product. Since compositions of
substitutions are needed to see whether two given substitutions are strongly compatible and O( N8) string comparisons are needed for each strongly compatible test,
O(NsN1J string comparisons are needed for each pairwise strongly compatible test. Thus O(NpNsN8) string
comparisons are needed for the subsumption test. Since
P :s; N p :s; k m , 1 :s; Ns :s; m(n;-l), and 1 :s; Ne :s; r,
in the worst case Ne = r, Ns = m(m - 1)/2 and N p
= km , so the worst-case time complexity of Eisinger's is
O(k m m 2r2).
Socher proposes an improvement of the s-link test
for subsumption of two clauses [Socher 1988]. He improves the search for 0 such that Cf) ~ D by imposing a
restriction on the possible matching substitutions. It is
based on the idea of giving the variables and literals of
a clause a characteristic property, which in fact denotes
information about the occurrences of variables in various argument positions of a literal. An order for these
characteristic is defined and it is shown that the order
is compatible with the matching substitution (J" from C
to D. Thus all matching substitutions that do not respect the order can be singled out. However, he does not
improve the pairwise strongly compatible test itself and
thus does not reduce the worst-case time complexity of
the s-link test.
In some cases Socher's algorithm can not single out a
matching substitution which is either useless or included
matching substitution. For example, let C = {p(X, Y),
q(Y,X)} and D = {p(a,e), p(b,d), q(e, b), p(d, a)} be
given. No matching substitution is singled out because
the characteristic matrices of literals p and q in Care
equal to those ones in D. However, all matching substitutions are useless in our approach, so no pairwise
strongly compatible test is performed.
By using strongly compatible lists and bit operations, we improve the pairwise strongly compatible test
and thus reduce the worst-case time complexity of the slink test. O(km 2 NeND) string comparisons are needed
to calculate all strongly compatible lists. O(m) bitconjunctions are needed for a pairwise strongly compatible test when m strongly compatible lists are given.
Thus, O(km 2 NeND) string comparisons and O(mNp )
bit-conjunctions are needed for a subsumption test.
Since P :s; N p :s; k m , 1 :s; Ne :s; r, and 1 :s; ND :s; km,
in the worst case N p = k m , Ne = rand ND = km, so
the worst-case time complexity is O( k 2 rm 3 string comparisons + mkm bit-conjunctions).
Let n be the ratio of the time complexity of a string
comparison to the time complexity of a bit-conjunction.
m
Then, in the case that Prm 3 is greater than mk , the
2
worst-case time complexity of our algorithm is
rm 3 )
and we can reduce the worst-case time complexity of
Eisinger's algorithm by O( km,:2r). In the other case,
the worst-time complexity of our algorithm is O( m~m)
and we can reduce the worst-case time complexity of
Eisinger's algorithm by O(mr 2 n).
O(k
5
Conclusions
Subsumption tests for long clauses with more than one
matching substitution for each literal may require an
excessive search for all elements in the Cartesian product. We have presented a new subsumption algorithm,
called PSCT algorithm, which has a lower worst-case
time complexity than the existing methods. The efficiency of our algorithm is based on the following facts.
1. Construction of strongly compatible lists allows
us to identify unnecessary matching substitutions
at the early stage of the subsumption test. Such
matching substitutions are removed and are not
involved at the actual pairwise strongly compatible test to come. This filtering process reduces
the number of possible combinations of matching
substitutions clearly.
2. As for the pairwise strongly compatible test itself,
the test is carried out efficiently due to the appropriate bit operations on the strongly compatible
lists which are already constructed.
The approaches [Socher 1988, Eisinger 1981] that actually compose the matching substitutions to check pairwise compatibility are considered to be slow and expensive. In most cases our approach outperforms others
[Socher 1988, Eisinger 1981] even though it may involve
the cost overhead for computation of the strongly compatible lists of matching substitutions. Furthermore, it
should be noted that our subsumption algorithm can be
used in general theorem proving approach even though
it is described in the context of the connection graph
proof procedure in this paper.
References
[Andrew 1981] P. B. Andrews, Theorem Proving via
General Matings, Journal of ACM 28 (2) (1981)
193-214.
649
[Bibel 1981] W. Bibel, On Matrices with Connections,
Journal of ACM 28 (4) (1981) 633-645.
[Chang and Lee 1973] C. L. Chang and R. C. T. Lee,
Symbolic Logic and Mechanical Theorem Proving
(Academic Press, New York, 1973).
[Eisinger 1981] N. Eisinger, Subsumption and Connection Graph, in: Proceedings of the IJCAI-81 (1981)
480-486.
[Gottlob and Leitsch 1985] G. Gottlob and A. Leitsch,
On the Efficiency of Subsumption Algorithms,
Journal of ACM 32 (2) (1985) 280-295.
[Juang et al. 1988] J. Y. Juang, T. 1. Huang, and E.
Freeman, Parallelism in Connection Graph-based
Logic Inference, in: Proceedings of the 1988 International Conference on Parallel Processing 2 (1988)
1-8.
[Kowalski 1975] R. Kowalski, A Proof Procedure using
Connection Graphs, Journal of ACM 22 (4) (1975)
572-595.
[Kowalski 1979] R. Kowalski, Logic for Problem Solving
(North Holland, Oxford, 1979).
[Loganantharaj 1986] R. Loganantharaj, Parallel Theorem Proving with Connection Graphs, in: Proceedings of 8th International Conference on Automated
Deduction (1986) 337-352.
[Loganantharaj 1987] R. Loganantharaj, Parallel Link
Resolution of Connection Graph Refutation and its
Implementation, in: Proceedings of International
Conference on Parallel Processing (1987) 154-157.
[Loveland 1978] D. Loveland, Automated Theorem
Proving: A Logical Basis (North-Holland, Amsterdam, 1978).
[Munch 1988] K. H. Munch, A New Reduction Rule for
the Connection Graph Proof Procedure, Journal of
Automated Reasoning 4 (1988) 425-444.
[Robinson 1965] J. A. Robinson, A Machine-Oriented
Logic Based on the Resolution Principle, Journal
of ACM 12 (1) (1965) 23-41.
[Sickel 1976] S. Sickel, A Search Technique for Clause
Interconnectivity Graphs, IEEE Transaction on
Computers 25 (8) (1976) 823-835.
[Socher 1988] R. Socher, A Subsumption Algorithm
Based on Characteristic Matrices, in: Proceedings
of 9th International Conference on Automated Deduction (1988) 573-581.
[Stillman 1973] R. B. Stillman, The Concept of Weak
Substitution in Theorem-Proving; Journal of ACM
20 (4) (1973) 648-667.
[Wos 1986] 1. Wos, Automated Reasoning: Basic Research Problems, Argonne National Laboratory,
Technical Memorandum No.67, March 1986.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
650
On the Duality of Abduction and Model Generation
Marc Denecker *
Danny De Schreye t
Department of Conlputer Science, K.U.Leuven,
Celestijnenlaan 200A, B-3001 Heverlee, Belgium.
e-mail: {marcd.dannyd}@cs.kuleuven.ac.be
Abstract
We present a duality relationship between abduction for
definite abductive programs and model generation 011
the only-if part of these programs. As was pointed out
by Console et all, abductive solutions for an abductive
program correspond to models of the only-if part. We
extend tIus observation by showing that the procedural semantics of abduction itself can be interpreted dually as a form of model generation on the only-if part.
Tlus model generation extends 5atcluno with an efficient
treatment of equality. It is illustrated how this duality
allows to improve current procedures for both abduction
and model generation by transferring teclmical results
known for one of these computational paradigms to the
other.
1
Introduction
The work we report on this paper was motivated by
some recent progress made in the field of Logic Programming to formalize abductive reasoning as logic deduction (see [Console et ai., 1991J and [Bry, 1990]). In
[Kowalski, 1991], R. Kowalski presents the intuition behind this approach. He considers the following simple
definite abductive logic program:
P = { wobbly-wheel +- flat-tyre.
wobbly-wheel +- broken-spokes.
flat-tyre +- J>.unctured-tube.
flat-tyre ~ leaky-valve. }
where the predicates broken-spokes, puuctured-tube and
leaky-valve are the abducibles. Given a query Q = +wobbly-wheel, abductive reasoning allows to infer the asswuptions:
51 = { pWlctured-tube },
52 = { leaky-valve}, and
53 = { broken-spokes} .
·supported by the Belgian "Diensten voor Programmatie van
Wetenschapsbeleid", under the contract RFO-AI-03
t supported by the Belgian National Fund for Scientific Research
These sets of assumptions are abductive solutions to the
given query +-Q in the sense that for each 5i, we have
that P U Si 1= Q.
Kowalski points out that we can equally well obtain
these solutions by deduction, if we first transform the
abductive program P U {Q} into a new logic theory T.
The transformation consists of taking the only-if part
of every defuution of a non-abducible predicate hI the
Clark-completion of P and by adding the negation of Q.
In the example, we obtain the (non-Horn) theory T:
T = { wobbly-wheel -+ flat-tyre, broken-spokes.
flat-tyre -+ punctured-tube, leaky-valve.
wobbly-wheel +}
Minimal models for this new theory Tare:
Ml = { wobbly-wheel, flat-tyre, puucturedtube },
M2 = { wobbly-wheel, flat-tyre, leaky-valve },
and
M3 = { wobbly-wheel, broken-spokes }.
Restricthlg these models to the atoms of the abducible
predicates only, we precisely obtain the three abductive
solutions 51, 52 and 53 of the original problem.
The above observation points to an hlteresting issue; namely the possibility of linking these dual declarative semantics by completely equivalent dual procedures.
Figure 1 shows this duality between an 5LD+ A~duction
tree (see [Cox and Pietrzykowski, 1986]) and the exectution tree of Satcluno, a theorem prover based on model
generation ([Manthey and Bry, 1987]).
f-wobbly-wheel
~roken-SPOkes
f-flat-tyre
!;leaky-valve
f-Qunctured-tube
SO=0
~
SI =so u (wobbly-wheel}
Abrok""."""'"
s2=slu(flat-tyre}
J
~eaky-valve}
S3=S2u (punctured-tube }
. Figure 1: Procedural Duality of Abduction and Satchmo
651
Although this example illustrates the potential of using deduction or more precisely, model generation, as a
formalisation of abductive reasoning, an obvious restriction of the example is that it is only propositional. Would
this approach also hold for the general case of definite
abductive programs? An example of a non-propositional
program and its only-if part is given in figure 2.
Abd = {q!2}; P =
{ p(a,b)~
p(a,X) ~ q(X,V). }
only-ifCP) =FEQ U
{p(y,Z) ~
(y=a&Z=b),
(3V: Y=a & q(Z,V»}
Q=~p(x.X).
notCQ)= 3 X: p(X,X).
details of the computation, figure 5 presents the computation tree.
transitivity
~a=b
Figure 2: A predicate example
The theory only-if(P) consists not only of the only-if
part of the definitions of the predicates but comprises
also the axioms of Free Equality (FEQ), also known as
Clark Equality ([Clark, 1978]). The abductive solutions
and models of only-if(P) are displayed in figure 3.
M= {p(a,a), ~}
M= {p(a,a), ~}
M= {p(a,a),~}
Figure 3: Abductive solutions and models
The duals of the abductive solutions are again identical to models of only-if(P). This example suggests that
at least the duality on the level of declarative semantics is maintained. However, on the level of procedural
semantics, some difficulties arise. The SLD+Abduction
derivation tree is given in figure 4.
. / ' \ e = {X/a}
fails
substitution
p(a,sk )
1
p(sk ,a)
1
~
~21
success
\
f-
Globally, the structure of the SLD+abduction tree of
figure 4 can still be seen in the Satchmo-1-tree. Striking is the duality of variables in the abductive derivation
and skolem constants in the model generation. However,
one difference is that the Satchmo-1 tree comprises many
additional inference steps due to the application of the
axioms of FEQI. In the abductive derivation these additional steps correspond to the unification operation (e.g.
on both left-most branches, the failure of the unification
of {X=a, X=b} corresponds to the derivation of the inconsistency of the facts {" kl =a, "k 1 = b } ).
Another difference is that the generated model
{p(a, a), q(a, "k1 ), p("k 1 , a), p(a, "k 1 ), p( $kl' "k 1 ),
q($kl! $kl ), $kl =a, a="k 1 , a=a, "k 1 =$k 1 , "k 1 ="kl
p(X,X)
unifiiation
{sk 1=a. q(sk l'sk2 )}
reflexivity
symmetry
Figure 5: Execution tree of Satchmo-1
1l ={q(a,a)}, e ={X/a}
1l = {q(a,b)}, e = {X/a}
1l = {q(a,sk)}, e = {X/a}
f-
failure
St v
q(a, V)
Figure 4: Abductive derivation tree
After skolemisation of the residue +-q( a, V), we obtain
the third abductive solution. With respect to the model
generation, the theory only-if(P) is not clausal, however
the extension of Satchmo, Satchmo-1 ([Bry, 1990]), can
deal with such formulas directly (without normalisation
to clausal form). Without dealing with the technical
}
is much larger than the model which is dual to the abuctive solution. Satchmo_1 generates besides the atoms of
this model also all logical implications of FEQ, comprising all substitutions of a by Ski' It is clear that in general
this will lead to an exponential explosion.
However, observe that we obtain the desired model by
contracting $k1 and a in the generated model. Therefore,
extending Satchmo-1 with methods for dynamic contraction of equal elements would solve the efficiency problem
and would restore the duality on the level of declarative
semantics.
Contraction of a model is done by taking one unique
witness out of every equivalence class of equal terms and
Ihnproper use or Satchmo_l: Equality in head or rule.
652
replacing all terms in the facts of the model by their
witnesses. Techniques from Term Rewriting can be used
to implement this. The procedural solution is to consider the set of inferred equality facts as a Term Rewriting System (TRS), to transform the set to an equivalent
complete TRS, and to normalise all facts in the model
using this complete TRS, and this after each forward
derivation step in Satchmo_1.
This procedure may seem alien to Logic programming,
but the contrary is true. As a mather of fact, the proposed procedure appears to be exactly the dual of techniques used in SLD+Abduction:
• the completion procedure corresponds dually to unification.
The dual of the mgu (by replacing variables by
skolem constants) is the completion of the set of
equality atoms.
related work. Due to space restrictions, all proofs are
omitted. We refer to [Denecker and De Schreye, 1991]
for the explicit proofs.
2
Extended programs.
In this section we int.roduce the formalism for which
the model generation will be designed. This formalism should at least contain any theory that can be obtained as the only-if part of the definition in the Clarkcompletion of definite logic programs. The extended
clause formalism introduced below, generalises both this
kind of formulas and the clausal form.
Definition 2.1 Let L be a first order language.
A 11. extended clause or rule is a closed formula of the
type:
V(Gl, ... ,G k -+ El, ... ,EI )
where Ei has the general form:
• the normalisation corresponds dually to applying
the mgu.
Therefore, incorporating these techniques in Satchmo.l
would also restore the duality on the level of procedural
semantics.
The research reported in this paper started as a
mathematical exercise in duality. However, there are
clearly spinoffs. One application is the extension of
Satchmo.l with efficient treatment of equality. We propose a framework for model generation under an arbitrary equality theory and we formally proof the duality of SLD+abduction in the instance of the framework,
obtained by taking FEQ as the equality theory. Also
for abduction there are spinoffs. An illustration of this
is found in the context of planning as abduction in the
event calculus. The event calculus contains a clause, saying that a property holds at a certain moment if there
is an earlier event which initiates this property, and the
property is not terminated (clipped) in between:
holds.at(P, T)~happens(E), iniiiates(E, P),
E < T, -,clipped(E, P, T).
A planner uses this clause to introduce new events which
initialise some desired property. Technically this is done
by first skolemising and then abducing the happens goal.
However, skolemisation requires explicit treatment of
the equality predicate as an abducible satisfying FEQ
([Eshghi, 1988]). The techniques proposed in this paper
allow efficient treatment of the abduced equality atoms,
and provide a declarative semantics for it.
The paper is structured as follows. III section: 2, we
present the class of theories for which the model generation is designed. Section 3 recalls basic concepts of
Term Rewriting. In section 4, the framework for model
generation is presented and inlportant semantic results
are formulated. In section 5, the duality with abductive
reasoning is formalised. Section 6 discusses future and'
such that all Gi are atoms based on L, all Fi are
equality atoms based on L
11.011.-
Definition 2.2 An extended program is a set of extended clauses.
Interestingly, the extended clause formalism can be
proven to provide the full expressivity of first order logic.
Any first order logic theory can be translated to a logically equivalent extended program, in the sense that they
share exactly the same models. (Recall that the equivalence between a theory and its clausal form is much
weaker: the theory is consistent iff its clausal form is
consistent. )
In the sequel, the theory of general equality (resp. the
theory of Free Equality), for a first order language L will
be denoted EQ(L) (resp. FEQ{L)). A theory T, based on
L, is called a theory with equality if it comprises EQ( L).
A theory T, based on L is called an equality theory if it is
a theory with equality in which" =" is the only predicate
symbol in all formulas except for the substitution axioms
of EQ(L).
3
Concepts of Term Rewriting.
The techniques we intend to develop for dealing with
equality, are inspired by Term Rewriting. However, work
in this area is too restricted for our purposes, because the
concepts and techniques assume the general equality theory EQ underlying the term rewriting. To be able to deal
with FEQ, we extend the basic concepts for the case of
an arbitrary underlying equaHty theory E. In the sequel,
equality and identity will be denoted distinctly when ambiguity may occur, resp. by "=" and "=". We assume
the reader to be familiar with basic notions of TRS's (see
653
e.g. [Dershowitz and Jouannautl, 1989]). We just recall
some general ideas. A TRS I associates to each term s
a reduction tree in which each branch consists of successive applications of rewrite rules of I' If I is noetherian,
these trees are all finite. If moreover I is Church-Rosser
or confluent, all leaves of the reduction tree of any term t
contain the same term, called the normalisation of t and
denoted t. , . In Term Rewriting, such a TRS is called
complete. Below we extend tIus concept.
Definition 3.1 Let E be an equality theory based on a
language L, I a Term Rewriting System based on L.
I is complete wrt to < L, E> iff I is noetherian and
Church-Rosser and, moreover, has a Least
Herbrand Model, which consists of all ground atoms s = t
constructed from terms in HU{L) such that S,,=t. , .
This definition extends the normal definition in Term
Rewriting by the tIlird condition. However, for E = EQ,
it has been proved that this property is implied by the
noetherian and Church-Rosser properties (for a proof see
[Huet, 1980]). Of course this is not the case for an arbitrary equality theory (as FEQ).
4
A framework for Model Generation
Informally a model generator const-ructs a sequence 1
(C'ld,jd)~' where Cld is the ground instance of a rule
applied after d steps, and jd thl! index indicating the conclusion of Cld that was selected, an increasing sequence
of sets of asserted ground facts (Md)~ of non-equality
predicates, a sequence of complete Term Rewriting Systems ((d)O' each of which is equivalent with the set of
asserted equality facts, and an increasing sequence of sets
of skolem constants (S kd)~, obtained by skolemizing the
existentially quantified variables. Formally:
Definition 4.1 Let L be a language, L,/c an infinite
countable alphabet of skolem constants, T an e:xtended
program based on L consisting of an equality theory with
completion E with completion function TRS-comp and P
an extended program.
An Nondeternunistic Model Generator with Equality
{NMGE} J( is a tuple of jour sequences (Skd)o' (Md)o,
b d)O and -( C ld' jd)~ where n E IN U {oo}. The sequences
satisfy the following conditions:
1. Mo
Definition 3.2 A completion of a TRS I wrt
is:
= Sko = Uj 10 =
TRS-comp( {})
2. for each d such that 0 < d :::; n, Cld, jd, Sk d, lvld
and Id are obtained from Sk d- l , M d - 1 and Id-1 by
applying the following steps:
• {o} if is inconsistent
(a) Selection of rule and conclusion
• a complete TRS Ie' such that
FI
~ Ie
Our framework for model generation is developed for
logical theories consisting of two components, an extended program P and an underlying equality theory E.
TIlls distinction reflects the fact that the model generation mechanism applies only to the extended clauses of
P, wIllIe E is dealt with in a procedural way, using completion and normalisation. However, in order to make
this possible, E should satisfy severe conditions, which
are formulated in the following definition.
Definition 3.3 An equality theory with completion,
E, based on a language L, is a clausal equality theory
equipped with a language independent completion procedure.
The latter condition means that if I is a ground Term
Rewriting System based on an extension L' of L by
skolem constants, and Ie is the completion of 'Y wrt
to , then for any further extension L" of L' by
skolem constants, 'Ye is still the completion of I wrt
.
We denote 'Ye as TRS-comp(().
Define LHMd_l as:
LHM{ is a tree
such that:
• Each node is labeled with a tuple (Sk,M,,) where Sk
is a skolem set, M a set of non-equality facts baud
on L+Sk, and I is a ground TRS based on L+Sk.
• To each non-leaf N, a ground instance Cl of a rule
of P is associated. For each conclusion with index
j in the head of GI, there is an arc leaving from N
which is labeled by (GI,i).
• The sequence of labels on the nodes and arcs on each
branch of T constitute an NMGE .
Definition 4.5 An NMGET is fair if each branch
tS
fair.
Definition 4.6 An NMGET is failed if each branch is
failed.
(LHMd)'O is a monotonically increasing se-
An NMGE performs a fixpoint computation, the result
of which can be seen as an interpretation of the language
L and, as we later show, a model of .
Definition 4.3 The fixpoint of an NMGE K is UoLHMd
and is denoted by Kj. The skolem set used by K is U'OSk d
and is denoted by Sk(K). Kj defines an interpretation of
L in the following way:
• domain: HU(L+Sk(K))
• for each constant c of L: KT( c )=c
• for each functor fin of L: KTUln) is the function
which maps terms t l , ... , tn of HU(L + Sk(K)) to
f(tl, ... ,t n ).
• for each predicate of L: Kj(pln) is the set of
P(tl'" .,t n ) facts in Kj.
Corollary 4.1 If K is a finite successful NMGE [( of
length n, then Kj = LHMn
Theorem 4.1 (Soundness) If K is a fair NMGE, then
KT i.s a model for and P+E is con.si.stent (a
fortiori).
We say that [(j is the model generated by K.
To state the completeness result, we require an additional concept: the NMGE-Tree. Analogously with
the concept of SLD-Tree, anNMGE-Tree is a tree of
NMGE's obtained by applying all different conclusions
of one rule in the descendents of a node ..
Observe that a failed NMGET contains only a finite
number of nodes. Also if T is inconsistent then because of
the soundness Theorem 4.1, each fair NMGET is failed.
As a completeness result, we want to state that for any
model of P+E, the NMGE contains a branch generating
a smaller model. In a context of Herbrand models, the
smaller-than relation can be expressed by set inclusion.
However, because of the existential quantifers and the resulting skolem constants, we cannot restrict to Herbrand
models only. In order to define a smaller-than relation
for general models, we must have a mechanism to compare models with a different domain. A solution to this
problem is provided by the concept of homomorphism.
Definition 4.7 Let II, 12 be interpretations of a language L with domains' D 1 , D 2 •
A homomorphism from 11 to 12 is a mapping h:
Dl ~Dz which satisfies the following conditions:
• For each functor fin (n 2: 0) of Land:e, :el, ... ,:e n E
D 1 : :e::I1 (fIn) ( :el, . ", :en) =>
h(:e)::Iz{f/n)(h(:ed, ... , h(:e n ))
• For each predicate symbol pin (n 2: 0) of Land
:ell ... , :en E D l :
I1(p/n)(:Cl, ... ,:e n ) ~ 12(pln)(h(:ed,· .. ,h(:e n ))
655
Intuitively a homomorphism is a mapping from one
domain to another, such that all positive information in
the first model is maintained under the mapping. Therefore the homomorphsinlS in the class of models of a theory can be used to represent a " .. .contains less positive
information than ..." relation. We denote the fact that
there exists a homomorphism from interpretation Ii to
12 by Ii ::S 12, This notation captures the intuition that
Ii contains less positive information than 12 ,
For NMGET's we can proof the following powerful
completeness result.
Theorem 4.2 (Completeness) Let E be an equality
theory with completion, P an extended program, both
based on L. Let L,/c be an alphabet of skolem constants.
Theorem 4.3 (Minimal Herbrand models) If P is
clausal, then for each fair NMGET T, each minimal Herbrand model is generated by a branch in T .
We have extended the concept of mininlal model
for general logic theories and proved the completeness of NMGE in the sense that each fair NMGET
We refer to
T generates all minimal models.
[Denecker and De Schreye, 1991].
5
Duality of SLD+Abduction
and Model Generation.
1. There exists a fair NMGET for .
The NMGE framework allows to formalise the observations that were made in the introduction. We first introduce the notion of a dualisation more formally.
2. For each model M of and each fair
NMGET T, there exists a succesful branch K of T
wch that KT ::S M.
Definition 5.1 Let L be a first order language, L ,/c an
We refer to [Denecker and De Schreye, 1991] for a constructive proof of this strong result. As a corollary we
obtain the following reformulation of a traditional completeness result.
Corollary 4.2 If is consistent then in each
fair NMGET there exists a succesjul branch.
If there exists a failed NMGET for , then
is inconsistent, and all fair NMGET's are
failed.
The completeness result does not imply that all models
are generated. For example for P = {pt-q}, the model
{p,q} is not generated by an NMGE. The following example shows that different NMGET's for the same theory might generate different models.
Example P
={
p, qt-
pt-}
Depending on which of these clauses is applied first,
we get two different nonredwldant NMGET's. If
pt- is applied first, then p, qt- holds already and is
not applied anymore. So we get an NMGET with
one branch of length 1. On the other hand if p, qtwas selected first, then two branches exist and we
get the solutions {p} and {p,q}.
Therefore it would be interesting if we could characterize a class of models which are generated by each
NMGET. The second item of the completeness Theorem 4.2 gives some indication: for any given model M,
some succesful branch of the NMGET generates a model
with less positive information than M. For the clausal
case, models with no redundant positive information
are minimal Herbrand Models. From this observation
one would expect that for a clausal program, each fair
NMGET generates all minimal models. Indeed, the following completeness theorem holds:
alphabet of skolem constants, V,/C a dual alphabet of variables such that a bijection D : L ,,.---+ V,k exists.
The dualisation mapping D can be extended to a mapping from HU(L+L,/.:) U HB{L+L'k) to the set of terms
based on L+ V,k by induction on the depth of terms:
• for each constant c of L : D(c)
==
c
• for each term t = f(t 1 , ••• , tn) :
D(f(t 1 , ••• , tn))==f(D(td, ... , D(t n))
D can be further extended to any formula or set of
formulas. Under dualisation, a ground TRS , based on
L+L'k corresponds to an equation set Dr,) with terms
based on L +V,/.:. , is said to be in solved form iff Dr,)
is an equation set in solved form.
An equation set is in solved form iff it consists of equations :Vi = ti, such that the :Vi'S are distinct variables and
do not occur in the right side of any equation. So a TRS
is in solved form if the left terms a.re distinct skolem constants of L ,/c which do not occur at the right. A TRS
in solved form can also be seen as the dual of a variable
substitution.
Property 5.1 Let, be a TRS in solved form. Then 'Y
is complete wrt to .
Theorem 5.1 (Duality completion - unification)
FEQ{L) is an equality theory with completion. The completion procedure is dual to unification. The dual of the
com.pletion of a ground TRS, based on L+Sk, is the mgu
of D(r). Or D{TRS-comp(r)) = mgu(D(r)).
As was observed in the int.roduction, this duality
can be extended further to the complete process of
SLD+abduction. On a procedural level, each resolution
step corresponds dually to a model generation step. The
selection of a goal for resolution corresponds dually to
656
the selection of the extended rule with its condition instantiated with the dual of the goal. The selection of
the clause in the resolution corresponds dually to the selection of the corresponding conclusion in the extended
rule. The unification of goal with the head of the clause
and the subsequent application of the mgu, corresponds
to the completion of the dual equations in the conclusion
and the subsequent normalisation.
Now we can formulate the duality theorem for
SLD+Abduction ([Cox and Pietrzykowski, 1986]) and
Model Generation.
Theorem 5.2 Let L be a first order language, with an
alphabet of variables L", L,/c an alphabet of skolem constants, and D: L,k-+L" a duality bijection between skolem
con3tants and variables. Let P be a definite abductive
program baud on L.
For any definite query ~Q, an abductive derivation
for t-Q and P can be dually interpreted as a fair NMGE
for only-ij(P}+3(Q). The set of atoms of the generated
model, re3tricted to the abducible predicate,s is the dual of
the abductive solution. The dual of the answer substitution is the re3triction of'n to the skolem constants dual
to the variables in the query.
The following corollary was proved first by Clark
([Clark, 1978]) for normal programs. For the definite
case it follows immediately from the theorem above.
Corollary 5.1 An SLD-refutation for a query t-Q, and
a definite program P without abducibles is a consistency
proof of 3( Q) +only-if{P}. A failed SLD-tree for a ground
query ~Q and P is an inconsistency proof of:J( Q) +onlyij(P}, and therefore of 3(Q)+comp{P}.
6
Discussion
A current limitation of the duality framework is its restriction to definite abductive programs. In the future
we will extend it to the case of normal abductive prQcedures. The extended framework will then describe a
duality between an SLDNF+Abduction procedure and a
form of model generation.
The SLDNF+Abduction procedure can be found by
proceeding as for the definite case. There we started
from pure SLD and definite programs without abduction,
we dualised it and obtained the NMGE method, which
under dualisation yields an SLD+Abduction procedure.
At present we have performed (on an informal basis)
the dualisation of SLDNF for normal programs without
abduction. Under dualisation, the resulting model generation procedure gives a natural extension of SLDNF
for abductive programs. The abductive procedure incorporates skolemisation for non-ground abducibles goals
and efficient treatment of abduced equality atoms by the
methods presented earlier. Integrity constraints can be
represented by adding for any integrity constraint IC,
the rule: "falset-not{IC)." ,transforming these rules to
a normal program using the transformation of LloydTopor ([Lloyd and Topor, 1984]), and adding the literal
not false to the query.
A prototype of this method has been implemented. An
interesting experiment was its extension to an abductive
planner based on the event calculus. Our prototype planner was able to solve some hard problems with context
dependent events, problems that are not properly solved
by existing systems ([Shanahan, 1989], [Missiaen, 1991]).
In [Denecker and De Schreye, 1992], we proved the
soundness of the procedure with respect to Completion
semantics, in the sense that for any query ~Q and generated solution .6.:
This implies the soundness of the procedure with respect to the Generalised Stable Model semantics of
[Kakas and Mancarella, 1990b]: a generated solution can
be extended in a natural way to a generalised stable
model of the abductive program. As a completeness result we proved that the procedure generates all minimal
solutions when the computation tree is finite.
Related to our work, [Bry, 1990] also indicates a relationship between abduction and model generation. However, while we propose a relationship on the object level,
there it is argued that abductive solutions can be generated by model generation on the abductive program
augmented with a fixed metatheory.
In [Console et al., 1991]' another approach is taken for
abduction through deduction. An abductive procedure is
presented which for a givellnormal abductive program P
and query t-Q, derives an explanation formula E equivalent with Q under the completion of P:
comp( P)
F (Q ¢:> E)
The explanation formula is built of abducibles predicates
and equality only. It characterises all abductive solutions
in the sense that for any set .6. of abducible atoms, .6. is
an abductive solution iff it satisfies E.
Although this approach departs also from the concept
of completion, it is of a totally different nature. In the
first place, our approach aims at contributing to the procedural semantics of abduction. This is not the case with
the work in [Console et al., 1991]. Another difference is
that this approach is restricted to queries with a finite
computation tree. If the computation tree contains an
infinite branch, then the explanation formula cannot be
computed.
In [Kakas and Mancarella, 1990a], an abductive procedure for normal abductive programs has been defined.
A restriction of this method is that abducible goals can
only be selected when they are ground. As argued in section I, this poses a serious problem for applications such
657
as plaIUllng. The methods presented here allow to overcome the problem by skolemisation of nonground goals
and efficient treatment of abduced equality facts.
Recently, an plalmi.ng system based on abduction in
the event calculus has been proposed i.n [Missiaen, 1991].
The underlying abductive system incorporates negation
as failure, skolemisation for non-ground abducible goals
and efficient treatment of abduced equality facts. However, the system shows some problems with respect to
soundness and completeness. Experiments indicated
that these problems are solved by our prototype planner.
Finally, we want to draw attention to an unexpected
application of the duality framework. In current work
on abduction, the theory of Free Equality is implicitly
or explicitly present. What happens if FEQ is replaced
by general equality EQ and the equality predicate is abducible? The result is an uncommon form of abduction
illustrated below. Take the program P = {r( a) ~ }. For
tIlls program, the query ~r(b) has a successful abductive
derivation.
~r(b)
b.={}
o
b.
= {b = a}
~r( b)
succeeds under the abductive hypothesis {b=
a}. The duality framework provides the teclmical support for efficiently implementing this form of abduction.
The only difference with normal abduction is that the
completion procedure for FEQ -the dual of unificationmust be replaced by a completion procedure for EQ, for
example Knuth-Bendix completion.
To conclude, we have presented a duality between two
computation paradigms. This duality allows to transfer
tecruucal results from one paradigm to the other and vice
versa. One application that was obtained was an efficient
extension of model generation with equality. Transferring these methods back to abduction, we obtained techniques for dealing with non-ground abducible goalS and
efficient treatment of abduced equality atoms. We discussed experiments indicating that the extension of the
duality framework for the case of normal programs is extremely useful for obtaining an abductive procedure for
normal abductive programs.
7
Acknowledgements
We thank Krzysztof Apt, Eddy Bevers, Maurice
Bruynooghe and Francois Bry for helpful suggestions.
References
[Bry, 1990] F. Bry. Intensional updates: Abduction via
deduction. In proc. of the intern. conf. on Logic Programming 90, pages 561-575,1990.
[Clark, 1978] K.L. Clark. Negation as failure. In H. Gallaire and J. Minker, ~ditors, Logic and databases, pages
293-322. Plenum Press, 1978.
[Console et al., 1991] L. Console, D. Theseider Dupre,
and P. Torasso. On the relationship between abduction and deduction. Journal of Logic and Computation,
1(5):661-690, 1991.
[Cox and Pietrzykowski, 1986]
P.T. Cox and T. Piet.rzykowski. Causes for events:
their computation and application. In proc. of the 8th
intern. conf. on Automated Deduction, 1986.
[Denecker and De Schreye, 1991] Marc Denecker and
Danny De Schreye. A framework for indeterministic
model generation with equality. Technical Report 124,
Department of Computer Science, K. U .Leuven, March
1991.
[Denecker and De Schreye, 1992] Marc Denecker and
Dann,Y De Schreye. A family of abductive procedures
for normal abductive programs, their soundness and
completeness. Technical Report 136, Department of
Computer Science, K.U.Leuven, 1992.
[Dershowitz and Jouannaud, 1989] N. Dershowitz and
J .-P. J ouannaud. Rewrite systems. In Handbook
of Theoretical Computer Science, vol.B, chapter 15.
North-Holland, 1989.
[Eshghi, 1988] K. Eshghi. Abductive planning with
event calculus. In R.A. Kowalski and K.A. Bowen,
editors, proc.of the 5th lCLP, 1988.
[Huet, 1980] G. Huet. confluent reductions: Abstract
properties and applications to term rewriting systems.
journal of the Association for Computing Machinery,
27( 4) :797-821, 1980.
[Kakas and Mancarella, 1990a] A.C. Kakas and P. Mancarella. Database updates through abduction. In proc.
of the 16th Very large Database Conference, pages
650-661,1990.
[Kakas and Mancarella, 1990b] A.C. Kakas and P. Mancarella. Generalised stable models: a semantics for
abduction. In proc. of ECAI-90, 1990.
[Kowalski, 19911 R.A. Kowalski. Logic programming in
artificial intelligence. In proceedings of the IlCAl,
1991.
[Lloyd and Topor, 1984] J.W. Lloyd and R.W. Topor.
Making prolog more expressive. Journal of logic programming, 1(3):225-240, 1984.
[Manthey and Bry, 1987] R. Manthey and F. Bry. A
hyperresolution-based proof procedure and its implementation in prolog. In proc. of the 11 th German
workshop on Artificial Intelligence, pages 221-230.
Geseke, 1987.
[Missiaen, 1991] L. Missiaen. Localized abductive planning with the event calculus. PhD thesis, Department
of Computer Science, K.U.Leuven, 1991.
[Shanahan, 1989] M. Shanahan. Prediction is deduction
but explanation is abduction. In IlCAI89, page 1055,
1989.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
658
Defining Concurrent Processes Constructively *
Yukihide Takayama
Kansai Laboratory, OKI Electric Industry Co., Ltd.
Crystal Tower, 1-2-27 Shiromi, Chuo-ku, Osaka 540, Japan
takayama@kansai.okLco.jp, takayama@icot.or.jp
Abstract
This paper proposes a constructive logic in which a concurrent system can be defined as a proof of a specification. The logic is defined by adding stream types
and several rules for them to an ordinary constructive
logic. The unique feature of the obtained system is in
the (!vIP ST) rule which is a kind of structural induction
on streams. The (AI PST) rule is based on the idea of
largest fixed point inductions, but the formulation of the
rule is quite different and it allows to define a concurrent
process as a Burge's mapstream function with a good
intuition on computation. This formulation is possible
when streams are viewed as sequences not infinite lists.
Also, our logic has explicit nondeterminacy but we do not
introduce any extralogica.l device. Our nondeterminacy
rule, (N onDet), is actually a defined rule which uses
inherent nondeterminacy in the traditional intuitionistic logic. Several techniques of defining stream based
concurrent programs are also presented through various
examples.
1
Introduction
Constructive logics give a method for formal development of programs, e.g., [C+86, HN89]. Suppose, for example, the following formula: \:Ix : D 1 .3y : D 2 • A(x, y).
This is regarded as a specification of a functIon, j,
whose domain is Dl and the codomain is D2 satisfying the input-output relation, A(x,y), that is, \:Ix :
D 1 • A(x, j(x)) holds. This functional interpretation of
formulas is realized mechanically. Namely, if a constructive proof of the formula is given, the function, j, is extracted from the proof with q-realizability interpretation
[TvD8S] or with Curry-Howard correspondence of types
and formulas [HowSO]. This programming methodology
will be referred to as constnlctive programming [SK90]
in the following.
Although constructive programming has been studied
by many researchers, the constructive systems which can
handle concurrency are ra.ther few. This is mainly be*This work was supported by ICOT as a joint research project
on theorem proving and its application.
cause most of the constructive logics. have been formalized as intuitionistic logics, and the intuitionism itself
does not have explicit concurrency besides proof normalization corresponding to the execution of programs
[Got85]. For example, QJ [Sat87] is an intuitionistic programming'logic for a concurrent language, Quty. However, when we view QJ as a constructive programming
system, concurrency only appears in the operational semantics of Quty.
Linear Logic [Gir87] gives a new formulation of constructive logic which is not based on intuitionism. This is
the first constructive logic which can handle concurrency
at the level of logic. The logic was obtained by refining
logical connectives of traditional intuitionistic or classical logic to introduce drastically new connectives with
the meaning of parallel execution. In Linear Logic, formulas are regarded as processes or resources and every
rule of inference defines the behavior of a concurrent operation. Linear Logic resembles Milner's SCCS [Mi189]
in this respect.
We take intermediate approach between QJ and Linear
Logic in the sense of not throwing away but extending
intuitionistic logic. The advantage of this approach is
that the functional interpretation of logical connectives
in the traditional constructive programming based on intuitionism is preserved, and that both the sequential and
concurrent parts of programs are naturally described as
constructive proofs. To this end, we take the stream
based concurrent programming model [KM74]. We introduce stream types and quantification over stream types.
A formula is regarded as a specification of a process
when it is a universally or an existentially quantified
over stream types, and otherwise it represents a specification of a sequential function, properties of processes
9r linkage relation between processes. A typical process,
\:IX.::lY.A(X, Y) where X and Yare stream variables, is
regarded as a stream transformer. Most of the rules of inference are those of ordinary constructive programming
systems, but rules for non determinacy and for stream
types are also introduced. Among them, a kind of structural induction on stream types called (M PST) is the
heart of our extended system: With (M PST), stream
transformers can be defined as Burge's mapstream functions [Bur75].
659
T. Hagino [Hag87] gave a clear categorical formalization of stream types (infinite list types or lazy types)
whose canonical elements are given by a schema of mapstream functions, but relation between his formulation
and logic is not investigated. N. Mendler and others
[PL86] introduced lazy types and the type checking rules
for them into an intuitionistic type theory preserving
the propositions-as-types principle in the sense that an
empty type can exist even in the extended type theory.
However, they do not give sufficient rules of inference for
proving specification of stream handling programs. Reasoning about stream transformer can be handled with a
largest fixed point induction as was demonstrated by P.
Dybjer and H. P. Sander [DS89]. However, their system
is designed as a program verification system not as a constructive programming system. Although q-realizability
interpretation for program extraction can be defined for
the coinduction rule [KT91]' the rule seems rather difficult to use for proving specifications. The reason is
that the coinduction rule deeply depends on the notion
of bisimulation, so that in the proof procedure one must
find a stronger logical relation included in the more general logical relation and that is not always an easy task.
The (~1 PST) rule is based on a similar idea to the
coinduction rule: one must find a new logical relation and
a new function to prove the conclusion. However, what
one must find has a clear intuitive meaning as the components of a concurrent process. Therefore, the (111 PST)
rule shows an intuitive guideline on how to construct a
concurrent process.
Section 2 explains how a concurrent system is specified
in logic. A process is specified by the VX.::lYA(X, Y)
type formula as in the traditional constructive programming. The rest of the sections focus on the problem of
defining processes which meet the specifications. Section 3 formulates streams and stream types. Streams
are viewed as infinite lists or programs which generate
infinite lists at the level of underlying programming language. At the logical reasoning level, streams are sequences, namely, total functions on natural numbers.
This two level formulation of streams enables to introduce (JIll PST) which will be given in section 4. Section
5 presents the rest of the formalism of the whole system.
The realizability interpretation which gives the program
'extraction algorithm from proofs will be defined. Several
examples will be given in section 6 to demonstrate how
stream based concurrent programming is performed in
our system.
Notational preliminary: 'Ne assume first order intuitionistic natural deduction. Equalities of terms, typing
relations (}vI : 0'), and T (true) are atomic formulas.
The domain of the quantification is often omitted when
it is clear from the context. Sequences of variables are
denoted as x or X. ~lx[N] denotes substitution of N
to the variable, x, occurring freely in 111. 1I1x[N] denotes simultaneous substitution. FV(M) is the set of
free variables in M. (::) denotes the (infinite) list constructor. Function application is denoted ap(M, N) or
1I1(N). Mn(N) denotes M(· .. ~1( N) ... ).
'-......--'
n
2
Specifying Concurrent
tems in Logic
Sys-
The model of concurrent computation in this paper is
as follows: A concurrent system consists of processes
linked with streams. A process interacts with other processes only through input and output streams. The configuration of processes in a concurrent system is basically static and finite, but in some cases, which will be
explained later, infinitely many new processes may be
created by already existing processes. A process is regarded as a transformer (stream transformer) of input
streams to an output stream, and it is specified by the
'IX : lO'l,oo',O'n. ::lY : IT". A(X, Y) type of formula where
10' denotes the type of streams over the type 0', but its
definition will be given later. l u1 ,oo.,O'n is an abbreviation
of 10'1 x· .. x l un , X and l' are input and output streams,
and A(X, 1') is the relation definition of input and output streams.
The combination of two processes, VX.::lY A(X, Y)
and VP.::lQ. B(P, Q), by linking the stream l' and P
is described by the following proof procedure:
~l
VX.::lY A(X,1') ('IE)
::l1'. A(X, 1')
IIo
(::lE)(l)
::l1'.::la. A(X, a) & B(a, 1') (VI)
VX.::l1'.3a. A(X, a) & B(a, Y)
where ITo ~
~2
VP.::lQ. B(P, Q) ('IE)
::lQ. B(1",Q)
III (::lE)(2)
31'.3a. A(X,a) & B(a,Y)
and III ~
[A(X, y,)](l) [B(y, Q,)](2)
A(X, 1") & B(1" , Q') (& I) (3I)
::la. A(X,a) & B(a,Q')
(31)
::l1'.::la. A(X, a) & B(a, 1')
and
~l
and
~2
are
the
definition
of
process
VX.::l1'.A(X,1') and VP.::lQ.B(P, Q).
This is a typical proof style to define a composition of
two functions. Thus, a concurrent system is also specified
by VX.3Y A(X, 1') type formula. X and l' are input and
output streams of the whole concurrent system, and a is
an internal stream.
All these things just realize the idea that functions can
be viewed as a special case of processes. In the following, we focus on the problem of how to define a process
(stream transformer) as a constructive proof.
660
3
r, n : nat I- hd(tln(s))
Formulation of Streams
Two Level Stream Types
A stream can be viewed at least in three ways: an infinite list, an infinite process, and an output sequence
of a.n infinite process, namely, a total function on natura.! numbers. The formal theories of lazy functional programming such as [PL86] and [Hag87] can be regarded as
the theories of concurrent functional programming based
on the first two points of view on streams .. Our system
uses a lazy typed lambda calculus as the underlying programming language and has lazy types as computational
stream types. Computational stream types are only used
as the type system for the underlying language. In proving specifications of stream transformers, we use logical stream types which are based on the third point of
view on streams. In other words, we have two kinds
of streams: computational streams at the programming
language level, and logical streams at the logical reasoning level. Vve denote a computational stream type Cu
and a logical stream type Iu' The following is the basic
rules for computational stream types. The idea behind
them is similar to that behind the lazy type rules in
[PL86]. We confuse the meaning of the infinite list constructor, (::), and will use this also as an infinite cartesian product constructor. Vie abbreviate 111 ~ N for
111 = N in CT in the following.
r I- 111 : CT r I- S : Cu
r I- (111 :: S) : Cu
r1-1I1~N
r
rl-S~T
I- (111 :: S) ~ (N :: T)
r I- A1 %:: N
r
I- (111 :: S) ~ (N :: T)
r
I- (111 :: S) ~ (N :: T)
r,Z: T I- M: T
r I- liZ. 111: T'
rl-S~T
where T is Cu or 7 - t Cu'
// is the fixed point operator only used for describing
a stream as an infinite process (infinite loop program).
The reduction rule for II-terms is defined as expected. hd
and tl are the primitive destructor functions on streams.
I- M: Cu
r I- 111 : Cu
I- hd( 111) : CT
r
r
r
r I- X: Cu
r
I- X ~ (hd(X) :: tl(X))
r I- (111 :: S)
: Cu
r I- hd((l11:: S)) ~ 111
r, n : nat, tln(s)
r I- (M :: S) : Cu
r
I- tl((111 :: S)) ~S
~ tln(T) I- S Un~C<7 T
rl-S~T
r I- A1z ~ Nw[z]
r I- liZ. Mz ~ IIW. Nw
Before giving the definition of logical stream types,
note that the type, nat - t CT, is isomorphic to Cu , namely,
Proposition 1: Let CT be any type, then Let 0
Note that X(n) = hd(tln(x)) for arbitrary X : Iu and
n : nat. All the rules for hd, tl and (::) in computational streams also hold for these defined functions and
the constructor for logical streams.
3.2
Quantification over Logical Stream
Types
There is a difficulty in defining the meaning of quantification over (logical) stream types. The standard intuition-
661
is tic interpretation of, say, existential quantification over
a type, CT, :lx : CT.A(x) is that "we can explicitly give the
object, a, of type CT such that A(a) holds". However, as a
stream is a partial object we can only give an approximation of the complete object at any moment. Therefore
we need to extend the familiar interpretation of quantification over types. In fact, Brouwer's theory of choice
sequences [TvD88] in intuitionism provides us with the
meaning of quantification over infinite sequences.
There are two principles in Brouwer's theory, the principle of open data and the principle of function continuity. The principle of open data, which informally states
that for independent sequences any property which can
be asserted must depend on initial segments of those sequences only, gives the meaning of the quantification of
type \lX.:ly.A(X, y). That is, for an arbitrary sequence,
X, there is a suitable initial finite segment, X o, of X such
that :ly. A(Xo, y) holds. The principle of function continuity gives the meaning of the quantification of type
\lX.:lY.A(X, Y). Assume the case of natural number
streams (total functions between natural number types).
The function continuity is stated as follows:
\lX.:lY. A(X, Y)
=?
:lj : K. \IX. A(X, fiX)
where fiX = Y is an abbreviation of \Ix : nat. f(x ..
X) = Y (x) and J( is the class of functions that take
initial finite segment of the input sequences and return
the values. This means that every element of Y is determined with a suitabl~ initial finite segment of X.
These principles meet our intuition of functions on
streams and stream transformers very well. \IX : 1"..:ly :
r.A(X, y) represents a function on streams over CT, but
we would hardly ever try to define a function which returns a value after taking all the elements of an input
stream. Also, we would expect a stream transformer,
\IX : 1"..:lY : I.r.A(X, Y), calculate the elements of the
output stream, Y, gradually by taking finitely many elements of the input stream, X, at any step of the calculation.
Note that this semantics also meets the proof method
used in [KM74j: To prove a property P(X) on a stream
X, we first prove P for an initial finite subsequence, X o,
of X (I- P(Xo)) and define I- P(X) to be limxo--+x P(Xo).
4
Structural Inductiori on Logical Streams
As streams ca.n be regarded as infinite lists, we would
expect to extend the familiar structural induction on finite lists to streams. However, a naive extension of the
structural induction on finite lists does not work well. If
we allow the rule below,
f, A(tl(X)) I-jA(X) (S1)
f I- \IX : 1".. A(X)
the following wrong theorem can be proved:
WrongTheorem: \IX : 1nat . B(X)
where B(X) ~:ln : nat. X(n) = 100.
Proof: By (S1) on X : 1nat . Assume B(tl(X)). Then,
there is a natural number k such that tl(X)(k) = X(k +
1) = 100. Then B(X).I
This proof would correspond to the following uninteresting program: foo = )"X. foo(tl(X)). This is because the naive extension of the structural rule on finite
lists does not maintain the continuity of the function on
streams. Therefore, we need a drastically different idea
in the case of infinite lists. One candidate is the coinduction rule (a largest fixed point induction) as in [DS89j:
(B =? p[B]) =? (B =? lIP.cP) where 1IP. denotes the
largest fixed point of P = . \IX : 1"..A(X) part will be
described with lIP. type formulas, and one must find a
suitaQle logical relation B to prove the conclusion. But
searching B will not always be an easy task: we wish the
searching task decomposed into more than one smaller
tasks each of which has clear and intuitive meaning of
computation. Therefore, we take another approach: the
(M PST) rule.
4.1
Mapstream Functions as Stream
Transformers
Recall that the motivation of pursuing a kind of structural induction on streams is to define stream transformers as proofs, and stream transformers can be realized as
Burge's mapstream functions. A schema of mapstream
functions is described in typed lambda calculus as follows:
P
= )..MT-+"'.)..NT-+T.)..XT. (eM x) :: (((P M) N)
(N x)))
If we give the procedures M and N, we obtain a mapstream function. Note that, from the viewpoint of continuity, these procedures should be as follows:
M
"Fetch initial segment, Xo, of the input stream,
X, to generate the first element of the output
stream. "
N
"Prepare for fetching the next finite segment
input stream interleaving, if necessary, other
stream transformer between the original input
stream and the input port.
This suggests that if a way to define M, N, and P as
proof procedures is given, one can define stream transformers as constructive proofs.
=
=
4.2
A Problem of Empty Stream
Before giving the rule of inference for defining stream
transformers, a little more observation of stream based
programming is needed. Assume a filter program on natural number streams realized as a mapstream function:
flt a= )"X. if (alhd(X)) then Jlta(tl(X))
else (hd(X) :: flta(tl(X)))
= )"X.((M X) :: (((P M) N)(NX)))
662
where (alhd(X)) is true when hd(X) ca.n be divided by
a (a. na.tural number) and
!ll _ ,\X. if (alhcZ(X)) then Al(tl(X)) else hd(X)
N == AX. if (alhd(X)) then N(tl(X)) else tl(X)
For example, flt s ((5 :: 5 :: 5 :: 5 :: ... )) is an empty sequence because the evaluation of ~M(5 :: 5 :: 5 :: 5 :: ... )
does not terminate. This contradicts the principle of
open data explained in 3.2. To handle such a case, we
introduce the notion of complete stream. The idea is
to regard flts, for example, always generating some elements even if the input stream is (5 :: 5 :: ... ).
Def. 1: Complete types
Let ()" be a.ny type other than a strea.m type, then ()".I..
denotes a type ()" together with the bottom element -L"
(often denoted just -L) and it is called a complete type.
Def. 2: Complete stream types
A stream type, 1" or C", is called complete when ()" is a
complete type.
flt s is easily modified to a function from C nat to Cnatl.'
and then flt s ((5 :: 5 :: ... )) will be (-L :: -L :: ... ) which
is practically an empty stream.
4.3
The (1\11PST) rule
Based on the observations in the previous sections, we introduce a rule (M PST) for defining stream transformers.
The rule is formulated in natural deduction style, but the
formula, A, in the specification of a stream transformer,
VX.3Y A(X, Y), is restricted. In spite of the restriction,
the rule can handle a fairly large class of specifications
of stream transformers as will be demonstrated later.
The rule is as follows:
(a) VX : Ier.3a : T. M(X, a)
(b) VX : Ier.Va : T. VS : IT' (Al(X, a) ::} A(O,X, (a :: S)))
(c) 3f : Ier -+ Ier. VX : Ier.\fY : IT.Vn : nat.
(A(n, f(X), tl(Y)) ::} A(n + 1, X, Y))
VX : Ier.3Y : IT.Vn : nat. A(n,X, Y)
where Al is a suitable predicate and A(n, X, Y) must be
a rank 0 formula [HNS9]. We can easily extend the rule
to the multiple input stream version. \lve do not give
the precise definition of rank 0 formulas here, but the
intention is that we should not expect to extract any
computational meaning from A(n, X, Y) part. This restriction comes from purely technical reason, but does
not degenerate the expressive power of the rule from the
practical point of view because we usually need only to
define a stream transformer program but not the verification code corresponding to A(n, X, Y) part. The technical reason for the side condition of (AlP ST) is as follows:
(AIPST) is in fact a derived rule with (ST) and (CON),
so that q-rea.lizability interpretation defined in the next
section is carried out using the interpretation of those
rules. The difficulty resides in the interpretation of the
(CON) rule, but if we restrict the formula A(n, X) in
(CON) to be rank 0, the interpretation is trivial. This
condition corresponds to to side condition of (M PST).
The intuitive meaning of (.N! PST) is as follows. As
explained in 4.1, a mapstream function is defined when
Al and N procedure are given. (a) is the specification
of the M procedure, fM' and (b) means that fM certainly generates the right elements of the output stream.
The N procedure, fN, is defined as the value of existentially quantified variable, f, in (c). (c) together
with (b) intuitively means the following: for X : I" (input stream) and Y : IT (output stream), let us call a
pair, (fN(X), tln(y)), the nth fN-descendant of (X, Y).
Then, for arbitrary n : nat, A(n, X, Y) speaks about nth
fN-descendant of (X, Y), and A(n, fN(X), tl(Y)) actually speaks about n + lth fN-descendant of (X, Y).
If fN is a stream transformer, this means that the process (stream transformer) defined by (M PST) generates
another processes dynamically.
Note that, as we must give a suitable formula, M, to
prove the conclusion, (M PST) is essentially a second
order rule.
5
The Formal System
This section presents the rest of the formalization of our
system briefly.
5.1
Non-deterministic 'x-calculus
The non-deterministic A-calculus is a typed concurrent
calculus based on parallel reduction and this is used as
the underlying programming language. The core part is
almost the same as that given in [Tak91]. It has natural
numbers, booleans (T and F), Land R as constants.
Individual variables, lambda-abstractions, application,
sequences of terms ((MI, ... , Mn) where Mi are terms),
if-then-else, and a fixed point operator (f.L) are used as
terms and program constructs. The reduction rules for
terms are defined as expected, and if a term, M, is reducible to a term, N, then AI and N are regarded as
equal. Also, several primitive functions are provided for
arithmetic operations and for the handling of sequences
of terms such as projection of elements or subsequences
from a sequence of terms. The type structure of the calculus is almost that of simply typed A-calculi. nat (natural number type), bool (boolean types), and 2 (type of
Land R) are primitive types and x (cartesian product)
and -+ (arrow) are used as type constructors. The type
inference rules for this fragment of the calculus are defined as expected. In addition to them, computational
streams, computational stream types and a special term
called coin flipper is introduced to describe concurrent
computation of streams. For the reduction strategy, /1-
663
terms in section 3.1 are lazily evaluated.
The coin flipper is a device for simulating nondeterminacy. It is a term, ., whose computational meaning is
given by the following reduction rule:
• t> Lor R
That is, • reduces to L or R in a nondeterministic way.
This is like flipping a coin, or can be regarded as hiding
some particular decision procedure whose execution may
not always be explained by the reduction mechanism.
• is regarded as an element of 2+, a super type of
2. The elements of 2 have been used to describe the
decision procedure of if-then-else programs in the program extraction from constructive proofs in [Tak91) as
if T = L then A1 else N. Nondeterminacy arises when
T is replaced by •. The intentional semantics of • IS
undefined. 2+ enjoys the following typing rules:
L : 2+
R : 2+
• : 2+
Let A be a formula.
defined as follows:
Then, a type of A, type(A),
IS
1. type(A) is empty, if A is rank 0;
2. type(A & B) ~ type(A) x type(B);
3. type(A V B) ~ 2+
X
type(A) x type(B);
4. type(A ~ B) ~f type(A)
5. type(Vx:
0".
A) ~ 0"
6. type(3x : 0". A) ~
-t
0" X
-t
type(B);
type(A);
type(A);
Proposition 2: Let A be a formula with a free variable x. Then, type(A) = type(Ax[M)) for any term 111
of the same type as x.
Def. 5: q-rea.lizability
1. If A is a: rank 0 formula, then () q A ~ A;
5.2
Rules of Inference
2.
A
1\1 : 0"
5.3
a: 0" n: nat
ap(Mn,a) : 0"
- t 0"
4. a q Vx :
g:
A ~ Vx : 0". (a(x) q A);
rank 0;
6. • q A V A ~ A if A is rank 0;
7. (a,b) q A & B ~f a q A & b q B.
Proposition 3: Let A be any formula. Ifa q A, then
a: type(A).
0"2 - t T2
0"2 - t Tl X T2
Realizability Interpretation
The realizability defined in this section is a variant of
q-realizability [TvD88).
A new class of formulas called realizability relations is
introduced to define q-realizability.
Def. 3: Realizability relation
A 'realizability Telation is an expression in the form of
a q A, where A is a formula and a is a finite sequence of
variables which does not occur in A. a is called a Tealizing vaTiables of A. For a term A1, A1 q A, which reads
"a term 1\1 realizes a formula A", denotes (a q A)a[A1],
and A1 is called a TealizeT of A.
Theorem: Soundness of realizability:
Assume that A is a formula. If A is proved, then there
is a term, T, such that T q A can be proved in a trivially extended logic in which realizability relations are
regarded as formulas, and FV(T) C FV(A).
The proof of the theorem gives the algorithm of program extraction from constructive proofs. The program
extracted from (NonDet) is if • = L then Meise N
where M and N are the program extracted from the
subproofs of two premises. From a proof by (MPST),
the program AX.Am.apUM, f'N(X)) is extracted where
fM and fN are as explained in section 4.3. Other part of
the extraction algorithm can be seen in [Tak91).
6
A type is assigned for each formula, which is actually
the t.ype of the realizer of the formula.
Def. 4: type(A)
0".
5. (z,a,b) q A VB
~f (z = L & A & a q A & b: type(B))
V (z = R & B & b q B & a : type(A)) provided that
A and B are distinct or A = B with A and B not
A
f : 0"1 - t T1
f X g : 0"1 X
B ~ Vb : type(A).(A & b q A ~ a(b) q B);
3. (a, b) q 3x : 0'. A ~ a : 0" & Ax[a) & b q Ax[a);
(1) Logical Rules
The rules for logical connectives and quantifiers are those
of first order intuitionistic natural deduction with mathematical induction.
(2) Rules for Nondeterminacy
• = Lv. = R
A
(NonDet)
(N onDet) is actually a derived rule: This is obtained
by proving A by divide and conquer on TVT. (NonDet)
means that if two distinct proof of A are given, one of
them will be chosen in a nondeterministic way. This
is the well-known nondeterminacy both in classical and
intuitionistic natural deduction.
(3) Auxiliary Rules
aq A ~
Examples
The basic programming technique with (A1 PST) is
demonstrated in this section. In the following, we write
Xn for X(n) when X is a stream.
664
6.1
6.3
Simple Examples
A process which doubles each element of the input natural number stream is defined as follows:
SPEC 1: VX : I nat .3Y : Inat.Vn : nat. Yn = 2· Xn
The proof is continued by (!I1PST).
Let
M(X, a) ~ a = 2 . hd(X), and (a) and (b) are easily
proved. (c) is proved by letting f = )..X. tl(X) .•
Proof:
The following example, a program which extracts only
prime numbers in the input stream, is one of the typical
examples of dynamic creation of new prqcesses.
SPEC 4: VX: I nat .3Y: Inatl..Vn: nat. OA(n,X,Y)
where
A(n,X,Y)
The program extracted from the proof is )"X.)..m. 2 .
hd(tlm(x)) which is, by the isomorphism cp, extensionally equal to l/z.)..X. (2· hd(X) :: z(tl(X))).
A process which takes the successive two elements at
once from the input stream and outputs the sum of them
is defined as follows:
SPEC 2: VX: IO'.3Y: Iq.Vn : nat. Yn = X 2 .n + X 2 ' n+1
(MPST). Let !I1(X,a) ~ a = hd(X) +
hd(tl(X)) and (a) and (b) are easily proved. (c) is proved
by letting f 'X.hd(X) and fN ~ >'(X, Y). if •
L then (tl(X), Y) else (Y, tl(X)).
7
Conclusion and Future Works
An extension of constructive programming to stream
based concurrent programming was proposed in this paper. The system has lazy types at the level of programming language and logical stream types, which are types
of sequences viewed as streams, at the level of logic. This
two level formulation of streams enables to formulate a
purely natural deduction style of structural induction on
streams (lv[ PST) in which concurrent processes (stream
transformers) are defined as proofs. The (MPST) rule
allows to develop the proof of a specification with a good
intuition on the concurrent process to be defined, and the
rule seems to be easier to handle than the largest fixed
point induction. Also, nondeterminacy was introduced
at the level of logic using the inherent nondeterminacy
of proof normalization in intuitionistic logic.
For the future work, as seen in the example of a merger
process, the side condition for (M PST) should be relaxed to handle larger varieties of concurrent processes.
References
[Bur75] W. H. Burge. Recursive Programming Techniques. Addison-V\lesley, 1975.
[C+86]
R. 1. Constable et al. Implementing Mathematics with the Nuprl Proof Development System.
Prentice-Hall, 1986.
[DS89]
P. Dybjer and H. P. Sander. A Functional Programming Approach to the Specification and
Verification of Concurrent Systems. Formal Aspects of Computing, 1:303 - 319, 1989.
[Gir87]
J.- Y. Girard. Linear logic. Theoretical Computer Science, 50, 1987. North-Holland.
[Got85] S. Goto. Concurrency in proof normalization
and logic programming. In Internatioinal Joint
Conference on Artificial Intelligence '85, 1985.
[Hag87] T. Ragino. A Typed Lambda Calculus with
Categorical Type Constructors. In Category
Theory and Computer Science, LNCS 283,
1987.
[HN89]
S. Hayashi and H. Nakano. PX : A Computational Logic. The MIT Press, 1989.
[How80J W. A. Howard. The formulas-as-types notion
of construction. In Essays on Combinatory
Logic, Lambda Calculus and Formalism, eds. J .
P. Seldin and J. R. Hindley. Academic Press,
1980.
[KM74] G. Kahn and D. B. MacQueen. The Semantics of a Simple Language for Parallel Programming. In IFIP Congress 74. North-Holland,
1974.
[KT91]
S. Kobayashi and M. Tatsuta. private communication. 1991.
[Mi189]
R. Milner. Communication- and Concurrency.
Prentice Hall, 1989.
[PL86]
N. Mendler P. Panangaden and R. L.Constable.
Infinite Objects in Type Theory. In Symposium
on Logic in Computer Science'86, 1986.
[Sat87]
M. Sato. Quty: A Concurrent Language Based
on Logic and Function. In Fourth International
Conference on Logic Programming, pages 10341056. The MIT Press, 1987.
[SK90]
M. Sato and Y. Kameyama. Constructive
Programming in SST. In Proceedings of the
Japanese-Czechoslovak Seminar on Theoretical Foundations of [(nowledge Information Processing, pages 23-30, INORGA, 1990.
[Tak91] Y. Takayama. Extraction of Redundancy-free
Programs from Constructive Natural Deduction Proofs. Journal of Symbolic Computation,
12(1):29-69, 1991.
[TvD88] A. S. Troelstra and D. van Dalen. Constructivism in A1athematics, An Introduction. Studies in Logic and the Foundation of Mathematics
121 and 123. North-Holland, 1988.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
666
Realizability Interpretation of Coinductive Definitions and
Program Synthesis with Streams
Makoto Tatsuta
Research Institute of Electrical Communication,
Tohoku University,
2-1-1 Katahira, Sendai 980, JAPAN
e-mail: tatsuta@riec.tohoku.ac.jp
Abstract
The main aim of this paper is to construct a logic by
which properties of programs can be formalized for verification, synthesis and transformation of programs.
This paper has 2 main points. One point is realizability interpretation of coinductive definitions of predicates.
The other point is an extraction of programs which treat
streams.
An untyped predicative theory TID 1I is presented,
which has the facility of coinductive definitions of predicates and is based on a constructive logic. Properties
defined by the greatest fixed point, such as streams and
the extensional equality of streams, can be formalized
by the facility of coinductive definitions of predicates in
TIDI/'
q-realizability interpretation for TIDI/ is defined and
the realizability interpretation is proved to be sound.
By the realizability interpretation, a program which
treats streams can be extracted from a proof of its specification in TIDI/' General program extraction theorem
and stream program extraction theorem are presented.
1
Introduction
Our main aim is to construct a logic by which we can
formalize properties of programs for verification, synthesis and transformation of programs. In this paper, we
concentrate on formalization of programs with streams
and present a theory TIDI/'
Coinductive definitions are very important for this
purpose. Properties of streams are represented semantically by the greatest fixed point. The predicate representing what a stream is and the extensional equality
of streams are defined semantically by the greatest fixed
point. These properties defined by the greatest fixed
point can be formalized by coinductively defined predicates and coinduction.
It-calculus has been studied to formalize programs with
streams for verification [3]. {L-calculus has the facility of
coinducti;e definitions of predicates and coinduction and
is based on classical logic.
In this paper, we present a theory T1DI/' which has
the facility of coinductive definitions of predicates and
coinduction and is based on a constructive logic. By
these facilities we can formalize properties of programs
with streams in TIDI/'
Our theory T1DI/ is based on a constructive logic because we want to use the facility of program extraction
by realizability for TIDI/' Program extraction is one of
the benefits we get when we use a constructive formal
theory to formalize properties of programs. Program extraction is to get a program from a constructive proof
of its specification formula. One method of program extraction is to use realizability interpretation. In PX[4],
for example, a LISP program is extracted from a proof of
its specification formula by realizability interpretation.
By the facility of coinductive definitions of predicates
and realizability interpretation, we can synthesize programs with streams naturally in TIDI/ using theorem
proving techniques.
This paper has 2 main points. One point is realizability interpretation of coinductive definitions. The other
point is an extraction of programs with streams.
We present an untyped predicative theory T1DI/'
which has coinductive definitions of predicates and is
based on a constructive logic. We define q-realizability
interpretation of TIDI/' We show that the realizability interpretation is sound. We present general program
extraction theorem and stream program extraction theorem.
The soundness proof is based on the early version of
this paper [8]. The soundness theorem was proved also
in [5]. Both works are independent.
In Section 2, we define a theory TIDI/' In Section 3, we
briefly explain how useful the facility of coinductive definitions of predicates is to formalize streams. In Section
4, we discuss a model of T1DI/ and prove its consistency.
In Section 5, we present q-realizability interpretation
of TIDI/ and prove the soundness theorem. In Section
6, we give general program extraction theorem, stream
667
program extraction theorem for T1Dv and an example
of program synthesis.
2
Theory T1Dv
'vVe present a theory T1Dv in this section. It is the same
as Beeson's EON [1] except for the axioms of coinductive
definitions of predicates.
In this paper, we choose combinators as the target programming language for simplicity since we want to concentrate on the topic of coinductive definitions of predicates. We suppose that the evaluation strategy of combinators is lazy or call by name because we represent
a stream by an infinite list, which is a non-terminating
term. We omit also the formalization of the lazy or call
by name evaluation strategy in T1Dv for simplicity.
Definition 2.1. (Language of T1Dv)
The language of T1Dv is based on a first order language but extended for coinductive definitions of predicates.
The constants are:
K, S, p, Po, PI' 0, SN, PN, d.
We choose combinators as a target programming language for simplicity. K and S mean the usual basic combinators. We have natural numbers as primitiyes, which
are given by 0, a successor function SN and a predecessor
function PN. We also have paring functions p, Po and
PI as built-in, which correspond to cons, car and cdr
in LISP respectively. d is a combinator judging equality of natural numbers and corresponds to an if-then-else
statement in a usual programming language.
We have only one function symbol:
App
whose arity is 2. It means a functional application of
combinators.
Terms are defined in the same way as for a usual first
order logic. For terms s, i, we abbreviate App(s, i) as si.
For terms s, i, we also use an abbreviation (s, i) == psi,
to == Pot and tl == PIt.
The predicate symbols are:
1.., N, -.
V.,re have predicate variables, which a first order language does not have. The predicate variables are:
X, Y, Z, ... , X*, Y*, Z*, ....
Each predicate variable has a fixed arity.
We use an abbreviation Ax.i which is constructed
by combinators in the usual way. We also abbreviate
Y(>..x.t) as j.lx.i where Y == Af.(Ax.f(XX ))(>..x.f(xx )).
Definition 2.2. (Formula)
We define a formula A, a set S+(A) of predicate variables which occur positively in A and a set S_(A) of
predicate variables which occur negatively in A.
1.
If a, b are terms,
.1, N(a), a = b
are formulas. Then
S+(1..) = S_(1..)
S+(N(a))
S+(a = b)
= >,
= S_(N(a)) = >,
= S_(a = b) = >.
2. If X is a predicate variable whose arity is n,
X(XI"" ,x n) is a formula and
S+(X(XI,'" ,x n)) = {X},
S-CX(XI"" ,xn )) = >.
3. A & B, A V B, A ~ B, VxA, :lxA are formulas if A
and B are formulas in the same way as a first order
language. Then
S+CA & B) = S+(A V B) = S+(A) U S+(B),
S_CA & B) = S_(A V B) = S_(A) U S_CB),
S+(A ~ B) = S_(A) U S+(B),
S_(A ~ B) = S+(A) U S_(B),
S+CVxA) = S+(:lxA) = S+(A),
S_(VxA) = S_(:lxA) = S_(A).
4. (VX.AXI'" xn.A)(t l , ... , in) is a formula where X is
a predicate variable whose arity is n, A is a formula,
i l , ... , tn are terms and X is not in S_(A). Then
S+((VX.AXI ... xn.A)(i l , ... , tn)) =
S+(A) - {X},
S_((VX.AXI'" xn.A)(i b
···,
in)) = S_(A).
.1 means contradiction. N(a) means that a is a natural
number. a = b means that a equals to b.
The last case corresponds to coinductively defined
predicates. Remark that X and Xl, ... , Xn may occur
freely in A. The intuitive meaning of a formula
(VX.AXI ... xn.A(X, XI, ... ,xn))(i l , .. . ,in)
is as follows: Let P be a predicate of arity n such that
P is the greatest solution of an equation
P(XI,"" Xn) f-4 A(P, Xb"" xn).
Then (VX.AXI'" xn.A(X, Xl, ... , Xn))(t l , ... , tn) means
P(tl,' .. ,tn ) intuitively.
We abbreviate a sequence as a bold type symbol, for
example, Xl, ... ,X n as x.
Example 2.3.
We give an example of a formula. We assume the arity
of a predicate variable P is 1. Then
(VP.AX.X = (Xo, Xl) & Xo = & P(XI))(X)
is a formula.
°
Among many axioms and inference rules of TID v, we
discuss only inference rules of coinductive definitions of
predicates here. The rest of axioms and inference rules
are almost the same as EON [1] and we only list them
in Appendix A.
668
Definition 2.4. (Coinductive Defi.nitions)
Let v == vP.Ax.A(P) where x is a sequence of variables whose length is the same as the arity of a predicate
variable P and A(P) is a formula displaying all the occurrences of P in a formula A. Suppose that C(x) is a
formula displaying all the occurrences of variables x in
the formula.
Vie have the following axioms:
Vx(v(x) -+ A(v)),
(vI)
Vx(C(x) -+ A(C)) -+ Vx(C(x) -+ vex)).
(v2)
v P.Ax.A(P) means the greatest fixed point of the function from a pr~dicate P to a predicate Ax.A(P).
We define a theory TID- as a theory T1Dv except for
the 2 axioms of coinductive definitions of predicates.
3
Coinductive
Predicates
Definitions
of
We explain coinductive definitions of T1Dv and show
some examples of formalization of streams by coinductive
defini tions.
or the fixed point of the function
AP.AX.X
= (Xo, Xl) & (Xo = 0 V Xo = 1) & P(Xl). (2)
There may be many solutions P for (1). For example,
AX.l. is one solution of (1), though it is not our intended
solution. AX.l. is the least solution. Our intended solution is the greatest solution of (1) or the greatest fixed
point of (2). Hence we have the solution in TIDv and it
is represented as follows:
BS == v P.AX.X = (Xo, Xl) &
(Xo = 0 V Xo = 1) & P(Xl).
Let 0 be f-Ls.(O, s). 0 represents the zero stream whose
elements are all o. We can show BS(O) by coinduction
(v2). Let C be AX.(X = 0) in (v2), then we have
Vx(x = 0-+
X = (xo, Xl) & (xo = 0 V Xo
-+ Vx(x = 0 -+ BS(x)).
= 1) & Xl = 0)
By definition of 0,
Vx(x
= 0-+
X = (xo, Xl) & (xo = 0 V Xo = 1) &
Proposition 3.1.
Let v be vX.Ax.A(X). Then
Vx(v(x) f-t A(v))
holds.
Xl
= 0)
holds and we have
Vx(x = 0 -+ BS(x)).
( vI')
Proof 3.2.
By (vI), we get vex) -+ A(v). By letting C be Ax.A(v)
in (v2), A(v) -+ vex) holds. 0
This proposition shows that vP.Ax.A(P) is the solution of the following recursive equation of a predicate
P:
P(x) f-t A(P).
(v2) says that vP.Ax.A(P) is the greatest solution of
Let X = 0" then we get BS(O").
The coinductive definitions of predicates play an important role also to represent predicates of properties of
streams [3, 6]. We will define the extensional equality
s ~ t for streams sand t. This equality can be represented by the coinductive definitions of predicates. ~
is the greatest solution of the following equation for a
predicate P:
P(x, y) f-t Xo = Yo & P(Xb Yd.
Therefore ~ can be formalized in TIDII as follows:
~ == VP.AXY·Xo = Yo & P(Xl,Yl).
this equation or the greatest fixed point of the functi~n
AP.AX.A(P).
Streams can be formalized by coinductive definitions
[3]. Therefore we can formalize streams in TID II .
We represent a stream by an infinite list (a, s) constructed by pairing where a is the first element of the
stream, s is the rest of the stream. In this representation, if s is a stream, we can get the first element of s by
So and the rest by Sl.
We present an example of bit streams. A bit stream
is a stream whose elements are 0 or 1. We will define
a predicate BS(x) which means that x is a bit stream.
When we write down a formula BS(x) in a naive way, BS
itself occurs in the body of the definition as follows:
BS(x) f-t x = (xo, Xl) & (xo = 0 V Xo = 1) & BS(Xl).
BS is a solution P of the following equation for a predicate P
P(x)
f-t
4
Model of TIDv
We will briefly explain semantics of TIDII by giving its
intended model.
We will use classical set theory and the well-known
greatest fixed point theorem for model construction in
this section.
Theorem 4.1. (Greatest Fixed Point)
Suppose S be a set, p( S) be a power set of S. If
f : p( S) -+ p( S) is a monotone function, there exists a
such that a E p( S) and
1. f(a)
= a,
2. For any bE peS), if b c feb), then be a.
669
a is abbreviated as gfp(f).
We will construct a model M' of TIDv extending an
arbitrary model M of TID-. Our intended model of
TID- is the closed total term model whose universe is
the set of closed terms [1]. We denote the universe by U.
We will define p 1= A in almost the same way as for a
first order logic where A is a formula and p is an environment which assigns a first order variable to an element
of U and a predicate variable of arity n to a subset of
un and which covers all the free first order variables and
all the free predicate variables of A. We present only the
definition for the case (vP.>.x.A(P))(t).
Define F as follows:
Ixl = n,
F : p(U n ) - t p(U n ),
F(X) = {x E un I p[P := X] 1= A(P)},
where p[P := X] is defined as follows:
p[P := X](P) = X,
p[P := X](x) = p(x)
if x is not P.
Then p 1= (vP.>.x.A(P))(t) is defined as t E gfp(F).
Note that F is monotone since a predicate variable P
occurs only positively in A(P).
Theorem 4.2.
If TIDv f- A, then p 1= A for any environment p which
covers all the free variables of A.
Theorem 4.3.
T1Dv is consistent.
5
q-Realizability Interpretation
of TIDv
We will explain motivation of our realizability. We
start with a usual q-realizability and try to interpret (vP.>.x.A(P))(x). Let v be vP>.x.A(P) and then
v(x) f-+ A(v, x) holds. We want to treat v(x) and A(v, x)
in the same manner. So we require (e q v( x)) f-+
(e q A(v,x)). Therefore it is very natural to define
(e q v(x)) as v*(e,x) where v*(e,x) is the greatest solution of a recursive equation for a predicate variable X*:
X*(e, x) f-+ (e q A(v, x))[(r q v(y)):= X*(r, y)].
where [(I" q v(y)):= X*(r, y)] of the right hand side
means replacing each subformula (I" q v(y)) by a subformula X*(r,y) in a formula (e q A(v,x)). We get
the following definition of our realizability by describing
syntactically this idea.
Our realizability in this paper is an extension of
Grayson's realizability. We can also define usual qrealizability of coinductively defined predicates in the
same way as in this paper.
Definition 5.1. (Harrop formula)
1. Atomic formulas ..1, N(a) and a
= b are Harrop.
2. If A and B are Harrop, then A & B, C
and (vP>.x.A)(t) are also Harrop.
-t
B, VxA
Since a Harrop formula does not have computational
meanings, we can simplify the q-realizability interpretation of them.
Definition 5.2. (Abstract)
1. A predicate constant of arity n is an abstract of arity
n.
2. A predicate variable of arity n is an abstract of arity
n.
3. If A is a formula, >'Xl ... xn-A is an abstract of arity
n.
We identify (AXl ... xn.A)(tl , ... , t n) with A[XI
t l , .. ·, Xn := t n] where [Xl := t l , ... , Xn := t n] denotes
a substitution.
Definition 5.3. ( q-realizability Interpretation)
Suppose A is a formula, PI,".' Pn is a sequence of
predicate variables whose arities are ml, ... ,mn respectively and Fl , Gi, ... , Fn , Gn is a sequence of abstracts
whose arities are ml, ml + 1, ... ,mn , mn + 1 respectively.
(e qPl, ... ,Pn[Fl , Gl, ... , Fnl Gn] A)
is defined by induction on the construction of A as follows.
We abbreviate qPl, ... ,pJFl , Gl , ... , Fn, Gn] as q',
qP1, ... ,Pn,p[Fl, Gl , ... , Fn, Gn, F, G]
as
qp[F, G],
Fl, ... ,Fn as F and Pl"",Pn as P.
1. (e q' A) == e = O&Ap[F]
where A is Harrop.
2. (e q' Pi(t)) == Fi(t) & Gi(e, t).
3. (e
Pi
Q(t)) (l:::;i:::;n).
q'
Q(t) & Q*(e, t)
where Q
:t
4. (e q' A & B) == (eo q' A) & (el q' B).
5. (e q' A V B) == N(eo) &
(eo = 0 - t (el q' A)) &
(eo =/:. 0 - t (el q' B)).
6. (e q' A-tB) == (A-tB)p[F]&Vq((q q' A)-t
(eq q' B)).
7. (e q' VxA(x)) == Vx(ex q' A(x) ).
8. (e q' :3xA(x)) == (el q' A(eo)).
9. (e q' (vX.>.x.A(X))(t)) ==
(vX*.>.ex.(e q'x[vp[F],X*] A(X)))(e, t)
where v == vX.>.x.A(X).
In the above definition, Pl, ... ,PJFl, Gl , ... , Fn, Gn] means
a substitution. Our realizability interpretation is something like a realizability interpretation with a substitution.
670
6
Proposition 5.4.
Let v vP.Ax.A(P).
=
1. Vxr((r
q
vex))
f-+
(r q A(v))).
2. Axr.r q Vx(v(x) -+ A(v)).
Proof 5.5.
By the definition of q-realizability and (vI'). 0
Definition 5.6.
For a formula A, a predicate variable P and a term I,
we define a term (]'~,j by induction on the construction
of A as follows:
1. A is a Harrop formula, then (]'~,j
=Ar.O.
2. A
= P(t), then (]'~,J = Ar.ltr.
3. A
= Q(t), then (]'~,J = Ar.r if Q ¢
P.
(]'~,j
Vx(A(x)
\
p,j( r ( (]'Al
P,j q)) .
= Arq.(]'
A2
-+
B(x, Ix))
then
P.
=
-+
3yB(x,y))
effectively from the proof of the specification formula.
Proposition 5.7.
Let v vP.Ax.A(P). Then
Aq'f.-lf.Axr.(]'~(~)(qxr) q
Vx(C(x) -+ A(C)) -+ Vx(C(x)
-+
Theorem 6.1. (Program Extraction)
Suppose that we prove a specification formula
Vx(A(x) -+ 3yB(x, y)) of a program in TIDv and we
have a realizer j such that
VX(A(x)-+(jx q A(x))).
Then we can get a program I and a proof of
-
=
(vQ.Ax.A1)(t),
= (f.-lg.Axr.(]'~~g((]'~/r))t where Q ¢
with
In this section, we give general program extraction theorem, stream program extraction theorem for TIDv and
an example of program synthesis.
Program synthesis by theorem proving techniques has
been studied both in typed theories [2] and untyped theories [4]. For untyped theories, realizability interpretation
is used as the foundation of program synthesis by theorem proving techniques. In Section 3, we showed that
streams and programs which treat streams can be formalized in TIDv by the facility of coinductively definitions
of predicates. In Section 5, we showed that realizability
interpretation can be defined for TIDv and the interpretation is sound. Hence we can synthesize programs
which treat streams by theorem proving techniques in
TIDv using realizability interpretation.
.
We represent streams by infinite lists constructed by
pairing. We represent a specification of a program by a
formula:
VX(A(x)
9. A
Synthesis
where x is an input, y is an output, A(x) is an input
condition and B(x, y) is an input output relation.
- \ (p,j
P,J)
4 . A -A
= 1 & A 2, th en(]'AP,J =Ar·(]'A1rO,(]'A2r1.
6 . A -A
= 1 -+ A 2, then (]'AP,J
Program
Streams
vex))
holds.
We prove it in Appendix B.
Proof 6.2.
Since the specification formula is proved in TID v, by
soundness theorem of q-realizability interpretation we
have a realizer e such that
e q VX(A(x) -+ 3yB(x,y))
holds. Let I be Ax.(ex(jx))o. Then the claim holds. 0
We can synthesize programs in the following steps:
1. We write down a specification formula.
2. We prove the specification formula in TIDv.
Theorem 5.B. (Soundness Theorem)
If TIDv f- A, we can get a term e from the proof of f- A
and TIDv f- (e q A) holds where all the free variables
of e are included in all the free variables of A.
Proof 5.9.
By induction on the proof of f- A. The case of the
axiom (vI) is proved by Proposition 5.4. The case of the
axiom (v2) is proved by Proposition 5.7. 0
3. We extract a program from the proof.
The program extraction theorem says that the third step
can be automated completely.
Example 6.3.
We show an example of the program which gets a
stream of natural numbers and returns a stream whose
each element is the element of the input stream plus one.
671
The predicate NS( x) which says that x is a stream
of natural numbers can be represented in TIDv by the
facility of coinductive definitions of predicates as follows:
NS == VX.AX.X = (Xo, Xl) & N(xo) & X(XI).
The input condition of the specification is a formula
NS(x).
The input output relation of the specification is a formula ADD1(x,y) which is defined as follows:
ADDl == VX.AXY.Yo = Xo + 1 & X(XI,YI).
The specification formula is:
Vx(NS(x)
-t
3yADD1(x, y»).
We have one problem for this program synthesis
method. The coinduction cannot be applied to the part
Vx(NS(x) - t ... ) in the above example. We cannot prove
3y AD D1( x, y) by the coind uction in general. Therefore
the realizer of the coinduction cannot give a loop structure for the program. On the other hand, a realizer of
the induction principle plays an important role for this
approach of program synthesis since the realizer corresponds to a loop structure of a program [4, 7]. Therefore we need the new method by which a realizer of the
coinduction also corresponds to a loop structure and is
useful.
Then we need more specialized program extraction
method for programs with streams in which the coinduction is useful. We give one solution for this problem
by the next theorem.
We put 2 restrictions on the theorem: One is that
the input condition A(x) must be the form (VX.AX.X =
(Xo, Xl) & A(xo) & X(Xl)(X) for some A. The other is
that the input output relation B(x, y) must be the form
(vX.Axy.B(x, Yo) & X(XI' Yl)(X, y) for some B. Th~se
restrictions require an input condition and an input output relation are uniform over data and they are natural
when we suppose that an input X and an output yare
both streams.
Theorem 6.4. (Stream Program Extraction)
Suppose that the specification formula is Vx(A(x)
Then we define
BO == vX.Ax.3zB(x, z) & X(Xl).
If we have e such that
-t
BO(x)),
we can get a term F such that
Vx(A(x)
-t
-t
2. We prove the corresponding formula Vx(A(x)
BO(x» in TIDv.
-t
3yB(x, y»).
3. We extract a program Ax.filter(ex(jx)) from the
proof where e is a realizer of the corresponding formula Vx(A(x) - t BO(x»).
In the second step, we can apply the coinduction to
prove the part BO(x) since BO(x) is defined by coinductive definitions. Therefore a realizer of the coinduction
can correspond to a loop structure of the program.
Example 6.5.
We treat the same example as above again. The specification formula is a formula Vx(NS(x)-t3yADD1(x, y)).
Hence the formula ADD10(x) is:
ADD10 == vX.Ax.3z(z
B(x, Fx))
where
filter == Jlf.AX.(Xoo, fXI),
F == Ax.filter(ex(jx)).
We prove it in Appendix C.
By this theorem, we can synthesize programs in the
following steps:
= Xo + 1) & X(XI).
(3)
Therefore the corresponding formula we must prove is:
Vx(NS(x)
-t
ADD10(x )).
(4)
If we prove this formula in TID v, we can get the program which satisfies the specification by stream program
extraction theorem.
The conditions of the theorem hold for this case. We
can put j == AX.JlS.(O, s) since
Vx(NS(x)
-t
(Jls.(O,s) q NS(x))).
We prove (4) in the following way here: Firstly, we
prove
Vx(NS(x)
-t
3z(z = Xo
+ 1) & NS(XI)).
(5)
This is proved by letting z be Xo + 1. Secondly, by letting
C be NS in (v2) for ADDlo, we have
Vx(NS(x) - t 3z(z = xo + 1) & NS(XI)) - t
-t
3yB(x, y»,
A == VX.AX.X = (xo, Xl) & A(xo) & X(XI),
B == vX.Axy.B(x, Yo) & X(Xb yd
and we have a termj such that Vx(A(x)-t(jx q A(x))).
e q Vx(A(x)
Vx(A(x)
1. We write down a specification formula
Vx(NS(x)
-t
ADD10(x)).
(6)
Finally, by (5) and (6), we get (4).
We calculate realizers corresponding to the above
proofs as follows: The realizer corresponding to the proof
of (5) is:
el == Axr.( (xo + 1,0), Tll)',
el q Vx(NS(x) - t 3z(z = Xo + 1) & NS(XI)).
The realizer corresponding to the proof of (6) is:
e2
e2
== Aq.Jlf.AXT.(J"(qxr),
q Vx(NS(x)
Vx(NS(x)
-t
3z(z = Xo
ADD10(x»
-t
+ 1) & NS(XI)) - t
where
(J"
== Ar.((Too, TOI), fXITI).
The realizer corresponding to the proof of (4) is:
e == e2ell
e q Vx(NS(x) - t ADD10(x)).
672
Appendix
We get
e
= p,j.Axr.((xo + 1,0),jxlrn).
The extracted program is:
Fx
filter(ex(jx))
A
= filter(Jx(p,s.(O,s)))
= (p,g.AX.(Xo + 1, gXl))X
where j == p,f.Axr.((xo + 1,0),jxlrn). This is the program we expect.
Remark that the realizer e2 of the coinduction (6) gives
a loop structure of the program F.
Axioms and Inference Rules
of TIDv
The logical axioms and inference rules are the same as
the ones of a usual intuitionistic logic.
Axioms for Equality:
Vx(x=x)
(El)
Vx, y(x = y & A(x)
--+
(E2)
A(y))
Axioms for Combinators:
Axioms for Pairing:
I would like to thank Mr. Satoshi Kobayashi and Mr.
Yukiyoshi Kameyama for careful comments. I'm deeply
grateful to Prof. Masahiko Sato for invaluable discussions and comments.
Vx,y(Po(pxy)
VX,y(Pl(PXY)
[1] M. Beeson, Foundations of Constructive Mathematics (Springer, 1985).
N(O)
(Nl)
Vx(N(x) --+ N(SNX))
Vx(N(x) --+ PN(SNX) = x)
Vx(N(x) --+ SNX =I- 0)
A(O) & Vx(N(x) & A(x) --+ A(SNX))--+
Vx(N(x) --+ A(x))
(N2)
(N3)
(N4)
Vx, y, a, b(N(x) & N(y) & x = y --+ dxyab
Vx, y, a, b(N(x) & N(y) & x =I- y --+ dxyab
ics with the Nuprl Proof Development System
(Prentice-Hall, 1986).
B
(Dl)
(D2)
(r qp[F, AyX.(y q C(x))] A)--+
((7~,fr qp[F,Ayx.3r((r q C(x)) & y = jxr)] A).
(2) If a predicate variable P occurs only negatively in
[6] R. Milner, Communication and Concurrency (Prentice Hall, 1989).
Proof B.2.
[9] M. Tatsuta, Monotone Recursive Definition of Predicates and Its Realizability Interpretation, Proceedings of Theoretical Aspects of Computer Software,
LNCS 526 (1991) 38-52.
= a)
= b)
Lemma B.I.
(1) If a predicate variable P occurs only positively in
a formula A,
a formula A,
[8] M. Tatsuta, Realizability Interpretation of Greatest
Fixed Points, Manuscript (1991).
(N5)
Proof of Soundness Theorem
[5] S. Kobayashi, Inductive/Coinductive Definitions
and Their Realizability Interpretation, Manuscript
(1991).
[7] M. Tatsuta, Program Synthesis Using Realizability,
Theoretical Computer Science 90 (1991) 309-353.
(PI)
(P2) .
Axioms for d:
[2] R.L. Constable et al., Implementing Mathemat-
[4] S. Hayashi and H. Nakano, PX: A Computational
Logic (MIT Press, Cambridge, 1988).
= x)
= y)
Axioms for Natural Numbers:
References
[3] P. Dybjer and H.P. Sander, A Functional Programming Approach to the Specification and Verification
of Concurrent Systems, Formal Aspects of Computing I (1989) 303-319.
(Cl)
(C2)
Vx,y(Kxy = x)
Vx, y, z(Sxyz = xz(yz))
Acknow ledgments
(r qp[F, Ayx.3r((r q C(x)) & y = jxr)] A)--+
((7~,J r qp[F, AyX.(y q C(x))] A).
We prove (1) and (2) simultaneously by induction on
the construction of A. 0
Proof B.3. (of 5.7)
Let v == vP.Ax.A(P).
Suppose
Vx(C(x) --+ A(C)),
q q Vx( C(x) --+ A( C))
and let
j == p,j.Axr.(7~(~)(qxr).
We show
j q Vx(C(x) --+ v(x)).
673
Let v*(r, x) == (r q v(x». It is sufficient to show
Vxr((r q C(x» --t v*(fxr, x».
This is equivalent to
Vxy(3r((r q C(x» & y = jxr) --t v*(y,x».
By (v2), it is sufficient to show
Vxy(3r((r q C(x» & y = jxr) --t
(y qp[v, Ayx.3r((r q C(x» & y = jxr)] A(P)).
This is equivalent to
Vxr((r q C(x»--t
(fxr qp[v, Ayx.3r((r q C(x» & y = jxr)]
A(P»).
Fix x and r and assume
r q C(x).
We show
A(XI) & (filter(fx)h = filter(gxI»)'
We will prove it.
Fix x and j and assume that
Vx(A(x) --t (fx q BO(x»,
(7)
(8)
(9)
A(x).
By (8) and (9), (fx q BO(x» holds. Hence
holds.
Therefore
(filter(fx»o
13(x, (filter(fx»o)
holds
since
= (fx)oo.
Put 9 be Ay.(f(XO, y) h. We will show Vy(A(y) --t
(gy q BO (y )) ). Fix y and assume that A(y). By the
By the assumption about q,
qxr q A(C).
Hence
definition of A,
A(x)
qxr qp[C, AyX.(y q C(x»] A(P).
By positivity and Vx(C(x) --t v(x»,
qxr qp[v, AyX.(y q C(x»)] A(P).
A(P).
= jxr)]
f-7
X = (xo, Xl) & A(xo) & A(XI)
and
By Lemma B.l,
O"~(~)(qxr) qp[v, Ayx.3r((r q C(x» & y = jxr)]
A(P).
D
C
13(x, (filter(fx»o) &
3g(Vx(A(x) --t (gx q BO(x») &
((fX)OI q 13(x, (fx)oo» & ((fxh q BO(XI») (10)
jxr qp[v, Ayx.3r((r q C(x») & y = jxr)] A(P).
By jxr = O"~(~)(qxr), we have
jxr qp[v, Ayx.3r((r q C(x) & y
By only rules of NJ, it is equivalent to
Vxj(Vx(A(x) --t (fx q BO(x») & A(x)--t
Proof of Stream Extraction
Theorem
Lemma C.l.
Suppose that
A == VX.AX.X = (xo, Xl) & A(xo) & X(XI),
B == VX.AXy.13(X, Yo) & X(Xll YI),
BO == vX.Ax.3z13(x, z) & X(XI)'
Then
Vj(Vx(A(x) --t (fx q BO(x»)--t
Vx( A( x) --t B( x, filter(f x»»)
holds.
Proof C.2.
By only rules of NJ, the above goal is equivalent to
Vxy(3j(Vx(A(x) --t (fx q BO(x») &
A(x) & y = filter(f x» --t B(x, y).
By (v2), it is sufficient to show
Vxy(3j(Vx(A(x) --t (fx q BO(x») &
A(x) & y = filter(fx»--t
13(x, Yo) & 3g(Vx(A(x) --t (gx q BO(x») &
A(XI) & YI = filter(gxl»)'
A( (xo, y)) f-7 A( xo) & A(Y)
hold. By this and (9), A( xo) holds. Hence A( (xo, y)
holds. Combined it with (8), we get (f(xo, y) q
BO((xo,Y)). Hence ((f(xo,y)h q BO(y) and (gy q
BO(y») hold. Therefore we get Vy(A( x) --t(gy q BO(y»).
By (9), A( Xl) holds. Since, in general, (filter( s) h =
filter(sl) holds, we get (filter(fx»)l = filter((fxh) =
filter(gxl)' Therefore (7) holds. D
Proof C.3. (of Theorem 6.4)
By the aElsumptions and the definition of
qrealizability, Vx(A(x) --t (exUx) q BO(x») holds. Letting j be Ax.ex(jx) in Lemma C.l, we get Vx(A(x)--t
B(x,Fx).
D
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
674
MLOG: A STRONGLY TYPED CONFLUENT FUNCTIONAL LANGUAGE
WITH LOGICAL VARIABLES
Vincent Poirriez*
Universite de Reims
INRIA-Ecole Normale Superieure PARIS, FRANCE
Abstract
A new programming language called MLOG is introduced. MLOG is a conservative extension of ML with
logical variables. To validate our concepts, a compiler
named CAML Light FL UO was implemented. Numerous examples are presented to illustrate the possibilities
of MLOG. The pattern-matching of ML is kept for Acalculus bindings and an unification primitive is introduced for the logical variables bindings. A suspension
mechanism allows cohabitation of pattern-matching and
logical variables. Though the evaluation strategy for
the application is fixed, the order for evaluation of the
parts of pairs and application remains free. MLOG programs can be evaluated in parallel with the same result
obtained irrespective of the particular order of evaluation. This is guaranteed by the Church Rosser property observed by the evaluation rules. As a corollary,
a strict A-calculus with explicit substitutions on named
variables is shown to be confluent. A completely formal
operational semantics of MLOG is given in this paper.
1
Introduction
Many attempts have been made at integrating functional and logical tools in the same language. It actually seems worthwile to combine the strengths of the
two paradigms, allowing the programmer to choose the
most appropriate tool to resolve his problem. The approach we have followed is to add "logical" tools to a
well-known strongly typed functional language: ML. To
validate our ideas and to demonstrate that MLOG is a
realistic proposal, we have implemented a compiler for
MLOG named "CAML Light FL UO" . It is an extension
of the CAML Light system of X.Leroy[Leroy 90]. Logical variables and unification serve two goals in logical
languages: to handle partially defined values, and to
provide a resolution mechanism. The implementation
of logical variables and unification is a required step to
"Projet Forme! BP 105 Domaine de Voluceau
78153 Rocquencourt Cedex, FRANCE.
poirriez~margaux.inria.fr
implement a resolution mechanism, so we bypass that
second goal and focus on the first one. MLOG is an
extension of ML with built-in logical variables instantiable once, and unification. We allow a fruitful cohabitation of logical variables and ML pattern matching
by introducing a suspension mechanism, thus when an
application cannot be evaluated due to a lack of information, the application is suspended. In the designing
of MLOG, we strive to obtain a conservative extension
of ML. Pure ML programs are not penalized by the
extension. This result is obtained by limiting the domain of logical variables and suspensions to specified
logical types. Moreover, MLOG inherits from ML a
strong system of types and a safety property for the execution of well-typed programs. Thus the programmer
does not waste energy in checking types. In this article, we trace the execution of programs that illustrate
that synchronisation algorithms, demand driven computation, algorithms using potentially infinite data structures or partially instantiated values are easily written
in MLOG. Then we focus on the confluence property.
In MLOG, the strategy for the evaluation of an application is strict evaluation: i.e. we impose the evaluation
of the argument before reducing the application. Nevertheless, some freedom remains in the order of evaluation
of a term: both parts of an application or of a pair for
example. Then MLOG is independent of the implementation choices and it can be implemented on a parallel
machine. As we fix the strategy for the evaluation of
the applications, we can name variables without risking
clashes. A complete operational semantics is given in
appendix. A subset limited to the functional part of
these rules is a strict A-calculus with explicit substitutions and named variables that verify the Church Rosser
property. That calculus is a very simple formalism and
as it is confluent, it is a good candidate to describe any
implementation of strict A-calculus, even a parallel one.
2
MLOG syntax and examples
We describe here the added syntax to ML. As MLOG is
an extension of ML, all programs of ML are programs of
675
MLOG. For clearness, we limit ourselves to a mini-ML.
All examples are produced by a session of our system
CAML Light FLUO. Note that # is the prompt and;;
the terminator of our system.
2.1
Syntax
The language we consider is A-calculus with patternmatching, concrete types (either built-in, as int or
string, or declared by the user), constructors, the let
construct and the conditional. We first define the set
P of programs of MLOG. We assume the existence of
a countable set Var of term variables, with typical elements x, y, and a disjoint countable set C of constructors, with typical elements c. Some constructors are
predefined: integers, strings, booleans (true, false) and
0, the element of type unit. In the following, i ranges
over integers and s over strings. The syntax of patterns,
with typical element p, 'is:
p ::= x I c I (pI, ... ,Pn) I c P
As in ML, we limit ourselves to linear patterns. The
syntax of programs, with typical elements a, b, is:
where [ ] surround optional expressions. A logical type
is declared by the new key-word: type logic. The
type void below has a unique value void and logical
variables of type void may be declared. The type void
is isomorphic to the type unit except that no logical
variable can be declared in unit. A value of the type
Bool below is True, False, or a free logical variable
that will possibly be instantiated later to either True or
False.
#type
Type
#type
Type
logic void = void;;
void defined.
logic Bool = True
False;;
Bool defined.
a then evaluate b and return the value of b. The last two
constructs are specific to MLOG: undef is a generator
of fresh logical variables; unif is the unification primitive. let_ var u in '" is syntactic sugar for let u
= undef in ....
The following rules govern type variable instantiations:
(1) ) a may be instantiated by any type (including 'b?);
(2) ) a? may be instantiated by any logical type; (3) 'a?
may not be instantiated by a non logical type.
We write "a : ti" the program a of type ti. Thus,
the set of MLOG programs is in fact the subset of the
well-typed programs Py of P defined by the familiar
ML type system. We just have to specify that: (1) undef: 'a?; (2) unif: 'a -+' a -+ void. Fortunately, as far
as types are concerned, logical variables and assignable
constructs are quite close, we have adapted to logical
variables previous work done for typing assignable objects in ML. We have directly applied the idea of Pierre
Weis and Xavier Leroy [LeroyWeis 91], and, using their
notion of cautious generalization, we get an extension of
the ML. type system to logical variables that is sound:
2.2
Theorem 1 No evaluation of a well-typed program can
a :: = x I c I a b I (a 1, ... , an) I let x = a in b I a; b
(function Pl -+ al I ... I Pn -+ an) I undef I unif
a; b is the ML notation for a sequence, it means evaluate
Types
leads to a run-time type error.
In MLOG, the programmer has to declare specially the
types that may contain undefined objects (that is, logical variables and suspensions). The notion of logical
type, is introduced. We assume given a countable set of
type variables TVar, with typical elements 'a, 'b, a disjoint countable set of variables over logical types LTVar
with typical elements 'a?, 'b? and two countable sets of
type constructors with typical elements ident and lident.
The sets of logical types .c, with typical element 'Ti, and
types T (typical element ti) are recursively defined by:
'Ti::= 'a? I [til lident
and
ti::= 'Ti I bool I int I string I unit I ti -+ tj I ti
[til ident
Note that .c is a strict subset of T.
declare new type are:
* tj
I
Expressions to
type ['a, ... , 'kJident = c [of tiHi .. ·Ic' [of tjll I
type logic ['a, ... ,'k] lident = c [of ti][l ... Ic' [of tj]]
Thus CAML Light Fluo has a type-checker that infers
and checks the types of programs.
2.3
Examples
We give below very simple examples to illustrate the
semantics of unification and logical variables in MLOG.
First logical variables are instantiable once, when the
unification fails, the exception Unify is raised:
#let (u:Bool) = undef;;
Value u : Bool u = ?
#unif u True; unif u False;;
- : void Uncaught exception: Unify
#u; ;
- : Bool - = True
CAML Light FL UO prints "?" for a free logical variable. Rational trees are allowed; unif does not perform
any occur-check. Moreover, unif does not loop when
unifying rational trees. The type 'a stream below implements the potentially infinite lists.
676
#type logic 'a stream = Nil 1St of 'a * 'a stream;;
Type stream defined.
#let (u:int stream) = undef;;
Value u : int stream u
?
#unif u (St(l,u));u;;
int stream
- = St (1, St (1, St (1, St (1 ,Interrupted.
The printing of u was interrupted by a system break.
At that point we can use classical technics used in the
logical languages, see for example in the appendix the
classical functional quicksort program, except that difference lists are used instead of lists to improve the concatenation of sorted sublists.
2.4
Suspensions: an intuitive semantics
Consider first the example below:
#let neg = function True -) False IFalse -) True;;
Value neg : Bool -> Bool
#let b,exp = let_var u in (u, neg u);;
Value b : Bool Value exp : Bool b = ? exp
b is a new free logical variable of type Bool. The application cannot match u with True or False: u is free.
So what is the meaning of exp? The answer is: the
application neg u is suspended. Thus, exp is a suspension of type Booll. A suspension is a first class citizen
in MLOG. It may be handled in data structures, and
used in other expressions.
#let exp' = unif exp False;;
Value exp' : void exp' = ...
Since exp is a suspension, MLOG cannot perform the
unification of exp with False. Therefore this unification
is also suspended 2 • Let us now instantiate b with True,
and look at exp and exp' .
#unif b True; exp,exp';;
Value - : Bool * void
#let (a,b,e) = let_var a,b in
(a,b,(function True ->(unif a True))b);j
Value a : Bool Value b : Bool Value e: void
b =?
e = ...
e is suspended waiting for the instantiation of b.
#unif b True;;
Value - : void -
#a;; Value - : Bool -
= void
INote that CAML Light FLUO prints suspensions as " ... ".
2That is why the'type of the result of unif has to be a logical
type. We do not want to have suspension in a non logical type.
= True
The example above illustrates the fine control on evaluation allowed by the suspension mechanism. The application is performed and then a is instantiated only
when b is instantiated.
A confluence result
3
To give an operational semantics for MLOG we have to
deal with bindings of .A.-calculus variables, bindings of
logical variables and suspensions. We give here a simple
formalism that allows us to keep named parameters and
we show that this calculus is strongly confiuent 3 . In this
section we neglect types.
3.1
A strict calculus with environment
We store bindings of parameters in environments. We
call EA the set of terms with environments. As our
calculus is strict, we specialize a subset Val of EA which
is the set of the values handled by the language. Typical
elements of Val and EA are respectively noted v and t.
e ::= [] I (x,v)::e
v ::= c I c(v) I (v,v')
t
::=
3.2
(False,void)
We have to clarify when a suspension is awakened.
Awakening a suspension could be delayed until it is actually needed. We must define when such an evaluation
is needed:
a =?
As b is instantiated, e can be awakened. If we choose
to wake up a suspension only if its value is needed, e
remains suspended and then a remains free. If the value
of a is needed, nothing indicates that the evaluation of
e will instantiate a. This motivates our choice to wake
up all suspended evaluations that can be awakened. Another motivation is that, if an expression is suspended, it
is because its evaluation was needed and unfortunately
was stopped by lack of information. So if we look at a:
I (function ... ).e
c I c(t) I (t,t') I t(t') la.e
Logical variables, substitutions and suspensions
Now we have to extend the set Val with logical variables. We assume the existence of a countable set U
disjoint with V and C with typical element u( i), distinct logical variables have distinct indexes. We call
LVal and ELA the obtained sets of values and terms
with environments. To manage the bindings of logical
variables we define substitutions as functions from U
to ELA. We will use greek letters to note substitutions. We call the domain of (J' and note dom( (J') the set
{u(i) s.t. (J'(u(i)) -=I u(i)}. We will note (J' 0 a the composition of substitutions. The MLOG pattern matching
algorithm has to deal with logical variables. It has to
3Recall that if no strategy for application is imposed, name
clash may occurs. To avoid that problem, the names of variables
can be replaced by numbers "iI. la De Bruijn"[AbadiCaCuLe 90,
HardinLevy 90]
677
access to the pointed value when it checks a bound variable, it fails with Unknown when it tries to match a free
logical variable with a construct pattern. We define the
match of a term t with a pattern pat in the substitution
0- and note ~cr(pat, t) as the list of appropriate bindings
of parameters of pat. Recall that patterns are linear.
We define now a sequential pattern matching without
entering into the optimization of the algorithm4.
if
where t is the term to reduce. The substitution 0- stores
the bindings of unified logical variables and updated
suspensions. The valuation a stores the suspensions
(recall they are bound to u(j) with j < 0). The substitution r stores the suspensions of which evaluations
are running. We use the classical notation ~ and .!!:.,
for reflexive transitive closure of - t and for derivations
of length n. We first have two lemmas that say that no
term of the form (a.e).e' is produced and that the term
component of a normal form is a value.
Lemma 1 Let a be a program and
t, 0-, a, r >. For all subterms of t oj
the form t'.e, i' is a program.
< a.[], 0, 0, 0 >.!!:.,<
Lemma 2 Let a be a program and
< a.[], 0, 0, 0 >~< i, 0-, a, r > such that
<
i, 0-, a, r
>
is a normal form. Then t is a value.
We can deduce from these lemmas that all bindings in
0- bind a variable with a value. Let us remark now that
if no suspension rule is applied, as we do not reduce
under a A and we impose a strict calculus we have strong
confluence for our reduction rules.
678
< t, 0-, a, r >-t< tll 0-1, a, rl >
and < t,o-,a,r >-t< t2,0-2,a,r2 > two reduction using respectively rules r 1 and r2 with ri not a suspension rule. Then we have by the application of respectively r2 and r1: < tl, 0-1, a, rl >-t< t3, 0-3, a, r3 >
and < t2, 0-2, a,
>-t< t3, 0-3, a, >
Proposition 1 Let
r2
r3
An important corollary of that result is that if we restrict ourselves to the functional subset of MLOG, we
have describe a strong confluent calculus with explicit
substitutions and named variables. That calculus is
rather simple (all that concerns logical variables and
suspensions is unnecessary) and describes all implementations of a strict >.-calculus, even a parallel one.
Remark that -t is not strongly confluent on the whole
language. That is illustrated by the example below
where the choice is between UniIT and Susp and the
diagram cannot be closed in one step as even if U niIT
is chosen after Susp waking up the suspension remains
to be done.
<
((fun c -t c').[] u(l), unif u(l) c), 0, 0, 0 >
We can see the use of a rule Susp, ASusp or USusp
as the translation of subterm from the term to r. From
a reduction point of view we can say that these rules do
not work. Thus the idea is to define an equivalence between four_uples < t, 0-, a, r > which is stable for these
suspension rules and then show the strong confluence of
-t up to that equivalence.
Definition 1
< t, 0-, a, r >==< t', 0-', a', r' >
iff
1. there exists a permutation P over positive variable
index such that (0- 0 a 0 r)*(t) = P(o-' 0 a' 0 r')*(t')
2. and for all u(i) in dom(o-) with i > 0, (0r)*( u(i)) = P( 0-' 0 a' 0 r')*( u(P( i)))
0
a
0
3. and for all u( i) in dome a) U dom(r) or there exists
j < 0 such that u(j) in dome a') U dom(r') and
(o-oaof)*(u(i)) = P(O"'oa'or')*(u(j)), either there
exists a subtermt~ oft' such that (o-oaor)*(u(i)) =
pea' 0 a' 0 r')*(tD
and vice versa for all u( i) in dome a') U dom(f')
or t
= t' = failwith(s)
Thus we have verified the Church Rosser property (the
proof is in appendix C):
Theorem 3 If < t, 0-, a, r
then it is unique up to
>
has a normal form for-t
==
Remark that if we add types as defined in the section
above, the rules have not to be modified and the result
holds.
4
MLOG: a conservative extension
ofML
The fact that the type of undef is ' a? ensures that
no logical variable occurs in a non-logical type. That is
not enough to ensure that no suspension of a non-logical
type is built. Fortunately, we handle type information
when we compile the pattern matching. Thus we have
the following rules for the application:
Let f be a function of type tl -t t2: (1) if type tl is a
non-logical type, then do not do any test to check if the
argument is a free variable or a suspension. (2) if type
tl is a logical type, then (21) first, test if the argument
is a bound logical variable or an updated suspension,
and access the bound value. (22) if type t2 is a nonlogical type, test if the argument is a free variable or a
suspension. If so, raise failure Unknown. (23) if type t2
is a logical type, test if the argument is a free variable
or a suspension. If so build and return the appropriate
suspension.
Example:
#type logic 'a partial = P of 'a;;
Type partial defined.
#(function (P x) ->x) undef;;
uncaught exception Unknown
Theorem 4 Let a be a well-typed program. The evaluation of a cannot build a logical variable or a suspension
of a non-logical type.
We can now deduce that MLOG is a conservative extension of ML as pure ML programs need not know for
the extension. However, it is clear that with that rule
of failure, our calculus is no longuer Church Rosser. To
keep that property, we must not use functions from a
logical type to a non-logical type. Let call M LOG* the
subset of MLOG that does not contain such functions.
Thus, we have the following result.
Proposition 2 The
relation
-t
is
confluent
on
MLOG*.
Remark: The counterpart of the conservative property of MLOG is the need to be cautious with logical variables and "functional types". First, for any instances of' a and 'b the type' a -t' b cannot include a logical variable as it is a "pure ML" type. Anyway, it is correct to have logical variables of type (int -t int)partial
as illustrated below.
#let app (P h) (P x) = P (h x);;
Value app:('a->'b)partial->'a partial->'b partial
#let (g: (int -> int)partial)=undef;;
Value g : (int -> int) partial g = ?
#let e2 = app g (P 2);;
Value e2 : int partial e2 = ...
679
#unif g (P (fun x -> x*x»;;
- : void - = void
#e2; ;
- : int partial
5
P 4
Conclusion
We have defined MLOG as an extension ofML. We have
shown that it verifies a Church Rosser property and
then it may be parallelized or used to simulate parallel
processes. Such processes can communicate with each
other through shared logical variables and the suspension mechanism allows synchronization. Partial data
are handled by MLOG, for example potentially infinite
lists can be implemented by the use of free logical variables for the tail of the structure (see example in appendix).
MLOG includes a suspension mechanism, let us now
compare it to some other proposals of integration that
have made a similar choice. MLOG is close to the
language Qute defined by M.Sato and T.Sakurai in
[SatoSakurai 86]. However, it differs from it in the following points: (1) its evaluation strategy ensures that
the evaluation of a suspended expression will be tried
only when needed information is provided; (2) the reduction of an application is allowed even if a subexpression of the argument is suspended, the only condition is
that pattern matching succeeds, in that case the binding
of the suspension by a logical variable and the storage
in a avoid duplication of that suspension.
MLOG is also close to GHC ofK.Veda [Ueda 86], the
main difference (except for typing point of view) is that
MLOG does not have non-determinism for rule selection and that we have preferred to keep the functional
formalism in place of the predicate one as selection of
rules is done by pattern matching. However, determinist GHC programs are easily translated in MLOG6.
The use of a suspension mechanism and the cohabitation of logical variables and functions are common
to Le Fun of H.Ait Kaci[Ait Kaci 89] and MLOG. Here
the main differences are that Le Fun provides a resolution mechanism based on backtracks and that MLOG
is strongly typed.
Perhaps the main difference between MLOG and
these related works is that MLOG is a conservative extension of ML. We demonstrate that the type system
of ML can be extended to MLOG and we gave a safety
property for well typed programs. As a side effect, we
have described an operational semantics for strict Acalculus which uses names for parameters and verifies
the Church Rosser property. Therefore it can be used to
6The author has traduced all programs given by G.Huet in
[Huet 88], he found that the use of types and of a functional formalism lead to more clear programs.
describe any interpreter of strict A-calculus, even parallel one. If it seems desirable, further work can be
done to provide a resolution mechanism in MLOG. Note
that the exhaustive search transformation described by
K.Ueda in [Ueda 86] is applicable.
We hope that MLOG is an attractive extension of ML
as from a "logical paradigm" point of view it allows handling incomplete data structures and controlled parallel
evaluation with the improvement of the ML type system. And from a "functional paradigm" point of view,
it respects functional programs with the improvement
of partial data and a fair control mechanism.
Acknowledgments: We would like to thanks all members
of LIENS-INRIA Formel project for helpful discussions. In
particular Therese Hardin for her accurate suggestions to improve our formalism and demonstration.
A
Appendix: MLOG programs
The program below is the classical functional quicksort program, except that difference lists are used instead of lists to
improve the concatenation of sorted sublists. This is done
by the use of the same variable r in both recursive calls of
qsortrec.
#let partition order x =
let rec partrec = function
Nil -> Nil,Nil
ISt(h,t) -> let infl,supl = partrec t in
if order(h,x) then St(h,infl),supl else infl,St(h,supl)
in partrec ;;
Value partition
('a*'b->bool)->'b->'a stream->'a stream*'a stream
#let quicksort order 1 =
let rec qsortrec = function
(Nil,result,sorted) -> (unif result sorted); result
I(St(h,t),presult,sorted) ->
let infl,supl = partition order h t in
let_var r in (qsortrec(supl,r,sorted);
qsortrec(infl,presult,St(h,r»)
in qsortrec (l,undef,Nil) ;;
Value quicksort: ('a*'a->bool)->'a stream->'a stream
The following example illustrates the use of potentially infinite lists and demand driven computation. The confluence
property allows to parallelize the evaluation of nested applications in the definition of the Hamming sequence of integers
of the form 2i * 3j * 5k [Dijkstra 76J.
#let mult (P X,P y) = P(x*y)jj
Value mult : int partial * int partial -> int partial
#let rec times (u,St(v,r» = St(mult(u,v),times(u,r);;
Value times:
int partial*int partial stream->int partial stream
#let rec merge (St(P x,s),St(P y,r») =
if xy then Step y,merge (St(P x,s),r» else
Step x, merge(s,r»jj
Value merge: int partial stream*int partial stream ->
int partial stream
#let rec copy_stream (St(a,b)as s) (St(h,t» =
unif a hj copy_stream b tj Sjj
Value copy_stream: 'a stream -> 'a stream -> 'a stream
680
#let Hamming = let_var r in
copy_stream
(St(P l,merge(merge(times(P 2,r),times(P 3,r»,
times(P 5,r»» rj
r· .
"
Value Hamming : int partial stream Hamming = ?
#let rec increase_stream st = function
o -> st
I n -> let_var tail in unif st St(undef,tail)j
increase_stream tail (n-l) jj
Value increase_stream : 'a? stream -> int -> 'a? stream
#increase_stream Hamming 9j Hamming;;
Value - : int partial stream
- =St(P l,St(P 2,St(P 3,St(P 4,St(P 5,St(P 6,St(P 8,
Step 9,St(P 10,?»»)))
B
Env
< x.(x, t) :: _, a, 0:, r
>~<
t, a, 0:, r >
EnvO
< x.(y, t)
>~<
x.e, a, 0:, r >
Const
< c.e,a,o:,r
AEnv
< (t t').e,a,o:,r
UEnv
< (unif t t').e,a,o:,r
PEnv
< (t,t').e,a,o:,r
PairlF
Pair2F
< t',a,o:,r >~< failwith(s), a, 0:, r >
< (t,t'),a,o:,r >~< failwith(s),a,o:,r >
>
Pairl
Lemma 3 If
< t,CT,a,r
of a suspension rule then
Proposition 3 If <
t',CT',a',r' > by application
< t,CT,a,r >=< t',CT',a',r' >
tl,CT1,al,rl
>~<
t~,CT~,a~,r~
>
Proof: We carefully discuss one case, others are similar:
>~<
>~<
>
(unif t.e t'.e),a,o:,r
(t.e,t'.e),a,o:,r
>
>
Susp
< t, a, 0:, r > is in ~ normal form. Cs =k
a*(f) = (fun PI ~ aII···1 pn ~ an).e,
~< u(-n),a, (u(-n), a*(f) t):: o:,r >
and C s +- (k + 1)
ASusp
< t, a, 0:, r > is in ~ normal form. c.=n
a*(f) = u(i)
< f t, a, 0:, r >~< u(-n), a, (u( -n), u(i) t) :: 0:, r >
and c. +- (n + 1)
Fail
< t, a, 0:, 0> is in ~ normal form. ~< failwith(Pattern), a, 0:, r >
UnifT
< t, a, 0:, 0 > and < t', a, 0:, 0 >are in ~ normal form
uniju(t, t') a'
Let L = 0 if a' = a or a'(u(i» = u(j)
for all u(i) E dorn(a')\dom(a)
and L = queueu,,,,(u(i» in other cases
< unif t t', a, 0:, r >~< void, a' , o:\L >, L U r >
UnifF
< t,a,0:,0 > and < t',a,0:,0 >
are in ~ normal form
unifu(t, t') = fail
< unif t t',a,o:,r >-.< failwith(Unif),a,o:,r
>~<
u(c),a,o:,r
>
=
USusp
< t, a, 0:, 0 > and < t', a, 0:,0 >
are in ~ normal form
unifu(t, t') = 8usp(u(i», C s = n
< unif t t', a, 0:, r >~
< u(-n),a,(u(-n),uniJ t t'):: o:,r >
and c. +- (n + 1)
Aw
u(i) E dorn(r) and r(u(i» = t
< t,a,0:,0 >~< t',a',o:',0 >
and < t' , a' , 0:' , 0 > not in normal form
< to,a,o:,r >~< to,a',o:',r[u(i) +- t'] >
>~<
by application of a rule distinct of a suspension rule,
and if < t l , CTI, al, r 1 >=< t 2, CT2, a2, r 2 > then we
have < t~,CT~,a~,r~,> such that < t2,CT2,a2,r2 >~<
t~,CT~,a~,r~ > and < t~,CTLa~,n >=< t~,CT~,a~,r~ >
(t.e t'.e),a,o:,r
< t, a, 0:, 0 > is in ~ normal form
a*(f) = (fun PI ~ aII···1 pn ~ an).e,
~< ai.ei @ e,a,o:,r >
Demonstration of theoreme 3
Let us give preliminary results.
>~<
(3
Figure 1: Structural rules
C
c,a,o:,r >
< undef.e,a,o:,r
and c +- (c + 1)
Pair2
We assume that we ha.ve a function queue such that
queueu,cxu(i) returns all the suspensions in a waiting for instantiation of u(i). The rule DVar uses a counter c that is
increased each time a new logical variable is created. c is initially at 1. The rules Susp and USusp use an other counter
C s dedicated to suspensions also initially at 1, they increase
a with the new suspension. The rules UniIT and AwUpd
increase CT with the new bindings and increase r with the
suspensions waiting for these instantiations or update. Note
that we remain free to choose the order of evaluation of binary constructs as for Ell.. (We give in figure 1 the rules
for pairs, rules for unification and application are similar.).
Moreover, the order of evaluation of terms bound in r is also
free (see rule Aw).
>~<
DVar
Reduction rules
< t,a,o:,r >~< failwith(s),a,o:,r >
< (t,t'),a,o:,r >~< failwith(s),a,o:,r
:: e, a, 0:, r
u(i) E dorn(r) and r(u(i» = t
< t,a,0:,0 >~< t',a',o:',r" >
and < t' , a' , 0:' , 0 > is in normal form
AwUpd r' = queueu,,,,(u(j»
< to,a,o:,r >~
< to, (u(j), t') :: a', o:'\r', r" u r' u r\ {( u(j), tn
AwFail
u(i) E dom(r) and r(u(i» = t
< t,a,0:,0 >~< failwith(s),u,0:,0 >
< to,a,o:,r >~< jailwith(s) , u, 0:, r >
>
>
681
Let < h,O"I,al,r l > be reduced by f3 applied on a subterm of t l . Let note that subterm
(fun PI -+ al I ... I Pn -+ an).e v. By the hypothesis of
== we have (0"20 a2 0 r 2)'"(t2) = tl, thus the corresponding
subterm of t2 is of one of the following forms: u(i); u(i) u(j);
(fun PI -+ al I·.· I Pn -+ an).e w. We examine the first two
forms:
(1): u(i). First as 0"2 binds variable with values, we have
O"z(u(i)) = u(j) and u(j) ¢ dom(0"2). The == hypothesis ensures that u(j) ¢ dom(a2) as in that case the application
would be suspended when the rule f3 applies on tl' Thus we
have: O"z(r 2(u(j))) = (fun PI -+ al I ... I Pn -+ an).e v.
The == hypothesis ensures that the same pattern matchs in
both reduction and then application of Aw with the rule f3
on that term clearly leads to an equivalent four_uple.
(2) u(i) u(j). The fact that bindings in a2 and r 2 are
bindings of logical variable to non value terms ensure that
O"2{u(i)) = (fun PI -+ a1 I··· I Pn -+ an).e and O"2'(u(j)) = V;
then f3 applies on u( i) u(j) and leads to an equivalent
four_uple.¢ We have now the result of strong confluence of
-+ up to ==,
all <
t, 0", a, r
> such that:
< t,O",a,r >-+< tl,O"l,al,r l >
< t,O",a,r >-+< t2,0"2,a2,r 2 >
There exists < t~, O"~ , a~ , r~ > and < t~, O"~, a~ , r~ > such
Theorem 5 For
that
< t1, 171, al, r 1 >~ < ti, O"~ , ai, ri >
< t2,a2,a2,r 2 >~< t~,a~,a~,r~ >
< ti, a~ , ai , r~ > == < t~, O"~ , a~ , r~ >
/e~
;/"\:
.
el
e2
e2
el
II
e3/
Two suspensions
One suspension
e
~ ~
el
e2
~/
e'l
II
e'2
No suspension
Figure 3: Strong confluence
Proof of the theorem: We show that the diagram of figure
4 holds with the theorem above and by successive inductions
on lengths of d1 and d2 .<>
Remark that the limitation to a strict calculus is necessary. If we permit reducing application without reducing the
argument, as some unification may occur in that argument
different normal forms are possible. Example:
< (fun
(x,y)
-+
unif x True).[](u(l),unif u(l) False),0,0,0 >
has two normal forms:
°
< void, ((u(l), True)}, 0, >
and
\ i
e'l • e'2
Figure 4: Church Rosser property
References
[AbadiCaCuLe 90J M.Abadi, L Cardelli, P.L. Curien, J.J.
Levy, "Explicit sUbstitutions", Proc. Symp. POPL 1990
[Ait Kaci 89J H.Ait-Kaci, R. Nasr, "Integrating Logic
and Functional Programming", Lisp and Symbolic
Computation,2,51-89 (1989)
[DeGrootLindstrom 86J D
DeGroot
G
Lindstrom
(eds), "Logic Programming - Functions, Relations and
Equations", Prentice-Hall, New-Jersey, 1986.
[Dijkstra 76J E.W. Dijkstra, " A Discipline of Programming", Prentice Hall, New Jersey, 1976.
[HardinLevy 90J T. Hardin, J.J. Levy, "A Confluent Calculus of Substitutions", third symposium (IZU) on LA.
Proof: it is illustrated in figure 3. The cases where at least
one reduction use a suspension rule are: if both rl and T2 use
suspension rules, then the lemma 3 is enough to conclude.
If one ri use a suspension rule, then we conclude with, the
proposition 3 and the lemma 3.<>
e
dl;/e~d2
< jailwith(Unij),{(u(1),False)},0,0>.
[Huet 76] G.Huet, "Resolution d'equations dans leslangages
d'ordre 1,2, ... ,w" These d'etat de l'Univ. de Paris 7, 1976
[Huet 88] G. Huet, "Experiments with GHC prototypes",
may 1988, unpublished
[Laville 88] Alain Laville, "Implementation of Lazy Pattern
Matching Algorithms" , ESOP'88,LNCS 300
[Leroy 90] X.Leroy, "The ZINC experiment: an economical
implementation of the ML language", INRIA technical
report 117, 1990.
[LeroyWeis 91] X.Leroy P.Weis, "Polymorphic type inference and assignment", "Principles of Programming Languages", 1991.
[Poirriez 91] V.Poirriez, "Integration de fonctionnalites
logiques dans un langage fonctionnel fortement type:
MLOG une extension de ML" These, Univ. Paris 7, 1991
[Poirriez 92a] Vincent Poirriez, "FLUO: an implementation
of MLOG", Fifth Nordic Workshop on Programming
Languages in Tampere, 1992
[SatoSakurai 86J M.Sato et T.Sakurai "QUTE: a functional
Language
Based
on
Unification".
In
[DeGrootLindstrom 86J pp 131-155.
[PuelSuarez 901 A. Suarez and L.Puel, "Compiling pattern
matching by term decomposition", LFP'90
[Ueda 861 K. Ueda, " Guarded Horn Clauses", Ph.D.Thesis,
Information Engineering Course, Univ. of Tokyo, 1986.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
682
A New Perspective on
Integrating Functional and Logic Languages
John Darlington
Yi-ke Guo
Helen Pull
Department of Computing
Imperial College, University of London
180 Queen's Gate London SW7 2BZ U.K.
E-mail: jd.yg.hmp@doc.ic.ac.uk
February 1992
Abstract
Traditionally the integration of functional and
logic languages is performed by attempting to integrate their semantic logics in some way. Many
languages have been developed by taking this approach, but none manages to exploit fully the programming features of both functional and logic languages and provide a smooth integration of the two
paradigms. We propose that improved integrated
systems can be constructed by taking a broader view
of the underlying semantics of logic programming.
A novel integrated language paradigm, Definitional
Constraint Programming (DCP), is proposed. DCP
generalises constraint logic programming by admitting user-defined functions via a purely functional
subsystem and enhances it with the power to solve
constraints over functional programs. This constraint
approach to integration results in a homogeneous unified system in which functional and logic programming features are combined naturally.
1
tend the conventional constraint logic programming (CLP)
framework by using a functional programming language to
define the domain over which relations are defined. Thus we
combine functional programming with a general CLP framework rather than with the conventional Prolog-like system ..
We call the resulting language paradigm Definitional Constraint Programming (DCP). We claim that DCP provides a uniform and elegant integration of functional, constraint and logic programming, while preserving faithfully
the essence of each of these language paradigms.
In section 3, constraint systems and constraint programming are investigated at a very general level. A constraint
logic programming model is then presented in section 4
as a particular constraint programming paradigm. Section
5 presents constraint functional programming (CFP) as a
framework which superimposes a solving capability on the
functional programming paradigm. The definitional constraint programming paradigm is developed in section 6. We
discuss future work in section 7 and make some concluding
comments in section 8.
Introduction
During the past ten years the integration of functional and
logic programming languages has attracted much research.
An extensive survey and classification of their results can
be found in [GLDD90]. Traditionally this integration is performed by attempting to integrate the respective semantic
logics of functional and logic languages in some way, resulting in a "super logic language". The conventional understanding is that a logic program defines a logical theory
and computation is attempting to prove that a query is a
logical consequence of this theory. Taking this view, integration is regarded as enhancing the original logic to cope
with functional programming features and results in a new
logic programming system. In section 2 we survey the main
results of this approach. It seems to us that this approach
fails to deliver all the features of both functional and logic
programming. The main source of inadequacy appears to
stem from the respective "intended semantics" assumed for
logic and functional languages. It is this intended semantics
which we question, motivating our search for a new way of
approaching the problem of integrating functional and logic
languages.
We show in later sections that if we regard functional programming as defining a higher-order value space, we can ex-
2
Background and Motivation
From the traditional view of logic programming, integrating functional and logic languages is viewed as enhancing
the original logic to cope with functional programming features. Most approaches take first-order equational logic as
the semantic logic of functional languages and combine it
with Horn clause logic. A comprehensive presentation of
the theory of Horn clause logic with equality may be found
in [GM87] and [Yuk88]. This shows that for every theory
in Horn clause logic with equality, its initial model (called
the least Herbrand E-model in [Yuk88], and the least Herbrand model in [Sny90]) always exists. Crucially, the initial
model is the intended model of a logic programming system,
since, according to the Herbrand theorem, the model is complete with respect to solving a query. For a Horn clause with
equality program r and a query 3Xl,' .. , xnAJ, .. . , An where
Ai is an atom or equation, a computational model must verify r 1= 3Xl,' .. , xnA 1 , •• • , An by computing an answer substitution (J such that r 1= V((JAI 1\ ••. 1\ (JAn). Such models
integrate SLD-resolution with some form of equational deduction such as paramodulation. A complete computational
model was proposed recently by Snyder et. al. [Sny90] as a
goal directed inference system. Systems which aim to sup-
683
port the full power of Horn clause logic with equality include
Eqlog [GMS4], which exploits fully the order-sorted variation
of the logic, SLOG [FriS5] in which a completion procedure
is used as the computational model, and Yukawa's system
[YukSS] which uses an explicit axiomatization of equality.
The computational difficulties of constructing a practical
programming language based on the full Horn clause logic
with equality leads us to conclude that this approach is not
appropriate. Alternative languages overcome these problems by imposing syntactic and semantic restrictions on the
paradigm. They all aim either to restrict the use of, or to
weaken, defined equality. An example of the first approach is
Jaffer and Lassez's Logic Programming Scheme [JLS6],
in which the equality part of a program is defined separately from predicate definitions. A program uses a firstorder equational sublanguage to define abstract data types
over which a definite clause subprogram is imposed. Operational models are based on SLD-resolution together with
an E-unification procedure which solves equations over the
equality defined by the equational subprogram.
Another way to restrict the computational explosiveness
of general equational deduction is to use equational clauses
as directed rewrite rules. A full discussion may be found
in [DOSS]. Narrowing [HuISO] (resp. conditional narrowing
[DOSS]) is employed to solve equations in a rewriting system
(resp. conditional rewriting system). Many languages have
been developed along this line, e.g. RITE [DPS6b], K-Leaf
[EGPS6]. They represent enhanced Prolog systems in which
a "rewrite" relation is defined over the Herband space. Syntactic restrictions guarantee the confluency of this rewrite
relation so that equational logic can mimic first order functional programming. In the case of K-Leaf, the Herbrand
space is enhanced to include partial terms, thus the lazy
evaluation of functional languages may be modeled.
These endeavours have led to the development of several
very successful languages and have significantly enriched the
state of the art of declarative language design, semantics
and implementation. However, we believe that the benefits of this combination are arguable and question how
much is gained by enhancing a first-order logic by weakening a higher-order logic. Moreover, even with only firstorder equational logic added, the inefficiencies of equational
deduction mean that the resulting system is far from practical. This approach to language integration results in a
sophisticated theorem prover, which we find unsatisfactory.
We suggest, therefore, some fundamental rethinking on the
purpose of integrating functional and logic languages.
In fact, the conventional assumption that a logic program
defines a logical theory has been criticized in many circumstances because: "there is no reference to the models that
the theory is a linguistic device for" [MesS9]. A logical theory may have many models, however when we are programming we always have a particular intended model in mind.
This alternative school of thought regards a program as a
linguistic description of the intended model; but the model
itself is primary. For a Horn clause program, its least Herbrand model is taken as the intended model. Therefore, if a
program is regarded as a linguistic description of this model,
the canonical denotation of a program is not a first-order
theory but a set of relations over the Herbrand space. This
view oflogic programming has also been taken by researchers
wishing to extend Prolog-like systems. Hagiya and Sakurai
[MTS4] present a formal system for logic programming based
on the theory of iterative inductive definitions. A similar approach is taken by Hallnas and Schroeder-Heister to develop
the framework of General Horn Clause Programming
[AEHKS9]. Paulson and Smith proposed an integrated system in which a logic subprogram is regarded as an inductive
definition of relations [PSS9].
This definitional view of logic programming suggests the
flexibility to define Horn clauses over arbitrary domains.
Relations become constraints over the domain of discourse,
which coincides with the general framework of Constraint
Logic Programming [SmoS9]. In this paper, we take this
idea one step further by using a functional programming
language to define the domain over which relations are defined. A novel definitional constraint programming system
is induced in which functions and relations are used together
to define constraint systems.
3
Constraint Programming
In this section, we present a framework for constraint programming which has it origins in the seminal work of Steele
[SteSO]. From the mathematical point of view, constraints
are associated with well-studied domains in which some privileged predicates, such as equality and various forms of inequalities, are available. Relations formed by applying these
predicates are regarded as constraints. A constraint may be
regarded as a statement of properties of objects; its denotation is the set of objects which satisfy these properties.
Therefore, constraints provide a succinct finite representation of possibly infinite sets of objects. We present a simple
definition of constraint systems to capture these characteristics.
3.1
Constraint System
Definition 3.1 (Constraint System) A constraint system is a tuple < A, V, CP, I> where
• A is a set of values called the domain of the system.
• V is a set of variables.
• cp is a set of constraints.
We define an A-valuation as a mapping V -+ A, and
VA as the set of all A-valuations. A computable function V is used to assign to every constraint a finite set V( <1» of variables, which are the variables constrained by <1>. ValA denotes the set of all A-valuations.
• I is an interpretation which consists of a solution mapping [f, mapping every basic constraint to [f , a
set of A-valuations called the solutions of and []I
is solution closed in the sense that
We now present some examples of constraint systems. The
most familiar constraint system in the context of program-
684
ming languages is perhaps the Herbrand system which is a
constraint system over finite labelled trees.
Example 3.1.1 (Herbrand System) Let E be a set of
ranked signatures of function symbols and V be a set of constant symbols treated as variables. T(E) is the ground term
algebra consisting of the smallest set of inductively generated E-terms. A Herbrand system is a constraint system
< T(E), V, <1>, I> where consists of all term equations of
the form tl = t2 for tl, t2 E T(f' V), where T(E, V) is the
free term algebra, and [tl = t2] = {a 1 atl == at2} where ==
denotes the identity of two terms.
Example 3.1.2 (Herb rand E-System) Let E, V be as
above and E an equational theory over T(E, V). Then
T(E)j E denotes the quotient term algebra consisting of the
finest E-congruences over T(E) generated by E. The constraint system < T (E) j E, V, <1>, I > is called the Herbrand
E-System where consists of all term equations of the form
1
tl = t2 for tl, t2 E T(E, V) and [tl = td = {[alE 1
[atl]E = [at2]E}, where [t]E stands for the equivalence class
of t in T(E) and [alE : V --t T(E)j E stands for the corresponding equivalence class of ground term substitutions
a : V --t T(E).
Constraint systems on various term structures can be regarded as cases of the following general definition of an algebraic constraint system.
Example 3.1.3 (Algebraic Constraint System) Let A
be an algebra equipped with a set of operators E and a set
of predicates II. Then the algebra is associated with a constraint system SA : , I > where 1A 1is the carrier of the algebra and evey constraint in is of the form
p( el, ... , en) where every ei is an A -expression and p E II
is an n-ary predicate in the algebra. [p( el, ... , en) = {a 1
A, a 1= p( el,"" en)}. Examples of algebraic constraints are
constraints over term algebras, constraints over arithmetic
expressions and constraint systems in boolean algebra.
f
Following the idea of associating constraint systems with
algebras, predicate logic can be viewed from the constraint
system perspective.
[ is valid iff [2, denoted where T(I;) is the ground term algebra for
the signature I; of function symbols (Herbrand space) and
be a constraint system closed
under conjunction, renaming and existential quantification.
Given a signature R as a family of user-defined predicates
indexed by their arities, a constraint logic program rover
C is a set of constrained defining rules of the form :
P
f-
Cl, . . . ,
Cj, B 1 ,
Bm
••• ,
P is an R-atom of form p(xt, ... , xn) where p E Rn is an
n-ary user-defined predicate, Ci E
be a constraint
system and r a constraint logic program over C. The sequence of interpretations In represents a chain in the complete lattice of interpretations of r. The limit of the chain
is the minimal model of rover C.
687
In this least model semantics of a CLP program the underlying constraint system is extended to a new constraint system
via user-defined constraints. We call this a relational extension of a constraint system.
Definition 4.1 (Relational Extension) Let r be a constraint logic program and R be the signature of user-defined
predicates in r. r constructs the constraint system :
R( C) :< A, V,
as a relational extension of the underlying constraint system C :< A, V, by extending fc
{a I (a(xl), ... ,a(xn )) E pIr}
where Ir is the minimal model of rover C.
A solver for a relationally extended constraint system
can be constructed by integrating SLD-resolution with the
constraint solver of the underlying constraint system to
give constrained SLD-resolution. Constrained SLDresolution rewrites a goal of the form G = Gc U Gn, where
Gn is a finite subset of atoms in
G':3Xu Y < GnU{B1, ... ,Bm}UGcU{ q""'Ck }U{Xl-Sl , ... ,xn-Sn}>
where VY.P(Xl ... Xn ):- CI, ... ,ck,BI, ... ,Bm is a
variant of a clause in a program
InCap ([m, 2m, 3m], 1000)
InCap (x,1.1c-i), x=[2m, 3m], i=m, c=1000
--+R InCap(x',1.1c'-i'), x=i':x', c'=1.1c-i,
x=[2m, 3m], i=m, c=1000
--+c InCap (x',1.1c'-i'), i'=2m,
x'=[3m],i=m, c'=1100-m
--+R InCap(x",1.1c"-i"), x'=i":x",1.1c'- i'=c"
, i'=2m, x'=[3m],i=m,c'=1100-m
--+c InCap (x",1.1c"-i"), x"=[], i"=3m,i'=2m,
i=m,x'=[3m], c'=1100-m, c"=1210-3.1m
--+R x"=[],1.1c"-i"=0, i"=3m, i'=2m, i=m,
x'=3m, c'=1100-m, c"=1210-3.1m
--+c 1.1(1210-3.1m)=3m
--+c m =207+413/641
--+R
Constrained SLD-resolution is a sound solver for a relationally extended constraint system and, as proved in
[Sm089], it is also well-founded. Therefore, any consistent
goal can be simplified to a set of solved forms. Let l G~ as its solved forms,
given a goal G~ ~f
G~ I- Vi>l G~. Moreover, if the underlying constraint system is compact, then G~ I- Vi=l G~ for some n,
i.e. the model has the stronger completeness of section 3.2.
r.
Constraint Simplification'
G:3X
• G':3X
if 3X.G c ----tc 3X.G~ and ----tc is the simplification
derivation realised by the solver in the underlying system.
Finite Failure: G:3ia~~~Gc
if 3X.G c ----tc false.
In this model, semantic resolution generates a new set of constraints whenever a particular program rule is applied. The
unification component of SLD-resolution is replaced by solving a set of constraints via the underlying solver. Whenever
it can be established that the set of constraints is unsolvable,
finite failure results.
For example, the following CLP program [CoI87]:
InCap ([], 0)
InCap (i :X~ c)
we compute the solved form of the goal constraint: InCap
([m, 2m, 3m), 1000). One execution sequence is illustrated
below, in which --+ R denotes a semantic resolution rewrite
step:
InCap (x, 1.1*c - i)
can be used to compute a series of instalments which will
repay capital borrowed at a 10% interest rate. The first rule
states that there is no need to pay instalments to repay zero
capital. The second rule states that the sequence of N+1 instalments needed to repay capital c consists of an instalment
i followed by the sequence of N instalments which repay the
capital increased by 10% interest but reduced by the instalment i. When we use the program to compute the value
of m required to repay $1000 in the sequence [m, 2m, 3m},
5
Constraint Functional Programming
Constraint functional programming (CFP) is characterized
as functional programming, enhanced with the capability
to solve constraints over the value space defined by a functional program. An intuitive construction of this language
paradigm is presented below.
5.1
Informal CFP
A data type D in a functional program, r, can be associated
with a constraint system CD. CD may contain privileged
predicates over D. A CFP system may be formed to extend the constraint solver so that any D-valued expression,
which may involve user-defined functions, can be admitted
in constraints. A D-valued expression must be evaluated to
its normal form with respect to r to enable the constraint
solver to handle that value.
We give a simple example of this paradigm. We assume a
constraint system over lists in which atomic constraints are
equations asserting identity over finite lists. A unification
algorithm is used as the basic solver for the system. Given
a functional program defining the function ++ which concatenates two lists and the function length which computes
the length of a list:
688
data [alpha] = [] I alpha : [alpha]
functions
++ :: [alpha] X [alpha] ---t [alpha]
length :: [alpha] ---t Num
[] ++
(x:y)
z = z
z = x: (y++z)
+ length
y
An extension to the basic solver may be used to solve the
constraint:
11
++
12
= [al,
a2, ... , an], length 11
= 10
to compute the first 10 elements of a the list [aI, a2, ... , an].
The solver must apply the function definitions of ++ and
length and must guess appropriate instances of the constrained variables. We will show that this procedure itself
may be modelled by some new constraints generated during
rule application.
Solving constraints over a functional program significantly
enhances the expressive power of functional programs to incorporate logic programming features. This idea was central to the absolute set abstraction construct which was
originally proposed in [DAP86,DG89] as a means to invoke
constraint solving and collect solutions. Using the absolute
set abstraction notation, the above constraint may be represented as the set-valued expression:
{ 11 I 11
++ 12 = [al,
.. -
Decl in Exp
.. -
ft = e
Exp
.. -
Pattern
++
length [] = 0
length (x:y) = 1
Program
Decl
a2, ... , an], length 11
= 10 }
Reddy's proposal of "Functional Logic Programming" languages [Red86] also exploits this solving capability in functional programs. However, his description of functional logic
programming as functional syntax with logic operational semantics fails to capture the essential semantic characteristics
of the paradigm. The constraint programming approach, as
we will show in the following, presents a concise semantical
and operational model for the paradigm.
We assume a functional language that is strongly typed, employs a polymorphic type system and algebraic data types,
and supports higher-order functions and lazy evaluation.
Examples of such languages are Miranda [Tur85] and Haskell
[Com90]. To investigate constraint solving we put aside the
statical features of a functional language such as its type
system, and concentrate on its dynamic semantics. We use
a kernel functional language with recursion equation syntax
for defining functions. We assume variables ranged over by
x and y, a special set of functional variables (identifiers)
ranged over by f and g, constructors ranged over by d, constants ranged over by a and b, patterns ranged over by t
and s and expressions ranged over bye. A pattern is assumed to be linear, i.e. having no repeated variables. Data
terms comprise only constants, constructors and first-order
variables. The following syntax defines this tiny functional
language:
I Decl; Decl
x I a I el e2 I el op e2
I if el then e2 else e3
.- x I a I dt1 , ••• , tn
The language can be regarded as sugared A-calculus and a
program as a A-expression. The program shown above is
an instance of this formalism in which the data statement
introduces a list structure with a nullary constructor [] and
a binary constructor :, and functions length and ++ which
are defined by recursion equations.
The semantics of a functional program is given in the standard way [Sc089]. The semantic domain D of the program
is an algebraic CPO which is the minimal solution of the
domain equation :
D = B.l+ C(D)+D
---t
D
D contains the domain B.l of basic types (real numbers,
boolean values et. al. lifted by .1 which denotes undefinedness), the domain C ( D) for constructed data structures which consists of partial terms ordered with respect to
the monotonicity of constructors and the domain D ---t D
of all continuous functions. A subdomain A of C(D) :
A = B.l + C(A) is distinguished as the domain of data
terms in the language (which is defined by the eq-type of
ML [MiI84]). We use T to denote all complete objects of A.
For a functional program, the semantic function P[]
computes the value of the program in terms of the function D[] : Decl ---t (Var ---t D) ---t (Var ---t D) which
maps function definitions to an environment which associates each function name with its denotation. The function
£[] : Exp ---t (Var ---t D) ---t D maps an expression together
with an environment Tf : Var ---t D (a D-valuation) to an
element of D.
5.3
Evaluating Nonground Expressions
Conventional functional programming involves evaluating a
ground expression to its unique normal form by taking a
program as a rewriting system. To superimpose a solving
capability on the functional programming paradigm, we consider first the extension of functional programming to handle
non-ground expressions. The meaning of a non-ground expression is a set of values corresponding to every correctly
typed instantiation of its free variables. Narrowing has been
proposed as the operational model for computing all possible values of a nonground expressions [Red84]. In the theorem proving context, enumerating narrowing derivations
provides a complete E-unification procedure for equational
theories defined by convergent rewriting systems. This use
of narrowing must be refined for the functional programming
context. Due to the lazyness of functional languages, only
those narrowing derivations whose corresponding reduction
derivations are lazy should be enumerated. This notion of
lazy narrowing is mentioned by Reddy in [Red84]. A lazy
narrowing procedure, pattern-driven narrowing, is proposed by Darlington and Guo in [DG90] for evaluating absolute set abstractions. A similar procedure was indepen-
689
dently developed by You for constructor based equational
programming systems [You88]. Here we present a lazy narrowing model following the constraint solving approach. The
model is central to the CFP paradigm.
Consider reducing a non-ground expression of form fe by a
defining rule ft = e'. The environment 11 should be enhanced
to satisfy £[ e]1] = £[ t]ry, i.e. ry is a solution ofthe rewriting
constraint e = t. This equality is the so called semantic
equality since it is determined by the identity of denotations
of components. It is not even semidecidable since it involves
verifying the equivalence of partial values. However, since
in our problem t is always a linear pattern, a semidecidable
solver exists.
a standard reduction in [Hue86j. Enumerating patterndriven derivations is optimal and complete in the sense that
any other derivation is subsumed by a pattern-driven derivation.
Definition 5.1 The solved form of a rewriting constraint
e = t is of the form {Xl = tl,· .. ,Xn = tn, YI = el, ... ,Ym =
em} where the Xi E V( e) are output variables and the
Yi E V( t) are input variables. The equation set 8 : {Xl =
tl , ... ,Xn = t n } is an output substitution equation and
B: {YI = el, ... , Ym = em} is an input substitution equation.
We conclude that pattern-driven narrowing ·provides a realisation of lazy narrowing. Lazy narrowing extends functional programming with the capability to find for which
values of variables in a nonground expression the expression
evaluates to a given value. Thus, it introduces the essential solving feature to functional languages. However, on its
own it is not enough because "built in" predicates may exist in functional languages, for example equality and various
boolean valued primitive functions, for which a dedicated
constraint solver is required. If we integrate lazy narrowing with a constraint solver over data terms, the solver is
then extended to allow general expressions containing userdefined functions. Therefore, querying a functional program
becomes possible. This enhanced functional programming
framework may be formalized as the paradigm of constraint
functional programming.
The substitutions 8 and (j corresponding to 8 and Bare called
output substitutions and input substitutions respectively.
5.4
The constraint solver presented below simplifies a rewriting
constraint to its solved form. Solving a rewriting constraint
realises the bidirectional parameter passing mechanism for
narrowing an outmost function application. The algorithm
is called pattern-fitting [DG89].
Substitution: {x = r} U G ::} {x = r} U pG
where p = {x 1--+ r}.
Decomposition: {de
= dt} U G ::} {e = t} U G
Formalizing CFP
We assume a constraint system Cy : (7, V, cI> Cl Ie) over firstorder values, where V is the set of variables over first-order
types and cI> e are constraints consisting of privileged predicates R. Computing the truth value of a ground relation of
data terms with respect to R is decidable. Thus, a predicate
w in
can always correspond to a boolean valued function
fw in the language. A functional program may be applied to
Cy. This introduces a new syntactic category in the functional program for constraints:
n
Constraint ::= w( el, ... , en) I Constraint, Constraint
Removing: {a = a} U G::} G
Failure: {dl el = d2e2} U G ::} false
if dl =I d2 •
Constrained Narrowing: {fe
where ft
=r
E
= ds} u G ::} {r = ds, e = t} u G
r
Lemma 5.1.1 The pattern-fitting algorithm is a complete
solver for simplifying a rewriting constraint to its solved
form.
For any rewriting constraint e = t, a solved form corresponds to a pattern-driven narrowing step fe ~8 (je' with
respect to a defining rule ft = e' where 8 is the output
substitution and (j is the input substitution associated with
the solved form. A pattern-driven narrowing derivation is
defined in a standard way by composing the output substitutions of each of its component steps. Note that a one
step pattern-driven narrowing derivation contains many narrowing steps due to the need to solve rewriting constraints.
Each narrowing step is demand driven and affects an outermost function application. Therefore, we have the following
theorem:
Theorem 5.1.1 For any expression e and term t, if e ~8
t, then the corresponding reduction derivation 8e --+ * t is always a lazy derivation . . Such a reduction derivation is called
where W(XI, ••. , xn) E cl>e. We use c to range over constraints.
Constraints in Cy are now enriched to admit general expressions defined by the functional program. A constraint
system is admissible if it is closed under negation. In the
following, we assume the underlying constraint system is
admissable. A CFP program is an extension of a functional
program with the syntax:
Program
::=
Decl in e I Decl in c
The semantic function C[] : Constraint
constraints to their solution sets:
C[ CI, C2]
C[w( ell ... , en)]
--+
P( Env) maps
C[ CI] n C[ C2]
{ryIUV( ei) I 7 1= w( £[ el]ry, ... , £[ en]ry)}
This semantics reveals constraint solving over a functional
language as "computing the environments" in which expressions, when evaluated, satisfy constraints.
The constraint solving mechanism is formed by integrating the solver of Cy with lazy narrowing, thus enhancing
Cy to handle constraints in the more general universe constructed by a functional program. A scheme for such an
integration is presented below. We use the pair (G, C) to
represent a goal G U C in which C contains rewriting constraints and G contains constraints from the underlying constraint solver G.
690
Constrained Narrowing:
where ft = r E r .
Simplification 1:
(Gu{w( ... ,je, ... )},C)
( GU{w(' .. ,r, ... )),C)u{e==t}
(g,',~)
if G ~ G' where ~ is a simplification derivation computed by the under lying solver.
Simplification 2:
some algebra, it is perfectly reasonable to define relations
over the system following the philosophy of general constraint logic programming. Therefore, CFP is a "building
block" for deriving a fully integrated Definitional Constraint Programming system in which both constraints
and the domain of discourse are user-definable.
19,'g1
if C ~ C' where ~ stands for a simplification derivation computed by the solver of rewriting constraints.
Failure:
(G,C)
6
if G ~ false or C ~ false
(G,CU{x==e})
S U b S t I't U toIon 1 :
(pG,CU{x==e})
where p = {x f---f e} and x E V(G) and C u {x = e} is
in solved form.
(GU{x==t},Ci
S U b S t IOt U toIon 2 :
(G,pCu{x==t )
where p = {x f---f t} and G U {x = t} is in solved form.
We are now in a position to present a unified definitional
constraint programming (DCP) framework. A DCP program defines a constraint system by defining its domain of
discourse and constraints over this domain. As discussed
above, CFP and CLP exhibit, respectively, the power to define domains, and the power to define constraints. Therefore
we would expect the unification of these two paradigms to
result in a full definitional constraint programming system.
We start by superimposing a functional program onto
a privileged constraint system. As shown in the previous
section, the functional program defining functions ++ and
length can be queried to compute the initial segment of a
given list. A further abstraction is possible if we take this
CFP enriched constraint system as the underlying constraint
system for a CLP language. Thus, CFP queries can be used
to define relations as new constraints. For example we can
define the relation front:
Definitional
ming
Constraint
Program-
false
Positive Accumulating:
( G, cU{Jw e==true})
(GU{w(e)},C)
Ifw(X)E~c.
A
I to
(G,CU{Jwe=false})
·
N egat Ive
ccumu a mg:
(Gu{-'w(e)},C)
if w( x) E ~ c and the constraint system is admissable.
An initial goal takes the form (G, {}). Its solved form is of
the form (G n, Cn) where Gn is in solved form with respect
to the underlying solver and V( G) rf. V( C) and C are solved
form rewriting constraints.
The soundness of lazy narrowing guarantees that the
enhanced solver is sound. However, it is not in general
com plete because a functional program may define some
boolean-valued functions which have no corresponding constraints in CT. This problem is similar to that of solving "hard constraints" in general constraint programming.
Some ways exist to resolve this problem such as the "waitingresuming" approach in which the solving of a hard constraint
is delayed until its variables are sufficiently instantiated
[JL87], or by defining special simplification rules for such
constraints. However, for a program in which all booleanvalued functions are consistent with the underlying constraint system, the scheme provides a complete enhanced
solver.
The scheme provides a generic model to enhance a constraint system to solve constraints in functional languages.
In [Pulga], Pull uses unification on data terms as the underlying solver and combines it with lazy narrowing to
solve equational constraints in lazy functional languages. In
[JCGMRA91], a more general constraint system over data
terms is adopted in which disunification is also exploited to
deal with negative equational constraints. This model can
be regarded as an instantiation of the scheme by providing
unification and disunification as the "built-in" solvers.
CFP represents a constraint programming system of the
"domain construction" approach of section 3.3. This means
that constraints appear only as computational goals; it is not
possible to define new constraints in the system. However,
the framework significantly enhances the expressive power
of both functional programs and the basic constraint system. Moreover, since a CFP program provides a constraint
system in which defined functions behave as operators in
front (n, 1, 11)
11
++ 12
= 1, length 11
=n
to compute the initial segment with length n of an input list
1. This systematic integration of CFP and CLP results in
a definitional constraint programming system and therefore,
can be expressed by the formula DCP = CLP( CFP).
It is straightforward to construct the semantic model of a
DCP program. The semantics for its functional component
are traditional functional language semantics. The intended
model of the relational component is its least model. This
may be constructed by computing all ground atoms generated by the program using the "bottom up" iterative procedure presented in theorem 4.0.1 and taking the functionally
enhanced constraint system as the underlying constraint system. In terms of the semantic functions defined above the
denotation of a defined predicate p in a program r can be
computed by enumerating the inductive closure of r as follows:
pO
0
pIn+l
{a( xl, ... , xn) I a E ni==l C[ Ci] n nj==l [Bj fn+l }
for each P(Xb.'.'Xn ):- cl, ... ,cn,Bl, ... Bm E r. [B]I
maps B to all solutions of B under the interpretation I for
the predicates in B. That is :
[p( el, . .. , en)f = {1] I (£[el]1], . .. ,£[ en]1])
E
pI
Compared with other functional logic systems, this general
notion of constraint satisfaction permits us, not only to define equational constraints over finite data terms, but also to
introduce more general domain specific constraints. Moreover, partial objects as introduced by lazy functional programming are admissible for constraint solving in the system
691
as approximations of complete objects. This gives uniform
support for laziness in a fully integrated functional logic programming system.
The computational model of the DCP paradigm is simply
the instantiation of the underlying constraint solver in constrained SLD-resolution to the CFP solver. Soundness and
completeness are a direct result of the properties of these
two components.
Clearly then, DCP represents a supersystem of both these
paradigms. Both the CLP InCap program and the CFP
query which computes the initial segment of a list are valid
DCP programs and queries. Moreover, the expressive power
of each of these individual paradigms is enhanced in the
DCP framework. We will demonstrate this with reference
to some programming examples.
The "built-in" solver manipulates only first-order objects.
In any correctly-typed DCP program, a function-typed variable will never become a constrained variable. Thus, higherorder functional programming features safely inherit their
intended use in functional computation without introducing
computability problems. The following examples illustrate
some of the attractive programming features of this rich language paradigm.
The quicksort algorithm is defined below as a relation
which uses difference lists (which appear as pairs of lists
(x, y)) to perform list concatenation in constant time. The
partitioning of the input list is specified naturally as a function, while the ordering function is passed as an argument
to the quicksort relation. Within the semantics of DCP,
such a functional parameter can be treated as special constant in relation definitions. A primitive function apply is assumed which is responsible for the application of such function names to arguments.
functions
partition : (alpha -+ alpha -+ boolean) X alpha X [alpha]
-+ ([alpha], [alpha])
relations
quicksort : (alpha -+ alpha -+ boolean) X [alpha]
X ([alpha ],[alpha])
partition (j, n, m : 1) = if f (n, m)
then (m : 11, 12) else (11, m : 12)
where (11, 12) = partition (j, n, 1)
partition (j, n, []) = ([], [])
quicksort (j, n : 1, (x, y))
partition (j, n, 1) = (11, 12),
quicksort (j, 11, (x, n : z)), quicksort (j, 12, (z, y))
quicksort (j, [], (x, x))
The relation perms below shows an interesting and highly
declarative way of specifying the permutations problem in
terms of constraints over applications of the list concatenation function ++.
relations
perms : [alpha] X [alpha]
perms (a : 1, (11 ++ (a : 12))
(11
++ 12) = perms
The final example shows how the recursive control constructs
of higher-order functions may be used to solve problems in
1
the relational component of a DCP language. We use a
reduce function over lists, together with the "back substitution" technique familiar in logic programming, to find the
minimal value in a list and propagate this value to all cells
of the list. This is shown via the relation propagatemin below, which uses the standard list reduce function to find the
minimum value, y, in the input list and construct a list, ll,
which is isomorphic to the input list, in which each element
is a logical variable x.
relations propagatemin : [Int] X [Int]
propogatemin 1 11 :reduce (j x, 1, (MaxInt, nil)) = (y, 11), x = y
where f z n (m, 12) = (min (n, m), z : 12)
These examples show that as well as being a systematic and
uniform integration of constraint, logic and functional programming with a sound semantics, the DCP paradigm displays a significant enhancement of programming expressive
power over other integrated language systems. We believe
that this pleasing outcome is a direct result of our strenuous
effort to identify clearly the essential characteristics of the
component language paradigms and to preserve them faithfully in the DCP language construction. We have defined a
concrete DCP language, Falcon [GP91]. Many Falcon programming examples appear in [DGP91].
7
Future Work
A very promising area of future research is the use of DCP
as the foundation for studying declarative parallel programming. The idea is quite simple. If we keep strictly to the
functional computational model for the functional sublanguage of a DCP language, synchronization between functional computation and constraint solving over logic variables becomes possible. Within this concurrent DCP framework, both the logical and the functional su blanguages cooperate to construct objects. The logical component approximates objects by imposing constraints and the functional
component constructs objects explicitly. At each step of the
construction, the functional part asks for more information
and continues the construction if and when that information
is available. Otherwise, it suspends and waits until other
concurrently executing agents provide the required information.
This behaviour is an important generalization of the traclitional local propagation model for constraint-based computation [Ste80]. The synchronization mechanism for functional computation obviously follows the data flow school,
but the use of constraint computation to enhance incrementally the information of logical variables provides a very attractive general data flow model, i.e. hi-directional data
flow. This idea originated from the data flow language JdNouveau [NPA86] in which an array of logical variables is a
special structure for synchronising functional computation
and constraint solving. This feature is generalised by the
concurrent DCP model as the basic principle of programming. Concurrent DCP may be understood as a further development of the concurrent constraint programming framework proposed by Saraswat et. al. [SR90] by exploiting the
692
elegant concurrent cooperation between functional and logic
computation.
Since computation in its functional sublangauge is deterministic, we would expect the efficiency of the system to be
much better than a logic programming system. Moreover,
since the functional component provides a powerful synchronization mechanism for deduction, with such a "control"
mechanism the overall efficiency of the paradigm is promising. This idea of exploiting deterministic computation in a
non-deterministic system by constraint propagation is also
central to the Andorra model [S.H90] which has been widely
accepted recently in the logic programming community. The
development of concurrent DCP has led to a very interesting convergence of research on language integration, constraint programming and declarative parallel programming
in [GF91].
8
M. Aronsson, L-H Eriksson, L. Hallnas, and
P. Kreuger. A Survey ofGCLA: A DefinitionalApproach to Logic Programming. In Proc. of the International Workshop on Extensions of Logic Programming, volume 475 of Lecture Notes in Gomputer Science, Springer Verlag. Springer, 1989.
[Co187]
A. Colmerauer. Opening the Prolog III universe.
Byte, July, 1987.
[Com90]
Haskell Committee. Haskell: A non-strict, purely
functional language. Technical report, Dept. of
Computer Science, Yale University, April 1990.
[DAP86]
J. Darlington, Field. A.J., and H. Pull. The
unification of functional and logic languages. In
D. DeGroot and G. Lindstrom, editors, Logic Programming, pages 37-70. Prentice-Hall, Englewood
Cliffs, New Jersey, 1986.
[DG89]
J. Darlington and Y.K. Guo. Narrowing and Unification in Functional Programming. In Proc. of
RTA 89, pages 292-310, 1989.
[DG90]
J. Darlington and Y. Guo. Constraint equational
deduction. Technical report, Dept. of Computing,
Imperial College, March 1990. will be presented in
CTRS' 90.
[DGP91]
J. Darlington, Y.K. Guo, and H. Pull. A new perspective on integrating functional and logic languages. Technical report, Dept. of Computing, Imperial College, December 1991.
[D088]
N. Dershowitz and M. Okada. Conditional equational programming and the theory of conditional
term rewriting. In Proc. of the FGGS '88, ed. by
IGOT,1988.
[DP86]
N. Dershowitz and D.A. Plaisted. Equational programming. Machine Intelligence (Mitchie,Hayes
and Richards, eds.), 1986.
[EGP86]
C.
Moiso
E.
Giovannetti,
G.
Levi
and C. Palmidessi. Kernel Leaf: An experimental
logic plus functional language - its syntax, semantics and computational model. ESPRIT Project
415, Second Year Report, 1986.
[Fri85]
Laurent Fribourg. SLOG: A logic programming
language interpreter based on clausal superposition and rewriting. In Proceeding of the 2nd IEEE
Symposium on Logic Programming, Boston, 1985.
[GF91]
Y. K. Guo and M. Fereira. Constraints, Functions
and Concurrency. Technical report, Dept of Computing, Imperial College, Sept. 1991. Working Research Notes.
Y. Guo, H. Lock, J. Darlington, and R. Dietrich. A
classification for the integration of functional and
logic languages. Technical report, Dept. of Computing, Imperial College and GMD Forchungsstelle
an der Universitat Karlsruhe, March 1990. Deliverable for the ESPRIT Basic Research Action
No.3147.
Conclusion
This paper set out to provide an answer to the question of
how and why we should integrate functional and logic programming languages. We believe that this should be done
not only with the goal of building a more powerful programming system but also aiming at diminishing the drawbacks
of the individual language paradigms. An integrated system should not only inherent the features of its components
but also, and equally importantly, it should exhibit new distinguishing features as a result of their combination. We
have developed a methodology for integration which demonstrates how the essential relational and functional features
may be preserved, and have explored the new programming
features which arise. The main idea underpinning this work
comes from clarification of the intended semantics of logic
and functional languages which motivated the insight to use
constraints as the glue for their integration. This led us to
develop the new language paradigm of definitional constraint
programming. We believe that the declarative constraint
programming model is a promising language paradigm for
the design of future programming languages.
9
[AEHK89]
Acknowledgements
We are indebted first and foremost to Sophia Drossopoulou
and Ross Paterson, our two colleagues on the Phoenix
project at Imperial College, for many valuable discussions.
We also thank our other colleague on the Phoenix project
at Nijmegen University and at GMD Kahlsruhe, particularly Maria Fereira for her cooperation and significant contribution to the recent work on concurrent DCP, and Hendrick Lock for his enlightening discussions on the philosophy of language integration. Many thanks are due to Dr.
Hassan Ait-Kaci, Prof. J-L Lassez, Dr. J. Jaffer and Dr.
Meseguer for their helpful insights and to all the people in
the Advanced Languages and Architectures Section at Imperial College who provide a stimulating working environment.
This work was carried out under the European Community
ESPRIT funded Basic Research Action 3147 (Phoenix).
References
[GLDD90]
[GM84]
Joseph A. Goguen and Jose Meseguer. Equality, types, modules, and (why not?) generics for
logic programming. Journal of Logic Programming, 2:179-210, 1984.
693
[GM87]
Joseph Goguen and Jose Meseguer. Models and
equality for logical programming. In Proc. of TAPSOFT 87, volume 250 of Lecture Notes in Computer Science, Springer Verlag. Springer, 1987.
[GP91]
Y.K. Guo and H. Pull. Falcon: Functional And
Logic language with CONonstraints-language definition. Technical report, Dept. of Computing, Imperial College, February 1991.
[Hue86]
G. Huet. Formal structure for computation and
deduction. Technical report, Dept. off Computer
Science, Carnegie-Mellon University, May 1986.
[HuI80]
Jean-Marie Hullot. Canonical forms and unification. In 5th Con! on Automated Deduction. LNCS
87, 1980.
[JCGMRA91] M.T. Hortala-Gonzalez J Carlos Gonzalez-Moreno
and Mario Rodriguez-Artalejo. A Functional Logic
Language with Higher Order Logic Variiables.
Technical Report, Dpto. de Informatica y Automatica UCM, 1991.
[JL86]
[JL87]
[LM89]
Joxan J affar and Jean-Louis Lassez. Logical programming scheme. In D. DeGroot and G. Lindstrom, editors, Logic Programming, pages 441-467.
Prentice-Hall, Englewood Cliffs, New Jersey, 1986.
Joxan Jaffar and Jean-Louis Lassez. Constraint
logic programming. In Prod. of POPL 87, pages
111-119, 1987.
J-L. Lassez and K. McAloon. A constraint sequent
calculus. Technical report, IBM T.J. Watson Research Center, 1989.
[Mes89]
Jose Meseguer. General logics. Technical Report
SRI-CSL-89-5, SRI International, March 1989.
[Mil84]
Robin Milner. A proposal for Standard ML. In
ACM Conference on Lisp and Functional Programming, 1984.
[MT84]
[NPA86]
[PS89]
M.Hagiya and T.Sakurai. Foundation of Logic
Programming Based on Inductive Definition. New
Generation Computing, 2(1), 1984.
R. Nikhil, K. Pingali, and Arvind. Id nouveau.
Technical report, M.I.T. Laboratory for Computer
Science, 1986. CSG Memo 265.
L.C. Paulson and A.W. Smith. Logic Programming, Functional Programming and Inductive Definitions. In Proc. of the International Workshop on
Extensions of Logic Programming, volume 475 of
Lecture Notes in Computer Science, Springer Verlag. Springer, 1989.
[PuI90]
Helen M. Pull.
Equation Solving in Lazy Functional Languages. PhD thesis, Dept. of Com-
puting, Imperial College, University of London,
November 1990.
[Red84]
Uday S. Reddy. Narrowing As the Operational Semantics of Functional Languages. In Proc. of Intern. Symp. Logic Prog. IEEE'. IEEE, 1984.
[Red86]
Uday S. Reddy. Functional Logic Languages, Part
1. In J.H. Fasel and R.M. Keller, editors, Poceedings of a Workshop on Graph Reduction, Santa
Fee, number 279 in Lecture Notes in Computer Sci-
ence, Springer Verlag, pages 401-425, 1986.
[Sco89]
Dana Scott. Semantic domains and denotational
semantics. Lecture Notes of the International Summer School on Logic, Algebra and Computation,
Marktoberdorf, 1989. to be published in LNCS series by Springer Verlag.
[S.H90]
S.Haridi. A logic programming language based on
andorra model. In New Generation Computing.
1990.
[Smo89]
Gert Smolka. Logic Programming over Polymorphically Order-Sorted Types. PhD thesis, Vom
Fachbereich Informatik der Universitat Kaiserlautern, May 1989.
[Smo91]
Gert Smolka. Residuation and Guarded Rules for
Constraint Logic Programming. Research Report
RR-91-13 DFKI, 1991.
[Sny90]
W. Snyder. The Theory of General Unification.
Birkhauser, Boston, 1990.
[SR90]
V.A. Saraswat and M. Rinard. Concurrent Constraint Programming. In Proc. 17th Annual ACM
Symp. on Principles of Programming Languages.
ACM, 1990, 1990.
[Ste80]
G .L. Steele.
[Tur85]
David A. Turner. Miranda: A non-strict language
with polymorphic types. In Conference on Func-
The Definition and Implementation
of a Computer Programming Language Based on
Constraints. PhD thesis, M.I.T. AI-TR 595, 1980.
tional Programming Languages and Computer Architecture, LNCS 201, pages 1-16,1985.
[You88]
Jia-Huai You. Outer Narrowing for Equational
Theories Based on Constructors. In Timo Lepisto and Arto Salomaa, editors, 15th Int. Colloqium on Automata, Languages and Programming,
LNCS 317, pages 727-741,1988.
[Yuk88]
K. Yukawa. Applicative logic programming. Technical Report LP-5, Logic programming Laboratory, June 1988.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
694
A Mechanism for Reasoning about Time and Belief
Hideki Isozaki
NTT Basic Research Laboratories
3-9-11, Midoricho, Musashino-shi
Tokyo 180, Japan
Abstract
Yoav Shoham
Computer Science Department
Stanford University
Stanford, CA 94305, U.S.A.
information about the mental states of other agents at
various times.
Several computational frameworks have been proposed
A statement referring to an agent's temporal belief
to maintain information about the evolving world, which
about another agent's temporal belief will be called a
embody a default persistence mechanism; examples in-
nested temporal belief statement. An example of it is the
clude time maps and the event calculus. In multi-agent
sentence "On Wednesday John believed that on the pre-
environments, time and belief both play essential roles.
vious Monday Jane believed that on the following Sat-
Belief interacts with time in two ways: there is the time
urday they would clean the house." Nested temporal be-
at which something is believed, and the time about which
liefs pose a number of interesting problems, both seman-
it is believed.
tical and algorithmic. In this paper we concentrate on
We augment the default mechanisms proposed for the
the latter kind; we propose a computational mechanism
purely temporal case so as to maintain information not
called a Temporal Belief Map, which functions as a data
only about the objective world but also about the evo-
base of nested temporal beliefs.
lution of beliefs. In the simplest case, this yields a two-
Consider a formal language for expressing nested tem-
dimensional map of time, with persistence along each di-
poral beliefs. A standard construction would extend clas-
mension.
sicallogic with a modal operator B:rp for each agent des-
Since beliefs themselves may refer to other beliefs,
ignator a and time point symbol t,meaning intuitively
we have to think of a statement referring to an agent's
that at time t the agent a believes rp. To ensure that the
temporal belief about another agent's temporal belief (a
modal operator respects the properties of belief (or, more
nested temporal belief statement). It poses both semanti-
exactly, its crude approximation that has been employed
cal and algorithmic problems. In this paper, we concen-
in computer science and AI), various restrictions on this
trate on the algorithmic aspect of the problems. The gen-
operator have been suggested, and then extensively ex-
eral case involves multi-dimensional maps of time called
plored, debated and modified[Hintikka 1962, Griffiths
Temporal Belief Maps.
1967, Konolige 1986]. These include properties such as
1
Introduction: Time Maps and
Temporal Belief Maps
B:(rp:::) '¢) I\B:rp:::) B:,¢ (the 'K' axiom), B:rp:::) ,B:,rp
Bta BtaT and ,BtaT
Bta ,Bta rp (the
(the 'D' axiom) ' aBtT
(f)
:::)
(f)
(f)
:::)
'4' and '5' axioms)[Chellas 1980], and others. In addition, although these have been less well studied, further
In multi-agent environments, time and belief both play
constraint may be imposed on the change in belief over
essential roles. Belief interacts with time in two ways:
time.
there is the time at which something is believed, and
We will briefly return to these properties in the next
the time about which it is believed. As in the atemporal
section, but they are not the focus ofthis paper. Instead,
treatment of belief, beliefs themselves may refer to beliefs
we concentrate on algorithmic issues. Consider first the
(of other agents, or even the same one). For example, in
purely temporal case, without an explicit notion of be-
the framework of Agent Oriented Programming [Shoham
lief. In principle, capturing the truth of facts over time
1990], at any time the mental state of an agent contains
695
should pose no problem; we can use standard data base
techniques to capture the fact true at a single point in
time, and repeat it for all point. In practice, though, it
is impossible, and we will need to use some shortcuts.
Figure 1: A simple persistence
The representational aspect of the problem appears in
the form of the well-known frame problem [McCarthy and
Hayes 1969]: when you buy a red bicycle, how you conclude that a year later it will still be red, regardless of
what happens in the meanwhile - the bike is ridden, the
tire is fixed, elections are held -
Figure 2: Clipping a persistence
unless it is painted. An
axiom stating explicitly that the color does not change
after each action is called a frame axiom; the problem is
to capture the persistence of facts without including the
numerous possible frame axioms.
The frame problem and related problems have been
investigated in detail from the logical point of view
(d. [Shoham 1992]), and most solutions proposed have
made use of nonmonotonic logic. Adding belief yields a
qualitative increase in difficulty, since beliefs (and lack
thereof) tend to persist as well: once you learn something, you will keep it in mind until you forget it or learn
incompatible facts. The formal details of the persistence
of mental state have not yet been studied as deeply; an
initial treatment of it appears in [Lin and Shoham 1992].
As was said, we are interested in the algoirthmic aspects of the problem.
Computational complexity of
knowledge and belief without time was discussed by
[Halpern and Moses 1985]. In the purely temporal case,
the question is how to efficiently implement the following persistence principle (throughout this article we will
assume discrete time, but the discussion can be adapted
to the continuous case as well; we also assume propositional facts, with no variables):
determine the truth value of all other points. Each event
gives rise to a default persistence, which ends at the first
future point about which a contradictory fact is believed.
For example, if an event which causes p occurs at time t[l]
(the superscript in [ ] identifies a given time point), and
no other information about p is yet present in the time
map, then the two points
t[l]
and
00
are associated with
p, with a default persistence of p from the first to the
second. This may be depicted graphically by Figure 1.
If it is subsequently added that at time t[2] (> t[l]) an
event happened that causes -'p,
t[2]
is associated in ad-
dition with p; a default persistence of -'p is assumed between
t[2]
and
00,
is "clipped" at
t[2]
and the persistence of p starting at
t(1]
(Figure 2).
This is a crude description of the operation of time
maps, but it suffices to explain the transition to temporal belief maps (TBM's), which incorporate an explicit
notion of belief.
(Note that we have discussed only persistence into the
future. Most of the literature in AI does that, and we too
will in this paper. However, persistence into the past can
make as much sense, especially when one adds an explicit
pHI holds iff either an event which causes p
notion of belief. For example, if you find a book on a
occurred at time t, or else pi holds and no event
desk, you will believe that the book was on the desk a few
which causes -'p occurred at time t.
minutes ago. Most researchers manage to avoid this is-
Straightforward embodiment of this rule
backward
sue by limiting the form of temporal information. In par-
In order to determine the
ticular, both time maps and the event calculus embody
truth value of pi, you do not want to have to check pi-I,
a certain causality principle: the only way new temporal
pi-2 ,an d so on unt!·1 you d·IS cover th a t p i-2l3857.IS t rue.
information is added is by a preceding event which causes
chaining is too inefficient.
III
Both time maps [McDermott 1982, Dean and McDer-
it. Since an explicit cause is known, there is no reason to
mott 1987] and event calculus [Kowalski and Sergot 1986]
posit backward persistence, past the cause. For example,
provide better alternatives. In particular, time maps rely
we cannot represent the simple fact that the book was
on keeping track of only the points at which the truth
on the table; we must represent a specific event or ac-
value of the proposition changes, which are sufficient to
tion that resulted in that state (such as placing the book
696
there). The closest one gets to backward persistence is
as shorthand for the English sentence). The input to
through abductive reasoning, "what would have to be the
previous events. In applications such as planning[Allen
our problem is assumed to be a collection of data points
t[i] t[i]
t[i]
t[i]
t[i] P]
tfi]
P]
of the form Lall La22 ••• La:~llPt and Lall La~ ... La:~ll -'Pjn
In other words, the sequences of agent indices are iden-
et al. 1991], this is a reasonable assumption, as in those
tical in all the input data, but the time indices are un-
one is constructing a map of the future based on spe-
constrained (we will see in section 5 why assuming a
case previously in order for this fact to hold," positing
cific planned events. However, if one is trying to use the
fixed sequence of agent indices is not limiting). We also
mechanism to piece together a map of time on the ba-
assume that the data is consistent, that is, it does not
sis of spotty data, this may prove inappropriate. For ex-
•
t[k]
t[k]
P]
P]
t[k]
t[k]
ample in a framework such as Agent Oriented Program-
contam both Lall ... La:~llPkn and Lall ... La:~ll -'Pkn for
any k. The problem is to define the rules of persis-
ming [Shoham 1990], a major source of new temporal in-
tence in this n-dimensional space, that is, to define for
formation are INFORM messages from other agents. As
a result of these messages, the agent may possess a rich
any (tl' tz' ... ,tJ in the space and each fact p, which (if
either) of Btl Bt2 ••• Btn-l ptn and Btl Bt2 ••• Btn-l-'pt n are
sample of what is true and false over time, but no causal
supported by the data. (In all of the above, both the
al
a2
an
al
a2
an
knowledge of the precipitating events. Nevertheless, we
agent indices and the time indices may contain repeti-
will ignore backward persistence in most of the paper.
tions.) Furthermore, we will want our definition to sup-
Unless we explicitly state otherwise, we will use the term
port an efficient mechanism for answering such a query
persistence to mean forward persistence.)
about any point in the space.
Suppose we now wish to represent the evolution of
Note that both the input form and query form are
an agent's beliefs. Let us first introduce the notion of
quite constrained. For example, the input form precludes
learning, which will playa role that is analogous to that
facts such as "John learned that Mary did not believe c.p,"
of an event in time maps. Given this notion, beliefs too
(L John -,BMaryc.p) without making the stronger statement
will be subject to a persistence rule:
"John learned that Mary learned -'c.p." (LJohnLMary -,c.p)
"The agent believes a fact at time t
+ 1 iff he
Similarly, a query "Does John believe that Mary does
learned it at time t, or else at time t he believed
not believe c.p?"
the fact and did not at that time learn that it
the stronger query about Mary's believing the negated
became false."
fact (BJohnBMary -'c.p?). A positive answer to the second
(This rule embodies the assumption that agents have
perfect memory.) If, in addition, the "fact" itself is temporal, we end up with persistence along two orthogonal
dimensions: the time of belief and the time of the property. This is the simple case of a 2-dimensional TBM.
The extension to higher-dimensional TBM's is natural.
Such TBM's are obtained by nested belief statements,
such as as "John believes today that yesterday he did
not believe ... " and "John believes today that tomorrow Mary will believe ... "); both of these example statements induce a 3-dimensional TBM. It turns out that
resolving contradictions in a multi-dimensional TBM is
somewhat more subtle than in standard time maps, as
the following sections will describe.
Here then is the problem we will address. Let us use
the notation
L: c.p to mean that agent a learned c.p at t
(actually formalizing this notion is tricky, but that is not
the concern of this paper; we use the notation merely
(B John -,BMaryc.p?) are disallowed, only
(BJohnBMary -,c.p) would entail a positive one to the first
(B John -,BMaryc.p), but a negative answer (-,BJohnBMary -,c.p)
would shed no light on the first query (BJohn -,BMaryc.p?).
These are extensions we plan to look at in the future.
In the remainder of this paper we will elaborate on this
picture. We will explicate the assumptions made about
agents, and discuss the multi-dimensional persistence in
more detail. The organization is as follows. In section 2
we state the assumptions we make about agents' beliefs,
both at single points in time and over periods of time. In
section 3 we look closely at persisten.ce in a TBM's with
a single datum point. In section 4 we look at TBM's
with multiple data points. In section 5 we discuss the
extension to data with multiple sequences of agent indices. In section 6 we briefly mention the complexity of
the query answering, and in section 7 we briefly mention
implementation efforts. We conclude with discussion of
related and future work.
697
2
Assumptions about Belief
We mentioned before that various idealizing assumptions
about belief have been made and debated by other researchers, and that the focus of this paper is different
from them. Nonetheless a few basic assumptions are es-
tl: belief
sential, and we discuss them here. In the spirit of this paper, we discuss these properties in commonsense terms,
tl: belief
Figure 3: Default region (left) and causal region (right)
rather than in a formal logic.
We have already listed some of the more common restriction on belief: closure of beliefs under tautological
implication (as captured by the 'K' axiom), consistency
(as captured by the 'D' axiom), and positive and negative introspection (as captured by the '4' and '5' axioms).
Since among objective properties (those without a belief operator) we will consider only literals (atomic properties and their negations), the closure property will be
irrelevant. Positive and negative introspection will also
Assumption 4 (Common knowledge) Every agent
believes that every agent believes the above properties)
that every agent believes that every agent believes them)
and so on.
Multi-Dimensional Persistence
of a Single Datum
3
turn out to impact our results only minimally, as will
be discussed in section 5. However, consistency will lie
In this section, we consider TBM's induced by a single
at the heart of the TBM mechanism, and is our first as-
datum point. We start by considering the non-nested
sumption.
case, in which the datum has the form
Assumption 1 (Consistency) B:'P and
both hold.
B:''P
cannot
p] p]
Lal
p
2
(at time
t~] agent a learns that at time t~l] property p was (is,
will be) true). This induces a 2D TBM, in which the
This is the only assumption we will make about a belief
persistences along both axes are uninterrupted and thus
at an instance of time.
do not terminate at all. This situation is represented
In addition we have constraints on how beliefs change
graphically in Figure 3.
over time. We first assume that agents do not come to
The hatched quarter plane in the left picture, rooted in
believe facts without explicitly learning them, but that
the point (til], t~l]), is called the default region of (t~l], t~]).
once they learn them, they do not forget them.
The meaning of this region is that, given only the datum
Assumption 2 (Causality and Memory) If at t agent
point Lal p 2 ,B al p 2 holds by default iff (iI' t 2) lies in that
. (.I.e., 1·ff t[l]
t
d i[l]
i)
regIOn
I < I an
2 < 2 .
t[l]
a does not learn ''P) then B!+1'P holds iffB:'P holds.
Our next assumption is that agents are extremely receptive to new information[Gardenfors 1988].
Assumption 3 (Gullibility) If at time t agent a
learns 'P) then B!+I'P holds.
t[l]
t
t
Similarly, if we focus on an affected point (*), all data
points affecting it by their forward persistence are distributed in the opposite quarter plane. This is the dual
concept of the default region and is called a causal re-
gion of the affected point. It is depicted graphically in
(Of course, in an environement in which agents are sup-
the right picture of the above figure. In this paper we
plied with unreliable or dishonest information, this last
will be concerned mostly default regions.
assumption would be unacceptable, and we would need
Finally, although it is only the 2-dimensional case that
a more sophistiated criterion to determine which of the
is so amenable to graphical representation, these con-
two contradictory facts, the previously believed one and
cepts extend naturally to the multi-dimensional case.
t[l]
P] tf1]
Specifically, given only the datum Lall •.. La:-=-llP n , we
... Btn-l ptn holds iff it is the case that i >
have that Btl
l
al
an-l
t[l] ... t > i[l]
the newly learned one, should dominate.)
Our last assumption is that all these properties are
'common knowledge':
I'
'n
n .
698
Mutiple Data with Incompatible Beliefs
4
We have so far considered only TBM's induced by a single datum. We now look at the general case in which we
have mutiple data. We still assume that all data have the
.c
t[']
t[i]
t[i]
P]
t[)]
t[)]
lorm L 1 ••• L n-l p.n or L 1 ••• L n-l ' p n for some fixed
al
an-l
z
al
an-l
t[2]
2 ..
till
2 ..
elief
J'
aI' ... ,a n _ l (again, see section 5 in this connection), but
nothing beyond that.
If for any Pk the collection does not contain more than
Figure 4: Overlapping default regions (t~l]
::f.
t~2], t~l]
::f.
t~2])
one occurrence of Pk (whether preceded by , or not), the
situation is simple: the persistence of each fact is independent of the others, and so we construct an independent TBM for each one.
t[2]
2
The situation in which multiple occurrence of a Pk ex-
..
till
2 ..
ist, but all with the same polarity (that is, either in all
elief
data containing Pk the Pk is preceded by" or in none),
the situation is also simple: the default region is simply
the union of the individual regions for each datum containing Pk'
It is the presence of contradictory data that makes the
Figure 5: Consistent default regions
4.1
(4
1
]
::f.
t~2],
4 ::f. t~2])
1
]
The 2D Case
story more interesting. Our assumption of consistency
dictates that persistences of contradictory beliefs may
not overlap. Without the strong limitations on the form
of input data and queries, we would have two problems
-
to determine which sets of persistences are contra-
dictory, and to resolve the contradiction. For example,
we would have to notice that the three sentences
B:
B:
B: (p V
q), 'P and 'q are jointly inconsistent, even though
all pairs are· consistent. Our restrictions remove this
first problem. Since we only consider facts of the form
Btl
Btn-l tn
d Btlal' .. Bta:-=-l1 , pt / ,t h eon1y f act contraal ... an-l Pi an
dicting
tn
Btl ... Btn-l Pk
al
an-l
will be
Btl .•. Btn-l ,ptn
al
an-l
k'
and vice
versa. When in future work we relax the restrictions on
input and queries, we will need a new criterion for deter-
dimensional case is derived in a straightforward fashion
from the assumptions stated in section 2, and is analogous to the clipping of persistences in simple time maps.
We will discuss the case of two data points, but the discussion extends easily to multi pIe points.
Consider the input data consisting of the two points
P] P]
• : Lal P 2 and
t[2]
t[2]
• Without loss of generality,
assume that t~2] ~ til] holds. We consider the two cases
-
0 :
Lal 'P 2
t~2] ~ t~l] and t~] < t~l] -
and assume for now that
neither ti2] = t~l] nor t~2] = t~l] hold. The default regions
of the two points in both cases are shown in the left and
right portions of Figure 4, respectively.
In both cases the default regions overlap, which is for-
mining incompatibility.
Our restrictions do not only render the problem of determining incompatibility trivial, they also simplify the
task of resolving it. Since we always have exactly two
beliefs contradicting one another, our task reduces to removing one of them; the question is which.
The rule for resolving contradictory beliefs in the two
l
bidden, and one of them must be trimmed. In deciding
which, we recall the assumption of gullibility: right after learning a fact, the agent must believe it. furthermore, the assumption of memory and causality dictates
that the agent must continue to believe it until the next
point about which he learns that the fact is false there.
This produces the consistent default regions in Figure 5.
Example. If John learns on Monday that on
lor cours~, removing both would also restore consistency, but
that would VIOlate our assumption about causality and memory.
Thursday his house will be painted white (.)
and on Tuesday he learns that on Friday it will
699
be painted blue (0), then from Monday until
Tuesday John will believe that his house will be
white from Thursday until the end of time, and
from Tuesday on he will believe that his house
will be white from Thursday until Friday (+45
0
0
shading), and blue afterwards (-45 shading)
tl: belief
(the left picture). (Of course, on Thursday he
will learn that the painter had a wedding in
t1: belief
Figure 6: Default regions (left: t~l]
= ti2], right:
t~l]
= t~])
Chicago and couldn't come.)
On the other hand (the right picture), if
John learns on Monday that on Thursday his
In this case the agent first learned that p became true
house will be painted white (.) and on Tuesday
at some point, and later learned that p became false at
he learns that on Wednesday it will be painted
that very point. Now in principle we could imagine quite
blue (0), then from Monday until Tuesday John
sophisticated criteria to decide which evidence should
will still believe that his house will be white
from Thursday until the end of time (+45
0
be given greater credence. However, our assumption of
gullibility forces a "recent is better" policy, leading us
shading), but from Tuesday he will believe that
to accept the later information and abandon the older
his house will be blue from Wednesday until
one. The resulting default regions are shown in the right
0
Thursday (_45 shading), and leave unaltered
his belief that it will be white afterwards (+45
figure.
0
4.2
The General Case
shading). (That will change when the painter,
back from Chicago a week later, paints John's
We now extend the previous discussion to higher TBM's.
house turquoise, since neither white nor blue
We will unfortunately have to do so without the aid of
really go well with olive tree in the yard.)
graphics; instead, we will use the following example.
Note that in either case, the beliefs from
Example. At t~l] you learn that at time t~l]
Tuesday onwards would not change even if the
your son learned that your son's teacher moved
the two pieces of information were acquired in
to Japan at time t3[
the opposite order. This is no accident; this
Church-Rosser property is true in general of our
system.
We now turn to the limiting cases, in which either
t~2] = 1] holds or t~2] = 1] holds. Note that from our
4
4
1]
[1]
[1]
[1]
(Lt1 Lt2 pt3).
you son
At time
t~2] you learn that at time t~2] your son learned
t1
2
that his teacher moved to the US at time ]
P] t[2] t[2]
[2]
[1]
(Ly~uLs~n ""p 3 ) where t3 > t3 .
Let t > max(t[1] t[2]) t > max(t[l] t[2])
1
and t3
1'1'2
> t~2](> t~1]). Then at
2'2'
t1 you believe
assumption about the consistency of the input, at most
that at t2 your son believes that his teacher is
one of them can hold. Therefore, if t~2] = t~1] holds, we
living in the US at t3' This is true regardless
of the relationship between t~l] and t~2], or the
may assume without loss of generality that t~2] > t~1].
This means that at time t~1] (= t~2]) the agent learned that
relationship between t~l] and t~2].
agent will th~refore believe at time t~l\ = t~2]) that p will
Now consider the same scenario, except that
t~2] = t~I]. This means that you believe that
be true from the first point until the second, and false
your son learned two contradictory facts. How-
afterwards. There will be nothing later to change that
ever, from the assumption that rules of belief
belief, and thus the default region of p forms an infinite
change are common knowledge , you know that
p first became true (.) and later became false
(0). The
2
horizontal strip, and the default region of ""p occupies
your son will adopt the latest information (as
the quadrant above it (Figure 6).
The case in which t~2] = t~l] holds is more interesting,
illustrated in the previous figure).
since it provides insight into the higher dimensional case.
Therefore
2Note that this is our first use of the common knowledge
assumption!
700
your beliefs about your son's beliefs will de-
but they also have perfect memory of past beliefs:
Assumption 7 (Introspection about past beliefs)
pend on the relationship between t~11 and t~21;
if t~l > t~ll then you will believe that your son
believes that the teacher lives in the US; other-
if
T
> O.
wise you will believe that your son believes that
The last assumption states that agents do not expect
the teacher lives in Japan.
their beliefs to change:
Assumption 8 (Belief about stability of beliefs)
Finally, what will you believe if t~21
= t~l
t~ll? In this case, you will need to
break the tie by comparing t~ll and ti21 . Note
and t~21
=
if
that they cannot also be equal, as that would
T
> O.
(Notice ,that assumptions 5, 7, and 8 can be unified into
violate the assumption that the input data is
BtlBt2
a a r.p
consistent.
==
B min (h,t2 ) .)
a
r.p
We are not arguing on behalf of these assumptions. We
The lesson from this example is clear.
To determine
list them merely as examples of plausible assumptions
whether a point in the hyper-space lies in a particular
one might want to make. The reason we mention them at
default region, you should compare the associated time
all is that they violate the property that nested temporal
vectors. This ordering is a reverse lexicographical order-
beliefs with different agent indices are independent of
ing, the innermost time being the most significant and
one another. For example, under assumption 8, B~B!p8
the outermost time the least significant.
is contradictory with B~
Multiple Sequences of Agent
Indices
5
We have all along assumed one fixed sequence of agent
indices in the data: a l , " ' , an_I' However, relaxing this
limitation is quite simple.
Consider data points with
multiple sequences of agents indices. Unless we make
further assumptions about belief, data with different index sequences will simply not interact. For example, the
t[l]
t[l] t[l]
truth of Ba1 Bb2 p 3 is completely independent from the
t[2]
t[2]
truth of any statement that is not of the form Ba1 Bb2 x,
where x is an objective sentence (containing no belief oppl pl
-,l.
Fortunately, these four assumptions allow an easy solution.
We simply keep simplifying the sentences by
substitution, until no further simplifications are possible. It turns out that no matter what subset of these
four we choose, the result of this substitution process is
unique (the Church-Rosser property again). More generally, whenever our assumptions allow us to derive a
unique canonical form, we convert the query and the input data to this canonical form, and then revert to our
usual procedure. We have not yet investigated the more
complex case in which the canonical form is hard to derive or nonexistent.
t[ll
erator); in particular, it is consistent with Ba1 Bc2 -,p 3
•
Thus we may simply construct separate TBM's for these
different sentences, each obeying our restriction.
6
Complexity
Our definition of default regions was constructive, and
However, if we do make further assumptions about
allows efficient query answering. We briefly discuss the
belief, we must take greater care. We consider here four
complexity here. If we assume that comparison of a pair
possible further assumptions about belief. The first two
of one-dimensional time points is done in one operation,
are the familiar assumptions of introspective capability:
then comparing two n-dimensional time points requires
Assumption 5 (Positive introspection) B:r.p holds iff
Bt Bt
holds.
a aT
(f')
Assumption 6 (Negative introspection) -,B:r.p holds
at most n operations. In ordinary applications, n will be
a very small integer. Ordinary people will not think of
n = 5 cases in their everyday life.
If we have N data points, we can get a sorted list
B: -,B: r.p holds.
of the data points by the priority based on the reverse
The other two have to do with beliefs of the agent at
lexicographical ordering, as eX:plained. This requires only
iff
~
different points in time. The first is that not only do
O(n . Nlog 2 N)
agents have memory (which we have already assumed),
agent learns informations gradually, it is useful to use a
O(NlogN) operations. Since each
701
heap, a well known balanced tree data structure which
made it clear that in this work we did not undertake
can be easily modified to keep ordering.
a logical treatment of time, belief and nonmonotonic-
If we need to identify only the dominant data point in
ity. We were also explicit about the limitations of our
the causal region, even a naive implementation gives it
framework. We hope to do both in the future, as well as
in O(nN) c::: O(N) operations.
demonstrate the pratical utility of this work.
7
Acknowledgments
Implementation
Our framework can be easily implemented by logic programmming languages such as Prolog as well as ordinary
procedural languages such as C. We implemented various
We would like to thank AOP group members at Stanford
University and the referees of this paper who gave us useful
comments. The first author would like to thank Koichi Furukawa at ICOT, Shigeki Goto, Hirofumi Katsuno, and other
colleagues at NTT, too.
versions of this framework in both languages. Backward
reasoning mechanism implemented in Prolog employed
References
simplified versions of Kowalski/Sergot's Event Calculus.
[Halpern and Moses 1985] J. Y. Halpern, Y. Moses. A Guide
to the Modal Logics of Knowledge and Belief: Preliminary Draft, ppA80-490, Proc. of IJCAI, 1985.
[Kowalski and Sergot 1986] R. Kowalski, M. Sergot. A
Logic-Based Calculus of Events, New Generation Computing, VolA, pp.67-95, 1986.
[Lin and Shoham 1992] F. Lin, Y. Shoham. Persistence of
Knowledge and Ignorance (in preparation), 1992.
[McCarthy and Hayes 1969] J. McCarthy, P. J. Hayes. Some
Philosophical Problems from the Standpoint of Artificial Intelligence, in B. Meltzer and D. Michie (Eds.),
Machine Intelligence 4, Edinburgh University Press,
ppA63-502, 1969.
[McDermott 1982] D. V. McDermott. A Temporal Logic for
Reasoning about Processes and Plans, Cognitive Science, Vol.6, pp.101-155, 1982.
[Dean and McDermott 1987] T.1. Dean, D. V. McDermott.
Temporal Data Base Management, Artificial Intelligence, Vol. 32, pp.1-55, 1987.
[Chellas 1980] B. F. Chellas. Modal Logic, An Introduction,
Cambridge University Press, 1980.
[Gardenfors 1988] P. Gardenfors, D. Makinson. Revisions
of Knowledge Systems Using Epistemic Entrenchment,
Proc. of the Second Conference on Theoretical Aspects
of Reasoning about Knowledge, pp.83-95, 1988.
[Shoham 1990] Y. Shoham. Agent-Oriented Programming,
Stanford Technical Report CS-1335-90, 1990.
[Shoham 1992] Y. Shoham. Nonmonotonic Temporal Reasoning, in D. Gabbay (Ed.), The Handbook of Logic in
Artificial Intelligence and Logic Programming (to appear), 1992.
[Sripada 1991] S. M. Sripada. Temporal Reasoning in Deductive Database, PhD thesis of Univ. of London, 1991.
[Allen et al. 1991] J. F. Allen, H. A. Kautz, R. N. Pelavin,
J. D. Tenenberg. Reasoning about Plans, Morgan Kaufmann Publishers, 1991.
[Hintikka 1962] J. Hintikka. Knowledge and Belief: An Introduction to the Logic of the Two Notions, Cornell University Press, 1962.
[Griffiths 1967] A. P. Griffiths ed. Knowledge and Belief, Oxford University Press, 1967.
[Konolige 1986] K. Konolige. A Deduction Model of Belief,
Morgan Kaufmann Publishers, 1986.
Forward reasoning mechanism implemented in C employed sorting of an array. As we described before, our algorithm is very fast in simple cases. We intend to implement more complex cases and evaluate their complexity.
As for 2D cases, we have a program which draw a map
from a set of data points whose time stamps are given in
hour/minutes or minutes/seconds or year/month/day.
Finally, this work has been carried out as part of the
research on Agent Oriented Programming. The current
simple interpreter, AGENTO [Shoham 1990l, only has a simple version of standard time maps. We have implemented
an experimental agent interpreter which incorporates the
ideas ofthis paper, and hope to report on it in the future.
8
Related Work and ConcluSIons
The only closely related work of which we are aware,
other than the work on time maps and event calculus
which we have discussed at length, is Sripada's [Sripada
1991], which was independently developed.
Both sys-
tems can deal with nested temporal beliefs. Sripada represents a nested temporal belief by a Cartesian product
of time intervals, and like us assumes that nested temporal beliefs are consistent. However, he does not consider the notion of default persistence, and therefore not
with the resolution of competing default persistences. It
would seem that the result of our system could serve as
input to his, but we would like to understand his work
better before making stronger claims about the relationship to his work.
As should be clear, much more needs to be done. We
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
702
Dealing with Time Granularity in the Event
Calculus
Angelo Montanari (+ )(*), Enrico Maim (-), Emanuele Ciapessoni (+), Elena Ratto (+)
( +) CISE, Milano, Italy (-) SYSECA, Paris, France
(*) Current Affiliation: University of Udine, Udine, Italy
Abstract
The paper presents a formalization of the notion of
time ·granularity in a logic-based approach to knowledge representation and reasoning. The work is based
on the Event Calculus [Kowalski,86], a formalism for
reasoning about time, events and properties using firstorder logic augmented with negation as failure. In the
paper, it is extended to include the concept of time
granularity. With respect to the representation, the
paper defines the basic notions of temporal universe,
temporal decomposition and coarse grain equivalence.
Then, it specifies how to locate events and properties
in the temporal universe and how to pair event and
temporal decompositions. With respect to the reasoning mechanisms, the paper defines two alternative
modalities of performing temporal projection, namely
upward and downward projections, that make it possible to switch among coarser and finer granularities.
1
Introduction
The paper presents a formalization of the notion
of time granularity in a logic-based approach to
knowledge representation and reasoning. The work
is based on the Event Calculus, a formalism for
reasoning about time, events and properties using
first-order logic augmented with negation as failure
[Kowalski and Sergot 1986]. In the paper, it is exAuthors' addresses: Angelo Montanari, University
of Udine, Mathematics and Computer Science Department, Via Zanon, 6 33100 Udine, ITALY, email: montanari@uduniv.cineca.itj Enrico Maim, SYSECA Temps
Reel, Constraint Resolution Research Group, 315 Bureaux
de la Colline, 92213 Saint-Cloud Cedex, FRANCE, email:
enrico..maim@eurokom.iej Emanuele Ciapessoni and Elena
Ratto, CISE, Artificial Intelligence Section, Division of Systems and Models, Via Reggio Emilia, 39, Segrate (Milano),
ITALY, email: kant@sia.cise.itandelena@sia.cise.it. Most
work of the fitst author was done while he was employed in
CISE.
tended to include the concept of time granularity.
Informally, granularity can be defined as the resolution power of a representation. In general, ea.ch·
level of abstraction at which knowledge can be represented is characterized by a proper granularity. Providing a formalism with the concept. of granularity allows it. to embed different levels of knowledge in a
representation. In such a way, each reasoning t.ask
can refer to the representational level that abstracts
from the domain only those aspects relevant io the
actual goal. We are interested in time granularity.
With respect to the expressive power, it allows one
to maintain the representations of the dynamics of
different processes of the domain that evolve according to different time constants as separate as possible
[Corsetti et al. 1990]. It also allows one to model t.he
dynamics of a process with respect t.o different time
scales. In such a case time granularity has to be paired
with other refinement mechanisms such as process
decomposition [Allen 1984], [Kaut.z and Allen 1986J,
[Corset.t.i ct al. 1991a], [Evans 1990]. Finally, time
granularit.y increases both the temporal distinctions
that a language can make and the distinctions that
it can leave unspecified. This means that consjdering two events as simultaneous or temporally distinct, or two time dependent relations as temporally overlapped or disjoint depends on the granularity one refers to. With respect to the computational
powet·, it supports different grains of reasoning to deal
with incomplete and uncertain knowledge [Allen 1983]'
[Dean and Boddy 1988]. It also allows one to t.ailor t.he
visibility of the knowledge base and the reasoning process to the needs of the actual task [Fum et al. 1989].
Secondly, it allows one to alternate among different
time granularities during the execution of a task in
order to solve each incoming problem at a time granularity as coarse as possible [Dean et al. 1988]. An example of a limited use of time granularity to expedit.e
the search of large temporal databases is provided by
[Dean 1989]. Finally, it allows one to solve a problem
at a time granularity coarser than the required one to
703
cope with the complexity of temporal reasoning. Such
a simplification speeds up the reasoning, but implies a
relaxation of the precision of the solution. The ratio
between the time granularities provides a measurement
of the approximation of the achieved result.
In despite of the widespread recognition of its relevance for knowledge representation and reasoning,
there is a lack of a systematic framework for temporal granularity. The main references are the paper
of Hobbs [1985] on the general concept of granularity and the works of Plaisted [1981], Giunchiglia and
Walsh [1989] on abstract theorem proving. Hobbs defines a concept of granularity that supports the construction of simple theories out of more complex ones.
He formally introduces the basic notions of relevant
predicate set, indistinguishability relations, simplification, idealization and articulation. Such notions are
extended and refined by Greer and McCalla [1989],
which identify two orthogonal dimensions along which
granularity can be interpreted, namely abstraction and
'aggregation. However, the one and the others reserve
little or no attention to time granularity. In particular,
Hobbs only sketches out a rather restrictive mapping
of continuous time into discrete times using the situation calculus formalism. Conversely, a set-theoretic
formalization of time granularity is provided by Clifford and Rao [1988], but they do not attempt to relate
the truth value of assertions to time granularity. Finally, Galton [1987] and Shoham [1988] give significant
categorizations of assertions based on their temporal
properties. These categorizations are strictly related
to the concept of time granularity even if it is not explicitly considered.
A first attempt to introduce the notion of time
granularity in the Event Calculus is reported in
[Evans 1990]. Evans defines a macro-events calculus
for dealing with time granularity whose limitations
are discussed in section 4.1. Our paper proposes a
framework to represent and reason abou t time granularity in the Event Calculus that generalizes these
previous results. It significantly benefits by the work
done to formalize the concept of time granularity
in TRIO, a logic formalism for specifying realtime
systems [Corsetti et al. 1991b], [Corsetti et al. 1991c],
[Montanari et al. 1991], and [Ciapessoni et al. 1992].
[Maim 1991] and [Maim 1992a] present an alternative
approach where the granularity problem is seen as an
issue of dealing with ranges and intervals in constraintbased reasoning.
The paper is organized as follows: section 2 presents
the original Event Calculus together with its basic extensions, namely types, macro-events and continuous
change; section 3 focuses on the representation of time
granularity; section 4 details the modalities of reasoning about time granularity.
2
The Event Calculus
The Event Calculus proposes a general approach
to represent and reason about events and their effects in a logic framework [Kowalski and Sergot, 1986]'
[EQUATOR 1991]. From a description of events t.hat
occur in the real world, it allows one to derive va.riolls
relationships and the time periods for which they hold.
It also embodies a notion of default persistence, that.
is, relationships are assumed to persist until an event
occurs which terminates them. As an example, if we
know that an aircraft enters a given sector at 10:00hrs
and leaves at 10:20hrs, the Event Calculus allows us
to infer that it is in that sector at 10:15hrs. More precisely, the Event Calculus takes the notions of event,
property, time-point and time-interval as primitives
and defines a model of change in which et!ent.~ happen at time-points and initiate and/or terminate timf'intervals over which some property holds. So, for instance, the events of entering and leaving the sector initiate and terminate the aircraft's property of being in
the sector, respectively. Time-points are unique points
in time at which events take place instantaneously. Tn
the previous example, the event of entering the sector occurs at 10:OOhrs, while the event of leaving t.he
sector occurs at. lO:20hrs. They can be specified at different degree of explicitness, e.g. "91/5/24:10:00hrs"
to include the full date or just "lO:OOhrs", but belong
to a unique domain. Time-intervals are represent.ed by
means of tuples of two time-points. Wit.h the same example, we can deduce that the aircraft is in the sect.or
during the time-interval starting at 10hrs and ending
at lO:20hrs.
Formally, Event Calculus represents domain knowledge by means of initiates and terminates predicat.es
.
1
that express the effects of event.s on propertIes :
initiates( Event., Property)
terminates(Event, Property)
In such a way, domain relations are intensionally
defined in terms of events and properties types
[EQUATOR 1991]. Weak forms of t.he initiate.'! and
terminates predicates, namely weak-initiate.'! and wmkterminates, have been introduced in [Sergot 1990].
The predicate weak-terminate., states that a giv(>n
event terminates a given property unless this propert.y
has been already terminated. In a similar way, the
predicate weak-initiates states that a given event initiates a given property unless this property has been
already initiated.
Instances of events and properties are obtained by at.taching a time-point (event, time-point) and a time1 We adopt the variable convention of the original Event.
Calculus where constants are distinguished from variables
by being denoted by names beginning with upper-case
characters.
704
interval (property, time-interval) to event and property
types, respectively.
The first Event Calculus axiom we introduce is the
Mholds-for. It allows us to state that the property p
holds maximally (i.e. there is no larger time-interval
for which it also holds) over (start, end) if an event e
occurs at the tiine start which initiates p, and an event
e' occurs at time end which terminates p, provided
there is no known interruption in between:
Mholds-for(p, (start,end») +0happenLat(e,start) /\ initiates(e,p) A
happens_at(e',end) A terminates(e',p) A
end> .start A not broken-during(p, (start, end»)
In the above axiom, the negation involving the broken predicate is interpreted using negation-as-failure.
This means that properties are assumed to hold uninterrupted over an interval of time on the basis of
failure to determine an interrupting event. Should we
later record a terminating event within this interval,
we can no longer conclude that the property holds over
the interval. This gives us the non-monotonic character of the Event Calculus which deals with default
persistence 2 •
The predicate broken-during is defined as follows:
broken-during(p, (start,end») +0happens_at(e,t) /\ start < t /\
end> t /\ terminates(e,p)
This states that a given property p ceases to hold
at some point during the time-interval (start,end) if
there is an event which terminates p at a time t within
(start,end).
Event Calculus also defines an Iholds-for predicate in
terms of Mholds-for to state that a property holds over
each time-interval included in the maximal one:
Iholds-for(p, (start,end») +0Mholds-for(p, (a,b») /\ start ~ a A end ~ b
Finally, Event Calculus defines the holds-at predicate
which is similar to Iholds-for except that it relates a
property to a time-point rather than a time-interval:
holds-at(p,t) +0Mholds-for(p, (start,end») A
t > start A t < end
In particular, the holds-at predicate states that a property is not valid at the time points at which occur the
events that initiate and terminate it. This negative
conclusion about the validity of properties at the left
and right ends of time-intervals properly stands for
ignorance. Time granularity will allow us to refine descriptions with respect to finer temporal domains.
2To deal with default persistence, [Maim 1992b] presents
an approach to constructive negation in constraint-based
reasoning.
2.1
Macro-events to Model Discrete
Processes
To model discrete processes the basic Event Calculus has been extended with an event decomposition
mechanism that allows us to refine event represent.ations [Evans 1990], [EQUATOR 1991]. Evans introduced the notion of macro-event, which is a finite event.
decomposed into a number of sub-events. The connections between a macro-event and its components are
formalized in the Event Calculus as follows:
happens_at(e,t) +0happens_at(e1,tJ)
happens_at(e2,t2)
happens_at(e3,t3)
happens_at(e4,t4)
A parLoJ(e1,e) A
/\ parLoJ(e2,e) /\
/\ parLoJ(e3,e) /\
A parLoJ(e4,e)
where the predicate parLof is defined by means of appropriate domain axioms.
This axiom allows us to derive the occurrence of a
macro-event from the occurrences of it.s sub-events. It.
can also be used to abduce the occurrence of sub-events
from the occurrence of the macro-event.
2.2
Continuous Change to Model
Continuous Processes
The basic Event Calculus is well-equipped to represent
discrete processes, but is not so good for representing continuous processes, i.e. processes characterized
by a continuous variation in a quantity such a.s t.he
height of a falling object or the angular position of a
crankshaft. Modelling a continuous process in terms of
its temporal snapshots, in fact, can be seen a partiClllar case of event decomposition, but cannot he directly
done by means of macro-events. To model cont.inuous processes, Event Calculus has been extended wit.h
the idea of the trajectory of a continuously changing
property through a space of values [Shanahan 1990],
[Shanahan 1991], [EQUATOR 1991]. Shanahan introduced the notion of 'dynamic' propert.ies, like motJing
of a train. When such a property holds, another property is continuously changing, such as position of the
train. Continuously changing properties are modelled
as trajectories. Formally, the holds-at axiom which
gives value to a continuously changing property is:
holds_at (p, t2) +0happens_at(e,t1) A initiates(e,q) /\ t1 < t2 A
not broken_during(q, (tt, t2»/\
trajectory(q,t1,p,t2)
In this axiom, the continuously changing propert.y p
can be assigned a given value at a time point t2 if
an instance of the relevant dynamic property q is initiated at a time point t1 (before t2) and not. broken
705
at some point between t1 and t2. The predicate trajectory describes the functional relationship between
the continuously changing property and the time that
has elapsed since it started to change. It can be seen
as a path plotted against time through the corresponding quantity space. The formula trajectory(q, n,p, t2)
represents that property p holds at time t2 on the trajectory of the period of continuous change represented
by q which start at time tl. Such a property p holds
only instantaneously and represents that some quantity varying continuously has a particular value. Its
definition is domain specific. That is, a set of trajectory clauses is also part of the description of the domain, along with the domain's initiates and terminates
clauses.
For example, suppose that the angular position of
a crankshaft increasing linearly with time whilst the
shaft is rotating. If w is the angular velocity of the
crankshaft, we have the following domain axiom:
trajectory(rotating, t1, angle(a2), t2) ..holds-Llt(angle(a1), t1)" a2 = w(t2 - tl)
3
+ a1
Representing Time Granularity
This section first introduces the notion of temporal
universe as a set of related, differently grained temporal domains. Such a notion supports the definition of
the relations of indistinguishability and distinguishability among the time-points of the domains. Then, it
precisely states the linkage between events and properties, and time granularity.
3.1
The Temporal Universe
Providing a representation with time granularity requires introducing a finite set of disjoint temporal domains that constitutes the temporal universe of the
representation:
T=
U
T,
'=1 .... n
The set {Tt, T 2 , .. , Tn} is totally ordered on the basis of
the degree of fineness (coarseness) of its elements and,
for each i, with 1 ::; i < n, Ti+l is said of a finer granularity than Ti. Each domain is discrete with the possible exception of the finest domain that may be dense.
For the sake of simplicity, we assume that each domain is denumerable. The temporal universe includes
at most one dense domain because each dense domain
is already at the finest level of granularity, since it
allows any degree of precision in measuring time distances. As a consequence, for dense domains we must
distinguish granularity from metric, while for discrete
domains we can define granularity in terms of set cardinality and assimilate it to a natural notion of metric 3 .
For the sake of simplicity, we assume that each domain
is denumerable.
For each pair of domains Ti, Ti+l, a mapping is
defined that maps each time-point of Ti into a t.imeinterval of Ti+l (totality). mathemat.ical expressions
we use the SUCCi(t) denotes the It maps contignous time-points into contiguous, disjoint time-intervals
(contiguity) preserving the ordering of the domains
(order preserving). Moreover, the union set. of the
time-intervals of Ti+l belonging to its range is eq1lal
to Ti+l (coverage). Finally, we assume that the length
of time-intervals into which it maps the time-point.s
of Ti is constant (homogeneity). This const.ant., denoted by ~i.i+l' defines the conversion factor bet.ween
Ti and Ti+l which provides a relative measurement of
the granularity of Ti and Ti+l wit.h respect. to each
other. A general mapping between Ti and ~, wit.h
Ti coarser than T J , can be easily obtained by a suitable composition of a number of elementary mappings. It is formally defined in a recursive way in
[Corsetti et al. 1991a], where it is also shown that the
properties of totality, contiguity, order preserving, coverage and homogeneity are preserved.
In general, there are several ways to define these mappings each one satisfying the required propert.ies. According to the intended meaning of the mappings as decomposition functions, each time-point of Ti is mapped
into the set of time-points of Ti+l that compose it..
Nevertheless, we are faced by a number of alternative possibilities in settling the reference time-point of
each domain. Choosing the one or the other is merely
a matter of convention, but it determines the actnal
form of the mappings. In the following, assume that,
for each pair Ti, Tj, the relevant function maps t.he
reference time-point of Ti into a time-interval of 7~
whose first element is the reference time-point of TJ
(reference time-points alignment assumption).
To include the notion of temporal universe in t.he
Event Calculus, we introduce the predicat.e valuemetric which splits each time-point (1st argument)
into a metric (2nd argument) and a value (3rd arg11ment) components. Moreover, we express metrics as
a subset of integer. Let us consider a temporal universe consisting of hours, minutes and seconds, a.nd
assign by convention the metric 1 to the doma.in of
3 "Mapping, say, a set of reals into anot.her set of rcals
would only mean changing the lmit of measure wit.h no semantic effect. Just in the same way one could decide t.o
describe geomet.ric facts by using, say, Kmeters and centimetres. However, if Kmeters are measured by real numbers, the same level of precision as with centimetres can b~
achieved. Instead, the key point in time granularit.y is t.hat.
saying that something holds for all days in a given int.erval
does not imply that it holds every second within the 'sam~'
interval" [Corsetti et al. 1991c).
706
seconds (in general, metric 1 is assigned to the finest
domain), the metric 60 to the domain of minutes )1
minu te corresponds to 60 seconds) and the metric 3600
to the~domain of hours (1 hour corresponds to 3600 seconds). As an example, value-metric(2hrs30m,60,150)
since there are 60 minutes in an hour. Using the predicate value-metric, decomposition functions can be defined as follows:
fine_grain_of( < tl, 12 >, t)+value_metric(t, m, v) /\
value_metric(tl, ml, vI) /\
value_metric(t2, ml, v2) /\ ml :::; m /\
vI = v * (mimI) /\ v2 = (v + 1) * (mimI) - 1
Given a pair of domains Ti, Tj , with Ti coarser grain
of Tj, for each time-point tj of Tj we also define as its
coarse grain equivalent on T, the time-point ti of Ti
such that tj belongs to the time-interval obtained by
applying the corresponding decomposition function to
ti. The uniquity of the coarse grain equivalents can be
easily deduced from the definition of the decomposition
functions. Coarse grain equivalent functions can also
be defined using the predicate value-metric as follows:
coarse_grain_of(t2, tl)+value_metric( t1, m1, vI) /\
value_metric(t2, m2, v2) /\
m1 :::; m2/\ v2 = (vI * ml)llm2
where (vI * m1)1 1m2 denotes the integer division of
(vI *ml) by m2.
The relationships of temporal ordering can be generalized to make it possible to compare two time-points
belonging to different temporal domains as follows:
is-Clfter(t2, tl)+value_metric(tl, m, vI) /\
coarse_grain_of(t, t2) /\
value_metric(t, m, v) /\
vI < v
is_after(t2, tl)+valuLmetric(t2, m, v2) /\
coarse_grain_of(t, tl) /\
value_metric(t, m, v) /\
v < v2
The is_before predicate can be easily defined in a similar way.
The coarse grain equivalent and the decomposition
functions can be viewed as forms of simplification and
articulation along the dimension of temporal aggregation, i.e. shifts in focus through part-whole relationships among time-points, respectively. They define distinguishability and undistinguishability relations between any pair of time-points with respect to each domain of the temporal universe.
3.2
Events and Properties In the
Temporal Universe
Let us now locate events and properties in t.he temporal universe. The idea is to directly associate a time
granularity with events and to derive the granularity
of propert.ies on the basis of the initiates and termirwtc
relations.
First of all, we give a characterization of events with respect to the temporal universe. With respect to a given
domain, we distinguish instantaneou,'1 events, that. happen at. a time-point., and events with duration, that
take place over a nonpoint time-interval. Such a distinction among events is a relative one, so, for example,
passing from a given domain to a finer (coarser) one
an instantaneous (with duration) event may become
an event with duration (instantaneous).
With respect to the temporal universe, we distinguish
finite and infinitesimal events. An event is said finite
if there exists a domain with respect to which it has
duration. A finite event thus identifies an implicit level
of time granularity: at this level and coarser ones, it is
an inst.antaneous event; at finer levels it is of finit.e duration. We define such a threshold the intrinsic time
granularity of the event. An event is said infinitesimal
if it is instantaneolls with respect to every doma.in 4 .
Infinit.esimal event.s are needed for dealing with continuous change [Shanahan 1990]. Let us consider, for
instance, a process of continuous change such as sink
filling with wat.er. We might. associate the occurrence
of an event. with each new level reached by the filling
fluid. If we did t.his, t.hen there would be no limit to
how fine we might choose our temporal grain in order
for the events to remain instantaneous. Thus t.a.king
this approach we have a need for infinitesimal event.s.
Differently from the previous one, such a dist.inction
among events is an absolute one.
To be able to deal with instantaneous events only, we
impose that every event is associated with a domain
whose granularity is equal to or coarser than the intrinsic one of the event. In such a way, Event Calculus axioms can be still used to reason within domains. On the contrary, they are insufficient by themselves to deal wit.h events associated with different domains (differently grained events). However, reasoning across domains can be brought back to reasoning
within domains provided that there exist some rules t.o
relate differently grained events to the same domain.
The idea is to integrate macro-events (section 3.3) and
continuous change (section 3.4) mechanisms with time
granularity, and to define general temporal project.ion
4The absolute instantaneousness of infinitesimal event.s
copes with the same representational problems t.hat
suggested to Hayes and Allen the introduction of
short time periods (moments) in Allen's Interval Logic
[Hayes and Allen 1987J.
707
rules (section 4) that are used by default when neither macro-events nor continuous change decompositions are explicitly given.
3.3
Refining Macro-Events
We define a unifying framework for the packaging of
events and the granularity of time to describe the temporal relationships between a macro-event and its components. We require that the intrinsic time granularity
of a macro-event is coarser than the ones of its subevents and that its occurrence time is a coarse grain
equivalent of the occurrence time of all its sub-events.
We also define a number of general operators, called
macro-event constructors, for specifying temporal relationships among sub-events [EQUATOR 1991](we use
the infix notation for macro-event constructors for the
sake of simplicity)5:
sequence
;delay(min,max) minimum and maximum delay
between two events
alternative
I
parallelism
II
sequential repetition (n is optional)
parallel repetition (n is optional)
composition
[]
Let us report here the Event Calculus axiomatization
of the basic operators expressing sequence, alternative,
and parallelism.
happens..at(e1; e2, t)happens..at( e1, t1) " happens_at( e2, t2)"
coarsf-grain_of(t, tl)" coarse_grain_of(t, t2)
is_after(t2, t1)
The operator expressing sequence deserves further
consideration. It allows us to deduce the occurrence of
a macro-event at a time-point of a domain coarser than
the domain(s) the occurrence times of its component
events (possibly macro-events in their turn) belong to.
Such a time-point is a coarse grain equivalent of both
the occurrence times of components. Then, the rule for
sequential macro-events first executes a comparison of
time-points with respect to the finer domain and then
it abstracts them into a time-point of the coarser one.
The presence of this switching to a coarser domain
makes the definition of sequential macro-events incomplete. Consider the following example. Given the occurrences of three events e1, e2 and e3 at time-point.s
2hrs15m, 2hr42m and 2hrs50m, respectively, we are
not able to deduce the occurrence of a sequential event.
e1; [e2; e3] at time-point 2hr s when the temporal universe is { ... , hours, minutes, ... }. In fact, there is no
way of strictly ordering e1 and the macro-event. into
which e2 and e3 can be abstracted, because the occurrence time of the macro-event is a coarse grain equiva.lent of the occurrence time of d. To make it possible
to derive the occurrence the macro-event el; [e2; e3],
the temporal universe has to be extended wit.h the
domain of 3D-minutes (similar considerations hold for
the macro-event [eli e2]; e3). However, it is easy t.o
find another sequence that cannot be abstracted into
a sequential macro-event with respect to the extended
temporal universe too. Such an incompleteness is due
to the fact that mappings between temporal domains
are fixed once and for all and then is inherent to t.he
upward tempora.l projection involved in macro-event.
derivation rules (section 4.1).
happens..at(e1Ie2, t)happens..at( e1, t)" not happens..at( e2, t)
3.4
happens..at(e1Ie2, t)not happens..at{e1, t)" happens..at(e2, t)
The original approach to con tin uous change makes
the assumptions that the parameters of the trajectory
function are set not after tl and are not reset between
tl and t2. In general, these assumptions are too restrictive. Mechanisms are requested for resetting t.he
parameters of the trajectory function. This allows it to
be initiated with parameters values at the start of the
property, but also allows the parameters to be changed
during the interval of validity of the property_ In this
way, the trajectory may model 'non-linearities' (e.g. a.
change in the rate of a linear increase of a temperature) without interrupting the relevant dynamic property (e.g. by splitting a 'temperature rising' property
when the rate of rise changes).
happens..at(e11Ie2, t)happens..at( e1, t)" happens..at( e2, t)
In general, domain axioms include definition of macroevents in terms of a suitable composition of su b-events.
An example of these domain axioms is:
happens..at( e, t)happens..at([e1; [e21Ie3]], t)"
parLof(e1, e) AparLof(e2, e)" parLof(e3, e)
5Dealing with the repetition operators may require the
addition of a domain composed of a single time point to the
temporal universe (absolutely coarsest domain).
Refining Continuous Change
To take into account the resetting of parameters the
original axiom can be replaced by the following one:
708
holds_at(p,t) +value_metric(t, m, v)A
happens..at(e, tl) A value_metric(tl, m, vl)A
~vl < v A initiates( e, q)A
not broken..during( q, (tl, t»)A
happens..at(e', t2) A value_metric(t2, m, v2)A
v2 < v A initiates(e',par)A
not broken..during(par, (t2, t»)A
max(tl,t2,ti) A trajectory(q,par,ti,p,t)
A continuously changing property p can be assigned a
given value at a time point t if an instance of the relevant dynamic property q is initiated at a time point tl
(before t) and not broken at some point between tl and
t, the relevant parameter par is (re)set at a time point
t2 (before t) and not broken at some point between t2
and t, and the initial value of p is calculated (by the
trajectory predicate) at the time point ti which is the
maximum between tl and t2 and the 'max' predicate
has the obvious definition. The crankshaft example of
section 2.2 must be rewritten according to the revisited
axiom as:
trajectory(rotating, velocity(w), ti, angle(a), t) value_metric( ti, m, vi) A value_metric( t, m, v)A
holds - at(angle(ai), ti) A a = w(v - vi) + ai
The indirect recursion on the predicate trajectory (or,
equivalently, on the predicate holds_at) stops when the
initial values of the configuration variables, e.g. the
angular position, are reached. They can be explicitly
asserted or derived from the occurrence of independent
events.
The application of the refined axiom for continuous
change is not restricted to discrete resetting of parameters; it can be used to deal with continuously changing parameters too. In such a case, the occurrence of
the continuous events of resetting can be derived from
the continuous change of the configuration variables
by means of appropriate domain axioms.
Continuous events can be either acquired by the external environment or computed according to explicit
laws. In both cases, we generally need to plot them at
regular time intervals to make the model computable.
Choosing the width of the time interval is equivalent to
choosing the time granularity at which describing the
process. Then, a change in the frequency of plotting
is equivalent to the switching of a continuous process
from one time granularity to another.
4
Reasoning
Granularity
with
Time
We distinguished two basic modalities of relating differently grained events, namely upward and downward
temporal projections. Upward (downward) projection
determines the temporal relat.ions set up by two events
ei and ej which occur at the time-points ti E Ti and
tj E Tj, respectively, wit.h Ti coarser than Tj, by upward (downward) projecting ej (ej) on Tj (Tj).
4.1
'Naive' Upward Projection
The 'naive' upward projection is a quite straightforward approach to abstractive t.emporal reasoning. It.
states that the upward projection of an event e that
occurs at a time-point t of a domain Tj on a domain Ti,
coarser than Tj, is accomplished by simply replacing
t with its coarse grain equivalent on Ti [Evans 1990].
Then the temporal ordering and distance between two
events ei and ej which occur at the time-points ti E Ti
and tj E Tj, respectively, are determined on the basis
of the relation between tj and the coarse grain equivalent of tj on Tj. Moreover, if ej (ej) precedes ej (ed
then the properties initiated by ej (e j) and terminated
by e j (ed hold over the time-interval of identified by
ti and the coarse grain equivalent of tj. To formalize
upward projection in the Event Calculus, we first extend the definition of the occurrence time of an event.
as follows:
n
happens..at( e, tl)happens_at(e, t2) A coarse - grain - ol(tl, t2) ,
In this way, each event is endowed with several occurrence times belonging to different domains, i.e. the
time-point at which it originally occurs and all the
coarse grain equivalents of such a point. Combined
with the macro-event derivation rules, upward projection allows us to deduce the occurrence of parallel and alternative macro-events at time-points of domains coarser than the domains at which occur their
components.
Upward projection can be seen as a simplification rule
[Hobbs 1985], because it allows us to derive a relation of temporal indistinguishability, i.e. simultaneity,
among events from the relation of indistinguishabilit.y
among time-points defined by coarse grain equivalent
functions.
Then, the Mholds-for predicate is redefined to constrain the starting and the ending time-points of the
time-interval to belong to the same domain:
Mholds-for(p, (start,end») +happens_at(e,start) A initiates(e,p) A
value-metric(start, m, vs) A
happens_at(e' ,end) A terminates(e' ,p) A
value-metric(end,m,ve) A
vs < ve 1\ not broken-during(p, (start,end»)
together with a similar axiom for the predicate brokenduring.
In such a way, the predicate Mholds-for identifies several time-intervals of ditferents domains over which the
709
properties initiated and terminated by two differently
grained events hold.
In despite of its apparent simplicity upward projection involves a number of semantic assumptions. The
most relevant one is related to its application to contradictory events, i.e. events that cannot occur simultaneously. We formally define two events as contradictory if they initiate or terminate incompatible properties. The definition of the relation of incompatibility
among properties depends on domain-specific knowledge [Kowalski and Sergot 1986].
Upward projection maintains the weak temporal ordering between events, but it does not always preserve the
strict one. Then the logical consistency of the upward
projection cannot be guaranteed in the general case,
because it may enforce contradictory events to occur
at the same time-point in a coarser domain. As a consequence, if two differently grained events are contradictory the coarse grain equivalent of the occurrence
time of the fine grained event must be different from
the occurrence time of the coarse-grained event. This
is guaranteed by the following integrity constraint 6 :
+-
happens.JLt(e1, t) /\ happens_at(e2, t)/\
cont'radictory( e1, e2)
Moreover, upward projection may change the ratio between the width of time-intervals. That is, given two
domains Ti and TJ', with Ti coarser than T i , the coarse
grain equivalents on Ti of two pairs of time-points of
Tj which are at the same temporal distance may be at
a different one, while the coarse grain equivalents on Ti
of two pair of time-points of TJ that are at a different
temporal distance may be at the same one.
Such a weakness of the 'naive' upward projection will
be overcome refining upward projection according to
the downward projection schema we are going to define.
4.2
Downward Projection
The downward projection of an event e that occurs at
a time-point t of a domain Ti on a domain TJ finer
than Ti is accomplished by applying the following decomposition scheme: for each event e that occurs at a
time-point t of Ti there exist two infinitesimal event.s
ei and e f that occur at the time-points ti and t J of
6This solution can be generalized by making cont.radiction dependent on granularities or even on time instants.
In such a way, simultaneous occurrence of two events can
be classified as contradictory in certain domains, or even in
certain time instants of them, only.
The relevant integrity constraint becomes:
+-
happens_at(el, t) /\ happens_at(e2, t)/\
contradictory(el, e2, t)
Tj, respectively, and such that (i) ti ~ t J; (ii) t i~ tllp
coarse grain equivalent on Ti of both ti and t,; (iii)
for each property p such that p is terminated hy (',
there exist an event e1' that occurs at t1' of Tj such
that e1' terminates p and ti ~ t1' ~ t J; (iv) for each
property q such that q is initiated bye, there exist.
an event e q . that occurs at tq of Tj such that eq initiates q and ti ~ tq ~ t f; (v) the (type of the) event
e becomes a dynamic property that is initiated by fi
and terminated bye, with respect to Tj. Because of
an event is defined by the properties that it initiat.es
and terminates, such rules provide the definition of the
component events ei, e" e 1' and e/.
Downward projection can be seen as an articulation
rule [Hobbs 1985]. From the relation of distinguishability among time-points of the finer domain introduced by the decomposition function, in fact, it. derives a relation of temporal distinguishability among
the sub events of a given finite event.
Let us formalize this scheme in the Event Calculus. First of all, we define two functions begin and
end that map a given instance of a macro-event. int.o
its initiating and terminating events, respectively. The
occurrence of such events can be deduced from the occurrence of the macro-event by means of the following
aXIOms:
happens_at{begin( e, t), time(begin( e, t)))+happens_at( e, t) /\
coarse - grain - of(t, time(begin{e, t)))
happens_at(end(e, t), time(end(e, t)))+happens_at(e, t) /\
coarse - grain - of(t, time(end(e, t)))
together with (condition (ii)):
coarse - grain - of(t, time(begin(e, t)))
coarse - grain - of(t, time(end(e, t)))
where time(begin(e, t)) and time(end(e, t)) denote t.he
occurrence times of begin( e, t) and end( e, t), resp~c
tively.
Condition (i) is expressed by the following integrity
constraint:
+-
iSJlfter(time(begin( e, t)), time( end( e, t)))
Let us now represent e1' and e q by means of two functions term and in. For each property p (q), term
(in) maps each instance of a given macro-event int.o
the component event that terminates (initiat.es) sl1ch
a property. Using these functions, conditions (iii) and
(iv) are codified by the following axioms:
e,
7When ti and t J coincide, the events ei, €1" €q and
are merged into the original single event e. This is always
the case of the downward projection of infinitesimal events.
For instance, the infinitesimal event of swit.ching on t.he
light remains instantaneous with respect to all the domains
of the temporal universe composed of {Day, Hour, Minute}.
710
terminates(term(e, t,p),p) - terminates(e,p)
initiates(in(e, t, q), q) - initiates(e, q)
togetlfer with:
-
is-D./ter(time(begin( e, t)), time( term( e, t, p)))V
is_be/ore(time( end( e, t)), time(term( e, t, p»)
- is-D./ter(time(begin(e, t»), time(in(e, t, q»)V
is_be/ore(time(end(e, t)), time(in(e, t, q)))
for each property p and q.
Finally, condition (v) is expressed by the following axioms:
initiates(begin( e,t) ,e)
terminates( end( e,t ),e)
They allow us to state that the property e holds
over (time(begin(e, t)), time(end(e, t») by means of
the Mholds-for axiom. These last axioms provide each
temporal object with a twofold event/property characterization. That is, (the type of) an event e, associated
with a given domain, may become a dynamic prop~rty
with respect to a finer domain, and vice versa.
Let us consider, as an example, the event of flying from
Milan to Venice. With respect to the domain TH of
hours it can be modeled as an instantaneous event that
occurs at a time-point t of TH. Such an event terminates the property of being in Milan and initiates the
property of being in Venice. With respect to the domain TM of minutes, it can be decomposed into a pair
of infinitesimal events /lyingi and flying, that occur
a~ the time-points ti and t f of TM, respectively, with
ti ::; t f, and such that t is the coarse grain eq ui valent
of both. Moreover, flyingi terminates the property
of being in Milan and initiates the property of flying,
while flying, terminates the property of flying and
initiates the property of being in Venice.
4.3
'Revised' Upward Projection
The event/property duality introduced by downward
projection suggests an extension of the upward projection rules to cope with contradictory events without
restrictions. When the coarse grain equivalents of two
contradictory events coincide the downward projection
schema suggests to merge and replace the events by a
macro-event corresponding to the conjunction of the
properties initiated by the first one and terminated by
the second one. Moreover, such a macro-event terminates (initiates) all the properties terminated (initiated) by its first (second) component and every property terminated (initiated) by the second (first) component which is not initiated (terminated) by the first
(second) component.
Let us consider, as an example, the events of leaving
station A and arriving at station B of a train. The
first one terminates the property of the train of being
at station A and initiates the property of moving, while
the second one terminates the property of moving and
initiates the property of being at station B. Let be T a
domain with respect to which the two events are simultaneous. According to the revised upward projection
rules they are merged and replaced by the event of
moving that terminates the property of being at station A and initiates the property of being at station
B.
The actual structure of the corresponding macroevent can be given in terms of a suitable composition of
the component events using macro-event constructors.
Consider two contradictory events e1 and e2. If their
temporal ordering is known and meaningful, e.g. el
precedes e2, then the corresponding macro-event e is a
sequential one, that is, el; e2; if their temporal ordering is meaningless (their global effect does not change
even if their ordering changes), and possibly unknown,
then the corresponding macro-event is a parallel one,
that is ellle2; if their temporal ordering is meaningful
and unknown, then the corresponding macro-event is
[ellle2]l[[el; e2]I[e2; ell]; and so on.
The last one is the case, for instance, of events of rotation around orthogonal axes in the three dimensional
space which are not commutative, that is, the final
configuration of the rotating system depends on the
ordering of their occurrences.
5
Conclusion
The paper made a proposal for embedding the notion
of time granularity into a logic-based representat.ion
language. Firstly, it enumerated a number of notational and computational reasons that motivate the
introduction of time granularity and briefly surveied
and discussed the existing relevant literature. Successively,it extended the Event Calculus to deal with time
granularity by introd ucing the concepts of temporal
universe, finite and infinitesimal events, macro-event.,
and continuously changing events and properties. Finally, it provided Event Calculus with the axioms supporting upward and downward temporal projection.
Acknow ledgements
We would like to thank Chris Evans of Goldsmiths'
College, University of London, and Murray Shanahan
of Imperial College for the useful discussions we had
with them.
The research for this paper was partially funded by t.he
European Community ESPRIT Program, EQUATOR
Project no. 2409 [EQUATOR 1991]. Collaborat.ing organizat.ions are CENA (France), CISE (Italy), EPFL
(Switzerland), ERIA (Spain), ETRA (Spain), Ferrant.i
711
Computer Systems Ltd. (UK), Imperial College (UK),
LABEN (Italy), Politecnico di Milano (Italy), SWIFT
(Belgi~m), SYSECA (France), UCL (UK). The work
of CISE was partially funded by the Automatica Research Center (CRA) of the Electricity Board of Italy
(ENEL) within the VASTA project too.
References
[Allen 1983] Allen, J., Maintaining Knowledge about
Temporal Intervals; Communications of the
ACM, 26, 11, 1983.
[Allen 1984] Allen, J., Toward a General Theory of
Action and Time, Artificial Intelligence, Vol.
23, No.2, July 1984.
[Clifford and Rao 1988] Clifford,J., Rao,A., A Simple, General Structure for Temporal Domains
in Temporal Aspects in Information Systems,
Rolland, C., Bodart, F., Leonard, M. (Eds.),
IFIP 1988.
[Ciapessoni et al. 1992] Ciapessoni,E., Corsetti, E.,
Montanari,A., San Pietro,P., Embedding Time
Granularity in a Logical Specification Language
for Synchronous Real- Time Systems; submitted to Science of Computer Programming,
North-Holland, January 1992.
[Corsetti et al. 1990] Corsetti, E., Montanari, A.,
Ratto, E., A Methodology for Real- Time System Specifications based on Knowledge Representation; in Computational Intelligence, III,
N. Cercone, F. Gardin, G. Valle (Eds.), NorthHolland, Proc. of the International Symposium, Milan, Italy, 24-28 September, 1990.
[Corsetti et al. 1991a] Corsetti, E., Montanari, A.,
Ratto, E., Dealing with Different Time Granularities in Formal Specification of Real- Time
Systems; The Journal of Real-Time Systems,
Vol. III, Issue 2, June 1991.
[Corsetti et al. 1991 b] Corsetti, E., Montanari, A.,
Ratto, E., Time Granularity in Logical Specifications, Proc. 6th Italian Conference on Logic
Programming, Pisa, Italy, June 1991.
[Corsetti et al. 1991c] Corsetti, E., Crivelli, E., Mandrioli, D., Montanari, A., Morzenti, A., San
Pietro, P., Ratto, E., Dealing with Different
Time Scales in Formal Specifications; Proc. 6th
International Workshop on Software Specification and Design, Como, Italy, October 1991.
[Dean and Boddy 1988] Dean, T., Boddy, M., Reasoning about partially ordered events; Artificial
Intelligence, 36, 1988.
[Dean et al. 1988] Dean, T., Firby, R., Miller, D., Hierarchical Planning involving deadlines, travel
time and resource,,!;
gence, 4, 1988.
Computational Intelli-
[Dean 1989] Dean, T., Using Temporal Hierarchies to Efficiently Maintain Large Temporal
Databases; Journal of ACM, 36, 4, 1989.
[Evans 1990] Evans, C., The Macro-Event Calculus: Representing Temporal Granularity; Proe.
PRICAI, Japan 1990.
[Fum et al. 1989] Fum, D., Guida, G., Montanari, A.,
Tasso, C., Using Levels and Viewpoints in
Text Rep"esentation; in Artificial Intelligence
and Information-Control Systems of Robot.s 89, North-Holland, I. Plander (Ed.), Proc.
5th International Conference, Strbske Pleso,
Czechoslovakia, 6-10 November, 1989.
[Galton 1987] Galton, A., The Logic of Occurrence; in
Temporal Logics and their applications, Galton
A., (Ed.), Academic Press, 1987.
[Giunchiglia and Walsh 1989] Giunchiglia, F., Walsh,
T., Abstract Theorem Proving; Proc. 11th TJCAl, Detroit, USA, 1989.
[Greer and McCalla 1989] Greer, J., McCaHa, G., A
Computational Framework for Granularity and
its Application to Educntional Diagnosis; Proc.
11th IJCAI, Detroit, USA 1989.
[EQUATOR 1991] Formal Specification of the GRF
and CRL; CISE, FERRANTI, SYSECA (Eds.),
ESPRIT Project no. 2409 EQUATOR, Deliverable D123-1, 1991.
[Hayes and Allen 1987] Hayes, P., Allen, J., Shod
Time Pe"iods; Proc. 10th IlCAI, Milano, Italy
1987.
[Hobbs 1985] Hobbs, J., Granularity; Proc. 9th lJCAl, Los Angeles, USA 1985.
[Kautz and Allen 1986] Kautz, II., Allen, J., Generalized Plan Recognition; Proc. AAAI, 1986.
[Kowalski and Sergot 1986] Kowalski, R., Sergot, M.,
A Logic-based Calculus of Events; New Generation Computing, 4, 1986.
[Maim 1991] Maim, E., Reasonig with Different Granularities in CRL; Proc. IMACS '91, Dublin,
Ireland, July 1991.
[Maim 1992a] Maim, E., Uniform Event Calculu.
.t;
2
IQ
GI
C)
e
~
IQ
o
append 100 nreverse 30 qsort 50
primes 100
8 queen
Figure 7: average number of active contexts in the sample programs
gram, and this is over twice as big as in other programs
such as quick sort 50 and 8 queens. This is because it
takes about 120 instructions to perform an integer division which is required in primes 100. For other similar
programs which require multiplication and/or division
of integer and/or floating point, low performance is also
expected. But, because the management processor has
its own FPU (floating point unit) in the IUs of PIE64,
UNIREDll can pass such calculation to the MP and can
concentrate on reducing goals. However, the evaluation
has not been done yet.
4.3
Tolerance of Remote Access Latency
To evaluate tolerance of remote memory access latency,
we incorporated a pseudo-remote access mechanism in
721
0
Cl
2500000
r.a
pipeline sleep
2500000
pipeline hold
0
D
D
invalidated insts.
8
2000000
2000000
VI
VI
CP
CP
u>-
1500000
U
>U
.:i!
U
0
m
1500000
u
pipeline sleep
pipeline hold
invalidated insts.
executed inslS.
.:i!
U
0
1000000
U
U
500000
500000
o
1000000
o
o
20
40
60
remote pointer ratlo[%]
80
100
Figure 8: all sorts of clock cycles vs. remote memory
access (8 queens, the maximum number of contexts = 1)
EI
2000000
IZl
EI
20
40
60
80
remote pointer ratio[%]
100
Figure 10: all sorts of clock cycles vs. remote memory
access (8 queens, the maximum number of contexts = 4)
2500000 "T"'"'"---r-----r----T"'"'"'"'--r----:
o
o
1.6,--------------------------,---nreverse 30
pipeline sleep
pipeline hold
1.5
invalidated insts.
executed inslS.
VI
c..
CP
1.4
:::>
U
>U
"0
CI)
CI)
.:i!
U
1.3
qsort 50
••
primes 100
6
8 queen
c..
o
(/)
U
o
20
60
40
remote pointer ratlo[%]
80
1.0....L...-. .=---+----1-----i-none
+derf +dfclldfcc +dcll/exll
condition
100
Figure 11: effects of dereference instructions
Figure 9: all sorts of clock cycles vs. remote memory
access (8 queens, the maximum number of contexts = 2)
the simulator in spite of the single processor model of it
as shown in figure 5. In more detail, we change the value
of the IU-identifier field of the pointers included in every
goal when reduction of the goal starts or resumes after
suspension, with the probability which we call remote
pointer ratio. Remote memory access commands issued
by UNIREDII are emulated by the command processor
shown in figure 5 with cycles listed in table 3. Under
these conditions, we varied the maximum number of the
contexts from one to four, and measured the clock cycles
required by all sorts of the pipelined execution of instructions using the 8 queens program. Results are shown in
figure 8 to 10. In these figures, the lowest part (shadowed) of the graph represents the number of executed
instructions, the second part (hatched) represents the
number of invalidated instructions by some jumps, the
third part (lightly shadowed) the number of cycles while
the internal pipeline of UNIREDII holds, and the fourth,
uppermost part (white) the number of cycles while the
pipeline are sleeping because, waiting for some replies,
no contexts can be executed.
In figure 8, the multi-context processing mechanism of
UNIREDil is not activated because the maximum number of the active contexts is set to one. Therefore the
pipeline sleeping time (the white part of the graph) can
not be hidden and becomes longer and longer as the remote memory access increases. Moreover, the pipeline
hold time and the amount of invalidated instructions are
great because the pipeline interlock occurs frequently.
In the other two figures (figure 9 and 10), the multicontext processing mechanism works and works O1ol'f' effectively as the number of contexts increase. The pipeline
sleeping time is least in the figure 10 and the pipeline
interlock (the pipeline hold and the instruction invalidation) hardly occurs in that figure. They become a little
longer as the remote memory access increases because
the average number of the active contexts decreases. Figure 9 shows an intermediate state between figure ~ a.nd
10.
4.4
Effects of Dedicated Instructions
Finally, we present the effect of the dereference instructions, which are most characteristic of the instruction
set of UNIREDII. Figure 11 shows the speed up about
four sample programs (naive reverse 30, quick sort 50,
primes 100, 8 queen) without the dereference instructions
(the dereference instructions are resolved into more basic instructions), with only the basic dereference (derf)
722
instruction, with the dereference-and-check-listj constant
(dfcl j dfcc) instruction, and with the all combined instructions such as the dereference-and-check-list-Ioad-car
j execute-on-list-Ioad-car (dclljexll) instruction, respectively.
In the figure, the speed up of the basic dereference instruction is about 10 % except in toe primes 100 program, in which the majority of the executed instructions
are arithmetic ones. In addition, the combined instructions have their effect as shown, and the total effect
of these instructions is about 30% except primes 100.
Therefore it can be said that the dereference instructions
have a great effect.
Acknowledgements
We specially thank Prof. J .A.Robinson for much helpful
advice. And we also thank the members of the group
SIGLE in our laboratory, namely Tadashi Saito, Eiichi
Takahashi, Minoru Yoshiada, Takeshi Shimizu, Yasuo
Hidaka, Jun'ichi Tatemura, Hidemoto Nakada, Kei Yamamoto, Hajime Maeda, Shougo Shibauti, and Takashi
Matsumoto. This work was supported by Grant-in-Aid
for Specially Prompted Research (No.62065002), and is
now supported by Grant-in-Aid for Encouragement of
Young Scientists (No.03001269) of the Ministry of Education, Science and Culture.
References
5
Discussion
In the previous subsection, we present the effect of the
dereference instructions and the combined ones. One
point is that they are not such complicated instructions.
In the hardware design, the instruction decoder does not
include the critical path which actually determines the
maximum clock rate of UNIREDll. The critical path
is included in reading general purpose register file and
ALU calculation. Moreover, all of the instructions of UNIREDll are single-cycle instructions because they jump
to themselves recursively when they need more cycles
to complete their action, as described before in section
3.4.1.
Owing to these dedicated instructions, we can compile
Fleng programs so that the number of executed instructions a.re minimized. As the result, we can achieve high
performance though the clock rate is comparatively slow,
10 MHz.
Finally, we shall mention the effect of the multi-context
processing of UNIREDll. As well as reducing overhead of
inter-processor synchronization, we can reduce pipeline
interlock with it so that we can turn the pipeline of UNIREDll into an interlock-free one.
6
Conclusion
We have described the architecture of the inference processor UNIREDll, and evaluated some aspects of it. We
got a performance of about 1 MRPS with lOMHz clock,
and made certain that the multi-context processing of
UNIREDll has a big effect on reducing pipeline interlocking and on reducing overhead of the remote memory
access latency. In future, we will evaluate it by larger,
real application programs. And, of course, we will make
the real UNIREDll chip work as PIE64 system element.
[Jordan 1983J Jordan,H.F.: Performance Measurements
on HEP - A Pipelined MIMD Computer, Proc. of
the 10th Annual International Symposium on Computer Architecture, pp.207-212, ACM (1983)
[Halstead and Fujita 1988] Halstead,R. and Fujita,T.:
MASA:A Multithreaded Processor Architecture for
Parallel Symbolic Computing, Proc. of the 15th International Symposium of Computer Architecture,
pp.443-451, IEEE (1988)
[Shimuzu et al. 1989] Shimizu,K., Goto,E.,
and Ichikawa,S.: CPC (Cyclic Pipeline Computer) A n A rchitecture Suited for Josephson and PipelinedMemory Machines, Transactions on Computers,
Vo1.38, No.6, pp.825-832, IEEE (1989)
[Kimura and Chikayama 1987J Kimura,Y.
and Chikayama,T.: An Abstract KLl Machine and
Its Instruction Set Proc. of the 1987 Symposium on
Logic Programming, pp.468-477 (1987)
[Nilsson and Tanaka 1988] Nilsson,M. and Tanaka,H.:
Massively Parallel Implementation of Flat GHC on
the Connection Machine, Proc. of Fifth Generation Computer Systems 1988, pp.1031-1040, ICOT
(1988)
[Koike and Tanaka 1988] Koike,H. and Tanaka,H.:
Multi-Context Processing and Data Balancing
Mechanism of the Parallel Inference Machine
PIE64 , Proc. of Fifth Generation Computer Systems 1988, pp.970-977, ICOT (1988)
[Takahashi et al. 1991] Takahashi,E., Shimizu,T.,
Koike,H., and Tanaka,H.: A Study of a High Bandwidth and Low latency Interconnection Network in
PIE64, Proc. of Pacific Rim Conference on Communications, Computers and Signal Processing, pp.5-8,
IEEE (1991)
[Shimizu et al. 1991] Shimizu,T.,Koike,H.,and
Tanaka,H.: Details of the Network Interface ProcesS07' for PIE64, (in Japanese) SIG Reports on Computer Architecture, 87-5, IPSJ (1991)
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
723
Hardware Implementation of Dynamic Load Balancing
in the Parallel Inference Machine PIM/c
T. NAKAGAWA, N. IDO, T. TARUI, M. ASAIE and M. SUGIE
Central Research Laboratory, Hitachi Ltd.
Higashi-Koigakubo, Kokubunji, Tokyo .185, Japan
ABSTRACT
and information processing systems. ICOT has also
produced software in KL 1, including the PIM operating
This paper proposes and evaluates the. hardware
implementation required for dynamic load balancing in
system [3].
We are currently developing the PIM/c [4] as a KL1-
the prototype PIM/c of the Parallel Inference Machine
(PIM). In fine grain multiprocessing, dynamic load
based machine. A hierarchical structure of networkconnected clusters each of which is a bus-connected
balancing is suffering from the high overhead due to the
multiprocessor is introduced to utilize high access
frequent access to load information. Proposed hardware
locality of KLI programs in PIM [5]. Use of locality
can reduce the overhead by speeding up the access to
the load information. In order to utilize the high locality
could restrict the interactions to clusters of several
of logic programs, PIM/c is configured along a
processors and thus reduce the communications among
clusters. Therefore, a double hierarchical organization
hierarchical structure of network-connected clusters
is used in PIM/c.
each of which is a bus-connected multiprocessor.
Therefore two kinds of hardware suitable for each
hierarchy are implemented for dynamic load balancing.
Dynamic load balancing is one of the main research
areas for PIM. As a result of the fact that logical
relations are present in a KLI program and they never
First, in the clusters, we propose a register with
defIne their process of execution with determinacy,
broaC:1cast write feature. The evaluation determines the
dynamic load balancing must be used. For dynamic
reduction of overhead due to memory polling which
load balancing it is necessary to require load
detects a load request. The proposed hardware reduces
the execution time of logic programs by 15%.
existence of idle processors or the value of a total load
information, for example, the information about the
Second, in the network, we propose the use of a
within a cluster. The load information is updated and
shortcut path to request the value of the total load within
a cluster. The evaluation shows that the overhead due to
referenced by distributed processors. In other words
the load information is global, therefore it has no
the request of that value is reduced as a result of
locality.
introducing the shortcut path. The proposed hardware
reduces the execution time by 50%.
The results obtained confIrm that the use of hardware
can reduce the high overhead of dynamic load
balancing.
1. INTRODUCTION
A problem exists in that hardware for normal process
execution in PIM/c: is optimized to the access with
locality. With this type of hardware the latency in
accessing global information is large. In fIne grain
multiprocessing in KLI programs, high frequency and
large latency in accessing load information produces
high overhead. Therefore, extensions in hardware are
introduced in order to reduce the latency of load
Japan's Fifth Generation Computer project [1] has
information in PIM/c.
been centered around ICOT (the Institute for new
In shared bus mUltiprocessors, snooping caches are
generation COmputer Technology). ICOT has
known to reduce the memory latency observed by the
developed the parallel logic programming language
processors [6,9]. There are two types of cache
KLI (Kernel Language-I) [2] to describe knowledge
coherency protocols for rewriting shared data with
724
copies distributed in plural caches; invalidation-type
protocols and broadcast-type protocols. The choice
2. PIM/c HARDWARE FEATURES
PIM/c has the following hardware features:
depends on whether it is preferable to invalidate old
copies for rewriting by the same processor, or to
broadcast the new data for rewriting by other
processors.
Eggers [7] defined "per processor locality" as the
average number of repeated write references to the same
address by the same processor. For normal process
execution in the KLI system, an incremental garbage
collection makes the same processor reuse the same
address repeatedly for different data references [4].
Thus invalidation protocols are more suitable due to
high "per processor locality".
For dynamic load balancing, broadcast protocols are
preferable in order to access load information
efficiently. Although protocols using both invalidation
and broadcast features are known as "competitive
snooping protocols [8]", the cache is insufficient to
reduce the latency in accessing load information within
the cluster of bus-connected multiprocessors. Thus the
snooping cache in PIM/c utilizes an invalidation
A. Hierarchical structure of shared bus
multiprocessor and network based multiprocessor.
Figure 1 shows the configuration of PIM/c. It is
organized along a hierarchical structure of networkconnected clusters to utilize the localities of KLI
programs. Thus, the shared bus hierarchy consists of
processors combined in a cluster. Each processor has
its own cache, and they share a common bus. Software
simulation has proved that the common bus might be a
bottle-neck. We concluded that the number of
processors within a cluster should be limited to around
eight, and that a two-way-interleaved common bus [11]
should be possible in PIM/c.
We consider that utilizing the access locality makes it
possible to reduce the amount of network hardware
because of reducing the number of messages
transferred among clusters. As a consequence, in PIM/c
the network is connected only to cluster controllers
(CC) instead of all processors in the cluster.
protocol and the implementation of broadcast feature is
also considered, not for cache, but for registers to
reduce the latency more efficiently.
l
Crossbar network
In network-based mUltiprocessors, for normal
process execution, it is more important to increase the
throughput than to reduce the latency because the "nonbusy-waiting" feature could overcome the large latency
[4]. The PIM/c network unit has message queues to
increase the throughput, although they produce an
II
•
I
I
PE1
CC
I
gache
increase in latency. For dynamic load balancing, use of
II
the old information may cause wasteful load
I
dispatching. Therefore, a shortcut path to the message
queues is introduced to reduce the latency in accessing
load information through the network of PIM/c.
32 Clusters
/ System
J
I
1
I
PE8
•••
Cache
1
11
.II
• •
•
11 Interleaved shared-bus
..l
Shared memory
Hardware extensions in PIM/c require only a small
I~aChe I
P
1-
8PEs / Cluster
t-
CC:Cluster Controller
PE: Processor Element
amount of hardware because the addressable space for
broadcasting is limited in the shared bus, and because
the increase in the number of interconnections among
Fig. 1. The configuration of PIM/c. Each
clusters is less than that of a system with a special
purpose network [10].
cache has a capacity of 80 Kbytes and consists of 20
byte blocks.
725
B. Broadcast registers in the shared bus hierarchy.
In order to reduce the access latency of load
information in the shared bus hierarchy, registers with
broadcast feature are introduced in PIM (Fig. 2) [12].
We denote these registers as EFR's (Event Flag
Register). They have the following features:
The register should be written with the load
information by its corresponding cluster controller.
As the load information is required without waiting at
message queues and without waiting for the cluster
controllers toreceive, specified registers can always be
read in 11 cycles.
• one-bit wide to indicate an event, and a fast
Network
detection feature for control jumps which checks the
un~
1
existence of events .
• feature of broadcast write; therefore, registers
indicating the same request event to any processor
can be written simultaneously.
The reference and jump can be done within a cycle.
Send
Recv.
msg.
msg.
queue
queue
•
When using registers, there is no overhead due to cache
misses. Each PIM/c processor has 16 EFRs.
PE0
PEl
PE7
Shared
eeo
ee1
CC: Cluster Controller
CIA: Cluster Info Register
Fig. 3. Shortcut path in the network. The
shortcut paths and the registers exist in the router board
of the packet switching network. Broken lines show the
normal path through the message queues to increase the
.... _-.-#-.
-:'- ...
I
network throughput and the bold lines show the
shortcut path to bypass the queues.
PE: Processing Element
EFR: Event Flag Register
Fig. 2. Broadcast registers in the cluster. Bold
3. EVALUATION STRATEGY
lines show the propagation path of a request event to
broadcast registers and the broken lines show the
We defined the following two strategies to evaluate
memory polling path without hardware support. The
the effectiveness of the proposed load balancing
thin lines show the reset action of that event.
hardware.
C. Shortcut path in the network hierarchy.
In order to reduce the access latency of load
information in the network hierarchy, two kinds of
features are introduced; a shortcut path for the specific
3.1 Evaluation on the Real Hardware
Real hardware was used for evaluation as the
software simulation is almost impossible for the
following reasons:
• The presence of the cache and the network introduce
messages (Fig. 3) [13] and the registers that hold the
more parameters.
load information are called CIR's (Cluster Information
There are many hardware parameters related to the
Register). The hardware has the following features:
internal states of the cache and the network. The
• a shortcut path to message queues.
common bus arbitration time, and the message
• eight-bit wide registers to indicate load information
packet switching time are examples. The overhead
in a corresponding cluster.
of cache misses and the network latency is important
726
in this evaluation. Thus, simulating the cache and
network effects concurrently with processor
processor using common load pool [14].
Consequently, an explicit load balancing
activities would have taken a great deal of time in
communication for the distributed load pools is
software simulation.
required.
• Receiver-initiated load balancing.
3.2 Evaluation using an Artificial
Load
Model
With an aim toward further improvement, we
evaluated an artificial load model for the following
reasons:
• to separate the effect of hardware alone.
An evaluation independent of the specific
application is necessary in order to isolate the
speedup produced by the proposed hardware
mechanisms.
• to separate the effect of load balancing.
The explicit load balancing communication for the
distributed load pools should be initiated by fully
idle processors in order to avoid wasteful
dispatching. Thus the communication is request
based.
• Communication with arbitrary responder.
In order to reduce the response time without
interrupting busy processors, a new type of
communication, the AR (Arbitrary Responder)
communication is introduced in PIM/c [12]. The
request is sent to any processor which has more
The real KLI execution environment involves many
than one load in its load pool. In order to avoid the
new control sequences in addition to load
high overhead of context switching, every
balancing. For example, handling the priority of
loads needs another polling action using EFR
registers. The total performance depends on the
usage of the proposed hardware in other control
processor polls the request at intervals where the
context switch overhead is low. Thus any
processor which detects the request rust responds
to it. As the timing to detect requests differs in
sequences.
each PIM/c processor, this communication method
is expected to reduce the response time
4. EVALUATION RESULTS
proportionally to the number of processors in a
cluster.
We carried out the evaluation of the proposed
hardware in both shared bus and network-based
hierarchies.
4.1 Evaluation of broadcast registers in the
shared-bus hierarchy
We carried out this evaluation by focusing on the
reduction of the latency to access the information about
the existence of the idle processors.
B. The load model.
This model reflects the following characteristics of
KLI program execution:
• Unit load.
We denote the unit as the reduction. The unit is
assumed to be 200 cycles in PIM/c (Fig. 4).
• Indeterminacy in the granularity of loads.
In order to simulate "Tail Recursion Optimization"
[17], we define the goal as consisting of an
A. The load balancing scheme.
The load balancing scheme is explained below:
• Distributed load pool.
Each processor has its own load pool in order to
avoid implicit data transfers between caches due to
updating a serial link in case of the generator
processor of the load differs from its consumer
arbitrary number of reductions (1 to 16).
• Indeterminacy in the number of goals .
In order to simulate the indeterminacy, we assume
that each processor generates an arbitrary number of
goals (1 to 4096).
• A high write ratio and a high share ratio.
Accesses performed within the reductions have the
727
following parameters: write ratio is 0.5, share ratio
is 0.5, where write ratio is defined as the ratio of
processor by updating their communication areas. The
evaluation measures are i and t, and the reduction cost
write references to total memory references, and
is defmed as follows:
Reduction cost = (T - I - t ) / R
share ratio is defmed as the ratio of references to
shared data area to total memory references .
• A high access locality.
Figure 5 shows the performance increase in reduction
We define the locality as the number of successive
accesses to the same address. The value is set to 4
in order to simulate free-list manipulation, which
consists of allocating, instantiating, referring and
deallocating a memory cell.
using registers. The total reduction cost and the load
request count are varied in 14 simulation cases. In this
figure, request ratio is introduced, which is defmed as
the ratio of the load request count r to the total reduction
count R. The reduction cost is almost independent of
the request ratio. This fact indicates that the memorypolling overhead caused by checking request
N
PEi
occurrences is larger than the overhead due to cache
misses using invalidation protocol. The speedup
obtained is 15% due to the use of EFRs.
300
o :Unit load
Fig. 4. A Load model with varying
granularity.
:
11.--
Ii)
Q)
~
ltll
280
U
Z
C. Results o/the evaluation in a cluster.
0
We control the initial load amount in each processor
~
:::>
270
to vary load balancing conditions. According to the
w
260
0
a:
··········l·············t:i;~··t~~r·········
···········t··~········
deviation of the initial load amounts within processors,
14 cases are simulated with an 8-processor cluster. The
250
0
resulting data are the total elapsed time (1'), the total idle
time (I), the total wait time after requesting for load (i),
:
·············t·············fWithort::EF~........... .
290
~
C/)
0
:
-61-- A._A_~-tAA~,A
i
1
i
i
0.01
0.02
0.03
0.04
0.05
REQUEST RATIO
Fig. 5. The increase in speed using regis-
the total dispatching time (t), the total reduction count
ters. The reduction cost is defined as the number of
(R) and the load request count (r). The total idle time
execution cycles per unit load. The result involves extra
includes the time spent waiting for load dispatching
cycles for probing. The request ratio is defined as the
since requesting a load by updating a bit-map word
number of request per reduction. Using memory
until receiving a load by reading a non-zero value from
its communication area, and the time to wait for
polling the reduction cost is high due to the serial
execution of a memory access and a branch. Using
termination of the whole program. The bit-map word is
EFR, both the access and the branch can be done within
a data array in which each bit corresponds to a
a cycle. The polling is done for three kinds of events;
processor requesting load. The total dispatching time
load request, load dispatching and termination of the
includes the time to select an idle processor by encoding
whole program.
the bit-map word to the address of its communication
area, and the time to dispatch a load to each idle
728
Figure 6 shows the wait time i and the dispatching
wasteful dispatching. In this scheme, the cluster to
time t as a function of request count. It is confirmed that
which goals are dispatched is determined at random
the use of EFR with broadcast feature reduces both the
wait time and dispatching time. The use of EFR reduces
the dispatching time by 20%, and reduces the wait time
by 15%.
and then this goal dispatch is aborted on the
condition that the dispatch target has more loads in
the pool than the dispatching cluster.
B. The Load model.
2.5
HI
The load model among clusters is defmed in such a
2.510'
Wllhou,-EFR
WAIT TIME
--Wllh_EFR
0
• -0 -
en
way as to reflect the changes in the amount of loads in
2.01r1
2.010'
1.51r1
1.5 10'
the load pool. The load model is as follows:
~
~
w
~
~
t=
-I
§:
(!)
m
Z
I
()
=i
1.01r1
1.010'
~
:Q
0
CD
.!!1
a..
C/)
5
5.01rJ
5.010'
-.(>o-Wllhoul_EFR
DISPATCHING TIME
--With_EFR
0
2000
3000
4000
5000
0000
0
7000
REQUEST COUNT
• An initial goal is denoted by L( 16) (Fig. 7 shows
L(5)).
• The execution of goal L(i) produces (i-I)
subgoals, L(i-l), ... , L(2), L(1). Thus, the goal
L(i) has 2i-l reductions.
• Each reduction takes 300 cycles to execute using
network messages.
• The message length required for the load
dispatching is 27 bytes long. Thus, it takes 27
cycles to send this message through the one-byte-
Fig. 6. The increase in speed using broad-
wide network interface. The length of the message
casting. The dispatching time and the wait time
requesting the load amount is 2 bytes.
increase due to the cache misses using an invalidationtype snooping cache. The use of broadcast feature
eliminates the overhead due to the cache misses.
4.2. Evaluation of shortcut paths in the
network-based hierarchy
We carried out this evaluation by focusing on the
reduction of the latency in accessing the value of the
total load in a cluster.
A. The load balancing scheme.
The load balancing scheme is described below:
• Sender-initiated load balancing.
A study of the Multi-PSI system disclosed a
problem of the receiver-initiated load balancing
sch~eme
in large-scale machines, namely that a load
request contention may arise at busy processors
[15]. In order to avoid this contention, an improved
sender-initiated scheme, named "Smart Random
Load Dispatching" [5] is efficient in reducing
©
:Unitload
Fig. 7. A load model with floating amount of
load.
C. Results o/the evaluation among clusters.
We control the dispatching rate, which is defmed as
the ratio of all goals dispatched to other clusters to all
executed goals, by changing the interval of the
dispatching control. In order to determine the efficiency
of load dispatching, the total elapsed time (T), the total
idle time (I) and the dispatching rate (d) are measured.
Differences result from the latency of load information.
729
Figure 8 shows the results obtained by applying the
smart random load dispatching scheme to 8 cluster
approximately 5.5 in an 8-cluster system at a
dispatching rate of 0.2.
Comparing the two results, the use of the proposed
system without support hardware. The normalized
elapsed time, which is defmed as the ratio of elapsed
hardware halves the normalized elapsed time at 0.2
time by 8 cluster system to elapsed time by single
dispatching rate, where the control of dispatching rate
cluster, and the utilization of processors are plotted as a
seems to be possible.
function of the dispatching rate. In order to compare the
results in the two cases, we assume that the dispatching
rate is controlled to be 0.2, because safe control occurs
only at the upper side of the minimum point. Without
the support hardware, the resulting increase in speed is
It should be noted that the shortcut path can also be
used for other load balancing schemes, including the
minimum load distribution scheme [16]. These schemes
will be evaluated in future work.
approximately 3.3 in an 8-cluster system at a
dispatching rate of 0.2.
w
::2
d--
w
::2
~
o
w
en
a..
~
o
w
~
~
a:
o
z
,. ----- UTILIZATION
··············(············(············1···....
0..+
0.2
w
~
w
I'•
0811lt
0.6
F
o
···1,·············
!
till
:s
0.8
o
W
N
C
0.6 -i
j=
:J
~
-i
a:
0.4
6
Z
«
~
oz
- - NORMALIZED ELAPSED TIME
1----+---1--+---+--1
0.2
0.2
0.4
0.6
0.8
0
DISPATCHING RATE
- - NORMAUZED ELAPSED TIME
01----+---+--+---+----1
o
0.2
0.4
0.6
0.8
DISPATCHING RATE
Fig. 9. Smart random dispatching
with support hardware. The normalized elapsed
time varies near 0.125 using 8 clusters connected via a
Fig. 8. Smart random dispatching
without support hardware. The dispatching rate
is defined as the ratio of all goals dispatched to other
network because the overhead for message handling is
quite low.
clusters to all executed goals. The normalized elapsed
time varies considerably from 0.125 using 8 clusters
connected via a network because the overhead for
message handling is visible.
5. CONCLUSION
Hardware for dynamic load balancing is implemented
in both shared-bus and network-based mUltiprocessors.
We propose a register with broadcast write feature in
Figure 9 shows the results after applying the smart
shared-bus multiprocessors. Also, in network-based
random load dispatching scheme with hardware
multiprocessors, the network unit uses a shortcut path.
support. The normalized elapsed time and the utilization
The evaluation was carried out using real hardware and
of processors are plotted as a function of the
an artificial load model.
dispatching rate. With the support hardware active, the
The evaluation results in the shared bus hierarchy
determine the overhead due to memory polling which
detects a load request. The proposed hardware reduces
processor can reduce the overhead due to requesting the
load amount. The resulting increase in speed is
730
the execution time of logic programs by 15%.
The evaluation results in the network-based hierarchy
Performance of four Snooping Cache Coherency
Protocols," Proc. of the 16th ISCA, 1989.
show that the overhead due to requesting the load
[8] A. R. Karlin, M.S. Manasse, L. Rudolph and D.
amount is reduced as a result of introducing the shortcut
D. Sleator, "Competitive Snoopy Caching," Proc.
path. The proposed hardware reduces the execution
time by 50%.
Computer Science, Toronto, October, 1986.
It is confirmed that the proposed hardware reduces
the access latency of load
infon~ation,
of the 27th Annual Symposium on Foundation of
[9] A. Gupta and J. Hennessy, "Comparable
and
subsequently the overhead produced by dynamic load
balancing.
Evaluation of Latency Reducing and Tolerating
Techniques," Proc. of the 18th ISCA, IEEE, 1991.
[10] H. Koike and H. Tanaka, "Multi Context
Processing and Data Balancing Mechanism of the
Parallel Inference Machine PIE64," Proc. of
ACKNOWLEDGEMENTS
The authors would like to thank Dr. Shun'ichi
FGCS, Vo1.3, 1988.
[11] L. Rudolph and Z. Segall, "Dynamic Decentralized
Uchida, the manager of the research department of
ICOT, for his guidance and support, Dr. Kazuo Taki,
Cache Schemes for MIMD Parallel Processors,"
chief of 1st ICOT laboratory, and Mr. Marius Hancu
for helpful discussions. This research was sponsored
by ICOT.
[12] T. Nakagawa, A. Goto, T. Chikayama, "Slit-
REFERENCES
Proc. of the 11th ISCA, June, 1984.
Check Features to Speedup Interprocessor Software
Interruption Handling," IEICE SIG Reports, July,
1989, pp 17-24, (in Japanese).
[13] N. Ido, H. Maeda, T. Tarui, T. Nakagawa, M.
Sugie, "Parallel Inference Machine PIM/c -Load
programming in the fifth generation computer
Balancing Support-," the 40th Annu. Convention
IPS Japan, 2L-4, (in Japanese) .
project," Springer-Verlag, 1987, 1(5) pp 3-28.
[2] K. Ueda, "Guarded Horn Clauses: A Parallel Logic
[14] M. Sato and A. Goto, "Evaluation of the KLI
Parallel System on a Shared Memory
Programming Language with the Concept of a
Multiprocessor," Proc. of IFIP Working Conf. on
Parallel Processing, Pisa, April, 1988.
[1] K. Fuchi and K. Furukawa, "The role of logic
Guard," TR208, ICOT, 1986.
[3] T. Chikayama, H. Sato, T. Miyazaki, "Overview of
the Parallel Inference Machine Operating System
(PIMOS)," Proc. of the FGCS, voU, 1988.
[4] A. Goto, M. Sato, K. Nakajima, K. Taki, A.
[15] M. Furuichi, K. Taki, and N. Ichiyoshi, " A
Multi-Level Load Balancing Scheme for ORParallel Exhaustive Search Programs on the MultiPSI," In Proc. 9f the 2nd SIGPLAN Sympo. on
Principles and Practice of Parallel Programming, pp
Matsumoto, "Overview of the Parallel Inference
Machine Architecture (PIM)," Proc. of the FGCS,
vo1.1, 1988, pp 208-229.
[16] S. Sakai, H. Koike, H. Tanaka, T. Motooka,
[5] M.' Sugie, M. Yoneyama, N. Ido, T. Tarui, "Load
Dispatching Strategy on Parallel Inference
Machines," Proc. of FGCS, Vol.3, 1988.
"Interconnection network with dynamic load
balancing facility," Trans. of Information
Processing, Vol. 27, No.5, pp 518-524, 1986, (in
[6] J ... Archibald and J. Baer, "Cache Coherence
Protocols: Evaluation using a Multiprocessor
50-59, Mar. 1990.
Japanese).
[17]
D. H. D. Warren, " An Improved' Prolog
Simulation Model," ACM Trans. on Compo
Implementation which Optimises Tail Recursion,"
Systems, Vol.4, No.4, 1986, pp 273-298.
Research paper 156, Dept. o.f Artificial Intelligence,
[7] S. J. Eggers and R. H. Katz, "Evaluating the
Univ. of Edinburgh, Scotland, 1980.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
731
Evaluation of the EM-4 Highly Parallel Computer
using a Game Tree Searching Problem
Yuetsu KODAMA Shuichi SAKAI Yoshinori YAMAGUCHI
Electrotechnical Laboratory
1-1-4, Umezono, Tsukuba-shi, Ibaraki 305, Japan
kodama~etl.go.jp
Abstract
EM-4 is a highly parallel computer whose eventual target implementation has more than 1,000 processing elements(PEs). The EM-4 prototype consists of 80 PEs
and has been fully operational at the Electrotechnical Laboratory since April 1990. EM-4 was designed
to execute in parallel not only static or regular problems, but also dynamic and irregular problems. This
paper presents an evaluation of the EM-4 prototype
for dynamic and irregular problems. For this evaluation, we chose a checkers program as an example
of the game tree searching problem. The game tree
is dynamically expanded and its structure is irregular because the number and the depth of subtrees of
each node depend heavily upon the status of the game.
We examine effects of the load balancing by function
distribution, data transfer, control of parallelism, and
searching algorithms on the EM-4 prototype. The results show that the EM-4 is effective in dynamic load
balancing, fine grain packet communication and high
performance of instruction execution.
1
Introduction
Parallel computing has been effective for static or regular problems such as scientific computing and database systems. Parallel computing is, however, still an
active research topic for dynamic or irregular problems.
EM-4 is a highly parallel computer which was developed at the Electrotechnical Laboratory in Japan.
Its target applications include not only static or regular problems, but also dynamic or irregular problems.
EM-4 provides special hardware for parallel computing: high data transfer rate, high data matching performance, dynamic load balancing, and high instruction execution performance.
In this paper, we evaluate the performance of EM-4
on a dynamic and irregular problem. The performance
of EM-4 on some small programs such as recursive fibonacci is presented in [Kodama et al. 1991]. While
the fibonacci program creates many function instances
dynamically, it is not irregular because the tree of calling functions is a binary tree, the depth of each branch
is similar to those of its neighbors, and the size of each
node function is the same and small. We chose a game
tree searching problem as a practical problem. This
class of programs dynamically expands the game tree,
and is irregular because the number of subtrees from
each node of the game tree, the depth of su btrees, an d
the execution time of each node depends heavily upon
the status of the game. Furthermore, the o:-{3 searching algorithm is often used for game tree searching,
because it cuts the evaluation of the current tree by
using the evaluation of the previous tree. Tree cutting
makes the program more dynamic and irregular.
This paper presents the evaluation of the EM-4 prototype using a checkers game program as an example
of the game tree searching problem. We examine the
effect of parallel computing on the EM-4 prototype.
Section 2 presents an overview of the EM-4 and its
prototype. Section 3 describes a game tree searching
problem and a checkers game. Section 4 presents evaluation issues for load balancing, data transfer, control
of parallelism, and searching algorithms for the checkers game. Section 5 gives an evaluation and examination of the strategies described in section 4. Section 6
concludes our results and discusses our future plans.
2
The EM-4 Highly Parallel
Computer
EM-4 is a highly parallel computer whose eventual target implementation has more than 1,000
PEs[Yamaguchi et al. 1989, Sakai et al. 1989]. The
EM-4 prototype consists of 80 PEs and has been fully
operational since April 1990[Kodama et al. 1990].
732
SU:
IaU:
Switching trxUt
Input Buffer trxUt
retch, Matchinq trxUt
EXt1:
Jb:eaution trxUt
MCl1:
Memory Control trxUt
MAINT: Maintenace trxUt
i'NIJ:
31
32
32
Figure 1: The organization of EM-4 Prototype
2.1
The architecture of EM-4
The organization of the EM-4 prototype is shown in
Figure 1. The prototype consists of 80 PEs, and each
5 PEs are grouped and are implemented on a single
PE board. The PE of the prototype is an single chip
processor which is called EMC-R and is implemented
in a C-MOS gate array. The PE has local memory
and is connected to the other PEs through a circular
omega network.
EMC-R is a RISC processor for fine grain packetbased parallel processing. EMC-R generates packets
in an execution pipeline, and computation is fired by
the arrival of packets. This is a dataflow mechanism,
but we improved it so that it can operate on a block
which consists of several instructions, executed exclusively from other instructions. This model is called
the "strongly connected arc model", and the block is
a strongly connected block(SCB).
When a packet arrives at a PE, the execution
pipeline is fired and EMC-R executes the SCB indicated by the packet. First, EMC-R checks whether the
partner of the packet has arrived. If the parter exists,
it continues to execute the SCB until the end of the
block. If the parter does not exist, EMC-R stores the
packet data in a matching memory and waits for the
next packet.
The packet size of EMC-R is two fixed words and
there is only one format consisting of one address word
and one data word. It can be generated in a RISC
pipeline of EMC-R. During the data word is calculated in a RISC pipeline, the address word is formed
in a packet generation unit when the packet output is
instructed. Since the network port is only one word
wide, first the address word is sent to network, and
then the data word is sent. In the second clock cycle,
the next instruction can be executed in parallel with
data word transfer.
The circular omega network has the same structure
as an omega network, except that every node of the
network is connected to a PE. The network has the
following features: (1) The required amount of hardware is D(N), where N is. the number of PEs; (2) The
distance between any two PEs is D(logN). The 3 by
3 packet switching unit is in a EMC-R, and a packet
can be transferred to a neighboring PE independent of
the instruction execution on the PE. Packets are transferred by wormhole routing, and take only M + 1 cycles
between PEs which are distance M apart if there is
no network conflict.
The clock of the EMC-R runs at 12.5 MHz. The
RISC pipeline can execute most instruction in one
clock cycle; the peak execution performance is 12.5
MIPS. It takes two clock cycles when two operand
matching fails, and takes three clock cycles when the
matching succeeds. The peak synchronization performance is 2.5 Msync/s. It takes two clock cycles to
transfer a packet, and the peak network packet transfer performance is 18.75 Mpacket/s. EM-4 prototype
consists of 80 PEs, the peak execution performance is
1 GIPS, its ,peak synchronization performance is 200
Msync/s, and its peak network packet transfer performance is 1.5 Gpacket/s. EMC-R achieves a high
performance in both instruction execution and packet
data transfer/matching.
733
CALLEE
PE[O,2]
a
b
c
d
e
g
h
[LD,[GA,CA]] is the MLPE packet which shows that
PE[GA,CA] has the minimum load LD
Figure 2: How to Detect the Minimum Load PE
2.2
Dynamic load balancing method
To get high performance in parallel computers, high
utilization of PEs, as well as high performance of PEs
are necessary. If the program has simple loop structure or static data transfer structure such as in diffusion equation applicaitons, the load of the program
can be estimated and the load can be statically balanced at programming or compiling time. But, if the
program is dynamic or irregular structure, static load
balancing is difficult and dynamic load balancing· is
necessary.
In the EM-4, we implemented automatic load balancing mechanisms attached to the circular omega
topology. In the circular omega network, each node
has two circular paths. We use a path to group
the PEs, and use another path to achieve dynamic
load balancing. Suppose that a PE wants to invoke a new function. This PE will send out a special MLPE{Minimum Load PE) packet. The MLPE
packet always holds the minimum load value and the
PE address among the PEs which it goes through.
The load of each PE is evaluated by hardware in the
PE mainly based on the number of packets in the input buffer. At the starting point, the MLPE packet
holds its sender's load value and its PE address; when
it goes through a certain SU in the circular path, the
SU compares the load value of the PE connected to it,
and if the value is less than of the packet, the data in
the MLPE packet will be automatically rewritten to
the current PE's value; otherwise the MLPE packet
keeps its value and goes to the next SU. This operation is done in one clock cycles of packet transfer.
When the MLPE packet returns to the starting point,
it holds the least loaded PE number and its load value.
Figure 2 show this. In this figure, PE[l,O] generates
an MLPE packet and, after the circulation, it obtains
the least loaded PE number [0,2] and its load l.
By this method, called the circular path load balancing, each MLPE packet scans s different groups,
where s is the number of network stages. When
the total number of the PEs increase, coverage of
PEs by this load balancing method becomes relatively
small. The efficacy of this method is reported in
[Kodama et al. 1991].
Since it takes several cycles for the MLPE packet to
return, the EM-4 resolves this latency by pre-fetching:
it sends a MLPE packet in advance, allocates the new
function instance on the PE specified by the returned
packet of MLPE, and stores the function ID in a special register of the required PE. When a function call is
necessary, the stored function ID is used and another
MLPE packet is sent for the next function call. In the
pre-fetch strategy, the new function ID may have not
yet been stored when a function call is necessary. In
this case, the pre-fetch method uses one of the other
distribution methods to choose the PE.
3
Game Tree Searching Problem
We choose the checkers program as an example of a
game tree searching problem in order to evaluate the
EM-4 on a dynamic and irregular problem. Since the
rules of checkers are very simple, the program makes
it easy to characterize the parallel behavior of the program.
The rule of checkers game is as follows. Each player
moves one of his pieces in turn until the player who
has no pieces or moves loses. Pieces can be moved
to a forward diagonal area. If there is an opponent's
piece in a forward diagonal area, and the next diagonal
area is empty, you must jump to the empty area and
remove the enemy piece. If you can jump successively,
you must jump successively. If your piece arrives at
the end of the enemy area, that piece can then move
in all four diagonal directions.
The Min-Max searching algorithm is the simplest algorithm for the game tree searching problem. This algorithm expands the game tree by the possible moves
of each player in turn. When the game tree is expanded to a certain level, each leaf is evaluated. If
the stage corresponds to your turn, the maximum
node is selected; if the stage is your opponent's turn,
the minimum node is selected. Although the MinMax algorithm is simple, it is not efficient because
it needs to search every branch. The a-{3 searching
algorithm[Slagle 1971] is more efficient than the Min-
734
Max algorithm, because this algorithm tries to cut off
the evaluation of unnecessary branches.
If the game tree is expanded in a depth-first manner, the resources required to remember the game tree
are small. This.expansion makes it easy to cut off the
unnecessary branches, but reduces the parallelism. If
the game tree is expanded in a breadth-first manner,
it results in large parallelism, so this expansion is wellsuited for parallel computers. However, since the number of nodes increases exponentially as a function of
the depth of the tree, the resources will be exhausted
quickly if the parallelism is not controlled.
4
Execution Issues of a Checkers Game
The overheads to parallelize the checkers program are
the following:
1. overhead for allocating new function instances on
other PEs.
2. overhead for transferring the current status of the
table to other PEs.
3. idle PEs caused by an unbalanced load.
4. decline of efficiency caused by cutting branches in
the a- f3 search.
These overheads depend upon implementation strategy decisions. The function distribution strategy effects the function allocation overhead. Packed data
transfer reduces the amount of transfer data. The
idle PE ratio depends upon the load balancing strategy. The searching algorithm changes the branch cutting overhead. These overheads also depend upon the
control of the parallelism and the searching strategy.
Each of these decisions is described in greater detail
in the following subsections.
4.1
Function distribution and load balancIng
Load balancing is the most important issue in achieving high performance on parallel computers. Since the
checkers program requires many function instances to
expand the game tree, it distributes them among the
PEs in order to balance the load across the machine.
Our checkers program can distribute function calls
by one of the following two strategies:
round-robin distribution Each PE independently
chooses the PE which will execute the called function in a round-robin manner.
manager distribution A centralized manager PE
chooses the PE which will execute the called function.
We can also combine the two methods: that is, the
manager distribution can be used until a certain level
in the game tree expansion, and the round-robin distribution can be used after that level. In the roundrobin distribution, the load might be unbalanced at
the beginning of the program. In the manager distribution, the overhead is larger than round-robin distribution because of packet communication overhead
and concentration of requests.
EM-4 dynamically distributes functions according
to the load of PEs by the circular path load balancing described in section 2.2. The dynamic round-robin
distribution described below is the third function distribution method that we evaluated in our checkers
program.
dynamic round-robin distribution A PE is dynamically chosen by the circular path load balancing method, and in the case that the MLPE
packet has not returned, a PE is chosen by the
round-robin distribution method.
4.2
Data transfer
Since EM-4 is a distributed-memory parallel computer, the checkers program sends the status of the
table and selected moves by packets to functions on
other PEs. The status of the table is represented by
a 64 word array, but each word is only 4 bits. The
following two transfer methods are considered in the
checkers program.
unpacked transfer use packets which have data
representing a position.
packed transfer use packets which have packed
data representing 8 positions.
While the unpacked transfer sends eight times more
packets than the packed transfer, the packed transfer
needs to pack and unpack data.
4.3
Control of parallelism
Parallelism has to be controlled to both avoid exhaustion of resources, and to provide sufficient parallelism
to keep all the PEs busy. To control parallelism,
throttling can limit the number of the active functions. If the number of active functions exceeds a certain amount, further requests for calling functions are
buffered until other functions are finished. Throttling
has the possibility of deadlock.
Another way to control parallelism is to switch
from breadth-first search to depth-first search at some
level of the game tree, where the level can be determined either statically or dynamically. Static switching sets the level by the depth of the game tree. Dynamic switching determines the level using the load
735
of PEs. Breadth-first searching increases parallelism,
and depth-first searching restrains parallelism.
Our checkers program uses the static switching
strategy to control parallelism, because this strategy
is very simple. We plan to implement the dynamic
switching strategy for the checkers program in the
near future.
4.4
Kin-Max
breadth-firat
depth-firat
Game tree searching algorithlTIs
The two primary algorithms for the game tree searching problems are the Min-Max algorithm and the a-/3
algorithm. The Min-Max algorithm provides much
parallelism in the breadth-first strategy. The a-{3 algorithm has high efficiency in the depth-first strategy.
If the a-/3 algorithm is used only with the breadth-first
strategy, it ignores the possibility of cutting branches,
and it must search more trees than the a-{3 algorithm
on a single processor. Since the ratio of branches cut
off relative to the whole tree in the a-{3 algorithm increases according to the depth of the searching tree, a
parallel a-/3 searching algorithm must be considered to
increase the efficiency of branch cutting in the parallel
environment.
Parallel a-/3 searching is complicated because of the
dilemma between parallelism and efficiency of branch
cutting. Another important problem is the overhead
of terminating functions. Since these function instances are distributed and activated in parallel, the
overhead of terminating functions is more than overhead of creating functions. This difficult trade-off is
simply resolved in our checkers program by changing algorithm in breadth-first strategy and depthfirst strategy. In the breadth-first strategy, we select the min-max algorithm to expand the parallelism,
and in the depth-first strategy, we select the a-{3 algorithm to achieve the efficiency of cutting branch.
We call this search "serial a-{3 search" in this paper.
This search can be easily implemented, but the efficiency of branch cutting is less than the parallel a-{3
search[Oki et al. 1989].
To get more efficiency from branch cutting, the
search that uses a-/3 search from the leaf of breadthfirst strategy is the "partial parallel a-/3 search". This
search algorithm is illustrated in Figure 3. In this
search, depth-first search is called in parallel from the
leaf of breadth-first search, but the top node(which
is indicated by B in the figure) of serial depth-first
search gets the a-/3 value from the parent node (A)
every time when the child node (C) return the evaluation result, and check whether the remain branch
(C') can be cut off or not. The merit of this search is
that we can expect enough efficiency from branch cutting and the overhead of terminating search is nothing
Figure 3: partial parallel search
since the child node in depth-first strategy is sequentialized.
The checkers program can use the following three
searching algorithms.
Min-Max search using the Min-Max algorithm
both breadth-first and depth-first.
serial a-{3 search using the Min-Max algorithm
breadth-first, and using the a-/3 algorithm depthfirst.
partial parallel a-{3 search using the Min-Max algorithm breadth-first until the last level, and using the a-/3 algorithm in the last level of breadthfirst and then depth-first.
5
Experimental Results on the
EM-4
We implemented the checkers program on the EM4 prototype in an assembly language to evaluate the
performance of the EM-4 for dynamic and irregular
problems. We examine the execution issues discussed
in the previous section.
5.1
Effects of function distribution and
load balancing
An unbalanced workload causes idle PEs. Since the
load balancing of the checkers program is performed
at the function level, the function distribution strategy must be evaluated. The alternatives for the function distribution of the checkers program are the manager distribution, the round-robin distribution, the
dynamic round-robin distribution, and combinations
of these.
We executed the checkers program using the partial
parallel a-/3 search using each function distribution
methods. Figure 4 shows the results. We represent
the speedup ratio of each distribution relative to the
round-robin distribution. We executed each combination of manager distribution and round-robin distribu-
736
eXl!cutio time(ms)
speedup
(\(\
1.6-+---1+-/- - - - - 1F--\-f':--\\--+----I-
:::===/1:'==:m=a~\=\.:=\\=\\\.:==:
1
1.3-+--/-f--r---t---~\I-\-'..-t\-.....-.... - - t -
30
1
o. 1
round·rob·n
".
O.9-1---....>...V'---t----+-----''rl-\---t-
1
2
3
4
5
6
Depth of searching tree
o.m
0.0 1
•• ----
,~,
.... ,
1.13
.
unpacked-t1'!'e;'
/
...
.......
~;/
/'
./·:~35
1.61 ~-
..---::.
,z
/pa~r·lnst/
3.~//·
packed.tlme
3
1.0-+.:----r--......--+-________+---"\r-I---+-
"""
/
0
o.3
,/ ---------, \
,;;/
100
:::~::/:~~~~:~~~~~~:y-n-a-nu-.c-r+o-T~-nd-;,:-;~-,~-:~t-·-·-"· "': :->-.:-: .-tII
instructions
3.",
• .1
_____ ~~.~...b~~.~!~ ~n
....·unpacked-Inst
;/
/~
/
.~.//
y
2
3
4
5
6
Depth of searching tree
Figure 4: Effects of function distribution
Figure 5: Comparison of data transfer
tion, and the fastest combination is shown in the figure. The combination uses the manager distribution
until the third level, and thereafter uses the dynamic
round-robin.
When the level of tree searching is shallow, manager
distribution is better, because the manager distribution allocates functions more evenly. Since the size of
each function is large relative to the whole program,
the heavily loaded PE will become a bottle-neck and
the program cannot achieve sufficient speed-up, even
if the load is only slightly unbalanced. When the level
ofthe search tree becomes deeper, the dynamic roundrobin distribution is better, because the size of each
function becomes small relative to the whole program,
and a small load imbalance does not effect the execution time much. On the other hand, in the manager
distribution, the requests of PE addresses for the function call concentrate on the manager PE. Because of
the queue of requests, the long turnaround time of
the function call makes the execution time slow. Furthermore, at the sixth level of the search tree in the
manager distribution, the program cannot be executed
because of overflow of the packet queue buffer.
Since the execution of the dynamic round-robin distribution is 15% faster than the round-robin distribution when the searching tree is deep, this indicates
that the dynamic round-robin strategy is effective in
the case that there is sufficient parallelism.
ory locations in a single PE. We compared the two
data transfer method, unpacked and packed. The unpacked transfer uses a packet which has data representing a position, while the packed transfer uses a
packet which has packed data representing 8 positions.
5.2
Effects of data transfer
To parallelize the program, data must be transferred
between PEs, while data is only passed between mem-
Figure 5 is the results by the checker program of
the partial parallel a-{3 search using the combination
of manager and dynamic round-robin method as the
function distribution. This figure shows the execution
time and the total number of executed instructions of
both data transfer method. Note that the execution
time and the total number of the executed instructions
are figured on a logarithmic scale.
In this figure, the number of executed instructions
of the packed packet transfer is 50% more than the
unpacked transfer for each level. The increase of the
executed instructions is caused by the pack and unpack operations. When the level is shallow, the execution of the unpacked transfer is 1.5 times faster
than the packed transfer. This speed-up ratio is the
same as the instruction amount ratio. But when the
level is deep, packed transfer is a little faster than the
unpacked transfer while the instruction count of the
packed transfer is larger than the unpacked transfer.
Figure 6 shows the number of active PEs and overhead PEs in both data transfer strategies. An overhead PE is a PE which is waiting for the ready of the
network to send a packet or stores the packet in the
memory packet buffer when the on-chip packet buffer
overflows. An active PE is a PE which is neither an
overhead PE nor an idle PE. At the shallow levels, the
active PE ratio of both transfer strategy is low. VVhen
737
Speedup ralio
PE ratio (%)
2~
70
60
50
pack~d-act
,I"
40
/
,// V--
I
/
/",
M n~.~~.~..................................., ' " '
~/'/
,/
,,
10
...... .
.'
//,:
" unpackec -act ./
30
20
V
~
:
:
.5:~-----+--/7;~~--~-----r-----r----T-
//
.,:
/.,
:-
.2~~-----¥----~----1-----+---~r---~-
....
unp~cked-ovh
..... I
,/,1
.1~~~-+----~----~----+-----r---~/"
./:
.05:~~--J.---+---+---+---t----t-
o
5
o
6
Depth of searching tree
Figure 6: Examination of the active PE ratio comparing the data transfer
the level becomes deep, the active PE ratio of the unpacked transfer is 30% lower than the packed transfer,
and the overhead PE ratio of the unpacked transfer is
30% higher than the packed transfer. This high overhead PE ratio of the unpacked transfer is the reason
why it is slower than the packed transfer. Since the
unpacked transfer needs to send more packets than
the packed transfer, the network has many conflicts,
resulting in large overhead.
Although the packed transfer shows the high ratio of the active PEs on the surface, a third of the
instructions are used for packing and unpacking the
packets, and the packed transfer is not so effective.
Since the pipeline of the EM-4 is designed to send
packets quickly, unpacked transfer is suitable for the
EM-4. If there are many conflicts in the network, however, the overhead decreases the performance of sending packets. One way to reduce this overhead is to
avoid the network conflicts by allocating the function
locally. Since the manager and round-robin distributions does not take into account the locality between
the PE which calls the function and the PE which executes the function, it increases the possibility of network conflicts. If the execution PE is selected from the
neighbors of the calling PE, network conflicts do not
occur as frequently. Another way to control the parallelism is by limiting the number of active functions.
This is examined in detail in the next subsection.
1
2
3
4
5
6
Deplh of parallel searching
Figure 7: Effects of parallelism control
5.3
Effects of parallelism control
While parallelism must be exploited to make the program execution faster, as mentioned before, too much
parallelism causes some overhead. It is necessary to
control the parallelism in order to avoid the exhaustion of resources, and to reduce the overhead of parallelization. The checkers program controls parallelism
by switching the searching strategy from a breadthfirst manner to a depth-first manner.
Figure 7 shows the speedup ratio to the sequential
execution of the a-{3 search when the switchover level
of the parallelism control strategy is changed. The
execution uses the combination of manager and dynamic round-robin method as the function distribution strategy and the unpacked method as the data
transfer strategy. Note that the X-axis represents the
depth of the breadth-first searching, while these all
execution search the game tree until the depth is the
sixth level.
In the Min-Max search, the deeper level of parallel searching results in more parallelism, and the
maximum speedup becomes 49 times. Exploiting
maximum parallelism, however, does not necessarily
achieve speedup. One reason is that at the sixth level,
too many packets are sent and the overhead of network
conflicts becomes much larger than at the shallow levels. Another reason is that excessive parallelism is just
overhead such as data transfer or remote function invocation, since sufficient parallelism is exploited until
the fifth level. It is sufficient to have as much parallelism as needed to activate every PE and hide the
latency of remote access - excessive parallelism is not
738
helpful.
The serial a-{3 search executes fastest at the second level, and when the level is deeper the performance decreases. This is because parallel searching
uses breadth-first search, and much information that
could be used to cut subtrees is discarded to parallelize
the program. As parallel searching gets deeper, more
information is discarded. As a result, it reduces the
efficiency of cutting excessive branches, and increases
the number of trees to be evaluated. The partial parallel a-{3 is same as the serial a-{3 search.
5.4
Effects of searching algorithms
Figure 7 also shows the effects of searching algorithms.
The execution of the Min-Max search on 80 PEs is
49 times faster than the Min-Max search on a single PE, but only 1.8 times faster than the a-{3 search
on one PE. This shows that the Min-Max search is
suitable for parallel execution, but that it is difficult
to compensate for the difference of efficiency between
the Min-Max search and the a-{3 search by parallel
execution.
The a-{3 search is a very serial algorithm, but
can achieve 16 times speedup via partial parallel a/3 search, while the serial a-/3 search can achieve 6
times speedup. This is because the partial parallel
a-{3 search uses the information of cutting trees at
the last level of parallel searching, and the efficiency
of cutting trees in the partial parallel a-{3 search is
higher than the serial a-{3 search.
6
Conclusion and Future plans
To evaluate the highly parallel computer EM-4 on dynamic and irregular programs, we execute the game
tree searching problem of checkers on the EM-4 prototype, which consists of 80 PEs. The effects of the
strategies for load balancing, data transfer, parallelism
control and searching algorithm are examined.
Our checkers program achieves 49 times speedup in
the Min-Max search and 16 times speedup in the a-{3
search on 80 PEs system. In this execution, the combination of the manager distribution until the third
level and the dynamic round-robin distribution thereafter is used as the function distribution method for
load balancing, the unpacked transfer is used as the
data transfer strategy, and the static switching from
the breadth-first to the depth-first at the fifth level in
the Min-Max search and at the second level in the a-{3
search is used to control parallelism.
In this evaluation, we demonstrated that the EM4 is effective for dynamic load balancing, fine grain
packet communication and high performance of instruction execution.
In the near future, we plan to implement a dynamic
switching strategy which controls parallelism according to the load of neighboring PEs. We will also implement the full parallel a-{3 search, compare it with
partial parallel a-{3 search, and make clear the advantages and disadvantages of each method in the EM-4
for the parallel game tree searching.
Furthermore, we are designing a higher performance
parallel computer EM-5. This computer will reduce
the overheads which are found in these evaluations
such as network conflicts.
Acknow ledgments
We wish to thank Dr. Toshitsugu Yuba, Director
of the Computer Science Division, Mr. Toshio Shimada, Chief of the Computer Architecture Section for
supporting this research, and the staff of the Computer Architecture Section for their fruitful discussions. Special thanks are due to Dr. Mitsuhisa Sato of
the Computer Architecture Section and Mr. Andrew
Shaw of MIT for their suggestions and careful reading.
References
[Slagle 1971] James R. Slagle, Artificial Intelligence:
The Heuristic Programming Approach, McGrawHill Inc., (1971).
[Oki et al. 1989] H. Oki, K. Taki, S. Sei and S. Huruichi, The parallel execution and evaluation of a
go problem on the multi PSI, Proc. of the Joint
Symp. on Parallel Processing '89, (1989), 351357.(in Japanese)
[Yamaguchi et al. 1989] Y. Yamaguchi, S. Sakai, K.
Hiraki, Y. Kodama and T. Yuba, An Architectural Design of a Highly Parallel Dataflow Machine, Proc. of IFIP 89, (1989), 1155-1160.
[Sakai et al. 1989] S. Sakai, Y. Yamaguchi, K. Hiraki,
Y. Kodama and T. Yuba, An Architecture of a
Dataflow Single Chip Processor, Proc. ofISCA 89,
(1989), 46-53.
[Kodama et al. 1990] Y. Kodama, S. Sakai and Y.
Yamaguchi, A Prototype of a Highly Parallel
Dataflow Machine EM-4 and its Preliminary Evaluation, Proc. of InfoJapan 90, (1990), 291-298.
[Kodama et al. 1991] Y. Kodama, S. Sakai and Y. Yamaguchi, Load balancing by Function Distribution
on the EM-4 Prototype, to appear in Supercomputing '91, (1991).
PROCEEDINGS OF THE INTERNA nONAL CONFERENCE
ON FIFTH GENERA nON COMPUTER SYSTEMS 1992,
edited by lCOT. © ICOT, 1992
739
OR-Parallel Speedups in a Knowledge Based System:
on Muse and Aurora
Khayri A. M. Ali and Roland Karlsson
Swedish Institute of Computer Science, SICS
Box 1263, S-164 28 Kista, Sweden
khayri@sics.se and roland@sics.se
Abstract
The paper presents experimental results of running a
knowledge based system that applies a set of rules to
a circuit board (or a gate array) design and reports
any design errors, on two OR-parallel Prolog systems,
Muse and Aurora, implemented on a number of shared
memory multiprocessor machines. The knowledge based
system is written in SICStus Prolog, by the Knowledge Based Systems group at SICS in collaboration
with groups from some Swedish companies, without considering parallelism. When the system was tested on
Muse and Aurora, without any modifications, the ORparallel speedups were very encouraging as a large practical application. The number of processors used in our
experiment is 25 on Sequent Symmetry (S81), 37 on
BBN Butterfly II (TC2000), and 70 on BBN Butterfly
I (GP1000). The results obtained show that the Aurora
system is much more sensitive to the machine architecture than the Muse system, and the latter is faster than
the former on all the three machines used. The real
speedup factors of Muse, relative to SICStus, are 24.3
on S81, 31.8 on TC2000, and 46.35 on GP1000.
1
Introduction
Two main types of parallelism can be extracted from
a Prolog program. The first, AND-parallelism, utilizes
possibilities for simultaneous execution of several subproblems offered by Prolog semantics. The second, ORparallelism, utilizes possibilities for simultaneous search
for multiple solutions to a single problem. This paper
is concerned with two systems exploiting only the latter type of parallelism: Muse [Ali and Karlsson 1990a]
and Aurora [Lusk et ai. 1990]. Both systems support
the full Prolog language with its standard semantics,
and they have been implemented on a number of shared
multiprocessor machines, ranging from a few processors up to around 100 processors. Both systems show
good speedups, in comparison with good sequential Prolog systems, for programs with a high degree of ORparallelism. The two systems are based on two dif-
ferent memory models. Aurora is based on the SRI
[Warren 1987] and Muse on incremental copying of the
WAM stacks [Ali and Karlsson 1990a]. The two systems
are implemented by adapting the same sequential Prolog system, SICStus version 0.6. The extra overhead
associated with this adaptation is low and depends on
the Prolog program and the machine architecture. For
a large set of benchmarks, the average extra overhead
for the Muse system on one processor is around 5% on
Sequent Symmetry, 8% on BBN Butterfly GP1000, and
22% on BBN Butterfly TC2000. For the Aurora system with the same set of the benchmarks, it is around
25% on Sequent Symmetry, 30% on BBN Butterfly
GP1000, and 77% on BBN Butterfly TC2000. Earlier results [Ali and Karlsson 1990b, Ali and Karlsson 1990c,
Ali et ai. 1991a, Ali et ai. 1991b] show that the Muse
system is faster than the Aurora system for a large set
of benchmarks and on the above mentioned machines.
In this paper we investigate the performance results
of Muse and Aurora systems on those multiprocessor
machines for a large practical knowledge based system
[Holmgren and Orsvarn 1989, Hagert et ai. 1988]. The
knowledge based system is used to check a circuit board
(or a gate array) design with respect to a set of rules.
These rules may for example be imposed by the development tool, by company standards or testability requirements. The knowledge based system has been written
in SICStus Prolog [Carlsson and Widen 1988], by the
Knowledge Based Systems group at SICS in collaboration with groups from some Swedish companies, without
considering parallelism. The gate array used in our experiment consists of 755 components. The system was
tested on Muse and Aurora without any modifications.
One important goal that has been achieved by Muse
and Aurora systems is running Prolog programs that
have OR-parallelism with almost no user annotations
for getting parallel speedups.
The speedup results obtained are very good on all the
machines used for the Muse system, but not for Aurora
on the Butterfly machines. We found that this application has high OR-parallelism. In this paper we are
going to present and discuss the results obtained from
the Aurora and Muse systems on the three machines
740
3
used.
The paper is organized as follows. Section 2 briefly
describes the three machines used in our experiment.
Section 3 briefly describes the two OR-parallel Prolog systems, Muse and Aurora. Section 4 presents the
knowledge based system. Sections 5 and 6 present and
discuss the experimental results. Section 7 concludes
the paper.
2
Multiprocessor Machines
The three machines used in our study are Sequent Symmetry S81, BBN Butterfly TC2000, and BBN Butterfly
GP1000. Sequent Symmetry is a shared memory machine with a common bus capable of supporting up to 30
(i386) processors. Each processor has a 64-KByte cache
memory. The bus supports cache coherence of shared
data and its capacity is 80 MByte/sec. It presents the
user with a uniform memory architecture and an equal
access time to all memory.
The Butterfly GP1000 is a multiprocessor machine
capable of supporting up to 128 processors. The GPlOOO
is made up of two subsystems, the processor nodes and
the butterfly switch, which connects all nodes. A processor node consists of an MC68020 microprocessor, 4
MByte of memory and a Processor Node Controller
(PNC) that manages all references. A non-local memory
access across the switch takes about 5 times longer than
local memory access (when there is no contention). The
Butterfly switch is a multi-stage omega interconnection
network. The switch on the GP1000 has a hardware
supported block copy operation, which is used to implement the Muse incremental copying strategy. The
peak bandwidth of the switch is 4 MBytes per second
per switch path.
The Butterfly TC2000 is a similar to the GP1000
but is a newer machine capable of supporting up to 512
processors. The main differences are that the processors used in the TC2000 are the Motorola 88100s. They
are an order of magnitude faster than the MC68020 and
have two 16-KByte data and instruction caches. Thus
in the TC2000 there is actually a three level memory hierarchy: cache memory, local memory and remote memory. Unfortunately no support is provided for cache coherence of shared data. Hence by default shared data
are not cached on the TC2000. The peak bandwidth of
the Butterfly switch on the TC2000 is 9.5 times faster
than the Butterfly GP1000 (at 38 MBytes per second
per path). The TC2000 switch does not have hardware
support for block copy.
OR-Parallel Systems
In Muse and Aurora, OR-parallelism in a Prolog search
tree is explored by a number of workers (processes
or processors). A major problem introduced by ORparallelism is that some variables may be simultaneously bound by workers exploring different branches of
a Prolog search tree. Two different approaches have
been used in Muse and Aurora systems for solving this
problem. Muse uses incremental copying of the WAM
stacks [Ali and Karlsson 1990a] while Aurora uses the
SRI memory model [Warren 1987].
The idea of the SRI model is to extend the conventional WAM with a large binding array per worker and
modify the trail to contain address-value pairs instead of
just addresses. Each array is used by just one worker to
store and access conditional bindings, i.e. bindings to
variables which are potentially shareable. The WAM
stacks are shared by all workers. The nodes of the
search tree contain extra fields to enable workers to move
around the tree. When a worker finishes a task, it moves
over the tree to take another task. The worker starting
a new task must partially reconstruct its array using the
trail of the worker from which the task is taken.
The incremental copying of the WAM stacks used in
Muse is based on having a number of sequential Prolog
engines, each with its own local address space, and some
global address space shared by all engines. Each sequential Prolog engine is a worker with its own WAM stacks.
The stacks are not shared between workers. Thus, each
worker has bindings associated with its current branch
in its own copy of the stacks. This simple solution
allows the existing sequential Prolog technology to be
used without loss of efficiency. But it requires copying
data (stacks) from one worker to another when a worker
runs out of work. In Muse, workers incrementally copy
parts of the (WAM) stacks and also share nodes with
each other when a worker runs out of work. The two
workers involved in copying will only copy the differing
parts between the two workers states. The shared memory space stores information associated with the shared
nodes on the search tree. Workers get work from shared
nodes through using the normal backtracking mechanism of Prolog. Each worker having its own copy of the
WAM stacks simplifies garbage collection, and caching
the WAM stacks on machines, like the BBN Butterfly
TC2000, that do not support cache coherence of shared
data.
A node on a Prolog search tree corresponds to a Prolog choicepoint. Nodes are either shared or nonshared
(private). These nodes divide the search tree into two
regions: shared and private. Each worker can be in either engine mode or in scheduler mode. In the engine
mode, the worker works as a sequential Prolog system
on private nodes, but is also able to respond to interrupt
signals from other workers. Anytime a worker has to ac-
741
cess the shared region of the search tree, it switches to
the scheduler mode and establishes the necessary coordination with other workers. The two main functions of
a worker in the scheduler mode are to maintain the sequential semantics of Prolog and to match idle workers
with the available work with minimal overhead.
The two systems, Muse and Aurora, have different working schedulers on the three machines used in
our experiment. Aurora has two schedulers: the Argonne scheduler [Butler et al. 1988] and the Manchester scheduler [Calderwood and Szeredi 1989]. According to the reported results, the Manchester scheduler always gives better performance than the Argonne scheduler [Mudambi 1991, Szeredi 1989]. So, the Manchester
scheduler will be used for Aurora in our experiment.
Muse has only one scheduler [Ali and Karlsson 1990c,
Ali and Karlsson 1991], so far.
The main difference between the Manchester scheduler for Aurora and the Muse scheduler is in the strategy used for dispatching work. The strategy used by the
Manchester scheduler is that work is taken from the topmost node on a branch, and only one node at a time is
shared. In Muse, several nodes at a time are shared and
work is taken from the bottommost node on a branch.
The bottommost strategy approximates the execution of
sequential implementations of Prolog within a branch.
Another difference between the two schedulers is in the
algorithms used in the implementation of cut and side
effects to maintain the standard Prolog semantics.
Many optimizations have been made of implementation of the Aurora and Muse systems on all the three
machines. The only optimization that has been implemented for Muse and not for Aurora is caching the
WAM stacks on the BBN Butterfly TC2000. In Aurora
the WAM stacks are shared by the all workers while in
Muse each worker has its own copy of the WAM stacks.
Therefore, it is straightforward for Muse to make the
WAM stack areas cachable whereas in Aurora it requires
a complex cache coherence protocol to achieve this effect.
4
Knowledge Based System
One important process in the design of circuit boards
and gate arrays is the checking of the design with respect
to a set of rules. These rules may for example be imposed by the development tool, by company standards
or by testability requirements. Until now, many of these
rules have only been documented on paper. The check is
performed manually by people who know the rules well.
Increasing the number of gates in circuit boards (or in
gate arrays) makes the manual check a very difficult process. Computerizing this process is very useful and may
be the most reliable solution. The knowledge based systems group at SICS, in collaboration with groups from
some Swedish companies, has been developing a knowledge based system that applies a set of rules to a circuit
board (or a gate array) design and reports any design errors [Hagert et al. 1988, Holmgren and Orsvarn 1989].
The groups have developed two versions of the knowledge based system. The first version has been developed
using a general purpose expert system shell while the
second has been developed using SICStus Prolog. The
latter, which will be used in our experiment, is more
flexible and more efficient than the former. It is around
10 times faster than the first version on single processor machines. When it has been tested, without any
modifications, on Muse and Aurora systems on Sequent
Symmetry, the speedups obtained are linear up to 25
processors.
One reason for the high degree of OR-parallelism in
this kind of application is that all of the rules applied
to the circuit board (or a gate array) design are independent or could be made independent of each other.
The second source of OR-parallelism is the application of
each rule to all instances of a given circuit sub-assembly
on the board. A circuit sub-assembly can be either a
component (like buffer') inver'ter') nand) and) nor') Or')
XOr') etc.) or a group of interconnected components. The
knowledge based system mainly consists of an inference
engine, design rules, and a database describing the circuit board (or the gate array). The inference engine is
implemented as a metainterpreter with only 8 Prolog
clauses. The gate array used in our experiment consists
of 755 components (Texas gate array family TGC-100),
which is described by around 10000 Prolog clauses. The
design rules part with its interface to the gate array
description is around 200 Prolog clauses. Eleven independent rules are used in this experiment. The metainterpreter applies the set of rules to the gate array description. For a larger gate array more OR-parallelism
is expected. It should be mentioned that people who developed the knowledge based system did not at all consider parallelism, but they tried to make their system
easy to maintain by writing clean code. They avoided
using side effects, but they have used cuts (embedded in
ILThen_Else) and findall constructs. The user interface
part of this application is not included in our experiment.
Since Muse and the Aurora system are also running
on larger machines, the BBN Butterfly machines, it was
more natural to test the knowledge based system on
those machines. The speedup results obtained differ for
the Muse and the Aurora system. On 37 TC2000 processors, Muse is 31.8 times faster than SICStus, while
Aurora is only 7.3 times faster than SICStus. Similarly,
on 70 GP1000 processors Muse is 46.35 times faster than
SICStus, while Aurora is only 6.68 times faster than
SICStus. The low speedup for the Aurora system is surprising since this application is rich in OR-parallelism.
Is this a scheduler problem for Aurora or an engine problem? The following two sections are going to present and
742
analyze the results of Muse and Aurora, in order to try
to answer this question.
Speed-up
25
5
Timings and Speedups
• Muse
o Aurora
20
In this section we present timing and speedup results
obtained from running the knowledge based system on
Muse and Aurora systems. The runtimes given in this
paper are the mean values obtained from eight
runs. On Sequent Symmetry, there is no significant difference between mean and best values, whereas on the
Butterfly machines, mean values are more reliable than
best values due to variations of timing results from one
run to another!. Variations around the mean value will
be shown in the graphs by a vertical line with two short
horizontal lines at each end. The speedups given in this
section are relative to running times of Muse on one
processor on the corresponding machine. The SICStus
one-processor runtime on each machine will also be presented to determine the extra overhead associated with
adapting the SICStus Prolog system to the Aurora and
Muse systems. Sections 5.1, 5.2, and 5.3 present those
results on Sequent Symmetry, GP1000, and TC2000 machines, respectively.
5.1
Sequent Symmetry
Table 1 shows the runtimes of Aurora and Muse on Sequent Symmetry, and the ratio between them. Times are
shown for 1,5, 10, 15,20, and 25 workers with speedups
(relative to one Muse worker) given in parentheses. The
SICStus runtime on one Sequent Symmetry processor is
422.39 seconds. This means that for this application and
on the Sequent Symmetry machine the extra overhead
associated with adapting the SICStus Prolog system to
Aurora is 26.3%, and for Muse is only 1.0% (calculated
from Table 1). The performance results that Table 1 illustrates are good for both systems, and Aurora timings
exceed Muse timings by 25% to 26% between 1 to 25
workers. Figure 1 shows speedup curves for Muse and
Aurora on Sequent Symmetry. Both systems show linear speedups with no significant variations around the
mean values.
Table 1: Runtimes (in seconds) of Aurora and Muse on
Symmetry, and the ratio between them.
I Workers
1
5
10
15
20
25
1/
Aurora I
Muse
533.69(0.80) 426.74(1.00)
106.87(3.99) 85.67( 4.98)
53.58(7.96) 42.94(9.94)
36.06(11.8)
28.73(14.9)
27.22(15.7)
21.65(19.7)
21.83(19.5)
17.39(24.5)
II
Aurora/Muse
1.25
1.25
1.25
1.26
1.26
1.26
IThese variations are due mainly to switch contention.
I
o
o
15
o
10
o
5
o
5
10
15
20
25
Workers
Figure 1: Speedups of Muse and Aurora on Symmetry,
relative to 1 Muse worker.
5.2
BBN Butterfly GP1000
Table 2 shows the runtimes of Aurora and Muse on
GP1000 for 1, 10, 20, 30, 40, 50, 60, and 70 workers.
The SICStus runtime on one GP1DDD node is 534.4 seconds. So, for this application and on the GP1DDD machine the extra overhead associated with adapting the
SICStus Prolog system to Aurora is 66%, and for Muse
is only 7%. Here the performance results are good for
the Muse system but not for the Aurora system. Aurora
timings are longer than Muse timings by 55% to 594%
between 1 to 70 workers.
Figure 2 shows speedup curves corresponding to Table 2 with va~iations around the mean values. The
speedup curve for. Aurora levels off beyond around 2D
workers. On the other hand, the Muse speedup curve
levels up as more workers are added.
Table 2: Runtimes (in seconds) of Aurora and Muse on
GP1DOD, and the ratio between them.
I Workers II
1
10
20
30
40
50
60
70
Aurora
886.4(0.65)
105.3(5.44)
74.1(7.72)
72.7(7.88)
64.3(8.91)
72.4(7.90)
65.7(8.71)
80.0(7.15)
I
Muse
572.3(1.00)
58.3(9.82)
29.8(19.2)
20.7(27.7)
16.1(35.5)
13.8(41.6)
12.4(46.1)
11.5(49.6)
II
Aurora/Muse
1.55
1.81
2.49
3.52
3.99
5.26
5.29
6.94
I
743
Speed-up
Speed-up
50
!
I
30
45
40
t
• Muse
o Aurora
35
30
•
25
•
• Muse
25
t
o Aurora
•
20
20
•
15
15
10
10
o
5
m:
10
20
30
40
50
60
5
70
Workers
5
Table 3: Runtimes (in seconds) of Aurora and Muse on
TC2000, and the ratio between them.
Aurora
180.55(0.59)
22.12( 4.79)
16.02( 6.61)
13.66(7.76)
13.79(7.68)
I
! f
Muse
105.97(1.00)
10.81(9.80)
5.56(19.1)
3.93(27.0)
3.29(32.2)
II
10
15
20
25
30
35
Figure 3: Speedups of Muse and Aurora on TC2000,
relative to 1 Muse worker.
Table 3 shows the performance results of Aurora and
Muse on TC2000 for 1, 10, 20, 30, and 37 workers. The
SICStus runtime on one TC2000 node is 100.48 seconds.
Thus, for this application and on the TC2000 machine
the extra overhead associated with adapting the SICStus
Prolog system to Aurora is 80%, and for Muse is only
1
10
20
30
37
I
Workers
BBN Butterfly TC2000
I Workers II
!
0
Figure 2: Speedups of Muse and Aurora on GP1000,
relative to 1 Muse worker.
5.3
m:
08-
Aurora/Muse
I
1.70
2.05
2.88
3.48
4.19
5%. Here also the performance results are good for the
Muse system but not for the Aurora system. Aurora
timings are longer than Muse timings by 70% to 319%
between 1 to 37 workers.
Figure 3 shows speedup curves corresponding to Table 3. The speedup curves are similar to the corresponding ones shown in Figure 2.
6
Analysis of Results
From the results presented in Section 5 we found that
the Muse system shows good performance results on the
three machines, whereas the Aurora system shows good
results only on the Sequent Symmetry. In this section,
we try to explain the reason for these results by studying
the Muse and Aurora implementations on one of the
Butterfly machines (TC2000). The TC2000 has better
support for reading the realtime clock than the GP1000.
A worker time could be divided into the following three
main activities:
1. Prolog: time spent executing Prolog (i.e., engine
time).
2. Idle: time spent waiting for work to be generated
when there is temporarily no work available in the
system.
3. Others: time spent in all the other activities (i.e.,
all scheduling activities) like spin lock, signalling
other workers, performing cut, grabbing work,
sharing work, looking for work, binding installation (and copying in Muse), synchronization between workers, etc.
Table 4 and Table 5 show time spent in each activity and the corresponding percentage of the total time.
Results shown in Table 4 and Table 5 have been obtained from instrumented versions of Muse and Aurora
on the TC2000. The times obtained from the instrumented versions are longer than those obtained from
744
Table 4: Time (in seconds) spent in the main activities
of Muse workers on TC2000.
Muse Workers
Prolog
1
5
10
20
30
128.36(100)
128.80(99.7)
129.28(99.1)
129.90(96.5)
130.32(95.4)
Activity
Idle
0
0.09(0.1)
0.40(0.3)
3.56(2.6)
4.17(3.0)
Others
0
0.26(0.2)
0.71(0.5)
1.17(0.9)
2.11(1.5)
Table 5: Time (in seconds) spent in the main activities
of Aurora workers on TC2000.
Aurora Workers
Prolog
1
5
10
20
30
210.42(98.2)
221.24(98.3)
235.34(98.1)
329.60(98.1)
412.97(94.7)
Activity
Idle
0
0.19(0.1)
0.43(0.2)
1.11(0.3)
13.70(3.2)
Others
2.36(1.1)
2.03(0.9)
2.43(1.0)
3.61(1.1)
7.64(1.8)
WAM stacks are shared by all workers while in Muse
each worker has its own copy of the WAM stacks. The
global Prolog tables have been implemented similarly in
the both Muse and Aurora systems. Since the Muse
engine does not have any problem with the Prolog tables, the problem should lie in the sharing of the WAM
stacks in Aurora, coupled with the fact that this application generates around 9.8 million conditional bindings,
and executes around 1.1 million Prolog procedure calls.
On the average, each procedure call gerierates around
9 conditional bindings. This may mean that the reason why Aurora slows down lies in the cactus stack approach, which causes a great many non-local accesses
to the Prolog stacks. This results in a high amount of
switch contention once over five workers. This is avoided
in the Muse model, since each worker has its own copy
of the WAM stacks in the processor local memory and
the copy is even cachable. Unfortunately, we could not
verify this hypothesis because the current Aurora implementation on the TC2000 does not provide any support
for measuring the stack variables access time.
uninstrumented systems by around 19-27%. So, they
might not be entirely accurate, but they help in indicating where most of the overhead is accrued.
7
Before analyzing the data in Table 4 and Table 5
we would like to make two remarks on these data. The
first remark is that in the Aurora system the overhead of
checking for the arrival of requests is separated from the
Prolog engine time, while in the Muse system there is no
such separation. This explains why there is scheduling
overhead (Others) in the 1 worker case in Table 5 and
not in Table 4. The other remark is that the figures
obtained from the Aurora system do not total 100% of
time, since a small fraction of the time is not allocated
to any of the three activities. However, these two factors
have no significant impact on the following discussion.
Experimental results of running a large practical knowledge based system on two OR-parallel Prolog systems,
M use and Aurora, have been presented and discussed.
The number of processors used in our experiment is 25
on Sequent Symmetry (S81), 37 on BBN Butterfly II
(TC2000), and 70 on BBN Butterfly I (GP1000). The
knowledge based system used in our study checks a circuit board (or a gate array) design with respect to a set
of rules and reports any design errors. It is written in
SICStus Prolog, by the Knowledge Based Systems group
at SICS in collaboration with groups from some Swedish
companies, without considering parallelism. It is used
in our experiment without any modifications.
By careful investigation of Table 4 and Table 5 we
find that the total Prolog time of Muse workers is almost
constant with respect to the number of workers whereas
the corresponding time for Aurora grows rapidly as new
workers are added. We also find that the scheduling time
(Others) in Table 5 is not very high in comparison with
the r:orresponding time in Table 4. Similarly, the difference of Idle time between Muse and Aurora is not so
high. So, the main reason for performance degradation
in Aurora is the Prolog engine speed.
We think that the only factor that slows down the
Aurora engine as more workers are added is the high
access cost of non-local memory. Non-local memory access takes longer time than local memory access, and
causes switch contention. Non-local memory accesses
can be due to either the global Prolog tables or the
WAM stacks. In Muse and Aurora systems, the global
tables are partitioned into parts and each part resides
in the local memory of one processor. In Aurora the
Conclusions
The results of our experiment show that this class of
applications is rich in OR-parallelism. Very good real
speedups, in comparison with SICStus Prolog system,
have been obtained for the Muse system on all three
machines. The real speedup factors for Muse are 24.3
on 25 S81 processors, 31.8 on 37 TC2000 processors,
and 46.35 on 70 GP1000 processors. The obtained real
speedup factors for Aurora are lower (than for Muse)
on Sequent Symmetry, and much lower on the Butterfly
machines. The Aurora timings are longer than Muse
timings by 25% to 26% between 1 to 25 S81 processors,
70% to 319% between 1 to 37 TC2000 processors, and
55% to 594% between 1 to 70 GP1000 processors.
The analysis of the obtained results indicates that
the main reason for this great difference between Muse
timing and Aurora timing (on the Butterfly machines)
lies in the Prolog engine and not in the scheduler. The
Aurora engine is based on the SRI memory model in
745
which the WAM stacks are shared by the all workers.
We think that the only reason why the Aurora engine
slows down as more workers are added is to be found in
the large number of non-local accesses of stack variables.
This results in a high amounts of switch contention as
more workers are added. This is avoided in the Muse
model, since each worker has its own copy of the WAM
stacks in the processor local memory and even cachable
in the TC2000. Unfortunately, we could not verify this
hypothesis because the current Aurora implementation
on the Butterfly machines does not provide any support
for measuring access time of stack variables.
8
Acknowledgments
We would like to thank the Argonne National Laboratory group for allowing us to use their Sequent Symmetry and Butterfly machines. We thank Shyam Mudambi
for his work on porting Muse and Aurora to the Butterfly machines. We also would like to thank Fredrik
Holmgren, Klas Orsvarn and Ingvar Olsson for discussions and allowing us to use their knowledge based system.
References
[Ali and Karlsson 1990a] Khayri A. M. Ali and Roland
Karlsson. The Muse Approach to OR-Parallel Prolog.
International Journal of Parallel Programming, pages
129-162, Vol. 19, No.2, April 1990.
[Ali and Karlsson 1990b] Khayri A. M. Ali and Roland
Karlsson. The Muse OR-Parallel Prolog Model and its
Performance. In Proceedings of the 1990 North A merican Conference on Logic Programming, pages 757776, MIT Press, October 1990.
[Ali and Karlsson 1990c] Khayri A. M. Ali and Roland
Karlsson. Full Prolog and Scheduling OR-Parallelism
in Muse. International Journal of Parallel Programming, pages 445-475, Vol. 19, No.6, Dec. 1990.
[Ali and Karlsson 1991] Khayri A. M. Ali and Roland
Karlsson. Scheduling OR-Parallelism in Muse. In Proceedings of the 1991 International Conference on
Logic Programming, pages 807-821, Paris, June 1991.
[Ali et al. 19"91a] Khayri A. M. Ali, Roland Karlsson and Shyam Mudambi. Performance of Muse on
the BBN Butterfly TC2000. In Proceedings of the
ICLP'91 Pre-Conference Workshop on Parallel Execution of Logic Programs, June 1991. To appear also
in Lecture Notes in Computer Science, Springer Verlag.
[Ali et al. 1991b] Khayri A. M. Ali, Roland Karlsson and Shyam Mudambi. Performance of Muse on
Switch-Based Multiprocessor Machines. Submitted to
the New Generation Computing Journal, 1991.
[Butler et al. 1988] Ralph Butler, Terry Disz, Ewing
Lusk, Robert Olson, Ross Overbeek and Rick Stevens.
Scheduling OR-parallelism: an Argonne perspective.
In Proceedings of the Fifth International Conference
and Symposium on Logic Programming, pages 15901605, MIT Press, August 1988.
[Calderwood and Szeredi 1989] Alan Calderwood and
Peter Szeredi. Scheduling OR-parallelism in Aurorathe Manchester scheduler. In Proceedings of the
sixth International Conference on Logic Programming, pages 419-435, MIT Press, June 1989.
[Carlsson and Widen 1988] Mats Carlsson and Johan
Widen. SICStus Prolog User's Manual. SICS Research
Report R88007B, October 1988.
[Hagert et al. 1988] G. Hagert, F. Holmgren, M. lidell
and K. Orsvarn. On Methods for Developing Knowledge Systems-an Example in Electronics, Mekanresult at 88003 (in Swedish), Sveriges Mekanforbund,
Box 5506, 114 85 Stockholm, 1988.
[Holmgren and Orsvarn 1989] Fredrik Holmgren and
Klas Orsvarn. Towards a Domain Specific Shell for
Design Rule Checking. In Proceedings of the IFIP TC
10/WG10.2 Working Conference on the CAD Systems Using AI Techniques. pages 221-228, Tokyo,
June 6-7, 1989.
[Lusk et al. 1990] Ewing Lusk, David H. D. Warren,
Seif Haridi, et al. The Aurora OR-parallel Prolog System. New Generation Computing, 7(2,3): 243-271,
1990.
[Mudambi 1991] Shyam Mudambi. Performance of Aurora on NUMA machines. In Proceedings of the
1991 International Conference on Logic Programming, pages 793-806, Paris, June 1991.
[Szeredi 1989] Peter Szeredi. Performance analysis of
the Aurora OR-parallel Prolog System. In Proceedings
of the 1989 North A merican Conference on Logic Programming, pages 713-732, MIT Press, March 1989.
[Warren 1987] David H. D. Warren. The SRI Model
for OR-parallel Execution of Prolog-Abstract Design and Implementation Issues. In Proceedings of the
1987 Symposium on Logic Programming, pages 92102, 1987.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992
edited by {COT. © ICOT, 1992
'
746
A Universal Parallel Computer Architecture
William J. Dally
Artificial Intelligence Laboratory and Laboratory for Computer Science
Massachusetts Institute of Technology
Cambridge, MA 02139
billd@ai.mit.edu
Abstract
Advances in interconnection network performance and interprocessor interaction mechanisms enable the construction
of fine-grain parallel computers in which the nodes are physically small and have a small amount of memory. This class
of machines has a much higher ratio of processor to memory area and hence provides greater processor throughput
and memory bandwidth per unit cost relative to conventional memory-dominated machines. This paper describes
the technology and architecture trends motivating fine-grain
architecture and the enabling technologies of high-performance interconnection networks and low-overhead interaction mechanisms. We conclude with a discussion of our experiences with the J-Machine, a prototype fine-grain concurrent computer.
1
Introduction
Computer architecture involves balancing the capabilities of components (processors, memories, and communication facilities), organizing the connections between
the components, and choosing the mechanisms that control how components interact. The top-level organization of most computer systems is similar. As shown in
Figure 1, all parallel computers consist of a set of processing nodes each of which contains a processor, some
memory, and a communication interface. The nodes are
interconnected by a communication facility (typically
a network). A sequential processor is the special case
where there is only a single node and the network is
used only to connect to I/O devices.
At present, the organization of processors and memories is well understood and network technology is rapidly
maturing. While these components continue to evolve
IThe research described in this paper was supported in part by
the Defense Advanced Research Projects Agency under contracts
N00014-88K-0738 and N00014-87K-0825, in part by a National
Science Foundation Presidential Young Investigator Award, grant
MIP-8657531, with matching funds from General Electric Corporation and IBM Corporation, and in part by assistance from Intel
Corporation.
Node 1
Node N
Node 2
•••
Network
Figure 1: The structure of a parallel computer or multicomputer. All multicomputers consist of a collection
of nodes connected by a network. Each node contains a
processor (P), a memory (M), and a communication interface (C). Machines differ in the balance of component
performance and in the mechanisms used for communication and synchronization between the nodes.
with improving technology and incremental architecture
improvements, they do not provide significant differentiation between machines. With a convergence in machine
organization, balance and mechanisms become central
architectural issues and serve as the major points of differentiation.
This paper explores two ideas related to balance and
mechanisms. First, we propose balancing machines by
cost, rather than by capacity to speed ratios. Such costbalanced machines have a much higher ratio of processor to memory area and hence much greater processor throughput and memory bandwidth per unit cost
compared to conventional machines. Cost-balanced machines are have a fine-grained physical structure. Each
node is physically small and has a small amount of memory. Efficient operation with this fine-grained structure
depends on high-performance communication between
nodes and low overhead interaction mechanisms.
The mechanisms that control the interaction between
the nodes of a parallel computer determine both the
747
grain-size and the programming models that can be efficiently supported. By choosing a simple, yet complete,
set of primitive mechanisms, a parallel computer can
support a broad range of programming models and operate at a fine grain size.
A fine-grain parallel computer with fast networks and
efficient mechanisms has the potential to become a universal computer architecture in two respects. First, this
class of machine has the potential to universally displace
conventional (sequential and parallel) coarse-grained computers. Secondly, a simple yet efficient set of interaction
mechanisms serves as the basis for a parallel computer
that is universal in the sense that it runs any parallel
programming system.
The remainder of this paper explores the issues of balance and mechanisms in more detail. The next section
identifies trends in conventional sequential processor architecture that have led to a cost-imbalance between
processors and memory. Section 3 discusses how an opportunity exists to greatly improve the performance/cost
of computer systems by correcting this imbalance. The
next two sections deal with the two enabling technologies: Networks (Section 4) and Mechanisms (Section 5).
Together these enable fine-grain machines to give sequential performance competitive with conventional machines while greatly outperforming them on parallel applications. Our experience in building and operating a
prototype fine-grain computer is described in Section 6.
2
Trends in Sequential Architecture
Two trends are present in the architecture of conventional computers:
1. The size of a processor relative to the size of its
memory system is decreasing exponentially.
2. The time required for a processor to interact with
an external device connected to its memory bus is
increasing.
The first trend is due to an attempt to balance computer systems by ratio of processor performance (i/s)
to memory capacity (bits). In 1967, Amdahl [22] suggested that a system should have 8Mbits of memory
for each Mi/s of processor performance. The processor
performance/size ratio (i/s x cm2 ) benefits from technology improvements in both density and speed while
the memory capacity/size ratio (bits/ cm2 ) benefits only
from density improvements. Thus the processor to memory cost ratio for an Amdahl-balanced system scales inversely with speed improvements.
Let K(67) denote the ratio of processor cost to memory cost for such an Amdahl-balanced system in 1967.
Every Y years, the line width of the underlying semiconductor technology has halved. As a result, the area
of both the processor and the memory was reduced by
a factor of four [23]. At the same time, the processor
speed increased by a factor of a. To keep such a system
Amdahl-balanced, the capacity (and hence the size) of
the memory must also be increased by a. Thus, the
processor to memory ratio during year x > 67 is given
by K( x) = K(67)a(67-X)/Y. For typical values of a = 3
and Y = 5 [23], K(92) = .004K(67).
The cost of a conventional machine has become largely
insensitive to processor size as a result of this exponential trend in the ratio of processor to memory size.
Thus, processor designers have become lavish in their
use of areal. Costly features such as large caches, complex data paths, and complex instruction-issue logic are
added even though their marginal affect on processor
performance (compared to a small cache and a simple
organization ) is minor. As long as the size of the machine is dominated by memory, adding area to the processor has a small effect on overall size and cost.
The second trend, the increase in external interaction latency, is due to the first trend, to the increasing difference in on-chip to off-chip signal energies, and
to to deepening memory hierarchies. As processors get
faster and memory size increases the number of processor cycles required to access memory increases. Modern
microprocessor-based computers have a latency of 5-20
cycles for a main memory access and this number is increasing. At the same time, decreasing on-chip signal
energies require greater amplification to drive off-chip
signals. Also, as more levels of caching are introduced,
the number of cycles expended before initiating an external memory reference increases and the memory interface becomes specialized for the transfer of cache lines.
If a conventional processor is used in a parallel computer, its high external interaction latency limits its
communication performance as the network must typically be accessed via the external memory interface.
Whether this interface uses DMA to transfer data stored
in memory (and possibly cached) or uses writes to a
memory-mapped network port, each word of the message must traverse the external memory bus and the cost
of initiating an external memory operation is incurred
at least once. The slow external memory interface also
contributes to the lack of agility in modern processors
(that is, their slowness in responding to external events
and switching tasks) because a great dea1 of processor
state must be transferred to and from memory during
these operations.
These trends in conventional processor architecture
1 As a result of this lavish use of area, processor sizes have
scaled slightly slower than predicted by the formula above.
748
make conventional processors ill-suited for use in a parallel computer. Current cost-insensitive processors are
not cost effective in a machine with higher processor
to memory ratio where the cost of the processor is an
important factor. Their high external interaction latency severely limits their communication performance
and their poor agility limits their ability to handle synchronization.
This does not mean, however, that conventional instruction set architectures (ISAs) are unsuitable for parallel computing. Rather it is the cost-insensitive design
style, deep memory hierarchies, and poor agility that are
the problem. As we will see in Section 5, a conventional
ISA can be extended with a few instructions to provide
an efficient set of parallel mechanisms.
Most importantly, the trend toward ever higher memory to processor size ratios has created an enormous
opportunity for parallel computing to improve the performance/ cost of computers. By adding more processors while keeping the amount of memory constant, the
performance of the machine is dramatically increased
with little impact on cost. The current trend, however, of building parallel computers by simply replicating workstation-sized units (increasing processors and
memory proportionally) does not exploit this advantage.
The processor to memory ratio must be decreased to improve efficiency. This theme is explored in more detail
in the next section.
3
Balance
Balance, in the context of computer architecture, refers
to the ratios of throughput, latency, and capacity of
different elements of a computer. In this section we
will explore the balance between processor throughput,
memory capacity, and network throughput in a parallel
computer. A case will be made for balancing machines
based on cost 2.
Traditionally, machines have been balanced by rules
of thumb such as the one due to Amdahl discussed above.
However, a more economical design results if a machine
is balanced based on cost. A machine is cost-balanced
when the incremental peformance increase due to an
incremental increase in the cost of each component is
equal. Let each component ki in a machine with performance P have cost Ci, then the machine is cost-balanced
if OP/OCi = oPjocj;Vi,j [7].
It is difficult to solve these balance equations because
(1) no analytic function exists that relates system performance to component cost and (2) this relationship
varies greatly depending on the application being run.
Also, analyzing existing applications can be misleading
2Much of the material in this section is based on a joint work
in progress with Prof. Anant Agarwal of MIT.
as they have been tuned to run on particular machines
and hence reflect the balance of those machines.
A workable approach is to start from the present
memory-dominated system and increase the processor
and network costs until they reach some fraction of total cost, for example 10%. At this point the system
costs a small fraction more than a conventional system.
If designed with an appropriate communication network
(Section 4) and mechanisms (Section 5), it should provide sequential performance comparable to that of a conventional machine. Applications that are parallelized to
take advantage of the machine can potentially speed up
by the entire increase in processing cost.
To make reasonable balancing decisions, it is important to use manufacturing cost, not component price, as
our measure of cost. This avoids distorting our analysis
due to the widely varying pricing policies of semiconductor vendors. To simplify our analysis of cost, we will use
silicon area normalized to half a minimum line width, >.,
as our measure of cost [27].
First consider the issue of processor to memory balance. There are two issues: (1) how large a processor to
use on each node and (2) how much memory per processor. A 64-bit processor with floating point but no cache
and simple issue logic currently costs about 100M).2,
about the same as 500Kbits of DRAM, and has a performance of 50Mi/s. Making a processor larger than
this gives diminishing returns in performance as heroic
efforts are made to exploit instruction-level parallelism
[20]. A smaller processor may improve efficiency slightly.
If we are allocating 10% of our cost to processors, we
will build one processor for every 5Mbits of memory rounding up this gives one processor per MByte. In today's technology a processor of this type with IMByte
of memory can easily be integrated on a single chip. In
comparison, an Amdahl-balanced machine would provide 64MBytes of memory for each processor and be
packaged in 30-50 chips.
Providing a small cache memory for the processor is
cost effective; however a large cache and/ or a secondary
cache are not. Adding a small 4KByte I-cache and Dcache requires about 16M).2 of area and greatly boosts
processor performance achieving hit rates greater than
90% on many codes [35]. Making the cache much larger
or deepening the memory hierarchy would greatly increase processor area with a very small return in performance. Also, using a small co-located memory reduces
processor access time to DRAM memory.
The network to memory balance is achieved in a similar manner, by adding network capability until cost is
increased by a small fraction. A great deal of network
performance comes at very little cost. The PC (printedcircuit) boards on which the processor-memory chips are
mounted have a certain wiring capacity and the periphery of the chips can support a certain number of I/O
749
pads3 . The network can make use of most of these pin
and wire resources at a very small cost. The cost of the
network router itself is small; a competant router can be
built in less than 10MA 2 [16]. For example, a router on
an integrated processor-memory chip could easily support 6 16-bit wide channels from which a 3-D network
can be constructed (Section 4). Conventional PC boards
and connectors can easily handle these signals.
Attempting to increase network bandwidth beyond
this level becomes very expensive. To add more channel
pins, the router must be moved to a separate chip or even
split across several chips incurring additional overhead
for communication between the chips. These chips are
pad-limited and most of their area is squandered. If the
amount of memory per node is increased proportionally
to the cost of the network router to hold the memory to
network cost ratio constant, the network bandwidth per
bit of memory decreases (and the processor to memory
ratio is distorted).
A computer design can be approximately cost-balanced
by using technology constraints to determine the processor/memory /network ratios. A simple three step method
gives a well cost-balanced system:
1. Size the processor to the knee of its performance/ cost
curve to get a cost effective processor.
2. Set the processor to memory ratio to allocate a
fixed fraction I (in the example above 0.1) of cost
to the processor to get a machine that is within
1/1 - I of the optimium cost.
3. Holding processor and memory sizes constant, size
the network to the knee of its performance/cost
curve to get a cost effective network.
Machines that are cost-balanced using this method
offer aggregate processor performance and local memory
bandwidth that is 50 times that of an Amdahl-balanced
machine per unit cost. This performance advantage will
expand by a factor of a every Y years.
Why are coarse-grained Amdahl-balanced machines
widespread both in uniprocessors and parallel computers? In uniprocessors, the number of processors is not
a free variable. Thus the designer is driven to increase
the size and cost of a single processor far past the knee
of its performance/cost curve.
Existing parallel computers are driven to a coarse
grain-size because (1) they are built using processors
that lack appropriate mechanisms for communication
and synchronization, (2) their networks are too slow
to provide fast access to all memory in the machine[2],
3Typical PC boards support 20wires/cm on each of 4-8 wiring
layers. Typical ICs support 100pads/cm along their periphery
with 20-50% of these pads reserved for power.
and (3) converting software to run in parallel on these
machines requires considerable effort [21]. Much of the
difficultly associated with (3) is due to the partitioning
requied to get good performance because of 1 and 2.
For cost-balanced machines to be competitive, increasing the number of processors must (1) not substantially reduce single-processor performance and (2) must
provide the potential for near-linear speedup on certain
problems. To retain single-processor performance on
a machine with a small amount of memory per node,
the network and processor communication mechanisms
must provide a single processor access to any memory location in the machine in time competitive with a
main memory access in a conventional machine. Singleprocessor performance depends on network latency. To
provide speedup on parallel applications, the processor's
communication and synchronization mechanisms must
provide for low-overhead interaction and the network
throughput must be sufficient to support the parallel
communication demands. Parallel speedup depends on
throughput and agility.
The two key technologies for building cost-balanced
machines are efficient networks, and processor mechanisms for communication and synchronization. The next
two sections explore these technologies in more detail.
4
Network Architecture and De-
.
sIgn
The interconnection network is the key component of a
parallel computer. The network accepts messages from
each processing node of a parallel computer and delivers
each message to any other processing node. Latency, T,
and throughput, As, characterize the performance of a
network. Latency is the time (s) from when the first
bit of the message leaves the sending node to when the
last bit of the message arrives at the receiving node.
Aggregate throughput AsN is rate of message delivery
(bits/s) when the network is fully loaded.
T must be kept low to achieve good performance for
sequential codes and for the portions of parallel codes
where the parallelism is insufficient to keep the machine busy. During these periods performance is latencylimited and execution time is proportional to T. During periods where there is abundant parallelism, performance is throughput limited. Recent developments in
network technology give throughputs and latencies that
approach physical and information theoretic bounds given
pin and wire constraints. A detailed discussion of this
technology is beyond the scope of this paper. This section briefly summarizes the major results.
An interconnection network is characterized by its
topology, routing, and flow control [11]. The topology of
750
~~oor-----~----~----~------~----~
,
()
c
~
, - - , ._ •• -
2500
2000
c~
-
,
...... 'Mre Delay
,,
,,,
,,
,,
B
Conventional Cube
Flat Express Cube, 1.16
"
Hierarchical Express Cube, 1-4, 1.3
1500
1000
IV¥ ,
D~
500
i
., ....
"
"
,., ,
....... .
................................
°0~----~50------10~0-----1-W------2-00-----2~50
Distance
Figure 2: Insertion of express channels into a k-ary 3cube gives performance within a small factor of physical
limits: (A) One dimension of a regular k-ary 3-cube
network, (B) Inserting one-level of express channels optimizes the ratio of wire to node delay for messages travelling long distances, (C) Hierarchical express channels
also reduce the number of switching decisions to the
minimum, logq N. (D) Adding multiple channels at each
level adjusts network bisection bandwidth to maximize
throughput.
a network is the arrangement of nodes and channels into
a graph. Routing specifies how a packet chooses a path
in this graph. Flow control deals with the allocation of
channel and buffer resources to a packet as it traverses
this path.
The topology strongly affects T since it determines
(1) how many hops H a message must make, (2) the total wire distance D (cm) that must be traversed, and (3)
the channel width W (bits) which is limited by the bisection width of the wiring media divided by the channel
bisection of the network 4 • The latency seen by a single
message in a network with no other traffic (zero-load
latency or To) is directly determined by these three factors:
D
L
To = HTn + -;- + W
r
(1)
Where Tn is the propagation delay of a node (s), v
is the signal propagation veloci ty 5 (cm/ s), and f is the
wire bandwidth (S-l).
The three-dimensional express cube topology [12], a
k-ary 3-cube with express channels added to skip intermediate hops (Figure 2B) when travelling large distances, can simultaneously optimize H, D, and AsN to
4In some small networks, Vi is constrained by component or
module pinout and not by bisection width.
5Typically v is a fraction of the speed of light O.3c S; v S; c.
Latency vs. Distance for Express Cubes
Figure 3: Latency as a function of distance for a hierarchical express channel cube with i = 4, 1 = 3, 0: = 64,
and a flat express channel cube with i = 16, 0: = 64.
In a hierarchical express channel cube latency is logarithmic for short distances and linear for long distances.
The crossover occurs between D = a and D = ia logi a.
The flat cube has linear delay dominated by Tn for short
distances and by Tw for long distances.
achieve performance that is within a small fraction of
physical and information-theoretic limits. The number
of hops H is bounded by logqN if a q way decision is
made at each step. The express cube network achieves
this bound by inserting a hierarchy of interchanges into
a k-ary n-cube network (Figure 2C). The wire distance,
D is kept to within 2- 1 / 3 of the physical minimum by always following a manhattan shortest path. Finally, the
number of network channels can be adjusted to use all
available wiring capacity (Figure 2D).
Figure 3 compares the performance of flat and hierarchical express cubes with a regular k-ary n-cube and a
wire with no switching. The ratio of the delay of a node,
T - n, to the delay of a wire between two adjacent nodes,
D( 1) / v, is denoted a = Tn V / D (1). The figure assumes
a = 64. The figure shows that a flat express cube decreases delay to a multiple of wire delay determined by
the ratio 0: to interchange spacing, i. Interchange spacing is set to the square-root of the distance to balance
the delay due to local channels with the delay due to
express channels. The hierarchical cube with three levels (1 = 3) permits small interchange spacing and allows
local and global delays to be optimized simultaneously.
The advantages of minimum H and maximum AsN
achieved by the express cube topology are important
for very large networks. For smaller networks (less than
4K nodes), however, a simpler three-dimensional torus
or mesh network, k-ary 3-cube, is usually more cost ef-
751
fective. The 3-D mesh also provides manhattan shortest
paths in physical space to keep D near minimum, has
a very regular structure, and uses uniformly short wires
simplifying the electrical design of the network.
:300
~
.2-
g 250
~
Three-dimensional networks are required to obtainin
adequate throughput for machines larger than 256 nodes.
As machines grow, the throughput per node varies inversely with the number of nodes in a row, as N 1 / 2 for
a 2-D network and as N 2 / 3 for a 3-D network. 3-D networks provide adequate throughput up to 4K nodes (16
nodes per row). Beyond this point express cubes and/or
careful management of locality is required. For machines of 256K or larger, express cubes become bisectionlimited and locality must be exploited. No cost-effective
network can scale throughput linearly with the size of
the machine. Above a certain size, all networks become
bisection-width limited and hence have a throughput
that grows as N 2 / 3 •
Figure 4: Latency as a function of offered traffic for a 2ary 8-fly network with 1, 2, 4, 8, and 16 virtual channels
per physical channel.
Routing, the assignment of a path to a message, determines the static load balance of a network. Most
routers built to date have used deterministic routing where the path depends only on the source and destination nodes. Deterministic routers can be made simple
and fast, and deadlock avoidance becomes much easier.
In particular, deterministic routing in dimension order
permits the switch to be cleanly partitioned [17]. For
some traffic patterns, deterministic routing results in a
degradation in performance due to channel load imbalance. However, for most cases deterministic routing has
proved adequate.
plexing them on demand, a network loaded with uniform
traffic can operate at 90% of its peak channel capacity. In comparison, the throughput of a network with
only a single buffer per node saturates at 20% to 50% of
capacity depending on the topology and routing. Virtual channel flow control uses several small, independent buffers in place of a single large queue to more
efficiently use valuable router storage. Figures 4 and 5
show the effect of adding virtual channels to the latency
and throughput of 2-ary n-fly networks.
Several adaptive routing algorithms have been proposed [14, 4, 25] that are capable of dynamically detecting and correcting channel load imbalance. Adaptive
routers also are able to route around a number of faulty
nodes and channels. Most adaptive routers require much
more complex logic than deterministic routers. The planar adaptive routing algorithm [4] is particularly attractive in that it retains much of the simplicity of dimensionorder routing.
Flow control involves dynamically allocating buffer
and channel resources to messages in the network. Most
parallel computer networks use wormhole routing [8] in
which buffers are allocated to messages while channels
are allocated to flow-control digits or flits. To keep
routers small and fast, channel buffers are often shorter
than messages. Thus it is possible for a message to be
blocked on the receiving side of a channel while part
of the message remains on the transmitting side. With
only a single buffer per channel, blocking a message on
the transmitting side would idle the channel wasting network resources.
Virtual-channel flow control permits messages to pass
blocked messages and make use of what would otherwise be idle channels [13]. By associating several buffers
(virtual channels) with each physical channel and multi-
200
150
••..•• 1 Lanes
- - - 2Lanes
_. - 4 Lanes
._ •• - 8Lanes
16Lanes
100
50
0
0.0
0.2
0.4
0.6
0.8
1.0
Traffic(fraclion of capacity)
The network technology described above is able to
meet the goal of providing global memory access with
a latency comparable to that of a uniprocessor. Compare for example a 64-node 3-D torus with 1MByte per
node with a comparably sized single processor machine
with 64Mbytes. Both of these machines will fit comfortably on a desktop. Since network channels are uniformly
short it is customary to operate them at twice the processor rate [10] (or more [5]). For our comparison we
will use a processor rate of 50MHz and a network clock
of 100MHz.
The 64-node torus requires an average of 6 hops to
reach any node in the machine (HTn =60ns). A message
of six 16-bit flits (L/Wj=60ns) is sent in each direction for a read operation. The composition time of the
message and the initiation of the memory access can be
overlapped with this L/W j. Thus the one-way communication time is 120ns. The memory access itself takes
lOOns. Adding the reply communication time (again terminal operations are overlapped with the L/W j time)
gives a total access time of 340ns. The uniprocessor requires 1 cycle to get off chip, 2 cycles to get across a
bus, and 1 cycle to initiate the memory operation (80ns
total). Again the memory read itself is lOOns and the
reply across the bus requires another SOns for a total of
752
~
I
1.0
'0
0.8
iI
0.6
r----r-----r---,....----r----
f
~
0.4
0.2
...... n .14
0.0 0~--.......----+---'~2----I'6-----'20
Number 01 Virtual Channels
Figure 5: Throughput of 2-ary n-fly networks with virtual channels as a function of the number of virtual channels.
260ns. Thus the uniprocessor is only 80ns or 24% faster.
Much of the additional delay can be attributed to the
fact that the parallel computer network has more decisions to make during routing and is able to handle many
messages simultaneously. While these capabilities have
a slight negative affect on latency, they give a significant
throughput advantage.
To see the throughput advantage, consider the problem of rotating a matrix about its center row. To perform one 64-bit move, the conventional machine requires
two memory cycles or 520ns for a rate of 123Mbits/s.
With an interleaved memory and a lockup-free memory
interface (which few processors have) it could overlap
operations to complete one every 160ns for a rate of
400Mbits/s. The parallel computer on the other hand
can apply its entire bidirectional bisection bandwidth of
256 16-bit channels to the problem for a total bandwidth
of 409.6GBits/s.
In summary, modern interconnection network technology gives latency comparable to conventional memory access times with throughput orders of magnitude
higher. Raw network performance solves only half of the
communication problem, however. To use such a network effectively requires efficient communication mechanisms.
5
Mechanisms
Mechanisms are the primitive operations provided by
a computer's hardware and systems software. The abstractions that make up a programming system are built
from these mechanisms [18, 9]. For example, most sequential machines provide some mechanism for a pushdown stack to support the last-in-first-out (LIFO) storage .allocation required by many sequential models of
computation. Most machines also provide some form of
memory relocation and protection to allow several processes to coexist in memory at a single time without
interference. The proper set of mechanisms can provide
a significant improvement in performance over a bruteforce interpretation of a computational model.
Over the past 40 years, sequential von Neumann processors have evolved a set of mechanisms appropriate for
supporting most sequential models of computation. It
is clear, however, from efforts to build concurrent machines by wiring together many sequential processors,
that these highly-evolved sequential mechanisms are not
adequate to support most parallel models of computation. These mechanisms do not support synchronization
of events, communication of data, or global naming of
objects. As a result, these functions, inherent to any
parallel model of computation, must be implemented
largely in software with prohibitive overhead.
For example, most sequential machines require hundreds of instructions to create a new process or to send
a message. This cost prohibits the use of fine-grain programming models where processes typically last only' a
few tens of instructions and messages contain only a
few words. It is not hard to construct mechanisms that
permit tasks to be created and messages sent in a few
instruction times; however, these mechanisms are not to
be found on conventional processors.
Some parallel computers have been built with mechanisms specialized for a particular model of programming, for example dataflow or parallel logic programming. However, our studies have shown that most programming models require the same basic mechanisms
for communication, synchronization, and naming. More
complex model-specific mechanisms can be built from
the basic mechanisms with little loss in efficiency. Specializing a machine for a particular programming model
limits its flexibility and range of application without any
significant gain in performance. In the remainder of this
section, we will examine mechanisms for communication, synchronization, and naming in turn.
Communication between two processing nodes
volves the following steps:
In-
1. Formatting: gathers the message contents together.
2. Addressing: selects the physical destination for the
message.
3. Delivery: transports the message to the destination.
4. Allocation: assigns space to hold the arriving message.
5. Buffering: stores the message into the allocated
space.
753
6. Action: carries out a sequence of operations to
handle the message.
All programming models use a subset of these basic
steps. A shared memory read operation, for example,
uses all six steps. A read message is formatted, the
address is translated, the message is delivered by the
network, the message is buffered until the receiving node
can process it, and finally a read is performed and reply
message is sent as the action. Some models, such as
synchronous message passing always send messages to
preallocated storage and thus omit allocation (step 4).
In some cases, no action is required to respond to a
message and step 6 can be omitted.
The SEND instruction, first used in the message-driven
processor [15, 16], with translation of destination addresses [19] efficiently handles the first two steps: formatting and addressing. A message is sent with a sequence of SEND instructions followed by a SENDE instruction. A SEND instruction takes a number of arguments
equal to the number of read register ports (typically two)
and appends its arguments to a message. A SENDE instruction is identical to the SEND except that it also signals the end of the message. The first SEND after a SENDE
starts a new message. By making full use of the register bandwidth the SEND instruction reduces formatting
overhead to a minimum. The alternative approaches of
formatting a message (1) in memory or (2) by writing
to a memory mapped network port have much lower
bandwidth and higher latency.
Translation is achieved by interpreting the first word
of the message stream (the first argument of the first
SEND) as a virtual destination address and translating
it to a physical address when a message is sent. A
simple translation-Iookaside buffer (TLB) efficiently performs this translation. This approach of translating virtual network addresses to physical addresses during the
SEND operation permits message sends from user code
to be fully protected without incurring the overhead of
a sytem call (as is done on many machines today). User
code is only permitted to send messages to addresses
that are entered in TLB. Sending a message to any other
address raises an exception.
Communication operations that do not require allocation and or remote action can use a subset of the basic
mechanism. A remote write operation, for example, requires neither of these functions. A voiding allocation
and action in this case eliminates the overhead of copying the message from newly allocated storage to its final
destination. The first SEND instruction of a message can
specify whether allocation (.A suffix) and/or spawning
a task (. S suffix) are required [19]. A SEND with no suffix would simply perform a remote write, SEND. A would
allocate but not initiate a remote action, and SEND. SA
would do both. The sending node treats these three
SEND operations identically and simply sends along the
two option bits with the message. The receiving node
examines the option bits to determine whether allocation and/or action is required. If an action is required,
the routine to be invoked is specified by the second word
of the message.
Storage allocation and message buffering must be performed in hardware to achieve adequate performance.
While approaches using stack (LIFO) or queue (FIFO)
based storage are simple to implement [10], they may
require copying if messages are not deallocated in order.
An alternative is to allocate message buffers off a free
list of fixed-sized segments [40]. Management of such
a free list is simple (only a single pointer is required)
and it does not restrict message lifetimes. Messages too
long for the fixed-sized segments can be handled in an
overflow area.
With any allocation scheme, a method for handling
message buffer overflow is required. Because handling
an overflow may require access to other nodes, the network must be usable even when a full buffer is causing
messages to back up into the network. This is accomplished on the J-Machine by using two virtual networks
[10]. The actual overflow handling may be performed
in software as it is a rare event. While many strategies
may be used to handle overflow, a simple one is to return overflowing messages to their senders. With this
scheme each node must guarantee that it has storage to
hold each message it originates until it is acknowledged.
The final step of a communication operation is to initiate a remote action by creating and dispatching a task.
A task or process consists of a thread of control and an
addressing environment. A thread can be created in a
few clock cycles by loading a processor's IP to set the
thread of control and initializing its memory management registers to alter the addressing environment. On
the J-Machine, each message in the message queue is
treated as a thread that is ready to run and threads are
dispatched when they reach the head of the queue This
dispatching on message arrival also serves as the basis
of a synchronization mechanism.
Synchronization enforces an ordering of events in a
program. It is used, for example, to ensure that one
process writes a memory location before another reads
it, to provide mutual exclusion during critical sections
of code, and to require all processes to arrive at a barrier
before any processes leave.
Any synchronization mechanism requires a namespace that processes use to refer to events, a method for
signalling that an event is enabled, and a method for
forcing a processor to wait on an event. Using tags for
synchronization, as with the presence bits on the HEP
[36], uses the memory address space as the synchronization namespace. This provides a large synchronization
754
namespace with very little cost as the memory manangement hardware is reused for this function. It also has the
benefit that when signaling the availability of data, the
data can be written and the event signaled in a single
memory operation. Since it naturally signals the presence of data, we refer to this synchronization using tags
on memory words as data synchronization[40].
With synchronization tags, an event is signaled by
setting the tag to a particular state. A process can
wait on an event by performing a synchronizing access
of the location which raises an exception if the tag is
not in the expected state. A synchronizing access may
optionally leave the tag in different state. Simple producer/consumer synchronization can be performed using
a single state bit. In this case, the producer executes a
synchronizing write which expects the tag to be empty
and leaves it full. A synchronizing read which expects
the location to be full and leaves it empty is performed
by the consumer. If the operations proceed in order,
no exceptions are raised. An attempt to read before a
write or to write twice before a single read raises a synchronization exception. More involved synchronization
protocols require additional states (for example to signal
that a process is waiting on a location) [19].
a
The communication mechanism described above complements data synchronization by providing a means for
a process on one node to signal an event on a remote
node. In the simplest case, a message handler can perform a synchronizing read or write operation. However,
it is often more efficient to move some computation to
the node on which the data is resident. Consider for
example the problem of adding a value to a remote
location 6 . One could perform a remote synchronizing
read that marks the location empty to gain exclusive
access, perform the add, and then perform a remote synchronizing write. Sending a single message to invoke a
handler that performs the read, add, and write on the
remote node, however, reduces the time to perform the
operation, the number of messages required, and the
amount of time the location is locked.
Many machines have implemented some form of global
barrier synchronization. For example, the Caltech Cosmic Cube [32] had four program accessible wire-or lines
for this purpose. While global barrier synchronization
is useful for some models, it can be emulated rapidly using communication and data synchronization. If there
is sufficient slack time from when a process signals that
it has reached the barrier to when it waits on the barrier, this emulation will not affect program performance.
The required amount of slack time varies logarithmically
with the number of processors performing the barrier.
Also, the major use of barrier synchronization (inserting a barrier between code that produces a structure
6This occurs for example when performing LU decomposition
of a matrix.
(e.g., array) and code that consumes the structure) is
eliminated by data synchronization. By synchronizing
in the data space on each individual element of the data
structure, control space synchronization on the program
counter between the producer and consumer is neither
required nor desired. It is more efficient to allow the
producer and consumer to overlap their execution subject to data dependency constraints. Barrier synchronization mechanisms also have the disadvantage that
they require a separate namespace which tends to be
small because of the prohibitive cost of providing many
simultaneous barriers, and they consume pin and wire
resources that could otherwise be used to speed up the
general communication network.
The mechanism that enforces event ordering solves
only half of the synchronization problem. Efficient synchronization also requires an agile processor that can
rapidly switch processes and handle events and messages
to reduce the exception handling and context switching
overhead when switching processes while waiting on an
event. Rapid task switching can be provided by providing multiple register sets or a named-state register
set [29]. Exception handling is accelerated by specifically vectoring exceptions, providing separate registers
for exception handling, and explicitly passing arguments
to exception handlers [19].
6
Experience
In the Concurrent VLSI Architecture Group at MIT, we
have built the J-Machine [10], a prototype fine-grain parallel computer with a high-speed network and efficient
yet general communication and synchronization mechanisms. The J-Machine was built to test and evaluate
our ideas on mechanisms and networks, as a proof of
concept for this class of machine, and as a testbed for
parallel software research. Small prototoypes have been
operational since June of 1991. We expect to have a
1024-processor J-Machine on-line during the summer of
1992.
The J-Machine communication mechanism permits a
node to send a message to any other node in the machine in < 1.5flS. On message arrival, a task is created
and dispatched in 200ns. A translation mechanism supports a global virtual address space. These mechanisms
efficiently support most proposed models of concurrent
computation and allow parallelism to be exploited at a
grain size of 10 operations. The hardware is an ensemble
of up to 65,536 nodes each containing a 36-bit processor, 4K 36-bit words of on chip memory, 256K words
of DRAM, and a router. The nodes are connected by a
high-speed 3-D mesh network with deterministic dimension order routing. The J-Machine has about the grain
755
512·144 bit SRAM
512·144 bit SRAM
(2Kwords)
(2Kwords)
Internal MemOry Interface
Internal Memory Interface
X Router
II
II
Y Router
Z Router
.-----
I II
Address
Arithmetic
Unit
(Datapath)
$:
c
~
Pre'
Net Input
Diag
Em","
Memory
EJ
Interface
II NM~~"I I
lelKI
2
Q.
-
0
::D
>
r
c
(')
2-
l~
0
::::I
[
Registers
Arithmetic/logic
Unit
(Datapalh)
-
Figure 6: FloOl'plan and Photograph of a MessageDriven Processor chip,
size of the cost-balanced machine described in Section 3,
one processor per megabyte of memory.
A photograph of the message-driven processor chip
used in the J-Machine is shown in Figure 6. One of these
chips combined with three external DRAM parts forms
a J-Machine node. An array of 64 nodes is packaged on
a single board (Figure 7). These boards are stacked and
connected side-to-side to form larger J-Machines.
Three software systems are currently operational on
the J-Machine. It runs Concurrent Smalltalk (CST)
[24], a version of Id based on the Berkeley TAM system
[37, 6], and a dialect of "C". Execution of these diverse
programming systems has demonstrated the efficiency
and flexibilty of the J-Machine mechanisms.
Table 1 shows the advantage of efficient mechanisms.
The left column of the table lists the operations involved in performing a remote memory reference on a
l024-node parallel computer. The next two columns list
the approximate number of instruction times required
to perform each operation on the Intel Paragon [5] and
Operation
Send 4- Word Message
Network Delay
Buffer Allocation
Switch To Handle Msg
Presence Test
Send 3-Word Return Msg
Network Delay
Buffer Allocation
Switch To Handle Msg
Switch To Restart Task
TOTAL
Paragon
600
32
20
1000
5
600
32
20
1000
1000
4309
J-Machine
3
10
0
10
0
3
10
0
3
10
49
Ideal
2
10
0
1
0
2
10
0
1
1
27
Table 1: The time to perform a remote memory reference on the Intel Paragon, a conventional messagepassing multicomputer, the J-Machine, a fine-grain parallel computer, and the time that could be achieved
with current technology (Ideal). Switch refers to a task
switch.
756
Figure 7: Photograph of a 64-node J-Machine board.
the J-Machine. Many of these times were derived from
the study reported in [38]. The final column of the table
shows the times that could be achieved with techniques
that are currently understood.
The table shows that while both machines have fast
networks the time to carry out a simple remote action is
many times greater on the conventional machine. The
single largest contributor is the task switching time 7 .
The overhead of task switching in a conventional operating system is unacceptable in this environment. Even if
the task switch time were reduced to zero, the overhead
of sending a message8 in a system where this function is
handled in software is still prohibitive. End-to-end hardware support for communication is required to achieve
acceptable latency.
The rightmost column represents times that could be
achieved by making some minor modifications to the
J-Machine. In particular, task switch time could be reduced from 10 cycles (when registers need to be saved)
or 3 cycles (w /0 register save) to a single cycle by providing more support for multithreading [29, 39]. The
J-Machine would also benefit from more user registers,
automatic destination translation on message send, being able to subset the communication operation, and a
.
7The estimate of 1000 instruction times or 25/1-s for the i860
extrapolated from other microprocessors and hence very generOUS; because of the complexity of event handling on this chip the
actual number is higher.
'
8S ome receive time is also included in this number.
IS
non-LIFO message buffer.
7
Related Work
Like the message-driven processor from which the MIT
J-Machine is built, the Caltech MOSAIC [33], Intel
iWARP [3], and INMOS Transputer [26] are integrated
processing nodes that incorporate a processor with memory and communication on a single chip. These integrated nodes, however, lack the efficient mechanisms
of the MDP and thus cannot efficiently support many
different models of computation. Also, the softwarerouted, bit-serial Transputer network does not have adequate performance for many applications.
Many machines built for a specific model of computation have been generalizing their mechanisms. For
example, the MIT Alewife machine [1], while specialized for the shared-memory model, provides an interprocessor interrupt facility that can be used for general
message-passing. Being memory mapped, this operation
is somewhat slower than the register-based send operation described above. Dataflow machines, which once
hard-wired a particular dataflow model into the architecture [30, 34], have also been moving in the direction
of general mechanisms with the EM4 [31] and *T [28].
757
8
Conclusion
Two enabling technologies, fast networks (Section 4)
and efficient interaction mechanisms (Section 5), make
it possible to build and program fine-grain parallel computers. Fine-grain machines have much less memory per
processor than conventional machines because they are
balanced by cost, rather than by capa.city to speed ratios. Increasing the processor to memory ratio improves
the processor throughput and local memory bandwidth
by a factor of 50 with only a small increase in system
cost.
We expect this dramatic performance/ cost advantage
will lead to mechanism-based fine-grain parallel computers becoming universal, replacing sequential computers
in all sizes of systems from personal desktop computers
to institutional supercomputers. This universal parallel computer will not happen with existing semiconductor price structures, where processor silicon is an order
of magnitude more expensive per unit area than memory silicon. Cost effective fine-grain computing requires
a true jellybean (inexpensive and plentiful) processingnode chip.
Low-latency networks enable each node in a fine grain
machine to access any memory location in the machine
in time competitive with a global memory access in
a conventional machine. Thus, the small memory per
node does not limit either the problem size that can be
handled or sequential execution speed. A fine-grain machine can execute sequential programs with performance
competitive with conventional machines.
High-bandwidth networks and efficient interaction
mechanisms enable fine-grain computers to apply their
high aggregE.te processor throughput and memory bandwidth with minimum overhead. Reducing interaction
overhead to a few instruction times (Table 1) increases
the amount of parallelism that ca.n be economically exploited. It also simplifies programming as tasks and
data structures no longer have to be grouped into large
chunks to amortize large communication, synchronization, and task-switching overheads.
At MIT we have built and programmed the J-Machine
to test, evaluate, and demonstrate our network and mechanisms. By running three programming systems on the
machine, we have demonstrated the flexibility of its mechanisms and generated some ideas on how to improve
them. The next step is to work to commercialize this
technology by developing a more integrated and higherperformance processing node in today's technology and
by providing bridges of compatibiliity to existing sequential software.
References
[1] Anant Agarwal et al. The MIT Alewife Machine: A
Large-Scale Distributed-Memory Multiprocessor. In
Scalable Shared Memory Multiprocessors. Kluwer Academic Publishers, 1991.
[2] David Bailey. The NAS Parallel Benchmarks. Presentation given in 1991.
[3] Shekhar Borkar et al. iWARP: An Integrated Solution
to High-Speed Parallel Computing. In Proceedings of
the Supercomputing Conference, pages 330-338. IEEE,
November 1988.
[4] Andrew A. Chien and Jae H. Kim. Planar-Adaptive
Routing: Low-cost Adaptive Networks for Multiprocessors. In Proceedings of the International Symposium
on Computer Architecture, Queensland, Australia, May
1992. IEEE.
[5] Intel Corporation. Paragon XP IS. Product Overview,
1991.
[6] David E. Culler et al. Fine-Grain Parallelism with
Minimal Hardware Support: A Compiler-Controlled
Threaded Abstract Machine. In Proceedings of the
Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 164-175. ACM, April 1991.
[7] William J. Dally. Directions in Concurrent Computing. In Proceedings of the International Conference on
Computer Design, pages 102-106. IEEE, October 1986.
Conference at Port Chester, New York.
[8] William J. Dally. Wire-Efficient VLSI Multiprocessor
Communication Networks. In Paul Losleban, editor,
Proceedings of Stanford Conference on Advanced Research in VLSI, pages 391-415. MIT Press, 1987.
[9] William J. Dally. Mechanisms for Concurrent Computing. In Proceedings of the International Conference
on Fifth Generation Computer Systems, pages 154-156,
December 1988.
[10] William J. Dally.
The J-Machine System.
In
Patrick Winston with Sarah A. Shellard, editor, Artificial Intelligence at MIT: Expanding Frontiers, chapter 21, pages 536-569. MIT Press, 1990.
[11] William J. Dally. Network and Processor Architecture
for Message-Driven Computers. In Suaya and BirtwhistIe, editors, VLSI and Parallel Computation. Morgan
Kaufmann, 1990.
[12] William J. Dally. Express Cubes: Improving the Performance ofk-ary n-cube Interconnection Networks. IEEE
Transactions on Computers, pages 1016-1023, September 1991.
[13] William J. Dally. Virtual-Channel Flow Control. IEEE
Transactions on Parallel and Distributed Systems, 3(2),
March 1991.
758
[14] William J. Dally and Hiromichi Aoki. Adaptive Routing using Virtual Channels. IEEE Transactions on Parallel and Distributed Computing, 1992.
[15] William J. Dally et al. Architecture of a MessageDriven Processor. In Proceedings of the 14th International Symposium on Computer Architecture, pages
189-205. IEEE, June 1987.
[16] William J. Dally et al. Design and Implementation of
the Message-Driven Processor. In Proceedings of the
1992 Brown/MIT Conference on Advanced Research in
VLSI and Parallel Systems. MIT Press, March 1992.
[17] William J. Dally and Paul Song. Design of a Self-Timed
VLSI Multicomputer Communication Controller. In
Proceedings of the International Conference on Computer Design, pages 230-234. IEEE, October 1987.
[18] William J. Dally and D. Scott Wills. Universal Mechanisms for Concurrency. In G. Goos and J. Hartmanis, editors, Proceedings of PARLE-89, pages 19-33.
Springer-Verlag, June 1989.
[19] William J. Dally, D. Scott Wills, and Richard Lethin.
Mechanisms for Parallel Computing. In Proceedings
of the NATO Advanced Study Institute on Parallel
Computing on Distributed Memory Mutliprocessors.
Springer, 1991.
[20] J. A. Fisher and B. R. Rau. Instruction-Level Parallel
Processing. Science, pages 1233-1241, September 1991.
[21] Geoffrey Fox et al. Solving Problems on Concurrent
Computers. Prentice Hall, 1988.
[22] John L. Hennessy and Patterson David A. Computer
A rchitecture a Quantitative Approach. Morgan Kaufmann, San Mateo, 1990.
[23] John 1. Hennessy and Norman P. Jouppi. Computer
Technology and Architecture: An Evolving Interaction.
Computer, pages 18-29, September 1991.
[24] Waldemar Horwat, Andrew Chien, and William J.
Dally. Experience with CST:Programming and Implementation. In Proceedings of the ACM SIGPLAN 89
Conference on Programming Language Design and Implementation, 1989.
[25] S. Konstantinidou and 1. Snyder. Chaos router: architecture and performance. In 18th Annual Symposium
on Computer Architecture, pages 212-221, 1991.
[26] InMOS Limited. IMS T424 Reference Manual. Order
Number 72 TRN 00600, November 1984.
[27] Carver A. Mead and Lynn A. Conway. Introduction to
VLSI Systems. Addison-Wesley, Reading, Mass, 1980.
[28] Rishiyur S. Nikhil, Gregory M. Papadopoulos, and
Arvind. *T: A Multithreaded Massively Parallel Architecture. Computation Structures Group Memo 325-1,
Massachusetts Institute of Technology Laboratory for
Computer Science, November 15 1991.
[29] Peter R. Nuth and William J. Dally. A Mechanism
for Efficient Context Switching. In Proceedings of the
International Conference on Computer Design. IEEE,
October 1991.
[30] Gregory M. Papadopoulos and David E. Culler. Monsoon: an Explicit Token-Store Architecture. In The
17th Annual International Symposium on Computer
Architecture, pages 82-91. IEEE, 1990.
[31] S. Sakai et al. An Architecture of a Dataflow Single
Chip Processor. In Proceedings of the 16th Annual Symposium on Computer Architecture, pages 46-53, 1989.
[32] Charles 1. Seitz. The Cosmic Cube. Communications
of the ACM, 28(1):22-33, January 1985.
[33] Charles L. Seitz et al. Sub micron Systems Architecture. Semiannual Technical Report Caltech-CS-TR-9005, Department of Computer Science, California Institute of Technology, March 15 1990.
[34] Toshio.Shimada, Kei Hiraki, Kenji Nishida, and Satosi
Sekiguchi. Evaluation of a Prototype Data Flow Processor of the Sigma-l for Scientific Computations. In 13th
Annual International Symposium on Computer Architecture, pages 226-234. IEEE, June 1986.
[35] Alan Jay Smith. Cache Memories. Computing Surveys,
14(3):473-530, September 1982.
[36] Burton J. Smith. Architecture and applications of the
HEP multiprocessor computer system. In SPIE Vol.
298 Real-Time Signal Processing IV, pages 241-248.
Denelcor, Inc., Aurora, Col, 1981.
[37] Ellen Spertus and William J. Dally. Experiments with
Dataflow on a General-Purpose Parallel Computer. In
Proceedings of International Conference on Parallel
Processing, pages II231-II235, Aug 1991.
[38] Brian Totty. Experimental Analysis of Data Management for Distributed Data Structures. Master's thesis,
University of Illinois, 1991.
[39] Wolf-Dietrich Weber and Anoop Gupta. Exploring the
Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results. In The 16th
Annual International Symposium on Computer Architecture, pages 273-280. IEEE Computer Society Press,
1989.
[40] D. Scott Wills. Pi: A Parallel Architecture Interface
for Multi-Model Execution. PhD thesis, Massachusetts
Institute of Technology, May 1990.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
759
An Automatic Translation Scheme from Prolog
to the Andorra Kernel Language *
Francisco Bueno
bueno@fi.upm.es
Manuel Hermenegildo t
or herme@cs.utexas.edu
herme@fi.upm.es
Facultad de Informatica
Universidad Politecnica de Madrid (UPM)
28660-Boadilla del Monte, Madrid - Spain
Abstract
The Andorra family of languages (which includes the
Andorra Kernel Language -AKL) is aimed, in principle,
at simultaneously supporting the programming styles of
Prolog and committed choice languages. On the other
hand, AKL requires a somewhat detailed specification
of control by the user. This could be avoided by programming in Prolog to run on AKL. However, Prolog
programs cannot be executed directly on AKL. This
is due to a number of factors, from more or less trivial
syntactic differences to more involved issues such as the
treatment of cut and making the exploitation of certain
types of parallelism possible. This paper provides basic guidelines for constructing an automatic compiler
of Prolog programs into AKL, which can bridge those
differences. In addition to supporting Prolog, our style
of translation achieves independent and-parallel execution where possible, which is relevant since this type of
parallel execution preserves, through the translation,
the user-perceived "complexity" of the original Prolog
program.
1
Introduction
A desirable goal in logic programming language design is to support both the don't-know nondeterministic, search-oriented programming style of Prolog and
the don't-care indeterministic, concurrent communicating agents programming style of committed-choice languages. Furthermore, from an implementation point
of view it is interesting to be able to support the orand independent and-parallelism often exploited in the
former (e.g. [Lus88, AK90, Ka187, HG90j) as well
as the dependent and-parallelism exploited in the latter (e.g. [Cra90, IMT87, HS86]). The Andorra family of languages is aimed at simultaneously supporting
-This work was funded in part by both ESPRIT project 2471
"PEPMA" and CICYT project 305.90.
tPlease direct correspondence to Manuel Hermenegildo at the
above address.
these two programming paradigms and their associated
modes of parallel execution. The Andorra proposal in
[War] (called the "basic" andorra model, on which the
Andorra-I system [SCWY90] is based) defined a framework which allowed or-parallelism and also the andparallel execution of deterministic goals (deterministic
"stream and-parallelism"), this now being called the
"Andorra Principle."
An important idea behind the choice of control in
the basic Andorra model is to perform the least possible amount of computation while allowing the maximum amount of parallelism to be exploited. Another
and complementary way of achieving this goal which
has also been identified [HR89, HR90] is to also run in
parallel nondeterministic goals, but provided (or while)
they are independent ("independent and-parallelism" lAP). In order to also include this type of parallelism
the Extended Andorra Model (EAM) [War90, HJ90] defines an execution framework which allows lAP in addition to the forms of parallelism supported in the basic
Andorra model. The EAM defines rules which specify
a series of admissible steps of computation from each
possible given state. Several rules can be admissible
from a given state and this gives rise to both nondeterminism and indeterminism, and also to opportunities
for parallel execution. One important issue within this
framework is thus that of control: i.e. which of the admissible rules should be applied in order to achieve the
most efficient execution while ,attaining the maximum
parallelism.
Two obvious approaches to treating the above mentioned issue are to put control decisions in the hands
of the programmer or to try to do this automatically
by compile-time and/or run-time analysis. The Andorra Kernel Language (AKL) [HJ90, JH91], uses explicit control. In particular, AKL allows (dependent)
parallel execution of determinate subgoals, as stated
by the Andorra Principle, but it also allows the more
general forms of parallel execution of the EAM, albeit
controlled by the programmer. The specification of control is done, among other mechanisms, by positioning
760
the goals and constraints before or after a guard operator, in a way that can be reminiscent of the labeling
of unification as input or output (i.e. ask or tell constraints [Sar89]) in the GHC language [Ued87aj. These
operators divide body clauses into two parts, the guard
and the actual body. Guards are executed In independent environments and proceed unless they attempt
to perform output unification, while bodies wait until guards are completely solved and goals in the body
promoted. Such goals are then executed concurrently
provided they are deterministic, in the spirit of the Andorra Principle. These properties give a means of control to the programmer which can be used to achieve
parallel execution of general goals.
The AKL is therefore quite a powerful language.
However, it does put quite a burden on the programmer
in requiring certain specification of control. In particular, Prolog programs cannot always be executed directly on the AKL. This is due to a number of factors
from more or less trivial syntactic differences to mor~
involved issues such as the treatment of cut, labeling
of unification, and making the exploitation of certain
types of parallelism, most notably lAP, possible without user involvement and preserving the programmerperceived complexity of the original program.
The objective of this paper is to investigate how the
above mentioned differences can be bridged, through
program analysis and transformation. It points out
the non-trivial problems involved in performing such
a translation, and then provides solutions for these
problems. Although desirable, our aim at this point
is not to provide the best possible translation, which
would take advantage of AKL properties to achieve a
large reduction of search space, but rather to bridge
the gap between Prolog and AKL in a manner that no
increment in the search space is done, and also lAP
can be exploited (with the important result of achieving "stability" in the frame of AKL for these cases).
Building on partial translation approaches presented
in [JH90, Her90] the paper presents a basic algorithm
for constructing a translator from Prolog to AKLI.
An important feature of the translation approach proposed herein is that it automatically detects and allows
the parallel execution of independent goals (as well of
course as or-parallelism, and the parallel execution of
deterministic goals even if they are dependent as per
the Andorra Principle). The execution of independent
goals in parallel has the very desirable properties of preserving the program complexity perceived by the programmer [HR89]. Important requirements for such a
translation are the compile-time detection of goal independence and input/output modes. This requires in
general a global analysis of the program, perhaps us1 Veda [Ved87bj proposed automatic translation from Prolog
to a committed-choice language (GHC, in his case). However,
our aim and target language are quite different.
ing abstract interpretation. In the approach proposed
herein heavy use will be made of our compile-time tools,
developed in. the context of &-Prolog [HG90j. In particular, Prolog programs are first analyzed and annotated
as &-Prolog programs (thus making goal independence
explicit), and then they are translated into AKL.
In the following section, the AKL control model and
its rules are briefly reviewed together with some syntactic conventions. Then transformations for Prolog
constructions for a basic translation are presented in
section 3 and some rules for combining the AKL model
with our purpose of achievement of independent parallelism are shown in section 4. Section 5 will present the
analysis tools and why they are needed in the translation process. In section 6 some results are shown for
the execution of a number of benchmarks automatically
translated, and section 7 presents some conclusions.
2
The Andorra Kernel
guage Revisited
Lan-
In this section we present a brief overview of the AKL
model of execution, in order to make the paper selfcontained. The purpose is to, based on an understanding of this, extract the correct rules for a translation of
Prolog which achieves the desired results. AKL and
its model of execution have been fully described in
[JH91, HJ90j.
AKL is a language with deep guards. Thus, clauses
are divided into two parts: the guard and the body, separated by a guard operator. Guard operators are: wait
(:), cut (!), and commit (I). The following syntactical
restrictions apply:
• Each clause is expected to have one and only one
guard operator;
• All clauses in the definition of a predicate have to
be guarded by the same guard operator. So, if any
of the clauses is not guarded, the guard operator
of its companions is assumed and positioned just
after the clause neck.
• A wait operator is assumed, and in the above mentioned position, where no other operator can be
assumed using the above mentioned rules.
Guards are regarded as part of clause selection. This
means that a clause body is not entered unless head
unification succeeds and its guard is completely solved.
Then, execution proceeds by "expansion" of the present
configuration by application of a rule of the computation model. The rules in the AKL model allow rewriting
of configurations (states) leading to valid configurations
from valid ones. They are fully described in [JH91j, so
we will simply enumerate them, providing very informally the concept behind the rule, rather than a precise
definition:
761
1. Local forking: unfolds an atomic goal into a choice
partition([EIR] ,C,Left,Right):E < C, I.
Left = [EILeft1],
partition(R,C,Left1,Right) .
partition([EIR] ,C,Left,Right):-
of all the alternatives in its definition (but without
creating "copies" 2 yet of continuation goals).
2. Nondeterminate promotion: promotes one guarded
goal with solved guard in a choice of several of them
(i.e. copies the goal to the parent continuation, applying its constraint/substitution to it, and creates
a "copy" of the continuation environment).
3. Determinate promotion: special case of the above
when there is a single guarded goal in a choice if
its guard is solved (no copying of the continuation
environment is necessary).
4. Failure and synchronization rules: remove or fail
configurations in the usual way.
5. Pruning rules: handle the effects of pruning guard
operators.
6. DMtribution and bagof rules: do the distribution of
guards and the bagof operation.
These rules basically represent the allowable transitions of the EAM. The last three rules are less relevant
for our purposes. In addition to these rules there are
three basic control restrictions in the general computation model (meta-rules) which control the application
of the above rules and which are highly relevant to our
independent style translation:
• Pruning in AKL has to be quiet, that is, a solution
for the guard of a cut or commit guarded clause
may not further restrict (or constrain) variables
outside its own configuration.
• Goals in the guard of a clause are completely and
locally executed. This means that execution of
guards is simultaneous but independent of the parent environment.
• Nondeterminate promotion is only admissible
within a stable subgoalof a configuration. A goal is
stable if no rule is applicable to any subgoal, and
no possible changes in its environment will lead to
a situation in which a rule is applicable in the goaL
As we shall soon see these three restrictions force the
conditions under which translation has to be done if we
want to achieve parallelism and correct pruning in the
translated clauses. But first, we will illustrate the AKL
execution model with a simple example:
partition([],_,Left,Right):- I,
Left = [],
Right = [].
2 Although we refer to "copying" throughout the paper, part
of the continuation goals could in principle be shared [War90j.
E
)=
C,
I,
Right = [EIRight1],
partition(R,C,Left,Right1).
For a query such as partition([2.1] .3.1.D) the
initial configuration would be a choice-point with the
three clauses for the predicate. Head unification would
fail the first alternative ([] = [2.1]), but the second one
would succeed ([EIR1= [2.1] • C=3. E=C (i.e. 2>=3) would be executed
(and failed) only after promotion. After failure of this
branch, determinate promotion of the remaining one
would be applicable, and execution would proceed as
before.
3
Translating Prolog Constructions
Having the aforementioned rules in mind, we now discuss transformation rules for translating basic Prolog
constructions, disregarding any possible exploitation of
lAP. Even this straightforward step is nontrivial, as we
shall soon see. This is due mainly to the semantics of
cut in both Prolog and AKL, cut being a guard operator in the latter. With the restrictions required for
guard operators to achieve both syntactic and semantic
correctness in AKL, we find problems in the following
constructions:
• syntactical restrictions:
- definitions of predicates in which a pruning
clause appears,
- clauses in which more than one cut appears;
• semantic restrictions:
- if-then-elses, where the cut has a "local" pruning effect,
- pruning clauses where the cut is regarded as
762
noisy (i.e. attempts to further restrict variables outside its scope),
- side-effects and meta-logical predicates, whic.h
should be sequentialized.
The transformations required to deal with these constructions are proposed in the following subsections.
This is done mainly through examples. The aim is thus
not to provide precise and formal definitions of program
transformations but rather to provide the intuition behind the process of translation. In subsequent sections
we will discuss other issues involved in the process of
translation, such as achievement of lAP, problems in
this, and its relation with the AKL stability conditions.
3.1
Direct translation
First, as all AKL clauses in a definition are forced to
have the same guard operator, we have to ensure this
is achieved. For example:
Example 1
Same guard operator in a definition
p(X,Y):- q(X), reV).
p(X,Y):- test(X) , !,
output(Y) .
p(X,Y):- a(X,Y).
p(X,Y):- q(X), reV).
p(X,Y):- pc(X,Y).
pc(X,Y):- test(X) , !,
output(Y) .
pc(X,Y):- a(X,Y).
Note that clauses before the pruning one will have an
(assumed) wait operator and clauses after that one (and
that one itself) will have an (assumed) cut operator.
All of them but the pruning one have an empty guard.
Note that, had the program not been rewritten, the
rules for assuming guard operators would have put a
cut operator in the first clause, which is obviously not
the correct translation.
Note also,· that only one guard operator is to be allowed in a clause. Therefore repeated cuts in the same
body (which are otherwise strongly discouraged as a
matter of style and declarativeness) have to be "folded"
out using the technique sketched below:
Example 2
p(X, Y):-
Single guard operator in a clause
teat(X) , !,
test (Y), !,
accept(X,Y).
p(X,Y):- test(X) , !,
foo(X, Y).
foo(X,Y):- test(Y), !,
accept(X,Y).
Second, the AKL cut operator is regarded as a guard
operator, and, furthermore, it has to be quiet (which is
not the case in some Prolog constructions, which cannot be easily translated to AKL). One of them is local
pruning, i.e. if-then-else. Indeed, an if-then-else can be
viewed as a disjunction containing a cut whose scope is
limited to the disjunction itself, rather than the clause
in which it appears. Thus the following preprocessing
can be done:
Example 3
Local pruning of if-then-else
p(X):- (cond(X) ->
q(X, Y)
r(X,Z)
), s(Y,Z).
p(X):- foo(X,Y,Z), s(Y,Z).
foo(X,Y,_):- cond(I), I,
q(X, Y).
foo(X,_,Z):- r(I,Z).
Last but not least, we have to ensure the quietness
of all AKL cuts. A cut is quiet if it does not attempt
to bind variables which are seen from outside its own
scope, that is, the clause where they appear. Then,
if this is not the case, we have to make that binding
explicit in the form of an equality constraint (a unification) and place it after the cut itself, i.e. outside the
guarded part of the clause:
Example 4
Making a cut quiet
p(X,Y):- teateX) ,
output(Y), !.
p(X,Y):- s(I,Y).
p(X,Y):- test(X) ,
output (Yl), !,
Yl=Y.
p(X,Y):- s(X,Y).
Note that knowledge of input/output modes of variables is required for performing this transformation,
and that the transformation may not always be safe3 •
This will be discussed in the following subsection.
3.2
Noisiness of cut
The main difference between cut in Prolog and cut in
AKL is that cut is quiet in AKL 4 • "Quiet" in the context
of a cut means that the solution of the cut's guard is
quiet, that is, it does not add constraints to variables
outside the guarded goals themselves, other than those
which already appear in its environment.
Indeed, a transformation such as the one proposed
in example (3.1).4 can make a noisy cut quiet. What
it does is to delay output unification until the guard is
promoted by making it explicit in the body part of the
clause. We regard a variable to be output in a query
if execution for this query will further constrain it; a
variable will be regarded as input if execution will depend on its state of instantiation (or constraint). In
other words, a variable is an output variable in a literal
if it is further instantiated by the query this literal represents it is an input variable if it makes a difference
for the' execution of the literal whether the variable is
instantiated or not 5 • Note that a given variable can be
both input and output, or none of them.
3Note also that this transformation, when safe, may be of
advantage as well in standard Prolog compilers in order to avoid
trailing overhead.
4Nevertheless, a noisy cut has also been implemented in AKL,
which we will discuss later.
5These definitions are similar to those independently proposed
in [SCWY91], (and also in the spirit of those of Gregory [Gre8S]),
which describes translation techniques from Prolog to AndorraI, an implementation of the Basic Andorra Model. ~ltho~gh ~he
techniques used in such a translation have some relatlonshl~ wl~h
those involved in Prolog-AKL translation, the latter requires m
practice quite different techniques due to AKL being based on the
763
The objective of a transformation such as the one
proposed is to rename apart all output variables in the
head of a pruning clause, and then bind the new variables to the original ones in the body of the clause
leaving input variables untouched. In general, it is un~
wise to rename apart input variables since,· from their
own definition, this renaming would make the variable
~ppear uninstantiated and potentially result in growth
In the search space of the goals involved. This would
not meet our objective of preserving the complexity of
~he progra~ (and perhaps not even that of preserving
Its semantIcs). However, since a variable can be both
input and output a conflict between renaming and notrenaming requirements appears in such cases. For these
cases< in which a variable cannot be "moved" after the
cut guard operator a real noisy cut is needed. This
?pe~ator exists in AKL (!!), together with a sequentialIzatlOn operator, the sequential conjunction (&). It is
necessary that every noisy cut be sequentialized, this
to ensure that pruning would occur in the same context that it would in Prolog. Thus, every literal call
to the pruning predicate has to be sequentialized to its
right, and every other call to a predicate sequentialized
has in turn to be also sequentialized. For this reason
noisy cut is not very efficient, and thus the translation
tries to minimize its use.
At this point we can summarize the action that
should be taken in every case to transform the pruning clauses of a Prolog program, based on the knowledge of input/output variables, that is, whether they
are "tested" or not and further instantiated or not.
Here we use "noisy" to mean the transformation that
defaults to the AKL noisy cut, and "move" to refer to
the renaming of variables like in example (3.1).4.
II
Further Instantiated?
yes
no
unknown
I
Tested?
I Action II
yes
no
unknown
noisy
move
user
none
user
move
user
'"
yes
no
unknown
Note that the knowledge of input/output modes in
the Prolog program that is assumed in this transformation requires in general a global analysis of the program
and can only be approximated, the translator having
to make conservative approximations or warn the user
("user" cases above) when insufficient information is
available. Note also that the "user" cases can be replaced by "noisy" cases if a non-interactive transformation is preferred. This subject will be discussed further
in section 5, as well as the type of analysis required.
Extended A?dorra Model (thus having to deal with the possibility
of parallehsm among n~n-deterministic goals and the stability
rules) .and the rather different way in which the control of the
execu~lOn model (explicit in AKL and implicit in Andorra-I) is
done in each language.
3.3
Synchronization of side-effects
In general, the purpose of side-effect synchronization is
to prevent a side effect from being executed before other
preceding (in the sense of the sequential operational sema.il.tics) side-effects or goals, in the cases when such
adherence to the sequential order is desired. In our
context, if side-effects are allowed within parallel AKL
code and a behaviour of the program identical to that
observable on a sequential Prolog implementation is to
be preserved, then some type of synchronization code
should be added to the program. In general, in order
to preserve the sequential observable behaviour, sideeffects can only be executed when every subgoal to their
left has been execute'd, i.e. when they are "leftmost"
in the execution tree. However, a distinction can be
made between soft and hard side-effects (a side-effect is
regarded to be hard if it could affect subsequent execution) , see [DeG87] and [MH89]. This distinction allows
more parallelism. It is also convenient in this context to
distinguish between side-effect built-ins and side-effect
procedures, i.e. those procedures that have side-effects
in their clauses or call other side-effect procedures.
To achieve side-effect synchronization, various
compile-time methods are possible:
• To use a chain of variables to pass a "leftmost token", taking advantage of the suspension properties of guards to suspend execution until arrival of
the token [SCWY91].
• To use chains of variables as semaphores with some
compact primitives that test their value. In [MH89]
a solution was proposed along such lines, and its
implementation discussed.
• To use a sequentialization built-in to make the sideeffect and the code surrounding it wait; this primitive would be in our case the sequentialization operator "&".
In the first solution, a pair of arguments is added
to the heads of relevant predicates for synchronization.
Side-effects are encapsulated in clauses with a wait (:)
guard containing an "ask" unification of the first argument with some known value (token), to be passed
by the preced~g side-effect upon its completion. Upon
successful execution of the current side-effect the second argument is bound ("tell") to the known value and
the token thus passed along. This quite elegant solution
can be optimized in several cases.
The second solution can be viewed as an efficient
implementation of the first one, which allows further
optimization [MH89]. The logical variables which are
passed to procedures in the extra arguments behave as
semaphores, and synchronization primitives operate on
the semaphore values.
764
In the third solution, every soft side-effect is synchronized to its left with the sequentialization operator "&", and every hard one both to its left and right.
This sequentialization is propagated upwards to the
level needed to preserve correctness. This introduces
some unnecessary restrictions to the parallelism available. However, if side-effects appear close to the top of
the execution tree, this may be quite a good solution.
4
Stability and Achievement of
Independent And-Parallelism
In order to achieve more parallelism than that available
by the translations described so far one might think of
translating Prolog into AKL so that every subgoal could
run in parallel unrestricted. However, this can be very
inefficient and would violate the premise of preserving
the results and complexity of the computation expected
by the user. On the other hand, and as mentioned
before, parallel execution of independent goals, even if
they are nondeterministic, is an efficient and desirable
form of parallelism and its addition motivated the development of the EAM, on which the AKL is based.
Nevertheless, in AKL goals known to be independent
have to be explicitly rewritten in order to make sure
that they will be run in parallel. This is because of the
rules that govern the (nondeterminate) promotion, that
is, the stability condition on nondeterminate promotion, which will prevent these goals for being promoted
if they try to bind external variables for output. Therefore, one important issue is the transformation that is
needed to avoid suspension of independent goals. This
is presented in section 4.1. Also, independence detection can and will be used to reduce stability checking,
a potentially expensive operation.
Clearly, an important issue in this context is how
stability /goal independence is detected. In the framework of the &-Prolog system we have already developed
technology and the associated tools for determining independence conditions for goals and partially evaluating many of those conditions at compile-time through
program analysis. Conceptual models for independent
and-parallel execution have been presented and their
correctness and efficiency proved [HR89j; among all
we focus on the and-parallelism models proposed in
[HR90, HR89j. For different but related models the
reader is referred to the references in those papers. As
mentioned before, in the translation process we propose to use algorithms and tools already developed in
the context of &-Prolog. In this context, a series of algorithms used in the &-Prolog compiler for annotating
Prolog programs have been implemented and described
in [MH90j. These algorithms select goals for parallel execution and, using the sufficient rules proposed
in [HR89], generate the conditions under which inde-
pendence is achieved and therefore independent parallel execution ensured. The result is a transforma.tion of
a given Prolog clause into an &-Prolog clause containing parallel e'xpressions which achieve such independent
and-parallelism.
The output of this analysis is made available for
the translation process in the form of an annotated
&-Prolog program [HG90j, i.e. the program itself expresses whi~h goals are independent and under which
conditions. These conditions are expressed in the form
of if-then-elses which have the intuitive meaning of "if
the conditions hold then run in parallel otherwise sequentially." The parallelism itself is made explicit by
using the "&" operator to denote parallel conjunction
instead of the standard sequential conjunction denoted
by "," 6. Some new issues are involved in the interaction
between the conditions of these parallel expressions and
other goals run in parallel concurrently, as it would be.
the case in AKL. These will be presented in section 4.2.
4.1
The transformation proposed
At this point the &-Prolog conditionals are regarded as
input to the translator. As such, if-then-elses are preprocessed in the form mentioned in the previous sections and the remaining issue is the treatment of the
parallelization operator "&". In implementing this operator we will use the AKL property that allows local
and unrestricted execution of guards, i.e., goals that are
encapsulated in a guard can run in parallel with goals
in other guards even if they are nondeterministic. The
transformation that takes advantage of this will:
• put goals known to be independent in (different)
guards, and
• extract output arguments from the guards, binding
them in the body part of the clauses,
the last step being required so that the execution of
these goals is not suspended because of their attempting to perform output unification. With the guard encapsulation we ensure that those predicates will be executed simultaneously and independently. The' following
example illustrates the transformation involved:
Example 5
Encapsulation of independent subgoals
p(X):- (ground (X) ,
indep(Y,Z) ->
q(X,Y) t r(X,Z)
; q(X,Y) , r(X,Z)
),
s(Y,Z) .
p(X):- pp(X,Y,Z), s(Y,Z).
pp(X,Y,Z):- ground(X) ,
indep(Y, Z), !,
qp(X,Y),
rp(X,Z).
pp(X,Y,Z):- q(X,Y), r(X,Z).
qp(X,Y):- q(X,Y1), ., Y=Y1.
rp(X,Z):- r(X,Z1), ., Z=Z1.
6Note that in AKL these operators have just the opposite
meaning!.
765
When the condition is met, both sub goals will be
tried by the local fork rule, then both guards will be
completely and locally solved, and then, as goals are
independent on X (X is ground) and no output is produced in the guard, the nondeterminate promotion rule
is always applicable and all solutions will be tried in
the standard cartesian product way. Thus, parallel execution is ensured for those goals that are identified as
independent.
On the other hand, when the condition fails (the goals
being dependent) they appear together in a body with
an empty guard. This means that the guard will be immediately solved, the clause body promoted, and subgoals tried simultaneously. Then the standard stability
and promotion rules will apply.
It should be noted that, as in the case of cut,
and in addition to detecting goal independence, to be
able to perform this transformation it is necessary to
have inferred mode information regarding the predicate
clauses. In section 5 techniques used in order to infer
this information will be reviewed.
4.2
Cohabitation of dependent and
independent and-parallelism and
stability checks
When evaluating the conditions of parallel expressions
at run-time within a parallel framework such as that
of the AKL, they may not evaluate to the same value
than during a Prolog execution. This is what we have
termed in another context the CGE-condition problem
[GSCYR91]7, and may result in a loss (or increase) of
parallelism. To deal with these issues, different levels
of restrictions can be placed on the translation:
• Disallow any parallel execution except for those
goals found to be independent.
• Allow parallel execution only for goals not binding
variables that appear in the conditions or CG E.
• Allow parallel execution outside a CGE but sequentialize before and after the conditional parallel
expressions.
• Allow unrestricted parallel execution unrestricted,
i.e. no sequentialization is to be done.
The first solution can be implemented by translating
every conjunction as a sequential AKL conjunction, except those joining independent goals. This will lead to
7Note that some other problems mentioned in [GSCYH91] regarding the interaction between independent and dependent andparallelism (in particular, the deterministic goal problem) are less of
an issue in the proposed translation to AKL because independent
goals execute in their own environments, thanks to the dynamic
scoping of AKL guards. In any case, the AKL implementation is
assumed to cope with all types of goal activations possible within
the EAM.
a type of execution where only goals known to be independent are run in parallel and which directly resembles that of &-Prolog [RG90]. The same search space
as &-Prolog will be explored. Nondeterminate (and determinate) promotion will then be restricted to only
independent and sequential goals. Thus, one very important advantage of this translation is that no checks
on stability ever need to be done, as stability is ensured
for sequential and independent execution. This is an
important issue since stability checking is a potentially
expensive operation (and very closely related to independence checking). Thus, in an ideal AKL implementation code translated as above, i.e. free of stability
checks, should run with comparable efficiency to that
of &-Prolog. On the other hand, the transformation
loses determinate dependent and-parallelism and its desirable effect of co-routining, which could be useful in
reducing search space [SCWY90j.
The second solution attempts to preserve the environment in which the CGE evaluates while allowing coroutining of goals that don't affect CG E conditions and
goals. Although interesting, this appears quite difficult
to implement in practice as it requires very sophisticated compile-time analysis and will probably incur in
run-time overheads for checking of the conditions placed
in the program.
The third solution can be viewed as a relaxation of
the first one to achieve some coroutining, or as an efficient (and feasible) way of partially implementing the
second one. Goals before and after are allowed to execute in parallel using the Andorra Principle, but they
are sequentialized just before and after a CG E. In this
way CGEs evaluate in the same context as in Prolog
and the same level of independent and-parallelism is
achieved. This translation has the good characteristics
regarding search space of the previous one. In addition, some reduction of search space due to coroutining
will be achieved. However, stability checking, although
reduced, cannot in general be eliminated altogether.
The fourth solution will allow every goal to run in
parallel. The full EAM and AKL operational semantics (including stability) has to be preserved. The
execution of goals which are unconditionally independent or depend only on groundness checks (conditionals
in the parallel expressions are composed of ground/1
and indep/2 checks, as in the example of section 4.1)
will be the same as in &-Prolog as eager execution
of other goals cannot affect ground or empty checks
[GSCYH91j. However, independence' checks may fail
where they wouldn't in Prolog (therefore losing this
parallelism), but also succeed where they would fail in
Prolog (therefore gaining this parallelism). Also, the
number of parallel steps will always be the same or less
as in Prolog (although different than in &-Prolog). This
solution (as well as the first and second ones) appear as
quite reasonable compromises and offer different trade-
766
offs. The current translation approach uses this fourth
option, but the others should also be explored.
5
II
Inferring modes - Abstract Interpretation
We have mentioned in previous sections the need for
inferring modes of clause variables (i.e. whether they
are input or output variables) in Prolog programs. The
main reason for this need is that we have to know which
are the output variables in a clause in order to rename
them apart and place corresponding bindings for them
in the body part of the clause in both
• the pruning clauses (as shown in section 3.2), and
• the remade clauses for parallel execution (as shown
in section 4.1 in example 5).
Much work has been done in global analysis of logic
programs to infer run-time properties, and, in particular, modes, mostly using the technique of abstract interpretation [CC77]. A more sophisticated sort of variable binding analysis (comprising groundness, aliasing,
and freeness information) is instrumental in the process of inferring the independence conditions for literals in a body. While not strictly needed, such an
analysis is extremely useful as it allows the reduction of the number of conditions and therefore the improvement of performance by reducing run-time checking [WHD88, MH91b] (these papers provide references
to the important body of other work in this area).
The standard global analyzer in the &-Prolog compiler,
described in [MH91 b], infers groundness and variable
sharing/ aliasing. Since variable freeness is also needed
for the AKL translator, this analyzer has been extended
to use the algorithm described in [MH91a] and infer
variable freeness information.
It turns out that freeness information is very useful
for many reasons [MH91a]. In the translation process
it is essential for determining input/output arguments.
This we can show by simply expressing the information
required for the table in section 3.2 in terms of information directly available froni abstract interpretation.
In order to do this, recall, as defined in section 3.2, that
a program variable (or an argument) is output in a literal if the call to the corresponding predicate further
instantiates this variable, and it is input in a literal if
its state of instantiation is going to be checked in the
execution of the call for that literal. With these definitions in mind the following table shows how the input
or output character of variables can be decided in a
good number of cases based on the information directly
available from global analysis:
From the table we identify cases in which it is clear
that the variable is known not to be an input variable,
without any further analysis (i.e. when the variable is
Before
ground
free
semi 1
After
(ground)
free
semi
ground
semi 1
semi2
ground
Output?
Input?
no
no
yes
yes
no
yes
yes
*
II
-"
no
no
--.
?
?
free). Furthermore, we realize that if a variable is
known not to be an output variable then it doesn't need
to be renamed apart and it is not necessary to determine whether it is an input variable or not ("*" cases).
Reducing the cases where knowing if a variable is input
is quite useful since inferring whether a variable binding is needed or not requires additional analysis ("7"
cases) . This analysis seeks to decide if a variable is
crucial in clause selection or checking. Note that the
analysis has to be extended for every child procedure of
the one being analyzed.
Finally, we would like to also mention that combining
mode/type analysis (such as the one used in [SCWY91]
or [J an90j) with the accurate tracking of sharing and
freeness information of [MH91a] could be very helpful
in this process (improving the ability to more accurately
resolve different degrees of partial instantiation such as
the semil/ semi2 cases in the table above) and is part
of our plans for future work.
6
Performance Timings
This section presents some results on the timing of a
number of benchmarks in a prototype AKL system.
The AKL versions of the programs obtained through
automatic compile-time translation are compared with
versions specifically written for AKL. Timings for the
original Prolog versions are also included for comparison and also with the intention of identifying translation paradigms that help efficiency. With this aim
in mind, the set of benchmarks has been chosen so
that performance results are obtained for several different programming paradigms, and a number of different
translation issues are taken into account. The results
show that translation suffices in most cases, provided
state-of-art analysis technology is used.
Timings 8 have been done for the Prolog program
(compiled and interpreted), the AKL program obtained
from automatic translation and the "hand-writtenAKL" version. Execution until the first solution is obtained has been measured. Timings are an average of
ten consecutive executions done after a first one (not
timed) and are given in in Il?-ilisseconds, rounded up to
tens.
8SICStus 1.8 and a sequential AKL 0.0 prototype system,
made available by SICS, have been used.
767
We briefly introduce the programming paradigms
represented by each of the benchmarks used. qsort has
been translated in two ways, one that "folds" pruning
definitions, and another one that is able to "extend" the
cut to all clauses; the latter showing an advantage w.r.t.
the former. sort illustrates the advantage of being able
to detect that some cuts are not noisy (as opposed to
defaulting to noisy cut in every case). In fact, in this
case the translated version is slightly faster than the
hand-coded one.
For money we have used three different versions. In
the first version of the program the problem is solved
through extensive backtracking. In the second one the
ordering of goals is improved at the Prolog level. In
the third version the Prolog builtins are translated into
AKL specific ones. As in zebra the difference with the
"hand-written" version is in the use of the arithmetic
predicates: addition is programmed in the hand-coded
AKL version as illustrated by the sum/3 predicate,
Bum(X.Y.Z):- pluB(X.Y.zO). I. Z = ZOo
Bum(X.Y.Z):- minUB(Z.Y.XO). I. I = XO.
Bum(X.Y.Z):- minus(Z.X.YO). I. Y = YO.
in which the coroutining effect provides a "constraint
solving" behaviour.
Scanner is a program where AKL can take a
large advantage from concurrent execution and the
"determinate-first" principle, even without explicit control, and this is shown in the good performance of the
translated program. On the other hand, in triangle
and knights heavy use of special AKL features has been
made, through hand-optimization.
Prolog
compiled
qsortl
qsort
sort
money 1
money
moneyb
zebra
scanner
triangle
knights
qsort
matrix
hanoi
query
maps
30
30
20
66,590
47,790
47,790
8,550
1,407,450
3,140
79,960
Prolog
interpret.
290
290
50
520,190
391,190
391,190
43,740
8,838,000
7,260
855,049
Prolog
compiled
Prolog
interpret,
30
50
10
70
90
290
400
50
340
540
AKL
AKL
translated
"hand"
290
290
910
530
530
530
1,980
120
11,020
480
750
290
870
294,370
294,070
187,920
10,380
540
152,230
1,165,020
AKL
AKL
translat.
(encap.)
translat.
(direct)
290
610
70
370
140
290
690
310
1,600
2,240
In matrix, hanoi, query, and maps (and also qsort) ,
encapsulation of different programming paradigms has
been tried. The results show that encapsulating independent goals which are deterministic provides no improvement, but performance improves when they are
nondeterministic. Performance also improves in the
case of goals which act in producer/consumer fashion
( maps). These results suggest that AKL control similar to that of hand-coded versions can be imposed automatically for paradigms other than independence of
goals.
The automatic transformation achieves reasonably
good results when compared to. code specifically written for AKL, provided one takes into account that the
starting point is a Prolog program with little specification of control, and it is being compared to an AKL
program where control has been greatly optimized by
the programmer. The examples where the largest differences show are those in which the control imposed
by hand in the AKL program changes the complexity
of the algorithm, generally through smart use of suspension (as in the sum/3 predicate), something that
the transformation can not yet do automatically. However, the results also show that it would obviously be
desirable to extend the translation algorithms towards
implementing some of the smart forms of control that
can be provided by an AKL programmer.
When comparing with Prolog, both the interpreted
and compiled Prolog figures should be considered, as
the AKL system prototype used is somehow something
in between a compiler and an interpreter. The results show that a variable performance improvement
can be obtained whenever determinism is significant in
the problem (this is quite spectacular in scanner). Also,
the encapsulation transformation can help efficiency in
some cases. In any case the figures are of course preliminary and a more exhaustive study should clearly be
done after improvements in the translation prototype
and the AKL system, and also when an actual parallel
AKL system is available.
7
Conclusions
We have presented an algorithm for translating Prolog into AKL which in addition achieves independent
and-parallel execution of appropriate goals. We have
pointed out a series of non-trivial problems associated
with such a translation and proposed solutions for them
based on existing global analysis technology. We have
shown how to take advantage both of the AKL execution model (the Extended Andorra Model) and the
independence analysis performed in the context of &.
Prolog to produce a translation that allows the exploitation of all the forms of parallelism present in AKL
(dependent-and, independent-and, and or-parallelism)
while offering the user the familiar Prolog (or, in general, logic with minimal control) view (and debugging
ease!). Most importantly, this is done while preserving
or improving the user-perceived complexity of the program. The transformation is relevant even in the case
of a sequential AKL implementation since the reduction of stability checking which follows from knowledge
768
of goal independence can already be of significant advantage, given the expected cost· of stability tests. In
the case of a parallel AKL implementation the transformation amounts to a form of automatic parallelization
and search space reducing implementation for Prolog
programs which. exploits the EAM, and imposes a particular form of control on it.
A sequential AKL implementation is already being
developed at SICS with a first prototype already running. The translator itself is also being implemented
and a preliminary version is already integrated with
the &-Prolog system compilation tools. The combination has been tested and some sample programs executed successfully on AKL, and compared with their
specific AKL counterparts. Further work is expected
in the translator as better translation algorithms are
developed to take more specific advantage of the AKL
control facilities, in particular coroutining, in more accurately detecting input and output variables, in adapting the algorithms to possible evolutions of the AKL, in
evaluating the performance of the translated programs
with respect to Prolog, and in the formal proof of the
correctness of the transformation and its preservation
of user expected computation size, the latter point being supported already in part by the basic results on
independent and-parallelism.
Programming Languages, pages 238-252,
1977.
[Cra90]
[DeG87]
D. DeGroot. Restricted AND-Parallelism
and Side-Effects. In International Symposium on Logic Programming, pages 80-89.
San Francisco, IEEE Computer Society,
August 1987.
[Gre85]
S. Gregory. Design, Application and Implementation of a Parallel Logic Programming Language. PhD thesis, Imperial College of Science and Technology, London,
England, 1985.
[GSCYH91] G. Gupta, V. Santos-Costa, R. Yang, and
M. Hermenegildo. IDIOM: A Model Intergrating Dependent-, Independent-, and
Or-parallelism. Technical report, University of Bristol, March 1991.
[Her90]
M. Hermenegildo. Compile-time Analysis
Requirements for the Extended Andorra
Model. In Sverker Jansson, editor, Parallel
Logic Programming Workshop, Box 1263,
S-163 13 Spanga, SWEDEN, June 1990.
SICS.
[HG90]
M. Hermenegildo and K. Greene. &-Prolog
and its Performance: Exploiting Independent And-Parallelism. In 1990 International Conference on Logic Programming,
pages 253-268. MIT Press, June 1990.
[HJ90]
S. Haridi and S. Janson. Kernel Andorra
Prolog and its Computation Model. In
Proceedings of the Seventh International
Conference on Logic Programming. MIT
Press, June 1990.
[HR89]
M. Hernienegildo and F. Rossi. On the
Correctness and Efficiency of Independent
And-Parallelism in Logic Programs. In
1989 North American Conference on Logic
Programming, pages 369-390. MIT Press,
October 1989.
[HR90]
M. Hermenegildo and F. Rossi. Non-Strict
Independent And-Parallelism. In 1990 International Conference on Logic Programmmg, pages 237-252. MIT Press, June
1990.
Acknow ledgements
The authors would like to thank Seif Haridi, Sverker
Jansson, Johan Montelius, and Mats Carlsson of SICS,
and David H.D. Warren, Vitor Santos Costa, and Gopal
Gupta of U. of Bristol for many useful discussions. Also
thanks to SICS for making the prototype AKL implementation available for experimentation. This work
has been performed in the context of the ESPRIT
"PEPMA" project and has greatly benefited from discussions with other members of the partner institutions,
most significantly from SICS, U. of Bristol, and U.P.
Madrid.
References
[AK90]
K.A.M. Ali and R. Karlsson. The Muse
Or-Parallel Prolog Model and its Performance. In 1990 North American Conference on Logic Programming. MIT Press,
October 1990.
[CC77]
P. Cousot and R. Cousot. Abstract Interpretation: A Unified Lattice Model for
Static Analysis of Programs by Construction or Approximation of Fixpoints. In
Conf. Rec. 4th Acm Symp. on Prin. of
.Jim Crammond. Scheduling and VmabIe
Assignment in the Parallel Parlog implementation. In 1990 North American Conference on Logic Programming. MIT Press,
, 1990.
769
[HS86]
A. Houri and E. Shapiro. A sequential abstract machine for flat concurrent prolog.
Technical Report CS86-20, Dept. of Computer Science, The Weizmann Institute of
Science, Rehovot 76100, Israel, July 1986.
[MH91b]
K. Muthukumar and M. Hermenegildo.
Compile-time Derivation of Variable Dependency Using Abstract Interpretation.
Journal of Logic Programming, 1991. To
appear (also published as Technical Report FIM 59.1/IA/90, Computer Science
Dept, Universidad Politecnica de Madrid,
Spain, Aug 1990).
[Sar89]
Vijay A. Saraswat. Concurrent Constraint
Programming Languages.
PhD thesis,
Carnegie Mellon, Pittsburgh, 1989. School
of Computer Science.
[SCWY90]
V. Santos-Costa, D.H.D. Warren, and
R. Yang. Andorra-I: A Parallel Prolog
System that Transparently Exploits both
And- and Or-parallelism. In Proceedings
of the Srd. ACM SIGPLAN Symposium
on Principles and Practice of Parallel Programming. ACM, April 1990.
[SCWY91]
L. Kale. Parallel Execution of Logic Programs: the REDUCE-OR Process Model.
In Fourth International Conference on
Logic Programming, pages 616-632. Melbourne, Australia, May 1987.
V. Santos-Costa, D.H.D. Warren, and
R. Yang. The Andorra-I Preprocessor:
Supporting Full Prolog on the Basic Andorra Model. In 1991 International Conference on Logic Programming, pages 443456. MIT Press, June 1991.
[Ued87a]
E. Lusk et. al. The Aurora Or-Parallel
Prolog System. In International Conference on Fifth Generation Computer Systems. Tokyo, November 1988.
K. Ueda. Guarded Horn Clauses. In E.Y.
Shapiro, editor, Concurrent Prolog: Collected Papers, pages 140-156. MIT Press,
Cambridge MA, 1987.
[Ued87b]
K. Ueda. Making Exhaustive Search Programs Deterministic.
New Generation
Computing, 5(1):29-44, 1987.
[War]
D. H. D. Warren. The Andorra Principle.
Presented at Gigalips workshop, 1987. Unpublished.
[War90]
D. H. D. Warren. The Extended Andorra Model with Implicit Control. In
Sverker Jansson, editor, Parallel Logic
Programming Workshop, Box 1263, S-163
13 Spanga, SWEDEN, June 1990. SICS.
[WHD88]
R. Warren, M. Hermenegildo, and S. Debray. On the Practicality of Global Flow
Analysis of Logic Programs. In Fifth International Conference and Symposium on
Logic Programming. MIT Press, August
1988.
[IMT87]
N. Ichiyoshi, T. Miyazaki, and K. Taki. A
Distributed Implementation of Flat GHC
on the Multi-PSI. In Fourth International
Conference on Logic Programming, pages
257-275. University of Melbourne, MIT
Press, May 1987.
[Jan90]
G. Janssens. Deriving Run-time Properties of Logic Programs by means of Abstract Interpretation. PhD thesis, Dept. of
Computer Science, Katholieke Universiteit
Leuven, Belgium, March 1990.
[JH90]
S. Janson and S. Haridi. Programming
Paradigms of the Andorra Kernel Language. Technical Report PEPMA Project,
SICS, Box 1263, S-164 28 KISTA, Sweden,
November 1990. Forthcoming.
[JH91]
Sverker Janson and Seif Haridi. Programming Paradigms of the Andorra Kernel
Language. In 1991 International Logic
Programming Symposium, pages 167-183.
MIT Press, 1991.
[KaI87]
[Lus88]
[MH89]
[MH90]
[MH91a]
Abstract Interpretation. In 1991 International Conference on Logic Programming.
MIT Press, June 1991.
K. Muthukumar and M. Hermenegildo.
Efficient Methods for Supporting Side Effects in Independent And-parallelism and
Their Backtracking Semantics. In 1989 International Conference on Logic Programming. MIT Press, June 1989.
K. Muthukumar and M. Hermenegildo.
The CDG, UDG, and MEL Methods for
Automatic Compile-time Parallelization of
Logic Programs for Independent Andparallelism. In 1990 International Conference on Logic Programming, pages 221237. MIT Press, June 1990.
K. Muthukumar and M. Hermenegildo.
Combined Determination of Sharing and
Freeness of Program Variables Through
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE.
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
770
Recomputation based Implementations of And-Or Parallel Prolog
Gopal Gupta t
Department of Computer Science
Box 30001, Dept. 3CU,
New Mexico State University
Las Cruces, NM 88003-0001
gupta a & b), and assuming that a and b have 3 solutions each (to be executed in or-parallel form) and the query is ?- q, then
the corresponding and-or tree would appear as shown
in figure 1.
b is entirely recomputed (in fact, the tree could contain
up to 9 c-nodes, one for each combination of solutions
of a and b). To represent the fact that a parallel conjunction can have multiple solutions we add a branch
point (choice point) before the different composition
nodes. Note that c-nodes and branch points serve purposes very similar to the Parcall frames and markers of
the RAP-WAM [H86, HG90]. The C-tree can represent
or- and independent and-parallelism quite naturallyexecution of goals in a c-node gives rise to independent
and-parallelism while parallel execution of untried alternatives gives rise to or-parallelism. t.
q
Key:
q
•
CJ
(a
&
b)
c:::::J
Choice point
Share Node
Composition Node
Key:
•
Choice point
al
a2
a3
b3
Figure 1: And-Or Tree
One problem with such a traditional and-or tree is
that bindings made by different alternatives of a are not
visible to different alternatives of b, and vice-versa, and
hence the correct environment has to be created before
the continuation goal of the parallel conjunction can be
executed. Creation of the proper environments requires
a global operation, for example, Binding Array loading
in AO-WAM [GJ89, G91a]' the complex dereferencing
scheme ofPEPSys [BK88], or the "global forking" operation of the Extended Andorra Model [W90]. To eliminate this possible source of overhead in our model, we
extend the traditional and-or tree so that the various
or-parallel environments that simultaneously exist are
always separate.
The extension essentially uses the idea of recomputing independent goals of a parallel conjunction of
&-Prolog [HG90] (and Prolog!). Thus, for every alternative of a, the goal b is computed in its entirety.
Each separate combination of a and b is represented
by what we term as a composition node (c-node for
brevity). Thus, each composition node in the tree corresponds to a different solution for the parallel conjunction, i.e., a different "continuation". Thus the extended
tree, called the Composition-tree (C-tree for brevity),
for the above query might appear as shown in figure
2-for each alternative of the and-parallel goal a, goal
bi
b2
b2
Figure 2: Composition Tree
Notice the topological similarity of the C-tree with
the purely or-parallel tree shown in figure 3 for the program above. Essentially, branches that are "shared" in
the purely or-parallel tree (i.e. that are "common",
even though different binding environments may still
have to be maintained -we will refer to such branches
and regions for simplicity simply as "shared") are also
shared in the C-tree. This sharing is represented by
means of a share-node, which has a pointer to the
shared branch and a pointer to the composition node
where that branch is needed (figure 2). Due to sharing the subtrees of some independent and-parallel goals
maybe spread out across different composition nodes.
Thus, the subtree of goal a is spread out over c-nodes
Cl, C2 and C3 in the C-tree of figure 2, the total amount of program-related work being essentially
maintained.
t
In fact, a graphical tool capable of representing this tree
has shown itself to be quite useful for implementors and users of
independent and- and or-parallel systems [eGglj.
774
that the whole environment is visible to each member
processor of the team. In copying is used, then processors in the team share the copy. Note that in the
limit case there will be only one processor per team.
Also note that despite the team arrangement a processor is free to migrate to another team as long as it is
not the only one left in the team. Although a fixed
assignment of processors to teams is possible a flexible scheme appears preferable. This will be discussed
in more detail in section 4.3. The concept of teams of
processors has been successfully used in the Andorra-I
system [SW9I], which extends an or-parallel system to
accommodate dependent and-parallelism.
Key:
-
indicates end
of a's branch
b3
bl
b2
b2
Figure 3: Or-Parallel Tree
4.1 And-Or Parallelism & Teams of Processors
We will present some of the implementation isuues
from the point of view of extending an or-parallel system to support independent and-parallelism. When a
purely or-parallel model is extended to exploit independent and-parallelism then the following problem arises:
at the end of independent and-parallel computation,
all participating processors should see all the bindings
created by each other. However, this is completely opposite to what is needed for or-parallelism where processors working in or-parallel should not see the (conditional) bindings created by each other. Thus, the
requirements of or-parallelism and independent andparallelism seem anti-thetical to each other. The solutions that have been proposed range from updating
the environment at the time independent and-parallel
computations are combined [RK89, GJ89] to having a
complex dereferencing scheme [BK88]. All of these operations have their cost.
We contend that this cost can be eliminated by organising the processors into teams. Independent andparallelism is exploited among processors within a team
while or-parallelism is exploited among teams. Thus a
processor within a team would behave like a processor
in a purely and-parallel system while all the processors
in a given team would collectively behave like a processor in a purely or-parallel system. This entails that
all processors within each team share the data structures that are used to maintain the separate or-parallel
environments. For example, if binding arrays are being used to represent multiple or-parallel environments,
then only one binding array should exist per team, so
4.2. C-tree & And-Or Parallelism
The concept of organising processors into teams
also meshes very well with C-trees. A team can work on
a c-node in the C-tree-each of its member processors
working on one of the independent and-parallel goal in
that c-node. We illustrate this by means of an example.
Consider the query corresponding to the and-or tree of
figure 1. Suppose we have 6 processors PI, P2, ... ,
P6, grouped into 3 teams of 2 processors each. Let us
suppose PI and P2 are in team I, P3 and P4 in team 2,
and P5 and P6 in team 3. We illustrate how the C-tree
shown in figure 2 would be created.
Execution commences by processor PI of team I
picking up the query q and executing it. Execution continues like normal sequential execution until the parallel conjunction is encountered, at which point a choice
point node is created to keep track of the information
about the different solutions that the parallel conjunction might generate. A c-node is then created (node
CI in figure 2). The parallel conjunction consists of
two and-parallel goals a and b, of which a is picked
up by processor PI, while b is made available for andparallel execution. The goal b is subsequently picked
up by processor P2, teammate of processor PI. Processor PI and P2 execute the parallel conjunction in
and-parallel producing solutions a1 and b1 respectively.
In the process they leave choice points behind. Since
we allow or-parallelism below and-parallel goals, these
untried alternatives can be processed in or-parallel by
other teams. Thus the second team, consisting of P3
and P4 picks up the untried alternative corresponding
to a2, and the third team, consisting of P5 and P6,
picks up the untried alternative corresponding to a3.
Both these teams create a new c-node, and restart the
execution of and-parallel goal b (the goal to the right
of goal a): the first processor in each team (P3 and
P5, respectively) executes the alternative for a, while
the second processor in each team (P4 and P6, respectively) executes the restarted goal b. Thus, there are
775
3 copies of b executing, one for each alternative of a.
Note that the nodes in the subtree of a, between c-node
Cl and the choice points from where untried alternatives were picked, are "shared" among different teams
(in the same sense as the nodes above the parallel conjunction are-different binding environments still have
to be maintained).
Since there are only three teams,. the untried alternatives of b have to be executed by backtracking. In
the C-tree, backtracking always takes place from the
right to mimic Prolog's behaviour-goals to the right
are completely explored before a processor can backtrack inside a goal to the left. Thus, if we had only
one team with 2 processors, then only one composition
node would actually need to be created, and all solutions would be found via backtracking, exactly as in
&-Prolog, where only one copy of the Parcall frame exists [H86, HG90]. On the other hand if we had 5 teams
of 2 processors each, then the C-tree could appear as
shown in fig 4. In figure 4, the 2 extra teams steal the
untried alternatives of goal b in c-node C3, This results
in 2 new c-nodes being created, C4 and C5 and the subtree of goal bin c-node C3 being spread across c-nodes
C3, C4 and C5. The topologically equivalent purely
or-parallel tree of this C-tree is still the one shown in
figure 3. The most important point to note is that
new c-nodes get created only if there are resources to
execute that c-node in parallel. Thus, the number of cnodes in a C-tree can vary depending on the availability
of processors.
point, which falls in the scope of a parallel conjunction,
only after at least one solution has been found for each
goal in that parallel conjunction. Thus, c-nodes C2,
C3, C4 and C5 (fig 4) will be created only after the
first team (consisting of PI and P2) succeeds in finding
solutions ai and bi respectively. In this situation if b
were to fail, then the c-node Cl will fail, resulting in
the failure of the whole parallel conjunction.
4.3. Processor Scheduling
Since our abstract model of C-trees is dependent
upon the number of processors available, some of the
processor scheduling issues can be determined at an
abstract level, without going into the details of a concrete realization of the C-trees. As mentioned earlier,
teams of processors are used to carry out or-parallel
work while individual processors within a team perform
and-parallel work. Since and-parallel work is shared
within a team, a processor can in principle steal andparallel work only from members of its own team. Orparallel work is shared at the level of teams, thus only
an idle team can steal an untried alternative from a
choice point. An idle processor will first look for andparallel work in its own team. If no and-parallel work is
found, it can decide to migrate to another team where
there is work, provided it is not the last remaining processor in that team. If no such team exists it can start
a new team of its own, perhaps with idle processors
of other teams, and the new team can steal or-parallel
work. One has to carefully balance the number of teams
and the number of processors in each team, to fully exploit all the and- and or-parallelism available in a given
Prolog program t .
5. Environment Representation
bl
b2
Choice point
[J
Share Node
Composition Node
b2
'The composition·nodes CI, C2 and C3 are created one each
!:
::"~w~=~v;J:': ::':J..~~:llo':! ilie,!:!C~f
and·parallel goal b in composition node C3 are picked by
So far we have described and-or parallel execution
with recomputation at an abstract level. We have not
addressed the crucial problem of environment representation in the C-tree. In this section we discuss how to
extend the Binding Arrays (BA) method [W84,W87]
and the Stack-copying [AK90] methods to solve this
problem. These extensions enable a team of processors
to share a single BA without wasting too much space.
others. The equivalent purely or-parallel tree is shown in fig 2.
Figure 4: C-tree for 5 Teams
It might appear that intelligent backtracking, that
accompanies independent and-parallelism in &-Prolog,
is absent in our abstract and-or parallel C-tree model.
This is because if b were to completely fail, then this
failure will be replicated in each of the three copies of b.
We can incorporate intelligent backtracking by stipulating that an untried alternative be stolen from a choice
5.1 Sharing vs Non-Sharing
In an earlier paper [GJ90] we argued that environment representation schemes that have constant-time
task creation and constant-time access to variables, but
non-constant time task-switching, are superior to those
t
Some of the 'flexible scheduling' techniques that have been
developed for the Andorra-I system [D91] can be directly adapted
for optimal distribution of or- and and-parallel work.
776
methods which have non-constant time task creation
or non-constant time variable-access. The reason being
that the number of task-creation operations and the
number of variable-access operations are dependent on
the program, while the number of task-switches can be
controlled by the implementor by carefully designing
the work-scheduler.
The schemes that have constant-time task creation
and variable-access can be further subdivided into those
that physically share the execution tree, such as Binding Arrays scheme [W84, W87, LW90] and Versions
Vectors [HC87] scheme, arid those that do not, such as
MUSE [AK90] and Delphi [CA88]. Both these kinds of
schemes have their advantages. The advantage of nonsharing schemes such as Muse and Delphi are that less
synchronization is needed in general since each processor has its own copy of the tree and thus there is less
parallel overhead [AK90]. This also means that they
can be implemented on non-shared memory machines
more efficiently. However, operations that may require
synchronization and voluntary suspension such as side
effects, cuts and speculative scheduling are more overhead prone to implement. When an or-parallel system reaches a side effect which is in a non-leftmost
or-branch, it has two choices: (i) it can suspend the
current branch and switch to some other node where
there is work available, the suspended branch would be
woken up when it becomes leftmost; or (ii) it can busywait at the current branch until it becomes left most.
In case (i) an or-parallel system that does not share
the execution tree, such as Muse, will have to save its
current execution stack in a scratch memory-area since
switching to a new node means that the current stack
would be overwritten due to copying of the branches
corresponding to the new node. Even if modern sophisticated multiprocessor Operating Systems may allow
some memory-saving optimizations, a substantial memory overhead may still be presentt. The same holds for
case (ii), where a modern OS may manage to avoid
busy-waiting, but at the cost of extra memory.
The essential conclusion is that for some applications (those that require processors to synchronize often riue to presence of a large number of side-effects
and cuts) environment representation schemes which
share the or-tree are better, and for some other applications (those that require processors to synchronize
less often) schemes which maintain an independent ortree per processor are better. With this observation
in mind we have extended both types of environment
t
Experimental results show that processors may voluntarily
suspend as much as 10 to 100s of times for large sized programs
[S191j.
representation schemes to accommodate independent
and-parallelism with recomputation of goals. We first
describe an extension of the Binding Arrays scheme,
and then an extension of the stack-copying technique.
Due to space limitations the essence of both approaches
will be presented rather than specifying them in detail
as full models, which is left as future work.
5.2. Environment Representation using BAs
Recall that in the binding-array method [W84,
W87] an offset-counter is maintained for each branch of
the or-parallel tree for assigning offsets to conditional
variables (CVs)t that arise in that branch. The 2 main
properties of the BA method for or-parallelism are the
following:
(i) The offset of a conditional variable is fixed for its
entire life.
(ii) The offsets of two consecutive conditional variables
in an or-branch are also consecutive.
The implication of these two properties is that conditional variables get allocated space consecutively in
the binding array of a given processor, resulting in optimum space usage in the BA. This is important because
a large number of conditional variables might need to
be created at runtimei.
a
BA
\-
~~\f
1!0>
c1
Fig (i): Part of a C-tree
Figure (ii): Optimal Space Allocation in the BA
Figure 5: BAs and Independent And-Parallelism
In the presence of independent and-parallel goals,
each of which has multiple solutions, maintaining contiguity in the BA can be a problem, especially if processors are allowed (via backtracking or or-parallelism)
to search for these multiple solutions. Consider a goal
with a parallel conjunction: a, (true => b 8& c), d.
A part of its C-tree is shown in figure 5(i) (the figure
t
Conditional variables are variables that receive different
bindings in different environments [GJ90j.
i
For instance, in Aurora [LW90j about 1Mb of space is allo-
cated for each BA.
777
also shows the number of conditional variables that are
created in different parts of the tree). If band e are
executed in independent and-parallel by two different
processors PI and P2, then assuming that both have
private binding arrays of their own, all the conditional
variables created in branch b-b1 would be allocated
space in BA of PI and those created in branch of ee1 would be allocated space in BA of P2. Likewise
conditional bindings created in b would be recorded in
BA of PI and those in e would be recorded in BA of
P2. Before PI or P2 can continue with d after finding
solutions b1 and e1, their binding arrays will have to
be merged somehow. In the AO-WAM [GJ89, G91a]
the approach taken was that one of PI or P2 would
execute d after updating its Binding Array with conditional bindings made in the other branch (known as
the the BA loading operation). The problem with the
BA loading operation is that it acts as a sequential bottleneck which can delay the execution of d, and reduce
speedups. To get rid of the BA loading overhead we
can have a common binding array for PI and P2, so
that once PI and P2 finish execution of band e, one of
them immediately begins execution of d since all conditional bindings needed would already be there in the
common BA. This is consistent with our discussion in
section 4.1 about having teams of processors where all
processors in a team would share a common binding
array.
However, if processors in a team share a binding
array, then backtracking can cause inefficient usage of
space, because it can create large unused holes in the
BA. This is because processors in a team, that are working on different independent and-parallel branches, will
allocate offsets in the binding array concurrently. The
exact number of offsets needed by each branch cannot
be allocated in advance in the binding array because
the number of conditional variables that will arise in a
branch cannot be determined a priori. Thus, the offsets
of independent and-branches will overlap: for example,
the offsets of kl CVs in branch bl will be intermingled with those of k2 CVs in branch cl. Due to overlapping offsets, recovery of these. offsets, when a processor backtracks, requires tremendous book-keeping.
Alternatively, if no book-keeping is done, it leads to
large amount of wasted space that becomes unusable
for subsequent offsets (see [GS92, G91, G91a] for more
details).
5.2.1. Paged Binding Array
To solve the above problem we divide the binding
array into fixed sized segments. Each conditional variable is bound to a pair consisting of a segment number
and an offset within the segment. An auxiliary array
keeps track of the mapping between the segment number and its starting location in the binding array. Dereferencing CVs now involves double indirection: given a
conditional variable bound to (i, 0), the starting address
of its segment in the BA is first found from location i
of the auxiliary array, and then the value at offset 0
from that address is accessed. A set of CV s that have
been allocated space in the same logical segment (i.e.
CV s which have common i) can reside in any physical page in the BA, as long as the starting address of
that physical page is recorded in the ith slot in the
auxiliary array. Note the similarity of this scheme to
memory management using paging in Operating Systems, hence the name Paged Binding Array (PBA)t.
Thus a segment is identical to a page and the auxiliary array is essentially the same as a page table. The
auxiliary and the binding array are common to all the
processors in a team. From now on we will refer to the
BA as the Paged Binding Array (PBA), the auxiliary
array as the Page Table (PT), and our model of and-or
parallel execution as the PBA modelt.
Every time execution of an and-parallel goal in a
parallel conj unction is started by a processor, or the
current page in the PBA being used by that processor
for allocating CVs becomes full, a page-marker node
containing a unique integer id i is pushed onto the
trail-stack. The unique integer id is obtained from a
shared counter (called a pt_eounter). There is one
such counter per team. A new page is requested from
the PBA, and the starting address of the new page is
recorded in the ith location of the Page Table. i is referred to as the page number of the new page. Each
processor in a team maintains an offset-counter, which
is used to assign offsets to CV s within a page. When a
new page is obtained by a processor, the offset-counter
is reset. Conditional variables are bound to the pair , where i is the page number, and 0 is the value of the
offset-counter, which indicates the offset at which the
value of the CV would be recorded in the page. Every
time a conditional variable is bound to such a pair, the
offset counter 0 is incremented. If the value of 0 becomes greater than K, the fixed page size, a new page
is requested and new page-marker node is pushed.
t Thanks to David H. D. Warren for pointing out this
similarity.
t
A paged binding array has also been used in the ElipSys
system of ECRC [VX91], but for entirely different reasons. In
ElipSys, when a choice point is reached the BA is replicated for
each new branch. To reduce the overhead of replication, the BA
is paged. Pages of the BA are copied in the children branches
on demand, by using a "copy-on-write" strategy. In ElipSys,
unlike our model, paging is not necessitated by independent andparallelism.
778
A list of free pages in the PBA is maintained separately (as a linked list). When a new page is requested,
the page at the head of the list is returned. When a
page is freed by a processor, it is inserted in the freelist. The free-list is kept ordered so that pages higher
up in the PBA occur before those that are lower down.
This way it is always guaranteed that space at the top
of the PBA would be used first, resulting in optimum
space usage of space in the PBA.
While selecting or-parallel work, if the untried alternative that is selected is not in the scope of any
parallel conjunction, then task-switching is more or
less like in purely or-parallel system (such as Aurora),
modulo allocation/ deallocation of pages in the PBA.
If, however, the untried alternative that is selected is
in the and-parallel goal g of a parallel conjunction,
then the team updates its PBA with all the conditional
bindings created in the branches corresponding to goals
which are to the left of g. Conditional bindings created
in g above the choice point are also installed. Goals
to the right of g are restarted and made available to
other member processors in the team for and-parallel
execution. Notice that if a C-tree is folded into an
or-parallel tree according to the relationship shown in
figures 2 and 3, then the behaviour of (and the number of conditional bindings installed/ deinstalled during) task switching would closely follow that of a purely
or-parallel system such as Aurora, if the same scheduling order is followed.
Note that the paged binding array technique is a
generalization of the environment representation technique of AO-WAM [GJS9, G91a], hence some of the
optimizations [GJ90a] developed for the AO-WAM,
to reduce the number of conditional bindings to installed/deinstalled during task-switching, will also apply to the PBA model. Lastly, seniority of conditional
variables, which needs to be known so that "older" variables never point to "younger ones" , can be easily determined with the help of the pair. Older variables
will have a smaller value of i; and if i is the same, then
a smaller value of o.
More details on Paged Binding Arrays can be
found in [GS92, G91].
5.3. The Stack Copying Approach
An alternative approach to represent multiple environments in the C-tree is to use explicit stack-copying.
Rather than sharing parts of the tree, the shared
branches can be explicitly copied, using techniques similar to those employed by the MUSE system [AK90].
To briefly summarize the MUSE approach, whenever a processor PI wants to share work with another
processor P2- it selects an untried alternative from one
of the choice points in P2's stack. It then copies the
entire stack of P2, backtracks up to that choice point
to undo all the conditional bindings made below that
choice point, and then continues with the execution
of the untried alternative. In this approach, provided
there is a mechanism for copying stacks, the only cells
that need to be shared during execution are those corresponding to the choice points. Execution is otherwise completely independent (modulo side-effect synchronization) in each branch and identical to sequential
execution.
If we consider the presence of and-parallelism in
addition to or-parallelism, then, depending on the actual types of parallelism appearing in the program and
the nesting relation between them, a number of relevant
cases can be distinguished. The simplest two cases are"
of course those where the execution is purely or-parallel
or purely and-parallel. Trivially, in these situations
standard MUSE and &-Prolog execution respectively
applies, modulo the memory management issues, which
will be dealt with in section 5.3.2.
Of the cases when both and- and or-parallelism
are present in the execution, the simpler one represents
executions where and-parallelism appears "under" orparallelism but not conversely (i.e. no or-parallelism
appears below c-nodes). In this case, and again modulo memory management issues, or-parallel execution
can still continue as in Muse while and-parallel execution can continue like &-Prolog (or in any other local
way. The only or-parallel branches which can be picked
up appear then above any and-parallel node in the tree.
The process of picking up such branches would be identical to that described above for MUSE.
In the presence of or-parallelism under andparallelism the situation becomes slightly more complicated. In that case, an important issue is carefully
deciding which portions of the stacks to copy. When
an untried alternative is picked from a choice-point,
the portions that are copied are precisely those that
have been labelled as "shared" in the C-tree. Note that
these will be precisely those branches that will also be
copied in an equivalent (purely or-parallel) MUSE execution. In addition, precisely those branches will be
recomputed that are also recomputed in an equivalent
(purely and-parallel) &-Prolog execution.
Consider the case when a processor selects an untried alternative from a choice point created during execution of a goal gj in the body of a goal which occurs
after a parallel conjunction where there has been andparallelism above the the selected alternative, but all
the forks are finished. Then not only will it have to copy
779
all the stack segments in the branch from the root to
the parallel conjunction, but also the portions of stacks
corresponding to all the forks inside the parallel conjunction and those of the goals between the end of the
parallel conjunction and 9j. All these segments have in
principle to be copied because the untried alternative
may have access to variables in all of them and may
modify such variables.
On the other hand, if a processor selects an untried
alternative from a choice point created during execution
of a goal 9i inside a parallel conjunction, then it will
have to copy all the stack segments in the branch from
the root to the parallel conjunction, and it will also
have to copy the stack segments corresponding to the
goals 91 ... 9i-1 (i.e. goals to the left). The stack segments up to the parallel conjunction need to be copied
because each different alternative within the 9iS might
produce a different binding for a variable, X, defined
in an ancestor goal of the parallel conjunction. The
stack segments corresponding to goals 91 through 9i-1
have to be copied because the different alternatives for
the goals following the parallel conj unction might bind
a variable defined in one of the goals 91 .. . 9i-1 differently.
5.3.1. Execution with Stack Copying
We now illustrate by means of a simple example
how or-parallelism can be exploited in non deterministic and-parallel goals through stack copying. Consider
the tree shown in figure I that is generated as a result.of
executing a query q containing the parallel conjunction
(true => a(X) &' b(Y». For the purpose of illustration we assume that there is an unbounded number of
processors, PI ... Pn.
Execution begins with processor PI executing the
top level query q. When it encounters the parallel conjunction, it picks the subgoal a for execution, leaving
b for some other processor. Let's assume that Processor P2 picks up goal b for execution (figure 6.(i)). As
execution continues PI finds solution ai for a, generating 2 choice points along the way. Likewise, P2 finds
solution bi for b.
Since we also allow for full or-parallelism within
and-parallel goals, a processor can steal the untried alternative in the choice point created during execution
of a by PI. Let us assume that processor P3 steals this
alternative, and sets itself up for executing it. To do
so it copies the stack of processor PI up to the choice
point (the copied part of the stack is shown by the dotted line; see index at the bottom of figure 6), simulates
failure to remove conditional bindings made below the
choice point, and restarts the goals to its right (i.e. the
goal b). Processor P4 picks up the restarted goal band
finds a solution bi for it. In the meantime, P3 finds the
solution a2 for a (see figure 6.(ii)). Note that before P3
can commence with the execution of the untried alternative and P4 can execute the restarted goal b, they
have to make sure that any conditional bindings made
by P2 while executing b have also been removed. This
is done by P3 (or P4) getting a copy of the trail stack
of P2 and resetting all the variables that appear in it.
Like processor P3, processor P5 steals the untried
alternative from the second choice point for a, copies
the stack from PI and restarts b, which is picked up
by processor P6. As in MUSE, the actual choice point
frame is shared to prevent the untried alternative in
the second choice point from being executed twice (once
through PI and once through P3). Eventually, P5 finds
the solution a3 for a and P6 finds the solution bi for b.
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• y ............................ ..
Pl
P2
P3
P4
P5
I
Ie!~'!:
a1
bl
(~
,
b)
/0J
\
a2
(i)
P6
(ii)
(iii)
•
• .. • .. • ........• ...... • ........ ·4 ..• ..•• ....•• .......... •• .. ..:1
P7
(a
J.
/0)
i~
al
(iv)
,
b)
(a
,
P8
b)
(a
,
P9
b)
\Ie I~\ \Ie \: .: \lei
b2
~2
(v)
b2
a3 (vi) b2
i
••••••••••••••••••••••••••••••••• _ •••••••••••••••••••• • •••••••••••• 1••••••••••••••••••••••••••••••
branch executed locally
.......•..............••.
..............··..·0
copied branch
embryonic branch (untried alternative)
Figure 6: Parallel Execution with Stack Copying
Note that now 3 copies of b are being executed,
one for each solution of a. The process of finding the
solution bi for b leaves a choice point behind. The
untried alternative in this choice point can be picked
up for execution by another processor. This is indeed
what is done by processors P7, P8 and P9 for each copy
of b that is executing. These processors copy the stack
of P2, P4 and P6, respectively, up to the choice point.
780
The stack segments corresponding to goal a are also
copied (figures 6.(iv), 6.(v), 6.(vi)) from processors PI,
P3 and P5, respectively. The processors P7, P8 and P9
then proceed to find the solution b2 for b.
Execution of the alternative corresponding to the
solution b2 in the three copies of b produces another choice-point. The untried alternatives from these
choice points can be picked up by other idle teams in a
manner similar to that for the previous alternative of b
(not shown in figure 6). Note that if there were no processors available to steal the alternative (corresponding
to solution b3) from b then this solution would have
been found by processors P7, P8 and P9 (in the respective copies of b that they are executing) through
backtracking as in &-Prolog. The same would apply
if no processors were available to steal the alternative
from b corresponding to solution b2.
5.3.2. Managing the Address Space
While copying stack segments we have to make
sure that pointers in copied portions do not need re,location. In Muse this is ensured by having a physically separate but logically identical memory spaces
for each of the processors [AK90]. In the presence of
and-parallelism and teams of processors a more sophisticated approach has to be taken.
All processors in a team share the same logical
address space. If there are n processors in the team the
address space is divided up into m memory segments
(m ;:::: n). The memory segments are numbered from I
to m. Each processor allocates its heap, local stacks,
trail etc. in one of the segments (this also implies that
the maximum no. of processors that a team can have is
m). Each team has its own independent logical address
space, identical to the address space of all other teams.
Also, each team has an identical number of segments.
Processors are allowed to switch teams so long as there
is a memory segment available for them to allocate their
stacks in the address space of the other team.
Consider the scenario where a choice point, which
is not in the scope of any parallel conjunction, is picked
up by a team Tq from the execution tree of another
team Tp. Let x be the memory segment number in
which this choice point lies. The root of the Prolog execution tree must also lie in memory segment x since
the stacks of a processor cannot extend into another
memory segment in the address space. Tq will copy
the stack from the xth memory segment of Tp into its
own xth memory segment. Since the logical address
space of each team is identical and is divided into identical segments, no pointer relocation would be needed.
Failure is then simulated and the execution of the un-
tried alternative of the stolen choice point begun. In
fact, the copying of stacks can be done incrementally
as in MUSE [AK90] (other optimizations in MUSE to
save copying should apply equally well to our model,
and are left as future work).
Now consider the more interesting scenario where
a choice point, which lies within the scope of a parallel
conjunction, is picked up by a processor in a team Tq
from another team Tp. Let this parallel conjunction be
the CGE (true =} gl& ... &gn) and let gi be the goal
in the parallel conjunction whose sub-tree contains the
stolen choice point. Tq needs to copy the stack segments corresponding to the computation from the root
up to the parallel conjunction and the stack segments
corresponding to the goals gl through gi. Let us assume these stack segments lie in memory segments of
team Tp and are numbered Xl, ... , xk. They will be'
copied into the memory segments numbered Xl, ... , Xk
of team Tq. Again, this copying can be incremental. Failure would then be simulated on gi. We also
need to remove the conditional bindings made during
the execution of the goal gi+1 ... gn by team Tp. Let
Xk+1 ... Xl be the memory segments where gi+1 .. , gn
are executing in team Tp. We copy the trail stacks of
these segments and reinitialize (i.e. mark unbound) all
variables that appear in them. The copied trail stacks
can then be discarded. Once removal of conditional
bindings is done the execution of the untried alternative of the stolen choice point is begun. The execution
of the goals gi+1 ... gn is restarted and these can be executed by other processors which are members of the
team. Note that the copied stack segments occupy the
same memory segments as the original stack segments.
The restarted goals can however be executed in any of
the memory segments.
An elaborate description of the stack-copying approach, with techniques for supporting side-effects, various optimizations that can be performed to improve
efficiency, and implementation details are left as future
work. Preliminary details can be found in [GH91].
6. Conclusions & Comparison with Other Work
In this paper, we presented a high-level approach
capable of exploiting both independent and-parallelism
and or-parallelism in an efficient way. In order to find
all solutions to a conjunction of non-deterministic andparallel goals in our approach some goals are explicitly recomputed as in Prolog. This is unlike in other
and-or parallel systems where such goals are shared.
This allows our scheme to incorporate side-effects and
to support Prolog as the user language more easily and
simplifies other implementation issues.
781
In the context of this approach we also presented
two techniques for environment representation in the
presence of independent and-parallelism which are extensions of highly successful environment representation techniques for supporting or-parallelism. The first
technique, based on Binding Arrays [W84, W87], and
termed Paged Binding Array technique, yields a system which can be viewed as a direct combination of
the Aurora [LW90] and &-Prolog [HG90] systems. The
second technique based on stack copying [AK90] yields
a system which can be viewed as a direct combination of the MUSE [AK90] and &-Prolog systems. If
an input program has only or-parallelism, then the system based on Paged Binding Arrays (resp. Stack copying) will behave exactly like Aurora (resp. Muse). If
a program has only independent and-parallelism the
two models will behave exactly like &-Prolog (except
that conditional bindings would be allocated in the
binding array in the system based on Paged Binding
Arrays). Our approach can also support the extralogical features of Prolog (such as cuts and side-effects)
transparently [GS91], something which doesn't appear
to be possible in other independent-and/or parallel
models [BK88, GJ89, RK89]. Control in the models
is quite simple, due to recomputation of independent
goals. Memory management is also relatively simpler.
We firmly believe that the approach, in its two versions of Paged Binding Array and stack copying can
be implemented very efficiently, and indeed their implementation is scheduled to begin shortly. The implementation techniques described in this paper can
be used for even those models that have dependent
and-parallelism, such as Prometheus [SK92], and IDIOM (with recomputation) [GY91]. They can also be
extended to implement the Extended Andorra Model
[W90].
Acknowledgements
Thanks to Vitor Santos Costa for his numerous
comments on this paper, and to Raed Sindaha and
Tony Beaumont for answering many questions about
Aurora and its schedulers. The research presented in
this paper has benefited from discussions with Kish
Shen, Khayri Ali, Roland Carlsson, and David H.D.
Warren, all of whom we would like to thank. This research was supported by U.K. Science and Engineering
Research Council grant GR/F 27420, ESPRIT project
PEPMA and CICYT project TIC90-1105-CE.
[AK90]
References
K. Ali, R. Karlsson, "The Muse Or-parallel
Prolog Model and its Performance," In Proceedings of the North American Conference
on Logic Programming '90, MIT Press.
[AK91]
K. Ali, R. Karlsson, "Full Prolog and
Scheduling Or-parallelism in Muse," To appear in International Journal of Parallel Programming, 1991.
[BK88]
Uri Baron, et. al., "The Parallel ECRC Prolog System PEPSys: An Overview and Evaluation Results," In Proceedings of FGCS '88,
Tokyo, pp. 841-850.
[CA88]
W. F. Clocksin and H. Alshawi, "A Method
for Efficiently Executing Horn Clause Programs Using Multiple Processors," In New
Generation Computing, 5(1988), 361-376.
[CC89]
S-E. Chang and Y.P. Chiang, "Restricted
And-Parallelism Model with Side Effects,"
Proceedings of North American Conference
on Logic Programming, 1989, MIT Press, pp.
350-368.
[CG91] 'M. Carro, L. Gomez, and M. Hermenegildo,
"VISANDOR: A Tool for Visualizing And/Or-parallelism in Logic Programs," Technical Report, U. of Madrid (UPM), MadridSpain, 1991.
Dutra, "Flexible Scheduling in the
Andorra-I System," In Proc. ICLP'91 Workshop on Parallel Logic Prog., Springer Verlag,
LNCS 569, Dec. 1991.
[D91]
I.
[G91]
G. Gupta, "Paged Binding Array: Environment Representation in And-Or Parallel Prolog," Technical Report TR-91-24, Department of Computer Science, University of
Bristol, Oct. 1991.
[G91a]
G. Gupta, "And-Or Parallel Execution of
Logic Programs on Shared Memory Multiprocessors," Ph.D. Thesis, University of North
Carolina at Chapel Hill, Nov. 1991.
[GS92]
G. Gupta, V. Santos Costa, "And-Or Parallel Execution of full Prolog based on Paged
Binding Arrays," To appear in Proceedings
of Parallel Languages and Architectures Europe (PARLE '92), June 1992.
[GH91]
G. Gupta and M. Hermenegildo, "ACE:
And/Or-parallel Copying-based Execution of
Logic Programs," Technical Report TR-9125, Department of Computer Science, University of Bristol, Oct. 1991. Also in Springer
Verlag LNCS 569, Dec. '91.
[GJ89]
G. Gupta and B. J ayaraman, "Compiled
And-Or Parallel Execution of Logic Programs," In Proceedings of the North American Conference on Logic Programming '89,
MIT Press, pp. 332-349.
[GJ90]
G. Gupta and B. J ayaraman, "On Criteria for
Or-Parallel Execution Models of Logic Programs," In Proceedings of the North A mer-
782
ican Conference on Logic Programming '90,
MIT Press, pp. 604-623.
[GJ90a] G. Gupta and B. Jayaraman, "Optimizing
And-Or Parallel Implementations," In Proceedings of the North American Conference
on Logic Programming '90, MIT Press, pp.
737-756.
[GS91] G. Gupta, V. Santos-Costa, "Cut and Side
Effects in And-Or Parallel Prolog," Technical
Report TR-91-26, Department of Computer
Science, University of Bristol, Oct. 1991.
[GY91] G. Gupta, V. Santos Costa, R. Yang, M.
Hermenegildo, "IDIOM: A Model for Integrating Dependent-and, Independent-and
and Or-parallelism," In Proc. Int'l. Logic
Programming Symposium '91, MIT Press,
Oct. 1991.
[H86]
M. V. Hermenegildo, "An Abstract Machine
for Restricted And Parallel Execution of
Logic Programs". 3rd International Conference on Logic Programming, London, 1986.
[HG90] M. V. Hermenegildo, K.J. Greene, "&-Prolog
and its performance: Exploiting Independent
And-Parallelism," In Proceedings of the 7th
International Conference on Logic Programming, 1990, pp. 253-268.
[HN86] M. V. Hermenegildo and R. I. Nasr, "Efficient Implementation of backtracking III
AND-parallelism" , 3rd International Conference on Logic Programming, London, 1986.
[HC87] B. Hausman, et. al., "Or-Parallel Prolog
Made Efficient on Shared Memory Multiprocessors," in 1987 IEEE Int. Symp. in Logic
Prog., San Francisco, CA.
[HC88] B. Hausman, A. Ciepielewski, and A. Calderwood, "Cut and Side-Effects in Or-Parallel
Prolog," In International Conference on Fifth
Generation Computer Systems, Tokyo, Nov.
88, pp. 831-840.
[LK88] Y-J. Lin and V. Kumar, "AND-parallel execution of Logic Programs on a Shared Memory Multiprocessor: A Summary of Results" ,
in Fifth International Logic Programming
Conference, Seattle, WA.
[LW90] E. Lusk, D.H.D. Warren, S. Haridi et. al.
"The Aurora Or-Prolog System", In New
Generation Computing, Vol. 7, No. 2,3,1990
pp. 243-273.
[MH89] K. Muthukumar and M. Hermenegildo,
"Complete and Efficient Methods for Supporting Side-effects III Independent/Restricted And-Parallelism," In Proc. of ICLP,
1989.
[MH89a] K. Muthukumar, M. V. Hermenegildo, "Determination of Variable Dependence Information through Abstract Interpretation," In
Proc. of NACLP '89, MIT Press.
[RS87]
M. Ratcliffe, J-C Syre, "A Parallel Logic Programming Language for PEPSys" In Proceedings of IICAI '87, Milan, pp. 48-55.
[RK89]
[S91]
[S89]
[SH91]
B. Ramkumar and L. V. Kale, "Compiled Execution of the REDUCE-OR Process Model,"
In Proc. of NACLP '89, MIT Press, pp. 31333l.
R. Sindaha, "The Dharma Scheduler Definitive Scheduling in Aurora on Multiprocessor Architecture," Technical Report, Department of Computer Science, University of
Bristol, forthcoming.
P. Szeredi, "Performance Analysis of the Au- .
rora Or-parallel Prolog System," In Proc. of
NACLP, MIT Press, 1989, pp. 713-732.
K. Shen and M. Hermenegildo, "A Simulation Study of Or- and Independent AndParallelism," In Proc. Int'l. Logic Programming Symposium '91, MIT Press, Oct. 1991.
[SI91]
R. Sindaha, Personal Communication, Sep.
1991.
[SK92]
K. Shen, "Studies of And-Or Parallelism in
Prolog," Ph.D. thesis, Cambridge University,
1992, forthcoming.
[SW91] . V. Santos Costa, D. H. D. Warren, R. Yang,
"Andorra-I: A Parallel Prolog system that
transparently exploits both And- and OrParallelism," In Proceedings of Principles &
Practice of Parallel Programming, Apr. '91,
pp. 83-93.
[VX91]
A. Veron, J. Xu, et. al., "Virtual Memory
Support for Parallel Logic Programming Systems," In PARLE'91, Springer Verlag, LNCS
506, 1991.
[W84]
D. S. Warren, "Efficient Prolog Memory
Management for Flexible Control Strategies,"
In The 1984 Int. Symp. on Logic Prog., Atlantic City, pp. 198-202.
D. H. D. Warren, "The SRI-model for OrParallel execution of Prolog - Abstract Design and Implementation Issues," 1987 IEEE
Int. Symp. in Logic Prog., San Francisco.
[W87]
[W90]
D.H.D. Warren, "Extended Andorra Model
with Implicit Control" Talk given at Workshop on Parallel Logic Programming, 7th
ICLP, Eilat.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
783
Estimating the Inherent Parallelism in Prolog Programs
David C. Sehr •
Laxmikant V. Kale t
University of Illinois at Urbana-Champaign
Abstract
In this paper we describe a system for compile time
instrumentation of Prolog programs to estimate the
amount of inherent parallelism. Using this information we can determine the maximum speedup
obtainable through OR- and AND/OR-parallel execution. We present the results of instrumenting a
number of common benchmark programs, and draw
some conclusions from their execution.
1
Introduction
In this paper we describe a method for timing Prolog programs by instrumenting the source code.
The resulting program is run sequentially to estimate the sequential and best possible OR parallel
execution times. This method is then extended to
give the b~st possible AND/OR parallel execution
time. Our instrumentation does not drastically r~
duce efficiency, and we present the results of a number of programs.
Our AND parallelism estimation method is based
upon the work of by Kumar [1988] in estimating
the inherent parallelism in Fortran programs. His
method augments the source program with a timestamp for each data item d, which is updated each
time d is written. In order to honor dependences,
each computation that reads d can begin no earlier than the time recorded in d's timestamp. The
largest timestamp computed by such an augmented
program is the optimal parallel time for the original
program. This time can be used to evaluate how
well a given implementation exploits parallelism.
This paper comprises six sections. The remain• Center for Supercomputing Research and Development,
305 Talbot Laboratory, 104 S. Wright St., Urbana, IL 61801,
USA. (sehr@csrd.uiuc.edu) This work was supported by
Air Force Office of Scientific Research grant AFOSR 90-0044
and a grant from the IBM Corporation to CSRD.
t Department of Computer Science, Digital Computer
Laboratory, 1304 W. Springfield Ave., Urbana, IL 61801,
USA. (tale!Dlcs. uiuc. edu) This work was supported in part
by NSF grant NSF-CCR-89-02496.
der of the first presents some terminology. The
second describes measuring the amount of OR parallelism in a Prolog program. The third section
extends this method to include AND parallelism.
The fourth presents the timing methods used for
several builtin predicates. The fifth section gives
the results of our technique on the UCB Benchmarks. The last section presents some conclusions
and suggests some future work.
1.1
Terminology
A prolog program consists of a top-level query and
a set of clauses. The top-level query is a sequence
of literals; we shall also use the term query to refer
to any arbitrary sequence of literals. A literal is an
atom or a compound term consisting of a predicate
name and a list of subterms or arguments. Each
subterm is an atom or a compound term. The number of sub terms of a compound term is its arity. A
clause has a head literal and zero or more body literals. A clause with no body literals is a fact; others
are rules. Clauses are grouped into procedures by
the predicate name and arity of their head literals. The rest of this paper assumes some working
knowledge of Prolog's execution strategy.
For our timings we model a program's execution
as traversal of its OR tree (SLD tree). Each node in
an OR tree is labeled by a query. The first literal
of the query at node N is the literal at N. The
label of the root is the top-level queryl. Each child
N of a node M is produced by unifying a clause
C's head with the literal L at M. N's query is
formed by replacing L in M's query by the body
of C. The left-to-right order of such children is the
order of the clauses in the source program. A leaf
node with an empty query is a success. Sequential
Prolog systems traverse this tree depth-first and
left to right.
1 Which may have appeared in the source program, or
may have been typed by the user at the read-evaluate-print
prompt.
784
2
Sequential and OR time
The most efficient OR parallel implementations of
Prolog to date [Warren 1987, Ali 1990] have been
based upon the Warren Abstract Machine (WAM)
[Warren 1983]. Because of this, we compute critical path timings in number of WAM instructions
executed. The number of instructions is an approximation to execution time, since each type of
WAM instruction takes a slightly different time.
Variations in execution time come mainly from two
sources: argument unification and backward execution. The former comes from the get_value and
unify _value instructions, whose costs depend on
the size of the terms they unify, which can be substantial. We address this by making the cost of
these instructions the number of unification steps
they perform. Backward execution comes from instruction failure and may perform significant bookkeeping changes, especially for deep backtracking.
Different WAM implementations, particularly parallel ones, have differing costs for backward execution. In the measurements presented here we have
assumed zero backward execution cost, but other
cost assumptions can be used.
The execution time of a program has two components. The literal L at a node N in the OR tree
is a call to a procedure p. Calling p consists of
setting up L's calling arguments by a sequence of
put instructions and performing the call by a call
or execute instruction. The execution time of this
sequence is a statically computable time tp(L) for
L, which we approximate by the number of put
instructions plus one.
Executing a called procedure consists of trying
clauses in succession. If C is being tried for the call,
the call arguments are unified with the arguments
of C's head literal H. This is done by get and
unity instructions and takes a time tu(H). In general the execution time of these instructions cannot
be estimated at compile time, so this head unification is performed by calls to run-time routines for
the corresponding WAM instructions. tu (H) is the
sum of the times computed by these routines.
To represent execution times the OR tree is given
two new labels. First, each node N is labeled with
the time tp(L) Jor the literal L at N. Second, each
edge (N, M) is labeled with tu(H), where H is the
head literal of the clause C applied to produce node
M. The program's all-solutions sequential execution time is the sum of the all tp's and tu's in the
tree's processed region 2 •
2 Predicates such as cut may prevent traversal of parts of
the tree.
fib(O,l).
fib(l,l).
fib(I,F) :I > 1,
11 is I - 1,
fib(ll,Fl),
12 is I - 2,
fib(12,F2),
F is Fl + F2.
Figure 1: A program to be timed
~ .p~::::.u~t_.:.::v::J.ar.=.ia_bl_e_A_l_,A_l...... 1tp(L)
Ts(L) pucconstant 3,AO
saIl fib!2
T~)
geCvariable AO,YO ttU(H)
gecvariable Al,Yl
f·····~;h;;li~~;~1s·····1
1 ••••••• _--._-----.--_.---•••••
~
Ts U pucvalue Xl,Al
( ~ puCvariable X2,A2
1
tp(Li)
Te(Li) r··;;~~;;i~;·~~~·k····!
~
~
I ••••••••••••••••••••••••••••••
r····~;h;;·ii~~;~i~·····l
I ••••••••••••••••••••••••••••••
Te(C)
Figure 2: Execution of a timed literal L
2.1
Pure Prolog
Finding the minimum OR parallel time requires
finding the critical path in the OR tree. For a pure
Prolog program this is done by summing the tp's
and tu's from the root to each leaf. The critical
path has the largest such sum. Programs containing builtins such as read, setof, recorda, and
assert require timing in sequential order. We first
describe the method for pure programs and extend
it to handle these predicates below.
Figure 1 shows a program to be instrumented,
and Figure 2 shows its execution. The time at
which literal L is to be processed is denoted by
T,(L). If L is at the root of the OR tree, then
T,(L)
O. Otherwise T,(L) is the time the p~e
ceding computation finished. Execution of L begms
with the puts and call, which take time tp(L), as
we noted above. Thus the earliest time any clause
can be tried for L is T,(L)+tp(L). This is the start
time T,(C) for every clause C applied for L, since
all are tried in OR parallel. Head unification for
=
785
C begins at T, (C) and is done by get (and unify)
instructions. If successful, this completes at time
T,(C) + tu(H). If C is a fact, then the end time is
Te(C) + 1, where the 1 is for the proceed instruction.
fib(A,B,Ts,Te) :(A == 0 j var(A»,
get_constant(A,O, Ts, Tul),
get_constant(B,l, Tul, Tu2) ,
update_time (Tu2, 1, Te).
If C is a rule, each literal Li is processed as L
was, begins at time T,(Li), and ends successful execution at time Te(Li). The first body literal begins
at time T,(L 1 ) T,(C)+tu(H). If the call from Li
is successful and returns at time Te(Li), then the
next literal Li+l starts at time T,(Li+d Te(Li).
This continues until the last literal Ln completes
at time Te(Ln), which is also the finish time Te(C)
for C.
fib(A,B,Ts,Te) :(A == 1 j var(A»,
get_constant(A,l, Ts, Tul),
get_constant(B,l, Tul, Tu2),
update_time (Tu2, 1, Te).
=
=
fib(A,B,Ts,Te) :get_variable(A,N, Ts, Tul),
get_variable(B,F, Tul, Tneck),
update_time (Tneck, 4, Tel),
N > 1,
update_time(Tel, 6, Te2),
Nl is N - 1,
update_time(Te2, 3, Ts3),
fib(Nl, Fl, Ts3, Te3),
update_time(Te3, 6, Te4),
N2 is N - 2,
update_time (Te4, 3, Ts5),
fib(N2, F2, Ts5, Te5),
update_time(Te5, 6, Te),
F is Fl + F2.
The time for a success is Te(L) for the last literal L in the top-level query. The time for a failed
instruction in C is T,(C) plus the portion oftu(H)
computed before the failure. Most builtins are
given a cost of one, and builtin failure takes the
same time as a successful call does.
The system maintains a global critical path time
Tmax. Whenever a library routine performing head
unification fails at time TJ, it examines Tmax, and
stores the larger of the two times as the new Tmax.
The library routine that computes T, (C) also updates Tmax , and the top-level query is modified to
update it as well.
Figure 3 shows the timed version of Figure 1.
Each clause has two new arguments, Ts and Te,
and head unification is performed by routines such
as get_constant and get_variable. These routines perform the corresponding WAM operation
and update the critical path time. The first two
clauses are facts, so the end time is computed by
an update_time literal for the proceed instruction.
The third clause is a rule, so each body literal L
has a preceding update_time literal. If L refers
to a user-defined predicate this literal computes
T,(L) + tp(L) for use as the start time for the call.
If L refers to a builtin predicates (except those in
Section 4) the update_time literal adds tp(L), plus
one for the builtin's execution time, and uses this
as the end time for L.
Each clause also has an initial index literal that
enables last call optimization. Moving head unifications to the body made indexing impossible, so
this literal is added to perform first argument indexing. If this is not done, last call optimization
rarely works. This literal appears sufficient for last
call optimization with the Sicstus compiler.
Figure 3: Program after instrumentation
3
Adding AND parallelism
The critical path time determines the best possible OR parallel execution time. Often segments
of a branch can execute simultaneously, and doing so would reduce that critical path time. This
is AND parallel execution, and unlike OR parallelism, it requires testing for dependences even in
pure Prolog programs. In this section we describe
the application of Kumar's [1988] techniques for
Fortran to estimate the best AND/OR parallel execution time. The method we describe extends
his work to deal with the dynamic data structures
and aliasing present in Prolog. We believe this
framework has the advantage over other methods
[Shen 1986, Tick 1987] of allowing us to extend it to
measure critical path times in programs with user
parallelism.
A program's dependences can only be exactly
determined at execution time, since one execution
may have a dependence while another does not. A
compiler, to ensure legal execution, must assume a
786
__ {N} -
/ ..
,b(N,F)
IN} {N\
Nl is N-l •
N2 is N-2 •
I
I
{N1}
{N2}
fi~i fi~~
'{F1,
/2}
• FisFl+F2
Figure 4: A dependence graph
dependence exists unless it can be proven not to.
Because of this, compilers often infer many more
dependences than are actually present in the program. Another use of the method we propose is to
compute exact dependences to test the effectiveness
of dependence tests.
There are a number of AND parallel execution
models that differ in their treatment of the dynamic
nature of dependences. The approaches range
from dependence graphs that are static [Kale 1987,
Chang et al 1985, Wise 1986] to partly dynamic
(conditional) [DeGroot 1984, Hermenegildo 1988]
to completely dynamic [Conery and Kibler 1985].
Kal'e [1984] notes that in some rare situations it
may be beneficial to evaluate dependent literals in
parallel. His Reduce-Or Process Model allows for
dependent AND parallelism, but his implementation [Ramkumar and Kale 1989] supports only independent AND parallelism. Epilog [Wise 1986]
also permits dependent AND parallelism, but provides a primitive (CABO) to curtail it. The model
we have developed includes dynamic, independent
AND parallelism, with a strict sequential ordering
on dependent literals. We are only able to present
the results here for independent AND parallel execution, though, because of a problem in the Prolog
system used to execute the instrumented programs.
In the future we hope to report the timings for the
more general approach.
3.1
Dependences
The third clause in Figure 1 contains six body
literals that might potentially execute in parallel.
The arguments of the > builtin must both be nu-
meric expressions, so to execute correctly the argument I to fib must be an integer. Because neither
writes I, the two is goals can execute independently. Each reads B and produces a binding for
11 or 12, the values of B for the recursive instances.
Since all fib clauses read B, the recursive calls can
only begin after their corresponding is. The final
is literal requires the value of both F1 and F2 so
the two fib calls must precede the final is. There
need be no other ordering between literals.
Figure 4 shows the dependence graph for the
clause. There is a node for the initial call to fib
and a node for each body literal. Recursive computations are represented by shaded areas. An
arc between two nodes represents a dependence, or
that the node at the tail must precede the node
at the head of the arc. Dependence arcs are labeled with the variables causing them. Such a variable v causes a dependence 6 in one of two ways.
First, if the node at the tail of Qbinds v, and and
v is read at the head, then there is a data dependence. Second, if the node at the head of 6 binds
v and the node at the tail reads v using a metalogical predicate (var, write, etc.), then there is
an anti-dependence. Anti-dependences arise when
a literal succeeds with a variable v unbound and
would fail or produce incorrect output because v is
subsequently bound.
3.2
Shadow terms
Dependences are detected at run time by shadow
terms. Each term t has a shadow term "p(t) associated with it, which mirrors t's structure. The
shadow of an atomic term is the atom a. The
shadow term of a compound term t l(t 1 , ••• , tn)
is s( "p(tt) , ... , .,p(t n )) , where .,p(ti) is the shadow for
=
ti.
A variable must be bound for a dependence to
exist, so the shadow term for a variable keeps
the binding times for that variable (there can be
multiple bindings, since some may be variable-tovariable). The shadow of an unbound variable is
unbound. If v is bound to any term t at time T
by a get_variable or unify_variable instruction,
the shadow variable "p( v) is dereferenced and the final variable is bound to the structure w("p(t), T).
The same operation is performed if v is bound to a
non-variable term t by any other instruction. If v is
bound to another variable Vi by any other instruction at time T, an alias has been created. The two
shadows reflect this by dereferencing both .,p( v) and
"p( Vi) and binding the final variables of both to the
term w(.,p'(v),T), where .,p'(V) is a new unbound
787
fib(A. B, Sa, Sb. Ta, Te) .(A
var(A»,
get_constant(A,O, Sa, Ts, Tul),
get_constant(B,l, Sb, Tul, Tu2),
update_times (Tu2, 1, Te).
==
°;
fib(A. B, Sa, Sb, Ts, Te) .(A
1 ; var(A»,
get_constant(A,l, Sa, Ts, Tul),
get_ constant (B, 1, Sb, Tul, Tu2) ,
update_times (Tu2 , 1, Te).
==
fib(A. B, Sa, Sb. Ts. Te) .get_variable(A,I, Sa, Sn, Ts, Tul),
get_variable(B,F, Sb, Sf, Tul, Tu),
max_shadow_time(Tu, [Sn], Ttl),
update_time(Ttl, 4, Tel),
I > 1,
max_shadow_time(Tu, [Snl,Sn], Tt2),
update_time(Tt2, 6, Te2),
11 is 1 - 1,
set_shadows([Snl],[ll],Te2),
update_time(Tu, 3, Ts3),
fib(ll, Fl, Snl, Sfl, Ts3, Te3),
max_shadow_time(Tu, [Sn2,Sn], Tt4),
update_time(Tt4, 6, Te4),
12 is 1 - 2,
set_shadows([Sn2], [12], Te4),
update_time(Tu, 3, Ts6),
fib(12, F2, Sn2, Sf2, Ts6, Te6),
max_shadow_time(Tu, [Sf,Sfl,Sf2], Tt6),
update_time(Tt6, 6, Te6),
F is Fl+F2,
set_shadows([Sf], [F], Te6),
max([Tel,Te2,Te3,Te4,Te6,Te6], Te).
Figure 5: AND/OR instrumented program
variable. If v is examined by a meta-logical builtin
at time T, ..p( v) is dereferenced, and the final variable is bound to m(..p' (v), T), where ..p' (v) is a new
unbound variable.
3.3
Dependences with shadow terms
Figure 5 shows fib after instrumentation for
AND/OR parallelism. Each variable V in a clause
has a shadow variable Sv, and each head argument
has a shadow argument. The end time for a clause
is the largest end time for any literal in that clause,
as if each literal starts immediately after head unification and suspends until its dependences are satisfied. In Figure 5 the end time is shown as computed
by a max literal at the end of the clause. This is
for clarity of presentation only, because this would
inhibit last call optimization. In the real version a
current maximum is passed to each body literal in
successIOn.
The head unification routines now include
shadow variables as arguments, since it is in these
instructions that dependences in user-defined predicates are enforced. These routines previously computed their finish time only from the start time
and the cost of the instruction. Now there is the
possibility that the instruction must wait until the
shadow time for a variable causing a dependence
before performing the unification. Hence the completion time is computed by performing the unification and keeping a current time. Whenever a term
is referenced the current time becomes the maximum of the current time and the timestamp. The
unification is then performed and the current time
incremented.
Two other predicates enforce dependences involving builtins. The first, max_shadow_time, computes the earliest time the builtin's arguments are
available3 from the latest time in the arguments'
shadows. This enforces data dependences that have
the builtin as their head. The builtin's end time
is computed by update_time, as before. The second predicate, set-Bhadows, builds shadows for
changes to the arguments of a builtin. Shadows
are built for those arguments that are bound or
are examined by meta-Iogicals, and they are constructed from the variable bindings after execution.
This handles builtins at the tail of a dependence.
For some builtins such as =.. this can be fairly
complex.
4
Builtin predicates
Prolog has several types of builtin predicates, each
with a different set of effects on critical path timing.
We have already noted that meta-logical builtins
(var, write, etc.) can cause anti-dependences. In
this section we describe four other kinds of predicates and methods for timing each of them.
4.1
Predicates involving call
There are four predicates that implicitly use the
meta-logical builtin call. They are bagof, setof,
not, and \ +. Timing these predicates requires two
3This predicate is also used to enforce independent-AND
parallel execution, by making every user predicate strict
788
Te-Tmax
Tmax-max{pop, Te)
Te2=max{Td(2), Te1j+ 1
lasLio=Te2
Figure 6: Processing setof
Figure 7: Processing the input/output predicates
kinds of special handling. First, since call's arguments may be constructed at run time, instrumentation is done at run time. This is done by including
the the instrumentation program in the timed program. Second, setot, bagot, not and \+ traverse
an entire OR tree, so their finish times are related to
the longest path in that tree. A stack of maximum
times is used with nested calls to these predicates
to collect a subtree's maximum time. For setof
and bagot we also add one for each solution for the
cost of building the returned list.
Figure 6 shows the processing of a call to setof
that computes all the solutions for the p(X) in region R and collects them in a list L. Since it traverses the whole OR tree R required to compute
p(X), setof's finish time is the longest completion
time in R. The maximum time is maintained by
update_time in the global variable Tmax4. Since
there may be a previous maximum time greater
than the largest completion time in R, Tmax is
pushed on a stack and the start time for the setof
is used as Tmax. R is traversed and the maximum
time is stored in Tmax, as always. The return time
for setof, Te is Tmax. At the end of setof Tmax is
set to the maximum of the stack value and Te, so
again Tmax contains the global largest time.
cross-branch dependences, since the observable order of input/output needs to conform to Prolog's
left-to-right order. Figure 7 depicts the execution
of a program with two writes, WI and W2. Data dependence would permit each write to start when
its arguments were ready (times Td(l) and Td(2)
respectively) were it not for the order of output.
WI must write its output before W2, so to determine when input or output can be done we maintain a global variable last_io. In this example, W2
cannot write its output until max{Td(2), last.io}.
Writes cost one time unit, so W2 can start no earlier
than max{T)d(2) , Td(l) + I}. In the instrumented
version each input/output predicate is preceded by
a literal that updates last_io.
4.2
Read and write
Neither setof nor pure Prolog cause dependences between branches in the OR tree. The input/output predicates (read, write, etc.) cause
41n the implementation of our system the maximum time,
along with a parallelism histogram, is maintained by several
C routines accessed through a foreign function interface, but
this is done only for the sake of efficiency.
4.3
Recorda and recorded
Prolog also has the builtins recorda, recorded,
and erase to manipulate an internal database.
Parallel accesses to relations in the database must
appear to preserve the sequential execution order.
Accesses to different database relations do not affect one another, so this order is only within a relation. It is not necessary to serialize accesses to each
relation to preserve the appearance of sequential access order. All we need is to guarantee that read
accesses to an element by recorded occur after the
write access that placed that element there, and
that write accesses (recordas and erases) are ordered. The former is enforced by pairing each item
placed in the database with its insertion time. Accesses by recorded wait until the maximum of the
data dependence time Td and the element's insertion time. The write order is enforced by labeling
each relation with a last..modify that is updated
789
Program
Name
chat_parser
crypt
divide10
fast-ITIU
flatten
log10
met~qsort
mu
nand
nreverse
ops8
poly10
prover
qsort
queens8
query
reducer
serialise
tak
times10
unify
zebra
Serial
WAM
Instr.
1014791
31787
207
8899
5218
119
38675
5925
180145
4460
163
307177
7159
5770
33821
17271
279220
3199
1431202
207
29490
261858
OR
Parallel
Speedup
257
58
1
9.1
1.25
1
2.1
16.7
5.4
1
1.04
1.1
4.5
1.3
26.4
243
2
1.4
1.1
1
1.6
453
AND/OR
Parallel
Speedup
1596
114
2
10.7
2.37
1.2
3.7
17.7
14.3
1
2.8
76.3
14.2
1.5
69.3
480
3.3
1.9
686
1.9
3.5
482
Table 1: Instrumented benchmark times
just like last_io.
4.4
Assert and retract
Prolog also allows as s ert and retract to modify the program at run time. These predicates
are timed by the method for call and that for
the internal database. The former is because the
asserted clause can be constructed at run time,
and hence the instrumentation must be done then.
The latter is because predicates modified at run
time must obey the access rules for database updates. The write-write (assert and retract) order is enforced by updating the last...modify for the
predicate. The read-write ordering is maintained
by adding a first literal to each asserted clause
that records when it was added. This is used to determine the earliest time a read (a clause builtin
or call to the modified predicate) can execute.
5
Analysis of programs
Table 1 presents the results obtained by instrumenting 23 of the University of California at Berke-
ley's UCB benchmarks. These programs range over
a variety of sizes and purposes. There are several interesting facts to observe from these programs. First, David H. D. Warren's assertion
[Warren 1987] that OR parallelism was likely to
produce significant speedups on a range of programs appears to be borne out. Several programs achieved small speedups from OR parallelism, mostly due to shallow backtracking (e.g flatten, ops8, polyl0, qsort, tak, unify). Improved indexing would probably eliminate most of this OR
parallelism. A number of programs exhibited essentially no OR parallelism (e.g. divide10, log10,
nreverse, timesl0).
In general, independent AND parallel execution improved the performance of programs already speeded up by OR parallel execution by
a small factor (1-6). These programs have all
shown reasonable speedups in real OR parallel
systems[Szeredi 1989]. Our results show that there
is plenty of parallelism in several of these programs
to extend to much larger machines (e.g. consider
chat_parser, query and zebra). Those with smaller
speedups may profit from the introduction of independent AND parallelism.
Of the programs that were mostly OR-sequential,
the majority get very small speedup by applying independent AND parallel execution. For divide10,
log10, and times10, this is because the AND parallel sub-problems are very unbalanced; that is, one
sub-problem is much larger than the other. For
nreverse, the reason is that independent AND parallel execution is not able to execute the two body
goals of nreverse in parallel. It is a recurrence, and
is hence completely sequential. This can be addressed by replacing the algorithm or applying a
parallel recurrence solver.
The best results for independent AND parallelism come from polyl0 and tak. In both cases
these give rise to fairly large numbers of independent subcomputations. In the case of tak, the
branching factor is approximately three and the
calling depth is large, so a large speedup is obtained. Qsort on a well-chosen input list with a
better partition routine should be able to obtain
similar results.
These results are just the beginning of understanding the parallelizability of programs, as we
would like information on the more general AND
and other sorts of parallelism. However, they can
tell us something about how much speedup we can
reasonably expect from parallel models. Moreover, examining these programs to see where dependences occur should help in designing restruc-
790
turing transformations.
6
Conclusions
The amount of OR and AND/OR parallelism in
a Prolog program can be effectively measured by
sequentially executing an instrumented version of
that program. The timings obtained this way give
a best-possible speedup under two different parallelism models, and can be used for a number of purposes. First, they can be used to evaluate the ability of a parallel execution model to exploit parallelism. These results can suggest areas of improvement for such models. We intend to instrument a
number of programs for this purpose.
With some relatively simple extensions this technique can measure the amount of a number of
lower-level program characteristics. Among these
are unification parallelism, backtracking properties,
aliasing, data dependences, and dereference costs.
Prolog can also be extended with predicates for
source-level parallelism. With proper timing methods, this instrumentation method can be used to
evaluate restructuring transformations for Prolog.
The instrumentation system we described has been
extended with such predicates and we have begun
to evaluate transformations. In the future we will
describe these extensions to the instrumentation
method as well as the results of our restructuring
transformations.
Acknowledgments
The authors would like to thank David Padua for
his many useful suggestions about this work.
References
[Ali 1990] Khayri Ali. The muse or-parallel prolog
model and its performance. In Proceedings of
the 1990 North American Logic Programming
Conference, pages 757-776, 1990.
[Chang et a/1985] J. Chang, A. M. Despain, and
D. DeGroot. And-parallelism of logic programs based on a static data dependency analysis. In Proceedings of Compcon 85, 1985.
[Conery and Kibler 1985] J.S. Conery and D.F.
Kibler. And parallelism and non determinism
in logic programs. New Generation Computing, 3:43-70, 1985.
[DeGroot 1984] D. DeGroot.
Restricted andparallelism. In Proceedings of the International Conference on Fifth Generation Computer Systems, pages 471-478. North Holland,
1984.
[Hermenegildo 1988] M. V. Hermenegildo. Independent AND-Parallel Prolog and its Architecture. Kluwer Academic Publishers, 1988.
[Kale 1984] Laxmikant V. Kale. Parallel Architectures for Problem Solving. PhD thesis, State
University of New York at Stony Brook, 1985.
[Kale 1987] Laxmikant V. Kale. Parallel execution of logic programs: the reduce-or process
model. In Proceedings of the International
Conference on Logic Programming, pages 616632, May 1987.
[Kumar 1988] Manoj Kumar.
Measuring parallelism in computation intensive scientific/engineering applications. IEEE Transactions on Computers, 37(9), September 1988.
[Ramkumar and Kale 1989] B. Ramkumar and
L.V. Kale. Compiled execution of the reduceor process model on multiprocessors. In Proceedings of the 1989 North American Conference on Logic Programming, pages 313-331,
October 1989.
[Shen 1986] Kish Shen. An investigation of the argonne model of or-parallel prolog. Master's
thesis, University of Manchester, 1986.
[Szeredi 1989] Peter Szeredi. Performance analysis of the aurora or-parallel prolog system. In
Proceedings of the 1989 North American Conference on Logic Programming, pages 713-732,
1989.
[Tick 1987] Evan Tick. Studies in Prolog A rchitectures. PhD thesis, Stanford University, June
1987.
[Warren 1983] David H. D. Warren. An abstract
prolog instruction set. Technical report, SRI
International, October 1983. Technical Note
309.
[Warren 1987] David H.D. Warren. The sri model
for or parallel execution of prolog - abstract
design and implementation. In Proceedings of
the 1987 Symposium on Logic Programming,
pages 92-103. September 1987.
[Wise 1986] Michael Wise. Prolog Multiprocessors.
Prentice-Hall International Publishers, 1986.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992
edited by ICOT. © ICOT, 1992
'
791
Implementing Streams on Parallel Machines with
Distributed Memory
t Koichi Konishi tTsu tomu Maruyama t Akihiko Konagaya
:j:Kaoru Yoshida :j:Takashi Chikayama
Abstract
Stream-based concurrent object-oriented programming
languages (SCOOL) to date have been typically implemented in concurrent logic programming languages
(CLL). However, CLLs have two drawbacks when used
to implement message streams on parallel machines with
distributed memory. One is the lack of restriction on the
number of readers of a shared variable. The other is
a cascaded buffer representation of streams. These require many interprocessor communications, which can
be avoided by language systems designed specially for
SCOOLs. The authors have been developing such a
language system named A'UM-90 for A'UM, a SCOOL
with highly abstract stream communication. This paper presents the optimized method used in A'UM-90 to
implement streams on distributed memory. A stream is
represented by a message queue, which migrates to its
reader's processor after the processor becomes known.
The improvement from using this method is estimated
in terms of the number of required interprocessor communication, and is demonstrated by the result of a preliminaryevaluation.
1
Introduction
One natural use of concurrent logic programming languages(CLLs) is to implement the Actor or objectIn a CLL, it is
oriented programming models.
easy to specify objects running concurrently, communicating with one another by messages sent in
streams[Shapiro and Takeuchi 1983]. Message streams
in CLLs are especially useful, as they provide flexibility
and modularity, and facilitates the exploitation of parallelism; they allow dynamic re-configuration of communitNEC Corporation
4-1-1, Miyazaki, Miyamae-ku, Kawasaki, Kanagawa 216, Japan
{konishi, maruyama, ·konagaya}@csl.cl.nec.co.jp
tInstitute for New Generation Computer Technology
1-4-28, Mita, Minato-ku, Tokyo 108, Japan
{yoshida, chikayama}@icot.or.jp
cation channels, while each object knows little about the
partners with whom it is communicating.
To support this style of programming, a number of
languages have been proposed ([Furukawa et al. 1984]
[Yoshida and Chikayama 1988]
[Kahn et al. 1986]
[Saraswat et al. 1990]). We call these languages streambased concurrent object-oriented languages(SCOOL).
Most research on SCOOLs to date has been focused on
providing excellent expressibility. While SCOOLs have
been implemented in CLLs, to our knowledge, no language system dedicated for SCOOLs has been implemented.
A dedicated system for SCOOL can be much more
efficient than those implemented in CLLs when the abstraction and other information in programs are fully
exploited. The authors have been developing such a dedicated system for a kind of SCOOL, A'UM. The system
is named A'UM-90, and is targeted for multiprocessor
systems with distributed memory.
.
In this paper, some drawbacks of CLLs as implementation languages for stream communications are discussed,
then it is shown how A'UM's well-regulated abstract
streams can be efficiently implemented. A brief description of such an implementation is given, its improvement
over a CLL implementation is estimated, and the results
of a preliminary evaluation are given.
The next section describes the implementation of objects and stream communication in CLLs. Section 3 introduces SCOOLs as natural descendants of CLLs. Section 4 explains why CLLs are inadequate for implementing streams. Section 5 describes A'UM and A'UM-90
briefly. Section 6 describes the implementation of stream
communication in A'UM-90 and its costs. Section 7
shows some results of evaluation. The last section gives
conclusion.
2
Objects in eLL
Stream-based concurrent object-oriented programming
languages have evolved from efforts to embody the
Actor or object~oriented programming models in
CLLs[Shapiro and Takeuchi 1983]. This style of pro-
792
object([message(Arguments) I In], State) :true I
method (Arguments , State, NewState),
object(In, NewState).
Figure 1: A clause representing an object
gramming has the virtues of object-oriented programming such as modularity and natural parallelism in an
extended way[Kahn et al. 1989]. For example, an object
implemented in a CLL may have multiple input ports,
and communication ports can be transferred between
processes. Moreover, it can send messages before the
destination is determined. In this chapter, an implementation of object-oriented programming in a CLL is briefly
described.
Many CLLs (FCP, FGHC, Flehg, Oc, Strand, etc.)
have been proposed to date. We use FGHC[Ueda 1985]
in the following explanation.
Figure 1 shows a typical example of representing an
object in FGHC. The behavior of an object is defined
by a number of clauses similar to the one above. Given
these clauses, a goal named obj ect represents the state
of an object at a certain moment. The first argument
is a shared variable used as a communication port, from
which the object receives messages. The second argument is the internal state of the object.
When another goal sharing the variable with the first
goal assigns a term [message(Actuals) I Rest] to the
variable, the above clause can be selected, and Rest becomes shared by the two goals. Actuals are bound to
Arguments, and the body of the clause is executed.
A goal named method performs most of the actual
work, creating new states and assigning it to NewState.
A new object goal is created with Rest as the first argument and NewState the second. Thus, an object, or a
process, is represented by the recurring creation of goals
with altered states.
Communication ports are represented by variables
shared by two goals. One goal emits a message by assigning a structure containing a message and a new variable.
When the other goal receives the message by successfully
matching itself with a head of a clause, the new variable
becomes shared, to be used as a new port. By repeating
this procedure, these goals can communicate as many
messages as required, one after another. The connection is closed when a structure containing no variable is
assigned. Communication in this style is called stream
communication.
Basically, stream communication is one-to-one as described above. However, several streams of messages can
easily be merged into one by a simple process. A merger
should have several ports representing the input streams
to be merged and one more for the output. It receives
a message from one of its input ports and forwards it to
the output port.
Many types of mergers with varying policies can be
devised. A merger of one type might receive from an
arbitrary port, utilizing the non-determinism in clause
selection of the CLL. A merger of another type might
concentrate on one port until the connection through it
is closed, then it might move on to another port. We call
the former type a merger, and the latter an appender,
because it effectively appends streams one after another.
3
SCOOL
Programming objects in a CLL has several obvious drawbacks. First, the implementation of stream communication is explicitly described in the program. Streams
are explicitly formed using messages and a variable,
and many to one communications are implemented with
merger processes. Programmers must make sure that
the same conventions are used throughout their programs. Secondly, contentions are apt to happen, due
to the lack of restriction on multiple writers to a variable. Lastly, the verbosity, in particular manipulation of
internal states, is excessive. It is cumbersome to provide
all the details of communication.
Many SCOOLs have been proposed to remove these
drawbacks ([Furukawa et al. 1984] [Kahn et al. 1986]
[Yoshida and Chikayama 1988] [Saraswat et al. 1990]).
These languages have a form for class definition, introduced to make a concise description of object behavior
possible. Stream communication is denoted by dedicated
expressions, with its implementation removed from programs.
To our knowledge, all SCOOLs have been implemented
in CLLs. It is natural and efficient to use CLLs for this
purpose, but is problematic with respect to the resulting
system's performance. CLL systems can not provide a
thoroughly object-oriented view efficiently, such as integers operated on by messages: Another problem is implementing stream communication on a multiprocessor
system with distributed memory. We focus on the latter
problem, and explain the inadequacies of CLLs in the
next section.
4
Problems in implementing
streams in CLLs
Stream communication, and more generally asynchronous communication, uses message buffers to store
pending messages. In distributed memory multiprocessor systems, accessing a message buffer requires interprocessor communications(IPC), unless both the accessing process and the buffer are on the same processor.
While a single IPC suffices to write a message into a
793
buffer on a remote processor, reading a message requires
two: a request and a reply. Placing the buffer on the
reader's processor can save one IPC for each message
communicated through the buffer.
However, it's difficult for CLL systems to place the
buffer on the reader's processor. CLL systems use a
shared variable as a message buffer, and they can't tell
the readers of a variable from the writers. In addition,
there may be multiple readers for a variable. In that
case, there is a relatively small advantage in saving IPCs
for only one reader among many.
Moreover, the number of IPCs required would not be
reduced even if the buffer is placed on the reader's processor. In a CLL, streams are represented as a sequence
of message buffers, and the writer only knows the last
one. When it becomes full, a new buffer is appended to
the sequence, and if it is created on the reader's processor, the address must be propagated to the writer. This
costs an additional IPC for every message sent.
Since CLL systems may not place shared variables on
the reader's processor, implementing these streams in
CLLs results in costly remote reads, repeated for every
buffer.
The argument so far prompts the development of a
dedicated system for SCOOLs. A'UM-90 is such a system for A'UM, a SCOOL that thoroughly integrates
streams into its specification. The next section describes
A'UM and gives an overview of A'UM-90.
where selector is the method's name, and actions specify
the operations it performs.
The only operations methods are allowed to perform
are connecting a stream to another, creating an object,
and sending a message to a stream.
Streams in A'UM
5.2
Stream communication in A'UM is highly abstract, providing safe communications and the notion of channels.
Directed variables prevent contentions for a stream. The
semantics of variables are enhanced so that they denote a
set of confluent streams called a channel, a more general
concept than a stream.
All variables in A'UM have a stream as their value.
The role of streams in A'UM is similar to pointers m
Lisp; streams are the sole way of referencing objects.
5.2.1
Operations on Streams
A stream is a sequence of messages, directed to a certain receiver. A message sent to a stream is placed at
the end of the stream. Sending is expressed simply by
juxtaposing a stream and a message, as follows.
stream message
Connection of two streams are denoted by the following syntax.
receiver = stream
5
5.1
A'UM and A'UM-90
Behavior of Objects
All A'UM objects run concurrently. They keep internal
states called slots, and execute methods according to the
messages they receive.
The class an object belongs to defines its behavior. A
class definition has the following form, which includes the
declaration of the class name, the classes it inherits from,
slot names (local state) and definitions of its methods.
class class_name.
super_class_decl
sloLdecl
method_defs
end.
An object receives messages from only one stream,
called its interface. An object is referenced by connecting a stream to its interface. Streams connected to the
object later on will be merged into the interface.
A method is defined by the following form.
selector -) actions.
This means that all messages sent to stream flow into
receiver.
Closing a stream indicates that no more messages will
be sent through it. Closing is always performed automatically, when a stream is discarded.
In addition, messages arriving at an object's interface
stream are consumed exclusively by that object. This
operation is also performed automatically.
5.2.2
Directed Streams
Stream connection is asymmetric; a stream may only
be connected to another stream once, but many other
streams may be connected to it. In order to assure at
compile-time that streams are connected only once, references to a stream are classified into two types, called
directions. An inlet is a reference to a stream from which
messages flow; an outlet is another kind of reference in
which messages are sentI. The single connection of a
stream is assured by the restrictions requiring that a
stream has only one inlet and that the right hand value
of a connect expression be an inlet.
Inlets and outlets are distinguished syntactically. Variables referencing inlets are denoted with a variable name
with A prepended to it, e.g. AX. Slots holding inlets and
lThey are named from an object's point of view.
794
class account.
out balance.
:init ->
0 = !balance.
:deposit(AAmount) ->
!balance + Amount = !balance.
:withdraw(AAmount, AAck) ->
(Amount < !balance) ? (
: (true· ->
!balance - Amount = !balance.
: (false ->
Ack :overdrawn(!balance).
)
.
:balance(!balance) -> .
end.
Figure 2: Bank account
outlets are written as slot names preceded by @ and by
! ,respectively. Expressions have a value whose direction
is determined according to their kind. Messages are distinguished by the directions of their arguments as well
as their number, and the message's name.
5.2.3
Channel Abstraction
Two types of stream confluence, namely mergers and
appenders have special support in the language. As
mentioned earlier, a merger performs non-deterministic
merging, and an appender connects streams one after
another in a specified order.
A channel is a tree formed of these confluences of
streams, Variables represent a channel of a particular
form, consisting of an appender and an arbitrary number of mergers. All outputs of the mergers are connected
to inputs of the appender.
For a variable named Foo, AFoo is an inlet of the root
stream of the channel. Foo$1, Foo$2, Foo$3, and so on,
are leaf streams. Foo is equivalent to Foo$1. They are
appended into the root in the order of their number.
When there are many expressions having the same number, the streams they denote are merged before being
appended.
Using channels reduces the description of mergers and
appenders in programs, which would be indecipherable
otherwise.
5.3
An Example Program
Figure 2 is an example A'UM program defining a class
for a bank account.
Arguments in a message are connected with values
of the expressions in the selector corresponding to the
message. For example, : deposi t receives an outlet and
connects AAmount to it. : balance receives an inlet and
connects it to the value of !balance.
A binary expression is a macro form. It expands into
a send expression, which sends to the left hand value a
message with two arguments, the right hand value and
an inlet of a new stream. The name ofthe message is determined according to the operator. A macro form evaluates into an: outlet of the new stream. Thus, ! balance
+ Amount are expanded into !balance :add(Amount,
AResult), with Result as its value.
exp? (... ) is an anonymous class definition,
which is used to represent a conditional behavior. Either
of the methods : (true or : (false is executed by the
instance of the anonymous class, according to the result
of Amount < !balance.
5.4
An outline of A'UM-90
A'UM-90 is an A'UM language system, independent of
any CLL. It provides efficient stream communication on
a distributed memory multiprocessor system. Moving
stream data structures to their reader's processor saves
many IPCs, which are otherwise required in stream communication.
A'UM-90 manages coarse-grained processes. Specifically, a process executes an instance of a user-defined
class.
An A'UM-90 system consists of a compiler and an
emulator. The compiler generates code for an abstractmachine designed for the system, and the emulator executes the code.
Two different types of platform have been used. One is
a Sequent Symmetry with 16 processors, and the other
is a number of Sun Sparc Stations communicating by
Ethernet. Although a Symmetry has shared memory,
we used it as a distributed memory machine. We used
a small part of the memory to implement message communication, and'divided the rest among processors.
6
Implementation of Streams in
A'UM-90
The implementation described here fully utilizes information on stream abstraction and message flow direction available in A'UM programs. Although the delivery
of messages is somewhat delayed, the number of IPCs
required is significantly reduced, when many messages
are sent through a long cascade of streams. Moreover,
the delay is eliminated in many cases by various subtle
optimization methods.
6.1
Streams
A stream is represented by a structure consisting of a
message queue, a pointer to its receiver, and a reference
count. The reference count is necessary for detecting
795
~
PE#3
_connect~J
migrate Stream! connect
Figure 3: Stream location
closed streams and for implementing the appenders correctly. The structure is named M node, where M stands
for merging. A merger is simply represented as an M
node having· more than one pointer referring to it. An
appender is represented by a structure consisting of an M
node and a pointer to the following stream. The structure is named A node.
With these structures, implementing operations on
streams within a processor is straightforward. Sending
a message is simply queuing it. Connecting a stream to
a receiver is making the pointer in the stream point to
the receiver and increasing the reference count of the receiver. When a stream is closed, its reference count is
decreased. Receiving a message is just dequeuing it.
6.2
Location of Streams
As argued in a previous section, a stream should be
placed on its receiver's processor in order to decrease the
number of IPCs. However, when a stream is created, its
receiver is still unknown. So we place it on the processor
local to its creater at its creation, and let it migrate later
to the receiver's processor(see Figure 3).
Since it is always an object that ultimately receives
messages sent to a stream, the stream migrates to the object's processor. When the stream is directly connected
to the object, it migrates immediately. If it is connected
to an intermediate stream, it waits until the intermediate
stream migrates.
Suppose that an address of a stream in a processor is
announced to an object in another processor and that
the stream has not yet migrated. If the object sends
messages to the stream, two series of IPCs occur, one
for sending them to the stream, and another for the migration proeess of the stream. We eliminate the former
series by putting the messages into a new stream created on the same processor as the sending object and
connecting the new stream to the original.
With this strategy, and assuming that objects do not
migrate, all messages except those used for implementing
the strategy are transferred between processors at most
once. In the next section, a more detailed description of
the stream migration is given.
6.3
Migration Procedure
In the following description, all streams are supposed to
reside in different processors until they move. Operations within a processor are trivial, and are assumed to
cost much less than ones involving IPCs. It is also supposed that streams are connected ,in a processor other
than that of the receiving object. Otherwise, the migration procedure is so simple to become identical with an
ordinary sending without migration.
1. A stream is placed on the same processor as its cre-
ator object.
2. When the stream is connected, a control message
named where is sent to the specified receiver. The
control message has a pointer to the stream and a
tag showing the type of the stream, i.e., either an M
node or an A node.
3. The where causes the following actions according to
the type of the receiver:
a stream before its migration handles the control message as if it is an ordinary message.
That is, it is put into the receiver's queue.
It will be transferred again when the receiver
eventually migrates, and will be forwarded to
another receiver, which should cause the following case.
an object or a stream after its migration
creates a new node of the type indicated by
the tag in the control message, and reports the
address of the new node by a control message
named here to the stream waiting for the reply. When the type of the immigrant and the
receiver is the same, the receiver creates no new
node, and reports its own address.
4. When the stream receives the here, it migrates to
the specified new residence, in one of the following
manners according to its type:
M node It sends all messages in its queue to the
new residence. If it hasn't been closed yet, it
leaves in the former residence a pointer forwarding to the new location. The original residence will be reclaimed when it is closed.
A node In addition to the procedure for the M
node, the stream to be appended to the migrating one is connected to the same receiver at the
moment when this A node is closed. That is, a
new where with a pointer to the stream is sent
to the receiver.
796
6.4
Migration Cost
Each stream creates a where. It is transferred between
processors twice, once when the stream is connected, and
once when its receiver migrates. The second transfer
doesn't happen if the receiver is an already moved stream
or an object. Suppose a channel connected to an object
consists of n streams, and of which nd are connected
directly to the object, then the number of IPCs for where
isn+(n-nd).
A here is created in correspondence with a where, and
is transferred between processors once. For all here's, n
IPCs occur.
Migration brings about no transfer of control messages, so the number of IPCs required for migration is
n + (n - nd) + n = 3n - nd.
Closing a stream requires another kind of control message. We call it close. Each stream sends its reader one
close when closed. This adds up to n close's requiring n
IPCs.
Ordinary messages are transferred between processors
always once. If there are m ordinary messages to be sent,
then, in total,
transfers between processors occur.
How many IPCs occur for stream communication if
streams don't move? Neither of where and here are created. A close is still created for a stream. The number
of times ordinary messages and close's are transferred
depends on the structure of the channel.
A channel is a tree having streams as its nodes. Suppose the i-th node receives mi messages, and its depth is
d· where a depth of a node is number of streams in the
p~th from the leaf to the root. For example, the depth
of a leaf directly connected to an object is 2. Then messages sent to the i-th leaf is transferred di - 1 times, and
the total number of transfers will be:
n
I:(d i -1)(mi
+ 1)
nd is the number of control messages used to move all
streams.
The above condition says that if the channel has some
intermediate nodes between the root and leaves, and
more than a certain number of messages are sent through
them, then stream migration is beneficial. Conversely, if
all streams in a channel are directly connected to an object, or too few messages are sent, streams should not
be moved. The next section discusses some optimization
based on detecting those cases.
6.5
Further Optimization
The left-hand side of the above condition becomes zero
when all streams are directly connected to an object.
When connecting a stream, it is detected at run-time
that the receiver is an object; pointers are tagged to indicate the type of the pointed structure. By not moving
those streams, the right-hand side is also decreased to
zero when the left-hand becomes zero.
When less than two messages are sent through a
stream, the stream does not migrate, i.e. it .does not
send out a where. More detailed analysis shows that two
is the least number to make stream migration beneficial.
In addition, various minor optimization methods are
applied to reduce the delay of the first message's delivery. For example, the first message is sent with a where,
packed together in one IPC, if it is available when the
where is sent out. When a where is received by a stream
that only bridges two other streams, receiving no ordinary messages, it immediately forwards this where instead of sending out a new one. Such a stream can be
distinguished by checking its reference count when it receives a where.
7
Evaluation
In order to evaluate performance of the implementation
described in the sections so far, we measured the following three values:
i=l
The condition when it requires less IPCs to implement stream communication with migrating streams
than without them is:
• Delay time
• IPC load
• Total elapsed time for entire execution of a program
n
I:(d i - 1)(mi + 1) > (3n - nd)
+m +n
i=l
This can be rewritten as:
n
I:(d i - 2)(mi
+ 1) > 3n -
nd
i=l
Since di can not be smaller than 2, di - 2 never becomes
negati ve. The next term mi +1 is the number of messages
sent from a node, including a close. The last term 3n -
As a control, we measured against an A'UM-90 system
which does not migrates streams. We call this system
NO_WHERE, and the system that performs the migration WHERE in the following sections.
Programs used in the measurement of delay time and
IP C load form a linear channel, a long chain of streams
without any branches, and send along the channel. Figure 5 shows the objects' configuration. Each PE creates
one stream on itself. When the PE receives a message
connect, it connects its stream to the next lower stream
797
2800r---~--~--~--~-~--~--~--.
: : .l~~~~-: . . . ,,::;:=~~~~:;~.~•.
............ -_.------+-----------
hello-
~ ___
2600
:--_ _
PE_#_O_ _ _ _ _ _-';i:"
1
t
""w-::J""I"li!!!""
.---------,
.:,." . :,.),:,.:,.r, I...".,!,,,":".,,,!,.'
. •..,!",. iii!
",, , , , (, , ,'!,! !-!, ,. - - . ".,---'
COllnect
PE#l
connect - H - - - - H - - '
1
PE#2
------1--- --------;;ERE,
UPSTREAM-+-
2000 /
PE#3
1800
1600
Figure 4: Objects' configuration
1400 .!------=-----:':------'4:----5~---:-6-----:':--~:------!
1
number of hellos
Figure 5: Delay time
on another PE. Also, the first PE releases several messages named hello at its stream.
The connect circulates around the PEs, one at a time,
through a channel different from that thorough whiCh
hellos flow. Two programs which differ in direction of
the circulation were used. We call one of them DOWNSTREAM, in which a connect flows in the same direction
as hellos, and the other UPSTREAM, in which a connect
flows against hellos. The connect in Figure 5 is flowing
UPSTREAM.
The time was measured from after the release of the
hellos and a connect until the arrival of the last hello.
9000
WHERE, UPSTREAM-+WHERE, DOWNSTREAM-+-NO WHERE, UPSTREAM ·0 ..
NO_WHERE, DOWNSTREAM· .....
8000
7000
6000
5000
4000
3000
2000
0~=2;:t;0=---=---=J-~~O--:::::---=--6~~-=---==-:;:t80:==::;10~0~1~2~O=;J14t;0==1~60;=1~8:=O~20 0
number of hellos
7.1
Delay time
Figure 6 shows the result of the delay time measurement, sending up to ten messages down a channel of
length ten.
The values are elapsed time measured on an unloaded
Sequent Symmetry, using 10 PEs. They includes CPU
time and idle time during which PEs were waiting for
messages.
In the DOWNSTREAM case, delay time in the
WHERE is longer than in the NO_WHERE by at most
1000 msec, as expected. In the UPSTREAM case, however, messages arrive earlier in the WHERE than in the
NO_WHERE by 200 msec. The reason for this reversal
is that the migration of streams took place concurrently
with the circulation of the connect in the WHERE. After
the connect reached the uppermost PE, hellos were sent
directly to their final receiver in the WHERE, while, in
the NO_WHERE, they flowed through every PE having
a part of the channel.
From these results, we can expect that the difference
in the delay time of the WHERE and the NO_WHERE
would be smaller than 1000 msec when the connections of
a channel's constituent streams occur in a varying order.
Also, note that the delay time for the first message
in the WHERE is much smaller than those for the later
messages. This results from the optimization, mentioned
at the section 6.5, of sending a where and the first message together whenever possible.
Figure 6: IPC load
7.2
IPC load
Figure 7 shows the result of the IPC load measurement,
sending up to 200 messages down a channel oflength 500.
The values are CPU time measured on an unloaded Sequent Symmetry, using 10 PEs. The results confirm that
the IPC load in the NO_WHERE eventually becomes
much larger than that in the WHERE as the number of
released messages grows.
7.3
Total elapsed time
Figure 8 and Figure 9 shows results of measurements
using a program PRIME, which enumerates prime numbers by the generate-and-test method. The graphs in
Figure 8 are obtained from 10 PEs in a Symmetry, and
those in Figure 9 are from isolated Ethernet network consisting of two Sun Sparc Stations. The top two graphs
in each figure are elapsed time, the next two are average
total CPU time for a PE, and the other one is CPU time
for a PE, spent only for processing other than IPC. The
last one is estimated from CPU time for execution using
1 PE, divided by the number of PEs, i.e., 10.
The graphs for elapsed time shows that the WHERE
798
5000
4500
NO WHERE, elapsed---WHERE, elapsed-+-NO_ ~=~~~;
4000
3500
Q
Q)
3000
~
2500
~
-~
'-'
C'"
~~g
-::-
:"O",C~//////
//
2000
1500
.. /Jr."
1000
500
200
400
600
800
1000 1200 1400_ 1600 1800 2000
maximum integer tested
Figure 7: P RIME on shared memory
70000 r - - - - r - - , - - - . . - - - - , - - - , - - , - - - , - - - , - - - , - - - - .
NO WHERE, elapsed..-WHERE, elapsed -+-60000
NO WHERE, CPU -0--WHERE, CPU -,-CPu, no IPC ~-50000
This is not the case for A'UM. A'UM has abstract
stream communication, whose implementation is left as
the language systems' responsibility. In addition, every
stream is restricted to have only one reader. So streams
in A 'UM can be more efficiently implemented than ones
in CLLs.
An A'UM-90 moves a stream to its reader's processor,
and saves about half of the IPCs required in CLLs. In
spite of the migration, it deliver the first message through
the stream with small delay. A prime number generator
program runs up to 40 % faster in an A'UM-90 than in
the system does not migrate streams.
While the optimization method given in this paper
tries to reduce the number of IPCs for a given distribution of objects, it is also important to find the best
distribution of objects. Of course, those methods have
to balance the amount of IPCs and the parallelism exploitation.
Acknowledgments
We thank Shinji Yanagida and Toshio Tange of NEC Scientific Information System Development for developing
the A'UM-90 abstract-machine emulator.
References
Figure 8: PRIME on Ethernet
is faster than the NO_WHERE. On a Symmetry, the entire speedup can be explained by decrease of CPU time.
There is up to 40% improvement in CPU time spent for
IPC, which can be read from the difference between total and non-IPC portion of CPU time. On Ethernet, the
speedup is much larger than the decrease of CPU time,
due to much slower communication.
8
Conclusion
Streams in CLLs are difficult to implement efficiently for
two reasons:
1. Message buffers are not always placed on their readers' processor, because an arbitrary number of readers are allowed for a buffer. Therefore, interprocessor reading from the buffer takes place with two
IPCs, instead of one required for writing into it.
2. A stream is represented by cascaded message
buffers, which CLLs don't treat as a single body.
Consequently, even if these buffers are placed on
their reader's processor, their address has to be repeatedly sent to their writer.
[Furukawa et al. 1984] K. Furukawa, A. Takeuchi, S. Kunifuji, H. Yasukawa, M. Ohki, K. Ueda, Mandala: A Logic
Based Knowledge Programming System, Proc. FGCS'84,
November 1984.
[Kahn et al. 1986] K. Kahn, E. D. Tribble, M. S. Miller, D.
G. Bobrow, Objects in Concurrent Logic Programming
Languages, Proc. OOPSLA'86, September, 1986.
[Kahn et al. 1989] K. Kahn, Objects - a fresh look, Proc.
Third European Conf. on Object-Oriented Programming, Cambridge University Press, July 1989.
[Saraswat et al. 1990] V. A. Saraswat, K. Kahn, J. Levy,
Janus: A step towards distributed constraint programming, North American Logic Programming Conference,
October 1990.
[Shapiro and Takeuchi 1983] E. Shapiro, A. Takeuchi,
Object-oriented Programming in Concurrent Prolog, New
Generation Computing, 1, 1983.
[Ueda 1985] K. Ueda, Guarded Horn Clauses, Technical Report TR-103, ICOT, June 1985.
[Yoshida and Chikayama 1988] K. Yoshida, T. Chikayama,
A 'UM: A Stream-Based Object-Oriented Language, Proc.
FGCS'88, November 1988. -
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
799
Message-Oriented Parallel Implementation of Moded Flat GHC
Kazunori U eda
Masao Morita
Institute for New Generation Computer Technology
4-28, Mita l-chome, Minato-ku, Tokyo 108, Japan
ueda@icot. or. jp
Mitsubishi Research Institute
3-6, Otemachi 2-chome, Chiyoda-ku, Tokyo 100, Japan
morita@asdal.mri.co.jp
Abstract
We proposed in [Ueda and Morita 1990] a new,.
message-oriented implementation technique for Moded
Flat GHC that compiled unification for data transfer into message passing. The technique was based
on constraint-based program analysis, and significantly
improved the performance of programs that used goals
and streams to implement reconfigurable data structures. In this paper we discuss how the technique
can be parallelized.
We focus on a method for
shared-memory multiprocessors, called the shared-goal
method, though a different method could be used for
distributed-memory multiprocessors. Unlike other parallel implementations of concurrent logic languages
which we call process-oriented, the unit of parallel execution is not an individual goal but a chain of message
sends caused successively by an initial message send.
Parallelism comes from the existence of different chains
of message sends that can be executed independently
or in a pipelined manner. Mutual exclusion based on
busy waiting and on message buffering controls access
to individual, shared goals. Typical goals allow lastsend optimization, the message-oriented counterpart of
last-call optimization. We are building an experimental implementation on Sequent Symmetry. In spite of
the simple scheduling currently adopted, preliminary
evaluation shows good parallel speedup and good absolute performance for concurrent operations on binary
process trees.
1. Introduction
Concurrent processes can be used both for programming computation and for programming storage. The
latter aspect can be exploited in concurrent logic programming to program reconfigurable data structures
using the following analogy,
records
pointers
+-----t
+-----t
(body) goals
streams (implemented by lists)
where a (concurrent) process is said to be implemented
by a multiset of goals.
nt([],
_, L,R) :-true I
L= [J , R= [J .
nt([search(K,V) ICs] ,K, Vl,L,R) :- true I
V=Vl, nt(Cs,K,Vl,L,R).
nt([search(K,V)ICs],Kl,Vl,L,R) :-KKl I
R=[search(K,V)IR1], nt(Cs,Kl,Vl,L,Rl).
nt ([update(K, V) ICs] ,K, _, L ,R) : - true I
nt(Cs,K,V,L,R).
nt([update(K,V)ICs],Kl,Vl,L,R) :-KKll
R=[update(K,V)IR1], nt(Cs,Kl,Vl,L,Rl).
tee]
) :-true I true.
t ([searchC, V) I Cs]) : - true I
V=undefined, t(Cs).
t ([update(K, v) ICs]) : - true I
nt(Cs,K,V,L,R), tel), t(R).
Program 1. A GHC program defining
binary search trees as processes
An advantage of using processes for this purpose is
that it allows implementations to exploit parallelism
between operations on the storage. For instance, a
search operation on a binary search tree (Program 1),
given as a message in the interface stream, can enter
the tree soon after the previous operation has passed
the root of the tree. Programmers do not have to worry
about mutual exclusion, which is taken care of by the
implementation. This suggests that the programming
of reconfigurable data structures can be an important
application of concurrent logic languages. (The verbosity of Program 1 is a separate issue which is out of
the scope of this paper.)
Processes as storage are almost always suspending, but should respond quickly when messages are
sent.
However, most implementations of concurrent logic languages have not been tuned for processes with this characteristic. In our earlier paper [Ueda and Morita 1990], we proposed messageoriented scheduling of goals for sequential implementation, which optimizes goals that suspend and resume
800
frequently. Although our primary goal was to optimize
storage-intensive (or more generally, demand-driven)
programs, the proposed technique worked quite well
also for computation-intensive programs that did not
use one-to-many communication. However, how to utilize the technique in parallel implementation was yet to
be studied.
Parallelization of message-oriented scheduling can
be quite different from parallelization of ordinary,
process-oriented scheduling. An obvious way of parallelizing process-oriented scheduling is to execute different goals on different processors. In message-oriented
scheduling, the basic idea should be to execute different
message sends on different processors, but many problems must be solved as. to the mapping of computation
to processors, mutual exclusion, and so on. This paper
reports the initial study on the subject.
The rest of the paper is organized as follows:
Section 2 reviews Moded Flat GHC, the subset of
GHC we are going to implement. Section 3 reviews
message-oriented scheduling for sequential implementation. Section 4 discusses how to parallelize messageoriented scheduling. Of the two possible methods suggested, Section 5 focuses on the shared-goal method
suitable for shared-memory multiprocessors and discusses design issues in more detail. Section 6 shows
the result of preliminary performance evaluation. The
readers are assumed to be familiar with concurrent
logic languages [Shapiro 1989]:
2. Moded Flat GRC and Constraint-Based Program Analysis
Moded Flat GHC [Ueda and Morita 1990] is a subset
of GHC that introduces a mode system for the compiletime global analysis of dataflow caused by unification.
Unification executed in clause bodies can cause bidirectional dataflow in general, but mode analysis tries
to guarantee that it is assignment to an uninstantiated
variable effectively and does not fail (except due to occur check).
Our experience with GHC and KLI [Ueda and
Chikayama 1990] has shown that the full functionality of bidirectional unification is seldom used and that
programs using it can be rewritten rather easily (if not
automatically) to programs using unification as assignment. These languages are indeed used as generalpurpose concurrent languages, which means that it is
very important to optimize basic operations such as
unification and to obtain machine codes close to those
obtained from procedural languages.
For global compile-time analysis to be practical,
it is highly desirable that individual program modules can be analyzed separately in such
way that
the results can be merged later. The mod~ system of
Moded Flat GHCis thus constraint-based; the mode
a
of a whole program can be determined by accumulating the mode constraints obtained separately from the
syntactic analysis of each program clause. Another advantage of the constraint-based system is that it allows
programmers to declare some of the mode constraints,
in which case the analysis works as mode checking as
well as mode inference.
The modularity of the analysis was brought by the
rather strong assumption of the mode system: whether
the function symbol at some position (possibly deep in
a data structure) of a goal 9 is determined by 9 or by
other goals running concurrently is determined solely
by that position specified by a path, which is defined
as follows. Let Pred be the set of predicate symbols
and Fun the set of function symbols. For each p E Pred
with the arity n p, let Np be the set {1,2, ... ,np}. N f
is defined similarly for each f E Fun. Now the sets
of paths Pt (for terms) and P a (for atoms) are defined
using disjoint union as:
Pt = (
L
fEFun
N f)* ,
Pa = (
L
N p ) x Pt.
pEPred
An element of Pa can be written as a string (p, i)(h,
j1) ... (fn,jn), that is, it records the predicate and the
function symbols on the way as well as the argument
positions selected. A mode is a function from Pa to
the set {in, out}, which means that it assigns either of
in or out to every possible position of every possible
instance of every possible goal. Whether some position
is in or out can depend on the predicate and function
symbols on the path down to that position. The function can be partial, because the mode values of many
uninteresting positions that will not come to exist can
be left undefined.
Mode analysis checks if every variable generated in
the course of execution will have exactly one out occurrence (occurrence at an out position) that can determine its top-level value, by accumulating constraints
between 'the mode values of different paths.
Constraint-based analysis can be applied to analyzing other properties of programs as well. For instance,
if we can assume that streams and non-stream data
structures do not occur at the same position of different goals, we can try to classify all the positions into
(1) those whose top-level values are limited to the list
constructors (cons and nil) and
(2) those whose top-level values are limited to symbols
other than the list constructors,
which is the simplest kind of type inference. Other
applications include the static identification of 'singlereference' positions, namely positions whose values are
not read by more than one goal and hence can be
discarded or destructively updated after use. This
could replace the MRB (multiple-reference bit) scheme
[Chikayama and Kimura 1987], a runtime scheme
801
(p)
(q)
sender's
goal
record
receiver's
goal
record
adopted in current KL1 implementations for the same
purpose.
3. Message-Oriented (Sequential) Implementation
In a process-oriented sequential implementation of concurrent logic languages, goals ready for execution are
put in a queue (or a stack or a deque, depending on
the scheduling). Once a goal is taken from the queue,
it is reduced as many times as possible, using last-call
optimization, until it suspends or it is swapped out. A
suspended goal is hooked on the uninstantiated variable( s) that caused suspension, and when one of the
variables is instantiated, it is put back into the queue.
Message-oriented implementation has much in
common with process-oriented implementation, but
differs in the treatment of stream communication: It
compiles the generation of stream elements into procedure calls to the consumer of the stream. A stream
is an unbounded buffer of messages in principle, but
message-oriented implementation tries to reduce the
overhead of buffering and unbuffering by transferring
control and messages simultaneously to the consumer
whenever possible. To this end, it tries to schedule
goals so that whenever the producer of a stream sends
a message, the consumer is suspending on the stream
and is ready to handle the message. Of course, this
is not always possible because we can write a program
in which a stream must act as a buffer; messages are
buffered when the consumer is not ready to handle incoming messages.
Process-oriented implementation tries to achieve
good performance by reducing the frequency of costly
goal switching and taking advantage of last-call optimization. Message-oriented implementation tries to reduce the cost of each goal switching operation and the
cost of data transfer between goals.
Suppose two goals, p and q, are connected by a
stream sand p is going to send a message to q that
is suspending on s. Message-oriented implementation
represents s as a two-field communication cell that
points to (1) the instruction in q's code from which the
processing of q is to be resumed and (2) q's goal record
containing its arguments (Fig. 1). (Throughout the paper, we assume that a suspended goal will resume its
execution from the instruction following the one that
caused suspension, not from the first instruction of the
predicate.) To send a message m, p first loads m on
a hardware register called the communication register,
changes the current goal to the one pointed to by the
communication cell of s, and calls the code pointed to
by the communication cell of s. The goal q gets m
from the communication register and may send other
messages in its turn. Control returns to p when all
the message sends caused directly or indirectly by m
put~get
~----~mes.~mes. ~----~
comm. reg.
(hardware)
Fig. 1. Immediate message send
code for buffering
(p)
sender's
goal
record
(q)
CJ
comm. reg.
(hardware)
Fig. 2. Buffered message send
have been processed. However, if m is the last message which p can send out immediately (i.e., without
waiting for further incoming messages), control need
not return to p but can go directly to the goal that
has outstanding message sends. This is called last-send
optimization, which we shall see in Section 5.4 in more
detail.
We have observed in GHCjKL1 programming that
the dominant form of interprocess communication is
one-to-one stream communication. It therefore deserves special treatment, even though other forms of
communication such as broadcasting and multicasting
become a little more expensive. One-to-many communication is done either by the repeated sending of messages or by using non-stream data structures.
Techniques mentioned in Section 2 are used to analyze which positions of a predicate and which variables
in a program are used for streams and to distinguish
between the sender and the receiver( s) of messages.
When a stream must buffer messages, the communication cell representing the stream points to the code
for buffering and the descriptor of a buffer. The old entries of the communication cell are saved in the descriptor (Fig. 2). In general, a stream must buffer incoming
messages when the receiver goal is not ready to handle them. The following are the possible reasons [Ueda
and Morita 1990]:
802
Fig. 3. Binary search tree as a process
(1) (selective message receiving) The receiver is waiting for a message from other input streams.
(2) The receiver is suspending on non-stream data
(possibly the contents of messages).
(3) The sender of a message may run ahead of the receIver.
(4) When the receiver r belongs to a circular process
structure, a message m sent by r may possibly arrive at r itself or may cause another message to be
sent back to 1'. However, unless m has been sent
by last-send optimization, r is not ready to receive
it.
The receiver examines the buffer when the reason
for the buffering disappears, and handles messages (if
any) in it.
Process-oriented implementation often caches (part
of) a goal record on hardware registers, but this should
not be done in message-oriented implementation III
which process switching takes place frequently.
4. Parallelization
How can we exploit parallelism from message-oriented
implementation? Two quite different methods can be
considered:
Distributed- goal method.
Different processors take
charge of different goals, and each processor handles
messages sent to the goals it is taking charge of.
Consider a binary search tree represented using goals
and streams (Fig. 3) and suppose three processors take
charge of the three different portions of the tree. Each
processor performs message-oriented processing within
its own portion, while message transfer between portions is compiled into inter-processor communication
with buffering.
Shared- goal method. All processors share all the goals.
There is a global, output-restricted de que [Knuth 1973]
of outstanding work to be done in parallel, from which
an idle processor gets a new job. The job is usually to
execute a non-unification body goal or to send a message, the latter being the result of compiling a unification body goal involving streams. The message send
will usually cause the reduction of a suspended goal. If
the reduction generates another unification goal that
has been compiled into a message send, it can be performed by the same processor. Thus a chain of message
sends is formed, and different chains of message sends
can be performed in parallel as long as they do not interfere with each other. In the binary tree example, different processors will take care of different operations
sent to the root. A tree operation may cause subsequent message sends inside the tree, but they should
be performed by the same processor because there is
no parallelism within each tree operation.
Unlike the shared-goal method, the distributedgoal method can be applied to distributed-memory
multiprocessors as well as shared-memory ones to
improve the throughput of message handling. On
shared-memory multiprocessors, however, the sharedgoal method is more advantageous in terms of latency
(i.e., responses to messages), because (1) it performs no
inter-processor communication within a chain of message sends and (2) good load balancing can be attained
easily. The shared-goal method requires a locking protocol for goals as will be discussed in Section 5.1, but
it enables more tightly-coupled parallel processing that
covers a wider range of applications. Because of its
greater technical interest, the rest of the paper is focused on the shared-goal method.
5. Shared-Goal Implementation
In this section, we discuss important technicalities in
implementing the shared-goal method. We explain the
method and the intermediate code mainly by examples.
Space limitations do not allow the full description of
the implementation, though we had to solve a number
of subtle problems related to concurrency control.
5.1 Locking of Goals
Consider a goal p(Xs, Ys) defined by the following
single clause:
p([AIXs1],Ys) :- true I
Ys=[AIYs1], p(Xs1,Ys1).
In the shared-goal method, different messages in
the input stream XS may be handled by different processors that share the goal p (Xs, Ys). Any processor
sending a message must therefore try to lock the goal
record (placed in the shared memory) of the receiver
first and obtain the grant of exclusive access to it. The
receiver must remain locked until it sends a message
through Ys and restores the dormant state.
The locking operation is important in the following
respect as well: In message-oriented implementation,
the order of the elements in a stream is not represented
803
spatially as a list structure but as the chronological order of message sends. The locking protocol must therefore make sure that when two messages, 0:' and /3, are
sent in this order to p (Xs, Ys), they are sent to the
receiver of Ys in the same order. This is guaranteed by
locking the receiver of Ys before p(Xs, Ys) is unlocked.
5.2 Busy Wait vs. Suspension
How should a processor trying to send a message wait
until the receiver goal is unlocked? The two extreme
possibilities are (1) to spin (busy-wait) until unlocked
and (2) to give up (suspend) the sending immediately
and do some other work, leaving a notice to the receiver
that it has a message to receive. We must take the
following observations into account here:
(a) The time each reduction takes, namely the time required for a resumed goal to restore the dormant
state, is usually short (several tens of CISC instructions, say), though it can be considerably long
sometimes.
(b) As explained in Section 5.1, a processor may lock
more than one goal temporarily upon reduction.
This means that busy wait may cause. deadlock
when goals and streams form a circular structure.
Because busy wait incurs much smaller overhead
than suspension, Observation (a) suggests that the processor should spin for a period of time within which
most goals can perform one reduction. However, it
should suspend finally because of (b).
Upon suspension, a buffer is prepared as in Fig. 2,
and the unsent message is put in it. Subsequent messages go to the buffer until the receiver has processed
all the messages in the buffer and has removed the
buffer. As is evident from Fig. 2, no overhead is incurred to check if the message is going to the buffer
or to the receiver. The receiver could notice the existence of outstanding messages by checking its input
streams upon each reduction, but it incurs overhead to
(normal) programs which do not require buffering. So
we have chosen to avoid this overhead by letting the
sender spawn and schedule a special routine, called the
retransmitter of the messages, when it creates a buffer.
The retransmitter is executed asynchronously with the
receiver. When executed, it tests if the receiver has
been unlocked, in which case it sends the first message
in the buffer and re-schedules itself.
For the shared resources other than goals (such as
logic variables and the global deque), mutual exclusion should be attained by busy wait, because access to
them takes a short period of time. On the other hand,
synchronization on the values of non-stream variables
(due to the semantics of GHC) should be implemented
using suspension as usual.
5.3 Scheduling
Shared-goal implementation exploits parallelism between different chains of message sends that do not
interfere with each other. For instance, a binary search
tree (Fig. 3) can process different operations on it in
a pipelined manner, as long as there is no dependence
between the operations (e.g., the key of a search operation depending on the result of the previous search
operation). When there is dependency, however, parallel execution can even lower the performance because
of synchronization overhead.
Another example for which parallelism does not
help is a demand-driven generator of prime numbers
which is made up of cascaded goals for filtering out
the multiples of prime numbers. The topmost goal receiving a new demand from outside filters out the multiples of the prime computed in response to the last
demand. However, until the last demand has almost
been processed, the topmost goal doesn't know what
prime's multiples should be filtered out, and hence will
be blocked.
These considerations suggest that in order to avoid
ineffective parallelism, it is most realistic to let programmers specify which chains of message sends should
be done in parallel with others. The simple method we
are using currently is to have (1) a global deque for the
work to be executed in parallel by idle processors and
(2) one local stack for each processor for the work to be
executed sequentially by the current processor. Each
processor obtains a job from the global deque when its
local stack is empty. We use a global deque rather than
a global stack because, if the retransmitter of a buffer
fails to send a message, it must go to the tail of the
deque so it may not be retried soon.
Each job in a stack/ deque is uniformly represented
as a pair (code, env), where code is the job's entry /resumption point and env is its environment. The
job is usually to start the execution of a goal or to resume the execution of a clause body. In these cases, env
points to the goal record on which code should work.
When the job is to retransmit buffered messages, env
points to the communication cell pointing to the buffer.
When a clause body has several message sends to
be executed in parallel, they will not put in the deque
separately. Instead, the current processor executing
the clause body performs the first send (and any sends
caused by that send), putting the rest of the work to
the deque after the first send succeeds in locking the
receiver. Then an idle processor will get the rest of
the work and perform the second message send (and
any sends caused by that send), putting the rest of the
rest back to the deque. This procedure is to guarantee
the order of messages sent through a single stream by
different processors. Suppose two messages, 0:' and /3,
are sent by a goal like Xs= [0:' ,/31 Xs 1]. Then we have
to make sure that the processor trying to send /3 will
804
not lock the receiver of Xs before the processor trying
to send a has done so.
5.4 Reduction
This section outlines what a typical goal should do during one reduction, where by 'typical' we mean goals
that can be reduced by receiving one message. As an
example, consider the distributor of messages defined
as follows,
p([A!Xs] ,Ys,Zs) :- true!
Ys=[A!Ysi] , Zs=[A!Zsi], p(Xs,Ysi,Zsi).
where we assume A is known, by program analysis or
declaration, to be a non-stream datum. (Otherwise
a somewhat more complex procedure is necessary, because the three occurrences of A will be used for one-totwo communication.) The intermediate code for above
program is:
entry(p/3)
rcv _val ue (Ai)
get_cr(A4)
send_call(A2)
put_cr(A4)
send_call(A3)
} or send_jmp(A3) .
execute
The Ai's are entries of the goal record of the goal
being executed, which contain the arguments of the
goal and temporary variables. Other programs may use
Xi's, which are (possibly virtual) general registers local
to each processor, and GAi's, which are the arguments
of a new goal being created. The label entry(p/3)
indicates the initial entry point of the predicate p with
three arguments.
The instruction rcv _val ue (Ai) waits for a message from the input stream on the first argument. If
messages are already buffered, it takes the first one and
puts it on the communication register. A retransmitter
of the buffer is put on the deque if more messages exist; otherwise the buffer is made to disappear (Section
5.7). If no messages are buffered, which is expected to
be most probable, rcv_value unlocks the goal record,
and suspends until a message arrives. In either case,
the itlstruction records the address of the next instruction in the communication cell (or, if the communication cell points to a buffer, in the buffer descriptor).
The goal is usually suspending at this instruction.
The instruction get_cr(A4) saves into the goal
record the message in the communication register,
which the previous rcv_value(Ai) has received. Then
send_call (A2) sends the message in the communication register through the second stream. The instruction send_call(A2) tries to lock the receiver of the
second stream and if successful, transfers control to
the receiver. If the receiver is busy for a certain period of time or it isn't busy but is not ready to handle
the message, the message is buffered. The instruction'
send_call does not unlock the current goal record.
When control eventually returns, put_cr(A4) restores
the communication register and send_call(A3) sends
the next message.
When control returns again, execut e performs the
recursive call by going back to the entry point of the
predicate p. Then the rcv _val ue (Ai) instruction will
either find no buffered messages or find some. In the
former case, rcv_value(Ai) obviously suspends. In
the latter case, a retransmitter of the buffer must have
been scheduled, and so rcv_value(Ai) can suspend
until the retransmitter sends a message. Moreover, the
resumption address of the rcv _ val ue (Ai) instruction
has been recorded by its previous execution. Thus in
either case, execute effectively does nothing but unlocking the current goal. This is why last-send optimization can replace the last two instructions into a
single instruction, send_jmp(A3).
The instruction send_jmp(A3) locks the receiver of
the third stream, unlocks the current goal, and transfers control to the receiver without stacking the return
address. Last-send optimization enables the current
goal to receive the next message earlier and allows the
pipelined processing of message sends. Note that with
last-send optimization, the rcv _ val ue (Ai) instruction
will be executed only once when the goal starts execution. The instructions executed for each incoming message are those from get_cr(A4) through send_
jmp(A3) .
The above instruction sequence performs the two
message sends sequentially. However, a variant of
send_call called send_fork stacks the return address
on the global deque instead of the local stack, allowing
the continuation to be processed in parallel. Note that
send_fork leaves the continuation to another processor rather than the message send itself for the reason
explained in Section 5.3.
We have established a code generation scheme for
general cases including the spawning and the termination of goals (Section 5.5), explicit control of message buffering (Section 5.6), and suspension on nonstream variables. Several optimization techniques have
been developed as well, for instance for goals whose
input streams are known to carry messages of limited forms (e.g., non-root nodes of a binary search
tree (Fig. 3)). Finally, we note that although processoriented scheduling and message-oriented scheduling
differ in the flow of control, they are quite compatible in the sense that an implementation can use both
in running a single program. Our experimental implementation has actually been made by modifying a
process-oriented implementation.
5.5 An Example
Here we give the intermediate code of a naIve reverse
805
The program:
(1) nreverse ([H IT] ,0)
(2) nreverse ([] ,
0)
(3) append([IIJ] ,K,L)
(4) append( [] ,
K, L)
true
true
true
true
append(01,[H] ,0), nreverse(T,01).
0= [] .
L=[IIM], append(J,K,M).
K=L.
entry(nreverse/2)
rcv_value(A1)
receive a message from the 1st arg
(the program is usually waiting for incoming messages here)
check_not_eos(101)
if the message is eos then collect the current comm. cell and goto 101
get_cr(X3)
save the message H in the comm. reg. to the register of the current P E
commit
Clause 1 is selected (no operation)
put_cc(X4)
create a comm. cell with a buffer
push_value(X3)
put the message H into the buffer
push_eos
put eos into the buffer
g_setup(append/3,3)
create a goal record for S args and record the name
put_value(A2,GA3)
set the Srd arg of append to a
put_value(X4,GA2)
set the 2nd arg of append to [H]
put_com_variable(A2,GA1) create a locked variable 01 and set the 2nd arg of nreverse and the
1st arg of append to the pointer to 01,
assuming that append will turn 01 into a comm. cell soon
g_call
execute append until it suspends
return
unlock the current goal and do the job on the local stack top
label(101)
commit
Clause 2 is selected (no operation)
send_call(A2)
send eos in the comm. reg. to the receiver of a
proceed
deallocate the goal record and return
entry(append/3)
deref(A3)
rcv_value(A1)
check_not_eos(102)
commit
sendn_jmp(A3)
label(102)
commit
send_unify_jmp(A2,A3)
dereference the Srd arg L
receive a message from the 1st argo
if the message is eos then collect the current comm. cell and goto 102
Clause S is selected (no operation)
send the received message to the receiver of L, where
'n' means that the instruction assumes that L has been dereferenced
Clause 4 is selected (no operation)
make sure that messages sent through K are
forwarded to the receiver of L, and return
Fig. 4. Intermediate code for naIve reverse
program (Fig. 4). In order for the code to be almost
self-explanatory, some comments are appropriate here.
Suppose the messages ml, ... , mn are sent to the
goal nreverse (In, Out) through In, followed by the
eos (end-of-stream) message indicating that the stream
is closed. The nreverse goal generates one suspended
append goal for each mi, creating the structure in
Fig. 5. The ith append has as its second argument
a buffer with two messages, mi and eos. The final eos
message to nreverse causes the second clause to forward the eos to the most recent append goal holding
m n . The append holding m n , in response, lets different
(if available) processors send the two buffered messages
mn and eos to the append holding m n -1. The message
mn is transferred all the way to the append holding m1
and appears in Out. The following eos causes the next
append goal to send m n -1 and another eos.
The performance of nrevers e hinges on how fast
each append goal can transfer messages. For each incoming message, an append goal checks if the message
is not eos and then transfers both the message and control to the receiver of the output stream. The message
remains on the communication register and need not
be loaded or stored.
The send_unify_jmp(1'1,r2) instruction is used
for the unification of two streams. Arrangements are
made so that next time a message is sent through 1'1,
the sender is made to point directly to the communication cell of 1'2' If the stream 1'1 has a buffer (which is
the case with nreverse), the above redirection is made
to happen after all the contents of the buffer are sent
to the receiver of 1'2.
It is worth noting that the multiway merging of
streams can transfer messages as efficiently as append.
806
Fig. 5. Process structure being created by
nreverse([rnl,'" ,rn n ] ,Out)
5.6 Buffering
As discussed in Section 5.2, the producer of a stream s
creates a buffer when the receiver is locked for a long
time. However, this is a rather unusual situation; a
buffer is usually created by s's receiver when it remains
unready to handle incoming messages after it has unlocked itself. Here we re-examine the four reasons of
buffering in Section 3:
(1) Selective message receiving. This happens, for instance, in a program that merges two sorted streams
of integers into a single sorted stream:
omerge([AIX1],[BIY1] ,Z) :- A< B
Z=[AIZ1], omerge(X1,[BIY1] ,Z1).
omerge([AIX1], [BIY1] ,Z) :- A>=B I
Z=[BIZ1], omerge([AIX1],Y1,Z1).
Two numbers, one from each input stream, are necessary for a reduction. Suppose the first number A arrives through the first stream. Then the goal omerge
checks if the second stream has a buffered value. Since
it doesn't, the goal cannot be reduced. So it records
A in the goal record and changes the first stream to a
buffer, because it has to wait for another number B to
come through the second stream. Suppose B( > A) arri ves and the first clause is selected. Then the second
stream ~hould become a buffer and B will be put back.
The first stream, now being a buffer, is checked and a
retransmitter is stacked if it contains an element; otherwise the buffer is made to disappear. Finally A is sent to
the receiver of the third stream. The above procedure
is admittedly complex, but this program is indeed one
of the hardest ones to execute in a message-oriented
manner. A simpler example of selective message receiving appears in the append program in Section 5.5;
its second input stream buffers messages until the nonrecursi ve clause is selected.
(2) Suspension on non-stream data. The most likely
case is suspension on the content of a message (e.g.,
the first argument of an update message to a binary
search tree). When a goal receives from a stream s
a message that is not sufficiently instantiated for reduction, it changes s to a buffer and puts the message
back to it. A retransmitter is hooked on the uninstantiated variable( s) that caused suspension, which will be
invoked when any of them are instantiated.
(3) The sender of a stream running ahead of the receiver. It is not always possible to guarantee that the
sender of a stream does not send a message before the
receiver commences execution, though the scheduling
policy tries to avoid such a situation. The simplest solution to this problem is to initialize each stream to an
empty buffer. However, creating and collecting a buffer
incurs certain overhead, while a buffer created for the
above reason will receive no messages in most cases. So
the current scheme defers the creation of a real buffer
until a message is sent. Moreover, when the message is
guaranteed to be received soon, the put_corn_variable
instruction (Fig. 4) is generated and lets the sender
busy-wait until the receiver executes rcv_value.
(4) Circular process structure. When the receiver sends
more than one message in response to an incoming
message, sequential implementation must buffer subsequent incoming messages until the last message is sent
out. In parallel implementation, the same effect is automatically achieved by the lock of the goal record, and
hence the explicit control of buffering is not necessary.
The retransmission of a buffer created due to the
reason (1) or (3) is explicitly controlled by the receiver.
When a buffer is created due to the reason (2) or by
the sender of a stream, a retransmitter of the buffer is
scheduled asynchronously with the receiver.
5.7 Mutual Exclusion of Communication Cells
The two fields of a communication cell representing a
stream may be updated both by the sender and the
receiver of the stream. For instance, the sender may
create a buffer and connect it to the cell when the receiver is locked for a certain period of time. The receiver may set or update the cell by the rcv_value
instruction, may create or remove a buffer for the cell
when buffering becomes necessary or unnecessary, may
execute send_unify_jmp and connect the stream to
another, and may move or delete the goal record of its
own.
This of course calls for some method of mutual exelusion for communication cells. The simplest solution
would be to lock a communication cell whenever updating or reading it, but locking both a goal record
and a communication cell for each message send would
be too costly. It is highly desirable that an ordinary
message send, which reads but does not update a communication cell, need not lock the communication cell.
However, without locking upon reading, the following sequence can happen and inconsistency arises:
(1) the sender follows the pointer in the second field
(the environment) of the communication cell,
(2) the receiver starts and completes the updating of
the communication cell (under an appropriate locking protocol), and then
807
Table 1. Performance Evaluation (in seconds)
Language
Processing
GHC
1
1
2
3
4
5
6
7
8
C (recursion)
C (iteration)
cc-O
cc-o
PE (no locking)
PE
PEs
PEs
PEs
PEs
PEs
PEs
PEs
binary process tree
(5000 operations)
(search) (update)
1.25
1.38
0.78
0.55
0.44
0.36
0.33
0.33
0.33
1.83
2.10
1.15·
0.81
0.63
0.53
0.46
0.39
0.36
0.71
0.32
0.72
0.35
naIve reverse
(1000 elements)
2.23
3.27
2.43
1. 71
1.33
1.10
0.96
0.85
0.77
(225
(154
(207
(294
(377
(456
(523
(591
(652
kRPS)*
kRPS)
kRPS)
kRPS)
kRPS)
kRPS)
kRPS)
kRPS)
kRPS)
(* kilo Reductions Per Second)
(3) the sender locks the (wrong) record r (the goal
record for the receiver or a buffer for the communication cell) obtained in Step (1) and calls the code
pointed to by the first field (the code) of the updated communication cell.
This can be avoided by not letting the receiver update the second field of the communication cell. The
receiver instead stores into the record r the pointer p
to the right record. The receiver accordingly sets the
first field of the communication cell to the pointer to a
code sequence (to be called by the sender in Step (3))
that notifies the sender of the existence of the pointer
p.
The sender can now access the right record pointed
to by p via the wrong record r, but it is still desirable
that p is finally written into the second field of the communication cell so that the right record can be accessed
directly next time. This update of the communication
cell must be done before the sender is unlocked and the
control is completely transferred to the receiver.
For this purpose, we take advantage of the fact that
the 1-byte lock of a record can take states other than
'locked' and 'unlocked'. When the lock of a record has
one of these other states, a special routine corresponding to that state runs before the goal record of the
sender is unlocked. This feature is being used for updating the second field of a communication cell safely.
6. An Experimental System and Its Performance
We have almost finished the initial version of the
abstract machine instruction set for the shared-goal
method. An experimental runtime system for performance evaluation has been developed on Sequent
Symmetry, a shared-memory parallel computer with
20MHz 80386's. The system is written in .an assembly language and C, and the abstract machine instructions are expanded into native codes automatically by
a loader. A compiler from Moded Flat GHC to the
intermediate code is yet to be developed.
The current system employs a simple scheme of
parallel execution as described in Section 5.3. When
the system runs with more than one processor, one
of them acts as a master processor and the others as
slaves. They act in the same manner while the global
deque is non-empty. When the master fails to obtain a
new job from the deque, it tries to detect termination
and exceptions such as stack overflow. The current system does not care about perpetually suspended goals;
they are treated just like garbage cells in Lisp. A slight
overhead of counting the number of goals in the system will be necessary to detect perpetually suspended
goals [Inamura and Onishi 1990] and/or to feature the
shoen construct of KL1 [Veda and Chikayama 1990],
but it should scarcely affect the result of performance
evaluation described below.
Locking of shared resources, namely logic variables,
goal records, communication cells, the global deque,
etc., is done using the xchg (exchange) instruction as
usual.
V sing Program 1, we measured (1) the processing
time of 5000 update operations with random keys given
to an empty binary tree and (2) the processing time
of 5000 search operations (with the same sequence of
keys) to the resulting tree with 4777 nodes. The number of processors was changed from 1 to 8. For the oneprocessor case, a version without locking/unlocking operations was tested as well. The numbers include the
execution time of the driver that sends messages to the
tree. The result was compared with two versions of (sequential) C programs using records and pointers, one
using recursion and the other using iteration. The performance of nreverse (Fig. 4) was measured as well.
The results are shown in Table 1.
The results show good (if not ideal) parallel
speedup, though for search operations on a binary
tree, the performance is finally bounded by the sequen-
808
tial nature of the driver and the root node. Access
contention on the global deque can be another cause
of overhead. Note, however, that the two examples are
indeed harder to execute in parallel than running independent processes in parallel, because different chains
of message sends share goals. Note also that the binary
tree with 4777 nodes is not very de~p.
The binary tree program run with 4 processors outperformed the optimized recursive C program. The iterative C program was more than twice as fast as the
recursive one and was comparable to the GHC program run with 8 processors. The comparison, however,
would have been more preferable to parallel GHC if a
larger tree had been used.
The overhead of locking/unlocking was about 30%
in nreverse and about 10% in the binary tree program. Since nreverse is one of the fastest programs
in terms of the kRPS value, we can conclude that the
overhead of locking/unlocking is reasonably small on
average even if we lock such small entities as individual goals.
As for space efficiency, the essential difference between our implementation and C implementations is
that GHC goal records have pointers to input streams
while C records do not consume memory by being
pointed to. The difference comes from the expressive
power of streams; unlike pointers, streams can be unified together and can buffer messages implicitly.
One may suspect that message-oriented implementation suffers from poor locality in general. This is true
for data locality, because a single message chain can
visit many goals. However, streams in process-oriented
implementation cannot enjoy very good locality either,
because a tail-recursive goal can generate a long list of
messages. Both process-oriented and message-oriented
implementations enjoy good instruction locality for the
binary tree program and nreverse.
Comparison of performance between a messageoriented implementation and a process-oriented implementation was reported in [Ueda and Morita 1990] for
the one-processor case.
7. Conclusions and Future Works
The main contribution of this paper is that messageoriented implementation of Moded Flat GHC was
shown to benefit from small-grain, tightly-coupled parallelism on shared-memory multiprocessors. Furthermore, the result of preliminary evaluation shows that
the absolute performance is good enough to be compared with procedural programs.
These results suggest that the programming of reconfigurable storage structures that allow concurrent
access can be a realistic application of Moded Flat
GHC. Programmers need not worry about mutual exclusion necessitated by parallelization, because it is
achieved automatically at the implementation level. In
procedural languages, parallelization may well require
major rewriting of programs. To our knowledge, how to
deal with reconfigurable storage structures efficiently in
non-procedural languages without side effects has not
been studied in depth.
We have not yet fully studied language constructs
and their implementation for more minute control over
parallel execution. The current scheme for the control
of parallelism is a simple extension to the sequential
system; it worked well for the benchmark programs
used, but will not be powerful enough to be able to tune
the performance of large programs. We need a notion
of priority that should be somewhat different from the
priority construct in KL1 designed for process-oriented
parallel execution. The notion of fairness may have to
be reconsidered also. KL1 provides the shoen (manor)
construct as well, which is the unit of execution control,
exception handling and resource consumption control.
How to adapt the shoen construct to message-oriented
implementation is another research topic.
Acknowledgments
The authors are indebted to the anonymous referees
for helpful comments.
References
[Chikayama and Kimura 1987] T. Chikayama and
Y. Kimura, Multiple Reference Management in
Flat GHC. In Proc. 4th Int. Conf. on Logic Programming, MIT Press, 1987, pp. 276-293.
[Inamura and Onishi 1990] Y. Inamura and S. Onishi,
A Detection Algorithm of Perpetual Suspension in
KLl. In Proc. Seventh Int. Conf. on Logic Programming, MIT Press, 1990, pp. 18-30.
[Knuth 1973] D. E. Knuth, The Art of Computer
Programming, Vol. 1 (2nd ed.). Addison-Wesley,
Reading, MA, 1973.
[Shapiro 1989] Shapiro, E., The Family of Concurrent
Logic Programming Languages. Computing Surveys, Vol. 21, No.3 (1989), pp. 413-510.
[Ueda and Morita 1990] K. Veda and M. Morita, A
New Implementation Technique for Flat GHC. In
Proc. Seventh Int. Conf. on Logic Programming,
MIT Press, 1990, pp. 3-17. A revised, extended
version to appear in New Generation Computing.
[Ueda and Chikayama 1990] K. Ueda and T. Chikayama,
Design of the Kernel Language for the Parallel Inference Machine. The Computer Journal, Vol. 33,
No.6 (Dec., 1990), pp. 494-500.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992
edited by ICOT. © ICOT, 1992
'
809
Towards an Efficient Com pile-Time
Granularity Analysis Algorithm
X. Zhong, E. Tick, S. Duvvuru,
L. Hansen, A. V. S. Sastry and R. Sundararajan
Dept. of Computer Science
University of Oregon
Eugene, OR 97403
Abstract
We present a new granularity analysis scheme for concurrent logic programs. The main idea is that, instead ot"
trying to estimate costs of goals precisely, we provide a
compile-time analysis method which can efficiently and
precisely estimate relative costs of active goals given the
cost of a goal at runtime. This is achieved by estimating the cost relationship between an active goal and its
subgoals at compile time, based on the call graph of the
program. Iteration parameters are introduced to handle
recursive procedures. We show that the method accurately estimates cost, for some simple benchmark programs. Compared with methods in the literature, our
scheme has several advantages: it is applicable to any
program, it gives a more precise cost estimation than
static methods, and it has lighter runtime overheads
than absolute estimation methods.
1
Introduction
The importance of grain sizes of tasks in a parallel computation has been well recognized [6, 5, 7]. In practice,
the overhead to execute small grain tasks in parallel may
well offset the speedup gained. Therefore, it is important to estimate the costs of the execution of tasks so
that at runtime, tasks can be scheduled to execute sequentiallyor in parallel to achieve the maximal speedup.
Granularity analysis can be done at compile time or
runtime or even both [7]. The compile-time approach estimates costs by statically analyzing program structure.
The program is partitioned statically and the partitioning scheme is independent of runtime parameters. Costs
of most tasks, however, are not known until parameters
are instantiated at runtime and therefore, the compiletime approach may result in inaccurate estimates. The
runtime approach, on the other hand, delays the cost
estimation until execution and can therefore make more
accurate estimates. However, the overhead to estimate
costs is usually too large to achieve efficient speedup,
and therefore the approach is usually infeasible. The
most promising approach is to try to get as much cost
estimation information as possible at compile time and
make the overhead of runtime scheduling very slight.
Such approach has been taken by Tick [10], Debray et
al, [2], and King and Soper [4]. In this paper, we adopt
this strategy.
A method for the granularity analysis of concurrent
logic programs is proposed. Although the method can
be well applied to other languages, such as functional
languages, in this paper, we discuss the method only
in the context of concurrent logic programs. The key
observation behind this method is that task spawning
in many concurrent logic program language implementations, such as Flat Guarded Horn Clauses (FGHC)
[13], depends only on the relative costs of tasks. If
the compile-time analysis can provide simple and precise cost relationships between an active goal and its
subgoals, then th~ runtime scheduler can efficiently estimate the costs of the subgoals based on the cost of
the active goal. The method achieves this by estimating, at compile time, the cost relationship based on the
call graph and the introduction of iteration parameters.
We show that for common benchmark programs, the
method gives correct estimates.
2
Motivations
Compile-time granularity analysis is difficult because
most of the information needed, such as size of a data
structure and number of loop iterations, are not know~
until runtime. Sarkar [7] used a profiling method to
get the frequency of recursive and nonrecursive function
calls for a functional language. His method is simple
and does not have runtime overheads, but can give only
a rough estimate of the actual granularity.
In the logic programming community, Tick [10j first
proposed a method to estimate weights of procedures
by analyzing the call graph of a program. The method,
as refined by Debray [1], derives the call graph of the
program, and then combines procedures which are mutually recursive with each other into a single cluster
(i.e., a strongly connected component in the call graph).
Thus the call graph is converted into an acyclic graph.
Procedures in a cluster are assigned the same weight
810
which is the sum of the weights of the cluster's children
(the weights of leaf nodes are one, by definition). This
method has very low runtime overhead; however, goal
weights are estimated statically and thus cannot capture the dynamic change of weights at runtime. This
problem is especially severe for recursive (or mutually
recursi ve ) procedures.
As an example of the method, consider the naivereverse procedure in Figure 1. (The clauses in the nrev/2
program do not have guards, i.e., only head unification
is responsible for commit.) Examining the call graph,
we find that the algorithm assigns a weight of one to
append/3 (it is a leaf), and a weight of two to nrev/2
(one plus the weight of its child). Such weights are associated with every procedure invocation and thus cannot
accurately reflect execute time.
Debray et al. [2] presented a compile-time method
to derive costs of predicates. The cost of a predicate is
assumed to depend solely on its input argument sizes.
Relationships between input and output argument sizes
in predicates are first derived based on so-called data dependency graphs and then recurrence equations of cost
functions of predicates are set up. These equations are
then solved at compile time to derive closed forms (functions) for the cost of predicates and their input argument
sizes, together with the closed forms (functions) between
the output and input argument sizes. Such cost and argument size functions can be evaluated at runtime to
estimate costs of goals. A similar approach was also
proposed by King and Soper [4]. Such approaches represent a trend toward precise estimation. For nrev/2,
Debray's method gives Costnrev(n) = O.5n 2 + 1.5n + 1,
where n is the size of the input argument. This function
can then be inserted into the runtime scheduler. Whenever nrev/2 is invoked, the cost function is evaluated,
which obviously requires the value n, the size of its first
argument. If the cost is bigger than some preselected
overhead threshold, the goal is executed in parallel; otherwise, it is executed sequentially.
The method described suffers from several drawbacks
(see [11] for further discussion). First, there may be
considerable runtime overhead to keep track of argument sizes, which are essential for the cost estimation
at runtime. Furthermore, the sizes of the initial input
arguments have to be given by users or estimated by the
program when the program begins to execute. Second,
within the umbrella of argument sizes, different metrics
may be used, e:g., list length, term depth, and the value
of an integer argument. It is unclear (from [2, 4]) how to
correctly choose metrics which are relevant for a given
predicate. Third, the resultant recurrence equations for
size relationships and cost relationships can be fairly
complicated.
It is therefore worth remedying the drawbacks of the
above two approaches. It is also clear that there is a
tradeoff between precise estimation and runtime overhead. In fact, Tick's approach and Debray's approach
represent two extremes in the granularity estimation
spectrum. Our intention here is to design a middleof-the-spectrum method: fairly accurate estimation, applicable to any procedures, without incurring too much
runtime overhead.
3
Overview of the Approach
We argue here, as in our earlier work, that it is sufficient
to estimate only relative costs of goals. This is especially
true for an on-demand runtime scheduler [8]. Therefore,
it is important to capture the cost changes of a subgoal
and a goal, but not necessarily the "absolute" granularity. Obviously the costs of subgoals of a parent goal are
always less than the cost of the parent goal, and the sum
of costs of the subgoals (plus some constant overhead)
is equal to the cost of the parent goal. The challenging
problem here is how to distribute the cost of the parent
goal to its subgoals properly, especially for a recursive
call. For instance, consider the naive reverse procedure
nrev/2 again. Suppose goal nrev([1,2,3,4J ,R) is invoked (i.e., clause two is invoked) and the cost of this
query is given, what are the costs of nrev( [2,3,4J ,R1)
and append(R1,[lJ,R)?
It is clear that the correct cost distribution depends
on the runtime state of the program. For example, the
percentage of cost distributed to nrev ( [1,2,3,4] , R)
(i.e., as one of the subgoals of nrev([1,2,3,4,5] ,T)
will be different from that of cost distributed to nrev (
[1,2J ,R). To capture the runtime state, we introduce
an iteration parameter to model the runtime state, and
we associate an iteration parameter with every active'
goal. Since the cost of a goal depends solely on its entry runtime state, its cost is a function of its iteration
parameter. Several intuitive heuristics are used to capture the relations between the iteration parameter of a
parent goal and those of its children goals. To have a
simple and efficient algorithm, only the AND/OR call
graph of the program, which is slightly different from
the standard call graph, is considered to obtain these
iteration relationships. Such relations are then used in
the derivation of recurrence equations of cost functions
of an active goal and its subgoaIs: The recurrence equations are derived simply based on the above observation,
i.e., the cost of an active goal is equal to the summation
of the costs of its subgoals.
We then proceed to solve these recurrence equations
for cost functions bottom up, first for the leaf nodes
of the modified AND/OR call graph, which can be obtained in a similar way in Tick's modified algorithm by
clustering those mutually recursive nodes together in the
AND/OR call graph of the program (see Section 2). After we obtain all the cost functions, cost distribution
functions are derived as follows. Suppose the cost of an
811
active goal is given, we first solve for its iteration parameter based on the cost function derived. Once the iteration parameter is solved, costs of its subgoals, which are
functions of their iteration parameters, can be derived
based on the assumption that these iteration parameters
have relationships with the iteration parameter of their
parent, which are given by the heuristics. This gives the
cost distribution functions desired for the subgoals.
To recap, our compile-time granularity analysis procedure consists of the following steps:
1. Form the call graph of the program and
cluster mutually recursive nodes of the
modified AND lOR call graph.
2. Associate each procedure (node) in the
call graph with an iteration parameter
and use heuristics to derive the iteration parameter relations.
3. Form recurrence equations for the cost
functions of goals and subgoals.
4. Proceed bottom up in the modified ANDOR call graph to derive cost functions.
5. Solve for iteration parameters and then
derive cost distribution functions for each
predicate.
4
4.1
Deriving Cost Relationships
Cost Functions and Recurrence Equations
To derive the cost relationships for a program, we u'se
a graph G (called an AND lOR call graph) to capture
the program structure. Formally, G is a triple (N, E, A),
where N is a set of procedures denoted as {PI, P2, ... ,Pn}
and E is a set of pair nodes such that (PI, P2) E E if and
only if P2 appears as one of the subgoals in one of the
clauses of Pl. Notice that there might be multiple edges
(PI,P2) because PI might call P2 in multiple clauses. A is
a partition of the multiple-edge set E such that (PbP2)
and (PI,P3) are in one element of A if and only if P2 and
P3 are in the body of the same clause whose head is Pl'
Intuitively, A denotes what subgoals are AND processes.
After applying A to edges leaving out a node, edges are
partitioned into clusters which correspond to clauses and
these clauses are themselves OR processes. Figure 2
shows an example, where the OR branches are labeled
with a bar, and AND branches are unmarked. Leaf facts
(terminal clauses) are denoted as empty nodes.
As in [1], we modify G so that we can cluster all
those recursive and mutually recursive procedures together and form a directed acyclic graph (DAG). This
is achieved by traversing G and finding all stronglyconnected components. In this traversing, the difference between AND and OR nodes is immaterial, and
we simply discard the partition A. A procedure is re-
cursive if and only if the procedure is in a stronglyconnected component. After nodes are clustered in a
strongly-connected component in G, we form a DAG
G' , whose nodes are those strongly-connected components of G and edges are simply the collections of the
edges in G. This step can be accomplished by an efficient algorithm proposed by Tarjan [9].
The cost of an active goal P is determined by two
factors: its entry runtime state s during the program
execution and the structure of the program. We use
an integer n, called the iteration parameter, to approximately represent state s. Intuitively, n can be viewed as
an encoding of a program runtime state. Formally, let
S be the set of program runtime states, M be a mapping from S to the set of natural numbers N such that
M(s) = n for s E S. It is easy to see that the cost of
P is a function of its iteration parameter n. It is also
clear that the iteration parameter of a subgoal of P is
a function of n. Hereafter, suppose Pij is the ph subgoal in the ith clause of p. We use Iij (n) to represent
the iteration parameter of Pij. The problem of how to
determine function Iij will be discussed in Section 4.2.
To model the structure of the program, we use the
AND lOR call graph G as an approximation. In other
words, we ignore the attributes of the data, such as size
and dependencies. We first derive recurrence equations
of cost functions between a procedure P and its subgoals
by looking at G. Let Costp (n) denote the cost of p.
Three cases arise·in this derivation:
Case 1: P is a leaf node of G' which is nonrecursive. This includes cases where that P
is a built-in predicate. In this case, we simply assign a constant c as Costp (n). c is the
cost to execute p. For instance such cost can
be chosen as the number of machine instructions in p.
For the next two cases, we consider non-leaf nodes
p, with the following clauses (OR processes),
Let the cost of each clause be Costej (n) for 1 ::; j ::; k.
We now distinguish whether or not P is recursive.
Case 2: P is not recursive and not mutually
recursive with any other procedures. We can
easily see that
812
k
Costp(n) ~
L
CostcJ(n).
4.2
(1)
j=1
There are several intuitions behind the introduction of
the iteration parameter. As we mentioned above, ite~
ation parameter n represents an encoding of a program
runtime state as a positive integer. In fact, this type of
encoding has been used extensively in program verification, e.g., [3], especially in the proof of loop termination.
A loop C terminates if and only it is possible to choose a
function M which always maps the runtime states of C
to nonnegative integers such that M monotonically decreases for each iteration of C. Such encoding also makes
it possible to solve the problem that once the cost of an
active goal is given, its iteration parameter can be obtained. This parameter can be used to derive costs of
its subgoals (provided the iteration-parameter functions
1m are given), which in turn give the cost distribution
functions.
Conservatively, we approximate Cost p(n) as
the right-hand side of the above inequality.
Notice that in a committed-choice language,
the summation in the above inequality can
be changed to the maximum (i.e., max) function. However this increases the difficulty of
the algebraic manipulation of the resultant
recurrence equations (see [11] for example)
and we prefer to use the summation as an
approximation.
Case 3: P is recursive or mutually recursive. In this case, we must be careful in the
approximation, since minor changes in the
recurrence equations can give rise to very
different estimation. This can be seen for
spli t in qsort example in Section 2.
Admittedly, the encoding of program states may be
fairly complicated. Hence, to precisely determine the
iteration-parameter functions for subgoals will be complicated too. In fact, this problem is statically undecidable since this is as complicated as to precisely determine the program runtime behavior at compile time.
Fortunately, in practice, most programs exhibit regular
control structures that can be captured by some intuitive heuristics.
To be more precise, we first observe that
some clauses are the "boundary clauses," that
is, they serve as the termination of the recursion. The other clauses, whose bodies have
some goals which are mutually recursive with
p, are the only clauses which will be effective
for the recursion. Without loss of generality, we assume for j > u, Cj are all those
"mutually recursive" clauses. For a nonzero
iteration parameter n (i.e., n > 0), we take
the average costs of these clauses as an approximation:
To determine the iteration-parameter functions, we
first observe that there is a simple conservative rule:
for a recursive body goal p, when it recursively calls
itself back again, the iteration parameter must have been
decreased by one (if the recursion terminates). This is
similar to the loop termination argument. Therefore,
as an approximation, we can use Im(n) = n - 1 as a
conservative estimation for a subgoal pim which happens
to be p (self-recursive). Other heuristics are listed as
follows:
and for n = 0, we take the sum of the costs
of those "boundary clauses" as the boundary
condi tion of Costp (n ):
§1. For a body goal pim whose predicate only
occurs in the body once and it is not mutually recursive with p (i.e., not in a stronglyconnected component of p), lim(n) = n.
u
Costp(O)
=
L
CostCj (0).
j=1
The above estimation only gives the relations between cost of p and those of its clauses. The cost of
clause Cj can be estimated as
nj
Costcj(n) = CHeadj
+L
Costpjm(Ijm(n))
Iteration Parameters
(3)
m=1
where CHeadj is a constant denoting the cost for head
unification of clause Cj and Ijm (n) is the iteration parameter for the mth body goal. Substituting Equation 3
back into Equation 1 or 2 gives us the recurrence equations for cost functions of predicates.
§2. If Pim is mutually recursive with p and its
predicate only occurs once in the body, lim (n)
=n-1.
§3. If Pim is mutually recursive with P and its
predicate occurs I times in the body, where
1 > 1, lim ( n) = n / 1 (this is integer division,
i.e., the floor function).
The intuitions behind these heuristics are simple.
Heuristic §1 represents the case where a goal does not
invoke its parent. In almost all programs, this goal will
process information supplied by the parent, thus the it-
813
eration parameter remains unmodified. Heuristic §2 is
based on the previous conservative principle. Heuristic
§3 is based on the intuition that the iteration is divided
evenly for multiple callees. Notice for the situation in
heuristic §3, we can also use our conservative principle.
However, we avoid use of the conservative principle, if
possible, because the resultant estimation of Cost p( n)
may be an exponential function of n, which, for most
practical programs, is not correct.
These heuristics have been derived from experimentation with a number of programs, placing a premium
on the simplicity of I (n). A partial summary of these
results is given in Section 6. A remaining goal of future
research is to further justify these heuristics with larger
programs, and derive alternatives.
4.3
An Example: Quicksort
After we have determined the iteration-parameter functions, we have a system of recurrence equations for cost
functions. These system of recurrence equations can be
solved in a bottom-up manner in the modified graph G'.
The problem of systematically solving these recurrence
equations in general is discussed in [11]. Here, we consider a complete example for the qsort/2 program given
in Figure 2.
The boundary condition for Costqsort (n) is that
Costqsort(O) is equal to the constant execution cost d1 of
qsort/2 clause one. The following recurrence equations
are derived:
Costqsort(O)
Costqsort(n)
With Heuristic §3, we have
CostC2 = d2
+ Costsplit(n) + 2Costqsort(n/2)
where d2 is the constant cost for the head unification of
the second clause of qsort/2.
Similarly, the recurrence equations for Costsplit (n)
are
Costsplit(O)
Cost split (n )
d3
(Costc2 + CostcJ/2
Furthermore,
Cos t C2
CostC3
d4 + Costsplit(n -1)
where d4 is the constant cost for the head unification
of the second (and the third) clause of spl it. We first
solve the recurrence equations for split, which is in
the lower level in G' and and then solve the recurrence
equations for qsort. This gives us Costsplit(n) = d3 +d4 n
which can be approximated as d4 n and Costqsort(n) =
d1 + d2 10g n + d4 n log n, which is the well known average
complexity of qsort.
Finally, it should be noted that it is necessary to distinguish between the recursive and nonrecursive clauses
here and take the average of the recursive clause costs
as an approximation. If we simply take the summation
of all clause costs together as the approximation of the
cost function, both cost functions for split and qsort
would be exponential, which are not correct. More precisely, if the summation of all costs of clauses of split
is taken as Costsplit(n), we will have
Costsplit(n)
=
d3 + 2(d4 + Costsplit(n - 1))
The solution of Costsplit (n) is an exponential function,
which is not correct.
5
Distributing Costs
So far, we have derived cost functions of the iteration
parameter for each procedure. However, to know the
cost of a procedure, we need to first know the value of
its iteration parameter. This, as pointed out in our introduction, may require too much overhead. We notice
that, in most scheduling policies (such as on-demand
scheduling), only relative costs are needed. This can be
relatively easily achieved in our theory since cost functions only have a single parameter (iteration parameter).
To derive cost distributing formulae for a given procedure and its body goals, the first step is to solve for
the iteration parameter n in Equation 3 assuming that
Costp(n) is given at runtime as Cpo Assuming that
clause i is invoked in runtime, we approximate CoStCi (n)
as Cp and solve Equation 3 for n. Let n = F( Cp ) be the
symbolic solution, which depends on the runtime value
of Costp(n) (i.e., Cp ), we can easily derive costs of its
subgoals of clause i as we can simply substitute n with
F( Cp ) in CostPim (Iim( n)), which gives rise to the cost
distributing functions we need to derive at compile time.
Let's reconsider the nrev/2 procedure.
equations are derived as follows:
Costnrev(n)
Cost nrev (0)
Costnrev(n - 1)
The cost
+ Costappend(n)
Cl
Cost append (n)
Costappend(n - 1)
Costappend(O)
C2
+ Ca
We can easily derive the closed forms for these two cost
functions as Costappend(n) = n x C a + C2 which can be
approximated as C a x n, and Costnrev(n) == C a X n 2 /2.
Now, given the Costnrev(n) as C n we solve for nand
have n
= ~.
Hence, we have Cost nrev (n - 1)
=
814
Ca(jWf -.1)2/2 and. C~stap~end(n) = CajWf. These
are the desIred cost dIstrIbutmg functions.
It should be pointed out that in some cases it is
not necessary to first derive the cost functions and then
derive the cost distributing functions since we can simply derive the cost distributing scheme directly from the
cost recurrence equations. For example, consider the Fibonacci function, where the cost equations are
Cost jib( n)
Cj
Costjib(O)
CI
+ 2 x Costjib(n/2)
Without actually deriving the cost functions of Cost jib( n),
we can simply derive the cost distributing relationship
from the first equation as Costjib(n/2) = (Costjib(n) -
Cj )/2.
Also note that at compile time, the cost distributing functions should be simplified as much as possible
to reduce the runtime overhead. It is even worthwhile
sacrificing precision to get a simpler function. Therefore, a conservative approach should be used to derive
the upper bound of the cost functions. In fact, we can
further simplify the cost function derived in the following way. If the cost function is of a polynomial form
such as conk + cln k- l + ... Ck, we simplify it as kconk
and if the cost function is of several exponential components such as Clan + C2bn where b > a, we simplify
it as (Cl + c2)bn . This will simplify the solution of the
iteration parameter and the cost distributing function
and hence simplify the evaluation of them at runtime.
5.1
Runtime Goal Management
The above cost relationship estimation is well suited
for a r~ntime scheduler which adopts an on-demand
scheduling policy (e.g., [8]), where PEs maintain a local queue for active goals and once a PE becomes idle,
it requests a goal from other PEs. A simple way to
distribute a goal to a requesting PE is to migrate an
active goal in the queue. The scheduler should adopt
a policy to decide which goal is going to be sent. It is
obvious that the candidate goal should have the maximal grain size among those goals in the queue. Hence,
we can use a priority queue where weights of goals are
their grain sizes (or costs). The priority is that the bigger the costs are, the higher priority they get. Because
the scheduler only needs to know the relative costs, we
can always assume the weight of the initial goal is some
fixed, big-enough number. Based on this initial cost and
the cost distributing formulae derived at compile time,
every time a new clause is invoked, the scheduler derives
the relative costs of body goals. The body goals are then
enqueued into the priority queue based on their costs.
Some bookkeeping problems arise from this approach.
First, even though we can simplify the cost distributing
functions at compile time to some extent, the runtime
overhead may still be large, since for each procedure
invocation, the scheduler has to calculate the weights
of the body goals. One solution to this problem is to
let the scheduler keep track of a modulo counter and
when the content of the counter is not zero, the scheduler simply lets the costs of the body goals be the same
as that of their parent. Once the content of the counter
becomes zero, the cost-distributing functions are used.
If we can choose an appropriate counting period, this
method is reasonable (one counter increment has less
overhead than the evaluation of the cost estimate).
Another problem in this approach is that for longrunning programs, costs may become negative, i.e., the
initial weight is not large enough. Since we require only
relative costs, a solution is to reset all costs (including those in the queue, and in suspended goals), when
some cost becomes too small. Cost resetting requires
the incremental overhead of testing to determine when
to reset.
As stated above, we need to choose the initial cost as
big as possible. However, this can introduce an anomaly
for our relative cost scheme. To see this, consider the
nrev example again. Suppose that the initial query is
nrev([l, ... ,50J). The correct query cost is approximately 50 x 50 = 2500. The correct cost of its immediate
append goal is approximately 49, and the correct cost of
one of its leaf descendant goals nrev ( [J) is one (the
head unification cost). If we choose the initial cost as
a big number, say 106 , then the corresponding iteration
parameter is 103 • This will give the cost of nrev ( [J )
as (10 3 - 50)2 which is bigger than the estimated cost
of the initial append goal (only around 10 3 ). In other
words, this gives an incorrect relationship between goals
near the very top and near the very bottom of the proof
tree.
For this particular example, the problem could be finessed by precomputing the "correct" initial value of the
iteration parameter: exactly equal to the weight of the
query. However, in general, a correct initial estimation
is not always possible, and when it is possible, its computation incurs too much overhead. All compile-time
granularity estimation schemes must make this tradeoff. Fortunately, in our scheme" the problem is not as
serious as it first appears. For initial goals with sufficiently large cost, our scheme is still able to give correct
relative cost estimation for sufficiently large goals which
are not close to leaves of the execution call graph. This
can be seen in the nrev example, where the relative costs
among nrev ( [2, ...• 50J through nrev ( [42, ... ,50J ),
and the initial append are still correct in our scheme.
Correct estimation for the large goals (those near the
root of the proof tree) is more important than that for
small goals (those near the leaves ) because the load balance of the system is largely dependent on those big
goals, and so is performance.
815
Heuristic
§1
§2
§3
all
Applicable
24
29
4
32
Correct
21
26
2
27
Percentage
87.5%
89.6%
50.0%
84.7%
Table 1: Statistics for Benchmark Programs
Heuristic
§1
§2
§3
all
Applicable
64
49
6
111
Correct
57
55
4
101
Percentage
89.1%
87.3%
66.7%
91.0%
level while recursively traversing down for each element
(which may be tree structures). Again, this presents inherent difficulty for our scheme because we take the call
graph as the sole input information for the program to
be analyzed.
To summarize, our statistics show that our scheme
achieves a fairly high percentage. of correct estimation.
However, we need to apply multiply-recursive heuristics
§2 and §3 with more finesse. Further quantitative performance studies of the algorithm's utility are presented in
Tick and Zhong [11]. Those multiprocessor simulation
results quantify the advantage of dynamically scheduling tasks with the granularity information.
Table 2: Statistics for a Compiler Front End
7
6
Conclusions and Future Work
Empirical Results: Justifying the Heuristics
We applied our three heuristics and the cost estimation formulae to two classes of programs. The first
class includes nine widely used benchmark programs
[12], containing 32 procedures. The second class consists of 111 procedures comprising the front-end of the
Monaco FGHC compiler. The results are summarized
in Table 1 and Table 2. For each heuristic, the tables
show the number of procedures for which the heuristic is
applicable (by the syntactic rules given in Section 4.2),
and the number for which the heuristic is correctly estimates complexity. The row labeled "all" gives the total
number of procedures analyzed. Since more than one
heuristic may be applicable in a single procedure, the
total number of procedures may be less than the sum of
the previous rows.
From the tables, we see that §1 and §2 apply most
frequently. This indicates that most procedures are linear recursive (i.e., have a single recursive body goal)
which can be estimated correctly by our scheme. The
relatively low percentage of §3 correctness is because the
benchmarks are biased towards procedures with exponential time compl~xity, whereas §3 usually gives polynomial time complexity.
Analysis of the benchmarks indicated two major anomalies in the heuristics. Although §1 may apply, a procedure may distribute a little work (say, the head of a
list) to one body goal and the rest of the work (say, the
tail of the list) to another goal. This cannot be captured
by §1, which essentially treats the head and tail of the
list as equal, i.e., a binary tree. A correct cost analysis
needs to explore the data structures of the program.
For recursive procedures, §3 can capture only the
fixed-degree divide & conquer programming paradigm.
However, the compiler benchmark contained procedures
which recursively traverse a list (or vector) and the degree of the divide & conquer dynamically depends on the
number of top-level elements in the list (or vector). In
this situation, the procedure may have to loop on the top
We have proposed a new method to estimate the relative
costs of procedure execution for a concurrent language.
The method is similar to Tick's static scheme [10], but
gives a more accurate estimation and reflects runtime
wei,l!;ht chan,l!;es. This is achieved bv the introduction
of an iteration parameter which is used to model recurSIons.
Our method is based on the idea that it is not the
absolute cost, but rather the relative cost that matters
for an on-demand goal scheduling policy. Our method
is also amenable to implementation. First, our method
can be applied to any program. Second, the resultant
recurrence equations can be solved systematically. In
comparison, it is ·unclear how to fully mechanically implement the schemes proposed in [2, 4]. Nonetheless,
our method may result in an inaccurate estimation for
some cases. This is because we use only the call graph to
model the program structure, not the data. We admit
that further static analysis of program structure such as
argument-size relationships can give more precise estimations.
Future work in granularity analysis includes the development of a more systematic and precise method to
solve the derived recurrence equations. It is also necessary to examine this method for more practical programs, performing benchmark testing on a multiprocessor to show the utility of the method.
Acknowledgements
E. Tick was supported by an NSF Presidential Young Investigator award, with funding from Sequent Computer
Systems Inc. The authors wish to thank S. Debray and
the anonymous referees for their helpful criticism.
REFERENCES
[1] S. K. Debray. A Remark on Tick's Algorithm for
Compile-Time Granularity Analysis. Logic Programming Newsletter, 3(1):9-10, 1989.
816
[2] S. K. Debray, N.-W. Lin, and M. Hermenegildo.
Task Granularity Analysis in Logic Programs. In
SIGPLAN Conference on Programming Language
Design and Implementation, pages 174-188. ACM
Press, June 1990.
[3] D. Gries. Science of Programming. Springer-Verlag
1989.
'
[4] A. King and P. Soper. Granularity Control for Concurrent Logic Programs. In International Computer
Conference, Turkey, 1990.
[5] B. Kruatrachue and T. Lewis. Grain Size Determination for Parallel Processing. IEEE Software,
pages 23-32, January 1988.
[6] C. McGreary and H. Gill. Automatic Determination of Grain Size for Efficient Parallel Processing.
Communications of the ACM, 32:1073-1978, 1989.
[7] V. Sarkar. Partitioning and Scheduling Parallel
Programs for Execution on .Multiprocessors. MIT
Press, Cambridge MA, 1989.
[8] M. Sato and A. Goto. Evaluation of the KLI Parallel System on a Shared Memory Multiprocessor. In
IFIP Working Conference on Parallel Processing,
pages 305-318. Pisa, North Holland, May 1988.
[9] R. E. Tarjan. Data Structures and Network Algorithms, volume 44 of Regional Conference Series in
Applied Mathematics. Society for Industrial and
Applied Mathematics, Philadelphia PA, 1983.
[10] E. Tick. Compile-Time Granularity Analysis of
Parallel Logic Programming Languages. New Generation Computing, 7(2):325-337, January 1990.
[11] E. Tick and X. Zhong. A Compile-Time Granularity Analysis Algorithm and its Performance Evaluation. Journal of Parallel and Distributed Computing, submitted to special issue.
[12] E. Tick. Parallel Logic Programming. MIT Press,
Cambridge MA, 1991.
[13] K. Veda. Guarded Horn Clauses. In E.Y. Shapiro,
editor, Concurrent Prolog: Collected Papers, volume 1, pages 140-156. MIT Press, Cambridge MA,
1987.
nrev( [] ,R) : - R= [] .
nrev([HIT],R) :- nrev(T,R1), append(R1,[H],R).
append([],L,A) ;- A=L.
append ( [H IT] ,L,A) : - A= [H IA1], append(T ,L ,A1) ·back to nrev
back to append
Figure 1: Naive Reverse and its Call Graph
qsort([], S) :- S=[].
qsort([MIT],S) :split(T,M,S,L),
qsort(S,SS),
qsort(L,LS),
append(SS,LS,S).
spli t ( [] ,
M, s, L) : - S= [], L= [] .
split([HIT],M,S,L) :- H < M I
S=[HITS] , split(T,M,TS,L).
split([HIT],M,S,L) :- H >= M I
L=[HITL], split(T,M,S,TL).
back to qsort
back to split
Figure 2: Quick Sort: FGHC Source Code and the AND/OR Call Graph
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
817
Providing Iteration and Concurrency in Logic Programs
through Bounded Quantifications
Jonas Barklund and Hakan Millroth, UP MAIL
Computing Science Dept., Uppsala University,
Box 520, S-751 20 Uppsala, Sweden
E-mail: jonas@csd.uu.seorhakanm@csd.uu.se
Abstract
Programs operating on inductively defined data structures, such as lists, are naturally defined by recursive
programs. Millroth has recently shown how many such
programs can be transformed or compiled to iterative
programs operating on arrays. The transformed programs can be run more efficiently than the original programs, particularly on parallel computers.
The paper proposes the introduction of 'bounded
quantifications' in logic programming languages. These
formulas offer a natural way to express programs operating on arrays and other 'indexable' data structures.
'Bounded quantifications' are similar to 'array comprehensions' in functional languages such as Haskell. They
are inherently concurrent and can be run efficiently on
sequential computers as well as on various classes of parallel computers.
1
PROCESSING DATA STRUCTURES
There are two principal ways of building a data structure
in a logic program.
AI. Use a recursive relation which defines explicitly the
contents of a finite part of the data structure and
then uses itself recursively to define the rest of the
data structure.
(There is, of course, an obvious duality between these
operations. )
Method A is often natural when one uses inductively defined data structures, including lists, trees, etc.
Method B is often natural when one uses data structures
whose elements can be indexed. Some data structures,
most importantly lists, fall in both categories and which
method is most natural depends on the context.
2
RECURSION
We can broadly classify recursive programs in 'conjunctive' and 'disjunctive' programs (some are a mixture).
The former category use recursion to compute a conjunction, like the following lessall program. l
lessall(A, [BIXD
lessall(A, [D.
f-
A
<
B 1\ lessall(A, X).
A formula lessall(A, [Bt, B 2, . .. ,BnD reduces to the finite conjunction
A
<
BI 1\ A
<
B2 1\ ... 1\ A
< Bn
which could be expressed more briefly as
Vi{l :::; i :::; n -+ A <
Bd.
BI. Express directly the contents of each element of the
data structure, preferrably through an 'indexing' of
the elements of the data structure.
This reduction can be performed at compile time, except that the value of n is the length of the list actually
supplied to the program. Such a program can be run
efficiently as an iteration on a sequential computer.
The latter category uses recursion to compute a disjunction, for example the member program.
Correspondingly there are two principal ways of traversing a data structure in a logic program.
member(A, [BIXD
member(A, [BIX])
A2. Use a recursive relation which examines explicitly
the contents of a finite part of the data structure
and then uses itself recursively to traverse the rest
of the data structure.
B2. Access directly the contents of each element of the
data structure, preferrably through an 'indexing' of
the elements of the data structure.
ff-
A = B.
member(A, X).
A formula member(A, [BI, B 2, ... , BnD reduces to the finite disjunction
A
= BI V A = B2 V ... V A = Bn
lOur language consists (initially) of clauses whose bodies may
contain comj unctions , disjunctions and negations. We assume
"Herbrand" equality except for arithmetic expressions and array
elements. All examples can be easily translated into Prolog or
Godel (Hill & Lloyd, 1991).
818
which could, in turn, be expressed more briefly as
3i{1
~ i ~ n /\ A
= Bd
which can, similarly, be run efficiently. Millroth's compilation method (1990, 1991), based on Tarnlund's Reform inference system (1992) transforms 'conjunctive'
and 'disjunctive' recursive programs to the iterative programs above.
2.1
Concurrency
The conjunction, or disjunction, in a logic program can
be interpreted as a concurrent operator, as in ANDparallel and OR-parallel logic programming systems.
This does not yield sufficient concurrency for running recursive programs efficiently on parallel computers. Even
using a concurrent connective, work is only initiated on
one 'recursion level' in each step. This implies a linear
run time which can be approximated by an expression
An + B (where A is the overhead for each recursion level,
n is the recursion depth and B is the time spent in each
recursion level). The number of literals in a recursive
clause is typically much smaller than the depth of the
recursion. For recursive programs with simple bodies,
such as lessall or member, the An term will always dominate Only for small recursion depths and complex bodies
will the B term be significant.
Recursive programs transformed by Millroth's method
have a much larger potential to run efficiently on parallel
computers. The iterative programs can be run in parallel
on n processors unless prohibited by data dependencies
etc. Techniques for parallelizing this kind of iterations
have been developed for, and applied to, FORTRAN programs for some time.
3
step( Gl , G z ) f size(O, Gt, So) /\ size(O, Q, So) /\ size(O, G 2 , So) /\
size(l, Gl , SI) /\ size(l, Q, Sl) /\ size(l, G z, SI) /\
'v'I'v'J{Q[I,JJ = GIll -1 mod So,J -1 mod SIJ +
GIll -1 mod So,JJ +
GIll -1 mod So,J + 1 mod SIJ +
Gl [I, J -1 mod SIJ +
GIlI,J + 1 mod Stl +
GdI + 1 mod So,J -1 mod SIJ +
GIll + 1 mod So,J] +
Gl [I + 1 mod So, J + 1 mod SIJ -+
(Q[I, JJ < 2/\ Gz[I, J] = 0 V
Q [I, JJ = 2 /\ Gz [I, J] = 1 V
Q [I, J] = 3 /\ Gz [I, J] = 1 V
Q [I, J] > 3 /\ Gz [I, J] = O)}.
We can also present a simple example of the use of
explicit existential quantifiers. The problem is to find
the position I in a array X of some element which is
smaller than a given value A.
small(I, X, A)
f-
f-
'v' B'v' I {X[IJ
4
-+
reverse(X1 ,XZ ) f size( 0, Xl, L) /\ size( 0, X z , L) /\
'v'A'v'I{XdIJ = A -+ Xz[L - 1-
1J = A}.
(Our notation assumes that the expression L - I - 1 is
evaluated and replaced by its value. We also assume that
array indices are zero based. Finally, we let size(D, X, S)
express that the size of the array X in dimension D is
S.)
We may express one generation of Conway's game of
Life:
= I}.
BOUNDED QUANTIFICATION
--+
[x]}
where 8 is a formula which is "obviously" true for
only a finite number of values of x, denoted by, say,
{ Co, Cl, ... , Ck-l}. In this case the quantification is clearly
equivalent to the finite conjunction
(8[eo]
(8[elJ
--+
(8[Ck-l]
--+
A < B},
provided that the the value of the expression X[IJ is the
I th element of the array X.
We may express reversal of the elements in an array:
B < A /\ J
Consider those universally quantified formulas which are
instances of the schema
'v'x{8[xJ
=B
-+
In all these examples we have quantified over the elements of an indexable data structure. There are other
useful relations which can be expressed naturally in this
way, and run efficiently. Specifically we want to include
all quantifications over the elements of a finite set, whose
members are 'obvious'. Below we will be somewhat more
precise what this means.
EXPLICIT QUANTIFICATION
It is possible to build arrays and other indexable data
structures, or express relations over them using recursive
programs. It is often more natural to use a universal or
existential quantification over the members of the data
structure.
We may express the lessall relation over arrays as
lessall( A, X)
3J {X[J] = B
--+
[co]) /\
[CI]) /\ ... /\
[Ck-l])
which is, by the definition of 8, equivalent to
Similarly, a formula which is an instance of the schema
3x{8 [xJ/\ [x]} is under the same assumptions equivalent
to
We propose to
1. identify a set of formulas which always are true for
only a finite number of objects, we call them range
formulas,
819
2. make a system which recognizes those instances of
the schema above where 8 is a range-formula, we
call them bounded quantifications, and
3. interpret bounded quantifications concurrently. The
conjuncts obtained from a bounded quantification
may be run in any order, even simultaneously, provided that any data dapendencies (arising, e.g., from
numerical expressions) are satisfied.
Since a range formula is required to hold for a finite number of objects, it is possible to enumerate them (as we
have indeed done above with {co, Cl, ... , Ck-l} ). It will
become apparent from examples below that it is very
useful to have range formulas relate each object with a
unique integer in {a, 1, ... , k - I}.
In the following sections we will first identify a few useful range formulas and then show how to run bounded
quantifications efficiently on sequential and parallel computers.
5
RANGE FORMULAS
The following is an incomplete set of interesting range
formulas.
5.1
Array and "structure" elements
As we have seen, it is useful to quantify over all elements
of a data structure. In an array, each element is associated with a unique integer in the range, say, {a, 1, ... ,n}.
We could, for example let XlI] = E (where X is an array, I is a variable and E is a term) be a range formula
and the lessall and reverse programs above are examples
of its use. It may be difficult to write a compiler which
recognizes precisely this use of an equality as a range
formula. One solution would be to predefine, say, the
predicate symbol eIt by
elt(I, X, E)
f-
XlI]
= E.
and only recognize predications on the form elt(·,·,·) as
range formulas.
5.2
Integer ranges
An obviously useful range formula would be one which
is true for the first k integers ([0, k - 1]). Again, the
formula
X /\ X < K expresses exactly that relation,
but for practical reasons it may be wise to define the
binary predication cardinal(X, K) to stand for the binary
relation which is true whenever
X < K. Note that
the enumeration in this case coincides with the objects
themselves.
Note, moreover, that it is trivial to obtain a range
formula which is true for all integers in an arbitrary range
[I, J] using the binary cardinal predicate.
°: :;
°: :;
5.3
Enumerable types
A logic programming language with types is likely to
contain "enumerable" types, for example, finite sets of
distinct constants. One may wish to consider any predication, whose predicate symbol coincides with the name
of such a type, a range relation. For example, suppose
that colour is a type with the elements spades, hearts,
clubs, and diamonds (in that order). Then colour(I, X)
is a range formula which is true if and only if I is and
X is spades, I is 1 and X is hearts, I is 2 and X is clubs,
or I is 3 and X is diamonds.
Note that in this view an enumerable type of K elements is isomorphic with the integer range [0, K - 1], so
it does not really add anything to the language as such. 2
°
5.4
List elements and list suffixes
Lists are usually operated upon by recursively defined
programs. Still, there are occasionally reasons for expressing programs through bounded quantifications. We
propose two range formulas involving lists. The first associates every element of some list with its (zero-based)
position in the list. The second enumerates every (not
necessarily proper) suffix of some list (with the list itself
being suffix 0). We propose to recognize the predication
member(I, L, X) as a range formula which is true if and
only if X is the Ith element of the list L.
The predication suffix(I, L, X) is a range formula
which is true if and only if X is the I th suffix of the
list L. Note that if the length of L is K and [] denotes
an empty list, then suffix(O, L, L) and suffix(I(, L, []) are
true formulas. (Since Prolog has no occur check, a programmer in that language could apply these predicates
to cyclic "terms". We leave the behaviour in such a case
undefined. )
5.5
Finite sets
Given that finite sets are provided as a data structure it
would make sense to have range formulas for sets (e.g.,
membership), as has been suggested by Omodeo (personal communication). This is an interesting proposal,
but is is difficult to represent arbitrary sets efficiently in
a way that allows the elements to be enumerated. Multisets (bags) are easier to implement, but these are, on the
other hand, quite similar to lists, except that the order
in which elements occur is irrelevant.
6
SEQUENTIAL ITERATION
Consider a bounded quantification \f x {8 [x] ---+ <.P [x]},
such that 8[x] is true when (and only when) the value of
x is one of {CO,Cl, ... ,Ck-d. We may run the conjuncts
<.P[ca] /\ <.P[Cl] /\ ... /\ [ck-d in any order, provided that
any data dependencies are satisfied.
2They do, however, seem to make programs easier to understand and debug.
820
We consider now a bounded quantification without
dependencies. Running it on a sequential computer is
straightforward: translate the quantified formula into
an iteration which evaluates, in sequence, the formulas
cI>[co], cI>[Cl], ... , cI>[Ck-l].
Since the compiler knows in advance about the possible range formulas, it may generate specialized code for
each kind of range formula. For example, if the range
formula 8[x] is member(I, X, L) then we can illustrate
the resulting code as
allocate_environment;
= deref(l);
while Cy != NIL)
y
{
x = deref(y->head);
code for cI>[x];
y = deref(y->tail);
}
deallocate_environment;
using a C-style notation. (Note that we ignore the enumeration of the list elements in this example.) Assuming that the implementation is based on WAM (Warren,
1983) the "code for cI>[x]" may introduce choice points
(and thus be unable to deallocate environments) if there
are alternative solutions for cI>[x1.
In the important case that the proof for cI>[x] is deterministic, every pass through the loop will begin in
the same environment. This is more efficient than the
corresponding recursive computation in Prolog (under
WAM) which will allocate and deallocate an environment for each recursive call. Most implementations will
also refer to the symbol table when making the recursive
call. That is somewhat less efficient than the (conditional) jump performed at the end of a loop. We predict
that together these improvements will result in substantial savings, particularly when proofs are deterministic,
the bodies of recursive clauses are small and recursion is
deep. Meier also notes these advantages when compiling
some recursive programs as iterations (1991).
7
PARALLEL ITERATION
On sequential computers bounded quantification, when
at all appropriate, is likely to offer significant improvements over the corresponding recursive programs, run
in the usual way. The potential speed-ups on parallel
computers are still more dramatic.
Consider the conjunction
obtained from a bounded quantification Vx{8[x] -+
cI>[x]}. Since we may run the conjuncts in any order, we
may also run them all in parallel (similarly for disjunctions), provided that we add synchronization to satisfy
dependencies.
7.1
Running deterministic programs
There are several methods for running deterministic iterations in parallel; these ideas have successfully applied
to FORTRAN programs for a long time. The following
is one of the simplest. If there are k processors, numbered from 0 to k - 1, simply let processor i evaluate
cI>[Ci], for each i, 0 ::; i < k. If there are fewer than k
processors, say k' processors, simulate k processors by
letting processor i evaluate cI>[Cj], for each j, 1 ::; j < k,
such that j modulo k' is i. If the computation of each
cI>[Ci] is deterministic, then this is quite straightforward.
7.2
Running nondeterministic programs
Suppose that the formula cI> is such that there is a choice
of two or more potential proofs for some conjunct cI>[Ci].
If no two conjuncts cI>[Ci] and cI>[Cj]' i #- j, share any
variables, then we have independent parallelism in which
backtracking is 'local' and easily implemented, cf., e.g.,
DeGroot (1984).
This is a special case of the more general situation in
which one can compute the variable assignments satisfying each conjunct independently of each other. For example, the conjuncts may share a variable, whose value
at runtime is an array, and only access distinct elements
of it. In general it is not possible to verify this condition
statically so some run time tests will be necessary.
Consider the other case: that the free variables of
conjuncts interact in such a way that it is not possible to compute variable assignments independently for
each conjunct. In that case the corresponding recursive
program, if run in the usual way using depth-first search
of the proof tree, has to perform deep backtracking to
earlier recursion levels. When investigating this class of
programs we have noted that they occur surprisingly infrequently. Running such programs often leads to a combinatorial explosion of potential proofs which is only feasible when backtracking over a few recursion levels. The
programs also do not behave nicely when running on,
e.g., WAM. They tend to consume stack space rapidly
if choice information prevents environments from being
deallocated.
The problem of simultaneously finding variable assignments for a set of non-independent and non-deterministic
conjuncts is also very difficult. Earlier research on backtracking in AND-parallel logic programming systems by,
e.g., Conery (1987) confirms this claim.
Our current position is therefore to refuse to run in
parallel any bounded quantification for which we cannot show statically, or at least with simple run time
tests, that the conjuncts are independent. In the context of AND-parallel logic programming systems, DeGroot among others have investigated appropriate run
time tests for independency. Note that the overhead for
such tests is lower in our context. One test (say, for determining whether a free variable in a bounded quantification is instantiated at run time) is sufficient for starting
821
arbitrarily many independent computations.
By applying these requirements also when running
bounded quantifications on sequential processors it is
guaranteed that the stack size when starting the proof
of each conjunct will be constant.
8
SIMD AND MIMD PARALLEL COMPUTERS
We believe that bounded quantifications will run efficiently on both SIMD and MIMD parallel computers.
When the bodies of bounded quantifications are simple
and no backtracking is needed inside them, the capabilities of SIMD parallel computers are sufficient. It seems
that most programs belong, or can be made to belong,
to this class.
For those programs which do more complicated processing in the bodies of bounded quantifications, e.g.,
backtracking, not all processors of a SIMD parallel computer will be active simultaneously. This will reduce the
efficiency of such a computer, while it may still be possible to fully utilize a MIMD parallel computer.
9
OTHER OPERATIONS
We think it is also beneficial to predefine certain useful
operations, such as reductions and 'scans' over lists and
arrays. Such operations will make it easy to eliminate
many parallelization problems with variables shared between conjuncts in bounded universal quantifications.
For example, this is a program which computes the
inner product S of two arrays X and Y.
i_p(X, Y, S) f size(O, X, Z) /\ size(O, Y, Z) /\ size(O, T, Z) /\
VIVQ{Y[I] = Q --+ T[I] = XlI] x Q} /\
reduce( +, T, S).
The arrays X, Y and T are shared between all conjuncts
but they all access distinct elements of the arrays. (The
variable Q was only introduced to maintain the standard
form of bounded quantifications. It seems convenient
and possible to relax, the syntax to recognize expressions
such as VI {T[I] = XlI] x Y[I]} as bounded quantifications, which is certainly even more elegant.
Sometimes the partial sums are also needed in the
computation. In this case it is useful to compute a 'scan'
with plus over an array. The result is an array of the
same length but where each element contains the sum of
all preceding elements in the first array.
10
FURTHER EXAMPLES
We now turn to a few more examples written using
bounded quantifications. In the authors' opinion these
formulas express at a high level the essentials of the algorithms they implement. In some cases they contain
formulas reminiscent of what would be (informally expressed) loop invariants when programming in another
language.
10.1
Factorial
The following program computes the factorial of N. The
program shows the use of the cardinal range formula.
factorial( N, F) f size(O, T, N) /\
VI {cardinal(I, N)
reduce( x, T, F).
10.2
--+
T[I] = I
+ 1} /\
Fibonacci
The following program computes the Nth Fibonacci
number. The program is remarkable in being both simple
and efficient, since it does not recompute any Fibonacci
numbers. Similar effects have been accomplished using
'memo' relations and 'bottom-up' resolution, etc., but
this solution appears both simple, elegant and semantically impeccable.
fibonacci( N, F) f size(O, T, N + 1) /\
VI {cardinal( I, N) --+
I
I
=
°/\
T [1]
=
1V
= 1 /\ T [1] = 1 V
1 > 1 /\ T [I] = T [I - 1]
F
10.3
+ T [I -
2]} /\
= T[N -1].
Finding roots in oriented forests
Suppose that the array P represents an oriented tree. 3
Each element of P contains the index of the parent of
some node; roots contain their own index. The following program returns a new array in which each element
points immediately to the root of its forest. This is an
example of a parallel-prefix algorithm and it also illustrates how bounded quantifications and recursion can be
used together.
find(P,P) f - VI{P[I] = P[I] --+ P[I] = P[P[I]]}.
find(Po, P) f VIVJ{Po[I] = J --+ (J = Po[J]/\ PdI] = JV
J i= Po [J]/\ PI [I] = Po [J] )} /\
10.4
Matrix transposition
The following little program transposes a matrix.
trans(MI, M 2 ) f size(O, M I , A) /\ size(1, Ml, B) /\
size(O, M 2 , B) /\ size(1, M 2 , A) /\
VIVJVQ{MI[I, J] = Q --+ M 2 [J, I] = Q}.
3Recall that an oriented tree is a "directed graph with a specified node R such that: each node N # R is the initial node of
exactly on arc; R is the initial node of no arc; R is a root in the
sense that for each node N # R there is an oriented path from N
to R" (Knuth, 1968).
822
10.5
Numerical integration
VI { cardinal(I, N)
l
bf
(x)dx
using Simpson's method (a quadrature method). In the
program below we let A and B be the limits, N the
number of intervals and I the resulting approximation of
the integral. We assume that the relation r(X, Y) holds
if and only if f(X) = Y, where f is the function being
integrated.
intsimp( A, B, N, I) f W = (B - A)/N /\
size(O, G, 2 x N + 1) /\ size(O, Z, N) /\
VIVY{G[I] = Y -+ r(A + I x W/2, Y)} /\
VIVS{Z[I] = S-+
S = W x (G[2 x I] +
4 x G[2 x I + 1] +
G[2 x I + 2))/6} /\
reduce( +, Z, I).
The array G is set up to contain the 2 x N
+ 1 values
f(a), f(a + w/2), f(a + w), ... , f(b - w), f(b - w/2),
f(b). These values are used to compute the area for each
of the intervals, stored in Z. Finally the sum of the areas
is computed.
10.6
Linear regression
This is an example of a more involved numeric computation, adopted from Press et al. (1989). The problem is to
fit a set of n data points (Xi, Vi)' :s; i < n, to a straight
line defined by the equation Y = A + Bx. We assume
that the uncertainty O'i associated with each item Yi is
known, and that all Xi (values of the dependent variable)
are known exactly.
Let us first define the following sums.
°
S -
",n-l
1
L....i=O ~
S x -
",n-l Ei..
L....i=O
(T~
The coefficients A and B of the equation above can now
be computed as
The following program computes A and B from three
arrays X, Y and U.
linear_regression(X, Y, U, A, B) f size(O, X, N) /\ size(O, Y, N) /\ size(O, U, N) /\
size(O, Z, N) /\ size(O, Zx, N) /\ size(O, Zy, N) /\
size(O, Zxx, N) /\ size(O, Zxy, N) /\
-+
Z[I] = 1/(U[I] x Uri)) /\
Zx[I] = X[I]/(U[I] x Uri]) /\
Zy[I] = Y[I]/(U[I] x Uri)) /\
Zxx[I] = (X[I] x X[I))/(U[I] x Uri)) /\
Zxy[I] = (X[I] x Y[I))/(U[I] x Uri))} /\
reduce( +, Z, S) /\
reduce( +, Zx, Sx) /\ reduce( +, Zy, Sy) /\
reduce( +, Zxx, Sxx) /\ reduce( +, Zxy, Sxy) /\
Delta = S x Sxx - Sx x Sx /\
A = (Sxx x Sy - Sx x Sxy)/ Delta /\
B = (S x Sxy - Sx x Sy)/ Delta.
The following program computes an approximation to
the integral
It is obvious that this program can be run in O(1og n)
time using n processors, dominated by the reductions.
The bounded quantification which computes the intermediate arrays Z, Zx, Zy, Zxx and Zxy runs in constant
time using n processors.
11
LIST EXAMPLES
The following two examples are present simply to show
that it is possible to express also list algorithms using bounded quantifications, although the recursive programs are usually more elegant.
11.1
Lessall
The lessall program for lists is of course very similar to
the array program (this makes it easy to change the data
structure) .
lessall(A, L)
11.2
f-
VBV I {member{I, L, B)
-+
A < B}.
Partition
The partition program, finally, is an example of a program which is much clearer when expressed recursively.
We intend that partition(X, A, L, H) be true if and only
if L contains exactly those of elements of X which are
less than or equal to A, and H contains exactly those
which are greater than A. The partition predicate is
usually part of an implementation of Hoare's Quicksort
algorithm. Here is the recursive program:
partition( [], A, [], []).
partition([BIX]' A, L, [BIH)) f A :s; B /\ partition(X, A, L, H).
partition([BIX], A, [BIL], H) f A> B /\ partition(X, A, L, H).
In the following program which uses bounded quantifications, we have tried to keep some of the structure of
the recursive program.
partition(X, A, L, H) f VFxVZVI{suffiai...I,X, Fx) -+
member(I,SL,L) /\
member(I, SH, H) /\
part(Fx,L,H,A,SL,SH)} /\
823
14
member(l, SL, L) /\ member(l, SH, H).
part([] , [], [], A, SL, SH).
part([BIX]' L, H, A, SL, SH) f J=I+1/\
member(J, SL, Ld /\ member(J, SH, HI) /\
(A ::; B /\ L = LI /\ H = [BIHI ] V
A> B /\ L = [BILl] /\ H = Hd.
The program computes two lists of lists SL and SH which
are scans of partitions on X, picking out those elements
which are less than or greater than A, respectively.
14.1
12
NESTED BOUNDED QUANTIFICATIONS
Consider a bounded quantification whose body is another bounded quantification:
Provided that 8dx] is true for any x in {co, Cll ... ,ck-d,
and that similarly 8 2 [y] is true for any y in {do, d l , ... ,
dl-d, the nested bounded quantification is equivalent to
the k x R element conjunction
[Co, do]
/\ [CI' do]
/\ [co, dl ]
/\ [CI, dl ]
/\ . . .
/\ ...
/\ [co, dl /\ [CI' dl -
l ]
l ]
As before, provided that all data dependencies are satisfied, all these conjuncts can be run simultaneously.
13
TOLERATING DEPENDENCIES
In all examples shown above the computations of the
conjuncts obtained from a bounded quantification have
been independent. Therefore the conjuncts could be
computed in any order, for example in parallel.
There are interesting computations where the resulting conjuncts are dependent. Consider, for example, the
following program (adapted from a program by Anderson & Hudak [1990]) which defines an n x n matrix A
through a recurrence.
rec(A) f size(O, A, N) /\ size(l, A, N) /\
VIVJ{A[I,J] = X -7
I=l/\X=lV
I>l/\J=l/\X=lV
I>l/\J>l/\
X = A[I - 1, J] +
A[I -1, J - 1] +
A[I, J -I]}.
This program requires a co-routining implementation of
bounded quantification to run on a sequential computer
or synchronization to run on a parallel computer. We are
currently investigating whether automatic generation of
synchronization/ co-routining code is sufficient or if the
programmer should be allowed to annotate the program,
for example, through read-only variables (Shapiro, 1983).
RELATED WORK
We noted above that M. Meier has suggested (1991) how
to compile some tail recursive (conjunctive as well as
disjunctive) programs to iterative programs on top of
WAM.
Several authors, e.g., Lloyd & Topor (1984) and Sato
& Tamaki (1989), have discussed methods for running
logic programs with arbitrary formulas in bodies. Our
method only covers a limited extension of Horn clauses.
Array Comprehensions
It is obvious that there are similarities between arrays
and bounded quantifications on one side, and the array
comprehensions proposed for the Haskell language (Hudak & Wadler, 1990) on the other. Both concepts aim
to express the contents of an array, or the relationship
between several arrays, declaratively.
It appears to us, as with most functional programming
language concepts, that when they are at all appropriate
they offer a more compact and occasionally more elegant notation. For example, the factorial program above
could have been expressed more easily if an expression
describing the temporary array T could have been written immediately.
However, when the relationship between the elements
of more than one array are to be described, the bounded
quantifications appear to be more comprehensive.
Array comprehensions are, in general, evaluated by
lazy computation. This can be thought of as a degenerated form of concurrency which suspends part of a computation until it is known that it must be performed.
We do not think lazy computation is necessary, provided
unification with the "logical variable" and a more general
form of concurrency.
Futures (Halstead, 1985) are yet another way of giving
names for values which are yet' to be fully computed.
14.2
Nova Prolog
The ideas presented above originated as a generalization of the language Nova Prolog (Barklund & Millroth,
1988).4 Here, however, it is appropriate to present Nova
Prolog as a language embodying a subset of bounded
quantifications. The subset is chosen to obtain a language tailored specifically for massively parallel SIMD
computers, such as the Connection Machine. More
specifically, we assume that we can store some data structures in such a way that processor i has particularly efficient access to the ith element of each data structure.
We say that those data structures are distributed.
4Nova Prolog relates to Prolog in much the same way as *LISP
(by Thinking Machines Corp.) relates to Common LISP and C*
(also by Thinking Machines Corp.) to C. That is, it is a sequential
programming langauge extended with a distributed data structure
and a control structure for expressing computations over each element on the data structure.
824
We currently limit the distributed data structures to
be compound terms; in fact only those compund terms
whose function symbol is pterm and whose arity is some
fixed value. We shall call them 'pterms.' (This is to help
a compiler distinguish distributed data structures from
other compound terms.)
Since pterms are the only distributed data structures
and they are compound terms, the only range formula we
need is arg( i, t, x). 5 We have chosen a syntax for bounded
quantifications which makes it possible to combine the
range formula with the quantification of variables. In
Nova Prolog a formula
where T is a pterm, is called a 'parall' and has the same
meaning as the bounded quantification
YIYAN A z · .. YAn( arg(I , T I , AI) ~
arg(I, Tz, Az) 1\ ... 1\
arg(I, Tn, An)
1\
<1>[1]),
namely that <1> is true for every corresponding element
Ai of T i , 1 :::; i :::; n. We can see that in Nova Prolog
the 'index' I is implicit and is denoted by the constant
symbol self in the body <1>.
All examples above for array computations can' be
translated into Nova Prolog. vVe have recently implemented parts of Nova Prolog in *LISP (Blanck, 1991).
15
CONCLUSION AND FUTURE WORK
We have defined bounded quantifications, a new construct for logic programming languages. We have discussed how they can be efficiently implemented on sequential and parallel computers. They offer clarity as
well as efficiency and we propose that language designers
and implementors consider including them in implementations of, e.g., Prolog, Godel and KLl.
A natural continuation of this work is to verify experimentally that bounded quantifications can be implemented efficiently in sequential and concurrent languages, and on sequential and parallel computers. It
is also important to investigate how data dependencies
and other synchronization considerations can be handled, when bounded quantifications are interpreted concurrently.
REFERENCES
Anderson, S. & Hudak, P., 1990, Compilation of Haskell
Array Comprehensions for Scientific Computing. In
Proc. SIGPLAN '90 Coni. on Programming Language Design and Implementation. ACM Press, New
York, N.Y.
5The difference from the elt predicate we proposed earlier is
that arg operates on compund terms, rather than arrays, and that
indexing is one-based. This is of course related to the use of the
arg predicate in Prolog.
Barklund, J. & Millroth, H., 1988, Nova Prolog. UPMAIL Tech. Rep. 52. Computing Science Dept., Uppsala University.
Blanck, J., 1991, Abstrakt maskin for Nova Prolog. Internal report. Computing Science Dept., Uppsala University.
DeGroot, D., 1984, Restricted And-Parallelism. In Proc.
Inti. Coni. on Fifth Generation Compo Systems
1984, pp. 471-8. North-Holland, Amsterdam.
Halstead, R., 1985, Multilisp-a Language for Concurrent Symbolic Computation. ACM TOPLAS, 2,
501-38.
Hill, P. M. & Lloyd, J. W., 1991, The Godel Report.
Tech. Rep. 91-02. Computer Science Dept., University of Bristol.
Hudak, P. & Wadler, P., 1990, Report on the Programming Language Haskell. Tech. Rep. YALEU/DCS/
RR-777. Dept. of Computer Science, Yale Univ.
Knuth, D. E., 1968, The Art of Computer Programming. Volume 1 / Fundamental Algorithms. Reading, Mass.
Lloyd, J. W. & Topor, R. W., 1984, Making Prolog more
Expressive. J. Logic Programming, 1, 225-40.
Meier, M., 1991, Recursion vs. Iteration in Prolog. In
Proc. 8th Inti. Coni. on Logic Programming (ed.
K. Furukawa), pp. 157-69. MIT Press, Cambridge,
Mass.
Millroth, H., 1990, Reforming Compilation of Logic Programs. Ph.D. thesis. Uppsala Theses in Computing
Science 10. Computing Science Dept., Uppsala University. (A summary will appear in the next item.)
Millroth, H., 1991, Reforming Compilation of Logic
Programs. In Proc. 1991 Inti. Logic Programming
Symp. (ed. V. Saraswat, K. Ueda). MIT Press, Cambridge, Mass.
Press, W. H. et al., 1989, Numerical Recipes. The Art of
Scientific Computing. Cambridge Univ. Press, Cambridge, U.K.
Sato, T. & Tamaki, H., 1989, First Order Compiler: a
Deterministic Logic Program Synthesis Algorithm.
J. Symbolic Computation, 8, 605-27.
Shapiro, E., 1983, A Subset of Concurrent Prolog and
Its Interpreter. Technical Report TR-003. ICOT,
Tokyo.
Tarnlund, s.-A., 1992, Reform. In Massively Parallel
Reasoning Systems (ed. J. A. Robinson). To be published by MIT Press, Cambridge, Mass.
Warren, D. H. D., 1983, An Abstract Prolog Instruction
Set. SRI Tech. Note 309. SRI International, Menlo
Park, Calif.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
825
An Implementation for a Higher Level
Logic Programming Language
Ross A. Paterson t
Anthony S.K. Cheng*
Software Verification Research Centre
Department of Computer Science
The University of Queensland
4072, Australia
Abstract
For representing high level knowledge, such as the math~
ematical knowledge used in interactive theorem provers
and verification systems, it is desirable to extend Prolog's
concept of data object. A basic reason is that Prolog
data objects-Herbrand objects-are terms of a minimal
object language, which does not include its own object
variables, or quantification over those variables.
Qu-Prolog (Quantifier Prolog) is an extended logic
programming concept which takes as its data objects,
object terms which may include free or bound occurrences of object variables and arbitrary quantifiers to
bind those variables. Qu-Prolog is unique in allowing its
data objects to include free occurrences of object variables.
In this paper the design of the abstract machine for
Qu-Prolog is given. The underlying design of the machine reflects the extended data objects and Qu-Prolog's
unification algorithm.
1
Introduction
The extended logic programming language Qu-Prolog
(Quantifier Prolog) [Cheng et ai. 1991, Paterson and
Hazel 1990, Paterson and Staples 1988, Staples et
al. 1988a, Staples et ai. 1988b] has been designed to provide improved support for language processing applications such as interactive proof systems. Its main feature
is that it supports higher level symbolic data types than
does Prolog. In particular, the data objects which QuProlog reasons about are terms of a full first order logic
syntax, which includes both object level variables and
arbitrary bindings of object level variables.
The language >.Prolog [Miller and Nadathur 1986],
which extends Prolog with typed lambda-terms, may
also be used for these purposes. Qu-Prolog is weaker,
in that its terms correspond to second-order lambdaterms; substitution is provided, but not application of
terms to terms. However, in Qu-Prolog, as in traditional
notation, term variables may refer to open terms, raising
further questions of whether an object level variable oc• e-mail: chenaGcs.uq.oz.au
t present address: Department of Computing, Imperial College,
London SW7.
curs free in a term, or whether two object level variables
are distinct.
The Qu-Prolog Abstract Machine (QuAM) [Cheng
and Paterson 1990] is designed as the target for compilation of the logic programming language Qu-Prolog.
QuAM is developed from the Warren Abstract Machine (WAM). New mechanisms are introduced to handle
quantified terms and substitutions and flexible programming in Qu-Prolog. This paper presents the basic structure of the language and describes its implementation.
The main features of Qu-Prolog are described in section 2. In section 3, unification is extended to Qu-Prolog
terms. The design of QuAM is given in section 4. Some
examples are given in section 5. It is assumed that the
reader has some knowledge of the design of WAM [AltKaci 1990, Warren 1983] and the compilation of logic
programming languages.
2
Qu-Prolog - the Language
Qu-Prolog has Prolog as a subset, and uses Edinburgh
Prolog syntax for constants and structures, and for ordinary variables which are intended to range over arbitrary
object level terms. These variables will be referred to as
meta variables, in recognition of the meta level status of
the Qu-Prolog language relative to the object language.
In addition, Qu-Prolog introduces syntax to represent
object level variables and quantifiers, as follows.
Qu-Prolog has other features not described here.
These include persistent variables, which are used to
manage incomplete information in the database. For a
description of persistent variables and their implementation, see [Cheng and Robinson 1991].
2.1
Object Variables
Since object level variables are simply part of the object
level syntax, it might seem natural to name them at the
Qu-Prolog (meta) level by constants. Instead, Qu-Prolog
refers to object level variables only by a type of QuProlog (~eta) level variable, called object-var variables.
The semantics of object-var variables is that they range
over object level variables. The success of this approach
reflects the common intuition that object level variables
are interchangeable.
826
The phrase 'object variable' is commonly used to abbreviate 'object-var variable' since it has no other use
in describing Qu-Prolog syntax. For an occasional reference to a variable of the object language, the phrase
'object level variable' will be used.
Qu-Prolog ,object variables have the same lexical
conventions as constants.
In order to distinguish
them, object variable notations must be declared by
ob j ect_var /1. The declaration convention is that an
explicit declaration of an object variable name also implicitly declares all variant names derived by appending
an underscore followed by a positive integer. The standard library declares the atoms x, y and z as object
variables.
As each object variable is intended to range over all
object level variables, it is important to know whether
two object variables denote different object level variable. This information can be supplied implicitly or by
explicit use of the predicate distinctJrom/2. For example, x distinctJrom y asserts that x and y do not
denote the same object level variable. By default, all
object variables occurring in the same clause/query are
distinct from each other.
Remark: In fact Qu-Prolog makes internal use of
some meta level constants representing'object level variables. These terms, called local object variables, are
mentioned below but they are not discussed here in detail. Their key role is as 'new' variables, for use when
changing bound variables. This newness is implemented
by a convention that they are excluded from instantiations of user accessible meta variables and object-var
variables.
2.2
Quantifiers
Qu-Prolog can reason about object level terms which
include arbitrary quantifiers, in much the same way that
Prolog can reason about terms which include arbitrary
function symbols. The user declares quantifier notations
as needed. Thus it is possible to have representations of
f for integral calculus as well as 't/, 3 for first order logic.
Distinct quantifier notations in Qu-Prolog represent
distinct object level quantifiers. Qu-Prolog uses the traditional prefix notation for quantified terms. Quantifiers
are declared explicitly by executing
ope Precedence, quant, Q)
where Q is the representation for the quantifier; Q must
have the same lexical structure as a Prolog constant.
2.3
Substitutions
Throughout logical reasoning, the need for substitutions
arises naturally. Qu-Prolog directly supports parallel
substitution for free occurrences of object level variables.
The syntax for substitutions in Qu-Prolog is
[tI/XI, ... ,tn/Xn ] * term
where Xl, ••• ,Xn are object variables and tl, . .. ,tn are
arbitrary Qu-Prolog terms.
Qu-Prolog substitutions are evaluated at unification
time, in accordance with the standard concept of correct substitution into quantified terms, which substitutes
only for free occurrences of variables and which changes
bound variables to avoid capture of free variables from
the substituted terms. For a term 81 * ... * 8 n * Y where
81, ... ,8 n is a sequence of substitutions, the substitutions
are applied from right to left. That is, 8 n is applied to y
first. The effect of applying a substitution to a term can
be observed with this example:
After applying the rightmost substitution, the result will
be:
•
•
* ti if for some i = 1, ... , n, Xi = y, or
8 * Y if for all i = 1, ... , n, Xi distinct-from y.
8
It is also possible that there is insufficient information
at a particular stage to determine which of these cases
applies. In that case evaluation of the substitution will
be delayed. That may lead to delaying of unification subproblems, perhaps extending beyond the current unification call.
As well as substitutions appearing in user inputs, the
system can generate substitutions via unification. For
example, the problem lambda X A = lambda y B has
the solution A = [x/V] * B.
2.4
Example
As a small example, we give a A-calculus evaluator in
Qu-Prolog. The terms of the A-calculus are transcribed
directly, except that we use the infix constructor (0 for
application. First, we declare the quantifier lambda and
the application operator:
?- op(700, quant, lambda).
?- op(600, yfx, (0).
Now the following predicate defines the structure of Aterms:
lambda_ term(x) .
lambda_term(A(OB)
lambda_term(A),
lambda_term(B).
lambda_term(lambda x A)
lambda_ term (A) .
For example, the following are A-terms:
x
lambda x x
(lambda x x) (Oy
lambda x (x(Oy)
Note that A-terms may contain free object variables.
Now we can define a single-step reduction predicate
on A-terms:
827
?- op(800, xfx, =».
(lambda x A)~B => [B/x]*A.
A~B => C~B
A => C.
A~B => A~C
B => C.
lambda x A => lambda x B :- A => B.
The first clause is the well-known ,B-rule. The others
allow rewrites anywhere in the expression. If desired, we
could also add the 17-rule:
lambda x
A~x
=> A :- x not_free_in A.
The full reduction relation in the usual reflexive, transitive closure of the single-step reduction predicate:
?- op(800, xfx, =>*).
A =>* C :- A => B, !, B =>* C.
A =>* A.
3
3.2
Quantifiers
To motivate the treatment of unification for quantified
terms, consider
[vlx] * t
Qu-Prolog extends Prolog unification to cover the new
data objects in the language. Two terms are unified if
they are equivalent up to changes of bound variables (a
equivalent). Since unification for Prolog terms is not
changed (except that Qu-Prolog includes occurs checking), our discussion will concentrate on the new features.
Because Qu-Prolog unification is more difficult than
ordinary unification-it is not decidable, but semidecidable [Paterson 1989]-we often encounter subproblems which cannot be solved at that point in the
computation, but we may be able to make further
progress on them later. Such sub-problems are delayed,
waiting for a relevant variable (or variables) to be instantiated, at which point they are re-attempted. If the
sub-problems remain unsolved at the end of query solution, they are displayed as part of the answer. This
approach has proved practical in our implementation.
We have also found it useful to delay sub-problems
to avoid branching. As a simple example, consider the
unification problem [X/y]*Z = c, where cis a constant (a
similar situation arises with structures). The unification
can succeed in one of two ways:
• Projection: Z
= c.
Here the substitution has a null
= y and X = c.
Hence it is impossible to determine a unique most general
unifier. Rather than branch the unification problem, QuProlog delays it until the binding of Z is known.
3.1
Object Variables
Since an object variable is intended to range over object
level variables, and since object variables are the only
Qu-Prolog terms of this type, an object variable can be
instantiated only to another object variable. Further,
unification fails if the object variables denote distinct
object level variables. Also, whenever a meta variable
is unified with an object variable, the meta variable is
bound to the object variable.
lambda y y
Intuitively, the two terms are unifiable without instantiation of x or y, because the terms are the same up
to change of bound variable. To unify x and y would
be incorrect: the two terms are a equivalent even if x
and y denote distinct object level variables. Hence during quantifier unification, Qu-Prolog uses substitution
to rename the bound variables to a common bound variable. The bound variable must not appear in the unified
terms. This is where the local object variables mentioned
previously are used. In general, a problem of the form
q x t = q y t' is reduced to
Unification
• Imitation: Z
effect on Z.
=
lambda x x
=
[vly] * t'
for some new local object variable v, and unification continues. Here is how the approach applies to the example,
(v is a local object variable).
lambda y y
lambda x x
*x
[vlx]*x
v
lambda v [vly]
lambda v [vlx]
[vly]
*y
*y
v
(success)
A substitution containing local object variables, when
applied to a meta variable, may be removed by a rule
called inversion: problem of the form [vlx] * X = t is
reduced to the two problems
a:
X
[xlv] * t, x not-free_in t
=
For example, we have the following reduction:
lambda x A
lambda y y
A
[v/y] * y
[xlv] * [vly] * y,
x not-free_in [v/y] * y
x, x not-free_in v
A
x
[v/x] * A
A
Unification produces the answer A = x.
As a further example, consider
lambda x A
= lambda y x
Since x does occur free on the right and cannot occur free
on the left, this unification problem should fail. In QuProlog unification, that failure is detected when, at the
time of calculation of A = [x I v] * [v I y] * x, the constraint
x not-free_in [v I y] * x is generated and tested; and after
substitution evaluation, the test fails.
Such not-free_in constraints may be delayed if they
cannot be immediately decided. For example, the unification problem
lambda x A
lambda y [xlz]
*Z
828
gives the solution
A
=
[x/v] * [v/y]
provided:
* [x/z] * Z
x nOLfreLin [v/y]
* [x/z] * Z
In the absence of further information about Z, the
noLfree_in test must be delayed.
3.3
data cell with a VARIABLE tag is used to indicate an
unbound variable in Qu-Prolog. The value field of the
data cell contains a pointer to a list of delayed problems associated with the variable (Figure 1). Although
the representation of variables is different to WAM, the
classification into temporary and permanent variables,
the age determining method and the rules of binding a
variable are retained.
+
IVARIABLEI
Occurs Checking
Unlike Prolog, occurs checking is included as standard in
Qu-Prolog unification. However, it is not always possible
to determine whether a variable occurs in the final form
of a term. For example, it is impossible to determine
whether X occurs in the term [X/y] * Z without knowing
more information about Z. If Z is bound to y, X occurs
in the term. On the other hand, if Z is bound to a
constant c, X does not occur.
Thus, if we are considering a sub-problem of the form
X = t, we cannot always reduce the problem. We use
two conservative syntactic conditions:
delayed problems
Figure 1: An Unbound Qu-Prolog Variable
The REFERENCE tag is retained to indicate that
one variable is bound to another one. When two heap
variables are bound together, the one created more recently points to the one created earlier on the heap. The
delayed problems from the younger one are appended to
those of the older one.
Unbound Object Variables
• If X occurs in t outside of any substitution, and t is
not of the form s * X, the unification fails, for the
X must appear in t no matter how other variables
are instantiated .
delayed problems
• If X does not appear in t, including substitutions,
X is instantiated to t.
distinct object variables
Figure 2: An Unbound Object Variable
If neither of these conditions is met, the unification subproblem must be delayed, pending further instantiation
ofX.
4
One of the design criteria for QuAM is that the efficiency of ordinary Prolog queries within Qu- Prolog must
be maintained wherever possible. Thus, most of the
features of WAM are retained and the description below will concentrate on the differences between QuAM
and WAM. The current implementation of QuAM differs
from the present description in that it uses an experimental representation for structures, intended for future enhancements to the Qu-Prolog language with higher-order
predicates and multiple-place quantifiers. The present
paper focuses on other aspects of the machine, so we omit
these details here, assuming a WAM-like representation
of structures. Because of the difference of the representation of the structures, no performance evaluation will
be given. A description of the current implementation
can be found in [Cheng and Paterson 1990].
4.1
x
The Qu-Prolog Abstract Machine
Data Objects
Unbound Variables
Because of the association with delayed problems described below, the representation of a self reference cell
for unbound variables as in WAM is inapplicable. A
OBJECLVARIABLE
y
z
OBJECLVARIABLE
Figure 3: x distincLfrom y and x distincLfrom z
A separate tag OBJECT_VARIABLE is given tothe
object variables to distinguish its function from the variables. The value field has the same purpose as the value
field in variables. The second cell in an object variable
points to a list of object variables from which it is distinguished (Figures 2, 3). Rather than record all object
variables in the distinctness list, an ALL-DISTINCT
tag is placed in this cell for local object variables.
829
The classification method, the binding rules and the
age determining method used for variables is also applied
to object variables.
The OBJECT _REFERENCE tag indicates that an
object variable is bound to another object variable.
When two object variables are bound together, the distinctness information from both object variables are
merged together and placed in the older object variable
and the delayed problems will be woken up.
substitution, while the renaming 2n cells refer to the
object variables and terms. Again the substitution pairs
are stored in the reverse order for easy evaluation (Figure
6).
2
OBJECT _REFERENCE
Quantified Terms
Qu-Prolog currently allows 1 place quantifiers (i.e. quantifiers with one bound variable) only. To represent quantified terms in Qu-Prolog, a tag QUANTIFIER is introduced, analogous to the STRUCTURE tag of the
WAM. Such a value points to a three contiguous cells,
containing the quantifier atom, a reference to the bound
object variable, and the body ofthe quantified term (Figure 4).
QUANTIFIER
q
ATOM
OBJECT_VARIABLE
term
Figure 4: Quantified Term q x term
Substitution Operators
In QuAM, an application of one or more substitutions
to a term is represented as a data cell, marked with a
SUBSTITUTION_OPERATOR tag and pointing to a
pair of cells. The first cell contains a pointer to the list
of substitutions, while the second is a data cell denoting
the term (Figure 5). The list of substitutions is stored
in reverse order, with the innermost substitution at the
front, to simplify evaluation.
SUBSTITUTION_OPERATOR
Figure 5: sub
* term
An ordinary parallel substitution is represented as a
data cell with the property tag, containing a pointer to
a pair of cells. The first of the cells is a pointer to the
parallel substitution, while the second represents the rest
of the substitution list. A parallel substitution involving
n pairs of object variables and terms is represented as
a block of 2n + 1 cells; the first contains the size of the
INTEGER
OBJECT_REFERENCE
INTEGER
- r---+ y
456
- r---+ x
123
Figure 6: A Parallel Substitution s * [123/x, 456/y]
Each substitution list contains a marker describing the
property of the substitution list. It is used during unification to assist the determination of whether or not the
unification can be solved by projection. In general, a
problem of the form s * A = t, where t is a constant,
structure, quantified term or object variable, can always
be reduced by imitation. If s is known not to contain
any terms of the same top-level structure as t, then the
problem cannot be solved by projection. Thus branching
is eliminated and we can proceed by imitation. Otherwise, the unification problem will be delayed to avoid
branching. In most cases, the whole substitution list
must be examined in order to eliminate projection. In
special cases, the marker will contain enough information
to make a complete search unnecessary.
It is also convenient to know if a substitution list consists solely of renamings generated by quantifier unification, as such a list can be safely inverted. Thus, each
substitution list is marked as one of:
• invertible: the substitution list consists solely of renamings.
• object variables only: the substitution list is not invertible, but its range contains only object variables.
• others: the range of the substitution list contains
constants, structures, quantifiers or meta variables.
4.2
Data Areas
QuAM supports the same data areas as in WAM. The
heap provides space to store data objects as well as the
distinctness information and linking cells required for delayed problems. The local stack holds choice points and
environments. The choice points are enlarged to reflect
the extra data areas and registers.
Because the delayed problems list and distinctness information must be reset to their previous value upon
backtracking, the method of trailing (i.e. resetting the
address to null) used in WAM is inapplicable. Each entry in the trail is extended to be a pair of addresses and
830
previous values to provide extra information for backtracking.
In addition to the standard WAM data areas, a delayed problems stack that holds any delayed problem
generated during unification is provided. Apart from
containing pointers to the arguments for the delayed
problem, it has a type tag and a solved tag. The type tag
indicates whether the delayed problem is a unification or
a noLlree_in problem (Figure 7). The solved tag is set
whenever the problem is solved.
[f(l)jxJ*A
~IFY
f(l)
Figure 7: Delayed Problem: [!(1)/x]
*A =
1(1)
When a query is solved, any delayed problem that remains is printed as a constraint to the solution. Storing
the delayed problems in a separate area allows fast access
to the problems when the solution is printed.
4.3
Registers
There are a few extra registers used in QuAM:
• the top of the delayed problems stack,
• a list of formerly delayed problems that have been
woken up,
• The substitution pointer register points to the entry
in the parallel substitution where the next component is to be added.
As well as the X registers, there is an associated set of registers, known as the XS (X substitution) registers, which hold the substitution of a term
when the substitution and the term of a SUBSTITUTION_OPERATOR data cell are broken up during
dereference. This procedure enables the substitution to
be passed from the outer structure to the inner terms
effectively.
Because each Y register is one data cell in size, and an
OBJECT_VARIABLE is two cells in size, a Yregister
cannot hold an OBJECT_VARIABLE directly, and instead contains a reference to an OBJECT _VARIABLE
in the heap.
4.4
Instruction Set
For each new data object provided in QuAM, there are
put and get instructions to build and unify the data object. The new instructions are:
put-object-variable Xi
Create a new object variable on the heap, and place
a reference to it in Xi.
get-object-variable Xi Xj
Copy the object variable reference in Xj into Xi.
put-object-value Xi Xj
Copy the object variable reference in Xi into Xj.
get-object-value Xi Xj
Unify X Sj, Xj with the object variable referenced
by Xi.
put-quantifier q Xi Xj Xk
Construct a quantified term, with quantifier q,
bound object variable Xi and body Xh and place a
reference to it in Xk.
geLquantifier q Xi Xj Xk
Match the term in X Sk, Xk with a quantified term,
with quantifier q and bound object variable Xi. The
body is placed in X Sj, Xj.
In each of the last two instructions, the register Xi must
have been previously set to an object variable.
Note that some of these instructions use the XS registers, while others ignore them, expecting any substitution to be incorporated into the term in the X register. Thus during head matching substitutions are conveniently accessible in the substitution registers, allowing efficient dereferencing, and sharing of substitutions.
However, if such a value is to be a sub-term, its substitution (if any) must be re-incorporated into the term.
There is a set of put instructions to build substitutions, but no corresponding set of get instructions. This
is because a substitution occurring in the head must be
built in the same way as if it had occurred in the body,
and then the substituted term must be unified with the
corresponding head argument (or component). The instructions available are:
puLsubs_operator Xi Xj
Combine X Sj and Xj into a SUBSTITUTION_OPERATOR, and place a reference to it
in Xi.
puLempty_subs Xi
Set X Si to an empty substitution.
pULparalleLsubs n Xi
Prepend a parallel substitution, consisting of n pairs
(each supplied with the next instruction), to X Si.
pULparalleLsubs_pair Xi Xj
Add a pair, substituting Xj for the object variable
referred to by Xi, to the parallel substitution currently under construction.
puLsubs Xi Xj
Transfer a substitution from X Si to X Sj.
seLobject-property Xi
Set the property tag on XSi to "object variables
only".
determine_property Xi
determine the property tag of X Si.
831
put_variable XO XO
1. A
put_empty_subs XO
put_object_variable Xl
Yo y
put_atom 'b' X2
put_object_variable X3
1. x
put_atom 'a' X4
put_parallel_subs 2 XO
Yo * A
put_parallel_subs_pair Xl X2 Yo [b/y] * A
put_parallel_subs_pair X3 X4 Yo [a/x,b/y]*A
determine_property XO
The only new procedural instructions are:
do_delayed_problems
Solve any woken delayed problems. This instruction
is executed after the head has been matched.
noLfree_in
Perform a noLfree_in test during quantifier unification.
4.5
Dereference
Because of the presence of substitution, additional operations are included into the dereference algorithm. The
substitutions are evaluated during dereference whenever possible. Given an object variable, the substitution
will map the object variable to its value. Depending
on the type of the data object encountered in the term,
dereference also simplifies the substitution before ree
turning.
5
Examples
A number of small examples are given here to highlight
the design differences between QuAM and WAM.
5.1
put_subs_operator XO XO
put_structure f/l Xl
unify _value XO
1. group together
Whenever a substitution is associated with a term in
the head, that term together with the substitution will
be built by put instructions and general unification will
be called. For example, consider the following clause
from the A-calculus evaluator:
Quantified Terms
Quantified terms are constructed in a similar fashion to
the unary structures, except for the object variable. The
following sequence of instructions shows how a quantified
term lambda x x is built in register Xl:
put_object_variable XO
put_quantifier lambda XO XO Xl
1.
~
Matching a quantified term is slightly more complicated than structure matching. Apart from matching
the term from outside in (i.e. match the quantifier befor~
matching the body), it must establish that the bound
variable of the quantified term in the head does not occur freely in the body of the quantified term from the
query. Thus, a not..free_in instruction must be executed before the quantifier matching is performed. The
following instructions match the argument Xo with the
head argument (lambda x A)COB:
get_structure CO/2 XO
unify_variable X2
Yo lambda x A
unify_variable XO
1. B
put_object_variable X3
Yo x
put_empty_subs X3
not_free_in X3 X2
get_quantifier lambda X3 X2 X2 Yo A
5.2
If the substitution is nested inside another term, an extra step is needed. A SUBSTITUTION _OPERATOR
data object is created to group the substitution and
its associated term together. To construct the term
f ( [a/x, b/y] * A), the following additional instructions are required:
Substitutions
QuAM is designed to create substitutions independent
of the term. The term is created before the substitution. The example [a I x, b I y] * A illustrates this
property.
(lambda x A)COB => [A/x]*B.
In section 5.1 above, we gave the translation of the
matching of the first argument, leaving x in X 3 , A in
X 2 and B in Xo . .The following instructions match the
second argument (in Xl):
put_subs_operator XO XO Yo group together B
put_subs_operator X2 X2 1. group together A
put_empty_subs XO
1. *B
put_parallel_subs XO 1
1. *B
put_parallel_subs_pair X3 X2 1. [A/x]
determine_property XO
get_value XO Xl 1. unify with the argument
Note that A and B must both be combined with their
substitutions, if any. In the case of A, this allows the
value to fit into a cell in the substitution pair. In the
case of B, the substitution must be i~corporated into the
value, and the substitution register set to empty, so that
the new substitution will be outside any existing substitutions.
If the substitution is nested within another term, the
outer term is matched by the get instructions, while the
substitution is built and unified with the appropriate
component.
6
Conclusions
QuAM has been implemented in C under the SUN 4
environment. The compiler was initially implemented in
NU-Prolog [Naish 1986], and subsequently transferred to
Qu-Prolog, which includes Prolog as a subset.
832
Qu-Prolog, including the extensions and features mentioned here, has been motivated particularly by the need
to rapidly prototype interactive proof systems, and currently it is the implementation language for a substantial
experimental proof system [Robinson and Tang 1991].
[Staples et ai. 1988a] J. Staples, P.J. Robinson, R.A.
Paterson, R.A. Hagen, A.J. Craddock and P.C. Wallis, Qu-Prolog: an Extended Prolog for Meta Level
Programming, Proc. of the Workshop on Meta Programming in Logic Programming, University of Bristol, June 1988.
Acknowledgements
[Staples et al. 1988b] J. Staples, R.A. Paterson, P.J.
Robinson and G.R. Ellis, Qu-Prolog: Higher Level
Symbolic Computation, Key Centre for Software
Technology, Department of Computer Science, University of Queensland, 1988.
John Staples, Peter Robinson, Gerard Ellis and Dan
Hazel have made substantial contributions to the design
and implementation of QuAM. This research was supported by the Australian Research Council.
References
[Ai"t-Kaci 1990] H. Ai"t-Kaci, The WAM: a (Real) Tutorial, Report No.5, Paris Research Laboratory (PRL),
France, 1990.
[Cheng and Robinson 1991] A.S.K. Cheng and P.J.
Robinson, An Implementation for Persistent Variables
in Qu-Prolog 3.0, Software Verification Research Centre, Department of Computer Science, University of
Queensland, 1991.
[Cheng and Paterson 1990] A.S.K. Cheng and R.A. Paterson, The Qu-Prolog Abstract Machine, Technical
Report No. 149, Key Centre for Software Technology, Department of Computer Science, University of
Queensland, February 1990.
[Cheng et al. 1991] A.S.K. Cheng, P.J. Robinson and
J. Staples, Higher Level Meta Programming in QuProlog 3.0, Proc. of 8th International Conference on
Logic Programming, Paris, June ~991.
[Miller and Nadathur 1986] D.A. Miller and G. Nadathur, Higher-order Logic Programming, Proc. of 3rd
International Conference on Logic Programming, London, July 1986.
[Naish 1986] L. Naish, Negation and Quantifiers in NUProlog, Proc. of 3rd International Conference on Logic
Programming, London, July 1986.
[Paterson 1989] R.A. Paterson, Unification of Schemes
of Quantified Terms, Technical Report No. 154, Key
Centre for Software Technology, Department of Computer Science, University of Queensland, Dec. 1989.
[Paterson and Hazel 1990] R.A. Paterson and D. Hazel,
Qu-Prolog 3.0 - Reference Manual, Technical Report
No. 195, Key Centre for Software Technology, Department of Computer Science, University of Queensland,
1990.
[Paterson and Staples 1988] R.A. Paterson and J. Staples, A General Theory of Unification and Solution of
Constraints, Technical Report No. 90, Key Centre for
Software Technology, Department of Computer Science, University of Queensland, 1988.
[Robinson and Tang 1991] P.J.
Tang, The Demonstration
Prover: Demo2.1, Technical
Verification Research Centre,
land, September 1991.
Robinson and T.G.
Interactive Theorem
Report 91-4, Software
University of Queens-
[Warren 1983] D.H.D. Warren, An Abstract Prolog Instruction Set, Technical Note 309, Artificial Intelligence Center, Computer Science and Technology Division, SRI International, 1983.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERA nON COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
833
Implementing Prolog Extensions· a Parallel Inference
Machine
Jean-Marc Alliot*
Andreas Herzig t
Mamede Lima-Marques+
(alliot@irit.fr)
(herzig@irit.fr)
(mamede@irit.fr)
Institut de Recherche en Informatique de Toulouse
118 Route de Narbonne
31062 Toulouse cedex, France
Abstract
We present in this paper a general inference machine
for building a large class of meta-interpreters. In particular, this machine is suitable for implementing extensions of Prolog with non-classical logics. We give
the description of the abstract machine model and
an implementation of this machine in a fast language
(ADA), along with a discussion on why and how parallelism can easily increase speed, with numerical results
of sequential and parallel implementation.
1
Introduction
In order to get closer to human reasoning, computer
systems, and especially logic programming systems,
have to deal with various concepts such as time, belief, knowledge, contexts, etc ... Prolog is just what is
needed to handle the Horn clause fragment of first order logic, but what about non-classical logics? Just
suppose we want to represent in Prolog time, knowledge, hypotheses, or two of them at the same time; or
to organize our program in modules, to have equational
theories, to treat fuzzy predicates or clauses. All these
cases need different ways of computing a new goal from
an existing one.
Theoretical solutions have been found for each of the
enumerated cases, and particular extensions of Prolog have been proposed in this sense in the literature-.
Examples are [BK82]' [GL82], Tokio [FKTM086], NPROLOG [GR84]' Context Extension [MP88], Templog [Bau89], Temporal Prolog [Sak89], and [Sak87].
For all these solutions it is possible to write specific meta-interpreters in Prolog that implement these
non-classical systems ([SS86]). But there are disadvantages of a meta-interpreter: lower speed and compila*Supported by the Centre d'Etudes de la Navigation Aerienne, France
tSupported by the Medlar Esprit Project
tSupported by CAPES - Brasil
tion notoriously inefficient. If we want to go a step
further, and to write proper extensions of Prolog, then
the problem is that costs for that are relatively high
(because for each case we will lead to write a new extension), and we are bound to specific domains: we can
only do temporal reasoning, but not reasoning about
knowledge (and what if we want to add modules?).
Our aim is to define a framework wherein a superuser can create easily "his" extension of Prolog. This
framework should be as general as possible. Hence,
we must provide a general methodology to implement
non-classical logics.
There are four basic assumptions on which our frame
is built:
1. to keep as a base the fundamental logic programming mechanisms that are backward chaining,
depth first strategy, backtracking, and unification,
2. to parametrize the inference step: it is the superuser who specifies how to compute the new goal
from a given one, and he specifies it in a logic
form.
3. to be able to rewrite goals.
4. to select clauses "by hand".
Points (2) and (3) postulate a more flexible way
of computing goals than that of Prolog, where first a
clause is selected from the program, then the Robinson
unification algorithm is applied to the clause and the
head of the goal, and finally a new goal is produced.
Point (4) introduces a further flexibility: the superuser may select clauses that do not unify exactly with
the current goal, but just "resemble" it in some sense.
Even more, if the current goal contains enough information to produce the next goal, or if we just want to
simplify a goal or to reorder literals we don't need to
select a fact clause at all.
The assumptions (1) and (2) were at base of
the development. of a meta-level inference system
called MOLOG [FdC86], [ABFdC+86], [BFdCH88],
834
(Esp87b], (Esp87a]. The inference machine that is presented in this paper is a complete rewriting of MOLOG
realizing assumption (4). It has been developped at
IRIT ((Bri87] and (AG88]).
A formal specification of the inference mechanism called TIM : Toulouse Inference Machine, together with various examples, has been published in
(BHLM91]. Here, in this paper, we present the 'J\RS.KI :
Toulouse Abstract Reasoning System for Knowledge
Inference, which is an abstract machine in which the
inference mechanism can be implemented. In the preliminary version of this work nothing has been said
about abstract machine and implementation, and the
specifications are being defined more clearly now.
'J\R.sKr was designed to implement parallelism (see
sections 6 and 7). For example, for a given definite
fact and goal clauses, more than one rule is possible.
In this case it is possible to use a different processor for
each rule. The parallel machine wasdevelopped and
differents solutions was be done.
2
Horn clauses
The base of the language is that of Prolog. That language can (but need not) be enriched with context operators if one wants to mechanize non-classical logics.
Characteristically, non-classical logics possess symbols with a particular behaviour. These symbols are
• either classical connectors with modified semantics (e.g. intuitionist, minimal, relevant, paraconsis tent logics)
• or new connectors called context operators (necessary and possible in modal, knows in epistemic,
always in temporal, if in conditional logics ).
Example In epistemic logics, the context operators
are knows and comp, and
knows(a):P means that agent a knows
that P
comp(a):P means that it is compatible
with a's knowledge that P
Hence inference engines for non-classical logics must
reckon for the particular behaviour of some given symbols. These properties will be handled by built-in features of the inference engine.
The conditio sine qua non for logic programming
languages is that they possess an implicational symbol
to which a procedural sense can be given. To define a
programming language it's less important if this is material implication or not, but it's rather the dynamic
aspect of implication that makes the execution of a
logic program possible. That is why the TIM language
is built around some arrow-like symbol.
We suppose the usual definition of terms and atomic
formulas of logic programming. Intuitively, TIM Horn
Clauses are formulas built with the above connectors,
such that dropping the context we may get a classical Horn clauses. Now for each logic programming
language we suppose a particular set of context operators. This set depends on the logic programming
language we want to implement, e.g. in epistemic logic
it is {knows, comp} and in temporal logic it is {always,
sometimes}. Formally we define by mutual recursion:
Definition 2. 1 - contexts
m( t
1, ... , t n )
is a context if mis a context operator n
~
0, and for 1 ~ i ~ n every t i is either a term or
a definite clause.
Definition 2. 2 - goal clauses
?P is a goal clause if P is an atomic formula
?(G /\ F) is a goal clause if ?G, ?F are goal clauses
?MOD : F is a goal clauses if ?F is a goal clause and
MOD is a context
Definition 2. 3 - definite clauses
P is a definite clause if P is an atomic formula
MOD: F is a definite clause if F is a definite clause
and MOD is a context
F
G is a definite clause if F is a definite clause
and G is a goal clause
+-
Definition 2. 4 - TIM Horn clause
A TIM Horn clause (or Horn clause for short) is
either a goal clause or a definite clause. Note that
Horn clauses may contain several implication symbols.
We shall also use the term Modal Horn clauses if we
are speaking of a modal logic. A set of definite clauses
is called a database.
In the following sections we shall use the definition
of the head of a Horn clause.
Definition 2. 5 - Head of a Horn clause
• H is a head of H.
• H is a head of F /\ G if H is a head of F.
• H is a head of F
+-
G if H is a head of F.
• H is a head of MOD : F if H is a head of F.
3
3.1
Writing meta-interpreters
General Mechanism
Just as in Prolog, to decide whether a given goal follows from the database essentially means to compute
step by step new subgoals from given ones. In our
835
3.2
~ Failure
L--_ _ _ _ _ _ _ _......
clause
~
rule I
Backtrack
L--_ _ _ _......
L - - - - - - r_ _ _ _---J
~e I Backtrack
Rewriting
No
Selecting and Executing Inference Rules
Final form?
~ Yes
SUCCESS
Figure 1: General mechanism of the TIM machine
An inference rule is of the form: A,? B f-?C where
A is a definite clause and B, C are goal clauses. It
can be read: If the current goal clause unifies with B
and the selected database clause unifies with A then a
new goal can be inferred that is unified with C. In the
style of Gentzen's sequent calculus, inference rules can
be defined recursively as follows:
A,?B f-?C
A', ?B' f-?C'
where A,A' are definite clauses and B, C, B', C' are
goal clauses. As usual in metaprogramming, objects of
the object language are represented by variables of the
metalanguage l .
Essentially, what can be tested here is any condition
on the form of A,A', B, C, B', C', or on the existence
of a database clause of a certain form. E.g. we can
let an inference rule depend on the (non- )existence of
some clause in some particular module of the database.
In the recursive definition the following conditions
must be met 2 :
• var(A') C var(A)
• A' is a head of A or A is a head of A'
case, the computation of the new subgoal is specified
by the superuser. The general inference mechanism is
described in figure 1. There are five steps:
Clause selection: We select a clause to solve the first
sub-goal of the question.
Rule selection: We select a rule to be applied to the
current clause and the current question.
• C' is a variable
• C' is a head of C
A special category of inference rules are reflexive
rules:
true, ? B f-?C
A', ? B' f-?C'
These rules use the special fact true. The conditions
that these rules must meet are:
• A' is either:
Rule execution: The execution of the rule "modifies" the current clause and the current question
and builds a resolvent.
Rewritting of the resolvent: When we reach a termination rule, we rewrite the resolvent into a new
question.
a variable3 , or
any definite clause constructed from the variables in Band C and constants.
• C' is a variable
• C' is a head of C
This system is doubly non determinist, because we
have both a clause selection (as in standard Prolog)
and a rule selection.
Partial termination rules are written:
A, ? B f-?C if Condition
They end the recursivity in resolution.
These are some examples : the Prolog rule for goai
conjunctions:
A,?BACf-?DAC
A,?Bf-?D
We are going in the next sections to explain how this
mechanism can be implemented. In subsection 3.2, we
will discuss rule selection and execution, in subsection
3.4 rewriting and in subsection 3.3 clause selection. In
section 6, we will come back to rule selection to show
how efficient mechanism can be used to improve resolution speed.
lTo be correct, the real form of inference rule is a little different : a procedural condition expressed with elementary functions of the abstract machine (see section 5) can be added. This
enables a more precise control over execution.
2It is these conditions on the form of the inference rules that
warrant the efficiency of the implementation.
3This variable will be unified with a new fact taken in the
clause base
End of resolution : A resolution is completed when
we reach a final form: the goal clause true.
836
the Prolog rule for implications in database clauses:
At-B,?GI-?BAD
A,?G I-?D
the Prolog partial termination rule is:
p, ?p I-?true
Note that here we make use of unification. These three
rules are exactly what is needed to implement Prolog.
To summarize, the execution of an inference rule
modifies the current fact and the current question and
constructs a resolvent. The resolvent has the same
structure than the question or any other fact. Partial
resolution is achieved when we reach a partial termination rule.
How rules are selected is defined by the user. We
will see in the section 6 how this is exactly done. For
the moment, we say that rules are taken in the order
they appear in the rule base.
3.3
Rewriting the Resolvent into a
New Question
As soon as we have reached a partial termination rule,
we rewrite the resolvent to create the new question to
solve. Rewriting is useful not only in order to simplify
goals, but also in order to eliminate the true predicate
from the new goal clause.
Rewrite rules are of the form:
GI~G2
and allow to replace a term that is matched by G I
in the resolvent with some substitution a by the term
(G2)a in the new question.
For example, the Prolog rewrite rule is:
true A A~ A
In epistemic logic, the rule:
knows(a) : knows(a) : A ~ knows(a) : A
is a useful simplification.
3.4
Selecting Database Clauses
The user can define the way clauses are selected in
the base. But this selection "by hand" must be chosen among a given set (that currently implements only
two methods: classical Prolog selection and least used
clause selection).
Using the abstract machine, it is possible to build
another selection mechanism (for example indexing selection on the first operator) but it has not been implemented yet and it is not described in this paper.
4
Examples: Modules
In this section we are going to show how to specify modules with dynamic import. Here, any module
name, such as m, ml, m(2), etc ... is considered to be
a context.
Module logic
U,fM:Ut fM:NU
O,?GI-?NG
M:O,?M:GI-?M:NG
O,?GI-?NG
truel\G-v>true
M:true-v.true
Table 1: Rules for Module logics
The goal mi : m2 : G succeeds if G can be proved
using clauses from the modules m1 and m2. The inference rules are that for Prolog, plus two supplementary
rules to handle module operators (table 1).
The first rule represents the case where a module
M is used to compute a new goal, and the second
where another module name eventually occurring in
G is used.
Others types of modules such as modules with static
import or with context extension [MP88], can be specified by just adding as new inference rule. In [BHLM91],
we have shown how temporal logics, hypothetical reasoning and logics of knowledge and belief can be specified elegantly in our framework.
5
The abstract machine
The goal of the ']!\.R;:Ki abstract machine is to bridge the
gap between the description of inference rules in logical
form as shown above, and the real implementation of
the rule in an efficient programming language.
Compared to the WAM, the ']!\.R;:Ki abstract machine
deals with different objects, and has a quite different
goal, but on the whole, principles are identical; we will
also define our machine in terms of data, stacks, registers and instructions set. We do not have enough
room here to describe completely the machine. So,
we shall not speak of the "classical" parts of resolution that are identical: i.e unification or backtracking.
Let's say that the machine relies on classical structure
sharing for unification, and on depth first search and
backtracking.
Before going further, we must tell about the Great
Lie. ']!\.R;:Ki does not use classical logic operators A or
t-. For consistency and simplicity sake, all operators
either modal, temporal, classical, are represented in
our formalism in the same way and are treated by the
machine in the same way also. Let's see that on an
example: The logical clause written in Prolog:
At-BAG
will be written in ']!\.R;:Ki:
A( G) : A(B) : A
Here B is the argument of A and A is qualified by
A(B). All operators have arguments, and qualify an
837
object. For example, the 84 modallogic 4 clause:
D(X) : (D(a) : p +- O(a) : p)
will be written:
D(X) : /\(O(a) : p) : D(a): p
and O( a) : p is the argument of /\ that qualifies D( a) :
The clauses stack: Each element of this stack is
composed of:
• a pointer in the object stack to the beginning
of the clause
• a pointer to the head predicatel l
p.
This could look like the polish reverse notation, but
it is not exactly the same. In the polish reverse notation K pq (that is p /\ q) gives the same role to p and q.
In /\ (p) : q, p and q have really different parts to play:
p is an operand of /\ and q is the object qualified by
/\(p). This destroys the symmetry of /\, but should be
considered as an advantage here. In all classical Prolog, solving p /\ q is different from solving q /\ p: the
operator is not symmetric at all.
This formalism was not adopted lightly. The first
versions did not use it, and gave a special place to
the classical operators: we had a lot of problems to
describe correctly the inference mechanism. Adopting
this structure greatly enhanced the simplicity and the
efficiency of the system.
5.1
Data structures
First of all, boolean objects (true, false) with classical
operations associated (not, or, and) are implemented
along with integer and floats, with their standard operations.
All data are organized in stacks. There are currently
nine basic data types, and nine corresponding stacks.
The objects stack: holds all the objects on which
the machine operates. An object can be either:
an operatorS, a predicate6 , a variable, an integer,
a float, a cons 7 , alfree8 • Elements of this stack will
be called ObjectElement?
The operands stack: Objects do not hold their
operands. Each object that has arguments holds
the number of its operands and a pointer to an
element of this stack that holds pointers to all
the operands lO • Elements of this stack are called
Opera ndElement.
4From now on, we will only use the S4 modal logic. A classical
introduction is [HC72]. We will use the following notations: 0
is knows, 0 is compatible. Modal operators have arguments that
must be constants. The new operator O[ must be added to the
original language as shown in ([CH88]).
5 An operator is an object that has objects as arguments and
qualify an other object.
6 A predicate is an object that has arguments but do not
qualify any other object.
7The classical LISP cons
8 alfree is a special object quite similar in its behaviour to a
variable that would always be free (alfree is the abbreviation of
always free).
9Strings are currently not implemented.
laThe operand stack is probably a technical mistake and will
probably be suppressed in future versions of the machine
• the number of free variables in the clause.
Elements of this stack are called ClauseElement.
The environments stack: Each element is a pair
composed of a pointer to an object and a pointer
in the environment stack in that the object has
to be evaluated (classical structure sharing implementation). Elements of this stack are called EnvironmentElement.
The Trail stack: Pointers to the environment list for
resetting to free some variables when backtracking
(classical structure sharing implementation). Elements of this stack are called TrailElement.
The backtrack stack: Each element holds all information necessary to backtracking (values of top
of stacks). Elements of this stack are called BactrackElement.
The question stack: Each element is a pair composed of a pointer of an object and a pointer to
the environment where this object must be evaluated. The question stack holds goals to be solved.
Elements of this stack are called QuestionElement.
The resolvent stack: stack for the resolvent elements. The resolvent is built with the current
question and the current selected fact. When
reaching a partial termination rule, the resolvent
is re-written using rewriting rules on the top of the
question and becomes the new question. Elements
of this stacks are called resolventElement.
The predicates stack: Holds predicate structures.
There are also nine other types: pointers12 to object
in each stack, respectively ObjectPointer) OperandPointer)ClausePointer) EnvironmentPointer) TrailPointer) BacktrackPointer) resolventPointer) QuestionPointer.
At last, there is the rules array. This array describe
how resolution rules behave in the system. We will
come back to this later.
5.2
Registers
The registers described here are what we call global
registers or main registers (see figure 2). There exists
11 Useful when using classical Prolog clauses selection to increase speed.
l2We usually use the term pointer that is not exactly appropriate. Our pointers should be thought as abstract data types,
that can be implemented as real pointers, or as indexes of an
array, or anything similar.
838
Register
Qcurr
FCurr
FEnv
CClause
CRuie
TrTop
ObTop
BTTop
Qtop
RTop
EnvTop
Description
Pointer to the current object in the question
Pointer to the current object in the clause
Pointer to the environment of the current clause
Pointer to the current clause
Index of the current rule used
Pointer to the top of Trail Stack
Pointer to the top of Object Stack
Pointer to the top of Backtrack stack
Pointer to the top of question stack
Pointer to the top of resolvent stack
Pointer to the top of environment stack
Figure 2: Abstract machine registers
Push(x : object) return pointer
Read(i : pointer) return object
Pull return object
Modify(x: object; i : pointer)
SetTop(i : pointer)
Position return pointer
instruction unify, that unifies (Structl, Envl) with
(Struct2, Env2)13.
Let's see on an example how the abstract machine
code is used to implement rules 14 :
D(X) : A, ?D(X) : B I-?D(X) : 0
D(X) : A, ?B I-?O
is translated into:
.
RO:=Read(Qcurr)
if not
unify(Fcurr,Fenv,GetNumStruct(RO),GetNumEnv(RO)
then return false
else Pushreso!vent(RO) endif
Qcurr := Qcurr+l
return true
6
Rule selection with parallelism
Figure 3: Operations available on each stack
also general purpose registers that can be temporarily
used for calculations. We will note them RO, RI, ... in
the following pages.
At time t, the machine is completely defined by the
values of its stacks and its registers.
5.3
Instructions set
We describe here the instruction set of the abstract
machine. We can not, because of lack of space, describe
it extensively, but the next few lines give an intensive
definitions of all instructions.
For each type of object, there are twice as many
functions as there are components in the object, one
for getting the value of the component and one for
setting this value.
Moreover, for each of the nine stacks there are 6 basic
operations implemented (see figure 3).
+(p:pointerj i:integer):pointer Increments pointer p by i
- (p:pointerj i:integer) :pointer Decrements pointer
p by i
-(pl,p2 : pointer):integer Returns the number of
elements between pI and p2.
There are also some classical functions: Assignment, Equality test, Conditional constructions.
This ends the description of atomic functions. We
will need in the following lines the classical macro-
In section 3.4, we said that resolution rules were chosen in the rules base in order of appearance. We are
going to show here that this mechanism can be greatly
enhanced by indexing the rules base and using parallel
execution of rules.
6.1
Indexation of rules
The rules necessary to implement S4 are shown on top
of table 2.
Remember that due to the uniform notation of the
abstract machine the clause !\(A) : B of the second
rule is in fact the implication B t - A. We can see
that, for a given fact and a given question, we have
to try a lot of different rules. This creates a second
non-determinism that greatly slows down the implementation of the language.
But trying all rules is usually not useful, because
for a given fact and a given question, only a few rules
will match the shape of the fact and the shape of the
question. For example, if the fact is D(X) : A and the
question <>I(X, 1): B only rules 9 and 11 can be used.
So, for a given logic, we can develop extensively all
possible cases. For 84, this gives table 2. This way,
given a fact and a question, the array gives directly the
rules that can be applied and there is often only one
rule that can be applied. This transforms the double
non-determinism in an almost simple non-determinism
much closer to Prolog complexity. 80, in a large number of cases, it is not necessary to backtrack on rule
selection.
13 unify is of course written with atomic instructions.
l40ther examples can be found in [AIled]: full implementation
of S4 logic, among others (Fuzzy logic, module logic).
839
I Fact I Question I
Rules
Rl
Type
Rule
Rule
Number
1
2
Rule
Rule
Rule
3
4
Rule
Rule
Rule
6
7
8
9
Form
p, ?p I-?true
11
A(A}:B,?CI-?A(A}:D
B ?CI-W
B,?A(A):CI-?A(A):D
B ?CI-W
A,?O(X):BI-?C
A,?BI-?C
o r(X ,I):A,?O(X):BI-?Or(X ,1):C
A, ?O(X):BI-?C
Od X ,l}:A,?Or(X,l}:BI-?Or(X,I):C
A,?BI-?C
D(X):A,?D(X):BI-?D(X):C
D(X):A,?BI-?C
D(X):A,?o(X):BI-?O(X):C
D(X):A,?BI-?C
D(X}:A, ?Or(X,I):BI-?Or(X,I):C
D(X):A,?BI-?C
D(X):A,?O(X):BI-?O(X):C
A,?O(X):BI-?C
D(X):A,?BI-?C
A ?BI-?C
Fact
Pred
Pred
Question
Pred
Usable rules
p, ?p I-?true
Pred
o
Rule
Rule
Rule
5
10
/\
/\
Pred
/\
/\
/\
o
o
/\
o
o
o
Pred
/\
o
o
01
01
01
o
A,.t\lX):BI-.A(X):(,'
A ?BI-?C
A,?O(X):BI-?C
A ?BI-?C
A(X):A,?BI-.A(X):C
A ?BI-?C
t\(X):A,.A Y):l1/-·.A(Y):(,'
A(X):A,?BI-?C
A(X):A,!O:Y :l1l-.A(X ):(,'
A?O Y :BI-?C
A(X):A,.D[Y :l1l-.A(X):C
A,?D Y :BI-?C
A(X):A,!Or( ,l):l1/-·.A(X):C
A ?Or(YI):BI-?C
D(X):A,:BI-?C
A ?BI-?C
D(Y):A,.A(X):BI-.A(X):C
D(YI:A ?BI-?C
D(X):A,?O(Y):BI- !C
A ?<>(Y :BI-?C
D(Y):A, !O(X ):l1l-'!C
D(YI:A,?BI-?C
D(X ):A,'!O(X ):l1l-'!O(X ):(,'
A,?O X):BI-?C
-D(X):A, !~(A ,:l1t !VIA J:C
D(X :A ?BI-?C
D(X):A,?Orl ,l):BI-!C
A,?odYl :BI-?C
D(X ):A,!Or( ,1 ):11/-·[<;>1(-" ,1):(,'
D(X :A ?BI-?C
<;>r(Y):A,.A( ):l1t-·.A(X):C
OI(Y):A,?BI-?C
Or(Y):A,?o(X):BI-?C
OI{Y):A,?BI-?C
o r(X,I):A,!O(X ):l1l-'!Or(X,1 J:C
A ?O(X):BI-?C
Or(X,I):A,!Or(X,l):l1l- tOr(X,l):C
A ?BI-?C
Table 2: S4 logic rules and their exhaustive development
R2
o
R3
o
R4
o
Table 3: Rules
PI
0
against 0
L-._ _ _ _ _ _ _ _....J
~ Failure
clause
PI
I
I
I
T
T
I
T
P4
,
I
Each processor will continue resolution with a fourth of
the resolution tree
Figure 4: Parallel execution of S4 rules
6.2
Parallel rule execution
The abstract machine was designed to enable an easy
implementation of parallelism. Sometimes, for a given
definite fact and a given goal clause, more than one
rule is possible: we can use a different processor for
each rule. For example, in the S4 logic, if the fact is
D(X) : A and the question is O(X) : B, four rules can
be used (table 3). With four processors, each one can
continue the resolution with a different rule. Figure 4
shows how the inference system running originally on
processor Pl. With four processors Pi, P2, P3, P4
available, it is possible to solve, in parallel, S4 rules
described in table 3.
The information transferred from one processor (Pi)
to its children (P2, P3, P4) are the abstract machine
data stacks and the abstract machine registers. Some
stacks are never transferred (the backtrack stack, the
trail stack) because the child does not need to backtrack over the current resolution point. This parallelism induces no side effects : as soon as one processor has received data, it will not have to communicate
840
L to all : free
P to L : request
L to P : Ok
P to L : Data
Figure 5: Fully interconnected network
with its parent any more until it has finished its own
resolution. Moreover, there is no overhead in processing time because parallelism is explicit in the language
itself: overhead comes only from communication between processes.
Four models (Master/slaves network, fully interconnected networks, ring networks, top-down networks)
are under development; we just mention them and we
will not discuss them in detail 15 •
Fully interconnected network: Every processor can
distribute work to any other processor that is free.
A very simple protocol is used to prevent two processors to send at the same time data to the same
processor (figure 5). This protocol will solve problems as represented in figure 4.
Master/slaves network: The master process distributes work to all other processes, which, in
turn, can not distribute any work. This protocol
will also solve problems as represented in figure 4.
Ring network: Here each processor can send work to
the next one, and the last processor can send work
to the first.
Top-Down network: In the Top-Down Network,
each processor can only send information to the
following one but the last processor can't send information to the first one. In ring networks and
top-down networks, resolution is not exactly as
represented in figure 4.
7
7.1
Implementing Parallelism
The "classical" machine
The new abstract machine specifications was the result
that began with the first implementation of MOLOG,
in C, in 1988.
Coding the new machine took less than two months.
Of course, two years spent in coding other abstract
machines (that proved to be unsatisfactory) helped a
lot. From the beginning, the stress was on getting a
150n all practical implementations issues, details can be found
in [AIled].
program as close as possible to the specifications of
the abstract machine. That was the reason why the
ADA language has been chosen: the specifications of
the abstract machine are exactly the specifications of
the main package of the implementation. Moreover,
compared to other implementations previously written
in C, coding and debugging was a lot easier and faster.
We wanted also to be able to easily implement parallelism. So, for example, stacks are implemented with
arrays and there is not a single real pointer in the system, only indexes. It has an interesting well known
side effect: we never run out of stack space, because if
a stack becomes full, we just have to copy it to a new
larger stack. All indexes are still valid. The mechanism
is invisible to the programmer and the user and very
useful with some very recursive non-classical problems.
This was done at the loss of performance. Accessing any object in a stack requires two function calls
and three tests plus the classical indirection. The
'l\R.sK:r machine runs about fifteen times slower than
C-Prolog16 on PROLOG problems. This could easily
be enhanced by recoding the machine with efficiency
in mind.
Coding a logic is very easy as soon as it follows the
general framework given in section 3.2. The S4 logic
was implemented in one day. and tested with the classical "wise men" puzzle. The puzzle is solved in three
minutes on a HP-720 workstation with the full amount
of knowledge (more than twenty clauses). With only
the five clauses necessary to solve the problem, the solution is found in less than a second, hundred times
faster than the MOLOG interpreter.
7.2
The parallel machine
The parallel machine was developped with an ETHERNET network as medium for data transfer. The
parallel system is made of many 'l\RS.Kr machines running on different workstations, linked by INTERNET
sockets 1 7. The only configuration tested was a topdown network. Results are shown in table 4. It would
be too long to discuss them here in detail. Full explanations can be found in [AIled].
We can briefly say that, over three processors, the
network is clearly too slow and becomes the bottleneck
of the system. A large part of time is lost in communicating with other processors. There are different
solutions that could be used to enhance performances:
• We can use parallelism only for branches that are
161t is however faster than some classical PROLOG written in
compiled Common Lisp
171t was quite easy to do, because all necessary packages for
communication and parallelism had been developped previously
for other projects. Reusability of software is a major advantage
of ADA.
841
#
1
2
3
4
of Procs
P1
319+1
166+10
129+24
129+26
P2
145+6
142+50
140+46
P3
P4
Acknowledgements
We wish to thank Luis Farinas Del Cerro for valuable
discussions.
77+17
46+31
22+9
Table 4: CPU +system time used
close to the root of the tree. This will decrease
the number of sent packets.
8
9
References
[ABFdC+86] R. Arthaud, P. Bieber, 1. Farinas del Cerro,
J. Henry, and A. Herzig. Automated modal reasoning. In Proc. of the Int. Conf. on Information Processing and Management of Uncertainty
in Knowledge-Based Systems, Paris, july 1986.
• We can try a master/slave network. The master
processor will be almost devoted to sending packets but slaves would not spare time on this.
[AG88]
J. M. AIliot and J. Garmendia. Une Implementation en "C" de MOLOG. Rapport D.E.A, Universite Paul Sabatier, Toulouse, France, 1988.
• We can improve the amount of sent data; some
stacks can only grow, and are never modified under a certain depth. We could only send new data,
and not the whole stack.
[AL91]
Jean-Marc Alliot and Marcel Leroux. En route
air traffic organizer, un systeme expert pour Ie
contr6le du trafic aerien. In Proceedings of the
• We could try to use a different medium. An
ethernet network is a very slow device for parallelism, and, moreover, our network is usually
crowded with packets coming from other stations
or other X-terminals. It would be very interesting
to implement the machine on a multi-processor
computer with shared memory segments, or on
a transputers network. We were not able to do
it yet because we lack access to such a machine.
We are very eager to try such an approach. If
we are able to find a machine with many processors, the inference machine could be almost as fast
as a standard PROLOG even when solving nonclassical logic problems, because the double nondeterminism would be almost reduced to classical
PROLOG non-determinism.
[AIled]
Jean-Marc Alliot. Tarski: une machine paralIe Ie pour l'implementation d'extensions de prolog.
Master's thesis, Universite Paul Sabatier, To be
published.
[Bau89]
Marianne Baudinet. Logic Programming Semantics: Techniques and Applications. PhD thesis,
Stanford University, feb 1989.
[BFdCH88]
P. Bieber, L. Farinas del Cerro, and A. Herzig.
MOLOG - a modal PROLOG. In E. Lusk and
R. Overbeek, editors, Proc. of the 9th Int. Conf.
on Automated Deduction, LNCS 310, pages 487499, Argonne - USA, may 1988. Springer Verlag.
[BHLM91]
P. Balbiani, A. Herzig, and M. Lima-Marques.
TIM: The Toulouse Inference Machine for nonclassical logic programming. In M.M. Richter and
H. Boley, editors, Processing Declarative K nowledge, number 567 in Lecture Notes in Artificial
Intelligence, pages 365-382. Springe-Verlag, 1991.
[BK82]
K. A. Bowen and R. A. Kowalski. Amalgamating
language and metalanguage in logic programming.
In K. Clark and S. Tarnlund, editors, Logic Programming, pages 153-172. Academic Press, 1982.
[Bri87]
M. Bricard. Unemachine abstraite pour compiler MOLOG. Rapport D.E.A., Universite Paul
Sabatier - LSI, 1987.
[CH88]
Luis Farinas Del Cerro and Andreas Herzig. Linear modal deductions. In E. Lusk and R. Overbeek, editors, Proc. of the 9th Int. Conf. on Automated Deduction Computer Systems. SpringerVerlag, 1988.
[Esp87a]
Esprit Project p973 "ALPES". MOLOG Technical Report, may 1987. Esprit Technical Report.
[Esp87b]
Esprit Project p973 "ALPES". MOLOG User
Manual, may 1987. Esprit Technical Report.
[FdC86]
1. Farinas del Cerro. MOLOG: A system that
extends PROLOG with modal logic. New Generation Computing, 4:35-50, 1986.
Conclusion
We think the implementation of any logic given by inference rules of the form defined in the earlier sections
can be done in a very short amount of time (one or
two days at most). The development of an automatic
translator from the logical shape of the rules to the
abstract machine specifications suggests itself and is a
subject of current work.
Now, it is hoped that fast, general and efficient implementations of such logics could bring a new area of
development for expert systems. In particular, in the
C.E.N .A.IS a large expert system (3,000 rules) using
fuzzy and temporal logics has been developped in Prolog ([AL91]). This expert systems could be an excellent
test for 1\lliKI.
18The CENA is an institution responsible for studies of new
systems for Air Traffic Control in France
International Conference on Expert systems and
their applications, Avignon, May 1991.
842
[FKTM086] M. Fujita, S. Kono, H. Tanaka, and T. MotoOka. Tokio: Logic programming language based
on temporal logic and its compilation to prolog.
In Third Int. Conf. on Logic Programming, pages
695-709, jul 1986.
[GL82]
M. Gallaire and C. Lasserre. Meta-level control
for logic programs. In K. Clark and S. Tarnlund,
editors, Logic Programming, pages 173-188. Academic Press, 1982.
[GR84]
D. Gabbay and U. Reyle. N-prolog: An extension
of prolog with hypothetical implications. lounal
of Logic Programming, 1:319-355, 1984.
[HC72]
G. E. Hughes and M. J. Cresswell. An Introduction to Modal Logics. Methuen & Co. Ltd, USA,
2 edition, 1972.
[MP88]
Luis Monteiro and Antonio Porto. Modules for
logic programming based on context extension. In
[Sak87]
Y Sakakibara. Programming in modal logic: An
extension of PROLOG based on modal logic. In
[Sak89]
Takashi Sakuragawa. Temporal PROLOG. In
Int. Con! on Logic Programming, 1988.
Int. Conf. on Logic Programming, 1987.
RIMS Conf. on software science and engineering,
1989.
[SS86]
L Sterling and E. Shapiro. The Art of Prolog. The
MIT Press, USA, 1986.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
843
Parallel Constraint Solving in Andorra-I
Steve Gregory and Rong Yang
Department of Computer Science
University of Bristol
Bristol BS8 1TR, England
steve/rong@cs.bris.ac.uk
Abstract
The subject of this paper is the integration of two
active areas of research: a parallel implementation of a
constraint logic programming language. Specifically,
we report on some experiments with the and/orparallel logic programming system Andorra-I
extended with support for finite domain constraint
solving.
We describe how the language supported by
Andorra-I can be extended wi th finite domain
constraints, and show that the computational model
underlying Andorra-I is well suited to execute such
programs. For example, most constraints are
automatically executed eagerly so as to reduce the
search space; moreover, they are executed
concurrently, using dependent and-parallelism.
We have compared the performance of some
constrained search programs on Andorra-I with that
of conventional generate-and-test programs. The
results show that the use of constraints not only
reduces the sequential execution time, but also
significantly increases the and-parallel speedup.
1
Introduction
Much of the success of Prolog has been due to its
suitability for applications involving search: the.
language provides a relational notation which is very
convenient for expressing non-deterministic problems
and it can be implemented with impressive efficiency.
However, the search strategy built into Prolog is a
rather naive one, which tends to perform an
unnecessary amount of search for problems that are
stated in a simple manner. To solve realistic search
problems in Prolog, it is often necessary to perform
additional forward computation in order to reduce
the search space to a manageable size. However,
since this extra computation must be programmed in
Prolog itself, it may be an expensive overhead which
partly offsets the speed benefits of the reduced search.
Moreover, the resulting program is more opaque and
difficult to write than a natural solution in Prolog.
To improve on the search strategy of Prolog while
retaining its advantages is the motivation for the
development of constraint logic programming (CLP)
systems. Most of the CLP languages that have been
proposed are based on Prolog, extended with the
ability to solve constraints in one or more domains.
CLP languages use knowledge specific to their
domain to execute certain goals ("constraints") earlier
than would be possible in Prolog, thus potentially
reducing the search space. Provided that the
constraint solving mechanism is implemented
efficiently and that the language is simple to use, the
search time can be reduced at little cost in either
forward computation time or increased program
complexity. One type of CLP language, which has
proved particularly useful for combinatorial search
problems, is that based on finite domains; this is
described in a little more detail in Section 2.
There have been many projects in recent years to
develop parallel implementations of Prolog. Most of
these systems incorporate either or-parallelism,
independent and-parallelism, or both. In contrast, the
Andorra-I system is an implementation of Prolog that
exploits or-parallelism together with dependent andparallelism, which is the sole form of parallelism
exploited in most implementations of concurrent logic
programming languages such as Parlog and GHC.
Andorra-I has proved effective in obtaining speedups
in programs that have potential or-parallelism and
those with potential and-parallelism, while in some
programs both forms of parallelism can be exploited.
Andorra-I, and the Basic Andorra model on which it
is based, are described briefly in Section 3.
The subject of this paper is the integration of the
above strands of research: a parallel implementation
of a constraint logic programming language.
Specifically, we report on our experiences with
extending the Prolog-like language supported by
Andorra-I to support finite domain constraint solving.
There are two main reasons why this is of interest:
1. Language. To investigate how easily the required
language extensions can be supported by the Basic
Andorra model.
844
2. Performance. To ensure that the finite domain
extensions can be implemented efficiently in
Andorra-I and that the efficiency is retained in
parallel execution.
Although a prototype or-parallel implementation
of the Chip language has been developed [Van
Hentenryck 1989b], we are not aware of any previous
investigation of and-parallelism with finite domain
constraints. By adding these extensions to Andorra-I
we can experiment with both forms of parallelism and
compare them.
It is particularly interesting to compare the
performance of constrained search programs on the
Basic Andorra model with that of conventional
generate-and-test programs (apart from the expected
reduction in overall execution time). The constraint
solving represents additional forward computation,
so - provided that the constraints can be effectively
solved in parallel- we would expect and-parallelism
to be increased. At the same time, since the search
space is reduced, there may be less scope for orparallelism. The performance results obtained with
Andorra-I confirm these expectations.
The next two sections describe the background to
the paper. Section 4 discusses the implementation of
finite domain constraints on the Basic Andorra model.
It describes in detail the language extensions that we
have implemented and the structure of programs that
use them. Section 5 presents some results of running
constrained search problems on Andorra-I. Section 6
concludes the paper.
2
Finite domain constraints
The idea of adding finite domain constraints to logic
programming originated with the work of Van
Hentenryck and his colleagues, and was first
implemented in the language Chip [Van Hentenryck
and Dincbas 1986; Dincbas et al. 1988; Van Hentenryck
1989a]. Chip extends Prolog in several ways to
handle constraints; the principal extensions relevant
to finite domains are outlined below.
2.1
Domain variables
Some variables in a program may be designated
domain variables, ranging over any specified finite
domain. Domain variables appear to the programmer
like normal logical variables but are treated
differently by unification and by constraints.
2.2
Constraints on finite domains
Goals for certain constraint relations behave ina
special way when they have domain variables as
arguments. For example, if X is a domain variable, the
goal X ~ 5 can be executed by removing from the
domain of X all items greater than 5. This in tum may
reduce the search space that the program explores. A
user-defined predicate may be made a constraint by
using a 'forward' or 'lookahead' declaration, while
some primitives (e.g., inequality) have such
declarations implicitly. (Unification can have a
similar effect: unifying two domain variables reduces
the domain of both to the intersection of their original
domains, while unifying a domain variable and a
constant may fail.)
2.3
Coroutining
Constraints should be executed as early as possible in
order to reduce the search space. For example, X ~ Y
could be executed as soon as either X or y has a value
and the other is a domain variable. In general, a
coroutining mechanism ensures that control switches
to a constraint goal as soon as it can be executed. The
simplest such control rule is forward checking, used for
forward-declared constraints, whereby a constraint is
executed as soon as its arguments contain at most one
domain variable and are otherwise ground. The
constraint goal is then effectively executed for each
member of its argument's domain and values that
cause failure are removed from the domain.
The lookahead rule, often used for inequality
relations such as '~', can even execute constraints
whose arguments contain more than one domain
variable; we shall not consider this further in this
paper.
3
The Basic Andorra model
The Basic Andorra model is a computational model
for logic programs which exploits both or-parallelism
and dependent (stream) and-parallelism. The model
works by alternating between two phases:
1. Determinate phase. Determinate g()als are
executed in preference to non-determinate goals.
While determinate goals exist they are executed in
parallel, giving dependent and-parallelism. (A
goal is considered determinate if the system can
detect that it can match at most one clause.) This
phase ends when no determinate goals are
available or when some goal fails.
2. Non-determinate phase. When no determinate
goals remain, one goal - namely, the leftmost one
that is not det only (see below) - is selected and
a choicepoint created for it. Or-parallelism can be
obtained by exploring choicepoints in parallel.
The model and its prototype implementation,
Andorra-I, are described in [Santos Costa et al. 1991].
Andorra-I supports· the Prolog language
augmented with a few features specific to the model.
For example, det_only declarations allow the
programmer to specify that goals for some predicate
845
can only be executed in the determinate phase; if such
a goal remains in the non-determinate phase it cannot
be used to create a choicepoint, even if it is the
leftmost goal. Conversely, non _ de t _ 0 n 1 y
declarations can be used to prevent goals from
executing in the determinate phase even if they are
determinate.
Performance results for Andorra-I show that the
system obtains good speedups from both orparallelism and and-parallelism. The best andparallel speedups are obtained for programs that are
completely determinate (and therefore have no orparallelism to exploit). The best or-parallel speedups
come from search programs, especially when
searching for all solutions.
Unfortunately, very little and-parallel speedup has
typically been observed in running standard Prolog
search programs on Andorra-I. One reason for this is
the sequential bottleneck inherent in the Basic
Andorra model: the periods (both during the nondeterminate phases and while backtracking) when no
and-parallel execution is performed.
This suggests that the key to obtaining greater
and-parallel speedup is to increase the "granularity"
of the and-parallelism. That is, it is important to
minimize the number of choicepoints created and the
number of goal failures, relative to the total number of
inferences. One way to achieve this in search
programs is by the use of constraint satisfaction
techniques.
4
Implementing finite domains
in Andorra-I
In order to experiment with finite domain constraint
solving on Andorra-I, we have defined and
implemented finite domains and a few simple
primitives to operate on them. Our system defines a
new data type, a domain, which exists alongside
numbers, structures, etc. Domains can only be used
as arguments to the domain primitives and have no
meaning elsewhere in a program; for example, they
cannot be printed. A domain is created with a set of
possible values that it may take; eventually it may
become instantiated to one of those values, at which
time we call it an instantiated domain. In contrast with
the Chip concept of domain variables, a domain
instantiated to t is not identical to t. We write a
domain as a set {tt,. . .,tn}, where tt,. .. ,tn are its current
possible values; {t} represents an instantiated domain.
Our domains,are easier to implement than domain
variables because there is no need to change many
basic operations of the system such as unification,
suspension on variables, etc. At the same time, the
efficiency of implementation should be comparable
with that of domain variables, while our primitives
are still quite convenient to use.
We describe our primitives first and then outline
their use and implementation.
4.1
Finite domain primitives
Domains can be created by the primitives
make domain and make domains. The latter is
potentially more efficient when creating many
domains ranging over the same values since the table
of values can be shared.
All of the other primitives operate on existing
domains; they can only be executed when their first
argument is instantiated and will fail if this is not a
domain. domain_var performs the mapping
between a domain and its ultimate value, while
domain remove allows the removal of values from a
domain.-Either of these may cause the domain to be
instantiated: the first in a positive way, the second by
removing all but one of the values. doma in_gue s s is
the only non-determinate primitive. The last two,
domain_size and domain_values, may yield
different results depending on when they are called
and should therefore be used with care.
make_domain(D,Set)
Can be executed when Set is instantiated to a
non-empty list of distinct atomic terms, [tt, .. .,tn ].
D, which should be an unbound variable, is bound
to a new domain, {tt,. .. ,tn }.
make_domains (Ds,Set)
Can be executed when Set is instantiated to a
non-empty list of distinct atomic terms, [tt,. . . ,tn ],
and D s is a list of variables. Each variable in D s is
bound to a new domain, {tt, ... ,tn }.
domain_var(D,Var)
Unifies Va r with the value variable (a normal
logical variable) of domain D. Subsequently, if D
becomes an instantiated domain {t}, t is unified
with Va r. Alternatively, if Va r becomes
instantiated to t, if t is currently in the domain D, D
becomes an instantiated domain {t}, otherwise
failure occurs.
domain_remove (D,Value)
Can be executed when Value is ground. If Value
is not currently in the domain D, there is no effect.
If D is the instantiated domain {Val u e} the
primitive fails. Otherwise Value is removed from
the domain; if only one value, t, remains in the
domain D becomes instantiated ~o {t}.
domain_9uess (D)
Instantiates D non-determinately to one of its
possible values. If D is the domain {tt, . .. ,t n }, D is
instantiated successively to (ttl, ... , {tn}.
Note that do m a in 9 u e s s (D) is nondeterminate (unless D is already instantiated) and
can therefore be executed only if there are no
determinate goals to execute.
846
domain guess (Ql),
Ql :i"" Q2, Ql * Q2
Ql * Q3, Ql * Q3
Ql * Q4, Ql * Q4
domain guess (Q2),
Q2 :i"" Q3, Q2 * Q3
Q2 * Q4, Q2 * Q4
domain guess (Q3),
Q3 :i"" Q4, Q3 * Q4
domain_guess (Q4) •
domain_size(D,Size)
S i z e is unified with a positive integer which
indicates the number of values currently in
domainD.
domain_values (D,Values)
Val u e s is· unified with a list of the values
currently in domain D.
4.2
Finite domain programming
Like Chip, our aim is to provide the programmer with
a language as close as possible to Prolog but with the
extensions necessary for constraint programming.
However, the "Prolog" language supported by the
Basic Andorra model differs in behaviour from that of
regular Prolog, and this affects how the language is
used. In this section we outline how our primitives
can be employed in the context of Prolog on
Andorra-I to solve constraint problems.
Program 1 is our solution to the familiar N-queens
problem. This program is almost identical to the Chip
one on p123 of [Van Hentenryck 1989a], except that
the result of the goal four_queens (Qs) is a list of
domains (which can be converted to a numeric value
by domain_ var). However, it executes differently.
The execution order in Chip is the same as in Prolog,
repeatedly executing a domain_guess goal for one
domain followed by a no a t t a c k goal to remove
inconsistent values from the other domains. On
Andorra-I the program executes all of the queens and
noattack goals first, since they are determinate, and
sets up all '*' constraints before domain _gues s is
called to non-determinately generate domain values.
four queens (Qs) :Qs = [Ql, Q2, Q3, Q4] ,
make domains (Qs, [1,2,3,4]),
queens (Qs) •
queens ( [ ] ) .
queens ([QIQs])
domain guess(Q),
noattack(Q, Qs, 1),
queens (Qs) .
noattack ( , [],
).
noattack (Ql, [Q2TQs], N)
Ql
Q2,
Ql
Q2 - N,
Ql
Q2 + N,
Nl is N + 1,
noattack(Ql, Qs, Nl).
*
*
*
Program 1: N-queens
At the end of the first determinate phase, the
resolvent contains only the following goals, for
domain_guess and the inequality predicate '*',
where each of Ql, Q2, Q3, and Q4 is an uninstantiated
domain:
- 1, Ql
- 2, Ql
- 3, Ql
- 1, Q2
- 2, Q2
-
1, Q3
*
*
*
*
*
*
Q2 + 1,
Q3 + 2,
Q4 + 3,
Q3 + 1,
Q4 + 2,
Q4 + 1,
The only goals that can be executed in the nondeterminate phase are for domain guess, since the
'*' goals are treated as det only (see Section 3).
Selecting the leftmost goal, domain guess (Ql), Ql is
instantiated non-determinately to the domain {1} and
a new determinate phase begins, in which all nine '*'
goals containing Ql can be executed in parallel.
This example illustrates a difference between our
language and Chip, which follows from the Basic
Andorra model: that the order of goals in a clause is '
irrelevant. Constraints and generators can appear in
any order, but the constraints will always be set up
before any non-determinate bindings are made. This
is important, since it results in a smaller search space.
In order to get the same effect (called "generalized
forward checking") in Chip, the structure of the
program has to be changed. However, we do have to
make sure that constraints can be executed
determinately, so that they execute first, whereas
constraints need not be determinate in Chip.
The inequality predicate '*' used above is an
example of a constraint that is to be executed by
forward checking.
Such predicates can be
programmed using the primitives of Section 4.1. As
an example, Program 2 defines a constraint
plusorminus (X, Y, C), which means X=Y-C or
X=Y+C. This can be executed in a forward checking
way when either of domains X and Y is instantiated
and the third argument is ground; it then leaves only
(at most) the two values Y-C and Y+C (resp. X-C and
x+c) in the domain of X (resp. Y).
In Program 2 we use Pandora syntax [Bahgat and
Gregory 1989]. The plusorminus procedure is a
"don't-care procedure" in the style of Parlog: the first
clause removes the appropriate values from the
domain of Y if domain X is instantiated, while the
second does the converse. This procedure uses the
da t a primitive to wait for the domain to be
instantiated and the operator ': to commit to the
appropriate clause. A sequential conjunction operator
'& is used in the pm procedure, so that the values
currently in domain Yare found (by a call to
domain_values) only after the other arguments are
instantiated. It then f i 1 t e rS these values to find
which ones must be removed from the domain, and
removes them by calling domain_remove.
In addition to primitive constraints such as
inequality, Chip allows user-defined constraints.
These are conventional Prolog procedures augmented
with a 'f 0 r war d' declaration indicating which
I
I
847
arguments should be ground and which should be
domain variables. For example, plusorminus is
defined [Van Hentenryck 1989a: pl34] as follows:
forward plusorminus(d,d,g).
plusorminus(X,Y,C) :- X is Y - C.
plusorminus(X,Y,C) :- X is Y + C.
implements the first-fail heuristic (the noattack
procedure is the same as in Program 1), and illustrates
the general structure of such programs. Note that the
"guessing" and "checking" components (the
guess_queens and check_queens procedures)
must be separated, though their order is unimportant.
four queens (Qs) :Qs = [Ql, Q2, Q3, Q4] ,
make domains (Qs, [1,2,3,4]),
guess queens (Qs),
check=queens(Qs) .
The problem with allowing user-defined
constraints in Andorra-I is that the procedures may in
general be non-determinate and, in any case, a search
is required through the elements of a domain. One
way to handle such constraints is by transforming the
procedure to a determinate, forward checking,
equi valen t, as we did with p 1 u s 0 r min u s in
Program 2. Another way would be to use a
"determinate bagof" primitive which is currently
being implemented in Andorra-I. This is similar to
the bagof of Prolog but it executes as part of the
determinate phase as a new subcomputation, even if it
has to create internal choicepoints.
mode plusorminus(?, ?, ?).
plusorminus(X, Y, C) [N 1from(N+ 1)].
F 2: sieve([P 1L]) ==> [P 1sieve(filterp(P,L»].
sieve(from(2)) in eland truncate(X, [H 1TJ) in C 3 is tried as
follows:
~
~
~
~
call E-Unify(sieve(from(2», [H 1T])
call E-Unify(from(2), [P 1L]) /* by F 2 */
exit E-Unify([21 from(2+ 1)],
[21 from«2+ 1)]) /* by F 1 */
call E-Unify(sieve([21 from(2+ I)], [H 1T])
exit E-Unify([21 sieve(filterp(2, from(2+1»)],
[2 1sieve(filterp(2,from(2+ 1»)]) 1* by F 2 */
In this E-Unification+ process, the reduction of a functional term is initiated when a head pattern of a clause or
rewrite rule is a non-variable term and the corresponding argument of the caller is a functional term. Note that
the functional term is not completely reduced to its normal form, but to WHNF, which makes it possible to handle the infinite data structures. The complete deSCription
of the E-Unification algorithm, called E-Unification with
Lazy Evaluation, is presented in [Nang et al. 1991].
FWAM-II, an abstract machine for Lazy Aflog, is an
extension of W AM augmented with the manipulation of
functional closure. It is characterized by that:
• it adds the reduction mechanism to the W AM architecture, and
• it employs an environment-based reduction rather than
graph reduction.
Since W AM uses an environment for the variables in the
body of a clause, the conventional environment-based
reduction scheme is more suitable to W AM than the
graph reduction is in the combination. Therefore,
FW AM-II behaves similarly to the WAM in the execution
of a clause, whereas it works' similarly to an
environment-based reduction machine in the reduction of
functional term. This W AM-based approach has been
also adopted in other abstract machines for the functional
logic language, such as K-WAM [Bosco et al. 1989] for KLEAF and a W AM model [Nadathur and Jayaraman
1989] for A.-Prolog. The E-unification of Lazy Aflog is
realized in FWAM-II via the reducibility checking in the
unification instructions, which immediately calls the
reduction process if the passed argument is a functional
term and corresponding pattern is not a non-variable
term. To implement the suspension and reactivation of
functional closure, a run-time structure (called, Reduction
Stack) is added to W AM structure. Figure 1 shows a
compiled FWAM-II code for the filterp function in Exam~
pIe I, where mode and eq are predefined strict functions.
Upon the benchmark testing [Nang et al. 1991], the
mechanism of FWAM-II is relatively less
effICIent than W AM executing pure Prolog programs
because of its overhead to construct and reference the
fun~ti?nal closure, but it can support lazy evaluation in
lOgiC m the abstract machine level. Consequently, it is
argued that FWAM-II can support not only all the
features of logic language but also the essential features
of functional language with the performance comparable
toWAM.
re~~ction
F 3 : filterp(P, [X 1L]) ==> «X%P) == a 1 filterp(P,L»
F 4:
[X 1filterp(P,L)].
3 A Parallel Computational Model
In Example I, a query ":- test(100)." generates 100
consecutive prime numbers as its result. In the course of
the refutation of the query, the unification of truncate(100,
~bstract
for Lazy Aflog
Although FWAM-II would be an efficient sequential
machine. for Lazy Aflog, it has the speed limitation because of ItS sequential nature. A natural way to
overcome this obstacle is to extend it in parallel. This
853
F 1: filterp(P, [X I L])
F 2:
=> «X%P) == 0 I filterp(P,L»
[X I filterp(P,L)].
3
allocate
% Pattern Matching
'P',X1
fget_value
X2
fget_list
'X'
match_value
'L'
match_value
% Guard Checking
try_me_else_L
F2
X,Xl
put_value
P,X2
put_value
call_P_Arity_N
model2,2
O,X2
put_integer
call_P_Arity_N
eq12,2
% Committing
commit
% Construct WHNF
'filterp/2', Xl
write_function
'P'
write_value
'L'
write_value
Xl
rewrite_value
% Returning
return
trust_me_elseJail
'filterp/2', Xl
write_function
'P'
write_value
'L'
write_value
X2
write_list
'X'
write_value
Xl
write_value
X2
rewrite_value
return
Figme 1 A Compilation Example
section addresses our point of view that adopts the RAP
as our starting point, and presents a parallel computational model for Lazy Aflog.
3.1 Parallelisms in Lazy Aflog Programs
Lazy Aflog has various kinds of parallelisms inherited from both function and logic, such as ANDParallelism, OR-Parallelism, and Argument-Parallelism.
Among these parallelisms, we adopt the Independent
AND-Parallelism as the primary parallelism owing that:
• Ideally, all parallelisms in the Lazy Aflog program can
be exploited in the parallel extension. However, it may
require a complex control mechanism that may
degrade the performance gains obtained through the
parallel execution.
• Since the Argument-Parallelism in the functional
language part can be viewed as a kind of
Independent-AND Parallelism in the logic language
part, we can exploit parallelisms in both of the functional and the logic parts in a simple and coherent
manner if there is a parallelizing method for it.
• There have emerged an efficient and powerful computational model and an abstract machine for
Independent-AND Parallelism of logic programs.
DeGroot's RAP Model and RAP-WAM [Hermenegildo
1986] are such a computational model and an abstract
machine, respectively.
3.2 A Parallel Computational Model: PR 3
A parallel computational model for Lazy Aflog,
called PR3 [Nang 1992], is a parallel model which can
support both of the parallel resolution and parallel lazy
reduction simultaneously. The basic principle to spawn a
parallel task is as follows;
Rule 1) the subgoals in a clause are executed in parallel
when their arguments are independent or ground
Rule 2) the arguments of a functional term are reduced in
parallel when their WHNFs are demanded and
the function is a strict one
Rule 3) the alternative clauses and rewrite rules are tried
sequentially using the top-down strategy
The algorithm of independent and ground are same as
the ones defined in [DeGroot 1984]. This principle can be
expressed with an intermediate code, called CGE+
(Conditional Graph Expression +), which is an extension of
DeGroot's CGE [DeGroot 1984]. It is used to express the
necessary conditions to spawn the subgoals or function
reductions in paralleL The body of a clause and righthand side of a rewrite rule are expressed by the CGE+,
which is informally defined as follows;
1) G : a simple goal (or subgoal) whose argument can be
a functional term.
2) (SEQ E 1 . . . En) : execute expressions E 1 through En
sequentially
3) (PAR E 1 . . . En) : execute expressions E 1 through En
in parallel
4) (GPAR (V 1 . . . Vk) E 1 . . . En) : if all the variables V 1
through Vk are ground, then execute expressions E 1
through En in parallel ; otherwise, execute them
sequentially
5) (IP AR (V 1 . . . Vk) E 1 . . . En) : if all the variables V 1
through Vk are mutually independent, then execute
expression E 1 through En in parallel; otherwise, execute them sequentially
6) (IF B E 1 E 2) : if the expression B is evaluated to
true, execute expression E 1; otherwise, execute
expression E 2
7) F (SEQ Fl'"
Fn) : if F is a construct symbol or
non-strict function symbol, then construct WHNF
F ( Fl' .. Fn) sequentially ; otherwise (i.e. F is a
strict function symbol) evaluate expressions F 1
throuph Fn s,equentially and eventually evaluate
F(F 1 . . . Fn)
8) F (PAR F 1 .. , Fn) : if F is a construct symbol or
non-strict function symbol, then COl~stru~t W~NF
F ( F 1 .. , Fn) sequentially ; otherwISe (z.e: F 15 a
strict function symbol) evaluate expresslOns F 1
throuph Fn i,n parallel and eventually evaluate
F( F1 .,. Fn)
The expressions 1) through 6) are the same as the
DeGroot's CGE for the clause (actually they are
improved CGE defined in [Hermenegildo 1986]), while
expressions 7) and 8) are new expressions for rewrite
854
rules. Note that there are no conditions to check the
groundness of function arguments in the expressions 7)
and 8), since they are automatically checked by pattern
matching semantics of the rewrite rules. That is, the
arguments of a rewrite rule are always ground, hence,
they can be always evaluated in parallel. In expression
8), the arguments are reduced in parallel only if the function is strict, which results in its WHNF. Otherwise, it is
rewritten to the term in the right-hand side, which is
returned as the result. In this case, as it is not WHNF, it
induces another reduction process. The reason to adopt
this reduction strategy rather than directly call the nonstrict function, is in order to keep the storage optimization based on tail-recursion.
(b) When the body ofCl is executed. The goals
p(X.z) and q(Y,W) can be executed in parallel
Example 2 is a CGE + for a Lazy Aflog program. It
can be automatically generated from the Lazy Aflog program by the parallelizing compiler, or programmed
directly by the programmer.
Example 2 A CGE+ for the Lazy Aflog program
C 1 : test(X,Y) :- (IPAR (X,Y) p(X,Z) q(Y,W», r(f(Z), g(W».
c 2: test(X,Y).
C3 : p(a,I).
C 5 : q(c,3).
C 4 : p(b,2).
C 6 : q(d,4).
C7: r(2,S).
L...-_ _ _
(c) The reduction of f(l) causes the reductions of
fib(l) and fib(l *2) in parallel
C 8 : r(4,72).
F 1 : f(X) ==> (X == 0) I 0,
F2
:
+(PAR fib(X) fib(2*X».
g(Y) ==> *(PAR factorial(Y), fib(Y».
In Example 2, as the subgoals p (X ,Z) and q (Y ,W)
would generate the values of Z and W that are taken
into the terms I(Z) and g(W), the goal r(/(Z),g(W»
should be executed after the evaluation of them. Figure 2
is the snapshots of the parallel execution of the CGE+ in
Example 2 when "Q 1 :- test(b,d)" is given. In Figure 2, the
rectangle, circle and rounded-rectangle represent OR
node, AND node and reduction node, respectively. The
number attached to each node represents the order of
execution, while the filled nodes represent the activated
nodes at that time. Note that, since the Unification Parallelism is not exploited in PR 3, the functional terms 1(1)
and g (3) in the step (c) are reduced sequentially,
although they can be evaluated in parallel if the
innermost-like reduction strategy is used. The backward
execution of the PR 3 is the same as the one presented in
[Hermenegildo 1986] because there are no backtracking
in the reduction phases after a functional term is eventually reduced to WHNF. For example, in the step (d), the
subgoal q (Y ,W) which generate the arguments W would
search alternative solutions for Y and W when a fail is
occurred, rather than to generate another WHNF for g(3)
or 1 (1).
4
A Parallel Extension of FWAM-II for PR3
The desirable characteristics of parallel abstract
machine is to support the parallel execution while retaining the performance optimizations offered by the current
sequential systems. To achieve this goal, a parallel
abstract machine for PR3, called PFWAM-II (Parallel
FWAM-II), is designed as an extension of the sequential
abstract machine FWAM-II. It is equipped with the runtime structures and instruction set to fork and join the
parallel executions. We adopted the nm-time structures
3
-2:p(~
C7:r(2.5)
C8:r(4.72
(d) Since there are no clause unified with r(2,12), ,
a 'fail' message is sent to q(Y,W), and now C6 is tired.
Figure 2 The Parallel Execution Snapshots of the
Lazy Aflog Program in Example 2
and instructions of the RAP-WAM for the extension of
FWAM-II because it is also an extension of W AM for
AND-Parallel execution of Prolog and has a general
primitive to fork and join the parallel tasks. Figure 3
shows the relationships between WAM, FWAM-II, RAPWAM, and PFWAM-II.
RAP-WAM
~=~_~
PFWAM-/I
Figure 3 The Relationships Between WAM, FWAM-II,
RAP-WAM and PFWAM-I1
855
4.1 Run-Time Structures for Parallel Execution
The run-time structure of PFWAM-II is an extension
of FWAM-II for parallel executions as shown in Figure 4.
It consists of three parts; First, the Heap, Trail, Environment, and Choice Point are structures for the execution of
the logic part, and inherited from W AM ; Secondly, RS
(Reduction Stack) is the structure only for the function
reduction, and inherited from FWAM-II ; Finally, GS
(Goal Stack), ParCali Frame, Local Goal Marker, Input
Goal Marker, and Wait Marker are run-time structures
for the parallel executions of subgoal or function
reduction, that are inherited from RAP-WAM with slight
modifications.
HEA
REDUCTION
SIM:K
H
CFA3
CP~
P
CODE
,
-
1
-
TRm
MESSAGE
BUFFER
MB
-,-
Figure 4 Data Areas and Registers for One PFWAM-II
In fact the run-time structures of the parallel execution is almost ~he same as that of RAP-WAM except that
a parallel task ill PFWAM-IT can be a reduction of a func~onal term as well as the evaluation of subgoal, whereas
m RAP-WAM, only the evaluation of a subgoal can be a
parallel task The run-time structure for parallel execution are the Goal Frame, ParCall Frame, Input Goal Marker,
Local .Goal Marker, and Wait Marker. Let us explain them
focusmg on the extensions which allow them to be also
used for function reduction.
• The Goal Frame :
The subgoals or the functional terms which are ready
to be executed in parallel are pushed onto the Goal
Stack Each entry in the GS is also called a Goal Frame
as in RAP-WAM. A Goal Frame contains all the necessary information for the remote execution of tasks.
There are two kinds of Goal Frame in PFWAM-II; one is
for a subgoal, and the other is .for a function reduction.
They are distinguished by the special tag in the Goal
Frame. When a Goal Frame is the one for t1;le subgoal,
the structure of Goal Frame is the same as in RAPW AM ; otherwise (i.e. it is one for the function reduction), it contains the extra pointer to the functional
term to be reduced. In both cases, they are stolen from
Goal Stack by a remote processor, and executed
remotely in the same way.
• The ParCall Frame:
It is used to keep track of the parallel tasks during forward and backward executions of PR 3. The entries
and meanings of the ParCali Frame that is created for
each parallel task are the same as in RAP-WAM. If a
ParCall Frame is the one for the parallel function reductions, it immediately disappears from the Local Stack
when the parallel reductions are completed because
there is no backtracking in the reduction process. It is
different from the case of parallel subgoal calls, in
which it remains in the Local Stack in order to select the
appropriate actions during backtracking.
• The entries and meanings of the Input Goal Marker,
Local Goal Marker, and Wait Marker are the same as in
RAP-W AM. However, they also immediately disappears when the task is a function reduction and it is
reduced to WHNF.
The general execution scenario of PFWAM-II is as
follows. As soon as a processor steals a task from another
processor's Goal Stack, it creates an Input Goal Marker on
its top of Local Stack, and checks whether it is a subgoal
or a function reduction. If it is a subgoal, the processor
starts working on the stolen sub goal by loading its argument registers from the parameter register fields in the
Goal Frame and fetching instructions starting at the location (procedure address) received. If the stolen task is a
function reduction, the processor loads the arguments
and finds the starting address of the corresponding
rewrite rule by referencing the functional term stored in
the Heap of the parent processor. It was recorded on the
Goal Frame by the parent processor. At any case, the local
stacks of the processor will then grow (and shrink) as
indicated by the semantics of FWAM-II.
When a parallel call is reached, a ParCall frame is
created on the top of the Local Stack and tasks are pushed
on to the Goal Stack. If there are no idle processors in the
system at that time, the processor itself gets the goal from
its Goal Stack again, makes a Local Goal Marker, and executes the task locally. If the parallel call is one for the
subgoals, an Wait Marker is created on the top of the Local
Stack as soon as all subgoals succeed. It is used for the
backward execution of PFWAM-II. However, if the
parallel call is for the function reduction, the ParCall
Frame, Local Goal Marker, or Input Goal Marker, created on
the local Stack can be removed since there is no backtracking in the reduction process. After the parallel call
is finished, the execution can continue normally beyond
the parallel call.
4.2 Instruction Set
The instruction set of PFW AM-II consists of the
FWAM-II instructions and the new instructions implementing RAP as shown in Table-I. Since the FWAM-II
instructions were explained in [Nang et al. 1991] and the
instructions to fork and join the parallel call when tasks
are subgoals are almost the same as the RAP-WAM, we
only explain the instructions to control the parallel reduction. To fork and join the parallel executions are actually
the same as the RAP-W AM when the parallel call is a
determinate one. However, some attentions are required
since the tasks to be forked can be functional terms.
• pushJeduce Vn, Slot_Num
It makes a new goal frame on the Goal Stack with the
Slot_Num for the functional term pointed by Vn.
856
The PFWAM-il Instruction Set
The PFWAM-II Instruction Set
YVAM instructIons
J:'roceaure _'-..ontrol
L
try
retry
L
trust
L
try_me_else
L
retry_me_else
L
trust me else
fail
Get
get_variable
Vi,Ai
get_value
Vi, Ai
get_constant
get_list
get_structure
get_nil
C,Ai
Ai
S,Ai
Ai
.-!n~exmg
switch on term
switch-on-constant
switch=on=structure
Ai, v,c,l,s
n, ff
n,ff
Put
put_variable
put_value
put_unsafe_value
put_constant
put_list
put_structure
put_nil
Vi,Ai
Vi,Ai
Yi, Ai
C,Ai
Ai
S,Ai
Ai
call
execute
proceed
allocate
deallocate
Llause_~..;onJ:!"Ql
P/arity
Unify
unify_variable
unny_value
unny_unsafe_value
unny_constant
unny_list
un~_structure
un' nil
uni -void
Vi
Vi
Yi
C
S
I{eauctIonlnstructIons
l'get
tget_value
fget_constant
fget_list
fget_structure
fget_nil
~atcnmg
\Ii, Ai
C,Ai
Ai
S,Ai
Ai
match_value
match constant
match-structure
match=list
R~uction Contro~
commit
return
rewrite_value
1!n~ng
Vi
C
S
write_value
write constant
write-structure
write-list
write-function
write-structure value
Reducmg
reduce_value
RewrItmg
Vi
arallel Abstract Machme ~[)eClhC InstructIons
InstructIons
Parallel ReauctIon
push_call
JJush_reduce
Pid/Arit~lot#
Slot_#, bel
aeallocate_pcall
check_read y
Vn,Vm
check_independent
waiting_ on siblings
~-l'Vi ~M
check_me_else_Iabel Label
check_ground
Vn
allocate_pcall #_oCslot, M
J20R _pending__ goal
• deallocate-rcall
It is used to join the parallel reductions. It waits until
the number of goals to wait on in current ParCall Frame
is 0; then, removes the current ParCall Frame from the
local Stack.
Figure 5 shows the simplified PFWAM-ll codes for
F 2 of the CGE + in Example 2, in which since '+' and '*'
are strict functions, their arguments are reduced directly
rather than constructing the functional closure.
5
Analysis
5.1 Performance Evaluation
In order to estimate the performance of our parallel
extension, a simulator for PFWAM-II is developed. In
this simulation, we assumed that there is a common
shared memory for the run-time structures of each processor which are interconnected by a network. Each processor can access the run-time structures of other processors without additional overheads. The performance of
PFWAM-ll is estimated by counting the number of
memory and register references, where the time for
referencing data stored in the shared memory (whether it
is local or not) is assumed 3 times longer than the time
for register referenCing, and the times for other operations such as arithmetic are ignored for the sake of simplicity.
We use three benchmark programs : the first one is
FibonaccilO that is to compute the 10th fibonacci number,
the second is CheckSO [Hermenegildo 1986] in which
there are 10 parallel tasks each of which calls itself 50
Ai
C
S
Ai
F,Ai
5, Ai
Ai
~peclhcs
Vn,
~lot_#
F 2: g(Y) ==> *(PAR factorial(Y), fib(Y».
F 2:
allocate
% Pattern Matching
fget_ value
Yl, Xl
% Spawn Parallel Reduction for factorial(Y)
allocate_pcall
2, 2
put_value
Yl, Xl
write function
factorial/l, Y2
write=value
Xl
pushJeduce
Y2,2
% Spawn Parallel Reduction for fib(y)
put_value
Yl, Xl
write_function
fib/l, Y3
write_value
Xl
pushJeduce
Y3, 1
% Gather the Results
pop_pending_goal
deallocate_pc all
% Construct WHNF
put_value
Y2, Xl
put_value
Y3, X2
call_P_Arity_N
*/2, 2, 1
rewrite_value
Xl
% Returning
return
Figure 5 An Compilation Example
for CGE+ in Example 2
times, and the third is Symbolic Derivation [Hermenegildo
1986] which is to find the derivative with respect to a
variable. There are 176 parallel tasks in the FibonaccilO,
857
10 parallel tasks in the CheckSO, and 152 parallel tasks in
the Symbolic Derivation. These benchmarks are programmedin both of logic and functional programming.
In the simulation of function reduction, the effect of different reduction strategies is also measured. The simulated reduction strategies are Innermost Reduction in
which the innermost functional terms are reduced first
before the outer is tried, Semi-Lazy in which only the
strict functions are reduced in the innermost fashion, and
Lazy Reduction in which all functions are reduced in the
outermost fashion.
Upon the simulation results, the parallelizing overhead, which is defined as the extra execution time for
parallel code running on the single processor, is measured as about 30-60 % when the grain size is relatively
small (for example, FibonaccilO and Symbolic Derivation),
whereas about less than 1 % when the grain size of parallel task is large enough to ignore the overhead (for example, CheckSO). Figure 6 graphically shows the speedup of
the execution time of all benchmark programs as a function of the number of processors. In this figure, since
CheckSO has only 10 parallel tasks, the speedup doest not
increase when the number of processors is larger than 10.
The speedup of other benchmark programs are not linear
because they have too fine-grained parallelism. The most
important fact which can be identified from Figure 5 is
that, whether they are programmed in the logic or functional
style, and whether the reduction strategy is innermost or outermost, the speedup behaviour is almost same. The speedup
ratio is not dependent on the execution mechanisms, but
the availability and grain size of parallelism in the benchmark programs. In other words, PFWAM-II can support
both of the parallel resolution and parallel reduction with
the almost same efficiency.
Figure 7 shows the Working, Waiting, and Idle
times for Symbolic Derivation as a function of the number
of processors. It is from identified from Figu~e 7 that the
processor utilization ratio is reduced proportlOnal to the
number of processors, and the parallel reduction mechanism permits higher utilization ratio than parallel resolution because there is no restriction to steal a task from
other processors when the task is a function reduction
(i.e., there is no "garbage slot problem [Hermenegildo
1986]" when executing the function reduction).
5.2 Comparison with Related Work
One of the most related works is the CSELT's work
centering around K-LEAF. K-LEAF [Levi and Bosco 1987]
is a functional logic language based on the transformation. A rewrite rule in K-LEAF program is transformed
into Prolog clause with an extra argument for the return
value, and the nested function is flattened with produced
variable for the outermost search strategy. K-WAM is an
abstract machine to support outermost-SLD resolution
which is the inference rule of K-LEAF. Accordingly,
there is no real reduction mechanism in K-LEAF and KWAM.
A parallel extension of K-W AM on a distributed
memory multiprocessor is also developed [Bosco et al.
1990]. In this work, K-WAM is extended to control the
OR-parallel execution of K-LEAF progra~, and
parallelism is restricted to be one-soiutlOn. The major
difference between the parallel extension of K-WAM and
PFWAM-IT is that the former is designed for exploiting
only OR-parallelism in the flattened K-LEAF programs,
while the later is designed for exploiting only ANDparallelism of Lazy Aflog programs.
AN?-
Fi bonacci 1O. Speedup
7~------------------------------~
6
5
4
3
2
o
2
4
6
8
10
12
14
16
Number of Processors
(a) SpeedUp vs. # of Processors for Fibonaccil0
CheckSO.Speedup
10~------------------------------~
9
Innermost
Semi-Lazy
Lazy
I
I
I
I
I
I
I
I
I
"'~
---o
2
... ,. ,. .. "," 'Resolution
4
6
8
10
12
14
16
Number of Processors
(b) SpeedUp vs. # of Processors for CheckSO
Derivation.Speedup
6~----------------------------~
5
4
..
Resolution "," '"
"..-_
3
,,"
2
"
... --
Innermost
,. "11 "
(Semi-) Lazy
o
2
4
6
8
10
12
14
16
Number of Processors
(c) SpeedUp vs. # of Processors for
Symbolic Derivation
Figure 6 SpeedUp vs. Number ofProcessors
for Benchmark Programs
6
Summary
This paper presents a pair of a parallel computational model and its abstract machine for a functional
logic language, called Lazy Aflog, which was proposed
as a cost-effective mechanism to incorporate functional
language features into logic language. The proposed
858
References
Resolution: Processors Utilization Ratio (%)
100
80
60
40
20
o
3
5
7
9
11
13
15
Number of Processors
(a) Case of Logic Programming
(Parallel Resolution)
Number of Processors
(b) Case of Functional Programming
(Parallel Reduction)
Figure 7 Working, Waiting Idle Times
for Symbolic Derivation
computational model underlies De 100%
with 128 processors) since the lazy-splitting is better in
coping with irregular shaped tree. However, the overhead of redundant computation makes the lazy-splitting
rule unsuitable for a shallow search tree such as that
of the 8-queens, the zebra, or the turtles program. The
height of the search trees for these programs is not sufficiently larger than log(128), the level at which each of
the 128 processors commits to its own tasks.
4.2
7
logN
The Tree Programs
o
7
1:1~;1
0
B
23456
B
B
B
o
567
log N
Speed-up Factors
Speed-up factor is defined as sequential runtime divided
by the parallel run time. It is a generally accepted indication of how well a parallel system is able to improve
the runtime of a program. Next, we present data showing speed-up factors of the proposed approach on the
selected benchmark programs.
Table 3 lists speed-up factors from a simulation study
running on a uniprocessors. In this simulation, the run
time is measured by the number of resolutions performed
in the execution (number of nodes traversed in the proof
tree).
Proc.
I Prog
128
II------'---'-=Ea-g-e
4 I 8 I 16 I 32 I 64
....
r_-sp"""li:-tt"'""in-g---L.----l
8-queens
9-queens
zebra
turtles
pattern
n-square
tree
3.9
2.9
3.2
3.0
2.8
2.6
2.4
7.5
4.5
4.0
5.2
5.5
2.8
2.4
9.4
8.6
8.3
8.6
6.2
3.2
3.7
31.9
22.7
15.3
15.3
21.7
3.7
6.8
40.1
42.2
20.6
27.6
21.7
6.7
6.8
n-square
tree
2.2
1.6
2.8
2.3
Lazy-splitting
4.3 7.2
13.0
2.7 4.9
9.0
17.7
14.2
17.0
16.7
9.1
8.6
12.5
3.7
6.5
Table 3: Speed-up from Simulation Study. Speed-up is
defined as sequential runtime divided by parallel runtime.
Table 4 lists speed-up factors from a parallel emulation
study running on a BBN Butterfly TC2000 with 32 processors. The run time is measured by the physical clock.
We assume that each resolution step takes constant time.
Cost of a real resolution step varies in general. However,
here we are merely interested in the total time of a task
which consists of a large number of resolution steps. The
total time (the sum of the time by all resolution steps)
can be considered as the average cost of each resolution
step times the number of resolutions. In other words,
the difference of time spent on each resolution step is
immaterial. For a given program, the constant can be
regarded as the average cost of each resolution step.
865
In order to observe the real overhead of task allocation, which is the time to compute the partition of tasks,
the resolution speed must be realistic. In the emulation,
resolution engine speed is set equal to that of Aurora
Parallel Prolog2 , a well known parallel Prolog implementation, running on one Butterfly processor. Both the eager and the lazy scheduling strategies are implemented in
the emulator. The eager-splitting rule was used for the
programs n-queens, zebra patten and turtles. The lazysplitting rule was used for the programs n-square and
tree. From the emulation study, we are able to verified
that the sequential simulation, which measures run time
by the number of resolution steps performed, accurately
reflects the speed-up result by the parallel emulation,
which measures run time by the real clock, for up to 32
processor. The overhead of calculating the task distribution, the only overhead not considered in the simulation,
is nearly invisible in the emulation, given that the speedup factors are almost identical to that from the sequential simulation. Notice that there is no communication
involved here.
Program
1
4 proc
8-queens
9-queens
zebra
turtles
patten
1
1
1
1
1
4.0
3.0
3.2
3.1
2.8
n-square
tree
1
1
2.2
1.6
8 proc 16 proc
Eager-splitting
7.6
9.3
4.6
8.4
4.0
8.0
5.2
8.2
5.5
6.1
Lazy-splitting
2.8
4.2
2.3
2.7
32 proc
16.5
16.4
8.9
8.2
12.1
6.9
4.6
Table 4: Speed-up from Emulation Study.
4.3
Performance Comparison with Aurora Parallel Prolog
The same set of benchmarks were run with Aurora Parallel Prolog on the Butterfly machine. Runtime and speedup factors (the best out of 10 runs) are listed in table 5.
The Peak Speed-up Factors: The speed-up curves
for all benchmark programs either have reached the peak
(bold face numbers) or at least level off with Aurora Parallel Prolog on 32 processors, as shown i-n Table 5. Using the self-organizing scheduling approach, simulation
results (Table 3) on up to 128 processors showed that:
• the peak speed-up factors for the 8-queens, zebra and
turtles programs (with fine grain parallelism) exceed,
by a margin of at least 200%, experimental results
on Aurora;
• the peak speed-up factors for the 9-queens program
is twice as that on Aurora;
2 Aurora
O.6/Foxtrot, patch #8, with the Manchester Scheduler.
• the peak speed-up factors for the n-square program
(with a very bushy search tree) is about 30% faster
that that on Aurora.
I Program II
1 proc
8-queens
9-queens
zebra
turtles
pattern
n-square
1,620
7,500
2,600
4,300
1,084
2,230
I 16 proc I 24 proc
141/11.5
533/14.1
490/5.3
550/7.8
130/8.3
190/11.7
122/13.3
367/20.4
500/5.2
580/7.4
160/6.8
170/13.1
I 32 proc
123/13.2
350/21.4
525/4.9
569/7.5
240/4.5
178/12.6
Table 5: Runtime (ms.) / Speed-up factors with Aurora
Parallel Prolog
Speed-up Comparison: Given the number of processors, the speed-ups achieved by self-organizing scheduling appears to be comparable to that of Aurora, but
somewhat lower when the number of processors is small
(e.g < 16). Note that these results are obtained without
communication. The same speed-up result is expected
to hold regardless of the speed at which resolution engine is running. Therefore, absolute speed comparison
will favor the self-organizing scheduling scheme.
5
Discussion
In the above experiment we studied the behavior of
the proposed technique without communication among
processors. We demonstrated that the scheme is able
to effectively deal with problems which render mostly
fine-grained parallel tasks under a traditional scheduler.
The loss of processor utilization due to the unevenness in
load distribution can be more than covered by the benefit of reduced scheduling overhead. The advantage of the
proposed technique is its non-communicating nature, as
frees it from possible constraints such as communication
bandwidth among processors that could otherwise limit
the ability of a scheduler to function effectively. The limitation, however, is its unable to re-use processors that
complete tasks they allocated before the termination of
the (parallel) execution. We have shown, in the above
simulation study, that this would not necessarily compromise performance of programs specially those that
generate mostly fine-grained tasks at run time under a
traditional scheduler. But the worse case scenario could
happen despite the effort to obtain a better balanced
load distribution by removing structural imbalance of
the search tree and using a statistically even distribution rule. Below, we discuss options to deal with the
problem.
One possible solution to the problem is to resort to dynamic task redistribution as existing schedulers do. As
we know, the overhead of dynamic task redistribution is
relatively small for medium to large-grained tasks, and
it provides us with the adaptiveness necessary to deal
866
with some extraordinary shape search space. On the
other hand, the self-organizing scheduling approach introduces low overhead and thus ensures that when it does
not help improve performance it is not expect degrade
it either. When the two methods are careful integrated,
it can be a combination that takes advantage of what
the two methods are best at. The issue is when and
how dynamic task redistribution should be invoked to
achieve the best result. Preliminary research has been
conducted in this direction and we will present results
in a separate paper. Another option that alleviates the
problem is to have idle processors collected by a higher
level scheduler (e.g. the operating system) and assigned
to other queries. The idea is to use dynamic scheduling
only at the level of user queries which usually offer larger
granule. In a multi-user environment, this approach can
yield a high system throughput given sufficient queries.
Global load balancing is involved here. It appears an
interesting subject for future investigation.
Static program analysis that provides probability of
cut-offs according to given query patterns will be very
helpful to guide task distribution. More research is yet
to be done before this becomes a feasible alternative to
the currently used statistical distribution rule.
Finally, we note that an interesting feature of the selforganizing scheduling approach is that it establishes linkage between processor mapping and the syntax of a program. This feature provides user a mean to influence
the mapping of processors to tasks, as would be particularly helpful for applications in which tasks are clearly
defined and dynamic task redistribution is known to be
not beneficial (there are many such applications). Again,
dynamic task redistribution can be used to guard against
abuse of this feature.
We presented data showing the effectiveness of the proposed methods on programs that belong to the generateand-test category. By removing structural imbalances
in a program, it was found that a reasonably balanced
load distribution can be obtained by following a statistically even distribution rule. We discussed two distinct task distribution rules, the eager-splitting rule
and lazy-splitting rule and examed their effectiveness.
We showed that the peak speed-up factors with selforganizing scheduling for a set of benchmark programs
exceeds, by a substantial margin, results achieved on the
same programs by Aurora Parallel Prolog, a well-known
parallel Prolog implementation. Given a fixed number
of processors, the speed-up factors by the self-organizing
scheduling scheme are competitive. By experimenting
with the two near-extreme case task distribution rules
we also demonstrated that adaptability can be gained
on the cost of redundant computation within this framework.
We believe that the condition for task distribution derived in the paper can be useful for other scheduling
schemes. Also, the idea of removing structural imbalances in a program will help with a tree-based scheduler that employs the top-most dispatching strategy
[But88, Cald88].
We are currently investigating incorporating traditional task redistribution techniques in order to handle
large but highly uneven shaped search trees. Preliminary results indicate that allowing limited communication among processors one can substantially improve
the efficiency of the execution. Global load balancing,
aimed at maximizing throughput of a system that supports multiple user and multiple queries, is an interesting
topic for future research.
6
References
Conclusion and Future Work
A task scheduling technique, self-organizing scheduling,
is proposed in this paper. The method directs processors
to share the search space, a search tree defined implicitly
by the program, according to universal rules followed by
every processor in the system. Load balance is achieved
by altering the shape of the search tree to remove the
so-called structural imbalance (see section 3), and imposing a statistically even task distribution rule to deal
with the randomness in cut-offs in the tree. Resolution
engines only share the program and the original query. A
condition for task distribution that minimizes the average parallel runtime is given and proved. An advantage
of the method is that it allows all processors to operate independently on private resources both for resolution and task allocation, while being able to maintain a
fairly balanced load distribution among processors. The
effectiveness of the self-organizing scheduling scheme is
independent of the speed of the resolution engine, and
architectural characteristics of the multiprocessor.
[Ali90] Ali, K. and Karlsson, R., "The Muse Or-Parallel
Prolog Model and its Performance", Proceeding of
the North American Conference of Logic Programming, 1990, MIT press, 1990.
[Ali91] Ali, K. and Karlsson, R., "Scheduling OrParallelism in Muse", Proceeding of the 8th International Conference on Logic Programming, MIT
Press, 1991.
[But88] Butler, R., Disz, T., Lusk, E., Overbeek, R.,
and Stevens, R., "Scheduling OR-Parallelism: an
Argonne perspective", Logic Programming, Proceedings of the Fifth International Conference and Symposium on Logic Programming, MIT press, 1988.
[Cald88] Calderwood, A., Szeredi P., "Scheduling Orparallelism in Aurora - the Manchester Scheduler",
Proceedings of the Sixth International Conference on
867
Logic Programming, pages 419-435, MIT Press, Jun.
1989.
[Clock88] Clocksin, W. F. and Alshawi, _H., "A Method
for Efficiently Executing Hown Clause Programs Using Multiple Processors", New Generation Computing, 5, 1988 P 361-376 OHMSHA, Ltd. and Springer
Veriag.
[GiuI90] Giuliano, M., Kohli, M., Minker, J., Durand,
I., "Prism: A Testbed for Parallel Control", Parallel Algorithms for Machine Intelligence, edited by
Kanal, L., and Kumar, V., to appear.
[Kale85] Kale, L. V., "Parallel Architectures for Problem Solving", Technical report No. UIUCDCS-R-851237, Department of Computer Science, University
of Illinois at Urbana-Champaign.
[Kumar87] Kumar V. and Nageshwara Rao V. "Parallel Depth First Search. Part II. Analysis" International Journal of Parallel Programming, Vol. 16,
No.6, 1987.
[Lloyd84] Lloyd, J. W. "Foundations of Logic Programming", Springer-Verlag, 1984.
[Lusk] Lusk, E., Warren H. D., Haridi, S., et al. "The
Aurora Or-Parallel Prolog System", Argonne internal technical report.
. [Mud91] Mudambi, S., "Performance of Aurora on
NUMA Machines", Proceeding of the 8th International Conference on Logic Programming, MIT
Press, 1991.
[VR90] Van Roy, P. 1., "Can Logic Programming Execute as Fast as Imperative Programming", Univ.
of California, Berkeley Technical Report UCB/CSD
90/600, Dec., 1990.
Appendix
We prove the following theorem:
Theorem: Let N be the number of processors, let m
C~ is an integer) be the number of tasks whose sizes are
statistically identical and exhibits the following property:
1. the probability density function is non-increasing, or
2. the probability density function is symmetric with
respect to a positive central point.
then the average parallel runtime is minimized iff identical number of processors are assigned to each of the
tasks.
Before the proof, we describe some basic terminology
and notations to be used.
Capital letters X, Y, Z are used for random variables.
The probability density function for X is fx(x), the
cumulative probability distribution function for X is
Fx(x), we have Fx(x) = f~oo fx(t)dt by definition. Or
in other words, fx(x) = Fx(x). In addition, fx(x) ~ 0
and 0 ::; Fx(x) :::; 1. Fx(x) is non-decreasing since
fx(x) ~ o.
Runtime of a parallel execution is the longest runtime
of all processors. Runtime is measured by the size of a
task, in our case, the number of nodes to be traversed in
a search tree.
N is the number of processor available. Tl, T2 , ••• , Tm
are random variables denoting the size of m tasks which
are statistically identical, that is, with an identical
probability distribution function f(x) and F(x). Let
kl' k2' ... , km be the number of processors assigned to
TIl ... ' Tm , respectively. kl + k2 + ... + km = N.
We illustrate the proof with a special case when m = 2.
Proof:
Let Z be a random variable denoting the runtime by
assigning kl to task Tl and k2 to task k2. We assume
that Tl is processed in time f; and T2 is processed in
time
Tl T2
Z = max(-,-)
kl k2
The cumulative distribution function for Z is FAx),
f!.
probability that Z ::; x
probability that
(~~
::; x) AND
(~:
::; x)
probability that (Tl ::; klX) AND (T2 ::; k2X)
F(klX)F(k2X)
Average runtime is the mean of Z,
We- need to show that Z is minimized when kl = k2'
given that kl + k2 = N, a constant.
For fixed kI,k2' define function G(x) = F(klX)~F(k2X).
We have
since F(klX )F(k2X) ::; G2(X), given that F(x) is nonnegative. Equality holds when kl = k2 •
Case I: the probability density function f(x) is nonincreasing.
It can be shown that the curve of F (x) is either of an
arch shape, or a straight line, as illustrated in figure 5.
The curve of G( x) lies below (or on) that of F( x) because
the curve of G(x) is composed from center points in lines
whose two ends are on curve F(x). G(x) - F(x) ::; 0,
hence G2(X) - F2(X) = (G(x) - F(x))(G(x) + F(x)):::; 0
Therefore,
868
y
klx
x
(k1+k2)x/2 k2x
Figure 6: An S Shape Distribution
Figure 5: An Arch Shape Distribution
Equation holds when kI
Thus, we have
I:
(1 - Pz(x»dx
~
that 11, after the rotation, completely lies above or on 12 •
Thus,
= k2 •
1:(1-
2
G (x»dx
~
1:(1 -
and equality holds when kI = k2 • Thus the mean of Z
is minimized when kI = k2 •
Case II: the probability density function f(x) is symmetric with respect to a positive center point, denoted
by C.
The curve of F( x) is of the shape an S tilted to the
right, as illustrated in figure 6. The curve of G(x) is
another S shape curve "contained" in that of F(x). We
want to show that
'
or,
This is equivalent to showing
smce
i:
i:
(F(x) - G(x))dx
(F(x)
F(C + x) - G(C + x)
~
G(C - x) - F(C - x)
P2(x»dx
~0
+ G(x))dx > 0
Notice that we can no longer have (F(x) - G(x)) ~ 0
for all x. However, the integral of (F(x) - G(x)) can
still be non-negative if we can proof the shaded areas'
A2 is larger or equal to Al in figure 6. It suffices to
show that for any (C - x) and (C + x) on the X axe,
F(C+x)-G(C+x) ~ G(C-x)-F(C-x), and
equality holds when k1 = k2.
Observe that (C-x,G(C-x)) is the center point of a
line, 11, whose end points are on the curve of F(x).
(C+x,G(C+x)) is the center point of another line, 12 ,
whose end points are on the curve of F(x). Now, rotate
the lower part of the S shaped curve of F( x) 180 0 • The
two part of the S matches each other and it can be shown
Equality holds when k1 = k2. Proof done for m = 2. 0
The same idea can be used to prove the general case.
A formal proof of the general case will not be presented
here, but we note that a property of polygon that is
crucial to the proof is that the center of a convex polygon
resides inside the polygon.
Acknowledgement: I wish to thank Professor Jack
Minker for his guidance on this work. Thanks to Dr.
Mark Guiliano for his comments on an early draft of this
paper. Also, I would like to express my appreciation
to Argonne National Laboratory for providing parallel
computing facilities.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
869
Asymptotic Load Balance of Distributed Hash Tables
No buyuki Ichiyoshi and Kouichi Kimura
Institute for New Generation Computer Technology
1 - 4 - 28 Mita, Minato-ku, Tokyo 108, Japan
{ichiyoshi,kokimura}@icot.or.jp
Abstract
The distributed hash table is a parallelization of the hash
table obtained by dividing the table into subtables of
equal size and allocating them to the processors. It can
handle a number of search/insert operations simultaneously, increasing the throughput by up to p times that of
the sequential version, where p is the number of processors. However, in the average case, the peak throughput
is not attained due to load imbalance.
It is clear that the table size m must grow at least
linearly in p to balance the load. In this paper, we
study the rate of growth of m relative to p necessary to
maintain the load balance on the average (or to make
it approach the perfect load balance). It turns out
that linear growth is not enough, but that moderate
growth-namely w(p log2 p )-is sufficient. The probabilistic model we used is fairly general and can be applied
to other load balancing problems.
We a.lso discuss communication overheads, and find
that, in the case of mesh multicomputers, unless the network channel bandwidth grows sufficiently as p grows,
the network will eventually become a performance bottleneck for distributed hash tables.
1
Introduction
Parallel computation achieves speedup over sequential
computation by sharing the computational load among
processors. The load balance between processors is central in determining the parallel runtime (though other
factors also affect performance). Unlike uniform computational tasks in which almost perfect load balance is
achieved by allocating data uniformly to the processors,
non-uniform computational tasks such as search problems pose non-trivial load balancing problems.
In most non-uniform tasks, worst-case computational complexity is far larger than average-case complexity; and the W0rst case is usually a very rare
case. Thus, the study of average case performance
is important, and it has been conducted for sorting and searching [Knuth 1973], optimization problems [Coffman and Lueker 1991], and many others
[Vitter and Flajolet 1990].
However, there seems to
have been little work on average-case performance analysis in regard to parallel algorithms, especially on
highly-parallel computers, a notable exception being
[Kruskal and Weiss 1985].
In this paper, we study the average-case load balance
of distributed hash tables on highly parallel computers.
A distributed hash table is a parallelization of a hash table, in which the table is divided into subtables of equal
size to be allocated to the processors. It can handle a
number of search/insert operations simultaneously, increasing the throughput up to p times that of the sequential version, where p is the number of processors.
However, in average cases, the peak throughput is not
attained due to load imbalance. Intuitively, the more
buckets allocated to each processor, the better the average load balance becomes. It is clear that under a constant load factor a = n/m (n is the number of elements
in the table, m is the table size), m must grow at least
linearly in p to balance the load. We shall investigate the
necessary / sufficient rate of growth of m relative to p so
that the load balance factor-the average processor load
divided by the maximum processor load-approaches 1
as P' -+ 00. It turns out that linear growth is not enough,
but that moderate growth-namely, w(p log2 p )-is sufficient. This means that the distributed hash table is
a data structure that can exploit the massive computational power of highly parallel computers, with problems
of a reasonable size.
We also briefly discuss communication overheads on
multicomputers, and find that, in the case of mesh multicomputers, unless the network channel bandwidth grows
sufficiently as p grows, the network will eventually become a performance bottleneck for distributed hash tables.
The rest of the paper is organized as follows. Section 2 describes the distributed hash table and defines the problem we shall analyze. The terminology
of average-case scalability analysis is introduced in Section 3. The analysis of load balance is presented in Section 4. The full proofs of the propositions appear in
[Kimura and Ichiyoshi 1991]. The communication overheads are considered in Section 5. The last section summarizes the paper.
870
2
2.1
Distributed Hash Tables
Distributed Hash Tables
The distributed hash table is a parallelization of the hash
table. A hash table of size m = pq is divided into subtables of equal size q and the subtables are allocated to p
processors. The two most simple bucket allocations are:
The block allocation
The k- th bucket (k ~ 1) belongs to the ( l( k-1) j qJ +
l)-th subtable/ and
The modular allocation
The k-th bucket (k ~ 1) belongs to the (((k-1) mod
p) + 1)-th processor.
At the beginning of a hash operation (search or insert)
for an element x, the hash function is computed for x to
generate a number h (1 ::; h ::; m), and the element (or
the key) is dispatched to the processor which contains
the h-th bucket. The rest of the operation is processed
at the target processor.
For better performance, it is desirable to maximize the
locality. Thus, when the indirect chaining scheme is employed for hash collision, the entire hash chain for a given
bucket should be contained in the same processor which
contains the bucket. With open addressing, linear probing has the best locality (under the allocation scheme
(1» but its performance degrades quickly as the load factor increases. Other open addressing schemes have better
sequential performance characteristics [Knuth 1973], but
have less locality. For this reason and also for simplicity of analysis, we choose the indirect chaining scheme.
The bucket allocation scheme does not influence the load
balance analysis in this case.
The absence of a single entry point that can become
a bottleneck makes the distributed hash table a suitable
data structure for highly parallel processing. The peak
throughput increases linearly with the number of processors. The problem is: \iVhen does the "real" performance
approach the "peak" performance? When elements are
evenly distributed over the processors, linear growth in
the number of data elements is sufficient for linear growth
in performance. On the other hand, in the worst case,
all elements in the hash table might belong to a single
subtable so that performance does not increase at all.
We are not interested in these two extremes, but in average performance, just as we are more interested in the
average complexity of hash operations in sequential hash
tables rather than worst-case complexity.
lWhen p does not divide m, taking q = fm/pl works but it
may lead to a sub-optima.! load balance (e.g., consider the case
m = p + 1). A better load balance can be realized by a mapping
function which is a little more complicated than simple division.
2.2
Problem Definition
There can b·e a number of uses of hash tables depending on the application. Here we examine the following
particular use of the hash table.
Concurrent Data Generation, Search and Insertion
Initially, there is an "old" distributed hash table containing "old elements" and an empty "new" distributed
hash table. The old and new tables are of the same size
m = pq (p is the number of processors and q is the number of buckets assigned to each processor) and use the
same hash function. Also, some "seeds" of new elements
are distributed randomly across the processors.
(1) Concurrent Data Generation
Each processor generates "new elements" from the
allocated seeds. It is assumed that the time it takes
each processor to generate new elements is proportional to the number of generated elements.
(2) Concurrent Data Dispatch
Each processor computes the hash values of the new
elements and dispatches the elements to the target
processors accordingly.
(3) Concurrent Search
Each processor does a search in the old table for
each of the new elements it has received.
(4) Concurrent Insert
Each processor inserts those new elements that are
not found in the old table into the new table. No interprocessor communication arises, because the old
and new hash tables use the same hash function.
The above usage may seem a little artificial, but the
probabilistic model and the analysis for it should be easily applicable to other usages. In the analysis of load
balance, the data dispatch step is ignored (equivalently,
instantaneous communication is assumed). This is discussed in Section 5.
3
Scalability Analysis
Average Speedup and Efficiency We. denote the sequential runtime by T(l) and the parallel runtime using p processors by T(p). The speedup is defined by
S(p) = T(l)jT(p), and the efficiency by E(p) = S(p)/p.
Efficiency is the ratio between the "real" performance
(obtained for a particular problem instance) and the
"peak" performance of the parallel computer. In the
absence of speculative computation, the efficiency is less
than or equal to 1.
871
f is an asymptotic super-isoefficiency function if it is
an asymptotic super-isoefficiency function for some
E > 0, i.e., the efficiency is bounded away from 0 as
Since we intend to engage ourselves in an average-case
analysis, we need to define the "average speedup" and
the "average efficiency".
p --+
Definition 1 We define the average collective speedup
O"(p) by E(T(l))/E(T(p)) (E(X) denotes the expectation of X) and the average collective efficiency 'T} (p) by
O"(p)/p.
The reason why we analyze the above defined average collective speedup, and not the expected speedup
in the literal sense-E(T(l)/T(p))-is that: (1) it is
much simpler to analyze E(T(l))/E(T(p)) than analyze E(T(1)jT(p)), and (2) in cases where any average
speedup figure is meaningful our definition is a better
indicator of overall speedup. Suppose we run a number of instances 11 ,12 , .. . from some problem class, then
the collective speedup defined by L-i T(l,Ii)j L-i T(p,Ii)
(T(l,I;) and T(p,I;) are sequential and parallel runtimes for problem instance Ii) and represent overall
speedup. This is more meaningful than anyone of
arithmetical mean, geometric mean, or harmonic mean
that may be calculated from the individual speedups
T(l, Ti)/T(p, Ti).
Scalability Analysis and Isoefficiency We would
like to study the behavior of 'T}(p) as p becomes very large.
In general, for a fixed amount of total computation W,
'T}(p) decreases as p increases, because there is only finite
parallelism in a fixed problem. On the other hand, in
many parallel programs, for a fixed p, 'T}(p) increases as
HI grows. KU111_ar and Rao [1987] introduced the notion
of isoefficiency: if HI needs to grow according to f(p) to
maintain an efficiency E, then f (p) is defined to be the
isoefficiency function for efficiency E. A rapid rate of
growth in the isoefficiency function indicates that nearpeak performance of a large-scale parallel computer can
be attained only when very-sometimes unrealisticallylarge problems are run. Such a parallel algorithm and/or
data structure is not suitable for utilizing a large-scale
parallel computer. (We will refer to the isoefficiency by
this original definition by exact isoefficiency.)
Since it is sometimes impossible to maintain an exact
E because of the discrete nature of the problem, the
following weaker definitions of isoefficiency may be more
suitable or easier to handle.
Asymptotic Isoefficiency f is an asymptotic isoefficiency function for E if
lim 'T}(p)
p->co
=E
under TiV
= f(p).
Asymptotic Super-Iso efficiency f is an asymptotic
8upel'-isoefficiency function for E if
lim
inf 'T}(p) 2: E
p--+co
under VV
= f(p).
00.
An exact isoefficiency function for E is an asymptotic
isoefficiency Junction for E; and an asymptotic isoefficiency function for E is an asymptotic super-isoefficiency
function for E.
In the analysis of load balance, we study the balance of essential computation. Essential computation is
the total computation performed by processors excluding the parallelization overheads. The amount of essential computation is equal to pT(p) minus the total overhead time spent on things such as message handling and
idle time. In the absence of speculative computation, we
can identify the amount of essential computation with
the sequential runtime. 2 The terminology for load balance analysis is defined like that for speedup/efficiency
analysis, except that "essential computation" replaces
"runtime": the total essential computation corresponds
to sequential runtime; maximum processor load corresponds to parallel runtime; and load balance factor3
corresponds to efficiency. We use the same terminology for isoefficiency functions. In the following analysis,
we study asymptotic isoefficiency for 1 and asymptotic
super-isoefficiency. (Since we are not dealing with exact isoefficiency, we drop the adjective "asymptotic" for
brevity.)
4
4.1
Analysis of Load Balance
Assumptions
For the sake of probabilistic analysis, we consider a model
in which the following values are treated as random variables (RVs): the number of old and new elements belonging to the j - th bucket on the i-th processor (1 ::; i ::; p,
1 ::; j ::; q) denoted by Aij and Bij respectively, and the
number of new elements generated at the i-th processor
denoted by Gi .
First, we make some assumptions on the distributions
of these random variables. The two alternative models of h~sh tables are the Bernoulli model in which the
number of elements n inserted in m buckets is fixed
(a = n/m) and the probability that an element has
a given hash value is uniformly l/m, and the Poisson model in which the occupancy of each bucket is
an independent Poisson random variable with parameter a [Vitter and Flajolet 1990]. We choose the Poisson model, because it is simpler to analyze directly, and
because, with regard to the distributions of maximum
2If we ignore various sequential overheads such as cache miss,
process switching, and paging.
3Not to be confused with the load factor of hash tables.
872
bucket occupancy in which we are interested, those under the Bernoulli model approach those under the Poisson l11.odel as m ~ 00 [Kolchin et al. 1978J.
For a similar reason, we assume that G i (1 ~ i ~ p) are
independent identically distributed (i.i.d.) random variables having a Poisson distribution with some parameter
,. It follows that the total number of new elements has
and by the asa Poisson distribution with parameter
sumption on the hash function, Bi/s are i.i.d. random
variables having a Poisson distribution with parameter
fJ = p, / m = ,/ q. We assume that load factors a and
fJ of the old and new hash tables are constant (do not
change with p, q).
To summarize, Aij and Bij are i.i.d. random variables
having a Poisson distribution with parameters a and fJ,
and G i are i.i.d. random variables with a Poisson distribution with parameter qfJ. Note that Gi's and Bk'S are
not independent because 2:i G i = 2:ij B ij .
p"
Thus, the total essential computation is:
L
l~i~p
(ignoring the constant factor).
As for the search step, some searches are successful
(the new element is found in the old table) and others
are unsuccessful. For simplicity of analysis, we choose a
pessimistic estimate of the essential computation and assume that all searches are unsuccessful. We also assume
that an unsuccessful search involves comparison of the
new elements against all the old elements in the bucket.
Thus, the number of comparisons made by an unsuccessful search in the bucket with Aij elements is Aj + 1 (the
number of elements plus one for the hash table slot containing the pointer to the collision chain). Therefore, the
essential computation of the search step is:
Hisearch
=
L L
(Aij
+ 1)Bij .
l::;i::;p l::;j::;q
(again ignoring the constant factor).
\~Te make a similar assumption for the insert step: every insert is done after an unsuccessful search in the new
table. Thus, the essential computation of the search step
for bucket j on processor i is:
L
(l + 1) =
Bij(Bij
+ 1)/2,
0:SI:SBij-l
and the total essential computation for the search step
IS
Vliinsert
=
L L
l~i:Sp l~j~q
Bij(Bij
+ 1)/2.
+ WI' + W:"),
where
WI = Gi , WI' = 2:1:S;j:S;q(Aij + l)Bij ,
WI" = 2:1:S;j~q BiABij
and
+ 1)/2.
The maximum processor load is
W(p) = l~i:S;p'
max(W~
+ W~' + W~")
t
,
The
average
load
balance
factor 7](p) is E (W(1)) /pE (W(p)). We would like to
know what rate of growth of q is necessary/sufficient so
that 7](p) ~ 1 as p ~ 00.
Since
E (
Essential Computation and Load
Balance Factor
Since each data generation is assumed to take the same
time, the essential computation of the data generation
step is:
lVgen =
Gi.
(W:
l~i~p
L
(WI
l~i~p
4.2
L
W(l) =
E (
+ WI' + WI"))
L
WI)
l~i~p
+E
(L
l~i~p
WI')
+E
L
(
WI")
l$i:S;p
+ E (W;') + E (W;")) ,
p(E (W;)
and
E
(m~x(W: + WI' + WI"))
l~t:S;p
~ E (m~x WI) + E (m~x WI') + E (m~x Wf"),
l~t~p
l~t~p
l:S;t~p
we have
7](p)
+ E (W")
+ E (Will)
1
1
.
+ E (m~x WI') + E (m~x WI")
E (W')
~
E
(m~x WI)
l~t~p
1
l~t~p
l~t~p
Thus, if
E (max
l$i~p
WI)
E(WD,
E (max WI')
E (Wn, and
E(max W~")
E (W;")
l$i~p
l$i~p
t
(as p ~
00),
then 7](p) ~ 1. The above are also necessary conditions,
because all three summands are significant as p ~ 00.
The random variable Gi , having a Poisson distribution
with parameter qfJ, has the same distribution as the sum
of q i.i.d. random variables Hij (1 ~ j ~ q) with a Poisson distribution with parameter fJ. Thus, we are led to
the study of the average maximum of p sums of q i.i.d.
random variables Wij (1 ~ i ~ p, 1 ~ j ~ q) with a
distribution that does not change with p and q. In our
distributed hash table example, we are interested in the
cases in which each W ij is either a Poisson variable, the
product of two Poisson variables, or a polynomial of a
Poisson variable.
873
4.3
Average Maximum of Sum of i.i.d.
Random Variables
We give sketches of the proofs or cite the results. The
details are presented in [Kimura and Ichiyoshi 1991].
4.3.1
Poisson Variable
The asymptotic distribution of the maximum bucket
occupancy has been analyzed by Kolchin et al.
[1978].
The following is the result as cited m
[Vitter and Flajolet 1990].
Theorem. 1 (Kolchin et al.) If Xi (1 ~ i ~ p) are
i. i. d. random variables having a Poisson distribution with
parameter /-l) the expected maximum bucket occupancy is
Ai
J.1
where
= E (max Xi)
l::;i::;p
b is
rv
1
(b + I)! - p
e-J.1/-lb
b!
Lemma 2 Let Xi (1 ~ i ~ p) be i.i.d. random variables
distributed as X. For all ai 2 0 (1 ~ i ~ p) such that
al + ... + a p = 1) alX I + ... + apXp -< X.
SKETCH OF PROOF: Let all az 2 0 and al + az = l.
For arbitrary c 20, max{alXI +azXz,c}+max{a1XZ +
azXt,c} ~ max{X1,c}+max{Xz,c}. The expectation of
the left hand side is equal to 2E (max{ alXI + azXz , c}),
and that of the right hand side is
E (max{XI' c} + max{Xz,c})
= E (max{XI' c}) + E (max{Xz,c})
= 2E(max{X,c}).
Lemma 3 Let Xi (1 ~ i ~ r s) and
i.i.d. random variables. We have
= 8(1)) b rv logp/loglogp.
The proof is based on the observation that, as p becomes large, P {MJ.1 > b} as a function of b approaches
the step function having value 1 for b smaller than band
o for b larger than b, and the expectation of MJ.1 is equal
to its summation from b = 0 to b = 00.
Vie extend Kolchin's theorem to the product of Poisson
variables and polynomials of a Poisson variable.
4.3.2
For the convex sum of i.i.d. variables, we have the
following lemma.
Thus, alXI + azXz -< X. The case for p > 2 can be
reduced to p - 1 using the above.
0
Finally, the following lemma gives an upper bound on
the sum of the product of two sets of i.i.d. random variables.
- - - - < - < ---.
When /-l
max{XI,XZ, .. . ,Xp} -< max{Yi,Xz, ... ,Xp}
-< ... -< max{Yi, ... , Yp} 0
{b/-l i!if /-l/-l == w(1ogp))
o(logp)j
an integer gl'eater than /-l such that
e-J.1/-lb+1
SKETCH OF PROOF:
Product of Two Poisson Variables
We introduce a partial order on the class M of nonnegative random variables with a finite mean.
XlYi
-< Y iff
Lemma 1 Let Xi (1 ~ i ~ p) and ii (1 ~ i ~ p)
be i.i.d. random variables distributed as X and Y. If
X -< Y) then
r s) be
+ ... + XrsYrs -< (Xl + ... + Xr)(Yi + ... + Ys).
+
Theorem 2 Let Xi (1 ~ i ~ q) and ii (1 ~ i ~ q) be
i. i. d. having a Poisson distribution with parameter a and
;3. If q = w(1ogZ p)) then
(as p
(~i't" lEo Xi; Yij)
-+
00).
SKETCH OF PROOF:
There are a number of natural properties concerning
this partial order. For example, if X -< Y and Z is
independent of X, Y, then X + Z -< Y + Z, xz -< YZ,
max{X, Z} -< max{Y, Z}, etc. Note X -< Yi and X -< Yz
do not imply 2X -< Yi +Y;. The utility of -< in analyzing
the expected maximum is illustrated by the following
lemma.
~ i ~
SKETCH OF PROOF: We can prove XlYi + ... XtYt+Z -<
Xl (Yi + ... +Yt) Z (Z independent of XiS, iis) by conditioning Z and using Lemma 2. By repeatedly "collecting" the Xij iij's and replacing them with the bracketed
0
terms, we have the desired result.
E
Definition 2 For X, Y E M, we define X
E (max{X, c}) ~ E (max{Y, c}) for all c 2 o.
ii (1
E (max
l::;i::;p (X·I}':;1
t
t
Let q = r 2 ,
+ ... + X·tq }':;tq ))
~
E (max
l::;i::;p (X·t I + ...
~
E
+ X·tr )(}':;l + ... + }':;tr ))
t
(m~x(Xil
+ ... )) E (m~x(YiI
+ ... ))
l::;t::;p
l::;t::;p
by the Lemma 2 and 3. The sum of r i.i.d. Poisson
variables with parameter a is distributed as a Poisson
variable with parameter ra. Thus, if r = w(logp), then
E
(~t;~(Xil + ... + X ir ))
rv
ra
= E (Xn + ... + X 1r )
by Kolchin's theorem. This is similar for the sum of iij.
This is what we needed.
0
874
4.3.3
Polynomial of Poisson Variable
The treatment of upper bounds on the expected maximum of the sums of a polynomial of i.i.d. random variables is more involved. We only list the result.
Theorem 3 Let Xi (1 :::; i :::; q) be i. i. d. having a Poisson dist1'iblltion with parameter 0, and c(X) be a polynomial of degree d > 0 with non-negative coefficients. If
q = w(logd p), then
(as p --T (0), where c(X) = adX(d) + ... + aiX(l) + a~,
and c*(X) = adXd + ... + aiX 1 + a~ (X(k) = X(X 1)··· (X - k + 1) is the falling power of X).
As for corresponding lower bounds on the necessary
growth rate of q, we only know at present that if
q = o( (log p / log log p)2), the ratio between the expected
maximum and the mean tends to 00 as p --T 00.
4.4
The Isoefficiency for Load Balance
Now, let us suppose q = w(log2 p). Then,
is immediate from Kolchin's theorem. Also,
1024, and 4096 and q = 1, 19p, Ig2 p, and Ig3 p (lg denotes the logarithm with base 2). The experimental load
balance factors (on the vertical axis) are plotted against
the number of processors (on the horizontal axis). The
experimental load balance factor "'p,q for p, q are calculated by
E (2:\$j$q WIi )
"'p,q = ~============~~
max1$i$p 2:1$j$q Wij ,
where Wij is one of Xij, Xii Yij and Xi~) «a), (b)
and (c), respectively in the figure), and the average
max1$i$p 2:1$j$q Wij is calculated from the result of 50
simulation runs.
Xii and Yii are generated according to the Bernoulli
model (i.e., a table X[l..pq] is prepared, and n = pqa.
random numbers x's with x ~ 0 were generated, each x
mod p) + I)-th table entry, etc.). The.
going to the
coefficient of variation (the ratio of standard deviation
to average) of max1$i$p 2:1$j$q W ij is larger for X(2) and
XY than for X, and it decreases as p becomes larger or q
becomes larger. Table 1 gives the coefficients of variation
for p = 64 and 4096.
By and large, the results seem to confirm the asymptotic analysis. For the product and the second falling
power, 0 (log2 p) appears to be a sufficient rate of growth
of q for 'T/ to converge to 1. Even logarithmic growth
(q = 19 p) does not lead to very poor load balance factors at least up to p = 4096 (approx. 0.5 for XY and
approx. 0.4 for X(2).
«x
5
by Kolchin's theorem and the proposition for the product
of two Poisson variables. Finally, since X(X + 1)/2 is a
polynomial of degree 2,
if q = w(1og2 p).
vVe ha.ve shown that if q = w(log2 p), the average collective load balance factor 'T/(p) --T 1 as p --T 00. Therefore, HI = 0(pq) = w(plog2 p) is a sufficient condition
for isoefficiency for 1.
4.5
Simulation
Communication Overheads
We briefly discuss the communication overheads when
distributed hash tables are implemented on multicomputers. A multicomputer (also referred to as a
distributed-memory computer and a message-passing
parallel computer) consists of p identical processors connected by some interconnection network. On such computers, the time it takes to transfer a message of length
L (in words) from a processor to another which is D
hops away in the absence of network contention4 is
ts+thD+twL, where ts is the constant start-up time, th is
the per-hop time, and tw is the per-word communication
time. We choose the mesh architecture for consideration (two-dimensional square meshes in particular) since
many of the recent "second generation" multicomputers
have such topologies. Examples include J-Machine, Intel
Paragon, and parallel inference machines Multi-PSI and
PIM/m.
We note that the average traveling distance of a random message (a message from a randomly chosen processor i to another randomly chosen processor i', allowing
i = if) is 3(JP- )p)
3VP on the meshes. It is roughly
f'V
A simple simulation program was run to test the applicability of the asymptotic analysis for p up to 4096. Fig. 1
shows the results for 0 = f3 = 4, p = 4, 16, 64, 256,
4Communication latency in the absence of network contention
is called zero-load latency.
875
(8) X
(b) XV
Load
1.0
Balance
~
Factor 0.8 .......... /
~
...... ........•. ....................
~......
(c) X(X-1)
1.0..--------------,
1.0..--------------,
0.8 ........·.............•.....·••····••••.. •......,,· ......... ·a
0.8 ..•.............•..............••..•.••..•....•.............•............•.......
~C
0.6 .•.••••...
~
0.6 ••.••.•..• ~•.....
0.4+---~=--~-----;
0.4+---~---------;
0.2 ................................................................................ .
-0-- q
(Ig p)A2
0.2 ............................................................................. .
=
q =Igp
q=1
O.O+----~--....------.-....j
1
10
100
1000
Number
10000
0.4+--~----=-..::::::::===------l
0.0+----------------1
1000
10000
1
10
100
O.O+-------------...j
1000
10000
10
100
1
0' Proceaaora
Figure 1: Experimental Load Balance Factors (0: = (3 = 4)
Table 1: Coefficients of Variation of Maximum Load (0:
X
XY
X~2)
q=1
11.0%
17.8%
24.8%
p = 64
q=6 q = 36
6.3%
2.6%
12.2%
5.0%
13.1%
6.0%
1/3 of the diameter of the network, which is 2(.JP - 1).
We can easily see that HI = n(p3/2) is a necessary and
sufficient condition for super-isoefficiency due to zeroload latency, which is a situation worse than that due to
load imbalance.
In real networks, the impact of message collisions must
be taken into account. Instead of estimating the time
required for data dispatch using a precise model of contention, we compare the amount of traffic generated by
random communication and the capacity of the network.
The traffic of a message is defined by the product of its
traveling distance and its length. It indicates how much
network resource (measured by channel x network cycle
time) the message consumes. The capacity of a network
is defined by the sum of the bandwidth of all network
channels (channels that connect routers). It indicates
the peak throughput of the netw-ork. The basic fact is
that the time required for completely delivering a set of
messages is at least !'I1/ C, where !'I1 is the total message
traffic and C is the capacity of the network.
The average traffic generated by L:l.
success
reduct ion = 57
Q=Cappend(Ca.b.c] .Cd] .@17)]
A=ClI2,3]
B=Ca.b,c.d]
?_ r
~%
Sample program
""
test (Q,A.B):- true:
append(Cl,2l.C3].A) •
geLq(Q) •
append ([a.b .c]. Cd] .B).
append (CH:n.V .Z):- true:
Z=CH:Z2] •
append ]") define an algebra (introduced in t~e next
section) whose terms are the states of a CHARM.
The dynamic behaviour of a CHARM is given
by a collection of rewrite rules. Every rewrite rule
R: S -+ S' maps its left-hand side S = (G,L) to its
right-hand side S' = (G, L'), both having the same
889
global part G, which is also called the global part
of R. The graphical representation of a rewrite rule
can be seen in Figure 1. The idea is that L can be
cancelled and L' can be generated provided that G
is present. Thus our notion of rewriting is contextdependent, the global part of a rule playing the role
of the context. It is worth stressing that the global
part G is not affected by the application of R, but
it is simply tested for existence. Although this goes
beyond the scope of this paper, this fact should allow
us to define a satisfactory truly concurrent semantics
for CHARM's, since it minimizes the causal dependencies among rewrite rules.
5
@
5'
R
G!)
Figure 1: A rewri te rule.
Intuitively, the global part G of R contains those
items (processes and variables) which are needed for
the transformation of the state to take place, but
which are not changed by the rewrite rule. For example, we may want to do some operation only if
some data structure contains some given information. In this case, the data structure is considered to
be global and thus it is not affected by the rewrite
rule. It is important at this point to notice that,
unlike our app~oach, many transition systems or abstract machines proposed in the literature (like Petri
nets [Reisig 1985] and the Chemical Abstract Machine [Berry and Boudol 1990]) cannot distinguish
between the situation where some item is preserved
by a rewrite rule, and the one where the same item
is cancelled and then generated again. For example,
the rule "a rewrites to b only if e is present" must be
represented in those formalisms as {a, e} -+ {b, e},
which also represents the rule "a and e rewrite to b
and e".
On the other hand, some other formalisms explicitly consider the issue of context-dependent rewriting, and allow one to formally indicate which items
should be present for the application of a rule, but
are not affected by it. For example, in the algebraic approach to graph grammars [Ehrig 1979],
the role of the context is played by the so-called
"gluing graph" of a graph production, while in
concurrent constraint progranlming [Saraswat 1989]
[Saraswat and Rinard 1990] [Saraswat et al. 1991],
the items the presence of which must be tested are
explicitly mentioned by the use of the "ask" primitive. The relationship between the CHARM and
these two formalisms will be explored deeply in later
sections of this paper.
A rewrite rule R from S = (G,L) to S' = (G,L')
can be thought of as modelling the evolution of a
(small) subsystem, represented by its left hand side
S. To apply this rule to a given state Q = (G Q , L Q ),
one first has to find an occurrence of S in Q, i.e.,
a subsystem of Q "isomorphic" to S (as we shall
see later, this requirement will be relaxed in the formal definitions). Following the usual intuition about
structured systems, it is evident that all the items
which are local for a part of a system are local for the
whole system as well, while items which are global
for a subsystem can be either global or local for any
enclosing system. In the application of a rule like
the one above, this observation is formalized by requiring that the occurrence of Lin Q is contained in
its local part L Q •
The application of R to a system Q yields a new
system Q' = (GQ"LQ'), where GQ' = GQ, i.e., the
global part remains unchanged, and LQ' = (LQ \
L) U L'. In words, the local part of the new state
coincides with the local part of the old one, except
that the occurrence of L has been replaced by an
occurrence of L'. Thus, the part oHhe state Q which
is preserved by the application of R is partitioned in
two parts: the occurrence of G, which is necessary
to apply R, and the rest, which does not take part
in the rewriting. The graphical representation of the
application of R to Q can be seen in Figure 2.
The fact that the application of a rewrite rule preserves the global part of the state can be justified by
interpreting the items contained in the global part
as an interface for a possible composition with other
states. Thus, such an interface cannot be modified
by any rewrite rule, since a rewrite rule is local to
the rewritten state. Notice that, as a consequence of
the above considerations, a closed system, i.e., a system which is not supposed to be composed further,
is represented by a state with no global part.
Q
Q'
(]@ @)
GQ
LQ
Figure 2: The application of a rewrite rule.
The above construction describes the application
of a single rewrite rule of a CHARM to a state. However, this mechanism is intrinsically concurrent, in
the sense that many rewrite rules may be applied in
890
parallel to a state, provided that their occurrences
do not interfere. In particular, if the occurrences
of the rules are pairwise disjoint, we have a degree of parallelism which is supported also by many
other models of concurrent computation, like Petri
nets [Reisig 1985], the Chemical Abstract Machine
[Berry and Boudol 1990], and the concurrent rewriting of [Meseguer 1990]. However, our approach provides a finer perception of the causal dependencies among rewrite rules, because rules whose occurrences in a state are not disjoint but intersect only on
their global parts can be considered not to depend
on each other, and thus can be applied concurrently.
This fact reflects the intuition, since such rules interact only on items which are preserved by all of-them.
This corresponds to what is called "parallel independence" of production applications in the algebraic
theory of graph grammars [Ehrig 1979], which can in
fact be faithfully implemented within the CHARM
framework, as we will see in Section 4.
From a technical point of view, the application of one or more rules of a CHARM is
modelled by extending the algebra of states to
the rules (for similar approaches in the case
of Petri nets or structured transition systems
see [Meseguer 1990], [Corradini et al. 1990] and
[Meseguer and Montanari 1990)). As we shall see in
Section 3, this is possible because each rule has an
associated global part, just like states. The resulting
algebra, called the algebra of tmnsitions, contains,
as elements, all theTewrite rules of the abstract machine, an identity rule 5 : 5 -+ 5 for each state
5, and it is closed w.r.t. parallel composition, hiding, and substitution operations. The left and the
right-hand sides of a transition (i.e., of an element
of the algebra of transitions) are easily obtained by
structural induction from its syntax. For example,
if R : 5 -+ Q and R' : 5' -+ Q' are two rewrite rules, .
then R I R' : 5 I 5' -+ Q I Q' is a new parallel transition. Like rewrite rules, transitions preserve the
global part of the state they are applied to.
In this paper we will assume that the algebra of
transitions is freely generated by the set of rewrite
rules defining a CHARM, and by the identity rules
for all states. As a consequence of this fact, if a
transition can be applied to a state 5, then it can be
applied to any state containing 5 as well. Informally,
this can be considered·as a meta-rule governing the
behaviour of a CHARM, and directly corresponds
to the so-called "membrane law" of the Chemical
Abstract Machine [Berry and Boudol 1990].
Although the choice of a free algebra of transitions
is satisfactory for the formalisms treated in this paper, more general kinds of algebras would be needed
in order to deal with other formalisms, like for example process description languages [Milner 1989]
[Hoare 1985]. In fact, some features of those languages (e.g., tIle parallel composition of agents
wi th synchronization in the presence of restriction, and the description of atomic sequences of
actions, useful to provide a low-level implementation of the non-deterministic choice operator "+"
[Gorrieri et al. 1990]
[Gorrieri and Montanari 1990)) cannot be modelled adequately within a free algebra of transitions. Nevertheless, as shown in [Ferrari 1990] and
[Gorrieri and Montanari 1990] respectively, both
those aspects can be faithfully modelled in an algebraic framework by labelling transitions with observations which include an error label and by specifying suitable algebraic theories of computations where
the atomic sequences are basic operators. Thus we
are confident that, although this goes beyond the
scope of this paper, these topics could be fruitfully addressed in the algebraic framework introduced here, by slightly generalizing the construction
of the algebra of transitions of CHARM's presented
in the next section.
A computation of a CHARM is a sequence of transitions, starting from a given initial state. Since each
transition preserves the global part of its left-hand
side state, the final state of a computation has the
same global part as the initial state. Thus every
computation is naturally associated with a global
part as well. As for transitions, this will a.llow us to
define an algebra of computations, having the same
operations of the algebra of states, plus a sequential composition operation denoted by"; ". The elements of the algebra of computations are subject to
the same axioms as for states, plus some axioms stating that all the operations distribute over sequential
composition. Thus 'vve have a rich language of computations, where some computations can be proved
to be equivalent by using the axioms.
The interesting fact is that the algebra of computations allows one to relate the global evolution of
a closed system to the local behaviour of its subsystems. For example, suppose to consider the closed
system P = (5 I 5')\x, where the two subsystems 5
and 5' cooperate through the common global variable x which is hidden by t.he use of the \x operat.or.
Furthermore, consider the computations p : 5 => Q
and p' : 5'
Q' for 5 and 5' respectively. Then,
by using the algebra of computations it is possible
to construct the computation a = (p I p')\x which
models the evolution of the closed system P, i.e.,
a: P => P', where P' = (Q I Q')\x.
The algebra of computations provides a.lso some
*
891
basic mechanisms which should allow to model process synchronization. In fact, consider for example
the two computations a = (p I p')\a: and a' = (p\x) I
(p'\x). Now, a =/:. a', since p and p' can synchronize
through the common variable :r in (J, but not in (J'.
Another relevant advantage of the definition of the
algebra of computations of a CHARM consists of the
possibility of providing a truly concurrent semantics
in a natural way. In fact, computations differing only
in the order in which independent rewrite rules are
applied fall within the same equivalence class. For
example, considering again the computation a' introduced above, we .have that a' = (p \x) I (p'\x) =
((p\x) I 5') ; (Q I (p'\x)), where 5' and Q stand
for the identity computations on such states. This
means that, since p\:r and p'\x are independent,
they can be performed either in parallel or sequentially, and the two resulting COll1put C1 tic>lls are equivalent.
With each equivalence class of computations it
is possible to associate a partial ordering, recording the causal dependencies among the rewrite rules
used in the computations. For the two formalisms
we shall consider in this paper (i.e., graph grammars and concurrent constraint programming) the
truly concurrent semantics obtained via their translation to a CHARM is significant. In fact, it is
possible to show that some of the classical results
about concurrency and parallelism in graph grammars directly derive from the axioms of our algebra.
Also, the truly: concurrent semantics pro·posed in
[Montanari and Rossi 1991] for the concurrent constraint programming framework coincides wi th the
one induced by its compilation into a. CHARM. However, the true concurrency aspects go beyond the
scope of this paper.
3
Formal Definitions
In this section we present the formal descriptipn of a
CHARM, following the outline of the informal presentation given in the previous section. After introducing the algebra of states, a. CHARM will be
defined as a collection of rewri t.e rules over this algebra which preserve the global part of a term. Next
we will introduce the algebra of transitions and the
algebra of computations of a CHARM, respectively.
The states of a CHARM are going to be represented by the terms of an algebra 5, which is parametric w.r. t. a fixed pair of disjoint infinite collections (P, V), called process instances and va.riables
respectively. The terms of this algebra are subject
to the axioms presented below in Definition 3.
Definition 1 Let P be a set of process instances
(ranged over by p, q, ... ), and V be a set of variables
(ranged over by v, z, ... ). Each x E (P U V) is
called an item. The algebra of states 5 is the alg~bra
having as elements the eq'lLivalence classes of terms
generated by the following syntax, mod'll.lo the least
eqiLivalence relation indnced by the axioms listed in
Definition 3:
5 ::= 0 1 v 1 P(Vl,""Vn ) 1 S 15 1 S[ll 5\x
where v, VI, ..• , Vn E V,. pEP; "I" ·is called
parallel composition,. is a (finite domain) substit1Ltion, i.e., a function : (P U V) -+ (P U V) such
that (V) ~ V, (P) ~ P, and s1Lch that the set of
items for which x=/:. ( x) is finite; and x is an item
(\x is called a hiding operator). Term of the form
OJ v, or p( VI, . .. , Vn ) are called atoms .•
Intuitively, 0 is the empty system, v is the system containing only one variable, and p( VI, ... , Vn )
represents a system with one process which has access to n variables. The term 51 I 52 represents the
composition of system 51 and system 52, 5[] is the
system obtained from state 5 by renaming its items,
and 5\x is the system which coincides v,rith system
5 except that item x is local.
Definition 2 Given a term 5 E 5, its set of
free items F(5) is inductively defined as F(O) =
0; F(v) = {v}; F(P(Vl'''' ,vn )) = {p,Vl,'" ,vn };
F(51 I 52) = F(5I) U F(52 ); F(5[]) = (F(5))
= {(x) I x E F(5)}; and F(5\x) = F(5) \ {x} if
:1; E F(5) and F(5\x) = F(5) otherwise. A term 5
is closed iff F(5) = 0. A term is concrete if it does
not contain any hiding operator. A term is open if
no variable appearing in the term is restricted. Forma.lly, all atoms a.re open; 51 I 52 is open if both
51 and 52 are open; 5 [] is open if 5 is open; and
5\x is open if 5 is open and x tJ. F(5). Clearly, all
concrete terms are open .•
The free items of a term 5 are the process instances and the variables of the global pa.rt of the
system represented by 5. Thus a closed term represents a system with no global part, while an open
term corresponds to a system where everything is
global. The above interpretation of the operators of
the algebra of states is supported by the following
axioms, which determine when two terms are equivalent, i.e., represent the same system.
Definition 3 The terms of algebra 5 introd1Lced in
Definition 1. are subject to the following conditional
axioms.
ACI: (51 152 ) 15'3 = 5'1 1(52 153); 8 1 152 = 5 2 151; 5
0=5'
I
892
ABS: p(v!, ... , Vn) I Vi = P(Vl, ... , vn), for 1 ~ i ~ 11,;
SIS = S, if S is open
COMP: S[] = p[<1>]; 0"[<1>]
(pjO")\x = p\x;O"\x.
893
4
Modelling graph grammars
The "theory of graph grammars" studies a variety
of formalisms which extend the theory of formal languages in order to deal with structures more general
than strings, like graphs and maps. A graph grammar allows one to describe finitely a (possibly infinite) collection of graphs, i.e., those graphs which
can be obtained from an initial graph through repeated application of graph productions. In this
section we shortly show how to translate a graph
grammar into a CHARM which faithfully implements its behaviour. Because of space limitations,
the discussion will be very informal: a more formal presentation of this translation can be found in
[Corradini and Montanari 1991].
Following the so-called algebraic approach to
graph grammars [Ehrig 1979], a graph production
p = (L ~ I': ~ R) is a pair of graph monomorphisms having as common source a graph E, the
gl'uing graph, indicating which edges and nodes have
to be preserved by the application of the production. Throughout this section, for graph we mean
unlabelled, directed hypergraph, i.e., a triple G =
(N, E, c), where N is a set of nodes, E is a set. of
edges, and c : E ~ N- is the connection /nnction
(thus each edge can be connected to a list of nodes).
Production p can be applied to a graph G yielding
H (written G ~p H) if there is an occurrence (i.e ..
a graph morphism) 9 : L ~ G, and H is obtained
as the result of the do'nble p'u.sho'u.t construction of
Figure 3.
K
PushOut
jh:
G:---- D
d
l'
R
PushOut
jh
b
H
Figure 3: Graph rewriting via double pushout construction.
This construction may be interpreted as follows.
In order to delete the occurrence of L in G, we construct the "pushout complement" of 9 and I, i.e., we
have to find a graph D (with morphism k : I': ~ D
and d : D ~ G) such that the resulting square is a.
"pushout". Intuitively, graph G in Figure 3 is the
pushout object of morphisms 1 and k if it is obtained
from the disjoint union of Land D by identifying the
images of J{ in L and in D. Next, we have to embed
the right-hand side R in D via a second pushout,
which produces graph H. In this case we say that
there is a direct deriva.tion form G to H via p.
A graph rewriting system is a set n of graph productions. A derivation from G to Hover n (shortly
G ~n H), is a finite sequence of direct derivations
of the form G ~PI G 1 ::;'P2 •• , ~Ptl G n = H, where
PI , ... ,pn are in n.
To define the CHARM which implements a given
graph rewriting system, we have to define the sets
of process instances and of variables (see Definition
1). Quite obviously, \ve can regard a graph as a distributed system where the edges are processes, and
the nodes are variables. Thus we consider a CHARM
over the pair of sets (£, N) which are two collections
including all edges and all nodes, respectively. The
precise relationship between the algebra of states of
such a CHARM and the graphs introduced above has
been explored in [Corradini and Montanari 1991]. It
has been shown there that concrete terms of such
an algebra (i.e., terms without hiding operators, see
Definition 2), faithfully model finite graphs, i.e., if
FGraph is the collection of all finite graphs and
CS is the sub-algebra of concrete terms of S, there
are injective functions Gr : CS ~ FGm.ph and
Tm. : FGraph ~ CS such that Gr(Tm( G)) ~ G
for each graph G. Furthermore, well-formed terms
(see Defini tion 5) model in a similar way "partially
abstract graphs", i.e., suitable equivalence classes of
graph monomorphisms, where the target graph is
defined up to isomorphism. For our goals, it is sufficient to introduce the function TV JT which associates a well-formed term with each graph monomorphism.
Definition 9 Let G = (N, E, c) be a gra.ph, with
N = {ndi:5 m , E = {ejL:5 r , and c(ei) = nil· .. ·· nikj
for (l.ll 1 :::; i :::; 1'. Then the concrete term representing G is defined as
Tm,( G)
=
nl
I ... I
nm
I
el (n11,' .. , nlk l
)
I ... I
,n r k r ) 10.
Let h : G '-+ H be (J, graph monomorphism. Then
the 'well-formed term representing h 'is defined as
€r(n r l, ...
TVJT(h)
= (Trn(H)\xl \
... \:r n )[h- 1 ]
where {Xl, ... , :rn} is the set of items of H which are
not in the image oj G thr01Lgh h, and 11.- 1 improperly denotes the s1Lbstit1ttion s1Lch that 11.-.1 (y) = X if
hex) = y, and h.-l(y) = y otherwise (which is well
defined because h is injective) .•
From the last definition it can be checked that
the global part of the well-formed term representing
a monomorphism h : G '-+ H is equivalent to the
concrete term representing G, i.e., Trn( G). Using
this observation, and since a graph production is a
pair of graph monomorphisms with common domain,
894
it is easy to associate a CHARM rewrite rule (in the
sense of Definition 6) with each graph production.
Definition 10 Let
n
be a graph rewriting system.
n,
For e~ch graph production p = (L ~ ]{ ~ R) in
its associated rewrite r'ule M (p) is defined as M (p) :
WfT(l) ~ vVJT(r). The CHARM implementing n
is defined as l\11(n) = {M(pj) I Pi En} .•
In order to correctly relate the operational behaviours of a graph rewriting system n and of its
associated CHARM M(n), we have to take care of
the translation of the starting graph of a derivation
into a term. In fact, if G is such a graph, it would not
be sound to take as starting state of M(n) the concrete term Tm( G). Indeed, we must observe that
the graph derivations informally introduced above
are defined up to isomorphism, i.e., if G =>p H, then
G' =>p H' for each G' ~ G and H' ~ H. This is due
to the fact that the pushout objects of Figure 3 are
defined up to isomorphism. As a consequence, graph
derivations actually define a relation among equivalence classes of graphs, rather than among graphs.
Such equivalence classes are faithfully represented
by closed terms of the algebra of states: using Definition 9, the class of all graphs isomorphic to G
is represented as W JT(Oc), where Oc is the unique
(mono)morphism from the empty graph to G.
The next theorem states that the translation of
a graph rewriting system into a CHARM is sound
and complete. This result is not trivial, and is
based on the fact that every transi tion of the algebra T(M(n)) (see Definition 7) represents a pair of
graph monomorphisms with common source, which
are the bottom line of a double pushout construction like the one depicted in Figure 3. We refer to [Corradini and Montanari 1991] for the formal
proofs.
Theorem 11 Let
n
be a graph rewriting system
and M(n) be the associated CHARM. Soundness:
If G is a graph and p : ItVfT(Oc) => Q is a. term
of the algebra of computations of M('R..), i.e., of
C(M(n)) (see Definition 8), then there is a. derivation G =>n H s'uch that IVfT(OH) ~ Q. Completeness: If G =>n H, then there is a. cornpnt(dion p
in the algebra. of computations of M ('R..), such that
p: W fT(Oc) => vV JT(OH ) .•
5
Modelling concurrent constraint programming
The concurrent constraint (cc) programming
paradigm [Saraswat 1989] is a very elegant framework which captures and generalizes most of the
concepts of logic programming [Lloyd 1987], concurrent logic programming [Shapiro 1989], and constraint logic programming [JaffaI' and Lassez 1987].
The basic idea is that a program is a collection of
concurrent agents which share a set of variables, over
which they may pose ("tell") or check ("ask") constraints. Agents are defin~d by clauses as the parallel composition ("II"), or the existential quantification ("3"), or the nondeterministic choice ("+"),
of other agents. A computation refines the initial
constraint on the shared variables (i.e., the store)
through a monotonic addition of information until
a stable configuration (if any) is obtained, which is
the final constraint returned as the result of such a
computation.
The cc paradigm is parametric w.r.t. the kind of
constraints that are handled. Any choice of the constraint system (i.e., kind of constraints and solution
algorithm) gives a specific cc language. For example, by choosing the Herbrand constraint system we
get concurrent logic programming, and by further
eliminating concurrency we get logic programming.
The constraint system is very simply modelled by
a pa.rtia.l information system [Saraswat et a1. 1991],
i.e. a pair < D, r>, where D isthe set of the primiti ve constraints and r ~ r( D) x D is the entailment
rela.tion which states which tokens are entailed by
vvhich sets of other tokens, and which must be reflexive and transitive. Then, a constraint is a set of
primitive constrairits, closed under entailment.
In this section we will informally show how any
cc program can be modelled by a CHARM. The
idea is to consider each state as the current collection of constraints (on the shared variables) and of
active agents (together with the variables they involve), and then to represent each computation step
as the application of a rewrite rule. More precisely,
both agents and primitive constraints are going to
be modelled as process instances, while the shared
variables are the variables of the abstract machine.
Basic computation steps are an ask operation, a
tell operation, the decomposition of 'an agent into
other agents, but also the generation of new constraints by the entailment relation. In the following,
each agent or constraint always comes together with
the variables it involves, even though we sometimes
will not say it explicitly.
In a state Q, the agent A = tell( c) ~ Al adds constraint c to Q and then transforms it.self into agent.
AI. This can be fait.hfully modelled by a rewrite rule
R from S = (G,L) to S' = (G,L') where L cont.ains
agent A, L' contains agent Al and constraint. c, and
G cont.ains the variables involved in A (since these
are t.he only items connecting A to t.he rest. of the
895
state Q). This rule may be seen in Figure 4. Note
that the fact .that c is present only in the local part
L' of 5' does not mean that c is visible only locally.
In fact the mechanism of rule application allows to
treat a'local item as a global one (see Figure 2).
5
5'
C8
e
thus be represented by a rewrite rule R from 5 =
(e, L) to 5' = (e, L'), \"rhere L is empty, e contains
C, and L' contains t. Note that I is empty, since
nothing has to be cancelled, and all items involved
are either tested for presence and thus preserved (C)
or generated (t). This rule may be seen in Figure 6.
R
en!)
e
L
In a state Q, the agent A = ask(c) - t A1 transforms itself into A1 if c is in Q and suspends otherwise. The corresponding re\'~ri te rule is R from
5 = (e,L) to 5' = (G,L'), where L contains
agent A, L' contains agent A1, and G contains c.
In fact, constraints, once generatec~, are never cancelled, since the accumulation of constraints is monotonic. Since the rewrite rule cannot be applied if
there is no occurrence of the Ihs in Q, the ask suspension is given for free. This rule may be seen in
Figure 5.
Parallel and nondeterministic composition, as well
as existential quantification of agents, are straightforwardly modelled by corresponding re\vrite rules.
Note that, in an "atomic" interpretation, tell and (J.sk
operations fail if c is inconsistent with the constraints
in Q. Our rewrite rules model inst.ead the "eventual"
interpretation [Saraswat 1989], where inconsistency
is discovered sooner or later, but possibly not immediately. Thus immediate failure is not directly
modelled. However, since the difference between the
two interpretations basically depends on the \vay the
nondeterministic choice is implemented, the specification of suitable algebraic theories, as suggested in
Section 2, could be of help for the implementation
of the atomic interpretation of the cc framework.
Each pair (C, t) E f- may be modelled by a state
change as well. In fact, in a state Q, (C, t) can be
interpreted as a tell of t whenever C is in Q, and can
G8
G L
5'
R
CB
G I'
Figure 5: The CHARM rewrite rule for the agent
A = ask( c) - t AI.
5'
CD
L'
Figure 4: The CHARM rewrite rule for the agent
A = tell( c) - t AI.
5
5
e
R
I
GJ
G L'
Figure 6: The CHARM rewrite rule for the pair
(C, t) of the entailment relation f-.
In summary, (the eventual interpretation of) a cc
program, together vvith the underlying constraint
system, is modelled in a sound and complete way
by'a CHARM with as many rewrite rules as agents
(and subagents) and pairs of the entailment relation
(note that, while the number of agents is always finit.e, in general t.here may be an infinite number of
pairs in the entailment relation). It is important to
stress the naturality of the CHARM as an abstract
machine for cc programming. In fact, the global part
of the rules exactly corresponds to the idea that constraints are never cancelled, and thus, once generated locally (by one of the subsystems), are global
forever. This description of cc programming within
the CHARM framework follows a similar one, given
in [Montanari and Rossi 1991], where the classical
"double-pushout" approach to graph rewriting was
used to model cc programs and to provide them with
a t.ruly concurrent semantics. Thus, the results of
this section are not surprising, given the results in
[Montanari and Rossi 1991] and those of the previous section, which show how to model graph grammars through a CHARM.
6
Future Work
As pointed out in Section 2, one of the subjects
which seem most interesting to investigate is the
possibility to provide the CHARM with a trueconcurrency semantics. . Another one is instead
the implementation of process description languages
onto the CHARM. As briefly discussed in sections
:2 and 3, both these issues seem to be fruitfully addressable within the algebraic· framework we have
depicted in this paper.
In [Laneve and Montanari 1991] it has been
shown that concurrent constraint programming may
encode the lazy and the call-by-value A-calculus.
896
This encoding exploits a technique similar to the one
used by Milner to encode A-calculus inir-ca.lculus
[Milner et a1. 19S9], since the mobility of processes
(which is one of the main features ofir-calculus) can
be simulated in cc programming via a clever use of
the shared logical variables. This result, combined
with our implementation of cc programming in the
CHARM, described in Section 5, suggests that also
higher order aspects of functional languages .may be
expressed within the CHARM.
References
[Berry and Boudol 1990J G. Berry and G. Boudo1.
The Chemical Abstract Machine. In Proc.
POPL90, ACM, 1990.
[Corradini 1990J A. Corradini. An Algebraic Semantics for Transition Systems and Logic Programming. Ph.D. Thesis TD-S/90, Dipartimento di
Informatica, Universita eli Pisa. Italy, IVlarch
1990.
[Corradini et a1. 1990J A. Corradini, G. Ferrari, and
U. Montanari. Transition Systems with Algebraic Structure as Models of Computations. In'
Semantics of Systems of Conc'u.rrent Processes,
Guessarian 1. ed, Springer-Verlag, LNCS 46S,
1990.
[Corradini and Montanari 1991] A. Corradini and
U. Montanari. An Algebra of Graphs and Graph
Rewriting. In Proc. 4th Conference on Category
Theory and Comp1der Science. Springer-Verlag,
LNCS, 1991.
[De Boer and Palamidessi 1991J F.S. De Boer and
C. Palamidessi. A Fully Abstract Model for
Concurrent Constraint Programming. In Proc.
CAAP, 1991.
[Ehrig 1979J H. Ehrig. Introduction to the Algebraic
Theory of Graph Grammars. In Proc. International Workshop on Graph Gramma.rs, SpringerVerlag, LNCS 73, 1979.
[Ferrari 1990J G. Ferrari. Unifying Models of Concurrency. Ph.D. Thesis, Computer Science Department, University of Pisa, Italy, 1990.
[Gorrieri et a.I. 1990J R. Gorrieri, S. Marchetti, and
U. Montanari. A 2 CCS: Atomic Actions for CCS.
In TCS 72, vol. 2-3, 1990.
[Gorrieri and Montanari 1990] R. Gon'ieri and U.
Montanari. A Simple Calculus Of Nets. In Proc.
CONCUR90, Springer-Verlag, LNCS 45S, 1990.
[Hoare 19S5J C.A.R. Hoare. CommtLnicating Sequential Processes. Prentice Hall, 19S5.
[Jaffar and Lassez 19S7J J. Jaffar and J.L. Lassez.
Constraint Logic Programming. In Proc. POP L.
ACM,19S7.
[Laneve and Montanari 1991J C. Laneve and U.
Montanari. Mobility in the cc paradigm. Submitted for publication, 1991.
[Lloyd 19S7J J.,\iV. Lloyd. Foundations of Logic Programming. Springer Verlag, 19S7.
[Meseguer 1990J J. Meseguer. Rewriting as a Unified
Model of Concurrency. In Proc. CONCUR90,
Springer- Verlag, LNCS 45S, 1990.
[Nleseguer and Ivlontanari 1990] J. Meseguer and U.
Montanari. Petri Nets are Monoids. Informa.tion
and Compu.tation, vo1.SS, n.2, 1990.
[Milner 19S9J R. Milner. CommtLnication and Concttrrency. Prentice Hall, 19S9.
[Milner et a1. 19S9J R. Milner, J.G. Parrow, and
D.J. Walker. A calcttlus of mobile processes.
LFCS Reports ECS-LFCS-S9-S5/S6, University
of Edinburgh, 19S9.
[Montanari and Rossi 1991J U. Montanari and F.
Rossi. True Concurrency in Concurrent Constraint Prpgramming. In Proc. ILPS91 , MIT
Press, 1991.
[Reisig 19S5J Vl. Reisig. Petri Nets: An Introdttction. EATCS ~/Ionographs on Theoretical Computer Science, Springer Verla.g, 19S5.
[Shapiro 19S9J E. Sha.piro. The Family of Concurren~ Logic Programming Languages. A CM Comptdzng Surveys, vo1.21, n.3, 19S9.
[Saraswat 19S9J V.A. Saraswat. Concttrrent Constraint Programming L(Lngtt(J.ges. Ph.D. Thesis,
Carnegie-Mellon University, 19S9. Also 19S9
ACM Dissertation Award, MIT Press.
[Saraswat and Rinard 1990J V.A. Saraswat and M.
Rinard. Concurrent Constraint Programming. In
Proc. POP L, ACM, 1990.
[Saraswat et a1. 1991J V.A. Saraswat, M. Rinard,
and P. Panangaden. Semantic Foundations of
Concurrent Constraint Programming. In Proc.
POP L, AC~\'f, 1991.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
897
Less Abstract Semantics
for Abstract Interpretation of FGHC Programs
Kenji Horiuchi
Institute for New Generation Computer Technology
1-4-28, Mita, Minato-ku, Tokyo 108, JAPAN
hori uch i@icot.or.jp
Abstract
In this paper we present a denotational semantics for
Flat GHC. In the semantics, the reactive behavior of a
goal is represented by a sequence of substitutions, which
are annotated with + or - depending on whether the
bindings are given from, or posted to the environment of
the goal. Our objective in investigating the semantics
is to develop a framework for abstract interpretation.
So, the semantics is less abstract enough to allow an
analysis of various properties closely related to program
sources. We also demonstrate moded type inference of
FG H C programs using abstract interpretation based on
the semantics.
1 Introduction
Various work on the semantics for concurrent logic
languages has been investigated by many researchers
[Gerth et al. 1988][Murakami 1988][Gaifman et al. 1989]
[Gabbrielli and Levi 1990][de Boer and Palamidessi
1990]. One of their main purposes is to identify one
program with another syntactically different program,
or distinguish between syntactically similar programs.
And, since some researchers are interested in properties
like fully abstractness, they may want to hide internal
communications from the semantics or want to abstract
even observable behaviors much further.
Since our main objective is to analyze a program
unlike the above researchers, we want to have a fixpoint semantics suitable to the collecting semantics, on
which our framework of abstract interpretation is based.
But once we try to introduce one of their semantics to
a framework of abstract interpretation, the semantics
may be too abstract to obtain some of the properties
we require.
In this paper we present a denotational fixpoint semantics for Flat GHC. In the semantics the reactive
behavior of a goal is represented by a sequence of substitutions which are annotated with + or - depending
on vlhether the bindings are given from, or posted to the
environment of the goal. The semantics presented here
is less abstract enough to allow an analysis of various
properties closely related to program sources, e.g., on
occurrences of symbols in programs or internal communications. V\Te also demonstrate moded type inference
of FGHC programs using abstract interpretation.
We briefly explain the concurrent logic programming
language Flat GHC and its operational semantics in
Section 3 after we introduce the preliminary notions in
the next section. Next we present the fixpoint approach
to the semantics of Flat GHC in Section 4, and then in
Section 5 we show the relationship between the fixpoint
semantics and the operational semantics. After ;reviewing a general framework for abstract interpretation, we
show examples of analyzing FGHC programs.
2 Preliminaries
In this section, we introduce the following basic notions
used in this paper, many of which are defined as usual
[Lloyd .1987][Palamidessi 1990].
Definition 2.1 (Functor, Term, Atom, Predicate
and Expression)
Let Var be a non-empty set of variables, Func be a set
of functors, Term be a set of all terms defined on Var
and on Func, Pred be a set of all predicates and Atom
be a set of all atoms defined on Term and Pred.
An expression is a term, an atom, a tuple of expressions or a (multi)set of expressions, and we denote a set
of all expressions by Exp. We also denote the set of all
variables appearing in an expression E by var(E).
Definition 2.2 (Substitution)
A substitution () is a mapping from Var to Term such
that the domain of () is finite, where the domain of (), denoted by dom(()), is defined by {V E Var I ()(V) i= V}.
The substitution () is also represented by a set of assignments such that {V +-t I V E dom(()) /\ ()(V) == t}. The
identity mapping on Var, called an identity substitution,
is denoted by 0. The range of (), denoted by ran(()), is a
set of all variables appearing in terms at the right hand
side of each assignment of (), i.e., UVEdom(9) var(()(V)).
var( ()) also denotes the set of variables dome ()) U ran( ())
When E is an expression, E() (or (E)()) denotes an
expression obtained by replacing each variable V in E
with ()(V). The composition of two substitution and
0', denoted by eO', is defined as usual [Lassez et al.
1987][Palamidessi 1990]. A substitution is assumed
to be always idempotent [Lassez et al. 1987]. (i.e.,
e
e
898
dom(8) n ran(8) = 0, where 0 denotes an empty set.)
And the result of composing substitutions is also assumed to be idempotent. The set of all idempotent
substitutions is denoted by S'l.lbst and the set of all renamings is denoted by Ren. A restriction of 8 onto
var(E) is denoted by alE'
Definition 2.3 (Equivalence Class and Partial
Ordering)
A pre-ordering :5 on Subst, called an instantiation ordering, is defined as follows: al :5 a2 iff 30"(8 l O" = ( 2),
where ai, 82 , 0" E Subst. The equivalence relation w.r.t.
an instantiation ordering ::$, denoted by rv, is defined
as follows: 81 rv a2 iff 3'r/(a l 'r/ == ( 2 ), where 'r/E Ren.
And substitutions al and a2 are said to be in an equivalence class when al rv a2. A set of the equivalence
classes of Subst is denoted by Subst/",. A partial ordering on S1£bst /"" also denoted by :5, is naturally induced
from a pre-ordering :5 on Exp. We denote the equivalence class of a substitution a by 8"" or simply by
a. Given T as the greatest element on :5 of Subst /""
Subst /'" can be naturally extended to Subst /"" Then
(Subst /"" :5) forms a complete lattice.
Definition 2.4 (Most General Unifier)
A most general unifier (mgu) a of expressions E l , E 2,
denoted by mgu( El , E 2), iff E 1 8 == E 28 and El aJ ==
E 2 8J J 8 :5 aJ for all 8J • Let U be a set of equations
{Sl =t l ,· .. , Sn=t n }. Then mgu([sl' ... , Sn], [t1" .. , t n ])
is also denoted by mgu(U). A substitution 8 can also
be represented by a set of equations, denoted by Eq(8),
such that Eq(8) = {X=t / (X~t)E8}.
Definition 2.5 (Directed)
Let 81 ,8 2 be substitutions. Then 81 and 82 are said to
be directed, denoted by 81 I> 0, suppose .6. = .6.'. 0 such that
1.6.' 1=~ O.
If Go is a unification goal, then the theorem is trivial.
Otherwise, Go is a non-unification goal. Now, since
(Go, .6.)~ p hold, (Go, .6.')~ p holds. By the induction
hypothesis, since 1.6.'1 < k and (Go,.6.')~p, (G o,.6.') E
[P]G o •
Hence, from the definition of Tp,G o, 3Bt(.6.' = Bt·
.6.") such that (H :- G 1 B) E P and 8g = mgu( {A=
H} U G) and VB i E B3(B i Bg, .6. i ) E [P]G o and .6." E
int(.6. 1 , ..• ,.6. n )lvar(G o)' Now we have (Go,Bt·.6.")~p
and .6." E int(.6. 1 , ... , 6 n )lvar(G o)' Then we can get
.6. i such that (BiBg,6i)~P by selecting a unit reaction
from the only i-th argument (i.e., .6. i ) in the definition
of into
.
Suppose that the last transition of (Go, 6' ·o)~ p is a
transition on a sub-goal of BiBg. Then (BiBg, 6iO)~P.
Since k > 16'1 > 16"1 ~ 16il, k> 16'1 ~ 16iol.
By induction hypothesis again, since (BiBg, 6iO)~ p,
(Bi 8g, .6. i o) E [P]G o.
Therefore, from the definition of Tp,G o, since 6'0 =
int(6 1 , ... , 6i, ... , 6 n ), (Go, 6' 0) E [P]Go'
•
In Theorem 5.1 we show that any most general correct atom reaction (Go, .6.) w.r. t. a program P is in the
topdown semantics [P]G o ' In general it is necessary
to prove the only-if part of the theorem (usually called
Completeness Theorem), and we think this is possible
by introducing a kind of downward closure of (A, 6)
by using the (more general than' relation in Section 4.1,
as subsumption relation in [Falaschi et al. 1990]. This,
however, is beyond the scope qf this paper. Because
Theorem 5.1 is sufficient to guarantee the correctness of
the framework of abstract interpretation based on the
top down semantics since we want to use this semantics
as a collecting semantics.
6 General Framework for Abstract Interpretation
In this section we briefly review a general framework of
abstract interpretation for programs whose semantics
can be defined from a fixpoint approach; and some conditions to guarantee that the abstract interpretation is
'safe' for the semantics.
When a standard semantics is given by the least fixpoint of some semantic function, an abstract semantics
is given by another semantic function obtained by directly abstracting the concrete semantic function such
that the safe relation exists between their two semantics.
6.1 Concrete Fixpoint Semantics
Suppose that the meaning of a program P is given by
the least fixpoint of a (concrete) semantic function T p ,
denoted by lfp(Tp), where Tp : Den --+ Den is a continuous function and Den is a powerset of D, called a
concrete domain, such that each element of D expresses
a concrete computation state of the program. For example, in an ordinary logic program, is an Herbrand Base.
And Den forms a complete lattice with regard to the set
inclusion ordering ~ on Den. Then, the least fixpoint
of Tp exists and we can get it by lfp(Tp) = Tpjw.
Definition 6.1 (Concrete Semantics)
[P] = lfp(Tp) is called the least jixpoint semantics of a
program P. Especially, we call it the concrete semantics
of a program P since the semantics is obtained from the
concrete semantic function Tp
6.2 Abstract Fixpoint Semantics
We define an abstract fixpoint semantics by abstracting
the concrete domain and the concrete semantic function
introduced in 6.1.
Definition 6.2 (Abstract Domain)
Given a concrete domain D, an abstract domain D is a
finite set of denotations satisfying the following conditions:
(1) every element of D represents a subset of D,
(2) D forms a complete lattice with respect to an order
relation ~ defined on D, and
(3) there exist two monotonic mappings, that is, abstraction a : D --+ D and concretization, : D --+ D
defined as follows: Vd. E D (d. = a(,(d.))) 1\ Vd E
D (d ~ ,(a(d)))
In order to define the abstract semantics of a program
P, we should define (or design) a monotonic and continuous mapping of a program Pi 'Lp : Den --+ Den, called
the abstract semantic function, as well as the abstract
domain D, corresponding to the concrete domain D and
the concrete semantic function Tp of P. Then we have
to define the abstract versions of various operations,
e.g., a composition or an application of substitutions,
used in the definition of T p .
903
Definition 6.3 (Abstract Semantics)
Then the least fixpoint semantics [P] = Ifp(T..p) , obtained from the abstract semantic function T..p , is called
the abstract semantics of a program P.
N ow we claim the termination property with respect
to the abstract fixpoint semantics.
Lelllllla 6.1 There exists the least fixpoint lfp (T.. p )
of T.. p such that Ifp(T.. p ) = T.. p jk for some finite k
Lastly, we attach the following acceptable relation between the abstract semantics and the concrete semantics:
Definition 6.5 (Safeness Condition)
A safeness condition for the abstract semantics is as
follows: [P] ~ ,([P]).
Lemma 6.2 If TpCl(d.)) ~ ,(T..p(d)) for all d. E D,
then the abstract semantics is safe, i.e., a safeness condition holds, where Tp(,(d)) = {Tp(d) I dE,(d)}.
7 Applications for Analysis of FGHC
Programs
In this section we show some examples of analyzing
FGHC programs by using abstract interpretation based
on the top down semantics in Section 4, which is an instance of the general framework in Section 6
7.1 Moded Type Graph
The abstract domain presented here is so similar to the
one based on type graphs in [Bruynooghe and Janssens
1988], that most necessary operations on the abstract
domain will be well-defined similarly to [Bruynooghe
and Janssens 1988][J anssens and Bruynooghe 1989].
Here we introduce a moded type graph, and show
briefly that a reaction sequence and an atom reaction
can be abstracted by a moded type graph.
Definition 7.1 (Moded Type Constructor and
Generic Types)
A(n n-ary) moded type constructor is a(n n-ary) function symbol fin E Func with a mode annotation + (or
-), denoted by ftn (or fin) or simply f+ (or f-), which
represents a(n n-ary) function symbol f appearing in
input (or output) unit reactions (respectively). Four
generic (moded) types are an any type, a variable type,
an 1.mdefined type and an empty type, denoted by any,
1::., - and 0 respectively. An any type represents the set
of all moded terms, both l::. and _ represent the set of
variables, and 0 represents the empty set of terms.
Definition 7.2 (Moded Term and Moded Type)
A moded term is a term constructed from moded type
constructors over a set of variables Var. A moded term
represents the same term without all mode annotations
such that a moded type constructor with + (or -) corresponds to a function symbol appearing in an input
(or an output) unit reaction. A moded type is a set of
moded terms.
Definition 7.3 (Moded Type Graph)
A moded type graph is a representation of a moded type,
which is a directed graph such that each node is labeled
with either a moded type constructor, a generic type,
or a special label 'or'.
The relation between a parent node and (possibly
no) child nodes in a moded type graph 9 is defined as
follows:
(1) a node labeled with f7n or fin(n ~ 0) has n ordered
arcs to n nodes, i.e., has n ordered child nodes,
(2) a node labeled with 'or' has n non-ordered arcs to n
nodes (n ~ 2), i.e., has n non-ordered child nodes,
(3) a node labeled with a generic type has no child node,
(4) there exists at least one node, called a root node,
such that there are paths from the root node to any
other nodes in g, and
(5) the number of occurrences of nodes with the same label on each path from the root node of 9 is bounded
by a constant d, called a moded type depth.
Suppose that a node N tries to be newly aqded as
a child node of Np in a moded type graph g. Then, if
the creation of the node N violates the condition (5) in
the above definition, that is, if there exist more than d
numbers of nodes with the same label as N on the path
from the root node to N, then the new node N will not
be added to 9 as a new child node of Np but will be
shared with the farthest ancestor node of Np with the
same label as N. In such a case, a circular path must
be created. (Nodes with the same label aren't shared
with each other when their nodes are on different paths
from root.) The restriction of (5) is the same as the
depth restriction in [Janssens and Bruynooghe 1989].
They call a type graph satisfying the depth restriction
a restricted type graph, and they have presented an algorithm for transforming a non-restricted type graph to
restricted one.
A concretization for a moded type graph with a root
node No, denoted by ,(No), is defined as follows:
(1) ,(N) is Var if the label of N is
l::. or_,
(2) ,(N) is the empty set 0 if the label of N is 0,
(3) ,(N) is {f+(t l , ... , tn) Iti E ,(Ni) 1\ 0 :::; i :::; n} if
the label of N is ftn and N I , ... , N n are child nodes
of N,
(4),(N) is {f-(tl, ... ,t n ) I tiE,(Ni )1\0 :::; i:::; n} if
the label of N is fin and N I , ... , N n are child nodes
of N, or
(5) ,(N) is ,(NI ) U ... U ,(Nn ) if the label of N is 'or'
and N I , ... , N n are child nodes of N.
A moded type graph represents a set of moded term,
i.e., a moded type, defined by,. A set of all moded
types is denoted by Term.
A moded type graph 9 can be also represented by an
expression, called a moded type definition, like a context
free grammar with (possibly no) non-terminal symbols,
called type variables, and one start symbol, called a root
904
type variable, corresponding to the root node of g, as
in [Janssens and Bruynooghe 1989]. A moded type or
a mode type graph represented by a moded type definition may be referred to the root type variable.
Example 7.1
The following graph 9 is a moded
type graph whose root node is labeled with h
J2:
hJ2
Then the moded type graph may also be denoted by
the following moded type definition:
T = h+(J::., TJ),
T1 = j+(TJ),
where T, T1 is type variable and T is a root type variable.
This moded type definition represents a set of moded
terms:
,(T) = {h+(V1,j+(V2 )), h+(V1,j+(j+(V2 ))), •.. }
An abstraction a for a moded term satisfying the
condition (3) in Definition 6.2 is also well-defined in a
similar way to [Janssens and Bruynooghe 1989].
A moded type substitution fl is a mapping Var to
Term, and is also represented by a set of assignments
of variables to moded types. A concretization and an
abstraction for a moded type substitution is defined:
,(fl) = {8 I VX E dom(fl) (t E ,(Xfl) :) (X ~t) E 8)
a(8) = {X ~a(X8) I X E dom(8)}
And an ordering relation ~ over moded type substi tutions is defined as follows: fl.l ~ fl.z iff ,(Vfl.1 ) ~ ,(Vfl.z )
for all variables V E Var.
A moded type reaction sequence I'::!. is a sequence of
moded type substitutions QIQZ ... Qn such that
Vi,j(l ::; i < j ::; n)(Qi ~ Q/\ dom(Qi) = dom(Qj)),
and dom(l'::!.) is defined dom(Qi)' A concretization for a
moded type reaction sequence, denoted by ,(Q1 ... Qn),
is defined as follows:
{81 ... 8n
I 81 " , 8n E Rseq /\ II8i E,(8i)},
where II8i is a composition of substitutions 181 1.. ·1 8i I.
And an instantiation ordering ~ on a moded type reaction sequence is defined: 1'::!.1 ~ 1'::!.2 iff ,(1'::!.1) ~ ,(I'::!.z).
Definition 7.4 (Moded Type A tom Reaction)
A moded type atom reaction is a pair of an atom A and
moded type reaction sequence I'::!. such that dom(l'::!.) ~
var(A). Areact is a set of all moded type atom reactions.
Exalllple 7.2
Let I'::!. be a reaction sequence {X ~
j(Y)}+{Y~g(Z)}-. Then a( {X~j(Y)}+{Y~(Z)}-)
is {X ~Td{X ~TZ}' where T1 and T2 is defined by the
following type definitions:
T1
= j+(K),
TZ
= j+(g-(V)).
An application of a moded type substitution Q to a
moded type reaction sequence Ql ... Qn is a moded type
reaction 'sequence 8~ ... 8~ such that 8~ = lube 8i-1, 8i)
for all i (O:S i:S n) where 80 = 8.
A possible interleaving of moded type reaction sequences int can be well-defined by using the definition
of possible interleaving on a concrete domain in Section
4.2. And Den is a power set of Areact.
N ow we can define the abstract semantic function
'Lp,G: Den -+ Den for a program P and a goal G by using abstract operations and denotations defined above.
7.2 An Example of Detecting MUltiple
Writers
Consider that two goals try to instantiate a shared
variable to a (possibly different) symbol( s). In such
a case, the goals may cause inconsistent assignments
to the same variable, which are called multiple writers.
Recently, in the family of concurrent logic languages,
several languages have been proposed that do not allow multiple writers, and many advantages have been
discussed [Saraswat 1990][Ueda 1990a][Kleinman et al.
1991][Foster and Taylor 1989]. For examples, moded
FGHC presented in [Ueda 1990a] has the following advantages: (1) an efficient implementation based on a
message-oriented technique, (2) unification failure free,
and (3) easy mode analysis. So moded FGHC seems
to lead FGHC programmers into a good style of FGHC
programming.
Although you can write most programs without using
multiple writers, you may want to use them in a few
cases. Stop signal may be one of these examples.
Stop signal is a programming technique such that,
when some goal find the answer to a searching problem, the goal broadcasts a stop signal to any other goals
which are solving the same searching problem (or its
sub-problems) and the goal forces any other goals to
terminate their process by instantiating a flag symbol
to a variable shared by all goals. Several flaggings may
occur on different goals at the same time, or some goal
may broadcast a flag at any stage if a flag is not received
but has been sent from other goals. In such cases, multiple writing problems may OCCur.
Now we show a method of detecting multiple writers
as an application of the moded type inference in the previous section. The following program implements a very
simple example of 'stop signal'. A subscript number of
each function symbol is used to distinguish occurrences.
main(T,F) :- true I generate(T),search(T. F).
search(t1C-.al._).F) :- true I F=f 1 .
search(_.f2) :- true I true.
search(t 2 (L. bl .ft) .F) : - true I
search(L,F),search(R,F).
generate(T) :- true I T=t3(L.N,R).genNode(N),
generate(L),generate(R).
genNode(N)
true I N=a2'
true I N=b 2 .
genNode(N):
905
A goal generate (T) generates a binary tree with
each node labeled with a or b, and a goal search (T , F)
searches a node labeled with a. Body goals of search/2
share the second argument as a 'stop signal'. Now we
try to analyze the moded type of a goal main(T, F) by
computing -ma1n
[P] . (T , F) on the abstract domain for
the moded type. Here each moded type constructor
has a subscript number. When we apply fl = {X+-ai}
to §. = {X +- a
we can get a moded type substitution (fl)§. = {X +- a Z1 }. This represents a moded type
{X+-a-} by engaging a2 to al. When goals try to engage a moded type constructor with - to a moded type
constructor with -, the goals are multiple writers.
In the above program, we can compute the following
moded type atom reactitm in --ma1n
[P] . (T , F):
z},
(main(T,F), ... {F +- f;-l} ... ).
Then we can get information such that
(1) the goal main(T ,F) may cause multiple writes, and
(2) the problematic goal is a unification goal writing f1'
i.e. in the body of the first clause of search.
8 Discussions
Much research has been presented on the fixpoint approaches to the semantics for concurrent logic languages.
Atom reactions are essentially the same as reactive
clauses introduced in reactive behavior semantics [Gaifman et al. 1.989J. Since reactive behavior semantics is
defined by the self-unfolding of reactive clauses, we cannot always define some reasonable abstraction of the semantics when the semantics is applied to abstract interpretation. That is, the same non-terminating problem
may occur as in the example below. While using our
semantics, since we define by computing all possible reaction sequences corresponding to atoms in a body at
one time by int, such a problem does not occur.
Our semantics distinguishes red,£ction failure from
deadlock as well as l£nification faill£re, although the operational semantics of FGHC say nothing w.r.t. reduction failure, that is, reduction failure is regarded as suspension. Then the case that a goal is reduced by no
clause is distinguished from failure ( unification failure),
but not distinguished from deadlock. But we introduce
reduction failure as a termination symbol. In a practical system of FGHC, reduction failure may be reported
as a system service to users if the system fortunately
detects it at run time. It is helpful to users if reduction failure can be detected since such failure causes
deadlock. So, we will want to detect the possibility of
reduction failure at analysis time too. This is why we
must introduce reduction failure to the semantics.
In [de Boer et. al. 1989J, they have presented a denotational and a fixpoint approach to the semantics for
(non-fiat) GHC. They have presented the declarative semantics based on a fixpoint approach over the semantic
domain similar to our atom reaction. They have men-
tioned that the fixpoint semantics is sound and complete w.r.t. the operational semantics giving only the
results of finite success computations. Whereas, since
our approach keeps more information by using the complement of all correct input unit reactions and l..rf, it
can be correctly related to the operational semantics
including the cases of deadlock and finite failure.
A few works on abstract interpretation for concurrent
logic programs have been presented. The approaches of
[Codognet et al. 1990] and [Co dish et ai. 1991] are based
on the operational semantics.
In [Codognet et al. 1990], they have presented a metaalgorithm for FCP(:) and an abstracted version of it.
They also show the correctness relation of the algorithm
to the operational semantics, which is defined by a transition system similar to this paper.
In [Codish et al. 1991], they directly abstracted a
standard transition system semantics, where a set of
configurations is approximated to an abstract configuration: One of the advantages of their approach is that
the analysis is simple and easy to prove correct.
These two are essentially the same approaches and it
is easy to understand the correspondence to the operational semantics in both approaches.
In the approach of [Codish et al. 1991], the termination of abstract interpretation may not be guaranteed for some programs such that a goal may infinitely
generate more and more sub-goals. For example, the
following program is taken from [Codish et al. 1991].
They must abstract the domain (i.e., configuration) too
much (called star abstraction) in order to solve such a
problem. The star abstraction is enough and not too abstract to analyze suspension. But it may not be suitable
to call and/or success pattern analysis. These problems
may be solved by adopting some abstraction on goals
other than the star abstraction [Co dish 1992].
producer(X) :- true I
X=f(Xl,X2), producer(Xl), producer(X2).
consumer (X) :- X=f(Xl,X2) I
consumer(Xl), consumer(X2).
But our abstract interpretation can analyze call pattern of the program, and return the following moded
type atom reaction when the moded type depth is 1:
(producer(X), {X+- 7d{X+- 7d{X+- 73} )
71
72
73
= f+(_,_)
= f+(72,-)
= f+( 73,73)
Although our possible interleavings may be a little
difficult to define and understand, these problems can
be solved by the abstraction only on the domain, i.e.,
reaction sequences.
906
9 Conclusions
V'le have presented a denotational semantics for FGHC
which is less abstract semantics and is suitable as a basis for abstract interpretation. Since the semantics is
defined by a fixpoint approach on atom reactions which
represent the reactive behaviors of atoms, we can easily
develop a program analysis system only to abstract a
(possibly infinite) domain to a finite domain. We have
also demonstrated moded type inference of FGHC programs.
Acknowledgments
I thank Kazunori U eda and Michael Codish for valuable comments and suggestions and the referees for their
helpful comments.
References
[Bruynooghe and Janssens 1988J Bruynooghe, M. and
G. Janssens, "An Instance of Abstract Interpretation Integrating Type and Mode Inferencing", P~oc.
of the 5th International Conference and SymposIUm
on Logic Programming, R .A. Kowalski and K. A.
Bowen (eds.), pp.669-683, 1988.
[Codish and Gallagher 1989J Codish, M., J. Gallag~er,
"A Semantic Basis for the Abstract InterpretatIOn
of Concurrent Logic Programs", Technical Report
CS89-26, November, 1989.
[Codish et al. 1991J Codish, M., M. Falaschi and ~.
Marriott, "Suspension Analysis for Concurrent LOgIC
Programs" , Proc. of the 8th International Conference
on Logic Programming, Furukawa, K. (ed.), pp.331345, 1991
[Codish 1992J Codish, M., personal communication,
Feb, 1992
[Codognet et al. 1990] Codognet, C., P. Codognet and
M. M. Corsini, "Abstract Interpretation for Concurrent Logic Languages", Proc. of the North American
Conference on Logic Programming, S. Debray and M.
Hermenegild? (eds.), pp.215-232, 1990.
[de Boer et.al. 1989] de Boer, F. S., J. N. Kok and
C. Palamidessi, "Control Flow versus Logic: a denotational and declarative model for Guarded Horn
Clauses", Proc. of Mathematical Foundations of
Computer Science, A. Kreczmar and G. Mirkowska
(eds.), pp.165-176, LNCS-379, Springer-Verlag, 1989.
[de Boer and Palamidessi 1990] de Boer, F. S., and C.
Palamidessi, "Concurrent Logic Programming: Asynchronism and Language Comparison", Proc. of the
North American Conference on Logic Programming,
S. Debray and M. Hermenegildo (eds.), pp.175-194,
1990.
[Falaschi et al. 1989J Falaschi, M., G. Levi, M. Mart~lli,
C. Palamidessi, "A Model-theoretic ReconstructIOn
of the Operational Semantics of Logic Programs",
Universita di Pisa, Technical Report TR-32/89, 1989.
[Falaschi et al. 1990] Falaschi, M., M. Gabbrielli, G.
Levi and M. Murakami" "Nested Guarded Horn
Clauses", International Journal of Foundations of
Computer Science, Vol.1, no. 3, pp.249-263, 1990.
[Foster and Taylor 1989] Foster, I. and S. Taylor,
"Strand: A Practical Parallel Programming Tool",
Proc. of the North American Conference on Logic
Programming, E. L. Lusk and R. A. Overbeek (eds.),
pp.497-512, 1989.
.
[Gabbrielli and Levi 1990] Gabbrielli, M. and G. LevI,
"Unfolding and Fixpoint Semantics for Concurrent
Constraint Logic Programs", Proc. of the 2nd International Conference on Algebraic and Logic Programs, LNCS, Nancy, France, 1990.
[Gaifman et al. 1989] Gaifm,:n, H., M. J: Maher and E.
Shapiro, "Reactive BehaVIOr SemantIcs for Concurrent Constraint Logic Programs", Proc. of the North
American Conference on Logic Programming, E. L.
Lusk and R. A. Overbeek (eds.), pp.551-569, 1989.
[Gerth et al. 1988] Gerth, R., M. Codish, Y. Lichtenstein and E. Shapiro "Fully Abstract Denotational Semantics for Concurrent Prolog" , Proc. of 3rd
Annual Conference on Logic in Computer Science,
IEEE, pp.320-335, 1988.
[Janssens and Bruynooghe 1989J Janssens, G. and M.
Bruynooghe "An Application of Abstract Interpretation: Integr~ted Type and Mode Inferencing" , Report
CW86 Katholieke Universiteit Leuven, April, 1989.
[Kleinma:n et al. 1991] Kleinman, A.,. Y .. Mos~owit~,
A. Pnueli and E. Shapiro, "CommulllcatIOn WIth DIrected Logic Variables", Proc. of the 8th Annual
ACM Symposium on Principles of Programming Languages, pp.221-232, 1991.
[Lassez et aI. 1987] Lassez, J. 1., M. J. Maher, .and
K. Marriott, "U nification Revised", FoundatIons
of Deductive Databases and Logic Programming,
Minker, J. (ed.), Morgan Kaufmann, pp. 587-625,
1987.
[Levi 1988] Levi, G., "A New Declarative Semantics
of Flat Guarded Horn Clauses", Technical Report,
ICOT, 1988.
[Lloyd 1987J Lloyd, J.W., "Foundatio~ .of Logic. Programming", Second, Extended EdItIon, SpnngerVerlag, 1087.
[Murakami 1988] Murakami, M., "A Declarative Semantics of Parallel Logic Programs with Perpetual
Processes", Proc. of the International Conference on
FGCS'88, pp.374-388, Tokyo, 1988.
[Palamidessi 1990] Palamidessi, C., "Algebraic Properties of Idempotent Substitutions", Proc. of the 17th
ICALP, pp 386-399, 1990.
[Saraswat 1990] Saraswat, V. A.! K. Kahn and~. Levy,
"Janus: A Step Towards Distnbuted Constramt Programming", Proc. of the North American Conference on Logic Programming, S. Debray and M.
Hermenegildo (eds.), 1990.
[Ueda 1990a] Ueda, K. and M. Morita, "New Implementation Technique for Flat GHC", Proc. of the 7th
International Conference on Logic Programming, D.
H. D. Warren and P. Szeredi (eds.), pp.3-17, 1990
[Ueda 1990b] Ueda, K., "Designing a Conc~rent Programming Language", Proc. of an InternatIOnal Conference organized by the IPSJ to Commemorate the
30th Anniversary: InfoJapan'90, pp.87-94, Tokyo,
1990.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
907
Parallel Optimization and Execution of Large Join Queries
Eileen Tien Lin
*
Sudhakar Yalamanchili
Edward Omiecinski
College of Computing
School of Electrical Engineering
Georgia Institute of Technology
Georgia Institute of Technology
Atlanta, Georgia 30332
Atlanta, Georgia 30332
Abstract
techniques, such as Simulated Annealing and Iterative
Optimizing large join queries that consist of many
joins has been recognized as NP-hard. In this paper,
we examine the feasibility of exploiting the inherent
parallelism in optimizing large join queries, on a hypercube multiprocessor. This includes not only using
the multiprocessor to answer the large join query, but
also to optimize it. Two heuristics are provided for
generating an initial solution, which is further optimized by an iterative local-improvement method. The
entire process of parallel query optimization and execution is simulated on an Intel iPSC/2 hypercube
machine.
Local-Improvement to avoid exhaustive enumeration
of all plans. In [SG88], the solution space consists of
only outer linear join processing trees where at most
one intermediate result is active and the inner relation
is always a base relation. Furthermore, they assumed
that the database resides in main memory. Later in
[Swa89], they propose a set of heuristics to be combined with the combinatorial techniques in order to
improve the performance. In [IK90], a new Two Phase
Optimization algorithm is presented which runs Iterative Local-Improvement for a small period of time and
uses the output of this phase as the initial solution for
the second phase that runs Simulated Annealing.
1
Introduction
A large join query consists of a series of relational
database join operations. The order in which these
joins are executed has a great impact on the response
time. The fundamental problem with optimizing large
join queries is searching the large solution space of
possible query execution plans.
In [IK84], the optimization of N-relationaljoins, using the nested-loop join method is proven to be NPcomplete. In [KBZ86], a generic cost formula is assumed applicable to the join methods used. They extend a polynomial time optimization algorithm for tree
queries[IK84] to the more general case. This algorithm
is also improved to an O( N 2 ) solution where N is the
number of relations in the query.
Several researchers have been studying the feasibility of applying general combinatorial optimization
·Currently at IBM Corporation, 555 Ba.i.ley Avenue, San
Jose, California 95150
In [DKT90] and [SD90], different strategies for processing large join queries in a parallel environment are
discussed. In [DKT90], the authors study how to execute a large join query on a shared memory parallel
computer. In [SD90], they show how a different representation of a query tree can affect the degree of parallelism within a query and performance. Specifically,
they compare left-deep and right-deep tree representations.
In this paper, we investigate the issue of using the
inherent parallelism in a hypercube multiprocessor to
optimize large join queries. Both inter-join and intmjoin paralielism are exploited in forming the plan,
which implies that a join can be performed on a subcube of any size and more than one join can be performed at a time.
908
The Parallel
Model
2
Query
Processing
Our parallel query processing model is predicated on the following parallel architecture model.
\Ve have P = 2d processors interconnected in a
d-dimensional binary hypercube.
Each processor,
with address Pd-l,Pd-2, ... ,Pi, ... ,Pl,PO, is connected to every other processor whose address is
Pd-l,Pd-2, ... ,Pi, ... ,Pl,Po,Vi, where Pi is the bit
complement of Pi.
Communication between nonadjacent processors is realized by routing messages
between intermediate nodes. Every processor (node)
has its own memory and interacts with other processors via message passing. In this paper, we refer to an
n-dimensional hypercube of 2" nodes as an n-cube. A
subcube is a subset of processors that forms a smaller
hypercube. For the purposes of our study we assume
the complete hypercube is available for performing the
joins.
Based on this architecture model, the query processing model consists of the following steps.
1. The host preprocesses a query and transforms it
into an internal form such as a join graph.
2. The host accesses the global database dictionary
for relevant statistics for each relation and each
join.
3. The host selects a query optimization strategy for
each node. The query and the selected strategy
are sent to all nodes in the system.
4. Each node follows its specified strategy to generate an initial plan and to optimize it to make the
best parallel query execution plan. This plan is
then reported to the host.
5. The host selects the best plan from all of the
nodes, schedules the query, and sends the plan to
all participating nodes. Each node then executes
the plan.
A distinct feature of our research is to exploit the inherent parallelism of the optimization step, instead of
relying on only the host to generate a good plan.
3
Assumptions
1. Each relation is horizontally partitioned over a
sub cube within the system. Relations may be
allocated to different subcubes of different sizes.
Tuples are assumed to be uniformly distributed
across nodes within a subcube.
2. The queries considered only involve natural joins,
i.e. equi-joins. For simplicity, we consider only
two-way joins that use the Cube Hybrid-Hash join
algorithm[OLS9j.
3. The system is assumed to be dedicated to this
application. Every node is available for both optimization and computation of the joins.
4. A join can be performed on any subcube of any
size. More than one join can be simultaneously
performed in disjoint subcubes.
5. The values of attributes are distributed uniformly
and independently of each other. This implies
that the size of R ~ S ~ T can be estimated by
multiplying the cardinalities of the three relations
and the two join selectivity factors.
4
Definitions
Parallelism in the execution of joins requires the
allocation! deallocation of su bcu bes to relations and
join computations. In our approach we use the binary
buddy system to manage subcubes for relations and
join operations. In the binary buddy system the hypercube is recursively partitioned into subcubes. The
subcubes can be represented by a binary tree as follows. Associate with each node a status bit that is 1
(0) if the processor is available (busy). The leaf nodes
represent the status bits of the nodes. The status bit
associated with any interior node is 0 if any of the leai
nodes in the corresponding subtree is 0, and 1 otherwise. The root is at level 0 and the nodes at level i are
associated with subcubes of dimension n - i. When a
request for 2k processors arrives, nodes at level n - kin
the tree are searched to find the first available one. If
found, it is allocated and the status bits of all the parent nodes are adjusted accordingly. Similar updates to
this structure take place on the deallocation of nodes.
The set of all processors that form a sub cube of
any size in a hypercube can be identified by a unique
cube identifier. A cube identifier is an address mask.
For example the 2-cube consisting of processing elements 0,2,8 and 10 are uniquely identified by the mask
(*0*0), where * represents a don't care. Subcubes
that can be allocated under the binary buddy system can be identified with cube identifiers of the form
Pd-l,Pd-2, ... , * * ... * *, where the n~mber of *'s is
equal to the dimension of the subcube. This assumes
that the recursive decomposition proceeds highest dimension first, followed by the next highest dimension,
and so on.
909
As an example, consider the following query for a
4-cube, qi,
j2
MR 3 .C=R•. C
I><1
j2
~
j4
~
R4
~
R2
C2=OO1*
Figure 3: A Plan Generated by the Maximum InterJoin-Parallelism Heuristic for qj.
Figure 2: An Example of a Parallel Plan using the
Maximum Intra-Join-Parallelism Heuristic.
the best order for executing the join operations, but
also the best mapping between a join and the subcube where it is to be performed. We provide the following two heuristics for mapping join operations to
subcubes in order to produce an initial solution. The
Maximum Inter-Join-Parallelism heuristic tries to rejuce the height of the tree as much as possible. The
Maximum Intra-Join-Parallelism heuristic always uses
the largest cube to perform each join. These heuristics
'
can be categorized as greedy heuristics.
5.2.1
Maximum Inter-Join-Parallelism
This heuristic tries to invoke as many joins in parallel as possible at the same time, and therefore, increases the degree of inter-join-parallelism. By invoking more joins in parallel, each join is allocated
a smaller cube, however the height of the plan is reduced. Even though this plan may have a smaller
height, each join may incur a higher cost.
The parallel plan for query qi in Section 4, which is
shown in Figure 1, could be produced by this heuristic.
Due to constraints in the query, at most two joins can
be performed in parallel. Therefore, it and i2 share
the cube. Since the following two joins have to be
performed sequentially, the largest cube is allocated
to each join.
5.2.2
Maximum Intra-Join-Parallelism
The maximum degree of parallelism is applied to every join to reduce the individual cost, but at the cost
of redistributing the relations prior to all the joins.
Since every join uses the same set of nodes, a chain of
immediate-location-dependencies is formed. As a result, all joins are forced to be performed sequentially.
Figure 2 is a possible parallel plan for the query qi in
Section 4 using this heuristic. Every join is allocated
the largest cube, i.e., the whole system. Note that
not all plans generated by this approach are necessarily binary linear processing trees[KBZ86] as shown in
Figure 2, in which at most one intermediate relation
is used as an input to subsequent join operation.
5.3
Combinatorial Optimization
Any plan for a large join query can be thought of as
a state in a solution space which includes all possible
plans. The ultimate goal of any optimization process
is to find a state with the globally optimum cost.
We use a simple combinatorial optimization technique, Iterative Local-Improvement. In this technique,
the current plan is transformed into a new plan by performing one move such as swapping the relative orders
of two joins. If the new plan has a lower cost (as computed by Algorithm 5.1), it becomes the current plan.
This process in general continues until a local optimum is found.
We now discuss how we optimize plans generated by
each of the heuristics described in the previous section.
Note that a local optimum is reached by a node when
further local improvement is not possible. The best
plan chosen by the nodes is selected as the parallel
large join plan.
912
5.3.1
Maximum Inter-Join-Parallelism
Following the initial application of this heuristic, there
is a limit to what extent the height of the plan (tree)
can be reduced for a given query. Consider the query
qj, joining 7 relations, Ro through ~, where R6 joins
with relations R 1 ,Rg ,R4 and R5; and Ra joins with Rg
and with Ro. Since ~ is to be joined with Rl,Rg ,R4
and R 5 , only one of these four joins can be performed
at a time. Figure 3 is a possible plan which has jl and
h performed in parallel. Following that, hand i4 are
performed in parallel. Finally j5 is performed followed
by j6. We optimize plans generated by the application of the Maximum Inter-Join-Parallelism heuristic
as follows:
• Globally, each node chooses a different maximal
independent set of relations, such that if two joins
i and j are in the same independent set, we can
perform i and j at the same time on two disjoint
cubes.
• Locally, each node can swap the join locations of
two randomly chosen joins for every independent
set until a local optimum is reached.
5.3.2
Maximum Intra-Join-Parallelism
To optimize plans generated by this heuristic, we take
an approach similar to that described using the Maximum Locality heuristic. However, since the join locations are fixed, i.e., the entire system, there is no need
to alter the join locations.
6
Performance Evaluation
In this section, we present our experimental evaluation of the two heuristics for parallel large join query
optimization.
6.1
System Description
The different heuristics and the entire process were
coded in C and the experiments were run on a 16node Intel iPSC/2 hypercube. To simulate a disk per
node, we implemented a disk module for every node
based on the single MAXTOR XT-8760S disk in our
hypercube.
Since the Intel hypercube uses the circuit switching
approach, we had to alter our cost model[OL89] used
in estimating the cost of a plan. The model[OL89]
assumes a packet switching approach. In addition,
to simplify the evaluation, attributes are not added
with each successive join and sufficient main memory
is assumed to be available to guarantee that hash table
overflow will not occur.
6.2
Query Characteristics
We categorize our queries into four groups based
on the number of tuples per relation (relation cardinality), the relative locations of the relations, and the
join selectivity factors:
1. All relation
cardinalities are uniformly distributed between 100 and 625 so that every relation is stored on only one node. All join selectivity factors are uniformly distributed between
10- 4 and 10- 3 •
2. All relation cardinalities are uniformly distributed between 5001 and 10,000 so that every
relation is stored on the entire 4-cube. All join
selectivity factors are uniformly distributed between 10- 5 and 10- 4 •
3. All relation cardinalities are uniformly distributed between 100 and 10,000. All join selectivity factors are uniformly distributed between
10- 4 and lO- g •
6.3
Experimental Results
In order to compare the average performance of the
two heuristics, we use the scaled cost instead of the real
cost measured in seconds. The scaled cost is the ratio
of the cost of the best plan produced by a heuristic for
a given query, to the minimum cost of plans produced
by the two heuristics for the same query. The reason
we do not compare the actual timing is that the order
of the costs for different queries can vary greatly and
a scaled cost provides an objective measure of the relative advantage of a specific heuristic. A scaled cost
of 1.0 means that this plan has the lowest cost among
the plans produced by the two heuristics.
The execution costs of these plans on the hypercube are measured and the average scaled costs are
compared to see if the result is consistent with the
estimate produced by Algorithm 5.1. For simplicity,
the execution cost only reflects the duration from the
time a node receives a plan, until it finishes performing the entire join plan. This is the cost predicted by
Algorithm 5.1.
Optimization of each of the individual plans will
take a varying amount of time. The overhead in initiating the optimization process at each of the nodes,
and the transfer of the results back to the host are
approximately equal. Therefore, we only consider the
longest and shortest durations of the optimization algorithms performed by the nodes.
For each table of results, several experiments were
run. We compare the average scaled cost of the initial
913
Table 1: Performance of Queries in Category 1 for 20
Relations.
Table 3: Performance of Queries in Category 3a for 20
Relations.
Inter-Join-Par.
Intra-J oin-Par.
1.00
1.52
Best Initial
2.42
1.00
Best Optimized
1.00
1.50
Best Optimized
4.65
1.00
Real Execution
1.00
2.20
Real Execution
6.35
1.03
Total
1.03
1.14
Total
2.15
1.17
Cost
Best Initial
Table 2: Performance of Queries in Category 2 for 20
Relations.
Cost
Inter-Join-Par.
Intra-J oin- Par.
Best Initial
3.42
1.00
Best Optimized
3.28
1.00
Real Execution
2.33
1.00
Total
1.13
1.16
plan, the average scaled cost of the optimized plan,
the average scaled execution cost, and the scaled total
cost which includes both the optimization time and
execution time for the two heuristics.
6.3.1
Comparison of Algorithms
Table 1 shows the performance of the three heuristics when all relations are very small and scattered.
The Maximum Inter-Join-Parallelism heuristic has
the best performance since it enables many joins
to be performed in parallel. In addition, for the
Maximum Intra-Join-Parallelism heuristic, the longest
time spend for query optimization was 93.17 seconds and for the Maximum Inter-Join-Parallelism
heuristic it was only 87.34 seconds. The shortest
time spend for query optimization was 31.49 seconds for the Maximum Intra-Join-Parallelism heuristic and 45.24 seconds for the Maximum Inter-JoinParallelism heuristic. By assigning the entire cube
to every join, the Maximum Intra-Join-Parallelism
heuristic has to spend more time in resolving all the
transfer-dependencies in the beginning since all relations have to be re-distributed over the entire cube.
When all relations are very large and stored on
Cost
Inter-Join-Par.
Intra-Join-Par.
the entire cube, the performance of the two heuristics is summarized in Table 2. The Maximum IntraJoin-Parallelism heuristic is superior to the Maximum Inter-Join-Parallelism heuristic. Although the
Maximum Inter-Join-Parallelism heuristic provides a
higher degree of inter-parallelism, for this type of
queries, each join takes a longer time in addition to
the overhead in resolving the transfer-dependencies.
In addition, for the Maximum Intra-Join-Parallelism
heuristic, the longest time spent for query optimization was 94.11 seconds and for the Maximum InterJoin-Parallelism heuristic it was only 79.77 seconds.
The shortest time spend for query optimization was
25.51 seconds for the Maximum Intra-Join-Parallelism
heuristic and 40.60 seconds for the Maximum InterJoin-Parallelism heuristic.
Table 3 summarizes the general case where the
relation cardinalities are uniformly distributed. In
general, the Maximum Intra-Join-Parallelism heuristic has the best initial and optimized costs. With respect to the longest and shortest optimization times,
a similar trend appeared as with the previous experiments.
6.3.2
Query Optimization Time
In general, the longest time and the shortest time to
reach a local optimum among the different starting
solutions generated by different nodes are quite far
apart. This makes it possible to improve the performance with the 2PO (Two Phase Optimization)
method described in [IK90]. Those nodes that have
reached a local optimum earlier can use the current
best solution as the input to the second phase, which
uses a modified simulated annealing method. This can
better utilize the idle nodes and further improve the
quality of their solutions.
914
6.3.3
Query Execution Time
References
Most of the average scaled costs for the queryexecution time on the hypercube are shown to be consistent with the estimated plan costs. That is, if a plan
produced by a heuristic is shown to have the best estimate, its actual execution cost is most likely to be
the best as well.
This is mainly due to the fact that for these two
categories, every relation has to be re-distributed over
the entire cube. This in turn results in increased link
contention, and therefore communication delays.
[DKT90] S. M. Deen, D. N. P. Kannanagara, and
M. C. Taylor. Multi-Join on Parallel Processors. In Proceedings of Second Interna-
[IK84]
T. Ibaraki and T. Kameda. On the Optimal
Nesting Order for Computing N-Relational
Joins. A CM Transactions on Database Systems, 9(3):482-502, September 1984.
6.3.4
[IK90]
Y. E. Ioannidis and Y. C. Kang. Random-
Total Time
By examining the total time spent in both optimization and execution, we have a better picture of the
performance of a heuristic. For category 1 in Table 1, the Maximum Inter-Join-Parallelism heuristic
not only has the best execution cost but also the best
total time. However in Table 2, the Maximum InterJ oin-Parallelism heuristic has the best total cost although it has the worst execution cost. One possible
reason is that for this heuristic, the difference between
the time that the earliest and latest node reaches a local optimum is significantly smaller than the other two
heuristics; this compensates for the inferior quality of
its optimized plan. Another possible explanation is
that this heuristic may be able to handle queries whose
relations are of similar sizes better than the other two
heuristics. From Table 3 we can see that the Maximum
Intra-Join-Parallelism heuristic has the best total time
in general together with the best execution time.
7
tional Symposium on Databases in Parallel and Distributed Systems, pages 92-102,
1990.
ized Algorithms for Optimizing Large Join
Queries. In Proceedings of ACM SIGMODInternational Conference on Management of
Data, pages 312-321, May 1990.
[KBZ86] R. Krishnamurthy, H. Boral, and C. Zaniolo. Optimization of Nonrecursive Queries.
In Proceedings of the Twelfth International
Conference on
[OL89]
E. Omiecinski and E. T. Lin. Hash-Based
and Index-Based Join Algorithms for Cube
and Ring Connected Multicomputers. IEEE
Transactions on Knowledge and Data Engineering, 1(3):329-343, September 1989.
[SD90]
D. Schneider and D. J. DeWitt. Tradeoffs in
Processing Complex Join Queries via Hashing in Multiprocessor Database Machines.
In Proceedings of the 16th VLDB Conference, August 1990.
[SG88]
A. Swami and A. Gupta.
Optimizing Large Join Queries. In Proceedings
Summary
In this paper, we examine the issue of optimizing
large join queries on a hypercube multiprocessor. Two
heuristics are proposed to produce an initial plan for a
given query. We adapt the iterative local improvement
procedure to the query optimization process in a hypercube multiprocessor to improve the plan produced
by the application of the two heuristics. Our simulation of the query optimization and the query execution
process show that the performance of these heuristics
depends on the characteristics of the queries.
Optimizing complex queries in parallel will reduce
the bottleneck in the query processor and will also improve the quality of the query execution plan. However, it is important to realize that the overhead can
be substantial, and therefore the number of processors
used in optimizing a query should depend on each individual query.
Very Large Data Bases,
pages 128-137, 1986.
of ACM SIGMOD- International Conference on Management of Data, pages 8-17,
September 1988.
[Swa89]
A. Swami. Optimizing Large Join Queries
Combining Heuristics and Combinatorial Techniques.
In Proceedings of
ACM SIGMOD- International Conference
on Management of Data, pages 367-376,
June 1989.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
915
Towards an Efficient Evaluation of Recursive Aggregates
in Deductive Databases
Alexandre Lefebvre 1
C.LT., Griffith University
Nathan, QLD 4111, Australia
ajl@cit.gu.edu.au
Abstract
This paper is devoted to the evaluation of aggregates (avg, sum, ... ) in deductive databases. Aggreo-ates have proved to be a modeling tool necessary for
: wide range of applications in non-deductive relational
databases. They also appear to be important in connection with recursive rules, as shown by the bill of
materials example. Several recent papers have studied
the problem of semantics for aggregate programs. As
in these papers, we distinguish between the classes of
stratified (non-recursive) and recursive aggregate programs. For each of these two classes, the declarative
semantics is recalled and an efficient evaluation algorithm is presented. The semantics and computation
of aggregate programs in the recursive case are more
complex: we rely on the notion of graph traversal to
motivate the semantics and the evaluation method proposed. The algorithms presented here are integrated
in the QSQ framework. Our work extends the recent
work on aggregates by proposing an efficient algorithm
in the recursive case. Recursive aggregates have been
implemented in the EKS-VI system.
1
Introduction
This paper examines an advanced functionality of deductive database systems, namely the ability to express
programs involving both recursion and aggregate computations in a declarative manner. The bill of materials
application (compute the total cost of a composite part
built up recursively from basic components) shows the
importance of this feature in real life databases. It is
well known that such programs are not expressible in
Datalog. We discuss semantics, evaluation model and
implementation of aggregates in the EKS-VI system
[VBKL90).
The recursive aggregate facility is one of the innovative features of the declarative language of EKS- VI,
in addition to more standard features like recursion,
IThis work was achieved while the author was at the European
Computer-Industry Research Centre in Munich.
negation and universal and existential quantifiers. EKSVI also provides an extensive integrity checking facility
and sophisticated update primitives (hypothetical reasoning, conditional updates). EKS-VI was developed
mainly in 1989 and demonstrated at several database
conferences (EDBT, Venice, March 1990 - SIGMOD,
Atlantic City, May 1990, ICLP, Paris, June 1991 VLDB, Barcelona, September 1991, ... ).
The aggregate capabilities we consider are essentially
those of SQL: a grouping primitive (group_by) is used
in association with scalar functions (such as sum, avg,
min) computing aggregate values for each group of tuples. Adding aggregate capabilities to a recursive language causes different problems, depending on the class
of programs accepted. We will consider two such classes:
stratified aggregate programs and non-stratified aggregate programs (this terminology builds on an analogy
with negation that will be explained below).
Our aim here is to provide efficient evaluation algorithms which can be integrated in the general evaluation frameworks such a QSQ or Magic Sets. In
the case of EKS-VI, this is performed within the topdown QSQ/DedGin * framework which was developed
in [Vie86, Vie88, Vie89) and for which compilation and
implementation techniques in a set-oriented way were
developed in the DedGin* prototype [LV89]. Studying evaluation in this framework does not limit its
scope. Indeed, it is now accepted that there is a
canonical mapping between an evaluation performed
along a Magic Sets like strategy [RLK86, BR87, SZ87]
and a "top-down" strategy [Vie86, Vie88, TS86] (see
[Bry89b, Sek89, Ull89, Vie89] for a comparison). Hence,
anything that we develop here can be adapted to Magic
Sets (and vice-versa).
In stratified aggregate programs, aggregate operations
and recursion are not allowed to be interleaved. In other
words, an aggregate value may be specified over the
result of a recursive query, or a recursive query may
be specified over the result of an aggregate operation.
However, an aggregate operation may not be part of
a recursive cycle, i.e. one aggregate predicate can not
recursively refer to itself.
916
For stratified aggregate programs, both semantics
and evaluation issues are readily solved: 1) the semantics can be defined in a standard proof-theoretic way
and 2) the evaluation problems are essentially those of
top-down constant propagation and of coordination on
the strata. The constant propagation issue is the (classical) problem of making use of constants given in the
query to limit the search space. For a query like "Give
me the average salary for the sales department", one
does not need to consult the entire employee relation.
As for coordination, one has to make sure that all relevant tuples have been computed before performing the
aggregate operation: again, this is a classical and relatively easy problem, which can be solved by appropriately extending the query evaluation method of the
respective system.
In the case of non-stratified aggregates, interaction of
recursion and aggregate computation raises more difficult problems. As a motivating example, consider the
classical bill of materials application for a bicycle. In
order to compute the total cost of a bicycle, one has to
1) compute the total costs of all its direct subparts (e.g.
a wheel), 2) multiply these costs by the number of occurrences of these subparts (e.g. 2 wheels in a bicycle)
and 3) sum up the resulting costs (aggregate computation). Step 1 consists in a recursive invocation of the
bill of materials query, implying a recursive invocation
of step 3 (aggregate computation). Clearly, aggregate
computation and recursion are intertwined. In the following, we refer to this more general class of programs
either as non-stratified aggregate programs or as recursive aggregate programs.
The first difficulty concerns semantics. For instance,
suppose that, in the bill of materials example, a composite part is defined in terms of itself (cyclic data).
Clearly, the cycle problem has to be solved in order to
provide semantics for such queries. Our definition of
the semantics of recursive aggregate queries relies on
the two following intuitive choices. 1) We regard recursive aggregate computations as operations on top
of the evaluation of a Datalog program. This underlying program represents a generalized graph (Datalog
allows more than just transitive closure) being traversed
during evaluation [RHDM86]. 2) Semantics should be
definable· in a way orthogonal to the semantics of the
aggregate operations themselves: for example, the semantics of a query should be definable whenever min is
replaced by max or vice-versa (of course, the result of
the evaluation would be different!).
In order to give semantics to recursive aggregate programs, we consider the subclass of programs for which
it is possible to associate a reduced program leaving out
the associated computation of aggregates. This program conceptually represents the graph being traversed.
'rVe call such programs reducible aggregate programs. A
query on a reducible program is meaningful only if there
is no cycle in the derivations on the associated reduced
program (we speak then of group stratification). Its semantics can then be defined in a classical proof-theoretic
manner.
The second difficulty is the evaluation of recursive aggregate queries. As in the stratified aggregate case, this
issue is two-fold: constant propagation and coordination. Constant propagation is done in the same way as
in the stratified aggregate case. Coordination is more
difficult than in the stratified aggregate case as one has
to rely on data stratification (there is no predicate stratification any more). Hence, one has to ensure that the
whole group of tuples for a given input value has been
computed before performing the corresponding aggregate operation. However, we are manipulating sets of
tuples: in a given set of tuples at a given time, there
might be a group that has been fully computed, and
another one for which only a partial set of tuples have
been produced. This makes the control over the order
of evaluation more complicated as it now has to be performed at the data level.
In the top-down evaluation scheme of EKS- VI, we
introduce the notion of subquery completion. We rely
on dependencies between sub queries in order to check
whether the evaluation of a given group has been completed. A general solution is proposed which makes use
of the reduced associated program in order to provide
ranges for the subqueries, so that the resulting sub query
dependencies correspond to the group dependencies. In
the case of tail-recursive programs, including the bill of
materials program, a simplification is possible.
The main contribution of our work is the integration
of recursion and aggregates in a general query evaluation framework. Two independent studies on recursive aggregates [MPR90, CM90] have been developed in
parallel to our work. They take a model-theoretic approach, as we consider a proof-theoretic approach to the
semantics of aggregate programs. [MPR90] describes an
algorithm extending the Magic Sets technique to stratified aggregate programs (in fact Magic Stratified aggregate programs). In this paper, we extend the evaluation
algorithm based on QSQ to group stratified aggregate
programs of which the bill of materials program is an
example.
The structure of this article is as follows. The remainder of this section introduces some definitions and
notations. Section 2 examines semantics and evaluation of stratified aggregates. For the recursive aggregate
case, we first analyze the semantics problem in section
3 where we define the class of reducible aggregate programs. We then propose an evaluation method in section 4 which relies on the notion of sub query completion.
Section 5 discusses related work, summarizes the paper
and opens towards future work.
917
1.1
Definitions and Notations
'rVe assume that a database is composed of base relations and of deduction rules of the form Head ~ Body
where the Body is a conjunction of positive and negative
literals. All the variables in the Head should appear in a·
positive literal in the body. Deduction rules define virtual predicates, which are also commonly called mews
in the classical relational terminology.
Definition 1.1 Aggregate rule
An aggregate predicate agg_pred is syntactically
defined, as in [MPR90j, by an aggregate rule in the
following way:
agg_pred( O;t) ~ group_bye
group_pred(In),
LisLoLGrouping_Variables,
LisLoLAggspecs
).
where:
• LisLoLGrouping_Variables is a list of variables.
O;t and Iii. are sequences of variables. They are
called grouping, output and input variables
respectively;
• group_pred is any virtual or base predicate and is
called the grouping predicate;
• LisLoLAggspecs is a list of aggregate specifications of the form A isagg funcagg(B) or A isagg
count where funcagg can be 'sum', 'min', 'max' or
'avg', A must be an output variable and B must
be an input variable. The variable A is called an
aggregate variable and B a variable to-beaggregated;
• an output variable must either be a grouping variable or an aggregate variable.
Without loss of generality, we assume that an aggregate
predicate is defined by one aggregate rule only. 0
Note that the aggregate function count has no argument, as it simply counts the number of tuples for a
given group.
We allow the use of e.g. arithmetic predicates in the
body of Datalog rules. Such predicates, not computable
by the basic relational operations, are· called external
predicates. We suppose that the external predicates are
used in a safe way (as in [BS89] - finite set of answers
and finite top-down evaluation). As an example, the
bill of material example uses an external predicate performing a multiplication (see section 3). The use of this
predicate is safe as soon as the data is acyclic.
Definition 1.2 Grouping subtuples and groups of
tuples
Given a tuple for the grouping predicate, its grouping subtuple is its projection over the grouping arguments.
Given a set of tuples S for a grouping predicate, we
partition S into groups of tuples: there is one group
for each different grouping subtuple GST in S. A group
contains those and only those tuples of Shaving GST
as grouping subtuple (and no other tuple). 0
We say that a predicate pred! depends directly (resp.
indirectly) on the predicate pred 2 , if pred 2 appears in
the body of a rule defining pred! (resp. if there is a predicate pred3 such that pred! depends directly on pred3
and pred3 depends indirectly on pred2 ). We can now
give the following definition, inspired by the terminology used in the case of Datalog queries with negation.
Definition 1.3 Stratified aggregate program
An aggregate program is stratified if no aggregate
predicate depends directly nor indirectly on itself. 0
Note that aggregate programs having recursive predicates which are not mutually recursive with aggregate
predicates are indeed aggregate stratified.
A simple example of a stratified aggregate program
is the following;
Example 1.1 Aggregate on a base relation
Suppose that the database contains a base relation employee with tuples of the form employee(Name,
Dept, Salary).
One can define a virtual predicate
avg.Jlalary _peLdept by the following rule:
avg_salary_per_dept(Dept, AvgSal) (group_bye employee (Name , Dept, Salary),
[Dept] ,
[AvgSal isagg avg(Salary)] ).
If the predicate avg_salary _peLdept is queried with the
argument Dept instantiated, it returns one single value.
If the query is fully uninstantiated, the result is a binary
table with one value per department. V
2
Stratified Aggregates
In this section, we first recall the natural semantics of
stratified aggregate 'programs, which rely on the stratification of rules. We then describe their evaluation by
extending the QSQ framework.
2.1
Semantics
The stratification of a database ensures the soundness of the following extension of the classical prooftheoretic definition of semantics for stratified aggrer:ate
programs.
918
Similarly to Datalog programs with stratified negation, a stratified aggregate program P can be divided
into strata Sj, i = 1, ... ,no
Consider a predicate p appearing in the body of a
rule R E Si. If R is an aggregate rule and p appears
as a grouping predicate in R, then the definition of p is
contained in Uj}. {created(P>}QP>
We
show
only
code of the theories user _databaseJllanager,
handles the user's requests in a user database,
rnain_databaseJllanager, which guarantees the
sistency of the main database.
5
the
that
and
con-
Oikos
Oikos is a distributed s,oftware development environment based on PoliS and written in ESP [Ambriola et
al., 1990b J. Oikos provides a number of standard facilities that can be easily configured using ESP itself.
The overall approach consists of offering mechanisms
that can be easily composed, in order to easily explore
different environment designs.
The ESP blackboard hierarchy offers a natural
way of structuring a software development environment. It is used to reflect its decomposition in subenvironments, according to a top-dmvn refinement
strategy. The blackboard hierarchy is not really
constraining the communication patterns among the
agents participating in a software development process, since blackboard narl'les can be exchanged in tuples, and an agent can put tuples in any blackboarcL
provided t.hat it knows the name of the destination.
Therefore, highly dynamic communication patterns
can be set up, even connecting blackboards at different levels of the hierarchy, if this is convenient.
5.1
A Prototype Implementation of
Oikos
The Oikos prototype has been implemented on t.op
of a local network connecting some Sun workstations
and a Vax mainframe. Oikos is writ.ten is ESP, that
provides the basic mechanisms for physical distribution and dynamic activation of communicating processes. ESP itself is implemented partly 'in C and
partly in Prolog [Bucci et al., 1991J. The standard set
of Prolog system predicates has been augmented with
IPC mechanisms using Unix Internet sockets [Ambriola et al., 1990aJ.
The three layers of the Oikos architecture are: the
Oikos runtime support, which is \vritten in ESP and
provides escapes to the underlying operating system;
a collection of separate processes, that implement a
distributed ESP run-time system; the underlying operating system, UNIX in this case. The processes in
the second layer are depicted by circles: an ESP process is the local interpreter of the ESP language, and
there are as many of them as machines in the network,
eager to interpret pieces of the ESP program. For a.
more detailed exposition see [Bucci et at.. 19911.
Oikos Services
Oikos provides a set of basic services. A service offers
access to shared resources according to a given protocol. The public interface of a service specifies the protocol of interaction with the service, i.e., which tuples
must be put into its blackboard to obtain its service.
For lack of space, we simply summarize the Oikos
standa1'd se1'vices, which play the role that primitive
operators and data types play in a programming language. Vie discuss here the most meaningful only,
i. e., those that are fundamental in any software development process.
The Tool Kit Server (TKS), the Service and Theory Server (STS) and the History Server (HS) offer
restricted access to databases of system data, e.g.,
those modeling the predefined documents. A User Interface Service (UIS) is used to interact with running
softwai~e process programs, whereas the Workspace
Server (vVS) allows users to run the tools and the
executable products of the software process. The
DataBase Server (DBS) offers unrestricted access to
a general purpose project database, and is therefore
used to set up specific project databases. Finally, the
Oikos Run Time System (ORTS) can also be seen as
a server offering essential services, like escapes to the
underlying operating system. All these services, except aRTS, can be simultaneously activated several
times in different blackboards.
The llser accesses Oikos through a special interact.ive service called User Interface Service (UIS). It is
a service because several different UIS can coexist,
and their definitions are ESP programs found in STS.
A UIS shows the user the contents of its blackboard
in a \\lindow, and acts according to the user's input.
A UIS offers also a flexible way to monitor a software
process, since the user can activate it on a blackboard,
looking at the tuple flow, and even saving some tuples with the history server HS. UISs are the basic
blocks of the role services, i.e., those parts of the process program that allows users to interact with the
software process.
For lack of space, here we do not show how Oikos
is used in a real software development process. The
interested reader can see the example contained in
[Ambriola et a.1., 1990b J.
6
Conclusions
In this paper we have introduced PoliS, a cOOl'dination model useful for designing distributed systems. A
programming notation based on PoliS, ESP, has been
used to illustrate the design of Oilms, a distributed
software development environment. The goal of the
ESP jOikos project. is to assess the combination of the
blackboard model with logic programming in the de-
933
sign of distributed programming environments. \Vhile
the blackboard model is well known in Artificial Intelligence, its use in Software engineering is quite novel.
After completing the implement.at.ion of Oikos, our
future plans include the st.udy of the impact. of different models of tool coordination in the definition of
planning tools for assisting users in the software process, and the analysis of the role int.erplay in dealing
wi th the software process itself.
Acknowledgements P.Ciancarini has been partially supported by C.N.R. Progetto Fina.lizzato Sistemi Informatici e Caleolo Parallelo, and M.U.R.S.T.
The authors are very grateful to N.Carriero at Yale,
for many discussions on Linda and PoliS, and to the
Shared Prolog and Oikos groups in Pisa, including
V.Ambriola, A.Bucci, T.Castagnetti. M.Danelutto,
and C.Montangero.
References
[Ambriola et al., 1990aJ Vincenzo Ambriola, Paolo
Ciancarini, and Marco Danelutto. Design and distributed implementation of the parallel logic language Shared Prolog. In Proceeding8 of A eM Symp.
on P1'incipies and Pmctice of Parallel Programmi'ng, volume 25::3 of SIGPLAN Notices, pages <1049, 1990.
[Ambriola et al., 1990bJ Vincenzo Ambriola, Paolo
Ciancarini, and Carlo lVIontangero. Enacting software processes in Oikos. In Proceedings of ACM
SIGSOFT Conlon Software Development Environments, volume 15:6 of A CM SIGSOFT Software
Enginee,,.ing Notes, pages 12-23, 1990.
[Banatre and LeMetayer, 1990J .Jean-Pierre Banat.re
and Daniel LeMetayer. The gamma model and its
discipline of programming. Science of Computer
Programm£ng, 1.5:.5.5-77, 1990.
[Barghouti and Kaiser, 1990J Naser Barghouti a.nd
Gail Kaiser. Modeling concurrency in rule-based
development environments. IEEE Expert, 5(6):1527, December 1990.
[Brogi and Ciancarini, 1991J Antonio
Brogi
and
Paolo Ciancarini. The concurrent language Shared
Prolog. A eM Trans. on Programming Langltages
and Systems, 1:3(1):99-123, 1991.
[Bucci et ai., 1991J Annamaria Bucci, Paolo Ciancarini, and Carlo Montangero. Extended Shared Prolog: A multiple tuple space logic language. In Proceedings of the 10 th Japanese Logic Programming
Conference. 1991.
[Carriero and Gelernter, 1991J Nick
Carriero and David Gelernter. Coordination languages and their significance. Communications oj
the AC1I-1, 1991.
[Ciancarini, 1990aJ Paolo Ciancarini.
Blackboard
prograrnming in Shared Prolog. In David Gelernter,
Alex Nicolau, and David Padua, editors, Languages
and Compile1's for Pamllel Computing, pages 170185. MIT Press, 1990.
[Ciancarini, 1990bJ Paolo Ciancarini. Coordination
languages for open system programming.
In
Proceedings IEEE Conj. on Computer Languages,
pages 120-129, 1990.
[Furukawa et at., 1984J K. Furukawa, A. Takeuchi,
S. Kunifujii, H. Yasukawa., M. Ohki, and K. Ueda.
l'v1a.nda.la: A logic based kno\ovledge programming
system. In Proc. of the Int. Con]. on Fifth Generation Computer SY8tems, pages 613-622, 1984.
[Gelernter, 1985] David Gelernter. Generative communication in Linda.. A CJltf Tmns. on P1'Ogmmming
La.nguages a.nd System8, 7(1 ):80-112, 1985.
[Gelernter, 1989J David Gelernter. Multiple tuple
spaces in Linda. In Proceedings of PARLE '89,
volume 36.5 of Lecture Note::; in Computer Science,
pages 20-27, 1989.
[Kaiser et al., 1987] Gail Kaiser, Simon Kapla.n, and
Josephine Micallef. NIultiuser, distributed language
based environments. IEEE Software, 4(11):58-67,
1987.
Matsuoka
[Matsuoka and Kawai, 1988J S.
and S. Kawai. Using tuple-space communication in
distributed object-oriented architectures. In Proc.
OOPSLA'88, volume 23:11 of ACA1 SIGPLAN
Notices, pages 276-284, November 1988.
[Perry and Ka.iser, 1991] Dewayne Perry and Gail
Kaiser. :Models of software development environments.
IEEE Tmns. on Software Engineering,
17(3):283-295, 1991.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
934
Visualizing Parallel Logic Programs with VISTA
E. Tick
Dept. of Computer Science
University of Oregon
Eugene OR 97403
ABSTRACT
A software visualization tool is described that transforms
program execution trace data from a multiprocessor into
a single color image: a program signature. The image
is essentially the program's logical procedure-invocation
tree, displayed radially from the root, with possible radial and lateral condensation. An implementation of
the tool was made in X-Windows, and experimentation
with the system was performed with trace data from
Panda, a shared-memory multiprocessor implementation
of FGHC. We demonstrate how the tool helps the programmer develop intuitions about the performance of
long-running parallel logic programs.
1
Introduction
Parallel programming is difficult in two main senses. It
is difficult to create correct programs and furthermore,
it is difficult to exploit the maximum possible performance in programs. One approach to alleviating these
difficulties is to support debugging, visualization, and
environment control tools. However, unlike tools for sequential processors, parallel tools must manage a distinctly complex workspace. The numbers of processes,
numbers of processors, topologies, data and control dependencies, communication, synchronization, and event
orderings multiplicatively create a design space that is
too large for current tools to manage.
The overall goal of our research is to contribute to
processing this massive amount of information so that
a programmer can understand it. There is no doubt
that a variety of visualization tools will be needed (e.g.,
[6, 9, 12, 5]): no one view can satisfy all applications,
paradigms, and users. Yet each view should be considered on its own merits: what are its strong and weak
points, how effective is it in conveying the information
desired, and hiding all else. In this paper we introduce
one view in such a system: ba.sed on a new technique,
called "kaleidescope visualization," that summarizes the
execution of a program in a single image or signature.
Unlike scientific visualization, i.e., the graphical rendering of multi-dimensional physical processes, in par-
allel performance analysis there are no "physical" phenomena; rather, abstract interactions between objects.
Thus renderings tend to be more abstract, are less constrained by "reality," and are certainly dealing with many
interacting parameters controlling the design space. Kaleidescope visualization is the graphical rendering of a
dynamic call tree of a parallel program in polar cOOl'dinates, to gain maximum utilization of space. To fit the
entire tree into a single workstation window, condensation transformations are performed to shrink the image
without losing visual information.
This paper concentrates on the analysis of parallel
logic programs with VISTA, an X-Windows realization
of kaleidescope visualization. Although we concentrate
on committed-choice reduction-based languages, VISTA
is applicable to a wider class of procedure-based ANDparallel languages. The paper is organization as follows.
Section 2 summarizes similar types of visualization tools,
and Section 3 reviews the VISTA algorithms (summarizing [14]). In Section 4, we describe the para.llel logic
programming platform upon which VISTA experimentation was conducted, and analyze the performance of
logic programs to illustrate the power of the tool. In
Section 5, conclusions and are summarized.
2
Literature Review
Earlier work on WAMTRACE [2, 3], a visualization tool
for OR-parallel Prolog, has influenced our work a great
deal. WAMTRACE is a trace-driven animator for Aurora Prolog [8] (originally for ANLWAM [2]). Aurora
creates a proof tree over which processors ("workers")
travel in search of work. WAMTRACE shows the tree,
growing vertically from root (top) to leaves (bot tom),
with icons representing node and worker types (e.g., live
and dead branchpoints, active and idle workers). The
philosophy of WAMTRACE was that an experimental
tool should present as much information to the programmer as is available. This often results in information
overload, especially because the animation progresses in
time, leaving only the short interval of the nea.r-present
animation frames in the mind of the viewer.
935
With comparison to WAMTRACE, our goals in VISTA were: (1) generalize the tool for other language
paradigms. Specifically, AND-parallel execution is more
prevalent in most languages, and needed to be addressed;
(2) to summarize the animation; (3) abstract away information so as not to detract the viewer from understanding one thing at a time. Thus we introduce different views of the same static image, to convey different
characteristics; (4) more advanced use of color to reduce
image complexity and increase viewer intuitions.
Note that the emphasis of WAMTRACE on animation is a feature, not a bug - the animation enables
the gross behavior of the dynamic scheduling algorithms
to be understood. Animation is implemented, but not
stressed in VISTA. In this paper, we analyze a system
with simple on-demand scheduling [10], and so animation is not critical to understanding program behavior.
There are numerous views of performance data, quite
different than WAMTRACE, e.g., [6, 9, 12, 5]. In general, these methods are effective only for large-grain processes, and either do not show logical (process) views
of program execution, or cannot show such views for
large numbers of processes. Voyeur [12] and Moviola
[5] are closest in concept to VISTA. These animators
have great benefit, but this limits the complexity that
can be realistically viewed. Related research concerns
a visual representation [7] and visual debugger [4] for
committed-choice languages, but these are not for performance analysis.
in T is mapped into a node n' at level [' in T', where
[' = lZ / CJ and c is level-condensing ratio (defined below).
Node condensing is the removal of all the descendants of
the node n from the tree, if the allocated sector (defined
below) for n in the window space is less than one pixel.
With these definitions in hand, VISTA management
is now reviewed. There are two inputs to the algorithm:
a trace file and a source program. A trace file entry consists information corresponding to a time-stamped procedure reduction. Although not currently implemented,
VISTA could easily be extended to accept arbitrary events logged in the trace, as does WAMTRACE.
There are alternative ways to map an arbitrarily
large tree onto a limited window space. Vve employ an
abstraction requiring two passes over the trace: finding
the tree height and creating a level-condensed tree. The
condensed tree keeps the shape of the original tree (although scaling is not precise). This original shape allows
us to carry our intuitions over from the tool's view to
tuning performance of actual programs. In order to calculate the level-condensing ratio, c, the maximum tree
height to be displayed (i.e., limitation of the window
space) is needed: h max = lw/2dJ, where 10 is the maximum window width, and d is the distance between two
adjacent levels. If tree height h :::; h max , level condensing
is not needed. If h > h max , level condensing is performed
with the c ratio calculated as follows:
Co
lh/ hmaxJ
coh max
3
Inside VISTA
The main goal of VISTA is to give effective visual feedback to a programmer tuning a program for parallel performance. To achieve this goal, VISTA displays an entire reduction tree in one (workstation) window, with
image condensation if needed. Two types of condensation are performed: level and node condensing (described below). In addition, VISTA enables a user to
view the tree from different perspectives (PE, time, or
procedure) and zoom-up different portions of the tree.
Since the tree is usually dense for small-grain parallel
programs (even after condensation), and the tree must
be redisplayed when the user desires different views, the
tree-management algorithm must be efficient and the
window space must be utilized effectively.
We now define terminology for describing logical call
trees. The level of a node is the path length from the
root to the node. The root level is zero. The root is the
initial procedure invocation, i.e., the query. The height
of a tree is the longest pa.th from the root to a leaf, plus
one. The height of a node is the height of the tree minus
the level of the node. Level condensing is a mapping
from a tree T to a tree T' with the same ancestor and
descendant relationships, such that a node n at level [
c
-
co(h - coh max )
l:::;t
co+11>t
Co
{
where [ is the node level. This condensation scheme puts
more emphasis (space) on the levels closer to the root
because earlier reductions are generally more "important" than later reductions. The heuristic corresponds
to the user's intuition that processes responsible for distribution of many subprocesses should appear larger.
An open question is the categorization of programs
into those which abide by this heuristic, and those which
do not. A program that would "frustrate" VISTA heUl'istics has the most significant computation near the leaves,
where distribution of this work (near the root) is less important. A trivial example is a tree of parallel tasks,
each a very heavy sequential thread of computation.
VISTA will condense the graph to fit within the window, in the limit (of very long threads) producing a star
shape. Although this may be considered "intuitive," it is
abstracts away all information except the threads, which
themselves are difficult to view against the background.
Alternative views, such as condensing each thread into
a polygon, perhaps colored as a function of the condensation, may be more informative because the freed-up
window space would allow the work distribution at the
root to be viewed also. How this and other types of con-
936
D (q.,3,1)
(•• ,0,4)
CD
('12,3,2) (q .. ,3,3) (q .. ,I,2)
M (•• ,0,5)
('13,2,3)
(103,28)
~
N
P
@] 0
(.",1,3) (q20,O,8) (q23,I,4)
[!]
S
(160,51)
t
(160,51)
(131,28)
( .• ,.,3,5)
~
N
(q2l,2,5)
K
P
(211,74)
@] 0
(211,25) (236,25) (261,25)
(qn.O,9)
~(285'i4)
(285,25)
V
[!]
S
(310,25)
(:J:n,25)
V
( ... ,2,4)
(160,51)
Figure 1: Whole Tree for Qsort: "?- qsort([2,1,4,
Figure 3: Node Allocation for Qsort
5,3],X)"
01apla",B"p[
Dlspay
roc
~lB~ a~",/
O~
,....
Up
BIIAnJ"'.tlon
~~;~ ay
13
SI2.
5'/.0
ub ree
JR
005 cb 'm
~23
Figure 2: Weight Calculation for Qsort
densation, while not scaling the tree linearly, can lead
to better understanding of certain programs, is a topic
of future research.
At this stage of the algorithm, the levels to be displayed, and to be discarded, are decided. To illustrate,
consider Fig. 1 showing the original tree for a Quick
Sort program (the trace executed on four PEs and consists of 23 records). Each node is labeled with a triple,
(Si' pe, index), where Si is procedure S (abbreviated) invoked at trace index i, pe is the PE number, and index is
the sequence index of that PE. For example, (q5, 3,1) of
node D denotes that procedure qso1't was invoked as the
5th trace record, and reduced on PE = 3 as the first goal
executed by that processor. After level condensing with
c=2, nodes B, F-L, U, and V are contracted into their
parents. These odd-level nodes are removed because if
(I mod c) =1= 0, all nodes in level 1 are condensed.
Each node at level 1 in the logical tree is displayed
in the window at a locus defined by radius r = d x I,
where d is the (constant) distance between two adjacent
levels. The node is illustrated by a point, however it is
connected to its children (at the next level) by a closed
polygon around the "family" (the polygon degenerates
into a line if there is one child). The polygon itself is
colored, representing an attribute of the parent. After
Figure 4: Execution Graph of Qsort: PE View (4 PEs)
level condensing has completed, the nodes at each level
are allocated to the corresponding locus (a concentric
circle). This is analogous to the pretty printing problem
for text. We solve this problem heuristically by allocating a sector to each node depending on its weight.
The node and its children are then displayed within the
range of the sector only. The weight w for each node
is heuristically defined as the sum of the weights of its
children plus the height of the node. Thus more weight
is put on nodes closer to the root because, the closer
the node is to the root, the fewer nodes the corresponding circle can contain. Fig. 2 shows an example of the
weight calculation for the Quick Sort program.
A sector is defined as the subset of the concentric circle within which a node can be displayed. To formalize
the sector calculation, consider a unique labeling of each
node by a path from the root {Xl, X2, ... , ;z; d, where Xi
is the sibling number traversed in the path. For example, in Fig. 2, node J with weight 9 has label {1, 3, 1}.
The sector of a node at path p is represented as a pair
(sp, a p ), where Sp and a p are the starting degree and the
allocation degree of the node, respecti vely. The sector
of the root is defined as (0, 360). The starting degree Sp
for a node at level k is calculated as follows:
937
leftmost child
otherwise
In other words, if the node is the leftmost child, then
the starting degree is equal to the starting degree of
the node's parent. Otherwise, the starting degree of the
node is equal to the sum of the starting and allocation
degrees of its left sibling. The allocation degree a p for a
node at level k is calculated as follows:
where Wp is the node's weight, the summation is the
total weight of all m siblings (including the node itself),
and the final factor is the allocation degree of the parent.
Fig. 3 shows the sectors of the nodes for Fig. 2.
After the previous steps, the execution graph is ready
for display in the X-Window System [11] with VISTA as
a client. When drawing the graphic, if the sector calculated is less than one pixel, node condensing is done, i.e.,
the node and its children are not displayed. The exact
position for each node in the window space isn't calculated until the tree is displayed, since the size and the
center of the tree may be changed. The exact node position (x,y) in the window is calculated in the next step
as x = d x 1 x cos(s + a/2) and y = d x 1 x sin(s + a/2),
where, d is the level distance, I is the level of the node,
and (s,a) are the start/allocation degrees of the node. A
complete description of the internal algorithms is given
in [14]. To put the algorithms into perspective, Fig.
4 shows the VISTA display showing a PE view of the
Quick Sort program (corresponding to Figs. 1-3).
4
Program Analysis with VISTA
Our initial experimental testbed for VISTA is an instrumented version of the parallel FGHC system, Panda
[10, 13]. Tuning a fine-grain parallel FGHC program
for increased performance involves understanding how
much parallelism is available and what portion is being utilized. In experimenting with parallel logic programs using VISTA, we have found a number of approaches useful for understanding performance characteristics. Our experiments consisted of a set of execution
runs on a Sequent Symmetry, and involved both modifying the benchmarks and varying the numbers of PEs.
We examine three such benchmarks here.
4.1 Pascal's Triangle Problem
Pascal's Triangle is composed of the coefficients of (x +
~ O. The binomial coefficients of degree n are
computed by adding successive pairs of coefficients of
degree n - 1. A set of coefficients is defined as a row
in Pascal's Triangle. Our first benchmark [13] computes
the 35th row of coefficients, with bignum arithmetic.
y)n for n
The easiest way to understand a program in VISTA
is with a procedure view or graph. Fig. 5 shows the reduction tree from the procedure view. This graph, displayed here without any condensation, has 2,235 nodes
and a height of 56. The interesting snail shape, where
the radial arms correspond to row calculations, indicates
that the rows, and therefore the computations, are growing in size. Near the root, a cyan distribution procedure
spawns the rows, and near the leaves, a sky-blue bignum
procedure adds coefficients. The size of the subtree (i.e.,
one row) is increased by one for every two rows. This
means that ~ and ~ + 1 rows have the same number of
coefficients (because only the first half of the row is ever
computed, taking advantage of the symmetry of a row).
The two lines at the east side represent the expansion
of the final half row into a full row. This program illustrates how the user can roughly understand execution
characteristics from the procedure view, even without
knowing the precise details of the source code.
To analyze the parallelism of the exe~ution graph,
we first examine load balancing among PEs. Good load
balancing among PEs does not necessarily mean efficient exploitation of parallelism. However, without fair
load balancing, full exploitation of parallelism cannot be
achieved. In VISTA, a fair color distribution in the PE
view or graph represents good load balancing. Figs. 6
(PE graph) and 7 (time graph) represent the execution
on five PEs. In the time view, the RGB color spectrum
from blue to magenta represents the complete execution
time. Because there are few visibly distinct colors in
this range, the same color in the time graph does not
necessarily represent the same time. If some nodes are
represented with the same color within the time graph,
and by the same PE color within the PE graph (i.e,
all the nodes are executed by the same PE), then the
reductions were executed sequentially.
All five colors are distributed almost evenly in the
PE graph for Pascal, representing good load balancing.
To further analyze parallelism, both PE and time graphs
are used in conjunction. In the time view, the spectrum
is distributed radially, although not perfectly so. This
indicates that most rows were executed in parallel. Although the maximum parallelism is limited by the PEs
at five, again the vagueness of the RGB spectrum can
be misleading, making it appear as if there is more parallelism. This problem can be overcome to some extent
with a subtree display, where the spectrum is recycled
to represent time relative to the selected root.
Fig. 8 shows the single-PE time graph for Pascal. In
this graph, the spectrum is distributed laterally, around
the spiral. This distribution indicates that the nodes
were executed by depth-first search, the standard Panda
scheduling when no suspensions occur. By comparing
the two time graphs, we can infer the manner of scheduiing: breadth-first on five PEs, and depth-first on one
PE, but without a PE view, we cannot conclusively infer parallelism. Fig. 6 shows that some rows are not
executed entirely by the same PE (i.e., task switches
938
occur within some rows). These characteristics indicate
that suspensions are occurring due to data dependencies
between successive rows of coefficients. All three figures
in conjunction indicate the "wavelike" parallelism being exploited as the leftmost coefficients of the Triangle
propagate the computation down and to the right.
4.2
Semigroup Problem
The Semigroup Problem is the closure under multiplication of a group of vectors [13J. The benchmark uses an
unbalanced binary hash tree to store the vectors previously calculated so that lookups are efficient when
computing the closure. Fig. 9 (PE graph) and Fig.
10 (time graph) were executed on five PEs. The total
number of nodes in the reduction tree is 15,419, and
the tree height h=174. In this experiment, the window
size was 850 x 850 and the level distance was four. The
maximum tree height to be displayed is calculated as
h max = l8~O X 2J = 106. Level condensing is performed
because h> hmax . The first 38 levels are not condensed,
but the remaining 136 levels are condensed by 2:1. This
example demonstrates some strong points of VISTA: (1)
After level condensing, the tree keeps its original shape;
(2) The window-space efficiency is very good. If the
tree were represented in a conventional way (propagating from the top of the window), representation would
be difficult, and space efficiency \vould be poor.
To understand the parallelism characteristics of Semigroup, load balancing among processors is analyzed first.
The most immediate characteristic of the PE graph is
that the reductions form the shape of many spokes or
threads of procedure invocations. Near the root are distribution nodes. Each thread represents a vector multiplication. As the graph shows, almost all threads were
executed without task switch. This indicates few suspensions due to lack of data dependencies, i.e., the vectors are not produced in the pipelined fashion of t.he
Pascal program. By eye, we judge t.hat the five colors
in the PE graph are evenly distributed, indicat.ing that.
load balancing is good.
Lack of dat.a dependencies between nodes is confirmed by a single-PE time graph (not shown). The color
distribution of this graph is similar to that of Fig. 10, indicating that as soon as the first node of t.he new thread
is spawned, bot.h the child and parent threads were executed in parallel, without any suspensions. The reason that the threads are not executed clockwise or anticlockwise in Semigroup, as in Pascal, is that there were
some initial data dependencies near the root. These
dependencies, caused by hash-tree lookups for avoiding
recomputation of a semigroup member, cause critical
suspensions that "randomize" the growth pattern.
When the PE graph (Fig. 9) is viewed in conjunction
with the time graph (Fig. 10), parallelism can be analyzed in more detail. Threads wi th the same colors in
the time graph, and different colors in the PE graph, are
executed in parallel. The PE graph still has a fair number of threads per PE, indicating that not all potential
parallelism has been exploited and additional PEs will
improve speedup. These approximations can be refined
by examining subtree displays.
Historically, the Semigroup program analyzed above
was the result of a number of refinements from an original algorithm written by N. Ichiyoshi [1:3J. Earlier algorithms utilize a pipeline process structure, wherein new
tuples are passed through the pipe, and duplicates are
filtered away. Any tuple surviving the pipe is added as
a new filter at the end, and all of its products with the
"kernel" tuples (the program's inputs) are sent through
the pipeline. Although these a.lgorithms are elegant, the
pipeline structure is a. performance bottleneck. The version analyzed above utilizes a binary tree instead of a
pipeline, increasing the parallelism of the checks.
In retrospect, we see how VISTA could have helped
in developing these successive algorithms. Fig. 11 shows
the time graph for an older version of the program that
has the same complexity as the program analyzed above.
Thus the main difference is the pipeline bottleneck, which
is clearly indicated by the signature's snail shape. Unlike Pascal, time is not projecting radially, indicating
lack of wave parallelism. Successive tuples a.re dependent on previous tuples surviving the pipeline, and this
dependency is seen in the coloring (it could be better
viewed if the RGB spectrum were more distinguished).
The dependency is made explicit by clicking on nodes
to indicate the corresponding procedures. Fig. 10 radiates from the query, indicating the potential parallelism
afforded by a tree vs. a pipeline. The coloring further
indicates that the tree is not bottlenecked.
4.3
Instant Insanity
The Instant Insanity problem is to stack four four-colored
cubes so that the faces of each column of the stack display all four colors. This is a typical all-solutions search
problem with eight solutions. There are several methods for doing the search in a committed-choice language:
most notable are candidates/noncandidates and layered
streams [13J. The candidates method builds an OR-tree
where each node concerns whether the current candidate is consistent with the current partial solution. At
the'root, all orientations of a.ll cubes are candidates and
the partial solution is empty. At the leaves, no candidates remain and the partial solutions are complete.
Each node has two branches: one branch contains the
solutions that include the current candidate, and the
other branch contains solutions that do not include the
candidate. Layered streams is a, network of filters that
eagerly attempt to produce a stream of solutions of the
form H*T. Here H is the first element shared by a set
of solutions, and T are the tails of these solutions. To
throttle excessive speculative parallelism, a "nil check"
is inserted at each filter to ensure that T must have at
least one element.
939
Layered streams has 9,094 reductions (nil check) and
9,775 reductions (without nil check). This increase of
7% reductions is because of two factors: the additional
speculative execution and the bloated conversion of the
now largely incomplete layered stream back into normal
form. Both of these effects are seen by comparing the
VISTA graphs (Fig. 12 and 13). The conversion routine is clearly viewed as a significant subtree without
the nil check, compared to a single thread with checking. The user can now appreciate the relative weight of
the conversion with respect to the entire search. The
speculative branches, however, do not stand out. This
would be an interesting application of user-defined trace
records, where a "trace dye" could be introduced with
a nil check that does not throttle the speculation.
The candidates program has 37,687 reductions, so
that VISTA must condense the image. The final image,
shown in Fig. 14, has 25,127 nodes. Examining the
structure and coloring of the layered-streams and candidates programs, there are no obvious parallelism bottlenecks in either (measurements of all three programs
showed equal PE utilization of 93-95%). From the time
graph coloring, the fine-grain parallelism of the filter
structure is apparent in the layered streams program.
The candidates graph shows large-grain structure, although we must view the time and PE graphs together
to ensure that PEs are equally distributed across time.
The simple examples analyzed here facilitate the exposition of VISTA. Intuitions gained for these programs
have been confirmed by timing measurements [13J. Programs without as much parallelism, and on larger numbers of PEs, can be similarly analyzed. As the number
of PEs grows, however, the tool approaches its limitations because the user can no longer distinguish between
the multiple colors representing the PEs. This is an important area of future research.
5
Conclusions and Future Work
This paper described the performance analysis of parallel logic programs using "kaleidescope visualization."
The VISTA system is an X-Windows realization of the
method, and is demonstrated in the context of para.llel
FGHC programs. We showed how the user can tune a
large-trace program for performance by examining alternative abstract views of the execution. VISTA, because
of its efficient implementation, proved its merit in enabling rapid analysis of views. This tool complements,
but by no means replaces, other visualization methods,
e.g., animation of PE activity and message passing.
We are currently extending this research in several
areas. First, we need to experiment more with the current VISTA prototype, for various programming languages, to determine its utility. Second, coloration methods for combining the time and processor views need
exploration, e.g., a method of spectral superposition [lJ.
The author was supported by an NSF Presidential Young
Investigator award, with funding from Sequent Computer Systems Inc. Computer resources were supplied
both by OACIS and Argonne MeS. D.- Y. Park, in an
outstanding effort, implemented VISTA.
[1 J J. A. Berton. Strategies for Scientific' Visualization:
Analysis and Comparison of Current Techniques.
Proceedings of Extracting Meaning from Complex
Data: P1'ocessing, Display, Intemction, SPIE vol.
1259, pages 110-121. February 1990.
[2J T. Disz and E. Lusk. A Graphical Tool for Observing the Behavior of Parallel Logic Programs.
In Inter. Symp. on Logic Prog., pages 46-53. IEEE
Computer Society, August 1987.
[3J T. Disz et al. Experiments with OR-Parallel Logic
Programs. In Inter. Con! on Logic Prog., pages
576-600. MIT Press, May 1987.
[4J Y. Feldman and E. Shapiro. Temporal Debugging
and its Visual Animation. In Inte'l'. SY1rip. on Logic
Prog., pages 3-17. MIT Press, November 1991.
[5J R. Fowler et al. An Integrated Approach to Parallel Program Debugging and Performance Analysis
on Large-Scale Multiprocessors. SIGPLAN Notices,
24(1 ):163-173, January 1989.
[6] M. T. Heath and .1. A. Etheridge. Visualizing the
Performance of Parallel Programs. IEEE Software,
pages 29-39, September 1991.
[7] K. M. Kahn and V. A. Saraswat. Complete Visualization of Concurrent Programs and their Executions. In IEEE Visual Language W O1'kshop. IEEE
Computer Society, October 1990.
[8] E. Lusk et al. The Aurora Or-Parallel Prolog System. In Inter. Con! on Fifth Gen. Compo Systems,
pages 819-830, Tokyo, November 1988. IeOT.
[9] A. D. Malony and D. Reed. Visualizing Parallel Computer System Performance, pages 59-90.
Addison-Wesley, 1990.
[10] M. Sato and A. Goto. Evaluation of the KL1 Parallel System on a Shared Memory Multiprocessor. In
IFIP Working Conference on Parallel Processing,
pages 305-318. Pisa, North Holland, May 1988.
[11] R. Scheifler and J. Gettys. The X Window System.
ACM Trans. on Graphics, 5:79-109, April 1986.
[12] D. Socha et al. Voyeur: Graphical Views of Parallel Programs. SIGPLAN Notices, 24( 1):206-21.5,
January 1989.
[13] E. Tick. Parallel Logic Programming. MIT Press,
Cambridge MA, 1991.
[14] E. Tick and D.- Y. Park. Kalaidescope Visualization
of Fine-Grain Parallel Programs. In Hawaii Inter.
Con! on System Sciences, vol 2, pages 137-148.
Kauai, IEEE Computer Society, January 1992.
\0
+:0-
o
Figure 7: Graph of Pascal from Time View (5 PEs)
Figure 5: Graph of Pascal from Procedure View (5 PEs)
LJ#AW'k\lWMltl_C;,~~~1., .~_~&wl%
~
tsh. . . -t
'.
\~<'.
\\:..
~flc..f,.c/~
-ie"d
j'
~'",~t~
~\\\I.
~.~,
'~'~' ,,,-, -',-if'.
-:"fty:
i· I .~""
t t
t t
.~
..
F
~,;
~
Figure 6: Graph of Pascal from PE View (5 PEs)
z~
~'
Figure 8: Graph of Pascal from Time View (1 PE)
941
Figure 9: Graph of Semi group from PE View (5 PEs)
Figure 10: Graph of Semi group from Time View (5 PEs)
------- -...
........
Figure 11: Graph of Old Semigroup Algorithm from
Time View (5 PEs)
942
Figure 12: Graph of Layered Stream Cubes with Nil
Check from Time View (5 PEs)
Figure 13: Graph of Layered Stream Cubes without Nil
Check from Time View (5 PEs)
Figure 14: Graph of Candidates Cubes from Time View
(5 PEs)
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
943
Concurrent Constraint Programs to Parse and Animate
Pictures of Concurrent Constraint Programs
Kenneth M. Kahn
XeroxPARC
3333 Coyote Hill Road
Palo Alto, CA 94304
kahn@parc.xerox.com
Abstract
The design and implementation of a visual programming
environment for concurrent constraint programming is described. The system is implemented in Strand, a commercially available concurrent logic programming language.
Three components are now operational and are described in
detail in this report; they are a parser. [med-grained interpreter, and animator of Pictorial Janus programs. Janus.
a concurrent constraint programming language designed to
support distributed computation. has much in common with
concurrent logic programming languages such as FGHC. The
design of a visual syntax for Janus called Pictorial Janus is
described in [KS90].
Visual programs can be created using any illustration or
CAD tool capable of producing a PostScript description of
the drawing. The Pictorial Janus Parser interprets traces of
PostScript executions and produces a textual clausal version
of the parsed picture which can be converted to Strand and
run as an ordinary Strand program.
The parser can also produce input to the Pictorial Janus Interpreter. The interpreter accepts as input a tenn representing
the program clauses and query. This tenn is annotated with
the colors. shapes. fonts. etc. used in the original drawing.
It spawns recurrent agents corresponding to each agent (i.e.
process or goal), rule (clause), message (tenn), port (variable), link (equality relation), and channel. These agents
interact to do the equivalent of clause reduction. The agents
also produce streams of major events (e.g. that some message moved and rescaled to the location and size of some
other message). These streams are merged and fed into the
Pictorial Janus Animator.
The animator generates a stream of animation frames and
associatedsOlmds. The resulting frames can then be printed;
more importantly, they can be converted to a raster format
and recorded on video tape or animated on a work station.
The colors, shape, fonts, line weights, used in the original
drawing are preserved so that the animation displays these
elements in the same graphical tenns as they were conceived
and created.
Various lessons were learned in the process of constructing
the system, ranging from parallel perfonnance issues, to
deadlock, to trade-offs between the use of terms and agents.
1 Introduction
This paper presents a software architecture in which concurrent constraint (or logic) programming plays a predominant
role. The structure of this software and the programming
techniques used are described, and problems that arose and
the resulting redesigns are discussed. This paper is primarily about a large concurrent logic program and the fact that
the program is one which supports a programming environment fOr concurrent constraint programming is unimportant
to this paper. The purpose of this paper is rather to relate experiences in writing a large, complex. and somewhat
unusual application in a concurrent logic programming language. Much of the discussion centers around difficulties
in applying, adapting. and choosing between well-known
concurrent logic programming techniques. Other papers are
in progress which present the visual programming environment.
The software described is part of a "grand plan" in which
parsers. editors, source transfonners. visualizers, animators,
and debuggers all work together to support a programmer
in constructing. maintaining, and understanding concurrent
constraint programs in a completely visual manner. This
work is driven by the belief that such an environment can
have a dramatic impact on the way in which software is
developed.
The grand plan is to support whole families of concurrent
constraint languages, including the familiar Herbrand family.
which includes FGHC [Ued85], Strand [FT89] , and Andorra
Prolog [HJ90]. We also anticipate supporting constraintsystems other than the traditional Herbrand constraints of logic
programming. Initially the system is being built to support
only a pictorial syntax for Janus [SKL90], a concurrent constraint language designed to address some of the needs of
distributed computing. Janus most closely resembles DOC
[Hir86], Strand, and FGHC.
An important aspect of the pictorial syntax of Janus is that
it is a complete syntax (i.e. anything expressible in textual
Janus is expressible in Pictorial Janus) and that the syntax
is based upon the topology of pictures. For example, a port
(i.e. a variable occurrence) is represented by any closed
contour which has no other elements of the picture inside. A
programmer is free to choose any size, color, shape, etc. for
the elements of the program. The syntax of Pictorial Janus
is discussed in greater detail in [KS90]. A simple example
944
program that appends two lists is shown in Figure 1.
Computation is visualized as the reduction of asynchronous agents. The rules for each agent are inside it.
If at least one of these rules can match, then the agent reduces. A matching rule expands and its "ask" devices (its
head and guard) match the corresponding devices attached
to the agent. The rule is then removed, and its body remains
connected to the pre-existing configuration by links. These
links represent equality relations and are collapsed bringing
the ports at each end together.
The matching of an append agent with its recursive rule
is shown in Figure 2. The matching rule contour expands
to match the contour of the agent. The messages and ports
rescale and translate to match the corresponding ports and
messages of the agents. In Figure 3 the commitment of a
rule is depicted. It shows the agent and the matched elements dissolving away leaving the configuration in the body
of the rule connected to the configuration of the computation. Figure 4 shows changes which have no semantic
meaning and are performed to tidy up the picture. Links in
the configuration establish equality relations between ports
and can be shnmk to zero, thereby bringing the equivalent
ports together. Newly spawned agents are scaled.
2 Pictorial Janus System Architecture
Figure 5 attempts to capture the essential modules and data
of a complete Pictorial Janus programming environment. It
depicts the various processing stages which take Pictorial
Janus program drawings to either a textual form for ordinary
compilation or to animations of its execution.
Source programs are drawings in PostScript. PostScript is
well-suited for this because of its ability to describe curves,
colors, fonts, etc. in a flexible and general manner. Since
PostScript is a common page description language for laser
printers, every modem illustration or computer-aided design
program is capable of producing a PostScript description of
a drawing. This is analogous to the situation in textual pr0gramming where the text file for a program can be produced
by any text editor. An alternative to PostScript input yet to be
explored is a custom structure editor that only allows the construction of syntactically correct pictorial programs and can
maintain a semantic representation of the program. Another
source of PostScript is from automatic tracing tools such as
Streamline from Adobe which converts scanned images of
hand-made drawings into PostScript strokes.
The problem of discovering the underlying program from
a PostScript deScription is complicated by the fact that
PostScript is a full programming language. This is analogous to the situation in conventional languages with sources
which require pre-processing. Such sources are not parsed
by a compiler; instead the output of a pre-processor nul on
those sources is. We handle this by executing the PostScript
with an ordinary PostScript interpreter in an environment
which redefmes the graphical primitives that draw strokes,
rule(aS7(272,485,308,521),
append(port(p61(box(273,489,276,492»),
port(p59(box(276,516,279,519»),
port(p63(box(309,50 1,312,504»»,
[equal(c6(312,504,324,516)
port(p63(box(309,501,312,504»),
$(port(p65(box(324,516,327,519»»),
equal(m55(262,488,277,493),
port(p61(box(273,489,276,492»),
[])],
[equal(15(280,518,324,518),
port(p59(box(276,516,279,519»),
port(p65(box(324,516,327,519»»],
[])
Figure 6: Annotated Janus Parse of the base case of Append
show text, etc. to, instead, print a trace of their calls to a file ..
The Pictorial Janus Parser is the module which accepts
such traces of calls to PostScript graphical primitives and
produces a parse in a format called •• annotated Janus". TIlls
format captures the parse tree of the program picture and
maintains correspondences with the original graphical appearances. These correspondences consist of annotations
which give the animator guidance in choosing the appearance, position, and scale of various program elements. They
are ignored if the program is simply to be compiled and
executed without the production of an animated trace. Annotated Janus is the "lingua franca" of the system. It can be
produced by the parser, by a visualizer from textual Janus
to Pictorial Janus, by a custom structure editor, or by a program transformation tool. It can be used by visual debuggers,
animators, or program transformation tools, or it can be converted to textual Janus for ordinary compilation and execution. Figure 6 contains the annotated Janus for produced by
parsing the base case rule of append in Figure 1. (Constants
such as "p61" also name PostScript drawing procedures.)
A central component of the system is a fme-grained interpreter for annotated Janus. As it interprets the program
it produces a stream of events describing activities for each
element of the computation (i.e. each agent, rule, port, channel, message. etc.). The event descriptions include a start
and end time. By default, the interpreter performs every
reduction as soon as possible. This corresponds to a maximally concurrent scheduler. The scheduler can currently be
customized to some extent. It can follow a schedule based
upon the trace of real execution on a parallel machine or
network.
The third major Component is the Pictorial Janus Animator. It accepts the stream of event descriptions from the
fme-grained interpreter and some layout control and produces PostScript describing each individual frame. This
PostScript can be printed, converted to raster for viewing or
video taping, or converted to film.
Other components of the system such as the "visualizer"
which converts textual Janus to Pictorial Janus and a spe-
945
First list followed by second
Figure 1: A Simple Example Program to Append Lists
Figure 2: The Animation of a Sucessful Rille Match
Figure 3: The Animation of a Rille Commitment
Figure 4: The Animation of Links Shrinking and Agents Rescaling
946
... ;:;;....t
........-
•
••••• Bdltor .,/
Legend
l··············\@'8§
..
..............
~. pltuIMd.:
prolfJtype
tl'lllliltlblc
or-
iIItMpllblU:~
Figure 5: Overall Architecture of the System
cialized pen-based editor for Pictorial I anus are under development and will be discussed in future papers. Additional
components such as an interactive visual debugger. a program transformation, and a partial evaluation system based
upon Pictorial I anus are only in the planning stages.
The three major components (the parser, interpreter and
animator) are operational prototypes and are discussed in
further detail in the rest of this paper.
3 Pictorial Janus Parser
The parser begins with an unordered set of line and curve
segments and located text which is the trace from executing
a PostScript description of a Pictorial Ianus drawing (see
Figure 7. These elements are analogous to the "alphabet"
of the language. The fIrst phase of the parser is a sort of
"tokenizing" where abutting curves are joined, closed curves
detected, and arrows recognized. As with most of the phases,
some tolerance for sloppy drawings is allowed.
To reduce the complexity of further operations, a containment tree of the elements is constructed. The containment
tree associates with each closed curve the sets of closed
curves directly contained within, as well as the end points
of open curves and arrows and the text directly contained
within. Since many parsing decisions depend upon which
ps(stroke,4,
[moveto(496,483),
curveto(496,487,485,490,472,490),
curveto(459,490,448,487,448,483),
curveto(448,479,459,475,472,475),
curveto(485,475,496,479,496,483)],
box(448,475,496,490),
[eofill,setrgbcolor(l,l,l),setdeviceUnewidth(l)D.
Figure 7: Sample Trace of PostScript Execution
elements are closest to other elements in the same containment level, the containment tree reduces the amount of search
necessary.
The parse proceeds by a series of phases.
• Identification of Ports. A port is defmed as a closed
contour with nothing inside of it. Once the containment
tree is completely built this is a trivial test.
• Classification ofArrows. Arrows are used for two purposes in Pictorial Ianus: channels between ports (i.e.
distinguishing between the asker and teller occurences
of variables) and the association of agents with rules.
Once ports are identified it is easy to distinguish the two
cases since channels are arrows between ports while
defmition arrows connect agents.
947
• Attachment of Ports. Ports cannot be free-standing;
they must be attached to either an agent contour, a rule
contour, a message contour, or the head of an channel
arrow. Essentially the port attachs to the nearest syntactically correct element. This phase also identifies which
ports are the internal ports of messages. Messages are
identified by having only an internal port and possibly
a label inside .
• Connecting Links. Open curves depict links which connect ports. This phase determines which port is closest
to the end of a link. The connecting port can be up or
down one containment level from the level of the link
end.
After these phases have completed, the parser can generate Annotated Janus or textual Janus by descending from
the top node in the containment tree and collecting information. Agents are distinguished from rules here by alternating
containment levels (i.e. the top level contains agents which
contains rules which contain agents and so on). Ports of
agents, rules and messages are collected into a list by going
clockwise from a distinguished port.
Early versions of the parser represented picture elements
by terms. Initially ,little was known about the terms, so they
contained many unbound logic variables for their role, their
attached ports, their label, etc.. Lists posed problems since it
can't be known beforehand how many elements they have.
If tails are left uninstantiated then at the phase where no more
elements can be added,some process must fmd these tails and
bind them to the empty list. The lists are constructed in the
order the elements were discovered; another logic variable
was needed to hold the sorted list. This implementation
became more cumbersome and was forced to rely upon some
questionable primitives.
Because of these problems, the parser was completely
rewritten to represent picture elements by recurrent agents
(processes). Lists are no longer a problem since agents can
simply recur with a different list. Sorting the list is equally
straight forward.
This use of recurrent agents is an object-oriented programming style [S1'83]. Unlike traditional object-oriented
programming systems, however, the underlying flexibility of
concurrent logic programming can be used to incrementally
refme the type or class of elements. This might be called
"object-oriented recognition". Closed contours, for example, begin as generic •'vanilla" nodes. As relationships are
discovered, a node may specialize itself to a port or non-port.
Figure 8 is a sketch of the code for determining whether a
node is a port or not.
Similarly non-ports may be further specialized as messages, rule contours, or agent contours depending upon their
local relationships. The locality in this case is between a contour and the elements directly contained within and elements
contained in the same contour.
In most object-oriented systems an instance cannot easily
(or at all) change to be the instance of another class, even if
node(ln,Contents,State) :In = [identify..portslIn'],
Contents = [] I
port(ln',state).
node(ln,Contents,State) :In =[identify..portslIn'],
Contents -# [] I
non-port(ln',Contents,State).
Figure 8: An Example of Incremental Class Refmement
that class is subclass of the original class.
4 Pictorial Janus Interpreter
A detailed trace of every reduction in a computation is needed
by the animator. A meta-level interpreter is too coarse for
this purpose. Instead a fme-grained interpreter which can
report events such as each subterm match is needed.
The fme-grained interpreter of Pictorial Janus is constructed out of recurrent agents. Agents are spawned which
represent each element of a program or configuration (programs and configurations are treated identically). There are
agents for each rule (clause),port(variable),message (term),
link (equality relation), channel (asker and teller pairs) and
agent (process). They emulate the ordinary execution at a
message-passing level. An agent reduces by spawning an
arbiter and sending a message to each of its rules. The rules
reduce by sending match messages to each of their ports with
streams to the corresponding ports of the agent. If all of the
ports respond with a possible match then the rule sends a
message to the arbiter. The first rule to send a message to
the arbiter then commits and the others eliminate themselves.
A committing rule spawns new agent, port, rule, message,
channel and link agents.
The agents of the fme-grained interpreter also generate
a stream of events. For example, when a rule commits it
produces two event descriptions. The first indicates that the
rule contour should transform to match the contour of the
agent it is reducing. As with all event descriptions, it also
indicates the start and stop time for this activity. These times
are computed based upon a specification of the scheduler.
The second event describes the removal of the rule. All the
event streams are merged to produce a time-ordered stream
of events.
One problem with the fme-grained interpreter is how it
interprets pictorial programs which deadlock. Each rule
agent suspends, waiting to hear from its ports how the match
went. A port in turn passes the match request to its attached
message. The message asks the corresponding port of the
reducing agent for a description of its attached message.
If there is no message there then the whole collection of
rule, message and port agents suspend until a message is
948
connected to the port. In a deadlocked computation there
never will be an attached message. These suspended agents
are unable to produce events, which in nun prevents the
ordered merge process from producing events; the whole
production of the stream of events is cut off.
In Strand it is possible to work around this by relying upon
the questionable "idle" guard that suspends until the whole
system is idle. The message agents waiting for a response
from the corresponding message which are also part of an
idle system can then proceed to return a match failure or
signal an exception and the interpretation can proceed. It
is possible to detect deadlock in a more principled manner
[S.WKS88], but the price is a significantly more complex and
verbose program.
A related problem is controlling the arbiter between competing rule conunitments. For example, a merge agent with
inputs on both incoming ports can reduce with either rule.
Which rule is chosen depends upon which one is the frrst
to get a message to the arbiter of the reduction. Consequently, the fme- grained interpreter selects between competing clauses depending upon the scheduler of the underlying
Strand implementation. When run on a single processor this
means that the same rule is always chosen. To make more interesting animations a random number generator was needed
to remove these biases.
5 Pictorial Janus Animator
The Pictorial Janus Animator consumes the stream of events
produced by the fme-grained interpreter. It also can be given
layout and viewpoint instructions. It produces a stream of
animation frames in PostScript. The animator currently models space as a sequence of ten planes. The graphics of lower
planes can be obscured by the graphics of higher planes. The
planes are infinite in extent but only a portion is "viewed"
at anyone time.
The animator accepts event descriptions describing events
whose times are described by real numbers. Given a frame
rate (i.e. a sampling rate), these are converted to frame numbers. The animator is like a discrete-time simulator where
on every "tick" every component needs to compute its next
state.
For each kind of event, the animator has methods for depicting it. A typical method might transform one element
to gradually match another (currently the transform involves
translation, scaling, and rotation). For example, a message
matches another message by incrementally changing its position and size until it has the same bounding box as the
other message. The other message may be changing and
the animator needs to adjust the transformation accordingly.
Furthermore, the method must maintain various constraints
on the matching message contour so that it remains· in contact with other elements. In order for a method to transform
an element based upon the position of others, the animator
maintains transformation "histories" for each picture ele-
ment. The history of a visual element is a list of transform
matrices, one for each frame. A frame is constructed by
selecting from each history the appropriate transformation
to apply to the appearance of each element.
The histories are also used to deal with graphical interactions between elements. For example, if a port is to transform
itself to match another port which itself is moving, then on
each frame the position of the tracking port is a function of
where it is, where the other port is, and the amount of time
before they meet. An interesting alternative is that it is a
function, not of where the other port is on each frame, but
where it will be at the time of the meeting. As illustrated
in Figure 9, the former corresponds to one port chasing another, while the later is more like a rendezvous. Generally,
the rendezvous looks better but it requires "knowledge of
the future". With care it is possible to avoid cycles of such
requirements of future knowledge that would lead to a deadlock.
The frrst time the animator was run on a large problem
(Le. one requiring several million reductions), it ran out of
memory. Increasing the amount of available memory to
30 or 40 megabytes helped but then it ran out again for
somewhat larger tasks. The cause of this kind of problem
is very difficult to track down. After much experimentation
it was discovered that the problem was that agents inside
the animator were producing information faster than other
agents were able to consume it. Memory was being used
to "buffer" the messages from the producers to the lagging
consumers.
This is a well-known problem and there is a well-known
concurrent logic programming technique called "bounded
buffers" [TF87] for dealing with it. The simple case of a
single producer and a single consumer is rare in the animator
and a more complex variant was needed to deal with consumers of streams that have multiple producers (typically
combined by an ordered merge). This fixed the problem but
significantly increased the complexity of the source code.
Many subtle bugs cropped up which eventually were traceable to some piece of code not following the bounded-buffer
protocol correctly. These were hard to debug because they
resulted in deadlocks of thousands of agents.
Another shortcoming of using bounded-buffers is that it is
difficult to tune for different language implementations and
hardware platforms. Under some schedulers all this complexity is unneeded because the scheduler nms consumers
frrst. To both simplify the code and increase its flexibility,
the bounded buffer technique was abandoned and instead
each agent was programmed to know the animation frame
number it is contributing to and which was the last frame to
be completed. Producers are now controlled by an integer
indicating how many frames ahead they are allowed to proceed. If this is set to a number larger than the total number of
frames in the animation, then buffering is effectively nuned
off. The use of frame numbers to control producers is easy
to generalize to other problem domains such as simulation,
but it is not as general as the bounded-buffer technique.
:.
i
•••................. ....•.......•.• ~.••••
.;: -.... .........
~
····B············:·
I:.(~I
~:
949
'.....................
~.]
c
I................................... I.................................'"
I ••••••••••••••••••••••••••••••••• ."
Meeting A mtJlIU towards B who mDlIU towards C
1···········A········j ········
..
...-...............................
... I·························~········l
~...
.
1 ··
·...... -------------....-.. --_ .. -- --_ .. -.. .
·
:.(
~: ~
i_____ .... __ ....... ____ .. _... __________ .; i _____ . __ . ___ ........ ___________ .. ___
ABC
~
;
;
Rendezvous A mDlIU towards wlter~ B will ~ at tlte ou! GIld B mows to wltu~ C will ~ at tlu ou!
Figure 9: An Illustration of a Chase in Contrast to a Rendezvous
When boWlded buffers were fIrst introduced the animator
would deadlock sometimes. After a few days of investigation it turned out that the problem was an interaction with
the method for having two ports meet. Recall that there are
two alternatives: "chase" and "rendezvous". Rendezvous
requires knowledge of where the other port will be at the end
of the event. It turned out that the system was deadlocking
whenever the buffer size was smaller than the nwnber of
frames needed to animate a port meeting. Once discovered
it was easy to conditionalize the meet method to use the rendezvous style if the buffers are large enough and otherwise
use the chase style.
Concurrent logic programs can be written without carefully ordering events since the basic computational mechanism reorders events based upon data dependencies. TIlls
greatly simplified the construction of both the fme-grained
interpreter and the animator. Each event can be handled
independently regardless of whether the data it needs from
other concurrent activities has been produced. In a sequentiallanguage the programs would have to have been carefully
constructed so that, say, the agent contour changes are computed before the dependent changes on their ports.
6 Preliminary Performance Results
The parser, interpreter and animator are implemented in over
11,000 lines of Strand code. The only important component
in C is a routine which fmds the closest point on a Bezier
curve to another point. A typical parse takes a few CPU
minutes (on a SUN Sparc 2). The interpreter typically takes
a few CPU minutes as well. The animator typically takes
tens of CPU minutes (10 to 20 million reductions is not
Wlcomrnon).
The sequential execution of the system cannot be sped up
much by optimizing the Strand code or replacing it with C.
For the parser nearly half of its time is spent in the C routine
for fmding the closest point to a curve. The Strand code
of the animator takes a third of the total time to produce
an animation; the PostScript rendering to raster takes up the
rest.
It would seem that parallel execution should speed up
the system significantly. Preliminary results have, however,
been disappointing. Possible reasons include:
• Communication costs. The coding style used strived
for maximal parallelism but little attention was paid to
the amoWlt of information passed between agents. On
a good shared-memory implementation, this would not
be a factor. There are many cases where.much of this
communication can be programmed away. For example, rather than commWlicate large shared structures
between agents, each processing node could have its
private copy and the messages between nodes would
just contain tokens referring to elements of these structures. TIlls rewriting has yet to be done. It would also be
coWlter to the dream of concurrent constraint programming (including the special case of concurrent logic
programming) that straight-forward high-level portable
programs can run effIciently in different environments
without major revision. Some rewriting has been done
to enable experimentation with parallelism. For example, the output of the animator previously was a large
PostScript fIle and now is a set of fIles, one for each
frame.
• Agent to Processor mappings. Experiments to date have
used agent-to-processormapping annotations. While a
few different mappings have been tried it is possible
that a good one exists but has yet to be discovered. No
experiments using a load balancing scheduler have been
tried.
Speedups of a factor of 2 to 3 were easily obtained by
spawning Unix -level processes to convert PostScript to raster
format on separate processors in parallel.
950
7
Conclusions and Future Work
The building of a large prototype visual programming environment in a concmrent logic programming language was
described. The architecture was presented and some experiences and lessons learned were described. These lessons
range from the trade-offs between using messages (terms)
and recurrent agents, to difficulties with producers getting
too far ahead of conswners, to dealing with deadlock.
For sequential executions the overhead of using a concurrent logic programming language was small. For parallel
executions on distributed memory machines, speedups are
not readily available and appear to require program rewriting and/or very clever distributions of agents and data to
processors.
The system is under development. Current plans include
extending the animator to deal with both spatial and temporal
abstractions. The animator needs to deal better with the
layout of elements. The parser needs to be revised to deal
robustly with hand-drawn input. Support for primitives and
foreign procedure calls are needed. The interpreter needs
to be able to accept general scheduler specifications. The
animator is cmrently able to produce a simple sound track
synchronized with the animation. The sounds depend upon
the kind of activities occurring. This should be extended
to differentiate between different elements involved in the
activities.
A very challenging direction for future development is to
build a "real-time" version of the system that the user can
influence as the computation proceeds. This could lead to
very powerful debugging tools. It could also be the basis
for user interfaces that are simultaneously interactive visual
programs. Such a system would need to run on platforms
capable of many millions of reductions every second.
8 Acknowledgements
The design of this system benefited from discussions with
Vijay Saraswat and Volker Haarslev. I am grateful to Mary
Dalrymple, Vijay Saraswat, and Markus Fromherz for comments on earlier drafts of this paper.
References
[Ff89]
Ian Foster and Stephen Taylor. Strand: A practical parallel programming language. In ProceedingsoftheNorthAmericanLogicProgramming
Conference,1989.
[Hir86]
Masahiro Hirata. Programming language doc
and its self-description, or, x=x considered
harmful. In3d Conference Proceedings ofJapan
Society for Software Science and Techtwlogy,
pages 69-72, 1986.
[HJ90]
Seif Haridi and Sverker Janson. Kernel andorra
prolog and its computation model. In Proceedings ofthe Seventh International Conference on
Logic Programming. JWle 1990.
[KS90]
Kenneth M. Kahn and Vijay A. Saraswat. Complete visualizations of concurrent programs and
their executions. In Proceedings of the IEEE
VisualLanguage Workshop, October 1990.
[SKL90]
Vijay A. Saraswat. Kenneth Kahn, and Jacob
Levy. Janus-A step towanls distributed constraint programming. In Proceedings of the
North American Logic Programming Conference. MIT Press, October 1990.
[ST83]
Ehud Shapiro and A. Takeuchi. Object oriented
programming in concurrent prolog. New Generation Computing. 1:25-48, 1983.
[SWKS88] Vijay A. Saraswat. David Weinbawn. Ken
Kahn, and Ehud Shapiro. Detecting stable properties of networks in concurrent logic programming languages. In Proceedings of the Seventh Annual ACM Symposium on Principles of
Distributed Computing (PODC 88), pages 210222, August 1988.
[TF87]
A. Takeuchi and K. Furukawa. Concurrent
Prolog: Collected Papers, volwne I, chapter
Bounded Buffer Conununication in ConcWTent
Prolog, pages 464-476. The MIT Press, 1987.
[Ued85]
K. Ueda. Guarded Hom Clauses. Technical
Report TR-103, ICar, June 1985.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
951
Logic Programs with Inheritance
Varon Goldberg, Willialn Silverman, and Ehud Shapiro
Department of Applied Mathematics and' Computer Science
The Weizmann Institute of Science
Rehovot 76100, Israel
Abstract
It is well known that while concurrent logic
languages provide excellent computational support for object-:-oriented programming they provide poor notational support for the abstractions
it requires. In an attempt to remedy their main
weaknesses - verbose description of state change
and of communication and the lack of class-like inheritance mechanism - new object-oriented languages were developed on top of concurrent logic
languages.
In this paper we explore an alternative solution: a notational extension to pure logic programs that supports the concise expression of both
state change and communication and incorporates
an inheritance mechanism. We claim that combined with the execution mechanism of concurrent
logic programs this notational extension results
in a powerful and convenient concurrent objectoriented programming language.
The use oflogic programs with inheritance had a
profound influence on our programming. We have
found the notation vital in the structuring of a
large application program we are currently building that consists of a variety of objects and interfaces to them.
1
Introduction
We share with the Vulcan language proposal [8] the
view on the utility of concurrent logic languages as
object-oriented languages:
"The concurrent logic programming
languages cleanly build objects with
changeable state out of purely side-effectThe resultfree foundations.
[... ]
ing system has all the fine-grained concurrency, synchronization, encapsulation,
and open-systems abilities of Actors [4].
In addition, it provides unification, logic
variables, partially instantiated messages
and data, and the declarative semantics
of first-order logic.
Abstract machines and corresponding concrete implementations support the
computational model of these languages,
providing cheap, light-weight processes
[... ] Since objects with state are not taken
as a base concept, but are built out of
finer-grained material, many variations
on traditional notions of object-oriented
programming are possible. These include
object forking and merging, direct broadcasting, message stream peeking, prioritized message sending, and multiple message streams per object."
See also [7] for a more recent account of the
object-oriented capabilities of concurrent logic programs.
We also share with the designers of Vulcan the
conclusion that:
"While [concurrent logic languages]
provide excellent computational support
[for object-oriented programming], we
claim they do not provide good notation
for expressing the abstractions of objectoriented programming."
However, we differ in the remedy. While Vulcan and similar proposals [2,13] each offer a new
language, whose semantics is given via translation to concurrent logic languages, we propose a
relatively mild notational extension to pure logic
952
programs, and claim that it addresses quite adequately the needs of the object-oriented programmer. Specifically, our notation addresses the main
drawbacks of logic programs for object-oriented
programming: verbose description of objects with
state, cumbersome notation for message sending
and receiving, and the lack of a class-like inheritance mechanism that allows the concise expression of several variants of the same object. We
explain the drawbacks and outline our solutions.
heritedj that is, superclasses are executable in
their own right.
• Clauses are methods: In the concurrent read-
ing of logic programs, each clause in a procedure specifies a possible process behavior. Inheriting a procedure means incorporating appropriate variants of the clauses of the inherited procedure into the inheriting procedure.
• Multiple inheritance: A procedure may inherit
several other procedures.
Inheritance
In certain applications, most notably graphical
user interfaces, many variants of the same object
are employed to cater to various user needs and to
support smooth interaction. In the absence of an
inheritance mechanism, a variant of a given object
must be defined by manually copying the description of the object and editing it to meet the variant
specification. Both development and maintenance
are hampered when multiple copies of essentially
the same piece of code appear within a system.
Class-based inheritance mechanisms provide the
standard solution for defining multiple variants of
the same basic object in a concise way, without
replicating the common parts. In this paper we
propose an inheritance mechanism for logic programs, called logic programs with inheritance, or
/pi for short.
The idea behind [pi is simple. When a procedure
p inherits a procedure q via the inheritance call
q(Tl' ... , Tk), add to p's clauses all of q's clauses,
with the following two basic modifications:
1. Replace the head predicate q by the head
predicate p, with "appropriate arguments" .
2. Replace recursive calls to q by "corresponding
calls" to p.
The formal definition of [pi will make precise the
meaning of the terms "appropriate arguments"
and "corresponding calls". The effect of inheritance is that behaviors realizable by q are also realizable by p, as expected.
Using common object-oriented terminology, lpi
can be characterized as follows:
• Predicates are classes: Logic program predicates are viewed as classes, and procedures
(i.e., predicate definitions) as class definitions. Classes can be executed as well as in-
Inheritance is
specified by "inheritance calls" , which may include parameters. Hence an inheriting procedure may inherit the same procedure in several different ways, using different parameters
in the inheritance calls.
• Parameterized inheritance:
We shall see examples for these features in the following sections.
We note that the inherent nondeterminism of
logic programs (and the inherent indeterminacy of
concurrent logic programs) can accommodate conflicts in inherited methods with no semantic difficulty. If necessary, an overriding mechanism can
be incorporated to enforce a preference of subclass
methods over superclass ones.
Implicit Arguments
Objects with state, accessible via messages, are realized in concurrent logic programs by recurrent
processes. Typically, such a recurrent process has
one or more shared variables and zero or more private state variables. The expression of a recurrent
process by a concurrent logic program has the general form:
p( . .. old state of process .. }... message receiving and sending, ...
p( .. . new state of process ... ).
When a process has several variables, where only
a few of them are accessed or changed on each
process reduction, then the notation of plain logic
programs becomes quite cumbersome. This is due
to the need to specify explicitly all the variables
that do not change in a process's state transition
twice: once in the "old" state in the clause head,
and once in the "new" state in the clause body. In
addition, different names need to be invented for
953
the old and new incarnations of a state variable
that did change in the transition.
This verbose syntax introduces the possibility
of trivial errors and reduces the readability of programs, since it does not highlight the important
parts (what has changed) from other details (repetition of the unchanged part).
We define a notational extension, independent
of the inheritance notation, called implicit arguments, to support the concise expression of recurrent processes. The notation allows specifying
what has changed in the process's state during a
state transition, rather than the entire old and new
states required by a plain logic program, thus effectively providing a frame axiom for the state of
recurrent processes. The semantics of the extended
notation is given in terms of its translation to plain
logic programs.
Streams are the most commonly used data structure in concurrent logic programs. To describe
sending or receiving a message M on a stream XS
one equates Xs with a list (stream) cell whose head
is M and tail is a new variable, say Xs', as in Xs
:= (M-Xsj. In this notation the "states" of the
stream before and after the communication have to
be named and referred to explicitly. We propose a
notation that, by exploiting the implicit arguments
notation, refers only once to the stream being used.
In practice we combine the two notational extension, inheritance and implicit arguments, into one
language. We find the resulting language greatly
superior to the "vanilla" syntax of (concurrent)
logic programs.
In the rest of the paper we formally define the
inheritance notation, the implicit arguments notation and give examples of their use.
2
2.1
Logic Programs with Inheritance
An Example
The inheritance notation is an extension to plain
logic programs which allows inheritance calis,
which are calls of the form +P(Tl' ... , Tn). Using
object-oriented terminology, we refer to a predicate
as a class and to a procedure as a class definition.
Each clause (or disjunct in a disjunctive form) of
the procedure is viewed as a method which manipulates the class's state.
As an example of inheritance consider the following well known logic program which manipulates a
simple counter:
counter(In) :counter(In,O) .
counter([clearIIn],_)
counter(In,O).
counter([addIIn],C) :C' := C + 1, counter(In,C').
counter([read(C) lIn] ,C)
counter(In,C).
counter( [] ,_).
An alternative representation of a logic program
can be in a disjunctive form, where all clauses of
a predicate are written with separating semicolons
and are under a single, simple head (an atom is
simple if its arguments are distinct variables). The
translation between plain logic programs and logic
programs in disjunctive form is trivial.
Put in disjunctive form, the definition of
counter/2 would appear as:
counter(In, c) :In
[clearIIn'] , counter(In' ,0);
In
[addIIn'] , C' := C + 1,
counter(In' ,C');
In = [read(C) lIn'], counter(In' ,C);
In
= [].
We illustrate the inheritance notation by adding
a feature to counter, which enables us to retain a
backup value of the counter, named BackUp. The
checkpoint method backs up the counter value and
restore restores its value from the backup. The
syntactic changes from the previous counter version are an added argument and two new disjuncts.
Using the inheritance notation we would write this
as:
counter2(In)
counter2(In, 0, 0).
counter2(In, C, BackUp)
+counter(In, C);
In
= [checkpointIIn'] ,
counter2(In', C, C);
In = [restoreIIn'] ,
counter2(In', BackUp, BackUp).
954
This procedure stands for:
Lpi Rule: Replace the clause:
counter2(In, e, BackUp) ,In = [checkpoint I In'] ,
counter2(In', e, e);
In = [restoreIIn'] ,
counter2(In', BackUp, BackUp);
In = [clearIIn'] ,
counter2(In' ,0, BackUp);
In = [add I In'] , e' := e + 1,
counter2(In' ,e', BackUp);
In = [read(e) lIn'] ,
counter2(In' ,e, BackUp);
In = [],
p(X 1 ,X2, ... ,Xn)~
(tl;·· .;+q(Xi 1 , · · .,Xi m
2.2
Syntax
with the clause:
where /3; is obtained from /3i by the following transformation:
q(Y1 , ••• Ym)~jh; .. . ; /3,.
Then the inheritance call +q
IS
a shorthand for
+q(Y1 , ... , Ym).
2.3
Semantics
The semantics of a well-defined logic program with
inheritance P is given by the following unfolding
rule, whose application to completion to P results
in a logic program in disjunctive form. In the following rule p, q are predicates, the S's are terms,
X and Yare logic variables, ai, f3i are disjuncts.
e ~ {Yj
--+ Xij
I1 ~
j ~ m}.
2. Replace each recursive call q(Sl' ... , Sm) with
the call p(X 1 , ••
Sj
where n, k 2:: 0, Xi'S are distinct variables and each
(ti is either a conjunctive goal or an inheritance
call of the form +q(Xil' ... ,Xi m ), where the ij's
are distinct and 1 ~ ij ~ n for every j, 1 ~ j ~ m.
Note that if p/n inherits q/m the definition implies that m ~ n.
An inheritance graph for an lpi program P is a
directed graph (V, E) where V is the set of predicates defined in P and for every inheritance call to
a predicate q in the procedure ofpin P, (p, q) E E.
An lpi program P is well-defined if the graph
(V, E) is well-defined (i. e., every predicate that occurs in E is a member of V) and acyclic.
For convenience, we employ the following syntactic default. Suppose the predicate q is defined
by:
.;ak·
where q is defined by the (renamed apart) clause:
1. Apply the substitution
A logic program with inheritance, [pi, is a set of
procedures, each having a unique head predicate.
Each procedure is a disjunctive clause of the form:
);··
I (1 ~ j
~
•,
Xn)O', where
0'
=
de! {
Xii
1--+
m)}.
This completes the definition of logic programs
with inheritance.
Assume some fixed first-order signature L. Let
P be the set of all well-defined logic programs with
inheritance over L. Let --+: P x P be the relation
satisfying P --+ pI iff pI can be obtained from P
by an application of the lpi rule to a clause in P.
The pair (P, --+) is not strictly a 'rewrite system
according to the standard definitions [3], since logic
programs with inheritance are sets, not terms, and
since they are not closed under substitution. However, these differences do not affect the applicability of the relevant tools of rewrite systems, so we
ignore them.
Lemma 1 The rewrite system (P, --+) is terminating and confluent up to clause renaming, i. e. if
pI and P" are two normal forms of P then they are
equivalent up to clause renaming. Furthermore, all
normal forms are ordinary logic programs.
Proof outline: Termination follows from the
fact that any application of the [pi rule eliminates one inheritance call. Normal forms don't
have inheritance calls since they can all be reduced by the requirement that P contains only
well-defined logic programs with inheritance. Confluence follows from the associativity of substitution composition.O
Corollary 1 The semantics of lpi is well defined.
.
955
2.4
An Example of Parameterized
Multiple Inheritance
As an example of parameterized inheritance, suppose we have a "show.id" feature which waits on
an input port In for a message show_id and then
fills the incomplete message with the value I d:
id(In, Id) :In = [show_id(Id)I Newln],
id(Newln, Id).
And suppose we have a class containing two input
ports Inl and In2, where on each port the class can
receive requests to show its id. Instead of copying
the method twice, we shall write:
class(In1, In2, Name)
+id(In1, Name);
+id(In2, Name);
«class body»
The result of applying the lpi rule would be:
class(In1, In2, Name) :In1 = [show_id(Name)I Newln],
class(Newln, In2, Name);
In2 = [show_id(Name)I Newln],
class (In1, .Newln, Name);
«class body»
The same id feature could be used when an object wishes to have different id's on different ports,
i. e., return different answers on different ports for
the same incomplete message show_id, as in:
class(In1, In2, Name1, Name2) :+id(In1, Name1);
+id(In2, Name2);
«class body»
The expansion is as follows:
class (Int", In2, Name1, Name2)
Ini = [show_id(Namei)I Newln],
class(Newln, In2, Name1, Name2);
In2 = [show_id(Name2)INewln],
class(In1, Newln, Name 1 , Name2);
«class body»
2.5
Integration with a Module System
The power oflogic programs with inheritance is enhanced when integrated with a module system. We
have integrated [pi with the hierarchical module
system of Logix [11]. To simplify the description,
we outline the principles behind the integration for
a non-hierarchical module system.
When p in module M inherits q from another
module M', the semantics of inheritance is that
the definitions of all predicates in M', called or inherited directly or indirectly by q, are incorporated
in M, unless they are already defined in M.
This overriding capability, which gives some of
the effects of higher-order programming, proves to
be invaluable in practice. One can easily specify a
variant of a module M by inheriting its top-level
procedure and overriding the definition of one or
more of its subprocedures. For example, by inheriting a sorting module and overriding the comparison routine, one can turn an ordinary sort routine to a sort routine that operates on records with
keys.
We note that although the semantics specifies
"code copying" , the following semantics-preserving
optimization may apply. If M' inherits from M,
P is a set of procedures in M that do not call or
inherit procedures outside of P, and none of the
procedures in P is redefined in M', then the code
for P need not be included in M',· and any call
to a procedure in P may be served by M. This
optimization achieves runtime code sharing among
several modules inheriting from the same module.
3
Logic Programs with Implicit Arguments
3.1
Example
We illustrate the notation of implicit arguments
via an example. The counter program (section 2.1)
is a typical logic program specifying a recurrent
process. A logic program with implicit arguments
that corresponds to the plain logic program for
counter is:
counter(In) + (C=O) :In = [clearlln'] , C'
In
= 0,
self;
[addlln'], C' := C + 1, self;
956
In
[read(C) lIn'], self;
In = [].
whose application to completion results in a disjunctive logic program P'.
Rule 1: Expand local argument of calls.
Replace each procedure call:
Similarly, a binary merge can be defined using implicit arguments by:
merge(Inl,In2,Out) :Inl
[Xllnl'], Out=[XIOut'], self;
In2
[Xlln2'], Out=[XIOut'], self;
Inl
[], Out
In2;
In2
[], Out
Inl.
3.2
Syntax
A logic program with implicit arguments is a set
of clauses. A clause is composed of a predicate
declaration and a disjunctive body. The predicate
declaration has the form:
p(XI' ... ) Xn)
+ (Xn+1 = VI, ... , Xn+k = Vk)+--
where n, k 2': 0, the X's are distinct variable names
and the V's are terms. We say that the predicate
p has n global and k local arguments, denote it
by p/n+k (or pin if k = 0), and call VI, ... , Vk
the initial values of the local arguments of p/n+k.
There can be at most one clause for any predicate
p.
A call to p/n+k in a procedure other than that
of p/n+k has the form P(TI" ..' T m ), where m = n
or m
n + k, where the T's are terms. A call to
p/n+k that occurs in its own procedure may also
have the form p, i. e., the call may have no arguments whatsoever. Such a recursive call is called
implicit. In addition, any call to p/n+k that occurs in its own procedure may use the predicate
name self as a synonym for p.
Variable names may be suffixed by a prime, e.g.
X', Y'. xn denotes the variable name X suffixed
by n primes, n 2': O. A primed version of a variable
name denotes a new incarnation of the variable
in the sense that the "most primed" occurrence
of a variable name is considered the most updated
version of the variable and hence is used in implicit
recursive calls as explained below. We assume that
the predicate =/2 is defined via the single unit
clause X
X.
=
=
3.3
Semantics
The semantics of a logic program with implicit arguments P is given by the following rewrite rules,
to a procedure pin
+ k,
by the call:
P(Tl' ...' Tn, Vn+l, ... , Vn+k)
where Vn , ... , Vn+k are the initial values of the local arguments of p/n+k.
Rule 2: Expand implicit recursive calls.
Replace each procedure call:
p or self
in the clause of pin + k by the call:
p( UI , ... , Un+k)
where Ui is the most primed version of Xi in the
clause. We say that xn is the most primed occurrence of X in a clause C if xn occurs in C, and
for no k > n, does Xk occur in C.
For example, applying the rewrite rules to the
merge procedure results in:
merge(Inl,In2,Out) :Inl = [Xllnl'], Out=[xIOut'],
merge(Inl',In2,Out');
In2 = [Xlln2'], Out=[xIOut'],
merge(Inl,In2',Out');
Inl
[], Out = In2;
In2
3.4
= [],
Out = Inl.
Special Notation for Stream
Communication
Streams are the most commonly used data structure in concurrent logic programs. Recurrent processes almost always have one input stream and
often have several additional input andlor output streams. Sending and receiving messages on
a stream XS by a process p can be specified by the
clause schema:
pc. .. Xs ... ) :XS = [MessageIXs'],
... ,
self.
957
where the difference between sending and receiving
is expressed using a language-specific synchronization syntax. Since this is such a common case, we
found it worthwhile to provide it with a special
notation. Our notation is reminiscent of CSP [5]
and Occam [6] (and in logic programming the Pool
language [2]):
Xs
Message,
Xs ? Message,
These constructs specify, respectively, sending and
receiving a message Message on a stream Xs. Each
is equivalent to Xs
[Message-Xs'j with the appropriate language-specific synchronization syntax
added. The construct requires n+4 fewer characters, where n is the length of the stream variable
name, and hence is less liable to typing errors and
probably also more readable.
Using this notation, a binary stream merger can
be specified by:
=
merge(In1,In2,Out)
In1 ? X, Out ! X, self;
In2 ? X, Out ! X, self;
[] , Out = In2;
In1
In2 = [] , Out = In1.
The predicate append/3 can be specified using the
first and third disjuncts of merge/3:
append(In1,In2,Out) :In1 ? X, Out! X, self;
In1 = [], Out = In2.
It is interesting to note that this description of append facilitates its process reading. The program
can be read as: "append is a process with three
streams. Upon receiving an item on its first stream
it sends that item on its third stream and iterates.
If the first steam is closed, then the second and
third streams are connected" .
Using multiple primes allows multiple messages
to be sent or received on the same stream, as the
following example for the filtering of pairs of items
on a stream shows:
remove_pairs(In, Out) :In ? X, In' ? X, Out! X, self;
In ? X, In' ? Y, X =\= Y,
Out! X, Out' ! Y, self.
3.5
Special
metic
Notation
for
Arith-
Arithmetic operations are quite common in ordinary and concurrent logic programs. Recurrent
processes with a loop counter such as the following are abundant:
list(N,Xs) :N > 0, Xs = [NIXs'], N'
list(N',Xs');
N = 0, Xs = [].
N -
1,
Following C conventions we allow variable names
to be suffixed by -- and ++, with the semantics
of the expression N-- given by replacing it with N
and adding the conjunctive goal N' : = N-1. Using
the stream and arithmetic support, the above list
generator can be written as:
list(N,Xs) :N-- > 0, XS ! N, self;
N = 0, Xs= [] .
Similarly, we define += and stands for N' : = N+K, and
N -= K stands for N' : = N-K.
4
4.1
where N += K
Implicit Logic Programs
with Inheritance
Concepts
The combination of inheritance and implicit arguments proves to be both highly succinct and more
readable. For example, the program counter2 of
section 2.1 can now be rewritten as:
counter2(In) + (C=O, BackUp=O)
+counter;
In ? checkpoint, BackUp' = C, self;
In ? restore, C' = BackUp, self.
The new style differentiates between global and
local (hidden) arguments and also avoids copying
counter's code as well as specifying the arguments
of the two recursive calls.
An implicit logic program with inheritance is
translated into a pure logic program by applying the two previously defined rules. That of [pi
(section 2.3) and that of implicit arguments (section 3.3). Minor changes in the rules are due.
Those changes depend on the order in which we
apply the two transformations.
958
4.2
Examples
A curious example in which /pi gives us some insight into a program, is the redefinition of merge
(section 3.4) as:
rnerge(Inl,In2,Out)
+append(Inl, In2, Out);
+append(In2, Inl, Out).
which means that merging is actually trying nondeterministically to append in both possible ways.
The following example implements a simple
lookup table as a list of key - value pairs. The
create predicate builds the list (named Table):
create(Table) :Table=[].
i.e, a new table is an empty list. The following
two predicates are not for direct usage. search iterates through the list as long as the key-value pair
at the top of the list does not match a given Key.
find inherits search, and adds a termination clause.
search(Key, Table, Tablel) :Table? Key1 - Valuel, Key=\=Keyl,
Tablel ! Keyl - Valuel, self.
find(Key, Table, Tablel, Ok) :+search;
Table = [],
Tablel = [],
Ok = false('key not found', Key).
The following check and lookup predicates inherit find and add a clause for the case where an
identical key was found. The replace predicate inherits search directly since we want a different error
message.
check(Key, Table, Tablel, Ok)
+find;
Table? Key - Value,
Tablel = Table,
Ok :;: true.
lookup(Key, Valuel, Table, Tablel, Ok)
+find;
Table? Key - Value,
Tablel = Table,
Valuel = Value,
Ok = true.
replace(Key, Value,
Ne~Value,
Table,
Tablel, Ok)
+search;
Table? Key - OldValue,
Value = OldValue,
Tablel = [Key - Ne~Value
I Table'],
Ok = true;
Table = [],
Tablel = [],
Ok = false('key not found',
Key - Ne~Value).
Finally insert and delete add and remove key- "
value pairs from the table.
insert(Key, Value, Table, Tablel, Ok)
+search;
Table? Key - Valuel,
Tablel = Table,
Ok = false('key"already exists',
Key - Value);
Table = [],
Tablel = [Key - Value],
Ok = true.
delete(Key,Value,Table,Tablel,Ok)
+find;
Table? Key - Valuel,
Value = Valuel,
Table1 = Table' ,
Ok = true.
As a third example we demonstrate the capabilities of the inheritance mechanism in a graphical environment by rewriting the window handling
class from [10].
The first class defines a rectangular area with
methods clear for painting the area specified by
Frame, and ask to retrieve the rectangular's dimensions. In is an input port and Frame is a four-tuple"
of rectangle coordinates.
rectangular_area(In) +
(Frame = {X, Y, w, H})
In ? clear,
clear_primitive(Frame),
self;
In ? ask(Frame),
self.
959
The following class frame is a rectangular area
with some content, which means that apart from
the methods clear and ask, one can draw the area
boundaries, and refresh it. Note that refresh is just
a combination of two previously defined methods
draw and clear. This also fixes a subtle synchronization bug in Shapiro and Takeuchi [10] where
the two methods were simultaneously activated,
one by the class process and one by the independent superclass process, which could have caused
drawing before clearing.
frame(In) + (Frame = {X,Y,W,H})
+rectangular_area;
In ? draw,
draw_lines(Frame), self;
In ? refresh,
self([clear, draw\In']).
The final class labeled Window adds two more
methods: change, to change a label, and show to
show it. In addition we redefine the refresh method
to show the label after refreshing (we thus require a
method override mechanism). Another local variable Label is added.
labeledWindow(In) + (Frame = {X,Y,W,H},
Label = default)
+frame;
In ? change(Label'), self;
In ? show,
show_label_primitive(Frame),
self;
In ? refresh,
self([clear, draw, show\In']).
After we have the class labeled Window we can
subclass it to define our own window as in:
my_window(In, .... ) + (Frame
Label = ... )
+labeledWindow;
=
«my_window_additional_methods».
The generated code derived from the semantics
of lpi and implicit arguments is not shown here due
to space limitations.
5
5.1
Conclusions
Implementation
Both notations, implicit and lpi, have been implemented in FCP within the Logix system [11] by
adding language preprocessors. The lpi preprocessor implements the combined notation of Section 4;
i. e., it translates FCP with inheritance and implicit arguments to FCP with implicit arguments.
Another preprocessor translates implicit FCP to
pure FCP. Each of the preprocessors is about 1000
lines of code. The implicit preprocessor was first
written in FCP(:,?) [12]. That initial version was
then used to bootstrap a new version written in
the implicit notation. The [pi preprocessor is also
written using the notation of implicit arguments.
5.2
Further work
A certain form of overriding is already available
via the integration of lpi and a module system,
described in Section 2.5. However, one may find
useful also the ability of a subclass's method to
override a method of the superclass. This can be
achieved, for example, by stating that if several
methods apply, then textual order dictates precedence. By appropriately placing inheritance calls,
one can achieve the desired override effect.
Additional clarity and conciseness could be
achieved by enabling an overriding method to also
execute the overridden method (apart from doing some processing of its own). This feature,
called send to super in the object oriented terminology, was easily implemented with Shapiro and
Takeuchi's scheme [10] of a subclass having also an
output stream to its super, by putting the method
on the output stream. As an example of the send
to super feature, suppose in my_window (section
4.2) we need to add functionality to the draw routine (e.g. drawing a grid on the rectangular_area),
which means overriding the current draw method.
Instead of copying the whole draw method, we
would write:
In ? draw, send_to_super,
draw_grid(Frame), my_window;
where send_to_super is a macro which copies the
necessary code from the appropriate superclass.
A redundancy problem occurs when we want to
use multiple inheritance but the generated inheritance graph is not a tree. For example, classes
band c both inherit a, and d inherits both band
c. Applying the transformation would result in
d having a's methods twice. This (harmless) redundancy could be optimized later, e.g. by the
decision graph compilation method [9].
960
5.3
Exp erience
The implicit arguments notation was incorporated
into Logix more than two years ago, and has been
used extensively by all members of our group.' All
of us found it preferable to the notation of plain
logic programs.
Logic programs with inheritance were incorporated as an extension to the implicit arguments
notation less than a year ago. It has been used by
all of us extensively, and it has had a major effect
on our programming style. One notable effect is
that inheritance allows us to specify in a modular way processes with a dozen of arguments and
dozens of clauses, by specifying multiple methods,
each referring only to a subset of the process's arguments, and using multiple inheritance to specify
the final process. This programming style meshes
well with the decision graph compilation method
to produce code which is readable, maintainable,
and efficient.
We have implemented two large systems using
lpi, each having several thousand lines of FCP
code, and we find it hard to imagine how we could
have written them without an inheritance notation.
6
Acknowledgments
The notation of implicit arguments was first described in an unpublished paper by Kenneth
Kahn and the last two authors. We thank Yael
Moscowitz and Marilyn Safran for comments on
previous drafts.
References
[1] Clark, K.L., Gregory, S., PARLOG: Parallel
Programming in Logic, ACM Trans. on Programming Languages and Systems, 8(1), pp.
1-49, 19S6.
[2] Davison, A., POOL: A PARLOG Object Oriented Language, Imperial College.
[3] Dershowitz, N., and Jouannaud, J .-P.,
Rewrite Systems, in Handbook of Theoretical Computer Science, J. van Leeuwen (Ed.),
pp.243-320, Elsevier Science Publishing, 1990.
[4] Hewitt C., A Universal, modular Actor formalism for artificial intelligence, Proc. Inter-
national Joint Conference on Artificial Intelligence, 1973.
[5] Hoare, C.A.R., Communicating Sequential
Processes, Prentice-Hall, New Jersey, 19S5.
[6] INMOS Ltd., OCCAM Programming Manual,
Prentice-Hall, New Jersey, 19S4.
[7] Kahn K. M., Objects - A Fresh Look. Proceedings of the European Conference on ObjectOriented Programming, Nottingham, England, July 19S9.
[S] Kahn K. M., Tribble D., Miller M. S., Bobrow
D.G. Vulcan: Logical Concurrent Objects. in
Concurrent Prolog: Collected Papers, Vol 2,
Chapter 30, MIT press, 19S7.
[9] Kliger, S., and Shapiro, E., From decision
trees to decision graps, Proc. of the 1990
North American Conf. on Logic Programming, S. Debray and M. Hermenegildo (Eds.),
MIT Press, pp. 97-116, 1990.
[10] Shapiro E., Takeuchi A., Object Oriented Programming in Concurrent Prolog. in Concurrent Prolog: Collected Papers, Vol 2, Chapter
29, MIT press, 19S7.
[11] Silverman, W., Hirsch, M., Houri, A., and
Shapiro, E., The Logix System' User Manual,
Version 1.21, in Concurrent Prolog: Collected
Papers, Vol 2, Chapter 21, MIT press, 19S7.
[12] Yardeni, E., Kliger, S., and Shapiro, E., The
languages FCP(:) and FCP(:,?), New Generation Computing, 7(2-3), pp.S5-S7, 1990.
[13] Yoshida K., Chikayama T. A 'UM - A stream
based Concurrent Object-Oriented Language
FGCS, Vol 2, 19S5.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
961
Implementing a Process Oriented Debugger with Reflection
and Program Transformation
Munenori MAEDA
International Institute for Advanced Study of Social Information Science,
FUJITSU LABORATORIES LTD.
17-25, Shinkamata 1-chome, Ota-ku, Tokyo 144, Japan
m-maeda@iias.flab.fujitsu.co.jp
Abstract
Programmers writing programs following a typical process and streams paradigm usually have some conceptual
image concerning the program's execution. Conventional
debuggers cannot trace or debug such programs because
they are unable to treat both processes and streams directly. The process oriented GHC debugger we propose
provides high-level facilities, such as displaying processes
and streams in three views and controlling a process's
behavior by interactively blocking or editing data in its
input streams. These facilities make it possible to trace
and check program execution from a programmer's point
of view. We implement the debugger by adopting reflection and program transformation to enhance standard
GHC execution and to treat extended logical terms representing streams.
1
Introduction
Debugging methods for programs in Guarded Horn
Clauses(GHC)[Ueda 1985] are classified into those based
on algorithmic debugging[Takeuchi 1986] under the denotational semantics of GHC programs, and those based
on execution tracing [Goldszmidt et al. 1990]1 under the
operational semantics. This paper proposes a debugging
method belonging to the execution tracing class.
In GHC programming, object-oriented[Shapiro and
Takeuchi 1983] and stream-based[Kahn and MacQueen
1977] programming focus on the notion of processes
and streams. Individual abstract modules are regarded
as processes, some of which are connected by streams,
and communicate with each other concurrently. A typical process repeatedly consumes data from a stream,
1 Even though the literature is concerned only with the execution tracing of Occam programs, its discussion is generally adaptable for most concurrent or parallel program debugging.
changes its internal state, and generates data for another
stream.
In a conventional execution tracer, it is difficult to
capture conceptual execution in terms of processes and
streams, because they are decomposed into GHC primitives and never displayed explicitly. The tracer we propose fully reflects the notion of processes and streams,
and enables both the specific control flow of processes
and the data structure of streams to be processed, making the causality among processes explicit.
2
2.1
2.1.1
Process Oriented Programs
and Debugging
Models of Processes and Streams
in GHC
Process model
A process can be interpreted either as a goal or as
a set of goals, e.g., an "object" in object-oriented
programming[Shapiro and Takeuchi 1983]. The following sections discuss processes based on the latter.
A process consists of goals for the continuation of the
process or goals for internal procedures defined in the
process. The continuation goal accepts streams in its
arguments one by one, and reserves its internal state in
other arguments. The stream argument takes a role of
an I/0 port for the dat3. migration. The internal state is
not affected by other processes, but is calculated by the
previous state and input data captured from streams.
A process features:
Creation: A process is created by the first call of the
continuation goal.
One-step execution: Reading data from streams, writing data to other streams, and changing the inter-
962
nal state using internal procedures are regarded as
atomic actions in an execution step.
Continuation and termination: A process will carryon
its computation with a new internal state when the
continuation goal is invoked. Otherwise the process
terminates its execution.
2.1.2
Stream model
A stream is a sequence of logical terms whose
operations[Tribble et al. 1987] are limited to reading the
first term of a stream and writing a term to the tail of a
stream.
A simple notation for streams is first introduced.
Streams are constructed by stream-variables SV, streamfunctors (SHIIST) and stream-terminators 0, where SV
is a variable constrained to become either a streamfunctor or a stream-terminator, SH is an arbitrary term
denoting the first data of the stream, and ST is a stream
representing the rest of the stream.
A stream features:
Creation: Streams are created dynamically when a continuation goal of a process is invoked, where they
are assigned to the arguments of the goal.
Data access: First data D is read from stream SX by
unifying SX with structure (D II ST) in the guard part
of a clause at runtime. Data D is written to streamvariable SX by unifying SX with a structure (DIIST)
in the body, where ST, called the tail of stream SX,
is a stream-variable. In reading or writing done several times, each operation is done recursively for the
rest of stream, ST.
Connection: Streams Sa and Sb are connected if they are
unified in the body. One of the connected streams
is regarded as an alias of the other.
Equivalence relation ~ is defined for the set of streams
S, used to visualize streams.
For substitution (J, relation ~u is defined for S, the set
of streams consisting of terms obtained in the execution.
1. S ~u S;
\is
ES
2. (HilS) ~u S; \is E S
The first reflective rule implies that two lexically identical variables satisfy the relation. The second rule implies
that a stream and its subpart are elements of the same
equivalence class. The third rule means that connected
streams are also elements of the same equivalence class.
Relation ~u is defined as the symmetric and transitive
closure of relation ~U' Below, relation ~ is written in
place of ~u if substitution (J is clearly understood from
the context.
In GHC, a stream is actually implemented by a list in
most programs, i.e. stream-functor (DIIS) and streamterminator () correspond to term [D' I S'] and atom D.
2.2
Process Oriented Debugging
GHC programs based on the process model are called
process oriented programs, each goal in the execution
trace belongs to a process, which is either a continuation
or a part of an internal procedure of the process. In tracing and checking process oriented programs, the goals belonging to a target process must first be extracted from
the "chaotic" execution trace where these goals are interleaved.
The data flow must also be checked. Unless a process inputs intended data, the process outputs incorrect
data to its output stream, or becomes permanently suspended. Intended data may not be sent to the process
for two possible reasons. First, an adjacent process corresponding to the producer of the data malfunctions. Or,
second, the input of the process is disconnected with the
output of the producer, an error caused by misuse of a
shared variable, in which case it is easier to detect the
error if the stream connection between processes, called
"a process network," is displayed.
To make process oriented programs execution traces
easier to read, the process oriented debugger(POD) we
propose, visualizes process and stream information structured from input/output data, internal state values, internal procedure traces and the stream connection between processes.
Programs can be debugged as follows:
Step 1 A user starts execution of a target program.
Step 2 The internal state and input/output data are displayed and checked at an appropriate interval. The
process network is also checked.
Step 3 The program code corresponding to a process
where an error occurs is checked in detail, with
any adjacent processes possibly contributing to the
anomaly also checked.
Step 4 Input/output data sequences are saved for checking an abnormal process because comparing the sequence of output data before and after a program is
modified makes it easier to check the behavior.
If the process malfunctions in Step 3 and 4, it is
forcibly suspended and overall execution is continued as
far as possible because program reexecution takes much
963
1. Forcibly suspending, resuming and aborting the execution of each process.
their streams are to be unified. The problem of whether
two streams satisfy equivalence relation ~ is solved in a
variable equivalence check.
The POD recognizes and manages streams as follows:
First, before starting the execution, a program translator, which is a subsystem of the POD, converts a target
program to a canonical form as detailed in Section 3.4.
Streams are replaced with special tagged terms, and extended unifications are placed for their unifications which
causes accessing data in streams or connecting streams
as described in Section 2.1.
All the parameters of each process are stored in its process table every own execution step. Process execution
is visualized using these process tables.
2. Buffering and modifying the data in streams interactively.
3.3
time and costs, i.e., reexecution must be avoided if it will
take too much data in streams up to a sufficient length.
Otherwise the program will have nondeterministic transitions.
Reexecution can be avoided either by giving the debugger the functions to delete or to modify unexpected
data and to insert data in a stream interactively, or by
having functions preserve data in streams automatically
and execute a process in the preserved environment,
Thus the POD requires the following execution control
functions:
3. Reexecuting a process in the preserved environment.
3
3.1
Implementing the POD
Process Declaration
In our process model(Section 2.1.1), goals are classified
into those for the continuation and those for internal
procedures. They are syntactically the same, and are
specified by the user in a process declaration.
The process declaration consists of a predicate specification and continuation marking. The predicate specification begins with the keyword process followed by
the name of the predicate specifying the usage of each
argument. The usage of each argument is specified by
declaring keyword state or port in an appropriate order. Annotation state shows that the argument represents a part of the internal state. Annotationport shows
that the argument represents a process's I/O port. The
continuation mark consists of a (Q preceding the goal in
a clause. An example of the process declaration is given
in Listing 1.
3.2
Streana Treatnaent
As mentioned previously, streams consist of special variables, functors, and terminators.
In the POD, streams are recognized and supported by
introducing tagged data structures. Each variable, functor, and atom that makes up a stream has an auxiliary
field to store the stream identifier. An identifier is associated with each stream equivalence class.
In implementing identifiers, note that if two streams
with different identifiers are unified, their identifiers
should be the same. This is achieved by assigning a
variable to each identifier and unifying the identifiers if
Reflective Extension of Unifier
Section 3.2 addressed a need for extended unification,
discussed in more detail together with its implementation
with reflection.
Tagged structures must be implemented using
wrapped terms 'Sm'(Var,ID), 'Sm'(Atom,ID), and
'Sm'( {Cons,Head,Tail},ID). The first term represents
the fresh variable of a stream whose first argument, Var,
corresponds to the original variable. The second, ID, is
a fresh variable that denotes the identifier of its stream.
The second term represents the terminator of a stream
whose first argument, Atom, abstracts 0, while the third
corresponds to (Head II Tail) and Cons is a functor for concatenation.
Terms are classified into six types: variable, atom,
compound term 2, stream-variable, stream-functor, and
stream-terminator.
New unification rules are needed for stream-term x
stream-term and stream-term x regular-term. The following cases are representative of the extended unification X >< Y:
Case 1 X is stream variable 'Sm'(V,ID), Y is a variable.
Assign Y to X.
Case 2 X is stream variable 'Sm'(V1,ID 1),
Y is stream variable 'Sm'(V2,ID 2).
Assign V 2 to V I and ID2 to ID 1.
Case 3 X is stream variable 'Sm'(V,ID),
Y is compound ter.m {C,H,T}.
Assign {C,H,'Sm'(N,ID)} to V, and execute
'Sm'(N,ID) >< T, where N is a fresh variable.
Case 4 X is stream functor 'Sm'( {C1,H1,Tt},ID).
Y is compound term {C 2,H 2,T2}.
Assign H2 to HI and C2 to C1.
Execute TI >< T2 recursively. 0
2In KLl, notation {F, AI, ... , An} is allowed to express compound term F(A I , ... , An). We follow the notation for convenience.
964
The remaining 17 possible cases are omitted here due to
space considerations.
The variable check is essential when describing the unifier and is done by reflection [Smith 1984]. Because reflection provides functions to manage memory and goal
queue, it becomes easy to implement streams.
Before developing the POD, we added a refiective feature to GHC similar to that for RGHC[Tanaka 1988].
When a user-defined reflective predicate is invoked, its
arguments are automatically converted from internal
representation to the meta-level ground form. Table
1 shows the correspondence between object-level and
meta-level terms.
Case 3 is described using a reflective predicate whose
second argument Gs is a stream connected with the goal
scheduler.
reflect(vector({atom('Sm'),variable(V),ID}) ><
vector({C,H,T}),Gs,Mm) :- true 1
Gs = [variable(V) = vector({C,H,vector(
{atom('Sm'),variable(N),ID})}),
vector({atom('Sm'),variable(N),ID}) >< T ],
Mm = [malloc(N)].
%% Gs: Goal scheduler, Mm: Memory manager.
The third argument Mm is a stream connected to the
memory manager for the object program. Terms written
in stream Gs are converted from meta-level ground terms
to internal representation and placed in the goal queue.
Terms written in stream Mm are understood as messages
for memory access. Message malloc( N) invokes dynamic
memory allocation, and the reference pointer to allocated
memory is bound to variable N. Extended unification is
defined similarly by the reflective predicate for all cases.
3.4
Tagged Term Transformation
As described above, tagged terms are represented as
wrapped functors. The translator converts streams to
tagged terms automatically. In the following, we show
program examples before and after the conversion, then
we explain the detail of the translating process. Furthermore we present additional transformation steps to
direct the data migration, in other words, to detect the
origin of data.
NAl >< [DAIDX], NA2 >< [DAIDY1].
DX >< 'Sm'(Nl,ID1), DYl >< 'Sm'(N2,ID2).
The converted program differs from the original in the
following ways:
1. The arity of predicate p doubles, i.e. the third and
fourth arguments are new, and the parameters of p
are converted to tagged terms for streams such as
, 8m' (Nl, rDl).
2. The first and second arguments of the converted p
are the same as those of the original, and the corresponding parameters are maintained.
3. Several extended unifications are added in the body.
The above points characterize the transformation: Two
kinds of variable bindings are treated. One is the same
as the original bindings and is used for the execution of
the guard goal. The other consists of tagged terms for
streams, and is used in extended unifications.
According to GHC semantics, unification invoked in a
guard can not export any bindings to the caller. Furthermore user-defined predicates can not be placed in a
guard. Because it is not easy to extend the guard execution rule of GHC, we follow the semantics as much as
possible.
In our transformation, by maintaining the original
bindings, the guard execution involving the parameter
passing is independent of the term extension, and the
extension never causes execution errors. The memory
consumption by storing two kinds of bindings is, however, at least twice as much as that of the original.
Transformation processes are detailed as follows:
Step 1 Choose a clause, and erase all the guard unifications by partial evaluation[Ueda and Chikayama
1985]. Replace nonvariable argument Arg to fresh
variable Var, and add goal Arg = Var in the
guard. By applying the replacement for every
argument, we get a canonical form such that
every argument is a variable and every guard
goal is ether a unification = of a variable and
a nonvariable term, a difference \=, an arithmetic comparison or a type checker. We write a
canonical clause as P (Al, ... , An) : - G(Al, ... ,An)
I Q(Al, ... ,An,Bl, ... ,Bm),whereG(Al, ... ,An)
andQ(Al, ... ,An,Bl, ... ,Bm)representaconjunction of goals.
%Original program
process p (port ,port) .
boot :- true 1 p([1,21_],X), q(X).
p([AIX],Y) :- true 1 Y=[AIY1], @p(X,Yl).
%Conversion for streams
boot :- true 1 p([1,21_],X,'Sm'(Nl,ID1),'Sm'(N2,ID2»,
q (X, , Sm' (N3, ID3) ) ,
Step 2 Rename all variables in the clause and get
[1,21_] >< 'Sm'(Nl,ID1), NX >< 'Sm'(N2,102),
a clause: P(Al', ... ,An'):- G(Al', ... ,An') I
NX >< 'Sm'(N3,103).
Q(Al', ... ,An',Bl', ... ,Bm').
Extract all the
p ( [A 1X] , Y, NAl , NA2) : - true 1 Y= [A 1Yl] ,
unifications from G(Al ' , ... , An' ) , and replace
p (X, Yl , ' Sm' (Nl, lOl) , , Sm' (N2, lO2» ,
symbol = of unification with symbol >< of extended
965
Table 1: Representations of meta-level terms
Level
Unbound variable
Term
Atom
Compound
Object
Meta
unobservable
variable (Addr )
Atom
atom(Atom)
vector( {Cl , ... ,Cn})
unification. The obtained conjunction is written as
G'(Al', ... ,An').
Step 3 For goals defined as processes or as continuations
in Q(Al' , ... , An' , Bl' , ... , Bm'), get conjunction
Q' (Al', ... ,An' ,Bl', ... ,Bm') by repeating follower. Replace port-declared parameter Param
with stream 'Sm'(N,ID), where Nand ID are fresh
variables. Place extended unification Param ><
'Sm'(N,ID) in the body of the new clause.
{Cl , .. ·, Cn}
Y=[AIY1] ,
p(X,Yl,'Sm'(Nl,ID1),'Sm'(N2,ID2) ,PIDself) ,
NAl >< 'Sb'([DAIDX] ,PID1) ,
NA2 >< 'Sb'([DAIDY1] ,PIDself) ,
DX >< 'Sm'(Nl,ID1), DYl >< 'Sm'(N2,ID2).
%% PIDl specifies the origin of the input list.
Step 6 Add argument PID seif to the head of the selected
clause.
Step 4 For two goals, B(Ol, ... ,Oi) in Q(Al, ... ,An,
Bl, ... ,Bm) where OJ,l S j S i, is ranged over
{Ai, ... ,An,Bl, ... ,Bm}, and B' (01' , ... ,Oi') in
Q' (Al ' , ... ,An' , Bl ' , ... ,Bm' ),
goal B" (01, .. ,Oi, 01' , .. ,Oi ') is defined as
their concatenation. Conjunction Q' , (Al, ... ,An,
Bl, ... ,Bm,Al', ... ,An' ,Bl', ... ,Bm') is defined
by combining every B' , .
Step 7 Select predicate pin in the body of the clause
and, if pin is declared as a process, add new parameter PID p / n or else add parameter PID self .
Step 5 An objective clause is obtained by combining G,
G' and Q" as follows:
Step 9 Replace every nonvariable parameter T of the extended unifications with term 'Sb'(T',PIDse1r).
Step 8 Recursively replace every nonvariable term Ti
except streams in G' (Al ' , ... ,An' ,Bl' , ... ,Bm' )
with term 'Sb'(T'i,PID i). Each PID i is used to indicate the origin of corresponding data.
P(Al, ... An,Al', ... An'):- G(Al, .. An) I
3.5 Execution Control
G' (Al ' , .. An') ,
Cl >< 'Sm'(Sl,ID1), ... ,Ci >< 'Sm'(Si,IDi),
Q"(Al, ... ,An,Bl, .. ,Bm,Al', ... ,An',Bl', .. ,Bm'). In the POD, the specific control of a process proposed
in Section 2.2 is achieved by introducing a valve inserted
% Replace Cj in Al', ... ,Bm' to 'Sm'(Sj,IDj)
into a stream. The valve serves as an intelligent data
buffer having two input ports, one output port, and
Detecting the origin of the data is achieved by using
a programmable conditional switch to close the output
a tag similar to that stated above. A tagged functor
port.
One of the two input ports is connected to the
'Sb'(Term,PID) is introduced, where Term corresponds
original
stream, and the other is connected to the user's
to the original term and may include other tagged strucconsole. The user can send commands to the valve. The
tures, PID is an unbound variable used as a process idenamount of buffered data and the description of the type
tifier.
of storable data are programmable conditions.
We show a modified example program using 'Sb' tag,
then the additional transformation steps are detailed.
The valve has three states, automatic migration mode,
conditional migration mode", and manual edit mode,
%Conversion for detecting the origin of the data
each changed by a user command or by evaluating proboot (PIDself) :- true I
p([1,21_],X,'Sm'(Nl,ID1),'Sm'(N2,ID2),PID1), grammable conditions. The valve operates as follows:
q(X,'Sm'(N3,ID3) ,PID2) ,
'Sb'(['Sb'(l,PIDself)I
'Sb'(['Sb'(2,PIDself)I_],PIDself)] ,PIDself)
>< 'Sm'(Nl,ID1),
NX >< 'Sm'(N2,ID2), NX >< 'Sm'(N3,ID3).
p([AIX],Y,NA1,NA2,PIDself) :- true I
• In automatic migration mode, the valve receives
data from its own input port and it stores the data
in its own buffer. Once the buffer becomes full, the
valve outputs the first data in the buffer through the
output port.
966
• In conditional migration mode, The valve gets data
then stores it in the buffer. Once the buffer becomes full or if data does not satisfy a condition,
the valve displays an alert and changes to manual
editing mode.
• In manual editing mode, the valve receives no new
data. The number and the description of data to
be stored, and data actually in the buffer can be
referenced and modified using a text editor. After
editing, the mode returns to the previous mode.
The usages of the menu and the views are described
using a program in Listing 1, first suppose that the program and query prime(10.Ps) are given to the POD.
Listing 1: Primes generator program
with process declaration
process gen(state.state.port). sift(port.port).
filter(port.state.port).
prime(Max.Ps):- true I gen(2.Max.Ns). sift(Ns.Ps).
gen(N.Max.Ns):- N>=Max I Ns=[].
gen(N.Max.Ns):- N,~, <,~. The userdefined goal is a combination of built-in goals.
sift([].Ps):- true I Ps=[].
sift([PIFs].Ps):- true I Ps=[PIPsl].
filter(Fs.P.Fsl). ~sift(Fsl.psl).
filter([].P.Fs):- true I Fs=[].
filter([NINs].P.Fs):- true I sw(N.P.Fsl.[NIFsl].Fs).
4
Figure 1 shows the initial stream graph. Data in a
stream that connects gen and sift can be checked in
two ways, by setting a valve to either an output port of
gen after suspending gen to prevent the creation of new
data, or an input port of sift to avoid consuming data.
Let item (1) be selected to gen suspend rather than sift.
Selecting item (3), then (2), resumes gen. The generated
valve is displayed as an icon in the window as for a process. Initially, the valve is in automatic migration mode
and the default buffer is set to 100.
After process gen finishes generating data, information
in the valve is displayed in a new dialog window if item
(4) is selected. Figure 2 shows that buffer contents are
modified by deleting the number 8. OAssume, then, that
the window is closed and flushes all buffer data flushed.
Flushing data causes sift to resume(Figure 3) with
the stream graph eventually becoming stable.
Process charts for each process in a window are shown
in Figure 4. In this figure:
Examples of Tracing
The POD is developed by extending the GHC interpreter
with reflection in Prolog. A user can trace and debug a
GHC program with a direct manipulation interface provided by the POD.
The interface provides several control facilities for the
target program in a menu, enabling the user to easily
manipulate the POD by selecting a facility from a menu
with a mouse. The menu currently provides, (1) compulsive process suspension, (2) process resumption, (3) valve
insertion, (4) valve control, and (5) terminated process
deletion.
The POD provides different three views to visualize
program execution: the stream graph, process char, and
communication flow.
The stream graph uses animated icons and lines to
show dynamic changes in a network graph of processes
and streams.
The process chart displays, in a structured diagram,
consumed and generated data from or to streams in a
process and its subprocesses. More specifically, the diagram contains dots, two kinds of lines, and data. A dot
represents a process's argument in each execution step.
One kind of line connecting two dots is associated with
relation ~ between them. Consumed or generated data
is located along this line. The other kind of line represents a subprocess fork point.
The communication flow [Shin 1991] shows I/O process
causality. When a substitution generated in process X is
referenced in a committed clause of process Y, a directed
arrow from X to Y is displayed. we say in this case that
data from process X makes Y active.
~filter(Ns.P.Fsl).
sw(N.P.Fsl.Fs2.Fs):- N mod P=:=O
sw(N.P.Fsl.Fs2.Fs):- N mod P=\=O
Fs=Fsl.
Fs=Fs2.
• Process gen maintains an output stream specified
by a vertical gray line at left in the window which
connects all third arguments obtained at each execution step. Numbers generated by gen are aligned
and displayed along this line.
• Process filter maintains both an input and an output stream specified by two vertical lines - black
and gray in the middle of the window. The input
sequence of numbers beside the black line, ranges
from 3 to 9, with 8 deleted. Process filter generates or does not generate at each execution step
when the output sequence on the gray line is referenced.
967
il!!
,I
11
';,
!i,
t,:'
Figure 5: Stream graph for int/2 and memo/3
ii'
','
!, ,'"
,
.'
!I
I
liliil!lHlt
Figure 1: Initial stream graph
Before data editing
• The difference between the process chart for sift
and others is the presence of a process fork specified by a dashed line. Process sift also has both
input and output streams. The output stream remains unchanged as the input stream is created dynamically. Process sift consumes a number from
the input stream in the first argument, generating a
fil ter and a prime number for the output stream
in the second argument in an execution step. The
input stream of the created filter is connected to
the original input and the output stream to the new
input stream of sift.
After data editing
About Valve ...
AboutValve ...
CJ D
IE
Test
Ilnlllgel(l!) ,E<20
Count:
7
MoxCount
•
I
IE I lnl9Cel(l!),E<20 I
Test
Count
MoxCoum:
6
•
Figure 2: Valve controller display
Listing 2: Bounded buffer program
process int (state ,port) , memo(port,port,state).
bb(N):- true I open(N,H,T), int(O,H), memo(H,T,C).
open(O,H,T):- true I H=T.
open(N,H,T):- N>O I N1:=N-1, H=[_IH1],
open(N1,H1,T).
int(N,[XIS]):- true I X=s(N), ~int(s(N),S).
memo([s(X)IS],T,C):- true I T=[_IT1],
~memo(S,T1,s(X)).
Figure 3: Final stream graph
The, bounded buffer program is shown in Listing 2.
Assume that the program and query bb(5) are given.
The query goal invokes processes int and memo, which
are connected after internal procedure open terminates.
Figure 5 shows the stable stream graph. The communication flow of these processes indicates the alternate
transition of two states. At left in Figure 6, Process
memo becomes active by consuming data derived by the
inactive int and a stream functor derived by the previous memo. At right, data from the inactive memo activates
into
5
Conclusion
We have proposed a process oriented debugger(POD) for
GHC programs based on a computation model for processes and streams. The POD enables
Figure 4: Process chart display
• Overall behavior of a process to be controlled by
manipulating data in streams and arbitrary delaying the transmission and reception of data between
processes,
968
[Maeda et al. 1990] M.Maeda, H.Uoi, N.Tokura: "Process and Stream Oriented Debugger for GHC programs" , Proceedings of Logic Programming Conference 1990, pp.169-178, ICOT, July 1990.
Ii
I'
!.'
U:
y
U ,:
ii
UJ iU
I
I I!,
I
,',
,Ui :,:
Figure 6: Communication flow transition
• Process causality to be shown using animated figures of processes and streams in both stream graph
and communication flow displays,
• Stream connectivity to be organized and shown in a
process chart, as a structure of lines connecting the
arguments of a process.
Because individual goal execution is not a concern,
our debugger gives some information such as input and
output substitutions and timing in less detail, making
it necessary to include a viewpoint in the future that
interprets the original sequence of primitives in such a
way that the user can follow it.
Our debugger is implemented using reflection and program transformation. Reflection makes it easy to describe extended unification, and program transformation
guarantees the efficient execution of guard goals under
the standard guard execution mechanism.
Acknowledgments
This research has been carried out as part of the Fifth
Generation Computer Project of Japan. Dongwook
Shin and Youji Kohda contributed insightful comments.
Masaki Murakami assisted in formalizing streams. Simon Martin helped with the English. The author would
like to express thanks to them.
The research originated in his postgraduate study, and
he is indebted to Hirotaka Uoi and Nobuki Tokura of
Osaka University for their invaluable advice.
References
[Goldszmidt et al. 1990] G.S.Goldszmidt,
S.Yemini,
S.Katz: "High-level Language Debugging for Concurrent Programs", ACM Transactions on Computer Systems, Vol.8, No.4, pp.311-336, November
1990.
[Kahn and MacQueen 1977] G.Kahn, D.B.MacQueen:
"Coroutines and Networks of Parallel Processes",
Information Processing 77, North-Holland, pp.993998, 1977.
[Shapiro and Takeuchi 1983]
E.Shapiro, A.Takeuchi: "Object Oriented Programming in Concurrent Prolog", New Generation Computing, Vol.1, No.1, pp.25-48, 1983.
[Shin 1991] D.Shin: "Towards Realistic Type Inference
for Guarded Horn Clauses", Proceedings of Joint
Symposium on Parallel Processing '91, pp.429-436,
1991.
[Smith 1984] B.C.Smith: "Reflection and Semantics in
Lisp", Conference Record of the 11th Annual Symposium on Principles of Programming Languages,
pp.23-35, ACM, January 1984.
[Takeuchi 1986] A.Takeuchi: "Algorithmic Debugging of
GHC programs and its Implement~tion in GHC",
ICOT Tech. Rep. TR-185, ICOT, 1986.
[Tanaka 1988] J.Tanaka: "Meta-interpreters and Reflective Operations in GHC", Proceedings of the International Conference on Fifth Generation Computer
Systems 1988, pp.774-783, ICOT, November 1988.
[Tribble et al. 1987] E.D.Tribble, M.S.Miller, K.Kahn,
D.G.Bobrow, C.Abbot and E.Shapiro: "Channels:
A Generalization of Streams" , Proc. of 4th International Conference of Logic Programming(ICLP)'87
Vol.2, pp,839-857 (1987).
[Ueda 1985] K.Ueda:"Guarded Horn Clauses", ICOT
Tech. Rep. TR-103, pp.1-12 (1985-06).
[Ueda and Chikayama 1985]
K.Ueda, T.Chikayama: "Concurrent Prolog Compiler on Top of Prolog", in Proc. of Symp. on Logic
Prog., pp. 119-126, 1985.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992
'
edited by ICOT. © ICOT, 1992
969
A New Parallelization Method for Production Systems
E. Bahr, F. Barachini, H. Mistelberger
Alcatel Austria-ELIN Research Center
Ruthnergasse 1-7, A-1210 Vienna, Austria
Abstract
The growing importance of expelt systems in real-time
applications reveals the necessity of reducing response
times. Since uniprocessor optimizations of production systems have widely been explored, only multiple processor architectures appear to provide further perfonnance gain. Efficient exploitation of the inherent parallelism of production
systems, however, requires suitable algorithms for load balancing without simultaneously increasing organization or
communication overhead. We present a novel parallel algorithm for PAMELA expert systems, based on dynamic distribution of data processing. The concept is suppolted by a
transputer based architecture with an advanced interconnection structure. 1
1 Introduction
PAMELA (PAttern Matching Expert system LAnguage)
[Barachini and Theuretzbacher 1988, Barachini 1988] was
originally designed as a high pelfonnance rule-based expert
system language especially suited to treat real-time problems. PAMELA's inference engine is highly optimized and
makes the language one of the most etIicient platfonns for
rule-based systems on uniprocessors. Nevertheless, the
computational complexity of rule-based programs leads to
considerable response times. Significant additional speedups are expected from parallel execution of the inference engine.
Parallel PAMELA (P2AMELA) uses a parallel matching
scheme not restricted to a specific matching algorithm. The
matches are perfonned concurrently on a number of identical
processing elements, requiring only little communication.
This is achieved by means of a special scheduling algorithm.
The parallelization algorithm is able to incorporate all optimization techniques of the serial PAMELA version.
A transputer based architecture, the "Parallel PAMELA
Research Engine" (PRE), has been developed to support the
needs of the parallel version of PAMELA. PRE uses a per-
This research is sponsored by the Austrian Innovationsund Technologiefonds as part of the InFACT project.
sonal computer as master processor, with a multicast interface from the PC to the processing elements [Kasparec et al.
1989]. PRE is a research architecture and scalable to 32
transputers. This limitation is not due to the paralle1ization
algorithm but arises from intended cost and complexity restrictions for the hardware architecture. Moreover, it is well
known from the literature that the inherent parallelism in typical present-day production systems does not allow speedup factors of more than 20. Hence, the number of 32 trans put ers is no obstacle for perfonning significant run-time
experiments.
We discuss in detail the mapping of the fme-grain algorithm onto the coarse-grain PRE architecture. Preliminary
perfonnance data of a few hand-coded examples show the
efficiency of our algorithm in exploiting inherent parallelism. These experiences serve as a motivation for a full implementation of a parallelizing production system compiler,
which is in the final stage of development.
2 Production Systems
A (forward chaining) production system (PS) consists of a
production memory containing rules, and a working memory
(WM) containing data (working memory elements, WMEs)
representing the system state. Real-time production systems
are able to communicate with the outside world, e.g. for sampling data or for sending messages to another system.
A rule resembles the well-known IF .. THEN ... statement.
It consists of a left hand side (LHS, corresponding to the IFpart) and a right hand side (RHS, corresponding to the
THEN-part). The PS execution breaks into a sequence of
"recognize-act cycles" (RACs). A single RAC consists of
the following steps:
•
During the "match phase", the LHSs satisfied by the
WMEs are detennined. For each valid rule a corresponding instantiation enters the "conflict set" (CS).
•
During "conflict set resolution" (CSR) one of the rule instantiations in the CS is selected.
•
During the "act phase" the RHS statements of the selected rule are executed. These statements usually
change WM or initiate conununication with the outside
world.
970
3 The Match Algorithlll of
Sequential Palllela
The RETE [Forgy 1979, Forgy 1982] and the TREAT [Miranker 1987J algorithm are the best known state saving algoritluns, which avoid recomputations of comparisons done in
previous RACs. Both algoritluns map the pattems of the
LHSs of the rules to nodes of a network. The inference engine of PAMELA uses a modified version lBarachini and
Theuretzbacher 1988] of the RETE algoritlul1. Since we
have chosen the RETE also for the implementation of our parallelization method, we sketch the basic mechanisms within
a RETE network.
When a WME is added to or removed from the WM, a
plus-token resp. a minus-token representing this action is
passed to the RETE network. In one-input nodes (lINs) attributes of the incoming token are compared against constant
values. Two-input nodes (2INs) have a token memory for
each input. An incoming plus-token is stored in the token
memory, a minus-token removes the corresponding 2IN token from the memory. In 2INs attributes of each incoming token will be compared against attributes of all tokens in the
opposite token memory, according to the conditions in the
LHS. On each (successful) match, a new token is generated
and is sent to the successor node. If a token leaves the RETE
network a rule instantiation enters the CS. Figure-1 shows
a RETE network with three lINs and two 21Ns.
CS
Figure-I: RETE network of a rule with
three patterns
4 Parallelization of Production
Systems
Before discussing parallelization, it seems appropriate fust
to distinguish several classes of parallel architectures. In the
familiar Flynn taxonomy [Flynn 1972], SIMD (single instruction, multiple data), MISD (multiple instruction, single
data), and MIMD (multiple instruction, multiple data) architectures constitute the variety of parallel architectures. Although there have been attempts to implement production
systems on SIMD machines [Forgy 1980], MIMD architectures obviously better match the needs of production system
algorithms. Parallel (distributed) systems of the MIMD class
fall into two categories [Bhuyan 1987], multiprocessors (all
processors share main memory) and multicomputers (each
processor has its own local memory with its local address
space, a processor cannot directly access another processor 's
local memory. Communication is accomplished via message-passing).
Two perfonnance measures are of particular interest in
evaluating parallel systems, speed-up (defmed as the ratio of
the execution times for one and for n processors) and efficiency (defined as speed-up divided by the number of processors) [Eager et al. 1989]. Efficiency depends on the ratio
of communication and computation. Limiting factors are
memory contention (with multiprocessors) and the COlnmunication overhead (with multicomputers), respectively.
Soon after the invention ofthe state saving algorithms,
various investigations have been started on parallel architectures for production systems. There are several levels of parallelism inherent to production system algorithms like the
orie used in PAMELA. Apart from application parallelism
(concurrent execution ofloosely coupled production system
tasks), there exists match parallelism on rule, inter-node,
and intra-node level, act parallelism, and CSR parallelism
[Gupta 1986]. The usefulness of exploiting a particular type
of parallelism depends on the time spent for each phase. Typical numbers for RETE production systems are: match (up
to) 90%, act 5%, and CSR 5% [Forgy 1979, Gupta 1986].
Most investigations therefore have concentrated on concurrent execution of the match phase. However, newer studies
have shown2 that some production systems spend considerably less than 90% in the match phase. With rule level parallelization, for the time of one RAC, each rule is assigned to a
different processing element (PE). With inter-node level parallelization, each node of the RETE-network is assigned to
a particular PE, whereas with intra-node level parallelization comparisons within anode are assigned to different PEs.
So far, none of the implementations of these ideas [Butler
et al. 1988, Gupta 1986, Gupta and Tambe 1988, Kelly and
Sevoria 1987, Miranker 1984, Oshisanwo and Dasiewicz
1985, Schreiner and Zimmermann 1987, Shaw 1987, Stolfo
1984, Tenorio 1984, Tien and Raghavendra 1987] has been
2
Private communication with Daniel Miranker
971
able to simultaneously cope with bottle-necks due to communication overhead or due to shared resources, and load
balancing problems. The approach presented in this paper is
placed among the intra node parallelizations, but avoids the
above-mentioned problems. The algorithm also exploits parallelization of the CSR and is not restricted to RETE but can
be applied to TREAT as well.
In order to illustrate the idea of the independent match
parallelism, we consider a 2IN of a RETE-network (figure-3). It is assumed that both token memories of node k are
known to all PEs. Each PE has physical copies of these memories and in this sense they are global. These memories
have been independently generated from the WM, which is
also global- ie. there is a copy of the WM on each PE. The
task to be carried out is to compare each token of the left to-
5 The Basic Idea of Independent
Match Parallelization
Anticipating the very simple overall structure of the architecture (figure-2) we can sketch the steps of a RAC in Parallel
PAMELA:
• During the match phase the comparisons are assigned to
the PEs by a scheduling algorithm (without inter-PE
communication) .
•
Each PE perfonns its local CSR (which means also a parallelization of the CSR) and sends its candidate rule instantiation to the "master" processor.
•
The master selects one ofthese candidates (global CSR),
executes the RHS of the corresponding rule, and sends
the WME changes back to the PEs.
At the beginning of an RAC, each PE therefore must be able
to decide independently which partition of the expected
comparisons it intends to perfonn. This decision is made dynamically during run-time by a special scheduler running on
each PE.
Figure-2: Hardware architecture
Figure-3: Partitioning of comparisons
ken memory with each token of the right memory, according
to the comparison prescription of the 2IN.
In order to partition the comparison among the PEs, either
the left or the right memory is divided into a number of
blocks, equal to the number of PEs (in our example we assume 4 PEs). This partitioning is only "virtual" in the sense
that both memories are still global. This is indicated by the
dashed lines in figure-3. The partition just means that during
the match phase in node k, the m-th PE takes the tokens in the
m-th block and compares them against all tokens in the opposite memory. In this way, all comparisons in node k are perfonned by the PEs m =1, ... ,N. But the run-time is reduced by
a factor 1IN, provided that the partitioned memory contains
enough tokens. Each PE generates its own tokens corresponding to its successful matches. This leads to disjoint
parts of the left memory at node k+ 1. These parts are local to
each PE, ie. part m is only known to PE m (indicated by the
solid lines in figure-3). The matches in the subsequent nodes
perfonned by PE m can only be done with its local data. One
can easily see that the conjunction of all comparisons gives
the whole set of comparisons of the uniprocessor version.
This is a necessary condition for consistency.
We demonstrate the consistency of the partitioning algorithm by an example with four PEs, which should perfonn all
matches in a two-input node. The input memories of this
node are assumed to be global. The left memory is represented by the vertical axes of the squares in figure-4 and the
right memOlY by the horizontal axes. Then we have three
possibilities to distribute the matches between the PEs: (virtual) partitioning of either the left token memory (first square
972
234
1,2,3,4
1,2,3,4
[f]~~
1,2
3,4
Figure-4: Consistency considerations
in figure-4) or the right token memolY into four parts (second square), or partitioning of both memories into two parts
with a suitable assignment of tokens to PEs (third square in
figure-4). The last square in figure-4 represents an assignment of the tokens violating the consistency since not the
whole cross product of matches is pelfonned. This restriction can be easily taken care of by the following procedure. A
prutitioning number Pk is assigned to the prutitioning node k.
Pk is the dyadic logaritlun of the number of portions into
which the matches in node k has been partitioned. Fonnally
we can define Pk = 0 for unprutitioned matching in node k.
Fmthem10re, a maximum prutitioning number Pmax is introduced. This number Cru1 be calculated by the dyadic logarithm of the number of PEs, which is always a power of two.
Then the restriction takes the fonn Pk :s; Plllax In general, the
left memOlY in a RETE-node need not be global due to previous prutitionings in predecessor nodes. In this case the
right memory must not be fully partitioned according to the
consistency requirements mentioned above.
Since each PE holds the whole RETE-network the PEs
can process the data, assigned by the partitioning algorithm,
without conununication with other PEs. The matches are
performed by using data which is local or global in the logical sense but strictly local with respect to the physical PE.
During the match phase all required global data items are not
accessed by communication with other PEs but are generated
from the global WM, which is located on each PE.
This somewhat simplified picture can be applied to all active nodes in the RETE-network and shows three major
points of our approach:
•
the approach relies on data parallelism in token memory
rather than on progrrun parallelism,
•
a velY fine grained dynamic distribution of matches
runong the PEs leads to good load balancing,
•
no conununication is necessruy during the match phase,
since no PE requires data from another PE.
Compared to a static assigrunent of partitioning nodes our
method is much more flexible, which is crucial if data load
varies overtime. This is especially the case for real-time production systems communicating with the outside world.
In order to clarify some open questions, a few remarks
should be made. So far, we have only considered comparisons in 2IN s and have not included those in lIN s. In principle
it is possible to split the token flow already in an lIN. Since
the lIN matches consume less than 5% of ilie total match
time of a typical production system, we decided to discard
this possibility in favour of more flexibility in partitioning
later RACs.
For simplicity it has been assumed that the token memories contain a sufficient number of tokens, so that partitioning leads to parts with nearly equal numbers of tokens. In real
life this may be a bit too optimistic. Assuming we have 4 PEs
and only 3 tokens in the memory of a node, a division into
four parts excludes one PE from processing this node and all
successor nodes for current path of token flow. Therefore,
three is the maximum speed-up factor in this node. Since we
need no synchronization point after the execution of a node,
the free PE can process another node in the meantime. Nevertheless, it is advisable to partition large memories because
this reduces the chance of idle PEs. A special scheduling algoritlun is called on each PE at the begim1ing of a match
phase, which estimates in advance the optimum nodes for
partitioning.
In reality the matching procedure is more complex since
several token packages may enter the RETE-network at different nodes each RAe. The interference between token
packages can be easily handled by processing package by
package. All matches in the course of the token package's
flow through the network are perfonned before the next
package is processed. Furthermore, a token package can be
partitioned at several nodes. This is allowed as long as the
generalized consistency requirement
l(t~t
L
'Irk
<
'Ir tn,ax
(1)
k=Jirst
is obeyed. Due to the fact that we can have nodes with partially partitioning, ie. Pk < Pmax, left memories can hold data of
different "degree of locality", generated during previous
RACs. Such data is known by fixed subsets of PEs. Therefore, if an incoming token package has to be matched against
a left memory, the package might branch into token packages
of different locality degrees. In all subsequent nodes these token packages have to be processed separately.
When rule instances fmally enter the CS it must be guaranteed that the local CSs are disjoint in order not to contain
dublicated instances. This is achieved by enforcing condition (1) but with "=" instead of":s;". Since the PEs' conflict
sets form disjoint sets most of the CSR is automatically parallelized. Only the CSR for the best candidates received
from the PEs is done by the master. CSR parallelism is only
973
possible when the priority of rule instances do not depend on
the existence or priority of other rule instances so that a global view of the CS is necessary. However, for LEX, MEA
[Cooper and Wogrin 1988] and many other CSR strategies
our algorithm exploits parallelism.
In contrast to rule-level parallelization our algorithm can
take full advantage of RETE-network sharing. This is due to
the fact that our algorithm relies on a kind of data flow splitting which is implicitely controlled by above mentioned consistency precautions.
6 The Scheduling Algorithm
At the beginning of each RAC, new tokens enter the RETEnetwork. They are counted and buffered into token packages.
These incoming tokens have to be matched against the opposite memories in 2INs and emerging tokens are passed to the
successor nodes. The task of the scheduling algorithm is to
predict the match activity within the RETE-network for each
token package. For this reason, the scheduler needs actual infonnation about the number of entering tokens, size of token
memories, and statistical data.
Since the scheduler works independently on each PE, it is
only allowed to use globally known data. Otherwise, there
would be no guarantee that the schedulers on the PEs arrive
at the same results (e.g. decision on the partitioning nodes).
Figure-5: A typical two-input node
Figure-5 shows a typical situation in a 2IN. The number of
matches in node k is just the product of the number of incoming tokens and the number of tokens in the opposite memory;
The number of emerging tokens (successful comparisons)
can be estimated by
Pk is the probability that a particular match is successful. Pk
itself is estimated by the ratio (successful matches )Imatches
of previously performed comparisons in node k and will be
continuously updated.
Having calculated the number of expected matches for a
certain entering token package in all relevant nodes of the
RETE-network, the scheduler decides upon the optimum
partitioning nodes for the considered token package. For this
reason, the scheduler searches for the minimum of a special
function J.l representing a measure for the load balance
among the PEs. The argument of this function is the number
of the partitioning node part. For example, such a function
could have the following form:
part-l
Jt(part)
i=la.qt
v..
""'
L..t v:. + ""'
L..t ~
F.
1,
i=/irst
i=part
t
From the entrance node first of the considered token package to the node part-l, no parallelization takes place. This
contribution is represented by the fust sum. The partitioning
node part and all its successor nodes are parallelized. Hence,
the number of comparisons per PE is only a fraction of the
total number of comparisons in each node, leading to speedup factors Fi in the second sum. These factors range between
1 and the number N of PEs. F; = 1 means that one PE performs all matches in node i, F; =N refers to the most balanced
parallelization. It can be easily shown that the speed-up factors cannot increase from one node to its successor, ie. Fi ~ Fj
for i > j. If, for instance, only one comparison has to be performed in node first then, obviously, Ffirst = 1 and Fi = 1 for
all subsequent nodes. This kind of unbalanced distribution of
comparisons has already been mentioned in the previous section. For the minimum function J.l to work sufficiently well,
the Vi'S must represent the major portion of work to be perfonned during the matching. If it turns out that insertions into
token memories take a significant amount of time, appropriate tenns have to be added.
Of course, the scheduling scheme can be generalized to
several partitioning nodes for each token package. This is
achieved by iterative application of the minimum search,
with updated VkS for k ~ part after each step.
After having detennined the partitioning node part, the token package actually enters the RETE-network.
7 The Parallel PAMELA Research
Engine
It is a well-known experience that performance of (very expensive) shared-memory multiprocessors degrades at higher n (> 4) due to memory contention. The decision therefore
has been made to construct a parallel architecture for PAMELA, offering the scalability of message passing machines as
well as tightly coupled pairs of processors (figure-6). This
tight coupling will facilitate future experiments using small
974
shared-memory subclusters constituting the otherwise losely coupled architecture. The p2 AMELA Research Engine
(PRE) is a prototype serving for the evaluation of parallelized PAMELA (p2 AMELA) expelt systems.
In order to allow the PRE fit into a standard enviromnent,
an industry-standard 386-based PC was selected for the
master processor. This allows the use of standard operating
systems and tools (Unix, A"Windows) as well as to interface
to a variety of networks. The full PAMELA-C expert system
shell therefore C,U1 be ported to the PRE.
Personal Computer
80386
Figure-6: Basic architecture of the PRE
In implementing the PEs, the design is based on the l111110s
transputer. In the course of the design, two particular problems were to be solved, namely (i) to effectively update
working memory on the PEs, and, (ii) to exchange data between a transputer pair on one PE. Therefore, up to 16 double
transputer boards can be served by the PC.
To solve the fIrst problem indicated above, a 'Link Broadcast Interface' (LIBRO) board was developed. The first version of LIBRO was implemented on a PC add-on card. This
PC LIBRO is a quadruple transputer link interface for personal computers; up to four boards per PC can be stacked and
treated as a single device. The LIBRO solution allows WM
contents and other global infonnation to be broadcast
through four to sixteen links under control of a master processor (PC). Each link chaIU1el is buffered in both directions,
so fast access with string primitive instructions is possible.
The core of the p2AMELA Research Engine consists of
Swapable-Memory Transputer Board (SMTB) modules
(figure-7) for the PEs. An SMTB incorporates two IMS-805
transputers at 25MHz with three free links each. The fourth
link of each transputer is used to access a conunon memory
swapping controller. The latter controls the access to four
1MB yte blocks of memory. Each memory block is allocated
to one of two transputers at a time. Control infonnation
supplied by the two transputers through a dedicated link is
used to change the allocation status. Therefore, we have a
kind of shared memory between the two transputers on an
SMTB.
Figure-7: Swappable Memory Transputer Board
8 Preliminary run time
nleasurements
In the absence of a production system compiler for the parallelization method described, we could only encode a few
examples on a simple commercially available transputer
based architecture. The results have been encouraging, although most of the examples exhibit rather low inherent parallelism. In addition, the scheduling algorithm has not yet
been optimized and it therefore causes some overhead.
975
4
4
3
3
8'I
'"0
2
Q)
Q)
-----
2
0..
r/)
a
B
A
c
D
O+---~--~2--~----~3---~----l
E
Number of PEs
A
B
C
D
... four rules
... nine lUles
... M&B, fixed PNs
.. , M&B, dynamic
scheduling
E ... extended M&B
Overall speed-up
Match speed-up
Figure-S: Example run-time measurements
TIle examples A and B in figure-S are simple production
systems, characterized by four and nine lUles, respectively.
Examples C and D use implementations of "Monkeys and
Bananas" with static and dynamic pat1itioning nodes, respectively. Unfortunately, the activity in the RETE-network
for these examples is very low so that the low speed-up is not
very surprising. Example E is an extended version of "Monkeys and Bananas" using more WMEs. It reveals the full
power of the algoritlun, yielding a speed-up factor of about
3.9 for the match phase (four PEs). Taking into account all
overheads, the factor of 3.1 is still remarkable. Figure-S
shows the speed-up dependence on the number of PEs. Although these examples do not provide a representative set of
production systems, they show the existence of expert systems with speed-ups ranging from minimal (one) to maximal (number of PEs) values. These results do not prove the
efficiency of our parallelization algoritlun but they serve as a
motivation for further investigations.
9 Sunlnlary and Future Directions
We presented a new approach to parallel execution of production systems, exploiting data pat"allelism in token
memory. The approach has the following advantages compared to other published parallelization methods that rely on
program parallelism:
•
high utilization of processing power,
•
no need for locking mechanisms for the consistency of
the RETE-network,
Figure-9: Speed-up dependence for extended M&B
•
small cOl1ununication overhead, no bottle-neck on
shared resources
•
a scalable architecture
Possible disadvantages of the method presented may be:
•
the memory per PE will be about the size of the monoprocessor version,
•
the scheduling algorithm causes additional computation
overhead.
After the Parallel PAMELA-C system is fully implemented, measurements on a representative set of production
systems will be perfonned in order to assess the quality of the
parallelization method. Various strategies for scheduling on
PRE and alternative parallel architectures will be investigated. In this respect, the adaption of the presented algorithm
to a shared memory architecture is of particular interest. The
usage of global data both simplifies the scheduling algorithm
and increases its accuracy and flexibility. But in order to
avoid memory and bus contention, the access to the global
memory must either be infrequent or decoupled between the
processing elements. Since the data of the RETE network are
frequently accessed the contention problem does not allow a
straightforward solution. Further investigations of this matter will be the subject of future research.
Acknowledgements
We owe thanks to J. Doppelbauer, H. Grabner, F. Kasparec, and T. Mandl who constructed the multicomputer architecture we are going to use for the execution of production
976
systems. We are especially grateful to J. Doppelbauer for
providing us with a photo of the Swapable Memory Transputer Board and with technical infonnation about the hardware.
References
[Barachini and Theuretzbacher 1988] Barachini E, Theuretzbacher N.: "The Challenge of Real-Time Process Control for Production Systems", The Seventh National Conference on Al1ificial Intelligence (AAAI-88), St. Paul,
Milmesota, Vol II, 1988.
[Barachini 1988] Barachini E : "PAMELA: A Rule-Based
AI Language for Process-Control Applications", Proceedil1gS on the fil·st Intemational Conference on Industrial & Engil1eering Applications of Artificial Intelligence & Expert
Systems, Vol 2, pp 860-867, Telmessee, 1988.
[Bhuyan 1987] Bhuyan L.N.: "IntercOImection Networks for
Parallel and Distributed Processing"; IEEE Computer, June
1987, pp. 9 ff.
[Bhuyan 1989] Bhuyan L.N., Yang. Q., Agrawal D.P.: "Perfonnance of Multiprocessor Interconnection Networks",
IEEE Computer, February 1989, pp. 25 - 37
LButler et at. 1988] Butler P.L., Allen J.P., Bouldin D.W. :
"Parallel Architecture for OPS5", The 15th Ammal Intemational Symposium on Computer Architecture, Honolulu,
Proceedings pp 452-457,1988.
[Gupta 1986] Gupta A.: "Parallelism in Production Systems"; CMU-CS-86-122, Ph.D. Thesis, Carnegie-Mellon
University, March 1986
[Gupta 1987] Gupta A. et al.: "Results of Parallel Implementation of OPS5 on the Encore Multiprocessor"; CMUCS-87-146, August 1987
[Gupta and Tambe 1988] Gupta A., Tambe M. : "Suitability
of Message Passing Computers for Implementing Production Systems", Proceedings of AAAI-88, Vol 2, pp. 687692, St.Paul, Milmesota, 1988.
[Kasparec et at. 1989] Kasparec F., Doppelbauer J., Grabner
H., Mandl T.: "Advanced Transputer Interconnection Techniques"; 1st Intemational Conference on the Application of
Transputers (SERC/DTI), Univ. of Liverpool, Aug. 1989
[Kelly and Sevoria 1987] Kelly M.A ., Seviora R.E.: "A
Multiprocessor Architecture for Production System Matching"; Proceedings of the AAAI-87, Vo1.1,pp. 36-41, 1987
1987
[Miranker 1984] Miranker, D.P.: "The perfonnance Analysis of TREAT : A DADO Production System Algorithm", Intemational Conference on Fifth Generation Computing, Tokyo 1984, revised article 1986
[Miranker 1987] Miranker D.P.: "TREAT: A New and Efficient Match Algorithm for AI Production Systems";
Ph.D.Thesis, Columbia Unversity 1987
[Cooper and Wogrin 1988] CooperT.A., Wogrin N. : "Rulebased Progranuning with OPS5", Morgan Kaufmann Publishers, Inc., Palo Alto, USA, 1988.
[Oshisanwo and Dasiewicz 1985] Oshisanwo A.O., Dasiewicz P.P.: "A Parallel Model and Architecture for Production] Systems"; Proceedings of the 1987 International Conference on Parallel Processing, pp.147-153 May 1985
[Eager et at. 1989] Eager D.L., Zahorian J., Lazowska E.:
"Speedup Versus Efficiency in Parallel Systems", IEEE
Transactions on Computers, Vol. 38, No.3, March 1989, pp.
408 ff.
[Schreiner and Zinunennann 1987] Schreiner F., Zinunermann G.: "PESA 1- A Parallel Architecture for Production
Systems"; Proceedings of the 1987 International Conference
on Parallel Processing, pp. 166-169
[Flym1 1972] Flym1 M.J., Some Computer Organizations
and Theil· Effectiveness, IEEE Trans. Computers Vol. 21,
No.9, Sept. 1972, pp. 948 - 960
[Shaw 1987] Shaw D.E.: "NON-VON's Applicability to
Three AI Task Areas", nCAI 1987
[Forgy 1979] Forgy c.L. : "On the Efficient Implementation
of Production Systems", Ph.D. Thesis, Camegie-Mellon
University, 1979.
[Stolfo 1984] Stolfo S.J.: "Five Parallel Algorithms for Production System Execution on the DADO Machine"; N ational Conference on Artificial Intelligence, AAAI-1984
[Forgy 1980] Forgy c.L.: "Note on Production Systems and
ILLIAC IV", Technical Report CMU-CS-80-130, CMU,
Pittsburgh, 1980
[Tenorio 1984] Tenorio M.F.M., "Parallelism in Production
Systems", Ph.D. Thesis, University of California, 1984.
[Forgy 1982] Forgy C.L. : "RETE: A Fast Algorithm for the
Many Pattem!Many Object Pattem Matching Problem", Artificial Intelligence, Vol. 19, pp. 17-37, 1982.
[Tien and Raghavendra 1987] Tien S-B.R., Raghavendra
C.S.: "A Parallel Algorithm for Execution of Production
Systems on HMESH Architecture"; Fall Joint Computer
Conference, 1987, pp.349-356
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
977
Performance Evaluation of the Multiple Root Node Approach
to the Rete Pattern Matcher for Production Systemst
AndrewSohn
Jean-Luc Gaudiot
Department of Computer and Information Science
New Jersey Institute of Technology
Newark, NJ 07102, sohn@cis.njit.edu
Department of Electrical Engineering-Systems
University of Southern California
Los Angeles, CA 90089-2563, gaudiot@usc.edu
Abstract- Much effort has been expanded on special architectures and algorithms dedicated to efficient processing of the
pattern matching step of production systems. In this paper, we
investigate the possible improvement on the Rete pattern matcher for production systems. Inefficiencies in the Rete match algorithm have been identified, based on which we introduce a pattern matcher with mUltiple root nodes. A complete implementation of the multiple root node-based production system
interpreter is presented to investigate its relative algorithmiC behavior over the Rete-based Ops5 production system interpreter.
Benchmark production system programs are executed (not simulated) on a sequential machine Sun 4/490 by using both interpreters and various experimental results are presented. Our investigation indicates that the multiple root node-based production system interpreter would give a maximum of up to 6-fold
improvement over the Lisp implementation of the Rete-based
Ops5 for the match step.
t Introduction
The importance of production systems in artificial intelligence (AI) has been repeatedly demonstrated by a large
number of expert systems. As the number and size of expert systems grow, there has however been an emerging
obstacle in the processing of such an important AI application: the large match time. In rule-based production
systems, for example, it is often the case that the rules and
the knowledge base needed to represent a particular production system would be on the order of hundreds to
thousands. It is thus known that applying a simple matching algorithm to production systems would yield
intolerable delays. The need for faster execution of production systems has spurred research in both the software
[2,3,7,8] and hardware domains [6,11].
In the software domain, the Rete state-saving match algorithm has been developed for fast pattern matching in
production systems [2]. The motivation behind developing
the Rete algorithm was based on the observation, called
temporal redundancy, which states that there is little
change in database between cycles. By storing the previous
match results and using them at later time, matching time
can be reduced [1].
Inefficiencies in the state-saving Rete algorithm were
identified, based on which the non-state-saving Treat
match algorithm was developed [10]. The motivation behind developing the Treat algorithm was McDermott's
conjecture, stating that the retesting cost will be less than
t This work is supported in part by the NSF \Ulder grant No. CCR-9013965.
the cost of maintaining the network of sufficient tests [9].
In this paper, we further identify the inefficiencies of
the Rete algorithm, based on which we introduce a pattern
matcher with Multiple Root Nodes (MRN). Section 2 gives
a brief introduction to production systems and the Rete
match algorithm. Section 3 explicates the inefficiencies of
the Rete matcher. A Lisp implementation of the MRNbased production system interpreter is then presented
along with the distinctive features of its implementation.
Section 4 presents benchmark production system programs and experimental results on both the Rete-based
OPS5 interpreter and the MRN-based interpreter. Various
statistics gathered both at compile time and runtime are
presented as well. Performance evaluation on the two interpreters are made in Section 5 in terms of number of
comparison operations and execution time. The last section concludes this paper.
2 Background
2.1 Production systems
A production system as shown in Figure 1 consists of a
production memory (PM), a working memory (WM), and
an inference engine (IE). PM (orrulebase) is composed entirely of conditional statements called productions (or
rules). These productions perform some predefined actions
when all the necessary conditions are satisfied. The lefthand side (LHS) is the condition part of a production rule,
while the right-hand side (RHS) is the action part. LHS
consists of one to many elements, called condition elements (CEs) while RHS consists of one to many actions.
Production
Rules
1··_····,' -w-M-:-:-:-:----w-m-e-s-+I
~wm
Memory
~wm:
change in working memory
Figure 1: An architecture of production systems
978
The productions operate on WM which is a database of
assertions called working memory elements (wmes). Both
condition elements and wmes have a list of elements,
called attribute-value pairs (avps). The value to an attribute can be either constant or variable for CEs and can
be constant only for wmes. A simple production system
with one rule is shown in Figure 2. The inference engine
executes an inference cycle which consists of the following three steps:
o Pattern Matching: The LHSs of all the production
rules are matched against the current wmes to determine the set of satisfied productions.
o Conflict Resolution: If the set of satisfied productions
is non-empty, one rule is selected. Otherwise, the execution cycle simply halts.
o Rule Firing: The actions specified in the RHS of the selected production are performed.
The above three steps are also known as Match-Recognize-Act, or MRA. The inference engine will halt the
production system either when there are no satisfied productions or a user stops.
2.2 The Rete match algorithm
The Rete match algorithm is a highly efficient approach
used in the matching of objects in production systems [2].
The simplest possible matching algorithm would consist in
going through all the rules and wmes one by one to find
match(es). The Rete algorithm, however, does not iterate
over the wmes to match all the rules. Instead, it constructs
a condition dependency network like shown in Figure 2,
saves in the network results from previous cycles, and utilizes them at a later time.
Production Memory
Rulel:
I(c X) (d Y)]
;CEI
[(b Y)]
;CE2
[(p I) (q 2) (r X)] ;CE3
~
[Remove (b Y)]
;Action 1
Working MemQry
wmel: [(p 1) (q 2) (r *)]
wme2: [(r=)(d+)]
wme3: [(c *)(d +)]
wme4: [(b 3)]
wmeS: [(b +)]
wme6: [(p 1) (q 3) (r 7)]
Given a set of rules a network is built which contains
information extracted from the LHSs of the rules. Figure 2
depicts a network for Rule 1, with the following nodes:
o Root Node (RN) distributes incoming tokens (or wmes)
to sequences of children nodes, called one-input nodes.
o One-Input Nodes (OIN) test intra-element features
contained in a condition element, i.e., compare the value of the incoming wmes to some preset values in the
condition element. For example, CE 1 of Rule I contains 2 intra-element features and therefore 2 OINs are
needed to test them. The test result of the one-input
nodes are propagated to nodes, called two-input nodes.
o Two-Input Nodes (TIN) are designed to test inter-condition features contained in two or more condition
elements. The variable X, which appeared in both eEl
and CE3, must be bound to the same value for rule instantiation. Attached to the TINs are left- and right
memories in which wmes matched through OIN s are
saved. The result from two-input nodes, when successful, are passed to nodes, called terminal nodes.
o Terminal Nodes (TN) represent instantiations of rules
Conflict resolution strategies are invoked to select and
fire a rule.
There are other variations to the nodes listed above.
Given the above network, the Rete algorithm performs pattern matching and we shall not go into detail. See [1,2,5]
for more details.
3 The MRN Matcher and Its Implementation
The multiple root node based interpreter is presented along
with its Lisp implementation.
3.1 The MRN Matcher
The Rete algorithm described earlier presents two apparent
bottlenecks: one in the root node and the other in two-input
nodes, as illustrated in Figure 3. Tokens coming into the
root node will pile up on the input arc of the root node since
there is one and only one root node which can distribute tokens one at a time to all CEs. For the network shown in
4
5
; I'
l""
Men-;ories
,............................................. ; h.
Figure 2: A Rete network for Rule 1.
Figure 3: Two bottlenecks of the Rete. (1) piling up of wmes on an
arc of the root node, resulting in a sequential distribution of wmes to
all CEs one at a time. (2) O(n) or Oem) comparisons in TINs.
979
Figure 3 where there are n condition elements, the root
node will have to make nx distributions to the network
when x wmes are present on the input arc of the root node.
The second inefficiency can also be seen on Figure 3.
Assuming that m tokens are stored in the left memory of
the two-input node and a token is matched on the right input. The arrival of this last token will trigger the invocation
of m comparisons with the wmes received and stored in the
left memory. Should the situation have been reversed and
n tokens be in the right memory, a token on the left side
would provoke n comparisons. The internal workings of
this two-input node are therefore purely sequential. In order to avoid wasting time in searching the entire memory,
an effective allocation of two-input nodes and one-input
nodes should be devised. In this paper, we will limit ourselves to the first bottleneck. Discussions on the second
bottleneck can be found in[4,5].
The first bottleneck described above can be resolved
by introducing mUltiple root nodes (MRN) in the network,
as depicted in Figure 4. This introduction of multiple root
node is based on the observation that a wme that has n
A VPs never matches a CE that has m AVPs where n ... > n(xN) >
n(Yl) > ... > 7r(YM), expressing the relative importance
of each variable in the output 3 . The algorithm ensures
that lower priority variables are represented in terms of
higher priority variables. (We will see later how n is used
in the context of functor and nonlinear equations to minimize the number of variables occurring in the output.)
A crucial point for efficiency is that the main loop in
3The priority among the Yi'S is arbitrary.
Figure 1 iterates N times, and N « M in general, that
is, the number of target variables is often far smaller than
the total number of variables in the system.
Note that the order of variables in the predefined predicate dump([xl,"" XN]) determines the priority relation
over these variables. Hence the user can influence the
output representation of the constraints.
3.2
Linear Inequalities
The constraint solver stores the linear inequalities in
a Simplex tableau. (See [Jaffar et al. 1990] for details.)
Each linear inequality is expressed internally as an equality by introducing a slack variable, one whose value is
restricted to be either nonnegative or positive. Our first
job, therefore, is the elimination of such slack variables.
This is achieved by pivoting the inequality tableau to
make all the slack variables basic so that each appears
in exactly one equation. Hence each row can be viewed
as s = exp where s 2: 0 or s > 0, and this equation can
now easily be rewritten into the appropriate inequality
exp 2: 0 or exp > O.
The remainder of this section deals with the problem of eliminating non-target variables which occur in
these inequalities. We use a method based on Fourier's
algorithm [Fourier 1824]. It is well-known that the direct application of this algorithm is impractical because
it generates many redundant constraints. Attempting to
eliminate all redundancy at every step is also impractical
[Lassez et al. 1989]. Adaptations of Fourier's algorithm
due to Cernikov [Cernikov 1963] substantially improve
the performance. We show how to incorporate other redundancy elimination methods with those of Cernikov to
obtain a more practical algorithm for eliminating variables from linear inequalities.
In some circumstances, especially when constraints
990
when written as a matrix is dense, algorithms not based
on Fourier such as [Huynh et al. 1990] can be more efficient; however, typical CLP(R) programs produce sparse
matrices. In general, the size of projection can grow exponentially in the number of variables eliminated, even
when all redundancy is eliminated. Fourier-based methods have the advantage over other methods that we can
stop eliminating variables at any time, thus computing
a partial projection.
3.2.1
Fourier-based Methods
We begin with some necessary definitions. A labeled constraint is a linear inequality labeled by a set of constraint names. We say that c is label-subsumed by c' if
label(c') ~ label(c). To simplify the explanation, we will
not consider strict inequalities. We assume all constraints
are written in the form 2:: 1 (};iXi :S {3.
We shall be using some algebraic manipulation of constraints. Let Cj be the constraint 2:: 1 (};j,iXi :S {3j, for
j = 1, ... ,n. Then "'I * Cj (or "'ICj) denotes the constraint
1 "'IG-j,iXi :S "'I{3j where "'I is a real number, and Cj + Ck
denotes the constraint 2:: 1 ((};j,i + (};k,i)Xi :S {3j + {3k. Similarly, 2:7=1 Cj denotes an iterated sum of constraints. We
consider constraints C and d equal, C = d, if C == "'I * d
for some "'I > O.
2::
Let C be a set of labeled constraints. Given a variable
Xi, we divide C into three subsets: C~, those constraints
in which Xi has a positive coefficient (i.e. Cj such that
(};j,i > 0); C;;;, those constraints in which Xi has a negative coefficient; and C~i' those constraints in which the
coefficient of Xi is zero. We omit the subscript when the
given variable is clear from the context.
Let Ck E C+ and Cl E C- and let d = 1/ (};k,i * Ck +
-l/(};I,i * Cl. Then, by construction, Xi does not occur in
the constraint d. If Sk (Sl) is the label of Ck CCl) then d
has label Sk U Sl. Let V be the collection of all such d.
Then V U Co is the result of a Fourier step eliminating
Xi. We write jourieri(C) = V U Co. When both C+ and
C- are non-empty then V is non-empty, and the step is
called an active variable elimination. After eliminating Xi
the total number of constraints in C increases (possibly
decreasing) by measure(xi,C) = IC+I x IC-I-IC+I-IC-I.
Let A be a set of constraints where each constraint
is labeled by its own name. Define F o = A and Fi+1 =
jourieri+l(Fi) . Then {Fih=o,l, ... is the sequence of constraint sets obtained by Fourier's method, eliminating,
in order, Yl, Y2, .... We write Fi for {Fjh=o,l, ... ,i' It is
straightforward (see [Lassez and Maher 1988], for example) that, if m < n, Fn +--+ ::JYm+b Ym+2, ... ,Yn Fm. In
particular, Fn +--+ ::JY1, Y2, ... ,Yn A. Thus Fourier's algorithm computes projections.
However Fourier's algorithm generates many redundant constraints and has doubly-exponential worst-case
behavior. Cernikov [Cernikov 1963] (and later Kohler
[Kohler 1967]; see also [Duffin 1974]) proposed modifications which allow some redundant constraints to be
eliminated during a Fourier step, and address this problem. The first method eliminates all constraints generated at the n'th active step which have a label of cardinality n + 2 or greater, for every n. A second method
retains, at each step, a set S of constraints such that every constraint generated at this step is label-subsumed
by a constraint in S 4. The first method eliminates
a subset of the constraints eliminated by the second.
These methods are correct in the following sense: If
{Cih=o,l, ... is the sequence generated by such a method,
then Ci +--+ ::JYl ... Yi A, for every i.
Although it appears that the Cernikov modifications
to Fourier's algorithm could be augmented by deleting
additional redundant constraints after each step, this is
incorrect in general [Huynh et al. 1990]. The following
example highlights this point by showing that the first
Cernikov algorithm, augmented by the simplest kind of
redundancy removal, removal of duplicate constraints, is
unsound.
3.2.2
An Example
Let A denote the following labeled constraints. Labels
appear to the left of the constraints. It can be verified
that A contains no redundancy.
{I}
w
w
{2}
{3}
-w
{4}
-w
{5}
v
{6} -v
+
+
X
X
X
X
+
+
+
+
Y
Y
Y
Y
Y
+
+
+
+
z
z
z
z
:S1
:S1
:S1
:S1
:SO
:SO
Upon eliminating v (by adding the last two constraints),
we obtain in the first (Fourier) step:
{I}
{2}
{3}
{4}
{5,6}
w
w
-w
-w
+
+
X
X
X
X
+
+
+
+
Y
Y
Y
Y
-Y
+
+
+
+
z
z
z
z
z
z
z
z
:S1
:S1
:S1
:S1
:SO
:S1
:S 1
:S1
:S1
:SO
Next we eliminate w obtaining:
{1,3}
{1,4}
{2,3}
{2,4}
{5,6}
X
+
-x
+
Y
Y
Y
Y
-Y
+
+
+
+
4In the English translation of [Cernikov 1963], this is misstated.
991
Observe that Cernikov's criterion does not allow us to
delete any constraints. Since the second and third constraints are duplicates, we could delete one. However, we
choose not to in this step. Next x is eliminated to obtain:
{1,2,3,4}
{1,4}
{2,3}
{5,6}
y
y
y
-y
+ z ::;1
+ z ::;1
+ z ::;1
::;0
The first three constraints are identical, and now we
choose to delete the second and third, obtaining:
{1,2,3,4}
{5,6}
y
-y
+ z ::;
1
::; 0
In the final Fourier step, we eliminate y to obtain:
{1,2,3,4,5,6}
z ::; 1
{Fj h=O,I, .... Indeed, these methods are valuable to the
extent that they do not compute F i. All that is required
is that Ci can be viewed as being computed using a function of this sequence.
Let r be the function associated with a Fourier-based
method, and let s map every constraint set to a subset
obtained by deleting some strictly redundant constraints.
The sequence of constraint sets {Kih=o,I, ... where Ko = A
and Ki+1 = s(fourieri+I(Ki) - r(Fi+I)) is the result of
augmenting the Fourier-based method with the deletion
of (some) strictly redundant constraints.
We let Vi denote the set of constraints deleted by
s at step i, that is, Vi = (fourieri(Ki-l) - r(Fi)) s(fourieri(Ki-l) - r(Fi)). A removed constraint in Ci is
defined to be a constraint in Ci which uses some constraint in V j , j ::; i during generation. That is, for some
j ::; i, if we view generation of Ci as starting from Cj (in-
Cernikov's criterion allows us to delete this constraint,
and so we finally obtain an empty set of constraints.
This outcome is incorrect since it implies ::lv, w, x, y A
is true for all values of z, and it is straightforward to
verify that, in fact, ::iv, w, x, y A +-+ (z ::; 1). Observe
that we could have achieved the same incorrect outcome
if, after eliminating w, one of the duplicate constraints
was deleted.
stead of Co) then c is generated using constraints from
V j . Ri denotes the set of all removed constraints in C.
Clearly Fi 2 Ci 2 lC i , lC i = Ci - R i , and Vi ~ R i ·
3.2.3
Combining Fourier-based Methods with
Strict Redundancy Elimination
The following theorem from the folklore underlies all
the work below.
Given a set C of constraints, c E C is redundant in C if
C +-+ C - {c}. A subset R of C is redundant if C +-+ C -R.
We define C~c iff, for some constraint c', C --+ c' and
c' --+ c but c
c'. Equivalently, (if we are dealing with
only non-strict inequalities) C~c means C --+ c' where
c = c' + (0 ::; f) for some constraint c' and some f > O.
(Recall that c' + (0 ::; f) denotes the sum ofthe constraint
c' and the constraint (0 ::; f).) If also c E C then c is said
to be strictly redundant in C. Geometrically, a strictly
redundant constraint c determines a hyperplane which
does not intersect the volume defined by C. We write
C~C' if C---7>->c for every c E C'. A constraint c E C is
said to be quasi-syntactic redundant [Lassez et al. 1989]
if, for some c' E C and some f > 0, C = c' + (0 ::; f).
Clearly quasi-syntactic redundancy is one kind of strict
redundancy.
Theorem 1 Let C = {co, CI, ... , Ck} be a consistent set
of constraints and let c be a constraint. C --+ c iff c =
L~=o AiCi + (0 ::; f) where the A'S and f are non-negative.
r
We capture Cernikov's modifications of Fourier's algorithm and others in the following definition. Let r be
a constrairit deletion procedure which, at step i, determines a redundant subset of Fi as a function of the sequence F i . Let a Fourier-based algorithm be one which
generates a sequence of constraint sets {Cih=o,I, ... where
Co = A and Ci+1 = fourieri+1 (Ci ) - r(Fi+1)' It is important to note that, in general, it is not necessary
for a Fourier-based method to compute the sequence
Numbers denoted by A's, I1'S, v's and f'S are nonnegative throughout. Thus (0 ::; f) is a tautologous constraint. The notation var(c) denotes the set of variables
with non-zero coefficient in constraint c.
D
The next lemma shows that all strictly redundant constraints can be deleted simultaneously from a consistent
set of constraints. Consequently it is meaningful to speak
of a strictly redundant subset of C. It also shows that a
set of redundant constraints can be deleted simultaneously with a set of strictly redundant constraints. The
corresponding results for the class of all redundant constraints do not hold.
Lemma 2 Let C be a consistent set of constraints.
1. If V ~ C and each c E V is strictly redundant in C
then C +-+ C - V.
2. If S is strictly redundant in C and R is redundant
in C then SUR is redundant in C. D
We now prove that elimination of strict redundancy
does not affect correctness of Fourier-based methods.
992
C = initial set of inequalities (after linear substitutions);
label(c) = {c} for each C E C;
n = 0;
while (there exists an auxiliary variable x in C) {
choose a variable x with minimal measure(x, C);
V=C~;
if
(IC:I > 0 and IC;I > 0) {
n = n + 1; / * count active
eliminations */
for (each pair Ck E C:, Cz E C;)
if (Ilabel(ck) U label(cz) I ;::: n + 2) continue; /* first Cernikov method
d is the constraint obtained from Ck and Cz eliminating x;
label (d) = label (Ck) U label (cz);
if (d is not quasi-syntactic redundant wrt V) {
E = quasi-syntactic redundant constraints in V wrt d;
*/
V = VU {d} - E;
second Cernikov method
if (d is label-subsumed in V)
V = V - {d};
else {
F = constraints in V label-subsumed by d;
/ * **
V = V-F;
}
* * */
}
}
C =V;
}
return C;
Figure 2: Linear inequalities
Theorem 3 Suppose A is consistent and {Ci } is correct.
Then {Ki} is correct.
Proof: Note that, since A is consistent, Fn is consistent
for every n, and consequently so are Cn and Kn- Suppose
cERn and C depends on constraints in V m, m :s; n;
say C = L:i AiCi + L:j /-tjdj where dj E Vm for each j and
Ci E Cm - Vm for each i, and for some j, /-tj > O. Now for
each j, since Cm~Vm (Lemma 2), dj = L:i VjiCi + (0 :s;
Ej) where Ej > 0, and Ci E Cm - Vm for each i. Hence
C = L:i(Ai + L: j /-tjVji)Ci + (0 :s; E') where E' = L:i /-tjEj and
E' > O. Let c' = L:i(Ai + L: j /-tjVji)Ci so that C = c' + (0 :s;
E').
Now Fm -----+ c' since every Ci E Fm. Furthermore Fn -----+ c'
since var(c') = var(c) C {Yn+1, Yn+2,' .. } (since cERn).
Since {Ci } is correct, Cn -----+ c', that is, Cn~c,
By applying this argument for every C depending on Vm
and every m :s; n, Cn~c for every cERn. By Lemma
2, Cn - Cn - Rn. But Cn - Rn = Kn. Hence Kn - Cn
and, since {Ci } is correct, Kn - Fn. 0
This result extends to sets of constraints containing
both strict and non-strict inequalities. Fourier's algorithm and Cernikov's modification extend straightforwardly. The definition of ~ stands, but it is no longer
equivalent to C -----+ c' and C = c' + (0 :s; E), for some E > O.
Before discussing our algorithm, we briefly outline the
costs of various redundancy elimination procedures. Let
C be a set of m inequalities involving n variables, obtained in a Fourier-based method from mo original inequalities. Full redundancy elimination using the simplex algorithm has exponential worst-case complexity,
although in the average case it is O(m 3 n). Strict redundancy elimination has essentially the same cost as
full redundancy elimination. Quasi-syntactic redundancy
elimination on the constraints C has worst-case complexity O(m 2 n). The cost of eliminating redundancy in C
using the first Cernikov method has worst-case complexity 0 (mmo), and it has the important advantage that
a constraint can be deleted before the (relatively expen-
993
sive) process of explicitly constructing it. Application of
the second Cernikov method has worst-case complexity
O(m 2 mo).
In [Cernikov 1963] it is recommended that the first,
and then the second Cernikov elimination method be
applied at each step. The variation in which the second method is applied only intermittently is suggested in
[Kohler 1967]. If we want to incorporate the elimination
of strict redundancy, the above complexity analysis suggests that quasi-syntactic redundancy elimination may
be most cost-effective. The analysis also suggests that
this elimination should be performed between the first
and second Cernikov methods.
Our tests tended to support this reasoning. Using the
first Cernikov method followed by quasi-syntactic redundancy elimination produced significant improvement
.over the first method alone. However further processing in accord with the second Cernikov method only
marginally reduced the number of constraints eliminated
and led to an overall increase in computation time. Full
redundancy elimination after each Fourier step, which is
incompatible with the Cernikov methods, slows computation by an order of magnitude. Full strict redundancy
elimination added to the Cernikov method is also unprofitable.
The algorithm is shown in Figure 2. It uses a heuristic
(from [Duffin 1974]) attempting to minimize the number
of new constraints generated. There remains the matter of verifying the correctness of the algorithm. It is
easy to see that a step i is active in {:Fi} iff it is active
in {Ki} iff it is active in {Cd. Thus the first Cernikov
method is Fourier-based, and the corresponding part of
the algorithm implements this method, and so is correct.
The second part of the algorithm deletes some remaining
quasi-syntactic redundancies and, by the previous theorem, is correct. If the third part, which is commented
out in Figure 2, is included in the algorithm then theorem 3 does not apply directly. However it is not difficult
to show that this algorithm is equivalent to eliminating some of the constraints eliminable by applying the
second Cernikov method en bloc and then eliminating
some strictly redundant (not necessarily quasi-syntactic
redundant) constraints. Thus the theorem applies and
the algorithm is correct.
4
Constraints over Trees
The constraints at hand are equations involving uninterpreted functors, the functor equations. As in PROLOG
systems a straightforward way of printing these equations is to print an equation between each target variable
and its value.
Consider equivalence classes of variables obtained as
the reflexive, symmetric and transitive closure of the relation: {(x, y) : x is bound to y}; write rep(x) to denote
the variable of highest priority equivalent to x. Now define the printable value of a variable as:
value(f(t l ,···, tn)) = f(value(tl),· .. , value(t n ));
va lue ()
x = {
value(t), if x is bound to a term t;
.
rep(x), If x is unbound
The output is a set of equations ofthe form x = value(x)
for each target variable x, excepting those variables x for
which value(x) is x itself. We remark that most PROLOG systems do not use equivalence classes as above,
and thus for example, the binding structure x I-t _1, Y I-t
_1 is generally not printed as x = y.
One well-known drawback of the above output method
is that the output can be exponentially larger than
the original terms involved. For example, the output of
Xl = f(X2' X2), X2 = f(X3' X3), ... Xn-l = f(x n , x n ), Xn =
a, where Xl, ... , Xn are target variables, is such that
the binding of Xl is a term of size O(2n). This exponential blowup can be avoided by other methods
[Paterson and Wegman 1978], but in practice it occurs
rarely. Hence the binding method is adopted in the
CLP(R) system.
The output of functor equations in the context of
other (arithmetic) constraints raises another issue. Recall that is not always possible to eliminate non-target
variables appearing in functor equations (e.g. eliminating z in x = f (z)). Consequently, arithmetic constraints
which affect these unavoidable non-target variables must
also be output. We resolve this issue by augmenting the
problem description sent to the linear constraint output
module (c.f. Section 3) as follows: the target variables
now consist of the original target variables and the unavoidable non-target variables, with the latter having priority intermediate between the original target variables
and remaining variables.
These secondary target variables are given lower priority than the original target variables in order to minimize their occurrence in the output. The lower priority ensures that such variables appear on the left hand
side of arithmetic equations as much as possible. We can
then substitute the right hand side of the equation for
the variable and omit the equation, thus eliminating the
variable. For example, if x and yare the target variables
in x = f(z), y = z + 2 the output is x = f(y - 2). We
discuss this further in section 6.
994
5
Nonlinear Constraints
In general all nonlinear constraints need to be printed,
regardless of the target variables, because omitting them
may result in an output which is satisfiable when the
original set of constraints is not. For example, given the
constraints x < 0, y * y = -2 and target variable x, we
cannot simply output x < O. This problem arises since we
have no guarantee that the nonlinear constraints are satisfiable. When the only nonlinear constraints are caused
by multiplication the auxiliary variables in the nonlinear constraints can, in theory, be eliminated. However
this approach is not practical with current algorithms
[Collins 1982] and not possible once trigonometric functions are introduced. Thus, as with functor equations,
the nonlinear equations contribute additional target variables. These are simply all the variables which remain
in the nonlinear constraints, and. we give them priority
lower than the target variables but higher than the variables added from functor equations.
However, there is one observation which can significantly reduce the number of nonlinear equations printed
and the number of additional target variables: Suppose a non-target variable y occurs exactly once in the
constraints, say in the constraint c, and p(x) implies
3y c( x, y) +-+ c' (i), for some constraint c' and some condition p, then c can be replaced by c', provided the remaining constraints imply that p(i) holds. Some specific
applications of this observation follow.
If y occurs in the form y = f(i) then this constraint
can be eliminated provided f is a total function on the
real numbers (this excludes functions such as exponentiation and division 5 ). If y occurs as y = X Z then we
can delete the constraint, provided we know that x > 0
or z is an integer other than O. Similarly, we can delete
x = zY provided that x > 0, Z > 0 and z i= 1, and delete
x = yZ provided x > 0 and z i= o. A constraint x = Iyl
can be replaced by x 2': 0; x = siny (and x = cosy)
can be replaced by -1 :S x :S 1; x = min(y, z) can be
replaced by x :S z (and similarly for max). A constraint
x = y * z (equivalently y = x / z) can be eliminated, provided it is known that z i= O. In this latter case, which
can be expected to occur more often than most of the
other delayed constraints, we can use linear programming techniques on the linear constraints to test whether
z is constrained to be non-zero. Specifically, we add the
constraint z = 0 to the linear constraint solver and if the
solver finds that the resulting set of constraints is inconsistent then we delete x = y * z. We undo the effects of
the additional constraint using the same mechanism as
used for backtracking during execution of a goal.
5Strictly speaking, division is not a function, since y = x/ z is
defined to be equivalent to x = y*z and so % can take any value.
There is a significant complication due to the linear
constraints which are generated as a result of simplifying nonlinear constraints. As each such linear constraint
is generated, it is passed to the linear constraint solver
so that a consistency check can be performed6 . If the resulting constraint system is not consistent then the simplifications are undone and the system backtracks to the
nearest choice-point as it normally does after executing
a failure.
6
Summary of the Output Module
We now present the output algorithm in its entirety, a
collation of the various sub-algorithms described above
corresponding to the different kinds of constraints. Note
that the order in which the sub-algorithms are invoked
is important; essentially, the processing of functor and
nonlinear equations must be done first in order to determine the set of secondary target variables. Then the
linear constraints are processed in such a way as to maximize the number of secondary target variables that can
be eliminated. Step V below, not previously described,
performs this elimination. It suffers the same drawback
as processing functor equations - potentially the size of
output is exponential in the size of the original equations.
Step I
Process the functor equations, in order to obtain the
secondary target variables. These are essentially the
non-target variables appearing in the bindings of the
primary target variables. Obtain a (possibly empty)
collection of functor equations.
Step II
Simplify the nonlinear equations, and expand the
set of secondary target variables to include all the
variables in the simplified collection. Obtain a collection of nonlinear equations. This step might also
produce additional linear equations.
Step III
Process the linear equations (Figure 1) with respect
to the primary and secondary target variables, using
some priority such that the primary variables are
higher priority than the secondary variables and the
auxiliary variables are of the lowest priority. Obtain
a collection of final linear equations involving only
target variables.
Step IV
Process the linear inequalities (Figure 2), and note
that these may have been modified as a result of
6Thus the output module implements a more powerful constraint solver than that used during run-time.
995
Step III above, using the primary and secondary target variables. Obtain a collection of linear inequalities involving only target variables.
[Fourier 1824] J-B.J. Fourier. Reported in: Analyse des
travaux de l' Acadamie Royale des Sciences, pendant l'annee 1824, Partie mathematique, Histoire de
Step V
For each secondary target variable y appearing in a
linear equation of the form y = t, substitute t for y
everywhere, and remove the equation. For each secondary target variable y appearing in a nonlinear
equation of the form y = t, where y appears elsewhere but not in t, substitute t for y everywhere,
and remove the equation.
Vol. 7 (1827), pp. xlvii-Iv. (Partial English translation
in: D.A. Kohler. Translation of a Report by Fourier
on his work on Linear Inequalities. Opsearch, Vol. 10
(1973), pp. 38-42.)
l'Academie Royale des Sciences de l'Institut de France,
Step VI
Output all the remaining constraints.
7
Conclusion
The output module of CLP(R) has been described.
While a large part of the problem coincides with the
classical problem of projection in linear constraints, dealing with functor and nonlinear equations, and working
in the context of a CLP runtime structure, significantly
increase the problem difficulty.
The core element of our algorithm deals with projecting linear constraints; it extends the FourierjCernikov
algorithm with strict redundancy removal. The rest of
the paper deals with functor and nonlinear equations and
how they are output together with the linear constraints.
What is finally obtained is an output module for CLP(R)
which has proved to be both practical and effective.
We finally remark that the introduction of metalevel facilities [Heintze et al. 1989] in a future version of
CLP(R) significantly complicates the output problem,
since the constraint domain is expanded to include representations j co dings of constraints.
References
[Cernikov 1963] S.N. Cernikov. Contraction of Finite Systems of Linear Inequalities (In Russian). Doklady
Akademiia Nauk SSSR, Vol. 152, No.5 (1963), pp.
1075 - 1078. (English translation in Soviet Mathematics Doklady, Vol. 4, No.5 (1963), pp. 1520-1524.)
[Collins 1982] G.E. Collins. Quantifier Elimination for Real
Closed Fields: a Guide to the Literature. In Computer
Algebra: Symbolic and Algebraic Computation, Computing Supplement #4, B. Buchberger, R. Loos and
G.E. Collins (Eds), Springer-Verlag, 1982, pp. 79-81.
[Duffin 1974] R.J. Duffin. On Fourier's Analysis of Linear Inequality Systems. Mathematical Programming Study,
Vol. 1 (1974), pp. 71-95.
[Heintze et al. 1989] N.C. Heintze, S. Michaylov, P.J.
Stuckey and R. Yap. Meta-programming in CLP(R).
In Proc. North American Conf. on Logic Programming, Cleveland, 1989. pp. 1-19.
[Huynh et al. 1990] T. Huynh, C. Lassez and J-L. Lassez.
Practical Issues on the Projection of Polyhedral Sets.
Annals of Mathematics and Artificial Intelligence, to
appear. (Also: IBM Research Report RC 15872, IBM
T.J. Watson Research Center, 1990.)
[Jaffar et al. 1990] J. Jaffar, S. Michaylov, P. Stuckey and R.
Yap. The CLP(R) Language and System, ACM Transactions on Programming Languages, to appear. (Also:
IBM Research Report RC 16292, IBM T.J. Watson
Research Center, 1990.)
[Jaffar and Lassez 1986] J. Jaffar and J-L. Lassez. Constraint Logic Programming. Technical Report 86/73,
Dept. of Computer Science, Monash University (June
1986). (An abstract appears in: Proc. 14th Principles
of Programming Languages, Munich, 1987, pp. 111
119.)
[Kohler 1967] D.A. Kohler. Projections of Polyhedral Sets.
Ph.D. Thesis, Technical report ORC-67-29, Operations Research Center, University of California at
Berkeley (August 1967).
[Lassez et al. 1989] J-L. Lassez, T. Huynh and K. McAloon.
Simplification and Elimination of Redundant Linear Arithmetic Constraints. In Proc. North American
Conference on Logic Programming, Cleveland, 1989.
pp.35-51.
[Lassez and McAloon 1988] J-L. Lassez and K. McAloon.
Generalized Canonical Forms for Linear Constraints
and Applications. In Proc. Int. Conf. on F~fth Generation Computer Systems, ICOT, Tokyo, 1988. pp.
703-710.
[Lassez and Maher 1988] J-L. Lassez and M. Maher. On
Fourier's Algorithm for Linear Arithmetic Constraints.
Journal of Automated Reasoning, to appear.
[Paterson and Wegman 1978] M.S. Paterson and M.N. Wegman. Linear Unification. Journal of Computer and
System Sciences, Vol. 16, No.2 (1978), pp. 158--167.
[Tarski 1951] A. Tarski. A Decision Method for Elementary
Algebra and Geometry. University of California Press,
Berkeley, USA, 1951.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
996
Adapting CLP(R) To Floating-Point Arithmetic
J.H.M. Lee and M.H. van Emden
Logic Programming Laboratory, Department of Computer Science
University of Victoria, Victoria, B.C., Canada V8W 3P6
{jlee,vanemden}@csr.uvic.ca
Abstract
As a logic programming language, Prolog has shortcomings. One of the most serious of these is in arithmetic.
CLP(R) though a vast improvement, assumes perfect
arithmetic on reals, an unrealistic requirement for computers, where there is strong pressure to use floatingpoint arithmetic. We present an adaptation of CLP(R)
where the errors due to floating-point computation are
absorbed by the use of intervals in such a way that the
logical status of answers is not jeopardized. This system
is based on Cleary's "squeezing" of floating-point intervals, modified to fit into Mackworth's general framework
of the Constraint-Satisfaction Problem. Our partial implementation consists of a meta-interpreter executed by
an existing CLP(R) system. All that stands in the way
of correct answers involving real numbers is the planned
addition of outward rounding to the current prototype.
1
Introduction
Mainstream computing holds that programming should
be improved by gradual steps, as exemplified by the
methods of structured programming and languages such
as Pascal and Ada. Revolutionaries such as Patrick
Hayes and Robert Kowalski advocated radical change,
as embodied in Hayes's motto: "Computation is deduction." [Hayes 1973] According to this approach, programs are definitions in a declarative language and every
computation step is a valid inference, so that results are
logical consequences of program and data. Logic programming, Prolog and the CLP scheme are examples of
this radical alternative in programming languages and
method. Where Prolog coincides with logic programming, certainty of knowledge is obtained.
In numerical analysis there is a similar tension between
mainstream thinking and the radicals. The first is satisfied without rigorous control of errors. When successive
approximations differ by a small amount, it is assumed
that a result has been obtained with an error of approximately that amount. Of course sophisticated error analyses can be made' to suggest more certain knowledge.
But such analyses are typically valid only asymptotically. In practice one does not know whether one is close
enough to the true value for the asymptotic analysis to
be applicable at all.
The radical alternative in numerical computation is
represented by interval methods, where the ideal is to be
sure that the true value is contained in an interval. It is
then the purpose of iteration to shrink such an interval
till it is no greater than an acceptable width. Here again
the goal is certainty of knowledge.
In the research reported in this paper, we bring these
two radical streams together. Both streams are, in their
present form, deficient. Logic programming lacks in control of numerical errors. Interval methods rely on conventional algorithmic languages and hence lack computation
as deduction. We show that the two can be combined in
such a way that rigorously justified claims can be made
about the error in numerical computation even if conventional floating-point arithmetic is used.
Problem statement. Logic programming, exemplified by Prolog, is the most successful realization of
Hayes's motto. In certain application areas, Prolog
can be used to program efficient computations that
are also logical deductions. However, Prolog arithmetic primitives, which are functional in nature, are incompatible with the relational paradigm of logic programming. The advent of CLP(?R), an instance of the
CLP scheme [Jaffar and Lassez 1987], takes us closer to
relational arithmetic but its implementation CLP('R)l
[Jaffar et at. 1990] is obtained by substituting each real
number by a single floating-point approximation. As a
result, round-off errors destroy soundness and disqualify
CLP(R) computation as deduction.
Solution. Our solution consists of three parts. First,
we tackle the round-off error problem with interval arithmetic introduced in [Moore 1966]. Instead of operating
on individual floating-point numbers, interval arithmetic
lIn this paper, we use CLP(3?) to denote the CLP instance with
3? being the algebraic structure of finite trees ofreals; and CLP(n)
is the name of a CLP(3?) implementation.
997
manipulates intervals. The guaranteed inclusion property of interval arithmetic ensures the soundness of computation.
Second, traditional interval arithmetic is functional and has been embedded in functional or imperative languages. To develop the required relational version, we use an interval narrowing operation based on
work in [Cleary 1987] and similar to the one used in
[Sidebottom and Havens 1992J.
Finally, we make a modification to the CLP scheme by
including an operation that reduces goal to normal form
and show that interval narrowing is such an operation.
By modifying a meta-interpreter for CLP(R) accordingly, we obtain a prototype implementation.
The paper is organized as follows. Section 2 discusses
related work. Relational interval arithmetic, which consists of interval narrowing and a relaxation algorithm,
is presented in section 3. In section 4, we describe the
semantics of ICLP(R), which is CLP(R) extended with
relational interval arithmetic. We summarize and conclude in section 5.
2
Related Work
Interval arithmetic. While it is important to derive
new and more efficient interval arithmetic algorithms and
ensure delivery of practical interval bounds, recent development in interval arithmetic [Moore 1988] emphasizes:
(1) automatic verification of computed answers, and (2)
clear mathematical description of the problem. Users of
numerical programs are usually only interested in the solution of a problem. They do not want to take the burden
(a) to understand how the problem is solved, (b) to validate the correctness of the answers, and (c) to calculate
error bounds. Logic programming shares these goals.
Constraint interval arithmetic. Constraint interval arithmetic stems from constraint propagation techniques.
It is a form of "label inference," where
the labels are intervals [Davis 1987].
ENVISION
[de Kleer and Brown 1984] performs qualitative reasoning about the behaviour of physical systems over time.
TMM [Dean 1985] is a temporal constraint system that
records and reasons with changes to the world over
time. SPAM [McDermott and Davis 1984] performs spatial reasoning. These systems are based on consistency
techniques [Mackworth 1977], which handle only static
constraint networks. To be able to generate constraints,
the described systems are equipped with programming
languages tailored to the application.
Constraint logic programming. Cleary incorporates a relational version of interval arithmetic, which
he calls Logical Arithmetic [Cleary 1987], into Prolog.
He introduces a new term "interval", which requires an
extension of the unification algorithm. Cleary presents
several "squeezing" algorithms that reduce arithmetic
constraints over intervals. A constraint relaxation cycle coordinates the execution of the squeezing algorithms. However, there is a semantic problem in this
approach. Variables bound to intervals, which are terms
in the Herbrand universe, can be re- bound to smaller intervals. This is not part of resolution, where only a variable can be bound. It is not clear in what
other, if any, sense this may be a logical inference.
BNR Prolog [Older and Vellino 1990] has a partial implementation of logical arithmetic, which only handles closed intervals. The Echidna constraint reasoning system also supports relational interval arithmetic
[Sidebottom and Havens 1992]. It is based on hierarchical consistency techniques [Mackworth et al. 1985].
Echidna is close to CHIP [Dincbas et al. 1988]; whereas we remain within the CLP framework.
3
Relational Interval Arithmetic
Cleary describes several algorithms to reduce constraints
on intervals [Cleary 1987]. These algorithms work under
a basic principle: they narrow intervals associated with
a constraint by removing values that do not satisfy the
constraint. We study the set-theoretic aspect of the algorithms and generalize them for narrowing intervals constrained by a relation p on lRn. We then discuss interval
narrowing for several common arithmetic relations. Interval narrowing is designed for the reduction of a single
constraint. Typically, several constraints interact with
one another by sharing intervals, resulting in a constraint
network. We present an algorithm that coordinates the
applications of interval narrowing to constraints in the
network.
3.1
Basics of Interval Arithmetic
We use lR to denote the set of real numbers and IF a set
of floating-point numbers. We also distinguish between
real intervals and floating-point intervals. The set of real
intervals, I(lR), is defined by
I(lR)
=
{(a,b]laElRU{-oo},bElR} U
{[a, b) la E lR,b E lRu {+oo}} U
{[a, b]1 a, b E lR} U
{(a, b) la E lRu {-oo},b E lRU {+oo}}.
Replacing lR by IF in the definition of I(lR), we obtain
the definition of floating-point intervals. The symbols
-00 and +00 are used to represent intervals without lower and upper bounds respectively. Every interval has the
usual set denotation. For example, [e,1f) = {x Ie:::; x <
1f}, (-00,4.5] = {x I x :::; 4.5}, and (-00, +(0) = lR. We
impose a partial ordering on real intervals; an interval II
998
is smaller than or equal to an interval 12 if and only if
II ~ 12. Given a set of intervals T. I E T is the smallest
interval in T if I is smaller than or equal to I' for all
l' E T.
Conventionally, real numbers are approximated by
floating-point numbers by means of rounding or truncation. We approximate real intervals by floating-point inI(JR) --+
tervals using the outward rounding function,
I(IF); if I is a real interval, e(I) is the smallest floatingpoint interval containing I. It follows that e(I) = I for
each floating-point interval I. The IEEE floating-point
standard [IEEE 1987] provides three user-selectable directed rounding modes: round toward +00, round toward
-00, and round toward O. The first two modes are essential and sufficient to implement outward rounding: we
round toward +00 at the upper bound and toward -00 at
the lower bound. In most hardware that conforms to the
IEEE standard, performing directed rounding amounts
to setting a hardware flag before performing the arithmetic operation.
We state without proof the following properties of the
outward rounding function.
e:
Lemma 1: If A E I(JR) and a' E e(A), then there exists
a E A such that a' E e([a, aD.
•
Lemma 2: If A E I(IF), a' E A, and a E e([a', a']), then
a E A.
•
3.2
Interval Narrowing
An i-constraint is of the form (p, f), where p is a relation
on IRn and f = (/1,'" ,In) is a tuple of floating-point
intervals. Note that the number of intervals in the tuple
f is equal to the arity of p. For any relation p of arity n,
we can associate n set-valued functions with p:
Fi(p)(SI, . .. , Si-I, Si+I," ., Sn)
{Si I(Sl, ... ,Sn) E S(p)}
'lri( S(p)),
where i = 1, ... ,n, the S/s are sets, 'lri is the projection
function defined by 'lri(p) = {sil(sI, ... ,sn) E p}, and
S(p) = (SI X ... X Si-1 x 'lri(p) X Si+I x ... x Sn) np.
To ensure that the result of narrowing is an interval,
we consider only relations p on IRn, such that each Fi(p)
maps intervals to intervals. We now specify interval narrowing as an input-output pair.
f
= (II, ... , In), where
Ii is a floating-point interval (1 ::; i ::; n).
Output: l' = (I~, ... , I~), where
II = Ii ne(Fi(p)(I1,"" Ii-I, Ii+I,"" In)).
Input:
e
The application of
in the formula ensures that the
output intervals are floating-point intervals. If one or
more Ii is empty, then interval narrowing fails and the iconstraint (p, f) is i-inconsistent. Otherwise it succeeds
with I~, .. . , I~ as output. Note that the output interval
Ii is a subset of the corresponding input interval h
For example, the Fi(add)'s of the relation add =
{(x, y, z) I x, y, z E JR, x + y = z} are
F1( add)(I2, 13) = 13812, F2( add) (II, 13) = 138 II,
F3(add) (I1,!2) = II EEl 12,
where A EEl B = {a
+ b Ia
E
A, b E B} and A 8 B
{a - b I a E A, bE B}.
=
The following theorem shows the soundness and completeness of interval narrowing.
Theorem 3: Let C be (p, (/1,"" In)). If (XI,"" Xn)
E p and (Ii, ... , I~) are the output intervals obtained
from interval narrowing of C, then (XI, ... , Xn) E II X
... x In if and only if (Xl,' .. , Xn) E I~ x ... x I~.
Proof: Since I~ x· .. x I~ ~ II x· .. x In, the if-part of the
lemma is true. In the following, we prove the only-if-part
of the lemma.
Suppose (Xl"'" Xn) E II x··· xInnp. We have Xi E Ii
for i = 1, ... , n and Xi E Fi(p)(Ib'" ,!i-1,!i+I,"" In)
by definition. Therefore Xi E Ii and (XI, ... , Xn) E I~ x
... x I~.
•
The next lemma assists in expressing interval narrowing in terms of relational operations.
Lemma 4: For A E I(IF) and B E I(JR), A ne(B)
=
e(AnB).
•
We rewrite the output of interval narrowing as follows:
Ii
Ii ne(Fi(p)(II,"" Ii-I, Ii+I,"" In))
e(Ii Fi (p) (II, ... , Ii-I, Ii+! , ... , In)) by lemma 4
e(Ii n'lri(T(p)))
e(Ii n{Xi I (Xl,"" Xn) E T(p)})
e({Xi E hl(XI,""X n) ET(p)})
e( {Xi I (XI,"" Xn) E ((II X .,. x In) np)})
e('lri((I1 x ... X In) np)),
n
where T(p)
(II x ... X I i - 1 x 'lri(p) X Ii+! x
... x In) np.
In essence, interval narrowing computes the intersection of II x .,. x In and p, and
outward-rounds each projection of the resulting relation.
We show in [Lee and van Emden 1991b] that
interval narrowing is an instance of the LAIR rule
[Van Hentenryck 1989], which is based on the arc consistency techniques [Mackworth 1977]. Figure 1 illustrates
the interval narrowing of the constraint (Ie, (II, 12 )),
where Ie = {(x,y) Ix,y E IR,x ::; y}. In the diagram,
the initial floating-point intervals are II and 12 , The dotted region denotes the relation Ie; the region for II x 12 is
shaded with a straight-line pattern. Interval narrowing
returns I~ and I~ by taking the projections of the intersection of the two regions. There is no need to perform
outward-rounding in this example since the bounds of I~
and I~ share those of II and 12 •
999
y
.
,
-.................. - .............
-. -. :
................. . .
...... _. __ .. _ --00 ''"' ____ :__ ,...... _.....
I
:
~
:
I
:,
x
narrowing; the other partition is visited upon automatic
backtracking or under user control.
An advantage of relational interval arithmetic is that
we do not have the division-by-zero problem. For example, the i-constraint (mul tiply+, ((4, +00),0, [-3,5))) is
reduced to (mul tiply+, ((4, +00),0,0)).
Relations induced from transcendental functions and
the dis equality relation, such as sin = {(x, y) I x, Y E
JR, y = sin( x nand dif = {(x, y) I x, Y E JR, x =1= y},
also suffer from the same problem as the multiply relation. Similarly, we can solve the problem by appropriate
partitioning of the relations.
3.4
Figure 1: Pictorial illustration of interval narrowing.
3.3
Arithmetic Primitives
A useful relational interval arithmetic system should support some primitive arithmetic constraints, such as addition and multiplication. More complex constraints can
then be built from these primitives. To ensure that a
relation p is suitable for interval narrowing, we need to
check that each Fi(p) maps from real intervals to real
intervals. If p is one of
eq
add
Ie
1t
{(x,x)IXEJR},
lx, y, z E JR, x + y = z},
{( x, y, z)
{(x,Y)lx,YEJR,x~y},
{ ( x, y) I x, Y E JR, x < y},
we can verify easily that the Fi(p) 's satisfy the criterion.
The case for the multiplication relation multiply =
{(x, y, z) I x, y, z E JR, xy = z}, requires further explanation. Consider
Fl(mul tiply) (I2,!3) = 13012 and
F 2 (mul tiply)(II, 13) = 13011,
where A 0 B = {a/bla E A,b E B,b =1= O}. Note
that A 0 B is not an interval in general. For example,
[1,1] 0 [-2,3] = (-00,-1/2] U [1/3,+00) is a union of
two disjoint intervals. The multiply relation does not
satisfy the criterion for interval narrowing.
As suggested in [Cleary 1987], we can circumvent the
problem by partitioning multiply into multiply+ and
multiply-, where
multiply+
multiply-
{(x,y,z) Ix,y,z E IR,x ~ O,xy = z},
{(x,y,z) I X,y,z E JR, x < O,xy = z}.
By restricting interval narrowing to one partition or the
other, we can guarantee that the result of interval division is an interval. When a multiply constraint is encountered, we choose one of the partitions and perform
Constraint Networks
The interval narrowing discussed so far reduces individual i-constraints. In practice, we have more than one
constraint in a problem. These constraints may depend
on one another by sharing intervals. By naming an interval by a variable and by having a variable occur in
more than one constraint, we indicate that constraints
share intervals. Note that the material in this section
is not related to logic programming but is in conventional notation with destructive assignment 2 • We define
an i-network to be a set of i-constraints. Consider the
quadratic equation x 2 - x - 6 = 0, which can be rewritten to x(x - 1) = 6. Suppose our initial guess for the
positive root of the equation is [1,100]. We can express
the equation by the following i-network:
{(add, (Vi, [1, 1], V)), (mul tiply+, (V,
Vi, [6,6]) n,
where the variables V and Vi are intervals [1,100] and
( -00, +00) respectively.
Our goal is to use interval narrowing to reduce inetworks. Note the following two observations. First,
the reduction of an i-constraint C in the i-network affects
other i-constraints that share variables with C. Second,
interval narrowing is idempotent as shown in the following lemma.
Lemma 5: Let 1 = (II, ... , In), ji = (I~, ... , I~), and
i" = (I~', .. . , I::) be tuples of floating-point intervals and
p a relation on JRn. If, by interval narrowing, ji is obtained from i and ji' is obtained from ji, then ji = i".
Proof: To prove the equality of ji and ji', we prove
II = Ii' for i = 1, ... , n. By the definition of interval
narrowing and lemma 4, we have
n
II = e(Ii Fi(p) (II , ... , Ii-I, Ii+l , ... , In)),
Ii' = e(IInFi(p)(I~, .. ·'!LllII+1'···'!~))·
~ II. Next we prove II
There exists
It is obvious that II'
Suppose
ai E If.
~
II'.
2In section 4, we show how we use logical variables to replace
the conventional variables.
1000
such that ai E e([aj, ajD by lemma 1. By the definition of
Fj(p),for j = 1, ... ,i-1,i+1, ... ,n,thereexistsaj E I j
such that (aI, .. . , an) E p. Since ai E Ii, we have aj E Ij
for each j. Thus, aj E Fi(p)(If, ... ,IL1,IIH , ... , I~).
This implies that aj E Ii'- By lemma 2, ai E II'.
•
An i-constraint (p, f) is stable if applying interval narrowing on f results in 1. An i-network is stable if every
i-constraint in the i-network is stable. The reduction of
an i-network amounts to transforming it into a stable
one.
A naive approach for the reduction of an i-network is
to reduce each i-constraint in the i-network in turn until every i-constraint becomes stable. As suggested by
lemma 5, this method is inefficient since much computation is wasted in reducing stable i-constraints. Algorithm 1, which is based on the constraint relaxation algorithm described in [Cleary 1987], is the pseudocode of
a more efficient procedure. The algorithm tries to avoid
reductions of stable i-constraints and, in this respect,
it is similar to AC-3 [Mackworth 1977] and the Waltz
algorithm [Waltz 1975]. Without loss of generality, we
assume that every i-constraint in the i-network is of the
form (p, (Vi, ... , Vn )), where Vi's are interval-valued variables.
initialize list A to hold all i-constraints in the i-network
initialize P to the empty list
while A is not empty
remove the first i-constraint, (p, V), from A
apply interval narrowing on V to obtain V'
if interval narrowing fails then
exit with failure
else if V =1= V' then
V~V'
foreach i-constraint (q,:9) in P
if V and :9 share narrowed variable(s) then
remove (q, :9) from P and append it to A
endif
endforeach
endif
append (p, V) to the end of P
endwhile
Algorithm 1: A Relaxation Algorithm.
Algorithm 1 resembles a classical iterative numericalapproximation
technique
called
"relaxation"
[Southwell 1946], which was first adopted in a constraint
system in [Sutherland 1963]. Numerical relaxation may
have numerical stability problems; the procedure may
fail to converge or terminate even when the constraints
have a solution. Algorithm 1 does not suffer from this
problem as shown in the following theorem.
Theorem 6: Algorithm 1 terminates. The resulting inetwork is either i-inconsistent or stable.
•
The validity of theorem 6 is easy to check since the
precision of a floating-point system is finite and thus interval narrowing cannot occur indefinitely due to the use
of outward rounding.
In the following, we show how algorithm 1 finds the
positive root of the equation x 2 - x - 6 = 0 with initial
guess in [1,100]. Initially, the passive list, P, is empty
and the active list of i-constraints, A, is [GI, G2 ], where
(add, (VI, [l,l],V)) and
G1
G2
(rnul tiply+, (V,
Vi, [6,6])).
We remove the first i-constraint G1 from A and reduce
it as shown in figure 2.
The updated values of V and Vi are [1, 100] and
[0,99] respectively. Similar narrowing is performed on
the rnul tiply+ i-constraint. The process repeats until
the precision of the underlying floating-point system is
reached and no more narrowing takes place. The history
of the values of A, P, V, and Vi, with four significant
figures, after each narrowing is summarized in table 1.
Table 1: Traces of A, P, V, and
Vi.
A
P
V
V1
[GI, G2 ]
[G2 ]
[G1]
[G2 ]
[G1]
[G 2 ]
[Gd
[G2 ]
[G 1]
[]
[G1]
[G 2 ]
[G1]
[G2 ]
[Gl ]
[G2 ]
[G1]
[G 2 ]
[1,100]
[1,100]
[1,100]
[1.06,7]
[1.06,7]
(1.857,6.661 )
(1.857,6.661 )
(1.900,4.231 )
(1.900,4.231 )
(-00, +00)
[0,99]
[0.06,6]
[0.06,6]
(0.8571,5.661)
(0.8571,5.661)
(0.9009,3.231)
(0.9009,3.231)
(1.418,3.157)
[]
[G1 ,G2 ]
(2.999,3.001 )
(1.999,2.001)
It is well-known that arc-consistency techniques are
"incomplete" [Mackworth 1977]: a network can be stable but neither a solution nor inconsistency is found. In
the finite domain case, enumeration, instantiation, and
backtracking can be used to find a particular solution after the constraint network becomes stable. This method
is infeasible for interval domains, which are infinite sets.
We use domain splitting [Van Hentenryck 1989] in place
of enumeration and instantiation. When an i-network
becomes stable, we split an interval into two partitions,
choose one partition and visit the other upon automatic
backtracking or user control.
1001
Input Intervals = (Ill 12 , h)
Fl(add) =
[1,100]
8
[1,1]
F2(add) =
[1,100]
8 (-00,+00)
F3 ( add) = (-00, +00) EEl
[1,1]
Output Intervals = (If, I~, I~)
(-00, +00)
[0,99]
[1,1]
[1,100]
(-00, +00)
(-00, +00)
[0,99]
[1,1]
[1,100]
Figure 2: Interval narrowing of an add constraint.
ICLP(R)
4
Examining the answer more carefully, we note that
So far, we have explained how a network of constraints
in terms of floating-point intervals can be made stable. We have not considered how such networks can
be specified. One language is CLP(R). An i-constraint
(p, (II,' .. , In)) can be expressed in CLP(R) as
X = -Y + Z
An example.
The i-constraint
(add, ([0,2], [1,3], [4,6]))
is represented by
X
~
O,X ::; 2, y
~
1, Y ::; 3, Z 2:: 4, Z ::; 6,X
+ Y = Z.
However, if we submit the above example as a query to
CLP(R) Version 2.02, we get the answer constraint
X = -Y + Z,2
+Y
6
~
~ Z,Y ~ 1,3 ~ Y,Z ~ 4,
Z,X ~ O.
= Z and 2
+ Y 2:: Z == X ::; 2.
Thus the answer constraint is the original query disguised
in a slightly different form. CLP(R) only checks the
solvability of the constraint but does not remove undesirable values from the intervals. A more useful answer
constraint is
X
where Xi E Ii stands for an appropriate set of inequalities. In this representation, we don't need conventional
variables. Constraints share intervals when they share
logical variables.
When a query to a logic program is answered according to the CLP scheme at each step, a network of constraints is solved. A special case of such a constraint is
a variable's membership of a domain. According to the
CLP scheme, it is possible at any stage that the domain
inclusions of the current set of constraints is inconsistent in Mackworth's sense of the Constraint-Satisfaction
Problem. We modify the basic CLP scheme by inserting
a constraint simplification step and show that interval
narrowing is a constraint simplification operation.
In principle, any of the constraint-satisfaction algorithms by Mackworth can be used. In this paper we are
concerned with real-valued variables. As we argued, the
only known way of obtaining correct answers involving
real numbers on a machine with floating-point numbers is to use interval arithmetic. Accordingly, we explain
the theory of ICLP(R), where the sub-network consisting of i-constraints is narrowed according to the method
described above.
== X + Y
~
1, X ::; 2, Y
~
2, Y ::; 3, Z
~
4, Z ::; 5, X
+ Y = Z.
The modified CLP scheme ICLP(R) is CLP(R) enhanced with interval narrowing and algorithm 1. The
operational semantics of ICLP(R) is based on a generalization of Mx-derivations [Jaffar and Lassez 1986]. Let
P be a CLP(X) program, where X is a structure with
model Mx, and f - Gi be a goal. f - G i +1 is M'x-derived
from f - Gi if
1.
2.
f-
G' is Mx-derived from
f-
G i , and
f - G i +1
= v( f - G'), where v is a normal-form
function that maps from goal to goal such that
P FMx 3(G') {:} P FMx 3(Gi +1)'
An M'x-derivation is a, possibly infinite, sequence of
goals G = Go, Gl , G2 , •.. such that Gi +1 is M'x-derived
from G i . A M'x-derivation is successful if it is finite
and the last goal contains no atoms. The soundness and
completeness of M'x-derivations follow directly from the
soundness and completeness of Mx-derivations and the
definition of the normal-form function. A M'x-derivation
is finitely-failed if it is finite, the last goal has one or more
atoms, and condition 1 does not hold.
The M'x-derivation step is not new. In fact, it
has been implemented in other CLP systems as a constraint simplification step. The Mx-derivation step only checks the solvability of the constraint accumulated
so far. Therefore, the answer constraint of a successful
Mx-derivation is usually complex and difficult to interpret. A useful system should simplify the constraint to
a more "readable" form. For example, CLP(R) simplifies the constraint {X + Y = 4, X - Y = 1} to
{X = 2.5, Y = lo5}. Suppose the goal f - d, A' is M x derived from f - e, A. CLP(R) simplifies e' to e" such
that FMx 3(e') {:}FMx 3(e") and thus
P FMx 3(e', A')
{:} P
FMx 3(e", A');
1002
CLP(n) is based on
M~-derivation.
Theorem 7: If
C
{)(l E
C'
{)(l E I~,
I1, ... ,)(n E In,P()(l, ... ,)(n)} and
... ,)(n E I~,P()(l, ... ,)(n)},
where Ii is obt~ined from Ii by interval narrowing for
i = 1, ... , n, then FMx 3(C) ¢:>FMx 3(C').
Proof: The theorem follows directly from theorem 3. •
Theorem 7 guarantees that interval narrowing transforms a constraint into a stable constraint with the same
solution space. Algorithm 1, which performs narrowing repeatedly on i-constraints in a network, is thus a
normal-form function.
Partitioning of relations can also be expressed compactly in ICLP(n). For example, the multiply relation
can be defined by
multiplyCX,Y,Z) .- X ~ O,multiply+CX,y,Z).
multiplyCX,Y,Z) .- X < O,multiply-CX,Y,Z).
A meta-interpreter ICLP(n) written in CLP(n) is described in [Lee and van Emden 1991a]. We have not yet
included outward rounding in the current implementation. Table 1 is derived from a trace produced by our
prototype, except that the outward rounding has been
added manually.
5
Concluding Remarks
We have developed the essential components of a relational interval arithmetic system. Interval narrowing establishes (1) the criterion that an arithmetic relation has
to satisfy to be used as arithmetic constraint in relational interval arithmetic, and (2) the reduction of arithmetic constraint using the interval functions induced from
the constraint. Algorithm 1 then coordinates the applications of narrowing to transform a constraint network
into its stable form.
The incorporation of relational interval arithmetic in
CLP(n) makes it possible to describe programs, constraints, queries, intervals, answers, and variables in a
coherent and semantically precise language--Iogic. The
semantics of ICLP(n) is based on M~-derivation, which
is a logical deduction. Consequently, numerical computation is deduction in ICLP(n), which is a general-purpose
programming language allowing compact description and
dynamic growth of constraint networks. One advantage
of ICLP(n) over CLP(n) is the ability to handle nonlinear constraints, which are delayed in CLP(R). It is
important to note that ICLP(n) is not another instance
of the CLP scheme. It is a correct implementation of
CLP(~).
The ICLP(n) meta-interpreter shows the feasibility of
our approach. Future work includes extending CLP(n)
at the source level, to ICLP(n) to improve efficiency. We
also plan to investigate applications in such areas as finite
element analysis, and spatial and temporal reasoning.
Acknowledgements
We thank Mantis Cheng, Bill Havens, Alan Mackworth,
Clifford Walinsky, and the anonymous referees for helpful suggestions. Paul Strooper gave constructive comments to early drafts of this paper. Generous support
was provided by the British Columbia Advanced Systems
Institute, the Canada Natural Science and Engineering
Research Council, the Canadian Institute for Advanced
Research, the Institute of Robotics and Intelligent Systems, and the Laboratory for Automation, Communication and Information Systems Research.
References
[Cleary 1987] J.G. Cleary. Logical arithmetic. Future
Computing Systems, 2(2):125-149, 1987.
[Davis 1987] E. Davis. Constraint propagation with interval labels. A rtificial Intelligence, 32:281-331,
1987.
[de Kleer and Brown 1984] J. de Kleer and J.S. Brown.
A qualitative physics based on confluences. Artificial Intelligence, 24:7-83, 1984.
[Dean 1985] T. Dean. Temporal imagery: An approach
to reasoning about time for planning and problem
solving. Technical Report 433, Yale University, New
Haven, CT, USA, 1985.
[Dincbas et al. 1988] M. Dincbas, P. Van Hentenryck,
H. Simonis, A. Aggoun, T. Graf, and F. Berthier.
The constraint logic programming language CHIP.
In Proceedings of the International Conference on
Fifth Generation Computer Systems (FGCS'88),
pages 693-702, Tokyo, Japan, December 1988.
[Hayes 1973] P.J. Hayes. Computation and deduction.
In Proceedings of the Second MFCS Symposium,
pages 105-118. Czechoslovak Academy of Sciences,
1973.
(Jaffar and Lassez 1986] J. Jaffar and J-L. Lassez. Constraint logic programming. Technical Report 72, Department of Computer Science, Monash University,
Clayton, Victoria, Australia, 1986.
[Jaffar and Lassez 1987] J. Jaffar and J-1. Lassez. Constraint logic programming. In Proceedings of the
14th ACM POPL Conference, pages 111-119, Munich, January 1987.
1003
[Jaffar et al. 1990] J. Jaffar, S. Michaylov, P.J. Stuckey, and R.H.C. Yap. The CLP(R) language and
system. Technical Report CMU-CS-90-181, School
of Computer Science, Carnegie Mellon University,
Pittsburgh, PA, USA, 1990.
[Lee and van Emden 1991a] J.H.M. Lee and M.H. van
Emden. Adapting CLP(R) to floating-point arithmetic. Technical Report LP-18 (DCS-183-IR), Department of Computer Science, University of Victoria, Victoria, B.C., Canada, December 1991.
[Lee and van Emden 1991b] J.H.M. Lee and M.H. van
Emden. Numerical computation can be deduction
in CHIP. Technical Report LP-19 (DCS-184-IR),
Department of Computer Science, University of Victoria, Victoria, B.C., Canada, December 1991. (submitted for publication).
[Mackworth 1977] A.K. Mackworth. Consistency in networks of relations. AI Journal, 8(1):99-118, 1977.
[Mackworth et al. 1985] A.K. Mackworth, J.A. Mulder,
and W.S. Havens. Hierarchical arc consistency: Exploiting structured domains in constraint satisfaction problems. Computational Intelligence, 1:118126, 1985.
[McDermott and Davis 1984] D.V. McDermott and E.
Davis. Planning routes through uncertain territory. Artificial Intelligence, 22:107-156, 1984.
[Moore 1966] R.E. Moore. Interval Analysis. PrenticeHall, 1966.
[Moore 1988] R.E. Moore, editor.
Reliability in
Computing-The Role of Interval Methods in Scientific Computing. Academic Press, 1988.
[IEEE 1987] Members of the Radix-Independent
Floating-point Arithmetic Working Group. IEEE
standard for radix-independent floating-point arithmetic. Technical Report ANSI/IEEE Std 854-1987,
The Institute of Electrical and Electronics Engineers, New York, USA, 1987.
[Older and Vellino 1990] W. Older and A. Vellino. Extending Prolog with constraint arithmetics on real intervals. In Proceedings of the Canadian Conference on Computer f3 Electrical Engineering, Ottawa, Canada, 1990.
[Sidebotto~ and Havens 1992] G. Sidebottom and W.S.
Havens. Hierarchical arc consistency for disjoint real
intervals in constraint logic programming. Computational Intelligence, 8(2), May 1992.
[Southwell 1946] R.V. Southwell. Relaxation Methods in
Theoretical Physics. Oxford University Press, 1946.
[Sutherland 1963] I.E. Sutherland. SKETCHPAD: a
Man-Machine Graphical Communication System.
PhD thesis, MIT Lincoln Labs, Cambridge, MA,
1963.
[Van Hentenryck 1989] P. Van Hentenryck. Constraint
Satisfaction in Logic Programming. The MIT Press,
1989.
[Waltz 1975] D. Waltz. Understanding line drawings of
scenes with shadows. In P. Winston, editor, The
Psychology of Computer Vision. McGraw-Hill, 1975.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1004
Domain Independent Propagation
Thierry Le Provost, Mark Wallace
ECRC, Arabellastr 17
8000 Miinchen 81, Germany
{thierry I mark}@ecrc.de
Abstract
Recent years have seen the emergence of two main
approaches to integrating constraints into logic
programming. The CLP Scheme introduces constraints as basic statements over built-in computation domains. On the other hand, systems such as
CHIP have introduced new inference rules, which
enable certain predicates to be used for propagation thereby pruning the search tree. Unfortunately these two complementary approaches were
up to now incompatible, since propagation techniques appeared intimately tied to the notion of
finite domains. This paper introduces a generalisation of propagation that is applicable to any
CLP computation domain, thereby restoring orthogonality and bridging the gap between two important constraint logic programming paradigms.
The practical interest of this new notion of "domain independent" propagation is demonstrated
by applying a prototype system for solving some
hard search problems.
1
Introduction
There are two main approaches for integrating
constraints into logic programming. The first approach, formalised as CLP(X) [Jaffar and Lassez,
1987], is to replace the usual domain of computation with a new domain X. The computation
domain X specifies a universe of values; a set of
predefined functions and relations on this universe;
and a class of basic constraints, which are formulae
built from predefined predicate and function symbols, and logical connectives. The C LP scheme
requires that an effective procedure decide on the
satisfiability of the basic constraints. The facility
to define new predicates as facts or rules, possibly
involving the built-in's, is carried over from logic
programming. The evaluation of queries involving
such user-defined predicates is performed using an
extension of resolution, where syntactic unification
is replaced with deciding the satisfiability of basic
constraints (constraint solving). As with standard
logic programming the default search method for
evaluating program-defined predicates is depthfirst, based on the ordering of program clauses and
goals.
The second main approach to integrating constraints in logic programming uses the standard,
syntactic, domain of computation, except that
the variables may be restricted to explicitly range
over finite subsets of the universe of values (finite
domain variables) [Van Hentenryck and Dincbas,
1986]. In this approach, inaugurated by CHIP
[Dincbas et al., 1988), it is the proof system that
is extended. The new type of controlled inference is termed constraint propagation or consistency techniques [Van Hentenryck, 1989). These
techniques combine solution-preserving simplification rules and tree search, and were originally introduced for solving constraint satisfaction problems [Montanari, 1974; Mackworth, 1977).
Informally constraint propagation aims at exploiting program-defined predicates as constraints.
It operates by looking ahead at yet unsolved goals
to see what locally consistent valuations there remain for individual problem variables. Such constraint techniques can have 8. dramatic effect in
cutting down the size of the search space [Dincbas
et al., 1990).
To date the technique of propagation has only
been defined for search involving finite domain
variables. Each such variable can only take 8. finite
number of values, and looking ahead is a way of
deterministically ruling out certain locally inconsistent values and thus reducing the domains. This
restriction has prevented the application of propagation to new computation domains introduced
by the C LP(X) approach. In addition propagation, as currently defined, cannot reason on com-
1005
pound terms, thereby enforcing an unnatural and
potentially inefficient encoding of structured data
as collections of constants.
This has meant that the two approaches to integrating constraints into logic programming have
had to remain quite separate. Even in the CHIP
system which utilises both types of integration,
propagation is excluded from those parts of the
programs involving new computation domains,
such as Boolean algebra or linear rational arithmetic.
This paper proposes a generalisation of propagation, which enables it to be applied on arbitrary
computation domains. Generalised propagation
can be applied in C LP(X) programs, whatever the
domain X. Furthermore its basic concepts, theoretical foundations, and abstract operational semantics can be defined independently of the computation domain. This allows programmers to reason about the efficiency of C LP programs involving propagation in an intuitive a,nd uniform way.
This generality carries over to the implementation,
where algorithms for executing generalised propagation apply across a large range of basic constraint theories. Last but not least, the declarative
semantics of C LP programs is preserved.
The main idea behind generalised propagation
is to use whatever basic constraints are available
in a CLP(X) language to express restrictions on
problem variables. Goals designated as propagation constraints are repeatedly approximated. to
the finest basic constraint preserving their solutions. When no further refinement of the current
resolvent's basic constraint is feasible, a resolution
step is performed and propagation starts again.
The practical relevance of generalised propagation has been tested by implementing it in the
computation domain of Prolog. Programs are just
sets of Prolog rules with annotations identifying
the goals to be used for propagation. The language has enabled us to write programs which are
simple, yet efficient, without the need to resort
to constructs without a clear declarative semantics such as demons. The performance results have
been very encouraging.
In the next section we recall the interest of integrating propagation over finite domains into logic
programming. We then present a logical basis for
propagation that will provide the basis for generalisation. The following section introduces generalised propagation, and sketches its theoretical
basis. The fourth section introduces our prototype system on top of Prolog, and discusses some
of the examples that we tackled with it. In conclusion we identify the directions that this work is
now taking.
2
Propagation over Finite Domains
2.1
Propagation in Constraint
Satisfaction Problems
The study of constraint satisfaction problems has
a long history, and we mention here just a few important references. The concept of arc consistency
was introduced in [Mackworth, 1977]; its combination with backtrack search was described in [Haralick and Elliot, 1980]; the notion of value propagation is due to [Sussman and Steele, 1980]; the application of constraint methods to real arithmetic
was surveyed in [Davis, 1987]; finally [Van Hentenryck, 1989] extensively motivates and describes in
detail the integration of finite-domain propagation
methods into logic programming.
A constraint satisfaction problem (CSP) can be
represen ted as
• a set of variables, {Xl, ... , Xn}, each Xi
ranging over a finite domain Di;
• a set of constraints Cl, ... , Cm on these variables, where each constraint Ci is an atomic
goal Pi(Xi 1 , .•• Xik) defined by a k-ary predicate Pi.
A solution to the problem is an assignment of
values from the domains to the variables (a labelling) such that all the constraints are satisfied. We now briefly recall the main approaches
to solving esP's in a logic programming setting,
using the following toy example. The problem has
four variables Xl, X2, X3, X 4, each with domain
{a, b, c}. There are four constraints, each involving the same binary predicate p:
p(X3, Xl) "p(X2, X3) 1\ p(X2, X4) 1\ p(X3, X 4)
The relation denoted by P has three tuples:
<
a,b
>, < a,c >, < b,c >.
Generate and Test This approach enumerates
labellings in a systematic way until one is found
that satisfies all the constraints. It is hopelessly inefficient for all but the smallest problem instances.
In our example the system will go through all 27
labellings which begin with an a, before discovering that Xl cannot take this value due to the first
constraint p(X3, Xl). In general reordering the
constraint goals may only bring minor improvements. Analysing the cause for the failure of goals
so as to avoid irrelevant backtrack steps (selective
backtracking) makes the runtime structures more
complex and is insufficient for complex problems
(see for instance [Wolfram, 1989]).1
1 Selective, or intelligent, backtracking [Codognet
and Sola, 1990] addresses the symptom of too many
choice points. Propagation addresses the cause, by reducing the number of choice points in advance.
1006
Backtrack Search A first improvement on pure
generate-and-test is to check each constraint goal
as soon as all its variables have received values
[Golomb and Baumert, 1965]. Backtrack search
thus performs an implicit enumeration over the
space of possible labellings, discarding partial ~a
beIlings as soon as they can be proved locally Inconsistent with respect to some constraint goal.
Backtrack search demonstrates considerable gains
over generate-and-test (the inconsistent assignment Xl
a is detected at once). However
this procedure still suffers from "maladies" [Mackworth, 1977], the worst being its repeated discovery of local inconsistencies. For instance it is obvious from p(X3, Xl) " p(X2, X3) alone that Xl
cannot take the value b. Backtrack search will
nonetheless consider all 9 combinations of values
b.
for X2 and X3 before rescinding Xl
=
=
LocAl Propagation The idea behind local
propagation methods for CSP's is to work on each
constraint independently, and deterministically to
extract information about locally consistent assignments. This has lead to various consistency
algorithms for networks of constraints, the most
widely applicable of these being arc-consistency
[Montanari,1974]. Consistency can be applied as a
preliminary to the search steps or interleaved with
them [Haralick and Elliot, 1980]. The application
of these techniques in the constraint logic programming language CHIP was accomplished through
two complementary extensions [Van Hentenryck
and Dincbas, 1986; Van Hentenryck, 1989]
• explicit finite domains of values to allow
the expression of range restrictions, together
with the corresponding extension of unification (FD-resolution)
• new lookahead inference rules to reduce finite
domains in a deterministic way
The effect of applying lookahead on a goal is to
reduce the domains associated with the variables
in the goal, so that the resulting domains approximate as closely as possible the set of remaining
goal solutions. The solutions can be determined
by simply calling the goal repeatedly. Application
of the lookahead rule is repeated on all constraint
goals until no more domain reductions are possible, forming a propagation sequence. Constraint
goals that are satisfied by any combination of values in the domains of their arguments can now be
dropped.
Our example problem can be encoded in a
CHIP-like syntax as follows:
csp(X1,X2,X3,X4) :lookahead p(X3,X1),
/. [1] ./
lookahead p(X2,X3), /. [2] ./
lookahead p(X2,X4), /. [3] ./
lookahead p(X3,X4), /. [4] ./
dom(X1),dom(X2),dom(X3),dom(X4).
The lookahead annotations identify goals that
must be treated by the new inference rule. Annotations can be ignored for a declarative reading.
For our example problem, the initial propagation sequence is sufficient to produce the only solution; domain goals merely check each of the variable bindings already produced. A possible computation sequence is as follows (though the ordering is immaterial for the final result):
lookahead on:
produces:
p(X3,X1)
[1] X3::{a,b}, X1::{b,c}
p(X2,X3::{a,b}) [2]
X2=a, X3=b
p(a,X4)
[3]
X4::{b,c}
p(b,X1::{b,c}) [4]
X4=c
p(b,X1::{b,c}) [1]
X1=c
p(a,b)
[2] succeeds
p(a,c)
[3] succeeds
p(b,c)
[4] succeeds
Note that the constraint [1] takes part in two propagation steps before it is solved. In general constraints may be involved in any number (> 0) of
propagation steps.
From this brief summary of consistency techniques for esP's and their integration into logic
programming, it may appear that finite domain
variables form the cornerstone of propagation.
The purpose of this paper is to show that this is
not the case, and that propagation has a very general, natural and useful counterpart in constraint
logic programming languages that do not feature
finite domains.
2.2
A Logical Basis for Propagation
The effect of (finite domain) propagation on a constraint is to reduce the domains associated with
the variables appearing in the constraint. The resulting domains capture as precisely as possible
the meaning of the constraint. The aim of this
section is to say in what sense the meaning of a
constraint is captured by a set of domains, and to
give a formal characterisation of the qualification
"as precisely as possible" .
A constraint C(Xl,"" Xn) is to be understood
as a logical formula with free variables Xl, ... , X n .
A constraint formula has the syntactic form:
(Xl
= au "
akl " ... "
Xn
... "Xn
= akn).
= al n ) V ... V (Xl
,
=
A domain formula Dom(X) is a disjunction of
equalities involving a single variable X:
an.
X al V X a2 V ... V X
=
=
=
1007
Generally many variables are involved in a problem, and we therefore introduce a syntactic class
of formulae representing the conjunction of their
domains. These are the basic formulae. Thus a
basic formula D(Xl, ... , Xn) has the form:
D07nl(Xd
1\ ... 1\
Domn(Xn ).
The reduced domains, resulting from propagation on a constraint, approximate the constraint
formula as closely as is possible using only a basic formula. Propagation is "precise" if this basic
formula is logically equivalent to the constraint formula. The problem is that basic formulae have a
limited expressive power, and it is not in general
possible to find one logically equivalent to a given
constraint formula.
For example the constraint formula C(Xl' X 2 ),
(Xl = al\X2 = b) V (Xl = al\X2 = C)V(XI =
b 1\ X2 = c),
is best approximated by the basic formula
(Xl = a V Xl = b) 1\ (X2 = b V X 2 = c).
However there is no basic formula logically equivalent to C(Xl' X2)'
Definition 1 A propagation step takes a constraint formula C and a basic formula D and
yields a "least" basic formula D' which satisfies
(C 1\ D) -+ D'. D' is the least such formula in the
sense that for any other basic formula D" satisfying (C 1\ D) -+ D" it is also true that D' -+ D".
This definition will be illustrated using the constraint C(Xt, X 2 ):
(Xl = al\X 2 = b)V(XI = bl\X2 = C)V(XI'=
cl\X2=a).
The input basic formula D(X 1 , X 2 ) is:
(Xl = a V Xl = b) 1\ (X2 = a V X 2 = b V X 2 = c).
Propagation on a constraint involves two steps:
the simplification of the constraint and the reduction of domains associated with its variables.
The simplification of the constraint C(XI , X 2 ),
with respect to the basic constraint D(Xl' X 2) is
just the calculation of a simplified constraint logically equivalent to C(Xl, X 2 ) 1\ D(XI' X 2 ). The
result of simplifying is C'(X 1 ,X2 ) == (C(Xt,X 2 )I\
D(Xl' X2» ==
=
(Xl = a 1\ X 2 = b) V (Xl
b 1\ X2 = c).
The reduction of the domains is the calculation of a new basic formula which approximates as
closely as possible the simplified constraint. The
result of reducing is D'(X l, X 2 ) ==
(Xl a V Xl b) 1\ (X2 b V X 2 c).
For this example there is no basic constraint
logically equivalent to C'(Xt, X2)'
However
D' (X 1, X 2) is the least basic formula implied by
C'(XI,X 2 ) since the domain of Xl must include
at least a and b, and the domain of X 2 must in-
=
=
=
=
clude at least band c.
Definition 2 Propagation is the result of applying
a propagation sequence, which is the repeated application of propagation steps on every constraint
until no more domain reductions are possible.
This definition does not mention the order in
which propagation steps are done. In fact the result of performing propagation on a set of constraints is independent of the order. We prove this
as follows.
Lemma 1 If ba,'iic formulae are ordered by logical
entailment, propagation steps are increasing and
monotonic on basic formulae.
This is easily deduced from the definition of a
propagation step.
Lemma 2 Each (ordered) propagation sequence
yields a fixpoint.
This follows from the fact that there are only
finitely many basic formulae greater than a given
basic formula under the logical entailment ordering, and propagation steps are increasing.
Theorem 1 The result of a propagation sequence
is independent of the order of the steps.
Suppose fix land fix2 were distinct fixpoints of
a propagation sequence, resulting from an initial
basic formula sO. Since propagation is increasing,
fixl ~ sO. fix2 results from applying a particular
ordered sequence of propagations on sO. By monot.onicity t.his same sequence applied to fixl yields
a result fix3 ~ fix2. However since fix 1 is a fixpoint of the propagation sequence, fix3 = fixl.
We conclude that fixl ~ fix2. Symmetrically
we can conclude that fix2 ~ fixl, and therefore
fixl
= fix2.
It is also possible to show that propagation can
be performed in parallel, and still yield the same
fixpoint. These and other results fall out very naturally when lattice theory is used to describe the
constraints. The lattice theoretic formalisation of
generalised propagation is described in another paper [Le Provost and Wallace, 1992].
3
Generalised Propagation
For finite domain propagation, the basic formulae
express domains associated with the problem variables, and the constraint formulae express membership of tuples in relations. Each class of formulae has a certain limited expressive power. However the definition of a propagation step and a
propagation sequence do not depend on the particular syntactic classes chosen for basic formulae or
constraint formulae. In this section we will explore
1008
the consequences of admitting different classes of
formulae. We shall propose a notion of generalised
propagation parameterised on the classes of formulae.
In the CLP{X) approach a class of basic constraints is identified for each domain X. Generalised propagation on a domain X is the result of
admitting the basic constraints on X as basic formulae as described in the last section. The class of
constraint formulae is the class of goals expressible
in CLP(X).
The basic formulae used for finite domain propagation involve only the equality predicate and
no function symbols. For generalised propagation
over a domain X the basic formulae may include
other predicates, such as < and >, and function
sym boIs such as + and "'. However the purpose
and effects of propagation remain the same. To
detect inconsistencies early and to extract as much
information as possible from a set of goals deterministically before making any choices. The information extracted is expressed as a basic formula,
which is added to the current constraint set, either
yielding inconsistency immediately, or else helping
to prune the remaining search.
As a simple example of generalised propagation,
consider C LP( Q) with atomic constraints Var ~
num and Var ~ num, where num is any rational
nllmber. Let us define a predicate p on which we
shall perform generalised propagation.
p(X) :- X >= 3.4, X =< 4.6
p(X) :- X >= 2.8, X =< 3.9
Assume the current constraints include X ~ 4.0,
and p(X) is a goal. The CLP(X) approach requires
us to treat user-defined predicates such as p a la
Prolog. One clause in the definition of p is selected,
and if that yields an inconsistency the other is tried
on backtracking.
.
Generalised propagation on the predicate p,
treated this time as a constraint, deterministically derives the tightest basic constraint C(X)
satisfying (p(X) /\ X ~ 4.0) -+ C(X), and adds
C(X) to its current set of constraints. In this case
C(X) == (X ~ 2.8/\ X ~ 4.0), which can be used
to prune the remaining search tree.
The case of finite domains can be viewed as an
instance, CLP(FD), of the constraint logic programming scheme, where the basic constraints
are the basic formulae as defined in section 2.2.
Propagation on finite domains can now be seen
as an instance of generalised propagation, just as
CLP(FD) is an instance of CLP(X). Notice that
the expressive power of CLP(FD) is weaker than
that of standard logic programming, since it is
impossible using domains to state that two vari-
abIes are equal, until their domains are reduced to
one value. This is indeed a weakness of propagation over finite domains, and in the next section
we shall present an implementation of generalised
propagation that overcomes it.
Unfortunately it is not the case that generalised
propagation can be automatically derived for any
computation domain X. There is a practical requirement to constructively define a propagation
step. Specifically, for propagating on a goal the
system requires an efficient way to extract a basic
formula which generalises all the answers to the
goal.
More fundamentally a theoretical problem arises
when we move from finite domain constraints to
arbitrary basic constraints. There are only finitely
many finite domain constraints tighter than a.
given constraint. This fact ensures that propagation is bound to reach a fixpoint. However for
many sets of basic constraints, such as inequalities over the rationals as exampled above, there is
no similar guarantee of termination. This problem
has been addressed by introducing a notion of approximate generalised propagation in [Le Provost
and Wallace, 1992].
4
Propia: An Implementation of
Generalised Propagation
4.1 An Overview of the Implementation
The behaviour of generalised propagation in practise has proved to be more than satisfactory.
An implementation of generalised propagation has
been completed based on ECRC's Sepia prolog system. We call it Propia. The underlying domain is
the I1erbrand domain of standard logic programming. The built-"in relation on this domain is '=',
and basic constraints are conjunctions of equalities
(or equivalently substitutions).
A simple example of generalised propagation
over this domain, is propagation on the predicate
p defined as follows:
p(g(1),a,b).
p(f(a),a,a).
p(g(2),b,a).
p(f(b),b,b).
The result of generalised propagation on the goal
p(A,X,X), is the deterministic addition of a new
equation, A = f(X). Although there are two different possible values for A, they both have the
form f(X), where X is the same variable occurring as the second and third arguments in the goal.
Using finite domains (even if structured terms were
admitted) it would only be possible to infer that
the doml\in of X was {a, b} and the domain of A
1009
=
was {f(a), f(b)}, but not that A
f(X). This is
the weakness of finite domain pointed out on page
5 above.
Implementationally constraint simplification
with respect to this goal amounts to selecting those
clauses in the definition which unify with the goal,
as done by Prolog. The reduction step, given a set
of answers, finds the set of equations which best
approximates them. The best approximation is, in
fact, their most specific generalisation.
Computations interleave the making of choices
and propagation. When a propagation sequence
terminates, goals are called a la Prolog until a new
binding, or set of bindings, occur thereby conjoinT to the current basic coning new equations X
straint. At this point propagation restarts. When
a fixpoint is reached, the propagation sequence is
complete and further goals are called a la Prolog.
It would be prohibitively expensive to attempt
propagation on all the constraints at each choice.
In practise the system determines on which variables new equalities have been added and only
propagates on constraints involving those variables. When further equalities are added during
a propagation sequence, then propagation is also
attempted on constraints involving these variables.
The purpose of propagation is to extract as
much information as possible deterministically before making any choices. The A ndorra principle [Warren, 1988] has a similar intent: it states
that deterministic goals should be executed before
other goals. The goal p(A, X, X) in the previous
example is clearly not deterministic, yet deterministic information can be extracted from it. Lee
Naish coined the term data determinacy for the determinism detected and used by generalised propagat.ion, as opposed to Andorra's weaker control
determinacy.
=
4,2 An Example of Propagation
The behaviour of generalised propagation in the
syntactic equality theory can be illustrated using
a simple example. We shall investigate what propagation is possible for varions calls on the 'and'
predicate defined a.c; follows:
and(true,true,true) ,
and(true,talse,talse),
and(talse,true,talse),
and(talse,talse,talse),
\Ve treat the goal as a propagat.ion constraint by
making the call?- propagate and(_,_,_). Note
t.hat. finite domain variables are not part of our
chosen propagation language.
For "most specific generalisation" we shall use
the abbreviation msg. First if the call is fully uninstant.iated ?- propagate and(X, Y, Z) the system
finds the first two answers and forms the msg
and(true, Y, Z). After the third answer the msg
becomes and(X, Y, Z), which is as little instantiated as the query, and propagation stops.
Second
if
the call has its first argument instantiated to false
?- propagate and(talse, Y, Z) there are two answers whose msg is and(false, _, false). Thus the
equality Z
false is returned.
Third if the call has its first argument instantiated to true ?- propagate and ( true, Y, Z) there
are again two answers, and(true, false, false) and
and(true, true, true). Our generalisation procedure is able to derive the equality of the last two
arguments and the final msg is and(true, Y, Y).
Thus the equality Y
Z is returned.
We note that the behaviour is very similar to
that obtained by encoding and using "cut guards"
in Andorra, GIIC rules, or "demons" in CIIIP. For
example in CIIIP we would write:
=
=
?- demon and/3.
and(talse,Y,Z)
and(true,Y,Z)
and(X,talse,Z)
and(X,true,Z)
and(X,Y,true)
and(X,X,Z)
'''-
'''-
Z=talse,
Z=Y.
Z=talse,
Z=X,
X=true, Y = true,
Z=X
The difference is that the use of propagate enables
us to separate the specification of the predicat.e,
from its control. When using guards or demons
we are forced to mix them t.oget.her. Indeed generalised propagation allows declarative specifications
to be directly used as constraints!
We used Propia for a benchmark set of proposit.ional satisfiability problems distributed by the
F AW research institute [Mitt.erreiter and Radermacher, 1991]. Its behaviour was in general quite
comparable to that of CHIP's demons or built-in
const.raint.s.
Another application we examined was that of
crossword puzzle compilation. The problem is to
fill up an empty crossword grid using words from
a given (possibly large) lexicon. The propagation
constraint.s enforce membership of words. in the
given lexicon. Intersections are expressed through
shared variables.
The statement of the problem is as follows:
1* some lexicon ot available words */
word(a),
word(a,b,a,c,k),
prog ,propagate word(A,B,C,D),
/* Bote the shared letter B */
1010
propagate
... ,
~ord(E,F,B,H),
The program just comprises a set of propaga~ion
const.raints. (There is no need for a labellmg smce
Propia itself selects a propag~tion constraint f~r
resolution when the propagatIOn sequence termInates.) Immediately certain letters are instantiated by the original propagation. Subsequently,
each time some letters are instantiated after selecting a wOf'd goal for resolution, the affected propagation constraints are re-executed in the hope of
instantiating further letters.
The crossword compilation problem has also
been addressed using CLP by Van Hentenryck
[Van IIent,enryck, 1989]. Generalised propagation yields a performance improvement of about
15 times on Van I1entenryck's example. However much more significant is the power of generalised propagation for solving large problems.
Van lIentenryck's example uses a lexicon which
cont.ained precisely the 150 words needed to compile the crossword. With generalised propagation
it. is possible to compile crosswords from a 25000
word lexicon. It is interesting to note that generalised propagation automatically yields a similar
algorithm for generating crosswords as that de~el
oped for specialised crossword puzzle generatmg
programs [Oerghel, 1987].
A further way to control the evaluation of the
crossword puzzle example is to divide the word
goals into clusters, reflecting connected subareas
of t.he crossword grid. A predicate cluster can be
defined which combines all the words in a cluster:
cluster(A,B,C,D,E,F) :~ord(A,B,C),~ord(A,D,F),~ord(C,E,F).
Generalised propagation can then be applied to
t.he whole cluster:
propagate cluster(A,B,C,D,E,F)
In general propagation on cluster yields strictly
more information than propagation on each of the
word goals individually. However the amount. of
computing required to perform the propagation on
cluste?' is also likely to be greater than propagating
on t.he word goals individually.
If propagat.ion is applied to larger subproblems,
then we term it more "global". Global propagation
is more expensive than local propagation but the
amount of pruning of the search tree that results
can be very significant.
4.3 Topological Branch and Bound
Generalised propagation is based on the idea of
finding all answers to a query and eliciting the
most specific generalisation. However it much
more efficient to alternate the finding of answers
and calculating the most specific generalisation.
We call this "topological branch and bound" .
For example after finding two words which satisfy a word goal in the crossword example, the system immediately attempts to "generalise" by finding common letters within and between words. If
there are no common letters, the propagation process ceases immediately. Only if there are common letters does the system now search for a third
word. As a result, the system very rarely needs
to find more than a few answers to any word goal
during propagation. This is the reason that the
program has such an excellent runtime, even with
a dictionary of 25000 words compiling real crosswords in a minute. It also accounts for Propia's
good performance on the propositional satisfia.bility benchmarks despite its recalculating at runtIme
propagation information which in the CHIP p.rogram was hard coded by the programmer usmg
demons.
Further optimisations can be applied if the predicate being used for propagation is defined by rules
instead of facts. The exploration of a new branch
in the search tree incrementally builds a new set
of equalities. If, when exploring a branch, the partial set of equalities becomes larger than the current most specific generalisation, then the search
on this branch can be stopped. This means that
propagation can terminate even when the actual
search tree is infinite. For example given the definition
p(s(O».
p(s(X»
:- p(X).
propagation on p(X) terminates after finding two
solutions yielding the constraint X
s(_).
=
5
Conclusion
Constraint logic programming systems offer a
range of tools for writing simple and efficient program8 over various computation domains. Unfortunately it is not always possible to use different
tools together. For example classical propagation
cannot be used in programs working on domains
such as Prolog Ill's trees.
A second drawback is that the logic of the program, when efficiency considerations are taken into
account, has to be transformed extensively, or
parts of it replaced altogether with rules expressed
in some reactive language such as demons. The
result for non-toy programs is a loss of clarity
and, possibly, efficiency. If the programmer is not
extremely competent these problems compound
themselves, too often yielding a result which is not
only inefficient but incorrect.
1011
Generalised propagation makes a contribution
to both problems. Firstly propagation can be used
for arbitrary domains of computation, thereby improving orthogonality. Secondly the propagation
annotations keep the control very simple and quite
separate from the program logic, thereby preserving clarity and correctness.
Current experiments show generalised propagation to be a powerful and flexible tool for exprc8sing control. More global propagation is more
costly but it can bring a drastic reduction of the
search tree. Local propagation is a cheap solution
which is much easier to program and debug than
guarded clauses or demons.
\Ve are continuing to investigate the effectiveness of generalised propagation on a range of applications, studying its practical applicability to
other computation domains, and following up the
8tudy of its lattice theoretic basis.
6
Acknowledgements
[Haralick and Elliot, 1980] R.M. IIaralick and
G.L. Elliot. Increasing tree search efficiency for
constraint satisfaction problems. A rtificial Intelligence, 14:263-314, October 1980.
[Jaffar and Lassez, 1987] J.
Jaffar
and J .-L. Lassez. Constraint logic programming.
In Proceedings of the Fourteenth A CAl Symposium on Principles of Programming Languages
(POPL '87), Munich, FRG, January 1987.
[Le Provost and Wallace, 1992] T. Le Provost and
M. Wallace. Generalised propagation. Technical
Report ECRC-92-1, ECRC, Munich, 1992.
[Mackworth, 1977] A.K. Mackworth. Consistency
in networks of relations. A rtificial Intelligence,
8(1):99-118, 1977.
[Mitterreiter and Radermacher, 1991] I. Mitterreiter and F. J. Radermacher. Experiments on the
running time behaviour of some algorithms solving propositional calculus problems. Technical
Report Draft, FAW, UIm, 1991.
This work was funded by the Esprit 2 project,
nO.5291 CIlIC. Thanks also to Bull, ICL and
Siemens for providing a wonderful working environment at ECRC.
(Montanari, 1974] U. Montanari. Networks of constraints : Fundamental properties and applications to picture processing. Information Science, 7(2):95-132, 1974.
References
[Sussman and Steele, 1980] G.J. Sussman and
G .L. Steele. CONSTRAINTS: A language for
expressing almost-hierarchical descriptions. A rtificial Intelligence, 14(1):1-39, January 1980.
[Berghel, 1987] II. Berghel. Crossword compilation with Horn clauses. The Computer Journal,
30(2): 183-188, 1987.
[Codognet and Sola, 1990] P.
Codognet
and T. Sola. Extending the WAM for intelligent backtracking. In Proc. 8th International
Conference on Logic Programming. MIT Pres,
1990.
(Davis, 1987] E. Davis. Constraint propagation
with interval labels.
A rtificial Intelligence,
32:281-331', 1987.
[Dincbas et al., 1988] M. Dincba.'3, P. Van Hentenryck, II. Simonis, A. Aggoun, T. Graf, and
F. Berthier. The constraint logic programming language CHIP. In Proceedings of the
International Conference on Fifth Generation
Computer Systems (FGCS'88), pages 693-702,
Tokyo, Japan, November 1988.
(Dincbas et al., 1990] M. Dincbas, H. Simonis,
and P. Van IIentenryck. Solving large combinatorial problems in logic programming. Journal
of Logic Programming, 8:74-94, 1990.
[Golomb and Baumert, 1965] S.W. Golomb and
L. D. Baumert. Backtrack programming. Journal of the A Cl./, 12:516-524, 1965.
[Van Hentenryck and Dincbas, 1986] P. Van Hentenryck and M. Dincbas. Domains in logic
In Proceedings of the Fifth
programming.
National Conference on A rtificial Intelligence
(AAAI'86), Philadelphia, PA, August 1986.
[Van Hentenryck, 1989] P. Van Hentenryck. Constraint Satisfaction in Logic Programming.
Logic Programming Series. The MIT Press,
1989.
The andorra
[Warren, 1988] D.II.D. Warren.
model. Presented at the Gigalips Workshop,
Univ. of Manchester, 1988.
[Wolfram, 1989] D.A. Wolfram. Forward checking
and intelligent backtracking. Information Processing Letters, 32(2):85-87, July 1989.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1012
A Feature-based Constraint System
for Logic Programming with Ent'ailment
Hassan Ait-Kaci*
Andreas Podelski*
Abstract
This paper presents the constraint system FT, which we feel
is an intriguing alternative to Herbrand both theoretically
and practically. As does Herbrand, FT provides a universal data structure based on trees. However, the trees of FT
( called feature trees) are more general than the trees of Herbrand (called constructor trees), and the constraints of FT
are finer grained and of different expressivity. The basic notion of FT are functional attributes called features, which
provide for record-like descriptions of data avoiding the overspecification intrinsic in Herbrand's constructor-based descriptions. The feature tree structure fixes an algebraic semantics for FT. We will also establish a logical semantics,
which is given by three axiom schemes fixing the first-order
theory FT.
FT is a constraint system for logic programming, providing a test for unsatisfiability, and a test for entailment
between constraints, which is needed for advanced control
mechanisms.
The two major technical contributions of this paper are
(1) an incremental entailment simplification system that is
proved to be sound and complete, and (2) a proof showing that FT satisfies the so-called "independence of negative
constraints" .
1
Introduction
An important structural property of many logic programming systems is the fact that they factorize into
a constraint system and an extension facility. Colmerauer's Prolog II [8] is an early language design making
explicit use of this property. CLP (Constraint Logic
Programming [10]), ALPS [16], CCP (Concurrent Constraint Programming [21]), and KAP (Kernel Andorra
Prolog [9]) are recent logic programming frameworks
that exploit this property to its full extent by being
parameterized with respect to an abstract class of constraint systems. The basic operation these frameworks
* Digital Equipment Corporation, Paris Research Laboratory
(PRL), 85 avenue Victor Hugo, 92500 Rueil-Malmaison, France
(email: {hak.podelski} FT 'lj;] if
V( ¢ ---* 'lj;) is valid in A [is a consequence of T]. A theory
T is complete if for every closed formula ¢ either ¢ or '¢
is a consequence of T.
The feature tree structure T is the S I±J F-structure
defined as follows:
• the domain of T is the set of all rational feature
trees;
• tEAT iff t(e)
• (8, t)
8
= A (t's
E fT iff f E
root is labeled with A);
Ds and t = f 8 (t is the subtree of
at 1).
Next we discuss the expressivity of our constraints
with respect to feature trees (that is, with respect to
the feature tree structure T) by means of examples. The
constraint
,::Jy(xfy)
says that x has no subtree at f, that is, that there is no
edge departing from x's root that is labeled with f. To
say that x has subtree y at path 11'" fn, we can use the
constraint
Now let's look at statements we cannot express (more
precisely, statements of whom the authors believe they
cannot be expressed). One simple unexpressible statement is "y is a subtree of x" (that is, "::Jw: y = wx").
Moreover, we cannot express that x is smaller than y.
Finally, if we assume that the alphabet F of features is
infinite, we cannot say that x has subtrees at features
iI, ... ,fn but no subtree at any other feature. In particular, we then cannot say that x is a primitive feature
tree, that is, has no proper subtree.
The theory FT0 is given by the following two axiom
schemes:
Vx Vy Vz (xfy 1\ xfz
(for every feature 1)
(Axl)
---*
Y
== z)
(Ax2) Vx (Ax
1\ B x ---* -.L)
(for every two distinct sorts A and B).
The first axiom scheme says that features are functional
and the second scheme says that sorts are mutually disjoint. Clearly, T is a model of FTo. Moreover, FTo is
1015
incomplete (for instance, :3x(Ax) is valid in T but invalid
in other models of FTo). We will see in the next section
that FT0 plays an important role with respect to basic
constraint simplification.
N ext we introduce some additional notation needed in
the rest of the paper. This notation will also allow us to
state a third axiom scheme that, as shown in [6], extends
FTo to a complete axiomatization of T.
Throughout the paper we assume that the conjunction
of formulae is an associative and commutative operator
that has T as neutral element. This means that we identify ¢ 1\ ('I/; 1\ 8) with 81\ ('I/; 1\ ¢), and ¢ 1\ T with ¢ (but
not, for example, xfy 1\ xfy with xfy). A conjunction of
atomic formulae can thus be seen as the finite multiset of
these formulae, where conjunction is multiset union, and
T (the "empty conjunction") is the empty multiset. We
will write 'I/; ~ ¢ (or 'I/; E ¢, if 'I/; is an atomic formula) if
there exists a formula '1/;' such that 'I/; 1\ '1/;' = ¢.
We will use an additional atomic formula xf i ("f
undefined on x") that is taken to be equivalent to
,:3y (xfy), for some variable y (other than x).
Only for the formulation of the third axiom we introduce the notion of a solved-clause, which is either T or
a conjunction ¢ of atomic formulae of the form x fy, Ax
or x fi such that the following conditions are satisfied:
1. if Ax E ¢ and Bx E ¢, then A
= B;
2. if xfy E ¢ and xfz E ¢, then y
Here the "leads-to" relation ~ is given by: x ~ x, and
x ~ y if x ~ y' and y'fy E ¢, for some y' E V(¢) and
some f E F. Since
and, for a node w of a( x), wa( x)
3
Basic Simplification
A basic constraint is either ~ or a possibly empty conjunction of atomic formulae of the form Ax, xfy, and
x == y. The following five basic simplification rules constitute a simplification system for basic constraints, which,
as we will see, decides whether a basic constraint is satisfiable in T.
1.
2.
3.
xfy 1\ xfz 1\ ¢
xfzl\y==zl\¢
Ax 1\ Ex 1\ ¢
~
4.
V:3X¢
(for every solved-clause ¢ and X
= VV(¢)).
Theorem 2.1 The feature tree structure T is a model
of the theory FT.
Proof. We will only show that FT is a model of the
third axiom. Let X be the set of dependent variables
of the solved-clause ¢, X = VV(¢). Let a be any Tvaluation defined on V (¢) - X; we write the tree a(y) as
t y • We will extend a on X such that T, a 1= ¢.
Given x EX, we define the "punctual" tree tz =
{(c:, A)}, where A E S is the sort such that Ax E ¢,
if it exists, and arbitrary, otherwise. Now we are going to use the notion of tree sum of Nivat [19], where
w-1t = {(wv, A) (v,A) E t} ("the tree t translated by
w" ), and we define:
1
a(x) = ~{W-lty
1
x ~ y for some
y E V(¢), wE F*}.
A-#B
Ax 1\ Ax 1\ ¢
Ax 1\ ¢
3. if xfy E ¢, then xfi ~ ¢.
(Ax3)
that
o
= z;
Given a solved-clause ¢, we say that a variable x is dependent in ¢ if ¢ contains a constraint of the form Ax, xfy
or x fi, and use VV( ¢) to denote the set of all variables
that are dependent in ¢.
The theory FT is obtained from FTo by adding the
axiom scheme:
= a(y), it follows
1= ¢.
a( x) is a rational tree and that T, a
x==yl\¢
x == Y 1\ ¢[x
+---
y]
x E V(¢) and x
-# y
The notation ¢[x +--- y] is used to denote the formula
that is obtained from ¢ by replacing every occurrence
of x with y. We say that a constraint ¢ simplifies to a
constraint 'I/; by a simplification rule p if is an instance
of p. We say that a constraint ¢ simplifies to a constraint
'I/; if either ¢ = 'I/; or ¢ simplifies to 'I/; in finitely many
steps each licensed by one of the five simplification rules
given above.
*
Example 3.1 We have the following basic simplification chain, leading to a solved constraint:
:::}
:::}
:::}
:::}
:::}
xfu 1\ yfv 1\ Au 1\ Av 1\ z == x 1\ Y == z
xfu 1\ yfv 1\ Au 1\ Av 1\ z == x 1\ Y == x
xfu 1\ xfv 1\ Au 1\ Av 1\ z == x 1\ Y == x
xfv 1\ Au 1\ Av 1\ u == v 1\ z == x 1\ Y == x
x fv 1\ Av 1\ Av 1\ u == v 1\ z == x 1\ Y == x
xfv 1\ Av 1\ u == v 1\ z == x 1\ Y == x
U sing the same steps up to the last one, the constraint
xfu 1\ yfv 1\ Au 1\ Bv 1\ z == x 1\ Y == z simplifies to ~ (in
the last step, Rule 2 instead of Rule 3 is applied).
0
1016
Proposition 3.2 If the basic constraint c/> simplifies to
'l/;J then FTo /= c/> ~ '1/;.
Proof. The rules 3, 4 and 5 perform equivalence transformations with respect to every structure. The rules 1
and 2 correspond exactly to the two axiom schemes of
FT0 and perform equivalence transformations with re0
spect to every model of FTo.
We say that a basic constraint c/> binds a variable x to
== y E c/> and x occurs only once in c/>. At this
point it is important to note that we consider equations
as ordered, that is, assume that x == y is different from
y == x if x -=I- y. We say that a variable x is eliminated,
or bound by C/>, if c/> binds x to some variable y.
y if x
Proposition 3.3 The basic simplification rules are terminating.
Proof. First observe that the simplification rules don't
add new variables and preserve eliminated variables.
Furthermore, rule 4 increases the number of eliminated
variables by one. Hence we know that if an infinite simplification chain exists, we can assume without loss of
generality that it only employs the rules 1, 3 and 5. Since
rule 1 decreases the number offeature constraints "xfy",
which is not increased by rules 3 and 5, we know that
if an infinite simplification chain exists, we can assume
without loss of generality that it only employs the rules
3 and 5. Since this is clearly impossible, an infinite sim0
plification chain cannot exist.
A basic constraint is called normal if none of the five
simplification rules applies to it. A constraint 'I/; is called
a normal form of a basic constraint c/> if c/> can be simplified to 'I/; and 'I/; is normal. A solved constraint is a
normal constraint that is different from .L
So far we know that we can compute for any basic
constraint c/> a normal form 'I/; by applying the simplification rules as long as they are applicable. Although the
normal form 'I/; may not be unique for C/>, we know that c/>
and 'I/; are equivalent in every model of FTo. It remains
to show that every solved constraint is satisfiable in T.
Every basic constraint c/> has a unique decomposition
c/> = c/>N 1\ c/>G such that c/>N is a possibly empty conjunction of equations "x == y" and and c/>G is a possibly
empty conjunction of feature constraints "xfy" and sort
constraints "Ax". We call c/>N the normalizer and and
c/>G the graph of c/>.
Proposition 3.4 A basic constraint c/>
the following conditions hold:
-=I- ~
is solved iff
1. an equation x == y appears in c/> only if x
nated in c/>;
2. the graph of c/> is a solved clause;
1,S
elimi-
3. no primitive constraint appears more than once in C/>.
Proposition 3.5 Every solved constraint is satisfiable
in every model of FT.
Proof. Let c/> be a solved constraint and A be a model of
FT. Then we know by axiom scheme Ax3 that the graph
c/>G of a solved constraint c/> is satisfiable in an FT-model
A. A variable valuation a into A such that A, a 1= c/>G
can be extended on all eliminated variables simply by
a(x) = a(y) if x == y E C/>, such that A, a 1= C/>.
0
Theorem 3.6 Let 'I/; be a normal form of a basic constraint C/>. Then c/> is satisfiable in T if and only if 'I/; -=I- ~.
Proof. Since c/> and 'I/; are equivalent in every model
of FTo and T is a model of FT o, it suffices to show
that 'I/; is satisfiable in T if and only if 'I/; -=I- ~. To
show the nontrivial direction, suppose 'I/; -=I- ~. Then 'I/;
is solved and we know by the preceding proposition that
'I/; is satisfiable in every model of FT. Since T is a model
of FT, we know that 'I/; is satisfiable in T.
0
Theorem 3.7 For every basic constraint c/> the following
statements are equivalent:
T
/= 34> {::}
:3 model A of FT 0: A
1= 34> {::}
FT
1= 34>.
Proof. The implication 1 :::} 2 holds since T is a model
of FTo. The implication 3 :::} 1 follows from the fact that
T is a model of FT. It remains to show that 2 :::} 3.
Let 4> be satisfiable in some model of FTo. Then we
can apply the simplification rules to c/> and compute a
normal form 'I/; such that c/> and 'I/; are equivalent in every
model of FTo. Hence 'I/; is satisfiable in some model of
FTo. Thus 'I/; -=I- ~, which means that 'I/; is solved. Hence
we know by the preceding proposition that 'I/; is satisfiable
in every model of FT. Since c/> and 'I/; are equivalent in
every model of FTo~FT, we have that c/> is satisfiable in
~~~~~~
4
Entailment,
and Negation
0
Independence
In this section we discuss some general properties of constraint entailment. This prepares the ground for the next
section, which is concerned with entailment simplification in the feature tree constraint system.
Throughout this section we assume that A is a structure, I and c/> are formulae that can be interpreted in A,
and that X is a finite set of variables.
We say that I dis entails c/> in A if I entails ,c/> in A.
If I is satisfiable in A, then I cannot both entail and
disentail :3X 4> in A. We say that I determines c/> in A if
I either entails or disentails c/> in A.
1017
Given I, , >1, ... ,>n with respect to I and X, there exists a
'basic plus Rule E6' simplification chain I /\ >, 11 /\
>~, ... "n+k /\>~+k' where k ~ 0 is the number of "extra"
variable elimination steps. Since, according to Proposition 3.3, basic simplification chains are finite, so are
entailment simplification chains.
D
So far we know that we can compute for any basic
constraint > a normal form 'lj; with respect to I and X
by applying the simplification rules as long as they are
applicable. Although the normal form 'lj; may not be
unique, we know that 1/\ > and ,/\ 'lj; are equivalent in
every model of FTo.
Proposition 5.6 For every basic constraint > one can
compute a normal form 'lj; with respect to I and X. Every
such normal form 'lj; satisfies: I I=r ?JX > iff I I=r ?JX'lj;,
and I I=FT ?JX> iff I I=FT ?JX'lj;.
Proof. Follows from Propositions 5.4, 5.5, 4.2 and 4.l.
D
In the following we will show that from the entailment
normal form 'lj; of > with respect to I it is easy to tell
whether we have entailment, disentailment or neither.
Moreover, the basic normal form of 1/\ > is exactly 1/\ 'lj;
in the first case (and in the second, where I /\ 1- = 1-),
and "almost" in the third case (cf. Lemma 5.3).
Proposition 5.7 A basic constraint > i- 1- is normal
with respect to I and X if and only if the following conditions are satisfied:
1. > is solved, X -oriented, and contains no variable
that is bound by ,;
2. if >:v = y and xfu E I, then yfv
3. if>:v
= >y andxfu E I
4. if >:v = y and Ax
5. if >:v
= >y
rf. > for
every v;
andyfv E I, thenu
E I, then By
rf. > for
= B.
Lemma 5.8 If > i- 1- is normal with respect to I and
X, then 1/\ > is satisfiable in every model of FT.
#- 1- be normal with respect to I and X.
Furthermore, let I = IN /\ IG and > = >N /\ >G be the
unique decompositions in normalizer and graph. Since
the variables bound by IN occur neither in IG nor in >,
it suffices to show that IG /\ >N /\ >G is satisfiable in every
model of FT.
Let >NbG) be the basic constraint that is obtained
from IG by applying all bindings of >N' Then IG /\ >N /\
>G is equivalent to >N /\ >N( IG) /\ >G and no variable
bound by >N occurs in >NbG) /\ >G. Hence it suffices
to show that >N( IG) /\ >G is satisfiable in every model of
FT. With the conditions 2-5 of the preceding proposition
Proof. Let >
Theorem 5.9 (Disentailment) Let 'lj; be a normal
form of cP with respect to I and X. Then I I=r ...,?JX cP
iff 'lj; = 1-.
Proof. Suppose 'lj; = 1-. Then I I=r ...,::IX'lj; and hence
I=r ...,?JX cP by Proposition 5.6.
To show the other direction, suppose I I=r ...,?JX cP.
Then I I=r ...,?JX'lj; by Proposition 5.6 and hence 1/\ 'lj;
unsatisfiable in T by Proposition 4.2. Since T is a model
of FT (Theorem 2.1), we know by the preceding lemma
I
that 'lj;
= 1- (since 'lj; is assumed to be normal).
D
We say that a variable x is dependent in a solved constraint cP if cP contains a constraint of the form Ax, xfy
or x ~ y. (Recall that equations are ordered; thus y is
not dependent in the constraint x == y.) We use 1JV(cP)
to denote the set of all variables that are dependent in a
solved constraint cP.
In the following we will assume that the underlying
signature S l±J F has at least one sort and at least one
feature that does not occur in the constraints under consideration. This assumption is certainly satisfied if the
signature has infinitely many sorts and infinitely many
features.
Lemma 5.10 (Spiting) Let cPl, ... , cPn be basic constraints different from 1-, and Xl, . .. , Xn be finite sets
of variables disjoint from Vb). Moreover, for every
i = 1, ... ,n, let cPi be normal with respect to I and Xi,
and let cPi have a dependent variable that is not in Xi.
Then 1/\ ...,?JX1cP1 /\ ... /\ ...,?JXncPn is satisfiable in every
model of FT.
= >v;
every B;
and Ax E I and By E I, then A
it is easy to see that cPN(!G) /\ cPG is a solved clause.
Hence we know by axiom scheme Ax3 that cPN( ,G) /\ cPG
is satisfiable in every model of FT.
D
Proof. Let I = IN /\ IG be the unique decomposition of
I into normalizer and graph. Since the variables bound
by IN occur neither in IG nor in any >i, it suffices to show
that IG /\ ...,?JX1>1 /\ ... /\ ...,?JXncPn is satisfiable in every
model of FT. Thus it suffices to exhibit a solved clause
6 such that IG ~ 6 and, for every i = 1, ... ,n, V(6) is
disjoint with Xi and 6/\ cPi is unsatisfiable in every model
of FT.
Without loss of generality we can assume that every
Xi is disjoint with V ( I) and V ( cPj) - X j for all j. Hence
we can pick in every cPi a dependent variable Xi such that
:Vi ~ Xj for any j.
Let Zl, .•. ,Z/e be all variables that occur on either side
of equation Xi == Y E cPi, i = 1, ... , n (recall that Xi is
fixed for i). None of these variables occurs in any Xj
since every cPi is Xi-oriented. Next we fix a feature 9 and
a sort B such that neither occurs in I or any cPi.
Now 6 is obtained from I by adding constraints as
follows: if A:Vi E cPi, then add BXi; if xdY E cPi, then
1020
add ;£d i; to enforce that the variables
pairwise distinct, add
Zl, ... , Zk
are
It is straightforward to verify that these additions to '1
yield a solved clause {j as required.
0
Proposition 5.11 If cP is solved and 'DV(cP)
FT 1= V3XcP.
~
X) then
Proof. Let cP = cPN 1\ cPG be the decomposition of cP in
normalizer and graph. Since every variable bound by cP
is in X, it suffices to show that V3X cPG is a consequence
of FT. This follows immediately from axiom scheme Ax3
since ¢G is a solved clause.
0
Theorem 5.12 (Entailment) Let'lj; be a normal form
of cP with respect to '1 and X. Then '1 1=7 3X cP iff
'lj; i- -.L and 'DV ('lj;) ~ X.
Proof. Suppose '1 1=7 3X cPo Then we know '1 1=7 3X'lj;
by Proposition 5.6, and thus '11\ ,3X'lj; is unsatisfiable in
T. Since '1 is solved, we know that '1 is satisfiable in T
and hence that '11\ 3X'lj; is satisfiable in T. Thus 'lj; i- -.L.
Since '1 1\ ,3X'lj; is unsatisfiable in T and T is a model
of FT, we know by Lemma 5.10 that 'DV('lj;) ~ X.
To show the other direction, suppose 'lj; i- -.L and
'DV('lj;) ~ X. Then FT 1= V3X'lj; by Proposition 5.11,
and hence T 1= V3X'lj;. Thus '1 1=7 3X'lj;, and hence
'1 1=7 3X ¢ by Proposition 5.6.
0
Theorem 5.13 Let ¢ be a basic constraint. Then '1
3X¢ iff'1 I=FT 3X¢.
1=7
Proof. One direction holds since T is a model of FT. To
show the other direction, suppose '1 1=7 3X cPo Without
loss of generality we can assum~ that cP is normal with
respect to '1 and X. Hence we know by Theorem 5.12
that cP i- -.L and 'DV( 'lj;) ~ X. Thus FT 1= V3X cP by
Proposition 5.11 and hence '1 I=FT 3X¢.
0
Theorem 5.14 (Independence) Let cPl, ... , cPn be basic constraints) and Xl, ... ,Xn be finite sets of variables.
Then
Proof. To show the nontrivial direction, suppose '1 1=7
3X1cP1 V ... V 3XncPn. Without loss of generality we can
assume that, for all i = 1, ... , n, Xi is disjoint from Vb),
cPi is normal with respect to '1 and Xl, and cPi i- -.L. Since
'11\,3X1 cP11\ . . . 1\,3Xn cPn is un satisfiable in T and T is a
model of FT, we know by Lemma 5.10 that 'DV( cPk) ~ X k
for some k. Hence '1 1=7 3XkcPk by Theorem 5.12.
0
6
Conclusion
We have presented a constraint system FT for logic programming providing a universal data structure based
on rational feature trees. FT accommodates recordlike descriptions, which we think are superior to the
constructor-based descriptions of Herbrand.
The declarative semantics of FT is specified both algebraicly (the feature tree structure T) and logically (the
first-order theory FT given by three axiom schemes).
The operational semantics for FT is given by an incremental constraint simplification system, which can check
satisfiability of and entailment between constraints.
Since FT satisfies the independence property, the simplification system can also check satisfiability of conjunctions of positive and negative constraints.
We see four directions for further research.
First, FT should be strengthened such that it subsumes the expressivity of rational constructor trees [7, 8].
As is, FT cannot express that ;£ is a tree having direct
subtrees at exactly the features 11, ... ,In. It turns out
that the system CFT [24] obtained from FT by adding
the primitive constraint
(;£ has direct subtrees at exactly the features f1, ... ,fn)
has the same nice properties as FT. In contrast to FT,
CFT can express constructor constraints; for instance,
the constructor constraint ;£ == A( y, z) can be expressed
equivalently as A;£ 1\ ;£{1, 2} 1\ ;£ly 1\ ;£2z, if we assume
that A is a sort and the numbers 1,2 are features.
Second, it seems attractive to extend FT such that it
can accommodate a sort lattice as used in [1, 3, 4, 5, 23].
One possibility to do this is to assume a partial order :S
on sorts and replace sort constraints A;£ with quasi-sort
constraints [A]x whose declarative semantics is given as
[A]x ==
V B;£.
B::;A
Given the assumption that the sort ordering :S has greatest lower bounds if lower bounds exist, it seems that the
results and the simplification system given for FT carry
over with minor changes.
Third, the worst-case complexity of entailment checking in FT should be established. We conjecture it to be
quasi-linear in the size of '1 and cP, provided the available
features are fixed a priory.
Fourth, implementation techniques for FT at the level
of the Warren abstract machine [2] need to be developed.
References
[1] H. Ai't-Kaci. An algebraic semantics approach to the
effective resolution of type equations. Theoretical
Computer Science, 45:293-351, 1986.
1021
[2] H. Alt-Kaci. Warren's Abstract Machine: A Tutorial Reconstruction. The MIT Press, Cambridge,
MA,1991.
[3] H. Alt-Kaci and R. Nasr. LOGIN: A logic programming language with built-in inheritance. The lournal of Logic Programming, 3:185-215, 1986.
[4] H. Alt-Kaci and R. Nasr. Integrating logic and functional programming. Lisp and Symbolic Computation, 2:51-89, 1989.
[5] H. Alt-Kaci and A. Podelski. Towards a Meaning of LIFE. Proceedings of the 3rd International
Symposium on Programming Language Implementation and Logic Programming (Passau, Germany),
J. Maluszynski and M. Wirsing, editors. LNCS 528,
pages 255-274, Springer-Verlag, 1991.
[6] R. Backofen and G. Smolka. A complete and decidable feature theory. Draft, German Research Center for Artificial Intelligence (DFKI), Stuhlsatzenhausweg 3, 6600 Saarbrucken 11, Germany, 1991.
To appear.
[7] A. Colmerauer. Equations and inequations on finite
and infinite trees. In Proceedings of the 2nd International Conference on Fifth Generation Computer
Systems, pages 85-99, 1984.
[8] A. Colmerauer, H. Kanoui, and M. V. Caneghem.
Prolog, theoretical principles and current trends.
Technology and Science of Informatics, 2(4):255292, 1983.
[9] S. Haridi and S. Janson. Kernel Andorra Prolog and
its computation model. In D. Warten and P. Szeredi,
editors, Logic Programming, Proceedings of the 7th
International Conference, pages 31-48, Cambridge,
MA, June 1990. The MIT Press.
[10] J. J affar and J.-L. Lassez. Constraint logic programming. In Proceedings of the 14th ACM Symposium on Principles of Programming Languages,
pages 111-119, Munich, Germany, Jan. 1987.
[11] M. Johnson. Attribute- Value Logic and the Theory
of Grammar. CSLI Lecture Notes 16. Center for
the Study of Language and Information, Stanford
University, CA, 1988.
[12] R. M. Kaplan and J. Bresnan. Lexical-Functional
Grammar: A formal system for grammatical representation. In J. Bresnan, editor, The Mental Representation of Grammatical Relations, pages 173-381.
The MIT Press, Cambridge, MA, 1982.
[13] M. Kay. Functional grammar. In Proceedings of
the Fifth Annual Meeting of the Berkeley Linguistics Society, Berkeley, CA, 1979. Berkeley Linguistics Society.
[14] J.-L. Lassez, M. Maher, and K. Marriot. Unification
revisited. In J. Minker, editor, Foundations of Deductive Databases and Logic Programming. Morgan
Kaufmann, Los Altos, CA, 1988.
[15] J. L. Lassez and K. McAloon. A constraint sequent
calculus. In Fifth Annual IEEE Symposium on Logic
in Computer Science, pages 52-61, June 1990.
[16] M. J. Maher.
Logic semantics for a class of
committed-choice programs. In J.-L. Lassez, editor,
Logic Programming, Proceedings of the Fourth International Conference, pages 858-876, Cambridge,
MA, 1987. The MIT Press.
[17] K. Mukai. Partially specified terms in logic programming for linguistic analysis. In Proceedings of
the 6th International Conference on Fifth Generation Computer Systems, 1988.
[18] K. Mukai. Constraint Logic Programming and the
Unincation of Information. PhD thesis, Tokyo Institute of Technology, Tokyo, Japan, 1991.
[19] M. Nivat. Elements of a theory of tree codes. In
M. Nivat, A. Podelski, editors, Tree Automata (Advances and Open Problems), Amsterdam, NE, 1992.
Elsevier Science Publishers.
[20] W. C. Rounds and R. T. Kasper. A complete logical
calculus for record structures representing linguistic
information. In Proceedings of the 1st IEEE Symposium on Logic in Computer Science, pages 38-43,
Boston, MA, 1986.
[21] V. Saraswat and M. Rinard. Concurrent constraint
programming. In Proceedings of the 7th Annual
A CM Symposium on Principles of Programming
Languages, pages 232-245, San Francisco, CA, January 1990.
[22] G. Smolka. Feature constraint logics for unification grammars. The lournal of Logic Programming,
12:51-87, 1992.
[23] G. Smolka and H. Alt-Kaci. Inheritance hierarchies: Semantics and unification. lournal of Symbolic Computation, 7:343-370, 1989.
[24] G. Smolka and R. Treinen. Relative simplification for and independence of CFT. Draft, German
Research Center for Artificial Intelligence (DFKI),
Stuhlsatzenhausweg 3, 6600 Saarbrucken 11, Germany, 1992. To appear.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1022
Range Determination of Design Parameters by Qualitative
Reasoning and its Application to Electronic Circuits
Masaru Ohki, Eiji Oohira, Hiroshi Shinjo, and Masahiro Abe
Central Research Laboratory, Hitachi, Ltd.
Higashi-Koigakubo, Kokubunji, Tokyo 185, Japan
ohki@crl.hitachi.co.jp
Abstract
There are numerous applications of qualitative
reasoning to diverse fields of engineering. The
main application has been to diagnosis, but there
are a few applications to design. We show a new
application to design, suggesting valid ranges for
design parameters; this application follows the
step of structure determination. The application
does not provide more innovative design, but it is
one of the important steps of design. To implement
it, we use an envisioning mechanism, which
determines all possible behaviors of a system
through qualitative reasoning. Our method: (1)
performs envisioning with design parameters
whose values are initially undefined, (2) selects
preferable behaviors from all possible behaviors
found by the envisioning process, and (3) calculates
the ranges of those design parameters that give the
preferable behaviors.
We built a design-support system Desq (D..e.sign
~upport system based on u,ualitative reasoning) by
improving an earlier qualitative reasoning system
Qupras (Q.u.alitative Rhysical reasoning ~stem).
We added three new features: envisioning,
calculating the undefined parameters, and
propagating new constraints on constant
parameters. The Desq system can deal with
quantities qualitatively and quantitatively, like
Qupras. Accordingly, we may someday be able to
determine the quantitative ranges, if the
parameters can be expressed quantitatively.
Quantitative ranges are preferable to qualitative
values, to support the determination of design
parameters.
1 Introduction
Recently, many expert systems have been used
in the diverse fields of engineering. However,
several problems still exist. One is the difficulty of
building knowledge bases from the experience of
human experts. The other is that these expert
systems cannot deal with unimaginable situations
[Mizoguchi 87]. Reasoning methods using deep
knowledge, which is the fundamental knowledge of
a domain, are expected to solve these problems.
One reasoning method is qualitative reasoning
[Bobrow 84]. Qualitative reasoning determines
dynamic behaviors, which are the states of a
dynamic system and its state changes, using deep
knowledge of the dynamic system. Another feature
of qualitative reasoning is that it can deal with
quantities qualitatively. So far, there have been
many applications of qualitative reasoning to
engineering [Nishida 88a, Nishida 88b, Nishida
91]. The main application has been to diagnosis
[Yamaguchi 87, Ohwada 88], but recently there
have also been applications to design [Murthy 87,
Williams 90].
In this paper, we show a new application to
design that supports decisions by suggesting valid
ranges for design parameters; it follows the step of
structure determination. This application is not
considered to be more innovative than the previous
applications to design [Murthy 87, Williams 90],
but it is one of the important steps of design
[Chandrasekaran 90].
The key to design support is applying an
envisioning mechanism, which predicts the
behaviors of the dynamic system, to those design
parameters whose values are undefined. If the
envisioning is performed on condition that the
design parameters whose values a designer wants
to determine are undefined, all possible behaviors
under the undefined design parameters can be
predicted by the envisioning process. Some
hypotheses are made to obtain each behavior. The
main reason why hypotheses are made is that
conditions written in the definitions of objects and·
physical rules cannot be evaluated because the
design parameters are undefined. Among the
obtained possible behaviors, more than one
behavior desired by the designer is expected to
exist. The designer can select the behaviors which
he/she exactly prefers. Although the designer may
not know the values of the design parameters,
helshe knows the desired. behavior. The values of
the undefined parameters can be calculated from
the hypotheses made to obtain the desired behavior.
1023
To sum up, the method of determining valid
ranges for design parameters offers the following:
(1) Performs envisioning for design parameters
whose values are initially undefined,
(2) Selects preferable behaviors from possible
behaviors found by the envisioning process,
and
(3) Calculates the ranges of those design
parameters that give the preferable behaviors.
We used a qualitative reasoning system Qupras
(Qualitative llhysical reasoning .system) [Ohki 86,
Ohki 88, Ohki 92] to construct a decision support
system Desq (~sign §.uppoit system based on
,Qualitative reasoning) that suggests valid ranges
for design parameters.
Qupras, using knowledge about physical rules
and objects after being given an initial state,
determines the followings:
(1) Relations between objects that are
components of physical systems.
(2) The subsequent states of the system following
a transition.
We extended Qupras to construct Desq as
follows:
(1) Envisioning
In Qupras, if a condition of a physical rule
or an object cannot be evaluated, Qupras
asks the user to specify the condition. We
extended Qupras to allow it to continue
assuming an unevaluated condition.
(2) Calculating the undefined parameters
After envisioning all possible behaviors,
Desq calculates the ranges of the undefined
design parameters that give the behavior
specified by the designer.
(3) Propagation of new constraints on constants
In the envisioning process, constraints
related to some constant parameters
become stronger because conditions in the
definitions of physical rules and objects are
hypothesized. The constraints propagate to
the subsequent states.
(4) Parallel constraint solving
Qupras uses a combined constraint solver
consisting of three basic constraint solvers:
a Supinf method constraint solver, an
Interval method constraint solver, and a
Groebner base method constraint solver, all
written in ESP. The processing load of the
combined constraint solver was heavy, so
we converted it to KL1 to speed up
processing.
Desq can deal with quantities qualitatively and
quantitatively like Qupras. Accordingly, we may
someday be able to get quantitative ranges, if the
parameters can be given as quantitative values.
Quantitative ranges may be preferable for decision
support. The usual qualitative reasoning like
[Kuipers 84] gives qualitative ranges.
Section 2 shows how Desq suggests ranges for
design parameters, Section 3 describes the system
organization of Desq, Section 4 shows an example
of Desq suggesting the value of a resistor in a DTL
circuit, Section 5 describes related works and
Section 6 summarizes the paper.
2
Method of determining
design parameters
In design, there are many cases in which a
designer does not directly design a new device, but
changes or improves an old device. Sometimes
designers only change parameters of components
in a device to satisfy the requirements. The
designer, in such cases, knows the structure of the
device, and needs only to determine the new values
of the components. This is common for electronic
circuits. We apply qualitative reasoning to the
design decisions.
The key process used to determine design
parameters is envisioning. Our method is as
described in Section 1:
(1) All possible behaviors of a device are found by
envisioning, with design parameters whose
values are initially undefined.
(2) Designers select preferable behaviors from
these possible behaviors.
(3) The ranges of the design parameters that give
the preferable behaviors are calculated using
a parallel constraint solver.
If a condition in the definitions of a physical rule
or an object cannot be evaluated, Desq hypothesizes
one case where the condition is valid and another
where it is not valid, and separately searches each
case to find all possible behaviors. This method is
called envisioning, and is the same as [Kuipers 84].
If a contradiction is detected, the reasoning is
abandoned. If no contradiction is detected, the
reasoning is valid. Finally, Desq finds several
possible behaviors of a device.
The characteristics of this approach are as
follows:
(1) Only deep knowledge is used to determine
design parameters.
(2) All possible behaviors with regard to
undefined design parameters are found.
Such information may be used in safety
design or danger estimation.
(3) Ranges of design parameters gIVIng
preferable behaviors are found. If a designer
uses numerical CAD systems, for example,
lO24
Initial state
DTL Circuit
is satisfied.
Desq similarly
hypothesizes this condition. Finally,
Desq finds two possible behaviors for
the initial data.
Then, Desq
calculates the resistance Rb. The
resistance must be larger than 473
ohms to give the desired behavior,
where the circuit acts as a NOT
circuit because the transistor is
"on".
If the resistance is smaller
than 473 ohms, the circuit shows
another behavior which is not
preferable. Thus, the resistance Rb
must be larger than 473 ohms. This
proves that Desq can deal with
quantities qualitatively and
quanti tatively.
5V
5 V --t.....-4I....I-r--+-l
+
Unevaluatedcondition: VoltageofDl< 0.7 V
?
}I' "
Hypothesis 1
(VoltageofDl ~O. 7 V)
Hypothesis
(VoltageofDl < 0.7 V)
.'
I
•
JI'
Conflict
2
t
Unevaluated condition: Base Voltage ofTr~ O. 7 V
lHlypothesis 1
/
"
?
lHlypothesis
2
(Base Voltage ofTr ~ 0.7 V (Tr On)) (Base Voltage of Tr < O. 7 V (Tr Oft))
t
3 System organization
This section describes the system
organization of Desq.
Figure 2
shows that Desq mainly consists of
Undefined parameter Rb ~473 0
473 0 >Undefined parameter Rb ~O 0
three subsystems:
Figure 1 An example of deciding an undefined parameter
(1) Behavior reasoner
This subsystem is based on
SPICE, helshe need not simulate values
Qupras.
It
determines all possible
outside the ranges.
behaviors.
Figure 1 shows an example of suggesting
(2) Design parameter calculator
ranges for a design parameter. This example
This subsystem calculates ranges of
illustrates the determination of a resistance value
design parameters.
in a DTL circuit. The designer inputs the DTL
structure and the parameters of the components
(3) Parallel constraint solver
except for the resistance Rb.
This subsystem is written in KLl, and is
Desq checks the conditions in the definitions of
executed on PIM, Multi-PSI, or Pseudo
physical rules and objects. If they are satisfied, the
Multi-PSI.
equations in their consequences are sent to the
When the designer specifies initial data, the
parallel constraint solvers. But, it is not known
behavior reasoner builds its model corresponding
what state the diode Dl is in, because the
to the initial state, by evaluating conditions of
resistance Rb is undefined. The first condition is
physical rules and objects. The physical rules and
whether the voltage of Dl is-lower than 0.7 volts.
objects are stored in the knowledge base. The
Desq hypothesizes two cases; in the first the
model in Desq uses simultaneous inequalities in
condition
is
not
satisfied, and in the
second it is. The first Initial data
hypothesis is abandoned
Output ~---.---.,
because the parallel
rr====ir=====;j
m~C
constraint solver detects
pariUllc:t6r>
a conflict with the other
~9JJ~9t<:
equations.
In the
....
on
PSI
second hypothesis, no
••••
conflict is - detected.
:Query
••••• Query
After
some
more
••••
'"
~
hypotheses are made,
another state is detected
where it is not known
Simultaneous
on Pseudo Multi-PSI
whether or not the
inequalities
condition giving the
state of the transistor Tr
Figure 2 System organization
Preferable behavior
Unpreferable behavior~
•
1025
the same way as that in Qupras. Simultaneous
inequalities are passed to the parallel constraint
solver to check the consistency and store them. If
an inconsistency is detected, the reasoning process
is abandoned. Conditions in the definitions of
physical rules and objects are checked by the
parallel constraint solver. If the conditions are
satisfied, the inequalities in the consequences of
the physical rules and objects are added to the
model in the parallel constraint solver. If a
condition cannot be evaluated by the parallel
constraint solver, envisioning is performed.
Finally, when all possible beh~viors are found, the
design parameter calculator deduces the ranges of
design parameters that give preferable behaviors.
3.1 Behavior reasoner
3.1.1 Qupras Outline
Qupras is a qualitative reasoning system that
uses knowledge from physics and engineering
textbooks.
Qupras has the following
characteristics:
(1) Qupras has three primitive representations:
physical rules (laws of physics), objects and
events.
.
(2) Qupras determines the dynamic behaviors of
a system by building all equations for the
system using knowledge of physical rules,
objects and events. The user need not enter all
the equations of the system.
(3) Qupras deals with equations that describe
basic laws of physics qualitatively and
quantitatively.
(4) Qupras does not require quantity spaces to be
given in advance. It finds the quantity spaces
for itself during reasoning.
(5) Objects in Qupras can inherit definitions
from their super objects. Thus, physical rules
can be defined generally by specifying the
definitions of object classes with super objects.
Qupras is similar to QPT [Forbus 84], but does
not use influence. The representations describing
relations of values in Qupras are only equations.
Qupras aims to represent laws of physics given in
physics textbooks and engineering textbooks. Laws
of physics are generally described not by usi ng
influences in the textbooks, but by using equations.
Therefore, Qupras uses only equations.
The representation of objects mainly consists of
existential conditions and relations. Existential
conditions correspond to conditions needed for the
objects to exist. Objects satisfying these conditions
are called active objects. The relations are
expressed as relative equations which include
physical variables (hereafter physical quantities
are referred to as physical variables). If existential
conditions are satisfied, their relations become
known as relative equations that hold for physical
variables of the objects specified in the physical
rule definition.
The representation of physical rules mainly
consists of objects, applied conditions and
relations. The objects are those necessary to apply
a physical rule. The representations of applied
conditions and relations are similar to those of
objects. Applied conditions are those required to
activate a physical rule, and relations correspond
to the laws of physics. Physical rules whose
necessary objects are activated and whose
conditions are satisfied are called active physical
rules. If a given physical rule is active, its
relations become known as in the case of objects.
Qualitative reasoning in Qupras involves two
forms of reasoning: propagation reasoning and
prediction reasoning.
Propagation reasoning
determines the state of the physical system at a
given moment (or during a given time interval).
Prediction reasoning determines the physical
variables that change with time, and predicts their
values at the next given point in time. The
propagation reasoning also determines the
subsequent states of the physical system using the
results from the prediction reasoning.
3.1.2 Behavior Reasoner
The behavior reasoner is not much different
from that of Qupras. The two features below are
additions to that of Qupras.
(1) Envisioning
In Qupras, if conditions of physical rules
and objects cannot be evaluated, Qupras
asks the user to specify the conditions. It is
possible for Desq to continue to reason in
such situations by assuming unevaluated
condi tions.
(2) Propagation of new constraints on constants
There are two types of parameters
(quantities): constant and variable. In
envisioning, the constraints related to some
constant parameters become stronger by
hypothesizing some conditions in the
definitions of physical rules and objects.
The constraints propagate to the subsequent
states.
Before the reasoning, all initial relations of the
objects defined in the initial state are set as known
relations, which are used to evaluate the conditions
of objects and physical rules. Initial relations are
mainly used to set the initial values of the physical
variables. If there is no explicit change to an
initial relation, the initial relation is held. An
example of an explicit change is the prediction of
the next value in the prediction reasoning.
1026
......
Add constraints
. . . _____ . Communication
among constraint
solvers
Linear part
Figure 3 Combined constraint solver
Propagation reasoning finds active objects and
physical rules whose conditions are satisfied by the
known relations. If a contradiction is detected, the
propagation reasoning is stopped. If no condition
of physical rules and objects can be evaluated, the
reasoning process is split by the envisioning
mechanism into two process: one process
hypothesizing that the condition is satisfied and
other hypothesizing that it is not.
Prediction reasoning first finds the physical
variables changing with time from the known
relations that result from the propagation
reasoning. ,Then, it searches for the new values or
the new intervals of the changing variables at the
next specified time or during the next time
interval. Desq updates the variables according to
the sought values or intervals in the same way as
Qupras. The updated values are used as the initial
relations at the beginning of the next propagation
reasoning.
3.2 Design parameter calculator
The method of
calculating the design
parameters is simple. After finding all possible
behaviors, the designer specifies which design
parameters to calculate. Then, the upper and
lower values of the specified parameters are
calculated by the parallel constraint solver.
3.3 Parallel constraint solver
The parallel constraint solver tests whether the
conditions written in the definitions of physical
rules and objects are proven by the known relations
obtained from active objects and active physical
rules, and from initial relations.
We want to solve nonlinear simultaneous
inequalities to test the conditions in the definitions
of objects, physical rules and events. More
than one algorithm is used to build the
combined constraint solver, because we ·do
not know of any single efficient algorithm for
nonlinear simultaneous inequalities. We
connected the three solvers as shown in
Figure 3. The combined constraint solver
consists of the following three parts:
(1) Nonlinear inequality solver based on
the interval method [Simmons 86],
(2) Linear inequality solver based on the
Simplex method [Konno 87], and
(3) Nonlinear simultaneous equation
solver based on the Groebner base
method [Aiba 88].
If anyone of the three constraint solvers
finds new resul ts, the resul ts are passed on
to the other constraint solvers by the control
parts. This combined constraint solver can
solve broader equations than each individual
solver can. However, its results are not always
valid, because it cannot solve all nonlinear
sim ul taneous inequalities.
The reason why we can get quantitative ranges
is that the combined constraint solver can process
quantities quantitatively as well as qualitatively.
4 Example
4.1 Description of Model
We show another example of the operator. We
use a DTL circuit identical to that the same as in
Figure 1. In this example, however, the input
voltage and the resistance Rb are undefined.
initial_state dtI
objects Rl-resistor;
Rg-resistor ;
Rb-resistor ;
Tr-transistor;
Dl-diode;
D2-diode2 ;
initiaCrelations
connect(ti !RI,tl !Rg) ;
connect(t2!Rg,tI !DI,tI !D2) ;
connect(t2!D3,tl!Rb,tb!Tr) ;
connect(t2!RI,tc!Tr) ;
resistance@RI=6000.0 ;
resistance@Rg=2000.0 ;
resistance@Rb>= 0.0;
v@tI !R} = 5.0 ;
v@t2!Dl >= 0.0;
v@t2!Dl =< 10.0 ;
v@te!Tr = 0.0 ;
v@t2!Rb = 0.0 ;
end.
Figure 4 Initial state for DTL circuit
1027
The initial data is shown in Figure 4. The
"objects" field specifies components and their
classes in the DTL circuit. The "initial_relations"
field specifies the relations holding in the initial
state. For example, "connect(t2!Rg, t1!Dl, t1!D2)"
specifies that the terminal t2 of the resistor Rg, the
terminal t1 of the diode Dl, and the terminal t1 of
the diode D2 are connected. The "!" is a symbol
specifying a part. The "t2!Rg" expresses the
terminal t2 which is one part of Rg. Rb is specified
as a resistor in the "objects" definition. The "@"
indicates a parameter.
The "resistance@Rl"
represents the resistance value of Rl.
The
"resistance@RI = 6000.0" specifies that RI is 6000.0
ohms. The resistance Rb is constrained to be
positive, and the input voltage, which is the voltage
of the terminal t2 in the diode Dl, is constrained to
be between 0.0 and 10.0 volts. Both values are
undefined, and Rb is a design parameter.
Figure 5 shows the definition of a diode. Its
super object is a two_terminal_device, so the diode
inherits the properties of the two_terminal_device,
i.e., it has two parts, both of which are terminals.
Each terminal has two attributes "v" for voltage
and "i" for current. The diode has an initial
object tenninal:Tenninal
attributes
v;
i;
end.
object two_tenninal_device:TTD
parts_of
tI-terminal ;
t2-terminal;
end.
object diode:Di
supers
two_tenninal_device;
attributes
v',
i;
resistance-constant ;
initial_relations
v@Di=v@tl!Di-v@t2!Di ;
state on
conditions
v@Di>=0.7;
relations
v@Di=0.7;
i@Di>=O.O;
state off
condition
v@Di<0.7;
relations
resistance@Di= 100000.0 ;
v@Di=resistance@Di*i@Di ;
end.
Figure 5 Definition of diode
physics three_conneccl
objects
TID1 - two_tenninaCdevice;
TID2 - two_tenninal_device ;
TID3 - two_tenninal_device ;
T1-tenninal partname t1 part_ofTTDl ;
T2-tenninal partname t1 part_ofTTD2;
TI-tenninal partname t1 parcofTTD3;
conditions
connect(T1,T2,T3);
relations
v@T1 =v@T2;
v@T2=v@T3;
i@T1 + i@T2 + i@T3 = 0 ;
end.
Figure 6 Definition of physics
relation, which specifies the voltage difference
between its terminals. The diode also has two
states: one is the "on" state where the voltage
difference is greater than 0.7, and the other is the
"off' state where the voltage difference is less than
0.7. If the diode is in the "on" state, it behaves like
a conductor. In the "off' state, it behaves like a
resistor. A transistor is defined like a diode, but it
has three states, "off', "on" and "saturated" (In the
example of Figure 1, we used a transistor model
with two states, "off' and "on").
Figure 6 shows the definition of a physical rule.
The rule shows Kirchhoffs law when the
terminals t1 of three two_terminaLdevices are
connected. It is assumed that the current into t1 of
a two_terminal_device flows to the terminal t2. In
fact, three two_terminal_devices can be connected
in eight ways depending on how the terminals are
connected.
Table 1 All behaviors of DTL circuit
State
RanRe of input
10N-ON-SAT 1.40081 -1.5381
20N-ON-ON 1.4 -1.40081
30N-ON-OFF 0.7-1.4
40N-OFF-ON 0-1.4007
50N-OFF-OFF 0-}.4
6 OFF-ON-SAT 1.40081 -10.0
70FF-ON-ON 1.4 -10.0
80FF-ON-OFF 0.rl0.0
19 OFF-OFF- * ~onfhct
Ranl1.e ofresistance value
486.16 - infmity
482.75 - infinity
0-233,567
100,000 - infmitv
0-233,567
460.9 - infmity
457.8 - 488.53
0-484.1
RanRe of output
0.2
0.r5.0
4.94
0.842-5.0
4.94
0.2
0.2-5.0
4.94
4.2 Results
Table 1 shows all behaviors of the DTL circuit
obtained by envisioning.
The state column
indicates the states of the diode, the diode2 and the
transistor. The following columns show the range
of the input voltage (volts), the range of the
resistance Rb (ohms), and the range of the output
voltage (volts). As is shown, the envisioning found
nine states. Because the input voltage and the
resistance Rb were undefined, the conditions of the
two diodes and the transistor could not be
1028
evaluated.
So, Desq was used to
hypothesize both cases, and to search all
paths. Figure 7 shows the relationship
between the resistance and the input
voltage. The reason why the ranges in
Table 1 overlap is because the models of
the diodes and the transistor are
approximate models.
A designer can decide, by looking at
Figure 7, the resistance Rb for the DTL
circuit to behave as a NOT circuit. It is
desired for Rb to be greater than about 0.5
k ohms, and less than about 100 k ohms,
so that the DTL 'circuit can output a low
voltage (nearly 0 volts) when the input is
greater than 1.5 volts, or can output a
high voltage (nearly 5 volts) when the
input is less than about 1.5 volts. The
range is shown by the area enclosed by
the dotted lines in Figure 7.
200k
5. Related Works
Desq does not suggest structures of
devices like the methods of [Murthy 87]
and [Williams 90]. Rather, it suggests the
ranges of design parameters for
preferable behaviors. The suggestion is
also useful, because determining values Figure 7
of design parameters is one of the
important steps of design [Chandrasekaran 90].
This approach may be regarded as one
application of constraint satisfaction problem
solving. There are several papers that deal with
electronic circuits as examples, using constraint
satisfaction problem solving [Sussman 80, Heintze
86, Mozetic 91]. Sussman and Steele's system
cannot suggest ranges for design parameters,
because their system uses only equations. Heintze,
Michaylov and Stuckey's work using CLP(R) to
design electronic circuits is the most similar to
Desq, but Desq is different from Heintze's work for
the following points:
(1) Knowledge on objects and laws of physics is
more declarative for Desq.
(2) Desq can design ranges of design parameters
(of devices) that change with time.
(3) Desq can deal with nonlinear inequalities,
and Desq can solve nonlinear inequalities in
some cases.
In Mozetic and Holzbaur's work, numerical and
qualitative models are used. In their view, our
approach uses numerical models rather than
qualitative models. But, if a constraint solver is
used to solve inequalities, it is possible to use both
numerical and qualitative calculations.
1.0
2.0
10.0
Input voltage (volts)
Relationship between Resistance and Input voltage
6. Conclusion
We have described a method of suggesting
ranges for design parameters using qualitative
reasoning, and implemented the method in Desq.
The ranges obtained are quantitative, because our
system deals with quantities quantitatively as well
as qualitatively. In an example utilizing the DTL
circuit, Desq suggested that the range of a
resistance (Rb in Figure 1) should be greater than
about 0.5 k ohms and less than about 100 k ohms to
work the DTL circuit as a NOT circuit. If the
designer wishes for a more detailed design, for
example, to minimize the response time by
performing numerical calculation, helshe need not
calculate outside the range, and thus can save on
the calculation cost, which is greater for direct
numerical calculation (outside range).
However, there are some possibilities that Desq
cannot suggest valid ranges or the best ranges for
design parameters.
This is because of the
following:
(1) The ability to solve nonlinear inequalities in
Consort is short
Desq may suggest invalid or weak ranges
because Consort cannot perfectly solve
nonlinear inequalities. But, almost all
results can be obtained by performing more
1029
detailed analysis using numerical analysis
systems, for example, SPICE.
(2) Inexact definitions are used
It may be difficult to describe the
definitions of physical rules and objects.
This is because from inexact definitions,
inexact results may be obtained.
(3) The ability to analyze circuits is short
The current Desq cannot analyze positive
feedback. If there are any posi tive feedbacks
in a circuit, Desq may return wrong
results.
The example in this paper does not change with
time. We are currently working on how to
determine ranges of design parameters (of
circuits) that change with time, for example, a
Schmidt trigger circuit. In such a case, we need
to propagate new constraints on constant
parameters. Moreover, we are investigating the
load balancing of the parallel constraint solver to
speed it up.
7. Acknowledgements
This research was supported by the ICOT
(institute of New Generation .Q..Q.mputer
Technology). We wish to express our thanks to Dr.
Fuchi, Director of the ICOT Research Center, who
provided us with the opportunity of performing this
research in the Fifth Generation Computer
Systems Project. We also wish to thank Dr. Nitta,
Mr. Ichiyoshi and Mr. Sakane (Current address:
Nippon Steel Corporation) in the seventh research
laboratory for their many comments and
discussions regarding this study, and Dr. Aiba,
Mr. Kawagishi, Mr. Menjyu and Mr. Terasaki in
the fourth research laboratory of the ICOT for
allowing us to use their GDCC and Simplex
programs, and helping us to implement our
parallel constraint solver. And we wish to thank
Prof. Nishida of Kyoto Univers'ity for his
discussions, Mr. Kawaguti, Miss Toki and Miss
Isokawa of Hitachi Information Systems for their
help in the implementation of our system, and Mr.
Masuda and Mr. Yokomizo of Hitachi Central
Research Laboratory for their suggestions on
designing electric circuits using qualitative
reasoning.
References
[Aiba 88] Aiba, A., Sakai, K., Sato Y., and Hawley, D. J.:
Constraint Logic Programming Language CAL, pp. 263276, Proc. of FGCS, ICOT, Tokyo, 1988.
[Bobrow 84] Bobrow, D. G.: Special Volume on Qualitative
Reasoning about Physical Systems, Artificial
Intelligence, 24, 1984.
[Chandrasekaran 90] Chandrasekaran, B.: Design Problem
Solving: A Task Analysis, AI Magazine, pp. 59-71, 1990.
[Forbus 84] Forbus, K. D.: Qualitative Process Theory,
Artificial Intelligence, 24, pp. 85-168, 1984.
[Hawley 91] Hawley, D. J.: The Concurrent Constraint
Language GDCC and Its Parallel Constraint Solver,
Proc. of KL1 Programming Workshop '91, pp. 155-165,
ICOT, Tokyo, 1991.
[Heintze 86] Heintze, N., Michaylov, S., and Stuckey P.:
CLP(R) and some Electrical Engineering Problems,
Proc. of the Fourth International Conference of Logical
Programming, pp.675-703, 1986.
[Konno 87] Konno, H.: Linear Programming, NikkaGirren, 1987 (in Japanese).
[Kuipers 84] Kuipers, B.: Commonsense Reasoning about
Causality: Deriving Behavior from Structure, Artificial
Intelligence, 24, pp. 169-203, 1984.
[Mozetic 91] Mozetic, I. and Holzbaur, C. : Integrating
Numerical and Qualitative Models within Constraint
Logic Programming, Proc. of the 1991 International
Symposium on Logic Programming, pp. 678-693, 1991.
[Mizoguchi 87] Mizoguchi, R.: Foundation of expert systems,
Expert system - theory and application, Nikkei-McGrawHill, pp. 15, 1987 (in Japanese).
[Murthy 87] Murthy, S. and Addanki, S.: PROMPT : An
Innovative Design Tool, Proc. of AAAI-87, pp.637-642,
1987.
[Nishida 88a] Nishida, T.: Recent Trend of Studies with
Respect to Qualitative Reasoning (I) Progress of
Fundamental Technology, pp. 1009-1022, 1988 (in
Japanese).
[Nishida 88b] Nishida, T;: Recent Trend of Studies with
Respect to Qualitative Reasoning (II) New Research Area
and Application, pp. 1322-1333, 1988 (in Japanese).
[Nishida 91] Nishida, T.: Qualitative Reasoning and its
Application to Intelligent Problem Solving, pp. 105-117,
1991 (in Japanese).
[Ohki 86] Ohki, M. and Furukawa, K: Toward Qualitative
Reasoning, Proc. of Symposium of Japan Recognition
Soc., 1986, or ICOT-TR 221, 1986.
[Ohki 88] Ohki, M., Fujii, Y., and Furukawa, K: Qualitative
Reasoning based on Physical Laws, Trans. Inf. Proc.
Soc. Japan, 29, pp. 694-702,1988 (in Japanese).
[Ohki 92] Ohki, M., Sakane, J., Sawamoto, K, and Fujii, Y.:
Enhanced Qualitative Physical Reasoning System:
Qupras, New Generation Computing, 10, 1992 (to appear).
[Ohwada 88] Ohwada, H., Mizoguchi, F., and Kitazawa, Y.: A
Method for Developing Diagnostic Systems based on
Qualitative Simulation, J. of Japanese Soc. for Artif.
Intel., 3, pp. 617 -626, 1988 (in Japanese).
[Simmons 86] Simmons, S.: Commonsense Arithmetic
Reasoning, Proc. of AAAI-86, pp. 118-128, 1986.
[Sussman 80] Sussman, G. and Steele, G. : CONSTRAINTS A Language for Expressing Almost-Hierachical
Descriptions, Artificial Intelligence, 14, pp. 1-39, 1980.
[Yamaguchi 87] Yamaguchi, T., Mizoguchi, R., Taoka, N.,
Kodaka, H., Nomura, Y., and Kakusho, 0: Basic Design
of Knowledge Compiler Based on Deep Knowledge, J. of
Japanese Soc. for Artif. Intel., 2, pp. 333-340, 1987 (in
Japanese).
[Williams 90] Williams B. C.: Interaction-based Invention:
Designing Novel Devices from First Principles, Proc. of
AAAI-90, pp.349-356, 1990.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1030
Logical Implementation of Dynamical Models
Yoshiteru Ishida
Division of Applied Systems Science
Kyoto University, Kyoto 606 Japan
ishida@kuamp.kyoto-u.ac.jp
Abstract
In this paper, we explore the logical system which reflect
the dynamical model. First, we define the" causality"
which requires "time reference". Then, we map the causation to the specific type of logical implications which
requires the time fragment dt > 0 at each step when
causal changes are made. We also propose a set of axioms, which reflect the feature of state-space and the
relation between time and state-space. With these axioms and logical implications mapped from the dynamical systems, the dynamical state transition can be deduced logically. We also discussed an alternative way of
deducing the dynamical state change using time operators and state-space operators.
1
namical model as well as some meta-rules which reflects
that the state-space of dynamical systems is continuous
to logical rules, the qualitative reasoning on dynamica
systems can be done by logical deductions.
Section 2 discusses the causality on the dynamica
model. The causality is defined in terms of physica
time. Then the causation is mapped to the logical implication which requires time fragment (dt > 0) at each
step. Cause-effect sequence is obtained by the deduction
where the new fact dt > 0 is required at each step. Section 3 discusses the relation between some concepts on
dynamical models and those on logical systems. Section
4 presents a set of axioms from which state transitions
are deduced logically. Section 5 discusses an alternative
formalization of logical systems for deducing the dynamical changes.
Introduction
Although the dynamical systems and logical systems are
considered to be completely different systems, there are
several elements in common. We mapped from dynamical systems to logical systems to investigate the following
questions:
(1) How the fundamental concepts in dynamical systems such as observability, stability can be related to
those in logical systems such as completeness, soundness
? (2) In order to attain the dynamical simulation on the
mapped logical systems, what are necessary? (3) Can
the qualitative simulation be carried out by deducing
the future state from the current state and some axioms
characterizing time, state-space and their relations?
We consider it is crucial to discriminate (physical)
causality explicitly from logical deducibility. We studied a causality characterized by "the time reference"
other than event dependency for the discussion of physical causality. The physical causality (or equivalently
"change" through physical time) is intrinsically embedded in a dynamical model which states the causal relation
between what is changed and what makes the cha.nge.
In this paper, we treat the physical causality as specific
type of deduction which always requires the fact of the
time fra.gment dt > 0 at each step. By mapping the dy-
2
2.1
Mapping Causality in Dynamical Models to Logical Implications
Causality referring to time
The causality has the following two requirements, which
seem intuitively sound for a causality for the discussion
of dynamical change. When we say "the event A caused
the event B", we must admit
(1) Time Reference: The event A occurred "before"
the event B, (2) Event Dependency: The occurrence 0
the event B must be "dependent on" the occurrence 0
the event A.
The "time reference" plays a crucial role to make
clear distinction between "the causality" and logical deduction. In the original dynamical model of the form:
dY/dt = X
contains the "built-in causal" direction from the right
hand side to the left hand side. We restrict ourselves to
interpret the form dY/ dt = X as follows: X > 0 caused
dt > 0) or is capable of causing the event of Y increase
dY > 0 ). The requirement of the new fact dt > 0 should
be claimed to verify the "built-in causality". Thus the
1031
these causal path, the following model is obtained.
form will be mapped to the logical form:
2.2
Language for dynamics
In order to logically describe the constraint of dynamical
model, we use the following First Order Predicate Calculus. We use the 4-place predicates p(x,i), n(x,i), z(x,i)
which should be interpreted as positive, negative, zero of
the variable x at certain moment i. p(x,i), for example
is interpreted as follows:
( Z') = {true,
f al se,
P x,
d6Xs/dt = -a' 6Po - d· 6Xs
d6Q/dt = b·(6Pi-6Po-c·(2X s·Q6Q-Q 2 6X s)/ X s2
d6Po/dt = e . (2Q6Q - f . 6Po)
where a, b, c, d, e, and f are appropriately chosen positive constants. 6 x denotes the variance from the equilibrium point of x.
if x (at time i) > 0;
otherwise.
Since the state must be unique at any moment, these
predicates must satisfy the following uniqueness axioms
U.
U-(l) 'r/x'r/i(p(x, i) ~ (('" n(x, i))
U-(2) 'r/x'r/i(n(x, i) ~ (('" p(x,i))
U-(3) 'r/x'r/i(z(x, i) ~ ((rv n(x, i))
1\ ('"
1\ ('"
1\ ('"
z(x, i))))
z(x,i))))
p(x, i))))
We also use the 2-place predicate of inequality >
Xs
Po
/
9 _~
xxxxxxxxxxxxxx~xxxx~~xxxxxx
pi
Fig.1
(x, y). Other than these three predicates, we also use
functions such as d/dt(time derivative), +(addition),
-(substraction), . (multiplication) , /(division) defined on
the time varying function x( t) in our language.
With these predicates, the causality defined from X to
Y can be written by:
2.3
A Schematic Diagram of
Pressure Regulator
The first equation of the model, for example,
mapped to the logical formulae:
n(6po(t), i) 1\ p(dt, i)
~
p(d6xs/dt, i)
p(X(t),i)
1\
p(dt, i)
~
p(dY/dt, i)
p(6po(t), i)
p(dt, i)
~
n(d6xs/dt, i)
n(X(t), i)
1\
p(dt, i)
~
n(dY/dt, i)
z(6po(t), i) V z(dt, i)
~
z(d6xs/dt, i)
z(X(t), i) V z(dt, i)
~
z(dY/dt, i)
Causality in dynamical models
We formalize the" causality" by the propagation of sign
in the dynamical model. In the propagation, time reference is included, since p(dt, i) is always needed to conclude the causation.
Example 2.1.
1\
With the set of logical formulae, which are mapped
from the dynamical equations and the following axioms,
we can obtain a cause-effect sequence by the causal deduction on this model.
1-(1 )
'r/x'r/i'r/j(z(x, i) 1\ p(dx/dt, i) 1\ (j > i)
3k((j > k)
1\
~
(k > i) I\p(x,j,k)))
I-(2)
'r/x'r/i'r/j(z(x, i)
In
order
to
compare
the simulation results with those done by other qualitative simulation [de Kleer and Brown 1984], we use the
same example of pressure regulator as shown in Fig. 1.
We can identify the causality in the feedback path.
The flow also is caused by a driving force and by the
available area for the flow. Further, the pressure at a
point is caused by the flow through the point. Reflecting
IS
1\
n(dx/dt, i)
1\
(j > i)
~
3k((j > k)l\(k > i)l\n(x,j,k)))
These
are
the
instant
change
rules
[de Kleer and Bobrow 1984]' which state that z(x, i) is
a point with measure zero.
Suppose Pi is disturbed p( 6Pi, 0) when the system
is in a stationary state (all the derivatives are zeros
1032
then the initial sign vector is (8Pi, 8Po, 8Q, 8X s) =
(+,0,0,0). We will use this state-state vector notation when needed instead of an awkward notation of
p(8Pi, 0), z(8Po, 0), z(8Q, 0), z(8Xs,0).
By the causal deduction, p(8Q, N1) is first obtained( first step). Including this new state as the fact,
we can then obtain p(8Po, N2) by the causal deduction again(2nd step). Including this state as the new
results and using third time fragment dt > 0, we obtain
n(5Xs, N3) by the causal deduction (3rd step).
3
Logical System and Dynamical System
In the previous section, we regarded the causality builtin the dynamical model as logical implication. Then, the
dynamical state change can be carried out in a similar
manner to deduce the new fact from the logical formulae
corresponding to the dynamical model and the time fragment p( dt, i). In order to use the causal relation in the
dynamical model, the dynamical model must be original
one. That is, the original dynamical model must reflect
causal path between two physical entities.
In this section, we consider some correspondence between the important concepts in dynamical systems and
those in logical systems.
Theorem 3.1(observability and deducibility)
The dynamical system is qualitative observable from
a observer y iff the non-zero of the observer y can be
deduced in the mapped logical system when the fact
that some variables (corresponding to the dynamical system)are non-zero is given.
This result can be used to save some deduction processes when some variables are known to be observable
or not. Further, this result can also used to investigate
the qualitative stability which can be known by the observability of the system [Ishida 1989].
Definition 3.2 (completeness and soundness)
The mapped logical system is called complete (sound)
if all the state which can (not) be attained by the corresponding dynamical system in the finite time can (not)
be deduced in the finite number of steps.
Conjecture 3.3
The mapped logical system is always complete but not
always sound.
This fact is often stated in qualitative reasoning, but
not formally proved yet. Most formal discussion may be
found in [Kuipers 1985,Kuipers 1986], stating that
Each actual behavior of the system is necessarily among those produced by the simulation.
But,
There are behaviors predicted by qualitative
simulation which do not correspond to the behaviorof any system sarisfying the qualitative
structure description.
We will see the example showing the lack of soundness
of the mapped logical system in the next section. The
lack of soundness is due to the following fact.
Proposition 3.4
Two equivalent dynamical systems may be mapped to
the different logical systems.
That is, two dynamical systems which can be transformed to each other, may be mapped to the different
logical systems. In fact, a dynamical system is usually
mapped to a part of the exact logical system. Therefore, in order to make the mapped logical system close
to the dynamical model, we must map from the multiple
dynamical models which are equivalent as a dynamica
model, and combine these mapped logical systems. We
have not yet known what kinds of equivalent dynamica
models suffice to make the mapped logical system exact.
4
Reasoning about State by Deduction
Thp. causal deduction stated in the previous section cannot say anything as to changes when some time interva
passed. That is, when many variables approaching to
zero, which one reaches zero first. In order to determine
this, meta-rules which are implicit in dynamical models
must be explicitly introduced. The following axioms reflect the fact that the state-space of the dynamical models are continuous. The lack of continuous and dense
space in the logical system is the fundamental points
which discriminate logical systems from dynamical systems.
T-(l)
'Vx'Vi(p(x, i) 1\ n(dx/dt, i)
~
3j((j > i) 1\ z(x,j)))
~
3j((j > i) I\p(x,j)))
T-(2)
'Vx'Vi(n(x, i) I\p(dx/dt, i)
These axioms T-(1),(2) comes from value continuity
rule stated in [de Kleer and Bobrow 1984]. This axiom T does not correctly reflect the world of dynamica
model. Even if x > and dx / dt < 0, x does not necessarily become zero in the finite or infinite time.
°
1033
M-(1)
\fx\fj1\fj2(p(x,j1) "n(x,j2)" (j2 > j1))
--+
3j3(z(x, j3) "(j3 > j1) " (j2 > j3)))
M-(2)
\fx\fj1\fj2(n(x,j1) "p(x,j2) " (j2 > j1))
--+
3j3(z(x, j3) "(j3 > j1) " (j2 > j3)))
These axioms M-(1),(2) corresponds to the well-known
intermediate value theorem, which reflects the continuity
of the function x. Axioms T and M states the continuity
of the state-space and that of the function from time to
state-space. Other than axioms U,I,M,T, we need the
following assumptions. That is, the state remains to be
the same as the nearest past state unless otherwise deduced. We call this no change assumption. We could
not formalize this assumption by a logical formula of our
language so far. This seems to be a common problem
to any formalization for reasoning about such dynamic
concepts as state change, actions, and event. The situation calculus [McCarthy and Hayes 1969], for example,
uses the Frame Axioms! to avoid this problem.
Example 4.1.
Let us consider the mass-spring system with friction
[de Kleer and Bobrow 1984] (Fig. 2) whose model is of
the form:
(4-1) dx / dt = v
(4-2) dv /dt = -kx - fv where k and f are positive
constants.
(4-2) is the original form containing the built-in causality whereas (4-1) is the definition of v.
since (p(x,O) "p(v, 0)) U Gdm --+ n(dv/dt,O). This result n( dv /dt, 0) is inconsistent with the initial pattern
p(dv/dt,O) under the uniqueness axiom U.
We do not consider the initial sign pattern which contains zero for any variables, since the pattern will change
immediately to the sign pattern with only non zero patterns by axiom I. Thus these three patterns cover all the
possible sign combinations.
We only show the deduction for the simulation of the
case 1 when p(x,O), n(v,O), n(dv/dt, 0) are given as the
initial pattern. Other cases can be deduced in a similar
manner to this case 1 from the initial sign pattern, the
set of logical formulae G dm and G ch . By the axiom T,
p(x, 0)" n(v, 0)
3-
Fig.2
X
A Schematic Diagram of
Mass-Spring System with Friction
3N1((N1 > 0)" z(x, N1)
By the no change assumption, other variables are assumed to remain the nearest past signs; that is
n(v, N1), n(dv/dt, N1).
However,
(n( v, Nl) " z(x, N1)) U G dm
--+
p(x, Nl).
Thus, we have
z(x, Nl), n(v, Nl), p(dv/dt, Nl).
Then by the axiom M,
(N1) 0) "n(dv/dt, 0) "p(dv/dt, N1)
0)" (N1 > N2)" (z(dv/dt, N2))
--+
3N2((N2 >
By the no change assumption, other variables at time
N2 are assumed to remain the nearest past signs; that
is p(x, N2), n(v, N2). Since n(v, N2) " z(dv/dt, N2) U
G dm
-~----
--+
--+
p(cPv/dt 2 ,N2)
and by the axiom I,
p(d 2 v/dt 2 , N2) " z(dv/dt, N2) ,,(Nl > N2)
3N3((NI > N3)" (N3 > N2) "p(dv/dt, N3)).
--+
Again by the no change assumption,
p(x, N3), n( v, N3).
By applying the axiom I to the state at N 1,
As for the initial sign patterns of (x, v, dv/dt), we
consider oIily three cases; (+, -, -), (+, +, -), (+, -, +).
Let G dm denote the set of logical formulae corresponding to the dynamical model, and Gch those corresponding to the axioms U,I,T,M. The sign pattern (+,+,+)
and its opposite pattern (-, -, -) are are inconsistent,
1 Frame axioms are collection of statements that do not change
when an action is performed.
n(v, Nl)" z(x, Nl)
--+
3N4((N4 > Nl) "p(x, N4)).
n(v,N4) and p(dv/dt,N4) are obtained by the no
change assumption. By applying the axiom T to the
1034
(x, dx/dt, d?x/dt 2 ) = (0,0,0) which is attained when in-
state N4,
n(v,N4) "p(dv/dt,N4)
(z(v, NS)).
-+
3NS((NS > N4)"
Again signs of other variables at NS remain to be the
same as those at N4. By applying the axiom I to the
state NS, we have
finite time passed in the dynamical model. In fact, we
only have periodic states as shown in Tables 1. However, the infinite sequences of deduction similar to this
convergence can be found. When the initial sign pattern
is (x, dx/dt, d?x/dt 2 , ... ) = (+, -, +, ... ), apply the axiom
T to n(dx/dt,O) then we have
3Nl((Nl > 0)" z(dx/dt, Nl)).
z(v, NS) " p(dv/dt, NS)
p(v, N6))
-+
3N6((N6 > NS)"
Then applying the axiom M to this result, we will have
In summary we have deduced, the set of the state
at different time p(x, N2), n(v, N2), z(dv/dt, N2),
3N2((NI > N2)" z(d?x/dt 2 , N2)).
p(x, N3), n(v, N3), p(dv/dt, N3), z(x, Nl), n(v, Nl),
p(dv/dt, Nl),
n(x,N4), n(v,N4), p(dv/dt,N4), n(x,NS), z(v,NS),
p(dv/dt, NS), n(x, N6), n(v, N6), p(dv/dt, N6) and the
order of time (0 < N2 < N3 < Nl < N4 < NS < N6).
This application of the axiom M progressively to any
higer order time derivative of x. That is we have
Tables 1 show the state transitions starting the initial
patterns case 1, case2 and case3.
This is an interesting corresponding between the dynamical model and the mapped logical systems. It may
suggest to introduce some operations in the logical system (other than deduction) which corresponds to the operation limt-+oo x(t).
We will show this convergence can be deduced even
in the finite step using the logical implications mapped
from a different (but equivalent) dynamical model. The
dynamical model (4-1), (4-2) is equivalent to the dynamical model:
Tables 1 State Transition by Logical Deduction
0
1
2
3
+
+
+
0
-
-
-
0
+
+
+
+
+
0
6
+
At step 6, the opposite pattern
initial pattern comes.
4
S
It I
of the
I dx/dt I d2x/dt 2 I
I!HI ~ I I
At step 2, the same pattern as the initial
pattern of case 1 comes.
case 3
I t I x I dx/dt I d2x/dt 2 I
I!II~I
+ 1)" z(di +1 x/dt i +1, Ni + 1)).
(4-3) E = x 2 + l/k * (dx/dt)2
(4-4) dE/dt = - f(dx/dt)2
. case 2
x
3Ni + 1((Ni > Ni
-I ! I
At step 2, the opposite pattern of the
initial pattern of case 2 comes.
This states that E and hence x will event ually become
zero as long as f > O. Table 2 shows the state transition
of the mapped logical system. This convergence of the
dynamical system is attained in the infinite time, and
hence need not be deduced in the mapped logical system. Since the current logical system does not have the
concepts of convergence and infinite step, these concepts
are out of scope of the mapped logical systems.
The results show that the logical system mapped from
the dynamical model (4-3)(4-4) is quite differet from that
mapped from the dynamical model (4-1)(4-2), althogh
these dynamical models are equivalent. Therefore, this
example shows the correctness of Proposition 3.3. This
point is also fundamental difference between dynamica
systems and logical systems.
Table 2 State Transition of Mass-Spring
System(Energy Model)
It" x
In the logical system mapper from the dynamical
model (4-1) and (4-2), it is impossible to deduce the
state which corresponds to the convergence to the point
I dx/dt I E I dE/dt I
I~II~I ~ I~I
*
0
denotes any sign +, -
I
o.
1035
5
Discussions
We first discuss the temporal logic with the temporal
operators F,P [Rescher and Urquhart 1971], where FA
(PA) means A will(was) be true at some(past) future
time. With the axiom schemata, the feature of these
temporal operators, and even the features of time (e.g.
whether it is transitive, dense, continuity) can be characterized. However, since the logic does not tell anything
about the feature of the state-space and the relation between the state-space and time, it is not possible to infer
the change in the state-space. In fact, the axioms I, T, M
given at section ? characterize the feature of the statespace. An alternative to our approach is to define the
space operators similar to the time operators. One way
of defining space operator follows:
Fx, Px where FxA(PxA) means that A is true at some
point where x is larger(smaller) than the current value.
With this definition, the previous time operator can be
written as F t, Pt.
With these space operators, the axioms I,T,M may be
written as:
1-(1) z(x) ~ GAp(x))
1-(2) z(x) ~ HAn(x))
T-(l) p(x) ~ Px(z(x))
T-(2) n(x) ~ Fx(z(x))
M-(l) p(x) AFAn(x)) ~ Fx(FAz(x)))
M-(2) n(x) AFAp(x)) ~ Fx(FAz(x)))
tion under the condition of dt > O. Time independent
relations are mapped to only deductions. Then causa
reasoning is done by requiring the facts dt > 0 in every
step. This logical reasoning can be implemented on the
the logical reasoning system such as prolog by providing
axioms so far proposed and the mapped dynamical models. (2) In modeling; since we use the causality built in
the dynamical model, we skip qualitative modeling process. That is, we use the dynamical model as qualitative
model. However, the dynamical models must be carefully selected to insure the causal path in the dynamica
models can be reflected on the mapped logical systems.
6
Conclusion
We discussed a mapping from dynamical systems to logical systems to see the correspondence of the fundamental concepts in these two domains, to implement the
causal reasoning system on a logical deduction system.
To clearly separate the physical causality from the usua
deduction, we defined causality in physical system by
making time explicit.
Many fundamental problems remains such as; whether
or not the complete and sound logical system for a dynamical system exists? If yes, how the complete and
sound logical system can be attained?
References
[de Kleer and Brown 1984] de Kleer, 1. and Brown, 1.S.
A qualitative physics based on confluences. Artificia
Intelligence, 24, 7-83, 1984.
Since these axiom schemata I,T,M characterize the feature of only state-space itself, we need the following axioms TS which characterize the monotonic relation between time and state-space.
[Forbus 1984] Forbus, K. D. Qualitative process theory.
Artificial Intelligence 24, 85-168, 1984.
TS-(l) p(dx/dt) ~ ((F A ~ FxA) A (PA ~ PxA))
TS-(2) n(dx/dt) ~ ((PA ~ FxA) A (F A ~ PxA))
[de Kleer and Bobrow 1984] de Kleer, 1. and Bobrow,
D. G. Qualitative Reasoning with Higher-Order
Derivatives. Proc. of AAAI 84 , 86-91, 1984.
[Kuipers 1986] Kuipers, B. Qualitative simulation. Artificial Intelligence 29, 289-337, 1986.
Here, the time operators are used instead of the time
index for the sign predicates p,n,z. The good point of this
space operator approach is that it can be discussed as a
natural extension of temporal logic with temporal operators. However, its critical point is that although these
space/time operators can tell the temporal precedence of
the event but it cannot describe that the different event
A, B occurred at the same time. In the approach taken
in section .4, it is described by putting the same time
tags.
When compared with the qualitative reasoning
[de Kleer and Bobrow 1984], our way of qualitative reasoning is different from theirs in the following two points:
t 1) In reasoning; we defined another causality which
refers to time strictly. Causal reasoning is carried out
by mapping causality in dynamical models to the deduc-
[Kuipers 1985] Kuipers, B. The Limits of Qualitative
Simulation. 1JCAI85, pp.128-136, 1985.
[Ishida 1989] Ishida Y. Using Global Properties for
Qualitative Reasoning: A Qualitative System Theory. Proc. of IlCAI 89, Detroit ,pp. 1174-1179, 1989.
[Struss 1988] Struss, P. Global filters for qualitative behaviors Proc. of AAAI 88, 301-306, 1988.
[Rescher and Urquhart 1971] Rescher, 1. and Urquhart,
A. Temporal Logic, Springer Verlag 1971.
[McCarthy and Hayes 1969] McCarthy, 1. and Hayes,
P.l. Some Philosophical Problems from the Standpoint of Artificial Intelligence, Machine Intelligence
4, Edinburgh University Press, 1969.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1036
The CLASSIC Knowledge Representation System
or,
KL-ONE: The Next Generation
Ronald J. Brachman Alexander Borgida* Deborah L. McGuinness
Peter F. Patel-Schneider Lori Alperin Resnick t
AT&T Bell Laboratories, 600 Mountain Ave.,
Murray Hill, NJ 07974-0636, U. S. A.
Abstract
CLASSIC is a recently developed knowledge representation (KR) system, based on a view of frames as structured descriptions, with several important inferable relationships, including description classification. While
much about CLASSIC is novel and important in its own
right, it is especially interesting to consider the system
in light of its unusual (for Artificial Intelligence) intellectual history: it is the result of over a d~cade of research
and evolution in representation systems that trace their
origins back to work on KL-ONE, arguably one ~f the
most long-lived and influential approaches to KR III the
history of AI. We outline some of the novel contributions
of CLASSIC, but pay special attention to its roots, illustrating the maturation of some of the Qriginal features
of KL-ONE and the decline and fall of others. A number of key ideas are analyzed-including the interpretation of frames as descriptions, the classification inference,
and the role of a knowledge representation system in a
knowledge- based application. The rare traceable relationship between CLASSIC and its ancestor gives us an
opportunity to assess progress in a generation of knowledge representation research.
1 Introduction
An unfortunately large fraction of work in Artificial Intelligence is ephemeral, accompanied by much sound and
fury, but, in the end, signifying virtually nothing. Work
on systems with significant longevity to the basic ideas,
such as STRIPS, appears to be the exception rather than
the rule in AI.
In the area of knowledge representation (KR), there
are ideas that have lived on for years, but very few
systems or approaches have seen more than a minimal
number of users for a minimal number of years. l The
KL-ONE system [7, 11] is different: it was "born" over a
dozen years ago, and has had continuous evolution and
influence ever since. Its offspring now number at least
twenty significant projects worldwide, all based directly
on its key ideas of classification and structured inheritance. With well more than a decade behind us, this rich
* Also with the Department of Computer Science, Rutgers
University, New Brunswick, NJ.
tElectronic mail addresses: rjb@research.att . com,
borgida@cs.rutgers.edu,
dlm@research.att.com,
pfps@research.att.com,resnick@research.att.com.
lSNePS and Conceptual Graphs are among the few
exceptions.
history bears closer examination, especially with the advent of the CLASSIC Knowledge Representation System,
a recent development that clarifies and amplifies many of
the central ideas that were more crudely approximated
in the KL-ONE of 1978. CLASSIC goes substantially beyond KL-ONE in its treatment of individuals and rules, its
clarification of subsumption and classification, its integration with its host language, and its concrete stand on
the role of a KR system as a limited deductive database
management system.
While a description of the CLASSIC system would be
interesting in its own right, its motivation and contribution are more easily understood by placing it in the
proper context. Thus, rather than describe the system
in isolation, we here briefly explore some of its key properties in light of their intellectual debt to KL-ONE and its
children. Besides making the case for CLASSIC, this will
also provide us an opportunity to assess in retrospect the
impact of some of the original ideas introduced by KLONE. This is a chance to see how far we have come in a
"generation" of knowledge representation research.
2 KL-ONE: The First Generation
KL-ONE was the first implementation (ca. 1978) of a representation system developed in Brachman's thesis [7].
It was influenced in part by the contemporary Zeitgeist
of "frames" (e.g., see [20]), with emphasis on structured
objects and complex inheritance relationships. But KLONE's roots were really in semantic networks, and it had
a network notation of labeled nodes and links.
Despite its appearance, in some key respects KLONE was quite different from both the semantic network systems that preceded it, and the frame systems
that grew up as its contemporaries. Following papers
by Woods [33] and Brachman [6], KL-ONE rejected the
prevailing idea of an open-ended variety of (domainspecific) link- and node-names, and instead embraced a
small, fixed set of (non-domain-specific) "epistemological primitives" [8] for constructing complex structured
objects. These constructs-which represented basic general relationships like "defines-an-attribute-of" and "is-aspecialization-of," rather than domain-specific relationships like "owns" or "has-employee"-were considered
to be at a higher level of representation than the datastructuring primitives used to implement them. They
could be used as a foundation for building applicationdependent conceptual models in a semantically meaningful way (rather than in the ad hoc fashion typical of
semantic nets).
1037
s.ndOate
11.11
-
n ,HILI
Figure 1: A KL-ONE Concept.
In addition to its clear stand on the semantics of semantic networks, the original KL-ONE introduced a number of important ideas, including these:
• rather than manipulating "slots" -which are in reality low-level data structures--KL-ONE looked at relationships as roles to be played; roles get their meanings from their interrelations-just like the roles in
a drama-and they are not just meaningless labeled
fields of records or indistinguishable empty bins into
which values are dropped;
• a role taxonomy, which allowed roles to be subdivided
into more specific roles; e.g., if child is a more specific
role than relative, then being a child entails something more constrained than being a relative, but includes everything that being a relative in general does;
• structural descriptions, which served to define the relationships between role players; e.g., the difference
between a buyer and a seller in a PURCHASE event
would be specified by reference to other concepts that
specified in which direction money and goods would
flow. These concepts would give substance to the roles,
rather than leaving their meanings open and subject
only to human interpretation of strings like "buyer."
• structured inheritance, which reflected the fact that
concepts (KL-ONE'S name for frames/classes) were
complex structured constructs and their parts were not
independent items to be manipulated arbitrarily.
The KL-ONE language showed its semantic-network
heritage rather directly, in that KL-ONE structures
were drawn in diagrams, with different link-types being indicated with different pictorial realizations. For
example, Figure 1 illustrates a typical KL-ONE concept: the "STARFLEET-MESSAGE" concept uses its parent,
"MESSAGE," to create the description corresponding to "a
MESSAGE whose Sender is a STARFLEET-COMMANDER." In
general, a user built a KL-ONE net like this by calling
rather low-level LISP functions, whose actions might be
to "create a role node" or "add a superconcept link."
After a number of years of use and reimplementation,
it gradually became clear that KL-ONE's approach to
structured objects was substantially different than that
of virtually all of its contemporary systems. The primary realization was that those objects had previously
been used for (at least) two purposes [6,9]: (1) to represent statements, usually of some typical properties (e.g.,
"elephants are gray"), and (2) to act as structured descriptions, somewhat like complex mathematical types
(e.g., "a black telephone," rather than "all telephones
are black"). In the KL-ONE community, the structureddescription aspect came to be emphasized over the assertional one.
Viewing frames as descriptional, rather than assertional, emphasized the intensional aspects of knowledge
representation. This had one primary benefit: it yielded
the idea that the central inference to be drawn was
subsumption-whether or not one description is necessarily more general than another. Subsumption in turn
led to the idea of description classification-taking a description and finding its proper place in a partial order of other descriptions, by finding all subsuming (more
general) descriptions and all subsumed (more specific)
descriptions. KL-ONE-based classification systems were
subsequently used in a number of interesting applications, including natural language understanding [11], information retrieval [27], expert systems [22], and more.
Because of this view of frames, the research foci in the
KL-ONE family gradually diverged somewhat from those
of other frame projects, which continued to emphasize
typicality and defaults.
Another key issue in the KL-ONE community has
been the tension between the need for expressiveness
in the language and the desire to keep implementations computationally reasonable. Two somewhat different approaches can be seen: NIKL [17], and subsequently LOOM [19], added expressive power to the original KL-ONE language, and admitted the possibility of incomplete classification. KRYPTON [12], and subsequently
KANDOR [26], on the other hand, emphasized computational tractability and completeness. While neither of
these approaches is right for every situation, they provide an interesting contrast and highlight a significant
current issue in knowledge representation. This topic is
still under active exploration (see Sections 4.5-4.6).
Over the last decade, systems based on the ideas in KLONE have proliferated in the United States and Europe
(with significant ESPRIT funding), with at least twenty
related efforts currently underway (see [34]). The work
has also inspired seven workshops, two recently being
held in Germany (in 1991) and one coming soon in the US
(1992). These workshops have attracted both theoretical
and practical scientists from several countries, and made
it clear that the class of "KL-ONE-like" representation
systems has both important theoretical substance and
practical impact.
3 The CLASSIC System
The CLASSIC Knowledge Representation System 2 represents a new generation of KL-ONE-like systems, emphasizing simplicity of the description language, a formal
approach, and tractability of its inference algorithms. In
this regard, it is most like KANDOR (and also BACK [32]),
which, while setting important directions for limited
subsumption-based reasoning, had a number of inadequacies. However, the CLASSIC system goes significantly
2CLASSIC stands for "CLASSification of Individuals and
Concepts." It has a complete, fully documented implementation in Common Lisp, and runs on SUN workstations, Apple Macintoshes, Symbolics Machines, etc. It has been distributed to numerous (> 40) universities for research use.
1038
beyond previous description-based KR systems in many
important respects, including its language, integration
with the host system, treatment of individuals, and clarity on the role of a KR system.
In CLASSIC's language, there are three types of objects:
• concepts, which are descriptions with potentially complex structure', formed by composing a limited set of
description-forming constructors; concepts correspond
to one-place predicates;
• roles, which are simple formal terms for properties;
roles correspond to two-place predicates; within this
class, CLASSIC distinguishes attributes, which are functional, from multi-roles, which can have multiple fillers;
• individuals, which are simple formal constructs intended to directly represent objects in the domain of
interest; individuals are given properties by asserting
that they are described by concepts (e.g., "Chardonnay
is a GRAPE") and that their roles are filled by other individuals (e.g., "Bell-Labs' parent-company is AT&T").
The CLASSIC description language is uniform and
compositional-the meaning of a complex description is
a simple combination of the meanings of its parts. 3 The
complete description language grammar in Figure 2 illustrates its simplicity. Besides the description language,
the interface to CLASSIC has a small number of operators
on knowledge bases for the creation of new concepts (and
the assignment of names to them), which include defined
concepts, with full necessary and sufficient conditions;
primitive concepts, which have only necessary conditions
(see [9]); and disjoint (primitive) concepts, which cannot
share instances (e.g., MALE and FEMALE). There is also
an operator to explicitly "close" a role; this makes the
assertion that there can be no more fillers for the role
(see below).
It is important to emphasize that the description constructors and knowledge base operators were chosen only
after careful study and extensive experience with numerous KR systems. For example, virtually every objectcentered representation system has a way to restrict the
type of an attribute; this yields our ALL constructor. All
KR languages need to assert that a role is filled by an
object; this corresponds to FILLS. CLASSIC's set captures the central core of virtually all KL-ONE-like systems in an elegant way: the constructors are minimal,
in that one can not be reduced to a combination of others; and they have a uniform, prefix notation syntax,
which allows them to be composed in a simple and powerful way. Rules (see Sec. 4.4), procedural tests, numeric
ranges (MAX, MIN) and host language values expand
the scope of KL-ONE-like concepts; these were included
after clear user need was demonstrated. Certain more
complex operators were excluded because they would
have clearly made inference intractable or undecidable.
Thus, CLASSIC's language is arguably the cleanest structured description language that tempers expressiveness
of descriptions with tractability of inference (but see
Section 4.5), elegantly balancing representational needs
and inferential constraints in a uniform, simple, compositional framework.
3CLASSIC has a formal semantics, but we will not be able
to elaborate on it here. See [4].
CLASSIC has many novel features, and improves on
its predecessors in a number of ways, one of the most
telling of which is its treatment of individuals. Anything that can be said about a concept can be said about
an individual; thus, partial knowledge about individuals is maintained and used for inference. For example,
we can assert that a person has at least three children
((AT-LEAST 3 child)) without identifying them, or
that all of the children-whoever they are-are female
((ALL child FEMALE)). Individuals from the host language (e.g., LISP), such as strings and numbers, can
be freely used where CLASSIC-supported individuals can,
with consistent treatment. When any individual is added
or augmented, or when a new concept is defined, complete propagation of properties is carried out, so that
all individuals are continuously classified properly, and
monotonic updates are treated completely. The rolefillers of an individual are not considered under the usual
closed-world assumption; this better supports the accumulation of partial knowledge about individuals. Roles·
can be "closed" explicitly when all of their fillers are
known. Most crucially, an individual cannot be proven
to satisfy an ALL restriction or an AT-MOST restriction by looking at its fillers for the role unless all of those
fillers are known. Previous systems either treated this
aspect of assertions incompletely or incorrectly.
Rather than delve further into CLASSIC's individual
features, we will attempt to better articulate its more
general contributions by examining its relation to the
issues that started this whole line of thinking over a
decade ago. In that respect we can not only appreciate
gains made in CLASSIC, but understand the strengths
and weaknesses of the original KL-ONE proposals.
4 Key Intellectual Developments
CLASSIC is innovative in a number of ways, and bears
little surface resemblance to KL-ONE. But it is also very
much a descendant of that system, which introduced a
number of key ideas to the knowledge representation
scene. While we will not have an opportunity here to
delve into all of these ideas, we will examine a few of the
more important issues raised by the original system and
its successors.
4.1 Subsumption as a Central Inference
In KL-ONE, as in all semantic networks that preceded
it (and most systems to follow), the backbone of a domain representation was an "IS-A" hierarchy. The IS-A
("superc" in KL-ONE) link served to establish that one
concept was a subconcept of another, and thus deserved
to inherit all of the features of the superconcept. Virtually all of these systems forced the user to state directly
that such a link should be placed between two explicitly
named concepts. This type of user responsibility is still
common in virtually all frame-based systems and expert
system shells.
In the early 1980's w~ discovered that in a
classification-based system this was the wrong way
around. In the KL-ONE-descendant languages of KRYPTON and KANDOR, where the meaning of a concept could
be determined simply and directly from its structure (be-
1039
::=
THING I CLASSIC-THING I HOST-THING I
built-in names
conjunction
(AND < concept-expression> +) I
universal value restriction
(ALL ) I
(AT-LEAST ) I
minimum cardinality
maximum cardinality
(AT-MOST )
role-filling
(FILLS +) I
role-filler equality
(SAME-AS *) I
test (HOST concept)
(TEST-H *) I
set of individuals
(ONE-OF +) I
numeric range limits
(MIN ::= I
::=
::=
I I I '
::=
I
::=
«attribute-name> +)
::= a function in the host language (COMMON LISP) with three-valued logical return type
Figure 2: The CLASSIC Description Language (comments in italics).
cause the logic had a compositional semantics and necessary and sufficient definitions), it became clear that IS-A
relations were purely derivative from the structure of the
concepts. In other words, the subsumption relation 4 between two descriptions was determined without any need
for a complete explicit hierarchy of IS-A connections.
Of course, it might 'make a difference to the efficiency
of the system if allsubsumption relationships that had
been calculated were cached in some kind of structure
that obviated the need to compute them a second time,
and this is now common practice. But in a system like
CLASSIC, it is clear that this is strictly an efficiency issue.
In essence, systems that force a user to think only
in terms of direct IS-A links place the entire burden of
knowledge structuring on that user. Since every IS-A
assertion is taken at its word, the system can provide
no feedback that the correct relationship has been r~p
resented; all responsibility is the user's. On the other
hand, the CLASSIC system (and others like it) can reliably decide under which concepts a new concept or individual must fit, since it has a compositional interpretation of the parts of any concept. This provides valuable
help to the user in structuring large knowledge bases,
because it is all too easy for us to assume that just because we know something that a term (e.g., a complex
concept, like RED-WINE) implies, the system will know
it as well. This advantage has been documented in the
LASSIE system [14], which uses classification to support
a software information system. Systems that do not do
classification do not have defined concepts, and therefore
treat everything as primitive [9]. Thus we can be falsely
lulled into assuming that when we assert that a particular WINE has color = Red, the system will know that
it is a RED-WINE; but a non-classification system will not
make that inference. 5
4S ubsumption is defined formally in [18] and [4]. Concept
a subsumes concept b iff instances of b are instances of a in
all possible interpretations.
5Note that CLASSIC and its cousins all do normal inheritance of properties. Most ofthese systems are strictly monotonic for simplicity, but LOOM [19] has a default component.
4.2 From LISP Functions to Languages
The realization that the structure of a concept is the
only source of its meaning, and that any IS-A hierarchy
is induced by such structures, leads to another significant
point of departure for the CLASSIC system. CLASSIC has
a true knowledge representation language-a grammar
of expressions. KL-ONE and even many of its successors treated a knowledge base as a set of data structures
to be more or less directly manipulated by a user, and
thus the user interface was strictly in terms of node- and
link-managing functions. Instead (following KRYPTON)
CLASSIC is really based on a formal logic, with a formal
syntax, rules of inference, and a formal interpretation of
the syntax (see [4]).
Of all of the KL-ONE-like systems, the CLASSIC system
has the cleanest language. As shown in Figure 2, the
language is simple, uniform, and compositional. Figure 3 illustrates the difference in style between KLONE structures and the lexical language of the CLASSIC system. 6 The advantages of a true logic over a set
of data-structure-manipulating programs should be obvious: one can write parsers and syntax checkers for the
language, formal semantics can be specified, inference
mechanisms can be verified to adhere to the semantics,
etc.
4.3 Attached Procedures
One of the more popular features of the early frame
systems was the ability to "attach" programs to pieces
of the data structures. The ultimate incarnation of
this idea was probably KRL [3], which had an elaborate process framework, including "servants," "demons,"
"traps," and "triggers." The program fragments could
be invoked at various times, and cause arbitrary computations to occur. KL-ONE had its own elaborate procedure attachment and invocation framework. However,
arbitrary access to LISP meant that KR systems with
this feature ceded control completely to the user-an at6The symbols ~ and == indicate a primitive concept specification and a defined concept specification, respectively. The
KL-ONE community has developed an algebraic notation that
includes operators like these for all constructs in CLASSIC and
related languages.
1040
So_
U.NIL)
(AND (AT-LEAST 1 sender)
(ALL sender PERSON)
(AT-LEAST 1 recipient)
(ALL recipient PERSON)
(AT-LEAST 1 body)
(AT-MOST 1 body)
(ALL body TEXT))
PRIVATE-MESSAGE ~ (AND MESSAGE
(AT-MOST 1 recipient))
MESSAGE [;;;
Figure 3: CLASSIC Expressions and KL-ONE Diagrams (adapted from [11]).
tached procedure could alter any data structure in any
way at any time. The semantics of KL-ONE networks
and other frame systems thus became very hazy once
attached procedures were utilized.
In CLASSIC, we have invented an important way to
control the use of such "escape hatches." Through the
notion of the TEST-C and TEST~H constructors, we
have isolated the use of procedures in the host language
to testing predicates. As one can see from the grammar, such concepts are treated syntactically uniformly
with other concepts. The procedure simply provides a
primitive sufficiency condition for the concept-it will
be invoked only when trying to recognize an instance.
These test functions are particularly useful when trying to relate individuals from the host language, such as
when two roles are filled with numbers, and one should
be a multiple of another. In their use, the user agrees to
avoid side-effects and to use only monotonic procedures
(i.e., those whose value never changes from true to false
or vice versa in the presence of purely monotonic updates). While under arbitrary circumstances, resorting
to program code for tests renders the semantics of the
language useless, in CLASSIC, if the user abides by this
"contract," the semantics of concepts with tests is manageable, and the inferences that the system draws are
still guaranteed to be sound. Indeed, tests work just like
other restrictions on concepts as far as classification of
individuals goes, but since the procedures are inscrutable
they have the flavor of primitive concepts. While primitive concepts allow primitive necessary conditions, tests
give us primitive sufficient conditions.
Another innovation in CLASSIC is the requirement that
the test functions must be 3-valued. If a system like
CLASSIC says that an individual does not satisfy a concept, then that means only that it cannot be currently
proven to do so. A complementary question can still
be asked-whether it can be proven that the individual
could never satisfy the description (i.e., that it is disjoint
from the concept). For example, if Fred has exactly one
child (i.e., (AND (AT-LEAST 1 child) (AT-MOST
1 child))), but nothing is known about it yet, then he
cannot be proven to satisfy the description (ALL child
FEMALE) . But it is possible that at a later time he could
be, if he were stated to have a known female child. On
the other hand, if it were asserted that his child was
Barney, who was known to be a MALE, and MALE and
FEMALE were disjoint concepts, then it would be provable
that Fred could never satisfy the description. Thus, in
order to fit into the classification framework, procedural
tests must provide the same facility-to differentiate between a guarantee never to satisfy a description and lack
of ability to prove it given the current knowledge base.
4.4 Definitions, Assertions, Individuals
As mentioned, KL-ONE ultimately distinguished itself
from other frame languages by its emphasis on structured descriptions and their relationships, rather than on
contingent and typical facts. At one point in its development, the system was in a strange state: there were facilities for building complex concepts, but none for actually
using them to describe individual objects in the domain.
"Individual concepts" were KL-ONE's initial attempt to.
distinguish between generic class descriptions and descriptions that could apply only to single individuals. As
it turned out, these were typically misused: an individual
concept with two parent concepts could only really mean
a conjunctive description. One example that was used often was the conjunction of DRIVING- IN-MASSACHUSETTS
and HAZARDOUS-ACTIVITY, intended to express the fact
that driving in Massachusetts is hazardous. However, in
truth the concept including them both was just a compound concept with no assertional force at all.
While KL-ONE initially correctly distinguished between
the import of different links between concepts, it failed to
distinguish between those and a link that would make a
contingent assertion about some individual. Eventually
an alternative mechanism was proposed-the "nexus,"
to stand for an individual-but this was never really
used. In the end, it took the work on KRYPTON to
get this right. In KRYPTON, it was proposed that terminological knowledge (knowledge about the structure
of descriptions) and assertional knowledge (facts) 'are
two complementary aspects of knowledge representation
competence, and that they should be maintained by distinct components, with an appropriate logical connection
between them. From this distinction arose the terms
"TBox" and "ABox," which are used extensively in the
KL-ONE community to refer to the two components.
But KRYPTON went too far in another direction, integrating an entire first-order logic theorem-prover as its
assertional component. The CLASSIC system makes what
we think is a better compromise: it has a limited objectcentered logic that properly relates descriptions and individuals. As is apparent from the grammar, CLASSIC
treats assertions about individuals in a parallel and uniform manner with its treatment of the formation of subconcepts; but it also carefully distinguishes the logical
meaning of the different relationships. Thus, for example, while individuals can be used in concept .value
restrictions (i.e., in a ONE-OF expression, e.g., (ALL
wine-color (ONE-OF Red White Blush))), no contingent property of an individual can be used in determining subsumption between two concepts (e.g., if Whi te
just happens to be my favorite color for a wine, that fact
cannot be used in any subsumption inference).
1041
As mentioned, CLASSIC also supports the propagation of information between individuals. If we assert
that some individual is described by a complex description (e.g., that Rebecca is a PERSON whose mother is
a DOCTOR), then that may imply some new properties
about other related individuals (e.g., we should assert
that Rebecca's mother, if known, is a DOCTOR). Such
propagated properties can in turn cause other properties
to propagate (e.g., that Rebecca's mother's office is
a DOCTOR' S-OFFICE).7 This type of inference was never
handled in KL-ONE, and only partially handled in some
of its successors. Note that as soon as a property propagates from one individual to another, the latter individual might now fall under some new descriptions. CLASSIC takes care of this re-classification inference as well
(as well as any further propagations that result, etc.).
The CLASSIC system has two other features along these
lines that distinguish it from its predecessors. First,
the previously mentioned apparatus does not allow the
expression of general contingent rules about individuals. Thus, given only what is in the CLASSIC concept
grammar, while we could form the concept of, for example, a LATE-HARVEST-WINE, we could not assert that all
LATE-HARVEST-WINEs are SWEET-WINEs. The sweetness
is a derivative property-it is not part of the meaning
of LATE-HARVEST-WINE, but rather a simple contingent
property of such wines. In CLASSIC, one can also express
general rules of a simple form. A rule has a named concept as the left-hand side, with an applicability condition
(filter) that limits the rule's firing to the desired sub cases
(i.e., if x is a with property , then
x is a < conceptb > ). These rules are used only in reasoning about individuals, and do not affect subsumption
relationships. 8
Most KL-ONE-like systems were unclear about the status of individuals that could easily be expressed in the
host implementation language (i.e., numbers and strings
in LISP). CLASSIC integrates such individuals in a simple and uniform way, and makes it virtually transparent
whether an individual is implemented directly in the host
language, or in the normal complex structure for CLASSIC
individuals. This aspect of CLASSIC has proven critical in
applications that deal with real data (for example, from
a database), as in [29].
4.5 KR and Computational
Complexity
Once it was apparent that the clearly defined logical relationship of subsumption was central to the KL-ONE
family, a new factor could be introduced to the analysis of frame- based knowledge representation systems. In
1984, Brachman and Levesque gave a formal analysis of
the complexity of computing subsumption in some frame
languages [10]. That analysis showed that the apparent
7In order to keep the complexity down, CLASSIC only
Thus, if
propagates properties to known individuals.
Rebecca's mother were unknown, the system would not attempt to create an individual about which to assert the
DOCTOR description. If it did, it would then have to do very
complex reasoning about existentials.
8S ome of the newer KL-oNE-derivatives, such as LOOM,
have developed similar rule mechanisms.
simplicity of some frame languages could be deceptive,
and that the crucial subsumption inference was co-NPhard. The original paper initiated a sequence of results
on the complexity of computing in the KL-ONE family,
culminating most recently in two that show that the original language is in fact undecidable [24, 28].
This line of analysis has caused some major rethinking
of the knowledge representation enterprise. No longer
can we view language features as simply providing more
expressiveness (which was the common view in the early
years of knowledge representation). Rather, as in other
areas of computer science, we must consider how expensive it will be to add a feature to a language. The addition of new features may demand the excision of some
others in order to maintain computational manageability, or the system must be clear on where it is incomplete.
In CLASSIC, subsumption is complete and tractable, but
with respect to a slightly non-standard semantics; that
is, it is clear what CLASSIC computes, and how fast it can
compute it, but it does not compute all the standard logical consequences of a knowledge base. In this regard, we
have opted for a less conservative approach than in KANDOR, but a more limited and disciplined approach than
in LOOM. The consequences of this are explored briefly
in the next section. We should point out that the viability of our approach has been proven in practice: CLASSIC
is the first KL-ONE-derived system to be deployed in a
fielded (AT&T proprietary) product, used every day in
critical business operations. It was expressive enough to
do the job.
4.6 The Role of a KR System
The above developments in the KL-ONE saga give rise to
an important general question that usually goes unasked
in AI: what role is a knowledge representation system
expected to play? There are clearly different approaches
here. On one extreme we have the large commercial systems, or expert system shells, which include substantial
knowledge representation apparatus. The philosophy of
those systems seems to be that a KR system should provide whatever apparatus is necessary to support virtually any AI application. In that regard, such systems are
like very powerful programming languages, with complex
data-structuring facilities.
But this is definitely not the only approach, and in
many respects its requirements are overly demanding.
Given the kind of complexity results mentioned above,
users of such powerful systems must be very careful in
"programming" their KR tools: predicting when a computation will return is difficult or impossible in a very
expressive logic.
In many contexts (but not all, of course), it may be appropriate for a knowledge representation system to act
in a more constrained fashion, rather like the database
component of an application system. This is the point
of view explicitly espoused in CLASSIC. Users cannot expect to program arbitrary computations in CLASSIC, but
in return they get predictable response time and clear
semantics. The burden of programming an application,
such as a medical diagnostician, must be placed on some
other component of the overall system. Since most KR
systems attempt to be application-independent, it is ap-
1042
propriate for them not to be asked to provide general diagnostic, planning, or natural language-specific support.
What is gained in return for certain limitations (and this
in part accounts for the appeal of databases) is a system
that is both complete with respect to an intuitive and
simple semantic model and efficient to use.
Failure to acknowledge this general issue has been a
source of difficulty with knowledge representation systems in AI. KL-ONE, uniformly with its contemporary
KR systems (and subsequently NIKL), never really took
a stand as to the role it should play. This has resulted,
for example, in a pair of recent critiques of NIKL [15, 30],
for failing to live up to a promise it perhaps was never
intended to make. With CLASSIC, on the other hand,
we expect to provide a powerful database service, but
with limited deductive and programming support. This
is a unique kind of database service, as it is both deductive and object-oriented (see [5]). But nevertheless
it is firmly limited. To use the CLASSIC system in the
context of an expert system, for example, it would be
appropriate to use it as a substitute for working memory
in a rule-based programming system like OPS5, not for
all computation to be done by the overall system. Several recent applications ([14], [29], [23], and others) have
shown convincingly that this approach, while not satisfying all needs for all applications, is quite successful in
important cases.
5 Perspective
While CLASSIC is a "KL-ONE-like" system, it differs in
so many ways from the original that it must be treated
in its own right. While KL-ONE began the thinking on
numerous key issues, it has taken us until CLASSIC to
begin to truly understand many of them. Among its
virtues, the CLASSIC Knowledge Representation System
• isolates an important set of language constructs, distilled from many years of use of frame representations,
and knits them together in an elegant, straightforward
language with a compositional interpretation; novel
language features include enumerated sets of individuals treated in a uniform manner with other concepts
(ONE-OF), and limited generic equalities between role
fillers (SAME-AS);
• treats individuals in a more complete way than its
predecessors, supporting propagation of facts and reclassification of individuals;
• allows contingent universal rules that are automatically applied, with the affected individuals being reclassified and any derived facts being propagated;
• offers tight, uniform integration of individuals from
the host language, including numeric range concepts
(MAX, MIN);
• offers a facility for writing procedural 3-valued tests
as primitive sufficiency conditions, and integrates such
tests into the language and semantics in a clean way. 9
9CLASSIC also allows retraction of any asserted fact, with
full dependency maintenance, but we have not had room to
discuss this here.
CLASSIC offers these facilities in the context of complete
computation of subsumption, while remaining computationally tractable. The CLASSIC system can be thought
of as a limited, deductive, object-oriented database management system as well as a knowledge representation
system, and has been used to support several real-world
applications. 1o
.
In this discussion, we have limited ourselves to considering the KL-ONE family and its contributions. Related
work involving manipulation of types and their relations
can be found in programming language research, in some
semantic data modeling work, and in feature logics in
support of (among other things) natural language processing. We do not have room to draw comparisons with
this other work, but in general it is clear that the bulk of
that work does not include classification and descriptionprocessing of the sort found so prevalently in KL-ONE-like
systems. Recent work in some of these areas does bear
a strong relationship to ours, but not by accident: work
on KL-ONE and its descendants has had direct influence,
for example, on LOGIN [lJ (a programming language),
CANDIDE [2J (a DBMS), and feature logics [21J.
There are still, of course, many open questions yet
to challenge CLASSIC and its relatives. Technically, the
notion of a "structural description," introduced by KLONE, has still not been treated adequately (although the
SAME-AS construct provides a limited form of relationship between roles). And there are important computational questions to be answered so that CLASSIC can
handle significant-sized databases, involving persistence
of KB's, automatic loading of data from conventional
DBMS's, and complex query processing.
But perhaps chief among the remaining research questions is how exactly to cope with the tradeoff we are
forced to make between expressive power and computational tractability. Is it even possible to provide the
kind of knowledge representation and inference services
demanded by AI applications in a computationally manageable way? The CLASSIC Knowledge Representation
System has provided convincing evidence that this is possible at least for a limited set of applications, but it is
but one point in a large space of possibilities that we
are still mapping out, after more than a dozen years of
research inspired by KL-ONE.
References
[1] Alt-Kaci, H., and Nasr, R. "LOGIN: A Logic Programming Language with Built-in Inheritance," Journal of
Logic Programming, 3:187-215, 1986.
[2] Beck, H. W., Gala, S. K., and Navathe, S. B. "Classification as a Query Processing Technique in the CANDIDE Data Model," Pmc. Fifth Intl. Conf. on Data
Engineering, Los Angeles, 1989, pp. 572-58l.
[3] Bobrow, D. G., and Winograd, T. A., "KRL, A
Knowledge Representation Language," Cognitive Science 1(1), 1977, pp. 3-46.
lOOne testimony to the success of CLASSIC'S clean and simple approach is the fact that a group from the University of
Calgary has simply picked up a written description of the system and quickly implemented their own version as a c++
library to support their work in knowledge acquisition [16].
1043
[4] Borgida, A., and Patel-Schneider, P. F., "A Semantics and Complete Algorithm for Subsurnption in the
CLASSIC Description Logic," unpublished manuscript,
AT&T Bell Laboratories, Murray Hill, NJ, 1992. Submitted for publication.
[5] Borgida, A., Brachman, R. J., McGuinness, D. L., and
Resnick, L. A., "CLASSIC: A Structural Data Model
for Objects," Proc. 1989 ACM SIGMOD Intl. Conf. on
Management of Data, Portland, Oregon, June, 1989,
pp. 59-67.
[6] Brachman, R. J., "What's in a Concept: Structural
Foundations for Semantic Networks," Intl. Journal of
Man-Machine Studies, 9(2), 1977, pp. 127-152.
[7] Brachman, R. J., "A Structural Paradigm for Representing Knowledge," Ph.D. Thesis, Harvard University,
Division of Engineering and Applied Physics, 1977. Revised as BBN Report No. 3605, Bolt Beranek and Newman, Inc., Cambridge, MA, May, 1978.
[8] Brachman, R. J., "On the Epistemological Status of
Semantic Networks." In Associative Networks: Representation and Use of Knowledge by Computers. N. V.
Findler (ed.). New York: Academic Press, 1979, pp. 350.
[9] Brachman, R. J., " 'I Lied about the Trees,' or, Defaults and Definitions in Knowledge Representation,"
AI Magazine, Vol. 6, No.3, Fall, 1985.
[10] Brachman, R. J., and Levesque, H. J., "The Tractability of Subsurnption in Frame-Based Description Languages," Proc. AAAI-84, Austin, TX, August, 1984,
pp.34-37.
[11] Brachman, R. J., and Schmolze, J. G., "An Overview
of the KL-ONE Knowledge Representation System,"
Cognitive Science, 9(2), April-June, 1985, pp. 171-216.
[12] Brachman, R. J., Fikes, R. E., and Levesque, H. J.,
"Krypton: A Functional Approach to Knowledge Representation," IEEE Computer, Vol. 16, No. 10, October, 1983, pp. 67-73.
[13] Brachman, R. J., McGuinness, D. L., Patel-Schneider,
P. F., Resnick, L. Alperin, and Borgida, A. "Living
with CLASSIC: How and When to Use a KL-ONElike Language." In Principles of Semantic Networks. J.
Sowa (ed.). San Mateo, CA: Morgan Kaufmann, 1991,
pp. 401-456.
[14] Devanbu, P., Brachman, R. J., Selfridge, P. G. and Ballard, B. W., "LaSSIE: A Knowledge-Based Software Information System," CACM, Vol. 34, No.5, May, 1991,
pp.34-49.
[15] Doyle, J., and Patil, R. S., "Two Theses of Knowledge Representation: Language Restrictions, Taxonomic Classification, and the Utility of Representation
Services," Artificial Intelligence, Vol. 48, No.3, April,
1991, pp. 261-297.
[16] Gaines, B. R., "Empirical Investigation of Knowledge
Representation Servers: Design Issues and Applications Experience with KRS," SIGART Bulletin, Vol. 2,
No.3, pp. 45-56.
[17] Kaczmarek, T. S., Bates, R., and Robins, G., "Recent
Developments in NIKL," Proc. AAAI-86, Philadelphia,
PA, 1986, pp. 978.,..985.
[18] Levesque, H. J., and Brachman, R. J., "Expressiveness and Tractability in Knowledge Representation and
Reasoning," Computational Intelligence, Vol. 3, No.2,
Spring, 1987, pp. 78-93.
[19] MacGregor, R. M., "A Deductive Pattern Matcher,"
Proc. AAAI-87, St. Paul, MN, pp. 403-408.
[20] Minsky, M., "A Framework for Representing Knowledge." In The Psychology of Computer Vision. P. H.
Winston (ed.). New York: McGraw-Hill Book Company, 1975, pp. 211-277.
[21] Nebel, B., and Smolka, G., "Attributive Description
Formalisms ... and the Rest ofthe World." In Text Understanding in LILOG. O. Herzog and C.-R. Rollinger
(eds.). Berlin: Springer-Verlag, 1991, pp. 439-452.
[22] Neches, R., Swartout, W. R., and Moore, J., "Enhanced Maintenance and Explanation of Expert Systems Through Explicit Models of Their Development,"
Proc. IEEE Workshop on Principles of KnowledgeBased Systems, Denver, CO, 1984, pp. 173-183.
[23] Nonnenmann, U., and Eddy, J. K., "Knowledge-Based
Functional Testing for Large Software Systems," Proc.
FGCS-92, Intl. Con! on Fifth Generation Computer
Systems, Tokyo, June, 1992.
[24] Patel-Schneider, P. F., "Undecidability of Subsumption
in NIKL," Artificial Intelligence, Vol. 39, No.2, June,
1989, pp. 263-272.
[25] Patel-Schneider, P. F., "A Four-Valued Semantics for
Terminological Logics," Artificial Intelligence, Vol. 38,
No.3, April, 1989, pp. 319-35l.
[26] Patel-Schneider, P. F., "Small can be Beautiful in
Knowledge Representation," Proc. IEEE Workshop on
Principles of Knowledge-Based Systems, Denver, CO,
December, 1984, pp. 11-16.
[27] Patel-Schneider, P. F., Brachman, R. J., and Levesque,
H. J., "ARGON: Knowledge Representation meets Information Retrieval," Proc. First Con! on Artificial Intelligence Applications, Denver, CO, December, 1984,
pp. 280-286.
[28] Schmidt-Schauss, M., "Subsumption in KL-ONE is Undecidable," Proc. KR '89: The First Intl. Conf. on
Principles of Knowledge Representation and Reasoning, Toronto, May, 1989, pp. 421-431.
[29] Selfridge, P. G., "Knowledge Representation Support
for a Software Information System," Proc. Seventh
IEEE Con! on Artificial Intelligence Applications, Miami Beach, FL, February, 1991, pp. 134-140.
[30] Smoliar, S. W., and Swartout, W., "A Report from
the Frontiers of Knowledge Representation," Technical
Report, USC Information Sciences Institute, Marina
del Rey, CA, 1988.
[31] Vilain, M., "The Restricted Language Architecture of
a Hybrid Representation System," Proc. IJCAI-85, Los
Angeles, 1985, pp. 547-55l.
[32] von Luck, K., Nebel, B., Peltason, C., and Schmiedel,
A., "The Anatomy of the BACK System," KIT Report
41, Technical University of Berlin, January, 1987.
[33] Woods, W. A., "What's in a Link: Foundations for Semantic Networks." In Representation and Understanding: Studies in Cognitive Science. D. G. Bobrow and
A. M. Collins (eds.). New York: Academic Press, 1975,
pp. 35-82.
[34] Woods, W. A., and Schmolze, J. G., "The KL-ONE
Family," to appear in Computers and Mathematics with
Applications, Special Issue on Semantic Networks in
Artificial Intelligence.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by rCOT. © ICOT, 1992
1044
Morphe: A Constraint-Based Object-Oriented Language
Supporting Situated Knowledge
Shigeru Watari, Yasuaki Honda, and Mario Tokoro*
e-mail: {watari.honda.mario}@csl.sony.co.jp
Sony Computer Science Laboratory Inc.
3-14-13 Higashi-Gotanda, Shinagawa-ku, Tokyo 141, Japan
Abstract
This article introduces Morphe, a programming language
aimed to support construction of open systems. In open
systems, the programmer cannot completely anticipate
the future use of his programs as components of new
environments. When independently developed systems
are integrated into an open system, we eventually have
inconsistent representations of the same object. This is
because knowledge about the world is partial and relative
to a perspective. We show how Morphe treats relative
(and eventually inconsistent) knowledge by incorporating the notions of situations and perspectives.
1
Introduction
In modeling complex systems, one is often required to
work with mUltiple representations of some aspects of
reality. The notion of situation has been studied in computer science [Barwise 83][Barwise 89][Cooper 90] as an
important concept in capturing the relative representation of knowledge about the world. The importance of
such a notion stems from the epistemological assumption that any representation of the world is partial and
relative to some perspective-that of the observer. In
the cognitive process, the observer abstracts from reality
only those aspects that he finds relevant; irrelevant portions are discarded. Sometimes this limited, abstracted
representation is sufficient to allow one to perform certain tasks. In such cases we do not Heed to think about
relative perspectives, and we can work as though our
knowledge were an absolute and unique mapping of the
real world. However, there are plenty of examples that
show this is not true. In order to understand what is
happening in the target world, we are forced to assume
* Also with Keio Uuiversity. 3-14-1 Hiyoshi, Kohoku-ku. Yokohama, 223 JAPAN e-mail: lllariocQlkeio.ac.jp
that the representation we are working with is relative,
and furthermore, that we must eventually change perspectives in order to capture the real properties of the
system we are representing. This is often the case when
we have ambiguous representations and we are not able
to resolve this ambiguity until we have some further information at hand.
Typically ambiguity arises when we try to combine
information from different sources. For example, in dialogue understanding the knowledge of the one must
be combined with the knowledge of the other to capture the exact meaning of an utterance [Numaoka 90].
Whenever there is some inconsistent information, the
speakers must exchange further information in order
to resolve the inconsistency. Other examples can be
seen in multi-agent systems [Bond 88][Osawa 9I]-where
we have different agents with different knowledge bases
that must be partially shared-and versioning systems
as used in software development tools and engineering
databases [Katz 90]-where we have different versions of
the same object. A ground for extensive use of the notion of situation is in open systems [Hewitt 84], because
in open systems the designer of a program cannot know
a priori the nature of the environments in which their
pieces of knowledge (called objects henceforth) will be
used in the future. Along with its continuous evolution,
an open system must be capable of integrating pieces of
knowledge from different sources, and eventually these
new pieces will conflict with existing ones.
In this paper we formalize the notion of situation as
embedded in Morphe, a knowledge base and programming system which supports construction of open systems. Situation in Morphe is associated with a general
notion of environment of interpretation. It represents a
consistent set of properties (described by formulas) in a
multi-version knowledge base. Rather than being a mere
name for a part of the absolute real world, a situation
has its own representation in Morphe, namely a routed,
1045
directed, acyclic, and colored graph.
The notion of situation provides for two novel concepts: compositional adaptation and situated polymorphic objects. With compositional adaptation component
objects are grouped within composite objects so that a
component object is made to adapt to the requirements
of the environment represented by the composite object.
Situated polymorphic objects are objects that have multiple representations which depend on the situation they
are used in. Situation is used to disambiguate the ambivalent interpretation of situated polymorphic objects.
The remainder of this paper is organized as follows:
Section 2 gives an overview of Morphe's features through
some examples. Morphe's formal syntax and semantics
are sketched in Section 3 and Section 4, respectively.
In this work we concentrate on the data modeling aspect of Morphe. Some important features (such as setvalued attributes, distinction between local and sharable
attributes, user-defined constraints, and dynamic generation of new situations at update transactions) were not
treated in the presentation for the sake of brevity and
clarity. In Section 4 we give emphasis in showing how
the domain of colored dags fits well to representing different perspectives to a shared object. In the last section
we conclude this work.
in that in Morphe even in a single knowledge base version we can have different object versions. The programmer can chose a particular version of the knowledge base through situation descriptors-formulas that
index terms-which can be used within programs or in
queries. In the development phase of a system, Morphe
keeps track of transaction updates and creates consistent
versions of the knowledge base. 1
2.1
We will represent Sony CSL, a computer science laboratory, where Mario works as a director. We know that
a representation of Mario already exists in the system
and we want to share that representation. The existing
representation is of Mario as a professor at an university.
1. person: [
2.
3.
2
Overview of Morphe
Morphe is a programming language which integrates
object-oriented programming, constraint-based logic
programming, and situated programming. It features:
• Querying capability for knowledge bases,
• Incremental construction of systems with inheritance and adaptive reuse of existent software,
• Multiple representations,
• Treatment of inconsistent knowledge through the
notion of situation.
The basic aim of Morphe is to provide a system that
supports easy construction of open information systems.
There are two areas of support that are essential:
1. Easy integration of new pieces of knowledge, and
2. Treatment of shared inconsistent knowledge.
The Morphe system is a multi-version knowledge base
with multi-versioned objects. We use the term multiversion knowledge base following the notion of multiversion databases as introduced by Cellary and Jomier in
[Cellary 90]. Our approach differs from Cellary-Jomier's
Example: Mario Joins Sony CSL
4.
5.
name : string;
age : integer;
sex : {male, female};
age ~ 0];
laboratory : [
name : string;
director : person;
researcher :: person];
mario: person * [
name : "Mario";
age: 44];
scsI : laboratory * [
name : "Sony CSL";
director : person * [
machine: "NEWS"]];
scsl.director = mario.
The first two expressions define the types for person
and laboratory, and expressions 3 and 4 define mario
and scsI as "instances" of person and laboratory, respectively. Expression 5 makes mario join scsI as its
director.
Objects in Morphe are typed. For example, the expressions name: string and age: integer specify that
the name of a person has type string and the age of
a person has type integer. String and integer are
primitive types provided in Morphe. The colon in those
expressions represents a built-in predicate that specifies
the type of the term on its left-hand side. Another builtin predicate is the one represented by the equal sign,
as in director = mario, which specifies that director
IThe operational aspects of manipulating situations are not emphasized in this work. Instead we will emphasize the declarative
(or modeling) aspects of objects and situations.
1046
and mario should have the same type. Expressions comprising these built-in predicates are called formulas or
constraints. 2
We can also construct complex types from primitive
ones through object descriptors. An object descriptor
is a set of formulas enclosed in brackets ("[]"). In the
example, the expression person: [.. :] introduces a new
type named person defined by the object descriptor on
the right hand side of the colon.
As in unification grammar formalisms [Shieber 86]
and some logic based programming languages [Kifer 89]
[Yokota 92], Morphe does riot make a distinction between
classes and instances. Strictly speaking, every expression in Morphe is a type expression, and the execution
of a Morphe program consists of finding the appropriate
types for the variables, or in other words, solving the
set of type constraints. Morphe provides domain specific constraint solvers and allows users to define predicates for new domains, as the predicate ;::: in the expression age ;::: o. In this article we concentrate on showing
how Morphe treats the notions of situations and polymorphic objects, leaving the discussion of other forms of constraints for another paper. Expressions using the colon
predicate resemble attribute-value pairs of feature structure grammars and hence we sometimes refer, though
improperly, to terms on the left-hand side of the colon
operator as attributes and those on the right-hand side
as values.
Besides object descriptors, there is another type of
constructor: braces ("{}"). While object descriptors
construct types intensionally, from formulas, braces construct types extensionally, from terms. For exam pIe,
the expression sex: {male, female} specifies that the
attribute sex of a person has type male or female.
Stated in another way, the same expression defines a
new type person. sex as a set of two constant types
{male, female}.
A type can be made more and more specific as we
add more restrictive constraints (formulas) into the associated object descriptor, and it becomes an "instance"
when all the attributes are assigned constant types. In
the code above, scsI is an instance of laboratory because the formulas in the object descriptor of the former are more restrictive than those in the object descriptor of the latter. Because all terms are types, even
scsI, which is an "instance", can be made more specific by adding more formulas into its object descriptor. The way to do so is by composing object descrip2The term "constraint" used here follows the terminology of
constraint logic programming framework as formalized by Jaffar
and Lassez in [Jaffar 87J.
tors through C C *' ,, the composition operator. The code
which defines mario composes the type person with the
object descriptor [name: mario j age: 44]. The resulting object descriptor contains all the formulas of both
operand types. The constraint solver then evaluates the
most specific set of formulas in the resulting object descript or , yielding [name: "Mario" j age: 44] as the type
of mario. Determining the most specific sets of formulas is the same as determining the greatest lower bound
of a set of terms. The associated procedure for determining the greatest lower bound is called unification,
following the terminology of feature-structure grammar
formalisms [Shieber 86].
2.2
Compositional Adaptation
With composition we can refine a type by giving more
specific "values" for the attributes-as in mario aboveor we can add new properties to an existing type. The
type laboratory. director in the example is defined as
a person plus an additional attribute: machine. Morphe
allows for creating new types in a very particular way.
The type director is defined in a specific context: scsI.
This is an essential aspect of what we call compositional
adaptation[Honda 92].
With compositional adaptation we make an object
"adapt" to a new environment by transforming the object so that it obeys the type constraints specified in the
environment. This process takes place when the predicate C C =" is evaluated. When the expression director
= mario is evaluated, it either succeeds or fails. If it succeeds, the object denoted by scsI. director is unified
with the object denoted by mario, and the result of the
unification can be accessed from both scsl.director
and mario. 3 The object enters a new environment "acquiring" new properties and constraints. In the example, mario acquires the additional attribute machine as
specified in the environment scsI, and scsI. director
acquires all the original properties of mario.
2.3
Situated Polymorphic Objects
In programming languages, the term polymorphism has
been traditionally associated with the capability of giving different things the same name. Morphe's notion of
polymorphism follows in the same vein. In Morphe the
3The full version of Morphe allows programmers to specify
which components of the type are private (i.e., local) and which
are public (i.e., sharable). The public part of two objects must be
compatible for the unification to succeed, while the private part is
not affected in the unification.
1047
same object can have different versions, eventually incompatible with each other. Incompatible versions of an
object are called morphes, and objects that have multiple
morphes are called polymorphic objects.
By incompatibility of morphes we mean incompatibility of their types. 4 Different morphes of the same (polymorphic) object may fundamentally mean two things: 1)
different states due to updates, or 2) different representations due to different perspectives. Each morphe of
a polymorphic object is situated. The evaluation of a
polymorphic object is the evaluation of a morphe, the
selection of which is subordinated to the selection of a
situation where the object participates.
Each morphe is a consistent set of constraints that describe the behavior of the object in a given situation.
For instance, a person may exhibit different and eventually contradictory behavior depending on the situation
in which he acts. Inconsistent sets of constraints yield
different values to be assigned to the same attribute. For
example, suppose that the definition of mario, instead of
that given in expression 3, had been: mario: person *
[name: ' 'Mario' '; birthyear : 1947; sex : male;
machine: ' 'Mac' ']; After mario joins scsi, the attribute machine of mario is assigned the value' 'News' )
when he plays his role as scsi. director and a different
value-' 'Mac) )- in other situations.
2.4
Specifying a Situation
Morphe's notion of situation is tied to the notion of
environment of interpretation. In the domain of interpretation, a situation is a graph representing the program being interpreted. Situations are used to disambiguate inconsistencies in the knowledge base. When an
object participates in different environments (eventually
created by independent programs) and is subject to independent transformations, it is often the case that the
object must behave differently in each of them. Once
the programmer wants a different view (or representation) for the object, the system creates a new version
of the object in such a way that the situation is kept
consistent.
When evaluating an expression within a situation, the
system keeps track of the path through which the object
containing that expression is being accessed. Access to
an object from different perspectives is realized as different paths to the object. A path is a sequence of labels
that allows one to navigate through the entire system,
4Informally, incompatible types means that the values of a type
cannot be the values of the other. We give a formal definition of
type incompatibility in the next section.
along the arcs in the graph. For example, if we want
to refer to Mario when he plays his role of a director
at SCSL we use the path scsi. director. Paths can
be combined with formulas which filters the morphes
of an object referred from the same path. For example, if we had several versions of Mario distinguished
according to his age, we could access the representation of Mario at Sony CSL when he was at the age
of 40 by using the expression: scsl.directorCO[age =
40] . We can also change the perspective by switching the path in the navigation. For example, we can
switch the view from mario to scsl.director with
the path mario i scsl.director, which gives us the
representation of mario from scsi. director's perspective.
3
Syntax
The alphabet of Morphe consists of: 1) A: a set of atoms,
2) L: a set of labels, 3) X: an infinite set of variables, 4)
the distinguished predicate symbols: ":" (colon) and "="
(equal), 5) the composition operator "*", 6) the logical
connective "j", 7) the path constructors: ".", "j", and
"@"j 8) the auxiliary symbols "( )", "[ 1", "{ }", ",", and
""
Atoms denote primitive indivisible objects. Example
atoms are: integer, string, 3, and "Mary'). Labels
are the names of the objects. The distinguished label
Home denotes the topmost object in a particular situation. 5 In the semantic domain, the label names an arc
which allows access to the objects down the (directed)
graph.
3.1
Terms (7)
Objects are denoted by terms. Terms are defined by:
7
::=
x I a I p I [Ill
7
*7
where x are variables, a are atoms, p are paths, I are
formulas, and 7 * 7 are compositions.
The terms of the form [fl are called object descriptors.
Object descriptors construct complex objects through
formulas, which are defined by:
I
::= p : 7
I7
=
71
Ij I
A colon predicate is a typing constraint. An expression
e : t, where e is a path and t is a term, specifies that the
5 Typically, the object denoted by Home represents the user's
"home object", which' is the user's entry-point into the Morphe
system.
1048
type of the object denoted by e has at least the properties
defined by t. For example, the formula mario: person
specifies that the mario has at least the properties specified by person.
The equal predicate specifies object sharing. Given el :
tl and e2 : t 2, where el and e2 are paths, the expression
el = e2 states that el and e2 denote the same object,
and hence they have equal types. The shared object is
"viewed" from different perspectives: any change to the
object performed from a perspective must be reflected
into other perspectives.
Because the atomic predicates colon (":") and equal
("=") impose a structure on the objects in the domain
of interpretation (Le., graphs), they are called structural
predicates, in contrast to other domain predicates and
user defined predicates. In this article we discuss only
the structural predicates and hence we call them simply
predicates.
A path names an object through a sequence of labels.
Paths are defined by:
p ::= lll.p I pip I p@[J]
where l are labels. When an object is polymorphic due
to different access paths, we select a morphe by the associated path. For example, in the subsystem:
a: [b: [c: X]id: [c: Y]ia.b = a.d]
the polymorphic value of c can be disambiguated through
the appropriate path: a.b.c : x, and a.d.c : y.
A path of the form PI i P2 is a path switch. It allows
one to view the same object from a different perspective.
For example, the value of a.b i d.c is y, instead of x.
A path of the form p@[J] is called a conditional path.
The formula enclosed in brackets on the right hand side
of the @ sign is called a situation descriptor, because it
specifies a family of situations which entail !. A conditional path has a meaning only in the situations where
the formula enclosed in the brackets is entailed. For notational convenience we write 1: {tl@[fl], t2@[f2]} instead of 1@[f 1 ] : tli 1@[f 2] : t2' Conditional paths are
used to select version morphes of polymorphic objects.
For example, given
Composition is a binary operation TxT --+ T which
composes two terms to produce a new term. Given
two terms tl and t2, their composition h * t2 is the
union of the formulas contained in both terms. For example, [name: "John"i age: "integer"] * [age: 23] ==
[name: "John"; age: integer; age: 23].
3.2
Ordering on Terms
We have seen that terms denote objects in the intended
domain, and formulas associate terms in order to represent complex structures in that domain. The colon
operator specifies the structure of the object denoted by
a given path. We can now amplify its use as a binary
predicate over two terms to construct a partial ordering
in the set of terms. We start with atoms. We assume
that the atoms in A are partially ordered according to
a binary relation represented by "~A'" For example:
"Mary" ~A string, and 3 ~A integer.
If x ~A y and y ~A X we say that x and yare congruent, and write x ~ A y. The greatest lower bound of a set
of elements B C A, denoted by 1 B is defined as usual:
1 B = in! E A such that Vx E B. in! ~A x.
For notational convenience, we will denote the greatest
lower bound of two atoms x and y by x 1 y. The greatest
lower bound does not always exist. The elements c of A
such that x : c implies x ~ A C are called the constants of
A.
We extend the partial ordering to the set of terms
with the binary relation ":", defined by the rules below.
In these rules, r is a set of formulas which defines a
situation.
r f- x : y (if x, yEA and x S:A y)
rf-t:O
r, (e : t) f-
e :t
r f- e@[4>] : t
r,4>f-e:t
a: [b: {X,Y}iC: {w@[b: x],v@[b: y]}]
where a, b, and c are labels and x, y, w, and v are
atoms, there are two possible values of a. c, which depend
on the possible values of b. The formulas b : Xi c : w)
and b : Yi c : v determine two distinct situations of a.
The value of a.c can be disambiguated by providing
an appropriate conditional path: a.b.c@[b: x] : w, and
a.b.c@[b : y] : v.
r f- tl : t~ ... r f- tn : t~
r f- [EI: tlj ... jln: tnj ... jlm: tmJ: [EI : t~j ... jln: t~J
rf-t:t
r f- tl : t2 r f- t2 : t3
r f- tl : t3
1049
The congruence relation on the set of terms is defined
by: x ~ y iff x : y and y : x. The operation! that gives
the greatest lower bound of a set of atoms is also extended to terms. The rules below describe U, the greatest lower bound of two terms, defined so that t1 U t2 : h
and t1 U t2 : t2.
[]ut~t
xUy~xly
[l : t] U [l : t']
[1 : (t u t')]
~
[h : t1] U [b : t2]
~
[h : t1j 12 : t2]
[h : t1j ... ;In : tn;li : ti; ... ;I~ : t~] U [11 : t~j ... ;In : t~jl{ :
t{; ... j Ii : ti] ~ [h : t1 U t~ j ••• j In : tn U t~j Ii : ti; ... j l~ :
t~jh :t~j ... jIn :t~jl{ :tL···jl{ :4]
t1 U t2
~
t2 U t1
t1 U (t2 u t3)
tut
~
~
4.2
Definition: Graph Morphism
A graph morphism I : 9
IN : Ng ---t Ngl and IA : Ag
g' is a pair of functions
Agl such that:
and I A preserve the incidence relations:
srcUA(a))
IN(src(a)) and tgtUA(a))
IN(tgt(a)),
U t3
t
U
t2 does not
Semantics
The formal semantics of Morphe is based on the algebraic
approach to graph grammars as described in [Ehrig 86]
and [Ehrig 90]. The domain of interpretation of Morphe
is a set of colored, rooted, directed, and acyclic graphs.
Following [ParisiPresicce 86]6, we impose a structur~ in
the coloring alphabet in order to represent unification in
that domain.
2.
I A preserve the arc colors:
Va E Ag. color:'UA(a)) = color:(a), and
3. Vx ENg. cOlor:'UN(X)) ~ color:(x).
A graph morphism indicates the occurrence of a graph
within another graph. A graph morphism I = UN, I A) is
called injective if both IN and I A are injective mappings,
and it is called surjective if both IN and I A are surjective.
If I : 9 ---t g' is injective and surjective it is called an
isomorphism, and there is also an inverse isomorphism
1-1 : g' ---t g. In this case we say that 9 and g' are
congruent and write 9 ~g g'.
4.3
4.1
---t
---t
1. IN
(t1 u t2)
Two terms t1 and t2 are incompatible iff h
exist.
4
color: : Ag ---t CA associates a color to each arc;
srcg : Ag ---t N g associates with each arc a unique source
node; tgt g : Ag ---t Ng associates with each arc a unique
target node; rootg is a distinguished node called the root
of the graph. It satisfies: tgt- 1(root g) = 0.
In what follows we refer to C-dags as graphs. A graph
9 is a subgraph of g' (written g ~g g') iff N g ~ Ngl,
Ag ~ Agl, and the functions color:, color:, srcg, and
tgtg are the restrictions of the corresponding mappings
of g'.
Definition: Colored Graphs
Let X be an infinite set of variables, A the set of atoms, L
the set oflabels (as introduced in Section 3), and 0 a set
of identifiers. Let C = (CN, CA) be a pair of alphabets
where CN = OuAuX and C A = L. The partial-order in
A, SA, is extended on C N (and denoted SN) such that
x SN y iff X SA y or y EX. A C-colored graph (or
C-dag, for short) is a graph 9 over C defined as a tuple
where: Ng is the set of nodes; Ag is the set of arcs;
color: : Ng ---t CN associates a color to each node;
6F. Parisi-Presicce, H. Ehrig, and U. Montanari allowed variables in graphs (and productions) so that they could represent
composition of graphs using relative unification. A. Corradini, U.
Montanari, F. Rossi, H. Ehrig, and M. Lowe [Corradini 90] further extended that work to represent general logic programs with
hypergraphs and graph productions.
Subsumption
Subsumption is an ordering on graphs which corresponds
to the relative specifity of their structures. A graph 9
subsumes h (h ~9 g) iff there exists a graph-morphism
I : 9 ---t h such that I(root g ) = rooth.
The semantic counterpart of the greatest upper bound
of a set of terms (ref. Section 3.2) is the join of two
graphs, which is their "most general unifier". The join
of graphs gl and g2 (notated gl Uc g2) is a graph h such
that h ~9 gl and h ~9 g2·
4.4
Semantic Structure
The semantic structure of Morphe is a tuple
A =< 9*,
~g, Ug, T
>
where:
1. 9*, the domain of interpretation, is the set of all
variable-free (Le., ground) C-dags.
1050
2. The relation
above.
~g
and the operation Ug are as defined
3. Top (T) is the distinguished element of g* defined
by: V9 E g*. 9 ~g T.
4.5
Interpretation
A consistent set of formulas is represented with a Cdag with variables. The C-dag representation of a set
of formulas is called a situation. A Morphe program is
mapped by the interpreter into a set of situations which
are ordered according to the subsumption relation. The
evaluation of a query is a mapping from the C-dag representing the query to the set of situations in the hierarchy.
If no situation is specified, the interpreter evaluates in a
default situation. While parsing its input, the interpreter
keeps track of this situation in order to resolve eventual
ambiguities.
Let Io: : A ~ eN be a function that maps each atom
in A to a node color in CN, and I>.. : L ~ CA another
function that maps each label to an arc color in CA'
Variable Assignment
A variable assignment in a situation s is a mapping fJ, :
X ~ g* which maps variables to ground C-dags. We
extend the variable assignment to other terms with the
following clauses:
• If a is an atom, fJ,(s,a) = 9 s.t. N g = {x},Ag =
0, and colorN(x) = Io:(a).
• If 1 is a label,
fJ,( s, l)
3a EA.,. colorA (a)
roots and tgt(a) = rootg.
9 ~g s s.t.
I>..(l) and src(a)
• If 1is a label, and e is a path, fJ,(s, l.e) = fJ,(fJ,(s, l), e).
• If e is a path and ¢ is a formula, fJ,( s, e@[¢]) = fJ,( s, e)
if s 1= ¢.
• fJ,(s, [4>])
= 9 ~g s s.t. 9 1= 4>.
Formulas
The "truthness" of a formula is relative to a specific situation. We say that a situation s models a formula ¢
under a variable assignment fJ, (written s 1=1' ¢) iff there
is a subgrapll of s with the properties specified by the
formula.
1=1' e : t iff fJ,(s, e) ~g fJ,(s, t).
s 1=1' el = e2 iff fJ,(s, el) ~g. fJ,(s, e2).
s 1=1' ¢j'IjJ iff S 1=1' ¢ and s 1=1' 'IjJ
5
Conclusion
This paper has shown how the notions of situation
and polymorphic objects in Morphe can handle situated
knowledge in open systems. We claim that the Morphe
features shown here are suited to support incremental
development of a complex system. When a set of constraints is added to a situation, the new formulas may
conflict with the old ones. Morphe helps the developer
to find the locus of inconsistency, and in the cases where
the programmer wants a new version of the system, Morphe splits the inconsistent situation into new subsituations whenever it is possible. Some meta-rules based on
domain-dependent heuristics may help the system to decide on which actions to take in the presence of conflict.
Syntactically, a situation was defined as a set of fOfllmlas which define a hierarchy of versions of the knowledge'
base. Situation descriptors can be used in programs in
order to specify a priori the family of situations in which
the program is expected to work. Once the system is provided with a way to determine the right situation, the
associated morphe can be selected and then passed to the
constraint solver in order to proceed with the evaluation
of the program or the query.
Most existing typed programming languages impose a
distinction between types and values syntactically, and
types are usually associated with the variables in order
to check whether the value assigned to a variable is compatible with the associated type. Morphe does not impose such a distinction at the syntactic level, though it
bears both the notions of "types" and "values". An equal
treatment of types and values was achieved in Morphe by
imposing a partial order on the set of terms. This partial
ordering was identified as the subsumption relation over
directed acyclic graphs in the domain of interpretation.
In this work we have shown only those features that
we find most interesting to capture the intuitive notion
of relative knowledge, perspective, and situations. Problems concerning changes of situations in the presence of
transaction updates, locality of information and sharing
(Le., unification), database querying facilities, and the
operational semantics were not treated here. We hope
however that the contents of this article have given the
readers an insight on the problems and solutions concerning relative representations of objects in open systems.
Acknowledgments
S
Sony Computer Science Laboratory has been a privileged
environment for discussing the problems and requirements of open distributed systems. Discussions with the
1051
other members of this laboratory have provided the underlying motivations for developing Morphe.. In particular, we wish to thank Ei-Ichi Osawa, for his collaboration
at the initial phase of Morphe, and Akikazu Takeuchi
and Chisato Numaoka for their helpful comments on the
formalisms presented in this work. Watari thanks the
members of Next-Generation Database Working Group
promoted by ICOT. Discussions in the group promoted
a better understanding of the requirements for advanced
data base programming languages.
References
[Barwise 83] Jon Barwise and John Perry.
and Attitudes. The MIT Press, 1983.
Situations
[Barwise 89] Jon Barwise. The Situation in Logic. Center for the Study of Language and Information, 1989.
[Hewitt 84] Carl Hewitt and Peter de Jong. Open
Systems. In J. Mylopoulos and J. W. Schmidt
M. L. Brodie, editors, On Conceptual Modeling,
Springer-Verlag, 1984.
[Honda 92] Yasuaki Honda, Shigeru Wat ari , and Mario
Tokoro. Compositional Adaptation: A New Method
for Constructing Software for Open-ended Systems.
JSSST Computer Software, Vo1.9, No.2, March 1992.
[Jaffar 87] Joxan Jaffar and Jean-Louis Lassez. Constraint Logic Programming. In Proceedings of the
Fourteenth ACM Symposium of the Principles of Programming Languages (POPL'87), January 1987.
[Katz 90] Randy H. Katz. Toward a Unified Framework
for Version Modeling in Engineering Databases. ACM
Computing Surveys, Vo1.22, No.4, December 1990.
[Bond 88] Alan H. Bond and Les Gasser, editors. Readings in Distributed Artificial Intelligence. Morgan
Kaufmann, 1988.
[Kifer 89] Michael Kifer and Georg Lausen. F-Logic: A
Higher-Order Language for Reasoning about Objects,
Inheritance, and Scheme. In Proceedings of the ACM
SIGMOD Conference on Management of Data, ACM,
1989.
[Cellary 90] Wojciech
Cellary
and
Genevieve
Jomier. Consistency of Versions in Object-Oriented
Databases. In Dennis McLeod, Ron Sacks-Davis, and
Hans Schek, editors, Proceedings of 16th International
Conference on Very Large Databases, August 1990.
[Numaoka 90] Chisato Numaoka and Mario Tokoro.
Conversation Among Situated Agents. In Proceedings
of the Tenth International Workshop on Distributed
Artificial Intelligence, October 1990.
[Cooper 90] Robin Cooper, Kuniaki Mukai, and John
Perry, editors. Situation Theory and its Applications
- Volume 1. Center for the Study of Language and
Information, 1990.
[Corradini 90] Andrea Corradini, Ugo Montanari,
Francesca Rossi, Hartmut Ehrig, and Michael Lowe.
Graph Grammars and Logic Programming. In Proc. of
the 4th International Workshop on Graph-Grammars
and Their Application to Computer Science, SpringerVerlag, March 1990.
[Ehrig 86] Hartmut Ehrig. Tutorial Introduction to the
Algebraic Approach of Graph Grammars. In Proc. of
the 3rd International Workshop on Graph- Grammars
and Their Application to Computer Science, SpringerVerlag, December 1986.
[Ehrig 90] Hartmut Ehrig and Michael Lowe Martin Korff. 'llitorial Introduction to the Algebraic Approach
of Graph Grammars Based on Double and Single
Pushouts. In Proc. of the 4th International Workshop
on Graph-Grammars and Their Application to Computer Science, Springer-Verlag, March 1990.
[Osawa 91] Ei-Ichi Osawa and Mario Tokoro. Collaborative Plan Construction for Multiagent Mutual Planning. Technical Report SCSL-TR-91-008, Sony Computer Science Laboratory, August 1991.
[ParisiPresicce 86] Francesco Parisi-Presicce, Hartmut
Ehrig,and Ugo Montanari. Graph Rewriting with
Unification and Composition. In Proc. of the 3rd International Workshop on Graph- Grammars and Their
Application to Computer Science, Springer-Verlag,
December 1986.
[Shieber 86] Stuart M. Shieber. An Introduction to
Unification-Based Approaches to Grammar. Center for
the Study of Language and Information, 1986.
[Yokota 92] Kazumasa Yokota and Hideki Yasukawa.
Towards an Integrated Knowledge Base Management
System. In Proceedings of the FGCS'92, ICOT, June
1992.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1052
On the Evolution of 'Objects
in a Logic Programming Framework
F. Nihan Kesim
Marek Sergot
Department of Computing, Imperial College
180 Queens Gate, London SW7 2BZ, UK
fnk@doc.ic.ac.uk, mjs@doc.ic.ac.uk
Abstract
The event calculus is a general approach to the representation of time and change in a logic programming
framework. We present here a variant which maintains a historical database of changing objects. We
begin by considering changes to the internal state of
an object, and the creation and deletion of objects.
We then present separately the modifications that are
necessary to support the mutation of objects, that is
to say, allowing objects to change class and internal structure without loss of identity. The aims are
twofold: to present the modified event calculus and
comment on its relative merits compared with the
standard versions; and to raise some general issues
about object-orientation in databases which do not
come to light if dynamic aspects are ignored.
1
Introduction
There has been considerable research on combining logic-based and object-oriented systems, and reasoning with complex objects. Many proposals have
been put forward for incorporating features of objectoriented systems into logic programming and deductive databases [Abiteboul and Grumbach 1988, Zaniolo 1985, Chen and Warren 1989, Kifer and Lausen
1989, Dalal and Gangopadhyay 1989, Maier 1986,
Bancilhon and Khoshafian 1986]. But opinions vary
widely as to what are the characteristic and beneficial features of objects and comparatively little attention has been given to the dynamic aspects of objects. Yet change in internal state of an object as it
evolves over time is often seen as a characteristic feature of object-oriented programming; and the ability
of object-oriented representations to cope gracefully
with change has often been cited as a major advantage of this style of representation. It is these dynamic aspects that we wish to address in this paper.
We are not concerned with object-oriented programming, but with object-oriented representation of data
in (deductive) databases. We address such problems
as how objects change state, how deletion and creation of objects can be described and how an evolving
object can change its class over time.
In order to avoid the discussion of destructive assignment, we formulate change in the context of a
historical database which stores all past states of objects in the database. Historical databases are logically simpler than snapshot databases because change
is then simply addition of new input. A snapshot of
the historical database at any given time is an objectoriented database in the sense that it supports an
object-based data model.
In this paper we present an object-based variant
of the event calculus [Kowalski and Sergot 1986] which
is a general approach to the treatment of time and
change within a logic programming framework. We
use this modified event calculus to describe changes
to objects. The objectives of this paper are twofold:
to present the object-based variant of the event calculus; and to raise some general issues about objectorientation in databases that we believe do not come
to light if dynamic aspects are ignored. These more
general points are touched upon in the course of the
presentation, and identified explicitly in the concluding section.
In the following section we give a brief summary
of the original event calculus. Section 3 presents the
basic data model that is supported by the objectbased variant. In section 4 we present this objectbased variant and discuss how it can be applied to
describe changes in objects. In section 5 we address
the mutation of objects, where objects are allowed to
change their classes during their evolution. We conclude the paper by summarising, and making some
remarks about object-based representations in general.
1053
2
The Event Calculus
holds_at(R, T) ;happens(Ev, Ts), Ts ::; T,
initiates(Ev, R),
. not broken(R, Ts, T).
We have omitted the clauses for holds_for which are
similar. The interpretation of not as negation by failure in the last condition for holds_at gives a form of
default persistence: property R is assumed to hold at
all times after its initiation by event Ev unless there
is information to the contrary.
The event calculus has been developed and ex.tended in various different ways (see for instance [Sripada 1991, Eshghi 1988]). But what is important for
present purposes is to stress that the underlying data
model in all of these applications is relational. The
properties that events initiate and terminate are facts
like rank(jim,professor). In database terms they are
tuples of relations; in logic programming terms they
are ground unit clauses or ground atoms or standard
first order terms, depending on what is taken as the
semantics of holds_at. A snapshot of the historical
database at any given time is a relational database.
In this paper we modify the event calculus in order
to describe changes to a database which supports an
object-oriented data model.
Before moving on to present this modification, we
wish to make one further remark about the representation of events. One of the most common motivations for introducing object-oriented extensions
to logic programming languages [Chen and Warren
1989, Ait-Kaci and Nasr 1986, M. KiferandWu 1990]
is to overcome the restrictions h:nposed by the fixed
arity of predicates and functors.
These restrictions are particularly evident in the representation
of events: Jim was promoted to professor in 1989,
Jim was promoted from lecturer, Jim was promoted
by his department in 1989 could all be descriptions
of the same promotion recording different amounts
of information about the event. In general, it is difficult or impossible to devise a fixed arity representation for events, because these representations cannot
cope gracefully with the range of descriptions that
can be expected even for events of the same type.
(The philosopher Kenny refers to this phenomenon
as the 'variable polyadicity' of events.) The standard way of dealing with 'variable polyadicity' is to
employ binary predicates. Thus [Kowalski and Sergot
1986] represents events in the following style:
event(e1).
act(e1, promote).
object(e1, jim).
newrank(e1, prof).
happens(e1,1989) .
broken(R, Ts, T) ;happens(Ev* , T*),
Ts < T* ::; T,
terminates(Ev*, R).
Chen and Warren [Chen and Warren 1989] have developed this usage of binary predicates and have given it
a formal basis. Their language C-Iogic allows the use
of structured terms which can be decomposed into
subparts. These terms are record-like tuples with
The primitives of the event calculus are events together with some kind of temporal ordering on them,
periods of time, and properties which are the facts and
relationships that change over time. Events initiate
and terminate periods of time for which properties
hold. The effects of each type of event are described
by specifying which properties they initiate and terminate. Given a set of events and the times at which
they occurred, the event calculus derives (computes)
which facts hold at which times. As an example,
consider a fragment of a departmental database. An
event of type
promote(X, New)
initiates a period of time for which employee X holds
rank New and terminates whatever rank X held at
the time of the promotion:
initiates(promote(X,New), rank(X,New)).
terminates(promote(X, New), rank(X,_)).
Given a fragment of data:
happens(promote(jim, assistant), 1986).
happens(promote(jim, lecturer), 1988).
happens(promote(jim,professor), 1991).
the event calculus computes answers to queries such
as :
?- holds_at(rank(jim,R), 1990).
R=lecturer
?- holds_for(rank(jim,lecturer), P).
P=1988-1991
The original presentation of the event calculus
showed how a computationally useful formulation can
be ·derived from general axioms about the properties
of periods. It gave particular attention to the case
where events (changes in the world) are not necessarily reported in the order in which they actually occur.
For the purpose of this paper, it is sufficient to consider only the simplest case, where the assimilation of
events into a database is assumed to keep step with
the occurrence of changes in the world, and where
the times of all event occurrences are known. Under
these simplifying assumptions, the event calculus can
be formulated thus:
1054
named labels. In the syntax of C-Iogic (also resembling the syntax of LOGIN [Ait-KaciandNasr 1986]
and O-logic [Maier 1986]) the event e1 can be described thus:
happens(event :, e1[act => promote, object => jim,
newrank => prof], 1989).
e1 is an identity which uniquely determines the event,
and the labels are used to complete the specification of the event. Chen and Warren give a semantics
to C-Iogic directly, and also by transformation to an
equivalent first-order (Prolog) formulation that uses
unary predicates for types and binary predicates for
attributes. In this paper we use C-Iogic syntax as a
convenient shorthand for describing events, and we
exploit C-Iogic's transformation to Prolog by mixing
C-Iogic and standard Prolog syntax freely. Thus we
shall also write, for example,
event:e1[act=>promote, object=> jim, newrank=?prof].
happens(e1,1989).
Chen and Warren's transformation to Prolog make
all of these formulations equivalent.
3
The Data Model
Our objective in this paper is to focus' attention on
the dynamic aspects of objects. For this purpose, we
take a very simple data model which exhibits only
the most basic features associated with object orientation. As will be illustrated, this simple data model
already raises a number of important problems; further elaborations of this data model are mentioned in
the concluding section of the paper.
The basic building block of the model is the concept of an object. An object corresponds to a real
world entity. Each object has a unique identity to distinguish it from other objects. The objects have attributes whose values can be other objects (Le. their
identities). We assume that all attributes are singlevalued.
Objects are organized into class hierarchies, defined
explicitly by is_a relationships among classes. A class
denotes a set of object identities; the class-subclass
relation (is_a) is the subset relation. A class describes
the internal structure (state) of its instances by attribute names. The state of an instance is determined
by the values assigned to these attributes. A subclass
inherits the structure (attribute names) of its superclass(es). As an example consider the following class
hierarchy: '
person
(attributes:name, address)
/""-
student
employee
(attributes:section, supervisor) (attributes: dept, rank)
The instances of the class student have the internal
structure described by the attributes name, address,
section and supervisor. Similarly the state of an employee instance is described by name, address, dept
and rank. The class hierarchy is represented by is_a
relations as:
is_a(student, person).
is_a(employee, person).
The relation between a class and its instances is represented by the instance_ofrelation. The instances of
a class C are also instances of the superclasses of C.
The instance_of relation can be represented thus:
instancLof(tom, student).
instance_of(mary, employee). etc.
together with
instance_of(X, Class) +is_a(Sub, Class), instance_of(X, Sub).
These definitions will be adjusted in later sections
when we consider time dependent behaviour.
Multiple inheritance without overriding can be expressed by the is_a and instance_of relationships.
This type of multiple inheritance causes no additional
difficulty and is not mentioned again.
4
Object-Based Event Calculus
Database applications require an ability to model a
changing world. The world changes as a result of the
occurrence of events and hence it is very natural to
describe such a changing world using a description of
events. Given a description of events, it is possible to
construct the state of the world using the the event
calculus.
4.1
State, Changes
One way of dealing with the evolution of an object
over time (as suggested to us by several groups, independently) is to view the changing object as a collection of different though related objects. Thus, if we
have an employee object jim in the database, which
changes over time, jim at time ti, jim at time t2, jim
at time ts are all different objects. Their common
time-independent attributes are inherited from jim
by some kind of 'part_of' mechanism. This approach
has a certain appeal, but a moment's thought reveals
it must be rejected for practical reasons. Each time
an object is modified a new object is created. This
new object becomes the most recent state of the object with a different identity. In this case, all other
objects referring to the modified object should also be
modified to refer to the new version. However updating them means creating other new objects in turn,
1055
which results in an explosion in the number of objects
in the database. In [M. Kifer and Wu 1990] a system
of this type is described. They have to' use equality in order to make certain denotations (i.e. object
ids) in fact refer to the same object and provide some
navigation methods through versions in order to get
appropriate versions of an object.
The alternative is to have one 'Object jim and to
parametrize its attributes with times at which these
attributes have various values. A state change in an
object now corresponds to changing the value of any
of its attributes. For instance if a person moves to a
new place, the value of the address attribute changes;
if an employee is promoted the rank attribute changes
accordingly. Formulation of this idea in the spirit of
the event calculus is straightforward. Instead of
happens(promote(jim,professor}, 1991}.
it is convenient to separate out the object that has
been affected by the event :
happens(jim, promote(professor}, 1991}.
Now events are indexed by object; every object has
associated with it the events that affected it. Events
initiate and terminate periods of time for which a
given attribute of a given object takes a particular
value:
initiates(Obj, promote(NewRank}, rank, NewRank}.
Given a set of event descriptions which are indexed
by object identities, the modified event calculus constructs the state of an object. We can ask queries to
find out the value of an attribute of an object at a
specific time or we can access the state of an object
at any time by querying all of its attributes :
?- holds_at(jim, rank, R, 1989}.
?- holds_at(jim, Attr, Val, 1989}.
The following is the basic formulation of the objectbased event calculus used to reason about the changing state of objects :
holds_at(Obj, Attr, Val, T} ~
happens(Obj, Ev, Ts}, Ts ~ T,
initiates(Obj, Ev, Attr, Val},
not broken(Obi, Attr, Val, Ts, T}.
broken(Obj, Attr, Val, Ts, T} ~
happens(Obj, Ev·, T·},
Ts < T· ~ T,
terminates(Obj, Ev·, Attr, Val}.
terminates(Obj, Ev·, Attr, _} ~
initiates(Obj, Ev·, Attr, _}.
Informally, to find the value of an attribute of an object at time T, we find an event which happened before time T, and initiated the value of that attribute;
and then we check that no other event which terminates that value has happened to the object in the
meantime. The last clause for terminates is to satisfy
the functionality constraint of the attributes. Since
we are considering only single-valued attributes we
can simply state that the value of an attribute is terminated if an event initiates it to another value. (The
usage of the anonymous variable '_' in this clause is
not a mistake).
The original event calculus can compute the periods of time for which a property holds. We can have
the same facility for· the attributes of objects. The
following compute the periods of time for which an
attribute takes a particular value :
holds_for(Obj, Attr, Val, (8 - E)} ~
happens(Obj, Ev, S},
initiates(Obj, Ev, Attr, Val},
terminated(Obj, Attr, Val, 8, E}.
terminated(Obj, Attr, Val, 8, E} ~
happens(Obj, Ev, E},
terminates(Obj, Ev, Attr, Val}, 8
not broken(Obj, Attr, Val, 8, E).
<
E,
holds_for is used to find the period of time for which
an attribute has a particular value. The time period
is represented by its start (8) and end (E) points. We
also require another clause for holds_for to deal with
periods that have no end-point (Le. an attribute is
initiated but there is no event which terminated its
value). This can be written in a similar style, which
we omit.
Since objects are organized into classes, it is natural and convenient to structure the specification of
the effects of a given event according to the class of
object it affects. If an event is defined to affect the
instances of a class, then the same event specification
applies to the instances of subclasses. For example,
consider a departmental database in which objects
are organized according to the class hierarchy given
in section 3. We can specify the effects of these events
in the following way :
initiates(Obj, moverAddress}, address,Address} ~
instance_of(Obi,person}.
initiates(Obj,promote(NewRank}, rank, N ewRank} ~
instance_of(Obj, employee}.
The effects of changing the address are valid for all
persons (Le. all students and employees as well).
However promotion is a type of event which can happen to employee objects only. In the formulation as
presented here, it is possible to assert that an object
of class person was promoted - but this event has no
effect (does not initiate or terminate anything) unless
the object is also an instance of class employee. An
1056
alternative is to arrange for event descriptions to be
checked and rejected at input if the class conditions
are not satisfied. This alternative requires more explanation than we have space for; it is peripheral to
our main points, and we omit further discussion.
We have discussed how event calculus can be used
to describe changes to the values of attributes of objects. Apart from the events that cause state changes
of existing objects, there are other kinds of events
which cause the creation of new objects or deletion
of objects.
4.2
Creation of Objects
The creation of a new object of a given class means
adding new information about an entity to the
database. In the real world being modeled, there are
events which create new entities. Birth of a person,
manufacturing of a vehicle or hiring a new employee
are examples of such events. We can think of describing object creation by events whose specification will
provide the necessary information about the initial
state of the object.
For creation, we need to say what the class of an
object is and specify somehow its initial state. In
a practical implementation, generation of a unique
identity for a newly created object can be left to the
system; conceptually, all object identities exist, and
the 'creation' of an object is simply assigning it to a
chosen class. Assigning the new identity to the class
initiates a period of time for which the new object
is a member of that class. This makes it necessary
to treat class membership as a time-dependent relationship. We introduce a new predicate assigns to
describe instance addition to classes. For the time
being we assume that once an object is assigned to a
class it remains an instance of this class throughout
its lifetime. Class changes are discussed separately in
section 5.
We can handle creation of objects by specifying
which events assign objects to which classes. We use
the same event description to initialize the state of
the object. As an example consider registration of a
student ali, which causes the creation of a new student object in the database. The specification of the
event and the necessary rules to describe creation are
as follows:
event: e23 [act => register,
object => ali,
section => Ip,
supervisor => bob].
assigns(event:E[act=>register, object=> ObJJ,
ObJ, student).
initiates(Obj, E , section, B) +event : E[act=> register, object=> Obj, section=} Bj.
initiates(Obj, E, supervisor, B) +event: E[act=> register, object=> Obj,supervisor=>Sj.
The assigns statement is used to assign the identity
of the object Obj to the class student; the initiates
statements are used to initialize the object's state.
Now the occurrence of the event is recorded by :
happens(e23, 1991).
To specify that the event has happened to the object
ali we use the rule:
happens(Obj, Ev, T) +happens(event:Ev[act=> register, object=} Obj}, T).
Note that we have two happens predicates: one binary
(for asserting that events happened at a given time),
and one ternary (to index events by objeds affected).
We have to notice also that creating a new object
of class C, creates a new instance of the superclasses
of C as well. There are several ways to formulate this.
The simplest is to write:
assigns(Ev, Obj, Class) +is_a(Bub, Class), assigns(Ev, Obj, Sub).
4.3
Deletion of Objects
There are two kinds of deletions that we are going to
discuss in this paper. One is absolute deletion of an
object where the object is removed from the database.
The other one deletes an object from its class but
keeps it as an instance of another class, possibly one
of the superclasses. The second case is related to
mutation of objects as they change class, which will
be discussed in section 5.
For the purposes of this section, we assume that
when an object is deleted it is removed from the set
of instances of its class and the superclasses, and that
all its attribute values are terminated. For example,
if a person dies, all the information about that person is deleted from the database. We use a new predicate destroys to specify events that delete objects and
write the following:
terminates(Obj, Ev, Attr, _)
+-
destroys(Ev, Obj).
This rule has the effect that all attributes Attrdefined
in the class of the object and also those inherited from
super classes are terminated. If an event destroys an
object 0 which is an instance of class C, then that
event removes 0 from class C and all superclasses of
C.
There is one point to consider when deleting objects in object-oriented databases. If we delete an object x, there might be other objects that have stored
1057
the identity of x as a reference. The deletion therefore can lead to dangling references. A basic choice
for object-oriented databases is whether to support
deletion of objects at all [Zdonik and Maier 1990]. We
choose to allow deletion of objects and we eliminate
dangling references by adding another rule for the
broken predicate:
broken(Obj, Attr, Val, Ts, T) .happens(Val, Ev*, T"'),
Ts < T'" ::; T,
destroys(Ev"', Val).
We obtain the effect that the value Val of the attribute Attr is terminated by any event .which destroys the object Val.
4.4
Class Membership
As we create and delete objects the instances of a
class change. Class membership, which is described
by the instance_of relation, is a dynamic relation that
changes over time. We can handle the temporal behaviour by adding a time parameter. We now have
events that initiate and terminate periods of time for
which an object 0 is an instance of a class C. The
instance_of relation is affected when a new object is
assigned to a class or when an object is destroyed.
By analogy with holds_at, the following finds the instances of a class at a specific time :
instance_of(Obj, Class, T) .happens(Ev, Ts), Ts::; T,
assigns(Ev, Obj, Class),
not removed(Obj, Class, Ts, T).
removed(Obj, Class, Ts, T) .happens(Obj, Ev"', T*), Ts < T* ::; T,
destroys(Ev"' , Obj).
With this time variant class membership we can ask
queries to find the instances of a class at a specific
time. For example:
?- instance_of(Obj, employee, 1980).
We can also write the analogue of holds_for to compute periods, which we omit here.
In the example we have been using, we have represented the rank of an employee object by including an
attribute rank whose value might change over time.
But suppose that instead of using an attribute rank,
we had chosen to divide the class of employees into
various distinct subclasses:
is_a (lecture r, employee).
is_a(professor, employee).
It is at least conceivable that this alternative representation might have been chosen, assuming that
all employee objects have roughly the same kind of
internal structure. Is the choice between these two
representations simply a matter of personal preference? Not if we consider the evolution of objects over
time. The first representation allows for change in
an employee's rank straightforwardly, since this just
changes the value of an attribute. The second does
not, since no object can change class in the formulation of this section. The only way of expressing, say,
a promotion from lecturer to professor, is by destroying (deleting) the lecturer object and creating a new
professor object. But then how do we relate the new
professor object to the old lecturer object, and how
do we preserve the values of unchanged attributes
across the change in class? In the next section we
will examine the problem of allowing the class of an
individual object to change.
5
Mutation
of
Objects:
Changing the Class
The ability to change the class of an object provides
support for object evolution [Zdonik 1990]. It lets an
object change its structure and behaviour, and still
retain its identity. For instance, consider an object
that is currently a person. As time passes it might
naturally become an instance of the class student and
then later an instance of employee. This kind of modification is usually not directly supported by most
systems. It may be possible to create another object of the new class and copy information from the
old object to it, but one loses the identity of the old
object.
We want to describe this kind of evolution by event
specifications. For example graduation causes a student to change class. If we delete student ali from
class student, then he will lose all the attributes he
has by virtue of being a student, but retain the attributes he has by virtue of being a person. The effects of this event can be described by removing ali
only from class student and terminating his attributes
selectively. The attributes that are going to be terminated can be derived from the schema information.
For dealing with this type of class change we use a
new predicate removes in place of the predicate destroys of section 4.3:
removes(event: Ev[act => graduate, object=> ObJJ,
Obj, student).
terminates(Obj, E, Attr, _) .event:Ev[act=> graduate, object=> Obj},
attribute(student,Attr).
1058
The clauses for the time-dependent instance_of relation must be modified too, to take removes into
account:
removed(Obj, Class, Ts, T) ~
happens(Obj, Ev*, T*), Ts < T*$ T,
removes(Ev*, Obj, Class).
The graduation of the student ali corresponds to
moving him up the class hierarchy. Now consider
hiring ali as an employee. This will correspond to
moving down the hierarchy. The specification of an
event causing such a change will likely include values to initialize the additional attributes associated
with the subclass. So the effects of hiring ali will be
to assign him to the employee class and initiate his
employee attributes. The event might be:
event: e21{act => hire,
object => ali,
dept => cs,
rank => lecturer}
And we can declare the following:
assigns(event:E{act=>hire, object=> Obj}, Obj, employee).
initiates(Obj, E, dept, D) ~
event:E{act=>hire, object=>Obj, dept=>Dj.
initiates(Obj, E, rank, R) ~
event:E{act=> hire, object=> Obj, rank=> R}.
Note that in changing class first from student to person, then from person to employee, ali retains all the
attributes he has as a person.
We have described this class change by two separate events: graduation and hiring. We can also
imagine a single event which would cause an object
to change its class from student to employee directly,
say hire-student event. We could then describe the
changes using the description of this event:
removes(event: E{act => hire-student,
object => Obj}, Obj, student).
assigns(event: E{act => hire-student,
object => Obj}, Obj, employee).
The initial values of the additional attributes will
again be given in the event specification. As in the
case of having two separate events, we have not lost
the values of the attributes as a person, and we have
not removed the object from class person.
We have illustrated three kinds of class changes:
changing from a class C to a direct superclass of C,
changing from C to a direct subclass of C and changing from C to a sibling class of C in the hierarchy.
In general, changing an object from class Cl to class
C2 involves removing from Cl and assigning to C2
and specifying in the event description how the initial values of C2 attributes are related to the values
of old Cl attributes.
6
Concluding Remarks
We have presented a variant of the event calculus
which maintains an object-based data model where
the standard versions maintain a relational one. Section 4 considered state changes of objects in this
framework, and the creation and deletion of objects.
Section 5 discussed the modifications that are necessary to support also the mutation of objects - change
of an object's class and its internal structure without
loss of its identity.
There are other object-oriented features that can
usefully be incorporated into the object-based data
model. Removing the restriction that attributes are
all single-valued causes no great' complication. We
are currently developing other extensions, such as
the inclusion of methods in classes for defining the
value of one attribute in terms of the values of other
attributes, and we are investigating what additionaJ
complications arise when the schema itself is subject
to change.
In object-oriented terminology, event types - like
promote, change-address, and so on - correspond to
methods: their effects depend on the class of object
that is affected; the predicates intiates and terminates for attribute values, and assigns, destroys and
removes for objects and classes are used to implement the methods (they would be replaced by destructive assignment if we maintained only a changing snapshot database). Of course, execution of this
event calculus in Prolog does not yield an objectoriented style of computation. At the implementationallevel, objects are not clustered (except by Prolog's first argument indexing), and the computation
has no element of message-passing. The implementation and the computational behaviour can be given
a more object-oriented flavour by using for example
the techniques described by [Chen 1990] for C-Iogic,
or the class templates of[McCabe 1988]. We are currently investigating what added value is obtained by
adjusting these implementational and computational
details.
The object-oriented version ofthe event calculus offers some (computational) advantages over the standard relational versions, that we do not go into here
for lack of space. Whatever the merits of the objectbased variant of the event calculus, we believe that
its formulation forces attention to be given to important aspects of object-orientation that are otherwise
ignored. We limit ourselves to two general remarks:
1) In the literature, the terms type and class are
often used interchangeably. Sometimes type is used
in its technical sense, but then it is common to see
illustrative examples resembling 'Mary is of type stu-
1059
dent'. If we consider the dynamics of object-oriented
representations, then these examples are either badly
chosen or the proposals are fundamentally flawed.
'Mary' might be a student now but this will not
hold forever. We could surely not contemplate an
approach where an update to a database requires a
change to the type system, and hence to the syntax of
the representational language. These remarks do not
apply to object-oriented programming where there is
no need to make provision for updates that change
the type of an object.
The static notion of a type corresponds to the
treatment of a class we presented in section 4: an object mayor may not exist at a given time, but when it
exists it is always an instance of the same class . If we
wish to go beyond this, to allow objects to mutate,
then a dynamic notion of class is essential. This is
not to say that types have no place in object-oriented
databases. A student can become an employee over
time, but a student cannot become a filing cabinet,
and a filing cabinet cannot become an orange. Both
static types and dynamic notions of class are useful.
The consideration of the dynamics of objects - how
they are allowed to evolve over time - suggests one
immediate and simple criterion for choosing which
notion to use: the type of an object cannot change.
2) In section 4.3, we assumed that all attributes
of an object are terminated when the object is destroyed; in section 5, removal of an object from the
class C terminates all attributes the object has by
virtue of being an instance of the class C. The reasoning behind this assumption is this: attributes are
used to represent the, possibly complex, internal state
of an object. When an object ceases to exist, it is not
meaningful to speak any more of its internal state.
Of course, some information about an object persists
even after it ceases to exist. It is still meaningful to
speak of the father of a person who has died, but it
is not meaningful to ask whether this person likes oranges or is happy or has an address. The development
of these ideas suggests that we should distinguish between what we call 'internal attributes' and 'external
relationships'. Internal attributes describe the state
of a complex object, and they cease to hold when the
object ceases to exist or ceases to be an instance of
the class with which these attributes are associated.
External relationships continue to hold even after the
object ceases to exist. We are being led to a kind of
hybrid data model together with some tentative criteria for choosing between representation as attribute
and representation as relationship with other objects.
The analysis given here is rather superficial, but it
indicates the general directions in which we are planning to pursue this work.
Acknowledgements. F.N. Kesim would like
to acknowledge the financial support by TUBITAK,
the Scientific and Research Council of Turkey.
References
[Abiteboul and Grumbach 1988] S. Abiteboul and
S. Grumbach. COL : A logic-based language for
complex objects. In International Conference on
Extending Database Technology- EDBT'BB, pages
271-293, Venice, Italy, March 1988.
[Ait-KaciandNasr 1986] H. Ait-Kaci and R. Nasr.
Login: A logic programming language with builtin inheritance. The Journal of Logic Programming, 1986.
[Bancilhon and Khoshafian 1986] F. Bancilhon and
S. Khoshafian. A calculus for complex objects. In
Proceedings of the 5th ACM-SIGACT-SIGMOD
Symposium on Principles of Database Systems,
pages 53-59, Cambridge, Massachusetts, March
1986.
[Chen 1990] Wei dong Chen. A General Logic-Based
Approach to Structured Data. PhD thesis, State
University of New York at Stony Brook, 1990.
[Chen and Warren 1989] W. Chen and D. Warren.
C-Iogic of complex objects. In Proceedings of
the Bth ACM SIGACT-SIGMOD-SIGART Symposium on the Principles of Database Systems,
1989.
[Dalal and Gangopaduyay 1989]
M. Dalal and D. Gangopadhyay. OOLP: A translation approach to object-oriented logic programming. In The First International Conference on
Deductive and Object-Oriented Databases, pages
555-568, Kyoto,Japan, December 4-6 1989.
[Eshghi 1988] K. Eshghi. Abductive Planning with
the Event Calculus. In Proc. 5th International
Conference on Logic Programming,1988.
[Kifer and Lausen 1989] M. Kifer and G. Lausen. Flogic: A higher-order language for reasoning
about objects, inheritance, and scheme. In Proceedings of the ACM-SIGMOD Symposium on the
Management of Data, pages 134-146, 1989.
[Kowalski and Sergot 1986] R.A.
Kowalski
and
M. Sergot. A logic-based calculus of events. New
Generation Computing, 4:67-95, 1986.
1060
[M. Kifer and Wu 1990] G. Lausen M. Kifer and
J. Wu. Logical foundations of object-oriented
and frame-based languages. Technical report, Department of Computer Science, SUNY at Stony
Brook, June 1990.
[Maier 1986] D. Maier. A logic for objects. In Proceedings of the Workshop on Foundations of Deductive Databases and Logic Programming, pages
6-26, Washington D.C., August 1986.
[McCabe 1988] F.G. McCabe. Logic and Objects:
Language Application and Implementation. PhD
thesis, Department of Computing, Imperial College, 1988.
[Sripada 1991] S. M. Sripada. Temporal Reasoning
in Deductive Databases. PhD thesis, Department
of Computing, Imperial College, 1991.
[Zaniolo 1985] C. Zaniolo. The representation and
deductive retrieval of complex objects. In Proceedings of Very Large Databases, page 458,
Stockholm, 1985.
[Zdonik 1990] S. B. Zdonik. Object-oriented type
evolution. In F. Bancilhon and P. Buneman, editors, Advances in Database Programming Languages, pages 277-288. ACM Press, 1990.
[Zdonik and Maier 1990] S. B. Zdonik and D. Maier,
editors. Readings in Object-Oriented Database
Systems, chapter 4, page 239. Morgan Kaufmann,
1990.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1061
The panel on a future direction of new generation applications
Fumio Mizoguchi
Science University of Tokyo
Intelligent System Laboratory
Noda, Chiba 278, Japan
1
Introduction
This paper introduces a panel to be held at the application t.rack of FGCS'92 conference. This panel will be
devoted to a future direction of new generation applications. The goal is to discuss about the applications
with various paradigms which have been explored in the
areas of knowledge representation, logic programming,
machine learing and parallel processing. It is my hope
that by expressing different perspectives of the panelists,
we will understand the importance of the underlying
paradigms, the real problem areas, and a direction of
next generation applications. The word paradigm itself is originally come from T. Kuhn's book called "The
Structure of Scientific Revolution (1962)". Recently, this
word is referred by the AI researchers because of its sophisticated meanings which indicates a current research
trend or a future direction. Here, I will use this word
in this context that implies new bases and views for exploration of applications without too much philosophical
discussion.
In this short paper, I will attempt to outline the perspectives represented by the panelists. Althouth the
ideas and the positions papers will be represented in t.he
following pages in the proceedings, I will try to guide
the rough views which will be necessary for this panel
discussion. The context is my subjective impressions on
the current trends and research directions.
2
KR paradigm
Ronald J. Brachman will talk about his knowledge representation language called Classic and his experiences
through the use of Classic for the developments in applications. He might refer the knowledge representation
as K R which follows his research communi ty. KR might
be the starting point for any AI based application system. KR is one of the main paradigms of AI researches
including natual language understanding and coginitive
science. There are a lots of attempts in the design of KR
language and systems such as KRL, FRL and KLONE in
the late 1970's. The 1980's was the following productive
period for KR system developments and theories. The
first dedicated international KR conference was held recently, and many important ideas and foundations were
presented in the conference. This state of art has been
reviewed by R.Brachman at the AAAI meeting in 1990.
He has presented KR and issues which are related to the
field, history, development of the 1980's, the future of
KR and open research problems. I am especially interested in his highlights for the future of KR which predicts
the current trends of common knowledge base and ontology. Now, KR should be standardized for the further
developments for any knowledge systems. The related
paper for Classic will be presented at the technical session and he will talk about his position based upon his
paper presentation. The panel will start with KR and
related topics.
3
CLP paradigm
Catherine Lassez will represent the constraint logic programming(CLP) which is a new face for handling constraints in Operations Research, Computational Geometry, Robotics and Qualitative Physics. Reasoning with
constraint is very important for these application areas.
These problems aTe sometimes required heavy computational resource and are related to combinatorial characteristics. The novel aspects of CLP is the unified
framework of knowledge representation for numeric a,nd
non-numeric constraints, solution algorithm and data
query system. Also, CLP has been implemented as the
programming languages such as CLP(R), CHIP, CAL,
Prolog-III and Triton. These languages are used for the
various application domains which are linkage between
AI and OR. As for the financial applications, CLP is
very good affinity for describing the financial equations
and relations. Constraint is also useful to the handling
qualitative knowledge in Computational Geometry and
Naive Physics. In order to show the expressive power of
CLP, it is necessary to demonstrate the speed and performance for the same problems which are OR people's
proposed. This is challenging for any AI researchers and
Logic programmers to persuade other field researchers
through the recent progress on programming which can
avoid the brute forces of numerical calculation. She will
1062
present her experiences on the developments on the theories and applicat.ions. The details will be shown in her
very intensive long position paper in this panel.
4
ILP paradigm
Stephen M uggleton will represent his'recent notion of inductive logic programming(ILP) which uses the inverse
resolution and relat.ive least general generalisation. ILP
is newly formed research area in the integration of machine learning and logic programming. Machine learning
is very attractive paradigm for knowledge acquisition and
learning which any AI system is addressed. With the
advent of machine learning research, there are a lots of
developments in tools for classifying large data using concepts learning and neural network methods. Muggleton's
recent development for his ILP is called GOLEM which is
a first order induction algorit.hm for generating rules from
given examples. Each example is a first order ground
atom and each rule is a first order Horn clause. Rules
can be used to classify new examples. GOLEM is implemented in SUN's using C and very efficient for inducing
rules from examples. Another example of ILP will be
presented in the invited speaker, Ivan Bratko and he will
talk about learning qualitative model of dynamic system
using GOLEM learing program. ILP is different from
CLP, but in its spirit, idea is come from the logic programming paradigm. As is well known, Shapiro's work
on Model Inference System(MIS) is implemented using
Prolog and it is very clear logical model for learning. Using logic programming paradigm, ILP is unified approach
to induction and deduction which provides knowledge
system with more powerfull inference facilities. Namely,
as for inductive component, IPL is very useful for inducing rules from data and then, using the rules, system infers deductively data into known diagnostic states.
Therefore, ILP is new approach to application with very
large data which are further classified into categorization. These kinds of applications are found in the area
of protein engineering and fault diagnosis for satellites.
He was the organizer of the first ILP workshop and the
second workshop which will be held after the FGCS conference. ILP is very young paradigm for machine learning and there will be another exploration in theory and
application. He will talk about the recent research with
the relationship between Valiant's PAC-Learning framework. Machine learning is most active research area and
it will be the next stage that it will deal with realistic
problems.
5
PP paradigm
Kazuo Taki will represent the Parallel Processing(PP)
paradigm which the Fifth Generation Computer System
Project aims to explore and to develop both sides of
hardware and software derived from the concurrent logic
programming which shows affinity for both expressing
concurrency and executing in parallel. With the continious efforts in langualge and implementation research
in the FGCS project, KL-l has expressive for describing many complex applications programs with efficient
performance. Most important aspects in the use of the
concurrent system are to built large scale parallel software which is further accumulated as the experiences
in parallel programming. A new style of programming
requires a new thinking way of programming and the
model of computation. This is also true for KL-l language and for applying it to complex applications such
as VLSI-design, DNA analysis and legal reasoning system. Basing upon these experiences, he will focuss on
the parallel language culture which is necessary for the
next generation computer like multi-PSI and PIM. The
hardware progress has made rapidly compared with software technology and the accumulation of parallel programming experiences are very important for the re-use
and the economy of coding. The current issue of parallel
programming is how to transfer knowledge in software
technology developed by the FGCS project in order to
explore the culture of the concurrent system. Therefore,
as for the future directions, PP paradigm is how to use in
the widely adopted computational environment. He will
talk about the issue of the parallel programming culture
and the experiences in the use of KL-l for applications.
6
Future directions
I will introduce the various paradigms for knowledge
information processing starting from KR to PP. Each
paradigm has distinctive and novel features for explorat.ion of applications. As for my position, I am interested in the research on the fusion of paradigms which is
the integration of CLP and ILP for example. I will call
this paradigm as Inductive Constraint Logic Programming(ICLP not conference name!) which is the natural
extension of constraint logic programming into inductive inference for constraints in Spacial Geometry and
Robotics. This framework is also useful for the Naive
Physics and qualitative reasoning system without large
amount of background knowledges for rules generations.
We will examine our approach to Naive Kinematics and
simple image processing for spacial reasoning. At this
stage, the application domain is very simple, but for the
research on Robotics t.hat learns, the inductive component is very important in the knowledge acquisition on
the constraints and then deductively use the constraints
for the further moves. The fusion of paradigms will
be necessary foundation for the next generation applications. We should re-examine the current paradigms
for the different problems areas such as 0 R, Robotics
and Computational Geometry.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1063
Knowledge Representation Theory Meets Reality:
Some Brief Lessons from the CLASSIC Experience
Ronald J. Brachman
AT&T Bell Laboratories, 600 Mountain Ave.,
Murray Hill, NJ 07974-0636, U.S.A.
rjb@research.att.com
Abstract
Knowledge representation is one of the keys to Artificial
Intelligence, and as a result will play a critical role in
many next generation computer applications. Recent results in the field look promising, but success on paper
may be misleading: there is a significant gap between a
theoretical result or proposal and its ultimate impact in
practice. Our recent experience in converting a fairly
typical knowledge representation design into a usable
system illustrates how many aspects of "reduction" to
practice can significantly influence and force important
changes to the original theoretical foundation. I briefly
motivate our work on the CLASSIC representation system
and outline a handful of ways in which practice had significant feedback to theory. The general lesson for next
generation applications is the need for us in our research
on core technology to take more seriously the influence
of implementation, applications, and users.
1 Knowledge Representation
Representation of knowledge has always been the foundation on which research and development in Artificial
Intelligence has rested. While no single representation
framework has come to dominate the field, and while
there are important challenges to the utility of conventional· representation techniques from "connectionists"
arid others, it is very likely that the next generation of
AI and AI-related applications will still subscribe to the
hypothesis that intelligent behavior can arise from formal reasoning over explicit symbolic representations of
world knowledge.
The centrality of the need to represent world knowledge in AI systems, expert systems, robots, and Fifth
Generation applications has helped increase interest in
formal systems for representation and reasoning-so
much so that over the last decade, the explicit subfield of "Knowledge Representation" (KR) has taken on
its own identity, with its own international conferences,
IFIP working group, etc. This subfield has been prolific.
It has attracted the attention of the greater AI community with highly visible problems like the "Yale Shooting Problem" and systems like CYC. It has collected its
own set of dedicated researchers, and has increasing numbers of graduate students working on formal logics, nonmonotonic reasoning, temporal reasoning, model-based
diagnosis, and other important issues of representation
and reasoning.
It is probably fair to say that in recent years, formal
and theoretical work has become preeminent in the KR
community.l Concomitantly, it appears to be generally
lThis has happened for numerous reasons, and while it
believed that when the theory is satisfactory, its reduction to practice will be relatively straightforward. This
transition from theory to practice is usually considered
uninteresting enough that it is virtually impossible to
have a technical paper accepted at a conference that addresses it; it seems to be assumed that all of the "hard"
work has been done in developing the theory.
This attitude is somewhat defensible: it is common
in virtually all other areas of AI; and there often really
isn't anything interesting to say further about a KR formalism as it is implemented in a system. However, my
own group has had substantial recent experience with
the transition of a knowledge representation system from
theory to practice that contradicts the common wisdom,
and yields an important message for KR research and
its role in next generation applications. In particular,
our view of what we thought was a clean and clear-and
"finished" -formal representation system was substantially influenced by the complexity and constraint of the
process of turning the logic into a usable tool.
2 The CLASSIC Effort
As of several years ago, we had developed a relatively
small, elegant representation logic that was based on
many years of experience with description hierarchies
and a key inference called classification. As described
in a companion paper at this conference [Brachman et
al., 1992], the CLASSIC system was a product of many
years of effort on numerous systems, all descended from
the KL-ONE knowledge representation system. Work on
KL-ONE and its successors grew to be quite popular in
the US and Europe in the 1980's, largely because of the
semantic cleanliness of these languages, the appeal of
object-centered (frame) representations, and their provision for some key forms of inference not available in
other formalisms (e.g., description classification). The
reader familiar with KR research will note that numerous publications in recent years have addressed formal
and theoretical issues in "KL-ONE-like" languages, including formal semantics and computational complexity
of variant languages. However, the key prior efforts all
had some fundamental flaws, and work on CLASSIC was
in large part launched to design a formalism that was
free of these defects.
Another central goal of CLASSIC was to produce a compact logic and ultimately, a small, manageable implemay have some negative consequences (as addressed here),
it is positive in many respects. The early history of the field
was plagued by vague and inadequate descriptions of ad hoc
solutions and computer programs; recent emphasis on formality has encouraged more thorough and rigorous work.
1064
mented representation and reasoning system. A small
system has important advantages in a practical setting,
such as portability, maintainability, and comprehensibility. Our intention was to eventually put KR technology
in the hands of non-expert technical employees, to allow
them to build their own domain models and maintain
them. CLASSIC was also designed to fill a small number of application needs. We had had experience with
a form of deductive information retrieval (most recently
in the context of information about a large software system [Devanbu et al., 1991]), and needed a better tool to
support this work. We also had envisioned CLASSIC as a
deductive, object-oriented database system (see [Borgida
et al., 1989]; success on this front was eventually reported
in [Selfridge, 1991]).
After analyzing the applications, assessing recent
progress in KL-ONE-like languages, and solving a number
of the technical problems facing earlier systems, we produced a design for CLASSIC that felt complete; the logic
was presented in a typical academic-style conference paper in 1989 [Borgida et al., 1989]. In this design, some
small concessions were made to potential users, including a procedural test facility that would allow some escape to the host implementation language for cases that
CLASSIC could not handle. Given the clarity and simplicity of this original design of CLASSIC, we ourselves held
the traditional opinion that there was essentially no research left in implementing the system and having users
use it in applications. At that point, we began a typical
AI programming effort, to build a version of CLASSIC in
COMMON LISP.
3 Influences in the "Reduction"
to Practice
As the research LISP version neared completion, we began to confer with colleagues in a development organization about the potential distribution of CLASSIC within
the company. Despite the availability of a number of AI
tools in the marketplace, an internal implementation of
CLASSIC held many advantages: we could maintain it and
extend it ourselves, in particular, tuning it to real users;
we could assure that it integrated with existing, non-AI
environments; and we could guarantee that the system
had a well-understood, formal foundation (in contrast to
virtually all commercially available AI tools). Thus we
undertook a collaborative effort to create a truly practical version of CLASSIC, written in c. Our intention
was to develop the system, maintain it, create a training
course, and eventually find ways to make it useful in the
hands of AI novices.
To make a long story short, it took at least as much
work to get CLASSIC to the point of usability as it did
to create the original logic that we originally thought
was the culmination of our research. Our view of the
language and knowledge base operations supporting it
changed substantially as a result of this undertaking, in
ways that simply could not be anticipated when consider
a paper design of the logic.
The factors that influenced the ultimate shape of CLASSIC were quite varied, and in most cases, were not influences that we-or most other typical researchers, I
suspect-would have expected to have forced more research before the logic was truly finished. These ranged
from the need to be reasonable in the release and main-
tenance of the software itself to some specific needs for
key applications that could not really have been anticipated until the system was actually put into practical
use. Here is a brief synopsis of the five main types of
issues that influenced the ultimate shape of the CLASSIC
system:
• the constraints of creating and supporting a system
for real users caused numerous compromises. For one
thing, upward compatibility of future releases is a critical issue with real software, and it meant that any
construct in the language in which we were not completely confident might better be left out of the released system. Issues of run-time performance (which
also dictated the exclusion of some features) also had
surprising effects on what we could realistically include
in the released version.
• certain detailed implementation considerations played
a role in determining what was included in the system.
These included certain tradeoffs that affected the design, such as the tremendous space consequences an inverse relationship ("inverse roles") feature would have
had, or the consequences of certain fine-grained forms
of truth maintenance (to allow for later retraction of
asserted facts). Some features (our SAME-AS construct, for example) were just so complex to implement
that they were better left out of the initial release.
• concern for real users alerted us to issues easily ignored
with a pure logic. These involved the sheer learn ability
and usability of the language and the system. Errorhandling, for example, was of paramount concern to
our real consumers, and yet the very idea never arose
when considering the initial CLASSIC language. Similarly, the uniformity of abstractions and the simplicity of the interface were critical to acceptability of our
system. The potential consequences of user "escapes"
with side-effects was another related concern. Finally,
explanation of the system's behavior-again, not an
issue when we designed the logic-might make the difference between success and failure in using the system.
• as soon as a system is put to any real use, mismatches
in its capabilities and specific application needs become
very evident. In this respect, there seems to be all the
difference in the world between the few small examples
given in typical research papers and the details of real,
sizable knowledge bases. In the case of CLASSIC, our
lack of attention to the details of numbers and strings
in the logic meant substantial more work before implementation. Another issue that plagued us was the lack
of attention to a query language for our KR system (a
common lack in most AI KR proposals).
• finally, what looked good (and complete) on paper did
not necessarily hold up under the fire of real use. Even
with a formal semantics, certain operators prove tricky
to understand in practice, and subtle interactions between operators that arise in practice are rarely evident from the formal work. Simply being forced by
an implementation effort to get every last detail right
certainly caused us to re-examine several things we
thought we had gotten correct in the original logic,
and I suspect this would be the case with virtually every sufficiently complex KR logic that ends up being
implemented.
1065
4 Some Lessons
The main lesson to be learned here is that despite the
ability to publish pure accounts of logics and their theoretical properties, the true theoretical work on knowledge representation systems is not really done until issues
of implementation and especially of use are addressed
head-on. The "theory" can hold up reasonably well in
the transition from paper to system, but the typical KR
research paper misses many make-or-break issues that
determine a proposal's true value in the end. Arguments
about needed expressive power, the impact of complexity results, the naturalness and utility of language constructs, etc., are all relatively hollow until made concrete
with specific applications and implementation considerations.
For example, in our context, the right decision was
clearly to start with a small version of the system for
release, and extend it only as needed. Given the complexity of software maintenance, it may never make sense
to try to anticipate in advance all possible ways that
all possible users might want to express concepts. 2 A
small core with an extension mechanism might in reality
be better than a large, extraordinarily expressive-and
complex-system. In the case of CLASSIC, we have been
able to place in the hands of relatively naive users a fairly
sophisticated, state-of-the-art inference system with a
formal semantics and well-founded inference mechanism,
and have them use it successfully, needing only to make a
small number of key extensions to meet their real needs.
There are several consequences here for next generation applications of knowledge representation research.
First, it is important that the research community recognize as legitimate and important the class of issues
that arise from implementation efforts-issues relating
to size, for example, that have always been the legitimate concern of the database community; issues relating
to implementation tradeoffs and complexities; and issues
relating to software release and maintenance. Second,
unless our KR proposals are put to the test in real use
on real problems, it is almost impossible to assess their
real value. So much seems to be different when a proposal is reduced to practice that it is unclear what the
original contribution really is. Third, it is quite critical
that at least some fraction of the community address directly the needs of users and the constraints and issues in
their applications. Too much research with only mathematics as its driving force will continue to lead KR (and
other areas of AI research) farther afield. Not only that,
it is clear that truly interesting research questions arise
when driven from real rather than toy or imagined needs.
References
[Borgida et al., 1989] A. Borgida, R. J. Brachman, D. L.
McGuinness, and L. A. Resnick. CLASSIC: A Structural Data Model for Objects. In Proceedings of
the 1989. ACM SIGMOD International Conference on
Management of Data, pages 59-67, June 1989.
[Brachman et al., 1992] R. J. Brachman, A. Borgida,
D. L. McGuinness, P. F. Patel-Schneider, and L. A.
2Ironically, the ongoing and sometimes virulently argued
debate over how much expressive power to allow in KR systems may in the end be settled by simple software engineering
considerations.
Resnick. The CLASSIC Knowledge Representation System, or, KL-ONE: The Next Generation. In Proceedings
of the International Conference on Fifth Generation
Systems, Tokyo, June 1992.
[Devanbu et al., 1991] P. Devanbu, R. J. Brachman,
P. G. Selfridge, and B. W. Ballard.
LaSSIE:
A Knowledge-Based Software Information System.
CACM, 34(5):34-49, May 1991.
[Selfridge, 1991] P. G. Selfridge. Knowledge Representation Support for a Software Information System. In
Proceedings of the Seventh IEEE Conference on AI
Applications, pages 134-140, Miami Beach, Florida,
February 1991.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1066
Reasoning With Constraints
Catherine Lassez
IBM T.J. Watson Research Center
P.O.Box 704, Yorktown Heights, NY 10598, USA
lassez@watson.ibm.com
Constraints are key elements in areas such as Operations Research, Constructive Solid Geometry, Robotics,
CAD/CAM, Spreadsheets, Model-based Reasoning and
AI. Languages have been designed specifically to solve
constraints problems. More recently, the reverse problem of designing languages that use constraints as primitive elements has been addressed. Constraints handling
techniques have been incorporated in programming languages and systems like CLP(~), CHIP, CAL, CIL, Prolog III, 2LP, BNR-Prolog, Mathematica and Trilogy.
In the rule-based context of Logic Programming, the
CLP scheme [5] provides a formal framework to reason
with and about constraints. The key idea is that the
important semantic properties of Horn clauses do not
depend on the Herbrand Universe or Unification. These
semantic properties and their associated programming
methodology hold for arithmetic constraints and solvability (and in many other domains including strings,
graphs, booleans, ... ). The CLP scheme is a main example of the use of constraints as the primitive building
blocks of a class of programming languages, since logic
formulae can be themselves considered as constraints.
In the same spirit constraints have been introduced
in committed choice languages in Maher [14], and in the
work of Saraswat [15], and in Database querying languages by Kanellakis, Kuper and Revesz [6]. The link
between classical AI work on constraints, and Logic Programming has been described by van Hentenryck [17].
Not surprisingly there are many different paradigms
reflecting the integration of constraints and languages.
The main differences come from the aims of the language: general purpose programming language, database
or knowledge based query language, or a tool for problem
solving. In mathematical programming the focus is on
optimization, in artificial intelligence the focus is on constraint satisfaction and constraint propagation, in program verification the focus is on solvability. This should
be reflected in the design of appropriate languages, but
constraint programming should also have its own focus
and theory.
We have developped a general framework for a systematic treatment of specific domains of constraints. We
recall that a logic formula is viewed as an implicit and
concise representation of its set of logical consequences
and that the answer to a query Q is a set of substitutions
which establish a relationship between the variables of Q,
satisfied if and only if Q is a logical consequence of the
formula. The-key point is that a single algorithm, Resolu-'
tion, is sufficient to answer all queries. These properties
of logic formulae have counterparts in other domains. In
particular, Tarski's theorem for quantifier elimination in
closed fields[16] establishes that an arithmetic formula
can be viewed as representing the set of all its logical
consequences, that is the set of all arithmetic formulae
it entails. Furthermore, a single algorithm, Quantifier
Elimination, is required in analogy with logic formulae
and resolution.
At the design and implementation level, however, the
problems are far more difficult than for logic formulae.
To try and circumvent these problems one must make
heavy use of results and algorithms from symbolic computation, operations research, computational geometry
etc... Also, as in the case of logic formulae, we have
to sacrifice generality to achieve acceptable efficiency by
carefully selecting sets of constraints for which suitable
algorithms can be found.
Parametric 'queries Applying the paradigmatic aspects of reasoning with logic formulae to linear arithmetic, we have that:
• a set of constraints is viewed as an implicit representation of the set of all constraints it entails
• there is a query system such that an answer to a
query Q is a relationship that is satisfied if and
only if the query is entailed by the system.
• there exists a single algorithm to answer all queries.
Given a set S of arithmetic constraints as a conjunction of linear equalities, inequalities and negative constraints (disjunctions of inequations), we define a parametric query [7J as:
3GYt, GY2, ••• ,
fJ 'Vxt, X2, •• , : S => GYIXI + GY2X2 + ... ::; fJ
I\R(GYl' GY2, ••• , fJ)?
1067
where S is the set of constraints in store and R is a set
of linear relations on the parameters aI, a2, ... , (3.
Parametric queries provide a general formalism to extract information from sets of constraints and to express
standard operations. For instance:
1. is S solvable? If not, what are the causes of unsolvability?
2. does S contains redundancies or implicit equalities?
3. is S equivalent to S'?
4. is it true that x = 2 is implied by S?
= a is implied by S?
does there exist a linear relation ax + (3y + ... = ,
5. does there exist a such that x
6.
implied by S?
7. does there exist a}, a2, ... (3 such that
S::::} alX + a2Y'" ~ (3 and al = 2a2 - 1 ?
The solvability query is typical of linear programming
and corresponds to the first phase of the Simplex method.
Finding the causes of unsolvability is a typical problem
of constraints manipulation system where the constraints
in store can be modified to restore solvability using feedback information provided by the solver. Queries 2 and
3 both address the problem of constraint representation. Redundancy is a major factor of complexity in
constraints processing and the removal of redundancies
and the detection of implicit equalities are key steps in
building a suitable canonical representation for the constraints [10] [12]. Queries 4 and 5 are classic Constr~int
Satisfaction Problems (eSP) and queries 6 and 7 are
generalization of CSP to linear relations: variables are
bound to satisfy given linear relations instead of simply
values.
A priori, there does not seem to be any real connections between these various queries. However, they
can all be expressed as parametric queries which ask under what conditions on the parameters aI, a2, ... , j3, the
constraint in the query is implied by the constraints in
store. By varying the parameters, specific queries can be
formulated. For instance,
• is x bound to a specific value a?
3a}, a2, ... ,j3, s.t. S::::} a1x1 +a2x2+ ...
al = 1, a2 = 0, ... , j3 = a.
= j3 with
• is x ground?
same as above but with j3 unconstrained.
• does S implies 2Xl + 3X2 ~ O?
as above with a1 = 2, a2 = 3, ... , j3 = O.
• what are the constraints implied by the projection
of S the {XI, Y2}-plane?
All parameters except aI, a2, j3 set to 0
The test for solvability and the classic optimization problem can also be expressed in this way:
• is S solvable? .
as above with all parameters aI, a2, ... set to 0
except j3 2:: O.
(by Fourier's theorem, which states that a set of
constraints is solvable if and only if the elimination
of all the variables results in a tautology)
• what are the upper and lower bounds of f = Xl +
X2 + X3?
as above with al = 1, a2 = 1, a3 = 1, all other
parameters are set to 0 except (3 2:: O. The answer gives the upper and lower bounds for (3 that
correspond to the minimum and maximun of f.
Parametric queries generalize logic programming queries
which ask if there exists an assignment of values to the
variables in the query so that the query becomes a logical
consequence of the program clauses. They also generalize
esP's queries which are restricted to constraints of the
type x = a.
We now must address the problem of finding a finite
representation for the answers to the queries. Parametric queries are more complex than simple conjunctions of
constraints as they involve universal quantifiers, non linearity and implication. However by using a result linked
to duality in linear programming [8]' we can reduce the
problem to a case of conjunction of linear constraints.
The Subsumption Theorem states that a constraint is
implied by a set of constraints S iff it is a quasi-linear
combination of constraints in S. A quasi-linear combination of constraints is a positive linear combination with
the addition of a positive constant on the righ-hand side.
For instance, let S be the set
{2x
+ 3y -
z ~ 1, x - y + 2z ~ 2, x - Y + z ~ O}
and Q be the query
3a,j3, \:Ix, y, S::::} ax
+ (3y
~
1?
The following relations express that the constraint in Q
is a quasi-linear combination of the constraints in S.
+ +
2).1
).2
).3 =
3).1 - ).2 - ).3
a
= (3
).1
+ 2).2 + ).3 =
+ 2).2 + q = 1
).1
2:: 0, ).2 2:: 0, ).3 2:: 0, q 2:: 0
-).1
0
where the ).i'S are the multipliers of the constraints in
S. It is from this simpler formulation that variables are
eliminated.
Variable elimination is the key operation to obtain
answers to queries. It plays the role ofresolution in Logic
Programming. With inequalities, the complexity problems are far more severe than in Logic Programming,
even in the restricted domain of conjunctions of positive
linear constraints.
1068
Fourier's method The basic algorithm is Fourier's[2].
The severity of the problem is illustrated by the table
below:
Number of
variables
eliminated
Number of
constraints
generated
°
32
226
12,744
39,730,028
390,417,582,083,242
1
2
3
4
Actual number of
constraints
needed
18
40
50
19
2
The middle column gives the size of the output of Fourier's
method to eliminate between 1 to 4 variables from an initial set of 32 constraints. The right most column gives
the minimum size of equivalent outputs. Fourier's elimination is in fact doubly exponential as it generates an
enormous amount of redundant information. Even if we
remove redundancy on the fly, we are still left with exponential size for intermediate computation and potential exponential size for output. To solve this problem,
one must look for output bound algorithms (an important area of study in computational geometry), that will
guarantee an output when its size is small, bypassing the
problem of intermediate swell. Also in the case where
the size of the output is unmanageable, there is no point
in computing it. However, we may sacrifice completeness
and search for an approximation of reasonable size. That
brings us back to avoiding intermediate swell.
The extreme points method This method, derived
from the formalism of parametric queries, is interesting
as it shows that variable elimination can be viewed as
a straightforward generalization of a linear program in
its specification and as a generalization of the simplex
in its execution. Let S = Ax :S; b and let V be the set
of variables to be eliminated, the associated generalised
linear program G LP is defined as:
h!
~=l
L: ).iail
= G'l
L: ).iaik = G'k
L: \b i = f3
L: \aik+l
=0
L:).i
=0
=1
).i ~
0
L: ).iai m
where extr denotes the set of extreme points. ~ represents the conditions to be satisfied by a combination of
constraints of S that eliminates the required variables.
The normalization of the ). 's ensures that ~ is a polytope. extr(cp(~)), solutions of GLP, determine a finite
set of constraints which defines the projection of S. The
coordinates of the extreme points of cp(~) are the coefficients of a set of constraints that define the projection.
The objective function in the usual linear program can
be viewed as a mapping from Rn to R, the image of the
polyhedron defined by the constraints being an interval
in R. The optimization consists in finding a maximum
or a minimum, that is one of the extreme points of the
interval. In a GLP, the objective function represents a
mapping from Rn to Rm and instead of looking for one
extreme point, we look for the set of all extreme points.
At the operational level, we can execute this GLP by
generalizing the simplex method. The extreme points of
cp(~) are images of extreme points of~. So we compute the set of extreme points of ~, map them by cp and
eliminate the images which are not extreme points. It
is important to note that although the extreme points
method is better that Fourier in general because it elim- .
inates the costly intermediate steps, there are still two
main problems: the computation of the extreme points
of ~ can be extremely costly even when the size of the
projection is small and also the method produces a highly
redundant output [1].
The convex hull method Variable elimination has
long been treated as algebraic manipulations based on
the syntax of the constraints rather than their semantics.
Fourier's Procedure and EPM are no exceptions. Consequently, the complexity of these methods is tied to the
initial polyhedral set instead of to the projection itself.
Quantifier elimination can also be viewed as an operation
of projection. Exploiting this remark in a systematic way
leads to more output bound algorithms which guarantee
an output when its size is reasonable and an approximation otherwise [9]. In the bounded case, the idea is trivial: by running linear programs we compute constraints
whose supporting hyperplanes bound both the polytope
to be projected and its projection. The traces of these
hyperplanes on the projection space provide an approximation containing the projection. At the same time the
extreme points provided by the linear programs project
on points of the projection. The convex hull of these
points is a polytope that is included in the projection.
Iterating this process leads to the projection. Whether
we have an output bound algorithm or not will however
depend on the choice of points. The difficulties that remain are that we do not want to make any assumption
on the input polyhedral set which can be bounded or
not, full dimensional or not, redundant or not, empty
or not. Standard linear programming techniques can be
used to determine solvability and to transform the input
if required into a set of equations defining its affine hull
and a set of inequalities defining a full-dimensional polyhedral set in a smaller space. A straightforward variable
elimination in the set of equations gives the affine hull
of the projection which will be part of the final output.
1069
This simplification based on geometrical considerations
allows us to eliminate as many variables as possible by
using only linear programming and gaussian elimination
before getting into the costly part of elimination.
In the bounded case, the algorithm works directly
on the input constraints. The projection is computed
by successive refinements of an initial approximation obtained by computing with linear programs enough extreme points of the projection so that their convex hull is
full-dimensional. Successive refinements consist in adding
new extreme points and updating the convex hull. The
costly convex hull construction is done in the projection
space thus the main complexity of the algorithm is linked
to the size of the output. The process stops when either
the projection has been found or the size of the approximation has reached a user-supplied bound.
In the unbounded case, the problem is reformulated
using the generalised linear program representation which
is bounded by definition. cp(~) is computed by projection. The output will consist of the convex hull of cp(~)
but also the set of its extreme points, from which the
constraints defining the projection are derived. The advantage over the extreme points method is that we compute directly the extreme points of the projection. We
do not need to compute the extreme points of ~, this
computation being the source of enormous intermediate
computation and high redundancy in the output.
Implicit equalities and causes of unsolvability
Fourier's algorithm can be used to trace all subsets of
constraints in S that cause unsolvability or that are implicit equalities [11].
By using the quasi-dual formulation, we can acheive
the same effects by running linear programs. The quasidual formulation which corresponds to Fourier's algorithm is
CP: f3 = bT }.
AT}. = 0,
~:
{
~}.i=1,.
}.i ~
0 \;fz.
Here CP maps ~m to ~, where m is the number of constraints in S. Since we want to compute the minimum
of CP subject to ~ we need to solve the following linear
program D:
minimize bT }.
subject to AT}. = 0
~\ = 1
}.i ~
0 Vi.
It is obvious that, in general, solving S in this manner is
far more efficient than using Fourier's algorithm. Since
D is a variant of the dual simplex in Linear Programming, it inherits nice properties from the standard dual
simplex such as good incremental behavior, no need to
introduce slack variables and no restriction to positive
I Quasi- Dual D I
Unsolvable
Properties of S
•
•
•
•
•
Strongly solvable
Full dimensional
No implicit equalities
Unbounded and
no p~ojection has arallel facets
•
•
•
•
•
Full dimensional
No implicit equalities
Bounded or
exists projection with parallel facets
• Solvable
·
.
.
• Weakly Solvable
• Not full dimensional
• Exists implicit equalities
• An evident minimal subset of
im licit e ualities
•
• Unsolvable
• An evident minimal infeasible subset
variables. More importantly as a side effect of the solvability test we obtain information about the algebraic
properties of the constraints and about the geometric
structure of the associated polyhedron. The properties
of D are summarized in the table.
Conclusion Much of the existing work on constraints
has been done in diverse domains with their own distinctive requirements. Even in the restricted domain of
linear arithmetic constraints, there is a wealth of knowledge and algorithms. To build systems to reason with
constraints requires borrowing and synthesizing various
notions and this led to the emerging concept of a unified framework of a single representation, the parametric
query, and solution technique, variable elimination, for
handling all the different operations on constraints. This
approach shares key aspects with Logic Programming,
with variable elimination playing the rule of resolution.
The viability of this approach, both from a knowledge
representation and knowledge processing aspects, is beeing tested with applications in the domain of spatial reasoning [3] and graphic user-interface [4]. Empirical results with an initial implementation have shown that a
variety of small (about a hundred inequalities in two dimensions) and fairly large problems (up to about 2,000
inequalities over 70 variables) can be processed in times
ranging from less than a second to a few minutes. Ongoing work includes the design and implementation of an
integrated system based on the proposed framework and
incorporating several solvers. The potential applicability of more recent interior points method is also investigated. Many properties of linear arithmetic constraints
hold for constraints in other domains. These properties
have been abstracted and generalized in [13].
1070
References
[I] T. Huynh, C. Lassez and J-L. Lassez, Practical Issues on the Projection of Polyhedral Sets, to appear
Annals of Maths and AI.
[2] T. Huynh, C. lassez and J-L Lassez, Fourier Algorithm Revisited, 2nd International Conference on
Algehraic and Logic Programming Springer-Verlag
Lecture Notes in Computer Sciences, 1990.
[3] T. Huynh, L. Joskowicz, C. Lassez and J-L. Lassez,
Reasoning About Linear Constraints Using Parametric Queries, in Foundations of Software Technology and Theoretical Computer Science, Lecture
Notes in Computer Sciences, Springer-Verlag vol.
472 December 1990.
[4] R. Helm, T. Huynh, C. Lassez and K. Marriott, A
Linear Constraint Technology for User Interfaces, to
appear Proceedings of Graphics Interface '92
[5] J. Jaffar and J.L. Lassez, Constraint Logic Programming, Proceedings of POPL 1987, Munich.
[6] P. Kanellakis, G. Kuper and P. Revesz, Constraint
Query Languages, Proceedings of the A CM Conference on Principles of Datahase Systems, Nashville
90.
[7] J-L. Lassez, Querying Constraints, Proceedings of
the ACM conference on Principles of Datahase Systems, Nashville 1990.
[8] J-L Lassez, Parametric Queries, Linear Constraints
and Variable Elimination Proceedings of DISCO 90,
Springer-Verlag Lecture Notes in Computer Sciences.
[9] C. Lassez and J-L. Lassez, Quantifier Elimination
for Conjunctions of Linear Constraints via a Convex
Hull Algorithm, IBM research Report, T.J. Watson
Research Center, RC 16779 (1991), to appear Academic Press.
[10] J-L. Lassez, T. Huynh and K. McAloon, Simplification and Elimination of Redundant Arithmetic Constraints, Proceedings of NACLP 89, MIT Press.
[11] J-L. Lassez and M.J. Maher, On Fourier's Algorithm
for Linear Arithmetic Constraints, IBM Research
Report, T.J. Watson Research Center, RC 14114
(1988). To appear Journal of Automated Reasoning.
[12] J-L. Lassez and K. McAloon, A Canonical Form
for Generalized Linear Constraints, IBM Research
Report, T.J. Watson Research Center, RC 15004
(1989), to appear Journal of Symholic Computation.
[13] J-L Lassez, K. McAloon, A Constraint Sequent Calculus LICS 90. Philadelphia.
[14] M. Maher, A Logic Semantics for a class of Committed Choice Languages, Proceedings of ICLP4, MIT
Press 87.
[15] V. Saraswat, Concurrent Constraint Logic Programming, to appear MIT Press.
[16] L. Van Den Vries, Alfred's Tarski's Elimination Theory for Closed Fields, The Journal of Symholic Logic,
vol.53 n.1, March 1988.
[17] P. van Hentenryck, Constraint Satisfaction in Logic
Programming, The MIT Press, 1989.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE.
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
t071
Developments in Inductive Logic Programming
Stephen Muggleton
The 'lUring Institute,
36 North Hanover Street,
Glasgow Gl 2AD,
UK.
Abstract
Inductive Logic Programming (ILP) is a research
area formed at the intersection of Machine Learning
and Logic Programming. ILP systems develop predicate descriptions from examples and background
knowledge. The examples, background knowledge
and final descriptions are all described as logic programs. A unifying theory of Inductive Logic Programming is being built up around lattice-based
concepts such as refinement, least general generalisation, inverse resolution and most specific corrections. In addition to a well established tradition
of learning-in-the-limit results, recently some results
within Valiant's PAC-learning framework have been
demonstrated for ILP systems. Presently successful applications areas for ILP systems include the
learning of structure-activity rules for drug design,
finite-element mesh analysis design rules, primarysecondary prediction of protein structure and fault
diagnosis rules for satellites.
1
Introduction
Deduction and induction have had a long strategic
alliance within science and philosophy. Whereas the
former enables scientists to predict events from theories, the latter builds up the theories from observations. The field of Inductive Logic Programming
[6,8] unifies induction and deduction within a logical
setting, and has already provided notable examples
of the discovery of new scientific knowledge in the
area of molecular biology [5, 7].
2
Theory
In the general setting an ILP system S will be given
a logic program B representing background knowl-
edge and a set of positive and negative examples
(E+, E-), typically represented as ground literals.
In the case in which B ~ E+, S must construct a
clausal hypothesis H such that
BAHpE+
where B, Hand E- are satisfiable. In some approaches [16, 13] H is found via a general-to-specific
search through the latt.ice of clauses. This lattice is
rooted at the top by the empty clause and is partially
ordered by O-subsumption (H O-subsumes H' with
substitution () whenever H() ~ H'). Two clauses
are treated as equivalent when they both O-subsume
each other. Following on from work by Plotkin [12],
Buntine [1] demonstrat.ed that the equivalence relation over clauses induced by O-subsumption is generally very fine relative to the the equivalence relation induced by entailment between two alternative theories with common background knowledge.
Thus when searching for the recursive clause for
member/2, infinitely many clauses containing the
appropriate predicate and function symbols are 0subsumed by the empty clause. Very few of these
entail the appropriate examples relative to the base
case for member/2.
Specific-to-general approaches based on Inverse
Resolution [9, 14, 15] and relative least general
generalisation [1, 10] maintain admissibility of the
search while traversing the coarser partition induced
by entailment. For instance Inverse Resolution is
based on inverting the equations of resolution to find
candidate clauses which resolve with the background
knowledge to give the examples. Inverse resolution
can also be used to add new theoretical terms (predicates) to the learner's vocabulary. This process is
known as predicate invention.
Several early ILP authors including Plotkin [12]
and Shapiro [16] proved learning in the limit results.
Recently, ILP learnability results have been proved
1072
within Valiant's PAC framework for learning a single
definite clause [11] and in [3] for learning a multiple
clause predicate definition assuming the examples
are picked from a simple-distribution.
3
Applications
ILP is rapidly developing towards being a widely
applied technology. In the scientific area, the ILP
system Golem [10] was used to find rules relating
the structure of drug compounds to their medicinal
activity [5]. The clausal solution was demonstrated
to give meaningful descriptions of the structural factors involved in drug activity with higher acuracy on
an independent test set than standard statistical regression techniques.
In the related area of predicting secondary structure of proteins from primary amino acid sequence
[7] Golem rules had an accuracy of 80% on an independent test set. This was considerably higher than
results of other comparable approaches.
Golem has also been used for building rules for
finite-element-mesh analysis [2] and for building
temporal fault diagnosis rules for satellites [4].
4
Conclusion
Inductive Logic Programming is developing into a
new logic-based technology. The field unifies induction and deduction within a well-founded theoretical
framework. ILP is likely to continue extending the
boundaries of applicability of machine learning techniques in areas which require machine-construction
of structurally complex rules.
References
[1] W. Buntine. Generalised subsumption and its
applications to induction and redundancy. Artificial Intelligence, 36(2):149-176, 1988.
[2] B. Dolsak and S. Muggleton. The application
of Inductive Logic Programming to finite element mesh design. In S.H. Muggleton, editor,
Inductive Logic Programming, London, L992.
Academic Press.
[3] S. Dzeroski, S. Muggleton, and S. Russell.
Pac-Iearnability of determinate logic programs.
TIRM, The Turing Institute, Glasgow, 1992.
[4] C. Feng. Inducing temporal fault dignostic rules
from a qualitative model. In S.H. Muggleton,
editor, Inductive Logic Programming. Academic
Press, London, 1992.
[5] R. King, S. Muggleton R. Lewis, and M. Sternberg. Drug design by machine learning: The
use of inductive logic programming to model
the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. Proceedings of the National Academy of
Sciences (to appear), 1992.
[6] S. Muggleton.
Inductive logic programming. New Generation Computing, 8(4):295318, 1991.
[7] S. Muggleton, R. King, and M. Sternberg. Predicting protein secondary-structure using inductive logic programming, 1992. submitted to
Protein Engineering.
[8] S.H. Muggleton. Inductive Logic Programming.
Academic Press, 1992.
[9] S.H. Muggleton and W. Buntine. Machine invention of first-order predicates by inverting
resolution. In S.H. Muggleton, editor, Inductive Logic Programming. Academic Press, London, 1992.
[10] S.H. Muggleton and C. Feng. Efficient induction of logic programs. In S.H. Muggleton,
editor, Inductive Logic Programming, London,
1992. Academic Press.
[11] D. Page and A. Frisch. Generalization and
learnability: A study of constrained atoms. In
S.H. Muggieton, editor, Inductive Logic Programming. Academic Press, London, 1992.
[12] G.D. Plotkin. Automatic Methods of Inductive
Inference. PhD thesis, Edinburgh University,
August 1971.
[13] R. Quinlan. Learning logical definitions from
relations. Machine Learning, 5:239-266, 1990.
[14] C. Rouveirol. Extensions of inversion of resolution applied to theory completion. In S.H. Muggleton, editor, Inductive Logic Programming.
Academic Press, London, 1992.
[15] C. Sammut and R.B Banerji. Learning concepts by asking questions: In R. Michalski,
J. Carbonnel, and T. Mitchell, editors, Machine
1073
Learning: An Artificial Intelligence Approach.
Vol. 2, pages 167-192. Kaufmann, Los Altos,
CA,1986.
[16] E.Y. Shapiro. Algorithmic program debugging.
MIT Press, 1983.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1074
Towards the General-Purpose Parallel Prosessing System
Kazuo Taki
Institute for New .Generation Computer Technology
4-,28, Mita 1-chome, l\1inato-ku, Tokyo 108, JAPAN
taki@icot.or.jp
1
Introduction
Processing power of the recent microprocessors grows
very rapidly; It almost gets over the power of mainframe
computers. Trends of the continuous improvement of
the semi-conductor technology suggest that the processing power of one-chip processor devices will reach 2000
MIPS until the end of 1990s, and that the parallel computer system with 1000 processors will be installed in a
cabinet which will real~ze 2 TIPS (tera instructions per
second) peek speed.
Such a gigantic power hardware is no longer hard to
imagine because recent large-scale parallel computers for
scientific processing, just appeared in the market, suggests a trend of large parallel computers.
However, the software technology on the scientific parallel computers focuses on very limited application domains. Hardware design is also shifted to the applications somewhat. The parallel processing paradigm on
those systems is the data parallelism. Problem modeling, language specification, compiling technique, a part
of OS design, etc. are all based on the data parallelism.
The characteristics of the data-parallel computation
are regular computation on uniform data or synchronous
computation in other word.
The coverage of this
paradigm is limited to non-wide area of application domains, such as dense matrices computation, image processing, and other problems with regular algorithms on
uniform data.
To make full use of the gigantic power parallel
machines in the future, the other parallel processing
paradigms, that cover much wider range of application
domains, are longed to be developed.
2
New Domain of Parallel Application
Knowledge processing is the target application domain
of FGCS project. Characteristics of knowledge processing problems are different much from that of scientific
computations based on the data-parallel paradigm.
Dynamic and non-unif01'm computation often appear
in the knowledge processing. For example, when a
heuristic search problem is mapped on a parallel computer, workload of each computation node changes dra.stically depending on expansion and pruning of the search
tree. Also, when a knowledge processing program is constructed from many heterogeneous objects, each object
arises non-uniform computation. Computation loads of
these problems are hardly estimated before execution.
These large computation problems with dynamism
and non-uniformity are called the dynamic and nonuniform problems in this paper. When a system supports
the new computation paradigm suitable for the dynamic
and non-uniform problems, its coverage of the application domain must expand not only to the knowledge processing but also to some classes of large numerical and
symbolic computation that have less data-parallelism.
3
Research Themes
The dynamic and non-uniform problems arise new requirements mainly on the software technology. They
need more complex program structure and more sophisticated load balancing scheme than that of the dataparallel paradigm.
These items, listed below, have not been studied
enough for the dynamic and non-uniform problems with
large computation.
1. Modeling scheme to realize large concurrency
2. Concurrent algorithms
3. Programming techniques
4. Load balancing schemes
.5. Language design
6. Language implementation
7. OS implementation
8. Debug and performance monitoring supports
The latter five items should be included in the topics
of design and implementation of the system layer. The
former three items should be included in the application layer or more general framework of soft.ware development.
4
Approach
Such an approach has been taken in the FGCS project
that the system layer (including the topics 5 to 8 in section 3) was carefully tailored to suit the dynamic and
non-uniform problems and topics of the upper layer (1
to 4) were studied on the system.
Key Features in the System Layer: The system
layer satisfies these items to realize efficient programming
and execution of the target problems.
1075
1. Strong descriptive power for complex concurrent
programs
schemes. The language features helps the research of various concurrent algorithms and programming techniques.
2. Easy to remove bugs
Application Development: Practical large applications have been implemented [Nitta 1992]. such as:
3. Ease of dynamic load balancing
• LSI-CAD system: Logic simulation / Placement
4. Flexibility for changing the load allocation and
scheduling schemes to cope with difficultv on estimating actual computation loads before execution
• Genome analysis system: Protein sequence analysis
/ folding simulation / structure analysis
Mainly, the language feature realizes these characteristics and the language implementation supports efficiency.
The key language features are listed below.
• Legal reasoning system
• Go game playing system
v
• Small-grain concurrent processes:
A lot
of communicating processes with complex structure
can be easily described, realizing large concurrency.
• Implicit synchronization/communication :
They are performed between concurrent processes
even in remote processors, which helps to write less
buggy programs.
• Separation of concurrency description and
mapping: Programmers firstlv describe concurrency of the program without conVcerning with mapping (load allocation). Mapping can be specified
with a clearly separated syntax after the COllCurrency description is finished. Runtime support for
the implicit remote synchronization enables it.
• Handling a scheduling without destroying the
clear semantics of the single-assignment language
• Handling a group of small-grain processes as
a task
The language implementation realizes an efficient execution of these features, including a efficient
kernel implementation of memory management, process
scheduling, communication, virtual global name space,
etc. [Taki 1992J. The other
functions, which are
written i~ the language, realize an research and development e!lvlronment of parallel software including a programmmg system, task management functions, etc.
as
as
Research for the Upper Layer: Research topics of
1 to 4 in section 3 have been studied. After toy problems have been tested enough, R&D on practical large
applications become important.
Strong cooperation of experts on application domain
and on parallel processing is indispensable for those R
& D. Several R&D teams have been made for each application development. Firstly, the research topics have
been studied focusing on each application, then commonly applicable paradigms and schemes are extracted
and supported by the system as libraries, as functions
or programming samples.
5
Current Status
System hnplementation: A concurrent logic programming language KL1, which has those features listed
in s~tion 4, has bee.n efficiently implemented on the parallel mference machme PIM. A parallel operating system
PIMOS, which is written in KL1, supports an R&D environment for parallel software.
Very low-cost implementation of those features
[Taki 1992J encourages the research of load balancing
• Other eight application programs with different
knowledge processing paradigms
Most of them arise dynamic or non-uniform computation. Some measurements show very good speedup and
absolute speed by parallel processing.
Common Paradigms and Schemes:
Efforts on
extracting common paradigms and schemes from each
application development have been continuing. Categorizing dynamic process structures and load distribution schemes have been carried on. Performance ana.lvsis
methodologies have also been studied [Nitta 1992]. '
A multi-level dvnamic load distribution scheme for
search problems i~ already supported as a library 'program. A modeling. programming and mapping scheme
based on (l lot of small conClIrl'ent objects have been commonly used among several application programs.
6
Conclusion
New paradigms of parallel processing, that can cover the
dynamic and non-uniforrn pmblems, are expected to expand application domains of parallel processing much
larger than ever.
The dynamic and non-uniform problems must be a
large application domain of parallel processing, coming
next to the applications based on the data-pamllelism.
Parallel processing systems, that support efficient programming and execution of the dynamic and nonuniform problems. will get close to the general-purpose
parallel processing system.
The KL1 language system, developed in the FGCS
project. realize many useful features for efficient programming and executioll of that problem domain. Mam'
application developments have been proving effectivene~s
of the language features and their implementation.
R&D of problem modeling schemes. concurrent algorithms, programming techniques and load balancing
schemes for that problem domain have started in the
project, and still have to be continued. The accumulation of those software technology must make the true
general-purpose parallel processing system.
References
[Nitta 1992] K. Nitta, K. Taki and N. Ichiyoshi. Experimental Parallel Inference Software. In Pmc. of thE Int. Conf.
on FGCS, 1992.
[Taki 1992] K. Taki. Parallel Inference Machine PIM. In
Pmc. of the Int. Conf. on FGCS. 1992.
PROCEEDINGS OF THE INTERN A TIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
lO76
A Hybrid Reasoning System For Explaining Mistakes
In Chinese Writing
Jacqueline Castaing
Univ. Paris-Nord, Lipn / . Csp., Avenue J-B Clement
93430 Villetaneuse France
jc@lipn.univ-paris13Jf
Abstract
We present in this paper a hybrid reasoning system for
Explaining Mistakes In Chinese Writing, called EMICW.
The aim of EMICW is to provide students of the chinese
language with a means to memorize characters. The
students write down from EMICW 's dictation. In case of
graphic errors, EMICW will explain the reasons of this
error by using either the etymology of characters or some
efficient mnemonic techniques.
EMICW has multiple representations associated to
multiple reasoning methods. The coherence of the
reasoning is ensured by means of a common logic
formalism, the FLL-theories, derived from Girard's linear
logic.
1 Introduction
The main aim of the system EMICW is to provide students
of the chinese language with a means to memorize chinese
characters without losing heart. The first obstacle for people
accustomed to an alphabet is indeed the great number of
characters to sink in. We propose to them to write down
from EMICW's dictation. In the case where students are
mistaken about a character, the system will explain the
reasons of this graphic error either by using the origin of
the character [Henshall 1988], [Ryjick 1981], [Wieger
1978], or by invoking an efficient mnemonic technique.
EMICW is a hybrid knowledge representation and
reasoning system [Brachman, et al. 1985], [Kazmareck et
al. 1986], [Nebel 1988]. It has multiple representations - a
semantic network associated to inference rules expressed in
the formalism of Gentzen's calculus [Gentzen 1969] associated to multiple reasoning methods. The set of
inference rules defines the main cases of mistakes that the
author of this article and school fellows could make during
their own initiation into the chinese writing. The learning
methods used are given in [Bellassen 1989], [De Francis
1966], [Lyssenko and Weulersse 1987] [Shanghai Press
1982].
To ensure a coherent reasoning, EMICW has a common
logic formalism, the FLL-theories [Castaing 1991],
borrowed from Girard's linear logic [Girard 1987, 1989].
The system essentially performs monotonic abduction
[B ylander 1991]. So, let a be the correct chinese character
the student should write down from EMICW 's dictation.
Let b be the actual answer given by the student. If the
student is mistaken, it means that the character a is different
from b, the binary predicate Error (a, b) is then set to the
value true. An explanation of a graphic error consists in
finding a set of first-order formulas Sigma such that a proof
of the linear sequent Sigma f-- Error (a,b) can be carried
out in a FLL-theory. The set of the formulas of Sigma
shows the different causes of the confusion of the
characters a with the character b. For example, the two
characters a and b may have the same sound (they are
homophonic), or they may share the same graphic
components, and so on.
In this paper, we first briefly outline the history of
chinese characters [Alleton 1970], [Henshall 1988], [Li
1991] [Ryjick 1981], [Wieger 1978], so the reader can
appreciate how a character is made up, how it acquired its
structure and will make himself an opinion on the
difficulties of the chinese writing. We also give the
terminology we use. In the third section, we discuss the
problem of characters representation and recognition which
explains the limitation of our system. Then, after describing
the system EMICW (section 4), we will give in section 5,
an example of explanation in the FLL-theory T. The
essential point of the section 6 is the proof of the
tractability of our system.
2
Chinese Writing
The chinese characters originated between 3000- 2000 B.C
in the Yellow River of China. They have been the subject of
numerous studies. In this paper, we limit ourselves to
mentioning what is essential for a good understanding of
our work.
The chinese characters, also called sinograms (letters
from China) are written in square form with the help of
strokes, for example, horizontal stroke, vertical stroke. A
set of 24 strokes standardized by the Foreign Languages
Institute of Beijing are now of general use (see section 3.1).
Strokes must be written down according to established
principles of stroke order (generally from top to bottom,
and from left to right) called calligraphic order. A
knowledge of these principles is important in order to
achieve the proper shape and to write in the cursive style or
semi-cursive style (the writing style of the chineses).
Sinograms are monosyllabic, and each syllable has a
definite tone. There are four basic tones in the official
national language (called mandarin chinese too). The
transliteration used in this article is based on the official
Chinese phonetic system, called pinyin, which is a
representation of the sounds of the language in the Latin
alphabet. We mark tones with numbers from 1 to 4.
Sinograms have traditionally been classified into six
categories. However, in many cases the categorization is
1077
open to difference of opinion, and one sino gram can
legitimately belong to more than one category. We list
below the main categories that shed considerable light on
the nature of sinograms. The students should consider these
categories as guides to remembering sinograms.
1. The simple pictogram: essentially a picture of si~le
physical object. For example, woman ~ nu3, child -:rzi3.
2. The complex pictogram: a picture of several physical
objects normally indissociable. For example, good }tf
ha03.
3. The ideogram: a meaningful combination of two or
more pictograms chosen for their meanings. For example,
from pictograms sun
ri4, and moon
yue4, the
ideogram intelligent is derived: 8~
4. The ideo- phonogram: the largest category, containing
about 90% of the sinograms. Essentially a combination of
a semantic element with a phonetic element. For example,
the ideo-phonogram seed ~t zi3 obtained by combining
the semantic element cereal* mi3, with the phonetic
element child=f zi3, which gives to the character its reading.
In fact, only about 30% of sinograms have a real phonetic
component as in the example. Chinese (as any other
language which is still spoken) has changed since the
origin, so the phonetic element has lost its property.
The classification of sinograms in dictionaries can be
done with the help of several methods. The number of
strokes method and the alphabetical order (based on the
pinyin romanization) method are easy to apply. The four
corner method considers particular strokes located at the
four corner of the sinogram. These strokes are codified
with the help of four (or five) digits, and the sinogram is
located at the position given by its numerical representation.
The radical method uses a particular element in a sinogram,
the key element, which indicates the general nature of the
character. For instance, the ideo-phonogram~.J
zi3 is
located under the radical
The character dictionary Xin Bua Zi Dian (eds .. 1979) lists
the sinograms with respect to 189 radicals.
About five to seven thousand sinograms of up to ten or
so strokes are needed in order to master the Chinese
writing. The usual technique for learning consists in writing
down a sinogram until it sinks in. We believe that the key
to successful study of sinograms does not lie in rote
learning. We propose a way to make the task a lot easier.
For each case of mistake, our EMICW system gives an
explanation based on the etymology of the characters. For
instance, the character
tian 1 (sky) can be confused
with the following one ~
fu4 (adult), because they have
similar graphics. In fac(,1he character ~ comes from .*
da4 (tall), and from the graphic - yi 1 (one), which
represents a hat, while the character.t;:. comes from 1:. '
and from the graphic- which means a hairpin. The position
of the strokes can be meaningful. If such an explanation is
given to the students in case of error, they progressively
will be able to correct their own mistakes by reasoning,
without relying heavily on memory. Moreover, they can
consider these explanations as an introduction to the history
of Eastern Asia.
We list below the main cases of mistakes we have met in
our study of the chinese language:
1. Confusion of homophonic sinograms: about
50000 sinograms share four hundred syllables. According
to official statistics each syllable with its tone corresponds
to an average of five distinct sinograms. So, the first
a
A
*.
*=
difficulty for students is to distinguish the homophonic
sinograms.
For exftmple, ten
shi2, moment
shi2, and to
know1R shi2 w ich are homophonic sinograms can be
confused in a dictation.
2. Confusion of sinograms with similar graphics:
For example,~ji3,
yi3,
si4 have similar graphics,
tian 1,
fu4 adult have similar graphics too. It
happens that t e mistaken graphic is not a sinogram. For
example, instead of half.:f ban 1, the student (the author
of this article) wrote':': .
3. Confusion of si'hograms which share the same
components: For example,;t..~ di4, and:1! chi2,
"
which share the component ~ .
4. Confusion of sinograms which form a word:
The sinograms are monosyllabic, but the chinese words are
generally dissyllabic. For example, the words .iJ
~l].enlti3 (body),.Jt. fi] gong4tong2 (togeilier), and
1&
shuolfiua4 (to talk). The students usually learn
dissyllabic words. So, they happen to confuse a sinogram
with another.
We can also mention the case of confusion of simplified
forms with non simplified forms of sinograms, of missing
strokes: very complex sinograms may have about thirty
strokes, so missing strokes is a very frequent mistake.
t
EJ·t
e
*
e
1*
1!
3
The Graphics Capture
Students write down sinograms from EMICW's dictation.
A "good" method for representing graphics should allow
the system to rapidly recognize the graphics drawn which
are not automatically sinograms, because students
can be mistaken. The different classification and search
techniques in dictionaries that we have mentioned in the
previous paragraph, permit to locate a character, but not to
correct it. For instance, the four corners method does not
take into account all the strokes drawn by the student, so,
cannot be used to correct mistakes. The recognition
problem of sinograms has been the subject of numerous
studies. The last results can be found in rWang 1988]
[Yamamoto 1991 J.
3.1
Data Capture
In our particular application, we have to "understand"
graphics drawn by students in order to help them in case
of error. Each graphic drawn is characterized by the type of
strokes used, the calligraphic order of strokes, and their
positions in a square. In order to capture all these data, the
system displays the set of 24 standardized strokes. In fact,
only six strokes are primary ones: the point l
(pt), the
horizontal stroke (hr), the vertical stroke
I (vt), the
top to left bottom stroke ) (dg), the top to right bottom
stroke ' (dd), and the back up stroke".(rt). All
other strokes derived from these primary ones. These
strokes are implemented by means of graphical primitives
such as line drawing, rectangle and arc drawing. The
students arrange strokes to draw graphics inside a square,
the pictures may be expanded or shrunk to fit their
destination square. For instance, the sinogram ~ tian1
(sky) can be written down in the following square by
means of strokes of types hr, dg, and dd, according to the
calligraphic order of writing (hr hr dg dd):
1078
4.1
3.2
Graphic Feature of a Sinogram
As the position of strokes can be meaningful, we propose
to locate each stroke in terms of coordinates on a plane (the
coordinate plane is a two-dimension grid, which
corresponds to the square drawn above, the coordinate
origin (0, 0) being at the left top corner of the square).
We sort out strokes with respect to their coordinates: from
top to bottom (top-down order) , from bottom to top
(bottom-up order), from left to right (left-right order), from
right to left (right-left order). So, every graphic is
characterized by the set of following codifications: the
calligraphic order of strokes, the top-down, the bottom-up,
the left-right and the right-left orders of strokes. For
instance, the graphic feature of the sinogram :t: tian 1 is
given by the calligraphic order of strokes (hr lfrllg dd), the
top-bottom order (hr dg hr dd), the bottom-top order (dg
dd hr hr), the left-right order (hr hr dg dd), and the rightleft order (hr hr dd dg). We show now how all that
knowledge can be used to explain graphic errors.
4
Terminological component
Concepts are labelled collections of (attribute, value)
pairs. The main concepts are the following ones:
Stroke, Graphic-Feature (abbreviated as G-F),
Graphic-Meaning (abbreviated as G-M), Graphic-Sound
(abbreviated into G-S), Syllable, Meaning.
Individual concepts denoted by small letters are
instances of concepts denoted by capital letters.
Attributes are classified into the structural link IS-A and
properties.
The IS-A link is used for inheritance. So, if two concepts
B and A are linked by means of the IS-A link, we say that
A subsumes B, and that the concept B is of type A.
Properties are related to the intrinsic features of
concepts. The attribute values are concepts too. The main
properties in the system are the following ones: c-o
(abbreviation of stroke calligraphic order ), o-t-d
(abbreviation of top-down order), o-b-u (abbreviation of
bottom-up order), o-l-r (abbreviation of stroke order from
left to right), o-r-l (abbreviation of stroke order from right
to left), sound (pronunciation), etymo (abbreviation of
etymology).
We give below a general view of the classification of the
main concepts in EMICW taxonomy. To make clear the
presentation, we use an ordering graph (semantic network),
where the bold arrow -> represents the IS-A relation, and
the arrow -> represents the roles.
Knowledge Representation
The representation language of EMICW is a restricted
version of the frame-based language KL-ONE [Brachman
and Smolze 1985] - for instance it does not support
structural dependency relations.
EMICW has a terminological component, the data base
associated to an assertional component. The assertional
component is a set of rules expressed in terms of
predicates which are defined in the terminological
component. Let us first justify our choice, then we will
describe the language.
In order to deal with all the cases of mistakes listed in
section 2, we need for a representation system which
allows us to define all the links of "proximity" between the
objects manipulated, i.e. graphics which are (or which are
not) sinograms. For instance, homophonic links between
two different sinograms, or graphic similarity between a
graphic and its components. The inheritance link IS-A ( B
IS-A A means intuitively that all instances of B are also
instances of A), and the properties which correspond to
roles fit very well our problem. For efficiency reasons, we
have to find a trade-off between the expressive power of
the representation language and the computational
tractability of the relation IS-A (called subsumption
relation). In [Castaing 1991], we analysed the relation B
IS-A A, and we proved that providing some restrictions, a
subsumption criterion can be defined. A matching
algorithm based on this criterion computes subsumption in
polynomial time. In the system EMICW, we increase the
expressive power of our language by adding to the system
an assertional component, which only deals with existential
rules. In section (6), we will discuss the computational
complexity of our system.
In the taxonomy given above, there are only individual
concepts of type Meaning. For instance, the words tall and
hat are instances of Meaning. The concepts of type Syllable
correspond to the syllables of the chinese language without
tone. For instance, the concept Tian is of type Syllable. An
instance of the concept Tian may be tian1 (first tone). The
concepts of type Stroke correspond to ordered sequences of
strokes. Let Sa and Sb be two concepts of type Stroke. Sb
IS-A Sa if and only if the strokes in Sa also appear in Sb in
the same order. For instance, the concept Sa which
corresponds to the sequence of strokes (hr dg dd)
subsumes the concept Sb given by the sequence (hr hr dg
dd). Intuitively, this relation means that the graphics drawn
by means of the ordered sequence of strokes (hr hr dg dd )
have been partially drawn by means of the ordered
sequence (hr dg dd) too. The concepts of type G-F give the
graphic features of sinograms. The meaning of a sinogram
1079
is given by the property etymo, and its reading is given by
the property sound. It may happen that two different
sinograms have the same graphic feature. For instance, to
10vet'J' ha04, and good 1<:f ha03. So, we define concepts
of type G-M (Graphic Meaning), and G-S (Graphic
Sound), such that each sinogram in the data base can be
considered as an instance of the concepts G-M and G-S.
We now give an example of a sinogram representation.
Example-I:
Let alOO be the sinogram ~ tianl. Its graphic feature can
be defined by means of the concept G-FIOO which is
characterized by the following (attribute, value) pairs:
G-FIOO =
{ (c-o, (hr, hr, dg, dd)), (o-t-d, (hr, dg, hr,
dd)), (o-b-u, (dg, dd, hr, hr)), (o-l-r, (hr, hr, dg, dd)), (or-I, (hr, hr, dd, dg»}. The sinogram alOO inherits its
meaning (sound) from the concept G-MIOO (G-S 100)
partially defined by the following sets of (attribute, value)
pairs:
G-MIOO = {(IS-A G-FlOO), (etymo, sky) }
G-S 100 = {(IS-A G-FlOO), (sound, tian)}
So, the sinogram alOO is an individual concept of type GFIOO defined by the (attribute, value) pairs: alOO ={(IS-A,
G-FlOO), (etymo, sky), (sound, tianl) }.
End of the Example-I.
The graphics drawn by the student during a dictation are not
automatically sinograms. So, we first consider them as
concepts of type G-F (Graphic Feature). We solve the
recognition problem of graphics by means of a classifier
[Brachman and Levesque 1984].
4.1.1
Classifier
Usually the role of the classifier in a KL-ONE taxonomy
consists in placing automatically a concept at its proper
location. For classifying concepts in EMICW taxonomy,
we proceed in two steps:
1. From the graphic drawn by the student, we define the
concept CG (Complete- Graphic) related to the properties
c-o, o-t-d, o-b-u, o-l-r, o-r-l of the components
2. We look for the concepts A and B, such that A subsumes
CG, CG subsumes B, and there does not exist a concept A'
which can be located between A and CG, and a concept B'
which can be located between CG and B. We place CG,
and we say that CG is at its optimal location in EMICW
taxonomy. It means that CG inherits from all its ancestors.
A is said to be a father of CG. B is said to be a son of
CG. In case the concepts A and B are identical, we say that
CG has been identified with A (or with B).
4.1.2
Recognition Problem
The recognition problem consists in discovering an
individual concept b of type Sino, which has the same
graphic feature than CG. We proceed as follows:
1. By means of the classifier, we place the concept CG at
its optimal location.
2. If CO can be identified with a concept G-Fn of type
G-F, it means that there exists at least a sinogram which is
an instance of G-Mn and G-Sn. Let cf be this particular
instance of G-Mn andG-Sn. We identify CG with cf, and
CG "wins" all the properties of cf, for example, the
properties sound, and etymo. We give an example.
Example-2
Let us suppose that the graphic drawn by the student is
~
tianl (sky). The concept CG has the following
properties (after sorting out the strokes with respect to
their coordinates)
CG = {(c-o, (hr hr dg dd)), (o-t-d, ( hr dg hr dd)), (o-bu, (dg dd hr hr)), (o-l-r, (hr hr dg dd)), (o-r-l, (hr hr dd
dg))}. The concept CG placed at its optimal location can be
identified with the concept G-FIOO (see the Example-I):
G-FlOO =
{ (c-o, (hr hr dg dd)), (o-t-d, (hr dg hr dd)),
(o-b-u, (dg dd hr hr)), (o-l-r, (hr hr dg dd)), (o-r-l, (hr hr
dd dg))}, and so, can be identified with the instance alOO
of G-MlOO and G-SIOO. The concept CG gains the
properties sound and etymo of a I 00.
End of the example-2.
Our recognition procedure is a little drastic. It may
happen in sinograms with multiple components that some
strokes in a component have no link with those in another
component. By sorting out all strokes, we consider that
they are necessarily linked, so, we detect a graphic error
and reject the graphic proposed by the student. Our
recognition procedure suits sinograms (simple or complex)
whose components are specified by the students.
4.2
Rules
The rules of the assertional component deal with the
different cases of error in chinese writing. All the predicates
manipulated are defined in the terminological component
either as unary predicates (concepts) or as binary predicates
(roles), except for the predicates Error, (different), and =
(equivalence). We explain now how the confusion of
sino grams can be interpreted by means of the predicate
Error.
Let a be the sinogram of the dictation, and CG be the
complete concept obtained from the graphic drawn by the
student. The student's answer is considered correct (there is
no error) if and only if:
1. The concept CG is recognized as a sinogram denoted by
b.
2. The individual concepts a and b share exactly the same
properties.
Two cases of error are possible:
1. The concept CG cannot be identified with a concept of
type Graphic- Feature of a sinogram. It means that the
graphic drawn is not a sinogram.
2 The concept CG is recognized as a sinogram denoted by
b, but the sinograms a and b do not share the same
properties.
In the first case, the concept CG is located at its optimal
position, and has a father that we denote by B. We consider
an individual concept b of type B, and we propose to
explain the confusion of a with b.The choice of an
individual b may depend on a strategy. For the time being
in our application, we identify CO with an individual
which has the same graphic feature as B. In the second
case, we propose to directly explain the confusion of a with
b. The individual concepts pointed out by our system
during an explanation are the witnesses of the error.
The rules of the assertional component have a limited
syntax. Their general form is: "If there-exists x such that P
(x) then Error (a, b)", where x is a vector of variables,
and P is a finite conjunction of predicates. For instance, the
*
1080
rule: "If there-exists z such that Syllable(z) & Sound(a, z)
& Sound(b,z) then Error (a, b)", can be used in order to
explain a mistake between two sinograms a and b which are
homophonic. We give some examples ofrules expressed in
sequent calculus formalism.
rule-I: ::3 z Syllable(z) & Sound(a,z) & Sound(b, z) I- Error
(a, b)
rule-2: ::3 u z ml, rn2, G-M(u) & G-M(z) & Meaning (mI)
& Meaning (m2) & ml:;t:m2 & u:;t: a & z:;t:b & Etymo (u,
mI) &Etymo(z, m2) & Etymo (a, ml) & Etymo (b, m2) &
Error Cu, z) I- Error (a, b)
rule-3: ::3 s Stroke (s) & c-o(a, s) & c-o(b, s) I- Error (a, b)
Logical rules:
r
1-
r
~
r,A°1-
r,
r',
~
A I-
5
Explanation in term of Proofs
In this section, we first present a formal description of
EMICW by means of the FLL-theory T, then we will give
an example of explanation. The FLL-theories use a
fragment of linear logic (see also [Cerrito 1990], and
[Masseron et a1. 19901 for some particular applications of
this logic). We suppose the reader familiar with sequent
calculus. In the next chapter, we will discuss the tractability
ofEMICW.
5.1
Formal Description of EMICW
The FLL-theories are built from the linear fragment which
consists of the connectives & (conjunction), the connective
y (disjunction), and the linear negation denoted by ( )0. The
essential feature of the fragment used is the absence of
the contraction and weakening rules listed below:
r, A, A 1r, A 1r 1- ~
r, A 1-
~
r
(C-l)
1- ~, A, A
r
~
1-
~,
B,~
rl-A,
r
1- (A Y B),
r, A
~
(W-r)
Axioms : A 1- A
r
1- ~, A
A,
r, r' 1-
r' 1-
~ , (C)
~,~'
Exchange rules:
r,
A, B 1- ~
r,B,AI-
~
(Ex-I)
r
I-
rl-
~, A, B (Ex-r)
~,B,A
~
I- A,
(&-11) _-=-r-2.,-=B,-,---I-~~"-----j(&_12)
~
r
r,
r,
r,(A&B)I-
r,A 1-
r,3x AI-
~
(3-1)
~
r 1r
~
~
~
~ (&-r)
1- B,
A(t/x) I- ~ (V-I)
V x A 1-
(y-l)
~, ~'
(y-r)
r I- (A&B),
A, ~
1- V x A,
(V-r)
~
r I- A(t/x) ,~ (3-r)
rl- 3 x A, ~
In rules (\I -r) and (::3-1), x must not be free in rand L1
A FLL-theory can be obtained from the above fragment
by adding a finite set of proper axioms S, which. are
sequents closed under substitution. In the cut rule gl.v en
above, the formula A is the cut-formula. A proof In a
FLL-theory is said to be cut-free, if all cut-formulas
involved occur in some sequent of S.
In our particular application, the set of proper axioms S
which completely defines the FLL-theory T is made up of
two subsets S 1 and S2. The subset of proper axioms S 1
corresponds to the terminological component. They
have the general form A I- B, where A and B are literals
which interprete either concepts or roles. So, the
terminological component of EMICW can be formally
described by the FLL-theory Tl limited to the set of proper
axioms Sl. The subset of proper axioms S2 is given by
the rules of the assertional component.
5.2
The axiom and the rules of the fragment are the following
ones:
Cut
r
~'
e-r)
~
~
~
1-
A I-~
1- AO
(C-r)
A
r 1- ~
r 1- ~, A
---=----'---=-- (W -I)
B 1-
r, r', (A y B) 1-
r,(A&B)1-
The rule-I deals with errors due to homophonic
sinograms. The rule-2 explains that the confusion of a with
b may come from a misunderstanding of the etymologies
of some components of the sinograms a and b.The rule-3
stresses the importance of the calligraphic order: two
sinograms with the same calligraphic order can be
confused.
r,
A,~ (0_1)
To Explain is To Prove
EMICW combines the two following different reasoning
methods:
1. The classifier which performs inferences by means of
the subsumption operation.
2. A theorem prover which applies the cut-rule by only
using the cut-formulas which appear in the rules of the set
S2.
An explanation of a graphic error consists in finding a
finite conjunction of ground formulas Sigma == PI
& ... & Pn such that a proof of the linear sequent Sigma IError (a, b) can be carried out in the FLL-theory T. Let us
show how we proceed generally.
1. First 'case: the cut-formula doesn't contain the
predicate Error.
1081
'*
axiom of S2
Sigma f-:3 x P(x)
:3x P(x) f- Error (a, b)
(Cut)
Sigma f- Error (a, b)
The proof of the sequent Sigma f-:3x P(x) consists in
instantiating the existential quantifier. We define a
component called instantiation component which
performs the following operations:
1. it defines a concept CP by using the properties given in
the predicates P.
2. it locates the concept CP at its optimal position with the
help of the classifier, such that there exists a witness c
which satisfies P in the taxonomy of EMICW.
We obtain the new sequent to be proved, Sigma f- P(c).
We "force" the proof of this sequent by setting Sigma =
P(c) & P2 ... &Pn. The proof of the sequent P(c) & P2
... &Pn f- P(c) is now straightforward by means of the
(&-11) rule.
2. Second case: the cut-formula contains the predicate
Error. We are left with the following tree:
Sigma f-:3x y P(x,y) & Error (x, y)
Sigma f- Error (a, b)
In the same way as indicated above, we use the instantiation
component to point out two witnesses c and d which
satisfy P. We obtain the following sequent to be proved:
Sigma f- P(c,d) & Error (c,d). We apply the (&-r) rule and
we obtain the new tree:
Sigma f- P(c,d)
Sigma f- Error (c,d)
----------------------------------------- --------- (& -r)
Sigma f- P(c,d) & Error (c,d).
We set Sigma = P(c,d) & P2 ... & Pn, so, we are now left
with the proof of the sequent: P(c,d) & P2 ... & Pn fError(c,d). We progressively makes appear all the
formulas of Sigma by iterating the same process.
The sequent Sigma f- Error (a,b) may have several
proofs. In this case, the system can give multiple
explanations to the students. The best explanation must
allow the students to better memorize the sinogram a. We
think that a good criterion for the choice of the best
explanation can be :
1. The presence of the predicate Etymo in the explanation
with the meanings of the components
2. The shorter proof (a proof which applies the smaller
number of rules ).
5.3
An Exalnple of Explanation
Let us explain the confusion of the sinogram ~ tian1
(sky) with the sinogram:t. fu4 (adult) by means of
proofs. Etymologists givt the following explainations: the
sinogram sky comes from a person standing with arms
spread out to look as tall as possible
with a big head
(or a hat) symbolised by the stroke ..... The sinogram adult
*-
comes from tall
with an ornamental hairpin through
his hair (a sign of adulthood in ancient China) symbolised
by the stroke _ . So, we propose the following taxonomy:
1. The concepts G-F90 and G-M90 give the graphic
feature of the sinogram ;(: da4 (tall), and its etymology
G-F90 = { (c-o, ( hr dg dd», (o-t-d, ( dg hr dd», (o-b-u,
(dg dd hr », (o-I-r, (hr dg dd», (o-r-l, (hr dd dg»}.
G-M90 = {(IS-A, G-F90), (etymo, tall)}.
2. The concept G-FOI corresponds to the graphic feature of
the sinogram
yi 1,
G-FOI = {(c-o, ( hr », (o-t-d, ( hr », (o-b-u, ( hr »,
(o-l-r, (hr », (o-r-l, (hr »}. As the sinogram yil has
(at least) two different origins, hat and hairpin, we define
two concepts of type G~M:
G-MOlO = {(IS-A, G-FOl), (etymo, hat)}
G-MOll = {( IS-A, G-FOl), (etymo, hairpin)}
3. The concept G-M 100 defined as {(IS-A, G-FlOO),
(etymo, sky)} (see the example-l of section 4.1) can be
located now as:
G-MlOO = {(IS-A, G-M90), (IS-A, G-MOlO)} .
The sinogram ~ tian 1 represented by the individual
concept alOO= {(IS-A, G-M 100), (IS-A, G-S 100),
(sound, tianl), (etymo, adult)} inherits the properties
(etymo, tall) and (etymo, hat) from the concepts G-M90
and G-MOIO.
In the same way, the sinogram:t . adult is represented by
the individual concept b100~ l: (IS-A, G-MllO), (IS-A,
G-S 110), (sound, fu4), (etymo, adult)}, and inherits the
properties (etymo, tall), (etymo, hairpin) from the concepts
G-M90 and G-MOIl.
In order to prove the sequent Sigma f- Error (aIOO,
bIOO), we propose to apply the cut-rule (C) with the cutformula appearing in the rule-2:
:3 u z ml m2 G-M(u)& G-M(v) & Meaning (ml) &
Meaning (m2) & ml ;j:. m2 & u ;j:. a & z;j:. b & Etymo(u, ml)
&Etymo(z, m2) & Etymo (a, ml) & Etymo (b, m2) &
Error( u, z) f- Error (a, b). We are left with the following
sequent to be proved :
Sigma f-:3 u z ml m2 G-M(u) & G-M(v) & Meaning (m1)
& Meaning (m2) & ml ;j:. m2 & u ;j:. a100 & z;j:. blOO &
Etymo(u, ml) & Etymo(z, m2) & Etymo (aIOO, ml) &
Etymo (b100, m2) & Error( u, z).
The instantiation component instantiates the variable ml to
hat, and the variable m2 to hairpin (it has only this
possibility), and defines the two individual concepts uC
and vC whose etymologies correspond to these meanings:
uC ={ (IS-A, G-MOIO), (etymo, hat)}, and vC= {(IS-A,
G-M011) , (etymo, hairpin)}.
Then, Sigma contains the following main ground formulas:
Etymo(uC, hat) & Etymo(vC, hairpin) which shows that
the reason of the confusion of a 100 with b 100 comes from
a misunderstanding of the origins of the component - yi I
which appears in these two sinograms.
We invite the reader to try to apply the rule-3 in place of the
rule-2. He will find that the confusion of a 100 with blOO
may come from the fact that these two sinograms have the
same calligraphic order.
6
Computational COlnplexity
In this chapter, we prove that EMICW is tractable. The
main problem comes from subsumption. The subsumption
opeartion has been particularly analysed in [Levesque and
1082
B (b I z, a), 'yIz B(z,a) 1-
Brachman 1987] and in [Schmidt-Schaub 1989]. Their
approach are mainly based on semantics. In [Castaing
1991], we have characterized a subsumption criterion by
means of proofs in FLL-theories as Tl (see section 5.1).
We briefly explain how we have proceeded.
6.1 Tractability of Subsumption
Let A and B be two concepts. We interpret A and B by
means of first-order formulas, as in Brachman-Levesque's
interpretation, then, we replace all classical connectives
with linear ones. Let Ac = 3xAl(x) & ... An(x), and Bc =
'yI zB 1(z)& ... Bm(z), (where z and x can be vectors of
variables, and Ai(x) = Ai 1(x) y.. :y Aip(x), Bj(z) = Bj 1(z)
y...y Bjq(z» be the conjunctive normal forms obtained. A
subsumes B iff there exists a cut-free proof in Tl of the
sequent Bc I- Ac. In the absence of contraction and
weakening, we proved the following result :
Theorem (subsumption criterion): A subsumes B iff Ac
and Bc satisfy the following condition (C): there exists
a, a substitution for x such that for each Ai, 1~ i ~n, there
exists some Bj, l~ j ~m, and b, a substitution for z, such
that there exists a cut-free proof of the sequent Bj.b I- Ai.a
in the FLL-theory Tl.
A matching algorithm can be easily derived from the
condition C. It computes subsumption in polynomial time
proportional to the length of the concepts, and to the
cardinality of the set of proper axioms S 1.
Without contraction and weakening, FLL-theories are
decidable. There exists other decidable first-order theories
which are based on classical logic [Ketonen and Weyhrauch
1984], or [Patel-Schneider 1985, 1988]. The originality of
our approach comes from the way we deal with the
universal quantifiers (or with the existential ones). Let us
show how we can explain the rise in complexity of
subsumption by means of contraction. We consider the
following cases:
1. Bc and Ac satisfy condition (C) (the contraction
rule is absent): the sequent Bc I- Ac is provable in
polynomial times, then the complexity of subsumption is
polynomial.
2. Bc and Ac do not satisfy condition (C): let us suppose
that the sequent Bc I- Ac is provable (for example, by
means of an approach based on semantics), and the proof
of the sequent Bc 1- A c necessitates the use of the
contraction rule, (and possibly of the weakening rule): the
search procedure for a proof can make sequents of the form
'yIz B(z,a) I-~, (or of the form r I- :3 x A(x, a» appear at
the nodes of the search-tree. Let us consider the case,
where the sequent 'yIzB(z,a) I- ~ appears at a node of the
search-tree: the search procedure can go back-up the tree by
applying the universal and contraction rules. We can be left
with the following tree:
~
---'----'--------,(~
-1)
~ z B(z,a), 'yI z B(z,a) I- ~
~zB(z,a)
I-
(C-l)
~
The use of contraction may open a branch which
terminates with a failure. Some back-tracking is then
necessary. The complexity of the subsumption in this case
is NP-hard.
3. Bc I- Ac is not provable, then the use of the contraction
rule may lead to duplicate infinitely the same formulas in
the case where the set of instantiation terms (such as b) is
infinite (for example in presence of functions)
B (b/z,a), ~ zB (z,a), 'yI zB(z,a)l-~
(C-I)
B(blz, a),'yIzB(z,a)l-~
---.:..~-,-,-----,-----,----,------,(~
'yIz B(z,a),'yIz B(z,a) I-
-1)
~
--"-':'-~----':"""":--'------(C-l)
'yIzB(z,a) I-
~
The subsumption turns to be undecidable.
6.2
Tractabilty of EMICW
The terminological component of EMICW has a restricted
syntax. The condition (C) defined above gives an adequate
subsumption criterion. In order to locate a concept at its
optimal location, the classifier performs the subsumption
operations in number limited by the diameter of the
semantic network. Its computational complexity is then
limited. The theorem prover applies the cut-rule, with cutformulas in some sequents of S2 (see section 5.2).
Without contraction, the existential formulas which appear
are never duplicated, and so, are only instantiated by
means of the classifier. The cardinality of S2 is finite.
Then, the proof of the sequent Sigma I- Error (a,b) can
also be carried out in limited time depending on the
cardinaly of the set of proper axioms S= S 1 + S2. The
tractability of our system is then ensured.
Conclusion
A prototype of our EMICW system is implemented in
LISP. For the time being, if the student writes down a
graphic which is not recognised as a sinogram, the system
has no particular strategy for discovering a "good"
witness of the error. We are now investigating a strategy
of choice of witnesses, which can take the context of the
dictation, (the sinograms that the student have already
drawn during the dictation) into account. Providing
adequate rules, EMICW can also help students to learn
japanese characters (kanjis) with the chinese or the
japanese reading, or to learn classical vietnamese
characters (nom).
1083
Acknowledgements
Representation and Reasoning, Compo Intell. 3 (2) (1987)
pp 78-93.
I would like to thank the four FGCS 's referees for their
comments which contributed to clarify the presentation of
my work. Discussions with my colleagues of PRC-IA
were very helpful. The contribution of J-L. Lambert and
C. Tollu to this work was invaluable. Thanks to both.
[Masseron M. and Tollu C. and Vauzeilles J. 1990]
"Generating plans in linear logic"
Proc. FST & TCS 10, Bengalore (India), Dec. 1990.
References
[Brachman RJ and Levesque HJ .1984]: "The Tractability
of Subsumption in Frame-Based Descrition language"
Proceedings AAAI-84, August 84, pp34-37.
[Brachman R.J and Smolze J.G. 1985] : "An overwiew of
the KL-ONE Knowledge Representation System".
Cogitive Sci. 9(2) (1985) 171- 216.
[Brachman R.J and Gilbert V.P and Levesque HJ. 1985]:
"An Essential Hybrid Reasoning System:Knowledge and
Symbol Level Accounts of KRYPTON"
Proc. 9th IJCAI (1985) Los Angeles. pp 532-539.
[Bylander T. 1991]: "The Monotonic Abduction Problem:
A Functional Characterization
on the Edge Of
Tractability" .
Principles Of Knowledge Representation and reasoning
Proceedings of the Second Internatipnal Conference.
Cambridge, Massachusetts. April 1991.
[Cerrito S. 1990]:" A linear semantics for Allowed Logic
Programs"
Proc. 5th Annual IEEE Symposium on Logic in Computer
Science, IEEE Computer Society Press, 1990,219-227.
[Castaing J. 1991]: "A New Formalisation Of
Subsumption In Frame-Based Representation Systems".
Principles Of Knowledge Representation and Reasoning
Proceedings of the Second International Conference.
Cambridge, Massachusetts. April 1991.
[Girard J. Y. 1987] : "Linear Logic"
Theorical Computer Science 50 (1987) .pp 1-102.
[Girard J. Y. 1989] :" Towards a Geometry ofInteraction"
Proc. AMS Conference on Categories, Logic and
Computer, Contemporary Mathematics 92, AMS 1989).
[Gentzen G.1969] :" The Collected Papers of Gerhard
Gentzen"
Ed. E; Szabo, North-Holland, amsterdam (1969).
[Kazmareck T.S and Bates R and Robbins G.1986]:
"Recent Developments in NIKL"
Proc. AAAI-86. Philadelphia, pp 978-985.
[Ketonen J. and Weyhrauch R. 1984]
fragment of predicate calculus"
Theorical Computer Science 32:3, 1984.
[Nebel B. 1988]: "Computational Complexity of
Terminological Reasoning in BACK".
Artificial Intelligence 34 (1988) pp371-383.
[Patel-Schneider P.F. (1985)] : "A Decidable First-Order
Logic for Knowledge Representation"
Proceedings 9th. IJCAI (1985). Los Angeles. pp 455-458.
[Patel-Schneider P.F 1988]: "A Four-Valued Semantics
for Terminological Logics"
Artificial Intelligence. 36 (1988) pp 319- 353.
[Schmidt-Schaub M. 1989]: "Subsumption in KL-ONE is
Undecidable"
First International Conference on Principles of
Knowledge Representation.1989. pp 421-431.
[Wang P.S.P 1988]: "On Line Chinese Character
recognition"
6th IGC Int. Conference on Electronic Image pp209-214
1988.
[Yamamoto Y. 1991]: "Two-Dimensional Uniquely
Parsable Isometrc Array Grammars".
Proceedings of the International Colloquium On Parallel
Image Processing Paris] une 1991.
[Alleton V.1970]: "L'Ecriture Chinoise".
Que sais-je N° 1374
[Bellassen 1989]: "Methode d'Initiation
l'Ecriture chinoises".
Eds. La Compagnie/ Bellassen 1989.
a la Langue et a
[De Francis]: "Character Text for Beginning Chinese"
Yale Language Series. New haven and London, Yale
University Press.
[Henshall Kenneth G 1988]. : "A Guide to Remembering
Japanese Characters" Charles E. Tuttle Company, Inc. of
Ruland, Vermont & Tokyo, Japan 1988.
[LI XiuQin 1991]: "Evolution de l'Ecriture Chinoise".
Librairie You Feng Paris 1991.
[Lyssenko N. and Weulersse D.]: "Methode Programmee
du Chinois Moderne"
Eds. Lyssenko Paris 1987.
[Ryjick K. 1981]: "L'Idiot Chinois".
Payot1981
"A decidable
[Levesque H.J and Brachman R.J. 1987]:
"Expressiveness and Tractability in Knowledge
[Shangai Foreign Language Institute 1982]: "A Concise
Chinese Course For Foreign Learners" (Books 1 and 2).
Shangai Foreign Language Institute Press 1982.
[S.S Wieger 1978]: "Les caracteres chinois"
Taichung 1978
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1084
Automatic Generation of a Domain Specific Inference
Program for Building a Knowledge Processing System
Takayasu Kasahara"', Naoyuki Yamada"', Yasuhiro Kobayashi"',
Katsuyuki yoshino ...., Kikuo Yoshimura""
"'Energy Research Laboratory, Hitachi, Ltd., 1168 Moriyama-cho, Hitachi-shi
Ibaraki-ken, Japan 316 Tel. (0294) 53-3111
....Software Development Center, Hitachi, Ltd., 549-6, Shinano-cho, Totsuka-ku
Yokohama-shi Japan 244 Tel. (045) 821-4111
Abstract
We have proposed and developed an expert system tool
ASPROGEN(Automatic §..earch ~m Generator) having a
built-in the automatic generation function of a domain specific
inference program. This function was based on search-based
program specification and an abstract data type of search.
ASPROGEN has interfaces for domain knowledge using an
object-oriented approach and constraints which represent
control knowledge. It is described by using domain knowledge
and it can cover a detailed problem solving strategy
We applied ASPROGEN to produce three kinds of
scheduling systems. These generated systems have equivalent
performance in comparison with knowledge processing
systems implemented by the conventional tool. Further, a twothirds reduction of the program step numbers required as
programmers' input was realized.
1. Introduction
Current expert system tools based on production rule
and/or frame representation provide an environment to generate
expert systems through formalizing and describing problems
by production rules. They are powerful tools, and many
practical expert systems have been produced by using them.
Industrial field applications of expert system tools have
sometimes met problems, the most important one being that
tools based on the production system only prepare a rule based
language, not a problem solving strategy. So, mapping the
problem solving strategy to the production rules is difficult for
users who are not knowledge engineers.
Domain shells[ll, tools based on generic task method[2],
half weak method[31, and SOAR[4] have been developed to
overcome this difficulty. Domain shells are expert system tools
which are restricted to the specified problem regions such as
diagnosis, scheduling, and design. They have spread sheet
type user interfaces and problem-specific inference programs.
But, actual industrial problems include particular conditions,
constraints, or problem solving knowledge, and domain shells
do not have enough flexibilities to cover all of them. This leads
to a conflict between tool flexibility and easy use. In general,
the tool becomes more specific to some regions, so it becomes
easy to use it, but loses flexibility.
The generic task method and half weak method also
have this conflict. The generic task method classifies problem
solving methods into several types which are called generic
tasks, and prepares generic task tools to provide them. Tool
users select an appropriate generic task and supply domain
knowledge to develop the knowledge processing system. The
half weak method regards problem solving as a search and
provides pre-defmed search modules. Tool users select an
appropriate search module and add domain knowledge to the
module. However, these methods, based on classification, do
not necessarily give directions for systematic preparation of
building blocks of knowledge processing systems. So, tool
users must reformulate the problem definition according to the
prepared building blocks.
SOAR has more flexibility for defining the problem
solving strategy. It can generate a search program by defming
several search control rules. But, lack of functions to relate the
search program and domain knowledge restricts the
applicability of SOAR to toy problems.
Then, we developed ASPROGEN(~utomatic §..earch
!X.Qgram Generator). ASPROGEN is an expert system tool
having a built-in automatic generation function of a domain
specific inference program was built. To specify the problem to
be solved, it has interfaces for describing the problem solving
strategy as a search strategy, domain knowledge in an objectoriented way, and the detailed problem solving strategy as
constraints among the attribute values of the domain objects.
2. Overview of ASPROGEN
2.1 Building expert systems based on search
ASPROGEN has no embedded inference mechanism.
Instead, as shown in Fig. 1, its parts include the search
program and search program generating mechanism which
produces inference programs according to user specifications
of the search program, domain knowledge, and detailed
constraints.
The reason Why we use search as the inference
program specification is that it covers almost every inference
mechanism required for expert systems, and it is simple. But, a
search is not easy to describe nor is it easy to prepare controls
tightly directed to a particular problem by the search strategy.
1085
To describe detailed control strategies, ASPROGEN includes
an interface for domain knowledge. Using the domain
knowledge, the detailed controls or problem solving strategy
can be described as constraints between attribute values of
domain knowledge. The detailed control programs are
complicated in the case of a scheduling system or CAD
systems, and it is important to support their generation. In
general, domain specific inference programs which have
functional
operators
have
complicated
constraints.
ASPROGEN combines these constraints to global search
strategies, and generates domain specific inference programs.
ASPROGEN users develop expert systems by following this
procedure:
select nodes in the search procedures, pruning functions,
establishment conditions, and so on. The evaluation functions
define a global search strategy, for example one which prefers
the deepest nodes of the search tree corresponds to the depth
first search. The characteristics of the search nodes are
described by specification values of the nodes in the search tree
which are depth, breadth, parent relations, sibling relations,
and node attributes values. Their values are retrieved from the
structure of the search tree, and we can prepare these
specification values or functions to calculate them. On the other
hand, node attribute values cannot be retrieved from the
structure of the search tree, and it is difficult to cover all
attribute values of the nodes to specify the problem solving
strategy.
Genera! Search
Program
L
_-~~~~~r;;;=~ Q
Domain
knowledg
Constraints
Expert
&
Global search information
Unknown
constraints
Knowledge-base retrieval
Heuristic search
Operator type
Link
function
I
Knowledge processing module
Fig. 1 Overview of ASPROGEN
given as conditions
(1)Users specify a problem solving strategy from the
viewpoint of a search strategy.
(2)Users input the search strategy by selecting the classification
items of the search classification tree which the tool
prepares. This step is executed with the help of the tool
interface.
(3)Users input domain knowledge and constraints with the
help of the tool interface.
(4)ASPROGEN generates a domain specific inference program
and data structures for domain knowledge.
Although (1) is an interesting problem, we limit the
present discussion to (2)-(4).
2.2 Specification of problem solving strategy
To specify the problem solving strategy as a search, we
define a classification tree for the search strategy and a template
of the search program.
Figure 2 shows the classification tree. It comes from
analyzing search trees used in various kinds of problem
solving. A search tree consists of nodes and operators. We
retrieve the classification items from the characteristics of the
nodes and operators. The first classification item comes from
the characteristics of the operators. There are two operator
types. One is a functional operator which creates new nodes
from parent nodes and adds them to the search tree. In the
scheduling search program, a functional operator is used. The
other type is a link operator. The link operator is used in the
diagnosis search program which selects, suitable diagnosis
nodes for the observed state.
The second classification item comes from the
characteristics of the nodes. They are evaluation functions to
I
Goal type.
. st
d
given as In anlce no e
.
I
Solution type
Optimal
Satisfactory
I
..
given as conditions
I
Initial node type .
.
given as Instance node
Node evaluation function type
not fixed
I
Known
I
I
Node evaluation function
search tree parameter
I
pruning
Not prune
I
fixed
paramete~
I Id
domain knowe ge
I
I
Prunig function parameter
search 1ree parameter
~
II
une
I
Fig. 2 Search classification tree
To mitigate this difficulty, we rank the attribute values
from the viewpoint of their relation to the search tree operators.
The attributes which search for the operator directly are called
first-order attributes. For example, in the scheduling system
starting time and ending time of each job are first order, and the
resource constraints are not, if search operators are functions
which adjust job scheduling. To describe programs in detail,
not only first-order attributes but also multi-order attributes or
variables are required. The ftrst-order attributes and the multiorder attributes are domain knowledge. We do not embed
detailed domain knowledge in ASPROGEN, instead an
1086
interface is prepared to describe the domain knowledge and
constraints of attributes of domain knowledge and global
search strategy. By combining the global search strategy,
described as a search strategy, and domain knowledge,
ASPROGEN covers not only toy problems, but also
applications for industrial uses.
2.3
Representation
constraints
of
domain
knowledge
and
ASPROGEN has an interface for describing the domain
knowledge. Domain knowledge is described by objects and
attributes, attribute value ranges, and attribute constraints.
There are two types of objects. One is a class objects which
defmes attributes, and relations between other objects. The
other type is an instance object which has instantiated attribute
values.
Figure 3 shows a representation scheme of domain
knowledge for ASPROGEN. Nodes of the search tree are also
objects. Node objects are related to other objects. The relations
among objects are of three types.
(1) Class-instance relations: Instance objects have the same
attributes as class objects, and the values of the attributes are
inherited from the class objects.
(2) Attribute-value relations: The value region of the attributes
can be described by he class objects. Thus, the attribute value
region is a set of instance objects of the class objects.
(3) Attribute-object relations: The attributes of the objects can
be described by the class objects. Thus, the attributes of the
nodes are instance objects of the class objects, and attribute
values are those of the instance objects.
to the global problem solving strategy, and constraint
satisfaction programs which correspond to the domain
knowledge. Figure 4 shows an outline of the inference
program. The control program is embedded into ASPROGEN,
and the global search program and constraint satisfaction
programs are generated according to user input. If the inference
program is completed, it behaves as follows. Using the global
search strategy, the inference program activates an operator and
generates or selects new node. Then, the constraint satisfaction
programs activate and adjust the attribute values of the objects
for every constraint. According to the result of the constraint
satisfactions, the operator is activated again. This process
continues until termination conditions are satisfied. The
generating process of the inferen~e program consists of three
steps.
(1) Generate the search program which represents a
global search strategy
ASPROGEN has a general search program which is
independent of domain and includes six search sub-functions
as shown in Fig. 5. When completed, it becomes the global
search program of Fig. 4. Constraint satisfaction programs are
activated in the sub-function of 'Apply operator'. The
difference between each search strategy is reflected in the
difference of the six element functions.
ASPROGEN prepares two reference tables and abstract
data types for search[S],[6J• Parent function parent(c,k) which
returns the parent of the node k of search tree c, and
Left_most_child(c,k) which returns a child node which was
first generated or selected are examples of abstract data types
for search. Here, an abstract data type of search makes up the
functions for the search program. The first reference table is a
table intended for generation of search element functions.
C) Instance node of search tree
~
Attribute-object relation
......
Attribute-value relation
---
Class-instance relation
Fig.3 Scheme of knowledge reprsentation in ASPROGEN
On the basis of these definitions of domain knowledge,
ASPROGEN users describe constraints. ASPROGEN
prepares a simplified language which can describe constraints
by using object names and attributes.
2.4 Generation of problem-directed inference
program
The inference program generated by ASPROGEN
consists of two parts, the search program which corresponds
CSP: Constraint satisfaction progam
Fig. 4 Outline of the inference progam
Fig.5 General search progam
Search element functions are program parts of sub-functions
and Fig.6 shows some. They consist of the abstract data type
of search and domain-dependent search functions. Examples
functions are node evaluation function for domain dependent
corresponding search element functions to the userspecified
problem solving strategy.
Figure 7 shows the generating process of the problemdirected inference program. Referring· to the problem solving
strategies using the reference table, the system decides the
search element function. This is done by domain dependent
search control functions such as evaluation function for node.
1087
Then, the same as in the definition process, sub-functions are
dermed by abstract data jy~s of the search tree.
Functions concerning search tree configuration
(1) parent(c,k)
(2) Leftmose_child(c,k)
(3) Right_sibling(c,k)
(4) Label(c,k)
(5) Root(k)
(6) Clear(k)
(7) Deep(c,k)
(8) Height(c,k)
(9) Leaf(c,k)
:returns parent of c in search tree k.
:returns eldest son of c in search tree k.
:returns next younger brother in search tree k.
:returns label of c in search tree k.
:returns root node of k.
:makes search tree k null set.
:returns depth of c in search tree k.
:returns height of c in search tree k.
:if c has no children returns yes;
otherwise, return no.
generating process, which exemplifies the node evaluation
function. According to the user-input problem solving strategy
that the depth of the tree has a high evaluation value, the tool
selects the depth function from abstract data types and
completes the node evaluation function.
Reference table
classification selected
statement
. value
item
solution type satisfactory
----------~------~
Functions concerning search tree operation§.(search element functions)
I
(10) Evaluate(c,k)
:evaluate c, and returns evaluation value.
(11) Chllnge_e(n_t,S,k) :change evaluatioD function of node type n_t to S.
(12) Search state(c,k)
:if c is an open node returns Current;
if c is a removed node returns Finished;
otherwise returns Yet.
(13) Jumping(c,k)
:returns the node, when c is established.
(14) Back_tracking(c,k) :returns the node, when c is not established.
(15) Initial(c,k)
:if node c is initialized state returns yes,
otherwise returns no.
(16) Kill(c,k)
:removes node c from the active node.
(17) Active_c(k)
:returns active node of search treeof k.
(18) Goal(c,k)
:if c is a goal node returns yes.
(19)
:if node c satisfies establish condition of node
Cond_Dode(n_t,c,k)
type
of n_t, returns yes,
otherwise returns no.
(20) Cond_node_type( :if node c satisfies establish conditions of node
D_t,c_t,c,k)
type of D_t, returns yes, othewise returns DO.
(21) Establish(c,k)
:if c is established returns yes,
otherwise returns no.
Evaluate(c, k)
{
return (
user_define_func +
depth (c, k)1
I
Fig. 8 Generation Example for search element functions
Figure 9 shows an example of a sub function
generating process, which exemplifies function (a) in Fig.S
which is named SUCCES_END(c,k.) here. Since the optimal
solution is requested in the problem-solving strategy, the tool
generates the checking successful termination function which
terminates the inference program only if an optimal solution is
found.
Fig. 6 Search element functions
Reference table
classificatiori
item
solution type
y
Global searell
informalion
unknown
nequiHKI goat
number:S
Jound goal
number
Generated CheCKing~\ .
function if terminated y
I
successJully or nol
you~n~ goal number
Problem solving Program paris
slralegy
Optimal
solulion
Number oJ
active nodes
-0
Search elemenl Junclion
J
ISel search Iree I
-t
iN
~eneraled checking
function if terminated
unsuccessJully or nol
(
IN
y
-~
~
/1"
_l,,- -~"-:::;:--'--"=-I
~--------~II-~
1.-----'
Problem-solving slralegy
'O""'"I'~"" I
number of
selected
value
~atisfactory
I::)GOAL:'NUM:;~::
statement
ctive c (c, k) = {}
GOAL NUM
&& -'--
~____~~g~o~a~l~~£fJ[12jj1fJ1£i~==~:=~___________~---~
SUCCES_END(c,k)
(
if~ctive_c(c,k)
__ ()
&&
~__________________~
GOAL_c (c, k) >=/30AL_NUM)
l~_ _ _ _ _ _ _-,
return (TRUE) ;
else return(FALSE);)
Fig.9 Generation Example for search sub-functions
Templale oJ generaJ
search ro ram
Generated inference
program
Fig. 7 Generating function of program-directed
.
inference program
(2) Generate the constraint satisfaction programs,
according to the user specifications in simplified
language
The object names and attribute names of the objects
which the tool users input are registered in the ASPROGEN as
key words for simplified language constraints description. We
call the language, SCRL (Simple Constraint Representation
Figure 8 shows an example of the element function
1088
tenninal symbols. The SCRL compiler accepts only the
following style sentence.
[value clause] [comparing key words] [value clause]
A value clause consists of object name, attribute name
and object relation key words.
Table 1 lists key words and their meaning in SCRL.
There are set operation key words, comparison key words, and
object relation key words such as 'of'. Figure 10 shows an
example of a constraint described by SCRL in which the
number of persons required for each time span is less than the
available personnel number.
object attribute values and their range.
Figure 11 shows an example of the constraint
satisfaction program generating process. The constraint is like
in Fig. 10, a personnel constraint. First, the sentence is pursed
into the C language by the SCRL compiler (code C in Fig. 11).
And attribute value range and code C are set to the allotting
function which ASPROGEN prepares, and the constraint
satisfaction program is completed.
(3) Synthesize all constraint satisfaction programs
and the search program
Finally, ASPROGEN synthesizes all the constraint
satisfaction programs and the search programs, and generates
the domain specific inference program. The key point of the
synthesis is to ensure consistency of the attribute values of the
objects which the tool users define. To make the argument
clear, we define the identity of the search node and scope of
the attribute values.
Constraint name: personnel
sum«time_span of job), person)
>
time_span of avairable_person
Identitv of the search node
Fig. 10 Example of constraint
Table 1 Example of key words of the SCRP
Set operation keywords
Scope of the attribute values
· A include B: A:2 B
· A have e: e E B
· SUM(A,B): sum up attribute value of B
of all instance object of A
Comparing keywords
·x>y
·x=y
object relation key words
. A of y: value of attribute y of object A.
Code C
!SCRl Compiler
Constraint name: personnel
Step. 1
sum«tima_span of job), person)
>
time_span
of avairable-person
The identity of the node is defined by equality of the
value set of the first-order attributes (cf. Section 2.2). Search
tree operators operate them directly. So, it is possible that the
inference programs generate different results, though the
problem solving strategies are the same.
{
for(t=O;tjobOl·person;
iI(sum
>=
tima_span[tj.available-person)
return(O) ;)
raturn(1);)
We define scope of the attribute values in the search tree
node. The attribute value of the objects should have a
consistency in the tree node, and the change of the attribute
values in the process of the constraint satisfaction must
propagate to other constraints.
Set first-order atribute values
Pick constraints which restricts
first-order attribute values
according to
CSP
Usar input
domain
knowladga
Coda C
Stap 3
'-----4 ~!~~ng
1----+1 Attributa
value sat
Fig.11
function
(built in)
Generating process of constraint satisfaction program
ASPROGEN generates constraint satisfaction programs
from the kernel of the constraint satisfaction programs. This
kernel of the constraint satisfaction programs comes from the
relation of attributes and their value range. The allotting
mechanism of the\attribute values is built to ASPROGEN. The
mechanism selects values from the value range. If constraints
are not satisfied, other values are selected. ASPROGEN
generates each constraint satisfaction program by setting the
Fig. 12 Simplified procedure for constraint satisfaction
1089
rBy search
operator
F1=10
Constraints
C 1. x+y +f1 > 30
C2. y+z +f2 < 15
C3. x+z <10
Region of values
x
\5,10,15.20\
Y E {4.8\
Z E {3.6\
E
x,y,z: multi-order attributes
f1 ,f2: first-order attributes
R(s): suitable value set for s.
F2= 7
J..
Initial set
R(x)={5,1 0, 15,20}
R(y)={4,8}
R(Z)={3,6}
B is a plant construction scheduling program. The generated
program produces a schedule under previous relations between
tasks and personnel limitations. Problem C is a jobshop
scheduling system. The generated program produces a
schedule under constraints of resource limitations and
appointed date of delivery.
TabJe 2 Test problems
1
Problem
Characteristics
of solution
By C1
R(x)={1 0, 15,20}
R(y)={4,8,16}
R(Z)={3,6}
Maintenance
scheduling
optimal
Construction
scheduling
satisfactory
..1
By C2
R(x)={10,15,20}
R(y)={4}
R(Z)={3}
1
By C3
R(x)=
R(y)={4}
R(Z)={3}
Fig. 13 Example of filterinq process
Figure 12 shows a simplified mechanism to assure
consistencies of the attribute values. At first, using the search
tree operator, first-order attribute values are instantiated. In the
next step, attributes which are constrained by the first-order
attributes are instantiated by the allotting mechanism of the
attributes. This process continues to survey all constraints. If
the set of attribute values is found, then the first-order attribute
set is suitable, and if not so, the node is unsuitable. But, this
simple algorithm has a fatal defect, i.e. ineffectiveness of the
allotting process. If global consistency among the constraints
does not exist, the algorithm searches for every combination of
the attributes until no solution is found.
To avoid this ineffectiveness ASPROGEN deals with
attributes as a set. In the first stage, using the search tree
operator, first-order attribute values are instantiated. Then the
available value set of multi-order attributes are filtered by the
constraints. Figure 13 shows a simple example of the filtering
process. At first the region of the attributes value set is a
candidate for solution. Filtering by the constraints, inconsistent
values are retrieved from the candidate set. The process
continues until no retrieval value exists/or no suitable value
exists for some attributes.
4. Example and Result
Using ASPROGEN, we built three kinds of scheduling
systems. They were a maintenance scheduling system
(Problem A), construction scheduling system (Problem B),
and jobshop scheduling system (Problem C).
Problem A is a scheduling system for maintenance
scheduling of a nuclear power plant[7]. The generated program
produces a schedule under constmints of maintenance
personnel limitations and interferences between tasks. Problem
Jobshop
sche
Evaluation
function
Constraints
Variety of
resources
. task
2
interference
working
time
I
. task
execution
order
satisfactory
3
The problems are shown in Table 2. Table 3 summaries the
problem solving strategy for each scheduling problem. These
problems differ regarding solution type and resource numbers.
Figure 14 shows the domain model of each problem.
They are the basis of ASPROGEN input. The framework of
these problems is the same. This means that global search
strategies are the same. First-order attribute values are the
starting and ending times of each jobs. Preference of the node
is total scheduling time. There are interference constraints that
some jobs cannot be executed simultaneously. Domain
knowledge differs. For example, problem A and problem B
have personnel limitations, and problem C has machine
constraints.
Table 3
Specification of the test problem-- task specific knowledge
Definition of problem solving method
Items
maintenance
scheduling
construction
scheduling
jobshop
scheduling
Number of goal
1
all
1
Initial number
I
1
1
Global search
information
none
none
none
'lYpe of operator
function of adjusting schedule
'lYpe of initial state
state of representing work schedule
'lYpc of goal state
conditions that satisfies all constraints
Solution type
optimal
satisfactory
satisfactory
Establish conditions
about tree configuration
none
none
none
Evaluation function
fixed
fixed
fixed
Figure 15 shows program step numbers which
programmers input. Comparing the inference programs
implemented by using a conventional tool [8) , equivalent
perlonnance is realized with two-thirds reduction in number of
program steps required as programmer input. Of course, the
reduction rate depends on the applications, for example,
a diagnosis system has more domain knowledge, and the
reduction rate may be smaller than for a scheduling system.
But, overall some reduction of programmers load will result by
the tool.
1090
(1) Problem A
knowledge and it can cover a detailed problem solving strategy
We applied ASPROGEN to produce three kinds of
scheduling systems. These generated systems have equivalent
performance in comparison with knowledge processing
systems implemented by the conventional tool, and two-thirds
reduction of the program step numbers required as programmer
input was realized by ASPROGEN.
\
We have applied ASPROGEN only to scheduling
systems, we are now going to check its applicability to CAD
systems and diagnosis systems.
References
(3) Problem C
Fig.14 Domain model of the problems
oo
:ASPROGEN
:Conventional
tool based on
production system
1500
.J::.
0
:c
.;!:
f!
Ql
.D_
E :;)
:;) a.
c:: .E
j
lnference program
1000
a.'"
Ql ...
_Ql
'" E
E E
Problem solving strategy
500
~
e!
OJ OJ
Task implementation
knowledge
ee
0.0.
Problem Problem Problem
ABC
Problem A : Ma intenance scheduling
Problem B : Construction scheduling
Problem C : Jobshop scheduling
Fig:15' -Program step numbers which programmers input
5. Conclusions
We have proposed and developed an expert system tool
ASPROGEN(Automatic §..earch ~ Generator) in
which the automatic generation function of a domain specific
inference program was built in. This function was based on
search-based program specification and an abstract data type of
search. ASPROGEN has interfaces for domain knowledge
using an object-oriented approach and constraints which
represent control knowledge. It is described by using domain
[1] K. Okuda et al.: Model Based Process Monitoring and
Diagnosis, Proc. of IEEE Pacific Rim International
Conference on Artificial Intelligence'90,pp.134-139,
Nagoya, Japan (1990).
[2] B. Chandrasekaran: Towards Functional Architecture for
Intelligence Base on Generic Information Processing
Tasks, Invited Talk of UCAI-87(1987).
[3] J. McDermott: Using Problem-Solving Methods to Impose
Structure on Knowledge, Proc. of IEEE International
Workshop on Artificial Intelligence for Industrial
Applications, pp.7-11,Hitachi, Japan(1988).
[4] J.Laird, et al.: Universal Subgoaling and Chunking, K1ur
Academic Publishers(1987).
[5] E. W. Dijakstra et al. : Structured Programming, Academic
Press, London(1979).
[6] A.V. Aho, et al.: Data Structure and Algorithms, AddisonWesley Publishing Company, Inc., Reading Mass.(1983).
[7] T.Kasahara, et al.: Maintenance Work Scheduling Aid for
Nuclear Power Plants, Proc. of IEEE International
Workshop on Artificial Intelligence for Industrial
Applications, pp.161-166 Hitachi, Japan(1988).
[8] S. Tano, et al.: Eureka-ll A Programming Tool for
Knowledge-Based Real Time Control Systems,
International Workshop on Artificial Intelligence for
Industrial
Applications,
pp.370-378,
Hitachi,
Japan(1988).
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1091
Knowledge-Based Functional Testing For Large Software
Systems
Uwe Nonnenmann and John K. Eddy
AT &T Bell Laboratories
600 Mountain Avenue, Murray Hill, NJ 07974, U.S.A.
Abstract
through time as people interact and come to undocumented agreements about the real meaning of features.
Automated testing oflarge embedded systems is perhaps
one of the most expensive and time-consuming parts of
the software life cycle. It requires very complex and heterogeneous knowledge and reasoning capabilities. The
Knowledge-based Interactive Test Script System (KITSS)
automates functional testing in the domain of telephone
switching software. KITSS uses some novel approaches
to achieving several desirable goals. Telephone feature
tests are specified in English. To support this KITSS has
a statistical parser that is trained in the domain's technical dialect. KITSS converts these tests into a formal
representation that is audited for coverage and sanity.
To accomplish this, KITSS uses a customized theorem
prover-based inference mechanism and a hybrid knowledge base as the domain model that uses both a static
terminological logic and a dynamic temporal logic. Finally, the corrected test is translated into an in-house
automated test language that exercises the switch and
its embedded software. This paper describes and motivates the approach taken and also provides an overview
of the KITSS system.
The consequence of these problems is that programs
that do not function as expected are produced and therefore extensive and costly testing is required. Once software is developed, even more testing is needed to maintain it as a product. The major cost of maintenance is in
re- testing and re-deployment and not the coding effort.
Estimates, as in [Myers, 1976] and [McCartney, 1991],
are that at least 50%, and up to as much as 80%, of the
cost in the life cycle of a system is spent on maintenance.
1
Functional Testing Problem
There is an increasing amount of difficulty, effort, and
cost that is needed to test large software development
projects. It is generally accepted that the development
of large-scale software with zero defects is not possible.
A corollary to this is that accurate testing that uncovers all defects is also not possible [Myers, 1979]. This
is because of the many inherent problems in the development of large projects [Brooks, 1987]. As just a few
examples, a large project provides support for many interacting features, which makes requirements and specifications complex. Also, many people are involved in
the project, which makes it difficult to ensure that each
person has a common understanding of the meaning and
functioning of features. Finally, the project takes a long
time to complete, which makes it even harder to maintain
a common understanding because the features change
We believe that the only practical way to drastically
reduce the maintenance cost is to find and eliminate software problems early and within the development process.
Therefore, we designed an automated testing system that
is well integrated into the current development process
[Nonnenmann & Eddy, 1991]. The focus of our system is
on "functional testing" [Howden, 1985]. It corresponds
directly to uncovering discrepancies in the program's behavior as viewed from the outside world. In functional
testing the internal design and structure of the program
are ignored. This type of testing has been called black box
testing because, like a black box in hardware, one is only
interested in the input and how it relates to the output.
The resulting tests are then executed in a simulated customer environment. This corresponds to verifying that
the system fulfills its intended purpose.
KITSS achieves a good integration into the current
development process by using the same expressive and
unobtrusive input medium (English functional tests) as
is used currently as well as generating tests in the existing
automated test language as output. Additionally, KITSS
checks the tests for consistency with its built-in extensive
knowledge base of "telephony".
Therefore, KITSS helps the test process by generating
more tests of better quality and by allowing more frequent regression testing through automation. Furthermore, tests are generated earlier, i.e., during the development phase not after, which should lead to detecting
problems earlier. The result is higher quality software at
a lower cost.
In this section, we motivated the need for the approach
1092
chosen in KITSS. In the next section, we will describe
KITSS in more detail.
2
KITSS Overview
The Knowledge-based Interactive Test Script System
(KITSS) was developed at AT&T Bell Laboratories to reduce the increasing difficulty and cost involved in testing
the software of DEFINITY@PBX switches 1 . Although
our system is highly domain dependent in its knowledge
base and inference mechanisms, the approach taken is a
general one and should be applicable to any functional
software testing task.
DEFINITY supports hundreds of complex features
such as call forwarding, messaging services, and call routing. Additionally, it supports telephone lines, telephone
trunks, a variety of telephone sets, and even data lines.
At AT&T Bell Laboratories, PBX projects have many
frequent and overlapping releases over their multi-year
life cycle. It is not uncQmmon for these projects to have
millions of lines of code.
2.1
Testing Process
Before KITSS, the design methodology involved writing test cases in English. They describe the details of
the external design and are written before coding begins.
The cases, which are written by developers based on the
requirements, constitute the only formal description of
the external functioning of a switch feature. The idea is
to describe how a feature works without having coding
in mind.
Figure 1 shows a typical test case. Test cases are structured in part by a goal/action/verify format. The goal
statement is a very high-level description of the purpose
of the test. It is followed by alternating action/verify
statements. An action describes stimuli that the tester
has to execute. Each stimulus triggers a switch response
that the tester has to verify (e.g., a specific phone rings,
a lamp is lit, a display shows a message etc).
Overall, there are tens of thousands of test cases for
DEFINITY. All these test cases are written manually,
just using an editor, and are executed manually in a test
lab. This is an error prone and slow process that limits test coverage and makes regression test intervals too
long.
Some 5% of the above test cases have been converted
into test scripts written in an in-house test automation language. Tests written in this language are run
directly against the switch software. As this software is
embedded in the switching system, testing requires large
1 A PBX, or private branch exchange, switch is a real-time system with embedded software that allows many telephone sets to
share a few telephone lines in a private company.
GOAL: Activate CF 2 using CF Access Code.
ACTION: Set station B without redirect
notification 3 . Station B goes oflhook
and dials CF Access Code.
VERIFY: Station B receives the second dial
tone.
ACTION: Station B dials station C.
VERIFY: Station B receives confirmation tone.
The status lamp associated with the
CF button at B is lit.
ACTION: Station B goes onhook.
Place a call from station A to B.
VERIFY: No ring-ping (redirect notification) is
applied to station B.
The call is forwarded to station C.
ACTION: Station C answers the call.
VERIFY: Stations A and C are connected.
Figure 1: Example of a Test Case
investments in test equipment (computer simulations are
not acceptable as they do not address the real-time aspects ofthe system). Running and re-running test scripts
becomes very time consuming and actually controls the
rate at which projects are completed.
Although an improvement over the manual testing
process, test automation has several problems. The current tools do not support any automatic semantic checking. The conversion from test case to test script takes a
long time and requires the best domain experts. There
are only limited error diagnosis facilities available as well
as no automatic update for regression testing. Also,
test scripts are cluttered with test language initialization statements and are specific to switch configurations
and software releases. Test scripts lack the generality of
test cases, which are a template for many test scripts.
Therefore, test cases are easier to read and maintain.
2.2
KITSS Architecture
KITSS takes English test cases as its input. It translates all test cases into formal, complete functional test
scripts which are run against the DEFINITY switch software. To make KITSS a practical system required novel
approaches in two very difficult and different areas.
First, a very informal and expressive language needed
2CF is an acronym for the call-forwarding feature, which allows
the user to send his/her incoming calls to another designated station. The user can activate or deactivate this feature by pressing
a button or by dialing an access code.
3Redirect notification is a feature to notify the user about an
incoming call when he/she has CF activated. Instead of the phone
ringing it issues a short "ring-ping" tone.
1093
to be transformed into formal logic. Test cases are written in English. While English is undeniably quite expressive and unobtrusive as a representation medium, it
is difficult to process into formal descriptions. It also requires theoretically unbounded amounts of knowledge to
satisfactorily resolve incompleteness, vagueness, ambiguity, etc. In practice, however, test cases are written in a
style that is considerably more restrictive than most English text. The test case descriptions are circumscribed
in terms of the vocabulary and concepts to which they
refer. Syntactic and semantic variations do occur, but
the language is a technical dialect of English, a naturally
occurring telephonese language that is less variable and
less complex. These limits to a specific domain and style
make it possible to transform the informal telephonese
representation into a formal one.
Second, incomplete test cases needed to be extended.
Even though humans find it easier to write test cases
in natural language as opposed to formal language, they
still have difficulties specifying tests that are both complete and consistent. They also have difficulties identifying all of the interactions that can occur in a complex system. This is analogous to the difference between
trying to define a word and giving examples of its use.
Creating a good definition, like creating a complete test
case with all the details, is usually the more challenging
task; giving word-usage examples, like describing a test
case in general terms, is easier. Therefore, the input test
cases need to be translated into a formal representation
and then analyzed to be corrected and/or extended.
Both tasks have been attempted for more than a
decade [Balzer et al., 1977] with only limited success.
Most difficulties arise because of the many possible types
of imprecision in unrestricted natural language specifications, as well as by the lack of a suitable corpus of
formalized background knowledge to guide automated
reasoning tools for most application domains.
To address these two difficulties (see also
[Yonezaki, 1989]), KITSS provides a natural language
processor that is trained on examples of the telephonese
sub-language using a statistical approach. It also provides a completeness and interaction analyzer that audits
test coverage. However, these two modules have been
feasible only due to the domain-specific knowledge-based
approach taken in KITSS [Barstow, 1985]. Therefore,
both modules are supported by a hybrid knowledge-base
(the "K" in KITSS) that contains a model ofthe DEFINITY PBX domain. Concepts that are used in telephony
and testing are available to both processes to reduce the
complexity of their interpretive tasks. If, for example,
a process gets stuck and cannot disambiguate the possible interpretations of a phrase, it interacts (the "I" in
KITSS) with the test author. It presents the context in
which the ambiguity occurs and presents its best guesses
and asks the author to pick the correct choice. Finally,
Figure 2: KITSS Architecture
KITSS also provides a translator that generates the actual test scripts (the "TS" in KITSS) from the formal
representation derived by the analyzer.
The two needs described above led to the architecture shown in Figure 2. It shows that KITSS consists of
four main modules: the domain model, the natural language processor, the completeness and interaction analyzer, and the translator. The domain model (see Section 3) is in the center of the system and supports all
three reasoning modules (see Section 4).
3
Domain Knowledge
A domain model serves as the knowledge base for an application system. Testing is a very knowledge intensive
task. It involves experience with the switch hardware
and testing equipment as well as an understanding of
the switch software with its several hundred features and
many more interactions. There are binders full of papers
that describe the features of DEFINITY PBX software,
but no concise formalizations of the domain were available before KITSS. One of the core pieces of KITSS is
its extensive domain model. The focus of KITSS and the
domain model is on an end-user's point of view, i.e., on
(physical and software) objects that the user can manipulate.
The KITSS domain model consists of three major functional pieces (see Figure 3):
Core PBX model: It is split into two major parts.
The static model is used by all reasoning modules. The
dynamic model is used mainly by the analyzer.
Test execution model: It includes details about
the current switch configuration and all the necessary
1094
DO
M
A
IN
MOD
STATIC MODEL
CORE
PBX
MODEL
• Major hardware
components
• Static data
• Phenomena
• Processes
• Logical resources
TEST
• Configuration model
EXECUTION • Automated test,
MODEL
language model
E
L
DYNAMIC MODEL
• Predicates
• Primitive stimuli
, • Abstract stimuli
• Observables
• Integrity constraints
- Invariants
- Rules
• Phenomena, such as tones and flashing patterns
which are occurrences at points in time.
• Processes, such as static definition of types of calls
(e.g. voice calls, data calls, priority calls) and types
of sessions (e.g. calling sessions, feature sessions).
• Logical resources, such as lines and trunks required
by processes.
LINGUISTIC • Telephonese statistics
MODEL
• Telephonese concepts
TERMINOLOGICAL LOGIC
• Static data, e.g., telephone numbers, routing codes
and administrative data such as available features,
and current feature settings.
The test execution model is divided as follows:
TEMPORAL LOGIC
Figure 3: KITSS Domain Model
• The configuration model describes the current test
setup, i.e., how many simulated phones and trunk
lines are available or which extension numbers belong to which phones/lines, etc. It also contains the
dial plan and the default feature assignments.
• The automated test language model defines the vocabulary of the test script language.
specifics of the automated test language. This model is
used mainly by the translator.
Linguistic model: It is specific to the input language
(telephonese) and is used mainly by the natural language
processor.
From a knowledge representational point of view,
we distinguish between static properties of the domain
model and dynamic ones [Brodie et al., 1984]. Static
properties include the objects of a domain, attributes
of objects, and relationships between objects. All static
parts of the domain model are implemented in a terminological logic (see Section 3.1). Dynamic properties
include operations on objects, their properties, and the
relationships between operations. The applicability of
operations is constrained by the attributes of objects.
Integrity constraints are also included to express the regularities of a domain. The dynamic part of the core PBX
model is represented in temporal logic (see Section 3.2).
3.1
Static Model
This part of the domain model represents the static aspects of KITSS. By static we mean all objects, data, and
conditions that do not have a temporal extent but may
have states or histories.
The static PBX model includes the following pieces:
• Major hardware components, such as telephones and
switch administration consoles as well as smaller
subparts of theses components, e.g., buttons, lamps,
and handsets.
The linguistic model supports two pieces:
• Telephonese statistics, which are frequency distributions of syntactic structures, help the natural language processor by disallowing interpretations of
phrases and concepts that are possible in English
but not likely in telephonese.
• Telephonese concepts make it easier to paraphrase
KITSS' representations for user interactions.
We used CLASSIC [Brachman et al., 1989] to represent the knowledge in our domain. CLASSIC belongs to
the class of terminological logics (e.g. KL-ONE). It is
a frame-based description system that is used to define
structured concepts and make assertions about individu-.
also CLASSIC organizes the concepts and the individuals
into a hierarchy by classification and subsumption. Additionally, it permits inheritance and forward-chaining
rules. CLASSIC is probably the most expressive terminological logic that is still computationally tractable
[Brachman et al., 1990]. Queries to CLASSIC are made
by semantics not by syntax.
The static model incorporates multiple views of an object from the various models into one (e.g., a station
might have one name in the English test case, another
in the automated test language and a third in the actual
configuration). Thus, although each reasoning module
might have a different view on the same object, CLASSIC will always retrieve the same concept correctly.
1095
3.2
Dynamic Model
This unique part of the domain model represents all dynamic aspects of the switch's behavior. It basically defines constraints that have to be fulfilled during testing
as well as the predicates they are defined upon.
The dynamic PBX model includes the following
pIeces:
• Predicates, such as offhook, station-busy, connected,
or on-hold, define a state which currently holds
The different phases of a call
for the switch.
are described with predicates such as requestingconnection, denied-connection, or call-waiting-fortimeout. Each of the predicates has defined sorts
that relate to objects in the static model. Synonyms
(e.g., on-hold is a synonym for call-suspended) are
allowed as well.
• Stimuli can be either primitive or abstract. Stimuli
appear in the action statements of test cases.
A primitive stimulus defines an action being performed by the user (e.g., dials-extension, goesoffhook) or by the switch (e.g., timeout-call). The
necessary pre- and post conditions (before and after
the stimulus) are also specified. For instance, for
a station to be able to go offhook the precondition
is that the station is not already offhook and the
postcondition is that the station is offhook after the
stimulus 4 .
An abstract stimulus is not an atomic action but may
have pre- and post conditions like a primitive stimulus. However, there are several primitive stimuli necessary to achieve the goal of a single abstract stimulus (e.g., place-call, busy-out-station, or activatefeature). The steps necessary for an abstract stimulus are defined in one or many abstract stimulus
plans. The abstract stimulus defines the conditions
that need to be true for the goal to succeed whereas
the abstract stimulus plans describe possible ways
of achieving such a goal.
• Observables are states that can be verified such as
receives-tone, ringing, or status-lamp-state. ObservabIes appear in the verify statements of test cases.
Additionally, the dynamic model includes two different
types of integrity constraints:
• Invariants are assertions that are true in all states.
These are among the most important pieces of domain knowledge as they describe basic telephony behavior as well as the look €3 feel of the switch. The
paraphrases of a few of the invariants are as follows:
4N ote the difference between the state of being offhook and the
action goes-offhook.
"Only offhook phones receive tones" or "You only
get ringing of any kind when you are alerting" or
"A forwarded call always alerts at the forwardee,
never at the forwarder" or "You can't be be talking
to an on-hold call".
• Rules also describe low-level behavior in telephony.
These are mainly state transitions in signaling behavior like "A tone must stop whenever another begins" or "Stop dial-tone after dialing an extension"
or "An idle phone starts to ring when the first incoming call arrives".
Representing the dynamic model we required expressive power beyond CLASSIC or terminological logics.
For example, CLASSIC is not well-suited for representing plan-like knowledge, such as sequences of actions to
achieve a goal, or to perform extensive temporal reasoning [Brachman et al., 1990]. But this is required for
the dynamic part of KITSS (see above examples). We
therefore used the WATSON Theorem Prover (see Section 4.2), a linear-time first-order resolution theorem
prover with a weak temporal logic. This non-standard
logic has five modal operators holds} occurs} issues} begins} and ends which are sufficient to represent all temporal aspects of our domain. For example, the abstract
stimulus plan for activating a feature is represented in
temporal logic as follows.
(abstract-stimulus-plan activate-feature-1
«:plan-goal activate-feature)
(:sorts
«station s1) (feature f) (station s2)))
(:preconditions
«holds (onhook s1))))
( :plan-steps
«(occurs (initiate-feature-session s1 f))
(begins (receives-tone s1
second-dial-tone)))
«occurs (dials-destination s1 s2))
(issues (receives-tone s1
confirmation-tone)))
«occurs (terminate-feature-session s1 f))
)))))
The theorem proving is tractable due to the tight integration between knowledge representation and reasoning.
Therefore, we specifically designed the analyzer using the
WATSON Theorem Prover and targeting them for this
domain. The challenging task in building the dynamic
model was to understand and extract what the invariants, constraints, and rules were [Zave & Jackson, 1991].
Representing them then in the temporal logic was much
eaSIer.
1096
3.3
Domain Model Benefits
In choosing a hybrid representation, we were able to increase the expressive power of our domain model and to
increase the reasoning capabilities as well. The integration of the hybrid pieces did produce some problems, for
example, deciding which components belonged in which
piece. However, this decision was facilitated because of
our design choice to represent all dynamic aspects of the
system in our temporal logic and to keep everything else
in CLASSIC.
There were other benefits to building a domain model.
It ensures that a standard terminology is used by all of
the test case authors. The domain model also simplifies
the maintenance of test scripts. In automated testing
environments without a domain model, the knowledge is
scattered throughout thousands of scripts. With the domain model a change in the functioning of the software
is made in only one place which makes it possible to
centralize knowledge and therefore centralize the maintenance effort. Additionally, the domain model provides
the knowledge that r~duces and simplifies the tasks of
the natural language processor, the analyzer, and the
translator modules.
4
Reasoning Modules
4.1
Natural Language Processor
The existing testing methodology used English as the
language for test cases (see Figure 1) which is also
KITSS' input. Recent research in statistical parsing
approaches [Jones & Eisner, 1991] provided some answers to the difficulty of natural language parsing in restricted domains such as testing languages. In the KITSS
project, the parser uses probabilities (based on training
given by telephonese examples) to prune the number of
choices in syntactic and semantic structures. Unlikely
structures can be ignored or eliminated, which helps to
speed up the processing. For instance, consider the syntax of the following two sentences 5 :
Place a call to station troops in Saudi Arabia.
Place a call to station "4623" in two minutes.
Both examples are correct English sentences. Although the second sentence on the surface matches in
many parts the first one, their structure is very different.
In the first sentence "station" is a verb, in the second a
noun; "to" is an infinitive and a preposition respectively.
"In Saudi Arabia" refers to a location whereas "in two
minutes" refers to time. It is hard to come up with correct parses for both but by restricting ourselves to the
5This example was given by Mark Jones.
telephonese sublanguage this is somewhat easier. In telephonese, the structure of the first sentence is statistically
unlikely and can be ignored while the second sentence is
a common phrase.
The use of statistical likelihoods to limit search during
natural language processing was used not only during
parsing but also when assigning meaning to sentences,
determining the scope of quantifiers, and resolving references. When choices could not be made statistically,
the natural language processor could query the domain
model, the analyzer, or the human user for disambiguation. The final output of the natural language processor are logical representations of the English sentences,
which are passed to the analyzer.
4.2
Completeness & Interaction
Analyzer
The completeness and interaction analyzer represents
one of the most ambitious aspects of KITSS. It is based
on experience with the WATSON research prototype
[Kelly & Nonnenmann, 1991]. Originally, WATSON was
designed as an automatic programming system to generate executable specifications from episodic descriptions
in the telephone switching software domain. This was an
extremely ambitious goal and could only be realized in
a very limited prototype. To be able to scale up to realworld use, the focus has been shifted to merely checking and augmenting given tests and maybe generating
related new ones rather than generating the full specification.
Based on the natural language processor output, the
analyzer groups the input logical forms into several
episodes. Each episode defines a stimulus-response-cycle
of the switch, which roughly corresponds to the action/verify statements in the original test case. These
episodes are the input for the following analysis phases.
Each episode is represented as a logical rule, which is
checked against the dynamic model. The analyzer uses
first-order resolution theorem proving in a temporal logic
as its inference mechanism, the same as WATSON.
The analysis consists of several phases that are specifically targeted for this domain and have to be re-targeted
for any different application. All phases use the dynamic
model extensively. The purpose of each phase is to yield
a more detailed understanding of the original test case.
The following are the current analysis phases:
• The structure of a test case is analyzed to recognize or attribute purpose to pieces of the test case.
There are four major pieces that might be found:
administration of the switch, feature activation or
deactivation, feature behavior, and regression testing.
1097
• The test case is searched for connections among concepts, e.g., there might be relations between system
administration concepts and system signaling that
need to be understood.
• Routine omissions are inserted into the test case.
Testers often reduce (purposefully or not) test sequences to their essential aspects. However, these
omissions might lead to errors during testing and
therefore need to be added.
• Based on the abstract plans in the dynamic model,
we can enumerate possible specializations, which
yield new test cases from the input example.
• Plausible generalizations are found for objects and
actions as a way to abstract tests into classes of
tests.
During the analysis phases, the user might interact
with the system. We try to exploit the user's ease at
verifying or falsifying examples given by the analyzer.
At the same time, the initiative of generating the details
of a test lies with the system. For example, some test
case might violate the look B feel of the system, i. e.,
there is a conflict with an invariant. However, the user
might want this behavior intentionally which will lead to
a change in the look B feel itself.
The final output of the analyzer is a corrected and
augmented test case in temporal logic. As an example of
the analyzer's representation after analysis, the following shows the logical forms for the first few episodes .in
Figure 1. Notice that the test case is expanded since the
analyzer applied abstract stimulus plans.
((OCCURS (GOES-OFFHOOK B))
(BEGINS (RECEIVES-TONE B NORMAL-DIAL-TONE)))
((OCCURS (DIALS-CODE B
(ACTIVATE-ACCESS-CODE CF)))
(BEGINS (RECEIVES-TONE B SECOND-DIAL-TONE)))
((OCCURS (DIALS-EXTENSION B C))
(ISSUES (RECEIVES-TONE B CONFIRMATION-TONE))
(BEGINS (STATUS-LAMP-STATE B (BUTTON CF)
STEADY)))
This representation is passed to the translator.
4.3
Translator
To make use of the analyzer's formal representation,
the translator needs to convert the test case into an
executable test language. This language exercises the
switch's capabilities by driving test equipment with the
goal of finding software failures. One goal of the KITSS
project was to extend the life of test cases so that they
could be used as many times as possible. To accomplish
this, it was decided to make the translator support two
types of test case independence.
First, a test case must be test machine independent.
Each PBX that we run our tests on has a different configuration. KITSS permits a test author to write a test
case without knowing which particular machine it will be
run on and assuming unlimited resources. The translator
loads the configuration setup of a particular switch into
the test execution model. It uses this to make the test
case concrete with respect to equipment used, system administration performed, and permissions granted. Thus,
if the functional description of a test case is identical in
two distinct environments, then the logical representation produced by the earlier modules of KITSS should
also be identical.
Second, a test case must be independent of the automated test language. KITSS generates test cases in an
in-house test language. The translator's code is small
because much of the translation information is static
and can be represented in CLASSIC. If a new test language replaces the current one then the translator can
be readily replaced without loss of test cases, with minimal changes to the KITSS code, and without a rewrite
of most of the domain model.
5
Status
The KITSS project is still a prototype system that has
not been deployed for general use on the D EFINITY
project. It was built by a team of researchers and developers. Currently, it fully translates 38 test cases
(417 sentences) into automated test scripts. While this
is a small number, these test cases cover a representative range of the core features. Additionally, each test
case yields multiple test scripts after conversion through
KITSS. The domain model consists of over 500 concepts,
over 1,500 individuals, and more than 80 temporal constraints. The domain model will grow somewhat with
the number of test cases covered, however, so far the
growth has been less than linear for each feature added.
All of the modules that were described in this paper
have been implemented but all need further enhancements. System execution speed doesn't seem to be a
bottleneck at this point in time. CLASSIC's fast classification algorithm's complexity is less than linear in the
size of the domain model. Even the analyzer's theorem
prover, which is computationally the most complex part
of KITSS, is currently not a bottleneck due to continued
specialization of its inference capability. However, it is
not clear how long such optimizations can avoid potential
intractability.
The current schedule is to expand KITSS to cover a
few hundred test cases. To achieve this, we will shift our
1098
strategy towards more user interaction. The version of
KITSS currently under development will intensely question the user to explain unclear passages of test cases. We
will then re-target the reasoning capabilities of KITSS
to cover those areas. This rapid-prototyping approach is
only feasible since we have already developed a robust
core system. Although scaling-up from our prototype to
a real-world system remains a hard task, KITSS demonstrates that our knowledge-based approach chosen for
functional software testing is feasible.
6
Conclusion
As we have shown, testing is perhaps one of the most
expensive and time-consuming steps in product design,
development, and maintenance. KITSS uses some novel
approaches to achieving several desirable goals. Features
will continue to be specified in English. To support this
we have incorporated a statistical parser that is linked to
the domain model as well as to the analyzer. Additionally, KITSS will interactively give the user feedback on
the test cases written and will convert them to a formal
representation. To achieve this, we needed to augment
the domain model represented in a terminological logic
with a dynamic model written in a temporal logic. The
temporal logic inference mechanism is customized for the
domain. Tests will continue to be specified independent
of the test equipment and test environment and the user
will not have to provide unnecessary details.
Such a testing system as demonstrated in KITSS will
ensure project-wide consistent use of terminology and
will allow simple, informal tests to be expanded to formal and complete test scripts. The result is a better
testing process with more test automation and reduced
maintenance cost.
Acknowledgments
Many thanks go to Van Kelly, Mark Jones, and Bob Hall
who also contributed major parts of the KITSS system.
Additionally, we would like to thank Ron Brachman for
his support throughout the project.
References
[Balzer et al., 1977J Balzer R., Goldman N., and Wile
D.: Informality in program specifications. In Proceedings of the 5th IJCAI, Cambridge, MA, 1977.
[Barstow, 1985J Barstow, D.R.: Domain-specific automatic programming. IEEE Transactions on Software
Engineering, November 1985.
[Brachman et al., 1989J Brachman, R.J., Borgida, A.,
McGuinness, D.L., and Alperin Resnick, L.:
The
CLASSIC knowledge representation system, or, KLONE: The next generation. In preprints of Workshop on
Formal Aspects of Semantic Networks, Santa Catalina
Island, CA, 1989.
[Brachman et al., 1990J Brachman, R.J., McGuinness,
D.L., Patel-Schneider, P.F., Alperin Resnick, L., and
Borgida, A.: Living with CLASSIC: When and how
to use a KL-ONE-like language. In Formal Aspects of
Semantic Networks, J. Sowa, Ed., Morgan Kaufmann,
1990.
[Brodie et al., 1984J Brodie, M.L., Mylopoulos, J., and
Schmidt, J.W.: On conceptual modeling: Perspectives
from Artificial Intelligence. Springer Verlag, New York,
NY, 1984.
[Brooks, 1987J Brooks, F.P.: No silver bullet: Essence
and accidents of software engineering. Computer, Vol. .
20, No.4, April 1987.
[Howden, 1985J Howden, W.E.: The theory and practice
of functional testing. IEEE Software, September 1985.
[Jones & Eisner, 1991J Jones, M.A., and Eisner J.: A
probabilistic chart-parsing algorithm for context-free
grammars. AT8T Bell Laboratories Technical Report,
1991.
[Kelly & Nonnenmann, 1991J Kelly, V.E., and Nonnenmann, U.: Reducing the complexity of formal specification acquisition. In Automating Software Design, M.
Lowry and R. McCartney, eds., MIT Press, 1991.
[McCartney, 1991J McCartney, R.:
Knowledge-based
software engineering: Where we are and where we are
going. In Automating Software Design, M. Lowry and
R. McCartney, eds., MIT Press, 1991.
[Myers,1976J Myers, G.J.: Software Reliability. John
Wiley & Sons, New York, NY, 1976.
[Myers, 1979J Myers, G.J.: The Art of Software Testing.
John Wiley & Sons, Inc. New York, NY, 1979.
[Nonnenmann & Eddy, 1991J Nonnenmann, U., and
Eddy J.K.: KITSS - Toward software design and testing integration. In Automating Software Design: Interactive Design - Workshop Notes from the 9th AAAI,
1. Johnson, ed., USC/lSI Technical Report RS-91-287,
1991.
[Yonezaki, 1989J Yonezaki, N.: Natural language interface for requirements specification. In Japanese Perspectives In Software Engineering, Y. Matsumoto and
Y., Ohno, eds., Addison-Wesley, 1989.
[Zave & Jackson, 1991J Zave, P, and Jackson, M.: Techniques for partial specifica~ion and specification of
switching systems. In Proceedings of the VDM'91 Symposium, October 1991.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1099
A Diagnostic and Control Expert System Based on a Plant Model
Junzo SUZUKI* Chiho KONUMA
Mikito IWAMASA
Naomichi SUEDA
Systems&Software Engineering Laboratory, Toshiba Corporation
70, Yanagi-cho, Saiwai-ku, Kawasaki 210, Japan
Shigeru MOCHIJI
Akimoto KAMIYA
Fuchu Works, Toshiba Corporation
1, Toshiba-cho, Fuchu 183, Japan
Abstract
A conventional expert system for plant control is based
on heuristics, which are a priori knowledge stored in a
knowledge base. Such a system has a substantial limitation in that it cannot deal with "unforeseen abnormal
situations" in a plant due to the lack of heuristics. To
realize a flexible plant control system which can overcome this limitation, we focus on model-based reasoning. Our system has three major functions: 1) modelbased diagnosis for unforeseen abnormal situations, 2)
model-based knowledge generation for plant control, and
3) knowledge-based plant control both with generated
and a priori stored know ledge.
In this paper, we focus on the function of model-based
knowledge generation. First, we show an overview of
our system which has an integrated architecture of deep
reasoning with shallow reasoning. Next, we explain the
theoretical aspects of model-based knowledge generation.
Finally, we show the experimental results of our system, and discuss the system's capabilities and some open
problems.
1
Introduction
Currently in the field of diagnosis and control of thermal power plants, the more intelligent and flexible systems become, the more knowledge they need. Conventional diagnostic and control expert systems are based
on heuristics stored a priori in knowledge bases, so they
cannot deal with unforeseen abnormal situations in the
plant. Such situations could occur if knowledge engineers
forgot to implement some necessary knowledge.
A skilled human operator is able to operate the plant
and somehow deal with such unforeseen abnormal situations because he has fundamental knowledge about
the structure and functions of component devices of a
plant, the principles of plant operations, and the laws of
*Email:suzuki%ssel.toshiba.co.jp@uunet.uu.net
physics. His thought process is as follows.
• Diagnosis of an unforeseen abnormal situation
• Generation of plant control knowledge
• Verification of generated knowledge
A skilled human operator can deal with unforeseen
abnormal situations by repeatedly executing these steps
using the fundamental knowledge mentioned before.
Therefore, the concepts of our diagnostic and control expert system are based on the same steps.
In this paper, we focus on the generation and verification of plant control knowledge. First, we show an
overview of our system. Next, we explain the model representations and the model-based reasoning mechanisms.
After that, we describe the experimental results and discuss the system's capabilities. Finally, we discuss some
open problems and related work.
2
A System Overview
The model-based diagnostic and control expert system
(Figure 1) consists of two subsystems: the Shallow Inference Subsystem (SIS) and the Deep Inference Subsystem
(DIS).
The SIS is a conventional plant control system based
on heuristics, namely the shallow knowledge for plant
control. It selects and executes plant operations according to the heuristics stored in the knowledge base. The
Plant Monitor detects occurrences of unforeseen abnormal situations, and then activates the DIS.
The DIS consists of the following modules: the Diagnosor, the Operation-Generator, the PreconditionGenerator, and the Simulation- Verifier. The Diagnosor
utilizes the Qualitative Causal Model for plant process
parameters to diagnose unforeseen abnormal situations.
The Operation-Generator figures out which plant operations are necessary to deal with these unforeseen abnormal situation. It utilizes the Device Model and the
1100
Operation Principle Model. The Precondition-Generator
attaches the preconditions to each plant operation above,
and as a result, generates rule-based knowledge for plant
control. The Simulation- Verifier predicts plant behavior which is to be observed when the plant is operated
according to th~ generated knowledge. It utilizes the
Dynamics Model, verifies the generated knowledge using predicted plant behavior, and gives feedback to the
Operation- Generator to refine the knowledge if necessary.
The knowledge compiled from models by the DIS is
transmitted to the SIS. The SIS executes the plant operations accordingly, and as a result, the unforeseen abnormal situations should be handled properly.
Deep Inference Subsystem
------
component devices, a Device Model can be defined
for each component device. Figure 2 shows the Device Model representation for a boiler-feeding-waterpump, which supplies water to a boiler.
name:
demand:
goal:
states :
operation :
a_bfp
a_bl'f: ~ [tonlbrJ
a_bI'f =< capaclty( a_bIT)
on ; capaclty( a bIT) =615 [tonlbr)
off ; capaclty( (.bIT) = 0 [tonlbr)
off -.on ; time-lag = 0.1 [br) , dldt( a_bIT) = +
on ~ ; time-lag =0.1 [br), dldt( a_bIT) =-
=dldt( a_bflf)
quality:
dldt( a_bIT)
flowJn:
flow_out:
system:
( defiMd III sys~". )
( dejiMd III sys~". )
bfp_system( a_btf , a_btlf )
Figure 2: An example of the Device Model
The demands for each component device are de-·
scribed in the demand slot, and their constraints
to be satisfied are described in the goal slot.
The functions of each component device are described as possible states of each device in the states
slot. The operations of a device are defined by the
change of its state.
Direct and indirect influences to plant processes
by operations are described in the operation and
quality slots respectively.
The structure of a plant is described in the flow jn
and flow _out slots. In addition, hierarchical modeling can be done as shown in Figure 3.
Figure 1: An overview of the system
Model-Based Generation and
Verification of Knowledge
3
The main purpose of this section is to present a generation and verification procedure for plant control knowledge to deal with unforeseen abnormal situations. This
knowledge is in IF-THEN format.
3.1
Model Representation
The Device Model and the Operation Principle Model are
used to generate the knowledge. The Dynamics Model is
used to verify the knowledge. We explain these models
briefly.
1. Device Model
The Device Model represents the fundamental
knowledge about the functions, structure and characteristics of a plant. Because a plant consists of
Figure 3: Hierarchical modeling of plant devices
2. Operation Principle Model
The Operation Principle Model is concerned with
the principles for safe and economical plant control.
It consists of the following two rules.
• Strict Accordance Rule
The purpose of this rule is to ensure plant
safety throughout a series of plant operations.
It consists of the following two components: a
rule to use a device within its own allowable
range, and a rule to keep a faulty device out of
service.
110 1
• Preference Rule
The purpose of this rule is to ensure an economical plant operation. It consists of the following
two components: a rule to keep the number of
in-service devices to a minimum, and a rule to
equalize the service-time of each device.
at each device to the others according to the
connections of devices, and locally verifying the
constraInts at each device.
demand for A
demand forB
3. Dynamics Model
The Dynamics Model represents the dynamic characteristics of the plant. In the area of plant control, the Dynamics lVlodel is concerned either with
the functions of traditional plant controllers based
on PID-controlor with the characteristics related to
physical laws. Figure 4 shows the model of a waterflow-controller. f{ p and T are constants. 1/ s means
the integral operator.
feeding
water
~~_~~.o~___ ~~~~ftow
Figure 4: An example of the Dynamics Model
3.2
Model-Based
nism
Reasoning
Mecha-
We briefly explain the model-based reasoning mechanism
of these modules: the Operation Generator, the Precondition Generator and the Simulation- Verifier.
State change ...·)(~arameterls
Le. operation" i value change
: I.e. external
-\1
\_ .....~~!:~~!.J
Figure 5: Constraints verification function
(b) Update of the Goal-State
If some of the constraints at a certain device
are proved not to be satisfied, a new state for
this device should be sought in order to satisfy them. This function (Figure 6) consists
of the following sub-functions: searching for a
state of each device where all of its demands
can be satisfied, distributing the demands for
a device of higher hierarchy to devices of lower
hierarchy according to the constraints defined
by the Operation Principle Model, and generating new demands for connected devices according to the Device Model and propagating them.
The plant operations are deduced by taking the
difference between the initial goal-state and the
updated one.
New demand
for C
) ...
Propagate
new demand for D
1. Operation Generator
This module determines the goal-state where all of
the constraints defined by the Device Model and
the Operation Principle Model are satisfied. Generally, an unforeseen abnormal situation causes a
state change of a plant, and this change can make
the above constraints unsatisfied. To estimate this
unsatisfied constraints, the following functions are
needed.
(a) Verification of Constraints
All the constraints defined by the Device Model
should be verified to see if they are still satisfied after the unforeseen abnormal situation.
This function (Figure 5) c'onsists of the following two sub-functions: propagating the change
Figure 6: Goal-State update function
2. Precondition Generator
In the domain of thermal power plant control,
preconditions of each plant operation can be
classified into the following five generic classes
[Konuma 1990J.
• Preconditions for the state before an operation
• Preconditions for the order of operations
1102
procedure GenelYlle&Test (M or DO , SO )
begin
[ Se, Op] <= Operation_GenelYlle (M or DO, SO) ;
Kl <= Precondition_GeMlYlle (SO, Se, Op ) ;
PS <= Sim,,",~ (SO, Kl ) ;
[NG, Dl, SI ] <= Verify (PS);
IfNG
constraint violation
tben return( Kl, Se ') ;
else
[ 10, S3 ] <= GeMra~&Test (Dl, SI ) ;
[ K3, Se ] <= Generate&Test (M, S3 ) ;
K4 <= FIX( Kl ) + 10 + K3 ;
return( K4, Se ) ;
endlf
end.
• Preconditions for safety during an operation
• Preconditions for the timing of an operation
='=
• Preconditions for completion of an operation
This module generates the above preconditions for
each operation by analyzing the goal-state according
to the constraints defined by the Device Model. An
image of their generation process is shown in Figure
7.
NOTATION:
SI, Se : plant state
DI : demand for a device
PS : plant bebavlor
KI : plan ~ plant operations
[ ] : list expression
M : output fIf Dlagnosor
Op : plant operations
NG : nag for allowable range
violations
<= : substitution expression
Figure 8: Generate&Test algorithm of the knowledge
Experiments
4
We have implemented the expert system on Multi-PSI
[Taki 1988]. To realize a rich experimental environment,
we have also implemented a plant simulator instead of an
actual plant on a mini-computer G8050. Both computers are linked by a data transmission line. This section
describes the results of some experiments.
Precnd. for order
Precnd. for safety
Precnd. for timing
Precnd. for completion
~-----IF-part - - - - - - - i l
t- THEN-part --i 4.1
Figure 7: Generation process of preconditions
3. Simulation Verifier
This module predicts plant behavior using the Dynamics Model to verify the knowledge compiled
from models by the Operation Generator and Precondition Generator. The prediction of plant behavior can be realized through simulation methods
[Suzuki 1990]. After the prediction, the module examines whether or not undesirable events have occurred. Undesirable events can be defined by several
criteria, but one of the most important is the transient violation of the allowable range for each process parameter's value. The execution of plant operations usually causes the transient change of processes due to the dynamic characteristics of a plant.
If this change is beyond the allowable range of a
current plant state, it is detected as a violation.
The Simulation- Verifier supports the GenerateBTest algorithm of knowledge [Suzuki 1990] as
illustrated in Figure 8. This process can be formalized as updating the goal-state according to the
degree of the violation.
Configuration of a Thermal Power
Plant
Figure 9 shows the configuration of the thermal power
plant. It consists of controllers (hatched rectangle) and
devices. The condenser is a device for cooling the turbine's exhaust steam; the steam is reduced to water using cooling water taken from the sea. The reduced water is moved through the de-aerator to the boiler by the
condensation-pump-system and the boiler-feeding-pumpsystem. The cooling water is provided by the circulationpump-system. The fuel-system supplies pulverized coal
to the boiler.
4.2
Experimental Results
The total of the Device Models in the system amounts
to 78 (Table 1). In this table, the difference between the
numbers in the left and right columns is due to hierarchical modeling.
The experiments were performed as follows.
1. First, we selected appropriate faults of the following devices: a coal-pulver,izer, a boiler-feeding-pump,
a condensation-pump, a circulation-water-valve, and
a water-heater. We made these faults the malfunctions of the plant simulator. We also set them up
for multiple faults.
1103
Table 1: The amount of devices and controllers
~
Devices
Amount in
the plant
Amount in
the Device Models
43
63
Controllers
7
15
Total
50
78
in Nl (N2), the number of generated ones by the system
(N3), the covered ratio of N2 by the system (C Rl), and
the uncovered ratio of N2 (ER), namely, the ratio of N2
missed being generated or incorrectly generated by the
system. (C R2 will be explained in Section5.)
The difference between Nl and N2 is due mainly to
the following reason. Although a human expert specifies
the preconditions of the knowledge as generally as possible, the system generates specialized preconditions for
each occurring unforeseen abnormal situation. With this
point in mind, we determine N2 by eliminating unnecessary preconditions from Nl. CRi (i=1,2) and ER are
calculated by the following formulas.
' = Success(N2)
CR ~
N2
Figure 9: Configuration of a thermal power plant
2. Next, we extracted some specific knowledge for plant
control from the knowledge base in the SIS. This
specific knowledge was necessary to deal with the
selected faults. As a result, the selected faults were
equal to unforeseen abnormal situations.
3. Finally, after activating the malfunctions of the
plant simulator, we confirmed that the DIS compiled the knowledge from the models and that the
SIS executed the operations accordingly.
We explain the quality of generated knowledge for a
single fault, because the results in multiple faults are the
same as in a single fault. In the experiments, the contents of generated knowledge are concerned with switching from a faulty device to a backup one. Table 2 summarizes all the generated plant operations. In the case of
a water-heater fault, the system failed to generate plant
operations. In other cases, the system succeeded in generating plant operations. We estimate the quality of the
generated knowledge in terms of its preconditions. This
table lists columns consisting of the following items for
each operation: the number of the preconditions encoded
by a human expert (Nl), the number of the essential ones
ER = Miss(N2) + Fail(N2)
N2
Success(N2) denotes the number of N2 generated by
the system; Miss(N2) the number of N2 which were not
generated; and Fail(N2) the number of N2 incorrectly
generated.
We also consider the following in evaluating
Success(N2).
• Although the generated preconditions enumerate
the individual state of each device, a human expert
often represents them succinctly. For example, the
conjunctive precondition "a_bfp = on" 1\ "b_bfp =
on" 1\ "c_bfp = off" are represented as "the number
of activated bfp = 2".
• The system often generates superfluous preconditions that a human expert does not mention.
• Although a human expert encodes preconditions for
the selection of an in-service device, the system
never generate them because they are already estimated in applying the Operation Principle Model.
None of the above devalues the quality of generated
knowledge because the system is "required only to generate specific preconditions for an occurring unforeseen
abnormal situation. For this reason, we regard generated preconditions applicable to any of the above as
Success(N2).
We carried out the experiments under the following
conditions.
1104
Table 2: Quality of generated knowledge
~
econd.
lunforeseen (OP)
situation
.
Operation
1. activate
Pulverizer
a Pulverizer
Fault
2. halt
a Pulverizer
3. activate
aBFP
4. open
a FWCV
BFP
Fault
CP
Fault
CWV
Fault
HTR
Fault
S.setFWCV
auto
6. set FWCV
hand
7. close
a FWCV
8. halt
aBFP
9. activate
aCP
10. halt
aCP
11. activate
aCWP
12. open
aCWV
13. close
aCWV
14. halt
aCWP
15. open a HTR
BvoassVLV
16. close a HTR
VLV
The number of preconditions
Knowledge Knowledge Generated Covered (CRl) Error (ER) C.R. after (CR2)
refinement [% ]
in KB (Nl) in KB (N2)
(N3) ratio [%]
ratio [%]
12
6
11
100
0
100
8
8
12
100
0
100
42
18
8
22
78
100
26
10
8
40
60
90
32
8
8
38
62
75
12
4
7
75
25
100
14
6
9
83
17
100
23
8
11
87
13
100
17
7
6
57
43
100
13
7
8
86
14
100
8
7
7
57
43
100
4
3
7
67
33
100
4
3
8
67
33
100
7
8
71
29
failed to
aenerate OP
failed to
aenerate OP
failed to
aenerate OP
failed to
aenerate OP
8
4
3
4
3
• Once DIS was activated, no further unforeseen abnormal situation occurred.
• The Diagnosor deduced the exact diagnostic results.
Because of the above conditions, SIS interpreted all
the generated knowledge and handled the unforeseen abnormal situations. Figure 10 shows the generated knowledge and its corresponding knowledge encoded by a human expert for the operation no.5 in Table 2. We also
show some additional information in Figure 10, which is
referred to in the next section.
5
Discussion
In this section, we evaluate the system's capability to
generate the necessary plant operations and to generate the correct preconditions for each operation. The
former is concerned with performance of the Operation
Generator' and the latter is concerned with that of the
Precondition Generator. In addition, we discuss the pros
and cons of using Multi-PSI and some open problems.
1. Capability to generate plant operations
I
I
failed to
generateOP
failed to
aenerateOP
100
failed to
generate OP
failed to
aenerate OP
In the experiment, the system could generate all the
necessary plant operations for each malfunction except the water-heater fault. We briefly explain the
reason for this failure bellow.
At a boiler, the following approximation holds true
for outlet steam pressure (P), inlet fuel flow (F), inlet water temperature (T) and inlet water flow (G).
are positive constants, and al, a2 are correction terms related to other process parameters.
Cl, C2
The Operation Generator calculates F, G and T
from P using this formula defined in the Device
Model. P is the demand for the boiler. After that,
the Operation Generator propagates F to the fuelsystem, and G and T to the water-heater as a new
demand respectively. In this time, the Operation
Generator must evaluate the above formula from left
side to right side, but possible value combinations of
F,G and T cannot be decided using the single input
value P. To deal with this undecidability, the Operation Generator utilizes the Operation Principle
1105
no.c:
no.d:
no.e:
no.f:
no.g:
device, this concept does not hold true for devices such as PID-controllers or devices placed
under the control of PID-controllers.
a fwcv ss = auto
b- fwcv- ss = auto
c-fwcv-= dose - - - - - -......" - _
c~bfp fw_dev>.S[%] - - - -..
c_bfp fw_dev NML - 200
c fwcv m >72-a[%]
[mm]
c=fwcv=m < 72 + a [%]
Operation
no.S
Figure 10: Knowledge for cbfp controller
1I1odel and approximation functions supplemented
with the Device Model. The failure in water-heater
fault is caused by this reasoning mechanism. We believe that additional principles are needed to evaluate such a process balance.
2. Capability to generate preconditions
From CR1 and ER in Table 2, we can see that
most of the generated preconditions are imperfect,
namely ER > O. The reasons are as follows.
• The Precondition Generator failed to generate
preconditions related to devices not modeled
in Device Model. An example is the set of preconditions to establish the electric power supply for the pump. We can resolve this problem
easily by augmenting the Device Model.
• Although all the necessary preconditions could
be checked in the goal-state search, the Precondition Generator missed analyzing them. No.i
to no.j in Figure 10 illustrates this point. The
system focuses only on the neighbor devices
of the operated device. Because the system
is required only 'to generate specific preconditions for an occurring unforeseen abnormal situation, we can resolve this problem easily by
extending the focusing area.
• The Precondition Generator generated incorrect preconditions for the timing of operations,
as shown by no.8 to no.9 in Figure 10. Although the system is based on the concept that
the timing of operations can be determined
from the maximum outlet process flow of each
Although we can resolve the former two problems
easily, the last problem is serious because it is closely
related to the basic concept for the generation of
preconditions. It is still an open problem. In Table
2, column C R2 represents the expected results after
the refinements against the former two problems.
The remaining uncovered parts for operations 4 and
5 (ER is 10% and 25% respectively) are related to
the last mentioned problem above.
3. Real-time reasoning using Multi-PSI
Although our system does not require of the severe
real-time reasoning capability to cover either PIDcontrol or adaptive-control, it requires at least the
ability to compile the knowledge within a few minutes. To guarantee this performance, we have been
investigating a parallel reasoning mechanism with
Multi-PSI [Suzuki 1991]. We can use KL1 language
on Multi-PSI, which is a profitable language to implement a multi-process system concisely. In particular, its process synchronization mechanism by
"suspend" is an advantage for our system implementation. In spite of this point, it is very difficult to
achieve a drastic speedup using KL1 and Multi-PSI.
We have already demonstrated a threefold to fivefold improvement of reasoning time by using MultiPSI with 16 processor elements. To achieve more
improvement, we think we must make a more elaborate implementation.
4. Utility of the compiled knowledge
In contrast to the classical approach by shallow
1mbwledge, our proposed model-based reasoning
mechanism succeeded to deal with unforeseen abnormal situations in a plant. This point is the utility
of the compiled knowledge.
Although our proposed mechanism is powerful to
deal with unforeseen abnormal situations, it is weak
with respect to the acquisition of knowledge which
is reusable in the SIS. Because the system generates
specific knowledge only for occurring unforeseen abnormal situations, the generated knowledge is either
too general with respect to the lack of some conjunctive preconditions or too specific with 'respect
to their enumerative representations from the viewpoint of its reusability.
5. Facility of model acquisition
The system utilizes the Qualitative Causal Model,
the Device Model, the Operation Principle Model
and the Dynamics Model. These models could be
1106
built from the plant design, and should be consistent with each other. In the current implementation
of the system, each model is built and implemented
separately. Therefore, model sharing is not yet realized.
In a diagnostic task, Yamaguchi[Yamaguchi 1987]
refers to the facility of model· acquisition. Some
other related works are in the area of the qualitative
reasoning. Crawford [Crawford 1990] attempted to
maintain and support the qualitative modeling environment by QPT.
6. Over-sensitive verification of the plant behavior
In the current implementation of the Generate8Test
algorithm for the knowledge, the priority of each
allowable range is not considered at all. Therefore, even though the violation of the range is slight
enough to be ignored, the system tries to deal with
this violation sensitively. This sensitivity is meaningless for all practical purposes because a plant
would be designed with enough capacity to absorb
the violation. For this reason, the system should
check the range with some allowable degree of violation. We are now investigating the mechanism.
7. Monitoring the execution of the generated knowledge
In this paper, we supposed that the Diagnosor can
diagnose unforeseen events exactly. However, in
general, this supposition can be invalid. Diagnostic results should be estimated by plant monitoring
following the plant operations.
As for the related work, Dvorak [Dvorak 1989] utilizes the QSIM [Kuipers 1986] to monitor a plant.
However, he does not refer to the generation of the
knowledge for unforeseen events.
6
Conclusion
We proposed a diagnostic and control expert system
based on a plant model. The main target of our approach
is a system which could deal with unforeseen abnormal
situations. Our approach adopts a model-based architecture to realize the thought process of a skilled human
plant operator.
In this paper, we focused on model-based generation of
plant control knowledge, and explained the details of the
model-based reasoning. Our system utilizes the following models: the Device Model, the Operation Principle
Model and the Dynamics Model. We also discussed its
ability as demonstrated through some experimental results. The results encourage us to make sure the modelbased reasoning capabilities in plant control.
Acknowledgements
This research was carried out under the auspices of
the Institute for New Generation Computer Technology
(ICOT).
References
[Crawford 1990] Crawford, J., Farquhar, A. and
Kuipers, B. "QPC: A Compiler from Physical Models into Qualitative Differential Equations", Proc. of
AAAI-90, pp.365-372 (1990).
[Dvorak 1989] Dvorak, D. and Kuipers, B. ((ModelBased Monitoring of Dynamics Systems", Proc. of
IJCAI-89, pp.1238-1243 (1989).
[Konuma 1990] Konuma, C., et. al. "Deep Knowledge
based Expert System for Plant Control - Development
of Conditions Generation Mechanism of Plant Operations - ", Proc. 9f 12th Intelligent System Symposium,
Society of Instrument and Control Engineers, pp.1318 (1990) (in Japanese).
[Kuipers 1986] Kuipers, B. "Qualitative Simulation",
Artificial Intelligence, 29, pp.289-338 (1986).
[Suzuki 1990] Suzuki, J., et al. "Plant Control Expert
System Coping with Unforeseen Events - Model-based
Reasoning Using Fuzzy Qualitative Reasoning - ",
Proc of Third International Conference on Industrial
and Engineering Applications of Artificial Intelligence
and Expert Systems (lEAl AIE-90), ACM, pp.431-439
(1990).
[Suzuki 1991] Suzuki, J., et al. "Plant Control Expert System on Multi-PSI Machine", Proc. of
KL1 Programming Workshop, pp.l0l-108 (1991) (in
Japanese).
[Taki 1988] Taki, K. "The parallel software research and
development tool: Multi-PSI system ", Programming
of Future Generation Computers, North-Holland,
pp.411-426 (1988).
[Yamaguchi 1987] Yamaguchi, T., et al. "Basic Design
of Knowledge Compiler Based on Deep Knowledge",
Journal of Japanese Society for Artificial Intelligence,
vol.2, no.3, pp.333-340 (1987) (in Japanese).
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1107
A Semiformal Metatheory for Fragmentary and Multilayered
Knowledge as an Interactive Metalogic Program
Andreas Hamfelt and Ake Hansson
Uppsala Programming Methodology and Artificial Intelligence Laboratory
Computing Science Dept., Uppsala University
Box 520, S-751 20 Uppsala, Sweden
+46-18-18 25 00
hamfel tes = [actor(X),actor(Y»),
LegSetl = [[provisionno(sga(5»U,LegSetO],
Text = the same text as in rule schema 1 in fig. 2.l.
propertyping(t(1 ),RulePropl ,ModAtl ,Types,Text).
In IT this provision is assumed open with respect to the
concepts 'vendee' and 'vendor'. So, the assumed fixed
structure of this provision is represented in the metalanguage as the term specified for the metavariable RulePropl
with open places expressed by the metavariables X and
Y. These variables have to be specialised interactively
with the user. The predicate propertyping/5 is defined for
this interaction. The metavariable ModAtl expresses the
relation between the concepts of IT, i.e., the text of Text
and its open parts, i.e., 'vendee' and 'vendor', and its
formal counterpart in 07 partly specified in RulePropl.
Thus, a proper typing carried out by the user gives a
meaningf~l rule of the object language of levell, represented In the metalanguage by the specialised term of
RulePropl. The metavariable LegSeU identifies what part
of level 1 in IT is relevant for a particular case.
The rules of acceptance may also only be partially
characterized in the metalanguage. However, a user can
interactively add interpretation data, thereby extending
the partial characterization of OT in the theory MT of
the metalanguage. What is hard characterizing is the
determination of whether or not a meaningful rule belongs to a theory of IT, i.e., is legally acceptable, and
thus should have a formal counterpart in OT. Presently,
this is solved by assuming in MT that a rule is acceptable when a user tries to apply it, and the conditions for
its application are accepted, i.e., either follow by logic
from other accepted rules or are included in the theory by rules at the higher adjacent level in co-operation
with the user. So, we presuppose that it is only the user
who can determine the relevance of a specific principle.
Consequently, at the end of a session these assumptions
should be possible for a user to examine.
These aspects are encoded in the prover clause [UP]
(short for upward reflection). Observe that the prover
clauses belong to MT which takes as object theory the
whole multilayered OT. Their first demo argument defines the formalisation in OT of logic provability between
a theory Ti of IT and a sentence of IT but though e.g.,
the fourth proof term argument has a counterpart in
OT - a formal proof extending over the whole hierarchy
of OT-it includes expressions solely of MT as well.
[UP]
prover(demo(n(t{l»,n(SentPropl»,ModI,LegSetI,ProofI):proposesent(t{l),SentPropl,Modl,LegSetI),
J is I + I,
ground([SentPropl,ModI,LegSetI]),
permissible(t{l),SentPropl),
prover(demo(n(t(J»,n(demo(n(t{l»,n(SentPropl»»,
[ModAtJ ,ModI] ,[LegSetAtJ ,LegSetI] ,ProoD),
Proofl = (sentenceof(theory(I),SentPropl):proofof(theory(J),proved(theory(I),SentPropl),ProoD».
permissible(t{l),SentPropl):-I = 1.
permissible(t(I),SentPropl):-I ~ 2,\+ SentPropl = (Head:-Body).
Clause [UP] encodes in MT upward reflection between two theories Ti and Tj of arbitrary adjacent levels in IT, with formal counterparts t(l) and t(J) in OT.
A sentence is assumed to belong to a theory Ti if this
accords with the rules of theory Tj of the higher adjacent level. In MT, LegSetI and ModI identify and modify
formula schemata corresponding to known fragments of
sentences of the theory Ti • The predicate proposesent/4 is
defined to specialise interactively with a user such meaningfulsent schemata. Proofl is a metaproof in MT of the
existence of a sequence of formulas in OT's formalisation of IT constituting a formal proof of the proposed
sentence.
Upward reflection must be constrained. If each sentence were upward reflected directly when proposed, the
reasoning process would ascend directly to the topmost
level since the metarule proposed for assessing the sentence would itself directly be upward reflected, etc.
Therefore, at levels i, i ~ 2, only sentence proposals
which are ground facts may be upward reflected, postponing the assessment of rules, which may only be proposed as non-ground conditional sentences, till facts are
activated by their premises.
Under this reasoning
scheme the content of all sentences involved in the reasoning process will eventually be assessed. The restriction is maintained by the permissible subgoal.
Clause [ANOI] handles A-introduction. In MT a theory Ti of IT, with t(I) as formal counterpart in OT, is
assumed to include a sentence which is a conjunction if
both its conjuncts may be assumed included in Ti •
[ANOI]
prover(demo(n(t{l»,n(and(GI,G2»),
[[ModGI ,ModG2] ,ModsBelow],LegSetI,Proofl):I
~
2,
prover(demo(n(t(I»,n(GI»,[ModGI,ModsBelow],LegSetI,ProofGl),
prover(demo(n(t(l»,n(G2»,[ModG2,ModsBelow],LegSetI,ProofG2),
Proofl =(sentenceof(theory(I),and(G I ,G2»:and(proofof(theory(I),G I ,ProofG I),
proofof(theory(l),G2,ProofG2»).
Clause [MP] encodes our version of modus ponens.
In MT a theory Ti of IT, with t{l) as formal counterpart in OT, is assumed to include a sentence which is
the consequence of a proposed implication of Ti whose
antecedent can be assumed included in Ti •
1112
[MP]
prover(demo(n( t(I) ),n(HeadI)),ModI,LegSetI,ProofI):I
~
2,
proposesent(t(I),(HeadI:-BodyI),ModI,LegSetI),
prover(demo(n(t(I»,n(BodyI»,ModI,LegSetI,ProofBodyI),
ProofI = (sentenceof(theory(I),HeadI):and(ruleof(theory(I) ,(HeadI: -BodyI»,
proofof(theory(I),BodyI,ProofBadyI»).
The knowledge of rules in IT for assessing sentence
proposals for the adjacent lower level theory Ti will at
some level j be too rudimentary for composing a theory
Tj • At this level, Tj is considered to be the user's opinion of the sentences proposed for T j • This is encoded in
MY in the clause [TOP].
[TOP]
prover(demo(n(t(J»,n(demo(n(t(l»,n(RulePropI»»,
ModJ ,LegSetJ ,ProoD):J ~ 2,
\f. proposesent(t(J),(demo(n(t(I»,n(RulePropl):-BodyJ),
ModJ ,LegSetJ),
externalconfirmation(t(I) ),RulePropI,ModJ ,LegSetJ),
ProoD = externallyconfirmed(sentenceof(theory(I),RulePropl)).
ModI = [[hirer/vendee,letter/vendor],unspec], call it (modI)
RulePropi =
(legalcons(pay,hirer,letter,goods,price):and(actorI(hirer,goods),and(actor2(letter,goods),
and(unseUledprice(goods ),and(demands(letter,price),
reasonable(price,goods»»», call it (ruleprop 1)
Now it must be established whether it accords with
the higher adjacent level, i.e., the theory T2 , to assume
a primary rule with this proposed content is included in
the theory T 1 • This is accomplished through "upward
reflection". Before a formula with content information
is upward reflected it must be checked for groundness (a
hack) and permissibleness. These are the tasks of the
third subgoal of [UP] (where (name) is shorthand for an
occurrence of the term named by name).
ground([(rulepropl) ,(modI) ,(legsetl) ])
and of the fourth subgoal of [UP]
permissible(t(l), (ruleprop I»,
which permits a conditional rule on levell to be upward
reflected. The fifth, "upward reflection" , subgoal of [UP]
Let us now partially trace the computation of a sample query
prover(demo(n(t(2»,n(demo(n(t(I»,n«(rulepropI»»),
[ModAt2,(mod 1) ],[LegSetAt2,(legseU) ],Proof2),
>prover(demo(n(t(I»,n(RulePropI»,ModI,LegSetl,Proofl).
resolves with the prover clause [MP] leading to four subgoals (the first and last of which controls the index of the
current level and builds the proof term, respectively).
Now a secondary rule must be proposed for assessing
the lower level expression. The second subgoal of [MP]
is
This query could be read as "is there a metaproof Proofl
stating that the theory Tl of level 1 includes a primary
rule which is represented in OT by RulePropi and modified by ModI in the legal set ting LegSeU?" Since it is completely unspecified at this point what particular problem
to solve the query can be stated in these general terms
and be generated by the system. The goal resolves with
the prover clause [UP] leading to six subgoals, the last
of which builds the proof term to bind Proofl. Below,
we refrain from discussing how the proof term is built
during the computation. The first subgoal of [UP] is
proposesent(t(l),SentPropl,Modl,LegSetl)
which through user interaction selects a legal rule and
modifies it for the current case. The unifying clause
proposesent(Theory,RuleProp,Mod,LegSet):(Theory = t(l);RuleProp=(demo( _,J:-Body»,l
findlegalseuing(Theory,Leg Set),
meaningfulsent(Theory,RuleProp,Mod,LegSet,Text).
identifies the relevant part of the legal domain from
which it retrieves a proposal for a rule provided it is
meaningful. The latter is sorted out by meaningfulsent
clau~es, say, the one presented above. In this clause the
propertyping condition is intended to promote that user
proposed modifications preserve the rule's meaningfulness. Suppose now that the user interaction makes the
first subgoal of [UP] return with the following ground
argument bindings, i.e., the schemata from sect. 5 Sale
of Goods Act is adapted into a primary rule proposal
regulating a case of 'hire of goods',
LegSetl =
[[provisionno(sga(5»,
provisioncategory(,Determination of Purchase Money'),
legalfield('Commercial Law'»),unspec], call it (legsetl)
1
The legal setting may be assumed unknown if any of these
two conditions hold.
proposesent(t(2),(demo(n(t(I»,n«(rulepropl»):-Body2),
(mod2) ,(legset2»,
where (mod2) is [ModAt2,(modl)], (modI) is [(modatl),unspec),
(modatl) is [hirer/vendee,letter/vendor] and (legset2) is [LegSetAt2,
(legsetl)] .
Suppose the user chooses the analogia legis principle. The relation between primary rules of theory Tl
and secondary rules for analogia legis of theory T2 is
encoded in this clause:
meaningfulsent(t(2),RuleProp2,Mod2,LegSet2,Text):RuIeProp2 =
(demo(n(t(I»,n(RulePropl»:analogialegis(n(RulePropl),n(ModAtI),LegSetl»,
M0d2 = L,[ModAtl,J],
LegSet2 = [[interpretationtheory(,analogia legis')IJ,LegSetl),
Text = '''A primary rule proposal is legally valid (i.e., belongs to the
theory tl of valid primary rules) if its inclusion accords with
the secondary rule for analogia legis." .. .',
propertyping(t(2),RuleProp2,[),[),Text).
The second subgoal of [MP) returns with its second argument bound to
(demo(n(t(I»,n«(ruleprop I) »:analogialegis(n( (ruleprop I) ),n( (modatl) ),(legsetl»)
and LegSetAt2 bound to [interpretationtheory('analogia legis')IJ.
The third subgoal of [MP] is
prover(demo(n(t(2)),n(analogialegis(n( (ruleprop I»,
n«(modatl»,
(legsetl»»,
(mod2) ,(legset2) ,ProofBody2),
1113
which recursively calls [MP]. Now a meaningful proposal
for an actual analogia legis secondary rule will, by the
second proposesent subgoal of [MP], be retrieved from this
clause
meaningfulsent(t(2).RuleProp2, _.LegSet2,Text):RuleProp2 =
(analogialegis(n«Cons:-Ante»,n(ModAtl),LegSetl):and(not(casuisticalinterpretation(LegalField,
n«not(Cons):-Ante»»,
and(intendedfor(ProvisionNo,n(TypeCase»,
and(substantialsimilarity(n(TypeCase) ,n(Ante) ,n(ModAtl»,
and(intendedtomeet(ProvisionNo,Interests,LegaIField),
and(supports(ProvisionNo,n(ModAtl ),ProInt,Interests),
and(recommendrejection(ProvisionNo,n(ModAtl),
ContraInt,Interests),
outweigh(Prolnt,ContraInt»»»»,
LegSet2 = [[interpretationtheory('analogia legis')!J,LegSetl],
LegSetl = [[provisionno(ProvisionNo), _,legalfield(LegalField)], J,
Text = the same text as in rule schema 3 in fig. 2.1.
propertyping(t(2).RuleProp2,[] ,[] ,Text).
with these bindings (where (rulepropl) is «(consrulel):(anterule 1)) )
analogialegis(n« (consrulel) :-(anterulel) »,n( (modatl) ),(legsetl) ):and(not(casuisticalinterpretationCCommercial Law',
n«not(consrulel) :-(anterulel) »»,
and(iotendedfor(sga(5),n(TypeCase»,
and(substantialsimilarity(n(TypeCase),n«(anterulel),n«(modatl)),
and(intendedtomeet(sga(5),Interests,'Commercial Law'),
and(supports(sga(5),n( (modatl) ),ProInt,Interests),
and(recommendrejection(sga( 5),n( (modatl )),Contralnt,Interests),
outweigh(Prolnt,ContraInt»»») .
Now it must be proved that with the proposed content
the antecedent of the analogia legis rule (call it (albody))
is included in T2 • The third subgoal of [MP] is
prover(demo(n(t(2»,n«(albody)), _,
[[interpretationtheory('analogia legis')J.(legsetl)J. J,
and each of the conjuncts in (albody) will be demonstrated
in turn by the prover clauses [ANDI] , [MP], and [UP]. To
illustrate how user proposed content for a sentence is
accepted (or rejected) at higher levels let us focus on
the fourth conjunct which gives rise to the goal
prover(demo(n(t(2»,
n(intendedtomeet(sga(5),Interests,'Commercial Law'»),
_,[[interpretationtheory('analogia legis')!J,(legsetl)]).
An "intended to meet" sentence must be proposed by
the user. The resul t may be a meaningful fact (unconditional sentence) whose inclusion in the theory T2 must
be accepted by the rules of theory T3 or it may be a
rule (conditional sentence) which is assumed included
in T2 directly after the user's acceptance. The resolving
clauses in the respective cases are [UP] and [MP]. Thus,
in the first case upward reflection occurs immediately.
In the second case upward reflection is postponed until
backward inferencing by modus ponens at the current
level leads to the proposal of a fact. Note that this
guarantees that the application of the originally proposed rule is not accepted unless all the components of
its antecedent eventually are assessed and accepted.
Suppose a fact is proposed. The goal will resolve
with the prover clause [UP], whose recursive fifth subgoal resolves with the prover clause [MP] leading to the
application of tertiary rules for assessing the proposed
(secondary) fact. Reasons of space force us to remove a
part of the trace here. The inferencing at the tertiary
level is similar to that just described for the secondary
level. We conclude this section with a fragment of the
trace in which a tertiary fact is proposed but no quaternary rules exist for assessing it. The upward reflected
goal looks like
prover(demo(n(t(4»,
n(demo(n(t(3»,
n(adequatetoequalize(
,actors with similar economical positions',
'consumer protection'/,hirer protection',
'Commercial Law'»»),
Mod4,LegSet4, J.
For the theory T4 proposesent fails however to return any
quaternary rules which may assess the adequatetoequalize
fact. The goal resolves with the prover clause [1DP] and
the user mayor may not accept the content of the "adequate to equalize" rule.
Provided the rule is accepted this completes the computation of the fourth conjunct in the antecedent of the
analogia legis rule. The following three conjuncts in
the antecedent of the analogi a legis rule are computed
likewise which completes the computation of the initial
query. A conclusion is not considered as final before the
line of arguments leading up to it has been considered
and accepted by the user. To this end the user needs
a comprehensible presentation of the proof term. We
illustrate elsewhere [Hamfelt and Hansson 1991b] how
derivations of goals can be entrusted to the user's acceptance or rejection by an interactive piecemeal unfolding
of a term representing the proof of the goal.
5. Coping with Change
A program should be able to cope with changes in the
frequently revised legal knowledge it formalises. Also it
should be structure preserving ("isomorphic") modulo
this knowledge, cf. [Sergot et al. 1986]. This is a conflict,
Bratley et al. [1991] claim, since coping with changes requires modifying "implicit or explicit rules which do not
correspond directly to paragraphs in the text of law" .
Our metalogic program MT, however, is a structure preserving formalisation of legal knowledge coping with changes. The schemata give a modular, direct and easily changed description of statutory rules
and (meta ... )metarules of legal interpretation. MT is
modular both horizontally and vertically entailing that
adjustments can be made locally to the schemata for the
(higher level) rules of legal interpretation as well as to
the schemata for the ordinary (low level) statutory rules.
The level of the knowledge is identified and the appropriate adjustment made to its rule schemata, which then
control the computation of accepted rules assumed included in theories of the lower adjacent level. Also, since
MT takes as its object language the whole n-Ievel language of OT, we can encode in the formal part of MY,
rules coping with global changes which are not possible
to localize to rule schemata of a certain level. Furthermore, if the legal system has undergone an even more
drastic revision, a large part of our system will nevertheless remain intact since the structure of principles such
as analogia legis will hardly be affected. The structure
1114
preserving model of the British Nationality Act [Sergot
et al. 1986] is according to Kowalski and Sergot [1990]
"of limited practical value" since it expresses a "layman's reading of the provision" but in our MT expert
knowledge may be incorporated e.g., for verifying the
correctness of 01', modifying and augmenting it, and
for suggesting promising ways for applying its rules.
6. Flelated VVork
Allen and Saxon [1991] discuss, in contrast to our multiple semantic interpretations, assistance for multiple
structural interpretation of components of provisions,
such as "if", "not", "provided that", e.g., by changing
which component is taken as the main connective of a
sentence. The logical relationship between theories comprising interpretative knowledge and interpreted theories is not analysed.
Assessing and compiling persuasive lines of arguments pro and contra different, often contradictory, legal
decisions is important in legal reasoning. Proof terms
should thus be objects of discourse and be reasoned
about, which they are in MT. This is advocated also by
Bench-Capon and Sergot [1988], who do not, however,
propose a formalisation or a detailed informal theory,
such as our IT, concerning how these aspects are sorted
out in informal legal reasoning.
7. Conclusions and Further VVork
Above we have proposed a novel approach for representing fragmentary, multilayered, not fully formalisable knowledge, in which the informal metatheory of the
usual formalisation approach is replaced by a semiformal
metalogic program which interactively composes formal
object theories to be accepted or rejected as formalisations of the knowledge by the user. Our representation
easily copes with changes in the represented knowledge.
Imprecise knowledge requires advanced user interaction that promotes meaningful user answers and queries,
constructs and intelligibly displays proof terms explaining derived conclusions, and makes the system pose its
questions in a natural order. These aspects have been
considered and to some extent solved in our program
[Hamfelt and Hansson 1991b].
Multiple semantic interpretations of provisions is realised by allowing the user to fill schemata with meaningful content referring to his fact situation whereupon
the system accepts or rejects the thus proposed rule. Including multiple structural interpretations, e.g., adding
premises, should raise no real obstacles provided rules
of acceptance for such alteration can be established.
In case law rules of legal interpretation are as important as in statute law and apart from the difficult
problem of inducing schemata from precedent cases, we
hypothesize that our framework needs only minor adaptations to catch the problem of case-based reasoning.
Proof terms should, since the notion of being a persuasive line of arguments is vague, not only be displayed
for user communication but also reasoned about.
Acknowledgments
We like to thank Keith Clark and Leon Sterling for valuable comments.
Fleferences
[Allen and Saxon 1991] L. E. Allen and S. S. Saxon.
More IA Needed in AI: Interpretation Assistance for
Coping with the Problem of Multiple Structural Interpretations. In Proc. Third Int. Conf. on Artificial
Intelligence and Law, ACM, New York, 1991. pp. 5361.
[Bench-Capon and Sergot 1988] T. Bench-Capon and
M. Sergot. Toward a Rule-Based Representation of
Open Texture in Law. Computer Power and Legal
Language, ed. C. Walter, Quorum Books, New York,
1988. pp. 39-60.
.
[Bowen and Kowalski 1982] K. A. Bowen and R. A.
Kowalski. Amalgamating Language and Metalanguage in Logic Programming. Logic Programming,
eds. K. Clark and S.-A. Tarnlund, Academic Press,
London, 1982. pp. 153-72.
[Brat ley et al. 1991] P. Bratley, J. Fremont, E. Mackaay
and D. Poulin. Coping with Change. In Proc. Third
Int. Conf. on Artificial Intelligence and Law, ACM,
New York, 1991. pp. 69-75.
[Costantini 1990] S. Costantini. Semantics of a Metalogic
Programming Language. In Proc. Second Workshop
on Metaprogramming in Logic, ed. M. Bruynooghe,
Katholieke Universiteit Leuven, 1990. pp. 3-18.
[Hamfelt 1990] A. Hamfelt. The Multilevel Structure
of Legal Knowledge and its Representation, Uppsala
Theses in Computing Science 8/90, Uppsala University, Uppsala, 1990.
[Hamfelt and Barklund 1990] A. Hamfelt, J. Barklund.
Metaprogramming for Representation of Legal Principles. In Proc. Second Workshop on Metaprogramming in Logic, ed. M. Bruynooghe, Katholieke Universiteit Leuven, 1990. pp. 105-22.
[Hamfelt and Hansson 1991a] A. Hamfelt, A. Hansson.
Metalogic Representation of Stratified Knowledge.
UPMAIL TR 66, Compo Sci. Dept., Uppsala University, Uppsala, 1991.
[Hamfelt and Hansson 1991 b] A. Hamfelt, A. Hansson.
Representation of Fragmentary and Multilayered
Know ledge-A Semiformal Metatheory as an Interactive Metq.logic Program. UPMAIL TR 68, Compo
Sci. Dept., Uppsala University, Uppsala, 1991.
[Horovitz 1972] J. Horovitz. Law and Logic. SpringerVerlag, Vienna, 1972.
[Kleene 1980] S. C. Kleene, Introduction to Metamathematics. North Holland, New York, 1980.
[Kowalski 1990] R. A. Kowalski. Problems and Promises
of Computational Logic. Computational Logic, ed.
J. W. Lloyd, Springer-Verlag, Berlin, 1990. pp. 1-36.
[Kowalski and Sergot 1990] R. A. Kowalski, M. J. Sergot.
The Use of Logical Models in Legal Problem Solving.
Ratio Juris, Vol. 3, No.2 (1990), pp. 201-18.
[Sergot et al. 1986] M. J. Sergot, F. Sadri, R. A. Kowalski, F. Kriwaczek, P. Hammond, H. T. Cory. The
British Nationality Act as a Logic Program. Comm.
ACM 29, (May 1986), pp. 370-86.
[Sergot 1983] M. J. Sergot. A Query-the-User Facility
for Logic Programming. In.tegrated Interactive Computer Systems, eds. P. Degano and E. Sandewall,
North-Holland, Amsterdam, 1983. pp. 27-41.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1115
HELIC-II:
A Legal Reasoning System on the Parallel Inference Machine
Katsumi Nitta (1)
Masayuki Ono (1)
(1)
(2)
(3)
Yoshihisa Ohtake (1)
Hiroshi Ohsaki (2)
Shigeru Maeda (1)
Kiyokazu Sakane (3)
Institute for New Generation Computer Technology
4-28, Mita l-chome, Minato-ku, Tokyo 108, Japan
Japan Information Processing Development Center
Nippon Steel Corporation
nitta@icot.or.jp
Abstract
This paper presents HELIC-II, a legal reasoning system
on the parallel inference machine. HELIC-II draws legal
conclusions for a given case by referring to a statutory
law (legal rules) and judicial precedents (old cases). This
system consists of two inference engines. The rule-based
engine draws legal consequences logically by using legal
rules. The case-based engine generates legal concepts by
referencing similar old cases. These engines complementally draw all possible conclusions, and output them in
the form of inference trees. Users can use these trees as
material to construct arguments in a legal suit.
HELIC-II is implemented on the parallel inference machine, and it can draw conclusions quickly by parallel
inference.
As an example, a legal inference system for the Penal Code is introduced, and the effectiveness of the legal
reasoning and parallel inference model is shown.
1
Introduction
The primary knowledge source of a legal inference system
is a statutory law. A statutory law is a set of legal rules.
As legal rules are given as logical sentences, they are
easily represented as logical formulae. Therefore, if a
new case is described using the same predicates as those
appearing in legal rules, we can draw legal conclusions
by deductive reasoning.
However, legal rules often contain legal predicates (legal concepts) such as "public welfare" and "in good
faith" . Some legal concepts are ambiguous and their
strict meanings are not fixed until the rules are applied
to actual facts. Predicates which are used to represent
actual facts do not contain such legal concepts. As there
are no rules to define sufficient conditions for legal predicates, in order to apply legal rules to actual facts, interpreting rules and matching between legal concepts and
facts are needed. To realize this, precedents (old cases)
are often referenced because they contain the arguments
of both sides (plaintiff vs. defendant or prosecutor vs.
defendant) and the judge's opinions concerning interpretation and matching.
Consequently, legal reasoning can be modeled as
a combination of logical inference using legal rules
and case-based reasoning using old cases. Based on
this model, several hybrid legal inference systems consisting of two inference engines have been developed
[Rissland et al. 1989] [Sanders 1991(a)]. However, as
practical legal systems contain many legal rules and old
cases, it takes a long time to draw conclusions. Moreover, controlling two engines often requires a complex
mechanism.
ICOT (Institute for New Generation Computer Technology) has developed parallel inference machines (Multi
PSI and PIMs) [Uchida et al. 1988],[Goto et al. 1988].
These are MIMD-type computers, and user's programs
written in parallel logic programming language KL1
[Chikayama et al. 1988] are executed in parallel on
them.
The HELIe-II (Hypothetical Explanation constructor by Legal Inference with Cases by 2 inference engines)
is a legal inference system based on the hybrid model. It
has been developed on the parallel inference machine,
and draws legal conclusions for a given case by quickly
referencing statutory law and old cases.
In Section Two, we introduce the function and architecture of HELIC-II. In Section Three, we explain legal
knowledge representation. In Section Four, we explain
the reasoning mechanism of HELIC-II. In Section Five,
a legal inference system of the Penal Code is explained.
2
Overview of HELIC-II
The function of HELIC-II is to generate all possible legal
conclusions for a given case by referring to legal rules
1116
and old cases. These conclusions are represented in the
form of inference trees which include final conclusions
and explanations of them.
HELIC-II consists of two inference engines - the rule
based engine and the case- based engine - and three
knowledge sources - a rule base, a case base and a dictionary of concepts (see Fig.I). The rule-based engine
refers to legal rules and draws legal consequences logically. The case-based engine generates abstract predicates (legal concepts) from concrete predicates (given
facts) by referring to similar old cases.
HELIC-II draws legal consequences using these two
engines. Since the reasoning of these engines is datadriven, there are no special control mechanisms to manage this. A typical pattern of reasoning by HELIC-II
is as follows. When a new case (original facts) is given
to HELIC-II, the case-based engine initially searches for
similar old cases and generates legal concepts which may
hold in the new case. These concepts are passed to
the rule-based engine by way of working memory(WM).
Then, the rule- based engine draws legal consequences using original facts and legal concepts.
These results are gathered by an explanation constructor, which then produces inference trees.
Knowledge Representation
3
In this section, we will explain the representation of legal
knowledge in HELIC-II. We will show how to represent
legal rules, old cases and legal concepts.
3.1
Representation of Legal Rules
A statutory law consists of legal rules. Each legal rule is
represented as follows.
RuleN ame( Comment, Rulelnfo,
[A I ,A2 , ••• ,Ai ]-+ [[BI, .. ,Bk],[CI, .. ,C1],
•• ]).
In this clause, RuleName is the rule identification,
Comment is a comment for users and Rulelnfo is adThe.
ditional information such as article number.
LHS ([All A 2 , ••• , Ai]) is the condition part, and the
RHS([[BI , .. , B k ], [CI , .. , Cd, .. ]) is the consequence part.
[BI, .. , B k ] and [CI, .. , Cd are combined disjunctively.
Each literal of the LHS and RHS is an extended predicate or its negation (denoted by""''' or "not"). An
extended predicate consists of a predicate (concept), an
object identifier and a list of attribute = value pairs.
The following is an example of an extended predicate.
An object "drivel" is an instance of a concept "drive".
Two attribute = value pairs (agent = tom and car =
toyotaI) are defined.
drive(drivel, [agent = tom, car = toyotaI]).
Internally, this extended predicate is treated as a set
of the triplet {obj ect, attribute, value} as follows.
{drivel, agent, tom}
{drivel,car,toyota1}
In a clause, we can use "not" (negation as failure)
in addition to ""," (logical not). By introducing "not",
nonmonotonic reasoning is realized, and the representation of exceptional rules and presumed facts are easily
represented [Sartor 1991].
The following are examples of legal rules.
Figure 1: The architecture of HELIC-II
homicideOI("example", [article = 199],
[person(A) , person(B),
action(Action, [agent = A]),
intention (Intention, [agent = A, action = Action,
goal = Result]),
death(Result, [agent = BJ),
caused(Caused, [event = Action,effect = Result2]),
death(Result2, [agent = B]),
riot( '" illegality( Illegal, [agent = A,
1117
action
= Action, result = Result2])]
[[crimeO f H omicide( Crime, [agent = A,
action = Action, result = Result2])]]).
legality01("example", [article = 38],
[action(Action, [agent = A]),
intention (Intention, [agent = A, action = Action,
goal = Result]),
selfDefence(Result, [object = Action]),
caused( Caused, [event = Action, effect = Result2])]
[["" illegality(Illegal, [agent = A,
action = Action, result = Result])]]).
drive(drive1, [agent = john, car = toyota1]).
accident(accident1, [agent = john]).
person(john, [sex = male]).
person(mary, [sex = female]).
restaurant(maxim's, [rank = 5stars]).
car(toyota1, [type = sportsCarl).
The meaning of this example is that the case
"traf fic accident 112" consists of three events such as
"dinner1", "drivel" and "accidentl". "Dinner1" occurred before "drivel", and "accident1" happened during "drivel". The event "dinned" is a lower concept of
"dinner", and it is acted by "john" in "maxim's", etc ..
(2) Case Rules
The first rule is a definition of the crime of homicide,
which is given by the Penal Code.
The meaning of "not( "" illegality(Illegal, [... ])" is
that illegality is presumed, in other words, if there
isn't proof that """ illegality(Illegal, [... ])" holds then
"not( "" illegality (Illegal, [... ])" is true.
The second rule is an exception of the first
If a person did some action in defense,
rule.
"illegality(Illegal, [... ])" is refuted.
3.2
Representation of Cases
A judicial precedent consists of the arguments of both
sides, the opinion of the judges and a final conclusion.
We represent a precedent (an old case) as a situation and
some case rules, and represent a new case as a situation.
(1) Situation
A situation consists of a set of events/objects and their
temporal relations. An event and an object are represented as an extended predicate as introduced in the previous section. The temporal relations are represented as
follows.
problem(CaseID, Comment, TemporalRelations).
CaseID is the case identification, Comment is a comment for users and TemporalRelations is a list of relations
between events. To represent temporal relations between
events/obj~cts, we use Allen's interval notation such as
"before", "meets", "starts", and so on [Allen 1984].
The following is an example of a situation.
problem(traf ficAccident112, "example",
[before(dinnerl, drivel), during (accident1 , drivel)]).
dinner(dinnerl, [agent = john,place = maxim's]).
Arguments by both sides are represented as a set of
case rules. The following is the syntax of a case rule.
RuleN ame( Comment, Rulelnfo,
[AI, A 2 , ... , Ai] -+ [BI' B 2 , .. , Bk])'
RuleName is the rule identification, Comment IS a
comment for users and RuleInfo is additional information
such as a related article, index for the opposing side's
case rules, relation to judge's decision and so on. The
LHS ([AI, A 2 , ... , Ai]) is the context of the opinion, and
the RHS ([Bb B 2 , .. , B k]) is the conclusion insisted on by
one side.
The following is an example of a case rule.
rule001("example" ,
[ article = 218, insisted = prosecutor,
result = lost],
[ drive(drive1, [agent = john/important,
object = toyota1/trivial]),
person(john, [sex = male/trivial]),
person(mary, [sex = female/trivial]),
accident(accident1, [agent = john/important]),
caused( caused1, [event = accident1/important,
effect = injury1/important]),
injury(injury1, [agent = mary/trivial])]
[ responsibility(resp1, [agent = john,
object = ken, reason = accident1])]).
The meaning of this case rule is:"In the case that a
traffic accident caused by John injured Mary, John had
a responsibility of care to Mary." This rule concerns article 218 of the Penal Code and was insisted on by the
prosecutor, but the judge didn't employ this rule. On
the LHS, "effect = injury1" is an important fact from
1118
the legal point of view. Therefore, this fact is marked
as "important". We can use "exact", "important" and
"trivial" to represent levels of importance. This information is used to calculate the similarity between two
situations.
Arguments in a case are sequences of case rules. As
both sides try to draw contradictory conclusions, an old
case contains case rules whose conclusions are inconsistent.
3.3
Representation of Concepts
All concepts in legal rules and cases must he contained
in the dictionary. In other words, each event and object
in a situation are instances of these concepts.
In the dictionary, a super concept, a concept and a list
of attributes are defined as follows.
4.1
A Rule-based Engine
The function of the rule-based engine is to draw all legal consequences by the forward reasoning of legal rules,
using original data (a new case) and results from a casebased engine.
The rule-based engine is based on the parallel
theorem prover MGTP (Model Generation Theorem
Prover)[Fujita et al. 1991] developed by ICOT.
MGTP solves range restricted non-Horn problems by
generating models. For example, let's take the following
clauses.
C1:
C2:
C3:
C4:
true -4 p(a); q(b).
p(X) -4 q(X); r(X).
r(X) -4 s(X).
q(X) -4 false.
obj ect( creature, []).
creature(person, [age, sex)).
person(person, []).
person(infant, []).
creature(lion, []).
action(drive, [agent, car, destination]).
M1 ={p(a)}
The similarity between concepts is defined by the distance in the hierarchy (see Fig.2). For example, "baby"
is closer to "infant" than to "lion" because it requires
two steps for "baby" to reach "infant" but thr~e steps
to reach "lion" in this hierarchy.
/C2~
M3={p(a),q(a)}
C4
creature
/""
lion
infant
baby
Figure 2: Hierarchy of concepts
4
Reasoning by HELIC-II
In this section, we will explain the reasoning mechanisms of the rule-based engine and the case-based engine. These engines are implemented in the parallel logic
programming language KL1 and run on the parallel inference machine.
M2={q(b)}
x
C4
M4={p(a),r(a)}
x
C3
M5={p(a),r(a),s(a)}
Figure 3: MGTP proof tree
MGTP calculates models which satisfy these clauses
as follows (see Fig.3). The proof starts with null model
MO = {¢>}. By applying C1, MO is extended into M1 =
{p(a)} and M2 = {q(b)}. Then, by applying C2, M1 is
extended into M3 = {p(a), q(a)} and M4 = {p(a), r(a)}.
Using C4, M3 and M2 are discarded. By C3, M4 is
extended to M5 = {p(a),r(a),s(a)}. M5 is a model
which satisfies all clauses.
In MGTP, each clause is compiled into a KL1 clause,
and each KL1 clause is applied in parallel on the parallel
inference machine. In the problem in which the proof
tree has many branches, parallel inference performance
becomes high.
To use MGTP as a rule-based engine of HELIC-II, we
extended the original MGTP as follows.
1119
1. Realization of "not (negation as failure)": We
made MGTP able to treat "negation as failure"
based on [Inoue et aI. 1991]. For example, the following C is treated as C', and the model is extended
in two ways (see FigA). Here, "k" is a modal operator, and "k(r(X»" means that the model is believed
to contain a datum which will satisfy reX) in the
future.
The Case-based Engine
plaintiff's
opinion
C: not(r(X» ~ seX).
C': dom(X) ~ k(r(X»; '" k(r(X», seX).
After MGTP generates models which satisfy all
clauses, the rule-based engine examines each of
them. For example, if a model contains both '"
k(r(a» and rea), or if a model contains k(r(a)) and
doesn't contain r( a), the model is discarded.
defendant's
opinion
{-q(b)}
{q(b)}
The Rule-based Engine
~------
initial models
Figure 5: Splitting a model
Figure 4: Negation as failure of MGTP
2. Realization of the multiple context: The rulebased engine uses both original facts (a new case)
and results from the case-based engine as the initial model. The case-based engine may generate
data which conflicts with each other such as "q(b)"
and "", q(b)". Therefore, before reasoning, the rulebased engine has to split the initial model into several ones so that each model doesn't contain any
conflicts (see Fig. 5).
However, the case-based engine has not generated
all results when the rule-based engine begins to reason because the reasoning of both engines is data
driven. To obtain the pipeline effect, we developed a
function to register predicates which may cause conflicts, and to split the model when such predicates
reach the rule-based engine. For example, in Fig.5,
if '" q(b) reaches the rule-based engine, the model
is split l?efore q(b) is reached. We implemented this
mechanism by using a similar modal operator as the
"k-operator" .
3. Keeping justification:
To' construct inference
tr~es, th~ rule-based engine must keep the justifications for each consequence. A justification consists
of a rule name and data which matches the LHS of
the rule.
4. Temporal reasoning: We prepared a small rule
set of temporal reasoning [Allen 1984] to help in describing the temporal relation. The following are
example rules.
before(A, B), before(B, C) ~ before(A, C).
meets(A,B),overlaps(C,B) ~
overlaps(A, C); during(A, C); starts(A, C).
With these extensions, the rule-based engine has many
proof tree branches even if clauses don't have the disjunction such as C1 and C2 in Fig.3. Therefore, the
rule-based engine has a lot of parallelisms in its reasoning.
4.2
A Case-based Engine
The function of the case-based engine is to generate legal
concepts by using similar old cases. The reasoning of the
case-based engine consists of two stages (see Fig.6).
1120
I New Case I
situation
we regard [strikel, runAwayl] and [kick2, sneak2] as
mapped subsequences of Sl and S2 (see Fig.7).
Time
~~
I
(Tom hits Mary }--~
causality ?
Case1
violence
escape
strike 1
injury 1
'---~------~==~~----------::~-Case2
situation
I
( Jim kicks Bi II )
Case3
Time
~
D
I
(injury
~
)
case rule --------------------
C)O
81
I
I
runAway 1
I
kick 2
82
I
sneak 2
It-----I
Figure 7: Subsequence of events
Figure 6: Reasoning by the case-based engine
The similarity between two cases is evaluated by the
length of the longest mapped subsequence. Several
cases whose similarities are beyond a threshold are
selected in the first stage.
2. Applying case rules:
1. Searching similar cases:
The role of the first stage is to search for similar
cases from the case base. At first, the case-based
engine constructs a sequence of events for each case.
As the situations of the new case and old cases are
described as a set of events/objects and their temporal relations, it is easy to coristruct a sequence of
events for each situation.
Then, the case-based engine tries to extract common subsequences from event sequences of the new
case and each old case. For example, let's take the
following two sequences.
Sl: [... , meets(strike1, injury1),
during(runAwayl, injuryl), .. ]
S2: [... , before(kick2,sneak2), ..]
In this example, the temporal relation between
"strikel" and "runAway1" is the same as that of
"kick2" and "sneak2". Furthermore, "strikel" and
"kick2" have a common upper concept "violence",
and "runAway1" and "sneak2" have a common upper concept "escape" in the dictionary. Therefore,
The role of the second stage is to apply the case
rules of selected cases as follows [Branting 1989].
At first, the similarity between the LHS of a case
rule and a new case is evaluated. For example, let's
take "rule001" in section 3.2 and the following new
case.
person(bill, [D.
baby(jane, O).
cycle( cycle2, [agent = bill, object = honda2D.
collision( collision2, [agent = bill]).
sprain(sprain2, [agent = jane]).
intention( intention2, [goal = injury2]).
injury(injury2, [agent = jane]).
The engine tries to map the LHS of "rule001" to
a new case. As the following pairs of event/ohject
have common upper concepts in the dictionary, we
map these pairs (see Fig.8).
john
mary
drivel
toyotal
~
~
~
~
bill
jane
cycle2
honda2
1121
new case
LHS of rule001
{cycle2.agent.bill} •.....
agent
Figure 8: Mapping networks
accident!
injuryl
causedl
+-+
+-+
+-+
collision2
sprain2
caused2
The similarity is evaluated by counting the number
of mapped links in Fig.8. As we explained in section 3.2, an ann6tation (exact, important, trivial) is
attached to each link in the network. These annotations and the distances between concepts are used as
weights to evaluate similarities. Even if some conditions of a case rule are not satisfied, but the important conditions are satisfied, then the LHS may be
judged as similar to the new case. For example, in
Fig.8, though there is no node which can be mapped
to "negligencel", "ruleOOl" may be selected as similar.
Next, the case-based engine selects case rules whose
LHSes are similar to the new case, and executes their
RHSes.
The matching and executing case rules are repeated
until there are no case rules left to be fired.
On the parallel inference machine, each stage is executed in parallel. In the first stage, before searching,
cases are distributed to processors (PEs) of the parallel
inference machine, and then a new case is sent to each
PE. Each PE evaluates similarities between the new case
and old cases, and selects similar ones.
Figure 9: Rete-like networks of KLl processes
In the second stage, case rules are distributed to PEs,
and the LHSes of each case rule are compiled into a Retelike network of KLl processes (see Fig.9). Then, triplets
({object,attribute,value}) which are facts of the new
case are distributed to each PE as tokens. To realize
matching based on similarity, each one-input node refers
to the dictionary of concepts, and each two-input node
not only examines the consistency of pairs of tokens but
evaluates their similarities with the LHS.
5
A legal reasoning system for
the Penal Code
We developed an experimental legal reasoning system for
the Penal Code.
In the Penal Code, general provisions and definitions
of crimes are given as legal rules. Though they seem to be
strictly defined, the existence of criminal intention and
causality between one's action and its result often becomes the most difficult issue in the court. The concept
of causality in the legal domain is similar to the concept
of responsibility and is different from physical causality.
Therefore, to judge the existence of causality, we have to
take into account various things such as social, political
and medical aspect.
We show the function of the reasoning system of the
Penal Code using Mary's case. We selected this case
1122
from the qualification examination for lawyers in Japan.
Mary's Case:
On a cold winter's day, Mary abandoned her
son Tom on the street because she was very
poor. Tom was just 4 months old. Jim found
Tom crying on the street and started to drive
Tom by car to the police station. However, Jim
caused an accident on the way to the police.
Tom was injured. Jim thought that Tom had
died of the accident arid left Tom on the street.
Tom froze to death.
The problem is to decide the crimes of Mary and Jim.
The hard issues of this case are the following.
1. Causality between Mary's action and Tom's
death:
If Mary hadn't abandoned Tom, Tom wouldn't have
died. Moreover, the reason for his death wasn't injury but freezing. Therefore, some lawyers will judge
the existence of causality and insist she should be
punished for the crime of "abandonment by person
responsible resulting in death". On the other hand,
other lawyers will deny any causality because causality was interrupted by Jim's action.
2. Causality between Jim's action and Tom's
death:
Jim did several actions such as "pick up", "drive",
"cause accident" and "leave Tom". Among them,
"cause accident" will be punished by the crime of
"injury by negligence in the performance of work" ,
and "leave Tom" will be punished by the crime of
"d eath by negligence". Moreover, if there is causality
between "cause accident" and Tom's death, Jim will
be punished by the crime of "death by negligence
in the performance of work" which is very grave.
As the main reason of Tom's death is freezing, it is
difficult to judge the causality.
Though the Penal Code has no definite rule for the
causality, lawyers can get hints from old cases. For example, let's take Jane's case which was handled by the
Supreme Court in Japan.
Jane's Case:
Jane strangled Dick to kill him. Though Dick
only lost consciousness, Jane thought he was
dead. Then, she took him to the seashore, and
left him there. He inhaled sand and suffocated
to death.
In the court, there were arguments between the prosecutor and Jane. The prosecutor insisted Jane should be
punished by the crime of homicide because of the following reasons.
PI: "Strangling" and "taking to the seashore" should be
considered the one action of performing the homicide. Therefore, it is evident that there was an intention to kill Dick and causality between her action
and Dick's death.
P2: There is causality between "strangling" and "Dick's
death" even though "strangling" wasn't the main
reason for his death.
On the contrary, Jane insisted her actions didn't satisfy the condition of the crime of homicide because of the
following reason.
J1: "Strangling" should be punished be the crime of
"attempted homicide, and "taking to the seashore"
should be punished by the crime of "manslaughter caused by negligence" because there isn't causality between strangling and Dick's death, and there
wasn't an intention to kill him when taking him to
the seashore.
We represent Mary's situation and Jane's case rule as
follows.
Mary's situation
problem("mary's case", "example", ..... ).
abandon(aba1, [agent = mary, object = tom]).
pickup(pic2, [agent = jim, object = tom]).
traf ficAccident(acc1, [agent = jim]).
Jane's opinion
rule002("Jane's case",
[article = 218, insisted = defendant,
result = lost],
[ suf focate(suf1, [agent = jane/trivial,
object = dick/trivial]),
intention(int1, [agent = jane/trivial,
object = act1/important,
goal = deathl/important]),
death(deathl, [agent = dick/trivial]),
1123
caused(causedl, [event = aetl/important,
ef feet = lostl/important]),
Speedup
Time(sec.)
500~--------------------------------T20
rv
caused(causedl, [event
ef feet = death3])]).
= actl,
400 ,
_///~--..--.-..-.--.-.----.-..•
300
The case-based engine of HELIe-II generated
caused(ID, [event = acc1, effeet = death9])" byapplying rule002.
In Mary's case, HELIe-II generated 12 inference trees.
Some of them are based on the prosecutor's opinion and
others are based on the defendant's opinion. The root
of each tree is a possible crime such as abandonment
by a person responsible resulting in death, manslaughter caused by negligence, etc .. The leaves are the initial
data of the new case, and intermediate nodes are consequences by case rules or legal rules (see Fig.l0).
"rv
~ ~~:~~~.)
,//'
200
10
,/
,;
.:/
100
t
..;.
,
o+o----1-o--~20-----30--~4-0----5-0~--6-0----+700
Number of processors
Figure 11: Performance of stage 1 of the case-based engine
Speedup
TIme(sec.)
1200r-~~----------------------------T60
1000
50
800
40
600
Time(sec.)
Speedup
~
400
. . . . . . . . . f[e
4
200
au
I
......
30
20
/
10
.t••••...-
0+0~--1~0--~2~0---3~0----4~0----5~0----6~0---4700
Number of processors
Figure 12: Performance of stage 2 of the case-based engine
Figure 10: An Inference Tree
We measured the calculation time to draw a conclusion
for Mary's case on the experimental parallel inference
machine Multi-PSI. The number of rules used was about
20 and the number of cases used was about 30.
Fig.s 11 and 12 show the performance of the case-based
engine, and Fig.13 shows the performance of the rulebased engine. These graphs show the effectiveness of the
parallel inference.
Speedup
TIme(sec.)
1000.--------------------------------.12
800
;"'///'///
/
600
400
10
8
...
Time(sec.)
S_~
6
4
...........
200
2
~
~.,
6
Conclusion
We introduced the parallel legal reasoning system
HELIe-II. The advantages of HELIe-II are as follows.
O~--------------~--------------~O
o
10
20
Number of processors
Figure 13: Performance of the rule-based engine
1124
1. The hybrid architecture of HELIC-II is appropriate
to realize legal reasoning. As the reasoning of both
engines is data-driven, controlling these engines is
easier.
2. The knowledge representation and inference mechanisms of HELIC-II are simple' but convenient to
represent legal rules. and old cases.
3. By parallel inference, HELIC-II draws conclusion
quickly. As the rule base and the case base of the
legal domain are very large, quick searching and
quick reasoning are important to develop practical
systems.
4. Though it is troublesome to represent cases in detail, the rules of temporal reasoning help to describe
cases.
There are many tasks for extending HELIC-II. The
following are examples.
• Though the case-based engine IS focusing on
the similarity between two cases, we have to
develop a mechanism to contrast two cases
By
[Rissland et al. 1987],[Rissland et al. 1989].
comparing two inference trees, it is possible to construct a debate system.
• To describe legal rules in detail, we have to integrate
an extended logic system such as the logic of belief
and knowledge with temporal logic on MGTP.
• To improve the power of the similarity based matching of the case-based engine, we have to introduce a
derivational analogy mechanism.
• As inference trees are not suitable for allowing
lawyers to understand the inference steps, they are
represented in natural language.
References
[Uchida et al. 1988] Shunichi Uchida et al. . Research
and Development of the Parallel Inference System
in the Intermediate Stage of the FGCS Project. In
Proc. Int. Conf. on Fifth Generation Computer Systems, ICOT, Tokyo, 1988. pp.16-36.
[Goto et al. 1988] Atsuhiro Goto et al. . Overview of the
Parallel Inference Machine Architecture. In Proc.
Int. Conf. on Fifth Generation Computer Systems,
ICOT, Tokyo, 1988. pp.208-229.
[Chikayama et al. 1988] Takashi Chikayama et al. .
Overview of the Parallel Inference Machine Operating System (PIMOS). In Proc. Int. Conf. on Fifth
Generation Computer Systems, ICOT, Tokyo, 1988.
pp.230-251.
[Nitta et ai. 1991] K. Nitta et al. . Experimental Legal
Reasoning System on Parallel Inference Machine. In
Proc. PPAI Workshop of 12th IJCAI, Sydney, Australia, 1991. pp.139-145.
[Rissland et al. 1987] E.L. Rissland et al. . A Case-Based
System for Trade Secrets Law. In Proc. Int. Conf. on
Artificial Intelligence and Law, Boston, USA, 1987.
pp.60-66.
[Rissland et al. 1989] E.L. Rissland et aI .. Interpreting
Statutory Predicates. In Proc. Int. Conf. on A rtificial Intelligence and Law, Vancouver, CANADA,
1989. pp.46-53.
[Sartor 1991] G. Sartor. The structure of Norm Conditions and Nonmonotonic Reasoning in Law. In Proc.
Int. Conf. A rtificial Intelligence and Law, Oxford,
UK, 1991. pp.155-164.
[Branting 1989] L.K.Branting.
Representing
and Reusing Explanations of Legal Precedents. In
Proc. Int. Conf. on A rtificial Intelligence and Law,
Vancouver, CANADA, 1989. pp.103-110.
[Sanders 1991(a)] K.Sanders. Representing and reasoning about open-textured predicates. In Proc. Int.
Conf. on A rtificial Intelligence and Law, Oxford,
UK, 1991. pp.137-144.
[Sanders 1991(b)] K.Sanders. Planning in an OpenTextured Domain. A Thesis Proposal. Technical Report CS-91-08, Brown University, 1991.
[Fujita et al. 1991] H.Fujita et al. . A Model Generation
Theorem Prover in KL1 Using a Ramified-Stack Algorithm. ICOT TR-606. 1991.
[Inoue et al. 1991] K.Inoue et al. . Embedding negation
as failure into a model generation theorem prover.
ICOT TR-722. 1991.
[Allen 1984] J.F.Allen. Towards a general theory of action and time. Artificial Intelligence, Vol. 23, No.2
(1984 ),pp.123-154.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992
1125
Chart Parsers as Proof Procedures for
Fixed-Mode Logic Programs
David A. Rosenblueth
IIMAS, UNAM
Apdo. 20-726, 01000 Mexico D.F.
drosenbl~unamvm1.bitnet
Abstract
Logic programs resemble context-free grammars. Moreover, Prolog's proof procedure can be viewed as a generalization of a simple top-down parser with backtracking.
Just as there are parsers with advantages over that simple one, it may be desirable to develop other proof procedures for logic programs than the one used by Prolog.
The similarity between definite clauses and productions
suggests looking at parsing to develop such procedures.
We show that for an important class of logic programs
(fixed-mode logic programs with ground data structures)
the conversion of parsers into proof procedures can be
straightforward. This allows for proof procedures that
construct refutations that Prolog does not find and opens
up opportunities for parallelism.
1
Introduction
A logic program consists of clauses that look like the productions of a context-free grammar. This suggests connections between proof procedures and parsers. In fact,
Prolog's proof procedure can be regarded as a generalization of a simple parser with backtracking. Although this
language has found numerous applications, its execution
mechanism has several disadvantages. For instance, if
such a mechanism finds an infinite branch of the derivation tree, it enters a non terminating loop. Thus, it may
be desirable to develop new proof procedures for logic
programs.
Simple parsers with backtracking also enter nonterminating loops easily. This has motivated the design of
other more sophisticated parsing methods. In contrast
with proof procedures for logic programs, there already
exists a great variety of parsers. The resemblance between definite clauses and productions suggests looking
at parsers to develop new proof procedures.
Pereira and Warren [1983] have adapted Earley's
[1970] parsing algorithm, but the result is inefficient compared with Prolog. It uses subsumption, which is NPcomplete [Garey and Johnson 1979]. We show that by
considering a restricted class of logic programs, parsers
can be readily adapted to proof procedures. This class is
important: it consists of fixed-mode logic programs with
ground data structures. Moreover, our proof procedures
do not use subsumption and may be more efficient than
Pereira and Warren's.
Compositional programs. By using difference lists
to represent strings, a logic program can be restricted to
coincide with the productions of a context-free grammar.
Hence, for this class of logic programs, parsers are proof
procedures. Such a class, however, only has the expressive power of context-free grammars. Assuming that we
are interested in having a programming language, this
suggests generalizing such programs without losing the
close similarity with grammars. We do so by allowing the
body of clauses to denote the composition of arbitrary
binary relations; we call such programs "compositional."
Prolog programs are not normally written in compositional form. Thus, we consider programs in a larger class
(fixed-mode programs with ground data structures) and
transform [Rosenblueth 1991] them into compositional
form.
Fixed-mode programs. A "mode" for a subgoal is
the subset of arguments that are variables at the time
the subgoal is selected. Thus, the mode depends on the
derivation tree for a program and a query. When we refer
to a "fixed-mode logic program," we actually mean a program and a query such that with Prolog's computation
rule all subgoals with the same predicate symbol have
the same mode. By further restricting these programs
to have "ground data structures," we require all arguments in a subgoal that are not variables to be ground
terms when the subgoal is selected. This class of program is important because it includes many programs
occurring in practice.
At first glance, it seems that the presence of difference lists causes a program to have data structures with
variables. However, by separating both components of a
difference list it is possible to write some programs using
1126
P-+-R .. ·S
difference lists as programs with ground data structures.
(The usual quicksort program is such an example; the
sorted list is then built backwards.)
Overview of the paper. The rest of this paper is organized as follows. Section 2 reviews chart parsers. Section 3 shows that such parsers are a,lso correct for compositional programs. Section 4 deals with a method for
converting fixed-mode to compositional programs, thus
making chart parsers proof procedures for the former
class of programs. Section 5 compares these procedures
with Pereira and Warren's. Section 6 concludes this paper with some remarks.
2
Chart parsers
Charts. Chart parsers [Gazdar and Mellish 1989] are
methods for parsing strings of context-free languages
that can be regarded as a generalization of Earley's algorithm. A chart is a set of "partially" applied productions,
usually called edges. Each edge contains, in addition to
the part of a production to be applied and the left-hand
side of that production, two pointers to symbols of the
string being parsed. The substring between these pointers corresponds to the part of that production that has
already been applied.
It is useful to classify edges into those that have not
been applied at all: empty active edges, those that have
already been applied completely: passive edges, and all
the others: nonempty active edges.
The fundamental rule. New edges are created according to the following rule, often called the fundamental rule.
If a chart contains:
1. an active edge (either empty or nonempty) from
point a to point b in which the next symbol to be
applied is Q, and
2. a passive edge with left-hand side Q, from point b
to point c,
then create a new edge from a to c in which the production is the same as the one in the active edge, but with
Q applied.
Figure 1 illustrates this rule. In figures
representing edges, we use the following notation. Each
edge is labeled with an arrow, a symbol to the left of
the arrow, and a possibly empty string to the right. The
symbol is the left-hand side of the partially applied production. The string is the part of that production that
remains to be applied.
Top-down and bottom-up parsing. The fundamental rule takes only existing edges to create new ones, and
does not use information from the set of productions.
b
a
c
Figure 1: The fundamental rule.
Therefore, a mechanism is needed for building edges from
productions. Two main mechanisms for this purpose
are used, commonly called "top-down" and "bottom-up"
rules. The former builds parse trees from the root towards the leaves, and the latter does so from the leaves
towards the root.
The top-down rule creates edges as follows. If an active
edge from a to b is added to the chart, in which the next
symbol to be applied is Q, then create one empty active
edge from b. to b for every production having Q as lejthand side and labeled with that production.
Figure 2
exemplifies this rule.
Q-+-S .. ·T
one new edge
for every
production
Q-+-S .. ·T
a
b
Figure 2: The top-down rule.
Given a parse tree having a leaf Q and a node P as
parent of Q, this rule allows for Q to be expanded by
creating an empty active edge with Q as left-hand side.
Hence, parse trees are built by expanding the leaves with
nonterminals, which is a construction of parse trees from
the root towards the leaves.
The bottom-up rule creates edges as follows. If a passive edge from a to b is added to the chart, in which
the left-hand side symbol is Q, then create one empty active edge from a to a for every production having Q as
first symbol on the right-hand side and labeled with that
production. This rule is depicted in Figure 3.
The bottom-up rule takes a passive edge, representing
a parse subtree with Q as root. By creating an empty
active edge with Q as first symbol to be applied, and P
as left-hand side, Q becomes the child of a node P, which
is the root of a new subtree. Thus, this rule builds parse
trees from the leaves towards the root.
1127
P-Q···R
one new edge
for every
production
P-Q···R
a
b
Figure 3: The bottom-up rule.
Base of the chart. The fundamental rule takes two
edges. One of them is active and the other one passive.
The next symbol to be applied in the former must be
the left-hand side of the latter. This means that the case
where the next symbol to be applied is a terminal is not
covered (all left-hand sides of productions are non terminals). We can remedy this situation by assuming that
the productions have been written in such a way that
each terminal occurs only in productions with exactly
one symbol (that terminal) on the right-hand side. Now
we can create certain edges as follows. For each production with a terminal occurring in the string being parsed,
we create a passive edge from that terminal to the next
one, labeled with that production. We can do so, because
an edge represents a partially applied production (where
"partially" may mean "completely") and all those productions can be immediately applied. Now we can rely
only on the fundamental rule to operate existing edges.
We shall call the set of all edges created from terminals
the base of the chart.
Initialization. To initialize a parser using the bottomup rule, it suffices to create the base. The reason is that
the creation of edges in the bottom-up rule depends only
on the existence of a passive edge. In a parser using
the top-down rule, however, we must also create empty
active edges from the first symbol of the string being
parsed to itself labeled with productions having the start
symbol of the grammar as left-hand side. This is because
such a rule uses an active edge to create another one.
Agenda. The rules for producing edges that we have
described only create edges, but do not add them to the
chart. Normally, chart parsers store edges in two different data structures: the chart and an agenda of the set
of edges to be added to it. The choice of the procedure
for selecting edges from the agenda to be added to the
chart is a degree·of freedom relegated to the chart-parser
designers. When an edge is removed from the agenda,
it is added to the chart only if it has not been added
before.
([], [a], [b])
([a], [], [b))
([a], [b])
([],[a,b])
Figure 4: A char't constructed with the top-down rule.
Example. Figure 4 shows a chart created by a parser
using the top-down rule for the grammar with productions:
a-- ko a kl
a -- ([a], [], [b])
ko -- ([], [a], [b])
kl -- ([a], [b])
and the input string ([], [a], [b]) ([a], [], [b]) ([a], [b])
([], [a, b]). Terminals have been enclosed in angled brackets. The last symbol ([], [a, b]) is not part of the string
itself, but rather an end marker. This example will be
used again to illustrate the chart created by a proof procedure when concatenating [a] to [b].
Phillips' variant of the bottom-up parser.
Phillips observed [Simpkins and Hancox 1990] that the
bottom-up chart parser can be modified so that some
edges can be disposed of as the chart is built. The
agenda, then, only keeps passive edges, ordered with respect to the position of the symbol on the string they start
from. The chart only keeps active edges. When the first
passive edge E is removed from the agenda and momen-
tarily added to the chart, then
1. the fundamental rule is applied as many times as
possible, and
2. the bottom-up rule is also applied if possible, followed by applications of the fundamental rule.
In both cases, if the resulting edges are active, they are
added to the chart; otherwise they are added to the
agenda. After this, E can be disposed of. The reason
is that E cannot contribute to the creation of any more
new edges.
1128
3
Chart parsers as proof procedures
In this section we will show that chart parsers can be regarded as proof procedures for compositional programs.
State-oriented computations. The difference-list
representation of strings associates a production
(1)
with a clause of the form
(2)
and a production with a single terminal on its right-hand
side
(3)
Chart parsers as proof procedures. We shall generalize chart parsers to proof procedures by establishing
a correspondence between chart parsing and resolution.
The difference-list representation of languages suggests
that clauses of the form (2) should play the role of productions with no terminals on the right-hand side (1).
Clauses of the form (5) would then be the counterpart of
productions with exactly one terminal on the right-hand
side (3).
Given this correspondence, we now turn our attention
to edges. The fundamental rule of chart parsing takes
two edges and produces another one. Resolution, on the
other hand, takes two clauses and produces another one.
This suggests identifying edges with clauses and the fundamental rule with a resolution step.
The fundamental rule. If an edge from a to b labeled
with Po ---7 Pi· .. Pn corresponds to a clause of the form
with
(4)
(6)
With a programming language having only those
clauses we cannot compute all computable functions.
But if we generalize (4) to
then the fundamental rule corresponds to a resolution
step having (6) (which plays the role of the active edge)
and
p(t, t')
+-
(5)
where t and t' are terms such that var(t') ~ var(t), we
can. (Throughout, var( t) denotes the set of variables occurring in term t.) This can be shown, for instance, by
associating a logic program with a flowchart in such a
way that both have the same set of computations [Clark
and van Emden 1981]. A refutation for such a program
and a query with a ground term in its first argument may
be said to define a sequence of ground terms, resembling
the sequence of states in a computation of a programming language using destructive assignment. Thus we
shall say that such a logic program defines state-oriented
computations.
Strings vs. state-oriented computations. There
are two main differences between state-oriented computations and strings. One is that at a given point of a
state-oriented computation, there may be more than one
way to extend it. State-oriented computations are then
said to be nondeterministic. This phenomenon does not
occur in strings, which have a linear structure.
The other difference is that whereas we do know all
the symbols of the string before it is parsed, we do not
know initially all the states in a computation. A proof
procedure could in principle compute some sequence of
states before trying to build a chart. However, it may
not be convenient to do so, because not all sequences of
states form the base of a chart. A better idea is to extend
the computations one step at a time, guided by the part
of the chart built so far.
Pi(b, c) +-
(which plays the role of the passive edge) as input
clauses. The resolving clause of this resolution step is
which corresponds to an edge from a to c labeled with
Po ---7 Pi +1 ••• Pn . By correctness of resolution, the resolving clause is a logical consequence of the two input
clauses. Thus, we have generalized the fundamental rule
to a correct operation.
The top-down and the bottom-up rules. Given
the above identification of clauses with edges, the topdown rule for parsing corresponds to the following. Let
P be a program in compositional form. If a clause of the
form
is added to the chart, then create a clause of the form
for every clause in P of the form
The created clause is an instance of a clause in P, which
is a logical consequence of P. The bottom-up rule can
be generalized in a similar way.
1129
The base. The base can be extended one step at a
time as follows. For each clause
that is created, create a clause
r(b, t'f)) ~
for each clause in P of the form
r(t, t')
~
such that band t unify with unifier f) and there is a path
from Pi to r. There is a path from p to r if
1. pis r or
2. there is a clause in P of the form
p(t, t') ~
Conversion of fixed-mode to
compositional programs
We have seen that chart parsers can be regarded as
proof procedures for compositional programs. However,
logic programs are hot normally written in compositional
form. In this section we observe that it is possible to convert a fixed-mode logic program with ground data structures into compositional form. The resulting program' is
logically implied by an extension of the original one.
First we define the class of programs transformable by
our method and the class produced by it. Then we prove,
for a particular example, the correctness of the resulting
program. We omit the proof for the general case, which
can be found in [Rosenblueth 1991].
4.1
Directed and compositional programs
Direc.ted form. The class of transformable programs
has fixed modes. Thus we assume, without loss of generality, that in each predicate, all input arguments have
been grouped into one argument, and all output arguments into another one. We write the input argument
first, and the output argument second. A definite clause
of the form
where
1. var(ti)
n var(tJ = 0,
for i,j
is a directed clause. A directed program is a logic program having only directed clauses. Condition 1 causes
the term constructed when a subgoal succeeds to have
an effect only on the input of other subgoals. Condition
2 causes the input argument of all selected subgoals to
be ground if the input of the initial query is also ground
and subgoals are selected in a left-to-right order. We include Condition 3 only for technical reasons. This is a
minor restriction that considerably simplifies both stating our transformation and proving it correct. We call
these "directed programs" because we can visualize the
binding of a variable as flowing from one occurrence to
subsequent occurrences.
Compositional form.
ini te clause of the form
and there is a path from q to r.
4
3. each variable occurring in t~ occurs only once in t~,
for i = 0, ... , n
= 0, ... , nand i i- j;
A compositional clause is a def-
or
where t and t' are terms such that var(t') ~ var(t), and
the Xi are distinct variables. A logic program with only
compositional clauses is a compositional program.
We shall need various axioms. As with program
clauses, we assume that each axiom is implicitly universally quantified with respect to its variables.
Normally, an SLD-derivation is either successful,
failed, or infinite. Sometimes, however, we shall use
derivations that end in a clause that could possibly be
resolved with a program clause. We shall refer to these
derivations as partial derivations.
A partial derivation with a single-subgoal initial query
yields a conditional answer [Vasey 1986]. Such an answer is a clause in which the head is the subgoal in the
initial query of that derivation with the composition of
the substitutions applied to it, and the body is the set
of subgoals in the last query of that derivation.
4.2
Example
We illustrate our method with the following program for
concatenating two lists. It defines the usual append relation, but its arguments have been grouped in such a way
that its two inputs constitute the first argument, and its
output, the second. a( (X, Y), (Z}) holds if Z is the concatenation of the list Y at the end of the list X. The
angled brackets ( ) are an alternative notation for ordinary brackets [], that we use to group input and output
arguments. We do this for clarity.
a( ([], Y), (Y}) ~
a( _______
([WIX], Y), ([WIZ]})
~
2. var(tD ~ var(t o) U '" U var(ti), for i
= 0, ... , n;
to
' t~
~
a( (X, Y), '-v-"
(Z))
~
t~
tl
(7)
(8)
1130
We shall convert (8), which is directed, to compositional
form. This process can be motivated as follows.
Assume that we wish to construct an SLD-derivation
for (7) and (8) with a query having a ground input that
unifies with the head of (8). It is necessary, then, to
remember the t~rm with which W unifies, to be able
to add it to the front of the result of appending the
lists that unify with X and Y. This lack of information in the arguments of the subgoal of (8) prevents us
from representing a computation by the composition of
the relation denoted by a(X, Y) with itself. To be able
to use relational composition for representing computations, we must provide the missing information to the
arguments. A common technique in the implementation
of state-oriented languages for recording values needed in
subsequent steps of a computation is the use of a stack.
This suggests storing the term unifying with W in a list
that is treated as a stack. We thus define the predicate:
a( (StoIX), (StlIY)
H
St o = St l &, a(X, Y)
(9)
Although both St o and Stl represent the same stack, it
will be convenient to keep two names for this term, so
that the input of this new predicate shares no variables
with the output. Later we will see why we wish clauses
in which the input and the output of their atoms share
no variables.
We will also use of the standaTd equality theoTY. This
theory consists of the following axioms:
the "if" part of the definition of a (9) we obtain:
a( (Sto, [WoIXo], Yo), (St l , [W1IZ1))) +St o = St l , Wo = WI, a( (Xo, Yo), (ZI))
Next we fold the "iff" version [UIV] = [U'IV'] H U =
= V' of the function substitutivity axiom for the
list-constructor function symbol:
U' & V
a( (Sto, [WoIXoJ, Yo), (Stb [W1IZl]) +[lVoISto] = [WlISt l ], a( (Xo, Yo), (ZI))
and fold the definition of a:
a( (Sto, [WoIXo], Yo), (Stb [WI IZl)) ) f a( ([WoISto],Xo, Yo), ([WlISt l ], ZI)) (11)
Now the head (Wo) of the first list in the original clause
can be thought of as being removed from that list, and
pushed onto the stack, then being removed from the
stack with another name (WI) and finally added to the
front of the result of appending the tail of the first list
to the second.
The fact that in (11) the inputs share no variables with
the outputs allows us to fold the definitions of ko and kl :
ko(U, V)
H
kl(U, V)
H
in the following clause:
a(Uo, U3)
X=XfX=Yf-Y=X
X=Zf-X=Y,Y=Z
J(XI, ... ,Xn ) = J(y}, ... , 1';1)
p(U, V)
f-
f-
Xl = Yi, ... , Xn = Yn
U = X, V = Y,p(X, Y)
which are called, respectively, reflexivity, symmetry,
transitivity, function substitutivity, and predicate substitutivity. Note that the last two axioms are actually
axiom schemas; an axiom is included for every function
and predicate sy,mbol respectively.
Next, we can derive another clause in which the input and the output of the atoms have no variables in
common:
a( ([H1 IX], Y), ([vl"IZ]))
f-
TV
3St 0 3W0 3X0 3Yo.[(St o, [WoIXo], Yo) = u
& ([WoISto],Xo, Yo) = Vj
3St13W13Z1,[([WlIStl], Zl) = U
& (St l , [T¥lIZl]) = V]
= W', a( (X, Y), (Z))
(10)
This clause can be obtained as a conditional answer,
starting from the query f - a( U, V) and using function
substitutivity to disassemble the term ([lVIZ)), and reflexivity to assemble it with lV' instead of TV.
Next we can proceed as follows. Unfoldingl (10) on
1 In program-transformation terminology, the "unfold" operation is a resolution step. The "fold" operation replaces the subgoals that unify with a conjunction of atoms by a single atom using
a definition.
.
f-
(Sto, [WoIXo], Yo) = Uo,
([WoISt o], X o, Yo) = Ul ,
([WlIStd, Zl) = U2,
(St l , [WlIZl )) = U3,
a(Ub U2)
which is a logical consequence of (11) and the standard
equality theory. The resulting clause is:
a(UO,U3)
+-
kO(UO,Ul),a(Ul,U2),kl(U2,U3)
Using a result found, for instance, in [Shoenfield 1967
p. 57, 58] we can prove that the fold steps preserve all
models of the program.
It may not be practical to transform a program with
fold and unfold operations. The compositional form of
a directed program may be obtained in a more straightforward manner based on the theorem in the Appendix.
4.3
Example (continued)
The compositional form of the append program used to
concatenate lists is, then:
a(Uo, U3 ) f - ko(Uo, U1 ), a(U1, U2), kl (U2, U3)
a((St, [], Y), (St, Y)) f ko( (St, [WIXj, Y), ([WISt], X, Y) +kl(([WISt], Z), (St, [WIZ]) +-
1131
The chart created by a proof procedure using the
top-down rule for this program and the query .a( ([], [a], [b]), Z) was shown in Figure 4.
5
A comparison with Pereira
and Warren's Earley deduction
Pereira and Warren [1983] have extended Earley's [1970]
algorithm to a proof procedure for logic programs that
they call "Earley deduction," and we shall now compare
their work with ours. Their proof procedure has the
advantage that it can be applied to any logic program.
Two rules produce new clauses; when none can be applied, the process terminates. Since chart parsers are a
generalization of Earley's algorithm, we can give such
rules using the chart-parsing terminology.
1. If the chart contains a clause C having a selected
literal that unifies with a unit clause either in the
chart or in the program, then create the resolvent of
C with that unit clause. (This rule is the counterpart of the fundamental rule as well as the extension
of the base.)
2. If the chart contains a clause having a selected literal
that unifies with the head of a nonunit clause C in
the program with most general unifier e, then create
the clause
(This rule parallels the top-down rule
of chart parsing.)
ceo
A new clause is added to the chart only if there' is
no clause already in the chart that subsumes the new
one. Subsumption, however, is NP-complete [Garey and
Johnson 1979].
Earley deduction terminates for some programs if subsumption is replaced by a test for syntactic equality.
This change results in a proof procedure that can be
faster than the original Earley deduction and our methods. Our proof procedures, however, are preferable than
this variant of Earley deduction in programs for which
our methods terminate but such a variant does not. We
now exhibit one such example. Given the directed program
p(O,X).- p(O,j(X))
and a chart initialized with the clause ans(Y) .p( 0, Y), Earley deduction with a syntactic equality test
instead of subsumption produces the infinite sequence
p((), Y) .- p(O, j(Y))
p(O,j(Y))'- p(O,j(f(Y)))
,
With subsumption, Earley deduction does terminate for
this example. Our method, in contrast, does not require
subsumption and yet also terminates.
We have implemented Earley deduction based on the
top-down chart parser of [Gazdar and Mellish 1989, p.
211, 212]. and using Robinson's [1965] subsumption algorithm as modified in [Gottlob and Leitsch 1985]. We
have also adapted both top-down and bottom-up parsers
[Gazdar and Mellish 1989, p. 208-212] to proof procedures for compositional programs: In addition, we have
modified Phillips' variant of the bottom-up chart parser
as presented in [Simpkins and Hancox 1990]. The following table summarizes execution times for several programs and queries. The tests were performed on a SUN
SPARC station 1 using SICStus Prolog.
perm
hanoi
append
qsort
PW1
time
48
36
49
249
top-down
time su
46 1.0
21 1.7
22 2.2
30 8.3
Phillips
time
su
11
4.4
4.0
9
5
9.8
7 35.6
PW2
time
su
7 6.9
2 18.0
6 8.2
17 14.6
"perm" computes all permutations (four elements),
"hanoi" solves the Towers of Hanoi problem using difference lists to store the sequence of steps of the solution (five disks), "append" is the ordinary append
used to concatenate lists (80 elements), and "qsort" is
quicksort using difference lists (20 elements). "PW1" is
Pereira and Warren's proof procedure, "top-down" and
"Phillips" result from our method, and "PW2" is a variant of Pereira and Warren's proof procedure in which
subsumption has been replaced by a syntactic equality
test. "su" stands for "speedup." Times are in seconds.
6
Concluding remarks
Chart parsers work for a generalization of the differencelist representation of context-free grammars. This generalization replaces the clauses representing productions
with exactly one terminal by clauses having terms subject to only one syntactic restriction: all variables in the
second argument must appear in the first (compositional
programs).
It is possible to transform [Rosenblueth 1991] fixedmode logic programs into this generalization by adding
arguments that play the role of a stack. Consequently,
chart parsers can be used as proof procedures for
fixed-mode logic programs transformed by this method.
Strings correspond to sequences of ground terms.
Experiments have shown that programs so transformed can be executed several times faster than with
the previous adaptation of Earley's parser to a proof procedure done by Pereira and Warren [1983].
Phillips has modified [Simpkins and Hancox 1990] the
bottom-up chart parser so that portions of the chart being built can be disposed of. It is essential in the doctored
parser to keep edges ordered with respect to the string
1132
being parsed. In compositional programs, computations
form sequences and Phillips' idea can also be applied.
It is not clear how to apply it to Pereira and Warren's
method.
Proof procedures obtained from chart parsers terminate for some pr.ograms for which Prolog does not. In
addition, it is possible to build charts in parallel [Trehan
and Wilk 1988].
Acknowledgments
We are grateful to Felipe Bracho, Carlos Brody, Warren Greiff, Rafael Ramirez, Paul Strooper, and Carlos
Velarde. The anonymous referees also made valuable
suggestions. We acknowledge the facilities provided by
IIMAS, UN AM.
Bibliography
[Clark and van Emden 1981] Keith L. Clark and M.H.
van Emden. Consequence verification of flowcharts.
IEEE Transactions on Software Engineering, SE7(1):52-60, January 1981.
[Earley 1970] Jay Earley. An efficient context-free parsing algorithm. Communications of the A CM, 14:453460, 1970.
[Garey and Johnson 1979] Michael R. Garey and David
S. Johnson. Computers and Intractability. A Guide
to the Theory of NP-Completeness. W.H. Freeman
and Company, 1979.
[Gazdar and Mellish 1989] Gerald Gazdar and Chris
Mellish. Natural Language Processing in Prolog. An
Introduction to Computational Linguistics. AddisonWesley, 1989.
[Gottlob and Leitsch 1985J G. Gottlob and A. Leitsch.
On the efficiency of subsumption algorithms. Journal
of the ACM, 32(2):280-295, 1985.
[Simpkins and Hancox 1990] Neil K. Simpkins and Peter Hancox. Chart parsing in Prolog. New Generation
Computing, 8:113-138, 1990.
[Trehan and Wilk 1988] R. Trehan and P.F. Wilko A
parallel chart parser for the committed choice nondeterministic logic languages. In K.A. Bowen and
R.A. Kowalski, editors, Logic Programming: Proceedings of the Fifth International Conference and Symposium, pages 212-232. MIT Press, 1988.
[Vasey 1986] P. Vasey. Qualified answers and their application to transformation. In Proceedings of the Third
International Logic Programming Conference, pages
425-432. Springer-Verlag Lecture Notes in Computer
Science 225, 1986.
Appendix
Our method for converting fixed-mode programs to compositional form is based on the following theorem, which
is proved in [Rosenblueth 1991].
Theorem 1 Let C be a directed clause
Po( to, t~)
t-
PI (t~, td, P2( t~, t 2), ... ,Pn (t~_l' tn)
and let
I1i = (var(to) U··· U var(ti-l))
n (var(tD
U··· U var(t~))
for i = 1, ... ,n. Then the clause
Po(XO ,X2n+t} t - ko(XO ,Xt},PI(Xb X 2),
k l (X 2, X 3 ),P2(X3 ,X4 ), ••• ,
Pn(X2n - l , X 2n ), kn(X2n , X 2n+1)
is logically implied by C, the standard equality theory,
the "iff" version of the function substitutivity axiom for
the list-constructor function symbol, and the following
axzoms:
Pi((St/X), (St'/Y})
H
St
= St' & Pi(X, Y)
i = 0, ... ,n
ko(U, V)
H
3}},0" . 3Ymo ,0.[(St/to} = U
[Pereira and Warren 1983] Fernando C.N. Pereira and
David H.D. Warren. Parsing as deduction. Technical
Report 295, SRI, June 1983.
kl(U, V)
H
3}},1'" 3Ym1 ,1.[(E 1/t l
[Robinson 1965] J.A. Robinson. A machine-oriented
logic based on the resolution principle. J. A CM,
12:23-41, 1965.
kn-I(U, V)
[Rosenblueth 1991] David A. Rosenblueth. Fixed-mode
logic programs as state-oriented programs. Technical
Report Preimpreso No.2, IIMAS, UNAM, 1991.
[Shoenfield 1967] Joseph R. Shoenfield.
Logic. Addison-Wesley, 1967.
!l1athematical
& (EI/t~)
} =U
= V]
& (E2It~) = V]
H
3}},n-1 ... 3Ymn_l,n-I.[{En-l/tn-l) = U
& (En/t~_l) = V]
kn(U, V)
H
3}},n'" 3Ymn ,n.[(En /t n) = U
& (Stlt~)
= V]
where }},i, ... ,Ym"i are the variables on the right-hand
side of the definition of ki' except for U and V, for i =
0, ... ,n; Ei is any list of the form [X1,i, ... ,Xd"i/St], and
{X1,i, .. . ,Xd"d = ni , for i = 1, ... , n.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
O~ FIFTH GENERATION COMPUTER SYSTEMS 1992
edited by ICOT. © ICOT, 1992
'
1133
A Discourse Structure Analyzer for Japanese Text*
K. Sumita, K. Ono, T. Chino, T. Ukita, and S. Amano
Toshiba Corp. R&D Center
Komukai-Toshiba-cho 1, Saiwai-ku, Kawasaki 210, Japan
sumita@isl.rdc.toshiba.co.jp
Abstract
This paper presents a practical procedure for analyzing
discourse structures for Japanese text, where the structures are represented by binary trees. In order to construct discourse structures for Japanese argumentative
articles, the procedure uses local thinking-flow restrictions, segmentation rules, and topic flow preference. The
thinking-flow restrictions restrict the consecutive combination of relationships detected by connective expressions. Whereas the thinking-flow restrictions restrict
the discourse structures locally, the segmentation rules
constrain them globally, based on rhetorical dependencies between distant sentences. In addition, the topic
flow preference, which is the information concerning the
linkage of topic expressions and normal noun phrases,
chooses preferable structures. Using these restrictions,
the procedure can recognize the scope of relationships
between blocks of sentences, which no· other discourse
structure analysis methods can handle. The procedure
has been applied to 18 Japanese articles, different from
the data used for algorithm development. Results show
that this approach is promising for extracting discourse
information.
1
Introduction
A computational theory for analyzing linguistic discourse
structure and its practical procedure are necessary to
develop machine systems dealing with plural sentences;
e.g., systems for text summarization and for knowledge
extraction from a text corpus.
Hobbs developed a theory in which he arranged three
kinds of relationships between sentences from the text coherency viewpoint [Hobbs 1979]. Grosz and Sidner proposed a theory which accounted for interactions between
three notions on discourse: linguistic structure, intention, and attention [Grosz and Sidner 1986]. Litman and
Allen described a model in which a discourse structure
of conversation was built by recognizing a participant's
plans [Litman and Allen 1987]. These theories all de*This work was supported by ICOT (Institute for New Generation Computer Technology), and was carried out as a part of the
Fifth Generation Computer Systems research.
pend on extra-linguistic knowledge, the accumulation of
which presents a problem in the realization of a practical
analyzer. The authors aim to build a practical analyzer
which dispenses with such extra-linguistic knowledge dependent on topic areas of articles to be analyzed.
Mann and Thompson proposed a linguistic structure
of text describing relationships between sentences and
their relative importance [Mann and Thompson 1987].
However, no method for extracting the relationships from
superficial linguistic expressions was described in their
paper. Cohen proposed a framework for analyzing the
structure of argumentative discourse [Cohen 1987], yet
did not provide a concrete identification procedure for
'evidence' relationships between sentences, where no linguistic clues indicate the relationships. Also, since only
relationships between successive sentences were considered, the scope which the relationships cover cannot be
analyzed, even if explicit connectives are detected.
This paper discusses a practical procedure for analyzing the discourse structure of Japanese text. The
authors present a machine analyzer for extracting such
structure, the main component of which is a structure
analysis using thinking-flow restrictions for processing of
argumentative documents. These restrictions, which examine possible sequences of relationships extracted from
connective expressions in sentences, indicate which sentences should be grouped together to define the discourse
structure.
2
2.1
Discourse structure of Japanese text
Discourse structure
This paper focuses on analyzing discourse structure, representing relationships between sentences. In text, various rhetorical patterns are used to clarify the principle of
argument. Among them, connective expressions, which
state inter-sentence relationships, are the most significant. They can be divided into the ca.tegories described
in Table 1.
Here, connective expressions include not only normal connectives such as "therefore", but also idiomatic
1134
expressions stating relations' to the other part of text
such as "in addition" and "here
is described."
The authors extracted 800 connective expressions from
a preliminary analysis of more than 1,000 sentences in
several argumentative a.rticles [Ono et al. 1989]. Then,
connective relationships were classified into 18 categories
'as shown in Table 1. Using these relationships, linguistic
structures of articles are captured.
Table 1 is the current version of the relationship categories. The number of relationship categories necessary
and sufficient to represent discourse structures must be
determined through further experimentation. New categories will be formed as need becomes apparent; likewise,
categories found to overlap in function will be merged.
Final categorization can only be fixed after extensive
analysis.
Sentences of similar content may be grouped together into a block. Just as each sentence in a block
serves specific roles, e.g., "serial", "parallel", and "contrast" , each block in text serves a similar function. Thus,
the discourse structure must be able to represent hierarchical structures as well as individual relationships between sentences. In this paper, a discourse structure is
represented as a binary tree whose terminal nodes are
sentences; sub-trees correspond to local blocks of sentences in text.
Figure 1 shows a paragra.ph from an article titled
"a zero-crossing rate which estima.tes the frequency of a
speech signal," where underlined words indicate connective expressions. Figure 2 shows its discourse structure.
Extension relationships are set to sentences without any
explicit connective expressions. Although the fourth and
fifth sentences are clearly the exemplification of the first
three sentences, the sixth is not. Thus, the first five can
be grouped into a block.
Discourse structure can be represented by a formula.
The discourse structure in Figure 2 corresponds to the
following formula.
[[[1 [2 3JJ
[4 5JJ 6J.
2.2
Local constraint for consecutive relationships
For analyzing discourse structure, a local constraint on
consecutive relationships between blocks of sentences is
introduced. The example shown in Figures 1 and 2 suggests that the sequence of connective relationships can
limit the accepted discourse structures to those most accurately representative of original argumentative text.
Consider the sequence [P Q RJ, where P, Q,
R are arbitrary (blocks of) sentences. The premise of R
is obviously not only Q but both P and Q. Since the argument in P and Q is considered to close locally, the two
should be grouped into a block. This is a local constraint
on natural argumentation.
Table 1: Connective relationships.
EXAMPLES and
EXPLANATION
RELATION
serial connection
tt. tJ~ ~ (thus, therefore), J:
dakara
negative connection tt. tJ~ (but),
daga
reason
parallel
"?
"C (then)
yotte
L
tJ~
L (though)
shikashi
~.tf 1J: ~ (because)~
nazenara
~ (J) KR ti (the reason is ...)
sono wake wa
IiiJ Ifi te (at the same time),
doujini
c! ~. te (in addition)
saram
contrast
-11 (however), & 00 (on the contrary)
exemplification
-wtJ X. f'f (for example),
ippou
hanmen
tatoeba
... ~-c: ~ Q (and so on)
.,. nado dearu
repetition
<:RF>
~ ? (J)
toiunowa
l:
ti (in other words),
~ It, ti (it is ...)
sore wa
supplementation
rephrase
summarization
extension
t !:> .'?Iv (of course)
mochiron
"? 'i D, T 1J: :b !:> (that is ...)
tsumari sunawachi
tafiU(after all), 'i l: ~ Q l: (in sum)
kekkyolcu
matomeruto
L It, ti (this is)
/core wa
definition
L C -C" •.• l:
rhetorical question
direction
1J:.tf ... 1J: (J) tt. ~ ? tJ~ (Why is it ...)
T
Q (... is defined as ...)
/co/co de ... to suru
naze .,. nanodarouka
C C-c:ti ... ~~~Q
Iwlcode wa .,. wo noberu
(here ... is described)
reference
~xte ... ~~~ Q (Fig.X shows ...)
topic shift
background
c! "'C, l: C -? -C:(well, now)
enumeration
M- te (in the rust place),
ZIl
X ni .. , wo noberu
sate
~
*
juurai
tolcorode
(hitherto)
dai 1 ni
M= te(in the second place)
dai 2 ni
1135
In the context of discrete-time signals, zerocrossing is said to occur if successive samples have different algebraic signs.
2 The rate at which zero crossings occur is a
simple measure of the frequency content of a
signal.
3 : ~ particularly true of narrow band signals.
4 For example, a sinusoidal signal of frequency R>, sampled at a rate Fs, has Fs/R> samples per cycle of the sine wave.
5 : Each cycle has two zero crossings so that
the long-term average rate of zero-crossings is Z = 2R>/Fs
6 : Thus, the average zero-crossing rate gives
a reasonable way to estimate the frequency
of a sine wave.
Figure I: Text example 1.
: extension
: exemplification
: serial
1
4
5
6
Figure 2: Discourse structure for the text example 1.'
(This structure can be represented as the form
[[[1 [2 3]] [4 5]] 6].)
Thinking-flow is defined by a sequence of connective relationships and the way in which the sequence fits
into the allowable structure. The authors have investigated all 324 (18 x 18) pairs of connective relationships
and derived possible local structures for thinking.:flow restrictions. The pairs of connective relationships can be
represented by (rl, r2), where the relations rl and r2
are arbitrary connective relatioriships. They can be classified into the following four major groups.
(1) POP-type: permitting [[P rl Q] r2 R]
(eliminating [p rl [Q r2 R]])
ex. [[P Q] R],
: exemplification,
: serial.
(2) PUSH-type: permitting [p rl [Q r2 R]]
ex. [P [Q R]],
: reason.
(3) NEUTRAL-type: permitting both (1) and (2)
ex. [[p Q] R],
[p [Q R]],
: parallel.
(4) NON-type: permitting non-structure
[p rl Q r2 R]
ex. [P Q R].
The relationship sequence of POP-type means that the
local structure for the first two blocks should be popped
up, because the local argument is closed. On the other
hand, the relationship sequence of PUSH-type means
that the local structure should be pushed down.
The relationship sequence of NON-type permits nonstructure, which is of the form [P rl Q r2 RJ. Therefore, to be exact, the discourse structure which contains
the sequence of this type is not a binary tree.
The thinking-flow restrictions can be used to eliminate structures expressing unnatural argumentative extensions, by examining their local structures. Although
the thinking-flow restrictions define local constraints on
relationships to neighbors, the scope of rela.tionships is
analyzed by recursively checking all local structures of a
discourse structure.
2.3
Distant dependencies
The greater part of text ca.n be appropriately analyzed,
using the above local constraints on connective relationships to neighbors, if the relationships are extracted correctly. However, in real text, there are rhetorical dependencies concerning distant sentences, which cannot be
detected by examining only the normal relationships to
neighbors. Two kinds of linguistic clues to distant dependencies must be considered in the realization of a precise
discourse analyzer: rhetorical expressions which cover
distant sentences, and referential relations of words, in
particular, topics.
2.3.1
Rhetorical expressions stating global structure
First, rhetorical expressions which relate to an entire article play an important role. Examples are:
"... ? ... ? The reason is, ... ",
" ... as follows .... (TENSE=present).
. .. (TENSE=present).",
". " is not an exceptional case. '" ... ".
Consider the text example in Figure 3, in which unnecessary words are omitted for expositional clarity. In this
text the rhetorical expressions which relate to the entire
paragraph affect its discourse structure. The expressions
"first" and "second" in the last two sentences correspond
to the expression "two pieces" in the first sentence; the
second and the third sentences, therefore, cab be said to
be connected by parallel relationship, as they have similar relations with the first sentence. Thus, the discourse
structure in Figure 4 is a natural representation.
1136
While, in real text, there is a wide variety of rhetorical expressions of this type, those that are often used in
argumentative articles can be determined through ana.lysis. A robust discourse analysis system must detect these
rhetorical expressions to restrict discourse structures.
2.3.2
Topic frow
The other significant phenomenon concerning the distant
dependencies is reference. While English uses pronouns
and definite noun phrases in reference, in Japanese, a
phrase that is identical to or a part of the original noun
phrase is used when referring to some other part of the
text. By analyzing the appearance of the same expressions, a restriction or a preference for building discourse
structures can be determined. However, the same expressions tend to scatter in a text: and it is difficult
to determine the referent for a reference without tasl P. In the case of the text shown in Figure 5,
T2 => 1, T3 => 2, and T4 => 3 hold. If a topic in a
sentence refers to a word in the previous sentence, it is
regarded as an elaboration of the earlier sentence. Thus,
these sentences must be kept close together in their discourse structure; the structure depicted in Figure 6 is
appropriate for this text.
In addition, relative importance of relationship connecting sentences in text must be considered for the topic
flow analysis. Connective relationships can be classified into three categories according to their relative importance: left-hand, right-hand, and neutral type. For
example, the exemplification relationship is a left-hand
type; i.e., for [p QJ, P strongly relates to the
global flow of argumentation beyond the outside of this
block, and in this sense P is more important than Q. In
contrast, the serial relationship is a right-hand type, and
the parallel relationship is a neutral type.
Consider the structure [[P rl QJ r2 RJ, where 'rl'
is a left-hand type relationship, and 'r2' can be any relationship. If TR => P, the above structure is natural, even
if there is the same word as TR in Q. However, if TR => Q,
this structure is unnatural, in the sense of coherency. In
this case, the structure [P rl [Q r2 RJ J is preferable
to [[p r1 QJ r2 RJ.
On the contrary, in the case .where 'r1' is a righthand type, [[P r1 QJ r2 RJ is a natural structure, even
if TR => Q. In short, the naturalness of a discourse structure closely depends on the appearance position of topics
and their referents, and the relative importance of the referred nodes.
1 : Two pieces of X are relevant.
2 : First, ... .
3 : Second, ... .
Figure 3: Text example 2 (X is a noun phrase.)
~
/
enumeration
parallel
3
Figure 4: Discourse structure for the text example 2.
1 : AttB C Ci3:l. b ~ ~ 0
A wa B to C lcara naru
A consists of Band C.
2:
ctt ... DcEK:5Ht bnQo
C wa ... D to E ni wakerareru
C is divided into D and E.
3 : Dtt .... F~#f"'Jo
D wa ... F wo motsu
D has .,. F.
4 : Ftt ...
0
F wa ..,
F is ....
Figure 5: Text example 3 (A - F are noun phrases.)
~
EX>
: extension
«EX>
123
4
Figure 6: Discourse structure for the text example 3.
1137
3
Discourse structure analyzer
3.1
System configuration
Input
Figure 7 shows the discourse structure analyzer, which
consists of five parts: pre-processing, segmentation, candidate generation, candidate reduction and preference
judgement. If input text consists of multiple paragraphs
or multiple sections, every section or every paragraph in
the text is analyzed individually. Figure 8 outlines the
input/output data of each stage for a paragraph. The
outline of each stage of the discourse structure analyzer
is described in the following sections.
3.1.1
Pre-processing
In this stage, input sentences are analyzed, character
strings are divided into words, and the dependency structure for each sentence is constructed. The stage consists
of the following sub-processes:
(1) Extracting the text of an article from chapters or
sections.
(2) Accomplishing morphological and syntactic analysis.
(3) Extracting topic expressions and the reappearance
of the targeted expression.
(4) Detecting connective relationships and constructing their sequence.
Segmentation
Rule
Thinking-flow
Restriction
Topics and
their Referents
Output
Figure 7: System overview.
Input sentences:
1."0 2~1 tc"' o 3 (: Q) .. ·A .. ·o 4Ati· .. o
dai 1 n;
First,
kono
This
Awa
Ais
5~2 tc ... 0 6 L. tc ;t~ "? -C ... 0
shitagatte
dai 2 ni
Second,
Thus
t
Pre-processing result:
[1 2 3 4 5 6]
In Step (1), the title of an article is eliminated, and
the body is extracted. Next, in Step (2), sentences in the
body of the article, extra.cted in Step (1), a.re morphologically and syntactically analyzed. In Step (3), topic
expressions are extracted, according to a table of topic
denotation expressions. The following are examples of
topic expressions.
" ...
" ...
" ...
" ...
wa" (as for ... ),
niwa" (in ... ),
dewa" (in ... ),
nioitewa" (in ... ).
In Step (4), a connecti ve expression is detected based
on an expression table consisting of a word and its part
of speech for individual connective relationships. In this
step, connection sequence, a sequence of sentence identifiers and connective relationships, is acquired. For example, a connection sequence is of the form
t
Segmentation result:
[1 {2